linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
@ 2020-12-13 13:38 Leo Yan
  2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
                   ` (11 more replies)
  0 siblings, 12 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

This patch set is to sort cache line for all load operations which hit
any cache levels.  For single cache line view, it shows the load
references for loads with cache hits and with cache misses respectively.

This series is a following for the old patch set "perf c2c: Sort
cacheline with LLC load" [1], in the old patch set it tries to sort
cache line with the load operations in last level cache (LLC), after
testing we found the trace data doesn't contain LLC events if the
platform isn't a NUMA system.  For this reason, this series refines the
implementation to sort on all cache levels hits of load operations; it's
reasonable for us to review the load and store opreations, if detects
any cache line is accessed by multi-threads, this hints that the cache
line is possible for false sharing.

This patch set is clearly applied on perf/core branch with the latest
commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
evlist' record methods").  And the changes has been tested on x86 and
Arm64, the testing result is shown as below.

The testing result on x86:

  # perf c2c record -- false_sharing.exe 2
  # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table
  =================================================
  #
  #        ----------- Cacheline ----------  Load Hit  Load Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt       Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  ........  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0x556f25dff100     0    1895    75.73%      4591     7840     4591     3249     2633      616      849     2734       67        58      883         0        0         0         0
        1      0x556f25dff080     0       1    13.10%       794      794      794        0        0        0      164      486       28        20       96         0        0         0         0
        2      0x556f25dff0c0     0       1    10.01%       607      607      607        0        0        0      107        5        5       488        2         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto
  =================================================
  #
  #        --  Load Refs --  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
  #   Num      Hit     Miss   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
  #
    -------------------------------------------------------------
        0     4591        0     2633      616      0x556f25dff100
    -------------------------------------------------------------
            20.52%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc1d         0      2200      1276      942         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            19.82%    0.00%   38.06%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc16         0      2190      1130     1912         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            18.25%    0.00%   56.63%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc16         0      2173      1074     2329         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            18.23%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc1d         0      2013      1220      837         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    3.11%   59.90%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc28         0         0         0      451         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    2.20%   40.10%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc28         0         0         0      305         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            12.00%    0.00%    0.00%    0.00%                0x20     0       1    28079    28083:reader_thd      0x556f25bfdc73         0       159       107      551         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
            11.17%    0.00%    0.00%    0.00%                0x20     0       1    28079    28084:reader_thd      0x556f25bfdc73         0       148       108      513         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0

  [...]


The testing result on Arm64 (Hisilicon D06); please note, the Arm SPE
data source patch set has not been merged into the mainline kernel and
a potential issue for store operations is working in progress, so the
final outputting result might have minor differences.

  # perf c2c record -- false_sharing.exe 2
  # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table          
  =================================================
  #
  #        ----------- Cacheline ----------  Load Hit  Load Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt       Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  ........  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0xaaaab4e8b100   N/A       0    35.04%    100447   104933   100447     4486     4486        0        0    11269        0     89178        0         0        0         0         0
        1      0xaaaab4e8af80   N/A       0    17.29%     49571    49571    49571        0        0        0        0    49571        0         0        0         0        0         0         0
        2      0xaaaab4e8afc0   N/A       0    16.72%     47922    47922    47922        0        0        0        0    47922        0         0        0         0        0         0         0
        3      0xaaaab4e8b080   N/A       0     8.94%     25641    67718    25641    42077    42077        0        0     4397        0     21244        0         0        0         0         0
        4      0xaaaab4e7a480   N/A       0     4.42%     12680    12680    12680        0        0        0        0    12680        0         0        0         0        0         0         0
        5      0xffffa2ffc980   N/A       0     2.62%      7511     7511     7511        0        0        0        0     7511        0         0        0         0        0         0         0
        6      0xffffa3ffe980   N/A       0     2.57%      7374     7374     7374        0        0        0        0     7374        0         0        0         0        0         0         0
        7      0xaaaab4e8b000   N/A       0     2.41%      6907     6907     6907        0        0        0        0     6907        0         0        0         0        0         0         0
        8      0xaaaab4e8b0c0   N/A       0     2.30%      6592     6592     6592        0        0        0        0     2822        0      3770        0         0        0         0         0
        9      0xffffa37fd980   N/A       0     2.24%      6408     6408     6408        0        0        0        0     6408        0         0        0         0        0         0         0
       10      0xffffb8d80980   N/A       0     2.18%      6254     6254     6254        0        0        0        0     6254        0         0        0         0        0         0         0
       11      0xffffb9d82980   N/A       0     1.31%      3763     9706     3763     5943     5943        0        0     3763        0         0        0         0        0         0         0
       12      0xffffb9581980   N/A       0     1.22%      3507    11484     3507     7977     7977        0        0     3507        0         0        0         0        0         0         0
       13      0xffffbad84980   N/A       0     0.33%       932     7766      932     6834     6834        0        0      932        0         0        0         0        0         0         0
       14      0xffffba583980   N/A       0     0.24%       700     6503      700     5803     5803        0        0      700        0         0        0         0        0         0         0
  
  =================================================
        Shared Cache Line Distribution Pareto      
  =================================================
  #
  #        --  Load Refs --  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared                                   
  #   Num      Hit     Miss   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
  #
    -------------------------------------------------------------
        0   100447        0     4486        0      0xaaaab4e8b100
    -------------------------------------------------------------
            15.44%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15049:lock_th         0xaaaab4e79dd0         0         0         0    15508         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   1
            14.43%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15048:lock_th         0xaaaab4e79dd0         0         0         0    14499         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            11.57%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15048:lock_th         0xaaaab4e79db8         0         0         0    11622         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            11.38%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15050:lock_th         0xaaaab4e79dd0         0         0         0    11429         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   2
            10.57%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15051:lock_th         0xaaaab4e79dd0         0         0         0    10614         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   3
             9.69%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15049:lock_th         0xaaaab4e79db8         0         0         0     9731         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   1
             5.74%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15050:lock_th         0xaaaab4e79db8         0         0         0     5763         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   2
             4.84%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15051:lock_th         0xaaaab4e79db8         0         0         0     4866         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   3
             0.00%    0.00%   14.02%    0.00%                 0x0   N/A       0    15046    15048:lock_th         0xaaaab4e79dbc         0         0         0      629         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
             0.00%    0.00%    6.44%    0.00%                 0x0   N/A       0    15046    15048:lock_th         0xaaaab4e79de0         0         0         0      289         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%   12.37%    0.00%                 0x0   N/A       0    15046    15049:lock_th         0xaaaab4e79dbc         0         0         0      555         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   1
             0.00%    0.00%    6.46%    0.00%                 0x0   N/A       0    15046    15049:lock_th         0xaaaab4e79de0         0         0         0      290         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   1
             0.00%    0.00%   21.38%    0.00%                 0x0   N/A       0    15046    15050:lock_th         0xaaaab4e79dbc         0         0         0      959         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   2
             0.00%    0.00%    9.61%    0.00%                 0x0   N/A       0    15046    15050:lock_th         0xaaaab4e79de0         0         0         0      431         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   2
             0.00%    0.00%   22.14%    0.00%                 0x0   N/A       0    15046    15051:lock_th         0xaaaab4e79dbc         0         0         0      993         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   3
             0.00%    0.00%    7.58%    0.00%                 0x0   N/A       0    15046    15051:lock_th         0xaaaab4e79de0         0         0         0      340         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   3
             6.66%    0.00%    0.00%    0.00%                0x20   N/A       0    15046    15054:reader_thd      0xaaaab4e79e54         0         0         0     6687         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   2
             3.76%    0.00%    0.00%    0.00%                0x28   N/A       0    15046    15052:reader_thd      0xaaaab4e79e80         0         0         0     3774         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:159   0
             3.54%    0.00%    0.00%    0.00%                0x28   N/A       0    15046    15055:reader_thd      0xaaaab4e79e80         0         0         0     3551         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:159   3
             2.39%    0.00%    0.00%    0.00%                0x30   N/A       0    15046    15053:reader_thd      0xaaaab4e79eac         0         0         0     2403         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:163   1


  [...]

Changes from v1:
* Changed from sorting on LLC to sorting on all loads with cache hits;
* Added patches 06/11, 07/11 for refactoring macros;
* Added patch 08/11 for refactoring node header, so can display "%loads"
  rather than "%hitms" in the header;
* Added patch 09/11 to add local pointers for pointing to output metrics
  string and sort string (Juri);
* Added warning in percent_hitm() for the display "all", which should
  never happen (Juri).

[1] https://lore.kernel.org/patchwork/cover/1321514/


Leo Yan (11):
  perf c2c: Add dimensions for total load hit
  perf c2c: Add dimensions for load hit
  perf c2c: Add dimensions for load miss
  perf c2c: Rename for shared cache line stats
  perf c2c: Refactor hist entry validation
  perf c2c: Refactor display filter macro
  perf c2c: Refactor node display macro
  perf c2c: Refactor node header
  perf c2c: Add local variables for output metrics
  perf c2c: Sort on all cache hit for load operations
  perf c2c: Update documentation for display option 'all'

 tools/perf/Documentation/perf-c2c.txt |  21 +-
 tools/perf/builtin-c2c.c              | 548 ++++++++++++++++++++++----
 2 files changed, 487 insertions(+), 82 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 01/11] perf c2c: Add dimensions for total load hit
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2021-01-06  7:38   ` Namhyung Kim
  2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Arm SPE trace data doesn't support HITM, but we still want to explore
"perf c2c" tool to analyze cache false sharing.  If without HITM tag,
the tool cannot give out accurate result for cache false sharing, a
candidate solution is to sort the total load operations and connect with
the threads info, e.g. if multiple threads hit the same cache line for
many times, this can give out the hint that it's likely to cause cache
false sharing issue.

Unlike having HITM tag, the proposed solution is not accurate and might
introduce false positive reporting, but it's a pragmatic approach for
detecting false sharing if memory event doesn't support HITM.

To sort with the cache line hit, this patch adds dimensions for total
load hit and the associated percentage calculation.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 112 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 112 insertions(+)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index c5babeaa3b38..3d5a2dc8b4fd 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -615,6 +615,47 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return tot_hitm_left - tot_hitm_right;
 }
 
+#define TOT_LD_HIT(stats)		\
+	((stats)->ld_fbhit +		\
+	 (stats)->ld_l1hit +		\
+	 (stats)->ld_l2hit +		\
+	 (stats)->ld_llchit +		\
+	 (stats)->lcl_hitm +		\
+	 (stats)->rmt_hitm +		\
+	 (stats)->rmt_hit)
+
+static int tot_ld_hit_entry(struct perf_hpp_fmt *fmt,
+			    struct perf_hpp *hpp,
+			    struct hist_entry *he)
+{
+	struct c2c_hist_entry *c2c_he;
+	int width = c2c_width(fmt, hpp, he->hists);
+	unsigned int tot_hit;
+
+	c2c_he = container_of(he, struct c2c_hist_entry, he);
+	tot_hit = TOT_LD_HIT(&c2c_he->stats);
+
+	return scnprintf(hpp->buf, hpp->size, "%*u", width, tot_hit);
+}
+
+static int64_t tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+			      struct hist_entry *left,
+			      struct hist_entry *right)
+{
+	struct c2c_hist_entry *c2c_left;
+	struct c2c_hist_entry *c2c_right;
+	uint64_t tot_hit_left;
+	uint64_t tot_hit_right;
+
+	c2c_left  = container_of(left, struct c2c_hist_entry, he);
+	c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+	tot_hit_left  = TOT_LD_HIT(&c2c_left->stats);
+	tot_hit_right = TOT_LD_HIT(&c2c_right->stats);
+
+	return tot_hit_left - tot_hit_right;
+}
+
 #define STAT_FN_ENTRY(__f)					\
 static int							\
 __f ## _entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,	\
@@ -860,6 +901,58 @@ percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return per_left - per_right;
 }
 
+static double percent_tot_ld_hit(struct c2c_hist_entry *c2c_he)
+{
+	struct c2c_hists *hists;
+	int tot = 0, st = 0;
+
+	hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
+
+	st  = TOT_LD_HIT(&c2c_he->stats);
+	tot = TOT_LD_HIT(&hists->stats);
+
+	return tot ? (double) st * 100 / tot : 0;
+}
+
+static int
+percent_tot_ld_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+			 struct hist_entry *he)
+{
+	struct c2c_hist_entry *c2c_he;
+	int width = c2c_width(fmt, hpp, he->hists);
+	char buf[10];
+	double per;
+
+	c2c_he = container_of(he, struct c2c_hist_entry, he);
+	per = percent_tot_ld_hit(c2c_he);
+	return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_tot_ld_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+			 struct hist_entry *he)
+{
+	return percent_color(fmt, hpp, he, percent_tot_ld_hit);
+}
+
+static int64_t
+percent_tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+		   struct hist_entry *left, struct hist_entry *right)
+{
+	struct c2c_hist_entry *c2c_left;
+	struct c2c_hist_entry *c2c_right;
+	double per_left;
+	double per_right;
+
+	c2c_left  = container_of(left, struct c2c_hist_entry, he);
+	c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+	per_left  = percent_tot_ld_hit(c2c_left);
+	per_right = percent_tot_ld_hit(c2c_right);
+
+	return per_left - per_right;
+}
+
 static struct c2c_stats *he_stats(struct hist_entry *he)
 {
 	struct c2c_hist_entry *c2c_he;
@@ -1412,6 +1505,14 @@ static struct c2c_dimension dim_ld_rmthit = {
 	.width		= 8,
 };
 
+static struct c2c_dimension dim_tot_ld_hit = {
+	.header		= HEADER_BOTH("Load Hit", "Total"),
+	.name		= "tot_ld_hit",
+	.cmp		= tot_ld_hit_cmp,
+	.entry		= tot_ld_hit_entry,
+	.width		= 8,
+};
+
 static struct c2c_dimension dim_tot_recs = {
 	.header		= HEADER_BOTH("Total", "records"),
 	.name		= "tot_recs",
@@ -1460,6 +1561,15 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_percent_tot_ld_hit = {
+	.header         = HEADER_BOTH("Load Hit", "Pct"),
+	.name		= "percent_tot_ld_hit",
+	.cmp		= percent_tot_ld_hit_cmp,
+	.entry		= percent_tot_ld_hit_entry,
+	.color		= percent_tot_ld_hit_color,
+	.width		= 8,
+};
+
 static struct c2c_dimension dim_percent_stores_l1hit = {
 	.header		= HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
 	.name		= "percent_stores_l1hit",
@@ -1615,11 +1725,13 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_ld_l2hit,
 	&dim_ld_llchit,
 	&dim_ld_rmthit,
+	&dim_tot_ld_hit,
 	&dim_tot_recs,
 	&dim_tot_loads,
 	&dim_percent_hitm,
 	&dim_percent_rmt_hitm,
 	&dim_percent_lcl_hitm,
+	&dim_percent_tot_ld_hit,
 	&dim_percent_stores_l1hit,
 	&dim_percent_stores_l1miss,
 	&dim_dram_lcl,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 02/11] perf c2c: Add dimensions for load hit
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
  2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2021-01-06  7:38   ` Namhyung Kim
  2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Add dimensions for load hit and its percentage calculation, which is to
be displayed in the single cache line output.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 71 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 3d5a2dc8b4fd..00014e3d81fa 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1052,6 +1052,58 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return per_left - per_right;
 }
 
+static double percent_ld_hit(struct c2c_hist_entry *c2c_he)
+{
+	struct c2c_hists *hists;
+	int tot, st;
+
+	hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
+
+	st  = TOT_LD_HIT(&c2c_he->stats);
+	tot = TOT_LD_HIT(&hists->stats);
+
+	return percent(st, tot);
+}
+
+static int
+percent_ld_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		     struct hist_entry *he)
+{
+	struct c2c_hist_entry *c2c_he;
+	int width = c2c_width(fmt, hpp, he->hists);
+	char buf[10];
+	double per;
+
+	c2c_he = container_of(he, struct c2c_hist_entry, he);
+	per = percent_ld_hit(c2c_he);
+	return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_ld_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		     struct hist_entry *he)
+{
+	return percent_color(fmt, hpp, he, percent_ld_hit);
+}
+
+static int64_t
+percent_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+		   struct hist_entry *left, struct hist_entry *right)
+{
+	struct c2c_hist_entry *c2c_left;
+	struct c2c_hist_entry *c2c_right;
+	double per_left;
+	double per_right;
+
+	c2c_left  = container_of(left, struct c2c_hist_entry, he);
+	c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+	per_left  = percent_ld_hit(c2c_left);
+	per_right = percent_ld_hit(c2c_right);
+
+	return per_left - per_right;
+}
+
 static int
 percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 			   struct hist_entry *he)
@@ -1417,6 +1469,14 @@ static struct c2c_dimension dim_cl_rmt_hitm = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_cl_tot_ld_hit = {
+	.header		= HEADER_SPAN("--- Load ---", "Hit", 1),
+	.name		= "cl_tot_ld_hit",
+	.cmp		= tot_ld_hit_cmp,
+	.entry		= tot_ld_hit_entry,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_cl_lcl_hitm = {
 	.header		= HEADER_SPAN_LOW("Lcl"),
 	.name		= "cl_lcl_hitm",
@@ -1570,6 +1630,15 @@ static struct c2c_dimension dim_percent_tot_ld_hit = {
 	.width		= 8,
 };
 
+static struct c2c_dimension dim_percent_ld_hit = {
+	.header		= HEADER_SPAN("--  Load Refs --", "Hit", 1),
+	.name		= "percent_ld_hit",
+	.cmp		= percent_ld_hit_cmp,
+	.entry		= percent_ld_hit_entry,
+	.color		= percent_ld_hit_color,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_percent_stores_l1hit = {
 	.header		= HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
 	.name		= "percent_stores_l1hit",
@@ -1715,6 +1784,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_rmt_hitm,
 	&dim_cl_lcl_hitm,
 	&dim_cl_rmt_hitm,
+	&dim_cl_tot_ld_hit,
 	&dim_tot_stores,
 	&dim_stores_l1hit,
 	&dim_stores_l1miss,
@@ -1731,6 +1801,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_percent_hitm,
 	&dim_percent_rmt_hitm,
 	&dim_percent_lcl_hitm,
+	&dim_percent_ld_hit,
 	&dim_percent_tot_ld_hit,
 	&dim_percent_stores_l1hit,
 	&dim_percent_stores_l1miss,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 03/11] perf c2c: Add dimensions for load miss
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
  2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
  2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2021-01-06  7:42   ` Namhyung Kim
  2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Add dimensions for load miss and its percentage calculation, which is to
be displayed in the single cache line output.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 107 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 00014e3d81fa..27745340c14a 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -624,6 +624,10 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	 (stats)->rmt_hitm +		\
 	 (stats)->rmt_hit)
 
+#define TOT_LD_MISS(stats)		\
+	((stats)->lcl_dram +		\
+	 (stats)->rmt_dram)
+
 static int tot_ld_hit_entry(struct perf_hpp_fmt *fmt,
 			    struct perf_hpp *hpp,
 			    struct hist_entry *he)
@@ -656,6 +660,38 @@ static int64_t tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return tot_hit_left - tot_hit_right;
 }
 
+static int tot_ld_miss_entry(struct perf_hpp_fmt *fmt,
+			     struct perf_hpp *hpp,
+			     struct hist_entry *he)
+{
+	struct c2c_hist_entry *c2c_he;
+	int width = c2c_width(fmt, hpp, he->hists);
+	unsigned int tot_miss;
+
+	c2c_he = container_of(he, struct c2c_hist_entry, he);
+	tot_miss = TOT_LD_MISS(&c2c_he->stats);
+
+	return scnprintf(hpp->buf, hpp->size, "%*u", width, tot_miss);
+}
+
+static int64_t tot_ld_miss_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+			       struct hist_entry *left,
+			       struct hist_entry *right)
+{
+	struct c2c_hist_entry *c2c_left;
+	struct c2c_hist_entry *c2c_right;
+	uint64_t tot_miss_left;
+	uint64_t tot_miss_right;
+
+	c2c_left  = container_of(left, struct c2c_hist_entry, he);
+	c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+	tot_miss_left  = TOT_LD_MISS(&c2c_left->stats);
+	tot_miss_right = TOT_LD_MISS(&c2c_right->stats);
+
+	return tot_miss_left - tot_miss_right;
+}
+
 #define STAT_FN_ENTRY(__f)					\
 static int							\
 __f ## _entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,	\
@@ -1104,6 +1140,58 @@ percent_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return per_left - per_right;
 }
 
+static double percent_ld_miss(struct c2c_hist_entry *c2c_he)
+{
+	struct c2c_hists *hists;
+	int tot, st;
+
+	hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
+
+	st  = TOT_LD_MISS(&c2c_he->stats);
+	tot = TOT_LD_MISS(&hists->stats);
+
+	return percent(st, tot);
+}
+
+static int
+percent_ld_miss_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		      struct hist_entry *he)
+{
+	struct c2c_hist_entry *c2c_he;
+	int width = c2c_width(fmt, hpp, he->hists);
+	char buf[10];
+	double per;
+
+	c2c_he = container_of(he, struct c2c_hist_entry, he);
+	per = percent_ld_miss(c2c_he);
+	return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_ld_miss_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		      struct hist_entry *he)
+{
+	return percent_color(fmt, hpp, he, percent_ld_miss);
+}
+
+static int64_t
+percent_ld_miss_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+		    struct hist_entry *left, struct hist_entry *right)
+{
+	struct c2c_hist_entry *c2c_left;
+	struct c2c_hist_entry *c2c_right;
+	double per_left;
+	double per_right;
+
+	c2c_left  = container_of(left, struct c2c_hist_entry, he);
+	c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+	per_left  = percent_ld_miss(c2c_left);
+	per_right = percent_ld_miss(c2c_right);
+
+	return per_left - per_right;
+}
+
 static int
 percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 			   struct hist_entry *he)
@@ -1477,6 +1565,14 @@ static struct c2c_dimension dim_cl_tot_ld_hit = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_cl_tot_ld_miss = {
+	.header		= HEADER_SPAN_LOW("Miss"),
+	.name		= "cl_tot_ld_miss",
+	.cmp		= tot_ld_miss_cmp,
+	.entry		= tot_ld_miss_entry,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_cl_lcl_hitm = {
 	.header		= HEADER_SPAN_LOW("Lcl"),
 	.name		= "cl_lcl_hitm",
@@ -1639,6 +1735,15 @@ static struct c2c_dimension dim_percent_ld_hit = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_percent_ld_miss = {
+	.header		= HEADER_SPAN_LOW("Miss"),
+	.name		= "percent_ld_miss",
+	.cmp		= percent_ld_miss_cmp,
+	.entry		= percent_ld_miss_entry,
+	.color		= percent_ld_miss_color,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_percent_stores_l1hit = {
 	.header		= HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
 	.name		= "percent_stores_l1hit",
@@ -1785,6 +1890,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_cl_lcl_hitm,
 	&dim_cl_rmt_hitm,
 	&dim_cl_tot_ld_hit,
+	&dim_cl_tot_ld_miss,
 	&dim_tot_stores,
 	&dim_stores_l1hit,
 	&dim_stores_l1miss,
@@ -1802,6 +1908,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_percent_rmt_hitm,
 	&dim_percent_lcl_hitm,
 	&dim_percent_ld_hit,
+	&dim_percent_ld_miss,
 	&dim_percent_tot_ld_hit,
 	&dim_percent_stores_l1hit,
 	&dim_percent_stores_l1miss,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 04/11] perf c2c: Rename for shared cache line stats
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
                   ` (2 preceding siblings ...)
  2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2021-01-06  7:44   ` Namhyung Kim
  2020-12-13 13:38 ` [PATCH v2 05/11] perf c2c: Refactor hist entry validation Leo Yan
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

For shared cache line statistics, it relies on HITM.  We can use more
general naming rather than only binding to HITM, so replace "hitm_stats"
with "shared_clines_stats" in structure perf_c2c, and rename function
resort_hitm_cb() to resort_shared_cl_cb().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 27745340c14a..580c4ead68db 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -97,8 +97,8 @@ struct perf_c2c {
 	bool			 symbol_full;
 	bool			 stitch_lbr;
 
-	/* HITM shared clines stats */
-	struct c2c_stats	hitm_stats;
+	/* Shared clines stats */
+	struct c2c_stats	shared_clines_stats;
 	int			shared_clines;
 
 	int			 display;
@@ -2251,7 +2251,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 {
 	struct c2c_hist_entry *c2c_he;
 	struct c2c_hists *c2c_hists;
-	bool display = he__display(he, &c2c.hitm_stats);
+	bool display = he__display(he, &c2c.shared_clines_stats);
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 	c2c_hists = c2c_he->hists;
@@ -2338,14 +2338,14 @@ static int setup_nodes(struct perf_session *session)
 
 #define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
 
-static int resort_hitm_cb(struct hist_entry *he, void *arg __maybe_unused)
+static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 {
 	struct c2c_hist_entry *c2c_he;
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
 	if (HAS_HITMS(c2c_he)) {
 		c2c.shared_clines++;
-		c2c_add_stats(&c2c.hitm_stats, &c2c_he->stats);
+		c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
 	}
 
 	return 0;
@@ -2416,7 +2416,7 @@ static void print_c2c__display_stats(FILE *out)
 
 static void print_shared_cacheline_info(FILE *out)
 {
-	struct c2c_stats *stats = &c2c.hitm_stats;
+	struct c2c_stats *stats = &c2c.shared_clines_stats;
 	int hitm_cnt = stats->lcl_hitm + stats->rmt_hitm;
 
 	fprintf(out, "=================================================\n");
@@ -3117,7 +3117,7 @@ static int perf_c2c__report(int argc, const char **argv)
 	ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
 
 	hists__collapse_resort(&c2c.hists.hists, NULL);
-	hists__output_resort_cb(&c2c.hists.hists, &prog, resort_hitm_cb);
+	hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
 	hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);
 
 	ui_progress__finish();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 05/11] perf c2c: Refactor hist entry validation
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
                   ` (3 preceding siblings ...)
  2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

This patch has no any functionality changes but refactors hist entry
validation for cache line resorting.

It renames function "valid_hitm_or_store()" to "is_valid_hist_entry()",
changes return type from integer type to bool type.  In the function,
it uses switch-case instead of ternary operators, which is easier
to extend for more display types.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 580c4ead68db..5cd30c083d6c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2178,16 +2178,32 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 	return he->filtered == 0;
 }
 
-static inline int valid_hitm_or_store(struct hist_entry *he)
+static inline bool is_valid_hist_entry(struct hist_entry *he)
 {
 	struct c2c_hist_entry *c2c_he;
-	bool has_hitm;
+	bool has_record = false;
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
-	has_hitm = c2c.display == DISPLAY_TOT ? c2c_he->stats.tot_hitm :
-		   c2c.display == DISPLAY_LCL ? c2c_he->stats.lcl_hitm :
-						c2c_he->stats.rmt_hitm;
-	return has_hitm || c2c_he->stats.store;
+
+	/* It's a valid entry if contains stores */
+	if (c2c_he->stats.store)
+		return true;
+
+	switch (c2c.display) {
+	case DISPLAY_LCL:
+		has_record = !!c2c_he->stats.lcl_hitm;
+		break;
+	case DISPLAY_RMT:
+		has_record = !!c2c_he->stats.rmt_hitm;
+		break;
+	case DISPLAY_TOT:
+		has_record = !!c2c_he->stats.tot_hitm;
+		break;
+	default:
+		break;
+	}
+
+	return has_record;
 }
 
 static void set_node_width(struct c2c_hist_entry *c2c_he, int len)
@@ -2241,7 +2257,7 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)
 
 	calc_width(c2c_he);
 
-	if (!valid_hitm_or_store(he))
+	if (!is_valid_hist_entry(he))
 		he->filtered = HIST_FILTER__C2C;
 
 	return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 06/11] perf c2c: Refactor display filter macro
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
                   ` (4 preceding siblings ...)
  2020-12-13 13:38 ` [PATCH v2 05/11] perf c2c: Refactor hist entry validation Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2021-01-06  7:47   ` Namhyung Kim
  2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

When sort on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm),
macro FILTER_HITM is to filter out the cache line entries if its
overhead is less than 1%.

This patch is to refactor macro FILTER_HITM.  It uses more gernal name
FILTER_DISPLAY to replace the old name; and refines its parameter,
rather than passing field name for the data structure, it changes to
pass the cache line's statistic value and the sum value, this is more
flexsible, e.g. if consider to extend for sorting on all load hits
which combines multiple fields from structure c2c_stats.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 5cd30c083d6c..f11c3c84bb2b 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2151,24 +2151,27 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
-#define FILTER_HITM(__h)						\
-	if (stats->__h) {						\
-		ld_dist = ((double)c2c_he->stats.__h / stats->__h);	\
+#define FILTER_DISPLAY(val, sum)					\
+{									\
+	if ((sum)) {							\
+		ld_dist = ((double)(val) / (sum));			\
 		if (ld_dist < DISPLAY_LINE_LIMIT)			\
 			he->filtered = HIST_FILTER__C2C;		\
 	} else {							\
 		he->filtered = HIST_FILTER__C2C;			\
-	}
+	}								\
+}
 
 	switch (c2c.display) {
 	case DISPLAY_LCL:
-		FILTER_HITM(lcl_hitm);
+		FILTER_DISPLAY(c2c_he->stats.lcl_hitm, stats->lcl_hitm);
 		break;
 	case DISPLAY_RMT:
-		FILTER_HITM(rmt_hitm);
+		FILTER_DISPLAY(c2c_he->stats.rmt_hitm, stats->rmt_hitm);
 		break;
 	case DISPLAY_TOT:
-		FILTER_HITM(tot_hitm);
+		FILTER_DISPLAY(c2c_he->stats.tot_hitm, stats->tot_hitm);
+		break;
 	default:
 		break;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 07/11] perf c2c: Refactor node display macro
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
                   ` (5 preceding siblings ...)
  2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2021-01-06  7:47   ` Namhyung Kim
  2020-12-13 13:38 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

The macro DISPLAY_HITM() is used to calculate HITM percentage introduced
by every node and it's shown for the node info.

This patch refactors the macro, it is renamed it as DISPLAY_METRICS().
And the parameters is changed for passing the metric's statistic value
and the sum value, this is flexsible for later's extension, e.g. it's
easier to be used for metrics which combines multiple fields from
structure c2c_stats.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index f11c3c84bb2b..50018bfb1089 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1324,23 +1324,26 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
 			ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
 			advance_hpp(hpp, ret);
 
-		#define DISPLAY_HITM(__h)						\
-			if (c2c_he->stats.__h> 0) {					\
+		#define DISPLAY_METRICS(val, sum)					\
+		{									\
+			if ((sum) > 0) {						\
 				ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ",	\
-						percent(stats->__h, c2c_he->stats.__h));\
+						percent((val), (sum)));			\
 			} else {							\
 				ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a");	\
-			}
+			}								\
+		}
 
 			switch (c2c.display) {
 			case DISPLAY_RMT:
-				DISPLAY_HITM(rmt_hitm);
+				DISPLAY_METRICS(stats->rmt_hitm, c2c_he->stats.rmt_hitm);
 				break;
 			case DISPLAY_LCL:
-				DISPLAY_HITM(lcl_hitm);
+				DISPLAY_METRICS(stats->lcl_hitm, c2c_he->stats.lcl_hitm);
 				break;
 			case DISPLAY_TOT:
-				DISPLAY_HITM(tot_hitm);
+				DISPLAY_METRICS(stats->tot_hitm, c2c_he->stats.tot_hitm);
+				break;
 			default:
 				break;
 			}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 08/11] perf c2c: Refactor node header
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
                   ` (6 preceding siblings ...)
  2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2020-12-13 13:38 ` [PATCH v2 09/11] perf c2c: Add local variables for output metrics Leo Yan
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

The node header array contains 3 items, each item is used for one of
the 3 flavors for node accessing info.  To extend sorting on all load
references and not always stick to HITMs, the second header string
"Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as
"Node{cpus %loads %stores}").

For this reason, this patch changes the node header array to three
flat variables and uses switch-case in function setup_nodes_header(),
thus it is easier for altering the header string.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 50018bfb1089..846ee58d6cfb 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1806,12 +1806,6 @@ static struct c2c_dimension dim_dso = {
 	.se		= &sort_dso,
 };
 
-static struct c2c_header header_node[3] = {
-	HEADER_LOW("Node"),
-	HEADER_LOW("Node{cpus %hitms %stores}"),
-	HEADER_LOW("Node{cpu list}"),
-};
-
 static struct c2c_dimension dim_node = {
 	.name		= "node",
 	.cmp		= empty_cmp,
@@ -2293,9 +2287,27 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 	return 0;
 }
 
+static struct c2c_header header_node_0 = HEADER_LOW("Node");
+static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");
+
 static void setup_nodes_header(void)
 {
-	dim_node.header = header_node[c2c.node_info];
+	switch (c2c.node_info) {
+	case 0:
+		dim_node.header = header_node_0;
+		break;
+	case 1:
+		dim_node.header = header_node_1;
+		break;
+	case 2:
+		dim_node.header = header_node_2;
+		break;
+	default:
+		break;
+	}
+
+	return;
 }
 
 static int setup_nodes(struct perf_session *session)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 09/11] perf c2c: Add local variables for output metrics
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
                   ` (7 preceding siblings ...)
  2020-12-13 13:38 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

For different display types, it might use different metrics for
outputting and sorting.  So this patch adds several local variables:

  "cl_output": pointer for outputting single cache line metrics;
  "output_str": pointer for outputting cache line metrics;
  "sort_str": pointer to the sorting metrics.

This is flexsible and the variables can be assigned to different strings
based on the specified display type.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 59 ++++++++++++++++++++++------------------
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 846ee58d6cfb..9342c30d86ee 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2500,16 +2500,17 @@ static void print_pareto(FILE *out)
 	struct perf_hpp_list hpp_list;
 	struct rb_node *nd;
 	int ret;
+	const char *cl_output;
+
+	cl_output = "cl_num,"
+		    "cl_rmt_hitm,"
+		    "cl_lcl_hitm,"
+		    "cl_stores_l1hit,"
+		    "cl_stores_l1miss,"
+		    "dcacheline";
 
 	perf_hpp_list__init(&hpp_list);
-	ret = hpp_list__parse(&hpp_list,
-				"cl_num,"
-				"cl_rmt_hitm,"
-				"cl_lcl_hitm,"
-				"cl_stores_l1hit,"
-				"cl_stores_l1miss,"
-				"dcacheline",
-				NULL);
+	ret = hpp_list__parse(&hpp_list, cl_output, NULL);
 
 	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
 		return;
@@ -3053,6 +3054,7 @@ static int perf_c2c__report(int argc, const char **argv)
 	OPT_END()
 	};
 	int err = 0;
+	const char *output_str, *sort_str = NULL;
 
 	argc = parse_options(argc, argv, options, report_c2c_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
@@ -3129,24 +3131,29 @@ static int perf_c2c__report(int argc, const char **argv)
 		goto out_mem2node;
 	}
 
-	c2c_hists__reinit(&c2c.hists,
-			"cl_idx,"
-			"dcacheline,"
-			"dcacheline_node,"
-			"dcacheline_count,"
-			"percent_hitm,"
-			"tot_hitm,lcl_hitm,rmt_hitm,"
-			"tot_recs,"
-			"tot_loads,"
-			"tot_stores,"
-			"stores_l1hit,stores_l1miss,"
-			"ld_fbhit,ld_l1hit,ld_l2hit,"
-			"ld_lclhit,lcl_hitm,"
-			"ld_rmthit,rmt_hitm,"
-			"dram_lcl,dram_rmt",
-			c2c.display == DISPLAY_TOT ? "tot_hitm" :
-			c2c.display == DISPLAY_LCL ? "lcl_hitm" : "rmt_hitm"
-			);
+	output_str = "cl_idx,"
+		     "dcacheline,"
+		     "dcacheline_node,"
+		     "dcacheline_count,"
+		     "percent_hitm,"
+		     "tot_hitm,lcl_hitm,rmt_hitm,"
+		     "tot_recs,"
+		     "tot_loads,"
+		     "tot_stores,"
+		     "stores_l1hit,stores_l1miss,"
+		     "ld_fbhit,ld_l1hit,ld_l2hit,"
+		     "ld_lclhit,lcl_hitm,"
+		     "ld_rmthit,rmt_hitm,"
+		     "dram_lcl,dram_rmt";
+
+	if (c2c.display == DISPLAY_TOT)
+		sort_str = "tot_hitm";
+	else if (c2c.display == DISPLAY_RMT)
+		sort_str = "rmt_hitm";
+	else if (c2c.display == DISPLAY_LCL)
+		sort_str = "lcl_hitm";
+
+	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
 
 	ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
                   ` (8 preceding siblings ...)
  2020-12-13 13:38 ` [PATCH v2 09/11] perf c2c: Add local variables for output metrics Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2021-01-06  7:52   ` Namhyung Kim
  2020-12-13 13:38 ` [PATCH v2 11/11] perf c2c: Update documentation for display option 'all' Leo Yan
  2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
  11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Except the existed three display options 'tot', 'rmt', 'lcl', this patch
adds option 'all' so can sort on the all cache hit for load operation.
This new introduced option can be a choice for profiling cache false
sharing if the memory event doesn't contain HITM tags.

For displaying with option 'all', the "Shared Data Cache Line Table" and
"Shared Cache Line Distribution Pareto" both have difference comparing
to other three display options.

For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
it sorts with the metrics "tot_ld_hit" and "percent_tot_ld_hit".  If
without HITM metrics, users can analyze the load hit statistics for all
cache levels, so the dimensions of total load hit is used to replace
HITM dimensions.

For Pareto, every single cache line shows the metrics "cl_tot_ld_hit"
and "cl_tot_ld_miss" instead of "cl_rmt_hitm" and "percent_lcl_hitm",
and the single cache line view is sorted by metrics "tot_ld_hit".

As result, we can get the 'all' display as follows:

  # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table
  =================================================
  #
  #        ----------- Cacheline ----------  Load Hit  Load Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt       Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  ........  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0x556f25dff100     0    1895    75.73%      4591     7840     4591     3249     2633      616      849     2734       67        58      883         0        0         0         0
        1      0x556f25dff080     0       1    13.10%       794      794      794        0        0        0      164      486       28        20       96         0        0         0         0
        2      0x556f25dff0c0     0       1    10.01%       607      607      607        0        0        0      107        5        5       488        2         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto
  =================================================
  #
  #        --  Load Refs --  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
  #   Num      Hit     Miss   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
  #
    -------------------------------------------------------------
        0     4591        0     2633      616      0x556f25dff100
    -------------------------------------------------------------
            20.52%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc1d         0      2200      1276      942         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            19.82%    0.00%   38.06%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc16         0      2190      1130     1912         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            18.25%    0.00%   56.63%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc16         0      2173      1074     2329         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            18.23%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc1d         0      2013      1220      837         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    3.11%   59.90%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc28         0         0         0      451         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    2.20%   40.10%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc28         0         0         0      305         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            12.00%    0.00%    0.00%    0.00%                0x20     0       1    28079    28083:reader_thd      0x556f25bfdc73         0       159       107      551         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
            11.17%    0.00%    0.00%    0.00%                0x20     0       1    28079    28084:reader_thd      0x556f25bfdc73         0       148       108      513         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0

  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 139 ++++++++++++++++++++++++++++-----------
 1 file changed, 101 insertions(+), 38 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 9342c30d86ee..0df4a4a30f7a 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -113,13 +113,15 @@ enum {
 	DISPLAY_LCL,
 	DISPLAY_RMT,
 	DISPLAY_TOT,
+	DISPLAY_ALL,
 	DISPLAY_MAX,
 };
 
 static const char *display_str[DISPLAY_MAX] = {
-	[DISPLAY_LCL] = "Local",
-	[DISPLAY_RMT] = "Remote",
-	[DISPLAY_TOT] = "Total",
+	[DISPLAY_LCL] = "Local HITMs",
+	[DISPLAY_RMT] = "Remote HITMs",
+	[DISPLAY_TOT] = "Total HITMs",
+	[DISPLAY_ALL] = "All Load Access",
 };
 
 static const struct option c2c_options[] = {
@@ -883,6 +885,11 @@ static double percent_hitm(struct c2c_hist_entry *c2c_he)
 	case DISPLAY_TOT:
 		st  = stats->tot_hitm;
 		tot = total->tot_hitm;
+		break;
+	case DISPLAY_ALL:
+		ui__warning("Calculate hitm percent for display 'all';\n"
+			    "should never happen!\n");
+		break;
 	default:
 		break;
 	}
@@ -1344,6 +1351,10 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
 			case DISPLAY_TOT:
 				DISPLAY_METRICS(stats->tot_hitm, c2c_he->stats.tot_hitm);
 				break;
+			case DISPLAY_ALL:
+				DISPLAY_METRICS(TOT_LD_HIT(stats),
+						TOT_LD_HIT(&c2c_he->stats));
+				break;
 			default:
 				break;
 			}
@@ -1692,6 +1703,7 @@ static struct c2c_header percent_hitm_header[] = {
 	[DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
 	[DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
 	[DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
+	[DISPLAY_ALL] = HEADER_BOTH("LLC", "Hit"),
 };
 
 static struct c2c_dimension dim_percent_hitm = {
@@ -2169,6 +2181,9 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 	case DISPLAY_TOT:
 		FILTER_DISPLAY(c2c_he->stats.tot_hitm, stats->tot_hitm);
 		break;
+	case DISPLAY_ALL:
+		FILTER_DISPLAY(TOT_LD_HIT(&c2c_he->stats), TOT_LD_HIT(stats));
+		break;
 	default:
 		break;
 	}
@@ -2199,6 +2214,9 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
 	case DISPLAY_TOT:
 		has_record = !!c2c_he->stats.tot_hitm;
 		break;
+	case DISPLAY_ALL:
+		has_record = !!TOT_LD_HIT(&c2c_he->stats);
+		break;
 	default:
 		break;
 	}
@@ -2288,7 +2306,10 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 }
 
 static struct c2c_header header_node_0 = HEADER_LOW("Node");
-static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_hitms_stores =
+		HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_loads_stores =
+		HEADER_LOW("Node{cpus %loads %stores}");
 static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");
 
 static void setup_nodes_header(void)
@@ -2298,7 +2319,10 @@ static void setup_nodes_header(void)
 		dim_node.header = header_node_0;
 		break;
 	case 1:
-		dim_node.header = header_node_1;
+		if (c2c.display == DISPLAY_ALL)
+			dim_node.header = header_node_1_loads_stores;
+		else
+			dim_node.header = header_node_1_hitms_stores;
 		break;
 	case 2:
 		dim_node.header = header_node_2;
@@ -2377,11 +2401,13 @@ static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 	struct c2c_hist_entry *c2c_he;
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
-	if (HAS_HITMS(c2c_he)) {
+	if (c2c.display == DISPLAY_ALL && TOT_LD_HIT(&c2c_he->stats)) {
+		c2c.shared_clines++;
+		c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
+	} else if (HAS_HITMS(c2c_he)) {
 		c2c.shared_clines++;
 		c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
 	}
-
 	return 0;
 }
 
@@ -2502,12 +2528,21 @@ static void print_pareto(FILE *out)
 	int ret;
 	const char *cl_output;
 
-	cl_output = "cl_num,"
-		    "cl_rmt_hitm,"
-		    "cl_lcl_hitm,"
-		    "cl_stores_l1hit,"
-		    "cl_stores_l1miss,"
-		    "dcacheline";
+	if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
+	    c2c.display == DISPLAY_RMT)
+		cl_output = "cl_num,"
+			    "cl_rmt_hitm,"
+			    "cl_lcl_hitm,"
+			    "cl_stores_l1hit,"
+			    "cl_stores_l1miss,"
+			    "dcacheline";
+	else /* c2c.display == DISPLAY_ALL */
+		cl_output = "cl_num,"
+			    "cl_tot_ld_hit,"
+			    "cl_tot_ld_miss,"
+			    "cl_stores_l1hit,"
+			    "cl_stores_l1miss,"
+			    "dcacheline";
 
 	perf_hpp_list__init(&hpp_list);
 	ret = hpp_list__parse(&hpp_list, cl_output, NULL);
@@ -2543,7 +2578,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
 		fprintf(out, "%-36s: %s\n", first ? "  Events" : "", evsel__name(evsel));
 		first = false;
 	}
-	fprintf(out, "  Cachelines sort on                : %s HITMs\n",
+	fprintf(out, "  Cachelines sort on                : %s\n",
 		display_str[c2c.display]);
 	fprintf(out, "  Cacheline data grouping           : %s\n", c2c.cl_sort);
 }
@@ -2700,7 +2735,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
 {
 	scnprintf(bf, size,
 		  "Shared Data Cache Line Table     "
-		  "(%lu entries, sorted on %s HITMs)",
+		  "(%lu entries, sorted on %s)",
 		  browser->nr_non_filtered_entries,
 		  display_str[c2c.display]);
 	return 0;
@@ -2906,6 +2941,8 @@ static int setup_display(const char *str)
 		c2c.display = DISPLAY_RMT;
 	else if (!strcmp(display, "lcl"))
 		c2c.display = DISPLAY_LCL;
+	else if (!strcmp(display, "all"))
+		c2c.display = DISPLAY_ALL;
 	else {
 		pr_err("failed: unknown display type: %s\n", str);
 		return -1;
@@ -2952,10 +2989,12 @@ static int build_cl_output(char *cl_sort, bool no_source)
 	}
 
 	if (asprintf(&c2c.cl_output,
-		"%s%s%s%s%s%s%s%s%s%s",
+		"%s%s%s%s%s%s%s%s%s%s%s",
 		c2c.use_stdio ? "cl_num_empty," : "",
-		"percent_rmt_hitm,"
-		"percent_lcl_hitm,"
+		c2c.display == DISPLAY_ALL ? "percent_ld_hit,"
+					     "percent_ld_miss," :
+					     "percent_rmt_hitm,"
+					     "percent_lcl_hitm,",
 		"percent_stores_l1hit,"
 		"percent_stores_l1miss,"
 		"offset,offset_node,dcacheline_count,",
@@ -2984,6 +3023,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
 static int setup_coalesce(const char *coalesce, bool no_source)
 {
 	const char *c = coalesce ?: coalesce_default;
+	const char *sort_str = NULL;
 
 	if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
 		return -ENOMEM;
@@ -2991,12 +3031,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
 	if (build_cl_output(c2c.cl_sort, no_source))
 		return -1;
 
-	if (asprintf(&c2c.cl_resort, "offset,%s",
-		     c2c.display == DISPLAY_TOT ?
-		     "tot_hitm" :
-		     c2c.display == DISPLAY_RMT ?
-		     "rmt_hitm,lcl_hitm" :
-		     "lcl_hitm,rmt_hitm") < 0)
+	if (c2c.display == DISPLAY_TOT)
+		sort_str = "tot_hitm";
+	else if (c2c.display == DISPLAY_RMT)
+		sort_str = "rmt_hitm,lcl_hitm";
+	else if (c2c.display == DISPLAY_LCL)
+		sort_str = "lcl_hitm,rmt_hitm";
+	else if (c2c.display == DISPLAY_ALL)
+		sort_str = "tot_ld_hit";
+
+	if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
 		return -ENOMEM;
 
 	pr_debug("coalesce sort   fields: %s\n", c2c.cl_sort);
@@ -3131,20 +3175,37 @@ static int perf_c2c__report(int argc, const char **argv)
 		goto out_mem2node;
 	}
 
-	output_str = "cl_idx,"
-		     "dcacheline,"
-		     "dcacheline_node,"
-		     "dcacheline_count,"
-		     "percent_hitm,"
-		     "tot_hitm,lcl_hitm,rmt_hitm,"
-		     "tot_recs,"
-		     "tot_loads,"
-		     "tot_stores,"
-		     "stores_l1hit,stores_l1miss,"
-		     "ld_fbhit,ld_l1hit,ld_l2hit,"
-		     "ld_lclhit,lcl_hitm,"
-		     "ld_rmthit,rmt_hitm,"
-		     "dram_lcl,dram_rmt";
+	if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
+	    c2c.display == DISPLAY_RMT)
+		output_str = "cl_idx,"
+			     "dcacheline,"
+			     "dcacheline_node,"
+			     "dcacheline_count,"
+			     "percent_hitm,"
+			     "tot_hitm,lcl_hitm,rmt_hitm,"
+			     "tot_recs,"
+			     "tot_loads,"
+			     "tot_stores,"
+			     "stores_l1hit,stores_l1miss,"
+			     "ld_fbhit,ld_l1hit,ld_l2hit,"
+			     "ld_lclhit,lcl_hitm,"
+			     "ld_rmthit,rmt_hitm,"
+			     "dram_lcl,dram_rmt";
+	else /* c2c.display == DISPLAY_ALL */
+		output_str = "cl_idx,"
+			     "dcacheline,"
+			     "dcacheline_node,"
+			     "dcacheline_count,"
+			     "percent_tot_ld_hit,"
+			     "tot_ld_hit,"
+			     "tot_recs,"
+			     "tot_loads,"
+			     "tot_stores,"
+			     "stores_l1hit,stores_l1miss,"
+			     "ld_fbhit,ld_l1hit,ld_l2hit,"
+			     "ld_lclhit,lcl_hitm,"
+			     "ld_rmthit,rmt_hitm,"
+			     "dram_lcl,dram_rmt";
 
 	if (c2c.display == DISPLAY_TOT)
 		sort_str = "tot_hitm";
@@ -3152,6 +3213,8 @@ static int perf_c2c__report(int argc, const char **argv)
 		sort_str = "rmt_hitm";
 	else if (c2c.display == DISPLAY_LCL)
 		sort_str = "lcl_hitm";
+	else if (c2c.display == DISPLAY_ALL)
+		sort_str = "tot_ld_hit";
 
 	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 11/11] perf c2c: Update documentation for display option 'all'
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
                   ` (9 preceding siblings ...)
  2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
  2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
  11 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Since the new display option 'all' is introduced, this patch is to
update the documentation to reflect it.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/Documentation/perf-c2c.txt | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index c81d72e3eecf..da49c3d26316 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -109,7 +109,8 @@ REPORT OPTIONS
 
 -d::
 --display::
-	Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
+	Switch to HITM type (rmt, lcl) or all load cache hit (all) to display
+	and sort on. Total HITMs as default.
 
 --stitch-lbr::
 	Show callgraph with stitched LBRs, which may have more complete
@@ -174,12 +175,18 @@ For each cacheline in the 1) list we display following data:
   Cacheline
   - cacheline address (hex number)
 
-  Rmt/Lcl Hitm
+  Rmt/Lcl Hitm (For display with HITM types)
   - cacheline percentage of all Remote/Local HITM accesses
 
-  LLC Load Hitm - Total, LclHitm, RmtHitm
+  LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
   - count of Total/Local/Remote load HITMs
 
+  LD Hit Pct (For display 'all')
+  - cacheline percentage of all load hit accesses
+
+  LD Hit Total (For display 'all')
+  - sum of all load hit accesses
+
   Total records
   - sum of all cachelines accesses
 
@@ -207,9 +214,12 @@ For each cacheline in the 1) list we display following data:
 
 For each offset in the 2) list we display following data:
 
-  HITM - Rmt, Lcl
+  HITM - Rmt, Lcl (For display with HITM types)
   - % of Remote/Local HITM accesses for given offset within cacheline
 
+  Load Refs - Hit, Miss (For display 'all')
+  - % of load accesses that hit/missed cache for given offset within cacheline
+
   Store Refs - L1 Hit, L1 Miss
   - % of store accesses that hit/missed L1 for given offset within cacheline
 
@@ -249,7 +259,8 @@ The 'Node' field displays nodes that accesses given cacheline
 offset. Its output comes in 3 flavors:
   - node IDs separated by ','
   - node IDs with stats for each ID, in following format:
-      Node{cpus %hitms %stores}
+      Node{cpus %hitms %stores} (For display with HITM types)
+      Node{cpus %loads %stores} (For display with "all")
   - node IDs with list of affected CPUs in following format:
       Node{cpu list}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
  2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
                   ` (10 preceding siblings ...)
  2020-12-13 13:38 ` [PATCH v2 11/11] perf c2c: Update documentation for display option 'all' Leo Yan
@ 2021-01-03 22:52 ` Jiri Olsa
  2021-01-04  2:09   ` Leo Yan
  11 siblings, 1 reply; 31+ messages in thread
From: Jiri Olsa @ 2021-01-03 22:52 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

On Sun, Dec 13, 2020 at 01:38:39PM +0000, Leo Yan wrote:
> This patch set is to sort cache line for all load operations which hit
> any cache levels.  For single cache line view, it shows the load
> references for loads with cache hits and with cache misses respectively.
> 
> This series is a following for the old patch set "perf c2c: Sort
> cacheline with LLC load" [1], in the old patch set it tries to sort
> cache line with the load operations in last level cache (LLC), after
> testing we found the trace data doesn't contain LLC events if the
> platform isn't a NUMA system.  For this reason, this series refines the
> implementation to sort on all cache levels hits of load operations; it's
> reasonable for us to review the load and store opreations, if detects
> any cache line is accessed by multi-threads, this hints that the cache
> line is possible for false sharing.
> 
> This patch set is clearly applied on perf/core branch with the latest
> commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
> evlist' record methods").  And the changes has been tested on x86 and
> Arm64, the testing result is shown as below.

SNIP

> 
> 
>   [...]
> 
> Changes from v1:
> * Changed from sorting on LLC to sorting on all loads with cache hits;
> * Added patches 06/11, 07/11 for refactoring macros;
> * Added patch 08/11 for refactoring node header, so can display "%loads"
>   rather than "%hitms" in the header;
> * Added patch 09/11 to add local pointers for pointing to output metrics
>   string and sort string (Juri);
> * Added warning in percent_hitm() for the display "all", which should
>   never happen (Juri).
> 
> [1] https://lore.kernel.org/patchwork/cover/1321514/
> 
> 
> Leo Yan (11):
>   perf c2c: Add dimensions for total load hit
>   perf c2c: Add dimensions for load hit
>   perf c2c: Add dimensions for load miss
>   perf c2c: Rename for shared cache line stats
>   perf c2c: Refactor hist entry validation
>   perf c2c: Refactor display filter macro
>   perf c2c: Refactor node display macro
>   perf c2c: Refactor node header
>   perf c2c: Add local variables for output metrics
>   perf c2c: Sort on all cache hit for load operations
>   perf c2c: Update documentation for display option 'all'
> 
>  tools/perf/Documentation/perf-c2c.txt |  21 +-
>  tools/perf/builtin-c2c.c              | 548 ++++++++++++++++++++++----
>  2 files changed, 487 insertions(+), 82 deletions(-)

Joe might want to test it first, but it looks all good to me:

Acked-by: Jiri Olsa <jolsa@redhat.com>

thanks,
jirka


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
  2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
@ 2021-01-04  2:09   ` Leo Yan
  2021-01-04  9:35     ` Jiri Olsa
  0 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2021-01-04  2:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

On Sun, Jan 03, 2021 at 11:52:19PM +0100, Jiri Olsa wrote:
> On Sun, Dec 13, 2020 at 01:38:39PM +0000, Leo Yan wrote:
> > This patch set is to sort cache line for all load operations which hit
> > any cache levels.  For single cache line view, it shows the load
> > references for loads with cache hits and with cache misses respectively.
> > 
> > This series is a following for the old patch set "perf c2c: Sort
> > cacheline with LLC load" [1], in the old patch set it tries to sort
> > cache line with the load operations in last level cache (LLC), after
> > testing we found the trace data doesn't contain LLC events if the
> > platform isn't a NUMA system.  For this reason, this series refines the
> > implementation to sort on all cache levels hits of load operations; it's
> > reasonable for us to review the load and store opreations, if detects
> > any cache line is accessed by multi-threads, this hints that the cache
> > line is possible for false sharing.
> > 
> > This patch set is clearly applied on perf/core branch with the latest
> > commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
> > evlist' record methods").  And the changes has been tested on x86 and
> > Arm64, the testing result is shown as below.
> 
> SNIP
> 
> > 
> > 
> >   [...]
> > 
> > Changes from v1:
> > * Changed from sorting on LLC to sorting on all loads with cache hits;
> > * Added patches 06/11, 07/11 for refactoring macros;
> > * Added patch 08/11 for refactoring node header, so can display "%loads"
> >   rather than "%hitms" in the header;
> > * Added patch 09/11 to add local pointers for pointing to output metrics
> >   string and sort string (Juri);
> > * Added warning in percent_hitm() for the display "all", which should
> >   never happen (Juri).
> > 
> > [1] https://lore.kernel.org/patchwork/cover/1321514/
> > 
> > 
> > Leo Yan (11):
> >   perf c2c: Add dimensions for total load hit
> >   perf c2c: Add dimensions for load hit
> >   perf c2c: Add dimensions for load miss
> >   perf c2c: Rename for shared cache line stats
> >   perf c2c: Refactor hist entry validation
> >   perf c2c: Refactor display filter macro
> >   perf c2c: Refactor node display macro
> >   perf c2c: Refactor node header
> >   perf c2c: Add local variables for output metrics
> >   perf c2c: Sort on all cache hit for load operations
> >   perf c2c: Update documentation for display option 'all'
> > 
> >  tools/perf/Documentation/perf-c2c.txt |  21 +-
> >  tools/perf/builtin-c2c.c              | 548 ++++++++++++++++++++++----
> >  2 files changed, 487 insertions(+), 82 deletions(-)
> 
> Joe might want to test it first, but it looks all good to me:
> 
> Acked-by: Jiri Olsa <jolsa@redhat.com>

Thanks for the review, Jiri.

Note, after testing with Arm SPE, we found the store operations don't
contain the information for L1 cache hit or miss, this leads to there
have no statistics for "st_l1hit" and "st_l1miss"; finally the single
cache line view only can show the load samples and fails to show store
opreations due to the empty statistics for "st_l1hit" and "st_l1miss".

This is related the hardware issue, after some discussion internally,
so far cannot find a easy way to set memory flag for L1 cache hit or
miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
PERF_MEM_LVL_MISS for store's L1 cache accessing).

Given it is uncertain for this issue, please hold on for this patch
series and I will resend if have any conclusion.

And really sorry I notify this too late.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
  2021-01-04  2:09   ` Leo Yan
@ 2021-01-04  9:35     ` Jiri Olsa
  2021-01-15 15:17       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Olsa @ 2021-01-04  9:35 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

On Mon, Jan 04, 2021 at 10:09:38AM +0800, Leo Yan wrote:

SNIP

> > > Leo Yan (11):
> > >   perf c2c: Add dimensions for total load hit
> > >   perf c2c: Add dimensions for load hit
> > >   perf c2c: Add dimensions for load miss
> > >   perf c2c: Rename for shared cache line stats
> > >   perf c2c: Refactor hist entry validation
> > >   perf c2c: Refactor display filter macro
> > >   perf c2c: Refactor node display macro
> > >   perf c2c: Refactor node header
> > >   perf c2c: Add local variables for output metrics
> > >   perf c2c: Sort on all cache hit for load operations
> > >   perf c2c: Update documentation for display option 'all'
> > > 
> > >  tools/perf/Documentation/perf-c2c.txt |  21 +-
> > >  tools/perf/builtin-c2c.c              | 548 ++++++++++++++++++++++----
> > >  2 files changed, 487 insertions(+), 82 deletions(-)
> > 
> > Joe might want to test it first, but it looks all good to me:
> > 
> > Acked-by: Jiri Olsa <jolsa@redhat.com>
> 
> Thanks for the review, Jiri.
> 
> Note, after testing with Arm SPE, we found the store operations don't
> contain the information for L1 cache hit or miss, this leads to there
> have no statistics for "st_l1hit" and "st_l1miss"; finally the single
> cache line view only can show the load samples and fails to show store
> opreations due to the empty statistics for "st_l1hit" and "st_l1miss".
> 
> This is related the hardware issue, after some discussion internally,
> so far cannot find a easy way to set memory flag for L1 cache hit or
> miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
> PERF_MEM_LVL_MISS for store's L1 cache accessing).
> 
> Given it is uncertain for this issue, please hold on for this patch
> series and I will resend if have any conclusion.
> 
> And really sorry I notify this too late.

no problem, I think we can take some of the refactoring patches anyway

thanks,
jirka


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 01/11] perf c2c: Add dimensions for total load hit
  2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
@ 2021-01-06  7:38   ` Namhyung Kim
  2021-01-11  7:49     ` Leo Yan
  0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06  7:38 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

Hi,

On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Arm SPE trace data doesn't support HITM, but we still want to explore
> "perf c2c" tool to analyze cache false sharing.  If without HITM tag,
> the tool cannot give out accurate result for cache false sharing, a
> candidate solution is to sort the total load operations and connect with
> the threads info, e.g. if multiple threads hit the same cache line for
> many times, this can give out the hint that it's likely to cause cache
> false sharing issue.
>
> Unlike having HITM tag, the proposed solution is not accurate and might
> introduce false positive reporting, but it's a pragmatic approach for
> detecting false sharing if memory event doesn't support HITM.
>
> To sort with the cache line hit, this patch adds dimensions for total
> load hit and the associated percentage calculation.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/builtin-c2c.c | 112 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 112 insertions(+)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index c5babeaa3b38..3d5a2dc8b4fd 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -615,6 +615,47 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
>         return tot_hitm_left - tot_hitm_right;
>  }
>
> +#define TOT_LD_HIT(stats)              \
> +       ((stats)->ld_fbhit +            \
> +        (stats)->ld_l1hit +            \
> +        (stats)->ld_l2hit +            \
> +        (stats)->ld_llchit +           \
> +        (stats)->lcl_hitm +            \
> +        (stats)->rmt_hitm +            \
> +        (stats)->rmt_hit)

It doesn't need to be a macro, why not use a static inline function?

Thanks,
Namhyung


> +
> +static int tot_ld_hit_entry(struct perf_hpp_fmt *fmt,
> +                           struct perf_hpp *hpp,
> +                           struct hist_entry *he)
> +{
> +       struct c2c_hist_entry *c2c_he;
> +       int width = c2c_width(fmt, hpp, he->hists);
> +       unsigned int tot_hit;
> +
> +       c2c_he = container_of(he, struct c2c_hist_entry, he);
> +       tot_hit = TOT_LD_HIT(&c2c_he->stats);
> +
> +       return scnprintf(hpp->buf, hpp->size, "%*u", width, tot_hit);
> +}
> +
> +static int64_t tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> +                             struct hist_entry *left,
> +                             struct hist_entry *right)
> +{
> +       struct c2c_hist_entry *c2c_left;
> +       struct c2c_hist_entry *c2c_right;
> +       uint64_t tot_hit_left;
> +       uint64_t tot_hit_right;
> +
> +       c2c_left  = container_of(left, struct c2c_hist_entry, he);
> +       c2c_right = container_of(right, struct c2c_hist_entry, he);
> +
> +       tot_hit_left  = TOT_LD_HIT(&c2c_left->stats);
> +       tot_hit_right = TOT_LD_HIT(&c2c_right->stats);
> +
> +       return tot_hit_left - tot_hit_right;
> +}
> +
>  #define STAT_FN_ENTRY(__f)                                     \
>  static int                                                     \
>  __f ## _entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,  \
> @@ -860,6 +901,58 @@ percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
>         return per_left - per_right;
>  }
>
> +static double percent_tot_ld_hit(struct c2c_hist_entry *c2c_he)
> +{
> +       struct c2c_hists *hists;
> +       int tot = 0, st = 0;
> +
> +       hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
> +
> +       st  = TOT_LD_HIT(&c2c_he->stats);
> +       tot = TOT_LD_HIT(&hists->stats);
> +
> +       return tot ? (double) st * 100 / tot : 0;
> +}
> +
> +static int
> +percent_tot_ld_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> +                        struct hist_entry *he)
> +{
> +       struct c2c_hist_entry *c2c_he;
> +       int width = c2c_width(fmt, hpp, he->hists);
> +       char buf[10];
> +       double per;
> +
> +       c2c_he = container_of(he, struct c2c_hist_entry, he);
> +       per = percent_tot_ld_hit(c2c_he);
> +       return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
> +}
> +
> +static int
> +percent_tot_ld_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> +                        struct hist_entry *he)
> +{
> +       return percent_color(fmt, hpp, he, percent_tot_ld_hit);
> +}
> +
> +static int64_t
> +percent_tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> +                  struct hist_entry *left, struct hist_entry *right)
> +{
> +       struct c2c_hist_entry *c2c_left;
> +       struct c2c_hist_entry *c2c_right;
> +       double per_left;
> +       double per_right;
> +
> +       c2c_left  = container_of(left, struct c2c_hist_entry, he);
> +       c2c_right = container_of(right, struct c2c_hist_entry, he);
> +
> +       per_left  = percent_tot_ld_hit(c2c_left);
> +       per_right = percent_tot_ld_hit(c2c_right);
> +
> +       return per_left - per_right;
> +}
> +
>  static struct c2c_stats *he_stats(struct hist_entry *he)
>  {
>         struct c2c_hist_entry *c2c_he;
> @@ -1412,6 +1505,14 @@ static struct c2c_dimension dim_ld_rmthit = {
>         .width          = 8,
>  };
>
> +static struct c2c_dimension dim_tot_ld_hit = {
> +       .header         = HEADER_BOTH("Load Hit", "Total"),
> +       .name           = "tot_ld_hit",
> +       .cmp            = tot_ld_hit_cmp,
> +       .entry          = tot_ld_hit_entry,
> +       .width          = 8,
> +};
> +
>  static struct c2c_dimension dim_tot_recs = {
>         .header         = HEADER_BOTH("Total", "records"),
>         .name           = "tot_recs",
> @@ -1460,6 +1561,15 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
>         .width          = 7,
>  };
>
> +static struct c2c_dimension dim_percent_tot_ld_hit = {
> +       .header         = HEADER_BOTH("Load Hit", "Pct"),
> +       .name           = "percent_tot_ld_hit",
> +       .cmp            = percent_tot_ld_hit_cmp,
> +       .entry          = percent_tot_ld_hit_entry,
> +       .color          = percent_tot_ld_hit_color,
> +       .width          = 8,
> +};
> +
>  static struct c2c_dimension dim_percent_stores_l1hit = {
>         .header         = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
>         .name           = "percent_stores_l1hit",
> @@ -1615,11 +1725,13 @@ static struct c2c_dimension *dimensions[] = {
>         &dim_ld_l2hit,
>         &dim_ld_llchit,
>         &dim_ld_rmthit,
> +       &dim_tot_ld_hit,
>         &dim_tot_recs,
>         &dim_tot_loads,
>         &dim_percent_hitm,
>         &dim_percent_rmt_hitm,
>         &dim_percent_lcl_hitm,
> +       &dim_percent_tot_ld_hit,
>         &dim_percent_stores_l1hit,
>         &dim_percent_stores_l1miss,
>         &dim_dram_lcl,
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 02/11] perf c2c: Add dimensions for load hit
  2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
@ 2021-01-06  7:38   ` Namhyung Kim
  2021-01-11  8:22     ` Leo Yan
  0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06  7:38 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Add dimensions for load hit and its percentage calculation, which is to
> be displayed in the single cache line output.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/builtin-c2c.c | 71 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 3d5a2dc8b4fd..00014e3d81fa 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -1052,6 +1052,58 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
>         return per_left - per_right;
>  }
>
> +static double percent_ld_hit(struct c2c_hist_entry *c2c_he)
> +{
> +       struct c2c_hists *hists;
> +       int tot, st;
> +
> +       hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
> +
> +       st  = TOT_LD_HIT(&c2c_he->stats);
> +       tot = TOT_LD_HIT(&hists->stats);
> +
> +       return percent(st, tot);

It's not clear to me what's different than percent_tot_ld_hit().

Thanks,
Namhyung



> +}
> +
> +static int
> +percent_ld_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> +                    struct hist_entry *he)
> +{
> +       struct c2c_hist_entry *c2c_he;
> +       int width = c2c_width(fmt, hpp, he->hists);
> +       char buf[10];
> +       double per;
> +
> +       c2c_he = container_of(he, struct c2c_hist_entry, he);
> +       per = percent_ld_hit(c2c_he);
> +       return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
> +}
> +
> +static int
> +percent_ld_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> +                    struct hist_entry *he)
> +{
> +       return percent_color(fmt, hpp, he, percent_ld_hit);
> +}
> +
> +static int64_t
> +percent_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> +                  struct hist_entry *left, struct hist_entry *right)
> +{
> +       struct c2c_hist_entry *c2c_left;
> +       struct c2c_hist_entry *c2c_right;
> +       double per_left;
> +       double per_right;
> +
> +       c2c_left  = container_of(left, struct c2c_hist_entry, he);
> +       c2c_right = container_of(right, struct c2c_hist_entry, he);
> +
> +       per_left  = percent_ld_hit(c2c_left);
> +       per_right = percent_ld_hit(c2c_right);
> +
> +       return per_left - per_right;
> +}
> +
>  static int
>  percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
>                            struct hist_entry *he)
> @@ -1417,6 +1469,14 @@ static struct c2c_dimension dim_cl_rmt_hitm = {
>         .width          = 7,
>  };
>
> +static struct c2c_dimension dim_cl_tot_ld_hit = {
> +       .header         = HEADER_SPAN("--- Load ---", "Hit", 1),
> +       .name           = "cl_tot_ld_hit",
> +       .cmp            = tot_ld_hit_cmp,
> +       .entry          = tot_ld_hit_entry,
> +       .width          = 7,
> +};
> +
>  static struct c2c_dimension dim_cl_lcl_hitm = {
>         .header         = HEADER_SPAN_LOW("Lcl"),
>         .name           = "cl_lcl_hitm",
> @@ -1570,6 +1630,15 @@ static struct c2c_dimension dim_percent_tot_ld_hit = {
>         .width          = 8,
>  };
>
> +static struct c2c_dimension dim_percent_ld_hit = {
> +       .header         = HEADER_SPAN("--  Load Refs --", "Hit", 1),
> +       .name           = "percent_ld_hit",
> +       .cmp            = percent_ld_hit_cmp,
> +       .entry          = percent_ld_hit_entry,
> +       .color          = percent_ld_hit_color,
> +       .width          = 7,
> +};
> +
>  static struct c2c_dimension dim_percent_stores_l1hit = {
>         .header         = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
>         .name           = "percent_stores_l1hit",
> @@ -1715,6 +1784,7 @@ static struct c2c_dimension *dimensions[] = {
>         &dim_rmt_hitm,
>         &dim_cl_lcl_hitm,
>         &dim_cl_rmt_hitm,
> +       &dim_cl_tot_ld_hit,
>         &dim_tot_stores,
>         &dim_stores_l1hit,
>         &dim_stores_l1miss,
> @@ -1731,6 +1801,7 @@ static struct c2c_dimension *dimensions[] = {
>         &dim_percent_hitm,
>         &dim_percent_rmt_hitm,
>         &dim_percent_lcl_hitm,
> +       &dim_percent_ld_hit,
>         &dim_percent_tot_ld_hit,
>         &dim_percent_stores_l1hit,
>         &dim_percent_stores_l1miss,
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 03/11] perf c2c: Add dimensions for load miss
  2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
@ 2021-01-06  7:42   ` Namhyung Kim
  2021-01-11  8:41     ` Leo Yan
  0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06  7:42 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Add dimensions for load miss and its percentage calculation, which is to
> be displayed in the single cache line output.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/builtin-c2c.c | 107 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 107 insertions(+)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 00014e3d81fa..27745340c14a 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -624,6 +624,10 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
>          (stats)->rmt_hitm +            \
>          (stats)->rmt_hit)
>
> +#define TOT_LD_MISS(stats)             \
> +       ((stats)->lcl_dram +            \
> +        (stats)->rmt_dram)
> +

Is this true always?  I'm not sure if there's a case where stores can go to DRAM
directly.. maybe like a kind of uncached accesses.

Also it can be a static function..

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/11] perf c2c: Rename for shared cache line stats
  2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
@ 2021-01-06  7:44   ` Namhyung Kim
  2021-01-11  8:42     ` Leo Yan
  0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06  7:44 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> For shared cache line statistics, it relies on HITM.  We can use more
> general naming rather than only binding to HITM, so replace "hitm_stats"
> with "shared_clines_stats" in structure perf_c2c, and rename function
> resort_hitm_cb() to resort_shared_cl_cb().
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/builtin-c2c.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 27745340c14a..580c4ead68db 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -97,8 +97,8 @@ struct perf_c2c {
>         bool                     symbol_full;
>         bool                     stitch_lbr;
>
> -       /* HITM shared clines stats */
> -       struct c2c_stats        hitm_stats;
> +       /* Shared clines stats */

Please change it to "Shared cache line stats".

Thanks,
Namhyung


> +       struct c2c_stats        shared_clines_stats;
>         int                     shared_clines;
>
>         int                      display;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 06/11] perf c2c: Refactor display filter macro
  2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
@ 2021-01-06  7:47   ` Namhyung Kim
  2021-01-11  8:43     ` Leo Yan
  0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06  7:47 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> When sort on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm),
> macro FILTER_HITM is to filter out the cache line entries if its
> overhead is less than 1%.
>
> This patch is to refactor macro FILTER_HITM.  It uses more gernal name
> FILTER_DISPLAY to replace the old name; and refines its parameter,
> rather than passing field name for the data structure, it changes to
> pass the cache line's statistic value and the sum value, this is more
> flexsible, e.g. if consider to extend for sorting on all load hits
> which combines multiple fields from structure c2c_stats.

As it doesn't use field names anymore, I think it's better to change it to
a static function.

Thanks,
Namhyung


>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/builtin-c2c.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 5cd30c083d6c..f11c3c84bb2b 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -2151,24 +2151,27 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
>
>         c2c_he = container_of(he, struct c2c_hist_entry, he);
>
> -#define FILTER_HITM(__h)                                               \
> -       if (stats->__h) {                                               \
> -               ld_dist = ((double)c2c_he->stats.__h / stats->__h);     \
> +#define FILTER_DISPLAY(val, sum)                                       \
> +{                                                                      \
> +       if ((sum)) {                                                    \
> +               ld_dist = ((double)(val) / (sum));                      \
>                 if (ld_dist < DISPLAY_LINE_LIMIT)                       \
>                         he->filtered = HIST_FILTER__C2C;                \
>         } else {                                                        \
>                 he->filtered = HIST_FILTER__C2C;                        \
> -       }
> +       }                                                               \
> +}
>
>         switch (c2c.display) {
>         case DISPLAY_LCL:
> -               FILTER_HITM(lcl_hitm);
> +               FILTER_DISPLAY(c2c_he->stats.lcl_hitm, stats->lcl_hitm);
>                 break;
>         case DISPLAY_RMT:
> -               FILTER_HITM(rmt_hitm);
> +               FILTER_DISPLAY(c2c_he->stats.rmt_hitm, stats->rmt_hitm);
>                 break;
>         case DISPLAY_TOT:
> -               FILTER_HITM(tot_hitm);
> +               FILTER_DISPLAY(c2c_he->stats.tot_hitm, stats->tot_hitm);
> +               break;
>         default:
>                 break;
>         }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/11] perf c2c: Refactor node display macro
  2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
@ 2021-01-06  7:47   ` Namhyung Kim
  2021-01-11  8:44     ` Leo Yan
  0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06  7:47 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> The macro DISPLAY_HITM() is used to calculate HITM percentage introduced
> by every node and it's shown for the node info.
>
> This patch refactors the macro, it is renamed it as DISPLAY_METRICS().
> And the parameters is changed for passing the metric's statistic value
> and the sum value, this is flexsible for later's extension, e.g. it's
> easier to be used for metrics which combines multiple fields from
> structure c2c_stats.

Same as the previous one.

Thanks,
Namhyung


>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/builtin-c2c.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index f11c3c84bb2b..50018bfb1089 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -1324,23 +1324,26 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
>                         ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
>                         advance_hpp(hpp, ret);
>
> -               #define DISPLAY_HITM(__h)                                               \
> -                       if (c2c_he->stats.__h> 0) {                                     \
> +               #define DISPLAY_METRICS(val, sum)                                       \
> +               {                                                                       \
> +                       if ((sum) > 0) {                                                \
>                                 ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ",        \
> -                                               percent(stats->__h, c2c_he->stats.__h));\
> +                                               percent((val), (sum)));                 \
>                         } else {                                                        \
>                                 ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a");    \
> -                       }
> +                       }                                                               \
> +               }
>
>                         switch (c2c.display) {
>                         case DISPLAY_RMT:
> -                               DISPLAY_HITM(rmt_hitm);
> +                               DISPLAY_METRICS(stats->rmt_hitm, c2c_he->stats.rmt_hitm);
>                                 break;
>                         case DISPLAY_LCL:
> -                               DISPLAY_HITM(lcl_hitm);
> +                               DISPLAY_METRICS(stats->lcl_hitm, c2c_he->stats.lcl_hitm);
>                                 break;
>                         case DISPLAY_TOT:
> -                               DISPLAY_HITM(tot_hitm);
> +                               DISPLAY_METRICS(stats->tot_hitm, c2c_he->stats.tot_hitm);
> +                               break;
>                         default:
>                                 break;
>                         }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations
  2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
@ 2021-01-06  7:52   ` Namhyung Kim
  2021-01-11  8:47     ` Leo Yan
  0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06  7:52 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Except the existed three display options 'tot', 'rmt', 'lcl', this patch
> adds option 'all' so can sort on the all cache hit for load operation.
> This new introduced option can be a choice for profiling cache false
> sharing if the memory event doesn't contain HITM tags.
>
> For displaying with option 'all', the "Shared Data Cache Line Table" and
> "Shared Cache Line Distribution Pareto" both have difference comparing
> to other three display options.
>
> For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
> it sorts with the metrics "tot_ld_hit" and "percent_tot_ld_hit".  If
> without HITM metrics, users can analyze the load hit statistics for all
> cache levels, so the dimensions of total load hit is used to replace
> HITM dimensions.
>
> For Pareto, every single cache line shows the metrics "cl_tot_ld_hit"
> and "cl_tot_ld_miss" instead of "cl_rmt_hitm" and "percent_lcl_hitm",
> and the single cache line view is sorted by metrics "tot_ld_hit".
>
> As result, we can get the 'all' display as follows:
>
>   # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio
>
>   [...]
>
>   =================================================
>              Shared Data Cache Line Table
>   =================================================
>   #
>   #        ----------- Cacheline ----------  Load Hit  Load Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt       Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  ........  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0x556f25dff100     0    1895    75.73%      4591     7840     4591     3249     2633      616      849     2734       67        58      883         0        0         0         0
>         1      0x556f25dff080     0       1    13.10%       794      794      794        0        0        0      164      486       28        20       96         0        0         0         0
>         2      0x556f25dff0c0     0       1    10.01%       607      607      607        0        0        0      107        5        5       488        2         0        0         0         0
>
>   =================================================
>         Shared Cache Line Distribution Pareto
>   =================================================
>   #
>   #        --  Load Refs --  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
>   #   Num      Hit     Miss   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
>   # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
>   #
>     -------------------------------------------------------------
>         0     4591        0     2633      616      0x556f25dff100
>     -------------------------------------------------------------
>             20.52%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc1d         0      2200      1276      942         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>             19.82%    0.00%   38.06%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc16         0      2190      1130     1912         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
>             18.25%    0.00%   56.63%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc16         0      2173      1074     2329         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
>             18.23%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc1d         0      2013      1220      837         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              0.00%    0.00%    3.11%   59.90%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc28         0         0         0      451         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              0.00%    0.00%    2.20%   40.10%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc28         0         0         0      305         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>             12.00%    0.00%    0.00%    0.00%                0x20     0       1    28079    28083:reader_thd      0x556f25bfdc73         0       159       107      551         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
>             11.17%    0.00%    0.00%    0.00%                0x20     0       1    28079    28084:reader_thd      0x556f25bfdc73         0       148       108      513         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
>
>   [...]
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/builtin-c2c.c | 139 ++++++++++++++++++++++++++++-----------
>  1 file changed, 101 insertions(+), 38 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 9342c30d86ee..0df4a4a30f7a 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
[SNIP]
> @@ -2502,12 +2528,21 @@ static void print_pareto(FILE *out)
>         int ret;
>         const char *cl_output;
>
> -       cl_output = "cl_num,"
> -                   "cl_rmt_hitm,"
> -                   "cl_lcl_hitm,"
> -                   "cl_stores_l1hit,"
> -                   "cl_stores_l1miss,"
> -                   "dcacheline";
> +       if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> +           c2c.display == DISPLAY_RMT)
> +               cl_output = "cl_num,"
> +                           "cl_rmt_hitm,"
> +                           "cl_lcl_hitm,"
> +                           "cl_stores_l1hit,"
> +                           "cl_stores_l1miss,"
> +                           "dcacheline";
> +       else /* c2c.display == DISPLAY_ALL */
> +               cl_output = "cl_num,"
> +                           "cl_tot_ld_hit,"
> +                           "cl_tot_ld_miss,"
> +                           "cl_stores_l1hit,"
> +                           "cl_stores_l1miss,"
> +                           "dcacheline";

Nit: You can keep the default value as is, and add an if statement
just for the DISPLAY_ALL.

>
>         perf_hpp_list__init(&hpp_list);
>         ret = hpp_list__parse(&hpp_list, cl_output, NULL);
> @@ -2543,7 +2578,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
>                 fprintf(out, "%-36s: %s\n", first ? "  Events" : "", evsel__name(evsel));
>                 first = false;
>         }
> -       fprintf(out, "  Cachelines sort on                : %s HITMs\n",
> +       fprintf(out, "  Cachelines sort on                : %s\n",
>                 display_str[c2c.display]);
>         fprintf(out, "  Cacheline data grouping           : %s\n", c2c.cl_sort);
>  }
> @@ -2700,7 +2735,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
>  {
>         scnprintf(bf, size,
>                   "Shared Data Cache Line Table     "
> -                 "(%lu entries, sorted on %s HITMs)",
> +                 "(%lu entries, sorted on %s)",
>                   browser->nr_non_filtered_entries,
>                   display_str[c2c.display]);
>         return 0;
> @@ -2906,6 +2941,8 @@ static int setup_display(const char *str)
>                 c2c.display = DISPLAY_RMT;
>         else if (!strcmp(display, "lcl"))
>                 c2c.display = DISPLAY_LCL;
> +       else if (!strcmp(display, "all"))
> +               c2c.display = DISPLAY_ALL;
>         else {
>                 pr_err("failed: unknown display type: %s\n", str);
>                 return -1;
> @@ -2952,10 +2989,12 @@ static int build_cl_output(char *cl_sort, bool no_source)
>         }
>
>         if (asprintf(&c2c.cl_output,
> -               "%s%s%s%s%s%s%s%s%s%s",
> +               "%s%s%s%s%s%s%s%s%s%s%s",
>                 c2c.use_stdio ? "cl_num_empty," : "",
> -               "percent_rmt_hitm,"
> -               "percent_lcl_hitm,"
> +               c2c.display == DISPLAY_ALL ? "percent_ld_hit,"
> +                                            "percent_ld_miss," :
> +                                            "percent_rmt_hitm,"
> +                                            "percent_lcl_hitm,",
>                 "percent_stores_l1hit,"
>                 "percent_stores_l1miss,"
>                 "offset,offset_node,dcacheline_count,",
> @@ -2984,6 +3023,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
>  static int setup_coalesce(const char *coalesce, bool no_source)
>  {
>         const char *c = coalesce ?: coalesce_default;
> +       const char *sort_str = NULL;
>
>         if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
>                 return -ENOMEM;
> @@ -2991,12 +3031,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
>         if (build_cl_output(c2c.cl_sort, no_source))
>                 return -1;
>
> -       if (asprintf(&c2c.cl_resort, "offset,%s",
> -                    c2c.display == DISPLAY_TOT ?
> -                    "tot_hitm" :
> -                    c2c.display == DISPLAY_RMT ?
> -                    "rmt_hitm,lcl_hitm" :
> -                    "lcl_hitm,rmt_hitm") < 0)
> +       if (c2c.display == DISPLAY_TOT)
> +               sort_str = "tot_hitm";
> +       else if (c2c.display == DISPLAY_RMT)
> +               sort_str = "rmt_hitm,lcl_hitm";
> +       else if (c2c.display == DISPLAY_LCL)
> +               sort_str = "lcl_hitm,rmt_hitm";
> +       else if (c2c.display == DISPLAY_ALL)
> +               sort_str = "tot_ld_hit";
> +
> +       if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
>                 return -ENOMEM;
>
>         pr_debug("coalesce sort   fields: %s\n", c2c.cl_sort);
> @@ -3131,20 +3175,37 @@ static int perf_c2c__report(int argc, const char **argv)
>                 goto out_mem2node;
>         }
>
> -       output_str = "cl_idx,"
> -                    "dcacheline,"
> -                    "dcacheline_node,"
> -                    "dcacheline_count,"
> -                    "percent_hitm,"
> -                    "tot_hitm,lcl_hitm,rmt_hitm,"
> -                    "tot_recs,"
> -                    "tot_loads,"
> -                    "tot_stores,"
> -                    "stores_l1hit,stores_l1miss,"
> -                    "ld_fbhit,ld_l1hit,ld_l2hit,"
> -                    "ld_lclhit,lcl_hitm,"
> -                    "ld_rmthit,rmt_hitm,"
> -                    "dram_lcl,dram_rmt";
> +       if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> +           c2c.display == DISPLAY_RMT)
> +               output_str = "cl_idx,"
> +                            "dcacheline,"
> +                            "dcacheline_node,"
> +                            "dcacheline_count,"
> +                            "percent_hitm,"
> +                            "tot_hitm,lcl_hitm,rmt_hitm,"
> +                            "tot_recs,"
> +                            "tot_loads,"
> +                            "tot_stores,"
> +                            "stores_l1hit,stores_l1miss,"
> +                            "ld_fbhit,ld_l1hit,ld_l2hit,"
> +                            "ld_lclhit,lcl_hitm,"
> +                            "ld_rmthit,rmt_hitm,"
> +                            "dram_lcl,dram_rmt";
> +       else /* c2c.display == DISPLAY_ALL */
> +               output_str = "cl_idx,"
> +                            "dcacheline,"
> +                            "dcacheline_node,"
> +                            "dcacheline_count,"
> +                            "percent_tot_ld_hit,"
> +                            "tot_ld_hit,"
> +                            "tot_recs,"
> +                            "tot_loads,"
> +                            "tot_stores,"
> +                            "stores_l1hit,stores_l1miss,"
> +                            "ld_fbhit,ld_l1hit,ld_l2hit,"
> +                            "ld_lclhit,lcl_hitm,"
> +                            "ld_rmthit,rmt_hitm,"
> +                            "dram_lcl,dram_rmt";

Ditto.

Thanks,
Namhyung


>
>         if (c2c.display == DISPLAY_TOT)
>                 sort_str = "tot_hitm";
> @@ -3152,6 +3213,8 @@ static int perf_c2c__report(int argc, const char **argv)
>                 sort_str = "rmt_hitm";
>         else if (c2c.display == DISPLAY_LCL)
>                 sort_str = "lcl_hitm";
> +       else if (c2c.display == DISPLAY_ALL)
> +               sort_str = "tot_ld_hit";
>
>         c2c_hists__reinit(&c2c.hists, output_str, sort_str);
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 01/11] perf c2c: Add dimensions for total load hit
  2021-01-06  7:38   ` Namhyung Kim
@ 2021-01-11  7:49     ` Leo Yan
  0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11  7:49 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

Hi Namhyung,

On Wed, Jan 06, 2021 at 04:38:01PM +0900, Namhyung Kim wrote:
> Hi,
> 
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Arm SPE trace data doesn't support HITM, but we still want to explore
> > "perf c2c" tool to analyze cache false sharing.  If without HITM tag,
> > the tool cannot give out accurate result for cache false sharing, a
> > candidate solution is to sort the total load operations and connect with
> > the threads info, e.g. if multiple threads hit the same cache line for
> > many times, this can give out the hint that it's likely to cause cache
> > false sharing issue.
> >
> > Unlike having HITM tag, the proposed solution is not accurate and might
> > introduce false positive reporting, but it's a pragmatic approach for
> > detecting false sharing if memory event doesn't support HITM.
> >
> > To sort with the cache line hit, this patch adds dimensions for total
> > load hit and the associated percentage calculation.
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/builtin-c2c.c | 112 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 112 insertions(+)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index c5babeaa3b38..3d5a2dc8b4fd 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -615,6 +615,47 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> >         return tot_hitm_left - tot_hitm_right;
> >  }
> >
> > +#define TOT_LD_HIT(stats)              \
> > +       ((stats)->ld_fbhit +            \
> > +        (stats)->ld_l1hit +            \
> > +        (stats)->ld_l2hit +            \
> > +        (stats)->ld_llchit +           \
> > +        (stats)->lcl_hitm +            \
> > +        (stats)->rmt_hitm +            \
> > +        (stats)->rmt_hit)
> 
> It doesn't need to be a macro, why not use a static inline function?

Yes, will change to use static inline function.

As explained with Jiri, this patch series is mainly for Arm SPE, but
so far we have a known issue for store operation, thus the store
operation cannot be shown properly in the single cache view of perf
c2c tool.  For this reason, I will firstly send the refactoring patches
in next version, but your comments for patches 01, 02, 03, 10, 11 will
be addressed if later upstream them.

Thanks a lot for review,
Leo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 02/11] perf c2c: Add dimensions for load hit
  2021-01-06  7:38   ` Namhyung Kim
@ 2021-01-11  8:22     ` Leo Yan
  0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11  8:22 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Wed, Jan 06, 2021 at 04:38:19PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Add dimensions for load hit and its percentage calculation, which is to
> > be displayed in the single cache line output.
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/builtin-c2c.c | 71 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 71 insertions(+)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index 3d5a2dc8b4fd..00014e3d81fa 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -1052,6 +1052,58 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> >         return per_left - per_right;
> >  }
> >
> > +static double percent_ld_hit(struct c2c_hist_entry *c2c_he)
> > +{
> > +       struct c2c_hists *hists;
> > +       int tot, st;
> > +
> > +       hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
> > +
> > +       st  = TOT_LD_HIT(&c2c_he->stats);
> > +       tot = TOT_LD_HIT(&hists->stats);
> > +
> > +       return percent(st, tot);
> 
> It's not clear to me what's different than percent_tot_ld_hit().

Yes, though these two functions are used for different view (
percent_tot_ld_hit() is for the brief one-line summary view,
percent_ld_hit() is used for the Pareto table), but I think you are
suggesting to consolidate the two functions to single function.

Will refine this in next version.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 03/11] perf c2c: Add dimensions for load miss
  2021-01-06  7:42   ` Namhyung Kim
@ 2021-01-11  8:41     ` Leo Yan
  0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11  8:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Wed, Jan 06, 2021 at 04:42:59PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Add dimensions for load miss and its percentage calculation, which is to
> > be displayed in the single cache line output.
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/builtin-c2c.c | 107 +++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 107 insertions(+)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index 00014e3d81fa..27745340c14a 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -624,6 +624,10 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> >          (stats)->rmt_hitm +            \
> >          (stats)->rmt_hit)
> >
> > +#define TOT_LD_MISS(stats)             \
> > +       ((stats)->lcl_dram +            \
> > +        (stats)->rmt_dram)
> > +
> 
> Is this true always?  I'm not sure if there's a case where stores can go to DRAM
> directly.. maybe like a kind of uncached accesses.

Firstly, from my understanding, the uncached accesses are not accounted
in the "load miss" metrics.  You could see there have other two metrics
"stats->st_uncache" and "stats->ld_uncache" which present the store
uncached and load uncached respectively.

Furthermore, based on the function total_records(), it uses below
formula to account the total load accesses:

        lclmiss  = stats->lcl_dram +
                   stats->rmt_dram +
                   stats->rmt_hitm +
                   stats->rmt_hit;

        ldcnt    = lclmiss +
                   stats->ld_fbhit +
                   stats->ld_l1hit +
                   stats->ld_l2hit +
                   stats->ld_llchit +
                   stats->lcl_hitm;

So this patch series classifies these metrics into two numbers: load hit
and load miss; load hit is considered as hit at any cache levels
(L1, L2, LLC and even remote cache), the load miss is accounted with the
rest metrics (lcl_dram + rmt_dram).

> Also it can be a static function..

Yeah, will fix.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 04/11] perf c2c: Rename for shared cache line stats
  2021-01-06  7:44   ` Namhyung Kim
@ 2021-01-11  8:42     ` Leo Yan
  0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11  8:42 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Wed, Jan 06, 2021 at 04:44:34PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > For shared cache line statistics, it relies on HITM.  We can use more
> > general naming rather than only binding to HITM, so replace "hitm_stats"
> > with "shared_clines_stats" in structure perf_c2c, and rename function
> > resort_hitm_cb() to resort_shared_cl_cb().
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/builtin-c2c.c | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index 27745340c14a..580c4ead68db 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -97,8 +97,8 @@ struct perf_c2c {
> >         bool                     symbol_full;
> >         bool                     stitch_lbr;
> >
> > -       /* HITM shared clines stats */
> > -       struct c2c_stats        hitm_stats;
> > +       /* Shared clines stats */
> 
> Please change it to "Shared cache line stats".

Will do.  Thanks!

> Thanks,
> Namhyung
> 
> 
> > +       struct c2c_stats        shared_clines_stats;
> >         int                     shared_clines;
> >
> >         int                      display;

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 06/11] perf c2c: Refactor display filter macro
  2021-01-06  7:47   ` Namhyung Kim
@ 2021-01-11  8:43     ` Leo Yan
  0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11  8:43 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Wed, Jan 06, 2021 at 04:47:00PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > When sort on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm),
> > macro FILTER_HITM is to filter out the cache line entries if its
> > overhead is less than 1%.
> >
> > This patch is to refactor macro FILTER_HITM.  It uses more gernal name
> > FILTER_DISPLAY to replace the old name; and refines its parameter,
> > rather than passing field name for the data structure, it changes to
> > pass the cache line's statistic value and the sum value, this is more
> > flexsible, e.g. if consider to extend for sorting on all load hits
> > which combines multiple fields from structure c2c_stats.
> 
> As it doesn't use field names anymore, I think it's better to change it to
> a static function.

Agreed.  Will do.

Thanks,
Leo

> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/builtin-c2c.c | 17 ++++++++++-------
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index 5cd30c083d6c..f11c3c84bb2b 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -2151,24 +2151,27 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
> >
> >         c2c_he = container_of(he, struct c2c_hist_entry, he);
> >
> > -#define FILTER_HITM(__h)                                               \
> > -       if (stats->__h) {                                               \
> > -               ld_dist = ((double)c2c_he->stats.__h / stats->__h);     \
> > +#define FILTER_DISPLAY(val, sum)                                       \
> > +{                                                                      \
> > +       if ((sum)) {                                                    \
> > +               ld_dist = ((double)(val) / (sum));                      \
> >                 if (ld_dist < DISPLAY_LINE_LIMIT)                       \
> >                         he->filtered = HIST_FILTER__C2C;                \
> >         } else {                                                        \
> >                 he->filtered = HIST_FILTER__C2C;                        \
> > -       }
> > +       }                                                               \
> > +}
> >
> >         switch (c2c.display) {
> >         case DISPLAY_LCL:
> > -               FILTER_HITM(lcl_hitm);
> > +               FILTER_DISPLAY(c2c_he->stats.lcl_hitm, stats->lcl_hitm);
> >                 break;
> >         case DISPLAY_RMT:
> > -               FILTER_HITM(rmt_hitm);
> > +               FILTER_DISPLAY(c2c_he->stats.rmt_hitm, stats->rmt_hitm);
> >                 break;
> >         case DISPLAY_TOT:
> > -               FILTER_HITM(tot_hitm);
> > +               FILTER_DISPLAY(c2c_he->stats.tot_hitm, stats->tot_hitm);
> > +               break;
> >         default:
> >                 break;
> >         }
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 07/11] perf c2c: Refactor node display macro
  2021-01-06  7:47   ` Namhyung Kim
@ 2021-01-11  8:44     ` Leo Yan
  0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11  8:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Wed, Jan 06, 2021 at 04:47:49PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > The macro DISPLAY_HITM() is used to calculate HITM percentage introduced
> > by every node and it's shown for the node info.
> >
> > This patch refactors the macro, it is renamed it as DISPLAY_METRICS().
> > And the parameters is changed for passing the metric's statistic value
> > and the sum value, this is flexsible for later's extension, e.g. it's
> > easier to be used for metrics which combines multiple fields from
> > structure c2c_stats.
> 
> Same as the previous one.

Will do, thanks.

> Thanks,
> Namhyung
> 
> 
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/builtin-c2c.c | 17 ++++++++++-------
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index f11c3c84bb2b..50018bfb1089 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -1324,23 +1324,26 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
> >                         ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
> >                         advance_hpp(hpp, ret);
> >
> > -               #define DISPLAY_HITM(__h)                                               \
> > -                       if (c2c_he->stats.__h> 0) {                                     \
> > +               #define DISPLAY_METRICS(val, sum)                                       \
> > +               {                                                                       \
> > +                       if ((sum) > 0) {                                                \
> >                                 ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ",        \
> > -                                               percent(stats->__h, c2c_he->stats.__h));\
> > +                                               percent((val), (sum)));                 \
> >                         } else {                                                        \
> >                                 ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a");    \
> > -                       }
> > +                       }                                                               \
> > +               }
> >
> >                         switch (c2c.display) {
> >                         case DISPLAY_RMT:
> > -                               DISPLAY_HITM(rmt_hitm);
> > +                               DISPLAY_METRICS(stats->rmt_hitm, c2c_he->stats.rmt_hitm);
> >                                 break;
> >                         case DISPLAY_LCL:
> > -                               DISPLAY_HITM(lcl_hitm);
> > +                               DISPLAY_METRICS(stats->lcl_hitm, c2c_he->stats.lcl_hitm);
> >                                 break;
> >                         case DISPLAY_TOT:
> > -                               DISPLAY_HITM(tot_hitm);
> > +                               DISPLAY_METRICS(stats->tot_hitm, c2c_he->stats.tot_hitm);
> > +                               break;
> >                         default:
> >                                 break;
> >                         }
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations
  2021-01-06  7:52   ` Namhyung Kim
@ 2021-01-11  8:47     ` Leo Yan
  0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11  8:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Wed, Jan 06, 2021 at 04:52:15PM +0900, Namhyung Kim wrote:

[...]

> > @@ -2502,12 +2528,21 @@ static void print_pareto(FILE *out)
> >         int ret;
> >         const char *cl_output;
> >
> > -       cl_output = "cl_num,"
> > -                   "cl_rmt_hitm,"
> > -                   "cl_lcl_hitm,"
> > -                   "cl_stores_l1hit,"
> > -                   "cl_stores_l1miss,"
> > -                   "dcacheline";
> > +       if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> > +           c2c.display == DISPLAY_RMT)
> > +               cl_output = "cl_num,"
> > +                           "cl_rmt_hitm,"
> > +                           "cl_lcl_hitm,"
> > +                           "cl_stores_l1hit,"
> > +                           "cl_stores_l1miss,"
> > +                           "dcacheline";
> > +       else /* c2c.display == DISPLAY_ALL */
> > +               cl_output = "cl_num,"
> > +                           "cl_tot_ld_hit,"
> > +                           "cl_tot_ld_miss,"
> > +                           "cl_stores_l1hit,"
> > +                           "cl_stores_l1miss,"
> > +                           "dcacheline";
> 
> Nit: You can keep the default value as is, and add an if statement
> just for the DISPLAY_ALL.

Okay, this might be more friendly for review.  Will change to this
way.

> >
> >         perf_hpp_list__init(&hpp_list);
> >         ret = hpp_list__parse(&hpp_list, cl_output, NULL);
> > @@ -2543,7 +2578,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
> >                 fprintf(out, "%-36s: %s\n", first ? "  Events" : "", evsel__name(evsel));
> >                 first = false;
> >         }
> > -       fprintf(out, "  Cachelines sort on                : %s HITMs\n",
> > +       fprintf(out, "  Cachelines sort on                : %s\n",
> >                 display_str[c2c.display]);
> >         fprintf(out, "  Cacheline data grouping           : %s\n", c2c.cl_sort);
> >  }
> > @@ -2700,7 +2735,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
> >  {
> >         scnprintf(bf, size,
> >                   "Shared Data Cache Line Table     "
> > -                 "(%lu entries, sorted on %s HITMs)",
> > +                 "(%lu entries, sorted on %s)",
> >                   browser->nr_non_filtered_entries,
> >                   display_str[c2c.display]);
> >         return 0;
> > @@ -2906,6 +2941,8 @@ static int setup_display(const char *str)
> >                 c2c.display = DISPLAY_RMT;
> >         else if (!strcmp(display, "lcl"))
> >                 c2c.display = DISPLAY_LCL;
> > +       else if (!strcmp(display, "all"))
> > +               c2c.display = DISPLAY_ALL;
> >         else {
> >                 pr_err("failed: unknown display type: %s\n", str);
> >                 return -1;
> > @@ -2952,10 +2989,12 @@ static int build_cl_output(char *cl_sort, bool no_source)
> >         }
> >
> >         if (asprintf(&c2c.cl_output,
> > -               "%s%s%s%s%s%s%s%s%s%s",
> > +               "%s%s%s%s%s%s%s%s%s%s%s",
> >                 c2c.use_stdio ? "cl_num_empty," : "",
> > -               "percent_rmt_hitm,"
> > -               "percent_lcl_hitm,"
> > +               c2c.display == DISPLAY_ALL ? "percent_ld_hit,"
> > +                                            "percent_ld_miss," :
> > +                                            "percent_rmt_hitm,"
> > +                                            "percent_lcl_hitm,",
> >                 "percent_stores_l1hit,"
> >                 "percent_stores_l1miss,"
> >                 "offset,offset_node,dcacheline_count,",
> > @@ -2984,6 +3023,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
> >  static int setup_coalesce(const char *coalesce, bool no_source)
> >  {
> >         const char *c = coalesce ?: coalesce_default;
> > +       const char *sort_str = NULL;
> >
> >         if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
> >                 return -ENOMEM;
> > @@ -2991,12 +3031,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
> >         if (build_cl_output(c2c.cl_sort, no_source))
> >                 return -1;
> >
> > -       if (asprintf(&c2c.cl_resort, "offset,%s",
> > -                    c2c.display == DISPLAY_TOT ?
> > -                    "tot_hitm" :
> > -                    c2c.display == DISPLAY_RMT ?
> > -                    "rmt_hitm,lcl_hitm" :
> > -                    "lcl_hitm,rmt_hitm") < 0)
> > +       if (c2c.display == DISPLAY_TOT)
> > +               sort_str = "tot_hitm";
> > +       else if (c2c.display == DISPLAY_RMT)
> > +               sort_str = "rmt_hitm,lcl_hitm";
> > +       else if (c2c.display == DISPLAY_LCL)
> > +               sort_str = "lcl_hitm,rmt_hitm";
> > +       else if (c2c.display == DISPLAY_ALL)
> > +               sort_str = "tot_ld_hit";
> > +
> > +       if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
> >                 return -ENOMEM;
> >
> >         pr_debug("coalesce sort   fields: %s\n", c2c.cl_sort);
> > @@ -3131,20 +3175,37 @@ static int perf_c2c__report(int argc, const char **argv)
> >                 goto out_mem2node;
> >         }
> >
> > -       output_str = "cl_idx,"
> > -                    "dcacheline,"
> > -                    "dcacheline_node,"
> > -                    "dcacheline_count,"
> > -                    "percent_hitm,"
> > -                    "tot_hitm,lcl_hitm,rmt_hitm,"
> > -                    "tot_recs,"
> > -                    "tot_loads,"
> > -                    "tot_stores,"
> > -                    "stores_l1hit,stores_l1miss,"
> > -                    "ld_fbhit,ld_l1hit,ld_l2hit,"
> > -                    "ld_lclhit,lcl_hitm,"
> > -                    "ld_rmthit,rmt_hitm,"
> > -                    "dram_lcl,dram_rmt";
> > +       if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> > +           c2c.display == DISPLAY_RMT)
> > +               output_str = "cl_idx,"
> > +                            "dcacheline,"
> > +                            "dcacheline_node,"
> > +                            "dcacheline_count,"
> > +                            "percent_hitm,"
> > +                            "tot_hitm,lcl_hitm,rmt_hitm,"
> > +                            "tot_recs,"
> > +                            "tot_loads,"
> > +                            "tot_stores,"
> > +                            "stores_l1hit,stores_l1miss,"
> > +                            "ld_fbhit,ld_l1hit,ld_l2hit,"
> > +                            "ld_lclhit,lcl_hitm,"
> > +                            "ld_rmthit,rmt_hitm,"
> > +                            "dram_lcl,dram_rmt";
> > +       else /* c2c.display == DISPLAY_ALL */
> > +               output_str = "cl_idx,"
> > +                            "dcacheline,"
> > +                            "dcacheline_node,"
> > +                            "dcacheline_count,"
> > +                            "percent_tot_ld_hit,"
> > +                            "tot_ld_hit,"
> > +                            "tot_recs,"
> > +                            "tot_loads,"
> > +                            "tot_stores,"
> > +                            "stores_l1hit,stores_l1miss,"
> > +                            "ld_fbhit,ld_l1hit,ld_l2hit,"
> > +                            "ld_lclhit,lcl_hitm,"
> > +                            "ld_rmthit,rmt_hitm,"
> > +                            "dram_lcl,dram_rmt";
> 
> Ditto.

Agreed.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
  2021-01-04  9:35     ` Jiri Olsa
@ 2021-01-15 15:17       ` Arnaldo Carvalho de Melo
  2021-01-16  0:45         ` Leo Yan
  0 siblings, 1 reply; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-15 15:17 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Leo Yan, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

Em Mon, Jan 04, 2021 at 10:35:40AM +0100, Jiri Olsa escreveu:
> On Mon, Jan 04, 2021 at 10:09:38AM +0800, Leo Yan wrote:
> 
> SNIP
> 
> > > > Leo Yan (11):
> > > >   perf c2c: Add dimensions for total load hit
> > > >   perf c2c: Add dimensions for load hit
> > > >   perf c2c: Add dimensions for load miss
> > > >   perf c2c: Rename for shared cache line stats
> > > >   perf c2c: Refactor hist entry validation
> > > >   perf c2c: Refactor display filter macro
> > > >   perf c2c: Refactor node display macro
> > > >   perf c2c: Refactor node header
> > > >   perf c2c: Add local variables for output metrics
> > > >   perf c2c: Sort on all cache hit for load operations
> > > >   perf c2c: Update documentation for display option 'all'
> > > > 
> > > >  tools/perf/Documentation/perf-c2c.txt |  21 +-
> > > >  tools/perf/builtin-c2c.c              | 548 ++++++++++++++++++++++----
> > > >  2 files changed, 487 insertions(+), 82 deletions(-)
> > > 
> > > Joe might want to test it first, but it looks all good to me:
> > > 
> > > Acked-by: Jiri Olsa <jolsa@redhat.com>
> > 
> > Thanks for the review, Jiri.
> > 
> > Note, after testing with Arm SPE, we found the store operations don't
> > contain the information for L1 cache hit or miss, this leads to there
> > have no statistics for "st_l1hit" and "st_l1miss"; finally the single
> > cache line view only can show the load samples and fails to show store
> > opreations due to the empty statistics for "st_l1hit" and "st_l1miss".
> > 
> > This is related the hardware issue, after some discussion internally,
> > so far cannot find a easy way to set memory flag for L1 cache hit or
> > miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
> > PERF_MEM_LVL_MISS for store's L1 cache accessing).
> > 
> > Given it is uncertain for this issue, please hold on for this patch
> > series and I will resend if have any conclusion.
> > 
> > And really sorry I notify this too late.
> 
> no problem, I think we can take some of the refactoring patches anyway

Agreed, in fact I already processed this series in my local branch and
I'm test building everything now.

- Arnaldo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
  2021-01-15 15:17       ` Arnaldo Carvalho de Melo
@ 2021-01-16  0:45         ` Leo Yan
  0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-16  0:45 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Namhyung Kim, Andi Kleen, Ian Rogers,
	Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
	James Clark, linux-kernel

On Fri, Jan 15, 2021 at 12:17:01PM -0300, Arnaldo Carvalho de Melo wrote:

[...]

> > > Thanks for the review, Jiri.
> > > 
> > > Note, after testing with Arm SPE, we found the store operations don't
> > > contain the information for L1 cache hit or miss, this leads to there
> > > have no statistics for "st_l1hit" and "st_l1miss"; finally the single
> > > cache line view only can show the load samples and fails to show store
> > > opreations due to the empty statistics for "st_l1hit" and "st_l1miss".
> > > 
> > > This is related the hardware issue, after some discussion internally,
> > > so far cannot find a easy way to set memory flag for L1 cache hit or
> > > miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
> > > PERF_MEM_LVL_MISS for store's L1 cache accessing).
> > > 
> > > Given it is uncertain for this issue, please hold on for this patch
> > > series and I will resend if have any conclusion.
> > > 
> > > And really sorry I notify this too late.
> > 
> > no problem, I think we can take some of the refactoring patches anyway
> 
> Agreed, in fact I already processed this series in my local branch and
> I'm test building everything now.

Thanks a lot, Arnalod.

Just remind in case you miss the latest series due to the flood emails,
I have extracted and refined the refactoring patches, and have sent out
patch series v4 [1] for picking up (and have received ACK tags from
Namhyung and Jiri).

Leo

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-01-16  0:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
2021-01-06  7:38   ` Namhyung Kim
2021-01-11  7:49     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
2021-01-06  7:38   ` Namhyung Kim
2021-01-11  8:22     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
2021-01-06  7:42   ` Namhyung Kim
2021-01-11  8:41     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
2021-01-06  7:44   ` Namhyung Kim
2021-01-11  8:42     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 05/11] perf c2c: Refactor hist entry validation Leo Yan
2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
2021-01-06  7:47   ` Namhyung Kim
2021-01-11  8:43     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
2021-01-06  7:47   ` Namhyung Kim
2021-01-11  8:44     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
2020-12-13 13:38 ` [PATCH v2 09/11] perf c2c: Add local variables for output metrics Leo Yan
2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
2021-01-06  7:52   ` Namhyung Kim
2021-01-11  8:47     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 11/11] perf c2c: Update documentation for display option 'all' Leo Yan
2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
2021-01-04  2:09   ` Leo Yan
2021-01-04  9:35     ` Jiri Olsa
2021-01-15 15:17       ` Arnaldo Carvalho de Melo
2021-01-16  0:45         ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).