All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load
@ 2020-10-15 14:50 Leo Yan
  2020-10-15 14:50 ` [PATCH v1 1/8] perf mem: Add structure field c2c_stats::tot_llchit Leo Yan
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-15 14:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kan Liang, Andi Kleen, Ian Rogers, Joe Mario, David Ahern,
	Don Zickus, Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
rely on HITM display to report cache false sharing.  Alternatively, we
can use the LLC access and multi-threads info to locate the potential
false sharing's data address, and if we connect with source code and
analyze the multi-threads' execution timing, if can conclude load and
store the same cache line at the meantime, thus this can be helpful for
resolve the cache false sharing issue.

This patch set is to enable the display with sorting on LLC load
accesses; it adds dimensions for total LLC hit and LLC load accesses,
and these dimensions are used for shared cache line table and pareto.

This patch set is dependend on the patch set "perf c2c: Refine the
organization of metrics" [1].

[1] https://lore.kernel.org/patchwork/cover/1321499/

With this patch set, we can get display 'llc' as follows:

  # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table
  =================================================
  #
  #        ----------- Cacheline ----------  LLC Hit   LLC Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt      Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  .......  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0x563b01e83100     0    1401   65.32%       648     7011     3738     3273     2582      691      515     2516       59       143      505         0        0         0         0
        1      0x563b01e830c0     0       1   26.51%       263      400      400        0        0        0      130        3        4       262        1         0        0         0         0
        2      0x563b01e83080     0       1    7.76%        77      650      650        0        0        0      180      348       45        14       63         0        0         0         0
        3  0xffff88c3d74e82c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
        4  0xffffa587c11e38c0   N/A       0    0.10%         1        2        1        1        1        0        0        0        0         1        0         0        0         0         0
        5  0xffffffffbd5e6fc0     0       1    0.10%         1        1        1        0        0        0        0        0        0         0        1         0        0         0         0
        6      0x7f90a4d6c2c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto
  =================================================
  #
  #        ---- LLC LD ----  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
  #   Num   LclHit  LclHitm   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
  #
    -------------------------------------------------------------
        0      143      505     2582      691      0x563b01e83100
    -------------------------------------------------------------
            96.50%    7.72%   46.79%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c16         0      1949      1331     1876         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
             0.00%   35.05%    0.00%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c1d         0      2651       975      748         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%   30.89%    0.00%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c1d         0      1425      1003      762         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             2.10%    7.52%   49.19%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c16         0      1585      1053     2037         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
             0.00%    0.00%    2.52%   44.86%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c28         0         0         0      375         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    1.51%   55.14%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c28         0         0         0      420         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             1.40%   12.87%    0.00%    0.00%                0x20     0       1    14100    14104:reader_thd      0x563b01c81c73         0       166        99      417         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
             0.00%    5.94%    0.00%    0.00%                0x20     0       1    14100    14105:reader_thd      0x563b01c81c73         0       144        85      376         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0

  [...]


Leo Yan (8):
  perf mem: Add structure field c2c_stats::tot_llchit
  perf c2c: Add dimensions for total LLC hit
  perf c2c: Add dimensions for LLC load hit
  perf c2c: Change to general naming for macros
  perf c2c: Rename for shared cache line stats
  perf c2c: Refactor hist entry validation
  perf c2c: Add option '-d llc' for sorting with LLC load
  perf c2c: Update documentation for display option 'llc'

 tools/perf/Documentation/perf-c2c.txt |  18 +-
 tools/perf/builtin-c2c.c              | 333 +++++++++++++++++++++-----
 tools/perf/util/mem-events.c          |   3 +
 tools/perf/util/mem-events.h          |   1 +
 4 files changed, 286 insertions(+), 69 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 1/8] perf mem: Add structure field c2c_stats::tot_llchit
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
@ 2020-10-15 14:50 ` Leo Yan
  2020-10-15 14:50 ` [PATCH v1 2/8] perf c2c: Add dimensions for total LLC hit Leo Yan
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-15 14:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kan Liang, Andi Kleen, Ian Rogers, Joe Mario, David Ahern,
	Don Zickus, Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Add new field c2c_stats::tot_llchit to count total number for LLC hit:

  c2c_stats::tot_llchit = c2c_stats::lcl_hitm + c2c_stats::ld_llchit

This is the preparation for additional sorting with total LLC hit, and
will be used in perf c2c report in following patches.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/mem-events.c | 3 +++
 tools/perf/util/mem-events.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index ea0af0bc4314..0ad27bef0698 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -371,6 +371,8 @@ do {				\
 					HITM_INC(lcl_hitm);
 				else
 					stats->ld_llchit++;
+
+				stats->tot_llchit++;
 			}
 
 			if (lvl & P(LVL, LOC_RAM)) {
@@ -455,6 +457,7 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
 	stats->ld_fbhit		+= add->ld_fbhit;
 	stats->ld_l1hit		+= add->ld_l1hit;
 	stats->ld_l2hit		+= add->ld_l2hit;
+	stats->tot_llchit	+= add->tot_llchit;
 	stats->ld_llchit	+= add->ld_llchit;
 	stats->lcl_hitm		+= add->lcl_hitm;
 	stats->rmt_hitm		+= add->rmt_hitm;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index 904dad34f7f7..a1fa1c312ddb 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -68,6 +68,7 @@ struct c2c_stats {
 	u32	ld_fbhit;            /* count of loads hitting Fill Buffer */
 	u32	ld_l1hit;            /* count of loads that hit L1D */
 	u32	ld_l2hit;            /* count of loads that hit L2D */
+	u32	tot_llchit;          /* count of all loads that hit LLC */
 	u32	ld_llchit;           /* count of loads that hit LLC */
 	u32	lcl_hitm;            /* count of loads with local HITM  */
 	u32	rmt_hitm;            /* count of loads with remote HITM */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 2/8] perf c2c: Add dimensions for total LLC hit
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
  2020-10-15 14:50 ` [PATCH v1 1/8] perf mem: Add structure field c2c_stats::tot_llchit Leo Yan
@ 2020-10-15 14:50 ` Leo Yan
  2020-10-15 14:50 ` [PATCH v1 3/8] perf c2c: Add dimensions for LLC load hit Leo Yan
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-15 14:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kan Liang, Andi Kleen, Ian Rogers, Joe Mario, David Ahern,
	Don Zickus, Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Since Arm64 SPE trace data doesn't support HITM, we still want to
explore "perf c2c" tool to analyze cache false sharing.  If without HITM
tag, the tool cannot give out accurate result for cache false sharing,
a candidate solution is to sort the LLC hit and connect with the info of
multiple threads, e.g. if multiple threads hit the LLC on the same
cacheline for many times, this hints that it's likely to cause false
sharing issue.  Though this solution is not perfect due to lacking HITM
tag, it's pragmatic for detecting false sharing.

To support the sorting with LLC hit, this patch adds dimensions for
total LLC hit and the associated percentage calculation.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 76 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 9c2183957c50..5fb77fcd3c9c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -651,6 +651,7 @@ STAT_FN(ld_l1hit)
 STAT_FN(ld_l2hit)
 STAT_FN(ld_llchit)
 STAT_FN(rmt_hit)
+STAT_FN(tot_llchit)
 
 static uint64_t total_records(struct c2c_stats *stats)
 {
@@ -856,6 +857,62 @@ percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return per_left - per_right;
 }
 
+static double percent_llchit(struct c2c_hist_entry *c2c_he)
+{
+	struct c2c_hists *hists;
+	struct c2c_stats *stats;
+	struct c2c_stats *total;
+	int tot = 0, st = 0;
+
+	hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
+	stats = &c2c_he->stats;
+	total = &hists->stats;
+
+	st  = stats->tot_llchit;
+	tot = total->tot_llchit;
+
+	return tot ? (double) st * 100 / tot : 0;
+}
+
+static int
+percent_llchit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		     struct hist_entry *he)
+{
+	struct c2c_hist_entry *c2c_he;
+	int width = c2c_width(fmt, hpp, he->hists);
+	char buf[10];
+	double per;
+
+	c2c_he = container_of(he, struct c2c_hist_entry, he);
+	per = percent_llchit(c2c_he);
+	return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_llchit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		     struct hist_entry *he)
+{
+	return percent_color(fmt, hpp, he, percent_llchit);
+}
+
+static int64_t
+percent_llchit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+		   struct hist_entry *left, struct hist_entry *right)
+{
+	struct c2c_hist_entry *c2c_left;
+	struct c2c_hist_entry *c2c_right;
+	double per_left;
+	double per_right;
+
+	c2c_left  = container_of(left, struct c2c_hist_entry, he);
+	c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+	per_left  = percent_llchit(c2c_left);
+	per_right = percent_llchit(c2c_right);
+
+	return per_left - per_right;
+}
+
 static struct c2c_stats *he_stats(struct hist_entry *he)
 {
 	struct c2c_hist_entry *c2c_he;
@@ -1392,6 +1449,14 @@ static struct c2c_dimension dim_ld_l2hit = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_tot_llchit = {
+	.header		= HEADER_BOTH("LLC Hit", "Total"),
+	.name		= "tot_llchit",
+	.cmp		= tot_llchit_cmp,
+	.entry		= tot_llchit_entry,
+	.width		= 8,
+};
+
 static struct c2c_dimension dim_ld_llchit = {
 	.header		= HEADER_SPAN("- LLC Load Hit --", "LclHit", 1),
 	.name		= "ld_lclhit",
@@ -1438,6 +1503,15 @@ static struct c2c_dimension dim_percent_hitm = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_percent_llchit = {
+	.header         = HEADER_BOTH("LLC Hit", "Pct"),
+	.name		= "percent_llchit",
+	.cmp		= percent_llchit_cmp,
+	.entry		= percent_llchit_entry,
+	.color		= percent_llchit_color,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_percent_rmt_hitm = {
 	.header		= HEADER_SPAN("----- HITM -----", "RmtHitm", 1),
 	.name		= "percent_rmt_hitm",
@@ -1611,9 +1685,11 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_ld_l2hit,
 	&dim_ld_llchit,
 	&dim_ld_rmthit,
+	&dim_tot_llchit,
 	&dim_tot_recs,
 	&dim_tot_loads,
 	&dim_percent_hitm,
+	&dim_percent_llchit,
 	&dim_percent_rmt_hitm,
 	&dim_percent_lcl_hitm,
 	&dim_percent_stores_l1hit,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 3/8] perf c2c: Add dimensions for LLC load hit
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
  2020-10-15 14:50 ` [PATCH v1 1/8] perf mem: Add structure field c2c_stats::tot_llchit Leo Yan
  2020-10-15 14:50 ` [PATCH v1 2/8] perf c2c: Add dimensions for total LLC hit Leo Yan
@ 2020-10-15 14:50 ` Leo Yan
  2020-10-15 14:50 ` [PATCH v1 4/8] perf c2c: Change to general naming for macros Leo Yan
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-15 14:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kan Liang, Andi Kleen, Ian Rogers, Joe Mario, David Ahern,
	Don Zickus, Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Add dimensions for LLC load hit and its associated percentage
calculation, which is to be displayed in the single cache line output.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 51 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 5fb77fcd3c9c..3f6271a779c4 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -947,6 +947,7 @@ static double percent_ ## __f(struct c2c_hist_entry *c2c_he)			\
 
 PERCENT_FN(rmt_hitm)
 PERCENT_FN(lcl_hitm)
+PERCENT_FN(ld_llchit)
 PERCENT_FN(st_l1hit)
 PERCENT_FN(st_l1miss)
 
@@ -1012,6 +1013,37 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 	return per_left - per_right;
 }
 
+static int
+percent_llc_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		       struct hist_entry *he)
+{
+	int width = c2c_width(fmt, hpp, he->hists);
+	double per = PERCENT(he, ld_llchit);
+	char buf[10];
+
+	return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_llc_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+		      struct hist_entry *he)
+{
+	return percent_color(fmt, hpp, he, percent_ld_llchit);
+}
+
+static int64_t
+percent_llc_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+		    struct hist_entry *left, struct hist_entry *right)
+{
+	double per_left;
+	double per_right;
+
+	per_left  = PERCENT(left, ld_llchit);
+	per_right = PERCENT(right, ld_llchit);
+
+	return per_left - per_right;
+}
+
 static int
 percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
 			   struct hist_entry *he)
@@ -1377,6 +1409,14 @@ static struct c2c_dimension dim_cl_rmt_hitm = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_cl_llc_hit = {
+	.header		= HEADER_SPAN("--- LLC Load ---", "LclHit", 1),
+	.name		= "cl_llc_hit",
+	.cmp		= ld_llchit_cmp,
+	.entry		= ld_llchit_entry,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_cl_lcl_hitm = {
 	.header		= HEADER_SPAN_LOW("Lcl"),
 	.name		= "cl_lcl_hitm",
@@ -1530,6 +1570,15 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
 	.width		= 7,
 };
 
+static struct c2c_dimension dim_percent_llc_hit = {
+	.header		= HEADER_SPAN("---- LLC LD ----", "LclHit", 1),
+	.name		= "percent_llc_hit",
+	.cmp		= percent_llc_hit_cmp,
+	.entry		= percent_llc_hit_entry,
+	.color		= percent_llc_hit_color,
+	.width		= 7,
+};
+
 static struct c2c_dimension dim_percent_stores_l1hit = {
 	.header		= HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
 	.name		= "percent_stores_l1hit",
@@ -1673,6 +1722,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_tot_hitm,
 	&dim_lcl_hitm,
 	&dim_rmt_hitm,
+	&dim_cl_llc_hit,
 	&dim_cl_lcl_hitm,
 	&dim_cl_rmt_hitm,
 	&dim_tot_stores,
@@ -1692,6 +1742,7 @@ static struct c2c_dimension *dimensions[] = {
 	&dim_percent_llchit,
 	&dim_percent_rmt_hitm,
 	&dim_percent_lcl_hitm,
+	&dim_percent_llc_hit,
 	&dim_percent_stores_l1hit,
 	&dim_percent_stores_l1miss,
 	&dim_dram_lcl,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 4/8] perf c2c: Change to general naming for macros
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
                   ` (2 preceding siblings ...)
  2020-10-15 14:50 ` [PATCH v1 3/8] perf c2c: Add dimensions for LLC load hit Leo Yan
@ 2020-10-15 14:50 ` Leo Yan
  2020-10-15 14:50 ` [PATCH v1 5/8] perf c2c: Rename for shared cache line stats Leo Yan
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-15 14:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kan Liang, Andi Kleen, Ian Rogers, Joe Mario, David Ahern,
	Don Zickus, Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

The display and filter macros are named with suffix "_HITM", it is bound
to HITM metrics and isn't general if we want to sort cache lines with
other metrics (like LLC load hit) rather than HITM metrics.

This patch changes to use more general naming for macros:

  s/DISPLAY_HITM/DISPLAY_ENTRY
  s/FILTER_HITM/FILTER_ENTRY

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 3f6271a779c4..59f141adea3e 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1176,7 +1176,7 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
 			ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
 			advance_hpp(hpp, ret);
 
-		#define DISPLAY_HITM(__h)						\
+		#define DISPLAY_ENTRY(__h)						\
 			if (c2c_he->stats.__h> 0) {					\
 				ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ",	\
 						percent(stats->__h, c2c_he->stats.__h));\
@@ -1186,18 +1186,18 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
 
 			switch (c2c.display) {
 			case DISPLAY_RMT:
-				DISPLAY_HITM(rmt_hitm);
+				DISPLAY_ENTRY(rmt_hitm);
 				break;
 			case DISPLAY_LCL:
-				DISPLAY_HITM(lcl_hitm);
+				DISPLAY_ENTRY(lcl_hitm);
 				break;
 			case DISPLAY_TOT:
-				DISPLAY_HITM(tot_hitm);
+				DISPLAY_ENTRY(tot_hitm);
 			default:
 				break;
 			}
 
-		#undef DISPLAY_HITM
+		#undef DISPLAY_ENTRY
 
 			advance_hpp(hpp, ret);
 
@@ -1984,7 +1984,7 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
-#define FILTER_HITM(__h)						\
+#define FILTER_ENTRY(__h)						\
 	if (stats->__h) {						\
 		ld_dist = ((double)c2c_he->stats.__h / stats->__h);	\
 		if (ld_dist < DISPLAY_LINE_LIMIT)			\
@@ -1995,18 +1995,18 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 
 	switch (c2c.display) {
 	case DISPLAY_LCL:
-		FILTER_HITM(lcl_hitm);
+		FILTER_ENTRY(lcl_hitm);
 		break;
 	case DISPLAY_RMT:
-		FILTER_HITM(rmt_hitm);
+		FILTER_ENTRY(rmt_hitm);
 		break;
 	case DISPLAY_TOT:
-		FILTER_HITM(tot_hitm);
+		FILTER_ENTRY(tot_hitm);
 	default:
 		break;
 	}
 
-#undef FILTER_HITM
+#undef FILTER_ENTRY
 
 	return he->filtered == 0;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 5/8] perf c2c: Rename for shared cache line stats
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
                   ` (3 preceding siblings ...)
  2020-10-15 14:50 ` [PATCH v1 4/8] perf c2c: Change to general naming for macros Leo Yan
@ 2020-10-15 14:50 ` Leo Yan
  2020-10-15 14:50 ` [PATCH v1 6/8] perf c2c: Refactor hist entry validation Leo Yan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-15 14:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kan Liang, Andi Kleen, Ian Rogers, Joe Mario, David Ahern,
	Don Zickus, Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

In the structure perf_c2c for shared cache line stats, replace
"hitm_stats" with "shared_clines_stats", and rename function
resort_hitm_cb() to resort_shared_cl_cb().

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 59f141adea3e..7962cc70b855 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -97,8 +97,8 @@ struct perf_c2c {
 	bool			 symbol_full;
 	bool			 stitch_lbr;
 
-	/* HITM shared clines stats */
-	struct c2c_stats	hitm_stats;
+	/* Shared clines stats */
+	struct c2c_stats	shared_clines_stats;
 	int			shared_clines;
 
 	int			 display;
@@ -2084,7 +2084,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 {
 	struct c2c_hist_entry *c2c_he;
 	struct c2c_hists *c2c_hists;
-	bool display = he__display(he, &c2c.hitm_stats);
+	bool display = he__display(he, &c2c.shared_clines_stats);
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 	c2c_hists = c2c_he->hists;
@@ -2171,14 +2171,14 @@ static int setup_nodes(struct perf_session *session)
 
 #define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
 
-static int resort_hitm_cb(struct hist_entry *he, void *arg __maybe_unused)
+static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 {
 	struct c2c_hist_entry *c2c_he;
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
 	if (HAS_HITMS(c2c_he)) {
 		c2c.shared_clines++;
-		c2c_add_stats(&c2c.hitm_stats, &c2c_he->stats);
+		c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
 	}
 
 	return 0;
@@ -2249,7 +2249,7 @@ static void print_c2c__display_stats(FILE *out)
 
 static void print_shared_cacheline_info(FILE *out)
 {
-	struct c2c_stats *stats = &c2c.hitm_stats;
+	struct c2c_stats *stats = &c2c.shared_clines_stats;
 	int hitm_cnt = stats->lcl_hitm + stats->rmt_hitm;
 
 	fprintf(out, "=================================================\n");
@@ -2942,7 +2942,7 @@ static int perf_c2c__report(int argc, const char **argv)
 	ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
 
 	hists__collapse_resort(&c2c.hists.hists, NULL);
-	hists__output_resort_cb(&c2c.hists.hists, &prog, resort_hitm_cb);
+	hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
 	hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);
 
 	ui_progress__finish();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 6/8] perf c2c: Refactor hist entry validation
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
                   ` (4 preceding siblings ...)
  2020-10-15 14:50 ` [PATCH v1 5/8] perf c2c: Rename for shared cache line stats Leo Yan
@ 2020-10-15 14:50 ` Leo Yan
  2020-10-15 14:50 ` [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load Leo Yan
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-15 14:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kan Liang, Andi Kleen, Ian Rogers, Joe Mario, David Ahern,
	Don Zickus, Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

This patch has no any functionality change but only refactors hist entry
validation for cacheline resorting.

It renames function "valid_hitm_or_store()" to "is_valid_hist_entry()",
changes return type from integer type to bool type.  In the function,
it uses switch-case instead of ternary operators for better readability.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 7962cc70b855..d15a6220bfd0 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2011,16 +2011,32 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 	return he->filtered == 0;
 }
 
-static inline int valid_hitm_or_store(struct hist_entry *he)
+static inline bool is_valid_hist_entry(struct hist_entry *he)
 {
 	struct c2c_hist_entry *c2c_he;
-	bool has_hitm;
+	bool has_record = false;
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
-	has_hitm = c2c.display == DISPLAY_TOT ? c2c_he->stats.tot_hitm :
-		   c2c.display == DISPLAY_LCL ? c2c_he->stats.lcl_hitm :
-						c2c_he->stats.rmt_hitm;
-	return has_hitm || c2c_he->stats.store;
+
+	/* It's a valid entry if contains stores */
+	if (c2c_he->stats.store)
+		return true;
+
+	switch (c2c.display) {
+	case DISPLAY_LCL:
+		has_record = !!c2c_he->stats.lcl_hitm;
+		break;
+	case DISPLAY_RMT:
+		has_record = !!c2c_he->stats.rmt_hitm;
+		break;
+	case DISPLAY_TOT:
+		has_record = !!c2c_he->stats.tot_hitm;
+		break;
+	default:
+		break;
+	}
+
+	return has_record;
 }
 
 static void set_node_width(struct c2c_hist_entry *c2c_he, int len)
@@ -2074,7 +2090,7 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)
 
 	calc_width(c2c_he);
 
-	if (!valid_hitm_or_store(he))
+	if (!is_valid_hist_entry(he))
 		he->filtered = HIST_FILTER__C2C;
 
 	return 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
                   ` (5 preceding siblings ...)
  2020-10-15 14:50 ` [PATCH v1 6/8] perf c2c: Refactor hist entry validation Leo Yan
@ 2020-10-15 14:50 ` Leo Yan
  2020-10-20  7:25   ` Jiri Olsa
  2020-10-20  7:26   ` Jiri Olsa
  2020-10-15 14:50 ` [PATCH v1 8/8] perf c2c: Update documentation for display option 'llc' Leo Yan
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-15 14:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kan Liang, Andi Kleen, Ian Rogers, Joe Mario, David Ahern,
	Don Zickus, Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Except the existed three display options 'tot', 'rmt', 'lcl', this patch
adds option 'llc' so that can sort on LLC load metrics.  The new
introduced option can work as a choice if the memory event doesn't
contain HITM tags.

For the display with option 'llc', both the "Shared Data Cache Line
Table" and "Shared Cache Line Distribution Pareto" have difference
comparing to other three display options.

For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
it sorts with the LLC hit metrics "tot_llchit".  In this case, users
should be interested in LLC related statistics, so the dimensions of
total LLC hit is used to replace HITM related dimensions.

For Pareto, every single cache line shows the metrics "cl_llc_hit"
instead of "cl_rmt_hitm", and the single cache line view is sorted by
metrics "tot_llchit".

As result, we can get the 'llc' display as follows:

  # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table
  =================================================
  #
  #        ----------- Cacheline ----------  LLC Hit   LLC Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt      Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  .......  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0x563b01e83100     0    1401   65.32%       648     7011     3738     3273     2582      691      515     2516       59       143      505         0        0         0         0
        1      0x563b01e830c0     0       1   26.51%       263      400      400        0        0        0      130        3        4       262        1         0        0         0         0
        2      0x563b01e83080     0       1    7.76%        77      650      650        0        0        0      180      348       45        14       63         0        0         0         0
        3  0xffff88c3d74e82c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
        4  0xffffa587c11e38c0   N/A       0    0.10%         1        2        1        1        1        0        0        0        0         1        0         0        0         0         0
        5  0xffffffffbd5e6fc0     0       1    0.10%         1        1        1        0        0        0        0        0        0         0        1         0        0         0         0
        6      0x7f90a4d6c2c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto
  =================================================
  #
  #        ---- LLC LD ----  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
  #   Num   LclHit  LclHitm   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
  #
    -------------------------------------------------------------
        0      143      505     2582      691      0x563b01e83100
    -------------------------------------------------------------
            96.50%    7.72%   46.79%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c16         0      1949      1331     1876         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
             0.00%   35.05%    0.00%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c1d         0      2651       975      748         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%   30.89%    0.00%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c1d         0      1425      1003      762         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             2.10%    7.52%   49.19%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c16         0      1585      1053     2037         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
             0.00%    0.00%    2.52%   44.86%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c28         0         0         0      375         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    1.51%   55.14%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c28         0         0         0      420         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             1.40%   12.87%    0.00%    0.00%                0x20     0       1    14100    14104:reader_thd      0x563b01c81c73         0       166        99      417         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
             0.00%    5.94%    0.00%    0.00%                0x20     0       1    14100    14105:reader_thd      0x563b01c81c73         0       144        85      376         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0

  [...]

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/builtin-c2c.c | 142 ++++++++++++++++++++++++++++-----------
 1 file changed, 101 insertions(+), 41 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index d15a6220bfd0..0147c23ff918 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -113,13 +113,15 @@ enum {
 	DISPLAY_LCL,
 	DISPLAY_RMT,
 	DISPLAY_TOT,
+	DISPLAY_LLC,
 	DISPLAY_MAX,
 };
 
 static const char *display_str[DISPLAY_MAX] = {
-	[DISPLAY_LCL] = "Local",
-	[DISPLAY_RMT] = "Remote",
-	[DISPLAY_TOT] = "Total",
+	[DISPLAY_LCL] = "Local HITMs",
+	[DISPLAY_RMT] = "Remote HITMs",
+	[DISPLAY_TOT] = "Total HITMs",
+	[DISPLAY_LLC] = "LLC Access",
 };
 
 static const struct option c2c_options[] = {
@@ -1193,6 +1195,10 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
 				break;
 			case DISPLAY_TOT:
 				DISPLAY_ENTRY(tot_hitm);
+				break;
+			case DISPLAY_LLC:
+				DISPLAY_ENTRY(tot_llchit);
+				break;
 			default:
 				break;
 			}
@@ -1533,6 +1539,7 @@ static struct c2c_header percent_hitm_header[] = {
 	[DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
 	[DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
 	[DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
+	[DISPLAY_LLC] = HEADER_BOTH("LLC", "Hit"),
 };
 
 static struct c2c_dimension dim_percent_hitm = {
@@ -2002,6 +2009,10 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 		break;
 	case DISPLAY_TOT:
 		FILTER_ENTRY(tot_hitm);
+		break;
+	case DISPLAY_LLC:
+		FILTER_ENTRY(tot_llchit);
+		break;
 	default:
 		break;
 	}
@@ -2032,6 +2043,9 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
 	case DISPLAY_TOT:
 		has_record = !!c2c_he->stats.tot_hitm;
 		break;
+	case DISPLAY_LLC:
+		has_record = !!c2c_he->stats.tot_llchit;
+		break;
 	default:
 		break;
 	}
@@ -2186,17 +2200,20 @@ static int setup_nodes(struct perf_session *session)
 }
 
 #define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
+#define HAS_LLCHIT(__h) ((__h)->stats.tot_llchit)
 
 static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
 {
 	struct c2c_hist_entry *c2c_he;
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
 
-	if (HAS_HITMS(c2c_he)) {
+	if (c2c.display == DISPLAY_LLC && HAS_LLCHIT(c2c_he)) {
+		c2c.shared_clines++;
+		c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
+	} else if (HAS_HITMS(c2c_he)) {
 		c2c.shared_clines++;
 		c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
 	}
-
 	return 0;
 }
 
@@ -2315,16 +2332,26 @@ static void print_pareto(FILE *out)
 	struct perf_hpp_list hpp_list;
 	struct rb_node *nd;
 	int ret;
+	const char *cl_output;
+
+	if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
+	    c2c.display == DISPLAY_RMT)
+		cl_output = "cl_num,"
+			    "cl_rmt_hitm,"
+			    "cl_lcl_hitm,"
+			    "cl_stores_l1hit,"
+			    "cl_stores_l1miss,"
+			    "dcacheline";
+	else /* c2c.display == DISPLAY_LLC */
+		cl_output = "cl_num,"
+			    "cl_llc_hit,"
+			    "cl_lcl_hitm,"
+			    "cl_stores_l1hit,"
+			    "cl_stores_l1miss,"
+			    "dcacheline";
 
 	perf_hpp_list__init(&hpp_list);
-	ret = hpp_list__parse(&hpp_list,
-				"cl_num,"
-				"cl_rmt_hitm,"
-				"cl_lcl_hitm,"
-				"cl_stores_l1hit,"
-				"cl_stores_l1miss,"
-				"dcacheline",
-				NULL);
+	ret = hpp_list__parse(&hpp_list, cl_output, NULL);
 
 	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
 		return;
@@ -2357,7 +2384,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
 		fprintf(out, "%-36s: %s\n", first ? "  Events" : "", evsel__name(evsel));
 		first = false;
 	}
-	fprintf(out, "  Cachelines sort on                : %s HITMs\n",
+	fprintf(out, "  Cachelines sort on                : %s\n",
 		display_str[c2c.display]);
 	fprintf(out, "  Cacheline data grouping           : %s\n", c2c.cl_sort);
 }
@@ -2514,7 +2541,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
 {
 	scnprintf(bf, size,
 		  "Shared Data Cache Line Table     "
-		  "(%lu entries, sorted on %s HITMs)",
+		  "(%lu entries, sorted on %s)",
 		  browser->nr_non_filtered_entries,
 		  display_str[c2c.display]);
 	return 0;
@@ -2720,6 +2747,8 @@ static int setup_display(const char *str)
 		c2c.display = DISPLAY_RMT;
 	else if (!strcmp(display, "lcl"))
 		c2c.display = DISPLAY_LCL;
+	else if (!strcmp(display, "llc"))
+		c2c.display = DISPLAY_LLC;
 	else {
 		pr_err("failed: unknown display type: %s\n", str);
 		return -1;
@@ -2766,9 +2795,10 @@ static int build_cl_output(char *cl_sort, bool no_source)
 	}
 
 	if (asprintf(&c2c.cl_output,
-		"%s%s%s%s%s%s%s%s%s%s",
+		"%s%s%s%s%s%s%s%s%s%s%s",
 		c2c.use_stdio ? "cl_num_empty," : "",
-		"percent_rmt_hitm,"
+		c2c.display == DISPLAY_LLC ? "percent_llc_hit," :
+					     "percent_rmt_hitm,",
 		"percent_lcl_hitm,"
 		"percent_stores_l1hit,"
 		"percent_stores_l1miss,"
@@ -2798,6 +2828,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
 static int setup_coalesce(const char *coalesce, bool no_source)
 {
 	const char *c = coalesce ?: coalesce_default;
+	const char *sort_str;
 
 	if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
 		return -ENOMEM;
@@ -2805,12 +2836,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
 	if (build_cl_output(c2c.cl_sort, no_source))
 		return -1;
 
-	if (asprintf(&c2c.cl_resort, "offset,%s",
-		     c2c.display == DISPLAY_TOT ?
-		     "tot_hitm" :
-		     c2c.display == DISPLAY_RMT ?
-		     "rmt_hitm,lcl_hitm" :
-		     "lcl_hitm,rmt_hitm") < 0)
+	if (c2c.display == DISPLAY_TOT)
+		sort_str = "tot_hitm";
+	else if (c2c.display == DISPLAY_RMT)
+		sort_str = "rmt_hitm,lcl_hitm";
+	else if (c2c.display == DISPLAY_LCL)
+		sort_str = "lcl_hitm,rmt_hitm";
+	else if (c2c.display == DISPLAY_LLC)
+		sort_str = "tot_llchit";
+
+	if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
 		return -ENOMEM;
 
 	pr_debug("coalesce sort   fields: %s\n", c2c.cl_sort);
@@ -2862,6 +2897,7 @@ static int perf_c2c__report(int argc, const char **argv)
 	OPT_END()
 	};
 	int err = 0;
+	const char *output_str, *sort_str;
 
 	argc = parse_options(argc, argv, options, report_c2c_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
@@ -2936,24 +2972,48 @@ static int perf_c2c__report(int argc, const char **argv)
 		goto out_mem2node;
 	}
 
-	c2c_hists__reinit(&c2c.hists,
-			"cl_idx,"
-			"dcacheline,"
-			"dcacheline_node,"
-			"dcacheline_count,"
-			"percent_hitm,"
-			"tot_hitm,lcl_hitm,rmt_hitm,"
-			"tot_recs,"
-			"tot_loads,"
-			"tot_stores,"
-			"stores_l1hit,stores_l1miss,"
-			"ld_fbhit,ld_l1hit,ld_l2hit,"
-			"ld_lclhit,lcl_hitm,"
-			"ld_rmthit,rmt_hitm,"
-			"dram_lcl,dram_rmt",
-			c2c.display == DISPLAY_TOT ? "tot_hitm" :
-			c2c.display == DISPLAY_LCL ? "lcl_hitm" : "rmt_hitm"
-			);
+	if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
+	    c2c.display == DISPLAY_RMT)
+		output_str = "cl_idx,"
+			     "dcacheline,"
+			     "dcacheline_node,"
+			     "dcacheline_count,"
+			     "percent_hitm,"
+			     "tot_hitm,lcl_hitm,rmt_hitm,"
+			     "tot_recs,"
+			     "tot_loads,"
+			     "tot_stores,"
+			     "stores_l1hit,stores_l1miss,"
+			     "ld_fbhit,ld_l1hit,ld_l2hit,"
+			     "ld_lclhit,lcl_hitm,"
+			     "ld_rmthit,rmt_hitm,"
+			     "dram_lcl,dram_rmt";
+	else /* c2c.display == DISPLAY_LLC */
+		output_str = "cl_idx,"
+			     "dcacheline,"
+			     "dcacheline_node,"
+			     "dcacheline_count,"
+			     "percent_llchit,"
+			     "tot_llchit,"
+			     "tot_recs,"
+			     "tot_loads,"
+			     "tot_stores,"
+			     "stores_l1hit,stores_l1miss,"
+			     "ld_fbhit,ld_l1hit,ld_l2hit,"
+			     "ld_lclhit,lcl_hitm,"
+			     "ld_rmthit,rmt_hitm,"
+			     "dram_lcl,dram_rmt";
+
+	if (c2c.display == DISPLAY_TOT)
+		sort_str = "tot_hitm";
+	else if (c2c.display == DISPLAY_RMT)
+		sort_str = "rmt_hitm";
+	else if (c2c.display == DISPLAY_LCL)
+		sort_str = "lcl_hitm";
+	else if (c2c.display == DISPLAY_LLC)
+		sort_str = "tot_llchit";
+
+	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
 
 	ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 8/8] perf c2c: Update documentation for display option 'llc'
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
                   ` (6 preceding siblings ...)
  2020-10-15 14:50 ` [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load Leo Yan
@ 2020-10-15 14:50 ` Leo Yan
  2020-10-20  7:26   ` Jiri Olsa
  2020-10-15 15:05 ` [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Arnaldo Carvalho de Melo
  2020-10-20  8:13 ` Namhyung Kim
  9 siblings, 1 reply; 19+ messages in thread
From: Leo Yan @ 2020-10-15 14:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Kan Liang, Andi Kleen, Ian Rogers, Joe Mario, David Ahern,
	Don Zickus, Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

Since the new display option 'llc' is introduced, this patch is to
update the documentation to reflect it.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/Documentation/perf-c2c.txt | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index c81d72e3eecf..eadce62ecd28 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -109,7 +109,8 @@ REPORT OPTIONS
 
 -d::
 --display::
-	Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
+	Switch to HITM type (rmt, lcl) or LLC load access (llc) to display
+	and sort on. Total HITMs as default.
 
 --stitch-lbr::
 	Show callgraph with stitched LBRs, which may have more complete
@@ -174,12 +175,18 @@ For each cacheline in the 1) list we display following data:
   Cacheline
   - cacheline address (hex number)
 
-  Rmt/Lcl Hitm
+  Rmt/Lcl Hitm (For display with HITM types)
   - cacheline percentage of all Remote/Local HITM accesses
 
-  LLC Load Hitm - Total, LclHitm, RmtHitm
+  LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
   - count of Total/Local/Remote load HITMs
 
+  LLC Hit Pct (For display 'llc')
+  - cacheline percentage of all LLC load accesses
+
+  LLC Hit Total (For display 'llc')
+  - sum of all LLC load accesses
+
   Total records
   - sum of all cachelines accesses
 
@@ -207,9 +214,12 @@ For each cacheline in the 1) list we display following data:
 
 For each offset in the 2) list we display following data:
 
-  HITM - Rmt, Lcl
+  HITM - Rmt, Lcl (For display with HITM types)
   - % of Remote/Local HITM accesses for given offset within cacheline
 
+  LLC LD - LclHit, LclHitm (For display 'llc')
+  - % of LLC hits and LLC HITMs accesses for given offset within cacheline
+
   Store Refs - L1 Hit, L1 Miss
   - % of store accesses that hit/missed L1 for given offset within cacheline
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
                   ` (7 preceding siblings ...)
  2020-10-15 14:50 ` [PATCH v1 8/8] perf c2c: Update documentation for display option 'llc' Leo Yan
@ 2020-10-15 15:05 ` Arnaldo Carvalho de Melo
  2020-10-15 15:14   ` Leo Yan
  2020-10-20  8:13 ` Namhyung Kim
  9 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-10-15 15:05 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Kan Liang, Andi Kleen, Ian Rogers,
	Joe Mario, David Ahern, Don Zickus, Al Grant, James Clark,
	linux-kernel

Em Thu, Oct 15, 2020 at 03:50:33PM +0100, Leo Yan escreveu:
> If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> rely on HITM display to report cache false sharing.  Alternatively, we
> can use the LLC access and multi-threads info to locate the potential
> false sharing's data address, and if we connect with source code and
> analyze the multi-threads' execution timing, if can conclude load and
> store the same cache line at the meantime, thus this can be helpful for
> resolve the cache false sharing issue.
> 
> This patch set is to enable the display with sorting on LLC load
> accesses; it adds dimensions for total LLC hit and LLC load accesses,
> and these dimensions are used for shared cache line table and pareto.
> 
> This patch set is dependend on the patch set "perf c2c: Refine the
> organization of metrics" [1].
> 
> [1] https://lore.kernel.org/patchwork/cover/1321499/

Ok, that one is applied and will appear publicly as soon as it goes thru
my usual set of build tests.

- Arnaldo
 
> With this patch set, we can get display 'llc' as follows:
> 
>   # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio
> 
>   [...]
> 
>   =================================================
>              Shared Data Cache Line Table
>   =================================================
>   #
>   #        ----------- Cacheline ----------  LLC Hit   LLC Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt      Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  .......  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0x563b01e83100     0    1401   65.32%       648     7011     3738     3273     2582      691      515     2516       59       143      505         0        0         0         0
>         1      0x563b01e830c0     0       1   26.51%       263      400      400        0        0        0      130        3        4       262        1         0        0         0         0
>         2      0x563b01e83080     0       1    7.76%        77      650      650        0        0        0      180      348       45        14       63         0        0         0         0
>         3  0xffff88c3d74e82c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
>         4  0xffffa587c11e38c0   N/A       0    0.10%         1        2        1        1        1        0        0        0        0         1        0         0        0         0         0
>         5  0xffffffffbd5e6fc0     0       1    0.10%         1        1        1        0        0        0        0        0        0         0        1         0        0         0         0
>         6      0x7f90a4d6c2c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
> 
>   =================================================
>         Shared Cache Line Distribution Pareto
>   =================================================
>   #
>   #        ---- LLC LD ----  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
>   #   Num   LclHit  LclHitm   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
>   # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
>   #
>     -------------------------------------------------------------
>         0      143      505     2582      691      0x563b01e83100
>     -------------------------------------------------------------
>             96.50%    7.72%   46.79%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c16         0      1949      1331     1876         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
>              0.00%   35.05%    0.00%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c1d         0      2651       975      748         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              0.00%   30.89%    0.00%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c1d         0      1425      1003      762         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              2.10%    7.52%   49.19%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c16         0      1585      1053     2037         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
>              0.00%    0.00%    2.52%   44.86%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c28         0         0         0      375         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              0.00%    0.00%    1.51%   55.14%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c28         0         0         0      420         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              1.40%   12.87%    0.00%    0.00%                0x20     0       1    14100    14104:reader_thd      0x563b01c81c73         0       166        99      417         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
>              0.00%    5.94%    0.00%    0.00%                0x20     0       1    14100    14105:reader_thd      0x563b01c81c73         0       144        85      376         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
> 
>   [...]
> 
> 
> Leo Yan (8):
>   perf mem: Add structure field c2c_stats::tot_llchit
>   perf c2c: Add dimensions for total LLC hit
>   perf c2c: Add dimensions for LLC load hit
>   perf c2c: Change to general naming for macros
>   perf c2c: Rename for shared cache line stats
>   perf c2c: Refactor hist entry validation
>   perf c2c: Add option '-d llc' for sorting with LLC load
>   perf c2c: Update documentation for display option 'llc'
> 
>  tools/perf/Documentation/perf-c2c.txt |  18 +-
>  tools/perf/builtin-c2c.c              | 333 +++++++++++++++++++++-----
>  tools/perf/util/mem-events.c          |   3 +
>  tools/perf/util/mem-events.h          |   1 +
>  4 files changed, 286 insertions(+), 69 deletions(-)
> 
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load
  2020-10-15 15:05 ` [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Arnaldo Carvalho de Melo
@ 2020-10-15 15:14   ` Leo Yan
  0 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-15 15:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Kan Liang, Andi Kleen, Ian Rogers,
	Joe Mario, David Ahern, Don Zickus, Al Grant, James Clark,
	linux-kernel

On Thu, Oct 15, 2020 at 12:05:06PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Oct 15, 2020 at 03:50:33PM +0100, Leo Yan escreveu:
> > If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> > rely on HITM display to report cache false sharing.  Alternatively, we
> > can use the LLC access and multi-threads info to locate the potential
> > false sharing's data address, and if we connect with source code and
> > analyze the multi-threads' execution timing, if can conclude load and
> > store the same cache line at the meantime, thus this can be helpful for
> > resolve the cache false sharing issue.
> > 
> > This patch set is to enable the display with sorting on LLC load
> > accesses; it adds dimensions for total LLC hit and LLC load accesses,
> > and these dimensions are used for shared cache line table and pareto.
> > 
> > This patch set is dependend on the patch set "perf c2c: Refine the
> > organization of metrics" [1].
> > 
> > [1] https://lore.kernel.org/patchwork/cover/1321499/
> 
> Ok, that one is applied and will appear publicly as soon as it goes thru
> my usual set of build tests.

Thank you, Arnaldo!

Leo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load
  2020-10-15 14:50 ` [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load Leo Yan
@ 2020-10-20  7:25   ` Jiri Olsa
  2020-10-20  8:08     ` Leo Yan
  2020-10-20  7:26   ` Jiri Olsa
  1 sibling, 1 reply; 19+ messages in thread
From: Jiri Olsa @ 2020-10-20  7:25 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Kan Liang,
	Andi Kleen, Ian Rogers, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

On Thu, Oct 15, 2020 at 03:50:40PM +0100, Leo Yan wrote:

SNIP

> @@ -1533,6 +1539,7 @@ static struct c2c_header percent_hitm_header[] = {
>  	[DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
>  	[DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
>  	[DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
> +	[DISPLAY_LLC] = HEADER_BOTH("LLC", "Hit"),
>  };
>  
>  static struct c2c_dimension dim_percent_hitm = {
> @@ -2002,6 +2009,10 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
>  		break;
>  	case DISPLAY_TOT:
>  		FILTER_ENTRY(tot_hitm);
> +		break;
> +	case DISPLAY_LLC:
> +		FILTER_ENTRY(tot_llchit);
> +		break;
>  	default:
>  		break;
>  	}
> @@ -2032,6 +2043,9 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
>  	case DISPLAY_TOT:
>  		has_record = !!c2c_he->stats.tot_hitm;
>  		break;
> +	case DISPLAY_LLC:
> +		has_record = !!c2c_he->stats.tot_llchit;
> +		break;
>  	default:
>  		break;
>  	}

there's one more switch with c2c.display in percent_hitm function,
where you did not add DISPLAY_LLC case.. I guess it should not ever
because we will not use that column in llc display mode, but we
should add at least some warning or that

SNIP

> -				"cl_rmt_hitm,"
> -				"cl_lcl_hitm,"
> -				"cl_stores_l1hit,"
> -				"cl_stores_l1miss,"
> -				"dcacheline",
> -				NULL);
> +	ret = hpp_list__parse(&hpp_list, cl_output, NULL);
>  
>  	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
>  		return;
> @@ -2357,7 +2384,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
>  		fprintf(out, "%-36s: %s\n", first ? "  Events" : "", evsel__name(evsel));
>  		first = false;
>  	}
> -	fprintf(out, "  Cachelines sort on                : %s HITMs\n",
> +	fprintf(out, "  Cachelines sort on                : %s\n",
>  		display_str[c2c.display]);
>  	fprintf(out, "  Cacheline data grouping           : %s\n", c2c.cl_sort);
>  }
> @@ -2514,7 +2541,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
>  {
>  	scnprintf(bf, size,
>  		  "Shared Data Cache Line Table     "
> -		  "(%lu entries, sorted on %s HITMs)",
> +		  "(%lu entries, sorted on %s)",
>  		  browser->nr_non_filtered_entries,
>  		  display_str[c2c.display]);
>  	return 0;
> @@ -2720,6 +2747,8 @@ static int setup_display(const char *str)
>  		c2c.display = DISPLAY_RMT;
>  	else if (!strcmp(display, "lcl"))
>  		c2c.display = DISPLAY_LCL;
> +	else if (!strcmp(display, "llc"))
> +		c2c.display = DISPLAY_LLC;

please update man page with this

>  	else {
>  		pr_err("failed: unknown display type: %s\n", str);
>  		return -1;
> @@ -2766,9 +2795,10 @@ static int build_cl_output(char *cl_sort, bool no_source)
>  	}
>  
>  	if (asprintf(&c2c.cl_output,
> -		"%s%s%s%s%s%s%s%s%s%s",
> +		"%s%s%s%s%s%s%s%s%s%s%s",

why is there extra '%s' when we did not add new argument.. ?

SNIP

> +			     "ld_fbhit,ld_l1hit,ld_l2hit,"
> +			     "ld_lclhit,lcl_hitm,"
> +			     "ld_rmthit,rmt_hitm,"
> +			     "dram_lcl,dram_rmt";
> +	else /* c2c.display == DISPLAY_LLC */
> +		output_str = "cl_idx,"
> +			     "dcacheline,"
> +			     "dcacheline_node,"
> +			     "dcacheline_count,"
> +			     "percent_llchit,"
> +			     "tot_llchit,"
> +			     "tot_recs,"
> +			     "tot_loads,"
> +			     "tot_stores,"
> +			     "stores_l1hit,stores_l1miss,"
> +			     "ld_fbhit,ld_l1hit,ld_l2hit,"
> +			     "ld_lclhit,lcl_hitm,"
> +			     "ld_rmthit,rmt_hitm,"
> +			     "dram_lcl,dram_rmt";
> +
> +	if (c2c.display == DISPLAY_TOT)
> +		sort_str = "tot_hitm";
> +	else if (c2c.display == DISPLAY_RMT)
> +		sort_str = "rmt_hitm";
> +	else if (c2c.display == DISPLAY_LCL)
> +		sort_str = "lcl_hitm";
> +	else if (c2c.display == DISPLAY_LLC)
> +		sort_str = "tot_llchit";
> +

could you please split addition of output_str/sort_str into separate
patch and then add DISPLAY_LLC support? it'd ease up review 

perhaps include also print_pareto changes in it 

thanks,
jirka

> +	c2c_hists__reinit(&c2c.hists, output_str, sort_str);
>  
>  	ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
>  
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load
  2020-10-15 14:50 ` [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load Leo Yan
  2020-10-20  7:25   ` Jiri Olsa
@ 2020-10-20  7:26   ` Jiri Olsa
  2020-10-20  8:14     ` Leo Yan
  1 sibling, 1 reply; 19+ messages in thread
From: Jiri Olsa @ 2020-10-20  7:26 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Kan Liang,
	Andi Kleen, Ian Rogers, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

On Thu, Oct 15, 2020 at 03:50:40PM +0100, Leo Yan wrote:
> Except the existed three display options 'tot', 'rmt', 'lcl', this patch
> adds option 'llc' so that can sort on LLC load metrics.  The new
> introduced option can work as a choice if the memory event doesn't
> contain HITM tags.
> 
> For the display with option 'llc', both the "Shared Data Cache Line
> Table" and "Shared Cache Line Distribution Pareto" have difference
> comparing to other three display options.
> 
> For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
> it sorts with the LLC hit metrics "tot_llchit".  In this case, users
> should be interested in LLC related statistics, so the dimensions of
> total LLC hit is used to replace HITM related dimensions.
> 
> For Pareto, every single cache line shows the metrics "cl_llc_hit"
> instead of "cl_rmt_hitm", and the single cache line view is sorted by
> metrics "tot_llchit".

hi,
I'm getting compilation error on newer gcc:

  CC       builtin-c2c.o
builtin-c2c.c: In function ‘perf_c2c__report’:
builtin-c2c.c:1979:9: error: ‘sort_str’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
 1979 |  return hpp_list__parse(&c2c_hists->list, output, sort);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
builtin-c2c.c:2900:27: note: ‘sort_str’ was declared here
 2900 |  const char *output_str, *sort_str;
      |                           ^~~~~~~~


jirka


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 8/8] perf c2c: Update documentation for display option 'llc'
  2020-10-15 14:50 ` [PATCH v1 8/8] perf c2c: Update documentation for display option 'llc' Leo Yan
@ 2020-10-20  7:26   ` Jiri Olsa
  0 siblings, 0 replies; 19+ messages in thread
From: Jiri Olsa @ 2020-10-20  7:26 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Kan Liang,
	Andi Kleen, Ian Rogers, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

On Thu, Oct 15, 2020 at 03:50:41PM +0100, Leo Yan wrote:
> Since the new display option 'llc' is introduced, this patch is to
> update the documentation to reflect it.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/Documentation/perf-c2c.txt | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
> index c81d72e3eecf..eadce62ecd28 100644
> --- a/tools/perf/Documentation/perf-c2c.txt
> +++ b/tools/perf/Documentation/perf-c2c.txt
> @@ -109,7 +109,8 @@ REPORT OPTIONS
>  
>  -d::
>  --display::
> -	Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
> +	Switch to HITM type (rmt, lcl) or LLC load access (llc) to display
> +	and sort on. Total HITMs as default.

ok, forget the doc update I asked in previous reply ;-)

thanks,
jirka

>  
>  --stitch-lbr::
>  	Show callgraph with stitched LBRs, which may have more complete
> @@ -174,12 +175,18 @@ For each cacheline in the 1) list we display following data:
>    Cacheline
>    - cacheline address (hex number)
>  
> -  Rmt/Lcl Hitm
> +  Rmt/Lcl Hitm (For display with HITM types)
>    - cacheline percentage of all Remote/Local HITM accesses
>  
> -  LLC Load Hitm - Total, LclHitm, RmtHitm
> +  LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
>    - count of Total/Local/Remote load HITMs
>  
> +  LLC Hit Pct (For display 'llc')
> +  - cacheline percentage of all LLC load accesses
> +
> +  LLC Hit Total (For display 'llc')
> +  - sum of all LLC load accesses
> +
>    Total records
>    - sum of all cachelines accesses
>  
> @@ -207,9 +214,12 @@ For each cacheline in the 1) list we display following data:
>  
>  For each offset in the 2) list we display following data:
>  
> -  HITM - Rmt, Lcl
> +  HITM - Rmt, Lcl (For display with HITM types)
>    - % of Remote/Local HITM accesses for given offset within cacheline
>  
> +  LLC LD - LclHit, LclHitm (For display 'llc')
> +  - % of LLC hits and LLC HITMs accesses for given offset within cacheline
> +
>    Store Refs - L1 Hit, L1 Miss
>    - % of store accesses that hit/missed L1 for given offset within cacheline
>  
> -- 
> 2.17.1
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load
  2020-10-20  7:25   ` Jiri Olsa
@ 2020-10-20  8:08     ` Leo Yan
  2020-10-22  8:43       ` Jiri Olsa
  0 siblings, 1 reply; 19+ messages in thread
From: Leo Yan @ 2020-10-20  8:08 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Kan Liang,
	Andi Kleen, Ian Rogers, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

Hi Jiri,

On Tue, Oct 20, 2020 at 09:25:53AM +0200, Jiri Olsa wrote:
> On Thu, Oct 15, 2020 at 03:50:40PM +0100, Leo Yan wrote:
> 
> SNIP
> 
> > @@ -1533,6 +1539,7 @@ static struct c2c_header percent_hitm_header[] = {
> >  	[DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
> >  	[DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
> >  	[DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
> > +	[DISPLAY_LLC] = HEADER_BOTH("LLC", "Hit"),
> >  };
> >  
> >  static struct c2c_dimension dim_percent_hitm = {
> > @@ -2002,6 +2009,10 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
> >  		break;
> >  	case DISPLAY_TOT:
> >  		FILTER_ENTRY(tot_hitm);
> > +		break;
> > +	case DISPLAY_LLC:
> > +		FILTER_ENTRY(tot_llchit);
> > +		break;
> >  	default:
> >  		break;
> >  	}
> > @@ -2032,6 +2043,9 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
> >  	case DISPLAY_TOT:
> >  		has_record = !!c2c_he->stats.tot_hitm;
> >  		break;
> > +	case DISPLAY_LLC:
> > +		has_record = !!c2c_he->stats.tot_llchit;
> > +		break;
> >  	default:
> >  		break;
> >  	}
> 
> there's one more switch with c2c.display in percent_hitm function,
> where you did not add DISPLAY_LLC case.. I guess it should not ever
> because we will not use that column in llc display mode, but we
> should add at least some warning or that

Exactly, for "DISPLAY_LLC" case, it will not run in the function
percent_hitm(); will add warning for that.

> SNIP
> 
> > -				"cl_rmt_hitm,"
> > -				"cl_lcl_hitm,"
> > -				"cl_stores_l1hit,"
> > -				"cl_stores_l1miss,"
> > -				"dcacheline",
> > -				NULL);
> > +	ret = hpp_list__parse(&hpp_list, cl_output, NULL);
> >  
> >  	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
> >  		return;
> > @@ -2357,7 +2384,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
> >  		fprintf(out, "%-36s: %s\n", first ? "  Events" : "", evsel__name(evsel));
> >  		first = false;
> >  	}
> > -	fprintf(out, "  Cachelines sort on                : %s HITMs\n",
> > +	fprintf(out, "  Cachelines sort on                : %s\n",
> >  		display_str[c2c.display]);
> >  	fprintf(out, "  Cacheline data grouping           : %s\n", c2c.cl_sort);
> >  }
> > @@ -2514,7 +2541,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
> >  {
> >  	scnprintf(bf, size,
> >  		  "Shared Data Cache Line Table     "
> > -		  "(%lu entries, sorted on %s HITMs)",
> > +		  "(%lu entries, sorted on %s)",
> >  		  browser->nr_non_filtered_entries,
> >  		  display_str[c2c.display]);
> >  	return 0;
> > @@ -2720,6 +2747,8 @@ static int setup_display(const char *str)
> >  		c2c.display = DISPLAY_RMT;
> >  	else if (!strcmp(display, "lcl"))
> >  		c2c.display = DISPLAY_LCL;
> > +	else if (!strcmp(display, "llc"))
> > +		c2c.display = DISPLAY_LLC;
> 
> please update man page with this
> 
> >  	else {
> >  		pr_err("failed: unknown display type: %s\n", str);
> >  		return -1;
> > @@ -2766,9 +2795,10 @@ static int build_cl_output(char *cl_sort, bool no_source)
> >  	}
> >  
> >  	if (asprintf(&c2c.cl_output,
> > -		"%s%s%s%s%s%s%s%s%s%s",
> > +		"%s%s%s%s%s%s%s%s%s%s%s",
> 
> why is there extra '%s' when we did not add new argument.. ?

This is deliberate.  The change is as below:

        if (asprintf(&c2c.cl_output,
-               "%s%s%s%s%s%s%s%s%s%s",
+               "%s%s%s%s%s%s%s%s%s%s%s",
                c2c.use_stdio ? "cl_num_empty," : "",
-               "percent_rmt_hitm,"
+               c2c.display == DISPLAY_LLC ? "percent_llc_hit," :
+                                            "percent_rmt_hitm,",
                "percent_lcl_hitm,"

In the old code the string "percent_rmt_hitm," is merged with later
lines (the second string is "percent_lcl_hitm,") into single string.

In this patch, it needs to check condition c2c.display and pass
different string ("percent_llc_hit," vs "percent_rmt_hitm,"), thus
the string ("percent_llc_hit," or "percent_rmt_hitm,") is passed
independently, it's _NOT_ jointed with sequnetial lines.

> SNIP
> 
> > +			     "ld_fbhit,ld_l1hit,ld_l2hit,"
> > +			     "ld_lclhit,lcl_hitm,"
> > +			     "ld_rmthit,rmt_hitm,"
> > +			     "dram_lcl,dram_rmt";
> > +	else /* c2c.display == DISPLAY_LLC */
> > +		output_str = "cl_idx,"
> > +			     "dcacheline,"
> > +			     "dcacheline_node,"
> > +			     "dcacheline_count,"
> > +			     "percent_llchit,"
> > +			     "tot_llchit,"
> > +			     "tot_recs,"
> > +			     "tot_loads,"
> > +			     "tot_stores,"
> > +			     "stores_l1hit,stores_l1miss,"
> > +			     "ld_fbhit,ld_l1hit,ld_l2hit,"
> > +			     "ld_lclhit,lcl_hitm,"
> > +			     "ld_rmthit,rmt_hitm,"
> > +			     "dram_lcl,dram_rmt";
> > +
> > +	if (c2c.display == DISPLAY_TOT)
> > +		sort_str = "tot_hitm";
> > +	else if (c2c.display == DISPLAY_RMT)
> > +		sort_str = "rmt_hitm";
> > +	else if (c2c.display == DISPLAY_LCL)
> > +		sort_str = "lcl_hitm";
> > +	else if (c2c.display == DISPLAY_LLC)
> > +		sort_str = "tot_llchit";
> > +
> 
> could you please split addition of output_str/sort_str into separate
> patch and then add DISPLAY_LLC support? it'd ease up review 
> 
> perhaps include also print_pareto changes in it 

Will do.

Thanks a lot for reviewing,
Leo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load
  2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
                   ` (8 preceding siblings ...)
  2020-10-15 15:05 ` [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Arnaldo Carvalho de Melo
@ 2020-10-20  8:13 ` Namhyung Kim
  2020-10-20  8:18   ` Leo Yan
  9 siblings, 1 reply; 19+ messages in thread
From: Namhyung Kim @ 2020-10-20  8:13 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Kan Liang,
	Andi Kleen, Ian Rogers, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

Hello,

On Thu, Oct 15, 2020 at 11:51 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> rely on HITM display to report cache false sharing.  Alternatively, we
> can use the LLC access and multi-threads info to locate the potential
> false sharing's data address, and if we connect with source code and
> analyze the multi-threads' execution timing, if can conclude load and
> store the same cache line at the meantime, thus this can be helpful for
> resolve the cache false sharing issue.
>
> This patch set is to enable the display with sorting on LLC load
> accesses; it adds dimensions for total LLC hit and LLC load accesses,
> and these dimensions are used for shared cache line table and pareto.
>
> This patch set is dependend on the patch set "perf c2c: Refine the
> organization of metrics" [1].
>
> [1] https://lore.kernel.org/patchwork/cover/1321499/
>
> With this patch set, we can get display 'llc' as follows:
>
>   # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio

I'm not sure if you ran the test on x86 or ARM.
IIUC ARM should have 0 local hitm, right?

Thanks
Namhyung

>
>   [...]
>
>   =================================================
>              Shared Data Cache Line Table
>   =================================================
>   #
>   #        ----------- Cacheline ----------  LLC Hit   LLC Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
>   # Index             Address  Node  PA cnt      Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
>   # .....  ..................  ....  ......  .......  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
>   #
>         0      0x563b01e83100     0    1401   65.32%       648     7011     3738     3273     2582      691      515     2516       59       143      505         0        0         0         0
>         1      0x563b01e830c0     0       1   26.51%       263      400      400        0        0        0      130        3        4       262        1         0        0         0         0
>         2      0x563b01e83080     0       1    7.76%        77      650      650        0        0        0      180      348       45        14       63         0        0         0         0
>         3  0xffff88c3d74e82c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
>         4  0xffffa587c11e38c0   N/A       0    0.10%         1        2        1        1        1        0        0        0        0         1        0         0        0         0         0
>         5  0xffffffffbd5e6fc0     0       1    0.10%         1        1        1        0        0        0        0        0        0         0        1         0        0         0         0
>         6      0x7f90a4d6c2c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
>
>   =================================================
>         Shared Cache Line Distribution Pareto
>   =================================================
>   #
>   #        ---- LLC LD ----  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
>   #   Num   LclHit  LclHitm   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
>   # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
>   #
>     -------------------------------------------------------------
>         0      143      505     2582      691      0x563b01e83100
>     -------------------------------------------------------------
>             96.50%    7.72%   46.79%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c16         0      1949      1331     1876         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
>              0.00%   35.05%    0.00%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c1d         0      2651       975      748         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              0.00%   30.89%    0.00%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c1d         0      1425      1003      762         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              2.10%    7.52%   49.19%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c16         0      1585      1053     2037         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
>              0.00%    0.00%    2.52%   44.86%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c28         0         0         0      375         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              0.00%    0.00%    1.51%   55.14%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c28         0         0         0      420         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
>              1.40%   12.87%    0.00%    0.00%                0x20     0       1    14100    14104:reader_thd      0x563b01c81c73         0       166        99      417         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
>              0.00%    5.94%    0.00%    0.00%                0x20     0       1    14100    14105:reader_thd      0x563b01c81c73         0       144        85      376         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
>
>   [...]
>
>
> Leo Yan (8):
>   perf mem: Add structure field c2c_stats::tot_llchit
>   perf c2c: Add dimensions for total LLC hit
>   perf c2c: Add dimensions for LLC load hit
>   perf c2c: Change to general naming for macros
>   perf c2c: Rename for shared cache line stats
>   perf c2c: Refactor hist entry validation
>   perf c2c: Add option '-d llc' for sorting with LLC load
>   perf c2c: Update documentation for display option 'llc'
>
>  tools/perf/Documentation/perf-c2c.txt |  18 +-
>  tools/perf/builtin-c2c.c              | 333 +++++++++++++++++++++-----
>  tools/perf/util/mem-events.c          |   3 +
>  tools/perf/util/mem-events.h          |   1 +
>  4 files changed, 286 insertions(+), 69 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load
  2020-10-20  7:26   ` Jiri Olsa
@ 2020-10-20  8:14     ` Leo Yan
  0 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-20  8:14 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Kan Liang,
	Andi Kleen, Ian Rogers, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

On Tue, Oct 20, 2020 at 09:26:03AM +0200, Jiri Olsa wrote:
> On Thu, Oct 15, 2020 at 03:50:40PM +0100, Leo Yan wrote:
> > Except the existed three display options 'tot', 'rmt', 'lcl', this patch
> > adds option 'llc' so that can sort on LLC load metrics.  The new
> > introduced option can work as a choice if the memory event doesn't
> > contain HITM tags.
> > 
> > For the display with option 'llc', both the "Shared Data Cache Line
> > Table" and "Shared Cache Line Distribution Pareto" have difference
> > comparing to other three display options.
> > 
> > For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
> > it sorts with the LLC hit metrics "tot_llchit".  In this case, users
> > should be interested in LLC related statistics, so the dimensions of
> > total LLC hit is used to replace HITM related dimensions.
> > 
> > For Pareto, every single cache line shows the metrics "cl_llc_hit"
> > instead of "cl_rmt_hitm", and the single cache line view is sorted by
> > metrics "tot_llchit".
> 
> hi,
> I'm getting compilation error on newer gcc:
> 
>   CC       builtin-c2c.o
> builtin-c2c.c: In function ‘perf_c2c__report’:
> builtin-c2c.c:1979:9: error: ‘sort_str’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
>  1979 |  return hpp_list__parse(&c2c_hists->list, output, sort);
>       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> builtin-c2c.c:2900:27: note: ‘sort_str’ was declared here
>  2900 |  const char *output_str, *sort_str;
>       |                           ^~~~~~~~

Sorry for this building error.  I didn't detect this error on both
Arm64 and x86;  will fix it.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load
  2020-10-20  8:13 ` Namhyung Kim
@ 2020-10-20  8:18   ` Leo Yan
  0 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2020-10-20  8:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Kan Liang,
	Andi Kleen, Ian Rogers, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

On Tue, Oct 20, 2020 at 05:13:01PM +0900, Namhyung Kim wrote:
> Hello,
> 
> On Thu, Oct 15, 2020 at 11:51 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> > rely on HITM display to report cache false sharing.  Alternatively, we
> > can use the LLC access and multi-threads info to locate the potential
> > false sharing's data address, and if we connect with source code and
> > analyze the multi-threads' execution timing, if can conclude load and
> > store the same cache line at the meantime, thus this can be helpful for
> > resolve the cache false sharing issue.
> >
> > This patch set is to enable the display with sorting on LLC load
> > accesses; it adds dimensions for total LLC hit and LLC load accesses,
> > and these dimensions are used for shared cache line table and pareto.
> >
> > This patch set is dependend on the patch set "perf c2c: Refine the
> > organization of metrics" [1].
> >
> > [1] https://lore.kernel.org/patchwork/cover/1321499/
> >
> > With this patch set, we can get display 'llc' as follows:
> >
> >   # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio
> 
> I'm not sure if you ran the test on x86 or ARM.
> IIUC ARM should have 0 local hitm, right?

Yes, on Arm64 the local HITM and remote HITM both are zeros.  Below is
the testing result on x86.

Thanks,
Leo

> >   [...]
> >
> >   =================================================
> >              Shared Data Cache Line Table
> >   =================================================
> >   #
> >   #        ----------- Cacheline ----------  LLC Hit   LLC Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
> >   # Index             Address  Node  PA cnt      Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
> >   # .....  ..................  ....  ......  .......  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
> >   #
> >         0      0x563b01e83100     0    1401   65.32%       648     7011     3738     3273     2582      691      515     2516       59       143      505         0        0         0         0
> >         1      0x563b01e830c0     0       1   26.51%       263      400      400        0        0        0      130        3        4       262        1         0        0         0         0
> >         2      0x563b01e83080     0       1    7.76%        77      650      650        0        0        0      180      348       45        14       63         0        0         0         0
> >         3  0xffff88c3d74e82c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
> >         4  0xffffa587c11e38c0   N/A       0    0.10%         1        2        1        1        1        0        0        0        0         1        0         0        0         0         0
> >         5  0xffffffffbd5e6fc0     0       1    0.10%         1        1        1        0        0        0        0        0        0         0        1         0        0         0         0
> >         6      0x7f90a4d6c2c0     0       1    0.10%         1        1        1        0        0        0        0        0        0         1        0         0        0         0         0
> >
> >   =================================================
> >         Shared Cache Line Distribution Pareto
> >   =================================================
> >   #
> >   #        ---- LLC LD ----  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
> >   #   Num   LclHit  LclHitm   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
> >   # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
> >   #
> >     -------------------------------------------------------------
> >         0      143      505     2582      691      0x563b01e83100
> >     -------------------------------------------------------------
> >             96.50%    7.72%   46.79%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c16         0      1949      1331     1876         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
> >              0.00%   35.05%    0.00%    0.00%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c1d         0      2651       975      748         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
> >              0.00%   30.89%    0.00%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c1d         0      1425      1003      762         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
> >              2.10%    7.52%   49.19%    0.00%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c16         0      1585      1053     2037         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
> >              0.00%    0.00%    2.52%   44.86%                 0x0     0       1    14100    14102:lock_th         0x563b01c81c28         0         0         0      375         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
> >              0.00%    0.00%    1.51%   55.14%                 0x0     0       1    14100    14103:lock_th         0x563b01c81c28         0         0         0      420         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
> >              1.40%   12.87%    0.00%    0.00%                0x20     0       1    14100    14104:reader_thd      0x563b01c81c73         0       166        99      417         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
> >              0.00%    5.94%    0.00%    0.00%                0x20     0       1    14100    14105:reader_thd      0x563b01c81c73         0       144        85      376         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
> >
> >   [...]
> >
> >
> > Leo Yan (8):
> >   perf mem: Add structure field c2c_stats::tot_llchit
> >   perf c2c: Add dimensions for total LLC hit
> >   perf c2c: Add dimensions for LLC load hit
> >   perf c2c: Change to general naming for macros
> >   perf c2c: Rename for shared cache line stats
> >   perf c2c: Refactor hist entry validation
> >   perf c2c: Add option '-d llc' for sorting with LLC load
> >   perf c2c: Update documentation for display option 'llc'
> >
> >  tools/perf/Documentation/perf-c2c.txt |  18 +-
> >  tools/perf/builtin-c2c.c              | 333 +++++++++++++++++++++-----
> >  tools/perf/util/mem-events.c          |   3 +
> >  tools/perf/util/mem-events.h          |   1 +
> >  4 files changed, 286 insertions(+), 69 deletions(-)
> >
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load
  2020-10-20  8:08     ` Leo Yan
@ 2020-10-22  8:43       ` Jiri Olsa
  0 siblings, 0 replies; 19+ messages in thread
From: Jiri Olsa @ 2020-10-22  8:43 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Kan Liang,
	Andi Kleen, Ian Rogers, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel

On Tue, Oct 20, 2020 at 04:08:39PM +0800, Leo Yan wrote:

SNIP

> > 
> > please update man page with this
> > 
> > >  	else {
> > >  		pr_err("failed: unknown display type: %s\n", str);
> > >  		return -1;
> > > @@ -2766,9 +2795,10 @@ static int build_cl_output(char *cl_sort, bool no_source)
> > >  	}
> > >  
> > >  	if (asprintf(&c2c.cl_output,
> > > -		"%s%s%s%s%s%s%s%s%s%s",
> > > +		"%s%s%s%s%s%s%s%s%s%s%s",
> > 
> > why is there extra '%s' when we did not add new argument.. ?
> 
> This is deliberate.  The change is as below:
> 
>         if (asprintf(&c2c.cl_output,
> -               "%s%s%s%s%s%s%s%s%s%s",
> +               "%s%s%s%s%s%s%s%s%s%s%s",
>                 c2c.use_stdio ? "cl_num_empty," : "",
> -               "percent_rmt_hitm,"
> +               c2c.display == DISPLAY_LLC ? "percent_llc_hit," :
> +                                            "percent_rmt_hitm,",
>                 "percent_lcl_hitm,"
> 
> In the old code the string "percent_rmt_hitm," is merged with later
> lines (the second string is "percent_lcl_hitm,") into single string.
> 
> In this patch, it needs to check condition c2c.display and pass
> different string ("percent_llc_hit," vs "percent_rmt_hitm,"), thus
> the string ("percent_llc_hit," or "percent_rmt_hitm,") is passed
> independently, it's _NOT_ jointed with sequnetial lines.

ah right, it's now 2 arguments instead of one

thanks,
jirka


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-10-22  8:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 14:50 [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Leo Yan
2020-10-15 14:50 ` [PATCH v1 1/8] perf mem: Add structure field c2c_stats::tot_llchit Leo Yan
2020-10-15 14:50 ` [PATCH v1 2/8] perf c2c: Add dimensions for total LLC hit Leo Yan
2020-10-15 14:50 ` [PATCH v1 3/8] perf c2c: Add dimensions for LLC load hit Leo Yan
2020-10-15 14:50 ` [PATCH v1 4/8] perf c2c: Change to general naming for macros Leo Yan
2020-10-15 14:50 ` [PATCH v1 5/8] perf c2c: Rename for shared cache line stats Leo Yan
2020-10-15 14:50 ` [PATCH v1 6/8] perf c2c: Refactor hist entry validation Leo Yan
2020-10-15 14:50 ` [PATCH v1 7/8] perf c2c: Add option '-d llc' for sorting with LLC load Leo Yan
2020-10-20  7:25   ` Jiri Olsa
2020-10-20  8:08     ` Leo Yan
2020-10-22  8:43       ` Jiri Olsa
2020-10-20  7:26   ` Jiri Olsa
2020-10-20  8:14     ` Leo Yan
2020-10-15 14:50 ` [PATCH v1 8/8] perf c2c: Update documentation for display option 'llc' Leo Yan
2020-10-20  7:26   ` Jiri Olsa
2020-10-15 15:05 ` [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load Arnaldo Carvalho de Melo
2020-10-15 15:14   ` Leo Yan
2020-10-20  8:13 ` Namhyung Kim
2020-10-20  8:18   ` Leo Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.