* [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
@ 2020-12-13 13:38 Leo Yan
2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
` (11 more replies)
0 siblings, 12 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
This patch set is to sort cache line for all load operations which hit
any cache levels. For single cache line view, it shows the load
references for loads with cache hits and with cache misses respectively.
This series is a following for the old patch set "perf c2c: Sort
cacheline with LLC load" [1], in the old patch set it tries to sort
cache line with the load operations in last level cache (LLC), after
testing we found the trace data doesn't contain LLC events if the
platform isn't a NUMA system. For this reason, this series refines the
implementation to sort on all cache levels hits of load operations; it's
reasonable for us to review the load and store opreations, if detects
any cache line is accessed by multi-threads, this hints that the cache
line is possible for false sharing.
This patch set is clearly applied on perf/core branch with the latest
commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
evlist' record methods"). And the changes has been tested on x86 and
Arm64, the testing result is shown as below.
The testing result on x86:
# perf c2c record -- false_sharing.exe 2
# perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio
[...]
=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- Load Hit Load Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ........ ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0x556f25dff100 0 1895 75.73% 4591 7840 4591 3249 2633 616 849 2734 67 58 883 0 0 0 0
1 0x556f25dff080 0 1 13.10% 794 794 794 0 0 0 164 486 28 20 96 0 0 0 0
2 0x556f25dff0c0 0 1 10.01% 607 607 607 0 0 0 107 5 5 488 2 0 0 0 0
=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# -- Load Refs -- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
# Num Hit Miss L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
# ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
#
-------------------------------------------------------------
0 4591 0 2633 616 0x556f25dff100
-------------------------------------------------------------
20.52% 0.00% 0.00% 0.00% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc1d 0 2200 1276 942 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
19.82% 0.00% 38.06% 0.00% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc16 0 2190 1130 1912 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
18.25% 0.00% 56.63% 0.00% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc16 0 2173 1074 2329 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
18.23% 0.00% 0.00% 0.00% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc1d 0 2013 1220 837 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 0.00% 3.11% 59.90% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc28 0 0 0 451 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 0.00% 2.20% 40.10% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc28 0 0 0 305 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
12.00% 0.00% 0.00% 0.00% 0x20 0 1 28079 28083:reader_thd 0x556f25bfdc73 0 159 107 551 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
11.17% 0.00% 0.00% 0.00% 0x20 0 1 28079 28084:reader_thd 0x556f25bfdc73 0 148 108 513 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
[...]
The testing result on Arm64 (Hisilicon D06); please note, the Arm SPE
data source patch set has not been merged into the mainline kernel and
a potential issue for store operations is working in progress, so the
final outputting result might have minor differences.
# perf c2c record -- false_sharing.exe 2
# perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio
[...]
=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- Load Hit Load Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ........ ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0xaaaab4e8b100 N/A 0 35.04% 100447 104933 100447 4486 4486 0 0 11269 0 89178 0 0 0 0 0
1 0xaaaab4e8af80 N/A 0 17.29% 49571 49571 49571 0 0 0 0 49571 0 0 0 0 0 0 0
2 0xaaaab4e8afc0 N/A 0 16.72% 47922 47922 47922 0 0 0 0 47922 0 0 0 0 0 0 0
3 0xaaaab4e8b080 N/A 0 8.94% 25641 67718 25641 42077 42077 0 0 4397 0 21244 0 0 0 0 0
4 0xaaaab4e7a480 N/A 0 4.42% 12680 12680 12680 0 0 0 0 12680 0 0 0 0 0 0 0
5 0xffffa2ffc980 N/A 0 2.62% 7511 7511 7511 0 0 0 0 7511 0 0 0 0 0 0 0
6 0xffffa3ffe980 N/A 0 2.57% 7374 7374 7374 0 0 0 0 7374 0 0 0 0 0 0 0
7 0xaaaab4e8b000 N/A 0 2.41% 6907 6907 6907 0 0 0 0 6907 0 0 0 0 0 0 0
8 0xaaaab4e8b0c0 N/A 0 2.30% 6592 6592 6592 0 0 0 0 2822 0 3770 0 0 0 0 0
9 0xffffa37fd980 N/A 0 2.24% 6408 6408 6408 0 0 0 0 6408 0 0 0 0 0 0 0
10 0xffffb8d80980 N/A 0 2.18% 6254 6254 6254 0 0 0 0 6254 0 0 0 0 0 0 0
11 0xffffb9d82980 N/A 0 1.31% 3763 9706 3763 5943 5943 0 0 3763 0 0 0 0 0 0 0
12 0xffffb9581980 N/A 0 1.22% 3507 11484 3507 7977 7977 0 0 3507 0 0 0 0 0 0 0
13 0xffffbad84980 N/A 0 0.33% 932 7766 932 6834 6834 0 0 932 0 0 0 0 0 0 0
14 0xffffba583980 N/A 0 0.24% 700 6503 700 5803 5803 0 0 700 0 0 0 0 0 0 0
=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# -- Load Refs -- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
# Num Hit Miss L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
# ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
#
-------------------------------------------------------------
0 100447 0 4486 0 0xaaaab4e8b100
-------------------------------------------------------------
15.44% 0.00% 0.00% 0.00% 0x0 N/A 0 15046 15049:lock_th 0xaaaab4e79dd0 0 0 0 15508 2 [.] read_write_func false_sharing.exe false_sharing_example.c:146 1
14.43% 0.00% 0.00% 0.00% 0x0 N/A 0 15046 15048:lock_th 0xaaaab4e79dd0 0 0 0 14499 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
11.57% 0.00% 0.00% 0.00% 0x0 N/A 0 15046 15048:lock_th 0xaaaab4e79db8 0 0 0 11622 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
11.38% 0.00% 0.00% 0.00% 0x0 N/A 0 15046 15050:lock_th 0xaaaab4e79dd0 0 0 0 11429 2 [.] read_write_func false_sharing.exe false_sharing_example.c:146 2
10.57% 0.00% 0.00% 0.00% 0x0 N/A 0 15046 15051:lock_th 0xaaaab4e79dd0 0 0 0 10614 2 [.] read_write_func false_sharing.exe false_sharing_example.c:146 3
9.69% 0.00% 0.00% 0.00% 0x0 N/A 0 15046 15049:lock_th 0xaaaab4e79db8 0 0 0 9731 2 [.] read_write_func false_sharing.exe false_sharing_example.c:145 1
5.74% 0.00% 0.00% 0.00% 0x0 N/A 0 15046 15050:lock_th 0xaaaab4e79db8 0 0 0 5763 2 [.] read_write_func false_sharing.exe false_sharing_example.c:145 2
4.84% 0.00% 0.00% 0.00% 0x0 N/A 0 15046 15051:lock_th 0xaaaab4e79db8 0 0 0 4866 2 [.] read_write_func false_sharing.exe false_sharing_example.c:145 3
0.00% 0.00% 14.02% 0.00% 0x0 N/A 0 15046 15048:lock_th 0xaaaab4e79dbc 0 0 0 629 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
0.00% 0.00% 6.44% 0.00% 0x0 N/A 0 15046 15048:lock_th 0xaaaab4e79de0 0 0 0 289 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 0.00% 12.37% 0.00% 0x0 N/A 0 15046 15049:lock_th 0xaaaab4e79dbc 0 0 0 555 2 [.] read_write_func false_sharing.exe false_sharing_example.c:145 1
0.00% 0.00% 6.46% 0.00% 0x0 N/A 0 15046 15049:lock_th 0xaaaab4e79de0 0 0 0 290 2 [.] read_write_func false_sharing.exe false_sharing_example.c:146 1
0.00% 0.00% 21.38% 0.00% 0x0 N/A 0 15046 15050:lock_th 0xaaaab4e79dbc 0 0 0 959 2 [.] read_write_func false_sharing.exe false_sharing_example.c:145 2
0.00% 0.00% 9.61% 0.00% 0x0 N/A 0 15046 15050:lock_th 0xaaaab4e79de0 0 0 0 431 2 [.] read_write_func false_sharing.exe false_sharing_example.c:146 2
0.00% 0.00% 22.14% 0.00% 0x0 N/A 0 15046 15051:lock_th 0xaaaab4e79dbc 0 0 0 993 2 [.] read_write_func false_sharing.exe false_sharing_example.c:145 3
0.00% 0.00% 7.58% 0.00% 0x0 N/A 0 15046 15051:lock_th 0xaaaab4e79de0 0 0 0 340 2 [.] read_write_func false_sharing.exe false_sharing_example.c:146 3
6.66% 0.00% 0.00% 0.00% 0x20 N/A 0 15046 15054:reader_thd 0xaaaab4e79e54 0 0 0 6687 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 2
3.76% 0.00% 0.00% 0.00% 0x28 N/A 0 15046 15052:reader_thd 0xaaaab4e79e80 0 0 0 3774 1 [.] read_write_func false_sharing.exe false_sharing_example.c:159 0
3.54% 0.00% 0.00% 0.00% 0x28 N/A 0 15046 15055:reader_thd 0xaaaab4e79e80 0 0 0 3551 1 [.] read_write_func false_sharing.exe false_sharing_example.c:159 3
2.39% 0.00% 0.00% 0.00% 0x30 N/A 0 15046 15053:reader_thd 0xaaaab4e79eac 0 0 0 2403 1 [.] read_write_func false_sharing.exe false_sharing_example.c:163 1
[...]
Changes from v1:
* Changed from sorting on LLC to sorting on all loads with cache hits;
* Added patches 06/11, 07/11 for refactoring macros;
* Added patch 08/11 for refactoring node header, so can display "%loads"
rather than "%hitms" in the header;
* Added patch 09/11 to add local pointers for pointing to output metrics
string and sort string (Juri);
* Added warning in percent_hitm() for the display "all", which should
never happen (Juri).
[1] https://lore.kernel.org/patchwork/cover/1321514/
Leo Yan (11):
perf c2c: Add dimensions for total load hit
perf c2c: Add dimensions for load hit
perf c2c: Add dimensions for load miss
perf c2c: Rename for shared cache line stats
perf c2c: Refactor hist entry validation
perf c2c: Refactor display filter macro
perf c2c: Refactor node display macro
perf c2c: Refactor node header
perf c2c: Add local variables for output metrics
perf c2c: Sort on all cache hit for load operations
perf c2c: Update documentation for display option 'all'
tools/perf/Documentation/perf-c2c.txt | 21 +-
tools/perf/builtin-c2c.c | 548 ++++++++++++++++++++++----
2 files changed, 487 insertions(+), 82 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 01/11] perf c2c: Add dimensions for total load hit
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2021-01-06 7:38 ` Namhyung Kim
2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
` (10 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
Arm SPE trace data doesn't support HITM, but we still want to explore
"perf c2c" tool to analyze cache false sharing. If without HITM tag,
the tool cannot give out accurate result for cache false sharing, a
candidate solution is to sort the total load operations and connect with
the threads info, e.g. if multiple threads hit the same cache line for
many times, this can give out the hint that it's likely to cause cache
false sharing issue.
Unlike having HITM tag, the proposed solution is not accurate and might
introduce false positive reporting, but it's a pragmatic approach for
detecting false sharing if memory event doesn't support HITM.
To sort with the cache line hit, this patch adds dimensions for total
load hit and the associated percentage calculation.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 112 +++++++++++++++++++++++++++++++++++++++
1 file changed, 112 insertions(+)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index c5babeaa3b38..3d5a2dc8b4fd 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -615,6 +615,47 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return tot_hitm_left - tot_hitm_right;
}
+#define TOT_LD_HIT(stats) \
+ ((stats)->ld_fbhit + \
+ (stats)->ld_l1hit + \
+ (stats)->ld_l2hit + \
+ (stats)->ld_llchit + \
+ (stats)->lcl_hitm + \
+ (stats)->rmt_hitm + \
+ (stats)->rmt_hit)
+
+static int tot_ld_hit_entry(struct perf_hpp_fmt *fmt,
+ struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct c2c_hist_entry *c2c_he;
+ int width = c2c_width(fmt, hpp, he->hists);
+ unsigned int tot_hit;
+
+ c2c_he = container_of(he, struct c2c_hist_entry, he);
+ tot_hit = TOT_LD_HIT(&c2c_he->stats);
+
+ return scnprintf(hpp->buf, hpp->size, "%*u", width, tot_hit);
+}
+
+static int64_t tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left,
+ struct hist_entry *right)
+{
+ struct c2c_hist_entry *c2c_left;
+ struct c2c_hist_entry *c2c_right;
+ uint64_t tot_hit_left;
+ uint64_t tot_hit_right;
+
+ c2c_left = container_of(left, struct c2c_hist_entry, he);
+ c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+ tot_hit_left = TOT_LD_HIT(&c2c_left->stats);
+ tot_hit_right = TOT_LD_HIT(&c2c_right->stats);
+
+ return tot_hit_left - tot_hit_right;
+}
+
#define STAT_FN_ENTRY(__f) \
static int \
__f ## _entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp, \
@@ -860,6 +901,58 @@ percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return per_left - per_right;
}
+static double percent_tot_ld_hit(struct c2c_hist_entry *c2c_he)
+{
+ struct c2c_hists *hists;
+ int tot = 0, st = 0;
+
+ hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
+
+ st = TOT_LD_HIT(&c2c_he->stats);
+ tot = TOT_LD_HIT(&hists->stats);
+
+ return tot ? (double) st * 100 / tot : 0;
+}
+
+static int
+percent_tot_ld_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct c2c_hist_entry *c2c_he;
+ int width = c2c_width(fmt, hpp, he->hists);
+ char buf[10];
+ double per;
+
+ c2c_he = container_of(he, struct c2c_hist_entry, he);
+ per = percent_tot_ld_hit(c2c_he);
+ return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_tot_ld_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ return percent_color(fmt, hpp, he, percent_tot_ld_hit);
+}
+
+static int64_t
+percent_tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left, struct hist_entry *right)
+{
+ struct c2c_hist_entry *c2c_left;
+ struct c2c_hist_entry *c2c_right;
+ double per_left;
+ double per_right;
+
+ c2c_left = container_of(left, struct c2c_hist_entry, he);
+ c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+ per_left = percent_tot_ld_hit(c2c_left);
+ per_right = percent_tot_ld_hit(c2c_right);
+
+ return per_left - per_right;
+}
+
static struct c2c_stats *he_stats(struct hist_entry *he)
{
struct c2c_hist_entry *c2c_he;
@@ -1412,6 +1505,14 @@ static struct c2c_dimension dim_ld_rmthit = {
.width = 8,
};
+static struct c2c_dimension dim_tot_ld_hit = {
+ .header = HEADER_BOTH("Load Hit", "Total"),
+ .name = "tot_ld_hit",
+ .cmp = tot_ld_hit_cmp,
+ .entry = tot_ld_hit_entry,
+ .width = 8,
+};
+
static struct c2c_dimension dim_tot_recs = {
.header = HEADER_BOTH("Total", "records"),
.name = "tot_recs",
@@ -1460,6 +1561,15 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
.width = 7,
};
+static struct c2c_dimension dim_percent_tot_ld_hit = {
+ .header = HEADER_BOTH("Load Hit", "Pct"),
+ .name = "percent_tot_ld_hit",
+ .cmp = percent_tot_ld_hit_cmp,
+ .entry = percent_tot_ld_hit_entry,
+ .color = percent_tot_ld_hit_color,
+ .width = 8,
+};
+
static struct c2c_dimension dim_percent_stores_l1hit = {
.header = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
.name = "percent_stores_l1hit",
@@ -1615,11 +1725,13 @@ static struct c2c_dimension *dimensions[] = {
&dim_ld_l2hit,
&dim_ld_llchit,
&dim_ld_rmthit,
+ &dim_tot_ld_hit,
&dim_tot_recs,
&dim_tot_loads,
&dim_percent_hitm,
&dim_percent_rmt_hitm,
&dim_percent_lcl_hitm,
+ &dim_percent_tot_ld_hit,
&dim_percent_stores_l1hit,
&dim_percent_stores_l1miss,
&dim_dram_lcl,
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 02/11] perf c2c: Add dimensions for load hit
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2021-01-06 7:38 ` Namhyung Kim
2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
` (9 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
Add dimensions for load hit and its percentage calculation, which is to
be displayed in the single cache line output.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 71 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 71 insertions(+)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 3d5a2dc8b4fd..00014e3d81fa 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1052,6 +1052,58 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return per_left - per_right;
}
+static double percent_ld_hit(struct c2c_hist_entry *c2c_he)
+{
+ struct c2c_hists *hists;
+ int tot, st;
+
+ hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
+
+ st = TOT_LD_HIT(&c2c_he->stats);
+ tot = TOT_LD_HIT(&hists->stats);
+
+ return percent(st, tot);
+}
+
+static int
+percent_ld_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct c2c_hist_entry *c2c_he;
+ int width = c2c_width(fmt, hpp, he->hists);
+ char buf[10];
+ double per;
+
+ c2c_he = container_of(he, struct c2c_hist_entry, he);
+ per = percent_ld_hit(c2c_he);
+ return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_ld_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ return percent_color(fmt, hpp, he, percent_ld_hit);
+}
+
+static int64_t
+percent_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left, struct hist_entry *right)
+{
+ struct c2c_hist_entry *c2c_left;
+ struct c2c_hist_entry *c2c_right;
+ double per_left;
+ double per_right;
+
+ c2c_left = container_of(left, struct c2c_hist_entry, he);
+ c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+ per_left = percent_ld_hit(c2c_left);
+ per_right = percent_ld_hit(c2c_right);
+
+ return per_left - per_right;
+}
+
static int
percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
struct hist_entry *he)
@@ -1417,6 +1469,14 @@ static struct c2c_dimension dim_cl_rmt_hitm = {
.width = 7,
};
+static struct c2c_dimension dim_cl_tot_ld_hit = {
+ .header = HEADER_SPAN("--- Load ---", "Hit", 1),
+ .name = "cl_tot_ld_hit",
+ .cmp = tot_ld_hit_cmp,
+ .entry = tot_ld_hit_entry,
+ .width = 7,
+};
+
static struct c2c_dimension dim_cl_lcl_hitm = {
.header = HEADER_SPAN_LOW("Lcl"),
.name = "cl_lcl_hitm",
@@ -1570,6 +1630,15 @@ static struct c2c_dimension dim_percent_tot_ld_hit = {
.width = 8,
};
+static struct c2c_dimension dim_percent_ld_hit = {
+ .header = HEADER_SPAN("-- Load Refs --", "Hit", 1),
+ .name = "percent_ld_hit",
+ .cmp = percent_ld_hit_cmp,
+ .entry = percent_ld_hit_entry,
+ .color = percent_ld_hit_color,
+ .width = 7,
+};
+
static struct c2c_dimension dim_percent_stores_l1hit = {
.header = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
.name = "percent_stores_l1hit",
@@ -1715,6 +1784,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_rmt_hitm,
&dim_cl_lcl_hitm,
&dim_cl_rmt_hitm,
+ &dim_cl_tot_ld_hit,
&dim_tot_stores,
&dim_stores_l1hit,
&dim_stores_l1miss,
@@ -1731,6 +1801,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_percent_hitm,
&dim_percent_rmt_hitm,
&dim_percent_lcl_hitm,
+ &dim_percent_ld_hit,
&dim_percent_tot_ld_hit,
&dim_percent_stores_l1hit,
&dim_percent_stores_l1miss,
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 03/11] perf c2c: Add dimensions for load miss
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2021-01-06 7:42 ` Namhyung Kim
2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
` (8 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
Add dimensions for load miss and its percentage calculation, which is to
be displayed in the single cache line output.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 107 +++++++++++++++++++++++++++++++++++++++
1 file changed, 107 insertions(+)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 00014e3d81fa..27745340c14a 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -624,6 +624,10 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
(stats)->rmt_hitm + \
(stats)->rmt_hit)
+#define TOT_LD_MISS(stats) \
+ ((stats)->lcl_dram + \
+ (stats)->rmt_dram)
+
static int tot_ld_hit_entry(struct perf_hpp_fmt *fmt,
struct perf_hpp *hpp,
struct hist_entry *he)
@@ -656,6 +660,38 @@ static int64_t tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return tot_hit_left - tot_hit_right;
}
+static int tot_ld_miss_entry(struct perf_hpp_fmt *fmt,
+ struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct c2c_hist_entry *c2c_he;
+ int width = c2c_width(fmt, hpp, he->hists);
+ unsigned int tot_miss;
+
+ c2c_he = container_of(he, struct c2c_hist_entry, he);
+ tot_miss = TOT_LD_MISS(&c2c_he->stats);
+
+ return scnprintf(hpp->buf, hpp->size, "%*u", width, tot_miss);
+}
+
+static int64_t tot_ld_miss_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left,
+ struct hist_entry *right)
+{
+ struct c2c_hist_entry *c2c_left;
+ struct c2c_hist_entry *c2c_right;
+ uint64_t tot_miss_left;
+ uint64_t tot_miss_right;
+
+ c2c_left = container_of(left, struct c2c_hist_entry, he);
+ c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+ tot_miss_left = TOT_LD_MISS(&c2c_left->stats);
+ tot_miss_right = TOT_LD_MISS(&c2c_right->stats);
+
+ return tot_miss_left - tot_miss_right;
+}
+
#define STAT_FN_ENTRY(__f) \
static int \
__f ## _entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp, \
@@ -1104,6 +1140,58 @@ percent_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
return per_left - per_right;
}
+static double percent_ld_miss(struct c2c_hist_entry *c2c_he)
+{
+ struct c2c_hists *hists;
+ int tot, st;
+
+ hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
+
+ st = TOT_LD_MISS(&c2c_he->stats);
+ tot = TOT_LD_MISS(&hists->stats);
+
+ return percent(st, tot);
+}
+
+static int
+percent_ld_miss_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct c2c_hist_entry *c2c_he;
+ int width = c2c_width(fmt, hpp, he->hists);
+ char buf[10];
+ double per;
+
+ c2c_he = container_of(he, struct c2c_hist_entry, he);
+ per = percent_ld_miss(c2c_he);
+ return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
+}
+
+static int
+percent_ld_miss_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ return percent_color(fmt, hpp, he, percent_ld_miss);
+}
+
+static int64_t
+percent_ld_miss_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
+ struct hist_entry *left, struct hist_entry *right)
+{
+ struct c2c_hist_entry *c2c_left;
+ struct c2c_hist_entry *c2c_right;
+ double per_left;
+ double per_right;
+
+ c2c_left = container_of(left, struct c2c_hist_entry, he);
+ c2c_right = container_of(right, struct c2c_hist_entry, he);
+
+ per_left = percent_ld_miss(c2c_left);
+ per_right = percent_ld_miss(c2c_right);
+
+ return per_left - per_right;
+}
+
static int
percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
struct hist_entry *he)
@@ -1477,6 +1565,14 @@ static struct c2c_dimension dim_cl_tot_ld_hit = {
.width = 7,
};
+static struct c2c_dimension dim_cl_tot_ld_miss = {
+ .header = HEADER_SPAN_LOW("Miss"),
+ .name = "cl_tot_ld_miss",
+ .cmp = tot_ld_miss_cmp,
+ .entry = tot_ld_miss_entry,
+ .width = 7,
+};
+
static struct c2c_dimension dim_cl_lcl_hitm = {
.header = HEADER_SPAN_LOW("Lcl"),
.name = "cl_lcl_hitm",
@@ -1639,6 +1735,15 @@ static struct c2c_dimension dim_percent_ld_hit = {
.width = 7,
};
+static struct c2c_dimension dim_percent_ld_miss = {
+ .header = HEADER_SPAN_LOW("Miss"),
+ .name = "percent_ld_miss",
+ .cmp = percent_ld_miss_cmp,
+ .entry = percent_ld_miss_entry,
+ .color = percent_ld_miss_color,
+ .width = 7,
+};
+
static struct c2c_dimension dim_percent_stores_l1hit = {
.header = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
.name = "percent_stores_l1hit",
@@ -1785,6 +1890,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_cl_lcl_hitm,
&dim_cl_rmt_hitm,
&dim_cl_tot_ld_hit,
+ &dim_cl_tot_ld_miss,
&dim_tot_stores,
&dim_stores_l1hit,
&dim_stores_l1miss,
@@ -1802,6 +1908,7 @@ static struct c2c_dimension *dimensions[] = {
&dim_percent_rmt_hitm,
&dim_percent_lcl_hitm,
&dim_percent_ld_hit,
+ &dim_percent_ld_miss,
&dim_percent_tot_ld_hit,
&dim_percent_stores_l1hit,
&dim_percent_stores_l1miss,
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 04/11] perf c2c: Rename for shared cache line stats
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
` (2 preceding siblings ...)
2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2021-01-06 7:44 ` Namhyung Kim
2020-12-13 13:38 ` [PATCH v2 05/11] perf c2c: Refactor hist entry validation Leo Yan
` (7 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
For shared cache line statistics, it relies on HITM. We can use more
general naming rather than only binding to HITM, so replace "hitm_stats"
with "shared_clines_stats" in structure perf_c2c, and rename function
resort_hitm_cb() to resort_shared_cl_cb().
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 27745340c14a..580c4ead68db 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -97,8 +97,8 @@ struct perf_c2c {
bool symbol_full;
bool stitch_lbr;
- /* HITM shared clines stats */
- struct c2c_stats hitm_stats;
+ /* Shared clines stats */
+ struct c2c_stats shared_clines_stats;
int shared_clines;
int display;
@@ -2251,7 +2251,7 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
struct c2c_hists *c2c_hists;
- bool display = he__display(he, &c2c.hitm_stats);
+ bool display = he__display(he, &c2c.shared_clines_stats);
c2c_he = container_of(he, struct c2c_hist_entry, he);
c2c_hists = c2c_he->hists;
@@ -2338,14 +2338,14 @@ static int setup_nodes(struct perf_session *session)
#define HAS_HITMS(__h) ((__h)->stats.lcl_hitm || (__h)->stats.rmt_hitm)
-static int resort_hitm_cb(struct hist_entry *he, void *arg __maybe_unused)
+static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
{
struct c2c_hist_entry *c2c_he;
c2c_he = container_of(he, struct c2c_hist_entry, he);
if (HAS_HITMS(c2c_he)) {
c2c.shared_clines++;
- c2c_add_stats(&c2c.hitm_stats, &c2c_he->stats);
+ c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
}
return 0;
@@ -2416,7 +2416,7 @@ static void print_c2c__display_stats(FILE *out)
static void print_shared_cacheline_info(FILE *out)
{
- struct c2c_stats *stats = &c2c.hitm_stats;
+ struct c2c_stats *stats = &c2c.shared_clines_stats;
int hitm_cnt = stats->lcl_hitm + stats->rmt_hitm;
fprintf(out, "=================================================\n");
@@ -3117,7 +3117,7 @@ static int perf_c2c__report(int argc, const char **argv)
ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
hists__collapse_resort(&c2c.hists.hists, NULL);
- hists__output_resort_cb(&c2c.hists.hists, &prog, resort_hitm_cb);
+ hists__output_resort_cb(&c2c.hists.hists, &prog, resort_shared_cl_cb);
hists__iterate_cb(&c2c.hists.hists, resort_cl_cb);
ui_progress__finish();
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 05/11] perf c2c: Refactor hist entry validation
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
` (3 preceding siblings ...)
2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
` (6 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
This patch has no any functionality changes but refactors hist entry
validation for cache line resorting.
It renames function "valid_hitm_or_store()" to "is_valid_hist_entry()",
changes return type from integer type to bool type. In the function,
it uses switch-case instead of ternary operators, which is easier
to extend for more display types.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 580c4ead68db..5cd30c083d6c 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2178,16 +2178,32 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
return he->filtered == 0;
}
-static inline int valid_hitm_or_store(struct hist_entry *he)
+static inline bool is_valid_hist_entry(struct hist_entry *he)
{
struct c2c_hist_entry *c2c_he;
- bool has_hitm;
+ bool has_record = false;
c2c_he = container_of(he, struct c2c_hist_entry, he);
- has_hitm = c2c.display == DISPLAY_TOT ? c2c_he->stats.tot_hitm :
- c2c.display == DISPLAY_LCL ? c2c_he->stats.lcl_hitm :
- c2c_he->stats.rmt_hitm;
- return has_hitm || c2c_he->stats.store;
+
+ /* It's a valid entry if contains stores */
+ if (c2c_he->stats.store)
+ return true;
+
+ switch (c2c.display) {
+ case DISPLAY_LCL:
+ has_record = !!c2c_he->stats.lcl_hitm;
+ break;
+ case DISPLAY_RMT:
+ has_record = !!c2c_he->stats.rmt_hitm;
+ break;
+ case DISPLAY_TOT:
+ has_record = !!c2c_he->stats.tot_hitm;
+ break;
+ default:
+ break;
+ }
+
+ return has_record;
}
static void set_node_width(struct c2c_hist_entry *c2c_he, int len)
@@ -2241,7 +2257,7 @@ static int filter_cb(struct hist_entry *he, void *arg __maybe_unused)
calc_width(c2c_he);
- if (!valid_hitm_or_store(he))
+ if (!is_valid_hist_entry(he))
he->filtered = HIST_FILTER__C2C;
return 0;
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 06/11] perf c2c: Refactor display filter macro
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
` (4 preceding siblings ...)
2020-12-13 13:38 ` [PATCH v2 05/11] perf c2c: Refactor hist entry validation Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2021-01-06 7:47 ` Namhyung Kim
2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
` (5 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
When sort on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm),
macro FILTER_HITM is to filter out the cache line entries if its
overhead is less than 1%.
This patch is to refactor macro FILTER_HITM. It uses more gernal name
FILTER_DISPLAY to replace the old name; and refines its parameter,
rather than passing field name for the data structure, it changes to
pass the cache line's statistic value and the sum value, this is more
flexsible, e.g. if consider to extend for sorting on all load hits
which combines multiple fields from structure c2c_stats.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 5cd30c083d6c..f11c3c84bb2b 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2151,24 +2151,27 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
c2c_he = container_of(he, struct c2c_hist_entry, he);
-#define FILTER_HITM(__h) \
- if (stats->__h) { \
- ld_dist = ((double)c2c_he->stats.__h / stats->__h); \
+#define FILTER_DISPLAY(val, sum) \
+{ \
+ if ((sum)) { \
+ ld_dist = ((double)(val) / (sum)); \
if (ld_dist < DISPLAY_LINE_LIMIT) \
he->filtered = HIST_FILTER__C2C; \
} else { \
he->filtered = HIST_FILTER__C2C; \
- }
+ } \
+}
switch (c2c.display) {
case DISPLAY_LCL:
- FILTER_HITM(lcl_hitm);
+ FILTER_DISPLAY(c2c_he->stats.lcl_hitm, stats->lcl_hitm);
break;
case DISPLAY_RMT:
- FILTER_HITM(rmt_hitm);
+ FILTER_DISPLAY(c2c_he->stats.rmt_hitm, stats->rmt_hitm);
break;
case DISPLAY_TOT:
- FILTER_HITM(tot_hitm);
+ FILTER_DISPLAY(c2c_he->stats.tot_hitm, stats->tot_hitm);
+ break;
default:
break;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 07/11] perf c2c: Refactor node display macro
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
` (5 preceding siblings ...)
2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2021-01-06 7:47 ` Namhyung Kim
2020-12-13 13:38 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
` (4 subsequent siblings)
11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
The macro DISPLAY_HITM() is used to calculate HITM percentage introduced
by every node and it's shown for the node info.
This patch refactors the macro, it is renamed it as DISPLAY_METRICS().
And the parameters is changed for passing the metric's statistic value
and the sum value, this is flexsible for later's extension, e.g. it's
easier to be used for metrics which combines multiple fields from
structure c2c_stats.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index f11c3c84bb2b..50018bfb1089 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1324,23 +1324,26 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
advance_hpp(hpp, ret);
- #define DISPLAY_HITM(__h) \
- if (c2c_he->stats.__h> 0) { \
+ #define DISPLAY_METRICS(val, sum) \
+ { \
+ if ((sum) > 0) { \
ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ", \
- percent(stats->__h, c2c_he->stats.__h));\
+ percent((val), (sum))); \
} else { \
ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a"); \
- }
+ } \
+ }
switch (c2c.display) {
case DISPLAY_RMT:
- DISPLAY_HITM(rmt_hitm);
+ DISPLAY_METRICS(stats->rmt_hitm, c2c_he->stats.rmt_hitm);
break;
case DISPLAY_LCL:
- DISPLAY_HITM(lcl_hitm);
+ DISPLAY_METRICS(stats->lcl_hitm, c2c_he->stats.lcl_hitm);
break;
case DISPLAY_TOT:
- DISPLAY_HITM(tot_hitm);
+ DISPLAY_METRICS(stats->tot_hitm, c2c_he->stats.tot_hitm);
+ break;
default:
break;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 08/11] perf c2c: Refactor node header
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
` (6 preceding siblings ...)
2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 09/11] perf c2c: Add local variables for output metrics Leo Yan
` (3 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
The node header array contains 3 items, each item is used for one of
the 3 flavors for node accessing info. To extend sorting on all load
references and not always stick to HITMs, the second header string
"Node{cpus %hitms %stores}" should be adjusted (e.g. it's changed as
"Node{cpus %loads %stores}").
For this reason, this patch changes the node header array to three
flat variables and uses switch-case in function setup_nodes_header(),
thus it is easier for altering the header string.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 50018bfb1089..846ee58d6cfb 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1806,12 +1806,6 @@ static struct c2c_dimension dim_dso = {
.se = &sort_dso,
};
-static struct c2c_header header_node[3] = {
- HEADER_LOW("Node"),
- HEADER_LOW("Node{cpus %hitms %stores}"),
- HEADER_LOW("Node{cpu list}"),
-};
-
static struct c2c_dimension dim_node = {
.name = "node",
.cmp = empty_cmp,
@@ -2293,9 +2287,27 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
return 0;
}
+static struct c2c_header header_node_0 = HEADER_LOW("Node");
+static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");
+
static void setup_nodes_header(void)
{
- dim_node.header = header_node[c2c.node_info];
+ switch (c2c.node_info) {
+ case 0:
+ dim_node.header = header_node_0;
+ break;
+ case 1:
+ dim_node.header = header_node_1;
+ break;
+ case 2:
+ dim_node.header = header_node_2;
+ break;
+ default:
+ break;
+ }
+
+ return;
}
static int setup_nodes(struct perf_session *session)
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 09/11] perf c2c: Add local variables for output metrics
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
` (7 preceding siblings ...)
2020-12-13 13:38 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
` (2 subsequent siblings)
11 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
For different display types, it might use different metrics for
outputting and sorting. So this patch adds several local variables:
"cl_output": pointer for outputting single cache line metrics;
"output_str": pointer for outputting cache line metrics;
"sort_str": pointer to the sorting metrics.
This is flexsible and the variables can be assigned to different strings
based on the specified display type.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 59 ++++++++++++++++++++++------------------
1 file changed, 33 insertions(+), 26 deletions(-)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 846ee58d6cfb..9342c30d86ee 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2500,16 +2500,17 @@ static void print_pareto(FILE *out)
struct perf_hpp_list hpp_list;
struct rb_node *nd;
int ret;
+ const char *cl_output;
+
+ cl_output = "cl_num,"
+ "cl_rmt_hitm,"
+ "cl_lcl_hitm,"
+ "cl_stores_l1hit,"
+ "cl_stores_l1miss,"
+ "dcacheline";
perf_hpp_list__init(&hpp_list);
- ret = hpp_list__parse(&hpp_list,
- "cl_num,"
- "cl_rmt_hitm,"
- "cl_lcl_hitm,"
- "cl_stores_l1hit,"
- "cl_stores_l1miss,"
- "dcacheline",
- NULL);
+ ret = hpp_list__parse(&hpp_list, cl_output, NULL);
if (WARN_ONCE(ret, "failed to setup sort entries\n"))
return;
@@ -3053,6 +3054,7 @@ static int perf_c2c__report(int argc, const char **argv)
OPT_END()
};
int err = 0;
+ const char *output_str, *sort_str = NULL;
argc = parse_options(argc, argv, options, report_c2c_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
@@ -3129,24 +3131,29 @@ static int perf_c2c__report(int argc, const char **argv)
goto out_mem2node;
}
- c2c_hists__reinit(&c2c.hists,
- "cl_idx,"
- "dcacheline,"
- "dcacheline_node,"
- "dcacheline_count,"
- "percent_hitm,"
- "tot_hitm,lcl_hitm,rmt_hitm,"
- "tot_recs,"
- "tot_loads,"
- "tot_stores,"
- "stores_l1hit,stores_l1miss,"
- "ld_fbhit,ld_l1hit,ld_l2hit,"
- "ld_lclhit,lcl_hitm,"
- "ld_rmthit,rmt_hitm,"
- "dram_lcl,dram_rmt",
- c2c.display == DISPLAY_TOT ? "tot_hitm" :
- c2c.display == DISPLAY_LCL ? "lcl_hitm" : "rmt_hitm"
- );
+ output_str = "cl_idx,"
+ "dcacheline,"
+ "dcacheline_node,"
+ "dcacheline_count,"
+ "percent_hitm,"
+ "tot_hitm,lcl_hitm,rmt_hitm,"
+ "tot_recs,"
+ "tot_loads,"
+ "tot_stores,"
+ "stores_l1hit,stores_l1miss,"
+ "ld_fbhit,ld_l1hit,ld_l2hit,"
+ "ld_lclhit,lcl_hitm,"
+ "ld_rmthit,rmt_hitm,"
+ "dram_lcl,dram_rmt";
+
+ if (c2c.display == DISPLAY_TOT)
+ sort_str = "tot_hitm";
+ else if (c2c.display == DISPLAY_RMT)
+ sort_str = "rmt_hitm";
+ else if (c2c.display == DISPLAY_LCL)
+ sort_str = "lcl_hitm";
+
+ c2c_hists__reinit(&c2c.hists, output_str, sort_str);
ui_progress__init(&prog, c2c.hists.hists.nr_entries, "Sorting...");
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
` (8 preceding siblings ...)
2020-12-13 13:38 ` [PATCH v2 09/11] perf c2c: Add local variables for output metrics Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2021-01-06 7:52 ` Namhyung Kim
2020-12-13 13:38 ` [PATCH v2 11/11] perf c2c: Update documentation for display option 'all' Leo Yan
2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
11 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
Except the existed three display options 'tot', 'rmt', 'lcl', this patch
adds option 'all' so can sort on the all cache hit for load operation.
This new introduced option can be a choice for profiling cache false
sharing if the memory event doesn't contain HITM tags.
For displaying with option 'all', the "Shared Data Cache Line Table" and
"Shared Cache Line Distribution Pareto" both have difference comparing
to other three display options.
For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
it sorts with the metrics "tot_ld_hit" and "percent_tot_ld_hit". If
without HITM metrics, users can analyze the load hit statistics for all
cache levels, so the dimensions of total load hit is used to replace
HITM dimensions.
For Pareto, every single cache line shows the metrics "cl_tot_ld_hit"
and "cl_tot_ld_miss" instead of "cl_rmt_hitm" and "percent_lcl_hitm",
and the single cache line view is sorted by metrics "tot_ld_hit".
As result, we can get the 'all' display as follows:
# perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio
[...]
=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- Load Hit Load Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ........ ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0x556f25dff100 0 1895 75.73% 4591 7840 4591 3249 2633 616 849 2734 67 58 883 0 0 0 0
1 0x556f25dff080 0 1 13.10% 794 794 794 0 0 0 164 486 28 20 96 0 0 0 0
2 0x556f25dff0c0 0 1 10.01% 607 607 607 0 0 0 107 5 5 488 2 0 0 0 0
=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# -- Load Refs -- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
# Num Hit Miss L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
# ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
#
-------------------------------------------------------------
0 4591 0 2633 616 0x556f25dff100
-------------------------------------------------------------
20.52% 0.00% 0.00% 0.00% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc1d 0 2200 1276 942 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
19.82% 0.00% 38.06% 0.00% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc16 0 2190 1130 1912 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
18.25% 0.00% 56.63% 0.00% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc16 0 2173 1074 2329 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
18.23% 0.00% 0.00% 0.00% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc1d 0 2013 1220 837 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 0.00% 3.11% 59.90% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc28 0 0 0 451 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 0.00% 2.20% 40.10% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc28 0 0 0 305 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
12.00% 0.00% 0.00% 0.00% 0x20 0 1 28079 28083:reader_thd 0x556f25bfdc73 0 159 107 551 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
11.17% 0.00% 0.00% 0.00% 0x20 0 1 28079 28084:reader_thd 0x556f25bfdc73 0 148 108 513 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
[...]
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/builtin-c2c.c | 139 ++++++++++++++++++++++++++++-----------
1 file changed, 101 insertions(+), 38 deletions(-)
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 9342c30d86ee..0df4a4a30f7a 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -113,13 +113,15 @@ enum {
DISPLAY_LCL,
DISPLAY_RMT,
DISPLAY_TOT,
+ DISPLAY_ALL,
DISPLAY_MAX,
};
static const char *display_str[DISPLAY_MAX] = {
- [DISPLAY_LCL] = "Local",
- [DISPLAY_RMT] = "Remote",
- [DISPLAY_TOT] = "Total",
+ [DISPLAY_LCL] = "Local HITMs",
+ [DISPLAY_RMT] = "Remote HITMs",
+ [DISPLAY_TOT] = "Total HITMs",
+ [DISPLAY_ALL] = "All Load Access",
};
static const struct option c2c_options[] = {
@@ -883,6 +885,11 @@ static double percent_hitm(struct c2c_hist_entry *c2c_he)
case DISPLAY_TOT:
st = stats->tot_hitm;
tot = total->tot_hitm;
+ break;
+ case DISPLAY_ALL:
+ ui__warning("Calculate hitm percent for display 'all';\n"
+ "should never happen!\n");
+ break;
default:
break;
}
@@ -1344,6 +1351,10 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
case DISPLAY_TOT:
DISPLAY_METRICS(stats->tot_hitm, c2c_he->stats.tot_hitm);
break;
+ case DISPLAY_ALL:
+ DISPLAY_METRICS(TOT_LD_HIT(stats),
+ TOT_LD_HIT(&c2c_he->stats));
+ break;
default:
break;
}
@@ -1692,6 +1703,7 @@ static struct c2c_header percent_hitm_header[] = {
[DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
[DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
[DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
+ [DISPLAY_ALL] = HEADER_BOTH("LLC", "Hit"),
};
static struct c2c_dimension dim_percent_hitm = {
@@ -2169,6 +2181,9 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
case DISPLAY_TOT:
FILTER_DISPLAY(c2c_he->stats.tot_hitm, stats->tot_hitm);
break;
+ case DISPLAY_ALL:
+ FILTER_DISPLAY(TOT_LD_HIT(&c2c_he->stats), TOT_LD_HIT(stats));
+ break;
default:
break;
}
@@ -2199,6 +2214,9 @@ static inline bool is_valid_hist_entry(struct hist_entry *he)
case DISPLAY_TOT:
has_record = !!c2c_he->stats.tot_hitm;
break;
+ case DISPLAY_ALL:
+ has_record = !!TOT_LD_HIT(&c2c_he->stats);
+ break;
default:
break;
}
@@ -2288,7 +2306,10 @@ static int resort_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
}
static struct c2c_header header_node_0 = HEADER_LOW("Node");
-static struct c2c_header header_node_1 = HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_hitms_stores =
+ HEADER_LOW("Node{cpus %hitms %stores}");
+static struct c2c_header header_node_1_loads_stores =
+ HEADER_LOW("Node{cpus %loads %stores}");
static struct c2c_header header_node_2 = HEADER_LOW("Node{cpu list}");
static void setup_nodes_header(void)
@@ -2298,7 +2319,10 @@ static void setup_nodes_header(void)
dim_node.header = header_node_0;
break;
case 1:
- dim_node.header = header_node_1;
+ if (c2c.display == DISPLAY_ALL)
+ dim_node.header = header_node_1_loads_stores;
+ else
+ dim_node.header = header_node_1_hitms_stores;
break;
case 2:
dim_node.header = header_node_2;
@@ -2377,11 +2401,13 @@ static int resort_shared_cl_cb(struct hist_entry *he, void *arg __maybe_unused)
struct c2c_hist_entry *c2c_he;
c2c_he = container_of(he, struct c2c_hist_entry, he);
- if (HAS_HITMS(c2c_he)) {
+ if (c2c.display == DISPLAY_ALL && TOT_LD_HIT(&c2c_he->stats)) {
+ c2c.shared_clines++;
+ c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
+ } else if (HAS_HITMS(c2c_he)) {
c2c.shared_clines++;
c2c_add_stats(&c2c.shared_clines_stats, &c2c_he->stats);
}
-
return 0;
}
@@ -2502,12 +2528,21 @@ static void print_pareto(FILE *out)
int ret;
const char *cl_output;
- cl_output = "cl_num,"
- "cl_rmt_hitm,"
- "cl_lcl_hitm,"
- "cl_stores_l1hit,"
- "cl_stores_l1miss,"
- "dcacheline";
+ if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
+ c2c.display == DISPLAY_RMT)
+ cl_output = "cl_num,"
+ "cl_rmt_hitm,"
+ "cl_lcl_hitm,"
+ "cl_stores_l1hit,"
+ "cl_stores_l1miss,"
+ "dcacheline";
+ else /* c2c.display == DISPLAY_ALL */
+ cl_output = "cl_num,"
+ "cl_tot_ld_hit,"
+ "cl_tot_ld_miss,"
+ "cl_stores_l1hit,"
+ "cl_stores_l1miss,"
+ "dcacheline";
perf_hpp_list__init(&hpp_list);
ret = hpp_list__parse(&hpp_list, cl_output, NULL);
@@ -2543,7 +2578,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
fprintf(out, "%-36s: %s\n", first ? " Events" : "", evsel__name(evsel));
first = false;
}
- fprintf(out, " Cachelines sort on : %s HITMs\n",
+ fprintf(out, " Cachelines sort on : %s\n",
display_str[c2c.display]);
fprintf(out, " Cacheline data grouping : %s\n", c2c.cl_sort);
}
@@ -2700,7 +2735,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
{
scnprintf(bf, size,
"Shared Data Cache Line Table "
- "(%lu entries, sorted on %s HITMs)",
+ "(%lu entries, sorted on %s)",
browser->nr_non_filtered_entries,
display_str[c2c.display]);
return 0;
@@ -2906,6 +2941,8 @@ static int setup_display(const char *str)
c2c.display = DISPLAY_RMT;
else if (!strcmp(display, "lcl"))
c2c.display = DISPLAY_LCL;
+ else if (!strcmp(display, "all"))
+ c2c.display = DISPLAY_ALL;
else {
pr_err("failed: unknown display type: %s\n", str);
return -1;
@@ -2952,10 +2989,12 @@ static int build_cl_output(char *cl_sort, bool no_source)
}
if (asprintf(&c2c.cl_output,
- "%s%s%s%s%s%s%s%s%s%s",
+ "%s%s%s%s%s%s%s%s%s%s%s",
c2c.use_stdio ? "cl_num_empty," : "",
- "percent_rmt_hitm,"
- "percent_lcl_hitm,"
+ c2c.display == DISPLAY_ALL ? "percent_ld_hit,"
+ "percent_ld_miss," :
+ "percent_rmt_hitm,"
+ "percent_lcl_hitm,",
"percent_stores_l1hit,"
"percent_stores_l1miss,"
"offset,offset_node,dcacheline_count,",
@@ -2984,6 +3023,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
static int setup_coalesce(const char *coalesce, bool no_source)
{
const char *c = coalesce ?: coalesce_default;
+ const char *sort_str = NULL;
if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
return -ENOMEM;
@@ -2991,12 +3031,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
if (build_cl_output(c2c.cl_sort, no_source))
return -1;
- if (asprintf(&c2c.cl_resort, "offset,%s",
- c2c.display == DISPLAY_TOT ?
- "tot_hitm" :
- c2c.display == DISPLAY_RMT ?
- "rmt_hitm,lcl_hitm" :
- "lcl_hitm,rmt_hitm") < 0)
+ if (c2c.display == DISPLAY_TOT)
+ sort_str = "tot_hitm";
+ else if (c2c.display == DISPLAY_RMT)
+ sort_str = "rmt_hitm,lcl_hitm";
+ else if (c2c.display == DISPLAY_LCL)
+ sort_str = "lcl_hitm,rmt_hitm";
+ else if (c2c.display == DISPLAY_ALL)
+ sort_str = "tot_ld_hit";
+
+ if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
return -ENOMEM;
pr_debug("coalesce sort fields: %s\n", c2c.cl_sort);
@@ -3131,20 +3175,37 @@ static int perf_c2c__report(int argc, const char **argv)
goto out_mem2node;
}
- output_str = "cl_idx,"
- "dcacheline,"
- "dcacheline_node,"
- "dcacheline_count,"
- "percent_hitm,"
- "tot_hitm,lcl_hitm,rmt_hitm,"
- "tot_recs,"
- "tot_loads,"
- "tot_stores,"
- "stores_l1hit,stores_l1miss,"
- "ld_fbhit,ld_l1hit,ld_l2hit,"
- "ld_lclhit,lcl_hitm,"
- "ld_rmthit,rmt_hitm,"
- "dram_lcl,dram_rmt";
+ if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
+ c2c.display == DISPLAY_RMT)
+ output_str = "cl_idx,"
+ "dcacheline,"
+ "dcacheline_node,"
+ "dcacheline_count,"
+ "percent_hitm,"
+ "tot_hitm,lcl_hitm,rmt_hitm,"
+ "tot_recs,"
+ "tot_loads,"
+ "tot_stores,"
+ "stores_l1hit,stores_l1miss,"
+ "ld_fbhit,ld_l1hit,ld_l2hit,"
+ "ld_lclhit,lcl_hitm,"
+ "ld_rmthit,rmt_hitm,"
+ "dram_lcl,dram_rmt";
+ else /* c2c.display == DISPLAY_ALL */
+ output_str = "cl_idx,"
+ "dcacheline,"
+ "dcacheline_node,"
+ "dcacheline_count,"
+ "percent_tot_ld_hit,"
+ "tot_ld_hit,"
+ "tot_recs,"
+ "tot_loads,"
+ "tot_stores,"
+ "stores_l1hit,stores_l1miss,"
+ "ld_fbhit,ld_l1hit,ld_l2hit,"
+ "ld_lclhit,lcl_hitm,"
+ "ld_rmthit,rmt_hitm,"
+ "dram_lcl,dram_rmt";
if (c2c.display == DISPLAY_TOT)
sort_str = "tot_hitm";
@@ -3152,6 +3213,8 @@ static int perf_c2c__report(int argc, const char **argv)
sort_str = "rmt_hitm";
else if (c2c.display == DISPLAY_LCL)
sort_str = "lcl_hitm";
+ else if (c2c.display == DISPLAY_ALL)
+ sort_str = "tot_ld_hit";
c2c_hists__reinit(&c2c.hists, output_str, sort_str);
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 11/11] perf c2c: Update documentation for display option 'all'
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
` (9 preceding siblings ...)
2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
@ 2020-12-13 13:38 ` Leo Yan
2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
11 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
Cc: Leo Yan
Since the new display option 'all' is introduced, this patch is to
update the documentation to reflect it.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/Documentation/perf-c2c.txt | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index c81d72e3eecf..da49c3d26316 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -109,7 +109,8 @@ REPORT OPTIONS
-d::
--display::
- Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
+ Switch to HITM type (rmt, lcl) or all load cache hit (all) to display
+ and sort on. Total HITMs as default.
--stitch-lbr::
Show callgraph with stitched LBRs, which may have more complete
@@ -174,12 +175,18 @@ For each cacheline in the 1) list we display following data:
Cacheline
- cacheline address (hex number)
- Rmt/Lcl Hitm
+ Rmt/Lcl Hitm (For display with HITM types)
- cacheline percentage of all Remote/Local HITM accesses
- LLC Load Hitm - Total, LclHitm, RmtHitm
+ LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
- count of Total/Local/Remote load HITMs
+ LD Hit Pct (For display 'all')
+ - cacheline percentage of all load hit accesses
+
+ LD Hit Total (For display 'all')
+ - sum of all load hit accesses
+
Total records
- sum of all cachelines accesses
@@ -207,9 +214,12 @@ For each cacheline in the 1) list we display following data:
For each offset in the 2) list we display following data:
- HITM - Rmt, Lcl
+ HITM - Rmt, Lcl (For display with HITM types)
- % of Remote/Local HITM accesses for given offset within cacheline
+ Load Refs - Hit, Miss (For display 'all')
+ - % of load accesses that hit/missed cache for given offset within cacheline
+
Store Refs - L1 Hit, L1 Miss
- % of store accesses that hit/missed L1 for given offset within cacheline
@@ -249,7 +259,8 @@ The 'Node' field displays nodes that accesses given cacheline
offset. Its output comes in 3 flavors:
- node IDs separated by ','
- node IDs with stats for each ID, in following format:
- Node{cpus %hitms %stores}
+ Node{cpus %hitms %stores} (For display with HITM types)
+ Node{cpus %loads %stores} (For display with "all")
- node IDs with list of affected CPUs in following format:
Node{cpu list}
--
2.17.1
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
` (10 preceding siblings ...)
2020-12-13 13:38 ` [PATCH v2 11/11] perf c2c: Update documentation for display option 'all' Leo Yan
@ 2021-01-03 22:52 ` Jiri Olsa
2021-01-04 2:09 ` Leo Yan
11 siblings, 1 reply; 31+ messages in thread
From: Jiri Olsa @ 2021-01-03 22:52 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
On Sun, Dec 13, 2020 at 01:38:39PM +0000, Leo Yan wrote:
> This patch set is to sort cache line for all load operations which hit
> any cache levels. For single cache line view, it shows the load
> references for loads with cache hits and with cache misses respectively.
>
> This series is a following for the old patch set "perf c2c: Sort
> cacheline with LLC load" [1], in the old patch set it tries to sort
> cache line with the load operations in last level cache (LLC), after
> testing we found the trace data doesn't contain LLC events if the
> platform isn't a NUMA system. For this reason, this series refines the
> implementation to sort on all cache levels hits of load operations; it's
> reasonable for us to review the load and store opreations, if detects
> any cache line is accessed by multi-threads, this hints that the cache
> line is possible for false sharing.
>
> This patch set is clearly applied on perf/core branch with the latest
> commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
> evlist' record methods"). And the changes has been tested on x86 and
> Arm64, the testing result is shown as below.
SNIP
>
>
> [...]
>
> Changes from v1:
> * Changed from sorting on LLC to sorting on all loads with cache hits;
> * Added patches 06/11, 07/11 for refactoring macros;
> * Added patch 08/11 for refactoring node header, so can display "%loads"
> rather than "%hitms" in the header;
> * Added patch 09/11 to add local pointers for pointing to output metrics
> string and sort string (Juri);
> * Added warning in percent_hitm() for the display "all", which should
> never happen (Juri).
>
> [1] https://lore.kernel.org/patchwork/cover/1321514/
>
>
> Leo Yan (11):
> perf c2c: Add dimensions for total load hit
> perf c2c: Add dimensions for load hit
> perf c2c: Add dimensions for load miss
> perf c2c: Rename for shared cache line stats
> perf c2c: Refactor hist entry validation
> perf c2c: Refactor display filter macro
> perf c2c: Refactor node display macro
> perf c2c: Refactor node header
> perf c2c: Add local variables for output metrics
> perf c2c: Sort on all cache hit for load operations
> perf c2c: Update documentation for display option 'all'
>
> tools/perf/Documentation/perf-c2c.txt | 21 +-
> tools/perf/builtin-c2c.c | 548 ++++++++++++++++++++++----
> 2 files changed, 487 insertions(+), 82 deletions(-)
Joe might want to test it first, but it looks all good to me:
Acked-by: Jiri Olsa <jolsa@redhat.com>
thanks,
jirka
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
@ 2021-01-04 2:09 ` Leo Yan
2021-01-04 9:35 ` Jiri Olsa
0 siblings, 1 reply; 31+ messages in thread
From: Leo Yan @ 2021-01-04 2:09 UTC (permalink / raw)
To: Jiri Olsa
Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
On Sun, Jan 03, 2021 at 11:52:19PM +0100, Jiri Olsa wrote:
> On Sun, Dec 13, 2020 at 01:38:39PM +0000, Leo Yan wrote:
> > This patch set is to sort cache line for all load operations which hit
> > any cache levels. For single cache line view, it shows the load
> > references for loads with cache hits and with cache misses respectively.
> >
> > This series is a following for the old patch set "perf c2c: Sort
> > cacheline with LLC load" [1], in the old patch set it tries to sort
> > cache line with the load operations in last level cache (LLC), after
> > testing we found the trace data doesn't contain LLC events if the
> > platform isn't a NUMA system. For this reason, this series refines the
> > implementation to sort on all cache levels hits of load operations; it's
> > reasonable for us to review the load and store opreations, if detects
> > any cache line is accessed by multi-threads, this hints that the cache
> > line is possible for false sharing.
> >
> > This patch set is clearly applied on perf/core branch with the latest
> > commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
> > evlist' record methods"). And the changes has been tested on x86 and
> > Arm64, the testing result is shown as below.
>
> SNIP
>
> >
> >
> > [...]
> >
> > Changes from v1:
> > * Changed from sorting on LLC to sorting on all loads with cache hits;
> > * Added patches 06/11, 07/11 for refactoring macros;
> > * Added patch 08/11 for refactoring node header, so can display "%loads"
> > rather than "%hitms" in the header;
> > * Added patch 09/11 to add local pointers for pointing to output metrics
> > string and sort string (Juri);
> > * Added warning in percent_hitm() for the display "all", which should
> > never happen (Juri).
> >
> > [1] https://lore.kernel.org/patchwork/cover/1321514/
> >
> >
> > Leo Yan (11):
> > perf c2c: Add dimensions for total load hit
> > perf c2c: Add dimensions for load hit
> > perf c2c: Add dimensions for load miss
> > perf c2c: Rename for shared cache line stats
> > perf c2c: Refactor hist entry validation
> > perf c2c: Refactor display filter macro
> > perf c2c: Refactor node display macro
> > perf c2c: Refactor node header
> > perf c2c: Add local variables for output metrics
> > perf c2c: Sort on all cache hit for load operations
> > perf c2c: Update documentation for display option 'all'
> >
> > tools/perf/Documentation/perf-c2c.txt | 21 +-
> > tools/perf/builtin-c2c.c | 548 ++++++++++++++++++++++----
> > 2 files changed, 487 insertions(+), 82 deletions(-)
>
> Joe might want to test it first, but it looks all good to me:
>
> Acked-by: Jiri Olsa <jolsa@redhat.com>
Thanks for the review, Jiri.
Note, after testing with Arm SPE, we found the store operations don't
contain the information for L1 cache hit or miss, this leads to there
have no statistics for "st_l1hit" and "st_l1miss"; finally the single
cache line view only can show the load samples and fails to show store
opreations due to the empty statistics for "st_l1hit" and "st_l1miss".
This is related the hardware issue, after some discussion internally,
so far cannot find a easy way to set memory flag for L1 cache hit or
miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
PERF_MEM_LVL_MISS for store's L1 cache accessing).
Given it is uncertain for this issue, please hold on for this patch
series and I will resend if have any conclusion.
And really sorry I notify this too late.
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
2021-01-04 2:09 ` Leo Yan
@ 2021-01-04 9:35 ` Jiri Olsa
2021-01-15 15:17 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 31+ messages in thread
From: Jiri Olsa @ 2021-01-04 9:35 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
Al Grant, James Clark, linux-kernel
On Mon, Jan 04, 2021 at 10:09:38AM +0800, Leo Yan wrote:
SNIP
> > > Leo Yan (11):
> > > perf c2c: Add dimensions for total load hit
> > > perf c2c: Add dimensions for load hit
> > > perf c2c: Add dimensions for load miss
> > > perf c2c: Rename for shared cache line stats
> > > perf c2c: Refactor hist entry validation
> > > perf c2c: Refactor display filter macro
> > > perf c2c: Refactor node display macro
> > > perf c2c: Refactor node header
> > > perf c2c: Add local variables for output metrics
> > > perf c2c: Sort on all cache hit for load operations
> > > perf c2c: Update documentation for display option 'all'
> > >
> > > tools/perf/Documentation/perf-c2c.txt | 21 +-
> > > tools/perf/builtin-c2c.c | 548 ++++++++++++++++++++++----
> > > 2 files changed, 487 insertions(+), 82 deletions(-)
> >
> > Joe might want to test it first, but it looks all good to me:
> >
> > Acked-by: Jiri Olsa <jolsa@redhat.com>
>
> Thanks for the review, Jiri.
>
> Note, after testing with Arm SPE, we found the store operations don't
> contain the information for L1 cache hit or miss, this leads to there
> have no statistics for "st_l1hit" and "st_l1miss"; finally the single
> cache line view only can show the load samples and fails to show store
> opreations due to the empty statistics for "st_l1hit" and "st_l1miss".
>
> This is related the hardware issue, after some discussion internally,
> so far cannot find a easy way to set memory flag for L1 cache hit or
> miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
> PERF_MEM_LVL_MISS for store's L1 cache accessing).
>
> Given it is uncertain for this issue, please hold on for this patch
> series and I will resend if have any conclusion.
>
> And really sorry I notify this too late.
no problem, I think we can take some of the refactoring patches anyway
thanks,
jirka
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 01/11] perf c2c: Add dimensions for total load hit
2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
@ 2021-01-06 7:38 ` Namhyung Kim
2021-01-11 7:49 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06 7:38 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
Hi,
On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Arm SPE trace data doesn't support HITM, but we still want to explore
> "perf c2c" tool to analyze cache false sharing. If without HITM tag,
> the tool cannot give out accurate result for cache false sharing, a
> candidate solution is to sort the total load operations and connect with
> the threads info, e.g. if multiple threads hit the same cache line for
> many times, this can give out the hint that it's likely to cause cache
> false sharing issue.
>
> Unlike having HITM tag, the proposed solution is not accurate and might
> introduce false positive reporting, but it's a pragmatic approach for
> detecting false sharing if memory event doesn't support HITM.
>
> To sort with the cache line hit, this patch adds dimensions for total
> load hit and the associated percentage calculation.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
> tools/perf/builtin-c2c.c | 112 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 112 insertions(+)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index c5babeaa3b38..3d5a2dc8b4fd 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -615,6 +615,47 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> return tot_hitm_left - tot_hitm_right;
> }
>
> +#define TOT_LD_HIT(stats) \
> + ((stats)->ld_fbhit + \
> + (stats)->ld_l1hit + \
> + (stats)->ld_l2hit + \
> + (stats)->ld_llchit + \
> + (stats)->lcl_hitm + \
> + (stats)->rmt_hitm + \
> + (stats)->rmt_hit)
It doesn't need to be a macro, why not use a static inline function?
Thanks,
Namhyung
> +
> +static int tot_ld_hit_entry(struct perf_hpp_fmt *fmt,
> + struct perf_hpp *hpp,
> + struct hist_entry *he)
> +{
> + struct c2c_hist_entry *c2c_he;
> + int width = c2c_width(fmt, hpp, he->hists);
> + unsigned int tot_hit;
> +
> + c2c_he = container_of(he, struct c2c_hist_entry, he);
> + tot_hit = TOT_LD_HIT(&c2c_he->stats);
> +
> + return scnprintf(hpp->buf, hpp->size, "%*u", width, tot_hit);
> +}
> +
> +static int64_t tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> + struct hist_entry *left,
> + struct hist_entry *right)
> +{
> + struct c2c_hist_entry *c2c_left;
> + struct c2c_hist_entry *c2c_right;
> + uint64_t tot_hit_left;
> + uint64_t tot_hit_right;
> +
> + c2c_left = container_of(left, struct c2c_hist_entry, he);
> + c2c_right = container_of(right, struct c2c_hist_entry, he);
> +
> + tot_hit_left = TOT_LD_HIT(&c2c_left->stats);
> + tot_hit_right = TOT_LD_HIT(&c2c_right->stats);
> +
> + return tot_hit_left - tot_hit_right;
> +}
> +
> #define STAT_FN_ENTRY(__f) \
> static int \
> __f ## _entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp, \
> @@ -860,6 +901,58 @@ percent_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> return per_left - per_right;
> }
>
> +static double percent_tot_ld_hit(struct c2c_hist_entry *c2c_he)
> +{
> + struct c2c_hists *hists;
> + int tot = 0, st = 0;
> +
> + hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
> +
> + st = TOT_LD_HIT(&c2c_he->stats);
> + tot = TOT_LD_HIT(&hists->stats);
> +
> + return tot ? (double) st * 100 / tot : 0;
> +}
> +
> +static int
> +percent_tot_ld_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> + struct hist_entry *he)
> +{
> + struct c2c_hist_entry *c2c_he;
> + int width = c2c_width(fmt, hpp, he->hists);
> + char buf[10];
> + double per;
> +
> + c2c_he = container_of(he, struct c2c_hist_entry, he);
> + per = percent_tot_ld_hit(c2c_he);
> + return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
> +}
> +
> +static int
> +percent_tot_ld_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> + struct hist_entry *he)
> +{
> + return percent_color(fmt, hpp, he, percent_tot_ld_hit);
> +}
> +
> +static int64_t
> +percent_tot_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> + struct hist_entry *left, struct hist_entry *right)
> +{
> + struct c2c_hist_entry *c2c_left;
> + struct c2c_hist_entry *c2c_right;
> + double per_left;
> + double per_right;
> +
> + c2c_left = container_of(left, struct c2c_hist_entry, he);
> + c2c_right = container_of(right, struct c2c_hist_entry, he);
> +
> + per_left = percent_tot_ld_hit(c2c_left);
> + per_right = percent_tot_ld_hit(c2c_right);
> +
> + return per_left - per_right;
> +}
> +
> static struct c2c_stats *he_stats(struct hist_entry *he)
> {
> struct c2c_hist_entry *c2c_he;
> @@ -1412,6 +1505,14 @@ static struct c2c_dimension dim_ld_rmthit = {
> .width = 8,
> };
>
> +static struct c2c_dimension dim_tot_ld_hit = {
> + .header = HEADER_BOTH("Load Hit", "Total"),
> + .name = "tot_ld_hit",
> + .cmp = tot_ld_hit_cmp,
> + .entry = tot_ld_hit_entry,
> + .width = 8,
> +};
> +
> static struct c2c_dimension dim_tot_recs = {
> .header = HEADER_BOTH("Total", "records"),
> .name = "tot_recs",
> @@ -1460,6 +1561,15 @@ static struct c2c_dimension dim_percent_lcl_hitm = {
> .width = 7,
> };
>
> +static struct c2c_dimension dim_percent_tot_ld_hit = {
> + .header = HEADER_BOTH("Load Hit", "Pct"),
> + .name = "percent_tot_ld_hit",
> + .cmp = percent_tot_ld_hit_cmp,
> + .entry = percent_tot_ld_hit_entry,
> + .color = percent_tot_ld_hit_color,
> + .width = 8,
> +};
> +
> static struct c2c_dimension dim_percent_stores_l1hit = {
> .header = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
> .name = "percent_stores_l1hit",
> @@ -1615,11 +1725,13 @@ static struct c2c_dimension *dimensions[] = {
> &dim_ld_l2hit,
> &dim_ld_llchit,
> &dim_ld_rmthit,
> + &dim_tot_ld_hit,
> &dim_tot_recs,
> &dim_tot_loads,
> &dim_percent_hitm,
> &dim_percent_rmt_hitm,
> &dim_percent_lcl_hitm,
> + &dim_percent_tot_ld_hit,
> &dim_percent_stores_l1hit,
> &dim_percent_stores_l1miss,
> &dim_dram_lcl,
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 02/11] perf c2c: Add dimensions for load hit
2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
@ 2021-01-06 7:38 ` Namhyung Kim
2021-01-11 8:22 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06 7:38 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Add dimensions for load hit and its percentage calculation, which is to
> be displayed in the single cache line output.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
> tools/perf/builtin-c2c.c | 71 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 71 insertions(+)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 3d5a2dc8b4fd..00014e3d81fa 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -1052,6 +1052,58 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> return per_left - per_right;
> }
>
> +static double percent_ld_hit(struct c2c_hist_entry *c2c_he)
> +{
> + struct c2c_hists *hists;
> + int tot, st;
> +
> + hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
> +
> + st = TOT_LD_HIT(&c2c_he->stats);
> + tot = TOT_LD_HIT(&hists->stats);
> +
> + return percent(st, tot);
It's not clear to me what's different than percent_tot_ld_hit().
Thanks,
Namhyung
> +}
> +
> +static int
> +percent_ld_hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> + struct hist_entry *he)
> +{
> + struct c2c_hist_entry *c2c_he;
> + int width = c2c_width(fmt, hpp, he->hists);
> + char buf[10];
> + double per;
> +
> + c2c_he = container_of(he, struct c2c_hist_entry, he);
> + per = percent_ld_hit(c2c_he);
> + return scnprintf(hpp->buf, hpp->size, "%*s", width, PERC_STR(buf, per));
> +}
> +
> +static int
> +percent_ld_hit_color(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> + struct hist_entry *he)
> +{
> + return percent_color(fmt, hpp, he, percent_ld_hit);
> +}
> +
> +static int64_t
> +percent_ld_hit_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> + struct hist_entry *left, struct hist_entry *right)
> +{
> + struct c2c_hist_entry *c2c_left;
> + struct c2c_hist_entry *c2c_right;
> + double per_left;
> + double per_right;
> +
> + c2c_left = container_of(left, struct c2c_hist_entry, he);
> + c2c_right = container_of(right, struct c2c_hist_entry, he);
> +
> + per_left = percent_ld_hit(c2c_left);
> + per_right = percent_ld_hit(c2c_right);
> +
> + return per_left - per_right;
> +}
> +
> static int
> percent_stores_l1hit_entry(struct perf_hpp_fmt *fmt, struct perf_hpp *hpp,
> struct hist_entry *he)
> @@ -1417,6 +1469,14 @@ static struct c2c_dimension dim_cl_rmt_hitm = {
> .width = 7,
> };
>
> +static struct c2c_dimension dim_cl_tot_ld_hit = {
> + .header = HEADER_SPAN("--- Load ---", "Hit", 1),
> + .name = "cl_tot_ld_hit",
> + .cmp = tot_ld_hit_cmp,
> + .entry = tot_ld_hit_entry,
> + .width = 7,
> +};
> +
> static struct c2c_dimension dim_cl_lcl_hitm = {
> .header = HEADER_SPAN_LOW("Lcl"),
> .name = "cl_lcl_hitm",
> @@ -1570,6 +1630,15 @@ static struct c2c_dimension dim_percent_tot_ld_hit = {
> .width = 8,
> };
>
> +static struct c2c_dimension dim_percent_ld_hit = {
> + .header = HEADER_SPAN("-- Load Refs --", "Hit", 1),
> + .name = "percent_ld_hit",
> + .cmp = percent_ld_hit_cmp,
> + .entry = percent_ld_hit_entry,
> + .color = percent_ld_hit_color,
> + .width = 7,
> +};
> +
> static struct c2c_dimension dim_percent_stores_l1hit = {
> .header = HEADER_SPAN("-- Store Refs --", "L1 Hit", 1),
> .name = "percent_stores_l1hit",
> @@ -1715,6 +1784,7 @@ static struct c2c_dimension *dimensions[] = {
> &dim_rmt_hitm,
> &dim_cl_lcl_hitm,
> &dim_cl_rmt_hitm,
> + &dim_cl_tot_ld_hit,
> &dim_tot_stores,
> &dim_stores_l1hit,
> &dim_stores_l1miss,
> @@ -1731,6 +1801,7 @@ static struct c2c_dimension *dimensions[] = {
> &dim_percent_hitm,
> &dim_percent_rmt_hitm,
> &dim_percent_lcl_hitm,
> + &dim_percent_ld_hit,
> &dim_percent_tot_ld_hit,
> &dim_percent_stores_l1hit,
> &dim_percent_stores_l1miss,
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 03/11] perf c2c: Add dimensions for load miss
2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
@ 2021-01-06 7:42 ` Namhyung Kim
2021-01-11 8:41 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06 7:42 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Add dimensions for load miss and its percentage calculation, which is to
> be displayed in the single cache line output.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
> tools/perf/builtin-c2c.c | 107 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 107 insertions(+)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 00014e3d81fa..27745340c14a 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -624,6 +624,10 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> (stats)->rmt_hitm + \
> (stats)->rmt_hit)
>
> +#define TOT_LD_MISS(stats) \
> + ((stats)->lcl_dram + \
> + (stats)->rmt_dram)
> +
Is this true always? I'm not sure if there's a case where stores can go to DRAM
directly.. maybe like a kind of uncached accesses.
Also it can be a static function..
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 04/11] perf c2c: Rename for shared cache line stats
2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
@ 2021-01-06 7:44 ` Namhyung Kim
2021-01-11 8:42 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06 7:44 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> For shared cache line statistics, it relies on HITM. We can use more
> general naming rather than only binding to HITM, so replace "hitm_stats"
> with "shared_clines_stats" in structure perf_c2c, and rename function
> resort_hitm_cb() to resort_shared_cl_cb().
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
> tools/perf/builtin-c2c.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 27745340c14a..580c4ead68db 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -97,8 +97,8 @@ struct perf_c2c {
> bool symbol_full;
> bool stitch_lbr;
>
> - /* HITM shared clines stats */
> - struct c2c_stats hitm_stats;
> + /* Shared clines stats */
Please change it to "Shared cache line stats".
Thanks,
Namhyung
> + struct c2c_stats shared_clines_stats;
> int shared_clines;
>
> int display;
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 06/11] perf c2c: Refactor display filter macro
2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
@ 2021-01-06 7:47 ` Namhyung Kim
2021-01-11 8:43 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06 7:47 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> When sort on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm),
> macro FILTER_HITM is to filter out the cache line entries if its
> overhead is less than 1%.
>
> This patch is to refactor macro FILTER_HITM. It uses more gernal name
> FILTER_DISPLAY to replace the old name; and refines its parameter,
> rather than passing field name for the data structure, it changes to
> pass the cache line's statistic value and the sum value, this is more
> flexsible, e.g. if consider to extend for sorting on all load hits
> which combines multiple fields from structure c2c_stats.
As it doesn't use field names anymore, I think it's better to change it to
a static function.
Thanks,
Namhyung
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
> tools/perf/builtin-c2c.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 5cd30c083d6c..f11c3c84bb2b 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -2151,24 +2151,27 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
>
> c2c_he = container_of(he, struct c2c_hist_entry, he);
>
> -#define FILTER_HITM(__h) \
> - if (stats->__h) { \
> - ld_dist = ((double)c2c_he->stats.__h / stats->__h); \
> +#define FILTER_DISPLAY(val, sum) \
> +{ \
> + if ((sum)) { \
> + ld_dist = ((double)(val) / (sum)); \
> if (ld_dist < DISPLAY_LINE_LIMIT) \
> he->filtered = HIST_FILTER__C2C; \
> } else { \
> he->filtered = HIST_FILTER__C2C; \
> - }
> + } \
> +}
>
> switch (c2c.display) {
> case DISPLAY_LCL:
> - FILTER_HITM(lcl_hitm);
> + FILTER_DISPLAY(c2c_he->stats.lcl_hitm, stats->lcl_hitm);
> break;
> case DISPLAY_RMT:
> - FILTER_HITM(rmt_hitm);
> + FILTER_DISPLAY(c2c_he->stats.rmt_hitm, stats->rmt_hitm);
> break;
> case DISPLAY_TOT:
> - FILTER_HITM(tot_hitm);
> + FILTER_DISPLAY(c2c_he->stats.tot_hitm, stats->tot_hitm);
> + break;
> default:
> break;
> }
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 07/11] perf c2c: Refactor node display macro
2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
@ 2021-01-06 7:47 ` Namhyung Kim
2021-01-11 8:44 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06 7:47 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> The macro DISPLAY_HITM() is used to calculate HITM percentage introduced
> by every node and it's shown for the node info.
>
> This patch refactors the macro, it is renamed it as DISPLAY_METRICS().
> And the parameters is changed for passing the metric's statistic value
> and the sum value, this is flexsible for later's extension, e.g. it's
> easier to be used for metrics which combines multiple fields from
> structure c2c_stats.
Same as the previous one.
Thanks,
Namhyung
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
> tools/perf/builtin-c2c.c | 17 ++++++++++-------
> 1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index f11c3c84bb2b..50018bfb1089 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
> @@ -1324,23 +1324,26 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
> ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
> advance_hpp(hpp, ret);
>
> - #define DISPLAY_HITM(__h) \
> - if (c2c_he->stats.__h> 0) { \
> + #define DISPLAY_METRICS(val, sum) \
> + { \
> + if ((sum) > 0) { \
> ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ", \
> - percent(stats->__h, c2c_he->stats.__h));\
> + percent((val), (sum))); \
> } else { \
> ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a"); \
> - }
> + } \
> + }
>
> switch (c2c.display) {
> case DISPLAY_RMT:
> - DISPLAY_HITM(rmt_hitm);
> + DISPLAY_METRICS(stats->rmt_hitm, c2c_he->stats.rmt_hitm);
> break;
> case DISPLAY_LCL:
> - DISPLAY_HITM(lcl_hitm);
> + DISPLAY_METRICS(stats->lcl_hitm, c2c_he->stats.lcl_hitm);
> break;
> case DISPLAY_TOT:
> - DISPLAY_HITM(tot_hitm);
> + DISPLAY_METRICS(stats->tot_hitm, c2c_he->stats.tot_hitm);
> + break;
> default:
> break;
> }
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations
2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
@ 2021-01-06 7:52 ` Namhyung Kim
2021-01-11 8:47 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Namhyung Kim @ 2021-01-06 7:52 UTC (permalink / raw)
To: Leo Yan
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Except the existed three display options 'tot', 'rmt', 'lcl', this patch
> adds option 'all' so can sort on the all cache hit for load operation.
> This new introduced option can be a choice for profiling cache false
> sharing if the memory event doesn't contain HITM tags.
>
> For displaying with option 'all', the "Shared Data Cache Line Table" and
> "Shared Cache Line Distribution Pareto" both have difference comparing
> to other three display options.
>
> For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
> it sorts with the metrics "tot_ld_hit" and "percent_tot_ld_hit". If
> without HITM metrics, users can analyze the load hit statistics for all
> cache levels, so the dimensions of total load hit is used to replace
> HITM dimensions.
>
> For Pareto, every single cache line shows the metrics "cl_tot_ld_hit"
> and "cl_tot_ld_miss" instead of "cl_rmt_hitm" and "percent_lcl_hitm",
> and the single cache line view is sorted by metrics "tot_ld_hit".
>
> As result, we can get the 'all' display as follows:
>
> # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio
>
> [...]
>
> =================================================
> Shared Data Cache Line Table
> =================================================
> #
> # ----------- Cacheline ---------- Load Hit Load Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
> # Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
> # ..... .................. .... ...... ........ ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
> #
> 0 0x556f25dff100 0 1895 75.73% 4591 7840 4591 3249 2633 616 849 2734 67 58 883 0 0 0 0
> 1 0x556f25dff080 0 1 13.10% 794 794 794 0 0 0 164 486 28 20 96 0 0 0 0
> 2 0x556f25dff0c0 0 1 10.01% 607 607 607 0 0 0 107 5 5 488 2 0 0 0 0
>
> =================================================
> Shared Cache Line Distribution Pareto
> =================================================
> #
> # -- Load Refs -- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
> # Num Hit Miss L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
> # ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
> #
> -------------------------------------------------------------
> 0 4591 0 2633 616 0x556f25dff100
> -------------------------------------------------------------
> 20.52% 0.00% 0.00% 0.00% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc1d 0 2200 1276 942 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 19.82% 0.00% 38.06% 0.00% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc16 0 2190 1130 1912 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> 18.25% 0.00% 56.63% 0.00% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc16 0 2173 1074 2329 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> 18.23% 0.00% 0.00% 0.00% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc1d 0 2013 1220 837 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 0.00% 0.00% 3.11% 59.90% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc28 0 0 0 451 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 0.00% 0.00% 2.20% 40.10% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc28 0 0 0 305 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 12.00% 0.00% 0.00% 0.00% 0x20 0 1 28079 28083:reader_thd 0x556f25bfdc73 0 159 107 551 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
> 11.17% 0.00% 0.00% 0.00% 0x20 0 1 28079 28084:reader_thd 0x556f25bfdc73 0 148 108 513 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
>
> [...]
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
> tools/perf/builtin-c2c.c | 139 ++++++++++++++++++++++++++++-----------
> 1 file changed, 101 insertions(+), 38 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 9342c30d86ee..0df4a4a30f7a 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
[SNIP]
> @@ -2502,12 +2528,21 @@ static void print_pareto(FILE *out)
> int ret;
> const char *cl_output;
>
> - cl_output = "cl_num,"
> - "cl_rmt_hitm,"
> - "cl_lcl_hitm,"
> - "cl_stores_l1hit,"
> - "cl_stores_l1miss,"
> - "dcacheline";
> + if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> + c2c.display == DISPLAY_RMT)
> + cl_output = "cl_num,"
> + "cl_rmt_hitm,"
> + "cl_lcl_hitm,"
> + "cl_stores_l1hit,"
> + "cl_stores_l1miss,"
> + "dcacheline";
> + else /* c2c.display == DISPLAY_ALL */
> + cl_output = "cl_num,"
> + "cl_tot_ld_hit,"
> + "cl_tot_ld_miss,"
> + "cl_stores_l1hit,"
> + "cl_stores_l1miss,"
> + "dcacheline";
Nit: You can keep the default value as is, and add an if statement
just for the DISPLAY_ALL.
>
> perf_hpp_list__init(&hpp_list);
> ret = hpp_list__parse(&hpp_list, cl_output, NULL);
> @@ -2543,7 +2578,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
> fprintf(out, "%-36s: %s\n", first ? " Events" : "", evsel__name(evsel));
> first = false;
> }
> - fprintf(out, " Cachelines sort on : %s HITMs\n",
> + fprintf(out, " Cachelines sort on : %s\n",
> display_str[c2c.display]);
> fprintf(out, " Cacheline data grouping : %s\n", c2c.cl_sort);
> }
> @@ -2700,7 +2735,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
> {
> scnprintf(bf, size,
> "Shared Data Cache Line Table "
> - "(%lu entries, sorted on %s HITMs)",
> + "(%lu entries, sorted on %s)",
> browser->nr_non_filtered_entries,
> display_str[c2c.display]);
> return 0;
> @@ -2906,6 +2941,8 @@ static int setup_display(const char *str)
> c2c.display = DISPLAY_RMT;
> else if (!strcmp(display, "lcl"))
> c2c.display = DISPLAY_LCL;
> + else if (!strcmp(display, "all"))
> + c2c.display = DISPLAY_ALL;
> else {
> pr_err("failed: unknown display type: %s\n", str);
> return -1;
> @@ -2952,10 +2989,12 @@ static int build_cl_output(char *cl_sort, bool no_source)
> }
>
> if (asprintf(&c2c.cl_output,
> - "%s%s%s%s%s%s%s%s%s%s",
> + "%s%s%s%s%s%s%s%s%s%s%s",
> c2c.use_stdio ? "cl_num_empty," : "",
> - "percent_rmt_hitm,"
> - "percent_lcl_hitm,"
> + c2c.display == DISPLAY_ALL ? "percent_ld_hit,"
> + "percent_ld_miss," :
> + "percent_rmt_hitm,"
> + "percent_lcl_hitm,",
> "percent_stores_l1hit,"
> "percent_stores_l1miss,"
> "offset,offset_node,dcacheline_count,",
> @@ -2984,6 +3023,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
> static int setup_coalesce(const char *coalesce, bool no_source)
> {
> const char *c = coalesce ?: coalesce_default;
> + const char *sort_str = NULL;
>
> if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
> return -ENOMEM;
> @@ -2991,12 +3031,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
> if (build_cl_output(c2c.cl_sort, no_source))
> return -1;
>
> - if (asprintf(&c2c.cl_resort, "offset,%s",
> - c2c.display == DISPLAY_TOT ?
> - "tot_hitm" :
> - c2c.display == DISPLAY_RMT ?
> - "rmt_hitm,lcl_hitm" :
> - "lcl_hitm,rmt_hitm") < 0)
> + if (c2c.display == DISPLAY_TOT)
> + sort_str = "tot_hitm";
> + else if (c2c.display == DISPLAY_RMT)
> + sort_str = "rmt_hitm,lcl_hitm";
> + else if (c2c.display == DISPLAY_LCL)
> + sort_str = "lcl_hitm,rmt_hitm";
> + else if (c2c.display == DISPLAY_ALL)
> + sort_str = "tot_ld_hit";
> +
> + if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
> return -ENOMEM;
>
> pr_debug("coalesce sort fields: %s\n", c2c.cl_sort);
> @@ -3131,20 +3175,37 @@ static int perf_c2c__report(int argc, const char **argv)
> goto out_mem2node;
> }
>
> - output_str = "cl_idx,"
> - "dcacheline,"
> - "dcacheline_node,"
> - "dcacheline_count,"
> - "percent_hitm,"
> - "tot_hitm,lcl_hitm,rmt_hitm,"
> - "tot_recs,"
> - "tot_loads,"
> - "tot_stores,"
> - "stores_l1hit,stores_l1miss,"
> - "ld_fbhit,ld_l1hit,ld_l2hit,"
> - "ld_lclhit,lcl_hitm,"
> - "ld_rmthit,rmt_hitm,"
> - "dram_lcl,dram_rmt";
> + if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> + c2c.display == DISPLAY_RMT)
> + output_str = "cl_idx,"
> + "dcacheline,"
> + "dcacheline_node,"
> + "dcacheline_count,"
> + "percent_hitm,"
> + "tot_hitm,lcl_hitm,rmt_hitm,"
> + "tot_recs,"
> + "tot_loads,"
> + "tot_stores,"
> + "stores_l1hit,stores_l1miss,"
> + "ld_fbhit,ld_l1hit,ld_l2hit,"
> + "ld_lclhit,lcl_hitm,"
> + "ld_rmthit,rmt_hitm,"
> + "dram_lcl,dram_rmt";
> + else /* c2c.display == DISPLAY_ALL */
> + output_str = "cl_idx,"
> + "dcacheline,"
> + "dcacheline_node,"
> + "dcacheline_count,"
> + "percent_tot_ld_hit,"
> + "tot_ld_hit,"
> + "tot_recs,"
> + "tot_loads,"
> + "tot_stores,"
> + "stores_l1hit,stores_l1miss,"
> + "ld_fbhit,ld_l1hit,ld_l2hit,"
> + "ld_lclhit,lcl_hitm,"
> + "ld_rmthit,rmt_hitm,"
> + "dram_lcl,dram_rmt";
Ditto.
Thanks,
Namhyung
>
> if (c2c.display == DISPLAY_TOT)
> sort_str = "tot_hitm";
> @@ -3152,6 +3213,8 @@ static int perf_c2c__report(int argc, const char **argv)
> sort_str = "rmt_hitm";
> else if (c2c.display == DISPLAY_LCL)
> sort_str = "lcl_hitm";
> + else if (c2c.display == DISPLAY_ALL)
> + sort_str = "tot_ld_hit";
>
> c2c_hists__reinit(&c2c.hists, output_str, sort_str);
>
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 01/11] perf c2c: Add dimensions for total load hit
2021-01-06 7:38 ` Namhyung Kim
@ 2021-01-11 7:49 ` Leo Yan
0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11 7:49 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
Hi Namhyung,
On Wed, Jan 06, 2021 at 04:38:01PM +0900, Namhyung Kim wrote:
> Hi,
>
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Arm SPE trace data doesn't support HITM, but we still want to explore
> > "perf c2c" tool to analyze cache false sharing. If without HITM tag,
> > the tool cannot give out accurate result for cache false sharing, a
> > candidate solution is to sort the total load operations and connect with
> > the threads info, e.g. if multiple threads hit the same cache line for
> > many times, this can give out the hint that it's likely to cause cache
> > false sharing issue.
> >
> > Unlike having HITM tag, the proposed solution is not accurate and might
> > introduce false positive reporting, but it's a pragmatic approach for
> > detecting false sharing if memory event doesn't support HITM.
> >
> > To sort with the cache line hit, this patch adds dimensions for total
> > load hit and the associated percentage calculation.
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> > tools/perf/builtin-c2c.c | 112 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 112 insertions(+)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index c5babeaa3b38..3d5a2dc8b4fd 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -615,6 +615,47 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> > return tot_hitm_left - tot_hitm_right;
> > }
> >
> > +#define TOT_LD_HIT(stats) \
> > + ((stats)->ld_fbhit + \
> > + (stats)->ld_l1hit + \
> > + (stats)->ld_l2hit + \
> > + (stats)->ld_llchit + \
> > + (stats)->lcl_hitm + \
> > + (stats)->rmt_hitm + \
> > + (stats)->rmt_hit)
>
> It doesn't need to be a macro, why not use a static inline function?
Yes, will change to use static inline function.
As explained with Jiri, this patch series is mainly for Arm SPE, but
so far we have a known issue for store operation, thus the store
operation cannot be shown properly in the single cache view of perf
c2c tool. For this reason, I will firstly send the refactoring patches
in next version, but your comments for patches 01, 02, 03, 10, 11 will
be addressed if later upstream them.
Thanks a lot for review,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 02/11] perf c2c: Add dimensions for load hit
2021-01-06 7:38 ` Namhyung Kim
@ 2021-01-11 8:22 ` Leo Yan
0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11 8:22 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Wed, Jan 06, 2021 at 04:38:19PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Add dimensions for load hit and its percentage calculation, which is to
> > be displayed in the single cache line output.
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> > tools/perf/builtin-c2c.c | 71 ++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 71 insertions(+)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index 3d5a2dc8b4fd..00014e3d81fa 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -1052,6 +1052,58 @@ percent_lcl_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> > return per_left - per_right;
> > }
> >
> > +static double percent_ld_hit(struct c2c_hist_entry *c2c_he)
> > +{
> > + struct c2c_hists *hists;
> > + int tot, st;
> > +
> > + hists = container_of(c2c_he->he.hists, struct c2c_hists, hists);
> > +
> > + st = TOT_LD_HIT(&c2c_he->stats);
> > + tot = TOT_LD_HIT(&hists->stats);
> > +
> > + return percent(st, tot);
>
> It's not clear to me what's different than percent_tot_ld_hit().
Yes, though these two functions are used for different view (
percent_tot_ld_hit() is for the brief one-line summary view,
percent_ld_hit() is used for the Pareto table), but I think you are
suggesting to consolidate the two functions to single function.
Will refine this in next version.
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 03/11] perf c2c: Add dimensions for load miss
2021-01-06 7:42 ` Namhyung Kim
@ 2021-01-11 8:41 ` Leo Yan
0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11 8:41 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Wed, Jan 06, 2021 at 04:42:59PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > Add dimensions for load miss and its percentage calculation, which is to
> > be displayed in the single cache line output.
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> > tools/perf/builtin-c2c.c | 107 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 107 insertions(+)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index 00014e3d81fa..27745340c14a 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -624,6 +624,10 @@ tot_hitm_cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> > (stats)->rmt_hitm + \
> > (stats)->rmt_hit)
> >
> > +#define TOT_LD_MISS(stats) \
> > + ((stats)->lcl_dram + \
> > + (stats)->rmt_dram)
> > +
>
> Is this true always? I'm not sure if there's a case where stores can go to DRAM
> directly.. maybe like a kind of uncached accesses.
Firstly, from my understanding, the uncached accesses are not accounted
in the "load miss" metrics. You could see there have other two metrics
"stats->st_uncache" and "stats->ld_uncache" which present the store
uncached and load uncached respectively.
Furthermore, based on the function total_records(), it uses below
formula to account the total load accesses:
lclmiss = stats->lcl_dram +
stats->rmt_dram +
stats->rmt_hitm +
stats->rmt_hit;
ldcnt = lclmiss +
stats->ld_fbhit +
stats->ld_l1hit +
stats->ld_l2hit +
stats->ld_llchit +
stats->lcl_hitm;
So this patch series classifies these metrics into two numbers: load hit
and load miss; load hit is considered as hit at any cache levels
(L1, L2, LLC and even remote cache), the load miss is accounted with the
rest metrics (lcl_dram + rmt_dram).
> Also it can be a static function..
Yeah, will fix.
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 04/11] perf c2c: Rename for shared cache line stats
2021-01-06 7:44 ` Namhyung Kim
@ 2021-01-11 8:42 ` Leo Yan
0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11 8:42 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Wed, Jan 06, 2021 at 04:44:34PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > For shared cache line statistics, it relies on HITM. We can use more
> > general naming rather than only binding to HITM, so replace "hitm_stats"
> > with "shared_clines_stats" in structure perf_c2c, and rename function
> > resort_hitm_cb() to resort_shared_cl_cb().
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> > tools/perf/builtin-c2c.c | 14 +++++++-------
> > 1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index 27745340c14a..580c4ead68db 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -97,8 +97,8 @@ struct perf_c2c {
> > bool symbol_full;
> > bool stitch_lbr;
> >
> > - /* HITM shared clines stats */
> > - struct c2c_stats hitm_stats;
> > + /* Shared clines stats */
>
> Please change it to "Shared cache line stats".
Will do. Thanks!
> Thanks,
> Namhyung
>
>
> > + struct c2c_stats shared_clines_stats;
> > int shared_clines;
> >
> > int display;
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 06/11] perf c2c: Refactor display filter macro
2021-01-06 7:47 ` Namhyung Kim
@ 2021-01-11 8:43 ` Leo Yan
0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11 8:43 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Wed, Jan 06, 2021 at 04:47:00PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > When sort on the respective metrics (lcl_hitm, rmt_hitm, tot_hitm),
> > macro FILTER_HITM is to filter out the cache line entries if its
> > overhead is less than 1%.
> >
> > This patch is to refactor macro FILTER_HITM. It uses more gernal name
> > FILTER_DISPLAY to replace the old name; and refines its parameter,
> > rather than passing field name for the data structure, it changes to
> > pass the cache line's statistic value and the sum value, this is more
> > flexsible, e.g. if consider to extend for sorting on all load hits
> > which combines multiple fields from structure c2c_stats.
>
> As it doesn't use field names anymore, I think it's better to change it to
> a static function.
Agreed. Will do.
Thanks,
Leo
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> > tools/perf/builtin-c2c.c | 17 ++++++++++-------
> > 1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index 5cd30c083d6c..f11c3c84bb2b 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -2151,24 +2151,27 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
> >
> > c2c_he = container_of(he, struct c2c_hist_entry, he);
> >
> > -#define FILTER_HITM(__h) \
> > - if (stats->__h) { \
> > - ld_dist = ((double)c2c_he->stats.__h / stats->__h); \
> > +#define FILTER_DISPLAY(val, sum) \
> > +{ \
> > + if ((sum)) { \
> > + ld_dist = ((double)(val) / (sum)); \
> > if (ld_dist < DISPLAY_LINE_LIMIT) \
> > he->filtered = HIST_FILTER__C2C; \
> > } else { \
> > he->filtered = HIST_FILTER__C2C; \
> > - }
> > + } \
> > +}
> >
> > switch (c2c.display) {
> > case DISPLAY_LCL:
> > - FILTER_HITM(lcl_hitm);
> > + FILTER_DISPLAY(c2c_he->stats.lcl_hitm, stats->lcl_hitm);
> > break;
> > case DISPLAY_RMT:
> > - FILTER_HITM(rmt_hitm);
> > + FILTER_DISPLAY(c2c_he->stats.rmt_hitm, stats->rmt_hitm);
> > break;
> > case DISPLAY_TOT:
> > - FILTER_HITM(tot_hitm);
> > + FILTER_DISPLAY(c2c_he->stats.tot_hitm, stats->tot_hitm);
> > + break;
> > default:
> > break;
> > }
> > --
> > 2.17.1
> >
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 07/11] perf c2c: Refactor node display macro
2021-01-06 7:47 ` Namhyung Kim
@ 2021-01-11 8:44 ` Leo Yan
0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11 8:44 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Wed, Jan 06, 2021 at 04:47:49PM +0900, Namhyung Kim wrote:
> On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > The macro DISPLAY_HITM() is used to calculate HITM percentage introduced
> > by every node and it's shown for the node info.
> >
> > This patch refactors the macro, it is renamed it as DISPLAY_METRICS().
> > And the parameters is changed for passing the metric's statistic value
> > and the sum value, this is flexsible for later's extension, e.g. it's
> > easier to be used for metrics which combines multiple fields from
> > structure c2c_stats.
>
> Same as the previous one.
Will do, thanks.
> Thanks,
> Namhyung
>
>
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> > tools/perf/builtin-c2c.c | 17 ++++++++++-------
> > 1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> > index f11c3c84bb2b..50018bfb1089 100644
> > --- a/tools/perf/builtin-c2c.c
> > +++ b/tools/perf/builtin-c2c.c
> > @@ -1324,23 +1324,26 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
> > ret = scnprintf(hpp->buf, hpp->size, "%2d{%2d ", node, num);
> > advance_hpp(hpp, ret);
> >
> > - #define DISPLAY_HITM(__h) \
> > - if (c2c_he->stats.__h> 0) { \
> > + #define DISPLAY_METRICS(val, sum) \
> > + { \
> > + if ((sum) > 0) { \
> > ret = scnprintf(hpp->buf, hpp->size, "%5.1f%% ", \
> > - percent(stats->__h, c2c_he->stats.__h));\
> > + percent((val), (sum))); \
> > } else { \
> > ret = scnprintf(hpp->buf, hpp->size, "%6s ", "n/a"); \
> > - }
> > + } \
> > + }
> >
> > switch (c2c.display) {
> > case DISPLAY_RMT:
> > - DISPLAY_HITM(rmt_hitm);
> > + DISPLAY_METRICS(stats->rmt_hitm, c2c_he->stats.rmt_hitm);
> > break;
> > case DISPLAY_LCL:
> > - DISPLAY_HITM(lcl_hitm);
> > + DISPLAY_METRICS(stats->lcl_hitm, c2c_he->stats.lcl_hitm);
> > break;
> > case DISPLAY_TOT:
> > - DISPLAY_HITM(tot_hitm);
> > + DISPLAY_METRICS(stats->tot_hitm, c2c_he->stats.tot_hitm);
> > + break;
> > default:
> > break;
> > }
> > --
> > 2.17.1
> >
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations
2021-01-06 7:52 ` Namhyung Kim
@ 2021-01-11 8:47 ` Leo Yan
0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-11 8:47 UTC (permalink / raw)
To: Namhyung Kim
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
Mark Rutland, Alexander Shishkin, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Wed, Jan 06, 2021 at 04:52:15PM +0900, Namhyung Kim wrote:
[...]
> > @@ -2502,12 +2528,21 @@ static void print_pareto(FILE *out)
> > int ret;
> > const char *cl_output;
> >
> > - cl_output = "cl_num,"
> > - "cl_rmt_hitm,"
> > - "cl_lcl_hitm,"
> > - "cl_stores_l1hit,"
> > - "cl_stores_l1miss,"
> > - "dcacheline";
> > + if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> > + c2c.display == DISPLAY_RMT)
> > + cl_output = "cl_num,"
> > + "cl_rmt_hitm,"
> > + "cl_lcl_hitm,"
> > + "cl_stores_l1hit,"
> > + "cl_stores_l1miss,"
> > + "dcacheline";
> > + else /* c2c.display == DISPLAY_ALL */
> > + cl_output = "cl_num,"
> > + "cl_tot_ld_hit,"
> > + "cl_tot_ld_miss,"
> > + "cl_stores_l1hit,"
> > + "cl_stores_l1miss,"
> > + "dcacheline";
>
> Nit: You can keep the default value as is, and add an if statement
> just for the DISPLAY_ALL.
Okay, this might be more friendly for review. Will change to this
way.
> >
> > perf_hpp_list__init(&hpp_list);
> > ret = hpp_list__parse(&hpp_list, cl_output, NULL);
> > @@ -2543,7 +2578,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
> > fprintf(out, "%-36s: %s\n", first ? " Events" : "", evsel__name(evsel));
> > first = false;
> > }
> > - fprintf(out, " Cachelines sort on : %s HITMs\n",
> > + fprintf(out, " Cachelines sort on : %s\n",
> > display_str[c2c.display]);
> > fprintf(out, " Cacheline data grouping : %s\n", c2c.cl_sort);
> > }
> > @@ -2700,7 +2735,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
> > {
> > scnprintf(bf, size,
> > "Shared Data Cache Line Table "
> > - "(%lu entries, sorted on %s HITMs)",
> > + "(%lu entries, sorted on %s)",
> > browser->nr_non_filtered_entries,
> > display_str[c2c.display]);
> > return 0;
> > @@ -2906,6 +2941,8 @@ static int setup_display(const char *str)
> > c2c.display = DISPLAY_RMT;
> > else if (!strcmp(display, "lcl"))
> > c2c.display = DISPLAY_LCL;
> > + else if (!strcmp(display, "all"))
> > + c2c.display = DISPLAY_ALL;
> > else {
> > pr_err("failed: unknown display type: %s\n", str);
> > return -1;
> > @@ -2952,10 +2989,12 @@ static int build_cl_output(char *cl_sort, bool no_source)
> > }
> >
> > if (asprintf(&c2c.cl_output,
> > - "%s%s%s%s%s%s%s%s%s%s",
> > + "%s%s%s%s%s%s%s%s%s%s%s",
> > c2c.use_stdio ? "cl_num_empty," : "",
> > - "percent_rmt_hitm,"
> > - "percent_lcl_hitm,"
> > + c2c.display == DISPLAY_ALL ? "percent_ld_hit,"
> > + "percent_ld_miss," :
> > + "percent_rmt_hitm,"
> > + "percent_lcl_hitm,",
> > "percent_stores_l1hit,"
> > "percent_stores_l1miss,"
> > "offset,offset_node,dcacheline_count,",
> > @@ -2984,6 +3023,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
> > static int setup_coalesce(const char *coalesce, bool no_source)
> > {
> > const char *c = coalesce ?: coalesce_default;
> > + const char *sort_str = NULL;
> >
> > if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
> > return -ENOMEM;
> > @@ -2991,12 +3031,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
> > if (build_cl_output(c2c.cl_sort, no_source))
> > return -1;
> >
> > - if (asprintf(&c2c.cl_resort, "offset,%s",
> > - c2c.display == DISPLAY_TOT ?
> > - "tot_hitm" :
> > - c2c.display == DISPLAY_RMT ?
> > - "rmt_hitm,lcl_hitm" :
> > - "lcl_hitm,rmt_hitm") < 0)
> > + if (c2c.display == DISPLAY_TOT)
> > + sort_str = "tot_hitm";
> > + else if (c2c.display == DISPLAY_RMT)
> > + sort_str = "rmt_hitm,lcl_hitm";
> > + else if (c2c.display == DISPLAY_LCL)
> > + sort_str = "lcl_hitm,rmt_hitm";
> > + else if (c2c.display == DISPLAY_ALL)
> > + sort_str = "tot_ld_hit";
> > +
> > + if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
> > return -ENOMEM;
> >
> > pr_debug("coalesce sort fields: %s\n", c2c.cl_sort);
> > @@ -3131,20 +3175,37 @@ static int perf_c2c__report(int argc, const char **argv)
> > goto out_mem2node;
> > }
> >
> > - output_str = "cl_idx,"
> > - "dcacheline,"
> > - "dcacheline_node,"
> > - "dcacheline_count,"
> > - "percent_hitm,"
> > - "tot_hitm,lcl_hitm,rmt_hitm,"
> > - "tot_recs,"
> > - "tot_loads,"
> > - "tot_stores,"
> > - "stores_l1hit,stores_l1miss,"
> > - "ld_fbhit,ld_l1hit,ld_l2hit,"
> > - "ld_lclhit,lcl_hitm,"
> > - "ld_rmthit,rmt_hitm,"
> > - "dram_lcl,dram_rmt";
> > + if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> > + c2c.display == DISPLAY_RMT)
> > + output_str = "cl_idx,"
> > + "dcacheline,"
> > + "dcacheline_node,"
> > + "dcacheline_count,"
> > + "percent_hitm,"
> > + "tot_hitm,lcl_hitm,rmt_hitm,"
> > + "tot_recs,"
> > + "tot_loads,"
> > + "tot_stores,"
> > + "stores_l1hit,stores_l1miss,"
> > + "ld_fbhit,ld_l1hit,ld_l2hit,"
> > + "ld_lclhit,lcl_hitm,"
> > + "ld_rmthit,rmt_hitm,"
> > + "dram_lcl,dram_rmt";
> > + else /* c2c.display == DISPLAY_ALL */
> > + output_str = "cl_idx,"
> > + "dcacheline,"
> > + "dcacheline_node,"
> > + "dcacheline_count,"
> > + "percent_tot_ld_hit,"
> > + "tot_ld_hit,"
> > + "tot_recs,"
> > + "tot_loads,"
> > + "tot_stores,"
> > + "stores_l1hit,stores_l1miss,"
> > + "ld_fbhit,ld_l1hit,ld_l2hit,"
> > + "ld_lclhit,lcl_hitm,"
> > + "ld_rmthit,rmt_hitm,"
> > + "dram_lcl,dram_rmt";
>
> Ditto.
Agreed.
Thanks,
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
2021-01-04 9:35 ` Jiri Olsa
@ 2021-01-15 15:17 ` Arnaldo Carvalho de Melo
2021-01-16 0:45 ` Leo Yan
0 siblings, 1 reply; 31+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-01-15 15:17 UTC (permalink / raw)
To: Jiri Olsa
Cc: Leo Yan, Peter Zijlstra, Ingo Molnar, Mark Rutland,
Alexander Shishkin, Namhyung Kim, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
Em Mon, Jan 04, 2021 at 10:35:40AM +0100, Jiri Olsa escreveu:
> On Mon, Jan 04, 2021 at 10:09:38AM +0800, Leo Yan wrote:
>
> SNIP
>
> > > > Leo Yan (11):
> > > > perf c2c: Add dimensions for total load hit
> > > > perf c2c: Add dimensions for load hit
> > > > perf c2c: Add dimensions for load miss
> > > > perf c2c: Rename for shared cache line stats
> > > > perf c2c: Refactor hist entry validation
> > > > perf c2c: Refactor display filter macro
> > > > perf c2c: Refactor node display macro
> > > > perf c2c: Refactor node header
> > > > perf c2c: Add local variables for output metrics
> > > > perf c2c: Sort on all cache hit for load operations
> > > > perf c2c: Update documentation for display option 'all'
> > > >
> > > > tools/perf/Documentation/perf-c2c.txt | 21 +-
> > > > tools/perf/builtin-c2c.c | 548 ++++++++++++++++++++++----
> > > > 2 files changed, 487 insertions(+), 82 deletions(-)
> > >
> > > Joe might want to test it first, but it looks all good to me:
> > >
> > > Acked-by: Jiri Olsa <jolsa@redhat.com>
> >
> > Thanks for the review, Jiri.
> >
> > Note, after testing with Arm SPE, we found the store operations don't
> > contain the information for L1 cache hit or miss, this leads to there
> > have no statistics for "st_l1hit" and "st_l1miss"; finally the single
> > cache line view only can show the load samples and fails to show store
> > opreations due to the empty statistics for "st_l1hit" and "st_l1miss".
> >
> > This is related the hardware issue, after some discussion internally,
> > so far cannot find a easy way to set memory flag for L1 cache hit or
> > miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
> > PERF_MEM_LVL_MISS for store's L1 cache accessing).
> >
> > Given it is uncertain for this issue, please hold on for this patch
> > series and I will resend if have any conclusion.
> >
> > And really sorry I notify this too late.
>
> no problem, I think we can take some of the refactoring patches anyway
Agreed, in fact I already processed this series in my local branch and
I'm test building everything now.
- Arnaldo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
2021-01-15 15:17 ` Arnaldo Carvalho de Melo
@ 2021-01-16 0:45 ` Leo Yan
0 siblings, 0 replies; 31+ messages in thread
From: Leo Yan @ 2021-01-16 0:45 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Jiri Olsa, Peter Zijlstra, Ingo Molnar, Mark Rutland,
Alexander Shishkin, Namhyung Kim, Andi Kleen, Ian Rogers,
Kan Liang, Joe Mario, David Ahern, Don Zickus, Al Grant,
James Clark, linux-kernel
On Fri, Jan 15, 2021 at 12:17:01PM -0300, Arnaldo Carvalho de Melo wrote:
[...]
> > > Thanks for the review, Jiri.
> > >
> > > Note, after testing with Arm SPE, we found the store operations don't
> > > contain the information for L1 cache hit or miss, this leads to there
> > > have no statistics for "st_l1hit" and "st_l1miss"; finally the single
> > > cache line view only can show the load samples and fails to show store
> > > opreations due to the empty statistics for "st_l1hit" and "st_l1miss".
> > >
> > > This is related the hardware issue, after some discussion internally,
> > > so far cannot find a easy way to set memory flag for L1 cache hit or
> > > miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
> > > PERF_MEM_LVL_MISS for store's L1 cache accessing).
> > >
> > > Given it is uncertain for this issue, please hold on for this patch
> > > series and I will resend if have any conclusion.
> > >
> > > And really sorry I notify this too late.
> >
> > no problem, I think we can take some of the refactoring patches anyway
>
> Agreed, in fact I already processed this series in my local branch and
> I'm test building everything now.
Thanks a lot, Arnalod.
Just remind in case you miss the latest series due to the flood emails,
I have extracted and refined the refactoring patches, and have sent out
patch series v4 [1] for picking up (and have received ACK tags from
Namhyung and Jiri).
Leo
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2021-01-16 0:46 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
2021-01-06 7:38 ` Namhyung Kim
2021-01-11 7:49 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
2021-01-06 7:38 ` Namhyung Kim
2021-01-11 8:22 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
2021-01-06 7:42 ` Namhyung Kim
2021-01-11 8:41 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
2021-01-06 7:44 ` Namhyung Kim
2021-01-11 8:42 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 05/11] perf c2c: Refactor hist entry validation Leo Yan
2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
2021-01-06 7:47 ` Namhyung Kim
2021-01-11 8:43 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
2021-01-06 7:47 ` Namhyung Kim
2021-01-11 8:44 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
2020-12-13 13:38 ` [PATCH v2 09/11] perf c2c: Add local variables for output metrics Leo Yan
2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
2021-01-06 7:52 ` Namhyung Kim
2021-01-11 8:47 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 11/11] perf c2c: Update documentation for display option 'all' Leo Yan
2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
2021-01-04 2:09 ` Leo Yan
2021-01-04 9:35 ` Jiri Olsa
2021-01-15 15:17 ` Arnaldo Carvalho de Melo
2021-01-16 0:45 ` Leo Yan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.