From: Namhyung Kim <namhyung@kernel.org>
To: Leo Yan <leo.yan@linaro.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
Jiri Olsa <jolsa@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Mark Rutland <mark.rutland@arm.com>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Andi Kleen <ak@linux.intel.com>, Ian Rogers <irogers@google.com>,
Kan Liang <kan.liang@linux.intel.com>,
Joe Mario <jmario@redhat.com>, David Ahern <dsahern@gmail.com>,
Don Zickus <dzickus@redhat.com>, Al Grant <Al.Grant@arm.com>,
James Clark <james.clark@arm.com>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations
Date: Wed, 6 Jan 2021 16:52:15 +0900 [thread overview]
Message-ID: <CAM9d7cj=SDLyCgBOKfEX91s7VrWbJZa-Qn+8SuE+rzC+vGGs_A@mail.gmail.com> (raw)
In-Reply-To: <20201213133850.10070-11-leo.yan@linaro.org>
On Sun, Dec 13, 2020 at 10:39 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Except the existed three display options 'tot', 'rmt', 'lcl', this patch
> adds option 'all' so can sort on the all cache hit for load operation.
> This new introduced option can be a choice for profiling cache false
> sharing if the memory event doesn't contain HITM tags.
>
> For displaying with option 'all', the "Shared Data Cache Line Table" and
> "Shared Cache Line Distribution Pareto" both have difference comparing
> to other three display options.
>
> For the "Shared Data Cache Line Table", instead of sorting HITM metrics,
> it sorts with the metrics "tot_ld_hit" and "percent_tot_ld_hit". If
> without HITM metrics, users can analyze the load hit statistics for all
> cache levels, so the dimensions of total load hit is used to replace
> HITM dimensions.
>
> For Pareto, every single cache line shows the metrics "cl_tot_ld_hit"
> and "cl_tot_ld_miss" instead of "cl_rmt_hitm" and "percent_lcl_hitm",
> and the single cache line view is sorted by metrics "tot_ld_hit".
>
> As result, we can get the 'all' display as follows:
>
> # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio
>
> [...]
>
> =================================================
> Shared Data Cache Line Table
> =================================================
> #
> # ----------- Cacheline ---------- Load Hit Load Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
> # Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
> # ..... .................. .... ...... ........ ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
> #
> 0 0x556f25dff100 0 1895 75.73% 4591 7840 4591 3249 2633 616 849 2734 67 58 883 0 0 0 0
> 1 0x556f25dff080 0 1 13.10% 794 794 794 0 0 0 164 486 28 20 96 0 0 0 0
> 2 0x556f25dff0c0 0 1 10.01% 607 607 607 0 0 0 107 5 5 488 2 0 0 0 0
>
> =================================================
> Shared Cache Line Distribution Pareto
> =================================================
> #
> # -- Load Refs -- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
> # Num Hit Miss L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
> # ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
> #
> -------------------------------------------------------------
> 0 4591 0 2633 616 0x556f25dff100
> -------------------------------------------------------------
> 20.52% 0.00% 0.00% 0.00% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc1d 0 2200 1276 942 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 19.82% 0.00% 38.06% 0.00% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc16 0 2190 1130 1912 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> 18.25% 0.00% 56.63% 0.00% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc16 0 2173 1074 2329 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
> 18.23% 0.00% 0.00% 0.00% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc1d 0 2013 1220 837 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 0.00% 0.00% 3.11% 59.90% 0x0 0 1 28079 28081:lock_th 0x556f25bfdc28 0 0 0 451 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 0.00% 0.00% 2.20% 40.10% 0x0 0 1 28079 28082:lock_th 0x556f25bfdc28 0 0 0 305 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
> 12.00% 0.00% 0.00% 0.00% 0x20 0 1 28079 28083:reader_thd 0x556f25bfdc73 0 159 107 551 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
> 11.17% 0.00% 0.00% 0.00% 0x20 0 1 28079 28084:reader_thd 0x556f25bfdc73 0 148 108 513 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
>
> [...]
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
> tools/perf/builtin-c2c.c | 139 ++++++++++++++++++++++++++++-----------
> 1 file changed, 101 insertions(+), 38 deletions(-)
>
> diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
> index 9342c30d86ee..0df4a4a30f7a 100644
> --- a/tools/perf/builtin-c2c.c
> +++ b/tools/perf/builtin-c2c.c
[SNIP]
> @@ -2502,12 +2528,21 @@ static void print_pareto(FILE *out)
> int ret;
> const char *cl_output;
>
> - cl_output = "cl_num,"
> - "cl_rmt_hitm,"
> - "cl_lcl_hitm,"
> - "cl_stores_l1hit,"
> - "cl_stores_l1miss,"
> - "dcacheline";
> + if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> + c2c.display == DISPLAY_RMT)
> + cl_output = "cl_num,"
> + "cl_rmt_hitm,"
> + "cl_lcl_hitm,"
> + "cl_stores_l1hit,"
> + "cl_stores_l1miss,"
> + "dcacheline";
> + else /* c2c.display == DISPLAY_ALL */
> + cl_output = "cl_num,"
> + "cl_tot_ld_hit,"
> + "cl_tot_ld_miss,"
> + "cl_stores_l1hit,"
> + "cl_stores_l1miss,"
> + "dcacheline";
Nit: You can keep the default value as is, and add an if statement
just for the DISPLAY_ALL.
>
> perf_hpp_list__init(&hpp_list);
> ret = hpp_list__parse(&hpp_list, cl_output, NULL);
> @@ -2543,7 +2578,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
> fprintf(out, "%-36s: %s\n", first ? " Events" : "", evsel__name(evsel));
> first = false;
> }
> - fprintf(out, " Cachelines sort on : %s HITMs\n",
> + fprintf(out, " Cachelines sort on : %s\n",
> display_str[c2c.display]);
> fprintf(out, " Cacheline data grouping : %s\n", c2c.cl_sort);
> }
> @@ -2700,7 +2735,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
> {
> scnprintf(bf, size,
> "Shared Data Cache Line Table "
> - "(%lu entries, sorted on %s HITMs)",
> + "(%lu entries, sorted on %s)",
> browser->nr_non_filtered_entries,
> display_str[c2c.display]);
> return 0;
> @@ -2906,6 +2941,8 @@ static int setup_display(const char *str)
> c2c.display = DISPLAY_RMT;
> else if (!strcmp(display, "lcl"))
> c2c.display = DISPLAY_LCL;
> + else if (!strcmp(display, "all"))
> + c2c.display = DISPLAY_ALL;
> else {
> pr_err("failed: unknown display type: %s\n", str);
> return -1;
> @@ -2952,10 +2989,12 @@ static int build_cl_output(char *cl_sort, bool no_source)
> }
>
> if (asprintf(&c2c.cl_output,
> - "%s%s%s%s%s%s%s%s%s%s",
> + "%s%s%s%s%s%s%s%s%s%s%s",
> c2c.use_stdio ? "cl_num_empty," : "",
> - "percent_rmt_hitm,"
> - "percent_lcl_hitm,"
> + c2c.display == DISPLAY_ALL ? "percent_ld_hit,"
> + "percent_ld_miss," :
> + "percent_rmt_hitm,"
> + "percent_lcl_hitm,",
> "percent_stores_l1hit,"
> "percent_stores_l1miss,"
> "offset,offset_node,dcacheline_count,",
> @@ -2984,6 +3023,7 @@ static int build_cl_output(char *cl_sort, bool no_source)
> static int setup_coalesce(const char *coalesce, bool no_source)
> {
> const char *c = coalesce ?: coalesce_default;
> + const char *sort_str = NULL;
>
> if (asprintf(&c2c.cl_sort, "offset,%s", c) < 0)
> return -ENOMEM;
> @@ -2991,12 +3031,16 @@ static int setup_coalesce(const char *coalesce, bool no_source)
> if (build_cl_output(c2c.cl_sort, no_source))
> return -1;
>
> - if (asprintf(&c2c.cl_resort, "offset,%s",
> - c2c.display == DISPLAY_TOT ?
> - "tot_hitm" :
> - c2c.display == DISPLAY_RMT ?
> - "rmt_hitm,lcl_hitm" :
> - "lcl_hitm,rmt_hitm") < 0)
> + if (c2c.display == DISPLAY_TOT)
> + sort_str = "tot_hitm";
> + else if (c2c.display == DISPLAY_RMT)
> + sort_str = "rmt_hitm,lcl_hitm";
> + else if (c2c.display == DISPLAY_LCL)
> + sort_str = "lcl_hitm,rmt_hitm";
> + else if (c2c.display == DISPLAY_ALL)
> + sort_str = "tot_ld_hit";
> +
> + if (asprintf(&c2c.cl_resort, "offset,%s", sort_str) < 0)
> return -ENOMEM;
>
> pr_debug("coalesce sort fields: %s\n", c2c.cl_sort);
> @@ -3131,20 +3175,37 @@ static int perf_c2c__report(int argc, const char **argv)
> goto out_mem2node;
> }
>
> - output_str = "cl_idx,"
> - "dcacheline,"
> - "dcacheline_node,"
> - "dcacheline_count,"
> - "percent_hitm,"
> - "tot_hitm,lcl_hitm,rmt_hitm,"
> - "tot_recs,"
> - "tot_loads,"
> - "tot_stores,"
> - "stores_l1hit,stores_l1miss,"
> - "ld_fbhit,ld_l1hit,ld_l2hit,"
> - "ld_lclhit,lcl_hitm,"
> - "ld_rmthit,rmt_hitm,"
> - "dram_lcl,dram_rmt";
> + if (c2c.display == DISPLAY_TOT || c2c.display == DISPLAY_LCL ||
> + c2c.display == DISPLAY_RMT)
> + output_str = "cl_idx,"
> + "dcacheline,"
> + "dcacheline_node,"
> + "dcacheline_count,"
> + "percent_hitm,"
> + "tot_hitm,lcl_hitm,rmt_hitm,"
> + "tot_recs,"
> + "tot_loads,"
> + "tot_stores,"
> + "stores_l1hit,stores_l1miss,"
> + "ld_fbhit,ld_l1hit,ld_l2hit,"
> + "ld_lclhit,lcl_hitm,"
> + "ld_rmthit,rmt_hitm,"
> + "dram_lcl,dram_rmt";
> + else /* c2c.display == DISPLAY_ALL */
> + output_str = "cl_idx,"
> + "dcacheline,"
> + "dcacheline_node,"
> + "dcacheline_count,"
> + "percent_tot_ld_hit,"
> + "tot_ld_hit,"
> + "tot_recs,"
> + "tot_loads,"
> + "tot_stores,"
> + "stores_l1hit,stores_l1miss,"
> + "ld_fbhit,ld_l1hit,ld_l2hit,"
> + "ld_lclhit,lcl_hitm,"
> + "ld_rmthit,rmt_hitm,"
> + "dram_lcl,dram_rmt";
Ditto.
Thanks,
Namhyung
>
> if (c2c.display == DISPLAY_TOT)
> sort_str = "tot_hitm";
> @@ -3152,6 +3213,8 @@ static int perf_c2c__report(int argc, const char **argv)
> sort_str = "rmt_hitm";
> else if (c2c.display == DISPLAY_LCL)
> sort_str = "lcl_hitm";
> + else if (c2c.display == DISPLAY_ALL)
> + sort_str = "tot_ld_hit";
>
> c2c_hists__reinit(&c2c.hists, output_str, sort_str);
>
> --
> 2.17.1
>
next prev parent reply other threads:[~2021-01-06 7:53 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
2021-01-06 7:38 ` Namhyung Kim
2021-01-11 7:49 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
2021-01-06 7:38 ` Namhyung Kim
2021-01-11 8:22 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
2021-01-06 7:42 ` Namhyung Kim
2021-01-11 8:41 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
2021-01-06 7:44 ` Namhyung Kim
2021-01-11 8:42 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 05/11] perf c2c: Refactor hist entry validation Leo Yan
2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
2021-01-06 7:47 ` Namhyung Kim
2021-01-11 8:43 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
2021-01-06 7:47 ` Namhyung Kim
2021-01-11 8:44 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
2020-12-13 13:38 ` [PATCH v2 09/11] perf c2c: Add local variables for output metrics Leo Yan
2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
2021-01-06 7:52 ` Namhyung Kim [this message]
2021-01-11 8:47 ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 11/11] perf c2c: Update documentation for display option 'all' Leo Yan
2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
2021-01-04 2:09 ` Leo Yan
2021-01-04 9:35 ` Jiri Olsa
2021-01-15 15:17 ` Arnaldo Carvalho de Melo
2021-01-16 0:45 ` Leo Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM9d7cj=SDLyCgBOKfEX91s7VrWbJZa-Qn+8SuE+rzC+vGGs_A@mail.gmail.com' \
--to=namhyung@kernel.org \
--cc=Al.Grant@arm.com \
--cc=acme@kernel.org \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=dsahern@gmail.com \
--cc=dzickus@redhat.com \
--cc=irogers@google.com \
--cc=james.clark@arm.com \
--cc=jmario@redhat.com \
--cc=jolsa@redhat.com \
--cc=kan.liang@linux.intel.com \
--cc=leo.yan@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.