linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Joe Mario <jmario@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Jiri Olsa <jolsa@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Stephane Eranian <eranian@google.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	linux-toolchains@vger.kernel.org,
	linux-trace-devel@vger.kernel.org,
	Ben Woodard <woodard@redhat.com>,
	Kees Cook <keescook@chromium.org>,
	David Blaikie <blaikie@google.com>, Xu Liu <xliuprof@google.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Ravi Bangoria <ravi.bangoria@amd.com>
Subject: Re: [RFC 00/48] perf tools: Introduce data type profiling (v1)
Date: Wed, 8 Nov 2023 20:48:50 -0800	[thread overview]
Message-ID: <CAM9d7ciWY6kvVB-JkrJS7BZs_WMy-OZRgwLvBG9WQ-Zi00czfA@mail.gmail.com> (raw)
In-Reply-To: <82cd8b7e-bd46-49ed-9160-eabcfd4c3c20@redhat.com>

Hello,

On Wed, Nov 8, 2023 at 9:12 AM Joe Mario <jmario@redhat.com> wrote:
>
> Hi Namhyung:
>
> I've been playing with your datatype profile patch and it looks really promising.
> I think it would be a big help if it could be integrated into perf c2c.

Great!  Yeah, I think we can collaborate on it.

>
> Perf c2c gives a great insight into what's contributing to cpu cacheline contention, but it
> can be difficult to understand the output.  Having visuals with your datatype profile output
> would be a big help.

Exactly.

>
> I have a simple test program with readers and writers tugging on the data below:
>
>   uint64_t hotVar;
>   typedef struct __foo {
>      uint64_t m1;
>      uint64_t m2;
>      uint64_t m3;
>   } FOO;
>
> The rest of this reply looks at both your datatype output and c2c to see where they
> might compliment each other.
>
>
> When I run perf with your patches on a simple program to cause contention on the above data, I get the following:
>
> # perf mem record --ldlat=1 --all-user --  ./tugtest -r3 -r5 -r7 -r9 -r11 -w10 -w12 -w14 -w16 -b5 -H2000000
> # perf report -s type,typeoff --hierarchy --stdio
>
>    # Samples: 26K of event 'cpu/mem-loads,ldlat=1/P'
>    # Event count (approx.): 2958226
>    #
>    #    Overhead  Data Type / Data Type Offset
>    # ...........  ............................
>    #
>        54.50%     int
>           54.50%     int +0 (no field)
>        23.21%     long int
>           23.21%     long int +0 (no field)
>        18.30%     struct __foo
>            9.57%     struct __foo +8 (m2)
>            8.73%     struct __foo +0 (m1)
>         3.86%     long unsigned int
>            3.86%     long unsigned int +0 (no field)
>        <snip>
>
>    # Samples: 30K of event 'cpu/mem-stores/P'
>    # Event count (approx.): 33880197
>    #
>    #    Overhead  Data Type / Data Type Offset
>    # ...........  ............................
>    #
>        99.85%     struct __foo
>           70.48%     struct __foo +0 (m1)
>           29.34%     struct __foo +16 (m3)
>            0.03%     struct __foo +8 (m2)
>         0.09%     long unsigned int
>            0.09%     long unsigned int +0 (no field)
>         0.06%     (unknown)
>            0.06%     (unknown) +0 (no field)
>        <snip>
>
> Then I run perf annotate with your patches, and I get the following:
>
>   # perf annotate  --data-type
>
>    Annotate type: 'long int' in /home/joe/tugtest/tugtest (2901 samples):
>    ============================================================================
>        samples     offset       size  field
>           2901          0          8  long int  ;
>
>    Annotate type: 'struct __foo' in /home/joe/tugtest/tugtest (5593 samples):
>    ============================================================================
>        samples     offset       size  field
>           5593          0         24  struct __foo       {
>           2755          0          8      uint64_t      m1;
>           2838          8          8      uint64_t      m2;
>              0         16          8      uint64_t      m3;
>                                       };
>
> Now when I run that same simple test using perf c2c, and I focus on the cachline that the struct and hotVar reside in, I get:
>
> # perf c2c record --all-user -- ./tugtest -r3 -r5 -r7 -r9 -r11 -w10 -w12 -w14 -w16 -b5 -H2000000
> # perf c2c report -NNN --stdio
> # <snip>
> #
> #      ----- HITM -----  ------- Store Refs ------  ------ Data address ------                ---------- cycles ----------    Total    cpu               Shared
> # Num  RmtHitm  LclHitm   L1 Hit  L1 Miss      N/A        Offset  Node  PA cnt  Code address  rmt hitm  lcl hitm      load  records    cnt      Symbol   Object    Source:Line  Node{cpu list}
> #....  .......  .......  .......  .......  .......  ............  ....  ......  ............  ........  ........  ........  .......  .....  ..........  .......  .............  ....
> #
>  ---------------------------------------------------------------
>     0     1094     2008    17071    13762        0      0x406100
>  ---------------------------------------------------------------
>          0.00%    0.20%    0.00%    0.00%    0.00%           0x8     1       1      0x401355         0       978      1020     2962      4  [.] writer  tugtest  tugtest.c:129   0{10,12,14,16}
>          0.00%    0.00%    0.12%    0.02%    0.00%           0x8     1       1      0x401360         0         0         0       23      4  [.] writer  tugtest  tugtest.c:129   0{10,12,14,16}
>         68.10%   60.26%    0.00%    0.00%    0.00%          0x10     1       1      0x401505      2181      1541      1393     5813      5  [.] reader  tugtest  tugtest.c:163   1{3,5,7,9,11}
>         31.63%   39.34%    0.00%    0.00%    0.00%          0x10     1       1      0x401331      1242      1095       936     3393      4  [.] writer  tugtest  tugtest.c:127   0{10,12,14,16}
>          0.00%    0.00%   40.03%   40.25%    0.00%          0x10     1       1      0x40133c         0         0         0    12372      4  [.] writer  tugtest  tugtest.c:127   0{10,12,14,16}
>          0.27%    0.15%    0.00%    0.00%    0.00%          0x18     1       1      0x401343       834      1136      1032     2930      4  [.] writer  tugtest  tugtest.c:128   0{10,12,14,16}
>          0.00%    0.05%    0.00%    0.00%    0.00%          0x18     1       1      0x40150c         0       933      1567     5050      5  [.] reader  tugtest  tugtest.c:163   1{3,5,7,9,11}
>          0.00%    0.00%    0.06%    0.00%    0.00%          0x18     1       1      0x40134e         0         0         0       10      4  [.] writer  tugtest  tugtest.c:128   0{10,12,14,16}
>          0.00%    0.00%   59.80%   59.73%    0.00%          0x20     1       1      0x401516         0         0         0    18428      5  [.] reader  tugtest  tugtest.c:163   1{3,5,7,9,11}
>
> With the above c2c output, we can see:
>  - the hottest contended addresses, and the load latencies they caused.
>  - the cacheline offset for the contended addresses.
>  - the cpus and numa nodes where the accesses came from.
>  - the cacheline alignment for the data of interest.
>  - the number of cpus and threads concurrently accessing each address.
>  - the breakdown of reads causing HITM (contention) and writes hitting or missing the cacheline.
>  - the object name, source line and line number for where the accesses occured.
>  - the numa node where the data is allocated.
>  - the number of physical pages the virtual addresses were mapped to (e.g. numa_balancing).
>
> What would really help the c2c output be more usable is if it had a better visual to it.
> It's likely the current c2c output can be trimmed a bit.
>
> Here's one idea that incorporates your datatype info, though I'm sure there are better ways, as this may get unwieldy.:
>
> #      ----- HITM -----  ------- Store Refs ------  ------ Data address ------                ---------- cycles ----------    Total    cpu               Shared
> # Num  RmtHitm  LclHitm   L1 Hit  L1 Miss      N/A        Offset  Node  PA cnt  Code address  rmt hitm  lcl hitm      load  records    cnt      Symbol   Object    Source:Line  Node{cpu list}
> #....  .......  .......  .......  .......  .......  ............  ....  ......  ............  ........  ........  ........  .......  .....  ..........  .......  .............  ....
> #
>  ---------------------------------------------------------------
>     0     1094     2008    17071    13762        0      0x406100
>  ---------------------------------------------------------------
>   uint64_t hotVar: tugtest.c:38
>          0.00%    0.20%    0.00%    0.00%    0.00%           0x8     1       1      0x401355         0       978      1020     2962      4  [.] writer  tugtest  tugtest.c:129   0{10,12,14,16}
>          0.00%    0.00%    0.12%    0.02%    0.00%           0x8     1       1      0x401360         0         0         0       23      4  [.] writer  tugtest  tugtest.c:129   0{10,12,14,16}
>   struct __foo uint64_t m1: tugtest.c:39
>         68.10%   60.26%    0.00%    0.00%    0.00%          0x10     1       1      0x401505      2181      1541      1393     5813      5  [.] reader  tugtest  tugtest.c:163   1{3,5,7,9,11}
>         31.63%   39.34%    0.00%    0.00%    0.00%          0x10     1       1      0x401331      1242      1095       936     3393      4  [.] writer  tugtest  tugtest.c:127   0{10,12,14,16}
>          0.00%    0.00%   40.03%   40.25%    0.00%          0x10     1       1      0x40133c         0         0         0    12372      4  [.] writer  tugtest  tugtest.c:127   0{10,12,14,16}
>   struct __foo uint64_t m2: tugtest.c:40
>          0.27%    0.15%    0.00%    0.00%    0.00%          0x18     1       1      0x401343       834      1136      1032     2930      4  [.] writer  tugtest  tugtest.c:128   0{10,12,14,16}
>          0.00%    0.05%    0.00%    0.00%    0.00%          0x18     1       1      0x40150c         0       933      1567     5050      5  [.] reader  tugtest  tugtest.c:163   1{3,5,7,9,11}
>          0.00%    0.00%    0.06%    0.00%    0.00%          0x18     1       1      0x40134e         0         0         0       10      4  [.] writer  tugtest  tugtest.c:128   0{10,12,14,16}
>   struct __foo uint64_t m3: tugtest.c:41
>          0.00%    0.00%   59.80%   59.73%    0.00%          0x20     1       1      0x401516         0         0         0    18428      5  [.] reader  tugtest  tugtest.c:163   1{3,5,7,9,11}
>
> And then it would be good to find a clean way to incorporate your sample counts.

I'm not sure we can get the exact source line for the data type/fields.
Of course, we can aggregate the results for each field.  Actually you
can use `perf report -s type,typeoff,symoff --hierarchy` for something
similar. :)

>
> On a related note, is there a way the accesses could be broken down into read counts
> and write counts?   That, with the above source line info for all the accesses,
> helps to convey a picture of "the affinity of the accesses".

Sure, perf report already supports showing events in a group
together.  You can use --group option to force grouping
individual events.  perf annotate with --data-type doesn't have
that yet.  I'll update it in v2.

>
> For example, while it's normally good to separate read-mostly data from hot
> written data, if the reads and writes are done together in the same block of
> code by the same thread, then keeping the two data symbols in the same cacheline
> could be a win.  I've seen this often. Your datatype info might be able to
> make these affinities more visible to the user.
>
> Thanks for doing this. This is great.
> Joe

Thanks for your feedback!
Namhyung

      reply	other threads:[~2023-11-09  4:49 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-12  3:50 [RFC 00/48] perf tools: Introduce data type profiling (v1) Namhyung Kim
2023-10-12  3:50 ` [PATCH 01/48] perf annotate: Move raw_comment and raw_func_start Namhyung Kim
2023-10-12  3:50 ` [PATCH 02/48] perf annotate: Check if operand has multiple regs Namhyung Kim
2023-11-27 19:05   ` Arnaldo Carvalho de Melo
2023-10-12  3:50 ` [PATCH 03/48] perf tools: Add util/debuginfo.[ch] files Namhyung Kim
2023-10-12  3:50 ` [PATCH 04/48] perf dwarf-aux: Fix die_get_typename() for void * Namhyung Kim
2023-11-04 10:52   ` Masami Hiramatsu
2023-10-12  3:50 ` [PATCH 05/48] perf dwarf-aux: Move #ifdef code to the header file Namhyung Kim
2023-11-04 10:59   ` Masami Hiramatsu
2023-10-12  3:50 ` [PATCH 06/48] perf dwarf-aux: Add die_get_scopes() helper Namhyung Kim
2023-11-05  9:50   ` Masami Hiramatsu
2023-10-12  3:50 ` [PATCH 07/48] perf dwarf-aux: Add die_find_variable_by_reg() helper Namhyung Kim
2023-11-05  9:48   ` Masami Hiramatsu
2023-10-12  3:50 ` [PATCH 08/48] perf dwarf-aux: Factor out __die_get_typename() Namhyung Kim
2023-11-05  9:07   ` Masami Hiramatsu
2023-11-06  4:01     ` Namhyung Kim
2023-10-12  3:50 ` [PATCH 09/48] perf dwarf-regs: Add get_dwarf_regnum() Namhyung Kim
2023-11-05  8:36   ` Masami Hiramatsu
2023-11-06  4:12     ` Namhyung Kim
2023-10-12  3:50 ` [PATCH 10/48] perf annotate-data: Add find_data_type() Namhyung Kim
2023-10-12  3:50 ` [PATCH 11/48] perf annotate-data: Add dso->data_types tree Namhyung Kim
2023-10-12  3:50 ` [PATCH 12/48] perf annotate: Factor out evsel__get_arch() Namhyung Kim
2023-10-12  3:50 ` [PATCH 13/48] perf annotate: Add annotate_get_insn_location() Namhyung Kim
2023-10-23 16:38   ` Arnaldo Carvalho de Melo
2023-10-24 19:10     ` Namhyung Kim
2023-10-26  5:26       ` Namhyung Kim
2023-10-26 19:37         ` Arnaldo Carvalho de Melo
2023-10-12  3:50 ` [PATCH 14/48] perf annotate: Implement hist_entry__get_data_type() Namhyung Kim
2023-10-12  3:50 ` [PATCH 15/48] perf report: Add 'type' sort key Namhyung Kim
2023-10-23 16:53   ` Arnaldo Carvalho de Melo
2023-10-24 19:11     ` Namhyung Kim
2023-10-12  3:50 ` [PATCH 16/48] perf report: Support data type profiling Namhyung Kim
2023-10-12  3:50 ` [PATCH 17/48] perf annotate-data: Add member field in the data type Namhyung Kim
2023-10-12  3:50 ` [PATCH 18/48] perf annotate-data: Update sample histogram for type Namhyung Kim
2023-10-12  3:50 ` [PATCH 19/48] perf report: Add 'typeoff' sort key Namhyung Kim
2023-10-12  3:50 ` [PATCH 20/48] perf report: Add 'symoff' " Namhyung Kim
2023-10-12  3:50 ` [PATCH 21/48] perf annotate: Add --data-type option Namhyung Kim
2023-10-12  3:50 ` [PATCH 22/48] perf annotate: Add --type-stat option for debugging Namhyung Kim
2023-10-23 17:28   ` Arnaldo Carvalho de Melo
2023-10-23 17:40     ` Arnaldo Carvalho de Melo
2023-10-24 19:12       ` Namhyung Kim
2023-10-12  3:50 ` [PATCH 23/48] perf annotate: Add --insn-stat " Namhyung Kim
2023-10-12  3:50 ` [PATCH 24/48] perf annotate-data: Parse 'lock' prefix from llvm-objdump Namhyung Kim
2023-10-12  3:50 ` [PATCH 25/48] perf annotate-data: Handle macro fusion on x86 Namhyung Kim
2023-10-12  3:50 ` [PATCH 26/48] perf annotate-data: Handle array style accesses Namhyung Kim
2023-10-12  3:50 ` [PATCH 27/48] perf annotate-data: Add stack operation pseudo type Namhyung Kim
2023-10-12  3:50 ` [PATCH 28/48] perf dwarf-aux: Add die_find_variable_by_addr() Namhyung Kim
2023-11-06 15:25   ` Masami Hiramatsu
2023-11-09  5:36     ` Namhyung Kim
2023-10-12  3:50 ` [PATCH 29/48] perf annotate-data: Handle PC-relative addressing Namhyung Kim
2023-10-12  3:50 ` [PATCH 30/48] perf annotate-data: Support global variables Namhyung Kim
2023-10-12  3:50 ` [PATCH 31/48] perf dwarf-aux: Add die_get_cfa() Namhyung Kim
2023-11-07  0:50   ` Masami Hiramatsu
2023-11-08  5:28     ` Namhyung Kim
2023-10-12  3:50 ` [PATCH 32/48] perf annotate-data: Support stack variables Namhyung Kim
2023-10-12  3:50 ` [PATCH 33/48] perf dwarf-aux: Check allowed DWARF Ops Namhyung Kim
2023-11-07  9:32   ` Masami Hiramatsu
2023-11-08  5:34     ` Namhyung Kim
2023-10-12  3:50 ` [PATCH 34/48] perf dwarf-aux: Add die_collect_vars() Namhyung Kim
2023-11-08 10:52   ` Masami Hiramatsu
2023-11-09  5:05     ` Namhyung Kim
2023-10-12  3:50 ` [PATCH 35/48] perf dwarf-aux: Handle type transfer for memory access Namhyung Kim
2023-11-08 10:57   ` Masami Hiramatsu
2023-10-12  3:50 ` [PATCH 36/48] perf annotate-data: Introduce struct data_loc_info Namhyung Kim
2023-12-03 16:22   ` Athira Rajeev
2023-12-05  0:10     ` Namhyung Kim
2023-12-05  7:17       ` Athira Rajeev
2023-10-12  3:51 ` [PATCH 37/48] perf map: Add map__objdump_2rip() Namhyung Kim
2023-10-12  3:51 ` [PATCH 38/48] perf annotate: Add annotate_get_basic_blocks() Namhyung Kim
2023-10-12  3:51 ` [PATCH 39/48] perf annotate-data: Maintain variable type info Namhyung Kim
2023-10-12  3:51 ` [PATCH 40/48] perf annotate-data: Add update_insn_state() Namhyung Kim
2023-10-12  3:51 ` [PATCH 41/48] perf annotate-data: Handle global variable access Namhyung Kim
2023-10-12  3:51 ` [PATCH 42/48] perf annotate-data: Handle call instructions Namhyung Kim
2023-10-12  3:51 ` [PATCH 43/48] perf annotate-data: Implement instruction tracking Namhyung Kim
2023-10-12  3:51 ` [PATCH 44/48] perf annotate: Parse x86 segment register location Namhyung Kim
2023-10-12  3:51 ` [PATCH 45/48] perf annotate-data: Handle this-cpu variables in kernel Namhyung Kim
2023-10-12  3:51 ` [PATCH 46/48] perf annotate-data: Track instructions with a this-cpu variable Namhyung Kim
2023-10-12  3:51 ` [PATCH 47/48] perf annotate-data: Add stack canary type Namhyung Kim
2023-10-12  3:51 ` [PATCH 48/48] perf annotate-data: Add debug message Namhyung Kim
2023-10-12  6:03 ` [RFC 00/48] perf tools: Introduce data type profiling (v1) Ingo Molnar
2023-10-12 16:19   ` Namhyung Kim
2023-10-12 18:33     ` Ingo Molnar
2023-10-12 20:45       ` Namhyung Kim
2023-10-12  9:11 ` Peter Zijlstra
2023-10-12 16:41   ` Namhyung Kim
     [not found]     ` <CADzB+2mu98v9EUsA1Y-wVDSrXT2kznKi87Tb6QdN5y4mMFNsyg@mail.gmail.com>
2023-10-25  5:58       ` Namhyung Kim
2023-10-12  9:15 ` Peter Zijlstra
2023-10-12 16:52   ` Namhyung Kim
2023-10-13 14:15 ` Arnaldo Carvalho de Melo
2023-10-23 21:58 ` Andi Kleen
2023-10-24 19:16   ` Namhyung Kim
2023-10-25  2:09     ` Andi Kleen
2023-10-25  5:51       ` Namhyung Kim
2023-10-25 20:01         ` Andi Kleen
2023-11-08 17:12 ` Joe Mario
2023-11-09  4:48   ` Namhyung Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM9d7ciWY6kvVB-JkrJS7BZs_WMy-OZRgwLvBG9WQ-Zi00czfA@mail.gmail.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=blaikie@google.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jmario@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=torvalds@linux-foundation.org \
    --cc=woodard@redhat.com \
    --cc=xliuprof@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).