linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/11] perf c2c: Sort cacheline with all loads
@ 2020-12-13 13:38 Leo Yan
  2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
                   ` (11 more replies)
  0 siblings, 12 replies; 31+ messages in thread
From: Leo Yan @ 2020-12-13 13:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Andi Kleen,
	Ian Rogers, Kan Liang, Joe Mario, David Ahern, Don Zickus,
	Al Grant, James Clark, linux-kernel
  Cc: Leo Yan

This patch set is to sort cache line for all load operations which hit
any cache levels.  For single cache line view, it shows the load
references for loads with cache hits and with cache misses respectively.

This series is a following for the old patch set "perf c2c: Sort
cacheline with LLC load" [1], in the old patch set it tries to sort
cache line with the load operations in last level cache (LLC), after
testing we found the trace data doesn't contain LLC events if the
platform isn't a NUMA system.  For this reason, this series refines the
implementation to sort on all cache levels hits of load operations; it's
reasonable for us to review the load and store opreations, if detects
any cache line is accessed by multi-threads, this hints that the cache
line is possible for false sharing.

This patch set is clearly applied on perf/core branch with the latest
commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
evlist' record methods").  And the changes has been tested on x86 and
Arm64, the testing result is shown as below.

The testing result on x86:

  # perf c2c record -- false_sharing.exe 2
  # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table
  =================================================
  #
  #        ----------- Cacheline ----------  Load Hit  Load Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt       Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  ........  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0x556f25dff100     0    1895    75.73%      4591     7840     4591     3249     2633      616      849     2734       67        58      883         0        0         0         0
        1      0x556f25dff080     0       1    13.10%       794      794      794        0        0        0      164      486       28        20       96         0        0         0         0
        2      0x556f25dff0c0     0       1    10.01%       607      607      607        0        0        0      107        5        5       488        2         0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto
  =================================================
  #
  #        --  Load Refs --  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared
  #   Num      Hit     Miss   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
  #
    -------------------------------------------------------------
        0     4591        0     2633      616      0x556f25dff100
    -------------------------------------------------------------
            20.52%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc1d         0      2200      1276      942         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            19.82%    0.00%   38.06%    0.00%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc16         0      2190      1130     1912         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            18.25%    0.00%   56.63%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc16         0      2173      1074     2329         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            18.23%    0.00%    0.00%    0.00%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc1d         0      2013      1220      837         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    3.11%   59.90%                 0x0     0       1    28079    28081:lock_th         0x556f25bfdc28         0         0         0      451         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%    2.20%   40.10%                 0x0     0       1    28079    28082:lock_th         0x556f25bfdc28         0         0         0      305         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            12.00%    0.00%    0.00%    0.00%                0x20     0       1    28079    28083:reader_thd      0x556f25bfdc73         0       159       107      551         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0
            11.17%    0.00%    0.00%    0.00%                0x20     0       1    28079    28084:reader_thd      0x556f25bfdc73         0       148       108      513         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   0

  [...]


The testing result on Arm64 (Hisilicon D06); please note, the Arm SPE
data source patch set has not been merged into the mainline kernel and
a potential issue for store operations is working in progress, so the
final outputting result might have minor differences.

  # perf c2c record -- false_sharing.exe 2
  # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table          
  =================================================
  #
  #        ----------- Cacheline ----------  Load Hit  Load Hit    Total    Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  - RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt       Pct     Total  records    Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm    RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  ........  ........  .......  .......  .......  .......  .......  .......  .......  .......  ........  .......  ........  .......  ........  ........
  #
        0      0xaaaab4e8b100   N/A       0    35.04%    100447   104933   100447     4486     4486        0        0    11269        0     89178        0         0        0         0         0
        1      0xaaaab4e8af80   N/A       0    17.29%     49571    49571    49571        0        0        0        0    49571        0         0        0         0        0         0         0
        2      0xaaaab4e8afc0   N/A       0    16.72%     47922    47922    47922        0        0        0        0    47922        0         0        0         0        0         0         0
        3      0xaaaab4e8b080   N/A       0     8.94%     25641    67718    25641    42077    42077        0        0     4397        0     21244        0         0        0         0         0
        4      0xaaaab4e7a480   N/A       0     4.42%     12680    12680    12680        0        0        0        0    12680        0         0        0         0        0         0         0
        5      0xffffa2ffc980   N/A       0     2.62%      7511     7511     7511        0        0        0        0     7511        0         0        0         0        0         0         0
        6      0xffffa3ffe980   N/A       0     2.57%      7374     7374     7374        0        0        0        0     7374        0         0        0         0        0         0         0
        7      0xaaaab4e8b000   N/A       0     2.41%      6907     6907     6907        0        0        0        0     6907        0         0        0         0        0         0         0
        8      0xaaaab4e8b0c0   N/A       0     2.30%      6592     6592     6592        0        0        0        0     2822        0      3770        0         0        0         0         0
        9      0xffffa37fd980   N/A       0     2.24%      6408     6408     6408        0        0        0        0     6408        0         0        0         0        0         0         0
       10      0xffffb8d80980   N/A       0     2.18%      6254     6254     6254        0        0        0        0     6254        0         0        0         0        0         0         0
       11      0xffffb9d82980   N/A       0     1.31%      3763     9706     3763     5943     5943        0        0     3763        0         0        0         0        0         0         0
       12      0xffffb9581980   N/A       0     1.22%      3507    11484     3507     7977     7977        0        0     3507        0         0        0         0        0         0         0
       13      0xffffbad84980   N/A       0     0.33%       932     7766      932     6834     6834        0        0      932        0         0        0         0        0         0         0
       14      0xffffba583980   N/A       0     0.24%       700     6503      700     5803     5803        0        0      700        0         0        0         0        0         0         0
  
  =================================================
        Shared Cache Line Distribution Pareto      
  =================================================
  #
  #        --  Load Refs --  -- Store Refs --  --------- Data address ---------                                                   ---------- cycles ----------    Total       cpu                                  Shared                                   
  #   Num      Hit     Miss   L1 Hit  L1 Miss              Offset  Node  PA cnt      Pid                 Tid        Code address  rmt hitm  lcl hitm      load  records       cnt               Symbol             Object                  Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ......  .......  ..................  ..................  ........  ........  ........  .......  ........  ...................  .................  ...........................  ....
  #
    -------------------------------------------------------------
        0   100447        0     4486        0      0xaaaab4e8b100
    -------------------------------------------------------------
            15.44%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15049:lock_th         0xaaaab4e79dd0         0         0         0    15508         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   1
            14.43%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15048:lock_th         0xaaaab4e79dd0         0         0         0    14499         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
            11.57%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15048:lock_th         0xaaaab4e79db8         0         0         0    11622         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
            11.38%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15050:lock_th         0xaaaab4e79dd0         0         0         0    11429         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   2
            10.57%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15051:lock_th         0xaaaab4e79dd0         0         0         0    10614         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   3
             9.69%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15049:lock_th         0xaaaab4e79db8         0         0         0     9731         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   1
             5.74%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15050:lock_th         0xaaaab4e79db8         0         0         0     5763         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   2
             4.84%    0.00%    0.00%    0.00%                 0x0   N/A       0    15046    15051:lock_th         0xaaaab4e79db8         0         0         0     4866         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   3
             0.00%    0.00%   14.02%    0.00%                 0x0   N/A       0    15046    15048:lock_th         0xaaaab4e79dbc         0         0         0      629         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   0
             0.00%    0.00%    6.44%    0.00%                 0x0   N/A       0    15046    15048:lock_th         0xaaaab4e79de0         0         0         0      289         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   0
             0.00%    0.00%   12.37%    0.00%                 0x0   N/A       0    15046    15049:lock_th         0xaaaab4e79dbc         0         0         0      555         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   1
             0.00%    0.00%    6.46%    0.00%                 0x0   N/A       0    15046    15049:lock_th         0xaaaab4e79de0         0         0         0      290         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   1
             0.00%    0.00%   21.38%    0.00%                 0x0   N/A       0    15046    15050:lock_th         0xaaaab4e79dbc         0         0         0      959         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   2
             0.00%    0.00%    9.61%    0.00%                 0x0   N/A       0    15046    15050:lock_th         0xaaaab4e79de0         0         0         0      431         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   2
             0.00%    0.00%   22.14%    0.00%                 0x0   N/A       0    15046    15051:lock_th         0xaaaab4e79dbc         0         0         0      993         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:145   3
             0.00%    0.00%    7.58%    0.00%                 0x0   N/A       0    15046    15051:lock_th         0xaaaab4e79de0         0         0         0      340         2  [.] read_write_func  false_sharing.exe  false_sharing_example.c:146   3
             6.66%    0.00%    0.00%    0.00%                0x20   N/A       0    15046    15054:reader_thd      0xaaaab4e79e54         0         0         0     6687         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:155   2
             3.76%    0.00%    0.00%    0.00%                0x28   N/A       0    15046    15052:reader_thd      0xaaaab4e79e80         0         0         0     3774         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:159   0
             3.54%    0.00%    0.00%    0.00%                0x28   N/A       0    15046    15055:reader_thd      0xaaaab4e79e80         0         0         0     3551         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:159   3
             2.39%    0.00%    0.00%    0.00%                0x30   N/A       0    15046    15053:reader_thd      0xaaaab4e79eac         0         0         0     2403         1  [.] read_write_func  false_sharing.exe  false_sharing_example.c:163   1


  [...]

Changes from v1:
* Changed from sorting on LLC to sorting on all loads with cache hits;
* Added patches 06/11, 07/11 for refactoring macros;
* Added patch 08/11 for refactoring node header, so can display "%loads"
  rather than "%hitms" in the header;
* Added patch 09/11 to add local pointers for pointing to output metrics
  string and sort string (Juri);
* Added warning in percent_hitm() for the display "all", which should
  never happen (Juri).

[1] https://lore.kernel.org/patchwork/cover/1321514/


Leo Yan (11):
  perf c2c: Add dimensions for total load hit
  perf c2c: Add dimensions for load hit
  perf c2c: Add dimensions for load miss
  perf c2c: Rename for shared cache line stats
  perf c2c: Refactor hist entry validation
  perf c2c: Refactor display filter macro
  perf c2c: Refactor node display macro
  perf c2c: Refactor node header
  perf c2c: Add local variables for output metrics
  perf c2c: Sort on all cache hit for load operations
  perf c2c: Update documentation for display option 'all'

 tools/perf/Documentation/perf-c2c.txt |  21 +-
 tools/perf/builtin-c2c.c              | 548 ++++++++++++++++++++++----
 2 files changed, 487 insertions(+), 82 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-01-16  0:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-13 13:38 [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Leo Yan
2020-12-13 13:38 ` [PATCH v2 01/11] perf c2c: Add dimensions for total load hit Leo Yan
2021-01-06  7:38   ` Namhyung Kim
2021-01-11  7:49     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 02/11] perf c2c: Add dimensions for " Leo Yan
2021-01-06  7:38   ` Namhyung Kim
2021-01-11  8:22     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 03/11] perf c2c: Add dimensions for load miss Leo Yan
2021-01-06  7:42   ` Namhyung Kim
2021-01-11  8:41     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 04/11] perf c2c: Rename for shared cache line stats Leo Yan
2021-01-06  7:44   ` Namhyung Kim
2021-01-11  8:42     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 05/11] perf c2c: Refactor hist entry validation Leo Yan
2020-12-13 13:38 ` [PATCH v2 06/11] perf c2c: Refactor display filter macro Leo Yan
2021-01-06  7:47   ` Namhyung Kim
2021-01-11  8:43     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 07/11] perf c2c: Refactor node display macro Leo Yan
2021-01-06  7:47   ` Namhyung Kim
2021-01-11  8:44     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 08/11] perf c2c: Refactor node header Leo Yan
2020-12-13 13:38 ` [PATCH v2 09/11] perf c2c: Add local variables for output metrics Leo Yan
2020-12-13 13:38 ` [PATCH v2 10/11] perf c2c: Sort on all cache hit for load operations Leo Yan
2021-01-06  7:52   ` Namhyung Kim
2021-01-11  8:47     ` Leo Yan
2020-12-13 13:38 ` [PATCH v2 11/11] perf c2c: Update documentation for display option 'all' Leo Yan
2021-01-03 22:52 ` [PATCH v2 00/11] perf c2c: Sort cacheline with all loads Jiri Olsa
2021-01-04  2:09   ` Leo Yan
2021-01-04  9:35     ` Jiri Olsa
2021-01-15 15:17       ` Arnaldo Carvalho de Melo
2021-01-16  0:45         ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).