linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 00/13] perf/x86/amd: Add AMD Fam19h Branch Sampling support
@ 2021-09-09  7:56 Stephane Eranian
  2021-09-09  7:56 ` [PATCH v1 01/13] perf/core: add union to struct perf_branch_entry Stephane Eranian
                   ` (13 more replies)
  0 siblings, 14 replies; 41+ messages in thread
From: Stephane Eranian @ 2021-09-09  7:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, acme, jolsa, kim.phillips, namhyung, irogers

This patch series adds support for the AMD Fam19h 16-deep branch sampling
feature as described in the AMD PPR Fam19h Model 01h Revision B1 section 2.1.13.
This is a model specific extension. It is not an architected AMD feature.

The Branch Sampling Feature (BRS) provides the statistical taken branch information necessary
to enable autoFDO-style optimization by compilers, i.e., basic block execution counts.

BRS operates with a 16-deep saturating buffer in MSR registers. There is no
hardware branch type filtering. All control flow changes are captured. BRS
relies on specific programming of the core PMU of Fam19h.  In particular,
the following requirements must be met:
 - the sampling period be greater than 16 (BRS depth)
 - the sampling period must use fixed and not frequency mode

BRS interacts with the NMI interrupt as well. Because enabling BRS is expensive,
it is only activated after P event occurrences, where P is the desired sampling period.
At P occurrences of the event, the counter overflows, the CPU catches the NMI interrupt,
activates BRS for 16 branches until it saturates, and then delivers the NMI to the kernel.
Between the overflow and the time BRS activates more branches may be executed skewing the period.
All along, the sampling event keeps counting. The skid may be attenuated by reducing the sampling
period by 16.

BRS is integrated into perf_events seamlessly via the same PERF_RECORD_BRANCH_STACK sample
format. BRS generates branch perf_branch_entry records in the sampling buffer. There is
no prediction or latency information supported. The branches are stored in reverse order of
execution.  The most recent branch is the first entry in each record.

Because BRS must be stopped when a CPU goes into low power mode, the series includes patches
to add callbacks on low power entry and exit (mwait and halt).

Given that there is no privilege filterting with BRS, the kernel implements filtering on
privlege level.

This version adds a few simple modifications to perf record and report.
1. add the branch-brs event as a builtin such as it can used directly: perf record -e branch-brs ...
2. improve error handling for AMD IBS and is contributed by Kim Phillips.
3. use the better error handling to improve error handling for branch sampling
4. add two new sort dimensions to help display the branch sampling information. Because there is no
   latency information associated with the branch sampling feature perf report would collapse all
   samples within a function into a single histogram entry. This is expected because the default
   sort mode for PERF_SAMPLE_BRANCH_STACK is symbol_from/symbol_to. This propagates to the annotation.

For more detailed view of the branch samples, the new sort dimensions addr_from,addr_to can be used
instead as follows:

$ perf report --sort=overhead,comm,dso,addr_from,addr_to 
# Overhead  Command          Shared Object     Source Address  Target Address
# ........  ...............  ................  ..............  ..............
#
     4.21%  test_prg        test_prg         [.] test_threa+0x3c  [.] test_threa+0x4
     4.14%  test_prg        test_prg         [.] test_threa+0x3e  [.] test_threa+0x2
     4.10%  test_prg        test_prg         [.] test_threa+0x4  [.] test_threa+0x3a
     4.07%  test_prg        test_prg         [.] test_threa+0x2  [.] test_threa+0x3c

Versus the default output:

$ perf report 
# Overhead  Command          Source Shared Object  Source Symbol                        Target Symbol                        Basic Block Cycles
# ........  ...............  ....................  ...................................  ...................................  ..................
#
    99.52%  test_prg        test_prg             [.] test_thread                      [.] test_thread                      -                 

BRS can be used with any sampling event. However, it is recommended to use the RETIRED_BRANCH
event because it matches what the BRS captures. For convenience, a pseudo event matching the
branches captured by BRS is exported by the kernel (branch-brs):

$ perf record -b -e cpu/branch-brs/ -c 1000037 test

$ perf report -D
56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0
... branch stack: nr:16
.....  0: 0000000000401d24 -> 0000000000401d5a 0 cycles      0
.....  1: 0000000000401d5c -> 0000000000401d24 0 cycles      0
.....  2: 0000000000401d22 -> 0000000000401d5c 0 cycles      0
.....  3: 0000000000401d5e -> 0000000000401d22 0 cycles      0
.....  4: 0000000000401d20 -> 0000000000401d5e 0 cycles      0
.....  5: 0000000000401d3e -> 0000000000401d20 0 cycles      0
.....  6: 0000000000401d42 -> 0000000000401d3e 0 cycles      0
.....  7: 0000000000401d3c -> 0000000000401d42 0 cycles      0
.....  8: 0000000000401d44 -> 0000000000401d3c 0 cycles      0
.....  9: 0000000000401d3a -> 0000000000401d44 0 cycles      0
..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles      0
..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles      0
..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles      0
..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles      0
..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles      0
..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles      0
 ... thread: test:18230
 ...... dso: test

Special thanks to Kim Phillips @ AMD for the testing, reviews and contributions.

Kim Phillips (1):
  perf tools: improve IBS error handling

Stephane Eranian (12):
  perf/core: add union to struct perf_branch_entry
  x86/cpufeatures: add AMD Fam19h Branch Sampling feature
  perf/x86/amd: add AMD Fam19h Branch Sampling support
  perf/x86/amd: add branch-brs helper event for Fam19h BRS
  perf/x86/amd: enable branch sampling priv level filtering
  perf/x86/amd: add AMD branch sampling period adjustment
  perf/core: add idle hooks
  perf/x86/core: add idle hooks
  perf/x86/amd: add idle hooks for branch sampling
  perf tools: add branch-brs as a new event
  perf tools: improve error handling of AMD Branch Sampling
  perf report: add addr_from/addr_to sort dimensions

 arch/x86/events/amd/Makefile       |   2 +-
 arch/x86/events/amd/brs.c          | 342 +++++++++++++++++++++++++++++
 arch/x86/events/amd/core.c         | 222 ++++++++++++++++++-
 arch/x86/events/core.c             |  22 +-
 arch/x86/events/intel/lbr.c        |  13 +-
 arch/x86/events/perf_event.h       | 106 +++++++--
 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/msr-index.h   |   4 +
 arch/x86/include/asm/mwait.h       |   6 +-
 include/linux/perf_event.h         |   8 +
 include/uapi/linux/perf_event.h    |  19 +-
 kernel/events/core.c               |  58 +++++
 kernel/sched/idle.c                |  19 +-
 tools/perf/util/evsel.c            |  50 +++++
 tools/perf/util/hist.c             |   2 +
 tools/perf/util/hist.h             |   2 +
 tools/perf/util/parse-events.l     |   1 +
 tools/perf/util/sort.c             | 128 +++++++++++
 tools/perf/util/sort.h             |   2 +
 19 files changed, 964 insertions(+), 43 deletions(-)
 create mode 100644 arch/x86/events/amd/brs.c

-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-10-28 18:30 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09  7:56 [PATCH v1 00/13] perf/x86/amd: Add AMD Fam19h Branch Sampling support Stephane Eranian
2021-09-09  7:56 ` [PATCH v1 01/13] perf/core: add union to struct perf_branch_entry Stephane Eranian
2021-09-09 19:03   ` Peter Zijlstra
2021-09-10 12:09     ` Michael Ellerman
2021-09-10 14:16       ` Michael Ellerman
2021-09-15  6:03         ` Stephane Eranian
2021-09-17  6:37           ` Madhavan Srinivasan
2021-09-17  6:48             ` Stephane Eranian
2021-09-17  7:05               ` Michael Ellerman
2021-09-17  7:39                 ` Stephane Eranian
2021-09-17 12:38                   ` Michael Ellerman
2021-09-17 16:42                     ` Stephane Eranian
2021-09-19 10:27                       ` Michael Ellerman
2021-09-09  7:56 ` [PATCH v1 02/13] x86/cpufeatures: add AMD Fam19h Branch Sampling feature Stephane Eranian
2021-09-09  7:56 ` [PATCH v1 03/13] perf/x86/amd: add AMD Fam19h Branch Sampling support Stephane Eranian
2021-09-09 10:44   ` kernel test robot
2021-09-09 15:33   ` kernel test robot
2021-09-09  7:56 ` [PATCH v1 04/13] perf/x86/amd: add branch-brs helper event for Fam19h BRS Stephane Eranian
2021-09-09  7:56 ` [PATCH v1 05/13] perf/x86/amd: enable branch sampling priv level filtering Stephane Eranian
2021-09-09  7:56 ` [PATCH v1 06/13] perf/x86/amd: add AMD branch sampling period adjustment Stephane Eranian
2021-09-09  7:56 ` [PATCH v1 07/13] perf/core: add idle hooks Stephane Eranian
2021-09-09  9:15   ` Peter Zijlstra
2021-09-09 10:42   ` kernel test robot
2021-09-09 11:02   ` kernel test robot
2021-09-09  7:56 ` [PATCH v1 08/13] perf/x86/core: " Stephane Eranian
2021-09-09  9:16   ` Peter Zijlstra
2021-09-09  7:56 ` [PATCH v1 09/13] perf/x86/amd: add idle hooks for branch sampling Stephane Eranian
2021-09-09  9:20   ` Peter Zijlstra
2021-09-09  7:56 ` [PATCH v1 10/13] perf tools: add branch-brs as a new event Stephane Eranian
2021-09-09  7:56 ` [PATCH v1 11/13] perf tools: improve IBS error handling Stephane Eranian
2021-09-13 19:34   ` Arnaldo Carvalho de Melo
2021-10-04 21:57     ` Kim Phillips
2021-10-04 23:44       ` Arnaldo Carvalho de Melo
2021-09-09  7:56 ` [PATCH v1 12/13] perf tools: improve error handling of AMD Branch Sampling Stephane Eranian
2021-10-04 21:57   ` Kim Phillips
2021-09-09  7:57 ` [PATCH v1 13/13] perf report: add addr_from/addr_to sort dimensions Stephane Eranian
2021-09-09  8:55 ` [PATCH v1 00/13] perf/x86/amd: Add AMD Fam19h Branch Sampling support Peter Zijlstra
2021-09-15  5:55   ` Stephane Eranian
2021-09-15  9:04     ` Peter Zijlstra
2021-10-28 18:30       ` Stephane Eranian
2021-09-27 20:17     ` Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).