linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kan.liang@linux.intel.com
To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	tglx@linutronix.de, bp@alien8.de, x86@kernel.org,
	linux-kernel@vger.kernel.org
Cc: mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	jolsa@redhat.com, namhyung@kernel.org, dave.hansen@intel.com,
	yu-cheng.yu@intel.com, bigeasy@linutronix.de, gorcunov@gmail.com,
	hpa@zytor.com, alexey.budankov@linux.intel.com,
	eranian@google.com, ak@linux.intel.com, like.xu@linux.intel.com,
	yao.jin@linux.intel.com, wei.w.wang@intel.com,
	Kan Liang <kan.liang@linux.intel.com>
Subject: [PATCH V3 00/23] Support Architectural LBR
Date: Fri,  3 Jul 2020 05:49:06 -0700	[thread overview]
Message-ID: <1593780569-62993-1-git-send-email-kan.liang@linux.intel.com> (raw)

From: Kan Liang <kan.liang@linux.intel.com>

Changes since V2:
- Rebase on top of perf/vlbr
- Drop the patch "Check Arch LBR MSRs", which is unnecessary.
- Replace the variables for the full CPUID leaf in structure x86_pmu
  with the variables for the available enumeration bits, which save
  the space and make the code more readable.
- Clear FEATURE_ARCH_LBR if the check for Architectural LBR fails.
  Use static_cpu_has(X86_FEATURE_ARCH_LBR) to instead of
  x86_pmu.arch_lbr.
- Mark the {rd,wr}lbr_{to,from,info,all} wrappers,
  lbr_is_reset_in_cstate(), and task_context_opt() __always_inline

Changes since V1:
- Move the union for CPUID enumeration bits to asm/perf_event.h
- Union the lbr_ctl_map and lbr_sel_map
- Rename ARCH_LBR_INFO_* to LBR_INFO_*
- Remove the function pointers of lbr_enable/disable. Architectural
  LBR and the model-specific LBR share the same functions for
  lbr_enable/disable now.
- Factor out common codes for lbr_read, lbr_save and lbr_restore to
  reduce the duplicate codes.
- Export copy_fpregs_to_fpstate instead of xfeatures_mask_all
- Add a padding to guarantee the required alignment
- Simplify the intel_pmu_arch_lbr_read_xsave()

LBR (Last Branch Records) enables recording of software path history
by logging taken branches and other control flows within architectural
registers. Intel CPUs have had model-specific LBRs for quite some time
but this evolves them into an architectural feature now.

The main advantages for the users are:
- Faster context switching due to XSAVES support and faster reset of
  LBR MSRs via the new DEPTH MSR
- Faster LBR read for a non-PEBS event due to XSAVES support, which
  lowers the overhead of the NMI handler. (For a PEBS event, the LBR
  information is recorded in the PEBS records. There is no impact on
  the PEBS event.)
- Linux kernel can support the LBR features without knowing the model
  number of the current CPU.
- Clean exposure of LBRs to guests without relying on model-specific
  features. (An improvement for KVM. Not included in this patch series.)
- Supports for running with a smaller number of LBRs than the full 32,
  to lower overhead (currently not exposed, however)

The key improvements for the perf kernel in this patch series include,
- Doesn't require a model check. The capabilities of Architectural LBR
  can be enumerated by CPUID.
- Each LBR record or entry is still comprised of three MSRs,
  IA32_LBR_x_FROM_IP, IA32_LBR_x_TO_IP and IA32_LBR_x_TO_IP, but they
  become architectural MSRs.
- Architectural LBR is stack-like now. Entry 0 is always the youngest
  branch, entry 1 the next youngest... The TOS MSR has been removed.
- A new IA32_LBR_CTL MSR is introduced to enable and configure LBRs,
  which replaces the IA32_DEBUGCTL[bit 0] and the LBR_SELECT MSR.
- The possible LBR depth can be retrieved from CPUID enumeration. The
  max value is written to the new MSR_ARCH_LBR_DEPTH as the number of
  LBR entries.
- Faster LBR MSRs reset via the new DEPTH MSR, which avoids touching
  potentially nearly a hundred MSRs.
- XSAVES and XRSTORS are used to read, save/restore LBR related MSRs
- Faster direct reporting of the branch type with the LBR without needing
  access to the code

The existing LBR capabilities, such as CPL filtering, Branch filtering,
Call stack, Mispredict information, cycles information, Branch Type
information, are still kept for Architectural LBRs.

XSAVES and XRSTORS improvements:

In perf with call stack mode, LBR information is used to reconstruct
a call stack. To get a complete call stack, perf has to save and restore
all LBR registers during a context switch. However, the number of LBR
registers is huge. To reduce the overhead, LBR state component is
introduced with architectural LBR. Perf subsystem will use XSAVES/XRSTORS
to save/restore LBRs during a context switch.

LBR call stack mode is not always enabled. Perf subsystem only needs to
save/restore an LBR state on demand. To avoid unnecessary save/restore of
the LBR state at a context switch, a software concept, dynamic supervisor
state, is introduced, which
- does not allocate a buffer in each task->fpu;
- does not save/restore a state component at each context switch;
- sets the bit corresponding to a dynamic supervisor feature in
  IA32_XSS at boot time, and avoids setting it at run time;
- dynamically allocates a specific buffer for a state component
  on demand, e.g. only allocate a LBR-specific XSAVE buffer when LBR is
  enabled in perf. (Note: The buffer has to include the LBR state
  component, legacy region and XSAVE header.)
- saves/restores a state component on demand, e.g. manually invoke
  the XSAVES/XRSTORS instruction to save/restore the LBR state
  to/from the buffer when perf is active and a call stack is required.

The specification of Architectural LBR can be found in the latest Intel
Architecture Instruction Set Extensions and Future Features Programming
Reference, 319433-038.

Kan Liang (23):
  x86/cpufeatures: Add Architectural LBRs feature bit
  perf/x86/intel/lbr: Add a function pointer for LBR reset
  perf/x86/intel/lbr: Add a function pointer for LBR read
  perf/x86/intel/lbr: Add the function pointers for LBR save and restore
  perf/x86/intel/lbr: Factor out a new struct for generic optimization
  perf/x86/intel/lbr: Use dynamic data structure for task_ctx
  x86/msr-index: Add bunch of MSRs for Arch LBR
  perf/x86: Expose CPUID enumeration bits for arch LBR
  perf/x86/intel/lbr: Support LBR_CTL
  perf/x86/intel/lbr: Unify the stored format of LBR information
  perf/x86/intel/lbr: Mark the {rd,wr}lbr_{to,from} wrappers
    __always_inline
  perf/x86/intel/lbr: Factor out rdlbr_all() and wrlbr_all()
  perf/x86/intel/lbr: Factor out intel_pmu_store_lbr
  perf/x86/intel/lbr: Support Architectural LBR
  perf/core: Factor out functions to allocate/free the task_ctx_data
  perf/core: Use kmem_cache to allocate the PMU specific data
  perf/x86/intel/lbr: Create kmem_cache for the LBR context data
  perf/x86: Remove task_ctx_size
  x86/fpu: Use proper mask to replace full instruction mask
  x86/fpu/xstate: Support dynamic supervisor feature for LBR
  x86/fpu/xstate: Add helpers for LBR dynamic supervisor feature
  perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch
  perf/x86/intel/lbr: Support XSAVES for arch LBR read

 arch/x86/events/core.c              |   2 +-
 arch/x86/events/intel/core.c        |  18 +
 arch/x86/events/intel/ds.c          |   4 +-
 arch/x86/events/intel/lbr.c         | 683 ++++++++++++++++++++++++++++++------
 arch/x86/events/perf_event.h        | 119 ++++++-
 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/include/asm/fpu/internal.h |  47 +--
 arch/x86/include/asm/fpu/types.h    |  27 ++
 arch/x86/include/asm/fpu/xstate.h   |  36 ++
 arch/x86/include/asm/msr-index.h    |  16 +
 arch/x86/include/asm/perf_event.h   |  46 ++-
 arch/x86/kernel/fpu/core.c          |  39 ++
 arch/x86/kernel/fpu/xstate.c        |  89 ++++-
 include/linux/perf_event.h          |   5 +-
 kernel/events/core.c                |  25 +-
 15 files changed, 972 insertions(+), 185 deletions(-)

-- 
2.7.4


             reply	other threads:[~2020-07-03 12:53 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-03 12:49 kan.liang [this message]
2020-07-03 12:49 ` [PATCH V3 01/23] x86/cpufeatures: Add Architectural LBRs feature bit kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-09 23:00     ` Dave Hansen
2020-07-10  9:51       ` Peter Zijlstra
2020-07-10 14:09       ` Liang, Kan
2020-07-03 12:49 ` [PATCH V3 02/23] perf/x86/intel/lbr: Add a function pointer for LBR reset kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 03/23] perf/x86/intel/lbr: Add a function pointer for LBR read kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 04/23] perf/x86/intel/lbr: Add the function pointers for LBR save and restore kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 05/23] perf/x86/intel/lbr: Factor out a new struct for generic optimization kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 06/23] perf/x86/intel/lbr: Use dynamic data structure for task_ctx kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 07/23] x86/msr-index: Add bunch of MSRs for Arch LBR kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 08/23] perf/x86: Expose CPUID enumeration bits for arch LBR kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 09/23] perf/x86/intel/lbr: Support LBR_CTL kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 10/23] perf/x86/intel/lbr: Unify the stored format of LBR information kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 11/23] perf/x86/intel/lbr: Mark the {rd,wr}lbr_{to,from} wrappers __always_inline kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 12/23] perf/x86/intel/lbr: Factor out rdlbr_all() and wrlbr_all() kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 13/23] perf/x86/intel/lbr: Factor out intel_pmu_store_lbr kan.liang
2020-07-03 19:50   ` Peter Zijlstra
2020-07-03 20:59     ` Liang, Kan
2020-07-06 10:25       ` Peter Zijlstra
2020-07-06 13:32         ` Liang, Kan
2020-07-06 14:25           ` Peter Zijlstra
2020-07-06 22:29       ` Liang, Kan
2020-07-07  7:40         ` Peter Zijlstra
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 14/23] perf/x86/intel/lbr: Support Architectural LBR kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 15/23] perf/core: Factor out functions to allocate/free the task_ctx_data kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 16/23] perf/core: Use kmem_cache to allocate the PMU specific data kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 17/23] perf/x86/intel/lbr: Create kmem_cache for the LBR context data kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 18/23] perf/x86: Remove task_ctx_size kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 19/23] x86/fpu: Use proper mask to replace full instruction mask kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 20/23] x86/fpu/xstate: Support dynamic supervisor feature for LBR kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2021-05-27 22:15     ` Thomas Gleixner
2020-07-03 12:49 ` [PATCH V3 21/23] x86/fpu/xstate: Add helpers for LBR dynamic supervisor feature kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 22/23] perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 12:49 ` [PATCH V3 23/23] perf/x86/intel/lbr: Support XSAVES for arch LBR read kan.liang
2020-07-08  9:51   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-03 19:34 ` [PATCH V3 00/23] Support Architectural LBR Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1593780569-62993-1-git-send-email-kan.liang@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.budankov@linux.intel.com \
    --cc=bigeasy@linutronix.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=eranian@google.com \
    --cc=gorcunov@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=like.xu@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=wei.w.wang@intel.com \
    --cc=x86@kernel.org \
    --cc=yao.jin@linux.intel.com \
    --cc=yu-cheng.yu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).