All of lore.kernel.org
 help / color / mirror / Atom feed
From: kan.liang@linux.intel.com
To: peterz@infradead.org, mingo@redhat.com, acme@kernel.org,
	tglx@linutronix.de, bp@alien8.de, x86@kernel.org,
	linux-kernel@vger.kernel.org
Cc: mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	jolsa@redhat.com, namhyung@kernel.org, dave.hansen@intel.com,
	yu-cheng.yu@intel.com, bigeasy@linutronix.de, gorcunov@gmail.com,
	hpa@zytor.com, alexey.budankov@linux.intel.com,
	eranian@google.com, ak@linux.intel.com, like.xu@linux.intel.com,
	yao.jin@linux.intel.com, Kan Liang <kan.liang@linux.intel.com>
Subject: [PATCH 11/21] perf/x86/intel/lbr: Support LBR_CTL
Date: Fri, 19 Jun 2020 07:03:59 -0700	[thread overview]
Message-ID: <1592575449-64278-12-git-send-email-kan.liang@linux.intel.com> (raw)
In-Reply-To: <1592575449-64278-1-git-send-email-kan.liang@linux.intel.com>

From: Kan Liang <kan.liang@linux.intel.com>

An IA32_LBR_CTL is introduced for Architecture LBR to enable and config
LBR registers to replace the previous LBR_SELECT.

All the related members in struct cpu_hw_events and struct x86_pmu
have to be renamed.

Some new macros are added to reflect the layout of LBR_CTL.

The mapping from PERF_SAMPLE_BRANCH_* to the corresponding bits in
LBR_CTL MSR is saved in lbr_ctl_map now, which is not a const value.
The value relies on the CPUID enumeration.

A dedicated HW LBR filter is implemented for Architecture LBR. For the
previous model-specific LBR, most of the bits in LBR_SELECT operate in
the suppressed mode. For the bits in LBR_CTL, the polarity is inverted.

For the previous model-specific LBR format 5 (LBR_FORMAT_INFO), if the
NO_CYCLES and NO_FLAGS type are set, the flag LBR_NO_INFO will be set to
avoid the unnecessary LBR_INFO MSR read. Although Architecture LBR also
has a dedicated LBR_INFO MSR, perf doesn't need to check and set the
flag LBR_NO_INFO. For Architecture LBR, XSAVES instruction will be used
as the default way to read the LBR MSRs all together. The overhead which
the flag tries to avoid doesn't exist anymore. Dropping the flag can
save the extra check for the flag in the lbr_read() later, and make the
code cleaner.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c |  4 +--
 arch/x86/events/intel/lbr.c  | 73 +++++++++++++++++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h | 18 ++++++++---
 3 files changed, 88 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c3372bd..6462ef2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3680,7 +3680,7 @@ int intel_cpuc_prepare(struct cpu_hw_events *cpuc, int cpu)
 {
 	cpuc->pebs_record_size = x86_pmu.pebs_record_size;
 
-	if (x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
+	if (x86_pmu.extra_regs || x86_pmu.lbr_sel_map || x86_pmu.lbr_ctl_map) {
 		cpuc->shared_regs = allocate_shared_regs(cpu);
 		if (!cpuc->shared_regs)
 			goto err;
@@ -3778,7 +3778,7 @@ static void intel_pmu_cpu_starting(int cpu)
 		cpuc->shared_regs->refcnt++;
 	}
 
-	if (x86_pmu.lbr_sel_map)
+	if (x86_pmu.lbr_sel_map || x86_pmu.lbr_ctl_map)
 		cpuc->lbr_sel = &cpuc->shared_regs->regs[EXTRA_REG_LBR];
 
 	if (x86_pmu.flags & PMU_FL_EXCL_CNTRS) {
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 1c253ab..b34beb5 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -132,6 +132,44 @@ enum {
 	 X86_BR_IRQ		|\
 	 X86_BR_INT)
 
+/*
+ * Intel LBR_CTL bits
+ *
+ * Hardware branch filter for Arch LBR
+ */
+#define ARCH_LBR_KERNEL_BIT		1  /* capture at ring0 */
+#define ARCH_LBR_USER_BIT		2  /* capture at ring > 0 */
+#define ARCH_LBR_CALL_STACK_BIT		3  /* enable call stack */
+#define ARCH_LBR_JCC_BIT		16 /* capture conditional branches */
+#define ARCH_LBR_REL_JMP_BIT		17 /* capture relative jumps */
+#define ARCH_LBR_IND_JMP_BIT		18 /* capture indirect jumps */
+#define ARCH_LBR_REL_CALL_BIT		19 /* capture relative calls */
+#define ARCH_LBR_IND_CALL_BIT		20 /* capture indirect calls */
+#define ARCH_LBR_RETURN_BIT		21 /* capture near returns */
+#define ARCH_LBR_OTHER_BRANCH_BIT	22 /* capture other branches */
+
+#define ARCH_LBR_KERNEL			(1ULL << ARCH_LBR_KERNEL_BIT)
+#define ARCH_LBR_USER			(1ULL << ARCH_LBR_USER_BIT)
+#define ARCH_LBR_CALL_STACK		(1ULL << ARCH_LBR_CALL_STACK_BIT)
+#define ARCH_LBR_JCC			(1ULL << ARCH_LBR_JCC_BIT)
+#define ARCH_LBR_REL_JMP		(1ULL << ARCH_LBR_REL_JMP_BIT)
+#define ARCH_LBR_IND_JMP		(1ULL << ARCH_LBR_IND_JMP_BIT)
+#define ARCH_LBR_REL_CALL		(1ULL << ARCH_LBR_REL_CALL_BIT)
+#define ARCH_LBR_IND_CALL		(1ULL << ARCH_LBR_IND_CALL_BIT)
+#define ARCH_LBR_RETURN			(1ULL << ARCH_LBR_RETURN_BIT)
+#define ARCH_LBR_OTHER_BRANCH		(1ULL << ARCH_LBR_OTHER_BRANCH_BIT)
+
+#define ARCH_LBR_ANY			 \
+	(ARCH_LBR_JCC			|\
+	 ARCH_LBR_REL_JMP		|\
+	 ARCH_LBR_IND_JMP		|\
+	 ARCH_LBR_REL_CALL		|\
+	 ARCH_LBR_IND_CALL		|\
+	 ARCH_LBR_RETURN		|\
+	 ARCH_LBR_OTHER_BRANCH)
+
+#define ARCH_LBR_CTL_MASK			0x7f000e
+
 static void intel_pmu_lbr_filter(struct cpu_hw_events *cpuc);
 
 /*
@@ -814,6 +852,37 @@ static int intel_pmu_setup_hw_lbr_filter(struct perf_event *event)
 	return 0;
 }
 
+/*
+ * Setup the HW LBR filter for arch LBR
+ * Used only when available, may not be enough to disambiguate
+ * all branches, may need the help of the SW filter
+ */
+static int intel_pmu_setup_hw_arch_lbr_filter(struct perf_event *event)
+{
+	struct hw_perf_event_extra *reg;
+	u64 br_type = event->attr.branch_sample_type;
+	u64 mask = 0, v;
+	int i;
+
+	for (i = 0; i < PERF_SAMPLE_BRANCH_MAX_SHIFT; i++) {
+		if (!(br_type & (1ULL << i)))
+			continue;
+
+		v = x86_pmu.lbr_ctl_map[i];
+		if (v == LBR_NOT_SUPP)
+			return -EOPNOTSUPP;
+
+		if (v != LBR_IGN)
+			mask |= v;
+	}
+
+	reg = &event->hw.branch_reg;
+	reg->idx = EXTRA_REG_LBR;
+	reg->config = mask;
+
+	return 0;
+}
+
 int intel_pmu_setup_lbr_filter(struct perf_event *event)
 {
 	int ret = 0;
@@ -834,7 +903,9 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
 	/*
 	 * setup HW LBR filter, if any
 	 */
-	if (x86_pmu.lbr_sel_map)
+	if (x86_pmu.lbr_ctl_map)
+		ret = intel_pmu_setup_hw_arch_lbr_filter(event);
+	else if (x86_pmu.lbr_sel_map)
 		ret = intel_pmu_setup_hw_lbr_filter(event);
 
 	return ret;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 7112c51..1b91f2b 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -244,7 +244,10 @@ struct cpu_hw_events {
 	int				lbr_pebs_users;
 	struct perf_branch_stack	lbr_stack;
 	struct perf_branch_entry	lbr_entries[MAX_LBR_ENTRIES];
-	struct er_account		*lbr_sel;
+	union {
+		struct er_account		*lbr_sel;
+		struct er_account		*lbr_ctl;
+	};
 	u64				br_sel;
 	void				*last_task_ctx;
 	int				last_log_id;
@@ -685,8 +688,12 @@ struct x86_pmu {
 	 */
 	unsigned long	lbr_tos, lbr_from, lbr_to; /* MSR base regs       */
 	int		lbr_nr;			   /* hardware stack size */
-	u64		lbr_sel_mask;		   /* LBR_SELECT valid bits */
+	union {
+		u64		lbr_sel_mask;		   /* LBR_SELECT valid bits */
+		u64		lbr_ctl_mask;		   /* LBR_CTL valid bits */
+	};
 	const int	*lbr_sel_map;		   /* lbr_select mappings */
+	int		*lbr_ctl_map;		   /* LBR_CTL mappings */
 	bool		lbr_double_abort;	   /* duplicated lbr aborts */
 	bool		lbr_pt_coexist;		   /* (LBR|BTS) may coexist with PT */
 	bool		arch_lbr;		   /* Arch LBR supported */
@@ -849,8 +856,11 @@ static inline struct x86_perf_task_context_opt *task_context_opt(void *ctx)
 
 static inline bool x86_pmu_has_lbr_callstack(void)
 {
-	return  x86_pmu.lbr_sel_map &&
-		x86_pmu.lbr_sel_map[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] > 0;
+	if (x86_pmu.lbr_ctl_map)
+		return x86_pmu.lbr_ctl_map[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] > 0;
+	if (x86_pmu.lbr_sel_map)
+		return x86_pmu.lbr_sel_map[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] > 0;
+	return false;
 }
 
 static inline int x86_pmu_get_max_arch_lbr_nr(void)
-- 
2.7.4


  parent reply	other threads:[~2020-06-19 14:08 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 14:03 [PATCH 00/21] Support Architectural LBR kan.liang
2020-06-19 14:03 ` [PATCH 01/21] x86/cpufeatures: Add Architectural LBRs feature bit kan.liang
2020-06-19 14:03 ` [PATCH 02/21] perf/x86/intel/lbr: Add pointers for LBR enable and disable kan.liang
2020-06-19 14:03 ` [PATCH 03/21] perf/x86/intel/lbr: Add pointer for LBR reset kan.liang
2020-06-19 14:03 ` [PATCH 04/21] perf/x86/intel/lbr: Add pointer for LBR read kan.liang
2020-06-19 14:03 ` [PATCH 05/21] perf/x86/intel/lbr: Add pointers for LBR save and restore kan.liang
2020-06-19 14:03 ` [PATCH 06/21] perf/x86/intel/lbr: Factor out a new struct for generic optimization kan.liang
2020-06-19 14:03 ` [PATCH 07/21] perf/x86/intel/lbr: Use dynamic data structure for task_ctx kan.liang
2020-06-19 14:03 ` [PATCH 08/21] x86/msr-index: Add bunch of MSRs for Arch LBR kan.liang
2020-06-19 19:11   ` Peter Zijlstra
2020-06-19 14:03 ` [PATCH 09/21] perf/x86: Expose CPUID enumeration bits for arch LBR kan.liang
2020-06-19 18:31   ` Peter Zijlstra
2020-06-19 14:03 ` [PATCH 10/21] perf/x86/intel: Check Arch LBR MSRs kan.liang
2020-06-19 14:03 ` kan.liang [this message]
2020-06-19 18:40   ` [PATCH 11/21] perf/x86/intel/lbr: Support LBR_CTL Peter Zijlstra
2020-06-19 19:15     ` Liang, Kan
2020-06-19 19:22       ` Peter Zijlstra
2020-06-19 14:04 ` [PATCH 12/21] perf/x86/intel/lbr: Support Architectural LBR kan.liang
2020-06-19 19:08   ` Peter Zijlstra
2020-06-19 19:40     ` Liang, Kan
2020-06-19 14:04 ` [PATCH 13/21] perf/core: Factor out functions to allocate/free the task_ctx_data kan.liang
2020-06-19 14:04 ` [PATCH 14/21] perf/core: Use kmem_cache to allocate the PMU specific data kan.liang
2020-06-19 14:04 ` [PATCH 15/21] perf/x86/intel/lbr: Create kmem_cache for the LBR context data kan.liang
2020-06-19 14:04 ` [PATCH 16/21] perf/x86: Remove task_ctx_size kan.liang
2020-06-19 14:04 ` [PATCH 17/21] x86/fpu: Use proper mask to replace full instruction mask kan.liang
2020-06-19 19:31   ` Peter Zijlstra
2020-06-22 14:52     ` Liang, Kan
2020-06-22 15:02       ` Dave Hansen
2020-06-22 17:47         ` Liang, Kan
2020-06-22 18:05           ` Dave Hansen
2020-06-22 18:46             ` Liang, Kan
2020-06-19 14:04 ` [PATCH 18/21] x86/fpu/xstate: Support dynamic supervisor feature for LBR kan.liang
2020-06-19 14:04 ` [PATCH 19/21] x86/fpu/xstate: Add helpers for LBR dynamic supervisor feature kan.liang
2020-06-19 14:04 ` [PATCH 20/21] perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch kan.liang
2020-06-19 19:41   ` Peter Zijlstra
2020-06-19 22:28     ` Liang, Kan
2020-06-19 14:04 ` [PATCH 21/21] perf/x86/intel/lbr: Support XSAVES for arch LBR read kan.liang
2020-06-22 18:49   ` Cyrill Gorcunov
2020-06-22 19:11     ` Liang, Kan
2020-06-22 19:31       ` Cyrill Gorcunov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1592575449-64278-12-git-send-email-kan.liang@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.budankov@linux.intel.com \
    --cc=bigeasy@linutronix.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=eranian@google.com \
    --cc=gorcunov@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=like.xu@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yao.jin@linux.intel.com \
    --cc=yu-cheng.yu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.