[V5,00/12] perf: Add Icelake support (kernel only, except Topdown)
mbox series

Message ID 20190402194509.2832-1-kan.liang@linux.intel.com
Headers show
Series
  • perf: Add Icelake support (kernel only, except Topdown)
Related show

Message

Liang, Kan April 2, 2019, 7:44 p.m. UTC
From: Kan Liang <kan.liang@linux.intel.com>

To make the patch series easier to be reviewed and integrated, it is
condensed into a smaller set of patches, which only include essential
features of Icelake for kernel except new hardware Metrics counters for
Topdown events. The user space patch and Topdown support will be
submitted separately once the patch series is integrated.
(Please let me know if the Topdown/user space patches are still
expected to be included in this series)

The patch series intends to add Icelake support (except Topdown) for
Linux perf.

PATCH 1: Patch to fix wrong PEBS_REGS for large PEBS
PATCH 2-12: Kernel patches to support Icelake.
 - 2-6: Support adaptive PEBS feature
 - 7-8: Enable core support with some new features, e.g. 8 generic
   counters, new event constraints, a new fixed counter.
 - 9-12: Enable cstate, rapl, msr and uncore support on Icelake

Changes since V4:
- Separate the changes. Topdown support and user space patches will be
  submitted separately.
- A patch to fix wrong PEBS_REGS for large PEBS
- Rename has_xmm_regs to pebs_no_xmm_regs
- Rename PEBS_REGS to PEBS_GPRS_REGS
  Add PEBS_XMM_REGS for XMM registers
  Refine the extra check in x86_pmu_hw_config()

Changes since V3:
- Keep the old names for GPRs. Rename PERF_REG_X86_MAX to
  PERF_REG_X86_XMM_MAX
- Remove unnecessary REG_RESERVED
- Add REG_NOSUPPORT for 32bit

Changes since V2:
- Make the setup_pebs_sample_data() a function pointer argument
- Use cpuc->pebs_record_size unconditionally
- Add comments for EVENT_CONSTRAINT_RANGE
- Correct the Author of "perf/x86: Support constraint ranges"

Changes since V1:
- Avoid the interface changes for perf_reg_value() and
  perf_output_sample_regs().
- Remove the extra_regs in struct perf_sample_data.
- Add struct x86_perf_regs
- Add has_xmm_regs to indicate the specific platform which support XMM
  registers collection.
- Add check in x86_pmu_hw_config() to reject invalid config of regs_user
  and regs_intr.
- Rename intel_hsw_weight and intel_hsw_transaction
- Add missed inline for intel_get_tsx_transaction()
- Add new patch to extract code of event update in short period
- Code rebase on top of c634dc6bdede
- Rename @d to pebs_data_cfg
- Make pebs_update_adaptive_cfg readable
- Clear pebs_data_cfg and pebs_record_size for first PEBS in add
- Don't clear ICL_EVENTSEL_ADAPTIVE. Rely on MSR_PEBS_CFG settings
- Change PEBS record parsing order (bug fix)
- Support struct x86_perf_regs
- make get_pebs_status generic
- specific intel_pmu_drain_pebs_icl()
- Use cpuc->pebs_record_size to replace format_size
- Use 'size' to replace 'range_end' for constraint ranges
- Add x86_pmu.has_xmm_regs = true;
- Add more explanation in change log of REMOVE transaction
- Make perf_regs.h consistent between kernel and user space

Andi Kleen (2):
  perf/x86/intel: Extract memory code PEBS parser for reuse
  perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them

Kan Liang (9):
  perf/x86: Fix wrong PEBS_REGS
  perf/x86: Support outputting XMM registers
  perf/x86/intel/ds: Extract code of event update in short period
  perf/x86/intel: Support adaptive PEBSv4
  perf/x86/intel: Add Icelake support
  perf/x86/intel/cstate: Add Icelake support
  perf/x86/intel/rapl: Add Icelake support
  perf/x86/msr: Add Icelake support
  perf/x86/intel/uncore: Add Intel Icelake uncore support

Peter Zijlstra (1):
  perf/x86: Support constraint ranges

 arch/x86/events/core.c                |  15 +
 arch/x86/events/intel/core.c          | 117 +++++-
 arch/x86/events/intel/cstate.c        |   2 +
 arch/x86/events/intel/ds.c            | 501 ++++++++++++++++++++++----
 arch/x86/events/intel/lbr.c           |  35 +-
 arch/x86/events/intel/rapl.c          |   2 +
 arch/x86/events/intel/uncore.c        |   6 +
 arch/x86/events/intel/uncore.h        |   1 +
 arch/x86/events/intel/uncore_snb.c    |  91 +++++
 arch/x86/events/msr.c                 |   1 +
 arch/x86/events/perf_event.h          | 113 ++++--
 arch/x86/include/asm/intel_ds.h       |   2 +-
 arch/x86/include/asm/msr-index.h      |   1 +
 arch/x86/include/asm/perf_event.h     |  49 ++-
 arch/x86/include/uapi/asm/perf_regs.h |  23 +-
 arch/x86/kernel/perf_regs.c           |  27 +-
 16 files changed, 882 insertions(+), 104 deletions(-)

Comments

Peter Zijlstra April 8, 2019, 3:41 p.m. UTC | #1
I currently have something like the below on top, is that correct?

If so, I'll fold it back in.


--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -563,16 +563,17 @@ int x86_pmu_hw_config(struct perf_event
 	/* sample_regs_user never support XMM registers */
 	if (unlikely(event->attr.sample_regs_user & PEBS_XMM_REGS))
 		return -EINVAL;
+
 	/*
 	 * Besides the general purpose registers, XMM registers may
 	 * be collected in PEBS on some platforms, e.g. Icelake
 	 */
 	if (unlikely(event->attr.sample_regs_intr & PEBS_XMM_REGS)) {
-		if (!is_sampling_event(event) ||
-		    !event->attr.precise_ip ||
-		    x86_pmu.pebs_no_xmm_regs)
+		if (x86_pmu.pebs_no_xmm_regs)
 			return -EINVAL;
 
+		if (!event->attr.precise_ip)
+			return -EINVAL;
 	}
 
 	return x86_setup_perfctr(event);
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3428,7 +3428,7 @@ icl_get_event_constraints(struct cpu_hw_
 	 * Force instruction:ppp in Fixed counter 0
 	 */
 	if ((event->attr.precise_ip == 3) &&
-	    ((event->hw.config & X86_RAW_EVENT_MASK) == 0x00c0))
+	    (event->hw.config == X86_CONFIG(.event=0xc0)))
 		return &fixed_counter0_constraint;
 
 	return hsw_get_event_constraints(cpuc, idx, event);
@@ -4810,7 +4810,7 @@ __init int intel_pmu_init(void)
 			hsw_format_attr : nhm_format_attr;
 		extra_attr = merge_attr(extra_attr, skl_format_attr);
 		x86_pmu.cpu_events = get_icl_events_attrs();
-		x86_pmu.force_gpr_event = 0x2ca;
+		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xca, .umask=0x02);
 		x86_pmu.lbr_pt_coexist = true;
 		intel_pmu_pebs_data_source_skl(false);
 		pr_cont("Icelake events, ");
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -853,13 +853,13 @@ struct event_constraint intel_icl_pebs_e
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x1c0, 0x100000000ULL),	/* INST_RETIRED.PREC_DIST */
 	INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x400000000ULL),	/* SLOTS */
 
-	INTEL_PLD_CONSTRAINT(0x1cd, 0xff),		/* MEM_TRANS_RETIRED.LOAD_LATENCY */
-	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x1d0, 0xf),  /* MEM_INST_RETIRED.LOAD */
-	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x2d0, 0xf),  /* MEM_INST_RETIRED.STORE */
+	INTEL_PLD_CONSTRAINT(0x1cd, 0xff),			/* MEM_TRANS_RETIRED.LOAD_LATENCY */
+	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x1d0, 0xf),	/* MEM_INST_RETIRED.LOAD */
+	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x2d0, 0xf),	/* MEM_INST_RETIRED.STORE */
 
 	INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD_RANGE(0xd1, 0xd4, 0xf), /* MEM_LOAD_*_RETIRED.* */
 
-	INTEL_FLAGS_EVENT_CONSTRAINT(0xd0, 0xf), 	     /* MEM_INST_RETIRED.* */
+	INTEL_FLAGS_EVENT_CONSTRAINT(0xd0, 0xf),		/* MEM_INST_RETIRED.* */
 
 	/*
 	 * Everything else is handled by PMU_FL_PEBS_ALL, because we
@@ -963,40 +963,42 @@ static u64 pebs_update_adaptive_cfg(stru
 	u64 pebs_data_cfg = 0;
 	bool gprs, tsx_weight;
 
-	if ((sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) ||
-	    attr->precise_ip < 2) {
+	if (!(sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) &&
+	    attr->precise_ip > 1)
+		return pebs_data_cfs;
 
-		if (sample_type & PERF_PEBS_MEMINFO_TYPE)
-			pebs_data_cfg |= PEBS_DATACFG_MEMINFO;
+	if (sample_type & PERF_PEBS_MEMINFO_TYPE)
+		pebs_data_cfg |= PEBS_DATACFG_MEMINFO;
 
+	/*
+	 * We need GPRs when:
+	 * + user requested them
+	 * + precise_ip < 2 for the non event IP
+	 * + For RTM TSX weight we need GPRs for the abort code.
+	 */
+	gprs = (sample_type & PERF_SAMPLE_REGS_INTR) &&
+	       (attr->sample_regs_intr & PEBS_GPRS_REGS);
+
+	tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) &&
+		     ((attr->config & INTEL_ARCH_EVENT_MASK) ==
+		      x86_pmu.rtm_abort_event);
+
+	if (gprs || (attr->precise_ip < 2) || tsx_weight)
+		pebs_data_cfg |= PEBS_DATACFG_GPRS;
+
+	if ((sample_type & PERF_SAMPLE_REGS_INTR) &&
+	    (attr->sample_regs_intr & PERF_XMM_REGS))
+		pebs_data_cfg |= PEBS_DATACFG_XMMS;
+
+	if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
 		/*
-		 * Cases we need the registers:
-		 * + user requested registers
-		 * + precise_ip < 2 for the non event IP
-		 * + For RTM TSX weight we need GPRs too for the abort
-		 * code. But we don't want to force GPRs for all other
-		 * weights.  So add it only collectfor the RTM abort event.
+		 * For now always log all LBRs. Could configure this
+		 * later.
 		 */
-		gprs = (sample_type & PERF_SAMPLE_REGS_INTR) &&
-			      (attr->sample_regs_intr & 0xffffffff);
-		tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) &&
-			     ((attr->config & 0xffff) == x86_pmu.force_gpr_event);
-		if (gprs || (attr->precise_ip < 2) || tsx_weight)
-			pebs_data_cfg |= PEBS_DATACFG_GPRS;
-
-		if ((sample_type & PERF_SAMPLE_REGS_INTR) &&
-		    (attr->sample_regs_intr >> 32))
-			pebs_data_cfg |= PEBS_DATACFG_XMMS;
-
-		if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
-			/*
-			 * For now always log all LBRs. Could configure this
-			 * later.
-			 */
-			pebs_data_cfg |= PEBS_DATACFG_LBRS |
-				((x86_pmu.lbr_nr-1) << PEBS_DATACFG_LBR_SHIFT);
-		}
+		pebs_data_cfg |= PEBS_DATACFG_LBRS |
+			((x86_pmu.lbr_nr-1) << PEBS_DATACFG_LBR_SHIFT);
 	}
+
 	return pebs_data_cfg;
 }
 
@@ -1022,13 +1024,8 @@ pebs_update_state(bool needed_cb, struct
 	}
 
 	/*
-	 * The PEBS record doesn't shrink on the del. Because to get
-	 * an accurate config needs to go through all the existing pebs events.
-	 * It's not necessary.
-	 * There is no harmful for a bigger PEBS record, except little
-	 * performance impacts.
-	 * Also, for most cases, the same pebs config is applied for all
-	 * pebs events.
+	 * The PEBS record doesn't shrink on pmu::del(). Doing so would require
+	 * iterating all remaining PEBS events to reconstruct the config.
 	 */
 	if (x86_pmu.intel_cap.pebs_baseline && add) {
 		u64 pebs_data_cfg;
@@ -1076,8 +1073,7 @@ void intel_pmu_pebs_enable(struct perf_e
 
 	cpuc->pebs_enabled |= 1ULL << hwc->idx;
 
-	if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) &&
-	    (x86_pmu.version < 5))
+	if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) && (x86_pmu.version < 5))
 		cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
 	else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
 		cpuc->pebs_enabled |= 1ULL << 63;
@@ -1766,8 +1762,7 @@ static void intel_pmu_drain_pebs_core(st
 			       setup_pebs_fixed_sample_data);
 }
 
-static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc,
-						 int size)
+static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, int size)
 {
 	struct perf_event *event;
 	int bit;
@@ -1826,8 +1821,7 @@ static void intel_pmu_drain_pebs_nhm(str
 
 		/* PEBS v3 has more accurate status bits */
 		if (x86_pmu.intel_cap.pebs_format >= 3) {
-			for_each_set_bit(bit, (unsigned long *)&pebs_status,
-					 size)
+			for_each_set_bit(bit, (unsigned long *)&pebs_status, size)
 				counts[bit]++;
 
 			continue;
@@ -1866,8 +1860,7 @@ static void intel_pmu_drain_pebs_nhm(str
 		 * If collision happened, the record will be dropped.
 		 */
 		if (p->status != (1ULL << bit)) {
-			for_each_set_bit(i, (unsigned long *)&pebs_status,
-					 x86_pmu.max_pebs_events)
+			for_each_set_bit(i, (unsigned long *)&pebs_status, size)
 				error[i]++;
 			continue;
 		}
@@ -1875,7 +1868,7 @@ static void intel_pmu_drain_pebs_nhm(str
 		counts[bit]++;
 	}
 
-	for (bit = 0; bit < size; bit++) {
+	for_each_set_bit(bit, (unsigned long *)&mask, size) {
 		if ((counts[bit] == 0) && (error[bit] == 0))
 			continue;
 
@@ -1939,7 +1932,7 @@ static void intel_pmu_drain_pebs_icl(str
 			counts[bit]++;
 	}
 
-	for (bit = 0; bit < size; bit++) {
+	for_each_set_bit(bit, (unsigned long *)mask, size) {
 		if (counts[bit] == 0)
 			continue;
 
@@ -1980,6 +1973,9 @@ void __init intel_ds_init(void)
 		char *pebs_qual = "";
 		int format = x86_pmu.intel_cap.pebs_format;
 
+		if (format < 4)
+			x86_pmu.intel_cap.pebs_baseline = 0;
+
 		switch (format) {
 		case 0:
 			pr_cont("PEBS fmt0%c, ", pebs_type);
@@ -2042,8 +2038,6 @@ void __init intel_ds_init(void)
 			pr_cont("no PEBS fmt%d%c, ", format, pebs_type);
 			x86_pmu.pebs = 0;
 		}
-		if (format != 4)
-			x86_pmu.intel_cap.pebs_baseline = 0;
 	}
 }
 
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -303,8 +303,8 @@ struct cpu_hw_events {
 	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0, 0)
 
 /*
- * Only works for Intel events, which has 'small' event codes.
- * Need to fix the rang compare for 'big' event codes, e.g AMD64_EVENTSEL_EVENT
+ * The constraint_match() function only works for 'simple' event codes
+ * and not for extended (AMD64_EVENTSEL_EVENT) events codes.
  */
 #define EVENT_CONSTRAINT_RANGE(c, e, n, m) \
 	__EVENT_CONSTRAINT_RANGE(c, e, n, m, HWEIGHT(n), 0, 0)
@@ -672,12 +672,12 @@ struct x86_pmu {
 			pebs_no_xmm_regs	:1;
 	int		pebs_record_size;
 	int		pebs_buffer_size;
+	int		max_pebs_events;
 	void		(*drain_pebs)(struct pt_regs *regs);
 	struct event_constraint *pebs_constraints;
 	void		(*pebs_aliases)(struct perf_event *event);
-	int 		max_pebs_events;
 	unsigned long	large_pebs_flags;
-	u64		force_gpr_event;
+	u64		rtm_abort_event;
 
 	/*
 	 * Intel LBR
Liang, Kan April 8, 2019, 4:06 p.m. UTC | #2
On 4/8/2019 11:41 AM, Peter Zijlstra wrote:
> 
> I currently have something like the below on top, is that correct?

Yes, it's correct.

> 
> If so, I'll fold it back in.

Thanks. It's really appreciated.

Kan

> 
> 
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -563,16 +563,17 @@ int x86_pmu_hw_config(struct perf_event
>   	/* sample_regs_user never support XMM registers */
>   	if (unlikely(event->attr.sample_regs_user & PEBS_XMM_REGS))
>   		return -EINVAL;
> +
>   	/*
>   	 * Besides the general purpose registers, XMM registers may
>   	 * be collected in PEBS on some platforms, e.g. Icelake
>   	 */
>   	if (unlikely(event->attr.sample_regs_intr & PEBS_XMM_REGS)) {
> -		if (!is_sampling_event(event) ||
> -		    !event->attr.precise_ip ||
> -		    x86_pmu.pebs_no_xmm_regs)
> +		if (x86_pmu.pebs_no_xmm_regs)
>   			return -EINVAL;
>   
> +		if (!event->attr.precise_ip)
> +			return -EINVAL;
>   	}
>   
>   	return x86_setup_perfctr(event);
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3428,7 +3428,7 @@ icl_get_event_constraints(struct cpu_hw_
>   	 * Force instruction:ppp in Fixed counter 0
>   	 */
>   	if ((event->attr.precise_ip == 3) &&
> -	    ((event->hw.config & X86_RAW_EVENT_MASK) == 0x00c0))
> +	    (event->hw.config == X86_CONFIG(.event=0xc0)))
>   		return &fixed_counter0_constraint;
>   
>   	return hsw_get_event_constraints(cpuc, idx, event);
> @@ -4810,7 +4810,7 @@ __init int intel_pmu_init(void)
>   			hsw_format_attr : nhm_format_attr;
>   		extra_attr = merge_attr(extra_attr, skl_format_attr);
>   		x86_pmu.cpu_events = get_icl_events_attrs();
> -		x86_pmu.force_gpr_event = 0x2ca;
> +		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xca, .umask=0x02);
>   		x86_pmu.lbr_pt_coexist = true;
>   		intel_pmu_pebs_data_source_skl(false);
>   		pr_cont("Icelake events, ");
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -853,13 +853,13 @@ struct event_constraint intel_icl_pebs_e
>   	INTEL_FLAGS_UEVENT_CONSTRAINT(0x1c0, 0x100000000ULL),	/* INST_RETIRED.PREC_DIST */
>   	INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x400000000ULL),	/* SLOTS */
>   
> -	INTEL_PLD_CONSTRAINT(0x1cd, 0xff),		/* MEM_TRANS_RETIRED.LOAD_LATENCY */
> -	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x1d0, 0xf),  /* MEM_INST_RETIRED.LOAD */
> -	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x2d0, 0xf),  /* MEM_INST_RETIRED.STORE */
> +	INTEL_PLD_CONSTRAINT(0x1cd, 0xff),			/* MEM_TRANS_RETIRED.LOAD_LATENCY */
> +	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x1d0, 0xf),	/* MEM_INST_RETIRED.LOAD */
> +	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x2d0, 0xf),	/* MEM_INST_RETIRED.STORE */
>   
>   	INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD_RANGE(0xd1, 0xd4, 0xf), /* MEM_LOAD_*_RETIRED.* */
>   
> -	INTEL_FLAGS_EVENT_CONSTRAINT(0xd0, 0xf), 	     /* MEM_INST_RETIRED.* */
> +	INTEL_FLAGS_EVENT_CONSTRAINT(0xd0, 0xf),		/* MEM_INST_RETIRED.* */
>   
>   	/*
>   	 * Everything else is handled by PMU_FL_PEBS_ALL, because we
> @@ -963,40 +963,42 @@ static u64 pebs_update_adaptive_cfg(stru
>   	u64 pebs_data_cfg = 0;
>   	bool gprs, tsx_weight;
>   
> -	if ((sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) ||
> -	    attr->precise_ip < 2) {
> +	if (!(sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) &&
> +	    attr->precise_ip > 1)
> +		return pebs_data_cfs;
>   
> -		if (sample_type & PERF_PEBS_MEMINFO_TYPE)
> -			pebs_data_cfg |= PEBS_DATACFG_MEMINFO;
> +	if (sample_type & PERF_PEBS_MEMINFO_TYPE)
> +		pebs_data_cfg |= PEBS_DATACFG_MEMINFO;
>   
> +	/*
> +	 * We need GPRs when:
> +	 * + user requested them
> +	 * + precise_ip < 2 for the non event IP
> +	 * + For RTM TSX weight we need GPRs for the abort code.
> +	 */
> +	gprs = (sample_type & PERF_SAMPLE_REGS_INTR) &&
> +	       (attr->sample_regs_intr & PEBS_GPRS_REGS);
> +
> +	tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) &&
> +		     ((attr->config & INTEL_ARCH_EVENT_MASK) ==
> +		      x86_pmu.rtm_abort_event);
> +
> +	if (gprs || (attr->precise_ip < 2) || tsx_weight)
> +		pebs_data_cfg |= PEBS_DATACFG_GPRS;
> +
> +	if ((sample_type & PERF_SAMPLE_REGS_INTR) &&
> +	    (attr->sample_regs_intr & PERF_XMM_REGS))
> +		pebs_data_cfg |= PEBS_DATACFG_XMMS;
> +
> +	if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
>   		/*
> -		 * Cases we need the registers:
> -		 * + user requested registers
> -		 * + precise_ip < 2 for the non event IP
> -		 * + For RTM TSX weight we need GPRs too for the abort
> -		 * code. But we don't want to force GPRs for all other
> -		 * weights.  So add it only collectfor the RTM abort event.
> +		 * For now always log all LBRs. Could configure this
> +		 * later.
>   		 */
> -		gprs = (sample_type & PERF_SAMPLE_REGS_INTR) &&
> -			      (attr->sample_regs_intr & 0xffffffff);
> -		tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) &&
> -			     ((attr->config & 0xffff) == x86_pmu.force_gpr_event);
> -		if (gprs || (attr->precise_ip < 2) || tsx_weight)
> -			pebs_data_cfg |= PEBS_DATACFG_GPRS;
> -
> -		if ((sample_type & PERF_SAMPLE_REGS_INTR) &&
> -		    (attr->sample_regs_intr >> 32))
> -			pebs_data_cfg |= PEBS_DATACFG_XMMS;
> -
> -		if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
> -			/*
> -			 * For now always log all LBRs. Could configure this
> -			 * later.
> -			 */
> -			pebs_data_cfg |= PEBS_DATACFG_LBRS |
> -				((x86_pmu.lbr_nr-1) << PEBS_DATACFG_LBR_SHIFT);
> -		}
> +		pebs_data_cfg |= PEBS_DATACFG_LBRS |
> +			((x86_pmu.lbr_nr-1) << PEBS_DATACFG_LBR_SHIFT);
>   	}
> +
>   	return pebs_data_cfg;
>   }
>   
> @@ -1022,13 +1024,8 @@ pebs_update_state(bool needed_cb, struct
>   	}
>   
>   	/*
> -	 * The PEBS record doesn't shrink on the del. Because to get
> -	 * an accurate config needs to go through all the existing pebs events.
> -	 * It's not necessary.
> -	 * There is no harmful for a bigger PEBS record, except little
> -	 * performance impacts.
> -	 * Also, for most cases, the same pebs config is applied for all
> -	 * pebs events.
> +	 * The PEBS record doesn't shrink on pmu::del(). Doing so would require
> +	 * iterating all remaining PEBS events to reconstruct the config.
>   	 */
>   	if (x86_pmu.intel_cap.pebs_baseline && add) {
>   		u64 pebs_data_cfg;
> @@ -1076,8 +1073,7 @@ void intel_pmu_pebs_enable(struct perf_e
>   
>   	cpuc->pebs_enabled |= 1ULL << hwc->idx;
>   
> -	if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) &&
> -	    (x86_pmu.version < 5))
> +	if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) && (x86_pmu.version < 5))
>   		cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
>   	else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
>   		cpuc->pebs_enabled |= 1ULL << 63;
> @@ -1766,8 +1762,7 @@ static void intel_pmu_drain_pebs_core(st
>   			       setup_pebs_fixed_sample_data);
>   }
>   
> -static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc,
> -						 int size)
> +static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc, int size)
>   {
>   	struct perf_event *event;
>   	int bit;
> @@ -1826,8 +1821,7 @@ static void intel_pmu_drain_pebs_nhm(str
>   
>   		/* PEBS v3 has more accurate status bits */
>   		if (x86_pmu.intel_cap.pebs_format >= 3) {
> -			for_each_set_bit(bit, (unsigned long *)&pebs_status,
> -					 size)
> +			for_each_set_bit(bit, (unsigned long *)&pebs_status, size)
>   				counts[bit]++;
>   
>   			continue;
> @@ -1866,8 +1860,7 @@ static void intel_pmu_drain_pebs_nhm(str
>   		 * If collision happened, the record will be dropped.
>   		 */
>   		if (p->status != (1ULL << bit)) {
> -			for_each_set_bit(i, (unsigned long *)&pebs_status,
> -					 x86_pmu.max_pebs_events)
> +			for_each_set_bit(i, (unsigned long *)&pebs_status, size)
>   				error[i]++;
>   			continue;
>   		}
> @@ -1875,7 +1868,7 @@ static void intel_pmu_drain_pebs_nhm(str
>   		counts[bit]++;
>   	}
>   
> -	for (bit = 0; bit < size; bit++) {
> +	for_each_set_bit(bit, (unsigned long *)&mask, size) {
>   		if ((counts[bit] == 0) && (error[bit] == 0))
>   			continue;
>   
> @@ -1939,7 +1932,7 @@ static void intel_pmu_drain_pebs_icl(str
>   			counts[bit]++;
>   	}
>   
> -	for (bit = 0; bit < size; bit++) {
> +	for_each_set_bit(bit, (unsigned long *)mask, size) {
>   		if (counts[bit] == 0)
>   			continue;
>   
> @@ -1980,6 +1973,9 @@ void __init intel_ds_init(void)
>   		char *pebs_qual = "";
>   		int format = x86_pmu.intel_cap.pebs_format;
>   
> +		if (format < 4)
> +			x86_pmu.intel_cap.pebs_baseline = 0;
> +
>   		switch (format) {
>   		case 0:
>   			pr_cont("PEBS fmt0%c, ", pebs_type);
> @@ -2042,8 +2038,6 @@ void __init intel_ds_init(void)
>   			pr_cont("no PEBS fmt%d%c, ", format, pebs_type);
>   			x86_pmu.pebs = 0;
>   		}
> -		if (format != 4)
> -			x86_pmu.intel_cap.pebs_baseline = 0;
>   	}
>   }
>   
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -303,8 +303,8 @@ struct cpu_hw_events {
>   	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0, 0)
>   
>   /*
> - * Only works for Intel events, which has 'small' event codes.
> - * Need to fix the rang compare for 'big' event codes, e.g AMD64_EVENTSEL_EVENT
> + * The constraint_match() function only works for 'simple' event codes
> + * and not for extended (AMD64_EVENTSEL_EVENT) events codes.
>    */
>   #define EVENT_CONSTRAINT_RANGE(c, e, n, m) \
>   	__EVENT_CONSTRAINT_RANGE(c, e, n, m, HWEIGHT(n), 0, 0)
> @@ -672,12 +672,12 @@ struct x86_pmu {
>   			pebs_no_xmm_regs	:1;
>   	int		pebs_record_size;
>   	int		pebs_buffer_size;
> +	int		max_pebs_events;
>   	void		(*drain_pebs)(struct pt_regs *regs);
>   	struct event_constraint *pebs_constraints;
>   	void		(*pebs_aliases)(struct perf_event *event);
> -	int 		max_pebs_events;
>   	unsigned long	large_pebs_flags;
> -	u64		force_gpr_event;
> +	u64		rtm_abort_event;
>   
>   	/*
>   	 * Intel LBR
>
Liang, Kan April 8, 2019, 4:25 p.m. UTC | #3
>> @@ -963,40 +963,42 @@ static u64 pebs_update_adaptive_cfg(stru
>>       u64 pebs_data_cfg = 0;
>>       bool gprs, tsx_weight;
>> -    if ((sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) ||
>> -        attr->precise_ip < 2) {
>> +    if (!(sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) &&
>> +        attr->precise_ip > 1)
>> +        return pebs_data_cfs;

Sorry, there are two typos. I didn't find in the previous email.
Should be pebs_data_cfg.

>> -        if (sample_type & PERF_PEBS_MEMINFO_TYPE)
>> -            pebs_data_cfg |= PEBS_DATACFG_MEMINFO;
>> +    if (sample_type & PERF_PEBS_MEMINFO_TYPE)
>> +        pebs_data_cfg |= PEBS_DATACFG_MEMINFO;
>> +    /*
>> +     * We need GPRs when:
>> +     * + user requested them
>> +     * + precise_ip < 2 for the non event IP
>> +     * + For RTM TSX weight we need GPRs for the abort code.
>> +     */
>> +    gprs = (sample_type & PERF_SAMPLE_REGS_INTR) &&
>> +           (attr->sample_regs_intr & PEBS_GPRS_REGS);
>> +
>> +    tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) &&
>> +             ((attr->config & INTEL_ARCH_EVENT_MASK) ==
>> +              x86_pmu.rtm_abort_event);
>> +
>> +    if (gprs || (attr->precise_ip < 2) || tsx_weight)
>> +        pebs_data_cfg |= PEBS_DATACFG_GPRS;
>> +
>> +    if ((sample_type & PERF_SAMPLE_REGS_INTR) &&
>> +        (attr->sample_regs_intr & PERF_XMM_REGS))

Should be PEBS_XMM_REGS.


Thanks,
Kan
Peter Zijlstra April 8, 2019, 4:28 p.m. UTC | #4
On Mon, Apr 08, 2019 at 12:25:17PM -0400, Liang, Kan wrote:
> 
> > > @@ -963,40 +963,42 @@ static u64 pebs_update_adaptive_cfg(stru
> > >       u64 pebs_data_cfg = 0;
> > >       bool gprs, tsx_weight;
> > > -    if ((sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) ||
> > > -        attr->precise_ip < 2) {
> > > +    if (!(sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) &&
> > > +        attr->precise_ip > 1)
> > > +        return pebs_data_cfs;
> 
> Sorry, there are two typos. I didn't find in the previous email.
> Should be pebs_data_cfg.
> 
> > > -        if (sample_type & PERF_PEBS_MEMINFO_TYPE)
> > > -            pebs_data_cfg |= PEBS_DATACFG_MEMINFO;
> > > +    if (sample_type & PERF_PEBS_MEMINFO_TYPE)
> > > +        pebs_data_cfg |= PEBS_DATACFG_MEMINFO;
> > > +    /*
> > > +     * We need GPRs when:
> > > +     * + user requested them
> > > +     * + precise_ip < 2 for the non event IP
> > > +     * + For RTM TSX weight we need GPRs for the abort code.
> > > +     */
> > > +    gprs = (sample_type & PERF_SAMPLE_REGS_INTR) &&
> > > +           (attr->sample_regs_intr & PEBS_GPRS_REGS);
> > > +
> > > +    tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) &&
> > > +             ((attr->config & INTEL_ARCH_EVENT_MASK) ==
> > > +              x86_pmu.rtm_abort_event);
> > > +
> > > +    if (gprs || (attr->precise_ip < 2) || tsx_weight)
> > > +        pebs_data_cfg |= PEBS_DATACFG_GPRS;
> > > +
> > > +    if ((sample_type & PERF_SAMPLE_REGS_INTR) &&
> > > +        (attr->sample_regs_intr & PERF_XMM_REGS))
> 
> Should be PEBS_XMM_REGS.
> 

Ha!, clearly I compile-tested it... oh wait :-)
Liang, Kan April 8, 2019, 10:49 p.m. UTC | #5
On 4/8/2019 12:06 PM, Liang, Kan wrote:
> @@ -1875,7 +1868,7 @@ static void intel_pmu_drain_pebs_nhm(str
>            counts[bit]++;
>        }
> -    for (bit = 0; bit < size; bit++) {
> +    for_each_set_bit(bit, (unsigned long *)&mask, size) {
>            if ((counts[bit] == 0) && (error[bit] == 0))
>                continue;
> @@ -1939,7 +1932,7 @@ static void intel_pmu_drain_pebs_icl(str
>                counts[bit]++;
>        }
> -    for (bit = 0; bit < size; bit++) {
> +    for_each_set_bit(bit, (unsigned long *)mask, size) {
>            if (counts[bit] == 0)
>                continue;

I have finished the tests for the changes. There is one more regression 
found by another typo on ICL.

Should be "for_each_set_bit(bit, (unsigned long *)&mask, size) {"

Thanks,
Kan