linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] perf/x86/intel/pt: Add new packet enables
@ 2015-07-17 13:34 Alexander Shishkin
  2015-07-17 13:34 ` [PATCH 1/3] perf/x86/intel/pt: Add new timing " Alexander Shishkin
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-17 13:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme,
	Alexander Shishkin

Hi Peter,

There are new PT packets and handles to enable and configure them in the
new version if Intel Architecture SDM. This series deals with timing
related packets: adds bit definitions, cpuid enumeration and relevant
userspace-visible parts.

The bulk of work, as usual, is in the tooling, which I don't include in
this patchset, since it depends on Adrian's earlier work which is still
bouncing between him and the maintainers, but for the sake of completeness
I pushed it to my github tree [1] and Adrian is also sending it out for
review. So anybody interested in trying the whole thing out can pull it
from my tree.

This time around, however, there is no compile time dependency between
mine and his series, since all the relevant bits are communicated through
sysfs.

[1] https://github.com/virtuoso/linux-perf/commits/intel_pt

Alexander Shishkin (3):
  perf/x86/intel/pt: Add new timing packet enables
  perf/x86/intel/pt: Add an option to not force PSB+ on every
    schedule-in
  perf/x86/intel/bts: Set itrace_started in pmu::start to match the new
    logic

 arch/x86/include/asm/msr-index.h           |  8 +++
 arch/x86/kernel/cpu/intel_pt.h             |  6 +++
 arch/x86/kernel/cpu/perf_event.h           |  1 +
 arch/x86/kernel/cpu/perf_event_intel_bts.c |  1 +
 arch/x86/kernel/cpu/perf_event_intel_pt.c  | 78 ++++++++++++++++++++++++++++--
 kernel/events/core.c                       |  2 -
 6 files changed, 91 insertions(+), 5 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/3] perf/x86/intel/pt: Add new timing packet enables
  2015-07-17 13:34 [PATCH 0/3] perf/x86/intel/pt: Add new packet enables Alexander Shishkin
@ 2015-07-17 13:34 ` Alexander Shishkin
  2015-07-30 10:40   ` Peter Zijlstra
  2015-07-17 13:34 ` [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in Alexander Shishkin
  2015-07-17 13:34 ` [PATCH 3/3] perf/x86/intel/bts: Set itrace_started in pmu::start to match the new logic Alexander Shishkin
  2 siblings, 1 reply; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-17 13:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme,
	Alexander Shishkin

Intel PT chapter in the new Intel Architecture SDM adds several packets
corresponding enable bits and registers that control packet generation.
Also, additional bits in the Intel PT CPUID leaf were added to enumerate
presence and parameters of these new packets and features.

The packets and enables are:
  * CYC: cycle accurate mode, provides the number of cycles elapsed since
    previous CYC packet; its presence and available threshold values are
    enumerated via CPUID;
  * MTC: mini time counter packets, used for tracking TSC time between
    full TSC packets; its presence and available resolution options are
    enumerated via CPUID;
  * PSB packet period is now configurable, available period values are
    enumerated via CPUID.

This patch adds corresponding bit and register definitions, pmu driver
capabilities based on CPUID enumeration, new attribute format bits for
the new featurens and extends event configuration validation function
to take these into account.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/include/asm/msr-index.h          |  8 +++++
 arch/x86/kernel/cpu/intel_pt.h            |  6 ++++
 arch/x86/kernel/cpu/perf_event_intel_pt.c | 56 ++++++++++++++++++++++++++++++-
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 9ebc3d0093..c665d34f72 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -80,13 +80,21 @@
 
 #define MSR_IA32_RTIT_CTL		0x00000570
 #define RTIT_CTL_TRACEEN		BIT(0)
+#define RTIT_CTL_CYCLEACC		BIT(1)
 #define RTIT_CTL_OS			BIT(2)
 #define RTIT_CTL_USR			BIT(3)
 #define RTIT_CTL_CR3EN			BIT(7)
 #define RTIT_CTL_TOPA			BIT(8)
+#define RTIT_CTL_MTC_EN			BIT(9)
 #define RTIT_CTL_TSC_EN			BIT(10)
 #define RTIT_CTL_DISRETC		BIT(11)
 #define RTIT_CTL_BRANCH_EN		BIT(13)
+#define RTIT_CTL_MTC_RANGE_OFFSET	14
+#define RTIT_CTL_MTC_RANGE		(0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
+#define RTIT_CTL_CYC_THRESH_OFFSET	19
+#define RTIT_CTL_CYC_THRESH		(0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
+#define RTIT_CTL_PSB_FREQ_OFFSET	24
+#define RTIT_CTL_PSB_FREQ      		(0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
 #define MSR_IA32_RTIT_STATUS		0x00000571
 #define RTIT_STATUS_CONTEXTEN		BIT(1)
 #define RTIT_STATUS_TRIGGEREN		BIT(2)
diff --git a/arch/x86/kernel/cpu/intel_pt.h b/arch/x86/kernel/cpu/intel_pt.h
index 1c338b0eba..feb293e965 100644
--- a/arch/x86/kernel/cpu/intel_pt.h
+++ b/arch/x86/kernel/cpu/intel_pt.h
@@ -72,9 +72,15 @@ struct topa_entry {
 enum pt_capabilities {
 	PT_CAP_max_subleaf = 0,
 	PT_CAP_cr3_filtering,
+	PT_CAP_psb_cyc,
+	PT_CAP_mtc,
 	PT_CAP_topa_output,
 	PT_CAP_topa_multiple_entries,
+	PT_CAP_single_range_output,
 	PT_CAP_payloads_lip,
+	PT_CAP_mtc_periods,
+	PT_CAP_cycle_thresholds,
+	PT_CAP_psb_periods,
 };
 
 struct pt_pmu {
diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index 183de71962..aacd3c9bed 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -65,9 +65,15 @@ static struct pt_cap_desc {
 } pt_caps[] = {
 	PT_CAP(max_subleaf,		0, CR_EAX, 0xffffffff),
 	PT_CAP(cr3_filtering,		0, CR_EBX, BIT(0)),
+	PT_CAP(psb_cyc,			0, CR_EBX, BIT(1)),
+	PT_CAP(mtc,			0, CR_EBX, BIT(3)),
 	PT_CAP(topa_output,		0, CR_ECX, BIT(0)),
 	PT_CAP(topa_multiple_entries,	0, CR_ECX, BIT(1)),
+	PT_CAP(single_range_output,	0, CR_ECX, BIT(2)),
 	PT_CAP(payloads_lip,		0, CR_ECX, BIT(31)),
+	PT_CAP(mtc_periods,		1, CR_EAX, 0xffff0000),
+	PT_CAP(cycle_thresholds,	1, CR_EBX, 0xffff),
+	PT_CAP(psb_periods,		1, CR_EBX, 0xffff0000),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -94,12 +100,22 @@ static struct attribute_group pt_cap_group = {
 	.name	= "caps",
 };
 
+PMU_FORMAT_ATTR(cyc,		"config:1"	);
+PMU_FORMAT_ATTR(mtc,		"config:9"	);
 PMU_FORMAT_ATTR(tsc,		"config:10"	);
 PMU_FORMAT_ATTR(noretcomp,	"config:11"	);
+PMU_FORMAT_ATTR(mtc_period,	"config:14-17"	);
+PMU_FORMAT_ATTR(cyc_thresh,	"config:19-22"	);
+PMU_FORMAT_ATTR(psb_period,	"config:24-27"	);
 
 static struct attribute *pt_formats_attr[] = {
+	&format_attr_cyc.attr,
+	&format_attr_mtc.attr,
 	&format_attr_tsc.attr,
 	&format_attr_noretcomp.attr,
+	&format_attr_mtc_period.attr,
+	&format_attr_cyc_thresh.attr,
+	&format_attr_psb_period.attr,
 	NULL,
 };
 
@@ -170,15 +186,53 @@ fail:
 	return ret;
 }
 
-#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC)
+#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN     |	\
+			RTIT_CTL_DISRETC    |	\
+			RTIT_CTL_CYCLEACC   |	\
+			RTIT_CTL_MTC_EN     |	\
+			RTIT_CTL_MTC_RANGE  |	\
+			RTIT_CTL_CYC_THRESH |   \
+			RTIT_CTL_PSB_FREQ)
 
 static bool pt_event_valid(struct perf_event *event)
 {
 	u64 config = event->attr.config;
+	u64 allowed, requested;
 
 	if ((config & PT_CONFIG_MASK) != config)
 		return false;
 
+	if (config &
+	    (RTIT_CTL_CYCLEACC | RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ)) {
+		if (!pt_cap_get(PT_CAP_psb_cyc))
+			return false;
+
+		allowed = pt_cap_get(PT_CAP_psb_periods);
+		requested = (config & RTIT_CTL_PSB_FREQ) >>
+			RTIT_CTL_PSB_FREQ_OFFSET;
+		if (requested && (!(allowed & BIT(requested))))
+			return false;
+
+		allowed = pt_cap_get(PT_CAP_cycle_thresholds);
+		requested = (config & RTIT_CTL_CYC_THRESH) >>
+			RTIT_CTL_CYC_THRESH_OFFSET;
+		if (requested && (!(allowed & BIT(requested))))
+			return false;
+	}
+
+	if (config & (RTIT_CTL_MTC_EN | RTIT_CTL_MTC_RANGE)) {
+		allowed = pt_cap_get(PT_CAP_mtc_periods);
+
+		if (!allowed)
+			return false;
+
+		requested = (config & RTIT_CTL_MTC_RANGE) >>
+			RTIT_CTL_MTC_RANGE_OFFSET;
+
+		if (!(allowed & BIT(requested)))
+			return false;
+	}
+
 	return true;
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in
  2015-07-17 13:34 [PATCH 0/3] perf/x86/intel/pt: Add new packet enables Alexander Shishkin
  2015-07-17 13:34 ` [PATCH 1/3] perf/x86/intel/pt: Add new timing " Alexander Shishkin
@ 2015-07-17 13:34 ` Alexander Shishkin
  2015-07-30 10:43   ` Peter Zijlstra
  2015-07-17 13:34 ` [PATCH 3/3] perf/x86/intel/bts: Set itrace_started in pmu::start to match the new logic Alexander Shishkin
  2 siblings, 1 reply; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-17 13:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme,
	Alexander Shishkin

Currently, the PT driver zeroes out the status register every time before
starting the event. However, all the writable bits are already taken care
of in pt_handle_status() function, except the new PacketByteCnt field,
which in new versions of PT contains the number of packet bytes written
since the last sync (PSB) packet. Zeroing it out before enabling PT forces
a sync packet to be written. This means that, with the existing code, a
sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
time a PT event is scheduled in.

To avoid these unnecessary syncs and save a WRMSR in the fast path, this
patch adds a new attribute config bit "no_force_psb", which will disable
this zeroing WRMSR.

One exception where we do need to force PSB like this when tracing starts,
so that the decoder has a clear sync point in the trace. For this purpose
we aready have hw::itrace_started flag, which we are currently using to
output PERF_RECORD_ITRACE_START. This patch moves setting itrace_started
from perf core to the pmu::start, where it should still be 0 on the very
first run.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event.h          |  1 +
 arch/x86/kernel/cpu/perf_event_intel_pt.c | 22 ++++++++++++++++++++--
 kernel/events/core.c                      |  2 --
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 3474cf26b6..d7aa6c5630 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -77,6 +77,7 @@ struct event_constraint {
 #define PERF_X86_EVENT_EXCL_ACCT	0x0200 /* accounted EXCL event */
 #define PERF_X86_EVENT_AUTO_RELOAD	0x0400 /* use PEBS auto-reload */
 #define PERF_X86_EVENT_FREERUNNING	0x0800 /* use freerunning PEBS */
+#define PERF_X86_EVENT_PT_NO_FORCE_PSB	0x1000 /* don't force PSB+ packets */
 
 
 struct amd_nb {
diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index aacd3c9bed..6c40455cd1 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -100,6 +100,7 @@ static struct attribute_group pt_cap_group = {
 	.name	= "caps",
 };
 
+PMU_FORMAT_ATTR(no_force_psb,	"config:0"	);
 PMU_FORMAT_ATTR(cyc,		"config:1"	);
 PMU_FORMAT_ATTR(mtc,		"config:9"	);
 PMU_FORMAT_ATTR(tsc,		"config:10"	);
@@ -109,6 +110,7 @@ PMU_FORMAT_ATTR(cyc_thresh,	"config:19-22"	);
 PMU_FORMAT_ATTR(psb_period,	"config:24-27"	);
 
 static struct attribute *pt_formats_attr[] = {
+	&format_attr_no_force_psb.attr,
 	&format_attr_cyc.attr,
 	&format_attr_mtc.attr,
 	&format_attr_tsc.attr,
@@ -199,6 +201,18 @@ static bool pt_event_valid(struct perf_event *event)
 	u64 config = event->attr.config;
 	u64 allowed, requested;
 
+	/*
+	 * Re-use one of the config bits to enable resetting packet byte
+	 * count that will result in outputting PSB+ at every event enable.
+	 * Since we don't allow setting bit 0 (TraceEn) directly from
+	 * attr::config, we can re-use it for our purpose.
+	 */
+	if (config & BIT(0)) {
+		event->hw.flags |= PERF_X86_EVENT_PT_NO_FORCE_PSB;
+		event->attr.config &= ~BIT(0);
+		config = event->attr.config;
+	}
+
 	if ((config & PT_CONFIG_MASK) != config)
 		return false;
 
@@ -245,6 +259,12 @@ static void pt_config(struct perf_event *event)
 {
 	u64 reg;
 
+	if (!(event->hw.flags & PERF_X86_EVENT_PT_NO_FORCE_PSB) ||
+	    !(event->hw.itrace_started)) {
+		event->hw.itrace_started = 1;
+		wrmsrl(MSR_IA32_RTIT_STATUS, 0);
+	}
+
 	reg = RTIT_CTL_TOPA | RTIT_CTL_BRANCH_EN | RTIT_CTL_TRACEEN;
 
 	if (!event->attr.exclude_kernel)
@@ -964,7 +984,6 @@ void intel_pt_interrupt(void)
 
 		pt_config_buffer(buf->cur->table, buf->cur_idx,
 				 buf->output_off);
-		wrmsrl(MSR_IA32_RTIT_STATUS, 0);
 		pt_config(event);
 	}
 }
@@ -988,7 +1007,6 @@ static void pt_event_start(struct perf_event *event, int mode)
 
 	pt_config_buffer(buf->cur->table, buf->cur_idx,
 			 buf->output_off);
-	wrmsrl(MSR_IA32_RTIT_STATUS, 0);
 	pt_config(event);
 }
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae3419b..15da13237a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6040,8 +6040,6 @@ static void perf_log_itrace_start(struct perf_event *event)
 	    event->hw.itrace_started)
 		return;
 
-	event->hw.itrace_started = 1;
-
 	rec.header.type	= PERF_RECORD_ITRACE_START;
 	rec.header.misc	= 0;
 	rec.header.size	= sizeof(rec);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] perf/x86/intel/bts: Set itrace_started in pmu::start to match the new logic
  2015-07-17 13:34 [PATCH 0/3] perf/x86/intel/pt: Add new packet enables Alexander Shishkin
  2015-07-17 13:34 ` [PATCH 1/3] perf/x86/intel/pt: Add new timing " Alexander Shishkin
  2015-07-17 13:34 ` [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in Alexander Shishkin
@ 2015-07-17 13:34 ` Alexander Shishkin
  2015-09-08 12:02   ` Alexander Shishkin
  2015-09-13 10:56   ` [tip:perf/core] perf/x86/intel/bts: Set event-> hw.itrace_started " tip-bot for Alexander Shishkin
  2 siblings, 2 replies; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-17 13:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme,
	Alexander Shishkin

Since hw::itrace_started is now set in pmu::start to signal the beginning of
the trace, do so also in the intel_bts driver.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_bts.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_bts.c b/arch/x86/kernel/cpu/perf_event_intel_bts.c
index 54690e8857..d1c0f254af 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_bts.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_bts.c
@@ -222,6 +222,7 @@ static void __bts_event_start(struct perf_event *event)
 	if (!buf || bts_buffer_is_full(buf, bts))
 		return;
 
+	event->hw.itrace_started = 1;
 	event->hw.state = 0;
 
 	if (!buf->snapshot)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] perf/x86/intel/pt: Add new timing packet enables
  2015-07-17 13:34 ` [PATCH 1/3] perf/x86/intel/pt: Add new timing " Alexander Shishkin
@ 2015-07-30 10:40   ` Peter Zijlstra
  2015-07-30 11:57     ` Alexander Shishkin
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-07-30 10:40 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

On Fri, Jul 17, 2015 at 04:34:08PM +0300, Alexander Shishkin wrote:
> +#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN     |	\
> +			RTIT_CTL_DISRETC    |	\
> +			RTIT_CTL_CYCLEACC   |	\
> +			RTIT_CTL_MTC_EN     |	\
> +			RTIT_CTL_MTC_RANGE  |	\
> +			RTIT_CTL_CYC_THRESH |   \
> +			RTIT_CTL_PSB_FREQ)
>  

#define RTIT_CTL_CYC (RTIT_CTL_CYCLEACC   | \
		      RTIT_CTL_CYC_THRESH | \
		      RTIT_CTL_PSB_FREQ)

#define RTIT_CTL_MTC (RTIT_CTL_MTC_EN | \
		      RTIT_CTL_MTC_RANGE)

#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN     | \
			RTIT_CTL_DISRETC    | \
			RTIT_CTL_CYC	    | \
			RTIT_CTL_MTC)

>  static bool pt_event_valid(struct perf_event *event)
>  {
>  	u64 config = event->attr.config;
> +	u64 allowed, requested;
>  
>  	if ((config & PT_CONFIG_MASK) != config)
>  		return false;
>  
> +	if (config &
> +	    (RTIT_CTL_CYCLEACC | RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ)) {

	if (config & RTIT_CTL_CYC) {

> +		if (!pt_cap_get(PT_CAP_psb_cyc))
> +			return false;
> +
> +		allowed = pt_cap_get(PT_CAP_psb_periods);
> +		requested = (config & RTIT_CTL_PSB_FREQ) >>
> +			RTIT_CTL_PSB_FREQ_OFFSET;
> +		if (requested && (!(allowed & BIT(requested))))
> +			return false;
> +
> +		allowed = pt_cap_get(PT_CAP_cycle_thresholds);
> +		requested = (config & RTIT_CTL_CYC_THRESH) >>
> +			RTIT_CTL_CYC_THRESH_OFFSET;
> +		if (requested && (!(allowed & BIT(requested))))
> +			return false;
> +	}
> +
> +	if (config & (RTIT_CTL_MTC_EN | RTIT_CTL_MTC_RANGE)) {

	if (config & RTIT_CTL_MTC) {

> +		allowed = pt_cap_get(PT_CAP_mtc_periods);
> +
> +		if (!allowed)
> +			return false;
> +
> +		requested = (config & RTIT_CTL_MTC_RANGE) >>
> +			RTIT_CTL_MTC_RANGE_OFFSET;
> +
> +		if (!(allowed & BIT(requested)))
> +			return false;
> +	}
> +
>  	return true;
>  }

Would that make sense?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in
  2015-07-17 13:34 ` [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in Alexander Shishkin
@ 2015-07-30 10:43   ` Peter Zijlstra
  2015-07-30 11:53     ` Alexander Shishkin
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-07-30 10:43 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

On Fri, Jul 17, 2015 at 04:34:09PM +0300, Alexander Shishkin wrote:
> Currently, the PT driver zeroes out the status register every time before
> starting the event. However, all the writable bits are already taken care
> of in pt_handle_status() function, except the new PacketByteCnt field,
> which in new versions of PT contains the number of packet bytes written
> since the last sync (PSB) packet. Zeroing it out before enabling PT forces
> a sync packet to be written. This means that, with the existing code, a
> sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
> time a PT event is scheduled in.
> 
> To avoid these unnecessary syncs and save a WRMSR in the fast path, this
> patch adds a new attribute config bit "no_force_psb", which will disable
> this zeroing WRMSR.

Why is this exposed?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in
  2015-07-30 10:43   ` Peter Zijlstra
@ 2015-07-30 11:53     ` Alexander Shishkin
  2015-07-30 12:13       ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-30 11:53 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

Peter Zijlstra <peterz@infradead.org> writes:

> On Fri, Jul 17, 2015 at 04:34:09PM +0300, Alexander Shishkin wrote:
>> Currently, the PT driver zeroes out the status register every time before
>> starting the event. However, all the writable bits are already taken care
>> of in pt_handle_status() function, except the new PacketByteCnt field,
>> which in new versions of PT contains the number of packet bytes written
>> since the last sync (PSB) packet. Zeroing it out before enabling PT forces
>> a sync packet to be written. This means that, with the existing code, a
>> sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
>> time a PT event is scheduled in.
>> 
>> To avoid these unnecessary syncs and save a WRMSR in the fast path, this
>> patch adds a new attribute config bit "no_force_psb", which will disable
>> this zeroing WRMSR.
>
> Why is this exposed?

The default behavior, which we have in the kernel now, is to always
force the PSB, so if we change it, we need to make sure nobody is
relying on it being the default behavior. Granted, for the hardware
that's on the market right now it won't make a difference (it generates
PSB+ at least every 256 bytes anyway), but in future if we change the
default, newer hardware will produce different results with 4.2 than with
the newer kernels and this would be a tad sloppy.

The tools, however, can decide to set no_force_psb=1 when no_force_psb
is available (sysfs caps) and otherwise the default behavior will be the
same as no_force_psb==0. Makes sense?

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] perf/x86/intel/pt: Add new timing packet enables
  2015-07-30 10:40   ` Peter Zijlstra
@ 2015-07-30 11:57     ` Alexander Shishkin
  2015-07-30 12:14       ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-30 11:57 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

Peter Zijlstra <peterz@infradead.org> writes:

> On Fri, Jul 17, 2015 at 04:34:08PM +0300, Alexander Shishkin wrote:
>> +#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN     |	\
>> +			RTIT_CTL_DISRETC    |	\
>> +			RTIT_CTL_CYCLEACC   |	\
>> +			RTIT_CTL_MTC_EN     |	\
>> +			RTIT_CTL_MTC_RANGE  |	\
>> +			RTIT_CTL_CYC_THRESH |   \
>> +			RTIT_CTL_PSB_FREQ)
>>  
>
> #define RTIT_CTL_CYC (RTIT_CTL_CYCLEACC   | \
> 		      RTIT_CTL_CYC_THRESH | \
> 		      RTIT_CTL_PSB_FREQ)

PSB_FREQ is not, strictly speaking, related to cycle accurate mode. Both
adjustable psb frequency and cycle accurate mode settings are enumerated
with the same CPUID bit, but they really do different things unrelated
to one another.

>> +	if (config & (RTIT_CTL_MTC_EN | RTIT_CTL_MTC_RANGE)) {
>
> 	if (config & RTIT_CTL_MTC) {
>
> Would that make sense?

To me either way is fine. Want me to respin it?

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in
  2015-07-30 11:53     ` Alexander Shishkin
@ 2015-07-30 12:13       ` Peter Zijlstra
  2015-07-30 12:49         ` Alexander Shishkin
                           ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Peter Zijlstra @ 2015-07-30 12:13 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

On Thu, Jul 30, 2015 at 02:53:51PM +0300, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Fri, Jul 17, 2015 at 04:34:09PM +0300, Alexander Shishkin wrote:
> >> Currently, the PT driver zeroes out the status register every time before
> >> starting the event. However, all the writable bits are already taken care
> >> of in pt_handle_status() function, except the new PacketByteCnt field,
> >> which in new versions of PT contains the number of packet bytes written
> >> since the last sync (PSB) packet. Zeroing it out before enabling PT forces
> >> a sync packet to be written. This means that, with the existing code, a
> >> sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
> >> time a PT event is scheduled in.
> >> 
> >> To avoid these unnecessary syncs and save a WRMSR in the fast path, this
> >> patch adds a new attribute config bit "no_force_psb", which will disable
> >> this zeroing WRMSR.
> >
> > Why is this exposed?
> 
> The default behavior, which we have in the kernel now, is to always
> force the PSB, so if we change it, we need to make sure nobody is
> relying on it being the default behavior. 

Seeing how there's hardly any tools out there for this -- and its only
recently introduced to the kernel, I think we can change this behaviour.

> Granted, for the hardware
> that's on the market right now it won't make a difference (it generates
> PSB+ at least every 256 bytes anyway), but in future if we change the
> default, newer hardware will produce different results with 4.2 than with
> the newer kernels and this would be a tad sloppy.

Even more reason not to expose this. By the time there's hardware out
there that supports this, 4.2 will be an ancient kernel :-)

> The tools, however, can decide to set no_force_psb=1 when no_force_psb
> is available (sysfs caps) and otherwise the default behavior will be the
> same as no_force_psb==0. Makes sense?

I'd rather not expose this. It doesn't make a difference to the actual
trace data other than it being bigger. Any decoder must be able to
decode both versions. And you cannot really tell anything from there
being too many PSB frames.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] perf/x86/intel/pt: Add new timing packet enables
  2015-07-30 11:57     ` Alexander Shishkin
@ 2015-07-30 12:14       ` Peter Zijlstra
  2015-07-30 13:15         ` [PATCH v1] " Alexander Shishkin
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2015-07-30 12:14 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

On Thu, Jul 30, 2015 at 02:57:17PM +0300, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Fri, Jul 17, 2015 at 04:34:08PM +0300, Alexander Shishkin wrote:
> >> +#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN     |	\
> >> +			RTIT_CTL_DISRETC    |	\
> >> +			RTIT_CTL_CYCLEACC   |	\
> >> +			RTIT_CTL_MTC_EN     |	\
> >> +			RTIT_CTL_MTC_RANGE  |	\
> >> +			RTIT_CTL_CYC_THRESH |   \
> >> +			RTIT_CTL_PSB_FREQ)
> >>  
> >
> > #define RTIT_CTL_CYC (RTIT_CTL_CYCLEACC   | \
> > 		      RTIT_CTL_CYC_THRESH | \
> > 		      RTIT_CTL_PSB_FREQ)
> 
> PSB_FREQ is not, strictly speaking, related to cycle accurate mode. Both
> adjustable psb frequency and cycle accurate mode settings are enumerated
> with the same CPUID bit, but they really do different things unrelated
> to one another.

RTIT_CTL_CYC_PSB then, to match the CPUID bit name?

> >> +	if (config & (RTIT_CTL_MTC_EN | RTIT_CTL_MTC_RANGE)) {
> >
> > 	if (config & RTIT_CTL_MTC) {
> >
> > Would that make sense?
> 
> To me either way is fine. Want me to respin it?

Please.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in
  2015-07-30 12:13       ` Peter Zijlstra
@ 2015-07-30 12:49         ` Alexander Shishkin
  2015-07-30 13:36           ` Alexander Shishkin
  2015-07-30 13:48         ` [PATCH v1] perf/x86/intel/pt: Do not force sync packets " Alexander Shishkin
  2015-07-31 11:47         ` [PATCH v2] " Alexander Shishkin
  2 siblings, 1 reply; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-30 12:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Jul 30, 2015 at 02:53:51PM +0300, Alexander Shishkin wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > On Fri, Jul 17, 2015 at 04:34:09PM +0300, Alexander Shishkin wrote:
>> >> Currently, the PT driver zeroes out the status register every time before
>> >> starting the event. However, all the writable bits are already taken care
>> >> of in pt_handle_status() function, except the new PacketByteCnt field,
>> >> which in new versions of PT contains the number of packet bytes written
>> >> since the last sync (PSB) packet. Zeroing it out before enabling PT forces
>> >> a sync packet to be written. This means that, with the existing code, a
>> >> sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
>> >> time a PT event is scheduled in.
>> >> 
>> >> To avoid these unnecessary syncs and save a WRMSR in the fast path, this
>> >> patch adds a new attribute config bit "no_force_psb", which will disable
>> >> this zeroing WRMSR.
>> >
>> > Why is this exposed?
>> 
>> The default behavior, which we have in the kernel now, is to always
>> force the PSB, so if we change it, we need to make sure nobody is
>> relying on it being the default behavior. 
>
> Seeing how there's hardly any tools out there for this -- and its only
> recently introduced to the kernel, I think we can change this
> behaviour.

Well, aside from perf tool itself (I'm not sure what's been happening to
the PT support there), there's gdb support, which has been merged to the
gdb mainline some time ago and makes use of our perf interface. I don't
know, however, if it relies on this or not.

>> Granted, for the hardware
>> that's on the market right now it won't make a difference (it generates
>> PSB+ at least every 256 bytes anyway), but in future if we change the
>> default, newer hardware will produce different results with 4.2 than with
>> the newer kernels and this would be a tad sloppy.
>
> Even more reason not to expose this. By the time there's hardware out
> there that supports this, 4.2 will be an ancient kernel :-)
>
>> The tools, however, can decide to set no_force_psb=1 when no_force_psb
>> is available (sysfs caps) and otherwise the default behavior will be the
>> same as no_force_psb==0. Makes sense?
>
> I'd rather not expose this. It doesn't make a difference to the actual
> trace data other than it being bigger. Any decoder must be able to
> decode both versions. And you cannot really tell anything from there
> being too many PSB frames.

It kinda does make a difference: inside PSB+ you get a TSC packet, which
carries the tsc timestamp as a payload, so, with the current behavior,
since you'll get one of those at every event schedule-in, you'll have
many more time synchronization points with the perf stream, but when the
default changes, suddenly your timestamps way further apart and the
decoder needs to put in additional guesswork to correlate those MMAP
events in the perf stream with the PT trace.

So normally, instead of forcing the PSB+, one would enable MTC (mini
time counter) packets, which would give better time reference points. If
one knows to do this, one doesn't need extra PSBs, if one doesn't, but
stuff sort of works for them because of the frequent PSBs, they will
break after the default changes. The decoders, ideally, should
eventually learn to deal with both cases, so in the long run it
shouldn't matter at all. I just typed in all that to clarify exactly
what the difference in trace data would mean.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v1] perf/x86/intel/pt: Add new timing packet enables
  2015-07-30 12:14       ` Peter Zijlstra
@ 2015-07-30 13:15         ` Alexander Shishkin
  2015-08-04  8:55           ` [tip:perf/core] " tip-bot for Alexander Shishkin
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-30 13:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme,
	Alexander Shishkin

Intel PT chapter in the new Intel Architecture SDM adds several packets
corresponding enable bits and registers that control packet generation.
Also, additional bits in the Intel PT CPUID leaf were added to enumerate
presence and parameters of these new packets and features.

The packets and enables are:
  * CYC: cycle accurate mode, provides the number of cycles elapsed since
    previous CYC packet; its presence and available threshold values are
    enumerated via CPUID;
  * MTC: mini time counter packets, used for tracking TSC time between
    full TSC packets; its presence and available resolution options are
    enumerated via CPUID;
  * PSB packet period is now configurable, available period values are
    enumerated via CPUID.

This patch adds corresponding bit and register definitions, pmu driver
capabilities based on CPUID enumeration, new attribute format bits for
the new featurens and extends event configuration validation function
to take these into account.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/include/asm/msr-index.h          |  8 ++++
 arch/x86/kernel/cpu/intel_pt.h            |  6 +++
 arch/x86/kernel/cpu/perf_event_intel_pt.c | 68 ++++++++++++++++++++++++++++++-
 3 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 9ebc3d0093..c665d34f72 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -80,13 +80,21 @@
 
 #define MSR_IA32_RTIT_CTL		0x00000570
 #define RTIT_CTL_TRACEEN		BIT(0)
+#define RTIT_CTL_CYCLEACC		BIT(1)
 #define RTIT_CTL_OS			BIT(2)
 #define RTIT_CTL_USR			BIT(3)
 #define RTIT_CTL_CR3EN			BIT(7)
 #define RTIT_CTL_TOPA			BIT(8)
+#define RTIT_CTL_MTC_EN			BIT(9)
 #define RTIT_CTL_TSC_EN			BIT(10)
 #define RTIT_CTL_DISRETC		BIT(11)
 #define RTIT_CTL_BRANCH_EN		BIT(13)
+#define RTIT_CTL_MTC_RANGE_OFFSET	14
+#define RTIT_CTL_MTC_RANGE		(0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
+#define RTIT_CTL_CYC_THRESH_OFFSET	19
+#define RTIT_CTL_CYC_THRESH		(0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
+#define RTIT_CTL_PSB_FREQ_OFFSET	24
+#define RTIT_CTL_PSB_FREQ      		(0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
 #define MSR_IA32_RTIT_STATUS		0x00000571
 #define RTIT_STATUS_CONTEXTEN		BIT(1)
 #define RTIT_STATUS_TRIGGEREN		BIT(2)
diff --git a/arch/x86/kernel/cpu/intel_pt.h b/arch/x86/kernel/cpu/intel_pt.h
index 1c338b0eba..feb293e965 100644
--- a/arch/x86/kernel/cpu/intel_pt.h
+++ b/arch/x86/kernel/cpu/intel_pt.h
@@ -72,9 +72,15 @@ struct topa_entry {
 enum pt_capabilities {
 	PT_CAP_max_subleaf = 0,
 	PT_CAP_cr3_filtering,
+	PT_CAP_psb_cyc,
+	PT_CAP_mtc,
 	PT_CAP_topa_output,
 	PT_CAP_topa_multiple_entries,
+	PT_CAP_single_range_output,
 	PT_CAP_payloads_lip,
+	PT_CAP_mtc_periods,
+	PT_CAP_cycle_thresholds,
+	PT_CAP_psb_periods,
 };
 
 struct pt_pmu {
diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index 183de71962..695fd6c562 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -65,9 +65,15 @@ static struct pt_cap_desc {
 } pt_caps[] = {
 	PT_CAP(max_subleaf,		0, CR_EAX, 0xffffffff),
 	PT_CAP(cr3_filtering,		0, CR_EBX, BIT(0)),
+	PT_CAP(psb_cyc,			0, CR_EBX, BIT(1)),
+	PT_CAP(mtc,			0, CR_EBX, BIT(3)),
 	PT_CAP(topa_output,		0, CR_ECX, BIT(0)),
 	PT_CAP(topa_multiple_entries,	0, CR_ECX, BIT(1)),
+	PT_CAP(single_range_output,	0, CR_ECX, BIT(2)),
 	PT_CAP(payloads_lip,		0, CR_ECX, BIT(31)),
+	PT_CAP(mtc_periods,		1, CR_EAX, 0xffff0000),
+	PT_CAP(cycle_thresholds,	1, CR_EBX, 0xffff),
+	PT_CAP(psb_periods,		1, CR_EBX, 0xffff0000),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -94,12 +100,22 @@ static struct attribute_group pt_cap_group = {
 	.name	= "caps",
 };
 
+PMU_FORMAT_ATTR(cyc,		"config:1"	);
+PMU_FORMAT_ATTR(mtc,		"config:9"	);
 PMU_FORMAT_ATTR(tsc,		"config:10"	);
 PMU_FORMAT_ATTR(noretcomp,	"config:11"	);
+PMU_FORMAT_ATTR(mtc_period,	"config:14-17"	);
+PMU_FORMAT_ATTR(cyc_thresh,	"config:19-22"	);
+PMU_FORMAT_ATTR(psb_period,	"config:24-27"	);
 
 static struct attribute *pt_formats_attr[] = {
+	&format_attr_cyc.attr,
+	&format_attr_mtc.attr,
 	&format_attr_tsc.attr,
 	&format_attr_noretcomp.attr,
+	&format_attr_mtc_period.attr,
+	&format_attr_cyc_thresh.attr,
+	&format_attr_psb_period.attr,
 	NULL,
 };
 
@@ -170,15 +186,65 @@ fail:
 	return ret;
 }
 
-#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC)
+#define RTIT_CTL_CYC_PSB (RTIT_CTL_CYCLEACC	| \
+			  RTIT_CTL_CYC_THRESH	| \
+			  RTIT_CTL_PSB_FREQ)
+
+#define RTIT_CTL_MTC	(RTIT_CTL_MTC_EN	| \
+			 RTIT_CTL_MTC_RANGE)
+
+#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN		| \
+			RTIT_CTL_DISRETC	| \
+			RTIT_CTL_CYC_PSB	| \
+			RTIT_CTL_MTC)
 
 static bool pt_event_valid(struct perf_event *event)
 {
 	u64 config = event->attr.config;
+	u64 allowed, requested;
 
 	if ((config & PT_CONFIG_MASK) != config)
 		return false;
 
+	if (config & RTIT_CTL_CYC_PSB) {
+		if (!pt_cap_get(PT_CAP_psb_cyc))
+			return false;
+
+		allowed = pt_cap_get(PT_CAP_psb_periods);
+		requested = (config & RTIT_CTL_PSB_FREQ) >>
+			RTIT_CTL_PSB_FREQ_OFFSET;
+		if (requested && (!(allowed & BIT(requested))))
+			return false;
+
+		allowed = pt_cap_get(PT_CAP_cycle_thresholds);
+		requested = (config & RTIT_CTL_CYC_THRESH) >>
+			RTIT_CTL_CYC_THRESH_OFFSET;
+		if (requested && (!(allowed & BIT(requested))))
+			return false;
+	}
+
+	if (config & RTIT_CTL_MTC) {
+		/*
+		 * In the unlikely case that CPUID lists valid mtc periods,
+		 * but not the mtc capability, drop out here.
+		 *
+		 * Spec says that setting mtc period bits while mtc bit in
+		 * CPUID is 0 will #GP, so better safe than sorry.
+		 */
+		if (!pt_cap_get(PT_CAP_mtc))
+			return false;
+
+		allowed = pt_cap_get(PT_CAP_mtc_periods);
+		if (!allowed)
+			return false;
+
+		requested = (config & RTIT_CTL_MTC_RANGE) >>
+			RTIT_CTL_MTC_RANGE_OFFSET;
+
+		if (!(allowed & BIT(requested)))
+			return false;
+	}
+
 	return true;
 }
 
-- 
2.4.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in
  2015-07-30 12:49         ` Alexander Shishkin
@ 2015-07-30 13:36           ` Alexander Shishkin
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-30 13:36 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

Alexander Shishkin <alexander.shishkin@linux.intel.com> writes:

> It kinda does make a difference: inside PSB+ you get a TSC packet, which
> carries the tsc timestamp as a payload, so, with the current behavior,
> since you'll get one of those at every event schedule-in, you'll have
> many more time synchronization points with the perf stream, but when the
> default changes, suddenly your timestamps way further apart and the
> decoder needs to put in additional guesswork to correlate those MMAP
> events in the perf stream with the PT trace.

I spoke hastily, TSC packets will be generated on every schedule-in (as
well as other packets that normally come in PSB+) regardless of the
actual PSB, so scratch the above. Let me replace the above with a
smaller patch.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v1] perf/x86/intel/pt: Do not force sync packets on every schedule-in
  2015-07-30 12:13       ` Peter Zijlstra
  2015-07-30 12:49         ` Alexander Shishkin
@ 2015-07-30 13:48         ` Alexander Shishkin
  2015-07-30 15:24           ` Alexander Shishkin
  2015-08-04  8:55           ` [tip:perf/core] " tip-bot for Alexander Shishkin
  2015-07-31 11:47         ` [PATCH v2] " Alexander Shishkin
  2 siblings, 2 replies; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-30 13:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme,
	Alexander Shishkin

Currently, the PT driver zeroes out the status register every time before
starting the event. However, all the writable bits are already taken care
of in pt_handle_status() function, except the new PacketByteCnt field,
which in new versions of PT contains the number of packet bytes written
since the last sync (PSB) packet. Zeroing it out before enabling PT forces
a sync packet to be written. This means that, with the existing code, a
sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
time a PT event is scheduled in.

To avoid these unnecessary syncs and save a WRMSR in the fast path, this
patch changes the default behavior to not clear PacketByteCnt field, so
that the sync packets will be generated with the period specified as
"psb_period" attribute config field. This has little impact on the trace
data as the other packets that are normally sent within PSB+ (between PSB
and PSBEND) have their own generation scenarios which do not depend on the
sync packets.

One exception where we do need to force PSB like this when tracing starts,
so that the decoder has a clear sync point in the trace. For this purpose
we aready have hw::itrace_started flag, which we are currently using to
output PERF_RECORD_ITRACE_START. This patch moves setting itrace_started
from perf core to the pmu::start, where it should still be 0 on the very
first run.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>

perf/x86/intel/pt: fixup with no force psb

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_pt.c | 7 +++++--
 kernel/events/core.c                      | 2 --
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index 695fd6c562..e20cfacb5a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -257,6 +257,11 @@ static void pt_config(struct perf_event *event)
 {
 	u64 reg;
 
+	if (!event->hw.itrace_started) {
+		event->hw.itrace_started = 1;
+		wrmsrl(MSR_IA32_RTIT_STATUS, 0);
+	}
+
 	reg = RTIT_CTL_TOPA | RTIT_CTL_BRANCH_EN | RTIT_CTL_TRACEEN;
 
 	if (!event->attr.exclude_kernel)
@@ -976,7 +981,6 @@ void intel_pt_interrupt(void)
 
 		pt_config_buffer(buf->cur->table, buf->cur_idx,
 				 buf->output_off);
-		wrmsrl(MSR_IA32_RTIT_STATUS, 0);
 		pt_config(event);
 	}
 }
@@ -1000,7 +1004,6 @@ static void pt_event_start(struct perf_event *event, int mode)
 
 	pt_config_buffer(buf->cur->table, buf->cur_idx,
 			 buf->output_off);
-	wrmsrl(MSR_IA32_RTIT_STATUS, 0);
 	pt_config(event);
 }
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae3419b..15da13237a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6040,8 +6040,6 @@ static void perf_log_itrace_start(struct perf_event *event)
 	    event->hw.itrace_started)
 		return;
 
-	event->hw.itrace_started = 1;
-
 	rec.header.type	= PERF_RECORD_ITRACE_START;
 	rec.header.misc	= 0;
 	rec.header.size	= sizeof(rec);
-- 
2.4.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v1] perf/x86/intel/pt: Do not force sync packets on every schedule-in
  2015-07-30 13:48         ` [PATCH v1] perf/x86/intel/pt: Do not force sync packets " Alexander Shishkin
@ 2015-07-30 15:24           ` Alexander Shishkin
  2015-08-04  8:55           ` [tip:perf/core] " tip-bot for Alexander Shishkin
  1 sibling, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-30 15:24 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

Alexander Shishkin <alexander.shishkin@linux.intel.com> writes:

> Currently, the PT driver zeroes out the status register every time before
> starting the event. However, all the writable bits are already taken care
> of in pt_handle_status() function, except the new PacketByteCnt field,
> which in new versions of PT contains the number of packet bytes written
> since the last sync (PSB) packet. Zeroing it out before enabling PT forces
> a sync packet to be written. This means that, with the existing code, a
> sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
> time a PT event is scheduled in.
>
> To avoid these unnecessary syncs and save a WRMSR in the fast path, this
> patch changes the default behavior to not clear PacketByteCnt field, so
> that the sync packets will be generated with the period specified as
> "psb_period" attribute config field. This has little impact on the trace
> data as the other packets that are normally sent within PSB+ (between PSB
> and PSBEND) have their own generation scenarios which do not depend on the
> sync packets.
>
> One exception where we do need to force PSB like this when tracing starts,
> so that the decoder has a clear sync point in the trace. For this purpose
> we aready have hw::itrace_started flag, which we are currently using to
> output PERF_RECORD_ITRACE_START. This patch moves setting itrace_started
> from perf core to the pmu::start, where it should still be 0 on the very
> first run.
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>
> perf/x86/intel/pt: fixup with no force psb
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>

Argh, it is not my day today.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] perf/x86/intel/pt: Do not force sync packets on every schedule-in
  2015-07-30 12:13       ` Peter Zijlstra
  2015-07-30 12:49         ` Alexander Shishkin
  2015-07-30 13:48         ` [PATCH v1] perf/x86/intel/pt: Do not force sync packets " Alexander Shishkin
@ 2015-07-31 11:47         ` Alexander Shishkin
  2 siblings, 0 replies; 21+ messages in thread
From: Alexander Shishkin @ 2015-07-31 11:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme,
	Alexander Shishkin

Currently, the PT driver zeroes out the status register every time before
starting the event. However, all the writable bits are already taken care
of in pt_handle_status() function, except the new PacketByteCnt field,
which in new versions of PT contains the number of packet bytes written
since the last sync (PSB) packet. Zeroing it out before enabling PT forces
a sync packet to be written. This means that, with the existing code, a
sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
time a PT event is scheduled in.

To avoid these unnecessary syncs and save a WRMSR in the fast path, this
patch changes the default behavior to not clear PacketByteCnt field, so
that the sync packets will be generated with the period specified as
"psb_period" attribute config field. This has little impact on the trace
data as the other packets that are normally sent within PSB+ (between PSB
and PSBEND) have their own generation scenarios which do not depend on the
sync packets.

One exception where we do need to force PSB like this when tracing starts,
so that the decoder has a clear sync point in the trace. For this purpose
we aready have hw::itrace_started flag, which we are currently using to
output PERF_RECORD_ITRACE_START. This patch moves setting itrace_started
from perf core to the pmu::start, where it should still be 0 on the very
first run.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_pt.c | 7 +++++--
 kernel/events/core.c                      | 2 --
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index 695fd6c562..e20cfacb5a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -257,6 +257,11 @@ static void pt_config(struct perf_event *event)
 {
 	u64 reg;
 
+	if (!event->hw.itrace_started) {
+		event->hw.itrace_started = 1;
+		wrmsrl(MSR_IA32_RTIT_STATUS, 0);
+	}
+
 	reg = RTIT_CTL_TOPA | RTIT_CTL_BRANCH_EN | RTIT_CTL_TRACEEN;
 
 	if (!event->attr.exclude_kernel)
@@ -976,7 +981,6 @@ void intel_pt_interrupt(void)
 
 		pt_config_buffer(buf->cur->table, buf->cur_idx,
 				 buf->output_off);
-		wrmsrl(MSR_IA32_RTIT_STATUS, 0);
 		pt_config(event);
 	}
 }
@@ -1000,7 +1004,6 @@ static void pt_event_start(struct perf_event *event, int mode)
 
 	pt_config_buffer(buf->cur->table, buf->cur_idx,
 			 buf->output_off);
-	wrmsrl(MSR_IA32_RTIT_STATUS, 0);
 	pt_config(event);
 }
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3dae3419b..15da13237a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6040,8 +6040,6 @@ static void perf_log_itrace_start(struct perf_event *event)
 	    event->hw.itrace_started)
 		return;
 
-	event->hw.itrace_started = 1;
-
 	rec.header.type	= PERF_RECORD_ITRACE_START;
 	rec.header.misc	= 0;
 	rec.header.size	= sizeof(rec);
-- 
2.4.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf/x86/intel/pt: Do not force sync packets on every schedule-in
  2015-07-30 13:48         ` [PATCH v1] perf/x86/intel/pt: Do not force sync packets " Alexander Shishkin
  2015-07-30 15:24           ` Alexander Shishkin
@ 2015-08-04  8:55           ` tip-bot for Alexander Shishkin
  1 sibling, 0 replies; 21+ messages in thread
From: tip-bot for Alexander Shishkin @ 2015-08-04  8:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, torvalds, peterz, alexander.shishkin, hpa, linux-kernel

Commit-ID:  9a6694cfa2390181dec936a17c0d9d21ef7b08d9
Gitweb:     http://git.kernel.org/tip/9a6694cfa2390181dec936a17c0d9d21ef7b08d9
Author:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Thu, 30 Jul 2015 16:48:24 +0300
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 4 Aug 2015 10:16:55 +0200

perf/x86/intel/pt: Do not force sync packets on every schedule-in

Currently, the PT driver zeroes out the status register every time before
starting the event. However, all the writable bits are already taken care
of in pt_handle_status() function, except the new PacketByteCnt field,
which in new versions of PT contains the number of packet bytes written
since the last sync (PSB) packet. Zeroing it out before enabling PT forces
a sync packet to be written. This means that, with the existing code, a
sync packet (PSB and PSBEND, 18 bytes in total) will be generated every
time a PT event is scheduled in.

To avoid these unnecessary syncs and save a WRMSR in the fast path, this
patch changes the default behavior to not clear PacketByteCnt field, so
that the sync packets will be generated with the period specified as
"psb_period" attribute config field. This has little impact on the trace
data as the other packets that are normally sent within PSB+ (between PSB
and PSBEND) have their own generation scenarios which do not depend on the
sync packets.

One exception where we do need to force PSB like this when tracing starts,
so that the decoder has a clear sync point in the trace. For this purpose
we aready have hw::itrace_started flag, which we are currently using to
output PERF_RECORD_ITRACE_START. This patch moves setting itrace_started
from perf core to the pmu::start, where it should still be 0 on the very
first run.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1438264104-16189-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_intel_pt.c | 7 +++++--
 kernel/events/core.c                      | 2 --
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index 183de71..cc58ef8 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -191,6 +191,11 @@ static void pt_config(struct perf_event *event)
 {
 	u64 reg;
 
+	if (!event->hw.itrace_started) {
+		event->hw.itrace_started = 1;
+		wrmsrl(MSR_IA32_RTIT_STATUS, 0);
+	}
+
 	reg = RTIT_CTL_TOPA | RTIT_CTL_BRANCH_EN | RTIT_CTL_TRACEEN;
 
 	if (!event->attr.exclude_kernel)
@@ -910,7 +915,6 @@ void intel_pt_interrupt(void)
 
 		pt_config_buffer(buf->cur->table, buf->cur_idx,
 				 buf->output_off);
-		wrmsrl(MSR_IA32_RTIT_STATUS, 0);
 		pt_config(event);
 	}
 }
@@ -934,7 +938,6 @@ static void pt_event_start(struct perf_event *event, int mode)
 
 	pt_config_buffer(buf->cur->table, buf->cur_idx,
 			 buf->output_off);
-	wrmsrl(MSR_IA32_RTIT_STATUS, 0);
 	pt_config(event);
 }
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a9796c8..bdea129 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6139,8 +6139,6 @@ static void perf_log_itrace_start(struct perf_event *event)
 	    event->hw.itrace_started)
 		return;
 
-	event->hw.itrace_started = 1;
-
 	rec.header.type	= PERF_RECORD_ITRACE_START;
 	rec.header.misc	= 0;
 	rec.header.size	= sizeof(rec);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf/x86/intel/pt: Add new timing packet enables
  2015-07-30 13:15         ` [PATCH v1] " Alexander Shishkin
@ 2015-08-04  8:55           ` tip-bot for Alexander Shishkin
  0 siblings, 0 replies; 21+ messages in thread
From: tip-bot for Alexander Shishkin @ 2015-08-04  8:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, peterz, alexander.shishkin, torvalds, mingo, hpa, linux-kernel

Commit-ID:  b1bf72d6691cc33fc7763fc8ec77df42ca1a8702
Gitweb:     http://git.kernel.org/tip/b1bf72d6691cc33fc7763fc8ec77df42ca1a8702
Author:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Thu, 30 Jul 2015 16:15:31 +0300
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 4 Aug 2015 10:16:55 +0200

perf/x86/intel/pt: Add new timing packet enables

Intel PT chapter in the new Intel Architecture SDM adds several packets
corresponding enable bits and registers that control packet generation.
Also, additional bits in the Intel PT CPUID leaf were added to enumerate
presence and parameters of these new packets and features.

The packets and enables are:

  * CYC: cycle accurate mode, provides the number of cycles elapsed since
    previous CYC packet; its presence and available threshold values are
    enumerated via CPUID;

  * MTC: mini time counter packets, used for tracking TSC time between
    full TSC packets; its presence and available resolution options are
    enumerated via CPUID;

  * PSB packet period is now configurable, available period values are
    enumerated via CPUID.

This patch adds corresponding bit and register definitions, pmu driver
capabilities based on CPUID enumeration, new attribute format bits for
the new featurens and extends event configuration validation function
to take these into account.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1438262131-12725-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/msr-index.h          |  8 ++++
 arch/x86/kernel/cpu/intel_pt.h            |  6 +++
 arch/x86/kernel/cpu/perf_event_intel_pt.c | 68 ++++++++++++++++++++++++++++++-
 3 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 9ebc3d0..c665d34 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -80,13 +80,21 @@
 
 #define MSR_IA32_RTIT_CTL		0x00000570
 #define RTIT_CTL_TRACEEN		BIT(0)
+#define RTIT_CTL_CYCLEACC		BIT(1)
 #define RTIT_CTL_OS			BIT(2)
 #define RTIT_CTL_USR			BIT(3)
 #define RTIT_CTL_CR3EN			BIT(7)
 #define RTIT_CTL_TOPA			BIT(8)
+#define RTIT_CTL_MTC_EN			BIT(9)
 #define RTIT_CTL_TSC_EN			BIT(10)
 #define RTIT_CTL_DISRETC		BIT(11)
 #define RTIT_CTL_BRANCH_EN		BIT(13)
+#define RTIT_CTL_MTC_RANGE_OFFSET	14
+#define RTIT_CTL_MTC_RANGE		(0x0full << RTIT_CTL_MTC_RANGE_OFFSET)
+#define RTIT_CTL_CYC_THRESH_OFFSET	19
+#define RTIT_CTL_CYC_THRESH		(0x0full << RTIT_CTL_CYC_THRESH_OFFSET)
+#define RTIT_CTL_PSB_FREQ_OFFSET	24
+#define RTIT_CTL_PSB_FREQ      		(0x0full << RTIT_CTL_PSB_FREQ_OFFSET)
 #define MSR_IA32_RTIT_STATUS		0x00000571
 #define RTIT_STATUS_CONTEXTEN		BIT(1)
 #define RTIT_STATUS_TRIGGEREN		BIT(2)
diff --git a/arch/x86/kernel/cpu/intel_pt.h b/arch/x86/kernel/cpu/intel_pt.h
index 1c338b0..feb293e 100644
--- a/arch/x86/kernel/cpu/intel_pt.h
+++ b/arch/x86/kernel/cpu/intel_pt.h
@@ -72,9 +72,15 @@ struct topa_entry {
 enum pt_capabilities {
 	PT_CAP_max_subleaf = 0,
 	PT_CAP_cr3_filtering,
+	PT_CAP_psb_cyc,
+	PT_CAP_mtc,
 	PT_CAP_topa_output,
 	PT_CAP_topa_multiple_entries,
+	PT_CAP_single_range_output,
 	PT_CAP_payloads_lip,
+	PT_CAP_mtc_periods,
+	PT_CAP_cycle_thresholds,
+	PT_CAP_psb_periods,
 };
 
 struct pt_pmu {
diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index cc58ef8..e20cfac 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -65,9 +65,15 @@ static struct pt_cap_desc {
 } pt_caps[] = {
 	PT_CAP(max_subleaf,		0, CR_EAX, 0xffffffff),
 	PT_CAP(cr3_filtering,		0, CR_EBX, BIT(0)),
+	PT_CAP(psb_cyc,			0, CR_EBX, BIT(1)),
+	PT_CAP(mtc,			0, CR_EBX, BIT(3)),
 	PT_CAP(topa_output,		0, CR_ECX, BIT(0)),
 	PT_CAP(topa_multiple_entries,	0, CR_ECX, BIT(1)),
+	PT_CAP(single_range_output,	0, CR_ECX, BIT(2)),
 	PT_CAP(payloads_lip,		0, CR_ECX, BIT(31)),
+	PT_CAP(mtc_periods,		1, CR_EAX, 0xffff0000),
+	PT_CAP(cycle_thresholds,	1, CR_EBX, 0xffff),
+	PT_CAP(psb_periods,		1, CR_EBX, 0xffff0000),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -94,12 +100,22 @@ static struct attribute_group pt_cap_group = {
 	.name	= "caps",
 };
 
+PMU_FORMAT_ATTR(cyc,		"config:1"	);
+PMU_FORMAT_ATTR(mtc,		"config:9"	);
 PMU_FORMAT_ATTR(tsc,		"config:10"	);
 PMU_FORMAT_ATTR(noretcomp,	"config:11"	);
+PMU_FORMAT_ATTR(mtc_period,	"config:14-17"	);
+PMU_FORMAT_ATTR(cyc_thresh,	"config:19-22"	);
+PMU_FORMAT_ATTR(psb_period,	"config:24-27"	);
 
 static struct attribute *pt_formats_attr[] = {
+	&format_attr_cyc.attr,
+	&format_attr_mtc.attr,
 	&format_attr_tsc.attr,
 	&format_attr_noretcomp.attr,
+	&format_attr_mtc_period.attr,
+	&format_attr_cyc_thresh.attr,
+	&format_attr_psb_period.attr,
 	NULL,
 };
 
@@ -170,15 +186,65 @@ fail:
 	return ret;
 }
 
-#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC)
+#define RTIT_CTL_CYC_PSB (RTIT_CTL_CYCLEACC	| \
+			  RTIT_CTL_CYC_THRESH	| \
+			  RTIT_CTL_PSB_FREQ)
+
+#define RTIT_CTL_MTC	(RTIT_CTL_MTC_EN	| \
+			 RTIT_CTL_MTC_RANGE)
+
+#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN		| \
+			RTIT_CTL_DISRETC	| \
+			RTIT_CTL_CYC_PSB	| \
+			RTIT_CTL_MTC)
 
 static bool pt_event_valid(struct perf_event *event)
 {
 	u64 config = event->attr.config;
+	u64 allowed, requested;
 
 	if ((config & PT_CONFIG_MASK) != config)
 		return false;
 
+	if (config & RTIT_CTL_CYC_PSB) {
+		if (!pt_cap_get(PT_CAP_psb_cyc))
+			return false;
+
+		allowed = pt_cap_get(PT_CAP_psb_periods);
+		requested = (config & RTIT_CTL_PSB_FREQ) >>
+			RTIT_CTL_PSB_FREQ_OFFSET;
+		if (requested && (!(allowed & BIT(requested))))
+			return false;
+
+		allowed = pt_cap_get(PT_CAP_cycle_thresholds);
+		requested = (config & RTIT_CTL_CYC_THRESH) >>
+			RTIT_CTL_CYC_THRESH_OFFSET;
+		if (requested && (!(allowed & BIT(requested))))
+			return false;
+	}
+
+	if (config & RTIT_CTL_MTC) {
+		/*
+		 * In the unlikely case that CPUID lists valid mtc periods,
+		 * but not the mtc capability, drop out here.
+		 *
+		 * Spec says that setting mtc period bits while mtc bit in
+		 * CPUID is 0 will #GP, so better safe than sorry.
+		 */
+		if (!pt_cap_get(PT_CAP_mtc))
+			return false;
+
+		allowed = pt_cap_get(PT_CAP_mtc_periods);
+		if (!allowed)
+			return false;
+
+		requested = (config & RTIT_CTL_MTC_RANGE) >>
+			RTIT_CTL_MTC_RANGE_OFFSET;
+
+		if (!(allowed & BIT(requested)))
+			return false;
+	}
+
 	return true;
 }
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] perf/x86/intel/bts: Set itrace_started in pmu::start to match the new logic
  2015-07-17 13:34 ` [PATCH 3/3] perf/x86/intel/bts: Set itrace_started in pmu::start to match the new logic Alexander Shishkin
@ 2015-09-08 12:02   ` Alexander Shishkin
  2015-09-08 12:28     ` Peter Zijlstra
  2015-09-13 10:56   ` [tip:perf/core] perf/x86/intel/bts: Set event-> hw.itrace_started " tip-bot for Alexander Shishkin
  1 sibling, 1 reply; 21+ messages in thread
From: Alexander Shishkin @ 2015-09-08 12:02 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

Alexander Shishkin <alexander.shishkin@linux.intel.com> writes:

> Since hw::itrace_started is now set in pmu::start to signal the beginning of
> the trace, do so also in the intel_bts driver.
>
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>

Peter, seems like you overlooked this one. Its counterpart to is in Linus'
master already:

commit 9a6694cfa2390181dec936a17c0d9d21ef7b08d9
Author: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Date:   Thu Jul 30 16:48:24 2015 +0300

    perf/x86/intel/pt: Do not force sync packets on every schedule-in

Cheers,
--
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] perf/x86/intel/bts: Set itrace_started in pmu::start to match the new logic
  2015-09-08 12:02   ` Alexander Shishkin
@ 2015-09-08 12:28     ` Peter Zijlstra
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2015-09-08 12:28 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, linux-kernel, adrian.hunter, x86, hpa, acme

On Tue, Sep 08, 2015 at 03:02:08PM +0300, Alexander Shishkin wrote:
> Alexander Shishkin <alexander.shishkin@linux.intel.com> writes:
> 
> > Since hw::itrace_started is now set in pmu::start to signal the beginning of
> > the trace, do so also in the intel_bts driver.
> >
> > Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> 
> Peter, seems like you overlooked this one. Its counterpart to is in Linus'
> master already:
> 
> commit 9a6694cfa2390181dec936a17c0d9d21ef7b08d9
> Author: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Date:   Thu Jul 30 16:48:24 2015 +0300
> 
>     perf/x86/intel/pt: Do not force sync packets on every schedule-in
> 

Urgh, sorry about that.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip:perf/core] perf/x86/intel/bts: Set event-> hw.itrace_started in pmu::start to match the new logic
  2015-07-17 13:34 ` [PATCH 3/3] perf/x86/intel/bts: Set itrace_started in pmu::start to match the new logic Alexander Shishkin
  2015-09-08 12:02   ` Alexander Shishkin
@ 2015-09-13 10:56   ` tip-bot for Alexander Shishkin
  1 sibling, 0 replies; 21+ messages in thread
From: tip-bot for Alexander Shishkin @ 2015-09-13 10:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, alexander.shishkin, tglx, hpa, linux-kernel, peterz, mingo

Commit-ID:  d249872939bfa86c9cce44a56a8cbdbc7086519b
Gitweb:     http://git.kernel.org/tip/d249872939bfa86c9cce44a56a8cbdbc7086519b
Author:     Alexander Shishkin <alexander.shishkin@linux.intel.com>
AuthorDate: Fri, 17 Jul 2015 16:34:10 +0300
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 11 Sep 2015 10:06:03 +0200

perf/x86/intel/bts: Set event->hw.itrace_started in pmu::start to match the new logic

Since event->hw.itrace_started is now set in pmu::start() to signal the beginning of
the trace, do so also in the intel_bts driver.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1437140050-23363-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_intel_bts.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_bts.c b/arch/x86/kernel/cpu/perf_event_intel_bts.c
index 54690e8..d1c0f25 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_bts.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_bts.c
@@ -222,6 +222,7 @@ static void __bts_event_start(struct perf_event *event)
 	if (!buf || bts_buffer_is_full(buf, bts))
 		return;
 
+	event->hw.itrace_started = 1;
 	event->hw.state = 0;
 
 	if (!buf->snapshot)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-09-13 10:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17 13:34 [PATCH 0/3] perf/x86/intel/pt: Add new packet enables Alexander Shishkin
2015-07-17 13:34 ` [PATCH 1/3] perf/x86/intel/pt: Add new timing " Alexander Shishkin
2015-07-30 10:40   ` Peter Zijlstra
2015-07-30 11:57     ` Alexander Shishkin
2015-07-30 12:14       ` Peter Zijlstra
2015-07-30 13:15         ` [PATCH v1] " Alexander Shishkin
2015-08-04  8:55           ` [tip:perf/core] " tip-bot for Alexander Shishkin
2015-07-17 13:34 ` [PATCH 2/3] perf/x86/intel/pt: Add an option to not force PSB+ on every schedule-in Alexander Shishkin
2015-07-30 10:43   ` Peter Zijlstra
2015-07-30 11:53     ` Alexander Shishkin
2015-07-30 12:13       ` Peter Zijlstra
2015-07-30 12:49         ` Alexander Shishkin
2015-07-30 13:36           ` Alexander Shishkin
2015-07-30 13:48         ` [PATCH v1] perf/x86/intel/pt: Do not force sync packets " Alexander Shishkin
2015-07-30 15:24           ` Alexander Shishkin
2015-08-04  8:55           ` [tip:perf/core] " tip-bot for Alexander Shishkin
2015-07-31 11:47         ` [PATCH v2] " Alexander Shishkin
2015-07-17 13:34 ` [PATCH 3/3] perf/x86/intel/bts: Set itrace_started in pmu::start to match the new logic Alexander Shishkin
2015-09-08 12:02   ` Alexander Shishkin
2015-09-08 12:28     ` Peter Zijlstra
2015-09-13 10:56   ` [tip:perf/core] perf/x86/intel/bts: Set event-> hw.itrace_started " tip-bot for Alexander Shishkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).