linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
@ 2012-04-02 18:19 Robert Richter
  2012-04-02 18:19 ` [PATCH 01/12] perf/x86-ibs: Fix update of period Robert Richter
                   ` (12 more replies)
  0 siblings, 13 replies; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

This patch set adds support for precise event sampling with IBS. It
also contains IBS fixes and updates not directly related to precise
event sampling, but found during testing. There are no changes of perf
tools required, thus this set only contains kernel patches. There will
be also updated perf tools patches available that basically base on my
previous postings to this list and additionally implement IBS pseudo
events.

With IBS there are two counting modes available to count either cycles
or micro-ops. If the corresponding performance counter events (hw
events) are setup with the precise flag set, the request is redirected
to the ibs pmu:

 perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
 perf record -a -e r076:p ...          # same as -e cpu-cycles:p
 perf record -a -e r0C1:p ...          # use ibs op counting micro-ops

Each IBS sample contains a linear address that points to the
instruction that was causing the sample to trigger. With ibs we have
skid 0.

Though the skid is 0, we map IBS sampling to following precise levels:

 1: RIP taken from IBS sample or (if invalid) from stack.
 2: RIP always taken from IBS sample, samples with an invalid rip
    are dropped. Thus samples of an event containing two precise
    modifiers (e.g. r076:pp) only contain (precise) addresses
    detected with IBS.

Precise level 3 is reserved for other purposes in the future.

The patches base on a trivial merge of tip/perf/core into
tip/perf/x86-ibs. The merge and also the patches are available here:

The following changes since commit 820b3e44dc22ac8072cd5ecf82d62193392fcca3:

  Merge remote-tracking branch 'tip/perf/core' into HEAD (2012-03-21 19:15:20 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git perf-ibs

-Robert


Robert Richter (12):
  perf/x86-ibs: Fix update of period
  perf: Pass last sampling period to perf_sample_data_init()
  perf/x86-ibs: Enable ibs op micro-ops counting mode
  perf/x86-ibs: Fix frequency profiling
  perf/x86-ibs: Take instruction pointer from ibs sample
  perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  perf/x86-ibs: Rename some variables
  perf/x86-ibs: Trigger overflow if remaining period is too small
  perf/x86-ibs: Extend hw period that triggers overflow
  perf/x86-ibs: Implement workaround for IBS erratum #420
  perf/x86-ibs: Catch spurious interrupts after stopping ibs
  perf/x86-ibs: Fix usage of IBS op current count

 arch/alpha/kernel/perf_event.c            |    3 +-
 arch/arm/kernel/perf_event_v6.c           |    4 +-
 arch/arm/kernel/perf_event_v7.c           |    4 +-
 arch/arm/kernel/perf_event_xscale.c       |    8 +-
 arch/mips/kernel/perf_event_mipsxx.c      |    2 +-
 arch/powerpc/kernel/perf_event.c          |    3 +-
 arch/powerpc/kernel/perf_event_fsl_emb.c  |    3 +-
 arch/sparc/kernel/perf_event.c            |    4 +-
 arch/x86/include/asm/perf_event.h         |    6 +-
 arch/x86/kernel/cpu/perf_event.c          |    4 +-
 arch/x86/kernel/cpu/perf_event_amd.c      |    7 +-
 arch/x86/kernel/cpu/perf_event_amd_ibs.c  |  274 +++++++++++++++++++++--------
 arch/x86/kernel/cpu/perf_event_intel.c    |    4 +-
 arch/x86/kernel/cpu/perf_event_intel_ds.c |    6 +-
 arch/x86/kernel/cpu/perf_event_p4.c       |    6 +-
 include/linux/perf_event.h                |    5 +-
 kernel/events/core.c                      |    9 +-
 17 files changed, 237 insertions(+), 115 deletions(-)

-- 
1.7.8.4



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 01/12] perf/x86-ibs: Fix update of period
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:29   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 02/12] perf: Pass last sampling period to perf_sample_data_init() Robert Richter
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

The last sw period was not correctly updated on overflow and thus led
to wrong distribution of events. We always need to properly initialize
data.period in struct perf_sample_data.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 573d248..6eb6451 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -386,7 +386,21 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	if (!(*buf++ & perf_ibs->valid_mask))
 		return 0;
 
+	/*
+	 * Emulate IbsOpCurCnt in MSRC001_1033 (IbsOpCtl), not
+	 * supported in all cpus. As this triggered an interrupt, we
+	 * set the current count to the max count.
+	 */
+	config = ibs_data.regs[0];
+	if (perf_ibs == &perf_ibs_op && !(ibs_caps & IBS_CAPS_RDWROPCNT)) {
+		config &= ~IBS_OP_CUR_CNT;
+		config |= (config & IBS_OP_MAX_CNT) << 36;
+	}
+
+	perf_ibs_event_update(perf_ibs, event, config);
 	perf_sample_data_init(&data, 0);
+	data.period = event->hw.last_period;
+
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		ibs_data.caps = ibs_caps;
 		size = 1;
@@ -405,19 +419,6 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 
 	regs = *iregs; /* XXX: update ip from ibs sample */
 
-	/*
-	 * Emulate IbsOpCurCnt in MSRC001_1033 (IbsOpCtl), not
-	 * supported in all cpus. As this triggered an interrupt, we
-	 * set the current count to the max count.
-	 */
-	config = ibs_data.regs[0];
-	if (perf_ibs == &perf_ibs_op && !(ibs_caps & IBS_CAPS_RDWROPCNT)) {
-		config &= ~IBS_OP_CUR_CNT;
-		config |= (config & IBS_OP_MAX_CNT) << 36;
-	}
-
-	perf_ibs_event_update(perf_ibs, event, config);
-
 	overflow = perf_ibs_set_period(perf_ibs, hwc, &config);
 	reenable = !(overflow && perf_event_overflow(event, &data, &regs));
 	config = (config >> 4) | (reenable ? perf_ibs->enable_mask : 0);
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 02/12] perf: Pass last sampling period to perf_sample_data_init()
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
  2012-04-02 18:19 ` [PATCH 01/12] perf/x86-ibs: Fix update of period Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:30   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 03/12] perf/x86-ibs: Enable ibs op micro-ops counting mode Robert Richter
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

We always need to pass the last sample period to
perf_sample_data_init(), otherwise the event distribution will be
wrong. Thus, modifiyng the function interface with the required period
as argument. So basically a pattern like this:

        perf_sample_data_init(&data, ~0ULL);
        data.period = event->hw.last_period;

will now be like that:

        perf_sample_data_init(&data, ~0ULL, event->hw.last_period);

Avoids unininitialized data.period and simplifies code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/alpha/kernel/perf_event.c            |    3 +--
 arch/arm/kernel/perf_event_v6.c           |    4 +---
 arch/arm/kernel/perf_event_v7.c           |    4 +---
 arch/arm/kernel/perf_event_xscale.c       |    8 ++------
 arch/mips/kernel/perf_event_mipsxx.c      |    2 +-
 arch/powerpc/kernel/perf_event.c          |    3 +--
 arch/powerpc/kernel/perf_event_fsl_emb.c  |    3 +--
 arch/sparc/kernel/perf_event.c            |    4 +---
 arch/x86/kernel/cpu/perf_event.c          |    4 +---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c  |    3 +--
 arch/x86/kernel/cpu/perf_event_intel.c    |    4 +---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |    6 ++----
 arch/x86/kernel/cpu/perf_event_p4.c       |    6 +++---
 include/linux/perf_event.h                |    5 ++++-
 kernel/events/core.c                      |    9 ++++-----
 15 files changed, 25 insertions(+), 43 deletions(-)

diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index 0dae252..d821b17 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -824,7 +824,6 @@ static void alpha_perf_event_irq_handler(unsigned long la_ptr,
 
 	idx = la_ptr;
 
-	perf_sample_data_init(&data, 0);
 	for (j = 0; j < cpuc->n_events; j++) {
 		if (cpuc->current_idx[j] == idx)
 			break;
@@ -848,7 +847,7 @@ static void alpha_perf_event_irq_handler(unsigned long la_ptr,
 
 	hwc = &event->hw;
 	alpha_perf_event_update(event, hwc, idx, alpha_pmu->pmc_max_period[idx]+1);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, hwc->last_period);
 
 	if (alpha_perf_event_set_period(event, hwc, idx)) {
 		if (perf_event_overflow(event, &data, regs)) {
diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index b78af0c..ab627a7 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -489,8 +489,6 @@ armv6pmu_handle_irq(int irq_num,
 	 */
 	armv6_pmcr_write(pmcr);
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
@@ -509,7 +507,7 @@ armv6pmu_handle_irq(int irq_num,
 
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!armpmu_event_set_period(event, hwc, idx))
 			continue;
 
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 4d7095a..ec0c6cc 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -953,8 +953,6 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
 	 */
 	regs = get_irq_regs();
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
@@ -973,7 +971,7 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
 
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!armpmu_event_set_period(event, hwc, idx))
 			continue;
 
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 71a21e6..e34e725 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -248,8 +248,6 @@ xscale1pmu_handle_irq(int irq_num, void *dev)
 
 	regs = get_irq_regs();
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
@@ -263,7 +261,7 @@ xscale1pmu_handle_irq(int irq_num, void *dev)
 
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!armpmu_event_set_period(event, hwc, idx))
 			continue;
 
@@ -588,8 +586,6 @@ xscale2pmu_handle_irq(int irq_num, void *dev)
 
 	regs = get_irq_regs();
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
@@ -603,7 +599,7 @@ xscale2pmu_handle_irq(int irq_num, void *dev)
 
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!armpmu_event_set_period(event, hwc, idx))
 			continue;
 
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 811084f..ab73fa2 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -1325,7 +1325,7 @@ static int mipsxx_pmu_handle_shared_irq(void)
 
 	regs = get_irq_regs();
 
-	perf_sample_data_init(&data, 0);
+	perf_sample_data_init(&data, 0, 0);
 
 	switch (counters) {
 #define HANDLE_COUNTER(n)						\
diff --git a/arch/powerpc/kernel/perf_event.c b/arch/powerpc/kernel/perf_event.c
index c2e27ed..df2b284 100644
--- a/arch/powerpc/kernel/perf_event.c
+++ b/arch/powerpc/kernel/perf_event.c
@@ -1268,8 +1268,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 	if (record) {
 		struct perf_sample_data data;
 
-		perf_sample_data_init(&data, ~0ULL);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, ~0ULL, event->hw.last_period);
 
 		if (event->attr.sample_type & PERF_SAMPLE_ADDR)
 			perf_get_data_addr(regs, &data.addr);
diff --git a/arch/powerpc/kernel/perf_event_fsl_emb.c b/arch/powerpc/kernel/perf_event_fsl_emb.c
index 0a6d2a9..106c533 100644
--- a/arch/powerpc/kernel/perf_event_fsl_emb.c
+++ b/arch/powerpc/kernel/perf_event_fsl_emb.c
@@ -613,8 +613,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 	if (record) {
 		struct perf_sample_data data;
 
-		perf_sample_data_init(&data, 0);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, event->hw.last_period);
 
 		if (perf_event_overflow(event, &data, regs))
 			fsl_emb_pmu_stop(event, 0);
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 8e16a4a..333a14a 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1294,8 +1294,6 @@ static int __kprobes perf_event_nmi_handler(struct notifier_block *self,
 
 	regs = args->regs;
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
 	/* If the PMU has the TOE IRQ enable bits, we need to do a
@@ -1319,7 +1317,7 @@ static int __kprobes perf_event_nmi_handler(struct notifier_block *self,
 		if (val & (1ULL << 31))
 			continue;
 
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!sparc_perf_event_set_period(event, hwc, idx))
 			continue;
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 453ac94..56ae3af 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1187,8 +1187,6 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 	int idx, handled = 0;
 	u64 val;
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
 	/*
@@ -1223,7 +1221,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 		 * event overflow
 		 */
 		handled++;
-		data.period	= event->hw.last_period;
+		perf_sample_data_init(&data, 0, event->hw.last_period);
 
 		if (!x86_perf_event_set_period(event))
 			continue;
diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 6eb6451..74b663c 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -398,8 +398,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	}
 
 	perf_ibs_event_update(perf_ibs, event, config);
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, hwc->last_period);
 
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		ibs_data.caps = ibs_caps;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 26b3e2f..166546e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1027,8 +1027,6 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	u64 status;
 	int handled;
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
 	/*
@@ -1082,7 +1080,7 @@ again:
 		if (!intel_pmu_save_and_restart(event))
 			continue;
 
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, event->hw.last_period);
 
 		if (has_branch_stack(event))
 			data.br_stack = &cpuc->lbr_stack;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 7f64df1..5a3edc2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -316,8 +316,7 @@ int intel_pmu_drain_bts_buffer(void)
 
 	ds->bts_index = ds->bts_buffer_base;
 
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, event->hw.last_period);
 	regs.ip     = 0;
 
 	/*
@@ -564,8 +563,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	if (!intel_pmu_save_and_restart(event))
 		return;
 
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, event->hw.last_period);
 
 	/*
 	 * We use the interrupt regs as a base because the PEBS record
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index ef484d9..ed301c7 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1005,8 +1005,6 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 	int idx, handled = 0;
 	u64 val;
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
@@ -1034,10 +1032,12 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 		handled += overflow;
 
 		/* event overflow for sure */
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 
 		if (!x86_perf_event_set_period(event))
 			continue;
+
+
 		if (perf_event_overflow(event, &data, regs))
 			x86_pmu_stop(event, 0);
 	}
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 57ae485..12ac652 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1076,11 +1076,14 @@ struct perf_sample_data {
 	struct perf_branch_stack	*br_stack;
 };
 
-static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
+static inline void perf_sample_data_init(struct perf_sample_data *data,
+					 u64 addr, u64 period)
 {
+	/* remaining struct members initialized in perf_prepare_sample() */
 	data->addr = addr;
 	data->raw  = NULL;
 	data->br_stack = NULL;
+	data->period	= period;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index c61234b..8833198 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4957,7 +4957,7 @@ void __perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)
 	if (rctx < 0)
 		return;
 
-	perf_sample_data_init(&data, addr);
+	perf_sample_data_init(&data, addr, 0);
 
 	do_perf_sw_event(PERF_TYPE_SOFTWARE, event_id, nr, &data, regs);
 
@@ -5215,7 +5215,7 @@ void perf_tp_event(u64 addr, u64 count, void *record, int entry_size,
 		.data = record,
 	};
 
-	perf_sample_data_init(&data, addr);
+	perf_sample_data_init(&data, addr, 0);
 	data.raw = &raw;
 
 	hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
@@ -5318,7 +5318,7 @@ void perf_bp_event(struct perf_event *bp, void *data)
 	struct perf_sample_data sample;
 	struct pt_regs *regs = data;
 
-	perf_sample_data_init(&sample, bp->attr.bp_addr);
+	perf_sample_data_init(&sample, bp->attr.bp_addr, 0);
 
 	if (!bp->hw.state && !perf_exclude_event(bp, regs))
 		perf_swevent_event(bp, 1, &sample, regs);
@@ -5344,8 +5344,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 
 	event->pmu->read(event);
 
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, event->hw.last_period);
 	regs = get_irq_regs();
 
 	if (regs && !perf_exclude_event(event, regs)) {
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 03/12] perf/x86-ibs: Enable ibs op micro-ops counting mode
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
  2012-04-02 18:19 ` [PATCH 01/12] perf/x86-ibs: Fix update of period Robert Richter
  2012-04-02 18:19 ` [PATCH 02/12] perf: Pass last sampling period to perf_sample_data_init() Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:31   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 04/12] perf/x86-ibs: Fix frequency profiling Robert Richter
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

Allow enabling ibs op micro-ops counting mode.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 74b663c..6f00ee3 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -468,6 +468,8 @@ static __init int perf_event_ibs_init(void)
 		return -ENODEV;	/* ibs not supported by the cpu */
 
 	perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
+	if (ibs_caps & IBS_CAPS_OPCNT)
+		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
 	perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
 	register_nmi_handler(NMI_LOCAL, &perf_ibs_nmi_handler, 0, "perf_ibs");
 	printk(KERN_INFO "perf: AMD IBS detected (0x%08x)\n", ibs_caps);
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 04/12] perf/x86-ibs: Fix frequency profiling
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (2 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 03/12] perf/x86-ibs: Enable ibs op micro-ops counting mode Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:32   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 05/12] perf/x86-ibs: Take instruction pointer from ibs sample Robert Richter
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

Fixing profiling at a fix frequency, in this case the freq value and
sample period was setup incorrectly. Since sampling periods are
adjusted we also allow periods that have lower 4 bits set.

Another fix is the setup of the hw counter: If we modify
hwc->sample_period, we also need to update hwc->last_period and
hwc->period_left.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 6f00ee3..eec3ea2 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -162,9 +162,16 @@ static int perf_ibs_init(struct perf_event *event)
 		if (config & perf_ibs->cnt_mask)
 			/* raw max_cnt may not be set */
 			return -EINVAL;
-		if (hwc->sample_period & 0x0f)
-			/* lower 4 bits can not be set in ibs max cnt */
+		if (!event->attr.sample_freq && hwc->sample_period & 0x0f)
+			/*
+			 * lower 4 bits can not be set in ibs max cnt,
+			 * but allowing it in case we adjust the
+			 * sample period to set a frequency.
+			 */
 			return -EINVAL;
+		hwc->sample_period &= ~0x0FULL;
+		if (!hwc->sample_period)
+			hwc->sample_period = 0x10;
 	} else {
 		max_cnt = config & perf_ibs->cnt_mask;
 		config &= ~perf_ibs->cnt_mask;
@@ -175,6 +182,13 @@ static int perf_ibs_init(struct perf_event *event)
 	if (!hwc->sample_period)
 		return -EINVAL;
 
+	/*
+	 * If we modify hwc->sample_period, we also need to update
+	 * hwc->last_period and hwc->period_left.
+	 */
+	hwc->last_period = hwc->sample_period;
+	local64_set(&hwc->period_left, hwc->sample_period);
+
 	hwc->config_base = perf_ibs->msr;
 	hwc->config = config;
 
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 05/12] perf/x86-ibs: Take instruction pointer from ibs sample
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (3 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 04/12] perf/x86-ibs: Fix frequency profiling Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:33   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

Each IBS sample contains a linear address of the instruction that
caused the sample to trigger. This address is more precise than the
rip that was taken from the interrupt handler's stack. Update the rip
with that address. We use this in the next patch to implement
precise-event sampling on AMD systems using IBS.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/include/asm/perf_event.h        |    6 ++-
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   48 +++++++++++++++++++----------
 2 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 9cf6696..651172d 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -157,6 +157,7 @@ struct x86_pmu_capability {
 #define IBS_CAPS_OPCNT			(1U<<4)
 #define IBS_CAPS_BRNTRGT		(1U<<5)
 #define IBS_CAPS_OPCNTEXT		(1U<<6)
+#define IBS_CAPS_RIPINVALIDCHK		(1U<<7)
 
 #define IBS_CAPS_DEFAULT		(IBS_CAPS_AVAIL		\
 					 | IBS_CAPS_FETCHSAM	\
@@ -169,14 +170,14 @@ struct x86_pmu_capability {
 #define IBSCTL_LVT_OFFSET_VALID		(1ULL<<8)
 #define IBSCTL_LVT_OFFSET_MASK		0x0F
 
-/* IbsFetchCtl bits/masks */
+/* ibs fetch bits/masks */
 #define IBS_FETCH_RAND_EN	(1ULL<<57)
 #define IBS_FETCH_VAL		(1ULL<<49)
 #define IBS_FETCH_ENABLE	(1ULL<<48)
 #define IBS_FETCH_CNT		0xFFFF0000ULL
 #define IBS_FETCH_MAX_CNT	0x0000FFFFULL
 
-/* IbsOpCtl bits */
+/* ibs op bits/masks */
 /* lower 4 bits of the current count are ignored: */
 #define IBS_OP_CUR_CNT		(0xFFFF0ULL<<32)
 #define IBS_OP_CNT_CTL		(1ULL<<19)
@@ -184,6 +185,7 @@ struct x86_pmu_capability {
 #define IBS_OP_ENABLE		(1ULL<<17)
 #define IBS_OP_MAX_CNT		0x0000FFFFULL
 #define IBS_OP_MAX_CNT_EXT	0x007FFFFFULL	/* not a register bit mask */
+#define IBS_RIP_INVALID		(1ULL<<38)
 
 extern u32 get_ibs_caps(void);
 
diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index eec3ea2..0321b64 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -9,6 +9,7 @@
 #include <linux/perf_event.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/ptrace.h>
 
 #include <asm/apic.h>
 
@@ -382,7 +383,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	struct perf_raw_record raw;
 	struct pt_regs regs;
 	struct perf_ibs_data ibs_data;
-	int offset, size, overflow, reenable;
+	int offset, size, check_rip, offset_max, throttle = 0;
 	unsigned int msr;
 	u64 *buf, config;
 
@@ -413,28 +414,41 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 
 	perf_ibs_event_update(perf_ibs, event, config);
 	perf_sample_data_init(&data, 0, hwc->last_period);
+	if (!perf_ibs_set_period(perf_ibs, hwc, &config))
+		goto out;	/* no sw counter overflow */
+
+	ibs_data.caps = ibs_caps;
+	size = 1;
+	offset = 1;
+	check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
+	if (event->attr.sample_type & PERF_SAMPLE_RAW)
+		offset_max = perf_ibs->offset_max;
+	else if (check_rip)
+		offset_max = 2;
+	else
+		offset_max = 1;
+	do {
+		rdmsrl(msr + offset, *buf++);
+		size++;
+		offset = find_next_bit(perf_ibs->offset_mask,
+				       perf_ibs->offset_max,
+				       offset + 1);
+	} while (offset < offset_max);
+	ibs_data.size = sizeof(u64) * size;
+
+	regs = *iregs;
+	if (!check_rip || !(ibs_data.regs[2] & IBS_RIP_INVALID))
+		instruction_pointer_set(&regs, ibs_data.regs[1]);
 
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
-		ibs_data.caps = ibs_caps;
-		size = 1;
-		offset = 1;
-		do {
-		    rdmsrl(msr + offset, *buf++);
-		    size++;
-		    offset = find_next_bit(perf_ibs->offset_mask,
-					   perf_ibs->offset_max,
-					   offset + 1);
-		} while (offset < perf_ibs->offset_max);
-		raw.size = sizeof(u32) + sizeof(u64) * size;
+		raw.size = sizeof(u32) + ibs_data.size;
 		raw.data = ibs_data.data;
 		data.raw = &raw;
 	}
 
-	regs = *iregs; /* XXX: update ip from ibs sample */
-
-	overflow = perf_ibs_set_period(perf_ibs, hwc, &config);
-	reenable = !(overflow && perf_event_overflow(event, &data, &regs));
-	config = (config >> 4) | (reenable ? perf_ibs->enable_mask : 0);
+	throttle = perf_event_overflow(event, &data, &regs);
+out:
+	config = (config >> 4) | (throttle ? 0 : perf_ibs->enable_mask);
 	perf_ibs_enable_event(hwc, config);
 
 	perf_event_update_userpage(event);
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (4 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 05/12] perf/x86-ibs: Take instruction pointer from ibs sample Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-04-14 10:21   ` Peter Zijlstra
                     ` (3 more replies)
  2012-04-02 18:19 ` [PATCH 07/12] perf/x86-ibs: Rename some variables Robert Richter
                   ` (6 subsequent siblings)
  12 siblings, 4 replies; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

This patch adds support for precise event sampling with IBS. There are
two counting modes to count either cycles or micro-ops. If the
corresponding performance counter events (hw events) are setup with
the precise flag set, the request is redirected to the ibs pmu:

 perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
 perf record -a -e r076:p ...          # same as -e cpu-cycles:p
 perf record -a -e r0C1:p ...          # use ibs op counting micro-ops

Each IBS sample contains a linear address that points to the
instruction that was causing the sample to trigger. With ibs we have
skid 0.

Though the skid is 0, we map IBS sampling to following precise levels:

 1: RIP taken from IBS sample or (if invalid) from stack
 2: RIP always taken from IBS sample, samples with an invalid rip
    are dropped. Thus samples of an event containing two precise
    modifiers (e.g. r076:pp) only contain (precise) addresses
    detected with IBS.

Precise level 3 is reserved for other purposes in the future.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd.c     |    7 +++-
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   71 +++++++++++++++++++++++++++++-
 2 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 95e7fe1..4be3463 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -134,8 +134,13 @@ static u64 amd_pmu_event_map(int hw_event)
 
 static int amd_pmu_hw_config(struct perf_event *event)
 {
-	int ret = x86_pmu_hw_config(event);
+	int ret;
 
+	/* pass precise event sampling to ibs: */
+	if (event->attr.precise_ip && get_ibs_caps())
+		return -ENOENT;
+
+	ret = x86_pmu_hw_config(event);
 	if (ret)
 		return ret;
 
diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 0321b64..05a359f 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -145,17 +145,82 @@ static struct perf_ibs *get_ibs_pmu(int type)
 	return NULL;
 }
 
+/*
+ * Use IBS for precise event sampling:
+ *
+ *  perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
+ *  perf record -a -e r076:p ...          # same as -e cpu-cycles:p
+ *  perf record -a -e r0C1:p ...          # use ibs op counting micro-ops
+ *
+ * IbsOpCntCtl (bit 19) of IBS Execution Control Register (IbsOpCtl,
+ * MSRC001_1033) is used to select either cycle or micro-ops counting
+ * mode.
+ *
+ * We map IBS sampling to following precise levels:
+ *
+ *  1: RIP taken from IBS sample or (if invalid) from stack
+ *  2: RIP always taken from IBS sample, samples with an invalid rip
+ *     are dropped. Thus samples of an event containing two precise
+ *     modifiers (e.g. r076:pp) only contain (precise) addresses
+ *     detected with IBS.
+ */
+static int perf_ibs_precise_event(struct perf_event *event, u64 *config)
+{
+	switch (event->attr.precise_ip) {
+	case 0:
+		return -ENOENT;
+	case 1:
+	case 2:
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	switch (event->attr.type) {
+	case PERF_TYPE_HARDWARE:
+		switch (event->attr.config) {
+		case PERF_COUNT_HW_CPU_CYCLES:
+			*config = 0;
+			return 0;
+		}
+		break;
+	case PERF_TYPE_RAW:
+		switch (event->attr.config) {
+		case 0x0076:
+			*config = 0;
+			return 0;
+		case 0x00C1:
+			*config = IBS_OP_CNT_CTL;
+			return 0;
+		}
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	return -EOPNOTSUPP;
+}
+
 static int perf_ibs_init(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_ibs *perf_ibs;
 	u64 max_cnt, config;
+	int ret;
 
 	perf_ibs = get_ibs_pmu(event->attr.type);
-	if (!perf_ibs)
+	if (perf_ibs) {
+		config = event->attr.config;
+	} else {
+		perf_ibs = &perf_ibs_op;
+		ret = perf_ibs_precise_event(event, &config);
+		if (ret)
+			return ret;
+	}
+
+	if (event->pmu != &perf_ibs->pmu)
 		return -ENOENT;
 
-	config = event->attr.config;
 	if (config & ~perf_ibs->config_mask)
 		return -EINVAL;
 
@@ -439,6 +504,8 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	regs = *iregs;
 	if (!check_rip || !(ibs_data.regs[2] & IBS_RIP_INVALID))
 		instruction_pointer_set(&regs, ibs_data.regs[1]);
+	else if (event->attr.precise_ip > 1)
+		goto out;	/* drop non-precise samples */
 
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		raw.size = sizeof(u32) + ibs_data.size;
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 07/12] perf/x86-ibs: Rename some variables
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (5 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:34   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 08/12] perf/x86-ibs: Trigger overflow if remaining period is too small Robert Richter
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

Simple patch that just renames some variables for better
understanding.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 05a359f..6591b77 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -62,7 +62,7 @@ struct perf_ibs_data {
 };
 
 static int
-perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *count)
+perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_period)
 {
 	s64 left = local64_read(&hwc->period_left);
 	s64 period = hwc->sample_period;
@@ -91,7 +91,7 @@ perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *count)
 	if (left > max)
 		left = max;
 
-	*count = (u64)left;
+	*hw_period = (u64)left;
 
 	return overflow;
 }
@@ -264,13 +264,13 @@ static int perf_ibs_init(struct perf_event *event)
 static int perf_ibs_set_period(struct perf_ibs *perf_ibs,
 			       struct hw_perf_event *hwc, u64 *period)
 {
-	int ret;
+	int overflow;
 
 	/* ignore lower 4 bits in min count: */
-	ret = perf_event_set_period(hwc, 1<<4, perf_ibs->max_period, period);
+	overflow = perf_event_set_period(hwc, 1<<4, perf_ibs->max_period, period);
 	local64_set(&hwc->prev_count, 0);
 
-	return ret;
+	return overflow;
 }
 
 static u64 get_ibs_fetch_count(u64 config)
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 08/12] perf/x86-ibs: Trigger overflow if remaining period is too small
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (6 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 07/12] perf/x86-ibs: Rename some variables Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:35   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 09/12] perf/x86-ibs: Extend hw period that triggers overflow Robert Richter
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

There are cases where the remaining period is smaller than the minimal
possible value. In this case the counter is restarted with the minimal
period. This is of no use as the interrupt handler will trigger
immediately again and most likely hits itself. This biases the
results.

So, if the remaining period is within the min range, we better do not
restart the counter and instead trigger the overflow.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 6591b77..1f53f16 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -78,16 +78,13 @@ perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_perio
 		overflow = 1;
 	}
 
-	if (unlikely(left <= 0)) {
+	if (unlikely(left < (s64)min)) {
 		left += period;
 		local64_set(&hwc->period_left, left);
 		hwc->last_period = period;
 		overflow = 1;
 	}
 
-	if (unlikely(left < min))
-		left = min;
-
 	if (left > max)
 		left = max;
 
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 09/12] perf/x86-ibs: Extend hw period that triggers overflow
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (7 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 08/12] perf/x86-ibs: Trigger overflow if remaining period is too small Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:36   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 10/12] perf/x86-ibs: Implement workaround for IBS erratum #420 Robert Richter
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

If the last hw period is too short we might hit the irq handler which
biases the results. Thus try to have a max last period that triggers
the sw overflow.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 1f53f16..f0271dd 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -85,8 +85,19 @@ perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_perio
 		overflow = 1;
 	}
 
-	if (left > max)
-		left = max;
+	/*
+	 * If the hw period that triggers the sw overflow is too short
+	 * we might hit the irq handler. This biases the results.
+	 * Thus we shorten the next-to-last period and set the last
+	 * period to the max period.
+	 */
+	if (left > max) {
+		left -= max;
+		if (left > max)
+			left = max;
+		else if (left < min)
+			left = min;
+	}
 
 	*hw_period = (u64)left;
 
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 10/12] perf/x86-ibs: Implement workaround for IBS erratum #420
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (8 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 09/12] perf/x86-ibs: Extend hw period that triggers overflow Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:37   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 11/12] perf/x86-ibs: Catch spurious interrupts after stopping ibs Robert Richter
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

When disabling ibs there might be the case where hardware continuously
generates interrupts. This is described in erratum #420 (Instruction-
Based Sampling Engine May Generate Interrupt that Cannot Be Cleared).
To avoid this we must clear the counter mask first and then clear the
enable bit. This patch implements this.

See Revision Guide for AMD Family 10h Processors, Publication #41322.

Note: We now keep track of the last read ibs config value which is
then used to disable ibs. To update the config value we pass now a
pointer to the functions reading it.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   62 +++++++++++++++++++-----------
 1 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index f0271dd..35a35be 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -293,20 +293,36 @@ static u64 get_ibs_op_count(u64 config)
 
 static void
 perf_ibs_event_update(struct perf_ibs *perf_ibs, struct perf_event *event,
-		      u64 config)
+		      u64 *config)
 {
-	u64 count = perf_ibs->get_count(config);
+	u64 count = perf_ibs->get_count(*config);
 
 	while (!perf_event_try_update(event, count, 20)) {
-		rdmsrl(event->hw.config_base, config);
-		count = perf_ibs->get_count(config);
+		rdmsrl(event->hw.config_base, *config);
+		count = perf_ibs->get_count(*config);
 	}
 }
 
-/* Note: The enable mask must be encoded in the config argument. */
-static inline void perf_ibs_enable_event(struct hw_perf_event *hwc, u64 config)
+static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
+					 struct hw_perf_event *hwc, u64 config)
 {
-	wrmsrl(hwc->config_base, hwc->config | config);
+	wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
+}
+
+/*
+ * Erratum #420 Instruction-Based Sampling Engine May Generate
+ * Interrupt that Cannot Be Cleared:
+ *
+ * Must clear counter mask first, then clear the enable bit. See
+ * Revision Guide for AMD Family 10h Processors, Publication #41322.
+ */
+static inline void perf_ibs_disable_event(struct perf_ibs *perf_ibs,
+					  struct hw_perf_event *hwc, u64 config)
+{
+	config &= ~perf_ibs->cnt_mask;
+	wrmsrl(hwc->config_base, config);
+	config &= ~perf_ibs->enable_mask;
+	wrmsrl(hwc->config_base, config);
 }
 
 /*
@@ -320,7 +336,7 @@ static void perf_ibs_start(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_ibs *perf_ibs = container_of(event->pmu, struct perf_ibs, pmu);
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
-	u64 config;
+	u64 period;
 
 	if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
 		return;
@@ -328,10 +344,9 @@ static void perf_ibs_start(struct perf_event *event, int flags)
 	WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
 	hwc->state = 0;
 
-	perf_ibs_set_period(perf_ibs, hwc, &config);
-	config = (config >> 4) | perf_ibs->enable_mask;
+	perf_ibs_set_period(perf_ibs, hwc, &period);
 	set_bit(IBS_STARTED, pcpu->state);
-	perf_ibs_enable_event(hwc, config);
+	perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
 
 	perf_event_update_userpage(event);
 }
@@ -341,7 +356,7 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_ibs *perf_ibs = container_of(event->pmu, struct perf_ibs, pmu);
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
-	u64 val;
+	u64 config;
 	int stopping;
 
 	stopping = test_and_clear_bit(IBS_STARTED, pcpu->state);
@@ -349,12 +364,11 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
 	if (!stopping && (hwc->state & PERF_HES_UPTODATE))
 		return;
 
-	rdmsrl(hwc->config_base, val);
+	rdmsrl(hwc->config_base, config);
 
 	if (stopping) {
 		set_bit(IBS_STOPPING, pcpu->state);
-		val &= ~perf_ibs->enable_mask;
-		wrmsrl(hwc->config_base, val);
+		perf_ibs_disable_event(perf_ibs, hwc, config);
 		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 		hwc->state |= PERF_HES_STOPPED;
 	}
@@ -362,7 +376,7 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
 	if (hwc->state & PERF_HES_UPTODATE)
 		return;
 
-	perf_ibs_event_update(perf_ibs, event, val);
+	perf_ibs_event_update(perf_ibs, event, &config);
 	hwc->state |= PERF_HES_UPTODATE;
 }
 
@@ -458,7 +472,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	struct perf_ibs_data ibs_data;
 	int offset, size, check_rip, offset_max, throttle = 0;
 	unsigned int msr;
-	u64 *buf, config;
+	u64 *buf, *config, period;
 
 	if (!test_bit(IBS_STARTED, pcpu->state)) {
 		/* Catch spurious interrupts after stopping IBS: */
@@ -479,15 +493,15 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	 * supported in all cpus. As this triggered an interrupt, we
 	 * set the current count to the max count.
 	 */
-	config = ibs_data.regs[0];
+	config = &ibs_data.regs[0];
 	if (perf_ibs == &perf_ibs_op && !(ibs_caps & IBS_CAPS_RDWROPCNT)) {
-		config &= ~IBS_OP_CUR_CNT;
-		config |= (config & IBS_OP_MAX_CNT) << 36;
+		*config &= ~IBS_OP_CUR_CNT;
+		*config |= (*config & IBS_OP_MAX_CNT) << 36;
 	}
 
 	perf_ibs_event_update(perf_ibs, event, config);
 	perf_sample_data_init(&data, 0, hwc->last_period);
-	if (!perf_ibs_set_period(perf_ibs, hwc, &config))
+	if (!perf_ibs_set_period(perf_ibs, hwc, &period))
 		goto out;	/* no sw counter overflow */
 
 	ibs_data.caps = ibs_caps;
@@ -523,8 +537,10 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 
 	throttle = perf_event_overflow(event, &data, &regs);
 out:
-	config = (config >> 4) | (throttle ? 0 : perf_ibs->enable_mask);
-	perf_ibs_enable_event(hwc, config);
+	if (throttle)
+		perf_ibs_disable_event(perf_ibs, hwc, *config);
+	else
+		perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
 
 	perf_event_update_userpage(event);
 
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 11/12] perf/x86-ibs: Catch spurious interrupts after stopping ibs
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (9 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 10/12] perf/x86-ibs: Implement workaround for IBS erratum #420 Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:38   ` [tip:perf/core] perf/x86-ibs: Catch spurious interrupts after stopping IBS tip-bot for Robert Richter
  2012-04-02 18:19 ` [PATCH 12/12] perf/x86-ibs: Fix usage of IBS op current count Robert Richter
  2012-04-02 19:11 ` [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Ingo Molnar
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

After disabling IBS there could be still incomming NMIs with samples
that even have the valid bit cleared. Mark all this NMIs as handled to
avoid spurious interrupt messages.

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 35a35be..b44aa63 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -475,11 +475,13 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	u64 *buf, *config, period;
 
 	if (!test_bit(IBS_STARTED, pcpu->state)) {
-		/* Catch spurious interrupts after stopping IBS: */
-		if (!test_and_clear_bit(IBS_STOPPING, pcpu->state))
-			return 0;
-		rdmsrl(perf_ibs->msr, *ibs_data.regs);
-		return (*ibs_data.regs & perf_ibs->valid_mask) ? 1 : 0;
+		/*
+		 * Catch spurious interrupts after stopping IBS: After
+		 * disabling IBS there could be still incomming NMIs
+		 * with samples that even have the valid bit cleared.
+		 * Mark all this NMIs as handled.
+		 */
+		return test_and_clear_bit(IBS_STOPPING, pcpu->state) ? 1 : 0;
 	}
 
 	msr = hwc->config_base;
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 12/12] perf/x86-ibs: Fix usage of IBS op current count
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (10 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 11/12] perf/x86-ibs: Catch spurious interrupts after stopping ibs Robert Richter
@ 2012-04-02 18:19 ` Robert Richter
  2012-05-09 14:39   ` [tip:perf/core] " tip-bot for Robert Richter
  2012-04-02 19:11 ` [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Ingo Molnar
  12 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-02 18:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML,
	Robert Richter

The value of IbsOpCurCnt rolls over when it reaches IbsOpMaxCnt. Thus,
it is reset to zero by hardware. To get the correct count we need to
add the max count to it in case we received an ibs sample (valid bit
set).

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   33 +++++++++++++++++++----------
 1 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index b44aa63..0dfe952 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -288,7 +288,15 @@ static u64 get_ibs_fetch_count(u64 config)
 
 static u64 get_ibs_op_count(u64 config)
 {
-	return (config & IBS_OP_CUR_CNT) >> 32;
+	u64 count = 0;
+
+	if (config & IBS_OP_VAL)
+		count += (config & IBS_OP_MAX_CNT) << 4; /* cnt rolled over */
+
+	if (ibs_caps & IBS_CAPS_RDWROPCNT)
+		count += (config & IBS_OP_CUR_CNT) >> 32;
+
+	return count;
 }
 
 static void
@@ -297,7 +305,12 @@ perf_ibs_event_update(struct perf_ibs *perf_ibs, struct perf_event *event,
 {
 	u64 count = perf_ibs->get_count(*config);
 
-	while (!perf_event_try_update(event, count, 20)) {
+	/*
+	 * Set width to 64 since we do not overflow on max width but
+	 * instead on max count. In perf_ibs_set_period() we clear
+	 * prev count manually on overflow.
+	 */
+	while (!perf_event_try_update(event, count, 64)) {
 		rdmsrl(event->hw.config_base, *config);
 		count = perf_ibs->get_count(*config);
 	}
@@ -376,6 +389,12 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
 	if (hwc->state & PERF_HES_UPTODATE)
 		return;
 
+	/*
+	 * Clear valid bit to not count rollovers on update, rollovers
+	 * are only updated in the irq handler.
+	 */
+	config &= ~perf_ibs->valid_mask;
+
 	perf_ibs_event_update(perf_ibs, event, &config);
 	hwc->state |= PERF_HES_UPTODATE;
 }
@@ -490,17 +509,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	if (!(*buf++ & perf_ibs->valid_mask))
 		return 0;
 
-	/*
-	 * Emulate IbsOpCurCnt in MSRC001_1033 (IbsOpCtl), not
-	 * supported in all cpus. As this triggered an interrupt, we
-	 * set the current count to the max count.
-	 */
 	config = &ibs_data.regs[0];
-	if (perf_ibs == &perf_ibs_op && !(ibs_caps & IBS_CAPS_RDWROPCNT)) {
-		*config &= ~IBS_OP_CUR_CNT;
-		*config |= (*config & IBS_OP_MAX_CNT) << 36;
-	}
-
 	perf_ibs_event_update(perf_ibs, event, config);
 	perf_sample_data_init(&data, 0, hwc->last_period);
 	if (!perf_ibs_set_period(perf_ibs, hwc, &period))
-- 
1.7.8.4



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                   ` (11 preceding siblings ...)
  2012-04-02 18:19 ` [PATCH 12/12] perf/x86-ibs: Fix usage of IBS op current count Robert Richter
@ 2012-04-02 19:11 ` Ingo Molnar
  2012-04-03 10:48   ` Robert Richter
  12 siblings, 1 reply; 48+ messages in thread
From: Ingo Molnar @ 2012-04-02 19:11 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo, LKML


* Robert Richter <robert.richter@amd.com> wrote:

>  perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count

Cool - this makes IBS really useful!

Mind posting some perf annotate output of any well-known kernel 
function showing skiddy '-e cpu-cycles' output versus skid-less 
'-e cpu-cycles:p' output?

I'm curious how well this works in practice.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-02 19:11 ` [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Ingo Molnar
@ 2012-04-03 10:48   ` Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Richter @ 2012-04-03 10:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On 02.04.12 21:11:23, Ingo Molnar wrote:
> Mind posting some perf annotate output of any well-known kernel 
> function showing skiddy '-e cpu-cycles' output versus skid-less 
> '-e cpu-cycles:p' output?

This is what I got for _raw_spin_lock_irqsave (first perfctr, second
ibs).

-Robert


 # perf annotate -k vmlinux -s _raw_spin_lock_irqsave -i perf-r076.data | cat
  Percent |	Source code & Disassembly of vmlinux
 ------------------------------------------------
          :
          :
          :
          :	Disassembly of section .text:
          :
          :	ffffffff8145036a <_raw_spin_lock_irqsave>:
     0.00 :	ffffffff8145036a:       push   %rbp
     0.00 :	ffffffff8145036b:       mov    %rsp,%rbp
     0.00 :	ffffffff8145036e:       callq  ffffffff81456c40 <mcount>
     0.00 :	ffffffff81450373:       pushfq 
     0.00 :	ffffffff81450374:       pop    %rax
     0.00 :	ffffffff81450375:       cli    
     0.00 :	ffffffff81450376:       mov    $0x100,%edx
     0.00 :	ffffffff8145037b:       lock xadd %dx,(%rdi)
     0.00 :	ffffffff81450380:       mov    %dl,%cl
     0.00 :	ffffffff81450382:       shr    $0x8,%dx
    10.34 :	ffffffff81450386:       cmp    %dl,%cl
     0.00 :	ffffffff81450388:       je     ffffffff81450390 <_raw_spin_lock_irqsave+0x26>
    10.34 :	ffffffff8145038a:       pause  
    65.52 :	ffffffff8145038c:       mov    (%rdi),%cl
    13.79 :	ffffffff8145038e:       jmp    ffffffff81450386 <_raw_spin_lock_irqsave+0x1c>
     0.00 :	ffffffff81450390:       leaveq 
     0.00 :	ffffffff81450391:       retq   
 # perf annotate -k vmlinux -s _raw_spin_lock_irqsave -i perf-r076pp.data | cat
  Percent | Source code & Disassembly of vmlinux
 ------------------------------------------------
          :
          :
          :
          :	Disassembly of section .text:
          :
          :	ffffffff8145036a <_raw_spin_lock_irqsave>:
     0.00 :	ffffffff8145036a:       push   %rbp
     0.00 :	ffffffff8145036b:       mov    %rsp,%rbp
     0.00 :	ffffffff8145036e:       callq  ffffffff81456c40 <mcount>
     0.00 :	ffffffff81450373:       pushfq 
     0.00 :	ffffffff81450374:       pop    %rax
     0.00 :	ffffffff81450375:       cli    
     0.00 :	ffffffff81450376:       mov    $0x100,%edx
     0.00 :	ffffffff8145037b:       lock xadd %dx,(%rdi)
     2.78 :	ffffffff81450380:       mov    %dl,%cl
     0.00 :	ffffffff81450382:       shr    $0x8,%dx
     2.78 :	ffffffff81450386:       cmp    %dl,%cl
    11.11 :	ffffffff81450388:       je     ffffffff81450390 <_raw_spin_lock_irqsave+0x26>
    72.22 :	ffffffff8145038a:       pause  
     2.78 :	ffffffff8145038c:       mov    (%rdi),%cl
     8.33 :	ffffffff8145038e:       jmp    ffffffff81450386 <_raw_spin_lock_irqsave+0x1c>
     0.00 :	ffffffff81450390:       leaveq 
     0.00 :	ffffffff81450391:       retq   

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-02 18:19 ` [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
@ 2012-04-14 10:21   ` Peter Zijlstra
  2012-04-23  9:56     ` Robert Richter
  2012-04-14 10:22   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2012-04-14 10:21 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
> + * We map IBS sampling to following precise levels:
> + *
> + *  1: RIP taken from IBS sample or (if invalid) from stack
> + *  2: RIP always taken from IBS sample, samples with an invalid rip
> + *     are dropped. Thus samples of an event containing two precise
> + *     modifiers (e.g. r076:pp) only contain (precise) addresses
> + *     detected with IBS.

		/*
		 * precise_ip:
		 *
		 *  0 - SAMPLE_IP can have arbitrary skid
		 *  1 - SAMPLE_IP must have constant skid
		 *  2 - SAMPLE_IP requested to have 0 skid
		 *  3 - SAMPLE_IP must have 0 skid
		 *
		 *  See also PERF_RECORD_MISC_EXACT_IP
		 */

your 1 doesn't have constant skid. I would suggest only supporting 2 and
letting userspace drop !PERF_RECORD_MISC_EXACT_IP records if so desired.

That said, mixing the IBS pmu into the regular core pmu isn't exactly
pretty..

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-02 18:19 ` [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
  2012-04-14 10:21   ` Peter Zijlstra
@ 2012-04-14 10:22   ` Peter Zijlstra
  2012-04-23  8:41     ` Robert Richter
  2012-04-14 10:24   ` Peter Zijlstra
  2012-05-02 10:33   ` [PATCH v2] " Robert Richter
  3 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2012-04-14 10:22 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
> + * IbsOpCntCtl (bit 19) of IBS Execution Control Register (IbsOpCtl,
> + * MSRC001_1033) is used to select either cycle or micro-ops counting
> + * mode. 

Ah is that what it does.. the BKDG doesn't appear to say this.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-02 18:19 ` [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
  2012-04-14 10:21   ` Peter Zijlstra
  2012-04-14 10:22   ` Peter Zijlstra
@ 2012-04-14 10:24   ` Peter Zijlstra
  2012-04-23 10:08     ` Robert Richter
  2012-05-02 10:33   ` [PATCH v2] " Robert Richter
  3 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2012-04-14 10:24 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
> +       switch (event->attr.type) {
> +       case PERF_TYPE_HARDWARE:
> +               switch (event->attr.config) {
> +               case PERF_COUNT_HW_CPU_CYCLES:
> +                       *config = 0;
> +                       return 0;
> +               }
> +               break;
> +       case PERF_TYPE_RAW:
> +               switch (event->attr.config) {
> +               case 0x0076:
> +                       *config = 0;
> +                       return 0;
> +               case 0x00C1:
> +                       *config = IBS_OP_CNT_CTL;
> +                       return 0;
> +               }
> +               break;
> +       default:
> +               return -ENOENT;
> +       } 

Another option would be to do this from amd_pmu_hw_config() after you've
already gotten rid of the whole attr.type thing.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-14 10:22   ` Peter Zijlstra
@ 2012-04-23  8:41     ` Robert Richter
  2012-04-23 10:36       ` Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-23  8:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On 14.04.12 12:22:10, Peter Zijlstra wrote:
> On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
> > + * IbsOpCntCtl (bit 19) of IBS Execution Control Register (IbsOpCtl,
> > + * MSRC001_1033) is used to select either cycle or micro-ops counting
> > + * mode. 
> 
> Ah is that what it does.. the BKDG doesn't appear to say this.

"19 IbsOpCntCtl: periodic op counter count control. Revision B:
    Reserved. Revision C: Read-write. Reset 0b. 1=Count dispatched ops
    0=Count clock cycles."

It's here:

 MSRC001_1033 IBS Execution Control Register (IbsOpCtl)
 http://support.amd.com/us/Processor_TechDocs/31116.pdf

Ok, it might not be quite clear that "dispatched ops" is related to
EventSelect 0C1h Retired uops, but there is an exact mapping.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-14 10:21   ` Peter Zijlstra
@ 2012-04-23  9:56     ` Robert Richter
  2012-04-27 12:34       ` Robert Richter
  0 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-23  9:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On 14.04.12 12:21:46, Peter Zijlstra wrote:
> On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
> > + * We map IBS sampling to following precise levels:
> > + *
> > + *  1: RIP taken from IBS sample or (if invalid) from stack
> > + *  2: RIP always taken from IBS sample, samples with an invalid rip
> > + *     are dropped. Thus samples of an event containing two precise
> > + *     modifiers (e.g. r076:pp) only contain (precise) addresses
> > + *     detected with IBS.
> 
> 		/*
> 		 * precise_ip:
> 		 *
> 		 *  0 - SAMPLE_IP can have arbitrary skid
> 		 *  1 - SAMPLE_IP must have constant skid
> 		 *  2 - SAMPLE_IP requested to have 0 skid
> 		 *  3 - SAMPLE_IP must have 0 skid
> 		 *
> 		 *  See also PERF_RECORD_MISC_EXACT_IP
> 		 */
> 
> your 1 doesn't have constant skid. I would suggest only supporting 2 and
> letting userspace drop !PERF_RECORD_MISC_EXACT_IP records if so desired.

Ah, didn't notice the PERF_RECORD_MISC_EXACT_IP flag. Will set this
flag for precise events.

Problem is that this flag is not yet well supported, only perf-top
uses it to count the total number of exact samples. Esp. perf-annotate
and perf-report do not support it, and there are no modifiers to
select precise-only sampling (or is this level 3?).

Both might be useful: You might need only precise-rip samples (perf-
annotate usage), on the other side you want samples with every
clock/ops count overflow (e.g. to get a counting statistic). The
p-modifier specification (see perf-list) is not sufficient to select
both of it.

Another question I have: Isn't precise level 2 a special case of level
1 where the skid is constant and 0? The problem I see is, if people
want to measure precise rip, they simply use r076:p. Level 2 (r076:pp)
is actually better than 1, but they might think not to be able to
sample precise-rip if we throw an error for r076:p. Thus, I would
prefer to also allow level 1.

> That said, mixing the IBS pmu into the regular core pmu isn't exactly
> pretty..

IBS is currently the only way to do precise-rip sampling on amd cpus.
IBS events fit well with its corresponding perfctr events (0x76/
0xc1). So what don't you like with this approach? I will also post IBS
perf tool support where IBS can be directly used.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-14 10:24   ` Peter Zijlstra
@ 2012-04-23 10:08     ` Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: Robert Richter @ 2012-04-23 10:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On 14.04.12 12:24:58, Peter Zijlstra wrote:
> On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
> > +       switch (event->attr.type) {
> > +       case PERF_TYPE_HARDWARE:
> > +               switch (event->attr.config) {
> > +               case PERF_COUNT_HW_CPU_CYCLES:
> > +                       *config = 0;
> > +                       return 0;
> > +               }
> > +               break;
> > +       case PERF_TYPE_RAW:
> > +               switch (event->attr.config) {
> > +               case 0x0076:
> > +                       *config = 0;
> > +                       return 0;
> > +               case 0x00C1:
> > +                       *config = IBS_OP_CNT_CTL;
> > +                       return 0;
> > +               }
> > +               break;
> > +       default:
> > +               return -ENOENT;
> > +       } 
> 
> Another option would be to do this from amd_pmu_hw_config() after you've
> already gotten rid of the whole attr.type thing.

I didn't want to have IBS setup code in amd_pmu_hw_config(). The
approach to pass the configuration for precise sampling to the ibs pmu
was the simplest to me.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-23  8:41     ` Robert Richter
@ 2012-04-23 10:36       ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2012-04-23 10:36 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On Mon, 2012-04-23 at 10:41 +0200, Robert Richter wrote:
> On 14.04.12 12:22:10, Peter Zijlstra wrote:
> > On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
> > > + * IbsOpCntCtl (bit 19) of IBS Execution Control Register (IbsOpCtl,
> > > + * MSRC001_1033) is used to select either cycle or micro-ops counting
> > > + * mode. 
> > 
> > Ah is that what it does.. the BKDG doesn't appear to say this.
> 
> "19 IbsOpCntCtl: periodic op counter count control. Revision B:
>     Reserved. Revision C: Read-write. Reset 0b. 1=Count dispatched ops
>     0=Count clock cycles."
> 
> It's here:
> 
>  MSRC001_1033 IBS Execution Control Register (IbsOpCtl)
>  http://support.amd.com/us/Processor_TechDocs/31116.pdf
> 
> Ok, it might not be quite clear that "dispatched ops" is related to
> EventSelect 0C1h Retired uops, but there is an exact mapping.

Ah, looks like my docs are stale.. my fam10 doc didn't have it specified
at all and my fam12 doc just listed the bit 19 as "periodic op counter
count control. Read-write." Which isn't very helpful.

Thanks!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-23  9:56     ` Robert Richter
@ 2012-04-27 12:34       ` Robert Richter
  2012-04-27 12:39         ` Stephane Eranian
  0 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-27 12:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On 23.04.12 11:56:59, Robert Richter wrote:
> On 14.04.12 12:21:46, Peter Zijlstra wrote:
> > On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
> > > + * We map IBS sampling to following precise levels:
> > > + *
> > > + *  1: RIP taken from IBS sample or (if invalid) from stack
> > > + *  2: RIP always taken from IBS sample, samples with an invalid rip
> > > + *     are dropped. Thus samples of an event containing two precise
> > > + *     modifiers (e.g. r076:pp) only contain (precise) addresses
> > > + *     detected with IBS.
> > 
> > 		/*
> > 		 * precise_ip:
> > 		 *
> > 		 *  0 - SAMPLE_IP can have arbitrary skid
> > 		 *  1 - SAMPLE_IP must have constant skid
> > 		 *  2 - SAMPLE_IP requested to have 0 skid
> > 		 *  3 - SAMPLE_IP must have 0 skid
> > 		 *
> > 		 *  See also PERF_RECORD_MISC_EXACT_IP
> > 		 */
> > 
> > your 1 doesn't have constant skid. I would suggest only supporting 2 and
> > letting userspace drop !PERF_RECORD_MISC_EXACT_IP records if so desired.
> 
> Ah, didn't notice the PERF_RECORD_MISC_EXACT_IP flag. Will set this
> flag for precise events.

Peter,

I have a patch on top that implements the support of the
PERF_RECORD_MISC_EXACT_IP flag. But I am not quite sure about how to
use the precise levels. What do you suggest?

Thanks,

-Robert

> 
> Problem is that this flag is not yet well supported, only perf-top
> uses it to count the total number of exact samples. Esp. perf-annotate
> and perf-report do not support it, and there are no modifiers to
> select precise-only sampling (or is this level 3?).
> 
> Both might be useful: You might need only precise-rip samples (perf-
> annotate usage), on the other side you want samples with every
> clock/ops count overflow (e.g. to get a counting statistic). The
> p-modifier specification (see perf-list) is not sufficient to select
> both of it.
> 
> Another question I have: Isn't precise level 2 a special case of level
> 1 where the skid is constant and 0? The problem I see is, if people
> want to measure precise rip, they simply use r076:p. Level 2 (r076:pp)
> is actually better than 1, but they might think not to be able to
> sample precise-rip if we throw an error for r076:p. Thus, I would
> prefer to also allow level 1.
> 
> > That said, mixing the IBS pmu into the regular core pmu isn't exactly
> > pretty..
> 
> IBS is currently the only way to do precise-rip sampling on amd cpus.
> IBS events fit well with its corresponding perfctr events (0x76/
> 0xc1). So what don't you like with this approach? I will also post IBS
> perf tool support where IBS can be directly used.
> 
> -Robert
> 
> -- 
> Advanced Micro Devices, Inc.
> Operating System Research Center

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 12:34       ` Robert Richter
@ 2012-04-27 12:39         ` Stephane Eranian
  2012-04-27 12:54           ` Robert Richter
  0 siblings, 1 reply; 48+ messages in thread
From: Stephane Eranian @ 2012-04-27 12:39 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

On Fri, Apr 27, 2012 at 2:34 PM, Robert Richter <robert.richter@amd.com> wrote:
> On 23.04.12 11:56:59, Robert Richter wrote:
>> On 14.04.12 12:21:46, Peter Zijlstra wrote:
>> > On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
>> > > + * We map IBS sampling to following precise levels:
>> > > + *
>> > > + *  1: RIP taken from IBS sample or (if invalid) from stack
>> > > + *  2: RIP always taken from IBS sample, samples with an invalid rip
>> > > + *     are dropped. Thus samples of an event containing two precise
>> > > + *     modifiers (e.g. r076:pp) only contain (precise) addresses
>> > > + *     detected with IBS.
>> >
>> >             /*
>> >              * precise_ip:
>> >              *
>> >              *  0 - SAMPLE_IP can have arbitrary skid
>> >              *  1 - SAMPLE_IP must have constant skid
>> >              *  2 - SAMPLE_IP requested to have 0 skid
>> >              *  3 - SAMPLE_IP must have 0 skid
>> >              *
>> >              *  See also PERF_RECORD_MISC_EXACT_IP
>> >              */
>> >
>> > your 1 doesn't have constant skid. I would suggest only supporting 2 and
>> > letting userspace drop !PERF_RECORD_MISC_EXACT_IP records if so desired.
>>
>> Ah, didn't notice the PERF_RECORD_MISC_EXACT_IP flag. Will set this
>> flag for precise events.
>
Why not use 2? IBS has 0 skid, unless I am mistaken.

> Peter,
>
> I have a patch on top that implements the support of the
> PERF_RECORD_MISC_EXACT_IP flag. But I am not quite sure about how to
> use the precise levels. What do you suggest?
>
> Thanks,
>
> -Robert
>
>>
>> Problem is that this flag is not yet well supported, only perf-top
>> uses it to count the total number of exact samples. Esp. perf-annotate
>> and perf-report do not support it, and there are no modifiers to
>> select precise-only sampling (or is this level 3?).
>>
>> Both might be useful: You might need only precise-rip samples (perf-
>> annotate usage), on the other side you want samples with every
>> clock/ops count overflow (e.g. to get a counting statistic). The
>> p-modifier specification (see perf-list) is not sufficient to select
>> both of it.
>>
>> Another question I have: Isn't precise level 2 a special case of level
>> 1 where the skid is constant and 0? The problem I see is, if people
>> want to measure precise rip, they simply use r076:p. Level 2 (r076:pp)
>> is actually better than 1, but they might think not to be able to
>> sample precise-rip if we throw an error for r076:p. Thus, I would
>> prefer to also allow level 1.
>>
>> > That said, mixing the IBS pmu into the regular core pmu isn't exactly
>> > pretty..
>>
>> IBS is currently the only way to do precise-rip sampling on amd cpus.
>> IBS events fit well with its corresponding perfctr events (0x76/
>> 0xc1). So what don't you like with this approach? I will also post IBS
>> perf tool support where IBS can be directly used.
>>
>> -Robert
>>
>> --
>> Advanced Micro Devices, Inc.
>> Operating System Research Center
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 12:39         ` Stephane Eranian
@ 2012-04-27 12:54           ` Robert Richter
  2012-04-27 13:10             ` Stephane Eranian
  2012-04-27 15:30             ` Peter Zijlstra
  0 siblings, 2 replies; 48+ messages in thread
From: Robert Richter @ 2012-04-27 12:54 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

On 27.04.12 14:39:21, Stephane Eranian wrote:
> On Fri, Apr 27, 2012 at 2:34 PM, Robert Richter <robert.richter@amd.com> wrote:
> > On 23.04.12 11:56:59, Robert Richter wrote:
> >> On 14.04.12 12:21:46, Peter Zijlstra wrote:
> >> > On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
> >> > > + * We map IBS sampling to following precise levels:
> >> > > + *
> >> > > + *  1: RIP taken from IBS sample or (if invalid) from stack
> >> > > + *  2: RIP always taken from IBS sample, samples with an invalid rip
> >> > > + *     are dropped. Thus samples of an event containing two precise
> >> > > + *     modifiers (e.g. r076:pp) only contain (precise) addresses
> >> > > + *     detected with IBS.
> >> >
> >> >             /*
> >> >              * precise_ip:
> >> >              *
> >> >              *  0 - SAMPLE_IP can have arbitrary skid
> >> >              *  1 - SAMPLE_IP must have constant skid
> >> >              *  2 - SAMPLE_IP requested to have 0 skid
> >> >              *  3 - SAMPLE_IP must have 0 skid
> >> >              *
> >> >              *  See also PERF_RECORD_MISC_EXACT_IP
> >> >              */
> >> >
> >> > your 1 doesn't have constant skid. I would suggest only supporting 2 and
> >> > letting userspace drop !PERF_RECORD_MISC_EXACT_IP records if so desired.
> >>
> >> Ah, didn't notice the PERF_RECORD_MISC_EXACT_IP flag. Will set this
> >> flag for precise events.
> >
> Why not use 2? IBS has 0 skid, unless I am mistaken.

Events with r076:p would fail then. But r076:pp is actually better and
a subset of level 1. Thus both level should work.

And there is still the question how samples with imprecise rip should
be handled. Sometimes we want to get all samples and sometimes all
samples should always contain a precise rip, other samples should be
dropped then. But there is no option or modifier for this yet.

My suggestions was to use level 1 for all samples and level 2 for
samples that only contain a precise rip, saving level 3 for future
use.

-Robert

> 
> > Peter,
> >
> > I have a patch on top that implements the support of the
> > PERF_RECORD_MISC_EXACT_IP flag. But I am not quite sure about how to
> > use the precise levels. What do you suggest?
> >
> > Thanks,
> >
> > -Robert
> >
> >>
> >> Problem is that this flag is not yet well supported, only perf-top
> >> uses it to count the total number of exact samples. Esp. perf-annotate
> >> and perf-report do not support it, and there are no modifiers to
> >> select precise-only sampling (or is this level 3?).
> >>
> >> Both might be useful: You might need only precise-rip samples (perf-
> >> annotate usage), on the other side you want samples with every
> >> clock/ops count overflow (e.g. to get a counting statistic). The
> >> p-modifier specification (see perf-list) is not sufficient to select
> >> both of it.
> >>
> >> Another question I have: Isn't precise level 2 a special case of level
> >> 1 where the skid is constant and 0? The problem I see is, if people
> >> want to measure precise rip, they simply use r076:p. Level 2 (r076:pp)
> >> is actually better than 1, but they might think not to be able to
> >> sample precise-rip if we throw an error for r076:p. Thus, I would
> >> prefer to also allow level 1.
> >>
> >> > That said, mixing the IBS pmu into the regular core pmu isn't exactly
> >> > pretty..
> >>
> >> IBS is currently the only way to do precise-rip sampling on amd cpus.
> >> IBS events fit well with its corresponding perfctr events (0x76/
> >> 0xc1). So what don't you like with this approach? I will also post IBS
> >> perf tool support where IBS can be directly used.
> >>
> >> -Robert
> >>
> >> --
> >> Advanced Micro Devices, Inc.
> >> Operating System Research Center
> >
> > --
> > Advanced Micro Devices, Inc.
> > Operating System Research Center
> >
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 12:54           ` Robert Richter
@ 2012-04-27 13:10             ` Stephane Eranian
  2012-04-27 15:18               ` Robert Richter
  2012-04-27 15:30             ` Peter Zijlstra
  1 sibling, 1 reply; 48+ messages in thread
From: Stephane Eranian @ 2012-04-27 13:10 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

Robert,

I did not follow the entire discussion, but based on your initial
post:

perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
perf record -a -e r076:p ...          # same as -e cpu-cycles:p
perf record -a -e r0C1:p ...          # use ibs op counting micro-ops

Each IBS sample contains a linear address that points to the
instruction that was causing the sample to trigger. With ibs we have
skid 0.

Though the skid is 0, we map IBS sampling to following precise levels:

 1: RIP taken from IBS sample or (if invalid) from stack.

I assume by stack you mean pt_regs, right?

2: RIP always taken from IBS sample, samples with an invalid rip
   are dropped. Thus samples of an event containing two precise
   modifiers (e.g. r076:pp) only contain (precise) addresses
   detected with IBS.

I don't think you need the distinction between 1 and 2. You can
always use the pt_regs IP as a fallback. You can mark that the
IP is precise with the MISC_EXACT flag in the sample header.
This is how it's done with PEBS. What's wrong with that?
It may actually be better than dropping samples silently as it
may introduce some bias.


On Fri, Apr 27, 2012 at 2:54 PM, Robert Richter <robert.richter@amd.com> wrote:
> On 27.04.12 14:39:21, Stephane Eranian wrote:
>> On Fri, Apr 27, 2012 at 2:34 PM, Robert Richter <robert.richter@amd.com> wrote:
>> > On 23.04.12 11:56:59, Robert Richter wrote:
>> >> On 14.04.12 12:21:46, Peter Zijlstra wrote:
>> >> > On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
>> >> > > + * We map IBS sampling to following precise levels:
>> >> > > + *
>> >> > > + *  1: RIP taken from IBS sample or (if invalid) from stack
>> >> > > + *  2: RIP always taken from IBS sample, samples with an invalid rip
>> >> > > + *     are dropped. Thus samples of an event containing two precise
>> >> > > + *     modifiers (e.g. r076:pp) only contain (precise) addresses
>> >> > > + *     detected with IBS.
>> >> >
>> >> >             /*
>> >> >              * precise_ip:
>> >> >              *
>> >> >              *  0 - SAMPLE_IP can have arbitrary skid
>> >> >              *  1 - SAMPLE_IP must have constant skid
>> >> >              *  2 - SAMPLE_IP requested to have 0 skid
>> >> >              *  3 - SAMPLE_IP must have 0 skid
>> >> >              *
>> >> >              *  See also PERF_RECORD_MISC_EXACT_IP
>> >> >              */
>> >> >
>> >> > your 1 doesn't have constant skid. I would suggest only supporting 2 and
>> >> > letting userspace drop !PERF_RECORD_MISC_EXACT_IP records if so desired.
>> >>
>> >> Ah, didn't notice the PERF_RECORD_MISC_EXACT_IP flag. Will set this
>> >> flag for precise events.
>> >
>> Why not use 2? IBS has 0 skid, unless I am mistaken.
>
> Events with r076:p would fail then. But r076:pp is actually better and
> a subset of level 1. Thus both level should work.
>
> And there is still the question how samples with imprecise rip should
> be handled. Sometimes we want to get all samples and sometimes all
> samples should always contain a precise rip, other samples should be
> dropped then. But there is no option or modifier for this yet.
>
> My suggestions was to use level 1 for all samples and level 2 for
> samples that only contain a precise rip, saving level 3 for future
> use.
>
> -Robert
>
>>
>> > Peter,
>> >
>> > I have a patch on top that implements the support of the
>> > PERF_RECORD_MISC_EXACT_IP flag. But I am not quite sure about how to
>> > use the precise levels. What do you suggest?
>> >
>> > Thanks,
>> >
>> > -Robert
>> >
>> >>
>> >> Problem is that this flag is not yet well supported, only perf-top
>> >> uses it to count the total number of exact samples. Esp. perf-annotate
>> >> and perf-report do not support it, and there are no modifiers to
>> >> select precise-only sampling (or is this level 3?).
>> >>
>> >> Both might be useful: You might need only precise-rip samples (perf-
>> >> annotate usage), on the other side you want samples with every
>> >> clock/ops count overflow (e.g. to get a counting statistic). The
>> >> p-modifier specification (see perf-list) is not sufficient to select
>> >> both of it.
>> >>
>> >> Another question I have: Isn't precise level 2 a special case of level
>> >> 1 where the skid is constant and 0? The problem I see is, if people
>> >> want to measure precise rip, they simply use r076:p. Level 2 (r076:pp)
>> >> is actually better than 1, but they might think not to be able to
>> >> sample precise-rip if we throw an error for r076:p. Thus, I would
>> >> prefer to also allow level 1.
>> >>
>> >> > That said, mixing the IBS pmu into the regular core pmu isn't exactly
>> >> > pretty..
>> >>
>> >> IBS is currently the only way to do precise-rip sampling on amd cpus.
>> >> IBS events fit well with its corresponding perfctr events (0x76/
>> >> 0xc1). So what don't you like with this approach? I will also post IBS
>> >> perf tool support where IBS can be directly used.
>> >>
>> >> -Robert
>> >>
>> >> --
>> >> Advanced Micro Devices, Inc.
>> >> Operating System Research Center
>> >
>> > --
>> > Advanced Micro Devices, Inc.
>> > Operating System Research Center
>> >
>>
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 13:10             ` Stephane Eranian
@ 2012-04-27 15:18               ` Robert Richter
  2012-04-27 15:30                 ` Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-27 15:18 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

On 27.04.12 15:10:22, Stephane Eranian wrote:
> perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
> perf record -a -e r076:p ...          # same as -e cpu-cycles:p
> perf record -a -e r0C1:p ...          # use ibs op counting micro-ops
> 
> Each IBS sample contains a linear address that points to the
> instruction that was causing the sample to trigger. With ibs we have
> skid 0.
> 
> Though the skid is 0, we map IBS sampling to following precise levels:
> 
>  1: RIP taken from IBS sample or (if invalid) from stack.
> 
> I assume by stack you mean pt_regs, right?

Right.

> 
> 2: RIP always taken from IBS sample, samples with an invalid rip
>    are dropped. Thus samples of an event containing two precise
>    modifiers (e.g. r076:pp) only contain (precise) addresses
>    detected with IBS.
> 
> I don't think you need the distinction between 1 and 2. You can
> always use the pt_regs IP as a fallback. You can mark that the
> IP is precise with the MISC_EXACT flag in the sample header.
> This is how it's done with PEBS. What's wrong with that?
> It may actually be better than dropping samples silently as it
> may introduce some bias.

There is nothing wrong with it. I already implemented that the
MISC_EXACT flag is supported. But, the flag is basically not used in
the perf tool and there is no modifier or so to only get a precise
rip.

Supose you want to use perf-annotate you only want to get precise
rips. With the levels suggested above you can do so with:

 perf record -a -e r076:pp ... | perf annotate ...

(Note the double-p.)

For non-biased sampling (e.g. counting or statistic numbers) you take
level 1 and you get every sample:

 perf record -a -e r076:p ...

There is the lack of a modifier to evaluate MISC_EXACT the same way.
That's why my choice of the levels above. Didn't have a better idea.

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 12:54           ` Robert Richter
  2012-04-27 13:10             ` Stephane Eranian
@ 2012-04-27 15:30             ` Peter Zijlstra
  2012-04-27 16:09               ` Robert Richter
  1 sibling, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2012-04-27 15:30 UTC (permalink / raw)
  To: Robert Richter
  Cc: Stephane Eranian, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

On Fri, 2012-04-27 at 14:54 +0200, Robert Richter wrote:
> My suggestions was to use level 1 for all samples and level 2 for
> samples that only contain a precise rip, saving level 3 for future
> use. 

No.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 15:18               ` Robert Richter
@ 2012-04-27 15:30                 ` Peter Zijlstra
  2012-04-27 15:57                   ` Stephane Eranian
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2012-04-27 15:30 UTC (permalink / raw)
  To: Robert Richter
  Cc: Stephane Eranian, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

On Fri, 2012-04-27 at 17:18 +0200, Robert Richter wrote:
> There is nothing wrong with it. I already implemented that the
> MISC_EXACT flag is supported. But, the flag is basically not used in
> the perf tool and there is no modifier or so to only get a precise
> rip. 

Just because userspace doesn't dtrt doesnt mean its a good idea to wreck
the kernel side of things.

Instead fix the userspace.

I'll simply not take patches that silently drops samples.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 15:30                 ` Peter Zijlstra
@ 2012-04-27 15:57                   ` Stephane Eranian
  0 siblings, 0 replies; 48+ messages in thread
From: Stephane Eranian @ 2012-04-27 15:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Robert Richter, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

On Fri, Apr 27, 2012 at 5:30 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2012-04-27 at 17:18 +0200, Robert Richter wrote:
>> There is nothing wrong with it. I already implemented that the
>> MISC_EXACT flag is supported. But, the flag is basically not used in
>> the perf tool and there is no modifier or so to only get a precise
>> rip.
>
> Just because userspace doesn't dtrt doesnt mean its a good idea to wreck
> the kernel side of things.
>
> Instead fix the userspace.
>
I was going to suggest you add an option to perf annotate/report to filter out
non exact_ip samples. That can't be that hard to do.


> I'll simply not take patches that silently drops samples.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 15:30             ` Peter Zijlstra
@ 2012-04-27 16:09               ` Robert Richter
  2012-04-27 16:21                 ` Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Robert Richter @ 2012-04-27 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephane Eranian, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

On 27.04.12 17:30:33, Peter Zijlstra wrote:
> On Fri, 2012-04-27 at 14:54 +0200, Robert Richter wrote:
> > My suggestions was to use level 1 for all samples and level 2 for
> > samples that only contain a precise rip, saving level 3 for future
> > use. 
> 
> No.

Ok, will look how to handle this in userspace.

But do you agree to have level 1 and 2 mapped to ibs, not just only
level 2 (since I don't want to fail with r076:p)?

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 16:09               ` Robert Richter
@ 2012-04-27 16:21                 ` Peter Zijlstra
  2012-04-27 16:23                   ` Stephane Eranian
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2012-04-27 16:21 UTC (permalink / raw)
  To: Robert Richter
  Cc: Stephane Eranian, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

On Fri, 2012-04-27 at 18:09 +0200, Robert Richter wrote:
> 
> But do you agree to have level 1 and 2 mapped to ibs, not just only
> level 2 (since I don't want to fail with r076:p)? 

Sure, you can have 1 and 2 mean the same. 2 wants 0 skid, 1 wants
constant skid, 0 is a constant, therefore its consistent.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-27 16:21                 ` Peter Zijlstra
@ 2012-04-27 16:23                   ` Stephane Eranian
  0 siblings, 0 replies; 48+ messages in thread
From: Stephane Eranian @ 2012-04-27 16:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Robert Richter, Ingo Molnar, Arnaldo Carvalho de Melo, LKML

On Fri, Apr 27, 2012 at 6:21 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2012-04-27 at 18:09 +0200, Robert Richter wrote:
>>
>> But do you agree to have level 1 and 2 mapped to ibs, not just only
>> level 2 (since I don't want to fail with r076:p)?
>
> Sure, you can have 1 and 2 mean the same. 2 wants 0 skid, 1 wants
> constant skid, 0 is a constant, therefore its consistent.

Yes. With IBS I would expect no difference in the samples between
level  1 and 2. But that's okay. As Peter says, it still fits within the
definitions of those levels.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-04-02 18:19 ` [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
                     ` (2 preceding siblings ...)
  2012-04-14 10:24   ` Peter Zijlstra
@ 2012-05-02 10:33   ` Robert Richter
  2012-05-02 11:14     ` Peter Zijlstra
  2012-05-09 14:34     ` [tip:perf/core] " tip-bot for Robert Richter
  3 siblings, 2 replies; 48+ messages in thread
From: Robert Richter @ 2012-05-02 10:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

Updated version. Level 1 and 2 are handled the same way now. Don't
drop samples in precise level 2 if rip is invalid, instead support the
PERF_EFLAGS_EXACT flag.

No changes in other patches of

 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs

-Robert



>From 6d646cefdea9958c3401110caecc958b41f6e84d Mon Sep 17 00:00:00 2001
From: Robert Richter <robert.richter@amd.com>
Date: Mon, 12 Mar 2012 12:54:32 +0100
Subject: [PATCH] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs

This patch adds support for precise event sampling with IBS. There are
two counting modes to count either cycles or micro-ops. If the
corresponding performance counter events (hw events) are setup with
the precise flag set, the request is redirected to the ibs pmu:

 perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
 perf record -a -e r076:p ...          # same as -e cpu-cycles:p
 perf record -a -e r0C1:p ...          # use ibs op counting micro-ops

Each ibs sample contains a linear address that points to the
instruction that was causing the sample to trigger. With ibs we have
skid 0. Thus, ibs supports precise levels 1 and 2. Samples are marked
with the PERF_EFLAGS_EXACT flag set. In rare cases the rip is invalid
when IBS was not able to record the rip correctly. Then the
PERF_EFLAGS_EXACT flag is cleared and the rip is taken from pt_regs.

V2:
* don't drop samples in precise level 2 if rip is invalid, instead
  support the PERF_EFLAGS_EXACT flag

Signed-off-by: Robert Richter <robert.richter@amd.com>
---
 arch/x86/kernel/cpu/perf_event_amd.c     |    7 +++-
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   73 ++++++++++++++++++++++++++++-
 2 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 95e7fe1..4be3463 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -134,8 +134,13 @@ static u64 amd_pmu_event_map(int hw_event)
 
 static int amd_pmu_hw_config(struct perf_event *event)
 {
-	int ret = x86_pmu_hw_config(event);
+	int ret;
 
+	/* pass precise event sampling to ibs: */
+	if (event->attr.precise_ip && get_ibs_caps())
+		return -ENOENT;
+
+	ret = x86_pmu_hw_config(event);
 	if (ret)
 		return ret;
 
diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 0321b64..117b0aa 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -145,17 +145,80 @@ static struct perf_ibs *get_ibs_pmu(int type)
 	return NULL;
 }
 
+/*
+ * Use IBS for precise event sampling:
+ *
+ *  perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
+ *  perf record -a -e r076:p ...          # same as -e cpu-cycles:p
+ *  perf record -a -e r0C1:p ...          # use ibs op counting micro-ops
+ *
+ * IbsOpCntCtl (bit 19) of IBS Execution Control Register (IbsOpCtl,
+ * MSRC001_1033) is used to select either cycle or micro-ops counting
+ * mode.
+ *
+ * The rip of IBS samples has skid 0. Thus, IBS supports precise
+ * levels 1 and 2 and the PERF_EFLAGS_EXACT is set. In rare cases the
+ * rip is invalid when IBS was not able to record the rip correctly.
+ * We clear PERF_EFLAGS_EXACT and take the rip from pt_regs then.
+ *
+ */
+static int perf_ibs_precise_event(struct perf_event *event, u64 *config)
+{
+	switch (event->attr.precise_ip) {
+	case 0:
+		return -ENOENT;
+	case 1:
+	case 2:
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	switch (event->attr.type) {
+	case PERF_TYPE_HARDWARE:
+		switch (event->attr.config) {
+		case PERF_COUNT_HW_CPU_CYCLES:
+			*config = 0;
+			return 0;
+		}
+		break;
+	case PERF_TYPE_RAW:
+		switch (event->attr.config) {
+		case 0x0076:
+			*config = 0;
+			return 0;
+		case 0x00C1:
+			*config = IBS_OP_CNT_CTL;
+			return 0;
+		}
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	return -EOPNOTSUPP;
+}
+
 static int perf_ibs_init(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_ibs *perf_ibs;
 	u64 max_cnt, config;
+	int ret;
 
 	perf_ibs = get_ibs_pmu(event->attr.type);
-	if (!perf_ibs)
+	if (perf_ibs) {
+		config = event->attr.config;
+	} else {
+		perf_ibs = &perf_ibs_op;
+		ret = perf_ibs_precise_event(event, &config);
+		if (ret)
+			return ret;
+	}
+
+	if (event->pmu != &perf_ibs->pmu)
 		return -ENOENT;
 
-	config = event->attr.config;
 	if (config & ~perf_ibs->config_mask)
 		return -EINVAL;
 
@@ -437,8 +500,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	ibs_data.size = sizeof(u64) * size;
 
 	regs = *iregs;
-	if (!check_rip || !(ibs_data.regs[2] & IBS_RIP_INVALID))
+	if (check_rip && (ibs_data.regs[2] & IBS_RIP_INVALID)) {
+		regs.flags &= ~PERF_EFLAGS_EXACT;
+	} else {
 		instruction_pointer_set(&regs, ibs_data.regs[1]);
+		regs.flags |= PERF_EFLAGS_EXACT;
+	}
 
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		raw.size = sizeof(u32) + ibs_data.size;
-- 
1.7.8.4



-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-05-02 10:33   ` [PATCH v2] " Robert Richter
@ 2012-05-02 11:14     ` Peter Zijlstra
  2012-05-04 17:53       ` Peter Zijlstra
  2012-05-09 14:34     ` [tip:perf/core] " tip-bot for Robert Richter
  1 sibling, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2012-05-02 11:14 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On Wed, 2012-05-02 at 12:33 +0200, Robert Richter wrote:
> Updated version. Level 1 and 2 are handled the same way now. Don't
> drop samples in precise level 2 if rip is invalid, instead support the
> PERF_EFLAGS_EXACT flag.
> 
> No changes in other patches of
> 
>  [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs

Thanks!, I managed to stomp all patches on top of -tip and shall be
trying it out on my aging opteron-1216.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-05-02 11:14     ` Peter Zijlstra
@ 2012-05-04 17:53       ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2012-05-04 17:53 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Stephane Eranian, Arnaldo Carvalho de Melo, LKML

On Wed, 2012-05-02 at 13:14 +0200, Peter Zijlstra wrote:
> On Wed, 2012-05-02 at 12:33 +0200, Robert Richter wrote:
> > Updated version. Level 1 and 2 are handled the same way now. Don't
> > drop samples in precise level 2 if rip is invalid, instead support the
> > PERF_EFLAGS_EXACT flag.
> > 
> > No changes in other patches of
> > 
> >  [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
> 
> Thanks!, I managed to stomp all patches on top of -tip and shall be
> trying it out on my aging opteron-1216.

Hmm, that box isn't reporting X86_FEATURE_IBS, a quick trip to Wikipedia
tells me this is a K8 (Santa Ana), not Fam 10h. Means I don't actually
have any hardware to test this on :-(

I'll have to throw it to Ingo then, IIRC he's got an Istanbul part.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Fix update of period
  2012-04-02 18:19 ` [PATCH 01/12] perf/x86-ibs: Fix update of period Robert Richter
@ 2012-05-09 14:29   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  c75841a398d667d9968245b9519d93cedbfb4780
Gitweb:     http://git.kernel.org/tip/c75841a398d667d9968245b9519d93cedbfb4780
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:07 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:11 +0200

perf/x86-ibs: Fix update of period

The last sw period was not correctly updated on overflow and thus led
to wrong distribution of events. We always need to properly initialize
data.period in struct perf_sample_data.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   27 ++++++++++++++-------------
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 8ff74d4..c8f69be 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -386,7 +386,21 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	if (!(*buf++ & perf_ibs->valid_mask))
 		return 0;
 
+	/*
+	 * Emulate IbsOpCurCnt in MSRC001_1033 (IbsOpCtl), not
+	 * supported in all cpus. As this triggered an interrupt, we
+	 * set the current count to the max count.
+	 */
+	config = ibs_data.regs[0];
+	if (perf_ibs == &perf_ibs_op && !(ibs_caps & IBS_CAPS_RDWROPCNT)) {
+		config &= ~IBS_OP_CUR_CNT;
+		config |= (config & IBS_OP_MAX_CNT) << 36;
+	}
+
+	perf_ibs_event_update(perf_ibs, event, config);
 	perf_sample_data_init(&data, 0);
+	data.period = event->hw.last_period;
+
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		ibs_data.caps = ibs_caps;
 		size = 1;
@@ -405,19 +419,6 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 
 	regs = *iregs; /* XXX: update ip from ibs sample */
 
-	/*
-	 * Emulate IbsOpCurCnt in MSRC001_1033 (IbsOpCtl), not
-	 * supported in all cpus. As this triggered an interrupt, we
-	 * set the current count to the max count.
-	 */
-	config = ibs_data.regs[0];
-	if (perf_ibs == &perf_ibs_op && !(ibs_caps & IBS_CAPS_RDWROPCNT)) {
-		config &= ~IBS_OP_CUR_CNT;
-		config |= (config & IBS_OP_MAX_CNT) << 36;
-	}
-
-	perf_ibs_event_update(perf_ibs, event, config);
-
 	overflow = perf_ibs_set_period(perf_ibs, hwc, &config);
 	reenable = !(overflow && perf_event_overflow(event, &data, &regs));
 	config = (config >> 4) | (reenable ? perf_ibs->enable_mask : 0);

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf: Pass last sampling period to perf_sample_data_init()
  2012-04-02 18:19 ` [PATCH 02/12] perf: Pass last sampling period to perf_sample_data_init() Robert Richter
@ 2012-05-09 14:30   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  fd0d000b2c34aa43d4e92dcf0dfaeda7e123008a
Gitweb:     http://git.kernel.org/tip/fd0d000b2c34aa43d4e92dcf0dfaeda7e123008a
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:08 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:12 +0200

perf: Pass last sampling period to perf_sample_data_init()

We always need to pass the last sample period to
perf_sample_data_init(), otherwise the event distribution will be
wrong. Thus, modifiyng the function interface with the required period
as argument. So basically a pattern like this:

        perf_sample_data_init(&data, ~0ULL);
        data.period = event->hw.last_period;

will now be like that:

        perf_sample_data_init(&data, ~0ULL, event->hw.last_period);

Avoids unininitialized data.period and simplifies code.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/alpha/kernel/perf_event.c            |    3 +--
 arch/arm/kernel/perf_event_v6.c           |    4 +---
 arch/arm/kernel/perf_event_v7.c           |    4 +---
 arch/arm/kernel/perf_event_xscale.c       |    8 ++------
 arch/mips/kernel/perf_event_mipsxx.c      |    2 +-
 arch/powerpc/perf/core-book3s.c           |    3 +--
 arch/powerpc/perf/core-fsl-emb.c          |    3 +--
 arch/sparc/kernel/perf_event.c            |    4 +---
 arch/x86/kernel/cpu/perf_event.c          |    4 +---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c  |    3 +--
 arch/x86/kernel/cpu/perf_event_intel.c    |    4 +---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |    6 ++----
 arch/x86/kernel/cpu/perf_event_p4.c       |    6 +++---
 include/linux/perf_event.h                |    5 ++++-
 kernel/events/core.c                      |    9 ++++-----
 15 files changed, 25 insertions(+), 43 deletions(-)

diff --git a/arch/alpha/kernel/perf_event.c b/arch/alpha/kernel/perf_event.c
index 0dae252..d821b17 100644
--- a/arch/alpha/kernel/perf_event.c
+++ b/arch/alpha/kernel/perf_event.c
@@ -824,7 +824,6 @@ static void alpha_perf_event_irq_handler(unsigned long la_ptr,
 
 	idx = la_ptr;
 
-	perf_sample_data_init(&data, 0);
 	for (j = 0; j < cpuc->n_events; j++) {
 		if (cpuc->current_idx[j] == idx)
 			break;
@@ -848,7 +847,7 @@ static void alpha_perf_event_irq_handler(unsigned long la_ptr,
 
 	hwc = &event->hw;
 	alpha_perf_event_update(event, hwc, idx, alpha_pmu->pmc_max_period[idx]+1);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, hwc->last_period);
 
 	if (alpha_perf_event_set_period(event, hwc, idx)) {
 		if (perf_event_overflow(event, &data, regs)) {
diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index b78af0c..ab627a7 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -489,8 +489,6 @@ armv6pmu_handle_irq(int irq_num,
 	 */
 	armv6_pmcr_write(pmcr);
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
@@ -509,7 +507,7 @@ armv6pmu_handle_irq(int irq_num,
 
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!armpmu_event_set_period(event, hwc, idx))
 			continue;
 
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 00755d8..d3c5360 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -1077,8 +1077,6 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
 	 */
 	regs = get_irq_regs();
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
@@ -1097,7 +1095,7 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
 
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!armpmu_event_set_period(event, hwc, idx))
 			continue;
 
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 71a21e6..e34e725 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -248,8 +248,6 @@ xscale1pmu_handle_irq(int irq_num, void *dev)
 
 	regs = get_irq_regs();
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
@@ -263,7 +261,7 @@ xscale1pmu_handle_irq(int irq_num, void *dev)
 
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!armpmu_event_set_period(event, hwc, idx))
 			continue;
 
@@ -588,8 +586,6 @@ xscale2pmu_handle_irq(int irq_num, void *dev)
 
 	regs = get_irq_regs();
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 	for (idx = 0; idx < cpu_pmu->num_events; ++idx) {
 		struct perf_event *event = cpuc->events[idx];
@@ -603,7 +599,7 @@ xscale2pmu_handle_irq(int irq_num, void *dev)
 
 		hwc = &event->hw;
 		armpmu_event_update(event, hwc, idx);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!armpmu_event_set_period(event, hwc, idx))
 			continue;
 
diff --git a/arch/mips/kernel/perf_event_mipsxx.c b/arch/mips/kernel/perf_event_mipsxx.c
index 811084f..ab73fa2 100644
--- a/arch/mips/kernel/perf_event_mipsxx.c
+++ b/arch/mips/kernel/perf_event_mipsxx.c
@@ -1325,7 +1325,7 @@ static int mipsxx_pmu_handle_shared_irq(void)
 
 	regs = get_irq_regs();
 
-	perf_sample_data_init(&data, 0);
+	perf_sample_data_init(&data, 0, 0);
 
 	switch (counters) {
 #define HANDLE_COUNTER(n)						\
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 02aee03..8f84bcb 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1299,8 +1299,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 	if (record) {
 		struct perf_sample_data data;
 
-		perf_sample_data_init(&data, ~0ULL);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, ~0ULL, event->hw.last_period);
 
 		if (event->attr.sample_type & PERF_SAMPLE_ADDR)
 			perf_get_data_addr(regs, &data.addr);
diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index 0a6d2a9..106c533 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -613,8 +613,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 	if (record) {
 		struct perf_sample_data data;
 
-		perf_sample_data_init(&data, 0);
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, event->hw.last_period);
 
 		if (perf_event_overflow(event, &data, regs))
 			fsl_emb_pmu_stop(event, 0);
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 28559ce..5713957 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1296,8 +1296,6 @@ static int __kprobes perf_event_nmi_handler(struct notifier_block *self,
 
 	regs = args->regs;
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
 	/* If the PMU has the TOE IRQ enable bits, we need to do a
@@ -1321,7 +1319,7 @@ static int __kprobes perf_event_nmi_handler(struct notifier_block *self,
 		if (val & (1ULL << 31))
 			continue;
 
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 		if (!sparc_perf_event_set_period(event, hwc, idx))
 			continue;
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index e33e9cf..e049d6d 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1183,8 +1183,6 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 	int idx, handled = 0;
 	u64 val;
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
 	/*
@@ -1219,7 +1217,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 		 * event overflow
 		 */
 		handled++;
-		data.period	= event->hw.last_period;
+		perf_sample_data_init(&data, 0, event->hw.last_period);
 
 		if (!x86_perf_event_set_period(event))
 			continue;
diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index c8f69be..2317228 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -398,8 +398,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	}
 
 	perf_ibs_event_update(perf_ibs, event, config);
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, hwc->last_period);
 
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		ibs_data.caps = ibs_caps;
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 26b3e2f..166546e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1027,8 +1027,6 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	u64 status;
 	int handled;
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
 	/*
@@ -1082,7 +1080,7 @@ again:
 		if (!intel_pmu_save_and_restart(event))
 			continue;
 
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, event->hw.last_period);
 
 		if (has_branch_stack(event))
 			data.br_stack = &cpuc->lbr_stack;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 7f64df1..5a3edc2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -316,8 +316,7 @@ int intel_pmu_drain_bts_buffer(void)
 
 	ds->bts_index = ds->bts_buffer_base;
 
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, event->hw.last_period);
 	regs.ip     = 0;
 
 	/*
@@ -564,8 +563,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	if (!intel_pmu_save_and_restart(event))
 		return;
 
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, event->hw.last_period);
 
 	/*
 	 * We use the interrupt regs as a base because the PEBS record
diff --git a/arch/x86/kernel/cpu/perf_event_p4.c b/arch/x86/kernel/cpu/perf_event_p4.c
index a2dfacf..47124a7 100644
--- a/arch/x86/kernel/cpu/perf_event_p4.c
+++ b/arch/x86/kernel/cpu/perf_event_p4.c
@@ -1005,8 +1005,6 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 	int idx, handled = 0;
 	u64 val;
 
-	perf_sample_data_init(&data, 0);
-
 	cpuc = &__get_cpu_var(cpu_hw_events);
 
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
@@ -1034,10 +1032,12 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 		handled += overflow;
 
 		/* event overflow for sure */
-		data.period = event->hw.last_period;
+		perf_sample_data_init(&data, 0, hwc->last_period);
 
 		if (!x86_perf_event_set_period(event))
 			continue;
+
+
 		if (perf_event_overflow(event, &data, regs))
 			x86_pmu_stop(event, 0);
 	}
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ddbb6a9..f325786 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1132,11 +1132,14 @@ struct perf_sample_data {
 	struct perf_branch_stack	*br_stack;
 };
 
-static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr)
+static inline void perf_sample_data_init(struct perf_sample_data *data,
+					 u64 addr, u64 period)
 {
+	/* remaining struct members initialized in perf_prepare_sample() */
 	data->addr = addr;
 	data->raw  = NULL;
 	data->br_stack = NULL;
+	data->period	= period;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9789a56..00c58df 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4957,7 +4957,7 @@ void __perf_sw_event(u32 event_id, u64 nr, struct pt_regs *regs, u64 addr)
 	if (rctx < 0)
 		return;
 
-	perf_sample_data_init(&data, addr);
+	perf_sample_data_init(&data, addr, 0);
 
 	do_perf_sw_event(PERF_TYPE_SOFTWARE, event_id, nr, &data, regs);
 
@@ -5215,7 +5215,7 @@ void perf_tp_event(u64 addr, u64 count, void *record, int entry_size,
 		.data = record,
 	};
 
-	perf_sample_data_init(&data, addr);
+	perf_sample_data_init(&data, addr, 0);
 	data.raw = &raw;
 
 	hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
@@ -5318,7 +5318,7 @@ void perf_bp_event(struct perf_event *bp, void *data)
 	struct perf_sample_data sample;
 	struct pt_regs *regs = data;
 
-	perf_sample_data_init(&sample, bp->attr.bp_addr);
+	perf_sample_data_init(&sample, bp->attr.bp_addr, 0);
 
 	if (!bp->hw.state && !perf_exclude_event(bp, regs))
 		perf_swevent_event(bp, 1, &sample, regs);
@@ -5344,8 +5344,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 
 	event->pmu->read(event);
 
-	perf_sample_data_init(&data, 0);
-	data.period = event->hw.last_period;
+	perf_sample_data_init(&data, 0, event->hw.last_period);
 	regs = get_irq_regs();
 
 	if (regs && !perf_exclude_event(event, regs)) {

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Enable ibs op micro-ops counting mode
  2012-04-02 18:19 ` [PATCH 03/12] perf/x86-ibs: Enable ibs op micro-ops counting mode Robert Richter
@ 2012-05-09 14:31   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  7bf352384fda3f678a283928c6c5b2cd9da877e4
Gitweb:     http://git.kernel.org/tip/7bf352384fda3f678a283928c6c5b2cd9da877e4
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:09 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:12 +0200

perf/x86-ibs: Enable ibs op micro-ops counting mode

Allow enabling ibs op micro-ops counting mode.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-4-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 2317228..ebf169f 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -468,6 +468,8 @@ static __init int perf_event_ibs_init(void)
 		return -ENODEV;	/* ibs not supported by the cpu */
 
 	perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
+	if (ibs_caps & IBS_CAPS_OPCNT)
+		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
 	perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
 	register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
 	printk(KERN_INFO "perf: AMD IBS detected (0x%08x)\n", ibs_caps);

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Fix frequency profiling
  2012-04-02 18:19 ` [PATCH 04/12] perf/x86-ibs: Fix frequency profiling Robert Richter
@ 2012-05-09 14:32   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  6accb9cf76080422d400a641d9068b6b2a2c216f
Gitweb:     http://git.kernel.org/tip/6accb9cf76080422d400a641d9068b6b2a2c216f
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:10 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:13 +0200

perf/x86-ibs: Fix frequency profiling

Fixing profiling at a fixed frequency, in this case the freq value and
sample period was setup incorrectly. Since sampling periods are
adjusted we also allow periods that have lower 4 bits set.

Another fix is the setup of the hw counter: If we modify
hwc->sample_period, we also need to update hwc->last_period and
hwc->period_left.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-5-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index ebf169f..bc401bd 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -162,9 +162,16 @@ static int perf_ibs_init(struct perf_event *event)
 		if (config & perf_ibs->cnt_mask)
 			/* raw max_cnt may not be set */
 			return -EINVAL;
-		if (hwc->sample_period & 0x0f)
-			/* lower 4 bits can not be set in ibs max cnt */
+		if (!event->attr.sample_freq && hwc->sample_period & 0x0f)
+			/*
+			 * lower 4 bits can not be set in ibs max cnt,
+			 * but allowing it in case we adjust the
+			 * sample period to set a frequency.
+			 */
 			return -EINVAL;
+		hwc->sample_period &= ~0x0FULL;
+		if (!hwc->sample_period)
+			hwc->sample_period = 0x10;
 	} else {
 		max_cnt = config & perf_ibs->cnt_mask;
 		config &= ~perf_ibs->cnt_mask;
@@ -175,6 +182,13 @@ static int perf_ibs_init(struct perf_event *event)
 	if (!hwc->sample_period)
 		return -EINVAL;
 
+	/*
+	 * If we modify hwc->sample_period, we also need to update
+	 * hwc->last_period and hwc->period_left.
+	 */
+	hwc->last_period = hwc->sample_period;
+	local64_set(&hwc->period_left, hwc->sample_period);
+
 	hwc->config_base = perf_ibs->msr;
 	hwc->config = config;
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Take instruction pointer from ibs sample
  2012-04-02 18:19 ` [PATCH 05/12] perf/x86-ibs: Take instruction pointer from ibs sample Robert Richter
@ 2012-05-09 14:33   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  d47e8238cd76f1ffa7c8cd30e08b8e9074fd597e
Gitweb:     http://git.kernel.org/tip/d47e8238cd76f1ffa7c8cd30e08b8e9074fd597e
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:11 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:13 +0200

perf/x86-ibs: Take instruction pointer from ibs sample

Each IBS sample contains a linear address of the instruction that
caused the sample to trigger. This address is more precise than the
rip that was taken from the interrupt handler's stack. Update the rip
with that address. We use this in the next patch to implement
precise-event sampling on AMD systems using IBS.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-6-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/perf_event.h        |    6 ++-
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   48 +++++++++++++++++++----------
 2 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8a3c75d..4e40a64 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -158,6 +158,7 @@ struct x86_pmu_capability {
 #define IBS_CAPS_OPCNT			(1U<<4)
 #define IBS_CAPS_BRNTRGT		(1U<<5)
 #define IBS_CAPS_OPCNTEXT		(1U<<6)
+#define IBS_CAPS_RIPINVALIDCHK		(1U<<7)
 
 #define IBS_CAPS_DEFAULT		(IBS_CAPS_AVAIL		\
 					 | IBS_CAPS_FETCHSAM	\
@@ -170,14 +171,14 @@ struct x86_pmu_capability {
 #define IBSCTL_LVT_OFFSET_VALID		(1ULL<<8)
 #define IBSCTL_LVT_OFFSET_MASK		0x0F
 
-/* IbsFetchCtl bits/masks */
+/* ibs fetch bits/masks */
 #define IBS_FETCH_RAND_EN	(1ULL<<57)
 #define IBS_FETCH_VAL		(1ULL<<49)
 #define IBS_FETCH_ENABLE	(1ULL<<48)
 #define IBS_FETCH_CNT		0xFFFF0000ULL
 #define IBS_FETCH_MAX_CNT	0x0000FFFFULL
 
-/* IbsOpCtl bits */
+/* ibs op bits/masks */
 /* lower 4 bits of the current count are ignored: */
 #define IBS_OP_CUR_CNT		(0xFFFF0ULL<<32)
 #define IBS_OP_CNT_CTL		(1ULL<<19)
@@ -185,6 +186,7 @@ struct x86_pmu_capability {
 #define IBS_OP_ENABLE		(1ULL<<17)
 #define IBS_OP_MAX_CNT		0x0000FFFFULL
 #define IBS_OP_MAX_CNT_EXT	0x007FFFFFULL	/* not a register bit mask */
+#define IBS_RIP_INVALID		(1ULL<<38)
 
 extern u32 get_ibs_caps(void);
 
diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index bc401bd..cc1f329 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -9,6 +9,7 @@
 #include <linux/perf_event.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/ptrace.h>
 
 #include <asm/apic.h>
 
@@ -382,7 +383,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	struct perf_raw_record raw;
 	struct pt_regs regs;
 	struct perf_ibs_data ibs_data;
-	int offset, size, overflow, reenable;
+	int offset, size, check_rip, offset_max, throttle = 0;
 	unsigned int msr;
 	u64 *buf, config;
 
@@ -413,28 +414,41 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 
 	perf_ibs_event_update(perf_ibs, event, config);
 	perf_sample_data_init(&data, 0, hwc->last_period);
+	if (!perf_ibs_set_period(perf_ibs, hwc, &config))
+		goto out;	/* no sw counter overflow */
+
+	ibs_data.caps = ibs_caps;
+	size = 1;
+	offset = 1;
+	check_rip = (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_RIPINVALIDCHK));
+	if (event->attr.sample_type & PERF_SAMPLE_RAW)
+		offset_max = perf_ibs->offset_max;
+	else if (check_rip)
+		offset_max = 2;
+	else
+		offset_max = 1;
+	do {
+		rdmsrl(msr + offset, *buf++);
+		size++;
+		offset = find_next_bit(perf_ibs->offset_mask,
+				       perf_ibs->offset_max,
+				       offset + 1);
+	} while (offset < offset_max);
+	ibs_data.size = sizeof(u64) * size;
+
+	regs = *iregs;
+	if (!check_rip || !(ibs_data.regs[2] & IBS_RIP_INVALID))
+		instruction_pointer_set(&regs, ibs_data.regs[1]);
 
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
-		ibs_data.caps = ibs_caps;
-		size = 1;
-		offset = 1;
-		do {
-		    rdmsrl(msr + offset, *buf++);
-		    size++;
-		    offset = find_next_bit(perf_ibs->offset_mask,
-					   perf_ibs->offset_max,
-					   offset + 1);
-		} while (offset < perf_ibs->offset_max);
-		raw.size = sizeof(u32) + sizeof(u64) * size;
+		raw.size = sizeof(u32) + ibs_data.size;
 		raw.data = ibs_data.data;
 		data.raw = &raw;
 	}
 
-	regs = *iregs; /* XXX: update ip from ibs sample */
-
-	overflow = perf_ibs_set_period(perf_ibs, hwc, &config);
-	reenable = !(overflow && perf_event_overflow(event, &data, &regs));
-	config = (config >> 4) | (reenable ? perf_ibs->enable_mask : 0);
+	throttle = perf_event_overflow(event, &data, &regs);
+out:
+	config = (config >> 4) | (throttle ? 0 : perf_ibs->enable_mask);
 	perf_ibs_enable_event(hwc, config);
 
 	perf_event_update_userpage(event);

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
  2012-05-02 10:33   ` [PATCH v2] " Robert Richter
  2012-05-02 11:14     ` Peter Zijlstra
@ 2012-05-09 14:34     ` tip-bot for Robert Richter
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  450bbd493d436f9eadd1b7828158f37559f26674
Gitweb:     http://git.kernel.org/tip/450bbd493d436f9eadd1b7828158f37559f26674
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 12 Mar 2012 12:54:32 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:14 +0200

perf/x86-ibs: Precise event sampling with IBS for AMD CPUs

This patch adds support for precise event sampling with IBS. There are
two counting modes to count either cycles or micro-ops. If the
corresponding performance counter events (hw events) are setup with
the precise flag set, the request is redirected to the ibs pmu:

 perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
 perf record -a -e r076:p ...          # same as -e cpu-cycles:p
 perf record -a -e r0C1:p ...          # use ibs op counting micro-ops

Each ibs sample contains a linear address that points to the
instruction that was causing the sample to trigger. With ibs we have
skid 0. Thus, ibs supports precise levels 1 and 2. Samples are marked
with the PERF_EFLAGS_EXACT flag set. In rare cases the rip is invalid
when IBS was not able to record the rip correctly. Then the
PERF_EFLAGS_EXACT flag is cleared and the rip is taken from pt_regs.

V2:
* don't drop samples in precise level 2 if rip is invalid, instead
  support the PERF_EFLAGS_EXACT flag

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120502103309.GP18810@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd.c     |    7 +++-
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   73 ++++++++++++++++++++++++++++-
 2 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index 589286f..6565226 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -134,8 +134,13 @@ static u64 amd_pmu_event_map(int hw_event)
 
 static int amd_pmu_hw_config(struct perf_event *event)
 {
-	int ret = x86_pmu_hw_config(event);
+	int ret;
 
+	/* pass precise event sampling to ibs: */
+	if (event->attr.precise_ip && get_ibs_caps())
+		return -ENOENT;
+
+	ret = x86_pmu_hw_config(event);
 	if (ret)
 		return ret;
 
diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index cc1f329..34dfa85 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -145,17 +145,80 @@ static struct perf_ibs *get_ibs_pmu(int type)
 	return NULL;
 }
 
+/*
+ * Use IBS for precise event sampling:
+ *
+ *  perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
+ *  perf record -a -e r076:p ...          # same as -e cpu-cycles:p
+ *  perf record -a -e r0C1:p ...          # use ibs op counting micro-ops
+ *
+ * IbsOpCntCtl (bit 19) of IBS Execution Control Register (IbsOpCtl,
+ * MSRC001_1033) is used to select either cycle or micro-ops counting
+ * mode.
+ *
+ * The rip of IBS samples has skid 0. Thus, IBS supports precise
+ * levels 1 and 2 and the PERF_EFLAGS_EXACT is set. In rare cases the
+ * rip is invalid when IBS was not able to record the rip correctly.
+ * We clear PERF_EFLAGS_EXACT and take the rip from pt_regs then.
+ *
+ */
+static int perf_ibs_precise_event(struct perf_event *event, u64 *config)
+{
+	switch (event->attr.precise_ip) {
+	case 0:
+		return -ENOENT;
+	case 1:
+	case 2:
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	switch (event->attr.type) {
+	case PERF_TYPE_HARDWARE:
+		switch (event->attr.config) {
+		case PERF_COUNT_HW_CPU_CYCLES:
+			*config = 0;
+			return 0;
+		}
+		break;
+	case PERF_TYPE_RAW:
+		switch (event->attr.config) {
+		case 0x0076:
+			*config = 0;
+			return 0;
+		case 0x00C1:
+			*config = IBS_OP_CNT_CTL;
+			return 0;
+		}
+		break;
+	default:
+		return -ENOENT;
+	}
+
+	return -EOPNOTSUPP;
+}
+
 static int perf_ibs_init(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_ibs *perf_ibs;
 	u64 max_cnt, config;
+	int ret;
 
 	perf_ibs = get_ibs_pmu(event->attr.type);
-	if (!perf_ibs)
+	if (perf_ibs) {
+		config = event->attr.config;
+	} else {
+		perf_ibs = &perf_ibs_op;
+		ret = perf_ibs_precise_event(event, &config);
+		if (ret)
+			return ret;
+	}
+
+	if (event->pmu != &perf_ibs->pmu)
 		return -ENOENT;
 
-	config = event->attr.config;
 	if (config & ~perf_ibs->config_mask)
 		return -EINVAL;
 
@@ -437,8 +500,12 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	ibs_data.size = sizeof(u64) * size;
 
 	regs = *iregs;
-	if (!check_rip || !(ibs_data.regs[2] & IBS_RIP_INVALID))
+	if (check_rip && (ibs_data.regs[2] & IBS_RIP_INVALID)) {
+		regs.flags &= ~PERF_EFLAGS_EXACT;
+	} else {
 		instruction_pointer_set(&regs, ibs_data.regs[1]);
+		regs.flags |= PERF_EFLAGS_EXACT;
+	}
 
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
 		raw.size = sizeof(u32) + ibs_data.size;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Rename some variables
  2012-04-02 18:19 ` [PATCH 07/12] perf/x86-ibs: Rename some variables Robert Richter
@ 2012-05-09 14:34   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  98112d2e957e0d348f06d8a40f2f720204a70b55
Gitweb:     http://git.kernel.org/tip/98112d2e957e0d348f06d8a40f2f720204a70b55
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:13 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:14 +0200

perf/x86-ibs: Rename some variables

Simple patch that just renames some variables for better
understanding.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-8-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 34dfa85..29a1bff 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -62,7 +62,7 @@ struct perf_ibs_data {
 };
 
 static int
-perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *count)
+perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_period)
 {
 	s64 left = local64_read(&hwc->period_left);
 	s64 period = hwc->sample_period;
@@ -91,7 +91,7 @@ perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *count)
 	if (left > max)
 		left = max;
 
-	*count = (u64)left;
+	*hw_period = (u64)left;
 
 	return overflow;
 }
@@ -262,13 +262,13 @@ static int perf_ibs_init(struct perf_event *event)
 static int perf_ibs_set_period(struct perf_ibs *perf_ibs,
 			       struct hw_perf_event *hwc, u64 *period)
 {
-	int ret;
+	int overflow;
 
 	/* ignore lower 4 bits in min count: */
-	ret = perf_event_set_period(hwc, 1<<4, perf_ibs->max_period, period);
+	overflow = perf_event_set_period(hwc, 1<<4, perf_ibs->max_period, period);
 	local64_set(&hwc->prev_count, 0);
 
-	return ret;
+	return overflow;
 }
 
 static u64 get_ibs_fetch_count(u64 config)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Trigger overflow if remaining period is too small
  2012-04-02 18:19 ` [PATCH 08/12] perf/x86-ibs: Trigger overflow if remaining period is too small Robert Richter
@ 2012-05-09 14:35   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  fc006cf7cc7471e1bdf34e40111971e03622af6c
Gitweb:     http://git.kernel.org/tip/fc006cf7cc7471e1bdf34e40111971e03622af6c
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:14 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:15 +0200

perf/x86-ibs: Trigger overflow if remaining period is too small

There are cases where the remaining period is smaller than the minimal
possible value. In this case the counter is restarted with the minimal
period. This is of no use as the interrupt handler will trigger
immediately again and most likely hits itself. This biases the
results.

So, if the remaining period is within the min range, we better do not
restart the counter and instead trigger the overflow.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-9-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 29a1bff..3e32908 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -78,16 +78,13 @@ perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_perio
 		overflow = 1;
 	}
 
-	if (unlikely(left <= 0)) {
+	if (unlikely(left < (s64)min)) {
 		left += period;
 		local64_set(&hwc->period_left, left);
 		hwc->last_period = period;
 		overflow = 1;
 	}
 
-	if (unlikely(left < min))
-		left = min;
-
 	if (left > max)
 		left = max;
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Extend hw period that triggers overflow
  2012-04-02 18:19 ` [PATCH 09/12] perf/x86-ibs: Extend hw period that triggers overflow Robert Richter
@ 2012-05-09 14:36   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  7caaf4d8241feecafb87919402b0a6dbb1b71d9e
Gitweb:     http://git.kernel.org/tip/7caaf4d8241feecafb87919402b0a6dbb1b71d9e
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:15 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:15 +0200

perf/x86-ibs: Extend hw period that triggers overflow

If the last hw period is too short we might hit the irq handler which
biases the results. Thus try to have a max last period that triggers
the sw overflow.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-10-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 3e32908..cb51a3e 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -85,8 +85,19 @@ perf_event_set_period(struct hw_perf_event *hwc, u64 min, u64 max, u64 *hw_perio
 		overflow = 1;
 	}
 
-	if (left > max)
-		left = max;
+	/*
+	 * If the hw period that triggers the sw overflow is too short
+	 * we might hit the irq handler. This biases the results.
+	 * Thus we shorten the next-to-last period and set the last
+	 * period to the max period.
+	 */
+	if (left > max) {
+		left -= max;
+		if (left > max)
+			left = max;
+		else if (left < min)
+			left = min;
+	}
 
 	*hw_period = (u64)left;
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Implement workaround for IBS erratum #420
  2012-04-02 18:19 ` [PATCH 10/12] perf/x86-ibs: Implement workaround for IBS erratum #420 Robert Richter
@ 2012-05-09 14:37   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  c9574fe0bdb9ac9a2698e02a712088ce8431e9f8
Gitweb:     http://git.kernel.org/tip/c9574fe0bdb9ac9a2698e02a712088ce8431e9f8
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:16 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:16 +0200

perf/x86-ibs: Implement workaround for IBS erratum #420

When disabling ibs there might be the case where hardware continuously
generates interrupts. This is described in erratum #420 (Instruction-
Based Sampling Engine May Generate Interrupt that Cannot Be Cleared).
To avoid this we must clear the counter mask first and then clear the
enable bit. This patch implements this.

See Revision Guide for AMD Family 10h Processors, Publication #41322.

Note: We now keep track of the last read ibs config value which is
then used to disable ibs. To update the config value we pass now a
pointer to the functions reading it.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-11-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   62 +++++++++++++++++++-----------
 1 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index cb51a3e..b14e711 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -291,20 +291,36 @@ static u64 get_ibs_op_count(u64 config)
 
 static void
 perf_ibs_event_update(struct perf_ibs *perf_ibs, struct perf_event *event,
-		      u64 config)
+		      u64 *config)
 {
-	u64 count = perf_ibs->get_count(config);
+	u64 count = perf_ibs->get_count(*config);
 
 	while (!perf_event_try_update(event, count, 20)) {
-		rdmsrl(event->hw.config_base, config);
-		count = perf_ibs->get_count(config);
+		rdmsrl(event->hw.config_base, *config);
+		count = perf_ibs->get_count(*config);
 	}
 }
 
-/* Note: The enable mask must be encoded in the config argument. */
-static inline void perf_ibs_enable_event(struct hw_perf_event *hwc, u64 config)
+static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
+					 struct hw_perf_event *hwc, u64 config)
 {
-	wrmsrl(hwc->config_base, hwc->config | config);
+	wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
+}
+
+/*
+ * Erratum #420 Instruction-Based Sampling Engine May Generate
+ * Interrupt that Cannot Be Cleared:
+ *
+ * Must clear counter mask first, then clear the enable bit. See
+ * Revision Guide for AMD Family 10h Processors, Publication #41322.
+ */
+static inline void perf_ibs_disable_event(struct perf_ibs *perf_ibs,
+					  struct hw_perf_event *hwc, u64 config)
+{
+	config &= ~perf_ibs->cnt_mask;
+	wrmsrl(hwc->config_base, config);
+	config &= ~perf_ibs->enable_mask;
+	wrmsrl(hwc->config_base, config);
 }
 
 /*
@@ -318,7 +334,7 @@ static void perf_ibs_start(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_ibs *perf_ibs = container_of(event->pmu, struct perf_ibs, pmu);
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
-	u64 config;
+	u64 period;
 
 	if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
 		return;
@@ -326,10 +342,9 @@ static void perf_ibs_start(struct perf_event *event, int flags)
 	WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
 	hwc->state = 0;
 
-	perf_ibs_set_period(perf_ibs, hwc, &config);
-	config = (config >> 4) | perf_ibs->enable_mask;
+	perf_ibs_set_period(perf_ibs, hwc, &period);
 	set_bit(IBS_STARTED, pcpu->state);
-	perf_ibs_enable_event(hwc, config);
+	perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
 
 	perf_event_update_userpage(event);
 }
@@ -339,7 +354,7 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_ibs *perf_ibs = container_of(event->pmu, struct perf_ibs, pmu);
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
-	u64 val;
+	u64 config;
 	int stopping;
 
 	stopping = test_and_clear_bit(IBS_STARTED, pcpu->state);
@@ -347,12 +362,11 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
 	if (!stopping && (hwc->state & PERF_HES_UPTODATE))
 		return;
 
-	rdmsrl(hwc->config_base, val);
+	rdmsrl(hwc->config_base, config);
 
 	if (stopping) {
 		set_bit(IBS_STOPPING, pcpu->state);
-		val &= ~perf_ibs->enable_mask;
-		wrmsrl(hwc->config_base, val);
+		perf_ibs_disable_event(perf_ibs, hwc, config);
 		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 		hwc->state |= PERF_HES_STOPPED;
 	}
@@ -360,7 +374,7 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
 	if (hwc->state & PERF_HES_UPTODATE)
 		return;
 
-	perf_ibs_event_update(perf_ibs, event, val);
+	perf_ibs_event_update(perf_ibs, event, &config);
 	hwc->state |= PERF_HES_UPTODATE;
 }
 
@@ -456,7 +470,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	struct perf_ibs_data ibs_data;
 	int offset, size, check_rip, offset_max, throttle = 0;
 	unsigned int msr;
-	u64 *buf, config;
+	u64 *buf, *config, period;
 
 	if (!test_bit(IBS_STARTED, pcpu->state)) {
 		/* Catch spurious interrupts after stopping IBS: */
@@ -477,15 +491,15 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	 * supported in all cpus. As this triggered an interrupt, we
 	 * set the current count to the max count.
 	 */
-	config = ibs_data.regs[0];
+	config = &ibs_data.regs[0];
 	if (perf_ibs == &perf_ibs_op && !(ibs_caps & IBS_CAPS_RDWROPCNT)) {
-		config &= ~IBS_OP_CUR_CNT;
-		config |= (config & IBS_OP_MAX_CNT) << 36;
+		*config &= ~IBS_OP_CUR_CNT;
+		*config |= (*config & IBS_OP_MAX_CNT) << 36;
 	}
 
 	perf_ibs_event_update(perf_ibs, event, config);
 	perf_sample_data_init(&data, 0, hwc->last_period);
-	if (!perf_ibs_set_period(perf_ibs, hwc, &config))
+	if (!perf_ibs_set_period(perf_ibs, hwc, &period))
 		goto out;	/* no sw counter overflow */
 
 	ibs_data.caps = ibs_caps;
@@ -523,8 +537,10 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 
 	throttle = perf_event_overflow(event, &data, &regs);
 out:
-	config = (config >> 4) | (throttle ? 0 : perf_ibs->enable_mask);
-	perf_ibs_enable_event(hwc, config);
+	if (throttle)
+		perf_ibs_disable_event(perf_ibs, hwc, *config);
+	else
+		perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
 
 	perf_event_update_userpage(event);
 

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Catch spurious interrupts after stopping IBS
  2012-04-02 18:19 ` [PATCH 11/12] perf/x86-ibs: Catch spurious interrupts after stopping ibs Robert Richter
@ 2012-05-09 14:38   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  fc5fb2b5e1874e5894e2ac503bfb744220db89a1
Gitweb:     http://git.kernel.org/tip/fc5fb2b5e1874e5894e2ac503bfb744220db89a1
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:17 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:16 +0200

perf/x86-ibs: Catch spurious interrupts after stopping IBS

After disabling IBS there could be still incomming NMIs with samples
that even have the valid bit cleared. Mark all this NMIs as handled to
avoid spurious interrupt messages.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-12-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index b14e711..5a9f95b 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -473,11 +473,13 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	u64 *buf, *config, period;
 
 	if (!test_bit(IBS_STARTED, pcpu->state)) {
-		/* Catch spurious interrupts after stopping IBS: */
-		if (!test_and_clear_bit(IBS_STOPPING, pcpu->state))
-			return 0;
-		rdmsrl(perf_ibs->msr, *ibs_data.regs);
-		return (*ibs_data.regs & perf_ibs->valid_mask) ? 1 : 0;
+		/*
+		 * Catch spurious interrupts after stopping IBS: After
+		 * disabling IBS there could be still incomming NMIs
+		 * with samples that even have the valid bit cleared.
+		 * Mark all this NMIs as handled.
+		 */
+		return test_and_clear_bit(IBS_STOPPING, pcpu->state) ? 1 : 0;
 	}
 
 	msr = hwc->config_base;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip:perf/core] perf/x86-ibs: Fix usage of IBS op current count
  2012-04-02 18:19 ` [PATCH 12/12] perf/x86-ibs: Fix usage of IBS op current count Robert Richter
@ 2012-05-09 14:39   ` tip-bot for Robert Richter
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot for Robert Richter @ 2012-05-09 14:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, robert.richter, a.p.zijlstra, tglx

Commit-ID:  8b1e13638d465863572c8207a5cfceeef0cf0441
Gitweb:     http://git.kernel.org/tip/8b1e13638d465863572c8207a5cfceeef0cf0441
Author:     Robert Richter <robert.richter@amd.com>
AuthorDate: Mon, 2 Apr 2012 20:19:18 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 9 May 2012 15:23:17 +0200

perf/x86-ibs: Fix usage of IBS op current count

The value of IbsOpCurCnt rolls over when it reaches IbsOpMaxCnt. Thus,
it is reset to zero by hardware. To get the correct count we need to
add the max count to it in case we received an ibs sample (valid bit
set).

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333390758-10893-13-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event_amd_ibs.c |   33 +++++++++++++++++++----------
 1 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_amd_ibs.c b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
index 5a9f95b..da9bcdc 100644
--- a/arch/x86/kernel/cpu/perf_event_amd_ibs.c
+++ b/arch/x86/kernel/cpu/perf_event_amd_ibs.c
@@ -286,7 +286,15 @@ static u64 get_ibs_fetch_count(u64 config)
 
 static u64 get_ibs_op_count(u64 config)
 {
-	return (config & IBS_OP_CUR_CNT) >> 32;
+	u64 count = 0;
+
+	if (config & IBS_OP_VAL)
+		count += (config & IBS_OP_MAX_CNT) << 4; /* cnt rolled over */
+
+	if (ibs_caps & IBS_CAPS_RDWROPCNT)
+		count += (config & IBS_OP_CUR_CNT) >> 32;
+
+	return count;
 }
 
 static void
@@ -295,7 +303,12 @@ perf_ibs_event_update(struct perf_ibs *perf_ibs, struct perf_event *event,
 {
 	u64 count = perf_ibs->get_count(*config);
 
-	while (!perf_event_try_update(event, count, 20)) {
+	/*
+	 * Set width to 64 since we do not overflow on max width but
+	 * instead on max count. In perf_ibs_set_period() we clear
+	 * prev count manually on overflow.
+	 */
+	while (!perf_event_try_update(event, count, 64)) {
 		rdmsrl(event->hw.config_base, *config);
 		count = perf_ibs->get_count(*config);
 	}
@@ -374,6 +387,12 @@ static void perf_ibs_stop(struct perf_event *event, int flags)
 	if (hwc->state & PERF_HES_UPTODATE)
 		return;
 
+	/*
+	 * Clear valid bit to not count rollovers on update, rollovers
+	 * are only updated in the irq handler.
+	 */
+	config &= ~perf_ibs->valid_mask;
+
 	perf_ibs_event_update(perf_ibs, event, &config);
 	hwc->state |= PERF_HES_UPTODATE;
 }
@@ -488,17 +507,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	if (!(*buf++ & perf_ibs->valid_mask))
 		return 0;
 
-	/*
-	 * Emulate IbsOpCurCnt in MSRC001_1033 (IbsOpCtl), not
-	 * supported in all cpus. As this triggered an interrupt, we
-	 * set the current count to the max count.
-	 */
 	config = &ibs_data.regs[0];
-	if (perf_ibs == &perf_ibs_op && !(ibs_caps & IBS_CAPS_RDWROPCNT)) {
-		*config &= ~IBS_OP_CUR_CNT;
-		*config |= (*config & IBS_OP_MAX_CNT) << 36;
-	}
-
 	perf_ibs_event_update(perf_ibs, event, config);
 	perf_sample_data_init(&data, 0, hwc->last_period);
 	if (!perf_ibs_set_period(perf_ibs, hwc, &period))

^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2012-05-09 14:39 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
2012-04-02 18:19 ` [PATCH 01/12] perf/x86-ibs: Fix update of period Robert Richter
2012-05-09 14:29   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 02/12] perf: Pass last sampling period to perf_sample_data_init() Robert Richter
2012-05-09 14:30   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 03/12] perf/x86-ibs: Enable ibs op micro-ops counting mode Robert Richter
2012-05-09 14:31   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 04/12] perf/x86-ibs: Fix frequency profiling Robert Richter
2012-05-09 14:32   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 05/12] perf/x86-ibs: Take instruction pointer from ibs sample Robert Richter
2012-05-09 14:33   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
2012-04-14 10:21   ` Peter Zijlstra
2012-04-23  9:56     ` Robert Richter
2012-04-27 12:34       ` Robert Richter
2012-04-27 12:39         ` Stephane Eranian
2012-04-27 12:54           ` Robert Richter
2012-04-27 13:10             ` Stephane Eranian
2012-04-27 15:18               ` Robert Richter
2012-04-27 15:30                 ` Peter Zijlstra
2012-04-27 15:57                   ` Stephane Eranian
2012-04-27 15:30             ` Peter Zijlstra
2012-04-27 16:09               ` Robert Richter
2012-04-27 16:21                 ` Peter Zijlstra
2012-04-27 16:23                   ` Stephane Eranian
2012-04-14 10:22   ` Peter Zijlstra
2012-04-23  8:41     ` Robert Richter
2012-04-23 10:36       ` Peter Zijlstra
2012-04-14 10:24   ` Peter Zijlstra
2012-04-23 10:08     ` Robert Richter
2012-05-02 10:33   ` [PATCH v2] " Robert Richter
2012-05-02 11:14     ` Peter Zijlstra
2012-05-04 17:53       ` Peter Zijlstra
2012-05-09 14:34     ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 07/12] perf/x86-ibs: Rename some variables Robert Richter
2012-05-09 14:34   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 08/12] perf/x86-ibs: Trigger overflow if remaining period is too small Robert Richter
2012-05-09 14:35   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 09/12] perf/x86-ibs: Extend hw period that triggers overflow Robert Richter
2012-05-09 14:36   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 10/12] perf/x86-ibs: Implement workaround for IBS erratum #420 Robert Richter
2012-05-09 14:37   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 11/12] perf/x86-ibs: Catch spurious interrupts after stopping ibs Robert Richter
2012-05-09 14:38   ` [tip:perf/core] perf/x86-ibs: Catch spurious interrupts after stopping IBS tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 12/12] perf/x86-ibs: Fix usage of IBS op current count Robert Richter
2012-05-09 14:39   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 19:11 ` [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Ingo Molnar
2012-04-03 10:48   ` Robert Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).