All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] bug fix mmap read in large PEBS
@ 2017-12-18 11:34 kan.liang
  2017-12-18 11:34 ` [PATCH 1/4] perf/x86/intel: pass auto-reload information to event update kan.liang
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: kan.liang @ 2017-12-18 11:34 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel; +Cc: tglx, jolsa, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

There is bug when mmap read event->count with large PEBS enabled.
Here is an example.
 #./read_count
 0x71f0
 0x122c0
 0x1000000001c54
 0x100000001257d
 0x200000000bdc5

The bug is caused by two issues.
- In x86_perf_event_update, the calculation of event->count does not
  take the auto-reload values into account.
- In x86_pmu_read, it doesn't count the undrained values in large PEBS
  buffers.

The issue is introduced with the auto-reload mechanism enabled by
commit 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload
mechanism when possible")

The source code of read_count is as below.

struct cpu {
        int fd;
        struct perf_event_mmap_page *buf;
};

int perf_open(struct cpu *ctx, int cpu)
{
        struct perf_event_attr attr = {
                .type = PERF_TYPE_HARDWARE,
                .size = sizeof(struct perf_event_attr),
                .sample_period = 100000,
                .config = 0,
                .sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME | PERF_SAMPLE_CPU,
                .precise_ip = 3,
                .mmap = 1,
                .comm = 1,
                .task = 1,
                .mmap2 = 1,
                .sample_id_all = 1,
                .comm_exec = 1,
        };
        ctx->buf = NULL;
        ctx->fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0);
        if (ctx->fd < 0) {
                perror("perf_event_open");
                return -1;
        }

        ctx->buf = mmap(NULL, sysconf(_SC_PAGESIZE), PROT_READ, MAP_SHARED, ctx->fd, 0);
        if (ctx->buf == MAP_FAILED) {
                close(ctx->fd);
                perror("mmap on perf fd");
                return -1;
        }
        return 0;
}

void perf_close(struct cpu *ctx)
{
        close(ctx->fd);
        if (ctx->buf)
                munmap(ctx->buf, pagesize);
}

int main(int ac, char **av)
{
        struct cpu ctx;
        u64 count;

        perf_open(&ctx, 0);

        while (1) {
                sleep(5);

                if (read(ctx.fd, &count, 8) != 8) {
                        perror("counter read");
                        break;
                }
                printf("0x%llx\n", count);

        }
        perf_close(&ctx);
}

Kan Liang (4):
  perf/x86/intel: pass auto-reload information to event update
  perf/x86/intel: fix event update for auto-reload
  perf/x86: introduce read function for x86_pmu
  perf/x86/intel: drain PEBS buffer in event read

 arch/x86/events/core.c       | 26 ++++++++++++++++++++++----
 arch/x86/events/intel/core.c | 18 ++++++++++++++----
 arch/x86/events/intel/ds.c   | 18 +++++++++++++++++-
 arch/x86/events/intel/knc.c  |  2 +-
 arch/x86/events/intel/p4.c   |  2 +-
 arch/x86/events/perf_event.h |  9 +++++++--
 6 files changed, 62 insertions(+), 13 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] perf/x86/intel: pass auto-reload information to event update
  2017-12-18 11:34 [PATCH 0/4] bug fix mmap read in large PEBS kan.liang
@ 2017-12-18 11:34 ` kan.liang
  2017-12-18 11:34 ` [PATCH 2/4] perf/x86/intel: fix event update for auto-reload kan.liang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2017-12-18 11:34 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel; +Cc: tglx, jolsa, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

There is auto-reload mechanism enabled for PEBS events in fixed period
mode. When calculating the event->count, the reload value also need to
be taken into account.

Pass the auto-reload value and times to event update.
They will be used later.

No functional change.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c       | 9 +++++----
 arch/x86/events/intel/core.c | 9 +++++----
 arch/x86/events/intel/ds.c   | 2 +-
 arch/x86/events/intel/knc.c  | 2 +-
 arch/x86/events/intel/p4.c   | 2 +-
 arch/x86/events/perf_event.h | 6 ++++--
 6 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 140d332..35552ea 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -64,7 +64,8 @@ u64 __read_mostly hw_cache_extra_regs
  * Can only be executed on the CPU where the event is active.
  * Returns the delta events processed.
  */
-u64 x86_perf_event_update(struct perf_event *event)
+u64 x86_perf_event_update(struct perf_event *event,
+			  u64 reload_val, int reload_times)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	int shift = 64 - x86_pmu.cntval_bits;
@@ -1357,7 +1358,7 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 		 * Drain the remaining delta count out of a event
 		 * that we are disabling:
 		 */
-		x86_perf_event_update(event);
+		x86_perf_event_update(event, 0, 0);
 		hwc->state |= PERF_HES_UPTODATE;
 	}
 }
@@ -1456,7 +1457,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 
 		event = cpuc->events[idx];
 
-		val = x86_perf_event_update(event);
+		val = x86_perf_event_update(event, 0, 0);
 		if (val & (1ULL << (x86_pmu.cntval_bits - 1)))
 			continue;
 
@@ -1884,7 +1885,7 @@ early_initcall(init_hw_perf_events);
 
 static inline void x86_pmu_read(struct perf_event *event)
 {
-	x86_perf_event_update(event);
+	x86_perf_event_update(event, 0, 0);
 }
 
 /*
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 09c26a4..1f7edaf 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1965,7 +1965,7 @@ static void intel_pmu_nhm_workaround(void)
 	for (i = 0; i < 4; i++) {
 		event = cpuc->events[i];
 		if (event)
-			x86_perf_event_update(event);
+			x86_perf_event_update(event, 0, 0);
 	}
 
 	for (i = 0; i < 4; i++) {
@@ -2135,9 +2135,10 @@ static void intel_pmu_add_event(struct perf_event *event)
  * Save and restart an expired event. Called by NMI contexts,
  * so it has to be careful about preempting normal event ops:
  */
-int intel_pmu_save_and_restart(struct perf_event *event)
+int intel_pmu_save_and_restart(struct perf_event *event,
+			       u64 reload_val, int reload_times)
 {
-	x86_perf_event_update(event);
+	x86_perf_event_update(event, reload_val, reload_times);
 	/*
 	 * For a checkpointed counter always reset back to 0.  This
 	 * avoids a situation where the counter overflows, aborts the
@@ -2299,7 +2300,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 		if (!test_bit(bit, cpuc->active_mask))
 			continue;
 
-		if (!intel_pmu_save_and_restart(event))
+		if (!intel_pmu_save_and_restart(event, 0, 0))
 			continue;
 
 		perf_sample_data_init(&data, 0, event->hw.last_period);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 3674a4b..0b693b7 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1260,7 +1260,7 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	struct pt_regs regs;
 	void *at = get_next_pebs_record_by_bit(base, top, bit);
 
-	if (!intel_pmu_save_and_restart(event) &&
+	if (!intel_pmu_save_and_restart(event, 0, 0) &&
 	    !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
 		return;
 
diff --git a/arch/x86/events/intel/knc.c b/arch/x86/events/intel/knc.c
index 618001c..b42a49d 100644
--- a/arch/x86/events/intel/knc.c
+++ b/arch/x86/events/intel/knc.c
@@ -247,7 +247,7 @@ static int knc_pmu_handle_irq(struct pt_regs *regs)
 		if (!test_bit(bit, cpuc->active_mask))
 			continue;
 
-		if (!intel_pmu_save_and_restart(event))
+		if (!intel_pmu_save_and_restart(event, 0, 0))
 			continue;
 
 		perf_sample_data_init(&data, 0, event->hw.last_period);
diff --git a/arch/x86/events/intel/p4.c b/arch/x86/events/intel/p4.c
index d32c0ee..672bf5e 100644
--- a/arch/x86/events/intel/p4.c
+++ b/arch/x86/events/intel/p4.c
@@ -1024,7 +1024,7 @@ static int p4_pmu_handle_irq(struct pt_regs *regs)
 		/* it might be unflagged overflow */
 		overflow = p4_pmu_clear_cccr_ovf(hwc);
 
-		val = x86_perf_event_update(event);
+		val = x86_perf_event_update(event, 0, 0);
 		if (!overflow && (val & (1ULL << (x86_pmu.cntval_bits - 1))))
 			continue;
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index f7aaadf..fabb8b3 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -738,7 +738,8 @@ extern u64 __read_mostly hw_cache_extra_regs
 				[PERF_COUNT_HW_CACHE_OP_MAX]
 				[PERF_COUNT_HW_CACHE_RESULT_MAX];
 
-u64 x86_perf_event_update(struct perf_event *event);
+u64 x86_perf_event_update(struct perf_event *event,
+			  u64 reload_val, int reload_times);
 
 static inline unsigned int x86_pmu_config_addr(int index)
 {
@@ -871,7 +872,8 @@ static inline bool intel_pmu_has_bts(struct perf_event *event)
 	return false;
 }
 
-int intel_pmu_save_and_restart(struct perf_event *event);
+int intel_pmu_save_and_restart(struct perf_event *event,
+			       u64 reload_val, int reload_times);
 
 struct event_constraint *
 x86_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] perf/x86/intel: fix event update for auto-reload
  2017-12-18 11:34 [PATCH 0/4] bug fix mmap read in large PEBS kan.liang
  2017-12-18 11:34 ` [PATCH 1/4] perf/x86/intel: pass auto-reload information to event update kan.liang
@ 2017-12-18 11:34 ` kan.liang
  2017-12-19 18:58   ` Peter Zijlstra
  2017-12-18 11:34 ` [PATCH 3/4] perf/x86: introduce read function for x86_pmu kan.liang
  2017-12-18 11:34 ` [PATCH 4/4] perf/x86/intel: drain PEBS buffer in event read kan.liang
  3 siblings, 1 reply; 13+ messages in thread
From: kan.liang @ 2017-12-18 11:34 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel; +Cc: tglx, jolsa, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

There is bug when mmap read event->count with large PEBS enabled.
Here is an example.
 #./read_count
 0x71f0
 0x122c0
 0x1000000001c54
 0x100000001257d
 0x200000000bdc5

There is auto-reload mechanism enabled for PEBS events in fixed period
mode. But the calculation of event->count does not take the auto-reload
values into account. Anyone who read the event->count will get wrong
result, e.g x86_pmu_read. Also, the calculation of hwc->period_left is
wrong either. It impacts the accuracy of period for the first record in
PEBS multiple records.

The issue is introduced with the auto-reload mechanism enabled by
commit 851559e35fd5 ("perf/x86/intel: Use the PEBS auto reload mechanism
when possible")

For the auto-reload before the last time, it went through the whole
period (reload value) every time. So period * times should be added into
the event->count.
For the last load, the elapsed delta (event-)time need to be corrected
by adding the period (reload value). Because the start point is -period.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c     | 14 ++++++++++++++
 arch/x86/events/intel/ds.c |  8 +++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 35552ea..f74e21d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -100,6 +100,20 @@ u64 x86_perf_event_update(struct perf_event *event,
 	 * of the count.
 	 */
 	delta = (new_raw_count << shift) - (prev_raw_count << shift);
+
+	/*
+	 * Take auto-reload into account
+	 * For the auto-reload before the last time, it went through the
+	 * whole period (reload_val) every time.
+	 * Just simply add period * times to the event.
+	 *
+	 * For the last load, the elapsed delta (event-)time need to be
+	 * corrected by adding the period. Because the start point is -period.
+	 */
+	if (reload_times > 0) {
+		delta += (reload_val << shift);
+		local64_add(reload_val * (reload_times - 1), &event->count);
+	}
 	delta >>= shift;
 
 	local64_add(delta, &event->count);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 0b693b7..f0f6026 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1256,11 +1256,17 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 				   void *base, void *top,
 				   int bit, int count)
 {
+	struct hw_perf_event *hwc = &event->hw;
 	struct perf_sample_data data;
 	struct pt_regs regs;
 	void *at = get_next_pebs_record_by_bit(base, top, bit);
 
-	if (!intel_pmu_save_and_restart(event, 0, 0) &&
+	/*
+	 * Now, auto-reload is only enabled in fixed period mode.
+	 * The reload value is always hwc->sample_period.
+	 * May need to change it, if auto-reload is enabled in freq mode later.
+	 */
+	if (!intel_pmu_save_and_restart(event, hwc->sample_period, count - 1) &&
 	    !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
 		return;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] perf/x86: introduce read function for x86_pmu
  2017-12-18 11:34 [PATCH 0/4] bug fix mmap read in large PEBS kan.liang
  2017-12-18 11:34 ` [PATCH 1/4] perf/x86/intel: pass auto-reload information to event update kan.liang
  2017-12-18 11:34 ` [PATCH 2/4] perf/x86/intel: fix event update for auto-reload kan.liang
@ 2017-12-18 11:34 ` kan.liang
  2017-12-18 11:34 ` [PATCH 4/4] perf/x86/intel: drain PEBS buffer in event read kan.liang
  3 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2017-12-18 11:34 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel; +Cc: tglx, jolsa, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Large PEBS need to be specially handled in event count read.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c       | 3 +++
 arch/x86/events/perf_event.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f74e21d..c781497 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1899,6 +1899,9 @@ early_initcall(init_hw_perf_events);
 
 static inline void x86_pmu_read(struct perf_event *event)
 {
+	if (x86_pmu.read)
+		return x86_pmu.read(event);
+
 	x86_perf_event_update(event, 0, 0);
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index fabb8b3..3a7a2e7 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -534,6 +534,7 @@ struct x86_pmu {
 	void		(*disable)(struct perf_event *);
 	void		(*add)(struct perf_event *);
 	void		(*del)(struct perf_event *);
+	void		(*read)(struct perf_event *event);
 	int		(*hw_config)(struct perf_event *event);
 	int		(*schedule_events)(struct cpu_hw_events *cpuc, int n, int *assign);
 	unsigned	eventsel;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] perf/x86/intel: drain PEBS buffer in event read
  2017-12-18 11:34 [PATCH 0/4] bug fix mmap read in large PEBS kan.liang
                   ` (2 preceding siblings ...)
  2017-12-18 11:34 ` [PATCH 3/4] perf/x86: introduce read function for x86_pmu kan.liang
@ 2017-12-18 11:34 ` kan.liang
  2017-12-19 19:02   ` Peter Zijlstra
  3 siblings, 1 reply; 13+ messages in thread
From: kan.liang @ 2017-12-18 11:34 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel; +Cc: tglx, jolsa, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

When the PEBS interrupt threshold is larger than one, there is no way to
get exact auto-reload times and value needed for event update unless
flush the PEBS buffer.

Drain the PEBS buffer in event read when large PEBS is enabled.

For the threshold is one, even auto-reload is enabled, it doesn't need
to be specially handled. Because auto-reload is only effect when event
overflow. There is no overflow in event read.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c |  9 +++++++++
 arch/x86/events/intel/ds.c   | 10 ++++++++++
 arch/x86/events/perf_event.h |  2 ++
 3 files changed, 21 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1f7edaf..2411ef0 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2060,6 +2060,14 @@ static void intel_pmu_del_event(struct perf_event *event)
 		intel_pmu_pebs_del(event);
 }
 
+static void intel_pmu_read_event(struct perf_event *event)
+{
+	if (event->attr.precise_ip)
+		return intel_pmu_pebs_read(event);
+
+	x86_perf_event_update(event, 0, 0);
+}
+
 static void intel_pmu_enable_fixed(struct hw_perf_event *hwc)
 {
 	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
@@ -3496,6 +3504,7 @@ static __initconst const struct x86_pmu intel_pmu = {
 	.disable		= intel_pmu_disable_event,
 	.add			= intel_pmu_add_event,
 	.del			= intel_pmu_del_event,
+	.read			= intel_pmu_read_event,
 	.hw_config		= intel_pmu_hw_config,
 	.schedule_events	= x86_schedule_events,
 	.eventsel		= MSR_ARCH_PERFMON_EVENTSEL0,
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index f0f6026..2e0c215 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -926,6 +926,16 @@ void intel_pmu_pebs_del(struct perf_event *event)
 	pebs_update_state(needed_cb, cpuc, event->ctx->pmu);
 }
 
+void intel_pmu_pebs_read(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	if (pebs_needs_sched_cb(cpuc))
+		return intel_pmu_drain_pebs_buffer();
+
+	x86_perf_event_update(event, 0, 0);
+}
+
 void intel_pmu_pebs_disable(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 3a7a2e7..34faf48 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -930,6 +930,8 @@ void intel_pmu_pebs_add(struct perf_event *event);
 
 void intel_pmu_pebs_del(struct perf_event *event);
 
+void intel_pmu_pebs_read(struct perf_event *event);
+
 void intel_pmu_pebs_enable(struct perf_event *event);
 
 void intel_pmu_pebs_disable(struct perf_event *event);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload
  2017-12-18 11:34 ` [PATCH 2/4] perf/x86/intel: fix event update for auto-reload kan.liang
@ 2017-12-19 18:58   ` Peter Zijlstra
  2017-12-19 20:08     ` Liang, Kan
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2017-12-19 18:58 UTC (permalink / raw)
  To: kan.liang; +Cc: mingo, acme, linux-kernel, tglx, jolsa, eranian, ak

On Mon, Dec 18, 2017 at 03:34:49AM -0800, kan.liang@linux.intel.com wrote:
>  arch/x86/events/core.c     | 14 ++++++++++++++
>  arch/x86/events/intel/ds.c |  8 +++++++-
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 35552ea..f74e21d 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -100,6 +100,20 @@ u64 x86_perf_event_update(struct perf_event *event,
>  	 * of the count.
>  	 */
>  	delta = (new_raw_count << shift) - (prev_raw_count << shift);
> +
> +	/*
> +	 * Take auto-reload into account
> +	 * For the auto-reload before the last time, it went through the
> +	 * whole period (reload_val) every time.
> +	 * Just simply add period * times to the event.
> +	 *
> +	 * For the last load, the elapsed delta (event-)time need to be
> +	 * corrected by adding the period. Because the start point is -period.
> +	 */
> +	if (reload_times > 0) {
> +		delta += (reload_val << shift);
> +		local64_add(reload_val * (reload_times - 1), &event->count);
> +	}
>  	delta >>= shift;
>  
>  	local64_add(delta, &event->count);
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 0b693b7..f0f6026 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1256,11 +1256,17 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
>  				   void *base, void *top,
>  				   int bit, int count)
>  {
> +	struct hw_perf_event *hwc = &event->hw;
>  	struct perf_sample_data data;
>  	struct pt_regs regs;
>  	void *at = get_next_pebs_record_by_bit(base, top, bit);
>  
> -	if (!intel_pmu_save_and_restart(event, 0, 0) &&
> +	/*
> +	 * Now, auto-reload is only enabled in fixed period mode.
> +	 * The reload value is always hwc->sample_period.
> +	 * May need to change it, if auto-reload is enabled in freq mode later.
> +	 */
> +	if (!intel_pmu_save_and_restart(event, hwc->sample_period, count - 1) &&
>  	    !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
>  		return;
>  

This all looks very wrong... In auto reload we should never call
intel_pmu_save_and_restore() in the first place I think.

Things like x86_perf_event_update() and x86_perf_event_set_period()
simply _cannot_ do the right thing when we auto reload the counter.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] perf/x86/intel: drain PEBS buffer in event read
  2017-12-18 11:34 ` [PATCH 4/4] perf/x86/intel: drain PEBS buffer in event read kan.liang
@ 2017-12-19 19:02   ` Peter Zijlstra
  2017-12-19 20:10     ` Liang, Kan
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2017-12-19 19:02 UTC (permalink / raw)
  To: kan.liang; +Cc: mingo, acme, linux-kernel, tglx, jolsa, eranian, ak

On Mon, Dec 18, 2017 at 03:34:51AM -0800, kan.liang@linux.intel.com wrote:
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -926,6 +926,16 @@ void intel_pmu_pebs_del(struct perf_event *event)
>  	pebs_update_state(needed_cb, cpuc, event->ctx->pmu);
>  }
>  
> +void intel_pmu_pebs_read(struct perf_event *event)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +
> +	if (pebs_needs_sched_cb(cpuc))
> +		return intel_pmu_drain_pebs_buffer();
> +
> +	x86_perf_event_update(event, 0, 0);
> +}

This is completely broken.. what if @event isn't a pebs event, but we
do have an auto-reloading pebs event configured?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload
  2017-12-19 18:58   ` Peter Zijlstra
@ 2017-12-19 20:08     ` Liang, Kan
  2017-12-19 20:24       ` Liang, Kan
  2017-12-19 22:07       ` Peter Zijlstra
  0 siblings, 2 replies; 13+ messages in thread
From: Liang, Kan @ 2017-12-19 20:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, acme, linux-kernel, tglx, jolsa, eranian, ak



On 12/19/2017 1:58 PM, Peter Zijlstra wrote:
> On Mon, Dec 18, 2017 at 03:34:49AM -0800, kan.liang@linux.intel.com wrote:
>>   arch/x86/events/core.c     | 14 ++++++++++++++
>>   arch/x86/events/intel/ds.c |  8 +++++++-
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>> index 35552ea..f74e21d 100644
>> --- a/arch/x86/events/core.c
>> +++ b/arch/x86/events/core.c
>> @@ -100,6 +100,20 @@ u64 x86_perf_event_update(struct perf_event *event,
>>   	 * of the count.
>>   	 */
>>   	delta = (new_raw_count << shift) - (prev_raw_count << shift);
>> +
>> +	/*
>> +	 * Take auto-reload into account
>> +	 * For the auto-reload before the last time, it went through the
>> +	 * whole period (reload_val) every time.
>> +	 * Just simply add period * times to the event.
>> +	 *
>> +	 * For the last load, the elapsed delta (event-)time need to be
>> +	 * corrected by adding the period. Because the start point is -period.
>> +	 */
>> +	if (reload_times > 0) {
>> +		delta += (reload_val << shift);
>> +		local64_add(reload_val * (reload_times - 1), &event->count);
>> +	}
>>   	delta >>= shift;
>>   
>>   	local64_add(delta, &event->count);
>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>> index 0b693b7..f0f6026 100644
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -1256,11 +1256,17 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
>>   				   void *base, void *top,
>>   				   int bit, int count)
>>   {
>> +	struct hw_perf_event *hwc = &event->hw;
>>   	struct perf_sample_data data;
>>   	struct pt_regs regs;
>>   	void *at = get_next_pebs_record_by_bit(base, top, bit);
>>   
>> -	if (!intel_pmu_save_and_restart(event, 0, 0) &&
>> +	/*
>> +	 * Now, auto-reload is only enabled in fixed period mode.
>> +	 * The reload value is always hwc->sample_period.
>> +	 * May need to change it, if auto-reload is enabled in freq mode later.
>> +	 */
>> +	if (!intel_pmu_save_and_restart(event, hwc->sample_period, count - 1) &&
>>   	    !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
>>   		return;
>>   
> 
> This all looks very wrong... In auto reload we should never call
> intel_pmu_save_and_restore() in the first place I think.
> 
> Things like x86_perf_event_update() and x86_perf_event_set_period()
> simply _cannot_ do the right thing when we auto reload the counter.
>

I think it should be OK to call it in first place.
For x86_perf_event_update(), the reload_times will tell if it's auto 
reload. Both period_left and event->count are carefully recalculated for 
auto reload.
For x86_perf_event_set_period(), there is nothing special needed for 
auto reload. The period is fixed. The period_left from 
x86_perf_event_update() is already handled.


BTW: It should be 'count' not 'count - 1' which pass to
intel_pmu_save_and_restart(). I just found the issue. I will fix it in
V2 with other improvements if there are any.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/4] perf/x86/intel: drain PEBS buffer in event read
  2017-12-19 19:02   ` Peter Zijlstra
@ 2017-12-19 20:10     ` Liang, Kan
  0 siblings, 0 replies; 13+ messages in thread
From: Liang, Kan @ 2017-12-19 20:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, acme, linux-kernel, tglx, jolsa, eranian, ak



On 12/19/2017 2:02 PM, Peter Zijlstra wrote:
> On Mon, Dec 18, 2017 at 03:34:51AM -0800, kan.liang@linux.intel.com wrote:
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -926,6 +926,16 @@ void intel_pmu_pebs_del(struct perf_event *event)
>>   	pebs_update_state(needed_cb, cpuc, event->ctx->pmu);
>>   }
>>   
>> +void intel_pmu_pebs_read(struct perf_event *event)
>> +{
>> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +
>> +	if (pebs_needs_sched_cb(cpuc))
>> +		return intel_pmu_drain_pebs_buffer();
>> +
>> +	x86_perf_event_update(event, 0, 0);
>> +}
> 
> This is completely broken.. what if @event isn't a pebs event, but we
> do have an auto-reloading pebs event configured?
>

precise_ip will be checked before intel_pmu_pebs_read is called.
So @event must be a pebs event.

@@ -2060,6 +2060,14 @@ static void intel_pmu_del_event(struct perf_event 
*event)
  		intel_pmu_pebs_del(event);
  }

+static void intel_pmu_read_event(struct perf_event *event)
+{
+	if (event->attr.precise_ip)
+		return intel_pmu_pebs_read(event);
+
+	x86_perf_event_update(event, 0, 0);
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload
  2017-12-19 20:08     ` Liang, Kan
@ 2017-12-19 20:24       ` Liang, Kan
  2017-12-19 22:07       ` Peter Zijlstra
  1 sibling, 0 replies; 13+ messages in thread
From: Liang, Kan @ 2017-12-19 20:24 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, acme, linux-kernel, tglx, jolsa, eranian, ak



On 12/19/2017 3:08 PM, Liang, Kan wrote:
> 
> 
> On 12/19/2017 1:58 PM, Peter Zijlstra wrote:
>> On Mon, Dec 18, 2017 at 03:34:49AM -0800, kan.liang@linux.intel.com 
>> wrote:
>>>   arch/x86/events/core.c     | 14 ++++++++++++++
>>>   arch/x86/events/intel/ds.c |  8 +++++++-
>>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
>>> index 35552ea..f74e21d 100644
>>> --- a/arch/x86/events/core.c
>>> +++ b/arch/x86/events/core.c
>>> @@ -100,6 +100,20 @@ u64 x86_perf_event_update(struct perf_event *event,
>>>        * of the count.
>>>        */
>>>       delta = (new_raw_count << shift) - (prev_raw_count << shift);
>>> +
>>> +    /*
>>> +     * Take auto-reload into account
>>> +     * For the auto-reload before the last time, it went through the
>>> +     * whole period (reload_val) every time.
>>> +     * Just simply add period * times to the event.
>>> +     *
>>> +     * For the last load, the elapsed delta (event-)time need to be
>>> +     * corrected by adding the period. Because the start point is 
>>> -period.
>>> +     */
>>> +    if (reload_times > 0) {
>>> +        delta += (reload_val << shift);
>>> +        local64_add(reload_val * (reload_times - 1), &event->count);
>>> +    }
>>>       delta >>= shift;
>>>       local64_add(delta, &event->count);
>>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>>> index 0b693b7..f0f6026 100644
>>> --- a/arch/x86/events/intel/ds.c
>>> +++ b/arch/x86/events/intel/ds.c
>>> @@ -1256,11 +1256,17 @@ static void __intel_pmu_pebs_event(struct 
>>> perf_event *event,
>>>                      void *base, void *top,
>>>                      int bit, int count)
>>>   {
>>> +    struct hw_perf_event *hwc = &event->hw;
>>>       struct perf_sample_data data;
>>>       struct pt_regs regs;
>>>       void *at = get_next_pebs_record_by_bit(base, top, bit);
>>> -    if (!intel_pmu_save_and_restart(event, 0, 0) &&
>>> +    /*
>>> +     * Now, auto-reload is only enabled in fixed period mode.
>>> +     * The reload value is always hwc->sample_period.
>>> +     * May need to change it, if auto-reload is enabled in freq mode 
>>> later.
>>> +     */
>>> +    if (!intel_pmu_save_and_restart(event, hwc->sample_period, count 
>>> - 1) &&
>>>           !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
>>>           return;
>>
>> This all looks very wrong... In auto reload we should never call
>> intel_pmu_save_and_restore() in the first place I think.
>>
>> Things like x86_perf_event_update() and x86_perf_event_set_period()
>> simply _cannot_ do the right thing when we auto reload the counter.
>>
> 
> I think it should be OK to call it in first place.
> For x86_perf_event_update(), the reload_times will tell if it's auto 
> reload. Both period_left and event->count are carefully recalculated for 
> auto reload.

Think a bit more. You are right. We cannot rely on count to tell us if 
it's auto reload.
The count could also be 1 if auto reload is enabled.

I will fix it in V2.

Thanks,
Kan

> For x86_perf_event_set_period(), there is nothing special needed for 
> auto reload. The period is fixed. The period_left from 
> x86_perf_event_update() is already handled.
> 
> 
> BTW: It should be 'count' not 'count - 1' which pass to
> intel_pmu_save_and_restart(). I just found the issue. I will fix it in
> V2 with other improvements if there are any.
> 
> Thanks,
> Kan
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload
  2017-12-19 20:08     ` Liang, Kan
  2017-12-19 20:24       ` Liang, Kan
@ 2017-12-19 22:07       ` Peter Zijlstra
  2017-12-19 22:11         ` Andi Kleen
  2017-12-19 23:25         ` Liang, Kan
  1 sibling, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2017-12-19 22:07 UTC (permalink / raw)
  To: Liang, Kan; +Cc: mingo, acme, linux-kernel, tglx, jolsa, eranian, ak

On Tue, Dec 19, 2017 at 03:08:58PM -0500, Liang, Kan wrote:
> > This all looks very wrong... In auto reload we should never call
> > intel_pmu_save_and_restore() in the first place I think.
> > 
> > Things like x86_perf_event_update() and x86_perf_event_set_period()
> > simply _cannot_ do the right thing when we auto reload the counter.
> > 
> 
> I think it should be OK to call it in first place.
> For x86_perf_event_update(), the reload_times will tell if it's auto reload.
> Both period_left and event->count are carefully recalculated for auto
> reload.

How does prev_count make sense when we've already reloaded a bunch of
times?

> For x86_perf_event_set_period(), there is nothing special needed for auto
> reload. The period is fixed. The period_left from x86_perf_event_update() is
> already handled.

Hurm.. I see. But rather than make an ever bigger trainwreck of things,
I'd rather you just write a special purpose intel_pmu_save_and_restart()
just for AUTO_RELOAD.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload
  2017-12-19 22:07       ` Peter Zijlstra
@ 2017-12-19 22:11         ` Andi Kleen
  2017-12-19 23:25         ` Liang, Kan
  1 sibling, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2017-12-19 22:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liang, Kan, mingo, acme, linux-kernel, tglx, jolsa, eranian

On Tue, Dec 19, 2017 at 11:07:09PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 03:08:58PM -0500, Liang, Kan wrote:
> > > This all looks very wrong... In auto reload we should never call
> > > intel_pmu_save_and_restore() in the first place I think.
> > > 
> > > Things like x86_perf_event_update() and x86_perf_event_set_period()
> > > simply _cannot_ do the right thing when we auto reload the counter.
> > > 
> > 
> > I think it should be OK to call it in first place.
> > For x86_perf_event_update(), the reload_times will tell if it's auto reload.
> > Both period_left and event->count are carefully recalculated for auto
> > reload.
> 
> How does prev_count make sense when we've already reloaded a bunch of
> times?

We can figure it out how often there was a reload based on the PEBS index.

-Andi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: fix event update for auto-reload
  2017-12-19 22:07       ` Peter Zijlstra
  2017-12-19 22:11         ` Andi Kleen
@ 2017-12-19 23:25         ` Liang, Kan
  1 sibling, 0 replies; 13+ messages in thread
From: Liang, Kan @ 2017-12-19 23:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, acme, linux-kernel, tglx, jolsa, eranian, ak



On 12/19/2017 5:07 PM, Peter Zijlstra wrote:
> On Tue, Dec 19, 2017 at 03:08:58PM -0500, Liang, Kan wrote:
>>> This all looks very wrong... In auto reload we should never call
>>> intel_pmu_save_and_restore() in the first place I think.
>>>
>>> Things like x86_perf_event_update() and x86_perf_event_set_period()
>>> simply _cannot_ do the right thing when we auto reload the counter.
>>>
>>
>> I think it should be OK to call it in first place.
>> For x86_perf_event_update(), the reload_times will tell if it's auto reload.
>> Both period_left and event->count are carefully recalculated for auto
>> reload.
> 
> How does prev_count make sense when we've already reloaded a bunch of
> times?

Same as non-auto-reload, it's the 'left' (unfinished) period from last time.
The period for the first record should always be the 'left' period no 
matter on which case.
For auto-reload, it doesn't need to increase the prev_count with the 
reload. Because for later records, the period should be exactly the same 
as the reload value.

To calculate the event->count,
For auto-reload, the event->count = prev_count + (reload times - 1) * 
reload value + gap between PMI trigger and PMI handler.

For non-auto-reload, the event->count = prev_count + gap between PMI 
trigger and PMI handler.

The 'prev_count' is same for both auto-reload and non-auto-reload.

The gap is a little bit tricky for auto-reload. Because it starts from 
-reload_value. But for non-auto-reload, it starts from 0.
"delta += (reload_val << shift);" is used to correct it.

> 
>> For x86_perf_event_set_period(), there is nothing special needed for auto
>> reload. The period is fixed. The period_left from x86_perf_event_update() is
>> already handled.
> 
> Hurm.. I see. But rather than make an ever bigger trainwreck of things,
> I'd rather you just write a special purpose intel_pmu_save_and_restart()
> just for AUTO_RELOAD.

OK. I will do it in V2.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-12-19 23:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-18 11:34 [PATCH 0/4] bug fix mmap read in large PEBS kan.liang
2017-12-18 11:34 ` [PATCH 1/4] perf/x86/intel: pass auto-reload information to event update kan.liang
2017-12-18 11:34 ` [PATCH 2/4] perf/x86/intel: fix event update for auto-reload kan.liang
2017-12-19 18:58   ` Peter Zijlstra
2017-12-19 20:08     ` Liang, Kan
2017-12-19 20:24       ` Liang, Kan
2017-12-19 22:07       ` Peter Zijlstra
2017-12-19 22:11         ` Andi Kleen
2017-12-19 23:25         ` Liang, Kan
2017-12-18 11:34 ` [PATCH 3/4] perf/x86: introduce read function for x86_pmu kan.liang
2017-12-18 11:34 ` [PATCH 4/4] perf/x86/intel: drain PEBS buffer in event read kan.liang
2017-12-19 19:02   ` Peter Zijlstra
2017-12-19 20:10     ` Liang, Kan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.