linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] perf: Fuzzer fixes
@ 2016-12-28 13:31 Jiri Olsa
  2016-12-28 13:31 ` [PATCH 1/4] perf/x86/intel: Account interrupts for PEBS errors Jiri Olsa
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Jiri Olsa @ 2016-12-28 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

hi,
sending out 3 fuzzer fixes, which kept my server
up and fuzzing over the xmas ;-)

Available also in here:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
  perf/fuzzer

jirka


---
Jiri Olsa (4):
      perf/x86/intel: Account interrupts for PEBS errors
      perf/x86: Fix period for non sampling events
      perf: Add perf_event_overflow_throttle function
      perf/x86/intel: Throttle PEBS events only from pmi

 arch/x86/events/core.c       |  7 +++++++
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/ds.c   | 21 +++++++++++++--------
 arch/x86/events/perf_event.h |  2 +-
 include/linux/perf_event.h   |  5 +++++
 kernel/events/core.c         | 55 +++++++++++++++++++++++++++++++++++--------------------
 6 files changed, 62 insertions(+), 30 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] perf/x86/intel: Account interrupts for PEBS errors
  2016-12-28 13:31 [PATCH 0/4] perf: Fuzzer fixes Jiri Olsa
@ 2016-12-28 13:31 ` Jiri Olsa
  2017-01-14 12:29   ` [tip:perf/urgent] " tip-bot for Jiri Olsa
  2016-12-28 13:31 ` [PATCH 2/4] perf/x86: Fix period for non sampling events Jiri Olsa
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Jiri Olsa @ 2016-12-28 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

It's possible to setup PEBS events to get only errors and not
a single data, like on SNB-X (model 45) and IVB-EP (model 62)
via 2 perf commands running simultaneously:

    taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10

This leads to soft lock up, because the error path of the
intel_pmu_drain_pebs_nhm does not account event->hw.interrupt
for error PEBS interrupts, so in case you're getting ONLY
errors you don;t have a way to stop event when it's over
the max_samples_per_tick limit.

  NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
  ...
  RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
  ...
  Call Trace:
   ? trace_hardirqs_on_caller+0xf5/0x1b0
   ? perf_cgroup_attach+0x70/0x70
   perf_install_in_context+0x199/0x1b0
   ? ctx_resched+0x90/0x90
   SYSC_perf_event_open+0x641/0xf90
   SyS_perf_event_open+0x9/0x10
   do_syscall_64+0x6c/0x1f0
   entry_SYSCALL64_slow_path+0x25/0x25

Adding perf_event_account_interrupt which does the interrupt
and frequency checks and calling it from drain_pebs's error
path.

Keeping pending_kill and pending_wakeup logic only in
__perf_event_overflow path, abecause they make sense only if
there's any data to deliver.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/events/intel/ds.c |  6 +++++-
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 47 ++++++++++++++++++++++++++++++----------------
 3 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index be202390bbd3..9dfeeeca0ea8 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1389,9 +1389,13 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
 			continue;
 
 		/* log dropped samples number */
-		if (error[bit])
+		if (error[bit]) {
 			perf_log_lost_samples(event, error[bit]);
 
+			if (perf_event_account_interrupt(event))
+				x86_pmu_stop(event, 0);
+		}
+
 		if (counts[bit]) {
 			__intel_pmu_pebs_event(event, iregs, base,
 					       top, bit, counts[bit]);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4741ecdb9817..78ed8105e64d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1259,6 +1259,7 @@ extern void perf_event_disable(struct perf_event *event);
 extern void perf_event_disable_local(struct perf_event *event);
 extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
+extern int perf_event_account_interrupt(struct perf_event *event);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
 perf_aux_output_begin(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index faf073d0287f..38f4baef5df5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7034,25 +7034,12 @@ static void perf_log_itrace_start(struct perf_event *event)
 	perf_output_end(&handle);
 }
 
-/*
- * Generic event overflow handling, sampling.
- */
-
-static int __perf_event_overflow(struct perf_event *event,
-				   int throttle, struct perf_sample_data *data,
-				   struct pt_regs *regs)
+static int
+__perf_event_account_interrupt(struct perf_event *event, int throttle)
 {
-	int events = atomic_read(&event->event_limit);
 	struct hw_perf_event *hwc = &event->hw;
-	u64 seq;
 	int ret = 0;
-
-	/*
-	 * Non-sampling counters might still use the PMI to fold short
-	 * hardware counters, ignore those.
-	 */
-	if (unlikely(!is_sampling_event(event)))
-		return 0;
+	u64 seq;
 
 	seq = __this_cpu_read(perf_throttled_seq);
 	if (seq != hwc->interrupts_seq) {
@@ -7080,6 +7067,34 @@ static int __perf_event_overflow(struct perf_event *event,
 			perf_adjust_period(event, delta, hwc->last_period, true);
 	}
 
+	return ret;
+}
+
+int perf_event_account_interrupt(struct perf_event *event)
+{
+	return __perf_event_account_interrupt(event, 1);
+}
+
+/*
+ * Generic event overflow handling, sampling.
+ */
+
+static int __perf_event_overflow(struct perf_event *event,
+				   int throttle, struct perf_sample_data *data,
+				   struct pt_regs *regs)
+{
+	int events = atomic_read(&event->event_limit);
+	int ret = 0;
+
+	/*
+	 * Non-sampling counters might still use the PMI to fold short
+	 * hardware counters, ignore those.
+	 */
+	if (unlikely(!is_sampling_event(event)))
+		return 0;
+
+	ret = __perf_event_account_interrupt(event, throttle);
+
 	/*
 	 * XXX event_limit might not quite work as expected on inherited
 	 * events
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] perf/x86: Fix period for non sampling events
  2016-12-28 13:31 [PATCH 0/4] perf: Fuzzer fixes Jiri Olsa
  2016-12-28 13:31 ` [PATCH 1/4] perf/x86/intel: Account interrupts for PEBS errors Jiri Olsa
@ 2016-12-28 13:31 ` Jiri Olsa
  2017-01-03  9:40   ` Peter Zijlstra
  2017-01-03 15:09   ` [PATCH 2/4] perf/x86: Fix period for non sampling events Peter Zijlstra
  2016-12-28 13:31 ` [PATCH 3/4] perf: Add perf_event_overflow_throttle function Jiri Olsa
  2016-12-28 13:31 ` [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi Jiri Olsa
  3 siblings, 2 replies; 16+ messages in thread
From: Jiri Olsa @ 2016-12-28 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

When in counting mode we setup the counter with the
longest possible period and read the value with read
syscall.

We also still setup the PMI to be triggered when such
counter overflow to reconfigure it.

We also get PEBS interrupt if such counter has precise_ip
set (which makes no sense, but it's possible).

Having such counter with:
  - counting mode
  - precise_ip set

I watched my server to get stuck serving PEBS interrupt
again and again because of following (AFAICS):

  - PEBS interrupt is triggered before PMI
  - when PEBS handling path reconfigured counter it
    had remaining value of -256
  - the x86_perf_event_set_period does not consider this
    as an extreme value, so it's configured back as the
    new counter value
  - this makes the PEBS interrupt to be triggered right
    away again
  - and because it's non sampling event, this irq storm
    is never throttled

Forcing the non sampling events to reconfigure from scratch
is probably not the best solution, but it seems to work.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/events/core.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f1c22584a46f..657486be9780 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1116,6 +1116,13 @@ int x86_perf_event_set_period(struct perf_event *event)
 		return 0;
 
 	/*
+	 * For non sampling event, we are not interested
+	 * in leftover, force the count from beginning.
+	 */
+	if (left && !is_sampling_event(event))
+		left = 0;
+
+	/*
 	 * If we are way outside a reasonable range then just skip forward:
 	 */
 	if (unlikely(left <= -period)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] perf: Add perf_event_overflow_throttle function
  2016-12-28 13:31 [PATCH 0/4] perf: Fuzzer fixes Jiri Olsa
  2016-12-28 13:31 ` [PATCH 1/4] perf/x86/intel: Account interrupts for PEBS errors Jiri Olsa
  2016-12-28 13:31 ` [PATCH 2/4] perf/x86: Fix period for non sampling events Jiri Olsa
@ 2016-12-28 13:31 ` Jiri Olsa
  2016-12-28 13:31 ` [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi Jiri Olsa
  3 siblings, 0 replies; 16+ messages in thread
From: Jiri Olsa @ 2016-12-28 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

Adding perf_event_overflow_throttle function to allow
callers to decide on throttling events. It's used in
following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/perf_event.h |  4 ++++
 kernel/events/core.c       | 14 +++++++-------
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 78ed8105e64d..f5a9468bad90 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -966,6 +966,10 @@ extern int perf_event_overflow(struct perf_event *event,
 				 struct perf_sample_data *data,
 				 struct pt_regs *regs);
 
+extern int perf_event_overflow_throttle(struct perf_event *event,
+					int throttle, struct perf_sample_data *data,
+					struct pt_regs *regs);
+
 extern void perf_event_output_forward(struct perf_event *event,
 				     struct perf_sample_data *data,
 				     struct pt_regs *regs);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 38f4baef5df5..466ed56340bc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7079,9 +7079,9 @@ int perf_event_account_interrupt(struct perf_event *event)
  * Generic event overflow handling, sampling.
  */
 
-static int __perf_event_overflow(struct perf_event *event,
-				   int throttle, struct perf_sample_data *data,
-				   struct pt_regs *regs)
+int perf_event_overflow_throttle(struct perf_event *event,
+				 int throttle, struct perf_sample_data *data,
+				 struct pt_regs *regs)
 {
 	int events = atomic_read(&event->event_limit);
 	int ret = 0;
@@ -7122,7 +7122,7 @@ int perf_event_overflow(struct perf_event *event,
 			  struct perf_sample_data *data,
 			  struct pt_regs *regs)
 {
-	return __perf_event_overflow(event, 1, data, regs);
+	return perf_event_overflow_throttle(event, 1, data, regs);
 }
 
 /*
@@ -7184,8 +7184,8 @@ static void perf_swevent_overflow(struct perf_event *event, u64 overflow,
 		return;
 
 	for (; overflow; overflow--) {
-		if (__perf_event_overflow(event, throttle,
-					    data, regs)) {
+		if (perf_event_overflow_throttle(event, throttle,
+						 data, regs)) {
 			/*
 			 * We inhibit the overflow from happening when
 			 * hwc->interrupts == MAX_INTERRUPTS.
@@ -8298,7 +8298,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 
 	if (regs && !perf_exclude_event(event, regs)) {
 		if (!(event->attr.exclude_idle && is_idle_task(current)))
-			if (__perf_event_overflow(event, 1, &data, regs))
+			if (perf_event_overflow_throttle(event, 1, &data, regs))
 				ret = HRTIMER_NORESTART;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi
  2016-12-28 13:31 [PATCH 0/4] perf: Fuzzer fixes Jiri Olsa
                   ` (2 preceding siblings ...)
  2016-12-28 13:31 ` [PATCH 3/4] perf: Add perf_event_overflow_throttle function Jiri Olsa
@ 2016-12-28 13:31 ` Jiri Olsa
  2017-01-03 13:45   ` Peter Zijlstra
  2017-01-24 16:41   ` Peter Zijlstra
  3 siblings, 2 replies; 16+ messages in thread
From: Jiri Olsa @ 2016-12-28 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

This patch fixes following WARNING:

  WARNING: CPU: 15 PID: 15768 at arch/x86/events/core.c:1256 x86_pmu_start+0x1b3/0x1c0
  ...
  Call Trace:
   <IRQ>
   dump_stack+0x86/0xc3
   __warn+0xcb/0xf0
   warn_slowpath_null+0x1d/0x20
   x86_pmu_start+0x1b3/0x1c0
   perf_event_task_tick+0x342/0x3f0
   scheduler_tick+0x75/0xd0
   update_process_times+0x47/0x60
   tick_sched_handle.isra.19+0x25/0x60
   tick_sched_timer+0x3d/0x70
   __hrtimer_run_queues+0xfb/0x510
   hrtimer_interrupt+0x9d/0x1a0
   local_apic_timer_interrupt+0x38/0x60
   smp_trace_apic_timer_interrupt+0x56/0x25a
   trace_apic_timer_interrupt+0x9d/0xb0
   ...

which happens AFAICS under following conditions:
(we have PEBS events configured)

  - x86_pmu_enable reconfigures counters and calls:
       - x86_pmu_stop on PEBS event
       - x86_pmu_stop drains the PEBS buffer, crosses
         the throttle limit and sets:
           event->hw.interrupts = MAX_INTERRUPTS
       - following x86_pmu_start call starts the event
  - perf_event_task_tick is triggered
    - perf_adjust_freq_unthr_context sees event with
      MAX_INTERRUPTS set and calls x86_pmu_start on already
      started event, which triggers the warning

My first attempt to fix this was to unthrottle the event
before starting it in x86_pmu_enable. But I think that
omitting the throttling completely when we are not in the
PMI is better.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/ds.c   | 17 +++++++++--------
 arch/x86/events/perf_event.h |  2 +-
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 86138267b68a..16332867d435 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2137,7 +2137,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	 */
 	if (__test_and_clear_bit(62, (unsigned long *)&status)) {
 		handled++;
-		x86_pmu.drain_pebs(regs);
+		x86_pmu.drain_pebs(regs, 1);
 		status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
 	}
 
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 9dfeeeca0ea8..75ba3cb06012 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -603,7 +603,7 @@ static inline void intel_pmu_drain_pebs_buffer(void)
 {
 	struct pt_regs regs;
 
-	x86_pmu.drain_pebs(&regs);
+	x86_pmu.drain_pebs(&regs, 0);
 }
 
 void intel_pmu_pebs_sched_task(struct perf_event_context *ctx, bool sched_in)
@@ -1233,7 +1233,7 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
 static void __intel_pmu_pebs_event(struct perf_event *event,
 				   struct pt_regs *iregs,
 				   void *base, void *top,
-				   int bit, int count)
+				   int bit, int count, int pmi)
 {
 	struct perf_sample_data data;
 	struct pt_regs regs;
@@ -1257,14 +1257,14 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	 * All but the last records are processed.
 	 * The last one is left to be able to call the overflow handler.
 	 */
-	if (perf_event_overflow(event, &data, &regs)) {
+	if (perf_event_overflow_throttle(event, pmi, &data, &regs)) {
 		x86_pmu_stop(event, 0);
 		return;
 	}
 
 }
 
-static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
+static void intel_pmu_drain_pebs_core(struct pt_regs *iregs, int pmi)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct debug_store *ds = cpuc->ds;
@@ -1295,10 +1295,10 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
 	if (n <= 0)
 		return;
 
-	__intel_pmu_pebs_event(event, iregs, at, top, 0, n);
+	__intel_pmu_pebs_event(event, iregs, at, top, 0, n, pmi);
 }
 
-static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
+static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, int pmi)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct debug_store *ds = cpuc->ds;
@@ -1392,13 +1392,14 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
 		if (error[bit]) {
 			perf_log_lost_samples(event, error[bit]);
 
-			if (perf_event_account_interrupt(event))
+			if (pmi && perf_event_account_interrupt(event))
 				x86_pmu_stop(event, 0);
 		}
 
 		if (counts[bit]) {
 			__intel_pmu_pebs_event(event, iregs, base,
-					       top, bit, counts[bit]);
+					       top, bit, counts[bit],
+					       pmi);
 		}
 	}
 }
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index bcbb1d2ae10b..cdcfdd06b2d2 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -590,7 +590,7 @@ struct x86_pmu {
 			pebs_prec_dist	:1;
 	int		pebs_record_size;
 	int		pebs_buffer_size;
-	void		(*drain_pebs)(struct pt_regs *regs);
+	void		(*drain_pebs)(struct pt_regs *regs, int pmi);
 	struct event_constraint *pebs_constraints;
 	void		(*pebs_aliases)(struct perf_event *event);
 	int 		max_pebs_events;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] perf/x86: Fix period for non sampling events
  2016-12-28 13:31 ` [PATCH 2/4] perf/x86: Fix period for non sampling events Jiri Olsa
@ 2017-01-03  9:40   ` Peter Zijlstra
  2017-01-03 14:24     ` [PATCH] perf/x86: Reject non sampling events with precise_ip Jiri Olsa
  2017-01-03 15:09   ` [PATCH 2/4] perf/x86: Fix period for non sampling events Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2017-01-03  9:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

On Wed, Dec 28, 2016 at 02:31:04PM +0100, Jiri Olsa wrote:
> When in counting mode we setup the counter with the
> longest possible period and read the value with read
> syscall.
> 
> We also still setup the PMI to be triggered when such
> counter overflow to reconfigure it.
> 
> We also get PEBS interrupt if such counter has precise_ip
> set (which makes no sense, but it's possible).

I think we should reject non sampling pebs events, as you say they make
no sense what so ever.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi
  2016-12-28 13:31 ` [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi Jiri Olsa
@ 2017-01-03 13:45   ` Peter Zijlstra
  2017-01-24 16:41   ` Peter Zijlstra
  1 sibling, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2017-01-03 13:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

On Wed, Dec 28, 2016 at 02:31:06PM +0100, Jiri Olsa wrote:
> This patch fixes following WARNING:
> 
>   WARNING: CPU: 15 PID: 15768 at arch/x86/events/core.c:1256 x86_pmu_start+0x1b3/0x1c0
>   ...
>   Call Trace:
>    <IRQ>
>    dump_stack+0x86/0xc3
>    __warn+0xcb/0xf0
>    warn_slowpath_null+0x1d/0x20
>    x86_pmu_start+0x1b3/0x1c0
>    perf_event_task_tick+0x342/0x3f0
>    scheduler_tick+0x75/0xd0
>    update_process_times+0x47/0x60
>    tick_sched_handle.isra.19+0x25/0x60
>    tick_sched_timer+0x3d/0x70
>    __hrtimer_run_queues+0xfb/0x510
>    hrtimer_interrupt+0x9d/0x1a0
>    local_apic_timer_interrupt+0x38/0x60
>    smp_trace_apic_timer_interrupt+0x56/0x25a
>    trace_apic_timer_interrupt+0x9d/0xb0
>    ...
> 
> which happens AFAICS under following conditions:
> (we have PEBS events configured)
> 
>   - x86_pmu_enable reconfigures counters and calls:
>        - x86_pmu_stop on PEBS event
>        - x86_pmu_stop drains the PEBS buffer, crosses
>          the throttle limit and sets:
>            event->hw.interrupts = MAX_INTERRUPTS
>        - following x86_pmu_start call starts the event
>   - perf_event_task_tick is triggered
>     - perf_adjust_freq_unthr_context sees event with
>       MAX_INTERRUPTS set and calls x86_pmu_start on already
>       started event, which triggers the warning
> 
> My first attempt to fix this was to unthrottle the event
> before starting it in x86_pmu_enable. But I think that
> omitting the throttling completely when we are not in the
> PMI is better.

Hurm,.. good catch that. Makes a bit of a mess of things though. Then
again, I cannot quickly think of something better either :/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] perf/x86: Reject non sampling events with precise_ip
  2017-01-03  9:40   ` Peter Zijlstra
@ 2017-01-03 14:24     ` Jiri Olsa
  2017-01-03 22:06       ` Vince Weaver
  2017-01-14 12:29       ` [tip:perf/urgent] " tip-bot for Jiri Olsa
  0 siblings, 2 replies; 16+ messages in thread
From: Jiri Olsa @ 2017-01-03 14:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

On Tue, Jan 03, 2017 at 10:40:59AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 28, 2016 at 02:31:04PM +0100, Jiri Olsa wrote:
> > When in counting mode we setup the counter with the
> > longest possible period and read the value with read
> > syscall.
> > 
> > We also still setup the PMI to be triggered when such
> > counter overflow to reconfigure it.
> > 
> > We also get PEBS interrupt if such counter has precise_ip
> > set (which makes no sense, but it's possible).
> 
> I think we should reject non sampling pebs events, as you say they make
> no sense what so ever.

ook, attached

jirka


---
As Peter suggested [1] rejecting non sampling PEBS events,
because they dont make any sense and could cause issues
in NMI handler [2].

[1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop
[2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/events/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 64582cfa6976..48b851f0ef89 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -505,6 +505,10 @@ int x86_pmu_hw_config(struct perf_event *event)
 
 		if (event->attr.precise_ip > precise)
 			return -EOPNOTSUPP;
+
+		/* There's no sense in having PEBS for non sampling events. */
+		if (!is_sampling_event(event))
+			return -EINVAL;
 	}
 	/*
 	 * check that PEBS LBR correction does not conflict with
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] perf/x86: Fix period for non sampling events
  2016-12-28 13:31 ` [PATCH 2/4] perf/x86: Fix period for non sampling events Jiri Olsa
  2017-01-03  9:40   ` Peter Zijlstra
@ 2017-01-03 15:09   ` Peter Zijlstra
  2017-01-03 15:26     ` Jiri Olsa
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2017-01-03 15:09 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver, Stephane Eranian

On Wed, Dec 28, 2016 at 02:31:04PM +0100, Jiri Olsa wrote:
> When in counting mode we setup the counter with the
> longest possible period and read the value with read
> syscall.
> 
> We also still setup the PMI to be triggered when such
> counter overflow to reconfigure it.
> 
> We also get PEBS interrupt if such counter has precise_ip
> set (which makes no sense, but it's possible).
> 
> Having such counter with:
>   - counting mode
>   - precise_ip set
> 
> I watched my server to get stuck serving PEBS interrupt
> again and again because of following (AFAICS):
> 
>   - PEBS interrupt is triggered before PMI

Slightly confused, the PEBS interrupt _is_ the PMI. And how can we get
an interrupt before the counter overflows?

>   - when PEBS handling path reconfigured counter it
>     had remaining value of -256

You're talking about the actual counter value, right, not @left?

>   - the x86_perf_event_set_period does not consider this
>     as an extreme value, so it's configured back as the
>     new counter value

Right, a counter value of -256 would result in @left being 256 which is
positive and not too large, so we 'retain' the value.

>   - this makes the PEBS interrupt to be triggered right
>     away again

So I'm curious how this is even possible. The normal described way of
things is:

	- we program the counter with a negative value
	- each 'event' does a counter increment
	- if we 'overflow' (turn positive) we start to arm the PEBS
	  assist
	- once the assist is armed, the next 'event' triggers a PEBS
	  record.
	- if the amount of PEBS records exceeds the DS threshold, we
	  set bit 62 in GLOBAL_STATUS and raise the PMI.

At which point the actual counter value should be at the very least 1
(for having counted the event that triggers the PEBS assist into
creating the record).


Did your kernel include commit:

  daa864b8f8e3 ("perf/x86/pebs: Fix handling of PEBS buffer overflows")

?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] perf/x86: Fix period for non sampling events
  2017-01-03 15:09   ` [PATCH 2/4] perf/x86: Fix period for non sampling events Peter Zijlstra
@ 2017-01-03 15:26     ` Jiri Olsa
  0 siblings, 0 replies; 16+ messages in thread
From: Jiri Olsa @ 2017-01-03 15:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver, Stephane Eranian

On Tue, Jan 03, 2017 at 04:09:46PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 28, 2016 at 02:31:04PM +0100, Jiri Olsa wrote:
> > When in counting mode we setup the counter with the
> > longest possible period and read the value with read
> > syscall.
> > 
> > We also still setup the PMI to be triggered when such
> > counter overflow to reconfigure it.
> > 
> > We also get PEBS interrupt if such counter has precise_ip
> > set (which makes no sense, but it's possible).
> > 
> > Having such counter with:
> >   - counting mode
> >   - precise_ip set
> > 
> > I watched my server to get stuck serving PEBS interrupt
> > again and again because of following (AFAICS):
> > 
> >   - PEBS interrupt is triggered before PMI
> 
> Slightly confused, the PEBS interrupt _is_ the PMI. And how can we get
> an interrupt before the counter overflows?
> 
> >   - when PEBS handling path reconfigured counter it
> >     had remaining value of -256
> 
> You're talking about the actual counter value, right, not @left?

right

> 
> >   - the x86_perf_event_set_period does not consider this
> >     as an extreme value, so it's configured back as the
> >     new counter value
> 
> Right, a counter value of -256 would result in @left being 256 which is
> positive and not too large, so we 'retain' the value.
> 
> >   - this makes the PEBS interrupt to be triggered right
> >     away again
> 
> So I'm curious how this is even possible. The normal described way of
> things is:
> 
> 	- we program the counter with a negative value
> 	- each 'event' does a counter increment
> 	- if we 'overflow' (turn positive) we start to arm the PEBS
> 	  assist

heh, I guess I thought this could happen earlier ;-) otherwise
I dont get how could I saw -256 left in the counter value..

> 	- once the assist is armed, the next 'event' triggers a PEBS
> 	  record.
> 	- if the amount of PEBS records exceeds the DS threshold, we
> 	  set bit 62 in GLOBAL_STATUS and raise the PMI.
> 
> At which point the actual counter value should be at the very least 1
> (for having counted the event that triggers the PEBS assist into
> creating the record).

what I saw was the bit 62 set and pebs_drain->__intel_pmu_pebs_event
re-configuring the event back with -256 again and again..

I'll run fuzzer again without the fix with my debug stuff in and try
to recreate ;-)

> Did your kernel include commit:
> 
>   daa864b8f8e3 ("perf/x86/pebs: Fix handling of PEBS buffer overflows")

yep

jirka

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] perf/x86: Reject non sampling events with precise_ip
  2017-01-03 14:24     ` [PATCH] perf/x86: Reject non sampling events with precise_ip Jiri Olsa
@ 2017-01-03 22:06       ` Vince Weaver
  2017-01-14 12:29       ` [tip:perf/urgent] " tip-bot for Jiri Olsa
  1 sibling, 0 replies; 16+ messages in thread
From: Vince Weaver @ 2017-01-03 22:06 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Jiri Olsa, lkml, Ingo Molnar, Andi Kleen,
	Alexander Shishkin, Arnaldo Carvalho de Melo

On Tue, 3 Jan 2017, Jiri Olsa wrote:
> On Tue, Jan 03, 2017 at 10:40:59AM +0100, Peter Zijlstra wrote:
> > 
> > I think we should reject non sampling pebs events, as you say they make
> > no sense what so ever.
> 
> ook, attached
> 

you can use the PEBS events to gather aggregate stats though and they 
seem roughly right.  Are they truly meaningless?

I had misremembered that they might not have the determinism problems of 
regular events (turns out that's wrong).  They oddly seem to be worse in 
some limited tests I did.

So I guess nothing will be lost if they're disabled.

Vince

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip:perf/urgent] perf/x86/intel: Account interrupts for PEBS errors
  2016-12-28 13:31 ` [PATCH 1/4] perf/x86/intel: Account interrupts for PEBS errors Jiri Olsa
@ 2017-01-14 12:29   ` tip-bot for Jiri Olsa
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot for Jiri Olsa @ 2017-01-14 12:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: alexander.shishkin, mingo, acme, jolsa, tglx, eranian, hpa,
	linux-kernel, vince, torvalds, acme, jolsa, peterz,
	vincent.weaver

Commit-ID:  475113d937adfd150eb82b5e2c5507125a68e7af
Gitweb:     http://git.kernel.org/tip/475113d937adfd150eb82b5e2c5507125a68e7af
Author:     Jiri Olsa <jolsa@kernel.org>
AuthorDate: Wed, 28 Dec 2016 14:31:03 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 11:06:49 +0100

perf/x86/intel: Account interrupts for PEBS errors

It's possible to set up PEBS events to get only errors and not
any data, like on SNB-X (model 45) and IVB-EP (model 62)
via 2 perf commands running simultaneously:

    taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10

This leads to a soft lock up, because the error path of the
intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
for error PEBS interrupts, so in case you're getting ONLY
errors you don't have a way to stop the event when it's over
the max_samples_per_tick limit:

  NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
  ...
  RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
  ...
  Call Trace:
   ? trace_hardirqs_on_caller+0xf5/0x1b0
   ? perf_cgroup_attach+0x70/0x70
   perf_install_in_context+0x199/0x1b0
   ? ctx_resched+0x90/0x90
   SYSC_perf_event_open+0x641/0xf90
   SyS_perf_event_open+0x9/0x10
   do_syscall_64+0x6c/0x1f0
   entry_SYSCALL64_slow_path+0x25/0x25

Add perf_event_account_interrupt() which does the interrupt
and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
error path.

We keep the pending_kill and pending_wakeup logic only in the
__perf_event_overflow() path, because they make sense only if
there's any data to deliver.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/events/intel/ds.c |  6 +++++-
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 47 ++++++++++++++++++++++++++++++----------------
 3 files changed, 37 insertions(+), 17 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index be20239..9dfeeec 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1389,9 +1389,13 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
 			continue;
 
 		/* log dropped samples number */
-		if (error[bit])
+		if (error[bit]) {
 			perf_log_lost_samples(event, error[bit]);
 
+			if (perf_event_account_interrupt(event))
+				x86_pmu_stop(event, 0);
+		}
+
 		if (counts[bit]) {
 			__intel_pmu_pebs_event(event, iregs, base,
 					       top, bit, counts[bit]);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 4741ecd..78ed810 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1259,6 +1259,7 @@ extern void perf_event_disable(struct perf_event *event);
 extern void perf_event_disable_local(struct perf_event *event);
 extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
+extern int perf_event_account_interrupt(struct perf_event *event);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
 perf_aux_output_begin(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index cbc5937..110b38a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7060,25 +7060,12 @@ static void perf_log_itrace_start(struct perf_event *event)
 	perf_output_end(&handle);
 }
 
-/*
- * Generic event overflow handling, sampling.
- */
-
-static int __perf_event_overflow(struct perf_event *event,
-				   int throttle, struct perf_sample_data *data,
-				   struct pt_regs *regs)
+static int
+__perf_event_account_interrupt(struct perf_event *event, int throttle)
 {
-	int events = atomic_read(&event->event_limit);
 	struct hw_perf_event *hwc = &event->hw;
-	u64 seq;
 	int ret = 0;
-
-	/*
-	 * Non-sampling counters might still use the PMI to fold short
-	 * hardware counters, ignore those.
-	 */
-	if (unlikely(!is_sampling_event(event)))
-		return 0;
+	u64 seq;
 
 	seq = __this_cpu_read(perf_throttled_seq);
 	if (seq != hwc->interrupts_seq) {
@@ -7106,6 +7093,34 @@ static int __perf_event_overflow(struct perf_event *event,
 			perf_adjust_period(event, delta, hwc->last_period, true);
 	}
 
+	return ret;
+}
+
+int perf_event_account_interrupt(struct perf_event *event)
+{
+	return __perf_event_account_interrupt(event, 1);
+}
+
+/*
+ * Generic event overflow handling, sampling.
+ */
+
+static int __perf_event_overflow(struct perf_event *event,
+				   int throttle, struct perf_sample_data *data,
+				   struct pt_regs *regs)
+{
+	int events = atomic_read(&event->event_limit);
+	int ret = 0;
+
+	/*
+	 * Non-sampling counters might still use the PMI to fold short
+	 * hardware counters, ignore those.
+	 */
+	if (unlikely(!is_sampling_event(event)))
+		return 0;
+
+	ret = __perf_event_account_interrupt(event, throttle);
+
 	/*
 	 * XXX event_limit might not quite work as expected on inherited
 	 * events

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip:perf/urgent] perf/x86: Reject non sampling events with precise_ip
  2017-01-03 14:24     ` [PATCH] perf/x86: Reject non sampling events with precise_ip Jiri Olsa
  2017-01-03 22:06       ` Vince Weaver
@ 2017-01-14 12:29       ` tip-bot for Jiri Olsa
  1 sibling, 0 replies; 16+ messages in thread
From: tip-bot for Jiri Olsa @ 2017-01-14 12:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: vincent.weaver, mingo, torvalds, alexander.shishkin, eranian,
	acme, tglx, peterz, hpa, acme, linux-kernel, jolsa, vince, jolsa

Commit-ID:  18e7a45af91acdde99d3aa1372cc40e1f8142f7b
Gitweb:     http://git.kernel.org/tip/18e7a45af91acdde99d3aa1372cc40e1f8142f7b
Author:     Jiri Olsa <jolsa@redhat.com>
AuthorDate: Tue, 3 Jan 2017 15:24:54 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 11:06:50 +0100

perf/x86: Reject non sampling events with precise_ip

As Peter suggested [1] rejecting non sampling PEBS events,
because they dont make any sense and could cause bugs
in the NMI handler [2].

  [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop
  [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.org

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170103142454.GA26251@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/events/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 019c588..1635c0c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -505,6 +505,10 @@ int x86_pmu_hw_config(struct perf_event *event)
 
 		if (event->attr.precise_ip > precise)
 			return -EOPNOTSUPP;
+
+		/* There's no sense in having PEBS for non sampling events: */
+		if (!is_sampling_event(event))
+			return -EINVAL;
 	}
 	/*
 	 * check that PEBS LBR correction does not conflict with

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi
  2016-12-28 13:31 ` [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi Jiri Olsa
  2017-01-03 13:45   ` Peter Zijlstra
@ 2017-01-24 16:41   ` Peter Zijlstra
  2017-01-25 13:02     ` Jiri Olsa
  2017-01-25 13:02     ` Jiri Olsa
  1 sibling, 2 replies; 16+ messages in thread
From: Peter Zijlstra @ 2017-01-24 16:41 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

On Wed, Dec 28, 2016 at 02:31:06PM +0100, Jiri Olsa wrote:
> This patch fixes following WARNING:
> 
>   WARNING: CPU: 15 PID: 15768 at arch/x86/events/core.c:1256 x86_pmu_start+0x1b3/0x1c0
>   ...
>   Call Trace:
>    <IRQ>
>    dump_stack+0x86/0xc3
>    __warn+0xcb/0xf0
>    warn_slowpath_null+0x1d/0x20
>    x86_pmu_start+0x1b3/0x1c0
>    perf_event_task_tick+0x342/0x3f0
>    scheduler_tick+0x75/0xd0
>    update_process_times+0x47/0x60
>    tick_sched_handle.isra.19+0x25/0x60
>    tick_sched_timer+0x3d/0x70
>    __hrtimer_run_queues+0xfb/0x510
>    hrtimer_interrupt+0x9d/0x1a0
>    local_apic_timer_interrupt+0x38/0x60
>    smp_trace_apic_timer_interrupt+0x56/0x25a
>    trace_apic_timer_interrupt+0x9d/0xb0
>    ...
> 
> which happens AFAICS under following conditions:
> (we have PEBS events configured)
> 
>   - x86_pmu_enable reconfigures counters and calls:
>        - x86_pmu_stop on PEBS event
>        - x86_pmu_stop drains the PEBS buffer, crosses
>          the throttle limit and sets:
>            event->hw.interrupts = MAX_INTERRUPTS
>        - following x86_pmu_start call starts the event
>   - perf_event_task_tick is triggered
>     - perf_adjust_freq_unthr_context sees event with
>       MAX_INTERRUPTS set and calls x86_pmu_start on already
>       started event, which triggers the warning
> 
> My first attempt to fix this was to unthrottle the event
> before starting it in x86_pmu_enable. But I think that
> omitting the throttling completely when we are not in the
> PMI is better.

So I don't particularly like these patches... they make a wee bit of a
mess.

Under the assumption that draining a single event is on the same order
of cost as a regular PMI, then accounting a drain of multiple events as
an equal amount of interrupts makes sense.

We should not disregard this work. Now it looks like both (BTS & PEBS)
drain methods only count a single interrupt, that's something we maybe
ought to fix too.

So these things that drain are different from the regular case in that
::stop() will do this extra work not 'expected' by the regular core, so
we must do something special. But 'hiding' the work is not correct.

Arguably the x86_pmu_start() call in x86_pmu_enable() is wrong, if the
stop caused a throttle, we should respect that. The problem is that we
'loose' the x86_pmu_stop() call done by drain. We check
PERF_HES_STOPPED() before doing x86_pmu_stop(), but we cannot do
thereafter because HES_STOPPED will always be set.


Hmm, so we have:

  x86_pmu_enable()
    if (HES_STOPPED)
      hwc->state |= HES_ARCH;

    x86_pmu_stop()
      if __tac(active_mask) (true)
	x86_pmu.disable() := intel_pmu_disable_event()
	  intel_pmu_pebs_disable()
	    intel_pmu_drain_pebs_buffer()
	      x86_pmu_stop()
		__tac(active_mask) (false)
      hwc->state |= HES_STOPPED;

    if (!HES_ARCH)
      x86_pmu_start();


So if we have that recursive stop also set ARCH, things might just work.

---
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1635c0c8df23..a95707a4140f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1343,6 +1343,8 @@ void x86_pmu_stop(struct perf_event *event, int flags)
 		cpuc->events[hwc->idx] = NULL;
 		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 		hwc->state |= PERF_HES_STOPPED;
+	} else {
+		hwc->state |= PERF_HES_ARCH;
 	}
 
 	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi
  2017-01-24 16:41   ` Peter Zijlstra
@ 2017-01-25 13:02     ` Jiri Olsa
  2017-01-25 13:02     ` Jiri Olsa
  1 sibling, 0 replies; 16+ messages in thread
From: Jiri Olsa @ 2017-01-25 13:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

On Tue, Jan 24, 2017 at 05:41:22PM +0100, Peter Zijlstra wrote:

SNIP

> 
> So I don't particularly like these patches... they make a wee bit of a
> mess.
> 
> Under the assumption that draining a single event is on the same order
> of cost as a regular PMI, then accounting a drain of multiple events as
> an equal amount of interrupts makes sense.

I understood it like we're trying more to protect the interrupt
context abuse rather than account the samples here.. but yes,
we compare (hwc->interrupts >= max_samples_per_tick) for throttling
so I guess we should not cross max_samples_per_tick in any case

> 
> We should not disregard this work. Now it looks like both (BTS & PEBS)
> drain methods only count a single interrupt, that's something we maybe
> ought to fix too.

right, we account just single hwc->interrupts++, and BTS drain does
not have the throttle logic

I tried to put the perf_event_overflow call into both PEBS and BTS
drains (below, untested). I think it's going to be slower especially
for BTS code, which bypassed single event allocations and allocated
one buffer for all samples at once.

jirka


---
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index be202390bbd3..703bd82c69b1 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -520,10 +520,7 @@ int intel_pmu_drain_bts_buffer(void)
 	};
 	struct perf_event *event = cpuc->events[INTEL_PMC_IDX_FIXED_BTS];
 	struct bts_record *at, *base, *top;
-	struct perf_output_handle handle;
-	struct perf_event_header header;
 	struct perf_sample_data data;
-	unsigned long skip = 0;
 	struct pt_regs regs;
 
 	if (!event)
@@ -562,40 +559,18 @@ int intel_pmu_drain_bts_buffer(void)
 		 */
 		if (event->attr.exclude_kernel &&
 		    (kernel_ip(at->from) || kernel_ip(at->to)))
-			skip++;
-	}
-
-	/*
-	 * Prepare a generic sample, i.e. fill in the invariant fields.
-	 * We will overwrite the from and to address before we output
-	 * the sample.
-	 */
-	rcu_read_lock();
-	perf_prepare_sample(&header, &data, event, &regs);
-
-	if (perf_output_begin(&handle, event, header.size *
-			      (top - base - skip)))
-		goto unlock;
-
-	for (at = base; at < top; at++) {
-		/* Filter out any records that contain kernel addresses. */
-		if (event->attr.exclude_kernel &&
-		    (kernel_ip(at->from) || kernel_ip(at->to)))
 			continue;
 
 		data.ip		= at->from;
 		data.addr	= at->to;
 
-		perf_output_sample(&handle, &header, &data, event);
-	}
+		if (perf_event_overflow(event, &data, &regs)) {
+			x86_pmu_stop(event, 0);
+			break;
+		}
 
-	perf_output_end(&handle);
+	}
 
-	/* There's new data available. */
-	event->hw.interrupts++;
-	event->pending_kill = POLL_IN;
-unlock:
-	rcu_read_unlock();
 	return 1;
 }
 
@@ -1243,25 +1218,18 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 	    !(event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD))
 		return;
 
-	while (count > 1) {
+	while (count > 0) {
 		setup_pebs_sample_data(event, iregs, at, &data, &regs);
-		perf_event_output(event, &data, &regs);
+
+		if (perf_event_overflow(event, &data, &regs)) {
+			x86_pmu_stop(event, 0);
+			break;
+		}
+
 		at += x86_pmu.pebs_record_size;
 		at = get_next_pebs_record_by_bit(at, top, bit);
 		count--;
 	}
-
-	setup_pebs_sample_data(event, iregs, at, &data, &regs);
-
-	/*
-	 * All but the last records are processed.
-	 * The last one is left to be able to call the overflow handler.
-	 */
-	if (perf_event_overflow(event, &data, &regs)) {
-		x86_pmu_stop(event, 0);
-		return;
-	}
-
 }
 
 static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi
  2017-01-24 16:41   ` Peter Zijlstra
  2017-01-25 13:02     ` Jiri Olsa
@ 2017-01-25 13:02     ` Jiri Olsa
  1 sibling, 0 replies; 16+ messages in thread
From: Jiri Olsa @ 2017-01-25 13:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Olsa, lkml, Ingo Molnar, Andi Kleen, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Vince Weaver

On Tue, Jan 24, 2017 at 05:41:22PM +0100, Peter Zijlstra wrote:

SNIP

> ---
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 1635c0c8df23..a95707a4140f 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1343,6 +1343,8 @@ void x86_pmu_stop(struct perf_event *event, int flags)
>  		cpuc->events[hwc->idx] = NULL;
>  		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
>  		hwc->state |= PERF_HES_STOPPED;
> +	} else {
> +		hwc->state |= PERF_HES_ARCH;
>  	}
>  
>  	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {

I think that will work, I'll try to test it

jirka

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-01-25 13:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-28 13:31 [PATCH 0/4] perf: Fuzzer fixes Jiri Olsa
2016-12-28 13:31 ` [PATCH 1/4] perf/x86/intel: Account interrupts for PEBS errors Jiri Olsa
2017-01-14 12:29   ` [tip:perf/urgent] " tip-bot for Jiri Olsa
2016-12-28 13:31 ` [PATCH 2/4] perf/x86: Fix period for non sampling events Jiri Olsa
2017-01-03  9:40   ` Peter Zijlstra
2017-01-03 14:24     ` [PATCH] perf/x86: Reject non sampling events with precise_ip Jiri Olsa
2017-01-03 22:06       ` Vince Weaver
2017-01-14 12:29       ` [tip:perf/urgent] " tip-bot for Jiri Olsa
2017-01-03 15:09   ` [PATCH 2/4] perf/x86: Fix period for non sampling events Peter Zijlstra
2017-01-03 15:26     ` Jiri Olsa
2016-12-28 13:31 ` [PATCH 3/4] perf: Add perf_event_overflow_throttle function Jiri Olsa
2016-12-28 13:31 ` [PATCH 4/4] perf/x86/intel: Throttle PEBS events only from pmi Jiri Olsa
2017-01-03 13:45   ` Peter Zijlstra
2017-01-24 16:41   ` Peter Zijlstra
2017-01-25 13:02     ` Jiri Olsa
2017-01-25 13:02     ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).