All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] Switch NMI watchdog to ref cycles on x86
@ 2016-06-09 13:14 Andi Kleen
  2016-06-09 13:14 ` [PATCH 2/2] perf stat: Remove nmi watchdog check code again Andi Kleen
  2016-06-14 11:27 ` [tip:perf/core] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86 tip-bot for Andi Kleen
  0 siblings, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2016-06-09 13:14 UTC (permalink / raw)
  To: peterz; +Cc: acme, x86, linux-kernel, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example
the --topdown support recently added to perf stat. The code needs to
fall back to not use groups, which can cause measurement inaccuracy
due to multiplexing errors.

This patch switches the NMI watchdog to use reference cycles
on Intel systems.  This is actually more accurate than cycles,
because cycles can tick faster than the measured CPU Frequency
due to Turbo mode.

The ref cycles always tick at their frequency, or slower when
the system is idling. That means the NMI watchdog can never
expire too early, unlike with cycles.

The reference cycles tick roughly at the frequency of the TSC,
so the same period computation can be used.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/apic/hw_nmi.c | 8 ++++++++
 include/linux/nmi.h           | 1 +
 kernel/watchdog.c             | 7 +++++++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 7788ce643bf4..016f4263fad4 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,8 +18,16 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/perf_event.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
+int hw_nmi_get_event(void)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return PERF_COUNT_HW_REF_CPU_CYCLES;
+	return PERF_COUNT_HW_CPU_CYCLES;
+}
+
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
 	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 4630eeae18e0..79858af27209 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -66,6 +66,7 @@ static inline bool trigger_allbutself_cpu_backtrace(void)
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
+int hw_nmi_get_event(void);
 extern int nmi_watchdog_enabled;
 extern int soft_watchdog_enabled;
 extern int watchdog_user_enabled;
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9acb29f280ec..8dd30fcd91be 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -315,6 +315,12 @@ static int is_softlockup(unsigned long touch_ts)
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 
+/* Can be overriden by architecture */
+__weak int hw_nmi_get_event(void)
+{
+	return PERF_COUNT_HW_CPU_CYCLES;
+}
+
 static struct perf_event_attr wd_hw_attr = {
 	.type		= PERF_TYPE_HARDWARE,
 	.config		= PERF_COUNT_HW_CPU_CYCLES,
@@ -604,6 +610,7 @@ static int watchdog_nmi_enable(unsigned int cpu)
 
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+	wd_attr->config = hw_nmi_get_event();
 
 	/* Try to register using hardware perf events */
 	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] perf stat: Remove nmi watchdog check code again
  2016-06-09 13:14 [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Andi Kleen
@ 2016-06-09 13:14 ` Andi Kleen
  2016-06-09 13:42   ` Arnaldo Carvalho de Melo
  2016-06-10 19:03   ` Arnaldo Carvalho de Melo
  2016-06-14 11:27 ` [tip:perf/core] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86 tip-bot for Andi Kleen
  1 sibling, 2 replies; 8+ messages in thread
From: Andi Kleen @ 2016-06-09 13:14 UTC (permalink / raw)
  To: peterz; +Cc: acme, x86, linux-kernel, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Now that the NMI watchdog runs with reference cycles, and does not
conflict with TopDown anymore, we don't need to check that the
NMI watchdog is off in perf stat --topdown.

Remove the code that does this and always use a group unconditionally.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |  6 ------
 tools/perf/arch/x86/util/Build         |  1 -
 tools/perf/arch/x86/util/group.c       | 27 ---------------------------
 tools/perf/builtin-stat.c              | 18 +-----------------
 tools/perf/util/group.h                |  7 -------
 5 files changed, 1 insertion(+), 58 deletions(-)
 delete mode 100644 tools/perf/arch/x86/util/group.c
 delete mode 100644 tools/perf/util/group.h

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index d96ccd4844df..6cf4028ddc2c 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -225,12 +225,6 @@ CPU thread. Per core mode is automatically enabled
 and -a (global monitoring) is needed, requiring root rights or
 perf.perf_event_paranoid=-1.
 
-Topdown uses the full Performance Monitoring Unit, and needs
-disabling of the NMI watchdog (as root):
-echo 0 > /proc/sys/kernel/nmi_watchdog
-for best results. Otherwise the bottlenecks may be inconsistent
-on workload with changing phases.
-
 This enables --metric-only, unless overriden with --no-metric-only.
 
 To interpret the results it is usually needed to know on which
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index f95e6f46ef0d..bc24b75add88 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -3,7 +3,6 @@ libperf-y += tsc.o
 libperf-y += pmu.o
 libperf-y += kvm-stat.o
 libperf-y += perf_regs.o
-libperf-y += group.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/group.c b/tools/perf/arch/x86/util/group.c
deleted file mode 100644
index 37f92aa39a5d..000000000000
--- a/tools/perf/arch/x86/util/group.c
+++ /dev/null
@@ -1,27 +0,0 @@
-#include <stdio.h>
-#include "api/fs/fs.h"
-#include "util/group.h"
-
-/*
- * Check whether we can use a group for top down.
- * Without a group may get bad results due to multiplexing.
- */
-bool arch_topdown_check_group(bool *warn)
-{
-	int n;
-
-	if (sysctl__read_int("kernel/nmi_watchdog", &n) < 0)
-		return false;
-	if (n > 0) {
-		*warn = true;
-		return false;
-	}
-	return true;
-}
-
-void arch_topdown_group_warn(void)
-{
-	fprintf(stderr,
-		"nmi_watchdog enabled with topdown. May give wrong results.\n"
-		"Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
-}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index dff63733dfb7..a599a78b2f22 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,10 +59,8 @@
 #include "util/thread.h"
 #include "util/thread_map.h"
 #include "util/counts.h"
-#include "util/group.h"
 #include "util/session.h"
 #include "util/tool.h"
-#include "util/group.h"
 #include "asm/bug.h"
 
 #include <api/fs/fs.h>
@@ -1861,16 +1859,6 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	return 0;
 }
 
-__weak bool arch_topdown_check_group(bool *warn)
-{
-	*warn = false;
-	return false;
-}
-
-__weak void arch_topdown_group_warn(void)
-{
-}
-
 /*
  * Add default attributes, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
@@ -2010,7 +1998,6 @@ static int add_default_attributes(void)
 
 	if (topdown_run) {
 		char *str = NULL;
-		bool warn = false;
 
 		if (stat_config.aggr_mode != AGGR_GLOBAL &&
 		    stat_config.aggr_mode != AGGR_CORE) {
@@ -2025,14 +2012,11 @@ static int add_default_attributes(void)
 
 		if (!force_metric_only)
 			metric_only = true;
-		if (topdown_filter_events(topdown_attrs, &str,
-				arch_topdown_check_group(&warn)) < 0) {
+		if (topdown_filter_events(topdown_attrs, &str, true) < 0) {
 			pr_err("Out of memory\n");
 			return -1;
 		}
 		if (topdown_attrs[0] && str) {
-			if (warn)
-				arch_topdown_group_warn();
 			err = parse_events(evsel_list, str, NULL);
 			if (err) {
 				fprintf(stderr,
diff --git a/tools/perf/util/group.h b/tools/perf/util/group.h
deleted file mode 100644
index 116debe7a995..000000000000
--- a/tools/perf/util/group.h
+++ /dev/null
@@ -1,7 +0,0 @@
-#ifndef GROUP_H
-#define GROUP_H 1
-
-bool arch_topdown_check_group(bool *warn);
-void arch_topdown_group_warn(void);
-
-#endif
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] perf stat: Remove nmi watchdog check code again
  2016-06-09 13:14 ` [PATCH 2/2] perf stat: Remove nmi watchdog check code again Andi Kleen
@ 2016-06-09 13:42   ` Arnaldo Carvalho de Melo
  2016-06-09 15:17     ` Andi Kleen
  2016-06-10 19:03   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-09 13:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: peterz, x86, linux-kernel, jolsa, Andi Kleen

Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Now that the NMI watchdog runs with reference cycles, and does not

Now as in when? We should at least warn the user that the kernel used is
one where the NMI watchdog will not get in the way of topdown. Is there
a programmatic way to discover that?

- Arnaldo

> conflict with TopDown anymore, we don't need to check that the
> NMI watchdog is off in perf stat --topdown.
> 
> Remove the code that does this and always use a group unconditionally.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/Documentation/perf-stat.txt |  6 ------
>  tools/perf/arch/x86/util/Build         |  1 -
>  tools/perf/arch/x86/util/group.c       | 27 ---------------------------
>  tools/perf/builtin-stat.c              | 18 +-----------------
>  tools/perf/util/group.h                |  7 -------
>  5 files changed, 1 insertion(+), 58 deletions(-)
>  delete mode 100644 tools/perf/arch/x86/util/group.c
>  delete mode 100644 tools/perf/util/group.h
> 
> diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
> index d96ccd4844df..6cf4028ddc2c 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -225,12 +225,6 @@ CPU thread. Per core mode is automatically enabled
>  and -a (global monitoring) is needed, requiring root rights or
>  perf.perf_event_paranoid=-1.
>  
> -Topdown uses the full Performance Monitoring Unit, and needs
> -disabling of the NMI watchdog (as root):
> -echo 0 > /proc/sys/kernel/nmi_watchdog
> -for best results. Otherwise the bottlenecks may be inconsistent
> -on workload with changing phases.
> -
>  This enables --metric-only, unless overriden with --no-metric-only.
>  
>  To interpret the results it is usually needed to know on which
> diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
> index f95e6f46ef0d..bc24b75add88 100644
> --- a/tools/perf/arch/x86/util/Build
> +++ b/tools/perf/arch/x86/util/Build
> @@ -3,7 +3,6 @@ libperf-y += tsc.o
>  libperf-y += pmu.o
>  libperf-y += kvm-stat.o
>  libperf-y += perf_regs.o
> -libperf-y += group.o
>  
>  libperf-$(CONFIG_DWARF) += dwarf-regs.o
>  libperf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
> diff --git a/tools/perf/arch/x86/util/group.c b/tools/perf/arch/x86/util/group.c
> deleted file mode 100644
> index 37f92aa39a5d..000000000000
> --- a/tools/perf/arch/x86/util/group.c
> +++ /dev/null
> @@ -1,27 +0,0 @@
> -#include <stdio.h>
> -#include "api/fs/fs.h"
> -#include "util/group.h"
> -
> -/*
> - * Check whether we can use a group for top down.
> - * Without a group may get bad results due to multiplexing.
> - */
> -bool arch_topdown_check_group(bool *warn)
> -{
> -	int n;
> -
> -	if (sysctl__read_int("kernel/nmi_watchdog", &n) < 0)
> -		return false;
> -	if (n > 0) {
> -		*warn = true;
> -		return false;
> -	}
> -	return true;
> -}
> -
> -void arch_topdown_group_warn(void)
> -{
> -	fprintf(stderr,
> -		"nmi_watchdog enabled with topdown. May give wrong results.\n"
> -		"Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
> -}
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index dff63733dfb7..a599a78b2f22 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -59,10 +59,8 @@
>  #include "util/thread.h"
>  #include "util/thread_map.h"
>  #include "util/counts.h"
> -#include "util/group.h"
>  #include "util/session.h"
>  #include "util/tool.h"
> -#include "util/group.h"
>  #include "asm/bug.h"
>  
>  #include <api/fs/fs.h>
> @@ -1861,16 +1859,6 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
>  	return 0;
>  }
>  
> -__weak bool arch_topdown_check_group(bool *warn)
> -{
> -	*warn = false;
> -	return false;
> -}
> -
> -__weak void arch_topdown_group_warn(void)
> -{
> -}
> -
>  /*
>   * Add default attributes, if there were no attributes specified or
>   * if -d/--detailed, -d -d or -d -d -d is used:
> @@ -2010,7 +1998,6 @@ static int add_default_attributes(void)
>  
>  	if (topdown_run) {
>  		char *str = NULL;
> -		bool warn = false;
>  
>  		if (stat_config.aggr_mode != AGGR_GLOBAL &&
>  		    stat_config.aggr_mode != AGGR_CORE) {
> @@ -2025,14 +2012,11 @@ static int add_default_attributes(void)
>  
>  		if (!force_metric_only)
>  			metric_only = true;
> -		if (topdown_filter_events(topdown_attrs, &str,
> -				arch_topdown_check_group(&warn)) < 0) {
> +		if (topdown_filter_events(topdown_attrs, &str, true) < 0) {
>  			pr_err("Out of memory\n");
>  			return -1;
>  		}
>  		if (topdown_attrs[0] && str) {
> -			if (warn)
> -				arch_topdown_group_warn();
>  			err = parse_events(evsel_list, str, NULL);
>  			if (err) {
>  				fprintf(stderr,
> diff --git a/tools/perf/util/group.h b/tools/perf/util/group.h
> deleted file mode 100644
> index 116debe7a995..000000000000
> --- a/tools/perf/util/group.h
> +++ /dev/null
> @@ -1,7 +0,0 @@
> -#ifndef GROUP_H
> -#define GROUP_H 1
> -
> -bool arch_topdown_check_group(bool *warn);
> -void arch_topdown_group_warn(void);
> -
> -#endif
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] perf stat: Remove nmi watchdog check code again
  2016-06-09 13:42   ` Arnaldo Carvalho de Melo
@ 2016-06-09 15:17     ` Andi Kleen
  2016-06-09 17:20       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2016-06-09 15:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, peterz, x86, linux-kernel, jolsa, Andi Kleen

On Thu, Jun 09, 2016 at 10:42:08AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > Now that the NMI watchdog runs with reference cycles, and does not
> 
> Now as in when? We should at least warn the user that the kernel used is
> one where the NMI watchdog will not get in the way of topdown. Is there
> a programmatic way to discover that?

If the other patch gets merged at the same time as the TopDown patches
it's only a few days in tip which don't support it, no released kernel.
So no need to check for this case.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] perf stat: Remove nmi watchdog check code again
  2016-06-09 15:17     ` Andi Kleen
@ 2016-06-09 17:20       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-09 17:20 UTC (permalink / raw)
  To: Andi Kleen; +Cc: peterz, x86, linux-kernel, jolsa, Andi Kleen

Em Thu, Jun 09, 2016 at 08:17:16AM -0700, Andi Kleen escreveu:
> On Thu, Jun 09, 2016 at 10:42:08AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu:
> > > Now that the NMI watchdog runs with reference cycles, and does not

> > Now as in when? We should at least warn the user that the kernel used is
> > one where the NMI watchdog will not get in the way of topdown. Is there
> > a programmatic way to discover that?
 
> If the other patch gets merged at the same time as the TopDown patches
> it's only a few days in tip which don't support it, no released kernel.
> So no need to check for this case.

Ok, I see, since on older kernels it will fail before that point, when
not finding the topdown counters, no problem, super small window, thanks
for clarifying.

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] perf stat: Remove nmi watchdog check code again
  2016-06-09 13:14 ` [PATCH 2/2] perf stat: Remove nmi watchdog check code again Andi Kleen
  2016-06-09 13:42   ` Arnaldo Carvalho de Melo
@ 2016-06-10 19:03   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-10 19:03 UTC (permalink / raw)
  To: Peter Zijlstra, x86, linux-kernel, Jiri Olsa, Andi Kleen; +Cc: Andi Kleen

Em Thu, Jun 09, 2016 at 06:14:39AM -0700, Andi Kleen escreveu:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Now that the NMI watchdog runs with reference cycles, and does not
> conflict with TopDown anymore, we don't need to check that the
> NMI watchdog is off in perf stat --topdown.
> 
> Remove the code that does this and always use a group unconditionally.

PeterZ, will you pick this one? I'm ok with it, Andi is just removing
some code he introduced recently.

- Arnaldo
 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/Documentation/perf-stat.txt |  6 ------
>  tools/perf/arch/x86/util/Build         |  1 -
>  tools/perf/arch/x86/util/group.c       | 27 ---------------------------
>  tools/perf/builtin-stat.c              | 18 +-----------------
>  tools/perf/util/group.h                |  7 -------
>  5 files changed, 1 insertion(+), 58 deletions(-)
>  delete mode 100644 tools/perf/arch/x86/util/group.c
>  delete mode 100644 tools/perf/util/group.h
> 
> diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
> index d96ccd4844df..6cf4028ddc2c 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -225,12 +225,6 @@ CPU thread. Per core mode is automatically enabled
>  and -a (global monitoring) is needed, requiring root rights or
>  perf.perf_event_paranoid=-1.
>  
> -Topdown uses the full Performance Monitoring Unit, and needs
> -disabling of the NMI watchdog (as root):
> -echo 0 > /proc/sys/kernel/nmi_watchdog
> -for best results. Otherwise the bottlenecks may be inconsistent
> -on workload with changing phases.
> -
>  This enables --metric-only, unless overriden with --no-metric-only.
>  
>  To interpret the results it is usually needed to know on which
> diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
> index f95e6f46ef0d..bc24b75add88 100644
> --- a/tools/perf/arch/x86/util/Build
> +++ b/tools/perf/arch/x86/util/Build
> @@ -3,7 +3,6 @@ libperf-y += tsc.o
>  libperf-y += pmu.o
>  libperf-y += kvm-stat.o
>  libperf-y += perf_regs.o
> -libperf-y += group.o
>  
>  libperf-$(CONFIG_DWARF) += dwarf-regs.o
>  libperf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
> diff --git a/tools/perf/arch/x86/util/group.c b/tools/perf/arch/x86/util/group.c
> deleted file mode 100644
> index 37f92aa39a5d..000000000000
> --- a/tools/perf/arch/x86/util/group.c
> +++ /dev/null
> @@ -1,27 +0,0 @@
> -#include <stdio.h>
> -#include "api/fs/fs.h"
> -#include "util/group.h"
> -
> -/*
> - * Check whether we can use a group for top down.
> - * Without a group may get bad results due to multiplexing.
> - */
> -bool arch_topdown_check_group(bool *warn)
> -{
> -	int n;
> -
> -	if (sysctl__read_int("kernel/nmi_watchdog", &n) < 0)
> -		return false;
> -	if (n > 0) {
> -		*warn = true;
> -		return false;
> -	}
> -	return true;
> -}
> -
> -void arch_topdown_group_warn(void)
> -{
> -	fprintf(stderr,
> -		"nmi_watchdog enabled with topdown. May give wrong results.\n"
> -		"Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
> -}
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index dff63733dfb7..a599a78b2f22 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -59,10 +59,8 @@
>  #include "util/thread.h"
>  #include "util/thread_map.h"
>  #include "util/counts.h"
> -#include "util/group.h"
>  #include "util/session.h"
>  #include "util/tool.h"
> -#include "util/group.h"
>  #include "asm/bug.h"
>  
>  #include <api/fs/fs.h>
> @@ -1861,16 +1859,6 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
>  	return 0;
>  }
>  
> -__weak bool arch_topdown_check_group(bool *warn)
> -{
> -	*warn = false;
> -	return false;
> -}
> -
> -__weak void arch_topdown_group_warn(void)
> -{
> -}
> -
>  /*
>   * Add default attributes, if there were no attributes specified or
>   * if -d/--detailed, -d -d or -d -d -d is used:
> @@ -2010,7 +1998,6 @@ static int add_default_attributes(void)
>  
>  	if (topdown_run) {
>  		char *str = NULL;
> -		bool warn = false;
>  
>  		if (stat_config.aggr_mode != AGGR_GLOBAL &&
>  		    stat_config.aggr_mode != AGGR_CORE) {
> @@ -2025,14 +2012,11 @@ static int add_default_attributes(void)
>  
>  		if (!force_metric_only)
>  			metric_only = true;
> -		if (topdown_filter_events(topdown_attrs, &str,
> -				arch_topdown_check_group(&warn)) < 0) {
> +		if (topdown_filter_events(topdown_attrs, &str, true) < 0) {
>  			pr_err("Out of memory\n");
>  			return -1;
>  		}
>  		if (topdown_attrs[0] && str) {
> -			if (warn)
> -				arch_topdown_group_warn();
>  			err = parse_events(evsel_list, str, NULL);
>  			if (err) {
>  				fprintf(stderr,
> diff --git a/tools/perf/util/group.h b/tools/perf/util/group.h
> deleted file mode 100644
> index 116debe7a995..000000000000
> --- a/tools/perf/util/group.h
> +++ /dev/null
> @@ -1,7 +0,0 @@
> -#ifndef GROUP_H
> -#define GROUP_H 1
> -
> -bool arch_topdown_check_group(bool *warn);
> -void arch_topdown_group_warn(void);
> -
> -#endif
> -- 
> 2.5.5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:perf/core] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
  2016-06-09 13:14 [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Andi Kleen
  2016-06-09 13:14 ` [PATCH 2/2] perf stat: Remove nmi watchdog check code again Andi Kleen
@ 2016-06-14 11:27 ` tip-bot for Andi Kleen
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot for Andi Kleen @ 2016-06-14 11:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: ak, vincent.weaver, eranian, tglx, linux-kernel,
	alexander.shishkin, hpa, torvalds, peterz, acme, mingo, jolsa

Commit-ID:  2c95afc1e83d93fac3be6923465e1753c2c53b0a
Gitweb:     http://git.kernel.org/tip/2c95afc1e83d93fac3be6923465e1753c2c53b0a
Author:     Andi Kleen <ak@linux.intel.com>
AuthorDate: Thu, 9 Jun 2016 06:14:38 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 14 Jun 2016 11:16:59 +0200

perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86

The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example
the --topdown support recently added to perf stat. The code needs to
fall back to not use groups, which can cause measurement inaccuracy
due to multiplexing errors.

This patch switches the NMI watchdog to use reference cycles
on Intel systems.  This is actually more accurate than cycles,
because cycles can tick faster than the measured CPU Frequency
due to Turbo mode.

The ref cycles always tick at their frequency, or slower when
the system is idling. That means the NMI watchdog can never
expire too early, unlike with cycles.

The reference cycles tick roughly at the frequency of the TSC,
so the same period computation can be used.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: jolsa@kernel.org
Link: http://lkml.kernel.org/r/1465478079-19993-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/apic/hw_nmi.c | 8 ++++++++
 include/linux/nmi.h           | 1 +
 kernel/watchdog.c             | 7 +++++++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 7788ce6..016f426 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,8 +18,16 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/perf_event.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
+int hw_nmi_get_event(void)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return PERF_COUNT_HW_REF_CPU_CYCLES;
+	return PERF_COUNT_HW_CPU_CYCLES;
+}
+
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
 	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 4630eea..79858af 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -66,6 +66,7 @@ static inline bool trigger_allbutself_cpu_backtrace(void)
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
+int hw_nmi_get_event(void);
 extern int nmi_watchdog_enabled;
 extern int soft_watchdog_enabled;
 extern int watchdog_user_enabled;
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9acb29f..8dd30fcd 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -315,6 +315,12 @@ static int is_softlockup(unsigned long touch_ts)
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 
+/* Can be overriden by architecture */
+__weak int hw_nmi_get_event(void)
+{
+	return PERF_COUNT_HW_CPU_CYCLES;
+}
+
 static struct perf_event_attr wd_hw_attr = {
 	.type		= PERF_TYPE_HARDWARE,
 	.config		= PERF_COUNT_HW_CPU_CYCLES,
@@ -604,6 +610,7 @@ static int watchdog_nmi_enable(unsigned int cpu)
 
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+	wd_attr->config = hw_nmi_get_event();
 
 	/* Try to register using hardware perf events */
 	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] perf stat: Remove nmi watchdog check code again
  2016-06-08 21:36 [PATCH 1/2] " Andi Kleen
@ 2016-06-08 21:36 ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2016-06-08 21:36 UTC (permalink / raw)
  To: peterz; +Cc: acme, jolsa, linux-kernel, x86, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Now that the NMI watchdog runs with reference cycles, and does not
conflict with TopDown anymore, we don't need to check that the
NMI watchdog is off in perf stat --topdown.

Remove the code that does this and always use a group unconditionally.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |  6 ------
 tools/perf/arch/x86/util/Build         |  1 -
 tools/perf/arch/x86/util/group.c       | 27 ---------------------------
 tools/perf/builtin-stat.c              | 18 +-----------------
 tools/perf/util/group.h                |  7 -------
 5 files changed, 1 insertion(+), 58 deletions(-)
 delete mode 100644 tools/perf/arch/x86/util/group.c
 delete mode 100644 tools/perf/util/group.h

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index d96ccd4844df..6cf4028ddc2c 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -225,12 +225,6 @@ CPU thread. Per core mode is automatically enabled
 and -a (global monitoring) is needed, requiring root rights or
 perf.perf_event_paranoid=-1.
 
-Topdown uses the full Performance Monitoring Unit, and needs
-disabling of the NMI watchdog (as root):
-echo 0 > /proc/sys/kernel/nmi_watchdog
-for best results. Otherwise the bottlenecks may be inconsistent
-on workload with changing phases.
-
 This enables --metric-only, unless overriden with --no-metric-only.
 
 To interpret the results it is usually needed to know on which
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index f95e6f46ef0d..bc24b75add88 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -3,7 +3,6 @@ libperf-y += tsc.o
 libperf-y += pmu.o
 libperf-y += kvm-stat.o
 libperf-y += perf_regs.o
-libperf-y += group.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/group.c b/tools/perf/arch/x86/util/group.c
deleted file mode 100644
index 37f92aa39a5d..000000000000
--- a/tools/perf/arch/x86/util/group.c
+++ /dev/null
@@ -1,27 +0,0 @@
-#include <stdio.h>
-#include "api/fs/fs.h"
-#include "util/group.h"
-
-/*
- * Check whether we can use a group for top down.
- * Without a group may get bad results due to multiplexing.
- */
-bool arch_topdown_check_group(bool *warn)
-{
-	int n;
-
-	if (sysctl__read_int("kernel/nmi_watchdog", &n) < 0)
-		return false;
-	if (n > 0) {
-		*warn = true;
-		return false;
-	}
-	return true;
-}
-
-void arch_topdown_group_warn(void)
-{
-	fprintf(stderr,
-		"nmi_watchdog enabled with topdown. May give wrong results.\n"
-		"Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
-}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index dff63733dfb7..a599a78b2f22 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,10 +59,8 @@
 #include "util/thread.h"
 #include "util/thread_map.h"
 #include "util/counts.h"
-#include "util/group.h"
 #include "util/session.h"
 #include "util/tool.h"
-#include "util/group.h"
 #include "asm/bug.h"
 
 #include <api/fs/fs.h>
@@ -1861,16 +1859,6 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	return 0;
 }
 
-__weak bool arch_topdown_check_group(bool *warn)
-{
-	*warn = false;
-	return false;
-}
-
-__weak void arch_topdown_group_warn(void)
-{
-}
-
 /*
  * Add default attributes, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
@@ -2010,7 +1998,6 @@ static int add_default_attributes(void)
 
 	if (topdown_run) {
 		char *str = NULL;
-		bool warn = false;
 
 		if (stat_config.aggr_mode != AGGR_GLOBAL &&
 		    stat_config.aggr_mode != AGGR_CORE) {
@@ -2025,14 +2012,11 @@ static int add_default_attributes(void)
 
 		if (!force_metric_only)
 			metric_only = true;
-		if (topdown_filter_events(topdown_attrs, &str,
-				arch_topdown_check_group(&warn)) < 0) {
+		if (topdown_filter_events(topdown_attrs, &str, true) < 0) {
 			pr_err("Out of memory\n");
 			return -1;
 		}
 		if (topdown_attrs[0] && str) {
-			if (warn)
-				arch_topdown_group_warn();
 			err = parse_events(evsel_list, str, NULL);
 			if (err) {
 				fprintf(stderr,
diff --git a/tools/perf/util/group.h b/tools/perf/util/group.h
deleted file mode 100644
index 116debe7a995..000000000000
--- a/tools/perf/util/group.h
+++ /dev/null
@@ -1,7 +0,0 @@
-#ifndef GROUP_H
-#define GROUP_H 1
-
-bool arch_topdown_check_group(bool *warn);
-void arch_topdown_group_warn(void);
-
-#endif
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-14 11:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-09 13:14 [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Andi Kleen
2016-06-09 13:14 ` [PATCH 2/2] perf stat: Remove nmi watchdog check code again Andi Kleen
2016-06-09 13:42   ` Arnaldo Carvalho de Melo
2016-06-09 15:17     ` Andi Kleen
2016-06-09 17:20       ` Arnaldo Carvalho de Melo
2016-06-10 19:03   ` Arnaldo Carvalho de Melo
2016-06-14 11:27 ` [tip:perf/core] perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86 tip-bot for Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2016-06-08 21:36 [PATCH 1/2] " Andi Kleen
2016-06-08 21:36 ` [PATCH 2/2] perf stat: Remove nmi watchdog check code again Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.