linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] Switch NMI watchdog to ref cycles on x86
@ 2016-06-08 21:36 Andi Kleen
  2016-06-08 21:36 ` [PATCH 2/2] perf stat: Remove nmi watchdog check code again Andi Kleen
  2016-06-09  8:43 ` [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Andi Kleen @ 2016-06-08 21:36 UTC (permalink / raw)
  To: peterz; +Cc: acme, jolsa, linux-kernel, x86, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example
the --topdown support recently added to perf stat. The code needs to
fall back to not use groups, which can cause measurement inaccuracy
due to multiplexing errors.

This patch switches the NMI watchdog to use reference cycles
on Intel systems.  This is actually more accurate than cycles,
because cycles can tick faster than the measured CPU Frequency
due to Turbo mode.

The ref cycles always tick at their frequency, or slower when
the system is idling. That means the NMI watchdog can never
expire too early, unlike with cycles.

The reference cycles tick roughly at the frequency of the TSC,
so the same period computation can be used.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/apic/hw_nmi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index c546aac36ee7..016f4263fad4 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,6 +18,7 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/perf_event.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 int hw_nmi_get_event(void)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] perf stat: Remove nmi watchdog check code again
  2016-06-08 21:36 [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Andi Kleen
@ 2016-06-08 21:36 ` Andi Kleen
  2016-06-09  8:43 ` [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2016-06-08 21:36 UTC (permalink / raw)
  To: peterz; +Cc: acme, jolsa, linux-kernel, x86, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Now that the NMI watchdog runs with reference cycles, and does not
conflict with TopDown anymore, we don't need to check that the
NMI watchdog is off in perf stat --topdown.

Remove the code that does this and always use a group unconditionally.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |  6 ------
 tools/perf/arch/x86/util/Build         |  1 -
 tools/perf/arch/x86/util/group.c       | 27 ---------------------------
 tools/perf/builtin-stat.c              | 18 +-----------------
 tools/perf/util/group.h                |  7 -------
 5 files changed, 1 insertion(+), 58 deletions(-)
 delete mode 100644 tools/perf/arch/x86/util/group.c
 delete mode 100644 tools/perf/util/group.h

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index d96ccd4844df..6cf4028ddc2c 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -225,12 +225,6 @@ CPU thread. Per core mode is automatically enabled
 and -a (global monitoring) is needed, requiring root rights or
 perf.perf_event_paranoid=-1.
 
-Topdown uses the full Performance Monitoring Unit, and needs
-disabling of the NMI watchdog (as root):
-echo 0 > /proc/sys/kernel/nmi_watchdog
-for best results. Otherwise the bottlenecks may be inconsistent
-on workload with changing phases.
-
 This enables --metric-only, unless overriden with --no-metric-only.
 
 To interpret the results it is usually needed to know on which
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index f95e6f46ef0d..bc24b75add88 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -3,7 +3,6 @@ libperf-y += tsc.o
 libperf-y += pmu.o
 libperf-y += kvm-stat.o
 libperf-y += perf_regs.o
-libperf-y += group.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/group.c b/tools/perf/arch/x86/util/group.c
deleted file mode 100644
index 37f92aa39a5d..000000000000
--- a/tools/perf/arch/x86/util/group.c
+++ /dev/null
@@ -1,27 +0,0 @@
-#include <stdio.h>
-#include "api/fs/fs.h"
-#include "util/group.h"
-
-/*
- * Check whether we can use a group for top down.
- * Without a group may get bad results due to multiplexing.
- */
-bool arch_topdown_check_group(bool *warn)
-{
-	int n;
-
-	if (sysctl__read_int("kernel/nmi_watchdog", &n) < 0)
-		return false;
-	if (n > 0) {
-		*warn = true;
-		return false;
-	}
-	return true;
-}
-
-void arch_topdown_group_warn(void)
-{
-	fprintf(stderr,
-		"nmi_watchdog enabled with topdown. May give wrong results.\n"
-		"Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n");
-}
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index dff63733dfb7..a599a78b2f22 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,10 +59,8 @@
 #include "util/thread.h"
 #include "util/thread_map.h"
 #include "util/counts.h"
-#include "util/group.h"
 #include "util/session.h"
 #include "util/tool.h"
-#include "util/group.h"
 #include "asm/bug.h"
 
 #include <api/fs/fs.h>
@@ -1861,16 +1859,6 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	return 0;
 }
 
-__weak bool arch_topdown_check_group(bool *warn)
-{
-	*warn = false;
-	return false;
-}
-
-__weak void arch_topdown_group_warn(void)
-{
-}
-
 /*
  * Add default attributes, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
@@ -2010,7 +1998,6 @@ static int add_default_attributes(void)
 
 	if (topdown_run) {
 		char *str = NULL;
-		bool warn = false;
 
 		if (stat_config.aggr_mode != AGGR_GLOBAL &&
 		    stat_config.aggr_mode != AGGR_CORE) {
@@ -2025,14 +2012,11 @@ static int add_default_attributes(void)
 
 		if (!force_metric_only)
 			metric_only = true;
-		if (topdown_filter_events(topdown_attrs, &str,
-				arch_topdown_check_group(&warn)) < 0) {
+		if (topdown_filter_events(topdown_attrs, &str, true) < 0) {
 			pr_err("Out of memory\n");
 			return -1;
 		}
 		if (topdown_attrs[0] && str) {
-			if (warn)
-				arch_topdown_group_warn();
 			err = parse_events(evsel_list, str, NULL);
 			if (err) {
 				fprintf(stderr,
diff --git a/tools/perf/util/group.h b/tools/perf/util/group.h
deleted file mode 100644
index 116debe7a995..000000000000
--- a/tools/perf/util/group.h
+++ /dev/null
@@ -1,7 +0,0 @@
-#ifndef GROUP_H
-#define GROUP_H 1
-
-bool arch_topdown_check_group(bool *warn);
-void arch_topdown_group_warn(void);
-
-#endif
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] Switch NMI watchdog to ref cycles on x86
  2016-06-08 21:36 [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Andi Kleen
  2016-06-08 21:36 ` [PATCH 2/2] perf stat: Remove nmi watchdog check code again Andi Kleen
@ 2016-06-09  8:43 ` Peter Zijlstra
  2016-06-09 13:13   ` Andi Kleen
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2016-06-09  8:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, linux-kernel, x86, Andi Kleen

On Wed, Jun 08, 2016 at 02:36:46PM -0700, Andi Kleen wrote:

> This patch switches the NMI watchdog to use reference cycles

Are you sure; it seems to only add an #include

> ---
>  arch/x86/kernel/apic/hw_nmi.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
> index c546aac36ee7..016f4263fad4 100644
> --- a/arch/x86/kernel/apic/hw_nmi.c
> +++ b/arch/x86/kernel/apic/hw_nmi.c
> @@ -18,6 +18,7 @@
>  #include <linux/nmi.h>
>  #include <linux/module.h>
>  #include <linux/delay.h>
> +#include <linux/perf_event.h>
>  
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  int hw_nmi_get_event(void)
> -- 
> 2.5.5
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] Switch NMI watchdog to ref cycles on x86
  2016-06-09  8:43 ` [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Peter Zijlstra
@ 2016-06-09 13:13   ` Andi Kleen
  0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2016-06-09 13:13 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andi Kleen, acme, jolsa, linux-kernel, x86

On Thu, Jun 09, 2016 at 10:43:14AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 08, 2016 at 02:36:46PM -0700, Andi Kleen wrote:
> 
> > This patch switches the NMI watchdog to use reference cycles
> 
> Are you sure; it seems to only add an #include

Right, sorry a git rebase went wrong. Let me resend 
with the correct patches.

-Andi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] Switch NMI watchdog to ref cycles on x86
@ 2016-06-09 13:14 Andi Kleen
  0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2016-06-09 13:14 UTC (permalink / raw)
  To: peterz; +Cc: acme, x86, linux-kernel, jolsa, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The NMI watchdog uses either the fixed cycles or a generic cycles
counter. This causes a lot of conflicts with users of the PMU who want
to run a full group including the cycles fixed counter, for example
the --topdown support recently added to perf stat. The code needs to
fall back to not use groups, which can cause measurement inaccuracy
due to multiplexing errors.

This patch switches the NMI watchdog to use reference cycles
on Intel systems.  This is actually more accurate than cycles,
because cycles can tick faster than the measured CPU Frequency
due to Turbo mode.

The ref cycles always tick at their frequency, or slower when
the system is idling. That means the NMI watchdog can never
expire too early, unlike with cycles.

The reference cycles tick roughly at the frequency of the TSC,
so the same period computation can be used.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/apic/hw_nmi.c | 8 ++++++++
 include/linux/nmi.h           | 1 +
 kernel/watchdog.c             | 7 +++++++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 7788ce643bf4..016f4263fad4 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -18,8 +18,16 @@
 #include <linux/nmi.h>
 #include <linux/module.h>
 #include <linux/delay.h>
+#include <linux/perf_event.h>
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
+int hw_nmi_get_event(void)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return PERF_COUNT_HW_REF_CPU_CYCLES;
+	return PERF_COUNT_HW_CPU_CYCLES;
+}
+
 u64 hw_nmi_get_sample_period(int watchdog_thresh)
 {
 	return (u64)(cpu_khz) * 1000 * watchdog_thresh;
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 4630eeae18e0..79858af27209 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -66,6 +66,7 @@ static inline bool trigger_allbutself_cpu_backtrace(void)
 
 #ifdef CONFIG_LOCKUP_DETECTOR
 u64 hw_nmi_get_sample_period(int watchdog_thresh);
+int hw_nmi_get_event(void);
 extern int nmi_watchdog_enabled;
 extern int soft_watchdog_enabled;
 extern int watchdog_user_enabled;
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9acb29f280ec..8dd30fcd91be 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -315,6 +315,12 @@ static int is_softlockup(unsigned long touch_ts)
 
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 
+/* Can be overriden by architecture */
+__weak int hw_nmi_get_event(void)
+{
+	return PERF_COUNT_HW_CPU_CYCLES;
+}
+
 static struct perf_event_attr wd_hw_attr = {
 	.type		= PERF_TYPE_HARDWARE,
 	.config		= PERF_COUNT_HW_CPU_CYCLES,
@@ -604,6 +610,7 @@ static int watchdog_nmi_enable(unsigned int cpu)
 
 	wd_attr = &wd_hw_attr;
 	wd_attr->sample_period = hw_nmi_get_sample_period(watchdog_thresh);
+	wd_attr->config = hw_nmi_get_event();
 
 	/* Try to register using hardware perf events */
 	event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback, NULL);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-06-09 13:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-08 21:36 [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Andi Kleen
2016-06-08 21:36 ` [PATCH 2/2] perf stat: Remove nmi watchdog check code again Andi Kleen
2016-06-09  8:43 ` [PATCH 1/2] Switch NMI watchdog to ref cycles on x86 Peter Zijlstra
2016-06-09 13:13   ` Andi Kleen
2016-06-09 13:14 Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).