All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] perf/stat: Add --disable-hwdt
@ 2017-02-06 12:15 Borislav Petkov
  2017-02-06 12:22 ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-06 12:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar
  Cc: Robert Richter, Vince Weaver, lkml

Hi guys,

so I've been tracing recently on an AMD F15h which has those funky counter
constraints and am seeing this:

# ./perf stat sleep 1

 Performance counter stats for 'sleep 1':

          0.749208      task-clock (msec)         #    0.001 CPUs utilized          
                 1      context-switches          #    0.001 M/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                54      page-faults               #    0.072 M/sec                  
         1,122,815      cycles                    #    1.499 GHz                    
           286,740      stalled-cycles-frontend   #   25.54% frontend cycles idle   
     <not counted>      stalled-cycles-backend                                        (0.00%)
     ^^^^^^^^^^^^
     <not counted>      instructions                                                  (0.00%)
     ^^^^^^^^^^^^
     <not counted>      branches                                                      (0.00%)
     <not counted>      branch-misses                                                 (0.00%)

       1.001550070 seconds time elapsed


The problem is that the HW watchdog thing is already taking up a
counter so when perf stat uses the default counters and when we reach
stalled-cycles-backend, we run out of counters for the remaining events.

So how about something like this:

# ./perf stat --disable-hwdt sleep 1

 Performance counter stats for 'sleep 1':

          0.782552      task-clock (msec)         #    0.001 CPUs utilized          
                 1      context-switches          #    0.001 M/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                55      page-faults               #    0.070 M/sec                  
         1,163,246      cycles                    #    1.486 GHz                    
           293,598      stalled-cycles-frontend   #   25.24% frontend cycles idle   
           400,017      stalled-cycles-backend    #   34.39% backend cycles idle    
           676,505      instructions              #    0.58  insn per cycle         
                                                  #    0.59  stalled cycles per insn
           133,822      branches                  #  171.007 M/sec                  
             7,319      branch-misses             #    5.47% of all branches        

       1.001660058 seconds time elapsed

We did explore other opportunities on IRC like sharing counters or
making the HW WDT thing a 'soft' counter but all those are nasty and
probably not really worth the trouble of touching perf core just so that
this works.

Besides, future generations don't have those constraints anymore so it
is only F15h.

Below is a silly patch as a syntactic sugar helper for perf stat. This
is just an RFC anyway, I'll do it properly with fopen() if you're ok
with the approach.

---
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a02f2e965628..7b466dde8012 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -146,6 +146,7 @@ static aggr_get_id_t		aggr_get_id;
 static bool			append_file;
 static const char		*output_name;
 static int			output_fd;
+static bool			disable_hwdt;
 
 struct perf_stat {
 	bool			 record;
@@ -1575,6 +1576,10 @@ static void sig_atexit(void)
 
 	sigprocmask(SIG_SETMASK, &oset, NULL);
 
+	if (disable_hwdt)
+		if (system("echo 1 > /proc/sys/kernel/nmi_watchdog"))
+			return;
+
 	if (signr == -1)
 		return;
 
@@ -1659,6 +1664,8 @@ static const struct option stat_options[] = {
 			"Only print computed metrics. No raw values", enable_metric_only),
 	OPT_BOOLEAN(0, "topdown", &topdown_run,
 			"measure topdown level 1 statistics"),
+	OPT_BOOLEAN(0, "disable-hwdt", &disable_hwdt,
+			"Disable HW watchdog to free-up a counter"),
 	OPT_END()
 };
 
@@ -2523,6 +2530,11 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (perf_stat_init_aggr_mode())
 		goto out;
 
+	if (disable_hwdt) {
+		if (system("echo 0 > /proc/sys/kernel/nmi_watchdog"))
+			goto out;
+	}
+
 	/*
 	 * We dont want to block the signals - that would cause
 	 * child tasks to inherit that and Ctrl-C would not work.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 12:15 [RFC PATCH] perf/stat: Add --disable-hwdt Borislav Petkov
@ 2017-02-06 12:22 ` Ingo Molnar
  2017-02-06 12:41   ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2017-02-06 12:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Robert Richter,
	Vince Weaver, lkml


* Borislav Petkov <bp@alien8.de> wrote:

> Hi guys,
> 
> so I've been tracing recently on an AMD F15h which has those funky counter
> constraints and am seeing this:
> 
> # ./perf stat sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>           0.749208      task-clock (msec)         #    0.001 CPUs utilized          
>                  1      context-switches          #    0.001 M/sec                  
>                  0      cpu-migrations            #    0.000 K/sec                  
>                 54      page-faults               #    0.072 M/sec                  
>          1,122,815      cycles                    #    1.499 GHz                    
>            286,740      stalled-cycles-frontend   #   25.54% frontend cycles idle   
>      <not counted>      stalled-cycles-backend                                        (0.00%)
>      ^^^^^^^^^^^^
>      <not counted>      instructions                                                  (0.00%)
>      ^^^^^^^^^^^^
>      <not counted>      branches                                                      (0.00%)
>      <not counted>      branch-misses                                                 (0.00%)
> 
>        1.001550070 seconds time elapsed
> 
> 
> The problem is that the HW watchdog thing is already taking up a
> counter so when perf stat uses the default counters and when we reach
> stalled-cycles-backend, we run out of counters for the remaining events.
> 
> So how about something like this:
> 
> # ./perf stat --disable-hwdt sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>           0.782552      task-clock (msec)         #    0.001 CPUs utilized          
>                  1      context-switches          #    0.001 M/sec                  
>                  0      cpu-migrations            #    0.000 K/sec                  
>                 55      page-faults               #    0.070 M/sec                  
>          1,163,246      cycles                    #    1.486 GHz                    
>            293,598      stalled-cycles-frontend   #   25.24% frontend cycles idle   
>            400,017      stalled-cycles-backend    #   34.39% backend cycles idle    
>            676,505      instructions              #    0.58  insn per cycle         
>                                                   #    0.59  stalled cycles per insn
>            133,822      branches                  #  171.007 M/sec                  
>              7,319      branch-misses             #    5.47% of all branches        
> 
>        1.001660058 seconds time elapsed
> 
> We did explore other opportunities on IRC like sharing counters or
> making the HW WDT thing a 'soft' counter but all those are nasty and
> probably not really worth the trouble of touching perf core just so that
> this works.
> 
> Besides, future generations don't have those constraints anymore so it
> is only F15h.
> 
> Below is a silly patch as a syntactic sugar helper for perf stat. This
> is just an RFC anyway, I'll do it properly with fopen() if you're ok
> with the approach.

Looks sensible, and I'd in fact make this the new default behavior (if root runs 
perf stat) - i.e. add a flag to re-enable it, for the rare case where we want to 
debug a hard deadlock while running perf stat ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 12:22 ` Ingo Molnar
@ 2017-02-06 12:41   ` Borislav Petkov
  2017-02-06 12:44     ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-06 12:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Robert Richter,
	Vince Weaver, lkml

On Mon, Feb 06, 2017 at 01:22:31PM +0100, Ingo Molnar wrote:
> Looks sensible, and I'd in fact make this the new default behavior (if root runs 
> perf stat) - i.e. add a flag to re-enable it, for the rare case where we want to 
> debug a hard deadlock while running perf stat ...

I'd probably only need to save the previous state, in case it was
disabled for whatever reason.

So actually, I'll make it so that we restore the previous state at exit.
Whatever it was. But while perf stat runs as root, we will disable it by
default.

/me scratches head a bit...

Oh ok, even better - no need for a cmdline option then.

Or does it *ever* make sense to have watchdog running *while* perf stat
runs too?

Because if it doesn't, turning it off by default *if* it was enabled

	[ should be, we do enable HW WDT by default ]

and restoring it should be the proper thing to do. Without cmdline
option.

Thanks!

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 12:41   ` Borislav Petkov
@ 2017-02-06 12:44     ` Ingo Molnar
  2017-02-06 12:49       ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2017-02-06 12:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Robert Richter,
	Vince Weaver, lkml


* Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Feb 06, 2017 at 01:22:31PM +0100, Ingo Molnar wrote:
> > Looks sensible, and I'd in fact make this the new default behavior (if root runs 
> > perf stat) - i.e. add a flag to re-enable it, for the rare case where we want to 
> > debug a hard deadlock while running perf stat ...
> 
> I'd probably only need to save the previous state, in case it was
> disabled for whatever reason.
> 
> So actually, I'll make it so that we restore the previous state at exit.
> Whatever it was. But while perf stat runs as root, we will disable it by
> default.
> 
> /me scratches head a bit...
> 
> Oh ok, even better - no need for a cmdline option then.
> 
> Or does it *ever* make sense to have watchdog running *while* perf stat
> runs too?

Yeah, if for some whatever reason perf locks up while running perf stat as root, 
so I'd keep the option, as a general principle. :-/

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 12:44     ` Ingo Molnar
@ 2017-02-06 12:49       ` Borislav Petkov
  2017-02-06 13:18         ` Robert Richter
  2017-02-07  1:08         ` Borislav Petkov
  0 siblings, 2 replies; 26+ messages in thread
From: Borislav Petkov @ 2017-02-06 12:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Robert Richter,
	Vince Weaver, lkml

On Mon, Feb 06, 2017 at 01:44:48PM +0100, Ingo Molnar wrote:
> Yeah, if for some whatever reason perf locks up while running perf stat as root, 
> so I'd keep the option, as a general principle. :-/

--dont-disable-hwdt it is.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 12:49       ` Borislav Petkov
@ 2017-02-06 13:18         ` Robert Richter
  2017-02-06 13:23           ` Borislav Petkov
  2017-02-06 14:23           ` [RFC PATCH] perf/stat: Add --disable-hwdt Vince Weaver
  2017-02-07  1:08         ` Borislav Petkov
  1 sibling, 2 replies; 26+ messages in thread
From: Robert Richter @ 2017-02-06 13:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Vince Weaver, lkml

On 06.02.17 13:49:37, Borislav Petkov wrote:
> On Mon, Feb 06, 2017 at 01:44:48PM +0100, Ingo Molnar wrote:
> > Yeah, if for some whatever reason perf locks up while running perf stat as root, 
> > so I'd keep the option, as a general principle. :-/
> 
> --dont-disable-hwdt it is.

Isn't there the danger the previous watchdog state is never restored
if for some reason perf got killed? So maybe have some other task
running that restores it once perf is gone.

-Robert

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 13:18         ` Robert Richter
@ 2017-02-06 13:23           ` Borislav Petkov
  2017-02-07  7:25             ` Ingo Molnar
  2017-02-06 14:23           ` [RFC PATCH] perf/stat: Add --disable-hwdt Vince Weaver
  1 sibling, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-06 13:23 UTC (permalink / raw)
  To: Robert Richter
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Vince Weaver, lkml

On Mon, Feb 06, 2017 at 02:18:32PM +0100, Robert Richter wrote:
> Isn't there the danger the previous watchdog state is never restored
> if for some reason perf got killed? So maybe have some other task
> running that restores it once perf is gone.

Currently, I'm restoring it in the atexit() sighandler. Isn't that
always called?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 13:18         ` Robert Richter
  2017-02-06 13:23           ` Borislav Petkov
@ 2017-02-06 14:23           ` Vince Weaver
  2017-02-06 17:02             ` Borislav Petkov
  1 sibling, 1 reply; 26+ messages in thread
From: Vince Weaver @ 2017-02-06 14:23 UTC (permalink / raw)
  To: Robert Richter
  Cc: Borislav Petkov, Ingo Molnar, Arnaldo Carvalho de Melo,
	Peter Zijlstra, lkml

On Mon, 6 Feb 2017, Robert Richter wrote:

> On 06.02.17 13:49:37, Borislav Petkov wrote:
> > On Mon, Feb 06, 2017 at 01:44:48PM +0100, Ingo Molnar wrote:
> > > Yeah, if for some whatever reason perf locks up while running perf stat as root, 
> > > so I'd keep the option, as a general principle. :-/
> > 
> > --dont-disable-hwdt it is.
> 
> Isn't there the danger the previous watchdog state is never restored
> if for some reason perf got killed? So maybe have some other task
> running that restores it once perf is gone.

minor issue, but is it possibly to do anything about dmesg spam?  From 
what I recall every time you enable and disable the watchdog the kernel 
prints a message.  Makes for messy logs, especially when you run the 
perf_fuzzer as root.

Vince

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 14:23           ` [RFC PATCH] perf/stat: Add --disable-hwdt Vince Weaver
@ 2017-02-06 17:02             ` Borislav Petkov
  0 siblings, 0 replies; 26+ messages in thread
From: Borislav Petkov @ 2017-02-06 17:02 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Robert Richter, Ingo Molnar, Arnaldo Carvalho de Melo,
	Peter Zijlstra, lkml

On Mon, Feb 06, 2017 at 09:23:33AM -0500, Vince Weaver wrote:
> minor issue, but is it possibly to do anything about dmesg spam?  From 
> what I recall every time you enable and disable the watchdog the kernel 
> prints a message.  Makes for messy logs, especially when you run the 
> perf_fuzzer as root.

You mean this:

[81304.460656] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.

We could turn it onto a

pr_info_once().

I mean, the feedback that the thing has been enabled is the 1 in
/proc/sys/kernel/nmi_watchdog anyway...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 12:49       ` Borislav Petkov
  2017-02-06 13:18         ` Robert Richter
@ 2017-02-07  1:08         ` Borislav Petkov
  2017-02-07  1:09           ` [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers Borislav Petkov
  2017-02-07  1:10           ` [PATCH 2/2] perf stat: Disable HW watchdog around a perf stat session Borislav Petkov
  1 sibling, 2 replies; 26+ messages in thread
From: Borislav Petkov @ 2017-02-07  1:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Robert Richter,
	Vince Weaver, lkml

Ok,

here's something (as replies to this message) which looks much more
decent.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers
  2017-02-07  1:08         ` Borislav Petkov
@ 2017-02-07  1:09           ` Borislav Petkov
  2017-02-07  1:43             ` Arnaldo Carvalho de Melo
  2017-02-07  1:10           ` [PATCH 2/2] perf stat: Disable HW watchdog around a perf stat session Borislav Petkov
  1 sibling, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-07  1:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Robert Richter,
	Vince Weaver, lkml

From: Borislav Petkov <bp@suse.de>

Add helper functions to be able to read/write ints into proc files.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 tools/lib/api/fs/fs.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 tools/lib/api/fs/fs.h |  3 +++
 2 files changed, 45 insertions(+)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 4b6bfc43cccf..8e4b9fe18b75 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -314,6 +314,22 @@ int filename__read_int(const char *filename, int *value)
 	return err;
 }
 
+int filename__write_int(const char *filename, int value)
+{
+	char line[64];
+	int fd = open(filename, O_WRONLY), err = -1;
+
+	if (fd < 0)
+		return -1;
+
+	snprintf(line, sizeof(int), "%d", value);
+
+	err = write(fd, line, strnlen(line, 64));
+
+	close(fd);
+	return err;
+}
+
 /*
  * Parses @value out of @filename with strtoull.
  * By using 0 for base, the strtoull detects the
@@ -400,6 +416,32 @@ int procfs__read_str(const char *entry, char **buf, size_t *sizep)
 	return filename__read_str(path, buf, sizep);
 }
 
+int procfs__read_int(const char *entry, int *value)
+{
+	char path[PATH_MAX];
+	const char *procfs = procfs__mountpoint();
+
+	if (!procfs)
+		return -1;
+
+	snprintf(path, sizeof(path), "%s/%s", procfs, entry);
+
+	return filename__read_int(path, value);
+}
+
+int procfs__write_int(const char *entry, int value)
+{
+	char path[PATH_MAX];
+	const char *procfs = procfs__mountpoint();
+
+	if (!procfs)
+		return -1;
+
+	snprintf(path, sizeof(path), "%s/%s", procfs, entry);
+
+	return filename__write_int(path, value);
+}
+
 int sysfs__read_ull(const char *entry, unsigned long long *value)
 {
 	char path[PATH_MAX];
diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
index 6b332dc74498..095f25d4c70f 100644
--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
@@ -28,9 +28,12 @@ FS(bpf_fs)
 
 
 int filename__read_int(const char *filename, int *value);
+int filename__write_int(const char *filename, int value);
 int filename__read_ull(const char *filename, unsigned long long *value);
 int filename__read_str(const char *filename, char **buf, size_t *sizep);
 
+int procfs__read_int(const char *entry, int *value);
+int procfs__write_int(const char *entry, int value);
 int procfs__read_str(const char *entry, char **buf, size_t *sizep);
 
 int sysctl__read_int(const char *sysctl, int *value);
-- 
2.11.0


-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] perf stat: Disable HW watchdog around a perf stat session
  2017-02-07  1:08         ` Borislav Petkov
  2017-02-07  1:09           ` [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers Borislav Petkov
@ 2017-02-07  1:10           ` Borislav Petkov
  2017-02-07  1:45             ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-07  1:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Robert Richter,
	Vince Weaver, lkml

From: Borislav Petkov <bp@suse.de>

When using perf stat on an AMD F15h system with the default hw events
attributes, some of the events don't get counted:

 Performance counter stats for 'sleep 1':

          0.749208      task-clock (msec)         #    0.001 CPUs utilized
                 1      context-switches          #    0.001 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                54      page-faults               #    0.072 M/sec
         1,122,815      cycles                    #    1.499 GHz
           286,740      stalled-cycles-frontend   #   25.54% frontend cycles idle
     <not counted>      stalled-cycles-backend                                        (0.00%)
     ^^^^^^^^^^^^
     <not counted>      instructions                                                  (0.00%)
     ^^^^^^^^^^^^
     <not counted>      branches                                                      (0.00%)
     <not counted>      branch-misses                                                 (0.00%)

       1.001550070 seconds time elapsed

The reason is that we have the HW watchdog consume one PMU counter
and when perf tries to schedule 6 events on 6 counters and some of
those counters are constrained to only a specific subset of PMCs by the
hardware, the event scheduling fails.

So let's disable the HW watchdog around a perf stat session running as
root and restore it after it to its previous state. This frees up the
one counter and the scheduling of the default events succeeds:

 Performance counter stats for 'sleep 1':

          0.806902      task-clock (msec)         #    0.001 CPUs utilized
                 1      context-switches          #    0.001 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                55      page-faults               #    0.068 M/sec
         1,200,677      cycles                    #    1.488 GHz
           308,044      stalled-cycles-frontend   #   25.66% frontend cycles idle
           424,292      stalled-cycles-backend    #   35.34% backend cycles idle
           672,694      instructions              #    0.56  insn per cycle
                                                  #    0.63  stalled cycles per insn
           132,965      branches                  #  164.785 M/sec
             7,300      branch-misses             #    5.49% of all branches

       1.001689739 seconds time elapsed

There's a --dont-disable-hwdt option which preserves the old behavior.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 tools/perf/builtin-stat.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a02f2e965628..b2aa2ed3161c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -146,6 +146,8 @@ static aggr_get_id_t		aggr_get_id;
 static bool			append_file;
 static const char		*output_name;
 static int			output_fd;
+static bool			keep_hwdt;
+static int			prev_hwdt;	/* previous HW watchdog state */
 
 struct perf_stat {
 	bool			 record;
@@ -1539,6 +1541,39 @@ static void print_counters(struct timespec *ts, int argc, const char **argv)
 	fflush(stat_config.output);
 }
 
+static void perf_stat_toggle_hwdt(int on)
+{
+	static const char *p = "sys/kernel/nmi_watchdog";
+	int val;
+
+	if (keep_hwdt)
+		return;
+
+	if (geteuid())
+		return;
+
+	if (procfs__read_int(p, &val) < 0)
+		return;
+
+	/* Reenable only when it was enabled before. */
+	if (on) {
+		if (prev_hwdt)
+			goto write;
+	/* Disable HWDT only when it is enabled. */
+	} else {
+		prev_hwdt = val;
+
+		if (val)
+			goto write;
+	}
+
+	return;
+
+write:
+	if (procfs__write_int(p, on) < 0)
+		return;
+}
+
 static volatile int signr = -1;
 
 static void skip_signal(int signo)
@@ -1575,6 +1610,8 @@ static void sig_atexit(void)
 
 	sigprocmask(SIG_SETMASK, &oset, NULL);
 
+	perf_stat_toggle_hwdt(1);
+
 	if (signr == -1)
 		return;
 
@@ -1659,6 +1696,8 @@ static const struct option stat_options[] = {
 			"Only print computed metrics. No raw values", enable_metric_only),
 	OPT_BOOLEAN(0, "topdown", &topdown_run,
 			"measure topdown level 1 statistics"),
+	OPT_BOOLEAN(0, "dont-disable-hwdt", &keep_hwdt,
+			"Do not disable HW NMI watchdog during the current session"),
 	OPT_END()
 };
 
@@ -2523,6 +2562,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (perf_stat_init_aggr_mode())
 		goto out;
 
+	perf_stat_toggle_hwdt(0);
+
 	/*
 	 * We dont want to block the signals - that would cause
 	 * child tasks to inherit that and Ctrl-C would not work.
-- 
2.11.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers
  2017-02-07  1:09           ` [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers Borislav Petkov
@ 2017-02-07  1:43             ` Arnaldo Carvalho de Melo
  2017-02-07 10:30               ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-02-07  1:43 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Robert Richter, Vince Weaver, lkml

Em Tue, Feb 07, 2017 at 02:09:17AM +0100, Borislav Petkov escreveu:
> From: Borislav Petkov <bp@suse.de>
> 
> Add helper functions to be able to read/write ints into proc files.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  tools/lib/api/fs/fs.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  tools/lib/api/fs/fs.h |  3 +++
>  2 files changed, 45 insertions(+)
> 
> diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
> index 4b6bfc43cccf..8e4b9fe18b75 100644
> --- a/tools/lib/api/fs/fs.c
> +++ b/tools/lib/api/fs/fs.c
> @@ -314,6 +314,22 @@ int filename__read_int(const char *filename, int *value)
>  	return err;
>  }
>  
> +int filename__write_int(const char *filename, int value)
> +{
> +	char line[64];
> +	int fd = open(filename, O_WRONLY), err = -1;
> +
> +	if (fd < 0)
> +		return -1;
> +
> +	snprintf(line, sizeof(int), "%d", value);
> +
> +	err = write(fd, line, strnlen(line, 64));
> +
> +	close(fd);
> +	return err;
> +}
> +
>  /*
>   * Parses @value out of @filename with strtoull.
>   * By using 0 for base, the strtoull detects the
> @@ -400,6 +416,32 @@ int procfs__read_str(const char *entry, char **buf, size_t *sizep)
>  	return filename__read_str(path, buf, sizep);
>  }
>  
> +int procfs__read_int(const char *entry, int *value)
> +{
> +	char path[PATH_MAX];
> +	const char *procfs = procfs__mountpoint();
> +
> +	if (!procfs)
> +		return -1;
> +
> +	snprintf(path, sizeof(path), "%s/%s", procfs, entry);
> +
> +	return filename__read_int(path, value);
> +}
> +
> +int procfs__write_int(const char *entry, int value)
> +{
> +	char path[PATH_MAX];
> +	const char *procfs = procfs__mountpoint();
> +
> +	if (!procfs)
> +		return -1;
> +
> +	snprintf(path, sizeof(path), "%s/%s", procfs, entry);
> +
> +	return filename__write_int(path, value);
> +}
> +
>  int sysfs__read_ull(const char *entry, unsigned long long *value)
>  {
>  	char path[PATH_MAX];
> diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
> index 6b332dc74498..095f25d4c70f 100644
> --- a/tools/lib/api/fs/fs.h
> +++ b/tools/lib/api/fs/fs.h
> @@ -28,9 +28,12 @@ FS(bpf_fs)
>  
>  
>  int filename__read_int(const char *filename, int *value);
> +int filename__write_int(const char *filename, int value);
>  int filename__read_ull(const char *filename, unsigned long long *value);
>  int filename__read_str(const char *filename, char **buf, size_t *sizep);
>  
> +int procfs__read_int(const char *entry, int *value);
> +int procfs__write_int(const char *entry, int value);
>  int procfs__read_str(const char *entry, char **buf, size_t *sizep);
>  
>  int sysctl__read_int(const char *sysctl, int *value);

Isn't sysctl__read_int() what you want?

See next patch...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/2] perf stat: Disable HW watchdog around a perf stat session
  2017-02-07  1:10           ` [PATCH 2/2] perf stat: Disable HW watchdog around a perf stat session Borislav Petkov
@ 2017-02-07  1:45             ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-02-07  1:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Robert Richter, Vince Weaver, lkml

Em Tue, Feb 07, 2017 at 02:10:28AM +0100, Borislav Petkov escreveu:
> From: Borislav Petkov <bp@suse.de>
> 
> When using perf stat on an AMD F15h system with the default hw events
> attributes, some of the events don't get counted:
> 
>  Performance counter stats for 'sleep 1':
> 
>           0.749208      task-clock (msec)         #    0.001 CPUs utilized
>                  1      context-switches          #    0.001 M/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 54      page-faults               #    0.072 M/sec
>          1,122,815      cycles                    #    1.499 GHz
>            286,740      stalled-cycles-frontend   #   25.54% frontend cycles idle
>      <not counted>      stalled-cycles-backend                                        (0.00%)
>      ^^^^^^^^^^^^
>      <not counted>      instructions                                                  (0.00%)
>      ^^^^^^^^^^^^
>      <not counted>      branches                                                      (0.00%)
>      <not counted>      branch-misses                                                 (0.00%)
> 
>        1.001550070 seconds time elapsed
> 
> The reason is that we have the HW watchdog consume one PMU counter
> and when perf tries to schedule 6 events on 6 counters and some of
> those counters are constrained to only a specific subset of PMCs by the
> hardware, the event scheduling fails.
> 
> So let's disable the HW watchdog around a perf stat session running as
> root and restore it after it to its previous state. This frees up the
> one counter and the scheduling of the default events succeeds:
> 
>  Performance counter stats for 'sleep 1':
> 
>           0.806902      task-clock (msec)         #    0.001 CPUs utilized
>                  1      context-switches          #    0.001 M/sec
>                  0      cpu-migrations            #    0.000 K/sec
>                 55      page-faults               #    0.068 M/sec
>          1,200,677      cycles                    #    1.488 GHz
>            308,044      stalled-cycles-frontend   #   25.66% frontend cycles idle
>            424,292      stalled-cycles-backend    #   35.34% backend cycles idle
>            672,694      instructions              #    0.56  insn per cycle
>                                                   #    0.63  stalled cycles per insn
>            132,965      branches                  #  164.785 M/sec
>              7,300      branch-misses             #    5.49% of all branches
> 
>        1.001689739 seconds time elapsed
> 
> There's a --dont-disable-hwdt option which preserves the old behavior.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  tools/perf/builtin-stat.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index a02f2e965628..b2aa2ed3161c 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -146,6 +146,8 @@ static aggr_get_id_t		aggr_get_id;
>  static bool			append_file;
>  static const char		*output_name;
>  static int			output_fd;
> +static bool			keep_hwdt;
> +static int			prev_hwdt;	/* previous HW watchdog state */
>  
>  struct perf_stat {
>  	bool			 record;
> @@ -1539,6 +1541,39 @@ static void print_counters(struct timespec *ts, int argc, const char **argv)
>  	fflush(stat_config.output);
>  }
>  
> +static void perf_stat_toggle_hwdt(int on)
> +{
> +	static const char *p = "sys/kernel/nmi_watchdog";
   	static const char *p = "kernel/nmi_watchdog";
> +	int val;
> +
> +	if (keep_hwdt)
> +		return;
> +
> +	if (geteuid())
> +		return;
> +
> +	if (procfs__read_int(p, &val) < 0)
	    sysctl__read_int(p, &val) < 0)
> +		return;
> +
> +	/* Reenable only when it was enabled before. */
> +	if (on) {
> +		if (prev_hwdt)
> +			goto write;
> +	/* Disable HWDT only when it is enabled. */
> +	} else {
> +		prev_hwdt = val;
> +
> +		if (val)
> +			goto write;
> +	}
> +
> +	return;
> +
> +write:
> +	if (procfs__write_int(p, on) < 0)
> +		return;
> +}
> +
>  static volatile int signr = -1;
>  
>  static void skip_signal(int signo)
> @@ -1575,6 +1610,8 @@ static void sig_atexit(void)
>  
>  	sigprocmask(SIG_SETMASK, &oset, NULL);
>  
> +	perf_stat_toggle_hwdt(1);
> +
>  	if (signr == -1)
>  		return;
>  
> @@ -1659,6 +1696,8 @@ static const struct option stat_options[] = {
>  			"Only print computed metrics. No raw values", enable_metric_only),
>  	OPT_BOOLEAN(0, "topdown", &topdown_run,
>  			"measure topdown level 1 statistics"),
> +	OPT_BOOLEAN(0, "dont-disable-hwdt", &keep_hwdt,
> +			"Do not disable HW NMI watchdog during the current session"),
>  	OPT_END()
>  };
>  
> @@ -2523,6 +2562,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
>  	if (perf_stat_init_aggr_mode())
>  		goto out;
>  
> +	perf_stat_toggle_hwdt(0);
> +
>  	/*
>  	 * We dont want to block the signals - that would cause
>  	 * child tasks to inherit that and Ctrl-C would not work.
> -- 
> 2.11.0
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-06 13:23           ` Borislav Petkov
@ 2017-02-07  7:25             ` Ingo Molnar
  2017-02-07 10:54               ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2017-02-07  7:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Robert Richter, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Vince Weaver, lkml


* Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Feb 06, 2017 at 02:18:32PM +0100, Robert Richter wrote:
>
> > Isn't there the danger the previous watchdog state is never restored if for 
> > some reason perf got killed? So maybe have some other task running that 
> > restores it once perf is gone.
> 
> Currently, I'm restoring it in the atexit() sighandler. Isn't that always 
> called?

Normally yes, but it's not guaranteed as atexit() is all user-space, SIGKILL (or 
OOM) or a straight exit (or a crash in the exit handler itself) will cause it to 
not run.

But there's only so much we can do about that, the /proc/sys API is fundamentally 
lossy in that regard. We'd have to add much more involved kernel support to 
guarantee that the watchdog state is restored.

A way to do it would be create a new /proc/sys/kernel/watchdog_disable_file that 
disables that watchdog while it's _open_. When a task exits and the kernel 
automatically closes the file, the watchdog is re-enabled again. (Or the process 
itself can close the file too.)

This method would also nest properly and would handle multi-processes races 
correctly: for example if a script runs perf as root, and root uses 'perf top', 
the two should not race and the hardware watchdog should not end up being 
disabled...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers
  2017-02-07  1:43             ` Arnaldo Carvalho de Melo
@ 2017-02-07 10:30               ` Borislav Petkov
  2017-02-07 15:00                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-07 10:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Robert Richter, Vince Weaver, lkml

On Mon, Feb 06, 2017 at 10:43:56PM -0300, Arnaldo Carvalho de Melo wrote:
> >  int sysctl__read_int(const char *sysctl, int *value);
> 
> Isn't sysctl__read_int() what you want?

Right, so looking at this: don't you think that having both sysctl__*
and procfs__* is a little redundant?

The sysctl* things are doing the accesses over proc so shouldn't it all
be procfs__* interfaces and no sysctl__* ones at all

or

at least the sysctl__* ones should call the procfs__* ones?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-07  7:25             ` Ingo Molnar
@ 2017-02-07 10:54               ` Borislav Petkov
  2017-02-07 15:06                 ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-07 10:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Robert Richter, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Vince Weaver, lkml

On Tue, Feb 07, 2017 at 08:25:12AM +0100, Ingo Molnar wrote:
> But there's only so much we can do about that, the /proc/sys API is fundamentally 
> lossy in that regard. We'd have to add much more involved kernel support to 
> guarantee that the watchdog state is restored.

So I think doing all this is meh but I guess I probably should do it
just so that we're thorough.

> A way to do it would be create a new /proc/sys/kernel/watchdog_disable_file that 
> disables that watchdog while it's _open_. When a task exits and the kernel 
> automatically closes the file, the watchdog is re-enabled again. (Or the process 
> itself can close the file too.)
> 
> This method would also nest properly and would handle multi-processes races 
> correctly: for example if a script runs perf as root, and root uses 'perf top', 
> the two should not race and the hardware watchdog should not end up being 
> disabled...

Hmm, so I don't like the aspect of adding a /proc file just for that.

Can we do something with sys_perf_event_open(..., flags) instead and
pass in a new flag that says:

PERF_FLAG_SHOO_COUNTER_USERS

or so which would go and turn off HW WDT (and possibly future things
using counters) while we're running a session?

The name should be generic enough so that we can use it for future
temporary disabling of things while a perf session runs.

Then on perf's exit path - I see there are a bunch of _destroy() things
being called when events are disappearing - we'd reenable stuff again
based on that flag.

This way we're clean in userspace and have the maximum control over
everything since we're in the kernel.

>From a quick staring, there's PERF_FLAG_FD_CLOEXEC which is a good
example for something like that. It doesn't do what I'd like to do but I
think I should model this in a similar fashion.

Thoughts?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers
  2017-02-07 10:30               ` Borislav Petkov
@ 2017-02-07 15:00                 ` Arnaldo Carvalho de Melo
  2017-02-07 15:08                   ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-02-07 15:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Robert Richter, Vince Weaver, lkml

Em Tue, Feb 07, 2017 at 11:30:01AM +0100, Borislav Petkov escreveu:
> On Mon, Feb 06, 2017 at 10:43:56PM -0300, Arnaldo Carvalho de Melo wrote:
> > >  int sysctl__read_int(const char *sysctl, int *value);
> > 
> > Isn't sysctl__read_int() what you want?
> 
> Right, so looking at this: don't you think that having both sysctl__*
> and procfs__* is a little redundant?
> 
> The sysctl* things are doing the accesses over proc so shouldn't it all
> be procfs__* interfaces and no sysctl__* ones at all
> 
> or
> 
> at least the sysctl__* ones should call the procfs__* ones?

Well, I see this as: sysctls are implemented as files in procfs, but
could conceivably be implemented somewhere else, just like the events
file was implemented in debugfs but then was moved to a separate
filesystem type, tracefs.

So it being in procfs is an implementation detail, what I'm interested
are sysctls, and names for sysctls will be appended to wherever the
sysctl is made available in the file system.

For instance:

[root@jouet ~]# sysctl kernel.watchdog
kernel.watchdog = 1
[root@jouet ~]#

So the sysctl is "kernel.watchdog" and it is set to one.

So, I suggest you use:

	if (sysctl__read_int("kernel/watchdog", &watchdog_value) == 0)
		/* do whatever you want with it */

And forget that under the hood this is in something called "procfs".

:-)

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-07 10:54               ` Borislav Petkov
@ 2017-02-07 15:06                 ` Borislav Petkov
  2017-02-11 17:03                   ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-07 15:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Robert Richter, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Vince Weaver, lkml

Btw,

I'm wondering if, alternatively, we add this to the manpage of perf stat
and be done with it:

"Remember to turn off HW watchdog temporarily as it is taking up one hw
counter:

$ echo  0 > /proc/sys/kernel/nmi_watchdog
 ... perf workload
$ echo  1 > /proc/sys/kernel/nmi_watchdog

in case some of the specified events don't get counted."

This solves the problem rather simply :-)))

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers
  2017-02-07 15:00                 ` Arnaldo Carvalho de Melo
@ 2017-02-07 15:08                   ` Borislav Petkov
  2017-02-07 15:34                     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-07 15:08 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Robert Richter, Vince Weaver, lkml

On Tue, Feb 07, 2017 at 12:00:45PM -0300, Arnaldo Carvalho de Melo wrote:
> And forget that under the hood this is in something called "procfs".

... unless one day you want to set a file in /proc which is not under
the sysctl hierarchy :-)

I mean, I don't care what I do - you're the maintainer - I'm just giving
you 2 cents :-)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers
  2017-02-07 15:08                   ` Borislav Petkov
@ 2017-02-07 15:34                     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-02-07 15:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Peter Zijlstra, Robert Richter, Vince Weaver, lkml

Em Tue, Feb 07, 2017 at 04:08:02PM +0100, Borislav Petkov escreveu:
> On Tue, Feb 07, 2017 at 12:00:45PM -0300, Arnaldo Carvalho de Melo wrote:
> > And forget that under the hood this is in something called "procfs".
 
> ... unless one day you want to set a file in /proc which is not under
> the sysctl hierarchy :-)

In that case you use procfs__read_int(), no problem. 
 
> I mean, I don't care what I do - you're the maintainer - I'm just giving
> you 2 cents :-)

:-)

- Arnaldo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-07 15:06                 ` Borislav Petkov
@ 2017-02-11 17:03                   ` Borislav Petkov
  2017-02-11 17:59                     ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Borislav Petkov @ 2017-02-11 17:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Robert Richter, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Vince Weaver, lkml

Ok,

turns out perf-list(1) already talks about it in the "EVENT GROUPS"
section. How about this then:

---
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a02f2e965628..2d18283574db 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -146,6 +146,7 @@ static aggr_get_id_t		aggr_get_id;
 static bool			append_file;
 static const char		*output_name;
 static int			output_fd;
+static int			print_free_counters_hint;
 
 struct perf_stat {
 	bool			 record;
@@ -1109,6 +1110,9 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval,
 			counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
 			csv_sep);
 
+		if (counter->supported)
+			print_free_counters_hint = 1;
+
 		fprintf(stat_config.output, "%-*s%s",
 			csv_output ? 0 : unit_width,
 			counter->unit, csv_sep);
@@ -1476,7 +1480,13 @@ static void print_footer(void)
 		print_noise_pct(stddev_stats(&walltime_nsecs_stats),
 				avg_stats(&walltime_nsecs_stats));
 	}
+
 	fprintf(output, "\n\n");
+
+	if (print_free_counters_hint)
+		fprintf(output,
+"Some events couldn't be scheduled. Consider freeing some counters by disabling the NMI watchdog temporarily, \n"
+"for example. See perf-list(1) manpage.\n");
 }
 
 static void print_counters(struct timespec *ts, int argc, const char **argv)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-11 17:03                   ` Borislav Petkov
@ 2017-02-11 17:59                     ` Ingo Molnar
  2017-02-11 18:32                       ` Borislav Petkov
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2017-02-11 17:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Robert Richter, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Vince Weaver, lkml


* Borislav Petkov <bp@alien8.de> wrote:

> Ok,
> 
> turns out perf-list(1) already talks about it in the "EVENT GROUPS"
> section. How about this then:
> 
> ---
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index a02f2e965628..2d18283574db 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -146,6 +146,7 @@ static aggr_get_id_t		aggr_get_id;
>  static bool			append_file;
>  static const char		*output_name;
>  static int			output_fd;
> +static int			print_free_counters_hint;
>  
>  struct perf_stat {
>  	bool			 record;
> @@ -1109,6 +1110,9 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval,
>  			counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
>  			csv_sep);
>  
> +		if (counter->supported)
> +			print_free_counters_hint = 1;
> +
>  		fprintf(stat_config.output, "%-*s%s",
>  			csv_output ? 0 : unit_width,
>  			counter->unit, csv_sep);
> @@ -1476,7 +1480,13 @@ static void print_footer(void)
>  		print_noise_pct(stddev_stats(&walltime_nsecs_stats),
>  				avg_stats(&walltime_nsecs_stats));
>  	}
> +
>  	fprintf(output, "\n\n");
> +
> +	if (print_free_counters_hint)
> +		fprintf(output,
> +"Some events couldn't be scheduled. Consider freeing some counters by disabling the NMI watchdog temporarily, \n"
> +"for example. See perf-list(1) manpage.\n");

So I checked the perf-list manpage and it didn't tell me much about how to disable 
the NMI watchdog.

How about a more proactive hint, something like:

	To disable the NMI watchdog permanently, do:

		sudo echo kernel.nmi_watchdog=0 >> /etc/sysctl.conf
		sudo sysctl -p

? (untested!)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-11 17:59                     ` Ingo Molnar
@ 2017-02-11 18:32                       ` Borislav Petkov
  2017-02-11 20:41                         ` Ingo Molnar
  2017-03-07  7:21                         ` [tip:perf/core] perf stat: Issue a HW watchdog disable hint tip-bot for Borislav Petkov
  0 siblings, 2 replies; 26+ messages in thread
From: Borislav Petkov @ 2017-02-11 18:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Robert Richter, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Vince Weaver, lkml

On Sat, Feb 11, 2017 at 06:59:10PM +0100, Ingo Molnar wrote:
> So I checked the perf-list manpage and it didn't tell me much about how to disable 
> the NMI watchdog.

Oh, it is buried there:

"
EVENT GROUPS
------------

...


Globally pinned events can limit the number of counters available for
other groups. On x86 systems, the NMI watchdog pins a counter by default.
The nmi watchdog can be disabled as root with

	echo 0 > /proc/sys/kernel/nmi_watchdog"

> How about a more proactive hint, something like:
> 
> 	To disable the NMI watchdog permanently, do:

Why permanently? We want it to run and be disabled around the
measurement only. Anyway, here's something more to the point:

---
From: Borislav Petkov <bp@suse.de>
Date: Tue, 7 Feb 2017 01:40:05 +0100
Subject: [PATCH -v2] perf stat: Issue a HW watchdog disable hint

When using perf stat on an AMD F15h system with the default hw events
attributes, some of the events don't get counted:

 Performance counter stats for 'sleep 1':

          0.749208      task-clock (msec)         #    0.001 CPUs utilized
                 1      context-switches          #    0.001 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                54      page-faults               #    0.072 M/sec
         1,122,815      cycles                    #    1.499 GHz
           286,740      stalled-cycles-frontend   #   25.54% frontend cycles idle
     <not counted>      stalled-cycles-backend                                        (0.00%)
     ^^^^^^^^^^^^
     <not counted>      instructions                                                  (0.00%)
     ^^^^^^^^^^^^
     <not counted>      branches                                                      (0.00%)
     <not counted>      branch-misses                                                 (0.00%)

       1.001550070 seconds time elapsed

The reason is that we have the HW watchdog consuming one PMU counter
and when perf tries to schedule 6 events on 6 counters and some of
those counters are constrained to only a specific subset of PMCs by the
hardware, the event scheduling fails.

So issue a hint to disable the HW watchdog around a perf stat session.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 tools/perf/builtin-stat.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a02f2e965628..a2763243a03d 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -146,6 +146,7 @@ static aggr_get_id_t		aggr_get_id;
 static bool			append_file;
 static const char		*output_name;
 static int			output_fd;
+static int			print_free_counters_hint;
 
 struct perf_stat {
 	bool			 record;
@@ -1109,6 +1110,9 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval,
 			counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
 			csv_sep);
 
+		if (counter->supported)
+			print_free_counters_hint = 1;
+
 		fprintf(stat_config.output, "%-*s%s",
 			csv_output ? 0 : unit_width,
 			counter->unit, csv_sep);
@@ -1477,6 +1481,13 @@ static void print_footer(void)
 				avg_stats(&walltime_nsecs_stats));
 	}
 	fprintf(output, "\n\n");
+
+	if (print_free_counters_hint)
+		fprintf(output,
+"Some events weren't counted. Try disabling the NMI watchdog:\n"
+"	echo 0 > /proc/sys/kernel/nmi_watchdog\n"
+"	perf stat ...\n"
+"	echo 1 > /proc/sys/kernel/nmi_watchdog\n");
 }
 
 static void print_counters(struct timespec *ts, int argc, const char **argv)
-- 
2.11.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] perf/stat: Add --disable-hwdt
  2017-02-11 18:32                       ` Borislav Petkov
@ 2017-02-11 20:41                         ` Ingo Molnar
  2017-03-07  7:21                         ` [tip:perf/core] perf stat: Issue a HW watchdog disable hint tip-bot for Borislav Petkov
  1 sibling, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2017-02-11 20:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Robert Richter, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Vince Weaver, lkml


* Borislav Petkov <bp@alien8.de> wrote:

> On Sat, Feb 11, 2017 at 06:59:10PM +0100, Ingo Molnar wrote:
> > So I checked the perf-list manpage and it didn't tell me much about how to disable 
> > the NMI watchdog.
> 
> Oh, it is buried there:
> 
> "
> EVENT GROUPS
> ------------
> 
> ...
> 
> 
> Globally pinned events can limit the number of counters available for
> other groups. On x86 systems, the NMI watchdog pins a counter by default.
> The nmi watchdog can be disabled as root with
> 
> 	echo 0 > /proc/sys/kernel/nmi_watchdog"
> 
> > How about a more proactive hint, something like:
> > 
> > 	To disable the NMI watchdog permanently, do:
> 
> Why permanently? We want it to run and be disabled around the
> measurement only. Anyway, here's something more to the point:

> +	if (print_free_counters_hint)
> +		fprintf(output,
> +"Some events weren't counted. Try disabling the NMI watchdog:\n"
> +"	echo 0 > /proc/sys/kernel/nmi_watchdog\n"
> +"	perf stat ...\n"
> +"	echo 1 > /proc/sys/kernel/nmi_watchdog\n");
>  }

Ok, looks good to me!

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip:perf/core] perf stat: Issue a HW watchdog disable hint
  2017-02-11 18:32                       ` Borislav Petkov
  2017-02-11 20:41                         ` Ingo Molnar
@ 2017-03-07  7:21                         ` tip-bot for Borislav Petkov
  1 sibling, 0 replies; 26+ messages in thread
From: tip-bot for Borislav Petkov @ 2017-03-07  7:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: vince, mingo, tglx, bp, rric, linux-kernel, acme, peterz, hpa

Commit-ID:  02d492e5dcb72c004d213756eb87c9d62a6d76a7
Gitweb:     http://git.kernel.org/tip/02d492e5dcb72c004d213756eb87c9d62a6d76a7
Author:     Borislav Petkov <bp@suse.de>
AuthorDate: Tue, 7 Feb 2017 01:40:05 +0100
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 3 Mar 2017 19:07:13 -0300

perf stat: Issue a HW watchdog disable hint

When using perf stat on an AMD F15h system with the default hw events
attributes, some of the events don't get counted:

 Performance counter stats for 'sleep 1':

          0.749208      task-clock (msec)         #    0.001 CPUs utilized
                 1      context-switches          #    0.001 M/sec
                 0      cpu-migrations            #    0.000 K/sec
                54      page-faults               #    0.072 M/sec
         1,122,815      cycles                    #    1.499 GHz
           286,740      stalled-cycles-frontend   #   25.54% frontend cycles idle
     <not counted>      stalled-cycles-backend                                        (0.00%)
     ^^^^^^^^^^^^
     <not counted>      instructions                                                  (0.00%)
     ^^^^^^^^^^^^
     <not counted>      branches                                                      (0.00%)
     <not counted>      branch-misses                                                 (0.00%)

       1.001550070 seconds time elapsed

The reason is that we have the HW watchdog consuming one PMU counter and
when perf tries to schedule 6 events on 6 counters and some of those
counters are constrained to only a specific subset of PMCs by the
hardware, the event scheduling fails.

So issue a hint to disable the HW watchdog around a perf stat session.

Committer note:

Testing it...

  # perf stat -d usleep 1

   Performance counter stats for 'usleep 1':

          1.180203      task-clock (msec)         #    0.490 CPUs utilized
                 1      context-switches          #    0.847 K/sec
                 0      cpu-migrations            #    0.000 K/sec
                54      page-faults               #    0.046 M/sec
           184,754      cycles                    #    0.157 GHz
           714,553      instructions              #    3.87  insn per cycle
           154,661      branches                  #  131.046 M/sec
             7,247      branch-misses             #    4.69% of all branches
           219,984      L1-dcache-loads           #  186.395 M/sec
            17,600      L1-dcache-load-misses     #    8.00% of all L1-dcache hits    (90.16%)
     <not counted>      LLC-loads                                                     (0.00%)
     <not counted>      LLC-load-misses                                               (0.00%)

       0.002406823 seconds time elapsed

  Some events weren't counted. Try disabling the NMI watchdog:
	echo 0 > /proc/sys/kernel/nmi_watchdog
	perf stat ...
	echo 1 > /proc/sys/kernel/nmi_watchdog
  #

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Vince Weaver <vince@deater.net>
Link: http://lkml.kernel.org/r/20170211183218.ijnvb5f7ciyuunx4@pd.tnic
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-stat.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 13b5499..f4f555a 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -146,6 +146,7 @@ static aggr_get_id_t		aggr_get_id;
 static bool			append_file;
 static const char		*output_name;
 static int			output_fd;
+static int			print_free_counters_hint;
 
 struct perf_stat {
 	bool			 record;
@@ -1109,6 +1110,9 @@ static void printout(int id, int nr, struct perf_evsel *counter, double uval,
 			counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
 			csv_sep);
 
+		if (counter->supported)
+			print_free_counters_hint = 1;
+
 		fprintf(stat_config.output, "%-*s%s",
 			csv_output ? 0 : unit_width,
 			counter->unit, csv_sep);
@@ -1477,6 +1481,13 @@ static void print_footer(void)
 				avg_stats(&walltime_nsecs_stats));
 	}
 	fprintf(output, "\n\n");
+
+	if (print_free_counters_hint)
+		fprintf(output,
+"Some events weren't counted. Try disabling the NMI watchdog:\n"
+"	echo 0 > /proc/sys/kernel/nmi_watchdog\n"
+"	perf stat ...\n"
+"	echo 1 > /proc/sys/kernel/nmi_watchdog\n");
 }
 
 static void print_counters(struct timespec *ts, int argc, const char **argv)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-03-07  8:50 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-06 12:15 [RFC PATCH] perf/stat: Add --disable-hwdt Borislav Petkov
2017-02-06 12:22 ` Ingo Molnar
2017-02-06 12:41   ` Borislav Petkov
2017-02-06 12:44     ` Ingo Molnar
2017-02-06 12:49       ` Borislav Petkov
2017-02-06 13:18         ` Robert Richter
2017-02-06 13:23           ` Borislav Petkov
2017-02-07  7:25             ` Ingo Molnar
2017-02-07 10:54               ` Borislav Petkov
2017-02-07 15:06                 ` Borislav Petkov
2017-02-11 17:03                   ` Borislav Petkov
2017-02-11 17:59                     ` Ingo Molnar
2017-02-11 18:32                       ` Borislav Petkov
2017-02-11 20:41                         ` Ingo Molnar
2017-03-07  7:21                         ` [tip:perf/core] perf stat: Issue a HW watchdog disable hint tip-bot for Borislav Petkov
2017-02-06 14:23           ` [RFC PATCH] perf/stat: Add --disable-hwdt Vince Weaver
2017-02-06 17:02             ` Borislav Petkov
2017-02-07  1:08         ` Borislav Petkov
2017-02-07  1:09           ` [PATCH 1/2] tools/lib/api/fs: Add procfs int read/write helpers Borislav Petkov
2017-02-07  1:43             ` Arnaldo Carvalho de Melo
2017-02-07 10:30               ` Borislav Petkov
2017-02-07 15:00                 ` Arnaldo Carvalho de Melo
2017-02-07 15:08                   ` Borislav Petkov
2017-02-07 15:34                     ` Arnaldo Carvalho de Melo
2017-02-07  1:10           ` [PATCH 2/2] perf stat: Disable HW watchdog around a perf stat session Borislav Petkov
2017-02-07  1:45             ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.