linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3 v3] perf: precise mode and exclude_guest
@ 2012-09-13 20:59 David Ahern
  2012-09-13 20:59 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: David Ahern @ 2012-09-13 20:59 UTC (permalink / raw)
  To: acme, linux-kernel, peterz; +Cc: David Ahern

Hopefully this wraps up the precise mode-exclude_guest dependency.
I'm sure someone will let me know if I screwed up the attribution
in the second patch.

David Ahern (2):
  perf tool: precise mode requires exclude_guest
  perf tool: give user better message if precise is not supported

Peter Zijlstra (1):
  perf: require exclude_guest to use PEBS - kernel side enforcement

v3: removed extra tab in patch 2
v2: updated commit messages for patches 1 and 2

 arch/x86/kernel/cpu/perf_event.c |    6 ++++++
 tools/perf/builtin-record.c      |    5 +++++
 tools/perf/builtin-top.c         |    4 ++++
 tools/perf/util/parse-events.c   |    3 +++
 4 files changed, 18 insertions(+)

-- 
1.7.10.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] perf tool: precise mode requires exclude_guest
  2012-09-13 20:59 [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
@ 2012-09-13 20:59 ` David Ahern
  2012-10-20  0:55   ` [tip:perf/urgent] perf tool: Precise " tip-bot for David Ahern
  2012-09-13 20:59 ` [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement David Ahern
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2012-09-13 20:59 UTC (permalink / raw)
  To: acme, linux-kernel, peterz
  Cc: David Ahern, Ingo Molnar, Robert Richter, Gleb Natapov, Avi Kivity

Summary of events per Peter:

  "Intel PEBS in VT-x context uses the DS address as a guest linear address,
  even though its programmed by the host as a host linear address. This
  either results in guest memory corruption and or the hardware faulting and
  'crashing' the virtual machine.  Therefore we have to disable PEBS on VT-x
  enter and re-enable on VT-x exit, enforcing a strict exclude_guest.

  AMB IBS does work but doesn't currently support exclude_* at all,
  setting an exclude_* bit will make it fail."

This patch handles userspace perf command, setting the exclude_guest
attribute if precise mode is requested, but only if a user has not
specified a request for guest or host only profiling (G or H attribute).

Kernel side AMD currently ignores all exclude_* bits, so there is no impact
to existing IBS code paths. Robert Richter has a patch where IBS code will
return EINVAL if an exclude_* bit is set. When this goes in it means use
of :p on AMD with IBS will first fail with EINVAL (because exclude_guest
will be set). Then the existing fallback code within perf will unset
exclude_guest and try again. The second attempt will succeed if the CPU
supports IBS profiling.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Link: https://lkml.org/lkml/2012/7/9/264
---
 tools/perf/util/parse-events.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 44afcf4..696cc7e 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -694,6 +694,9 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
 			eH = 0;
 		} else if (*str == 'p') {
 			precise++;
+			/* use of precise requires exclude_guest */
+			if (!exclude_GH)
+				eG = 1;
 		} else
 			break;
 
-- 
1.7.10.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement
  2012-09-13 20:59 [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
  2012-09-13 20:59 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
@ 2012-09-13 20:59 ` David Ahern
  2012-10-20  0:56   ` [tip:perf/urgent] perf: Require " tip-bot for Peter Zijlstra
  2012-09-13 20:59 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2012-09-13 20:59 UTC (permalink / raw)
  To: acme, linux-kernel, peterz
  Cc: David Ahern, Ingo Molnar, Robert Richter, Gleb Natapov, Avi Kivity

From: Peter Zijlstra <peterz@infradead.org>

Per Peter:
  "Intel PEBS in VT-x context uses the DS address as a guest linear address,
  even though its programmed by the host as a host linear address. This
  either results in guest memory corruption and or the hardware faulting and
  'crashing' the virtual machine.  Therefore we have to disable PEBS on VT-x
  enter and re-enable on VT-x exit, enforcing a strict exclude_guest.

This patch enforces exclude_guest kernel side.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Link: https://lkml.org/lkml/2012/7/9/298
---
 arch/x86/kernel/cpu/perf_event.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 915b876..4bc96c5 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -338,6 +338,9 @@ int x86_setup_perfctr(struct perf_event *event)
 		/* BTS is currently only allowed for user-mode. */
 		if (!attr->exclude_kernel)
 			return -EOPNOTSUPP;
+
+		if (!attr->exclude_guest)
+			return -EOPNOTSUPP;
 	}
 
 	hwc->config |= config;
@@ -380,6 +383,9 @@ int x86_pmu_hw_config(struct perf_event *event)
 	if (event->attr.precise_ip) {
 		int precise = 0;
 
+		if (!event->attr.exclude_guest)
+			return -EOPNOTSUPP;
+
 		/* Support for constant skid */
 		if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
 			precise++;
-- 
1.7.10.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-13 20:59 [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
  2012-09-13 20:59 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
  2012-09-13 20:59 ` [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement David Ahern
@ 2012-09-13 20:59 ` David Ahern
  2012-09-14  5:43   ` Ingo Molnar
  2012-10-25  8:01   ` [tip:perf/core] perf tools: Give " tip-bot for David Ahern
  2012-09-26  1:24 ` [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
  2012-10-09 15:08 ` David Ahern
  4 siblings, 2 replies; 19+ messages in thread
From: David Ahern @ 2012-09-13 20:59 UTC (permalink / raw)
  To: acme, linux-kernel, peterz; +Cc: David Ahern, Ingo Molnar, Robert Richter

Platforms (e.g., VM's) without support for precise mode get a confusing
error message. e.g.,
$ perf record -e cycles:p -a -- sleep 1

  Error: sys_perf_event_open() syscall returned with 95 (Operation not
  supported).  /bin/dmesg may provide additional information.

  No hardware sampling interrupt available. No APIC? If so then you can
  boot the kernel with the "lapic" boot parameter to force-enable it.
  sleep: Terminated

which is not clear that precise mode might be the root problem. With this
patch:

$ perf record -e cycles:p -fo /tmp/perf.data -- sleep 1
  Error:
  'precise' request may not be supported. Try removing 'p' modifier
  sleep: Terminated

v2: softened message to 'may not be' supported per Robert's suggestion

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
---
 tools/perf/builtin-record.c |    5 +++++
 tools/perf/builtin-top.c    |    4 ++++
 2 files changed, 9 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index c643ed6..385e272 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -294,6 +294,11 @@ try_again:
 					  perf_evsel__name(pos));
 				rc = -err;
 				goto out;
+			} else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
+				ui__error("\'precise\' request may not be supported. "
+					  "Try removing 'p' modifier\n");
+				rc = -err;
+				goto out;
 			}
 
 			printf("\n");
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 5550754..1d1c98c 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -976,6 +976,10 @@ try_again:
 				ui__error("Too many events are opened.\n"
 					    "Try again after reducing the number of events\n");
 				goto out_err;
+			} else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
+				ui__error("\'precise\' request may not be supported. "
+					  "Try removing 'p' modifier\n");
+				goto out_err;
 			}
 
 			ui__error("The sys_perf_event_open() syscall "
-- 
1.7.10.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-13 20:59 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
@ 2012-09-14  5:43   ` Ingo Molnar
  2012-09-14 11:13     ` David Ahern
  2012-10-25  8:01   ` [tip:perf/core] perf tools: Give " tip-bot for David Ahern
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2012-09-14  5:43 UTC (permalink / raw)
  To: David Ahern; +Cc: acme, linux-kernel, peterz, Robert Richter


* David Ahern <dsahern@gmail.com> wrote:

> Platforms (e.g., VM's) without support for precise mode get a confusing
> error message. e.g.,
> $ perf record -e cycles:p -a -- sleep 1
> 
>   Error: sys_perf_event_open() syscall returned with 95 (Operation not
>   supported).  /bin/dmesg may provide additional information.
> 
>   No hardware sampling interrupt available. No APIC? If so then you can
>   boot the kernel with the "lapic" boot parameter to force-enable it.
>   sleep: Terminated
> 
> which is not clear that precise mode might be the root problem. With this
> patch:
> 
> $ perf record -e cycles:p -fo /tmp/perf.data -- sleep 1
>   Error:
>   'precise' request may not be supported. Try removing 'p' modifier
>   sleep: Terminated
> 
> v2: softened message to 'may not be' supported per Robert's suggestion

Well, either it's supported on this machine or it's not - why 
does the text have to be so unsure about it?

We use computers to increase determinism, not to insert extra 
uncertainty! ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-14  5:43   ` Ingo Molnar
@ 2012-09-14 11:13     ` David Ahern
  2012-09-14 11:36       ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2012-09-14 11:13 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: acme, linux-kernel, peterz, Robert Richter

On 9/13/12 11:43 PM, Ingo Molnar wrote:
>> v2: softened message to 'may not be' supported per Robert's suggestion
>
> Well, either it's supported on this machine or it's not - why
> does the text have to be so unsure about it?

Because EOPNOTSUPP is returned for more than just precise mode. We 
cannot say with certainty that the precise attribute caused that errno.

David



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-14 11:13     ` David Ahern
@ 2012-09-14 11:36       ` Ingo Molnar
  2012-09-14 11:43         ` David Ahern
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2012-09-14 11:36 UTC (permalink / raw)
  To: David Ahern; +Cc: acme, linux-kernel, peterz, Robert Richter


* David Ahern <dsahern@gmail.com> wrote:

> On 9/13/12 11:43 PM, Ingo Molnar wrote:
> >>v2: softened message to 'may not be' supported per Robert's suggestion
> >
> > Well, either it's supported on this machine or it's not - 
> > why does the text have to be so unsure about it?
> 
> Because EOPNOTSUPP is returned for more than just precise 
> mode. We cannot say with certainty that the precise attribute 
> caused that errno.

Well, then that is useful information we *lost*, and that 
situation needs to be improved on the ABI side: an expanded 
error code present in the event structure, copied back to 
user-space on errors, or so.

(Alternatively, a special event channel just to pass back 
expanded error conditions.)

Computers are supposed to make life easier for humans, by 
answering such "what did go wrong?" questions. Our losing of 
precise error conditions is a usability bug really - and in the 
perf project we are in a unique position to be able to improve 
both the kernel side code and make immediate use of it on the 
tooling side as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-14 11:36       ` Ingo Molnar
@ 2012-09-14 11:43         ` David Ahern
  2012-09-14 18:00           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2012-09-14 11:43 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: acme, linux-kernel, peterz, Robert Richter

On 9/14/12 5:36 AM, Ingo Molnar wrote:
> Well, then that is useful information we *lost*, and that
> situation needs to be improved on the ABI side: an expanded
> error code present in the event structure, copied back to
> user-space on errors, or so.
>
> (Alternatively, a special event channel just to pass back
> expanded error conditions.)
>
> Computers are supposed to make life easier for humans, by
> answering such "what did go wrong?" questions. Our losing of
> precise error conditions is a usability bug really - and in the
> perf project we are in a unique position to be able to improve
> both the kernel side code and make immediate use of it on the
> tooling side as well.

Understood and there have been suggestions on how to definitely state 
what the kernel side did not like. I like Peter's last suggestion -- 
something along the lines of clearing attr on a failure except the 
offending setting.

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-14 11:43         ` David Ahern
@ 2012-09-14 18:00           ` Arnaldo Carvalho de Melo
  2012-09-14 18:07             ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-09-14 18:00 UTC (permalink / raw)
  To: David Ahern; +Cc: Ingo Molnar, linux-kernel, peterz, Robert Richter

Em Fri, Sep 14, 2012 at 05:43:42AM -0600, David Ahern escreveu:
> On 9/14/12 5:36 AM, Ingo Molnar wrote:

> >Well, then that is useful information we *lost*, and that situation
> >needs to be improved on the ABI side: an expanded error code present
> >in the event structure, copied back to user-space on errors, or so.

> Understood and there have been suggestions on how to definitely state
> what the kernel side did not like. I like Peter's last suggestion --
> something along the lines of clearing attr on a failure except the
> offending setting.

I think ws need to use a new bit, attr.clear_opsup_on_error,
that we would set, older kernels would return an error cause they don't
support this new feature, the tool would then clear it and work as
today, giving a vague message, cause that is all it can do.

We can't just clean the unsupported bits because then older tools would
get completely confused when trying to use the current fallback
mechanism, where, for instance, for sample_id_all, we just reset it in
the tooling side, ask again the kernel and it works.

Older tools not setting the clear_supported_bits_on_error would get the
current behaviour.

- Arnaldo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-14 18:00           ` Arnaldo Carvalho de Melo
@ 2012-09-14 18:07             ` Peter Zijlstra
  2012-09-14 20:05               ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2012-09-14 18:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: David Ahern, Ingo Molnar, linux-kernel, Robert Richter

On Fri, 2012-09-14 at 11:00 -0700, Arnaldo Carvalho de Melo wrote:
> > Understood and there have been suggestions on how to definitely state
> > what the kernel side did not like. I like Peter's last suggestion --
> > something along the lines of clearing attr on a failure except the
> > offending setting.
> 
> I think ws need to use a new bit 

Quite so, for all the reasons you list. But you like the general idea? I
wasn't sure I did, but it was the only thing I could come up with that
would sort of do what we need it to.

The fact that you destroy the user input is awkward, I don't think
there's another syscall that behaves in this fashion.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-14 18:07             ` Peter Zijlstra
@ 2012-09-14 20:05               ` Ingo Molnar
  2012-09-14 20:11                 ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2012-09-14 20:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, David Ahern, linux-kernel, Robert Richter


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2012-09-14 at 11:00 -0700, Arnaldo Carvalho de Melo wrote:
> > > Understood and there have been suggestions on how to definitely state
> > > what the kernel side did not like. I like Peter's last suggestion --
> > > something along the lines of clearing attr on a failure except the
> > > offending setting.
> > 
> > I think ws need to use a new bit 
> 
> Quite so, for all the reasons you list. But you like the 
> general idea? I wasn't sure I did, but it was the only thing I 
> could come up with that would sort of do what we need it to.
> 
> The fact that you destroy the user input is awkward, I don't 
> think there's another syscall that behaves in this fashion.

Destroying/clearing stuff looks really hacky.

Why not use a single error status field, set via a long list of 
enum error constants, a 'perf errnos'?

The only real problem with the kernel's syscall error code is 
that it's not wide enough for historic reasons, so we cannot 
just create our own errnos. But we can create our errors in the 
attr just fine and make them finegrained enough so that tooling 
can figure out what happened exactly when it gets a syscall 
error.

Yes, that's old-fashioned technology, but it works. With time we 
could put some structure into the list of error IDs, to make it 
easily extensible yet grouped in some fashion, etc.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-14 20:05               ` Ingo Molnar
@ 2012-09-14 20:11                 ` Ingo Molnar
  2012-09-14 21:26                   ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2012-09-14 20:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, David Ahern, linux-kernel, Robert Richter


* Ingo Molnar <mingo@kernel.org> wrote:

> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, 2012-09-14 at 11:00 -0700, Arnaldo Carvalho de Melo wrote:
> > > > Understood and there have been suggestions on how to definitely state
> > > > what the kernel side did not like. I like Peter's last suggestion --
> > > > something along the lines of clearing attr on a failure except the
> > > > offending setting.
> > > 
> > > I think ws need to use a new bit 
> > 
> > Quite so, for all the reasons you list. But you like the 
> > general idea? I wasn't sure I did, but it was the only thing I 
> > could come up with that would sort of do what we need it to.
> > 
> > The fact that you destroy the user input is awkward, I don't 
> > think there's another syscall that behaves in this fashion.
> 
> Destroying/clearing stuff looks really hacky.
> 
> Why not use a single error status field, set via a long list 
> of enum error constants, a 'perf errnos'?
> 
> The only real problem with the kernel's syscall error code is 
> that it's not wide enough for historic reasons, so we cannot 
> just create our own errnos. But we can create our errors in 
> the attr just fine and make them finegrained enough so that 
> tooling can figure out what happened exactly when it gets a 
> syscall error.
> 
> Yes, that's old-fashioned technology, but it works. With time 
> we could put some structure into the list of error IDs, to 
> make it easily extensible yet grouped in some fashion, etc.

And we could start this small: just covering the problem at hand 
or so, with proper tooling side support - and then extend it as 
needed. I.e. have something in place that we could ask people 
adding new kernel side error paths to use - and with time old 
code could be enriched with better error codes as well.

perf could in fact internally standardize on a 32-bit wide error 
code, and translate it to -EINVAL at the syscall return level. 
That way we could keep all the simple error paths as they are 
today, with a very little syscall level logic (triggered in 
error cases only, so no overhead in the regular case).

I.e.:

  return -EPERF_CPU_PRECISE_EV_NOTSUPP;

would result in that error code being available to new, improved 
tooling that knows about the new attr field. Old tooling would 
still work as well, it would only see the -EINVAL of the 
syscall.

Or something like this - simple and using existing patterns and 
practices. Anything else will likely fail due to non-use.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-14 20:11                 ` Ingo Molnar
@ 2012-09-14 21:26                   ` Peter Zijlstra
  2012-09-17  7:11                     ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2012-09-14 21:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, David Ahern, linux-kernel, Robert Richter

On Fri, 2012-09-14 at 22:11 +0200, Ingo Molnar wrote:
>   return -EPERF_CPU_PRECISE_EV_NOTSUPP;

I just don't like having to enumerate all possible fails, I'm too lazy.
Can't we be smarter about that? Could we do a {reason}x{bit-offset} like
thing?

Where we limit reason to a few simple things like:

 invalid
 out-of-range
 not-supported

and have the bit-offset indicate the field we're having the particular
problem with?

Then all we need is a smart way to generate and map the bit-offsets
without too much manual labour. 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
  2012-09-14 21:26                   ` Peter Zijlstra
@ 2012-09-17  7:11                     ` Ingo Molnar
  0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2012-09-17  7:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, David Ahern, linux-kernel, Robert Richter


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, 2012-09-14 at 22:11 +0200, Ingo Molnar wrote:
> >   return -EPERF_CPU_PRECISE_EV_NOTSUPP;
> 
> I just don't like having to enumerate all possible fails, I'm 
> too lazy. Can't we be smarter about that? Could we do a 
> {reason}x{bit-offset} like thing?
> 
> Where we limit reason to a few simple things like:
> 
>  invalid
>  out-of-range
>  not-supported
> 
> and have the bit-offset indicate the field we're having the particular
> problem with?
> 
> Then all we need is a smart way to generate and map the bit-offsets
> without too much manual labour. 

Putting the 'where' into a separate field would do that, and 
thus we could generate and report such structured errors as well 
- but nevertheless there will always be special/individual 
errors as well that won't fit into such a scheme, for which we 
should include a 'boring' errno range as well ...

I.e. a {where},{what} s32 pair of fields - if 'where' is zero 
then 'what' is the enumerated errno value I suggested, if it's 
nonzero then it's the 'where' indication you suggested.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3 v3] perf: precise mode and exclude_guest
  2012-09-13 20:59 [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
                   ` (2 preceding siblings ...)
  2012-09-13 20:59 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
@ 2012-09-26  1:24 ` David Ahern
  2012-10-09 15:08 ` David Ahern
  4 siblings, 0 replies; 19+ messages in thread
From: David Ahern @ 2012-09-26  1:24 UTC (permalink / raw)
  To: David Ahern, acme, peterz; +Cc: linux-kernel

On 9/13/12 2:59 PM, David Ahern wrote:
> Hopefully this wraps up the precise mode-exclude_guest dependency.
> I'm sure someone will let me know if I screwed up the attribution
> in the second patch.
>
> David Ahern (2):
>    perf tool: precise mode requires exclude_guest
>    perf tool: give user better message if precise is not supported
>
> Peter Zijlstra (1):
>    perf: require exclude_guest to use PEBS - kernel side enforcement
>
> v3: removed extra tab in patch 2
> v2: updated commit messages for patches 1 and 2

Ping? Is this version Peter approved?

David


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3 v3] perf: precise mode and exclude_guest
  2012-09-13 20:59 [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
                   ` (3 preceding siblings ...)
  2012-09-26  1:24 ` [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
@ 2012-10-09 15:08 ` David Ahern
  4 siblings, 0 replies; 19+ messages in thread
From: David Ahern @ 2012-10-09 15:08 UTC (permalink / raw)
  To: acme, peterz; +Cc: linux-kernel

On 9/13/12 2:59 PM, David Ahern wrote:
> Hopefully this wraps up the precise mode-exclude_guest dependency.
> I'm sure someone will let me know if I screwed up the attribution
> in the second patch.
>
> David Ahern (2):
>    perf tool: precise mode requires exclude_guest
>    perf tool: give user better message if precise is not supported
>
> Peter Zijlstra (1):
>    perf: require exclude_guest to use PEBS - kernel side enforcement
>
> v3: removed extra tab in patch 2
> v2: updated commit messages for patches 1 and 2

Any chance this get committed? At the least userspace code? It has been 
3 months since I first reported the problem and what appears to be an 
acceptable solution has been lingering for a month.

David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [tip:perf/urgent] perf tool: Precise mode requires exclude_guest
  2012-09-13 20:59 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
@ 2012-10-20  0:55   ` tip-bot for David Ahern
  0 siblings, 0 replies; 19+ messages in thread
From: tip-bot for David Ahern @ 2012-10-20  0:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, hpa, mingo, peterz, gleb, robert.richter,
	dsahern, tglx, avi

Commit-ID:  1342798cc13e3b48d9b5738f0c8fa812ccea8101
Gitweb:     http://git.kernel.org/tip/1342798cc13e3b48d9b5738f0c8fa812ccea8101
Author:     David Ahern <dsahern@gmail.com>
AuthorDate: Thu, 13 Sep 2012 14:59:13 -0600
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 16 Oct 2012 12:43:31 -0300

perf tool: Precise mode requires exclude_guest

Summary of events per Peter:

  "Intel PEBS in VT-x context uses the DS address as a guest linear address,
  even though its programmed by the host as a host linear address. This
  either results in guest memory corruption and or the hardware faulting and
  'crashing' the virtual machine.  Therefore we have to disable PEBS on VT-x
  enter and re-enable on VT-x exit, enforcing a strict exclude_guest.

  AMB IBS does work but doesn't currently support exclude_* at all,
  setting an exclude_* bit will make it fail."

This patch handles userspace perf command, setting the exclude_guest
attribute if precise mode is requested, but only if a user has not
specified a request for guest or host only profiling (G or H attribute).

Kernel side AMD currently ignores all exclude_* bits, so there is no impact
to existing IBS code paths. Robert Richter has a patch where IBS code will
return EINVAL if an exclude_* bit is set. When this goes in it means use
of :p on AMD with IBS will first fail with EINVAL (because exclude_guest
will be set). Then the existing fallback code within perf will unset
exclude_guest and try again. The second attempt will succeed if the CPU
supports IBS profiling.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Robert Richter <robert.richter@amd.com>
Tested-by: Robert Richter <robert.richter@amd.com>
Reviewed-by: Robert Richter <robert.richter@amd.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/r/1347569955-54626-2-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-events.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index aed38e4..75c7b0f 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -690,6 +690,9 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
 			eH = 0;
 		} else if (*str == 'p') {
 			precise++;
+			/* use of precise requires exclude_guest */
+			if (!exclude_GH)
+				eG = 1;
 		} else
 			break;
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip:perf/urgent] perf: Require exclude_guest to use PEBS - kernel side enforcement
  2012-09-13 20:59 ` [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement David Ahern
@ 2012-10-20  0:56   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: tip-bot for Peter Zijlstra @ 2012-10-20  0:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, hpa, mingo, peterz, gleb, robert.richter,
	dsahern, tglx, avi

Commit-ID:  20b279ddb38ca42f8863cec07b4d45ec24589f13
Gitweb:     http://git.kernel.org/tip/20b279ddb38ca42f8863cec07b4d45ec24589f13
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 13 Sep 2012 14:59:14 -0600
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 16 Oct 2012 12:43:58 -0300

perf: Require exclude_guest to use PEBS - kernel side enforcement

Intel PEBS in VT-x context uses the DS address as a guest linear
address, even though its programmed by the host as a host linear
address. This either results in guest memory corruption and or the
hardware faulting and 'crashing' the virtual machine.  Therefore we have
to disable PEBS on VT-x enter and re-enable on VT-x exit, enforcing a
strict exclude_guest.

This patch enforces exclude_guest kernel side.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/r/1347569955-54626-3-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 arch/x86/kernel/cpu/perf_event.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 915b876..3373f84 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -338,6 +338,9 @@ int x86_setup_perfctr(struct perf_event *event)
 		/* BTS is currently only allowed for user-mode. */
 		if (!attr->exclude_kernel)
 			return -EOPNOTSUPP;
+
+		if (!attr->exclude_guest)
+			return -EOPNOTSUPP;
 	}
 
 	hwc->config |= config;
@@ -380,6 +383,9 @@ int x86_pmu_hw_config(struct perf_event *event)
 	if (event->attr.precise_ip) {
 		int precise = 0;
 
+		if (!event->attr.exclude_guest)
+			return -EOPNOTSUPP;
+
 		/* Support for constant skid */
 		if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
 			precise++;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip:perf/core] perf tools: Give user better message if precise is not supported
  2012-09-13 20:59 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
  2012-09-14  5:43   ` Ingo Molnar
@ 2012-10-25  8:01   ` tip-bot for David Ahern
  1 sibling, 0 replies; 19+ messages in thread
From: tip-bot for David Ahern @ 2012-10-25  8:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, hpa, mingo, peterz, robert.richter, dsahern, tglx

Commit-ID:  2305c82fb35dd2c8c9533303bb1693f1636c66e4
Gitweb:     http://git.kernel.org/tip/2305c82fb35dd2c8c9533303bb1693f1636c66e4
Author:     David Ahern <dsahern@gmail.com>
AuthorDate: Thu, 13 Sep 2012 14:59:15 -0600
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Wed, 24 Oct 2012 14:20:11 -0200

perf tools: Give user better message if precise is not supported

Platforms (e.g., VM's) without support for precise mode get a confusing
error message. e.g.,
$ perf record -e cycles:p -a -- sleep 1

  Error: sys_perf_event_open() syscall returned with 95 (Operation not
  supported).  /bin/dmesg may provide additional information.

  No hardware sampling interrupt available. No APIC? If so then you can
  boot the kernel with the "lapic" boot parameter to force-enable it.
  sleep: Terminated

which is not clear that precise mode might be the root problem. With this
patch:

$ perf record -e cycles:p -fo /tmp/perf.data -- sleep 1
  Error:
  'precise' request may not be supported. Try removing 'p' modifier
  sleep: Terminated

v2: softened message to 'may not be' supported per Robert's suggestion

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/r/1347569955-54626-4-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c |    5 +++++
 tools/perf/builtin-top.c    |    4 ++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 73b5d7f..53c9892 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -317,6 +317,11 @@ try_again:
 					  perf_evsel__name(pos));
 				rc = -err;
 				goto out;
+			} else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
+				ui__error("\'precise\' request may not be supported. "
+					  "Try removing 'p' modifier\n");
+				rc = -err;
+				goto out;
 			}
 
 			printf("\n");
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index fb9da71..f2ecd49 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -977,6 +977,10 @@ try_again:
 				ui__error("Too many events are opened.\n"
 					    "Try again after reducing the number of events\n");
 				goto out_err;
+			} else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
+				ui__error("\'precise\' request may not be supported. "
+					  "Try removing 'p' modifier\n");
+				goto out_err;
 			}
 
 			ui__error("The sys_perf_event_open() syscall "

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-10-25  8:01 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-13 20:59 [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
2012-09-13 20:59 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
2012-10-20  0:55   ` [tip:perf/urgent] perf tool: Precise " tip-bot for David Ahern
2012-09-13 20:59 ` [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement David Ahern
2012-10-20  0:56   ` [tip:perf/urgent] perf: Require " tip-bot for Peter Zijlstra
2012-09-13 20:59 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
2012-09-14  5:43   ` Ingo Molnar
2012-09-14 11:13     ` David Ahern
2012-09-14 11:36       ` Ingo Molnar
2012-09-14 11:43         ` David Ahern
2012-09-14 18:00           ` Arnaldo Carvalho de Melo
2012-09-14 18:07             ` Peter Zijlstra
2012-09-14 20:05               ` Ingo Molnar
2012-09-14 20:11                 ` Ingo Molnar
2012-09-14 21:26                   ` Peter Zijlstra
2012-09-17  7:11                     ` Ingo Molnar
2012-10-25  8:01   ` [tip:perf/core] perf tools: Give " tip-bot for David Ahern
2012-09-26  1:24 ` [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
2012-10-09 15:08 ` David Ahern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).