* [PATCH 0/3 v2] perf: precise mode and exclude_guest
@ 2012-09-12 15:16 David Ahern
2012-09-12 15:16 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: David Ahern @ 2012-09-12 15:16 UTC (permalink / raw)
To: acme, linux-kernel, peterz; +Cc: David Ahern
Hopefully this wraps up the precise mode-exclude_guest dependency.
I'm sure someone will let me know if I screwed up the attribution
in the second patch.
David Ahern (2):
perf tool: precise mode requires exclude_guest
perf tool: give user better message if precise is not supported
Peter Zijlstra (1):
perf: require exclude_guest to use PEBS - kernel side enforcement
v2: updated commit messages for patches 1 and 2
arch/x86/kernel/cpu/perf_event.c | 6 ++++++
tools/perf/builtin-record.c | 5 +++++
tools/perf/builtin-top.c | 4 ++++
tools/perf/util/parse-events.c | 3 +++
4 files changed, 18 insertions(+)
--
1.7.10.1
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/3] perf tool: precise mode requires exclude_guest
2012-09-12 15:16 [PATCH 0/3 v2] perf: precise mode and exclude_guest David Ahern
@ 2012-09-12 15:16 ` David Ahern
2012-09-12 17:46 ` Robert Richter
2012-09-12 15:16 ` [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement David Ahern
2012-09-12 15:16 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
2 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2012-09-12 15:16 UTC (permalink / raw)
To: acme, linux-kernel, peterz
Cc: David Ahern, Ingo Molnar, Robert Richter, Gleb Natapov, Avi Kivity
Summary of events per Peter:
"Intel PEBS in VT-x context uses the DS address as a guest linear address,
even though its programmed by the host as a host linear address. This
either results in guest memory corruption and or the hardware faulting and
'crashing' the virtual machine. Therefore we have to disable PEBS on VT-x
enter and re-enable on VT-x exit, enforcing a strict exclude_guest.
AMB IBS does work but doesn't currently support exclude_* at all,
setting an exclude_* bit will make it fail."
This patch handles userspace perf command, setting the exclude_guest
attribute if precise mode is requested, but only if a user has not
specified a request for guest or host only profiling (G or H attribute).
Kernel side AMD currently ignores all exclude_* bits, so there is no impact
to existing IBS code paths. Robert Richter has a patch where IBS code will
return EINVAL if an exclude_* bit is set. When this goes in it means use
of :p on AMD with IBS will first fail with EINVAL (because exclude_guest
will be set). Then the existing fallback code within perf will unset
exclude_guest and try again. The second attempt will succeed if the CPU
supports IBS profiling.
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Link: https://lkml.org/lkml/2012/7/9/264
---
tools/perf/util/parse-events.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 44afcf4..696cc7e 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -694,6 +694,9 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
eH = 0;
} else if (*str == 'p') {
precise++;
+ /* use of precise requires exclude_guest */
+ if (!exclude_GH)
+ eG = 1;
} else
break;
--
1.7.10.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement
2012-09-12 15:16 [PATCH 0/3 v2] perf: precise mode and exclude_guest David Ahern
2012-09-12 15:16 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
@ 2012-09-12 15:16 ` David Ahern
2012-09-13 2:38 ` Namhyung Kim
2012-09-12 15:16 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
2 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2012-09-12 15:16 UTC (permalink / raw)
To: acme, linux-kernel, peterz
Cc: David Ahern, Ingo Molnar, Robert Richter, Gleb Natapov, Avi Kivity
From: Peter Zijlstra <peterz@infradead.org>
Per Peter:
"Intel PEBS in VT-x context uses the DS address as a guest linear address,
even though its programmed by the host as a host linear address. This
either results in guest memory corruption and or the hardware faulting and
'crashing' the virtual machine. Therefore we have to disable PEBS on VT-x
enter and re-enable on VT-x exit, enforcing a strict exclude_guest.
This patch enforces exclude_guest kernel side.
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Link: https://lkml.org/lkml/2012/7/9/298
---
arch/x86/kernel/cpu/perf_event.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 915b876..4bc96c5 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -338,6 +338,9 @@ int x86_setup_perfctr(struct perf_event *event)
/* BTS is currently only allowed for user-mode. */
if (!attr->exclude_kernel)
return -EOPNOTSUPP;
+
+ if (!attr->exclude_guest)
+ return -EOPNOTSUPP;
}
hwc->config |= config;
@@ -380,6 +383,9 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip) {
int precise = 0;
+ if (!event->attr.exclude_guest)
+ return -EOPNOTSUPP;
+
/* Support for constant skid */
if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
precise++;
--
1.7.10.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-12 15:16 [PATCH 0/3 v2] perf: precise mode and exclude_guest David Ahern
2012-09-12 15:16 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
2012-09-12 15:16 ` [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement David Ahern
@ 2012-09-12 15:16 ` David Ahern
2 siblings, 0 replies; 21+ messages in thread
From: David Ahern @ 2012-09-12 15:16 UTC (permalink / raw)
To: acme, linux-kernel, peterz; +Cc: David Ahern, Ingo Molnar, Robert Richter
Platforms (e.g., VM's) without support for precise mode get a confusing
error message. e.g.,
$ perf record -e cycles:p -a -- sleep 1
Error: sys_perf_event_open() syscall returned with 95 (Operation not
supported). /bin/dmesg may provide additional information.
No hardware sampling interrupt available. No APIC? If so then you can
boot the kernel with the "lapic" boot parameter to force-enable it.
sleep: Terminated
which is not clear that precise mode might be the root problem. With this
patch:
$ perf record -e cycles:p -fo /tmp/perf.data -- sleep 1
Error:
'precise' request may not be supported. Try removing 'p' modifier
sleep: Terminated
v2: softened message to 'may not be' supported per Robert's suggestion
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
---
tools/perf/builtin-record.c | 5 +++++
tools/perf/builtin-top.c | 4 ++++
2 files changed, 9 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index c643ed6..385e272 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -294,6 +294,11 @@ try_again:
perf_evsel__name(pos));
rc = -err;
goto out;
+ } else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
+ ui__error("\'precise\' request may not be supported. "
+ "Try removing 'p' modifier\n");
+ rc = -err;
+ goto out;
}
printf("\n");
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 5550754..1d1c98c 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -976,6 +976,10 @@ try_again:
ui__error("Too many events are opened.\n"
"Try again after reducing the number of events\n");
goto out_err;
+ } else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
+ ui__error("\'precise\' request may not be supported. "
+ "Try removing 'p' modifier\n");
+ goto out_err;
}
ui__error("The sys_perf_event_open() syscall "
--
1.7.10.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] perf tool: precise mode requires exclude_guest
2012-09-12 15:16 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
@ 2012-09-12 17:46 ` Robert Richter
2012-09-12 18:50 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 21+ messages in thread
From: Robert Richter @ 2012-09-12 17:46 UTC (permalink / raw)
To: David Ahern
Cc: acme, linux-kernel, peterz, Ingo Molnar, Gleb Natapov, Avi Kivity
On 12.09.12 09:16:28, David Ahern wrote:
> Summary of events per Peter:
>
> "Intel PEBS in VT-x context uses the DS address as a guest linear address,
> even though its programmed by the host as a host linear address. This
> either results in guest memory corruption and or the hardware faulting and
> 'crashing' the virtual machine. Therefore we have to disable PEBS on VT-x
> enter and re-enable on VT-x exit, enforcing a strict exclude_guest.
>
> AMB IBS does work but doesn't currently support exclude_* at all,
> setting an exclude_* bit will make it fail."
>
> This patch handles userspace perf command, setting the exclude_guest
> attribute if precise mode is requested, but only if a user has not
> specified a request for guest or host only profiling (G or H attribute).
>
> Kernel side AMD currently ignores all exclude_* bits, so there is no impact
> to existing IBS code paths. Robert Richter has a patch where IBS code will
> return EINVAL if an exclude_* bit is set. When this goes in it means use
> of :p on AMD with IBS will first fail with EINVAL (because exclude_guest
> will be set). Then the existing fallback code within perf will unset
> exclude_guest and try again. The second attempt will succeed if the CPU
> supports IBS profiling.
>
> Signed-off-by: David Ahern <dsahern@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Robert Richter <robert.richter@amd.com>
> Cc: Gleb Natapov <gleb@redhat.com>
> Cc: Avi Kivity <avi@redhat.com>
> Link: https://lkml.org/lkml/2012/7/9/264
> ---
> tools/perf/util/parse-events.c | 3 +++
> 1 file changed, 3 insertions(+)
Acked-by: Robert Richter <robert.richter@amd.com>
I tested the patch set with AMD IBS and it works fine.
-Robert
>
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 44afcf4..696cc7e 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -694,6 +694,9 @@ static int get_event_modifier(struct event_modifier *mod, char *str,
> eH = 0;
> } else if (*str == 'p') {
> precise++;
> + /* use of precise requires exclude_guest */
> + if (!exclude_GH)
> + eG = 1;
> } else
> break;
>
> --
> 1.7.10.1
>
>
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] perf tool: precise mode requires exclude_guest
2012-09-12 17:46 ` Robert Richter
@ 2012-09-12 18:50 ` Arnaldo Carvalho de Melo
2012-09-13 9:13 ` Robert Richter
0 siblings, 1 reply; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-09-12 18:50 UTC (permalink / raw)
To: Robert Richter
Cc: David Ahern, linux-kernel, peterz, Ingo Molnar, Gleb Natapov, Avi Kivity
Em Wed, Sep 12, 2012 at 07:46:55PM +0200, Robert Richter escreveu:
> On 12.09.12 09:16:28, David Ahern wrote:
> >
> > Kernel side AMD currently ignores all exclude_* bits, so there is no impact
> > to existing IBS code paths. Robert Richter has a patch where IBS code will
> > return EINVAL if an exclude_* bit is set. When this goes in it means use
> > of :p on AMD with IBS will first fail with EINVAL (because exclude_guest
> > will be set). Then the existing fallback code within perf will unset
> > exclude_guest and try again. The second attempt will succeed if the CPU
> > supports IBS profiling.
> >
> > Signed-off-by: David Ahern <dsahern@gmail.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Robert Richter <robert.richter@amd.com>
> > Cc: Gleb Natapov <gleb@redhat.com>
> > Cc: Avi Kivity <avi@redhat.com>
> > Link: https://lkml.org/lkml/2012/7/9/264
> > ---
> > tools/perf/util/parse-events.c | 3 +++
> > 1 file changed, 3 insertions(+)
>
> Acked-by: Robert Richter <robert.richter@amd.com>
>
> I tested the patch set with AMD IBS and it works fine.
Ok, I think that in this specific patchset I can add:
Tested-by: Robert Richter <robert.richter@amd.com>
Reviewed-by: Robert Richter <robert.richter@amd.com>
Instead of an Acked-by, may I?
- Arnaldo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement
2012-09-12 15:16 ` [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement David Ahern
@ 2012-09-13 2:38 ` Namhyung Kim
2012-09-13 4:33 ` David Ahern
0 siblings, 1 reply; 21+ messages in thread
From: Namhyung Kim @ 2012-09-13 2:38 UTC (permalink / raw)
To: David Ahern
Cc: acme, linux-kernel, peterz, Ingo Molnar, Robert Richter,
Gleb Natapov, Avi Kivity
Hi David,
On Wed, 12 Sep 2012 09:16:29 -0600, David Ahern wrote:
> From: Peter Zijlstra <peterz@infradead.org>
>
> Per Peter:
> "Intel PEBS in VT-x context uses the DS address as a guest linear address,
> even though its programmed by the host as a host linear address. This
> either results in guest memory corruption and or the hardware faulting and
> 'crashing' the virtual machine. Therefore we have to disable PEBS on VT-x
> enter and re-enable on VT-x exit, enforcing a strict exclude_guest.
>
> This patch enforces exclude_guest kernel side.
>
[snip]
> @@ -380,6 +383,9 @@ int x86_pmu_hw_config(struct perf_event *event)
> if (event->attr.precise_ip) {
> int precise = 0;
>
> + if (!event->attr.exclude_guest)
> + return -EOPNOTSUPP;
I see a whitespace problem here. :)
Thanks,
Namhyung
> +
> /* Support for constant skid */
> if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
> precise++;
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement
2012-09-13 2:38 ` Namhyung Kim
@ 2012-09-13 4:33 ` David Ahern
2012-09-13 4:45 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2012-09-13 4:33 UTC (permalink / raw)
To: Namhyung Kim
Cc: acme, linux-kernel, peterz, Ingo Molnar, Robert Richter,
Gleb Natapov, Avi Kivity
On 9/12/12 8:38 PM, Namhyung Kim wrote:
> Hi David,
>
> On Wed, 12 Sep 2012 09:16:29 -0600, David Ahern wrote:
>> From: Peter Zijlstra <peterz@infradead.org>
>>
>> Per Peter:
>> "Intel PEBS in VT-x context uses the DS address as a guest linear address,
>> even though its programmed by the host as a host linear address. This
>> either results in guest memory corruption and or the hardware faulting and
>> 'crashing' the virtual machine. Therefore we have to disable PEBS on VT-x
>> enter and re-enable on VT-x exit, enforcing a strict exclude_guest.
>>
>> This patch enforces exclude_guest kernel side.
>>
> [snip]
>> @@ -380,6 +383,9 @@ int x86_pmu_hw_config(struct perf_event *event)
>> if (event->attr.precise_ip) {
>> int precise = 0;
>>
>> + if (!event->attr.exclude_guest)
>> + return -EOPNOTSUPP;
>
> I see a whitespace problem here. :)
grr.... an extra freaking tab.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement
2012-09-13 4:33 ` David Ahern
@ 2012-09-13 4:45 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-09-13 4:45 UTC (permalink / raw)
To: David Ahern
Cc: Namhyung Kim, linux-kernel, peterz, Ingo Molnar, Robert Richter,
Gleb Natapov, Avi Kivity
Em Wed, Sep 12, 2012 at 10:33:36PM -0600, David Ahern escreveu:
> On 9/12/12 8:38 PM, Namhyung Kim wrote:
> >I see a whitespace problem here. :)
>
> grr.... an extra freaking tab.
As people say down here, it happens, even in the best families... :-P
- Arnaldo
$ shutdown now tho
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] perf tool: precise mode requires exclude_guest
2012-09-12 18:50 ` Arnaldo Carvalho de Melo
@ 2012-09-13 9:13 ` Robert Richter
0 siblings, 0 replies; 21+ messages in thread
From: Robert Richter @ 2012-09-13 9:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: David Ahern, linux-kernel, peterz, Ingo Molnar, Gleb Natapov, Avi Kivity
On 12.09.12 11:50:57, Arnaldo Carvalho de Melo wrote:
> Em Wed, Sep 12, 2012 at 07:46:55PM +0200, Robert Richter escreveu:
> > On 12.09.12 09:16:28, David Ahern wrote:
> > >
> > > Kernel side AMD currently ignores all exclude_* bits, so there is no impact
> > > to existing IBS code paths. Robert Richter has a patch where IBS code will
> > > return EINVAL if an exclude_* bit is set. When this goes in it means use
> > > of :p on AMD with IBS will first fail with EINVAL (because exclude_guest
> > > will be set). Then the existing fallback code within perf will unset
> > > exclude_guest and try again. The second attempt will succeed if the CPU
> > > supports IBS profiling.
> > >
> > > Signed-off-by: David Ahern <dsahern@gmail.com>
> > > Cc: Ingo Molnar <mingo@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Robert Richter <robert.richter@amd.com>
> > > Cc: Gleb Natapov <gleb@redhat.com>
> > > Cc: Avi Kivity <avi@redhat.com>
> > > Link: https://lkml.org/lkml/2012/7/9/264
> > > ---
> > > tools/perf/util/parse-events.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> >
> > Acked-by: Robert Richter <robert.richter@amd.com>
> >
> > I tested the patch set with AMD IBS and it works fine.
>
> Ok, I think that in this specific patchset I can add:
>
> Tested-by: Robert Richter <robert.richter@amd.com>
> Reviewed-by: Robert Richter <robert.richter@amd.com>
>
> Instead of an Acked-by, may I?
I am fine with this.
Thanks,
-Robert
>
> - Arnaldo
>
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-14 21:26 ` Peter Zijlstra
@ 2012-09-17 7:11 ` Ingo Molnar
0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2012-09-17 7:11 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo, David Ahern, linux-kernel, Robert Richter
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2012-09-14 at 22:11 +0200, Ingo Molnar wrote:
> > return -EPERF_CPU_PRECISE_EV_NOTSUPP;
>
> I just don't like having to enumerate all possible fails, I'm
> too lazy. Can't we be smarter about that? Could we do a
> {reason}x{bit-offset} like thing?
>
> Where we limit reason to a few simple things like:
>
> invalid
> out-of-range
> not-supported
>
> and have the bit-offset indicate the field we're having the particular
> problem with?
>
> Then all we need is a smart way to generate and map the bit-offsets
> without too much manual labour.
Putting the 'where' into a separate field would do that, and
thus we could generate and report such structured errors as well
- but nevertheless there will always be special/individual
errors as well that won't fit into such a scheme, for which we
should include a 'boring' errno range as well ...
I.e. a {where},{what} s32 pair of fields - if 'where' is zero
then 'what' is the enumerated errno value I suggested, if it's
nonzero then it's the 'where' indication you suggested.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-14 20:11 ` Ingo Molnar
@ 2012-09-14 21:26 ` Peter Zijlstra
2012-09-17 7:11 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2012-09-14 21:26 UTC (permalink / raw)
To: Ingo Molnar
Cc: Arnaldo Carvalho de Melo, David Ahern, linux-kernel, Robert Richter
On Fri, 2012-09-14 at 22:11 +0200, Ingo Molnar wrote:
> return -EPERF_CPU_PRECISE_EV_NOTSUPP;
I just don't like having to enumerate all possible fails, I'm too lazy.
Can't we be smarter about that? Could we do a {reason}x{bit-offset} like
thing?
Where we limit reason to a few simple things like:
invalid
out-of-range
not-supported
and have the bit-offset indicate the field we're having the particular
problem with?
Then all we need is a smart way to generate and map the bit-offsets
without too much manual labour.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-14 20:05 ` Ingo Molnar
@ 2012-09-14 20:11 ` Ingo Molnar
2012-09-14 21:26 ` Peter Zijlstra
0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2012-09-14 20:11 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo, David Ahern, linux-kernel, Robert Richter
* Ingo Molnar <mingo@kernel.org> wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
> > On Fri, 2012-09-14 at 11:00 -0700, Arnaldo Carvalho de Melo wrote:
> > > > Understood and there have been suggestions on how to definitely state
> > > > what the kernel side did not like. I like Peter's last suggestion --
> > > > something along the lines of clearing attr on a failure except the
> > > > offending setting.
> > >
> > > I think ws need to use a new bit
> >
> > Quite so, for all the reasons you list. But you like the
> > general idea? I wasn't sure I did, but it was the only thing I
> > could come up with that would sort of do what we need it to.
> >
> > The fact that you destroy the user input is awkward, I don't
> > think there's another syscall that behaves in this fashion.
>
> Destroying/clearing stuff looks really hacky.
>
> Why not use a single error status field, set via a long list
> of enum error constants, a 'perf errnos'?
>
> The only real problem with the kernel's syscall error code is
> that it's not wide enough for historic reasons, so we cannot
> just create our own errnos. But we can create our errors in
> the attr just fine and make them finegrained enough so that
> tooling can figure out what happened exactly when it gets a
> syscall error.
>
> Yes, that's old-fashioned technology, but it works. With time
> we could put some structure into the list of error IDs, to
> make it easily extensible yet grouped in some fashion, etc.
And we could start this small: just covering the problem at hand
or so, with proper tooling side support - and then extend it as
needed. I.e. have something in place that we could ask people
adding new kernel side error paths to use - and with time old
code could be enriched with better error codes as well.
perf could in fact internally standardize on a 32-bit wide error
code, and translate it to -EINVAL at the syscall return level.
That way we could keep all the simple error paths as they are
today, with a very little syscall level logic (triggered in
error cases only, so no overhead in the regular case).
I.e.:
return -EPERF_CPU_PRECISE_EV_NOTSUPP;
would result in that error code being available to new, improved
tooling that knows about the new attr field. Old tooling would
still work as well, it would only see the -EINVAL of the
syscall.
Or something like this - simple and using existing patterns and
practices. Anything else will likely fail due to non-use.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-14 18:07 ` Peter Zijlstra
@ 2012-09-14 20:05 ` Ingo Molnar
2012-09-14 20:11 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2012-09-14 20:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo, David Ahern, linux-kernel, Robert Richter
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2012-09-14 at 11:00 -0700, Arnaldo Carvalho de Melo wrote:
> > > Understood and there have been suggestions on how to definitely state
> > > what the kernel side did not like. I like Peter's last suggestion --
> > > something along the lines of clearing attr on a failure except the
> > > offending setting.
> >
> > I think ws need to use a new bit
>
> Quite so, for all the reasons you list. But you like the
> general idea? I wasn't sure I did, but it was the only thing I
> could come up with that would sort of do what we need it to.
>
> The fact that you destroy the user input is awkward, I don't
> think there's another syscall that behaves in this fashion.
Destroying/clearing stuff looks really hacky.
Why not use a single error status field, set via a long list of
enum error constants, a 'perf errnos'?
The only real problem with the kernel's syscall error code is
that it's not wide enough for historic reasons, so we cannot
just create our own errnos. But we can create our errors in the
attr just fine and make them finegrained enough so that tooling
can figure out what happened exactly when it gets a syscall
error.
Yes, that's old-fashioned technology, but it works. With time we
could put some structure into the list of error IDs, to make it
easily extensible yet grouped in some fashion, etc.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-14 18:00 ` Arnaldo Carvalho de Melo
@ 2012-09-14 18:07 ` Peter Zijlstra
2012-09-14 20:05 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2012-09-14 18:07 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: David Ahern, Ingo Molnar, linux-kernel, Robert Richter
On Fri, 2012-09-14 at 11:00 -0700, Arnaldo Carvalho de Melo wrote:
> > Understood and there have been suggestions on how to definitely state
> > what the kernel side did not like. I like Peter's last suggestion --
> > something along the lines of clearing attr on a failure except the
> > offending setting.
>
> I think ws need to use a new bit
Quite so, for all the reasons you list. But you like the general idea? I
wasn't sure I did, but it was the only thing I could come up with that
would sort of do what we need it to.
The fact that you destroy the user input is awkward, I don't think
there's another syscall that behaves in this fashion.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-14 11:43 ` David Ahern
@ 2012-09-14 18:00 ` Arnaldo Carvalho de Melo
2012-09-14 18:07 ` Peter Zijlstra
0 siblings, 1 reply; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2012-09-14 18:00 UTC (permalink / raw)
To: David Ahern; +Cc: Ingo Molnar, linux-kernel, peterz, Robert Richter
Em Fri, Sep 14, 2012 at 05:43:42AM -0600, David Ahern escreveu:
> On 9/14/12 5:36 AM, Ingo Molnar wrote:
> >Well, then that is useful information we *lost*, and that situation
> >needs to be improved on the ABI side: an expanded error code present
> >in the event structure, copied back to user-space on errors, or so.
> Understood and there have been suggestions on how to definitely state
> what the kernel side did not like. I like Peter's last suggestion --
> something along the lines of clearing attr on a failure except the
> offending setting.
I think ws need to use a new bit, attr.clear_opsup_on_error,
that we would set, older kernels would return an error cause they don't
support this new feature, the tool would then clear it and work as
today, giving a vague message, cause that is all it can do.
We can't just clean the unsupported bits because then older tools would
get completely confused when trying to use the current fallback
mechanism, where, for instance, for sample_id_all, we just reset it in
the tooling side, ask again the kernel and it works.
Older tools not setting the clear_supported_bits_on_error would get the
current behaviour.
- Arnaldo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-14 11:36 ` Ingo Molnar
@ 2012-09-14 11:43 ` David Ahern
2012-09-14 18:00 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2012-09-14 11:43 UTC (permalink / raw)
To: Ingo Molnar; +Cc: acme, linux-kernel, peterz, Robert Richter
On 9/14/12 5:36 AM, Ingo Molnar wrote:
> Well, then that is useful information we *lost*, and that
> situation needs to be improved on the ABI side: an expanded
> error code present in the event structure, copied back to
> user-space on errors, or so.
>
> (Alternatively, a special event channel just to pass back
> expanded error conditions.)
>
> Computers are supposed to make life easier for humans, by
> answering such "what did go wrong?" questions. Our losing of
> precise error conditions is a usability bug really - and in the
> perf project we are in a unique position to be able to improve
> both the kernel side code and make immediate use of it on the
> tooling side as well.
Understood and there have been suggestions on how to definitely state
what the kernel side did not like. I like Peter's last suggestion --
something along the lines of clearing attr on a failure except the
offending setting.
David
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-14 11:13 ` David Ahern
@ 2012-09-14 11:36 ` Ingo Molnar
2012-09-14 11:43 ` David Ahern
0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2012-09-14 11:36 UTC (permalink / raw)
To: David Ahern; +Cc: acme, linux-kernel, peterz, Robert Richter
* David Ahern <dsahern@gmail.com> wrote:
> On 9/13/12 11:43 PM, Ingo Molnar wrote:
> >>v2: softened message to 'may not be' supported per Robert's suggestion
> >
> > Well, either it's supported on this machine or it's not -
> > why does the text have to be so unsure about it?
>
> Because EOPNOTSUPP is returned for more than just precise
> mode. We cannot say with certainty that the precise attribute
> caused that errno.
Well, then that is useful information we *lost*, and that
situation needs to be improved on the ABI side: an expanded
error code present in the event structure, copied back to
user-space on errors, or so.
(Alternatively, a special event channel just to pass back
expanded error conditions.)
Computers are supposed to make life easier for humans, by
answering such "what did go wrong?" questions. Our losing of
precise error conditions is a usability bug really - and in the
perf project we are in a unique position to be able to improve
both the kernel side code and make immediate use of it on the
tooling side as well.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-14 5:43 ` Ingo Molnar
@ 2012-09-14 11:13 ` David Ahern
2012-09-14 11:36 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2012-09-14 11:13 UTC (permalink / raw)
To: Ingo Molnar; +Cc: acme, linux-kernel, peterz, Robert Richter
On 9/13/12 11:43 PM, Ingo Molnar wrote:
>> v2: softened message to 'may not be' supported per Robert's suggestion
>
> Well, either it's supported on this machine or it's not - why
> does the text have to be so unsure about it?
Because EOPNOTSUPP is returned for more than just precise mode. We
cannot say with certainty that the precise attribute caused that errno.
David
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-13 20:59 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
@ 2012-09-14 5:43 ` Ingo Molnar
2012-09-14 11:13 ` David Ahern
0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2012-09-14 5:43 UTC (permalink / raw)
To: David Ahern; +Cc: acme, linux-kernel, peterz, Robert Richter
* David Ahern <dsahern@gmail.com> wrote:
> Platforms (e.g., VM's) without support for precise mode get a confusing
> error message. e.g.,
> $ perf record -e cycles:p -a -- sleep 1
>
> Error: sys_perf_event_open() syscall returned with 95 (Operation not
> supported). /bin/dmesg may provide additional information.
>
> No hardware sampling interrupt available. No APIC? If so then you can
> boot the kernel with the "lapic" boot parameter to force-enable it.
> sleep: Terminated
>
> which is not clear that precise mode might be the root problem. With this
> patch:
>
> $ perf record -e cycles:p -fo /tmp/perf.data -- sleep 1
> Error:
> 'precise' request may not be supported. Try removing 'p' modifier
> sleep: Terminated
>
> v2: softened message to 'may not be' supported per Robert's suggestion
Well, either it's supported on this machine or it's not - why
does the text have to be so unsure about it?
We use computers to increase determinism, not to insert extra
uncertainty! ;-)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 3/3 v2] perf tool: give user better message if precise is not supported
2012-09-13 20:59 [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
@ 2012-09-13 20:59 ` David Ahern
2012-09-14 5:43 ` Ingo Molnar
0 siblings, 1 reply; 21+ messages in thread
From: David Ahern @ 2012-09-13 20:59 UTC (permalink / raw)
To: acme, linux-kernel, peterz; +Cc: David Ahern, Ingo Molnar, Robert Richter
Platforms (e.g., VM's) without support for precise mode get a confusing
error message. e.g.,
$ perf record -e cycles:p -a -- sleep 1
Error: sys_perf_event_open() syscall returned with 95 (Operation not
supported). /bin/dmesg may provide additional information.
No hardware sampling interrupt available. No APIC? If so then you can
boot the kernel with the "lapic" boot parameter to force-enable it.
sleep: Terminated
which is not clear that precise mode might be the root problem. With this
patch:
$ perf record -e cycles:p -fo /tmp/perf.data -- sleep 1
Error:
'precise' request may not be supported. Try removing 'p' modifier
sleep: Terminated
v2: softened message to 'may not be' supported per Robert's suggestion
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
---
tools/perf/builtin-record.c | 5 +++++
tools/perf/builtin-top.c | 4 ++++
2 files changed, 9 insertions(+)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index c643ed6..385e272 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -294,6 +294,11 @@ try_again:
perf_evsel__name(pos));
rc = -err;
goto out;
+ } else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
+ ui__error("\'precise\' request may not be supported. "
+ "Try removing 'p' modifier\n");
+ rc = -err;
+ goto out;
}
printf("\n");
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 5550754..1d1c98c 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -976,6 +976,10 @@ try_again:
ui__error("Too many events are opened.\n"
"Try again after reducing the number of events\n");
goto out_err;
+ } else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
+ ui__error("\'precise\' request may not be supported. "
+ "Try removing 'p' modifier\n");
+ goto out_err;
}
ui__error("The sys_perf_event_open() syscall "
--
1.7.10.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2012-09-17 7:11 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-12 15:16 [PATCH 0/3 v2] perf: precise mode and exclude_guest David Ahern
2012-09-12 15:16 ` [PATCH 1/3] perf tool: precise mode requires exclude_guest David Ahern
2012-09-12 17:46 ` Robert Richter
2012-09-12 18:50 ` Arnaldo Carvalho de Melo
2012-09-13 9:13 ` Robert Richter
2012-09-12 15:16 ` [PATCH 2/3] perf: require exclude_guest to use PEBS - kernel side enforcement David Ahern
2012-09-13 2:38 ` Namhyung Kim
2012-09-13 4:33 ` David Ahern
2012-09-13 4:45 ` Arnaldo Carvalho de Melo
2012-09-12 15:16 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
2012-09-13 20:59 [PATCH 0/3 v3] perf: precise mode and exclude_guest David Ahern
2012-09-13 20:59 ` [PATCH 3/3 v2] perf tool: give user better message if precise is not supported David Ahern
2012-09-14 5:43 ` Ingo Molnar
2012-09-14 11:13 ` David Ahern
2012-09-14 11:36 ` Ingo Molnar
2012-09-14 11:43 ` David Ahern
2012-09-14 18:00 ` Arnaldo Carvalho de Melo
2012-09-14 18:07 ` Peter Zijlstra
2012-09-14 20:05 ` Ingo Molnar
2012-09-14 20:11 ` Ingo Molnar
2012-09-14 21:26 ` Peter Zijlstra
2012-09-17 7:11 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).