All of lore.kernel.org
 help / color / mirror / Atom feed
* perf software events broken in containers
@ 2017-03-22 18:24 Brendan Gregg
  2017-03-22 19:15 ` William Cohen
  0 siblings, 1 reply; 7+ messages in thread
From: Brendan Gregg @ 2017-03-22 18:24 UTC (permalink / raw)
  To: linux-perf-use.

G'Day,

I think something broke recently with using perf from within a docker
container. We used to be able to run "docker exec -it --privileged
UUID bash", and then run perf from it for CPU sampling. But we just
noticed on recent kernels (4.10 and updated 4.4) that it no longer
works. Anyone else see this?

perf's error message is contradictory:

# /perf record -F 99 -a -- sleep 1
perf_event_open(..., PERF_FLAG_FD_CLOEXEC) failed with unexpected
error 1 (Operation not permitted)
perf_event_open(..., 0) failed unexpectedly with error 1 (Operation
not permitted)
Error:
You may not have permission to collect system-wide stats.

Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).

The current value is -1:

  -1: Allow use of (almost) all events by all users
>= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN

With -v, I can see that it tries the PMC based cycles, then gives up.
What normally happens is it then switches to the cpu-clock software
event, but it's not trying that anymore. Those software events are
also no longer visible:

# /perf list sw

List of pre-defined events (to be used in -e):

(no output)

The kernel is returning errno 1 to the sys_perf_event_open() call in
__perf_evsel__open(). I'm trying to find out which kernel function
throws the EPERM, but almost nothing is tracable via ftrace/kprobes.
It's pretty annoying... Thanks for any ideas,

Brendan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf software events broken in containers
  2017-03-22 18:24 perf software events broken in containers Brendan Gregg
@ 2017-03-22 19:15 ` William Cohen
  2017-03-22 19:59   ` Brendan Gregg
  0 siblings, 1 reply; 7+ messages in thread
From: William Cohen @ 2017-03-22 19:15 UTC (permalink / raw)
  To: Brendan Gregg, linux-perf-use.

On 03/22/2017 02:24 PM, Brendan Gregg wrote:
> G'Day,
> 
> I think something broke recently with using perf from within a docker
> container. We used to be able to run "docker exec -it --privileged
> UUID bash", and then run perf from it for CPU sampling. But we just
> noticed on recent kernels (4.10 and updated 4.4) that it no longer
> works. Anyone else see this?
> 
> perf's error message is contradictory:
> 
> # /perf record -F 99 -a -- sleep 1
> perf_event_open(..., PERF_FLAG_FD_CLOEXEC) failed with unexpected
> error 1 (Operation not permitted)
> perf_event_open(..., 0) failed unexpectedly with error 1 (Operation
> not permitted)
> Error:
> You may not have permission to collect system-wide stats.
> 
> Consider tweaking /proc/sys/kernel/perf_event_paranoid,
> which controls use of the performance events system by
> unprivileged users (without CAP_SYS_ADMIN).
> 
> The current value is -1:
> 
>   -1: Allow use of (almost) all events by all users
>> = 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
>> = 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>> = 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
> 
> With -v, I can see that it tries the PMC based cycles, then gives up.
> What normally happens is it then switches to the cpu-clock software
> event, but it's not trying that anymore. Those software events are
> also no longer visible:
> 
> # /perf list sw
> 
> List of pre-defined events (to be used in -e):
> 
> (no output)
> 
> The kernel is returning errno 1 to the sys_perf_event_open() call in
> __perf_evsel__open(). I'm trying to find out which kernel function
> throws the EPERM, but almost nothing is tracable via ftrace/kprobes.
> It's pretty annoying... Thanks for any ideas,

Hi Brendan,

Several years ago I had a problem with the perf_event_open not working on arm machines.  I looked at the kernel source code and just kept putting systemtap one-liners like the following on the various functions used by the perf_event_open to see which function was the one returning the error code:

stap -v -e 'probe kernel.function("idr_find").return {printf("%s %s 0x%x\n", pn(), $$parms$, $return)}'


Could the docker container not have the perf_event_open syscall on the whitelist of allowed syscalls?

Would bcc's capable.py tool give some insight into whether there are some other capabilities that the perf_event_open might be needing?

-Will
> 
> Brendan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf software events broken in containers
  2017-03-22 19:15 ` William Cohen
@ 2017-03-22 19:59   ` Brendan Gregg
  2017-03-22 20:29     ` William Cohen
  2017-03-27 16:10     ` Frank Ch. Eigler
  0 siblings, 2 replies; 7+ messages in thread
From: Brendan Gregg @ 2017-03-22 19:59 UTC (permalink / raw)
  To: William Cohen; +Cc: linux-perf-use.

G'Day Will,

On Wed, Mar 22, 2017 at 12:15 PM, William Cohen <wcohen@redhat.com> wrote:
> On 03/22/2017 02:24 PM, Brendan Gregg wrote:
>> G'Day,
[...]
>>
>> The kernel is returning errno 1 to the sys_perf_event_open() call in
>> __perf_evsel__open(). I'm trying to find out which kernel function
>> throws the EPERM, but almost nothing is tracable via ftrace/kprobes.
>> It's pretty annoying... Thanks for any ideas,
>
> Hi Brendan,
>
> Several years ago I had a problem with the perf_event_open not working on arm machines.  I looked at the kernel source code and just kept putting systemtap one-liners like the following on the various functions used by the perf_event_open to see which function was the one returning the error code:
>
> stap -v -e 'probe kernel.function("idr_find").return {printf("%s %s 0x%x\n", pn(), $$parms$, $return)}'
>

Right, I've been doing that with ftrace/kprobes/bcc/BPF... Many of the
functions aren't visible, though, I suspect inlined.

>
> Could the docker container not have the perf_event_open syscall on the whitelist of allowed syscalls?
>

Good thought. I do have a __secure_computing() call returning a -1,
and docker recently changed their blacklist to a whitelist:

https://github.com/docker/docker/pull/18979

I'm digging on this...

> Would bcc's capable.py tool give some insight into whether there are some other capabilities that the perf_event_open might be needing?
>

It should do (and I should enhance it to show the result of the
capability lookups, it was first written to just show what was being
asked), but it doesn't say anything new in this case. CAP_SYS_ADMIN,
which perf already has...

Thanks,

Brendan

> -Will
>>
>> Brendan
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf software events broken in containers
  2017-03-22 19:59   ` Brendan Gregg
@ 2017-03-22 20:29     ` William Cohen
  2017-03-22 21:35       ` Brendan Gregg
  2017-03-27 16:10     ` Frank Ch. Eigler
  1 sibling, 1 reply; 7+ messages in thread
From: William Cohen @ 2017-03-22 20:29 UTC (permalink / raw)
  To: Brendan Gregg; +Cc: linux-perf-use.

On 03/22/2017 03:59 PM, Brendan Gregg wrote:
> G'Day Will,
> 
> On Wed, Mar 22, 2017 at 12:15 PM, William Cohen <wcohen@redhat.com> wrote:
>> On 03/22/2017 02:24 PM, Brendan Gregg wrote:
>>> G'Day,
> [...]
>>>
>>> The kernel is returning errno 1 to the sys_perf_event_open() call in
>>> __perf_evsel__open(). I'm trying to find out which kernel function
>>> throws the EPERM, but almost nothing is tracable via ftrace/kprobes.
>>> It's pretty annoying... Thanks for any ideas,
>>
>> Hi Brendan,
>>
>> Several years ago I had a problem with the perf_event_open not working on arm machines.  I looked at the kernel source code and just kept putting systemtap one-liners like the following on the various functions used by the perf_event_open to see which function was the one returning the error code:
>>
>> stap -v -e 'probe kernel.function("idr_find").return {printf("%s %s 0x%x\n", pn(), $$parms$, $return)}'
>>
> 
> Right, I've been doing that with ftrace/kprobes/bcc/BPF... Many of the
> functions aren't visible, though, I suspect inlined.
> 

Hi Brendan,

Yes, the inlined functions are going to make it hard to get some of the return values.  However, you can probably narrow down the boundary between executed functions and not executed functions. Would the Intel Processor Trace (http://halobates.de/blog/) be usable to help track down this problem?

-Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf software events broken in containers
  2017-03-22 20:29     ` William Cohen
@ 2017-03-22 21:35       ` Brendan Gregg
  2017-03-23 14:21         ` David Ahern
  0 siblings, 1 reply; 7+ messages in thread
From: Brendan Gregg @ 2017-03-22 21:35 UTC (permalink / raw)
  To: William Cohen; +Cc: linux-perf-use.

On Wed, Mar 22, 2017 at 1:29 PM, William Cohen <wcohen@redhat.com> wrote:
> On 03/22/2017 03:59 PM, Brendan Gregg wrote:
>> G'Day Will,
>>
>> On Wed, Mar 22, 2017 at 12:15 PM, William Cohen <wcohen@redhat.com> wrote:
>>> On 03/22/2017 02:24 PM, Brendan Gregg wrote:
>>>> G'Day,
>> [...]
>>>>
>>>> The kernel is returning errno 1 to the sys_perf_event_open() call in
>>>> __perf_evsel__open(). I'm trying to find out which kernel function
>>>> throws the EPERM, but almost nothing is tracable via ftrace/kprobes.
>>>> It's pretty annoying... Thanks for any ideas,
>>>
>>> Hi Brendan,
>>>
>>> Several years ago I had a problem with the perf_event_open not working on arm machines.  I looked at the kernel source code and just kept putting systemtap one-liners like the following on the various functions used by the perf_event_open to see which function was the one returning the error code:
>>>
>>> stap -v -e 'probe kernel.function("idr_find").return {printf("%s %s 0x%x\n", pn(), $$parms$, $return)}'
>>>
>>
>> Right, I've been doing that with ftrace/kprobes/bcc/BPF... Many of the
>> functions aren't visible, though, I suspect inlined.
>>
>
> Hi Brendan,
>
> Yes, the inlined functions are going to make it hard to get some of the return values.  However, you can probably narrow down the boundary between executed functions and not executed functions. Would the Intel Processor Trace (http://halobates.de/blog/) be usable to help track down this problem?
>

I'm in the cloud, so I usually don't even have PMCs. :)

Looks like the issue is a change with docker, I'd guess related to
this syscall whitelisting. Doing a "docker run --cap-add cap_sys_admin
..." results in a container that can run perf (but has sysadmin
privilege always on), although we still need the --privileged for it
to work fully. Previously we could run normal containers and run some
ad hoc privileged shells only with "docker exec -it --privileged ...
bash" for running perf.

Weirdly, the container claims it has cap_sys_admin either way, but one
way it doesn't work and the other it does. I filed
https://github.com/docker/docker/issues/32018

We also found that docker explicitly blocks perf_event_open() by default[1]:

"perf_event_open    Tracing/profiling syscall, which could leak a lot
of information on the host."

[1] https://docs.docker.com/engine/security/seccomp/#significant-syscalls-blocked-by-the-default-profile

Brendan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf software events broken in containers
  2017-03-22 21:35       ` Brendan Gregg
@ 2017-03-23 14:21         ` David Ahern
  0 siblings, 0 replies; 7+ messages in thread
From: David Ahern @ 2017-03-23 14:21 UTC (permalink / raw)
  To: Brendan Gregg, William Cohen; +Cc: linux-perf-use.

On 3/22/17 3:35 PM, Brendan Gregg wrote:
> We also found that docker explicitly blocks perf_event_open() by default[1]:
> 
> "perf_event_open    Tracing/profiling syscall, which could leak a lot
> of information on the host."

as it should. The event collection does not discriminate by namespaces.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf software events broken in containers
  2017-03-22 19:59   ` Brendan Gregg
  2017-03-22 20:29     ` William Cohen
@ 2017-03-27 16:10     ` Frank Ch. Eigler
  1 sibling, 0 replies; 7+ messages in thread
From: Frank Ch. Eigler @ 2017-03-27 16:10 UTC (permalink / raw)
  To: Brendan Gregg; +Cc: William Cohen, linux-perf-use.


brendan.d.gregg wrote:

> [...]
>> stap -v -e 'probe kernel.function("idr_find").return {printf("%s %s 0x%x\n", pn(), $$parms$, $return)}'
>>
>
> Right, I've been doing that with ftrace/kprobes/bcc/BPF... Many of the
> functions aren't visible, though, I suspect inlined.

stap is comfortable with probing inlined functions, or
generally any statements within functions.

See e.g. https://sourceware.org/systemtap/examples/#general/whythefail.stp

- FChE

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-27 16:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-22 18:24 perf software events broken in containers Brendan Gregg
2017-03-22 19:15 ` William Cohen
2017-03-22 19:59   ` Brendan Gregg
2017-03-22 20:29     ` William Cohen
2017-03-22 21:35       ` Brendan Gregg
2017-03-23 14:21         ` David Ahern
2017-03-27 16:10     ` Frank Ch. Eigler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.