All of lore.kernel.org
 help / color / mirror / Atom feed
* perf segfault in docker container
@ 2016-06-09 21:14 Brendan Gregg
  2016-06-10 10:28 ` Aravinda Prasad
  2016-06-10 20:15 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 6+ messages in thread
From: Brendan Gregg @ 2016-06-09 21:14 UTC (permalink / raw)
  To: linux-perf-use.; +Cc: Wang Nan

G'Day,

Default docker container, in Linux 4.7-rc2, with latest perf from perf/core:

docker# ./perf record -F 99 -a
Segmentation fault

The segfault is in perf_event__synthesize_kernel_mmap(). I know
symbol__read_kptr_restrict() has been updated recently to fix similar
segfaults, hence getting perf/core.

I think the problem is this:

docker# id
uid=0(root) gid=0(root) groups=0(root)
docker# cat /proc/sys/kernel/kptr_restrict
1

(I'd previously set "echo -1 > /proc/sys/kernel/perf_event_paranoid")

The current (May 24) code has, symbol__read_kptr_restrict():

                if (fgets(line, sizeof(line), fp) != NULL)
                        value = (geteuid() != 0) ?
                                        (atoi(line) != 0) :
                                        (atoi(line) == 2);

assumes that if euid is 0 && kptr_restrict isn't 2, then we're aren't
restricted. But we are. Maybe the code should check for CAP_SYS_ADMIN,
instead of euid == 0?

Brendan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf segfault in docker container
  2016-06-09 21:14 perf segfault in docker container Brendan Gregg
@ 2016-06-10 10:28 ` Aravinda Prasad
  2016-06-21 22:32   ` Brendan Gregg
  2016-06-10 20:15 ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 6+ messages in thread
From: Aravinda Prasad @ 2016-06-10 10:28 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: linux-perf-use., Wang Nan, Hari Bathini, Ananth M, Naveen N. Rao

Hi Brendan,

I though of replying to your mail as I saw you running perf inside a
docker container. I believe you would be interested in events specific
to the container context as you are using "perf record -a".

We are working on supporting "container-aware tracing" i.e., whenever
you run "perf record -a" inside a container it should report
container-wide events rather than system-wide events. Towards that goal,
we posted an RFC patch in LKML [1] last year and also discussed possible
ways to restrict events within a container in Plumbers (Container
Microconf) [2].

Based on the discussion in Container Microconf, we are coming up with a
new prototype which should be ready for review by next week. The new
prototype introduces a new namespace "perf-namespace" (namespace name is
just a placeholder. Suggestions welcome). If the container is created
with perf-namespace, then "perf record -a" inside the container reports
only those events that are triggered within the container.

We would like to know if you are looking for "container-aware tracing"
and also like to know the scenarios/problems you are trying to debug by
running perf inside a container.

[1] lkml.org/lkml/2015/7/15/192
[2] linuxplumbersconf.org/2015/ocw/sessions/2667.html

Regards,
Aravinda

On Friday 10 June 2016 02:44 AM, Brendan Gregg wrote:
> G'Day,
> 
> Default docker container, in Linux 4.7-rc2, with latest perf from perf/core:
> 
> docker# ./perf record -F 99 -a
> Segmentation fault
> 
> The segfault is in perf_event__synthesize_kernel_mmap(). I know
> symbol__read_kptr_restrict() has been updated recently to fix similar
> segfaults, hence getting perf/core.
> 
> I think the problem is this:
> 
> docker# id
> uid=0(root) gid=0(root) groups=0(root)
> docker# cat /proc/sys/kernel/kptr_restrict
> 1
> 
> (I'd previously set "echo -1 > /proc/sys/kernel/perf_event_paranoid")
> 
> The current (May 24) code has, symbol__read_kptr_restrict():
> 
>                 if (fgets(line, sizeof(line), fp) != NULL)
>                         value = (geteuid() != 0) ?
>                                         (atoi(line) != 0) :
>                                         (atoi(line) == 2);
> 
> assumes that if euid is 0 && kptr_restrict isn't 2, then we're aren't
> restricted. But we are. Maybe the code should check for CAP_SYS_ADMIN,
> instead of euid == 0?
> 
> Brendan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Regards,
Aravinda

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf segfault in docker container
  2016-06-09 21:14 perf segfault in docker container Brendan Gregg
  2016-06-10 10:28 ` Aravinda Prasad
@ 2016-06-10 20:15 ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-10 20:15 UTC (permalink / raw)
  To: Brendan Gregg; +Cc: linux-perf-use., Wang Nan

Em Thu, Jun 09, 2016 at 02:14:11PM -0700, Brendan Gregg escreveu:
> G'Day,
> 
> Default docker container, in Linux 4.7-rc2, with latest perf from perf/core:
> 
> docker# ./perf record -F 99 -a
> Segmentation fault
> 
> The segfault is in perf_event__synthesize_kernel_mmap(). I know
> symbol__read_kptr_restrict() has been updated recently to fix similar
> segfaults, hence getting perf/core.
> 
> I think the problem is this:
> 
> docker# id
> uid=0(root) gid=0(root) groups=0(root)
> docker# cat /proc/sys/kernel/kptr_restrict
> 1
> 
> (I'd previously set "echo -1 > /proc/sys/kernel/perf_event_paranoid")
> 
> The current (May 24) code has, symbol__read_kptr_restrict():
> 
>                 if (fgets(line, sizeof(line), fp) != NULL)
>                         value = (geteuid() != 0) ?
>                                         (atoi(line) != 0) :
>                                         (atoi(line) == 2);
> 
> assumes that if euid is 0 && kptr_restrict isn't 2, then we're aren't
> restricted. But we are. Maybe the code should check for CAP_SYS_ADMIN,
> instead of euid == 0?

Ack, reproduced will work on it. Thanks for the report.

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf segfault in docker container
  2016-06-10 10:28 ` Aravinda Prasad
@ 2016-06-21 22:32   ` Brendan Gregg
  2016-06-21 22:43     ` Brendan Gregg
  2016-06-22 21:35     ` Aravinda Prasad
  0 siblings, 2 replies; 6+ messages in thread
From: Brendan Gregg @ 2016-06-21 22:32 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: linux-perf-use., Wang Nan, Hari Bathini, Ananth M, Naveen N. Rao

G'Day Aravinda,

Sorry for the delay; answers inline:

On Fri, Jun 10, 2016 at 3:28 AM, Aravinda Prasad
<aravinda@linux.vnet.ibm.com> wrote:
>
> Hi Brendan,
>
> I though of replying to your mail as I saw you running perf inside a
> docker container. I believe you would be interested in events specific
> to the container context as you are using "perf record -a".
>
> We are working on supporting "container-aware tracing" i.e., whenever
> you run "perf record -a" inside a container it should report
> container-wide events rather than system-wide events. Towards that goal,
> we posted an RFC patch in LKML [1] last year and also discussed possible
> ways to restrict events within a container in Plumbers (Container
> Microconf) [2].

Sounds great.

>
>
> Based on the discussion in Container Microconf, we are coming up with a
> new prototype which should be ready for review by next week. The new
> prototype introduces a new namespace "perf-namespace" (namespace name is
> just a placeholder. Suggestions welcome). If the container is created
> with perf-namespace, then "perf record -a" inside the container reports
> only those events that are triggered within the container.

I'd think that this restriction should be the default, rather than
needing to create a container with a perf-namespace. Why wouldn't it
make use of the existing pid namespace?

>
> We would like to know if you are looking for "container-aware tracing"
> and also like to know the scenarios/problems you are trying to debug by
> running perf inside a container.

Yes, perf needs to be container-aware.

To start with, we'd like to profile apps running inside Docker
containers, either by running perf in the container, or by running
perf from the host. As in, "perf record -F49 -a -g -- sleep 30". I've
tried both and had both approaches work, with some wrestling of
/tmp/perf-PID.map files and things.

If perf was container-aware, then running it in the container should
be the easiest way to profile an app, if it's only sampling that
container.

Also, from within a container, I'd expect to be able to sample kernel
stacks that are running for the container processes (eg, syscalls),
but not asynchronous kernel threads that are running host-wide (eg,
background fsflush).

More advanced things would involve tracing syscall latency and using
BPF for latency histograms, from within a container. That should be
allowed.

What about tracepoints? Should a container be able to use the block
I/O tracepoints and see disk I/O latency histograms? Filtering this to
be just the container's block I/O would be tricky. Doing it
system-wide may be allowable, depending on a setting in
perf_event_paranoid. I think in some environments, having a container
trace all tracepoints (disk, tcp, etc) is ok, provided to data is
leaked from another container; whereas in other environments tracing
non-container events would not be ok. Hence setting this in
perf_event_paranoid.

Brendan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf segfault in docker container
  2016-06-21 22:32   ` Brendan Gregg
@ 2016-06-21 22:43     ` Brendan Gregg
  2016-06-22 21:35     ` Aravinda Prasad
  1 sibling, 0 replies; 6+ messages in thread
From: Brendan Gregg @ 2016-06-21 22:43 UTC (permalink / raw)
  To: Aravinda Prasad
  Cc: linux-perf-use., Wang Nan, Hari Bathini, Ananth M, Naveen N. Rao

On Tue, Jun 21, 2016 at 3:32 PM, Brendan Gregg
<brendan.d.gregg@gmail.com> wrote:
> G'Day Aravinda,
>
[...]
>> We would like to know if you are looking for "container-aware tracing"
>> and also like to know the scenarios/problems you are trying to debug by
>> running perf inside a container.
>
> Yes, perf needs to be container-aware.
>
> To start with, we'd like to profile apps running inside Docker
> containers, either by running perf in the container, or by running
> perf from the host. As in, "perf record -F49 -a -g -- sleep 30". I've
> tried both and had both approaches work, with some wrestling of
> /tmp/perf-PID.map files and things.
>
> If perf was container-aware, then running it in the container should
> be the easiest way to profile an app, if it's only sampling that
> container.
>
> Also, from within a container, I'd expect to be able to sample kernel
> stacks that are running for the container processes (eg, syscalls),
> but not asynchronous kernel threads that are running host-wide (eg,
> background fsflush).
>
> More advanced things would involve tracing syscall latency and using
> BPF for latency histograms, from within a container. That should be
> allowed.
>
> What about tracepoints? Should a container be able to use the block
> I/O tracepoints and see disk I/O latency histograms? Filtering this to
> be just the container's block I/O would be tricky. Doing it
> system-wide may be allowable, depending on a setting in
> perf_event_paranoid. I think in some environments, having a container
> trace all tracepoints (disk, tcp, etc) is ok, provided to data is
> leaked from another container; whereas in other environments tracing
> non-container events would not be ok. Hence setting this in
> perf_event_paranoid.
>
> Brendan

An addition for container-aware tracing: perf from the host should be
able to find the correct /tmp/perf-PID.map files, even though they are
in the container /tmp's.

Brendan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf segfault in docker container
  2016-06-21 22:32   ` Brendan Gregg
  2016-06-21 22:43     ` Brendan Gregg
@ 2016-06-22 21:35     ` Aravinda Prasad
  1 sibling, 0 replies; 6+ messages in thread
From: Aravinda Prasad @ 2016-06-22 21:35 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: linux-perf-use., Wang Nan, Hari Bathini, Ananth M, Naveen N. Rao

Hi Brendan,

On Wednesday 22 June 2016 04:02 AM, Brendan Gregg wrote:
> G'Day Aravinda,
> 
> Sorry for the delay; answers inline:
> 
> On Fri, Jun 10, 2016 at 3:28 AM, Aravinda Prasad
> <aravinda@linux.vnet.ibm.com> wrote:
>>
>> Hi Brendan,
>>
>> I though of replying to your mail as I saw you running perf inside a
>> docker container. I believe you would be interested in events specific
>> to the container context as you are using "perf record -a".
>>
>> We are working on supporting "container-aware tracing" i.e., whenever
>> you run "perf record -a" inside a container it should report
>> container-wide events rather than system-wide events. Towards that goal,
>> we posted an RFC patch in LKML [1] last year and also discussed possible
>> ways to restrict events within a container in Plumbers (Container
>> Microconf) [2].
> 
> Sounds great.
> 
>>
>>
>> Based on the discussion in Container Microconf, we are coming up with a
>> new prototype which should be ready for review by next week. The new
>> prototype introduces a new namespace "perf-namespace" (namespace name is
>> just a placeholder. Suggestions welcome). If the container is created
>> with perf-namespace, then "perf record -a" inside the container reports
>> only those events that are triggered within the container.
> 
> I'd think that this restriction should be the default, rather than
> needing to create a container with a perf-namespace. Why wouldn't it
> make use of the existing pid namespace?

Our initial prototype (lkml.org/lkml/2015/7/15/192) was based on
pid-namespace. However, during the discussion in Plumbers, it was
mentioned that the requirement of PID namespace is insufficient for
containers that need access to the host PID namespace as these
containers are created without a PID namespace. Hence, we thought of
introducing perf-namespace.

We have posted the RFC patches for perf-namespace prototype:
https://lkml.org/lkml/2016/6/14/547

> 
>>
>> We would like to know if you are looking for "container-aware tracing"
>> and also like to know the scenarios/problems you are trying to debug by
>> running perf inside a container.
> 
> Yes, perf needs to be container-aware.
> 
> To start with, we'd like to profile apps running inside Docker
> containers, either by running perf in the container, or by running
> perf from the host. As in, "perf record -F49 -a -g -- sleep 30". I've
> tried both and had both approaches work, with some wrestling of
> /tmp/perf-PID.map files and things.

We are also working on enabling running perf from host with a container
ID as an argument. This is in addition to enabling perf inside a container.

> 
> If perf was container-aware, then running it in the container should
> be the easiest way to profile an app, if it's only sampling that
> container.
> 
> Also, from within a container, I'd expect to be able to sample kernel
> stacks that are running for the container processes (eg, syscalls),
> but not asynchronous kernel threads that are running host-wide (eg,
> background fsflush).

Our current and previous prototypes sample kernel events which are
triggered from the container context. And yes, they do not include
events from asynchronous kernel threads.

> 
> More advanced things would involve tracing syscall latency and using
> BPF for latency histograms, from within a container. That should be
> allowed.

Sure. Noted.

> 
> What about tracepoints? Should a container be able to use the block
> I/O tracepoints and see disk I/O latency histograms? Filtering this to
> be just the container's block I/O would be tricky. Doing it
> system-wide may be allowable, depending on a setting in
> perf_event_paranoid. I think in some environments, having a container
> trace all tracepoints (disk, tcp, etc) is ok, provided to data is
> leaked from another container; whereas in other environments tracing
> non-container events would not be ok. Hence setting this in
> perf_event_paranoid.

Yes, filtering such tracepoints to just the container's instance is
tricky and we have not yet figured out any solution for that.

Regards,
Aravinda

> 
> Brendan
> 

-- 
Regards,
Aravinda

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-22 21:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-09 21:14 perf segfault in docker container Brendan Gregg
2016-06-10 10:28 ` Aravinda Prasad
2016-06-21 22:32   ` Brendan Gregg
2016-06-21 22:43     ` Brendan Gregg
2016-06-22 21:35     ` Aravinda Prasad
2016-06-10 20:15 ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.