* perf segfault in docker container @ 2016-06-09 21:14 Brendan Gregg 2016-06-10 10:28 ` Aravinda Prasad 2016-06-10 20:15 ` Arnaldo Carvalho de Melo 0 siblings, 2 replies; 6+ messages in thread From: Brendan Gregg @ 2016-06-09 21:14 UTC (permalink / raw) To: linux-perf-use.; +Cc: Wang Nan G'Day, Default docker container, in Linux 4.7-rc2, with latest perf from perf/core: docker# ./perf record -F 99 -a Segmentation fault The segfault is in perf_event__synthesize_kernel_mmap(). I know symbol__read_kptr_restrict() has been updated recently to fix similar segfaults, hence getting perf/core. I think the problem is this: docker# id uid=0(root) gid=0(root) groups=0(root) docker# cat /proc/sys/kernel/kptr_restrict 1 (I'd previously set "echo -1 > /proc/sys/kernel/perf_event_paranoid") The current (May 24) code has, symbol__read_kptr_restrict(): if (fgets(line, sizeof(line), fp) != NULL) value = (geteuid() != 0) ? (atoi(line) != 0) : (atoi(line) == 2); assumes that if euid is 0 && kptr_restrict isn't 2, then we're aren't restricted. But we are. Maybe the code should check for CAP_SYS_ADMIN, instead of euid == 0? Brendan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf segfault in docker container 2016-06-09 21:14 perf segfault in docker container Brendan Gregg @ 2016-06-10 10:28 ` Aravinda Prasad 2016-06-21 22:32 ` Brendan Gregg 2016-06-10 20:15 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 6+ messages in thread From: Aravinda Prasad @ 2016-06-10 10:28 UTC (permalink / raw) To: Brendan Gregg Cc: linux-perf-use., Wang Nan, Hari Bathini, Ananth M, Naveen N. Rao Hi Brendan, I though of replying to your mail as I saw you running perf inside a docker container. I believe you would be interested in events specific to the container context as you are using "perf record -a". We are working on supporting "container-aware tracing" i.e., whenever you run "perf record -a" inside a container it should report container-wide events rather than system-wide events. Towards that goal, we posted an RFC patch in LKML [1] last year and also discussed possible ways to restrict events within a container in Plumbers (Container Microconf) [2]. Based on the discussion in Container Microconf, we are coming up with a new prototype which should be ready for review by next week. The new prototype introduces a new namespace "perf-namespace" (namespace name is just a placeholder. Suggestions welcome). If the container is created with perf-namespace, then "perf record -a" inside the container reports only those events that are triggered within the container. We would like to know if you are looking for "container-aware tracing" and also like to know the scenarios/problems you are trying to debug by running perf inside a container. [1] lkml.org/lkml/2015/7/15/192 [2] linuxplumbersconf.org/2015/ocw/sessions/2667.html Regards, Aravinda On Friday 10 June 2016 02:44 AM, Brendan Gregg wrote: > G'Day, > > Default docker container, in Linux 4.7-rc2, with latest perf from perf/core: > > docker# ./perf record -F 99 -a > Segmentation fault > > The segfault is in perf_event__synthesize_kernel_mmap(). I know > symbol__read_kptr_restrict() has been updated recently to fix similar > segfaults, hence getting perf/core. > > I think the problem is this: > > docker# id > uid=0(root) gid=0(root) groups=0(root) > docker# cat /proc/sys/kernel/kptr_restrict > 1 > > (I'd previously set "echo -1 > /proc/sys/kernel/perf_event_paranoid") > > The current (May 24) code has, symbol__read_kptr_restrict(): > > if (fgets(line, sizeof(line), fp) != NULL) > value = (geteuid() != 0) ? > (atoi(line) != 0) : > (atoi(line) == 2); > > assumes that if euid is 0 && kptr_restrict isn't 2, then we're aren't > restricted. But we are. Maybe the code should check for CAP_SYS_ADMIN, > instead of euid == 0? > > Brendan > -- > To unsubscribe from this list: send the line "unsubscribe linux-perf-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Regards, Aravinda ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf segfault in docker container 2016-06-10 10:28 ` Aravinda Prasad @ 2016-06-21 22:32 ` Brendan Gregg 2016-06-21 22:43 ` Brendan Gregg 2016-06-22 21:35 ` Aravinda Prasad 0 siblings, 2 replies; 6+ messages in thread From: Brendan Gregg @ 2016-06-21 22:32 UTC (permalink / raw) To: Aravinda Prasad Cc: linux-perf-use., Wang Nan, Hari Bathini, Ananth M, Naveen N. Rao G'Day Aravinda, Sorry for the delay; answers inline: On Fri, Jun 10, 2016 at 3:28 AM, Aravinda Prasad <aravinda@linux.vnet.ibm.com> wrote: > > Hi Brendan, > > I though of replying to your mail as I saw you running perf inside a > docker container. I believe you would be interested in events specific > to the container context as you are using "perf record -a". > > We are working on supporting "container-aware tracing" i.e., whenever > you run "perf record -a" inside a container it should report > container-wide events rather than system-wide events. Towards that goal, > we posted an RFC patch in LKML [1] last year and also discussed possible > ways to restrict events within a container in Plumbers (Container > Microconf) [2]. Sounds great. > > > Based on the discussion in Container Microconf, we are coming up with a > new prototype which should be ready for review by next week. The new > prototype introduces a new namespace "perf-namespace" (namespace name is > just a placeholder. Suggestions welcome). If the container is created > with perf-namespace, then "perf record -a" inside the container reports > only those events that are triggered within the container. I'd think that this restriction should be the default, rather than needing to create a container with a perf-namespace. Why wouldn't it make use of the existing pid namespace? > > We would like to know if you are looking for "container-aware tracing" > and also like to know the scenarios/problems you are trying to debug by > running perf inside a container. Yes, perf needs to be container-aware. To start with, we'd like to profile apps running inside Docker containers, either by running perf in the container, or by running perf from the host. As in, "perf record -F49 -a -g -- sleep 30". I've tried both and had both approaches work, with some wrestling of /tmp/perf-PID.map files and things. If perf was container-aware, then running it in the container should be the easiest way to profile an app, if it's only sampling that container. Also, from within a container, I'd expect to be able to sample kernel stacks that are running for the container processes (eg, syscalls), but not asynchronous kernel threads that are running host-wide (eg, background fsflush). More advanced things would involve tracing syscall latency and using BPF for latency histograms, from within a container. That should be allowed. What about tracepoints? Should a container be able to use the block I/O tracepoints and see disk I/O latency histograms? Filtering this to be just the container's block I/O would be tricky. Doing it system-wide may be allowable, depending on a setting in perf_event_paranoid. I think in some environments, having a container trace all tracepoints (disk, tcp, etc) is ok, provided to data is leaked from another container; whereas in other environments tracing non-container events would not be ok. Hence setting this in perf_event_paranoid. Brendan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf segfault in docker container 2016-06-21 22:32 ` Brendan Gregg @ 2016-06-21 22:43 ` Brendan Gregg 2016-06-22 21:35 ` Aravinda Prasad 1 sibling, 0 replies; 6+ messages in thread From: Brendan Gregg @ 2016-06-21 22:43 UTC (permalink / raw) To: Aravinda Prasad Cc: linux-perf-use., Wang Nan, Hari Bathini, Ananth M, Naveen N. Rao On Tue, Jun 21, 2016 at 3:32 PM, Brendan Gregg <brendan.d.gregg@gmail.com> wrote: > G'Day Aravinda, > [...] >> We would like to know if you are looking for "container-aware tracing" >> and also like to know the scenarios/problems you are trying to debug by >> running perf inside a container. > > Yes, perf needs to be container-aware. > > To start with, we'd like to profile apps running inside Docker > containers, either by running perf in the container, or by running > perf from the host. As in, "perf record -F49 -a -g -- sleep 30". I've > tried both and had both approaches work, with some wrestling of > /tmp/perf-PID.map files and things. > > If perf was container-aware, then running it in the container should > be the easiest way to profile an app, if it's only sampling that > container. > > Also, from within a container, I'd expect to be able to sample kernel > stacks that are running for the container processes (eg, syscalls), > but not asynchronous kernel threads that are running host-wide (eg, > background fsflush). > > More advanced things would involve tracing syscall latency and using > BPF for latency histograms, from within a container. That should be > allowed. > > What about tracepoints? Should a container be able to use the block > I/O tracepoints and see disk I/O latency histograms? Filtering this to > be just the container's block I/O would be tricky. Doing it > system-wide may be allowable, depending on a setting in > perf_event_paranoid. I think in some environments, having a container > trace all tracepoints (disk, tcp, etc) is ok, provided to data is > leaked from another container; whereas in other environments tracing > non-container events would not be ok. Hence setting this in > perf_event_paranoid. > > Brendan An addition for container-aware tracing: perf from the host should be able to find the correct /tmp/perf-PID.map files, even though they are in the container /tmp's. Brendan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf segfault in docker container 2016-06-21 22:32 ` Brendan Gregg 2016-06-21 22:43 ` Brendan Gregg @ 2016-06-22 21:35 ` Aravinda Prasad 1 sibling, 0 replies; 6+ messages in thread From: Aravinda Prasad @ 2016-06-22 21:35 UTC (permalink / raw) To: Brendan Gregg Cc: linux-perf-use., Wang Nan, Hari Bathini, Ananth M, Naveen N. Rao Hi Brendan, On Wednesday 22 June 2016 04:02 AM, Brendan Gregg wrote: > G'Day Aravinda, > > Sorry for the delay; answers inline: > > On Fri, Jun 10, 2016 at 3:28 AM, Aravinda Prasad > <aravinda@linux.vnet.ibm.com> wrote: >> >> Hi Brendan, >> >> I though of replying to your mail as I saw you running perf inside a >> docker container. I believe you would be interested in events specific >> to the container context as you are using "perf record -a". >> >> We are working on supporting "container-aware tracing" i.e., whenever >> you run "perf record -a" inside a container it should report >> container-wide events rather than system-wide events. Towards that goal, >> we posted an RFC patch in LKML [1] last year and also discussed possible >> ways to restrict events within a container in Plumbers (Container >> Microconf) [2]. > > Sounds great. > >> >> >> Based on the discussion in Container Microconf, we are coming up with a >> new prototype which should be ready for review by next week. The new >> prototype introduces a new namespace "perf-namespace" (namespace name is >> just a placeholder. Suggestions welcome). If the container is created >> with perf-namespace, then "perf record -a" inside the container reports >> only those events that are triggered within the container. > > I'd think that this restriction should be the default, rather than > needing to create a container with a perf-namespace. Why wouldn't it > make use of the existing pid namespace? Our initial prototype (lkml.org/lkml/2015/7/15/192) was based on pid-namespace. However, during the discussion in Plumbers, it was mentioned that the requirement of PID namespace is insufficient for containers that need access to the host PID namespace as these containers are created without a PID namespace. Hence, we thought of introducing perf-namespace. We have posted the RFC patches for perf-namespace prototype: https://lkml.org/lkml/2016/6/14/547 > >> >> We would like to know if you are looking for "container-aware tracing" >> and also like to know the scenarios/problems you are trying to debug by >> running perf inside a container. > > Yes, perf needs to be container-aware. > > To start with, we'd like to profile apps running inside Docker > containers, either by running perf in the container, or by running > perf from the host. As in, "perf record -F49 -a -g -- sleep 30". I've > tried both and had both approaches work, with some wrestling of > /tmp/perf-PID.map files and things. We are also working on enabling running perf from host with a container ID as an argument. This is in addition to enabling perf inside a container. > > If perf was container-aware, then running it in the container should > be the easiest way to profile an app, if it's only sampling that > container. > > Also, from within a container, I'd expect to be able to sample kernel > stacks that are running for the container processes (eg, syscalls), > but not asynchronous kernel threads that are running host-wide (eg, > background fsflush). Our current and previous prototypes sample kernel events which are triggered from the container context. And yes, they do not include events from asynchronous kernel threads. > > More advanced things would involve tracing syscall latency and using > BPF for latency histograms, from within a container. That should be > allowed. Sure. Noted. > > What about tracepoints? Should a container be able to use the block > I/O tracepoints and see disk I/O latency histograms? Filtering this to > be just the container's block I/O would be tricky. Doing it > system-wide may be allowable, depending on a setting in > perf_event_paranoid. I think in some environments, having a container > trace all tracepoints (disk, tcp, etc) is ok, provided to data is > leaked from another container; whereas in other environments tracing > non-container events would not be ok. Hence setting this in > perf_event_paranoid. Yes, filtering such tracepoints to just the container's instance is tricky and we have not yet figured out any solution for that. Regards, Aravinda > > Brendan > -- Regards, Aravinda ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: perf segfault in docker container 2016-06-09 21:14 perf segfault in docker container Brendan Gregg 2016-06-10 10:28 ` Aravinda Prasad @ 2016-06-10 20:15 ` Arnaldo Carvalho de Melo 1 sibling, 0 replies; 6+ messages in thread From: Arnaldo Carvalho de Melo @ 2016-06-10 20:15 UTC (permalink / raw) To: Brendan Gregg; +Cc: linux-perf-use., Wang Nan Em Thu, Jun 09, 2016 at 02:14:11PM -0700, Brendan Gregg escreveu: > G'Day, > > Default docker container, in Linux 4.7-rc2, with latest perf from perf/core: > > docker# ./perf record -F 99 -a > Segmentation fault > > The segfault is in perf_event__synthesize_kernel_mmap(). I know > symbol__read_kptr_restrict() has been updated recently to fix similar > segfaults, hence getting perf/core. > > I think the problem is this: > > docker# id > uid=0(root) gid=0(root) groups=0(root) > docker# cat /proc/sys/kernel/kptr_restrict > 1 > > (I'd previously set "echo -1 > /proc/sys/kernel/perf_event_paranoid") > > The current (May 24) code has, symbol__read_kptr_restrict(): > > if (fgets(line, sizeof(line), fp) != NULL) > value = (geteuid() != 0) ? > (atoi(line) != 0) : > (atoi(line) == 2); > > assumes that if euid is 0 && kptr_restrict isn't 2, then we're aren't > restricted. But we are. Maybe the code should check for CAP_SYS_ADMIN, > instead of euid == 0? Ack, reproduced will work on it. Thanks for the report. - Arnaldo ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-06-22 21:35 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-09 21:14 perf segfault in docker container Brendan Gregg 2016-06-10 10:28 ` Aravinda Prasad 2016-06-21 22:32 ` Brendan Gregg 2016-06-21 22:43 ` Brendan Gregg 2016-06-22 21:35 ` Aravinda Prasad 2016-06-10 20:15 ` Arnaldo Carvalho de Melo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.