All of lore.kernel.org
 help / color / mirror / Atom feed
* perf not picking up symbols for namespaced processes
@ 2019-12-05  3:46 Ivan Babrou
  2019-12-05 12:33 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 9+ messages in thread
From: Ivan Babrou @ 2019-12-05  3:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-team, Jiri Olsa, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim,
	sashal, Kenton Varda

We have a service that forks a child process in a namespace-based
sandbox where the mount namespace is intentionally designed to reflect
a totally empty filesystem. Our use case is very similar to Chrome's
sandbox, for example, but on a server. Within the sandbox, not even
the service's own binary is present in the mount namespace.

Process tree looks like this:

$ sudo pstree -psc 63989
edgeworker(63989)─┬─edgeworker/sbox(255716)─┬─edgeworker/zygt(255718)
                   │                         ├─{edgeworker/sbox}(255719)
                   │                         ├─{edgeworker/sbox}(255720)
                   │                         ├─{edgeworker/sbox}(255721)
                   ├─edgeworker/stry(5803)
                   ├─edgeworker/stry(63990)
                   ├─edgeworker/stry(106218)
                   ├─edgeworker/stry(191905)
                   ├─edgeworker/stry(255695)
                   ├─edgeworker/supr(255717)

Here sbox processes do actual work living in an empty mount namespaces
and stry is a helper process for error reporting. All tasks come from
the same binary that lives in the root mount namespace, launched by
systemd.

During "perf script" run on a trace obtained from the system there are
these possible outcomes:

1. The first pid to be processed is a non-namespaced helper and
symbols are present.
2. The first pid is not found and symbols are present.
3. The first pid is a sandboxed task and symbols are missing.

Symbols are missing, because "perf script" tries to jump into an empty
sandbox and find a binary there, when in fact it lives outside:

getcwd("/state/home/ivan", 4096)        = 17
open("/proc/self/ns/mnt", O_RDONLY)     = 5
open("/proc/255719/ns/mnt", O_RDONLY)   = 6
setns(6, CLONE_NEWNS)                   = 0
stat("/usr/local/bin/edgeworker", 0x7ffedb9b3ca0) = -1 ENOENT (No such
file or directory)

In the second outcome we don't have a PID to figure out the namespace
to jump into, so this doesn't happen. It's a good fallback, but it was
a bit confusing during debugging.

It's not entirely clear to me why sometimes a helper PID is picked,
even though it's not the first sample in the recorded trace (at least
not in the output). This happens deterministically, or at least
appears so. In my process tree it's 255695.

I think perf should try to fallback to the default namespace to look
up symbols if they are not found inside to cover our case. Relevant
piece of logic is here:

* https://elixir.free-electrons.com/linux/v5.4.1/source/tools/perf/util/dso.c#L520

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf not picking up symbols for namespaced processes
  2019-12-05  3:46 perf not picking up symbols for namespaced processes Ivan Babrou
@ 2019-12-05 12:33 ` Arnaldo Carvalho de Melo
  2019-12-06  2:17   ` Ivan Babrou
  0 siblings, 1 reply; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-12-05 12:33 UTC (permalink / raw)
  To: Ivan Babrou
  Cc: linux-kernel, kernel-team, Jiri Olsa, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Namhyung Kim, sashal,
	Kenton Varda

Em Wed, Dec 04, 2019 at 07:46:10PM -0800, Ivan Babrou escreveu:
> We have a service that forks a child process in a namespace-based
> sandbox where the mount namespace is intentionally designed to reflect
> a totally empty filesystem. Our use case is very similar to Chrome's
> sandbox, for example, but on a server. Within the sandbox, not even
> the service's own binary is present in the mount namespace.
> 
> Process tree looks like this:
> 
> $ sudo pstree -psc 63989
> edgeworker(63989)─┬─edgeworker/sbox(255716)─┬─edgeworker/zygt(255718)
>                    │                         ├─{edgeworker/sbox}(255719)
>                    │                         ├─{edgeworker/sbox}(255720)
>                    │                         ├─{edgeworker/sbox}(255721)
>                    ├─edgeworker/stry(5803)
>                    ├─edgeworker/stry(63990)
>                    ├─edgeworker/stry(106218)
>                    ├─edgeworker/stry(191905)
>                    ├─edgeworker/stry(255695)
>                    ├─edgeworker/supr(255717)
> 
> Here sbox processes do actual work living in an empty mount namespaces
> and stry is a helper process for error reporting. All tasks come from
> the same binary that lives in the root mount namespace, launched by
> systemd.
> 
> During "perf script" run on a trace obtained from the system there are
> these possible outcomes:
> 
> 1. The first pid to be processed is a non-namespaced helper and
> symbols are present.
> 2. The first pid is not found and symbols are present.
> 3. The first pid is a sandboxed task and symbols are missing.
> 
> Symbols are missing, because "perf script" tries to jump into an empty
> sandbox and find a binary there, when in fact it lives outside:
> 
> getcwd("/state/home/ivan", 4096)        = 17
> open("/proc/self/ns/mnt", O_RDONLY)     = 5
> open("/proc/255719/ns/mnt", O_RDONLY)   = 6
> setns(6, CLONE_NEWNS)                   = 0
> stat("/usr/local/bin/edgeworker", 0x7ffedb9b3ca0) = -1 ENOENT (No such
> file or directory)
> 
> In the second outcome we don't have a PID to figure out the namespace
> to jump into, so this doesn't happen. It's a good fallback, but it was
> a bit confusing during debugging.
> 
> It's not entirely clear to me why sometimes a helper PID is picked,
> even though it's not the first sample in the recorded trace (at least
> not in the output). This happens deterministically, or at least
> appears so. In my process tree it's 255695.
> 
> I think perf should try to fallback to the default namespace to look
> up symbols if they are not found inside to cover our case. Relevant
> piece of logic is here:

That should work for your use case, as you're sure that looking up by
pathname only will find, outside the namespace, the binary you want.

Even with pathname based looukups being fragile, it works for your
usecase, so please consider providing a patch for such fallback,
together with a pr_debug() or even pr_warning() if this don't get too
noisy, to warn the user.

- Arnaldo
 
> * https://elixir.free-electrons.com/linux/v5.4.1/source/tools/perf/util/dso.c#L520

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf not picking up symbols for namespaced processes
  2019-12-05 12:33 ` Arnaldo Carvalho de Melo
@ 2019-12-06  2:17   ` Ivan Babrou
  2020-02-04 15:09     ` Marek Majkowski
  0 siblings, 1 reply; 9+ messages in thread
From: Ivan Babrou @ 2019-12-06  2:17 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, kernel-team, Jiri Olsa, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Namhyung Kim, sashal,
	Kenton Varda

I'm not very good at this, but the following works for me. If you this
is in general vicinity of what you expected, I can email patch
properly.

Initially I hoped that setting dso->nsinfo->need_setns to false in
dso_open would do the trick, but it did not work.

$ cat 0001-perf-fallback-to-opening-dso-from-outside-of-mount-n.patch
| sed 's/\t/        /g'
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Ivan Babrou <ivan@cloudflare.com>
Date: Thu, 5 Dec 2019 16:27:48 -0800
Subject: [PATCH] perf: fallback to opening dso from outside of mount namespace

Some tasks enter mount namespace for isolation and this fallback
allows perf to read symbols from binaries that live outside of
mount namespace of the running task.

Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
---
 tools/perf/util/dso.c    |  7 +++++++
 tools/perf/util/symbol.c | 20 +++++++++++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index e11ddf86f2b3..dac6bf42e43e 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -527,6 +527,13 @@ static int open_dso(struct dso *dso, struct
machine *machine)
         fd = __open_dso(dso, machine);
         if (dso->binary_type != DSO_BINARY_TYPE__BUILD_ID_CACHE)
                 nsinfo__mountns_exit(&nsc);
+
+        if (fd < 0) {
+                fd = __open_dso(dso, machine);
+                if (fd >= 0) {
+                        pr_warning("Using debug info for %s from
outside of its active mount namespace.\n", dso->long_name);
+                }
+        }

         if (fd >= 0) {
                 dso__list_add(dso);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index a8f80e427674..e85d57dfcc14 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1679,11 +1679,21 @@ int dso__load(struct dso *dso, struct map *map)
          * Read the build id if possible. This is required for
          * DSO_BINARY_TYPE__BUILDID_DEBUGINFO to work
          */
-        if (!dso->has_build_id &&
-            is_regular_file(dso->long_name)) {
-            __symbol__join_symfs(name, PATH_MAX, dso->long_name);
-            if (filename__read_build_id(name, build_id, BUILD_ID_SIZE) > 0)
-                dso__set_build_id(dso, build_id);
+        if (!dso->has_build_id) {
+            bool is_reg = is_regular_file(dso->long_name);
+            if (!is_reg) {
+                nsinfo__mountns_exit(&nsc);
+                is_reg = is_regular_file(dso->long_name);
+                if (!is_reg) {
+                    nsinfo__mountns_enter(dso->nsinfo, &nsc);
+                }
+            }
+
+            if (is_reg) {
+                __symbol__join_symfs(name, PATH_MAX, dso->long_name);
+                if (filename__read_build_id(name, build_id, BUILD_ID_SIZE) > 0)
+                    dso__set_build_id(dso, build_id);
+            }
         }

         /*
--
2.24.0

  /*

--
2.24.0

On Thu, Dec 5, 2019 at 4:33 AM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Wed, Dec 04, 2019 at 07:46:10PM -0800, Ivan Babrou escreveu:
> > We have a service that forks a child process in a namespace-based
> > sandbox where the mount namespace is intentionally designed to reflect
> > a totally empty filesystem. Our use case is very similar to Chrome's
> > sandbox, for example, but on a server. Within the sandbox, not even
> > the service's own binary is present in the mount namespace.
> >
> > Process tree looks like this:
> >
> > $ sudo pstree -psc 63989
> > edgeworker(63989)─┬─edgeworker/sbox(255716)─┬─edgeworker/zygt(255718)
> >                    │                         ├─{edgeworker/sbox}(255719)
> >                    │                         ├─{edgeworker/sbox}(255720)
> >                    │                         ├─{edgeworker/sbox}(255721)
> >                    ├─edgeworker/stry(5803)
> >                    ├─edgeworker/stry(63990)
> >                    ├─edgeworker/stry(106218)
> >                    ├─edgeworker/stry(191905)
> >                    ├─edgeworker/stry(255695)
> >                    ├─edgeworker/supr(255717)
> >
> > Here sbox processes do actual work living in an empty mount namespaces
> > and stry is a helper process for error reporting. All tasks come from
> > the same binary that lives in the root mount namespace, launched by
> > systemd.
> >
> > During "perf script" run on a trace obtained from the system there are
> > these possible outcomes:
> >
> > 1. The first pid to be processed is a non-namespaced helper and
> > symbols are present.
> > 2. The first pid is not found and symbols are present.
> > 3. The first pid is a sandboxed task and symbols are missing.
> >
> > Symbols are missing, because "perf script" tries to jump into an empty
> > sandbox and find a binary there, when in fact it lives outside:
> >
> > getcwd("/state/home/ivan", 4096)        = 17
> > open("/proc/self/ns/mnt", O_RDONLY)     = 5
> > open("/proc/255719/ns/mnt", O_RDONLY)   = 6
> > setns(6, CLONE_NEWNS)                   = 0
> > stat("/usr/local/bin/edgeworker", 0x7ffedb9b3ca0) = -1 ENOENT (No such
> > file or directory)
> >
> > In the second outcome we don't have a PID to figure out the namespace
> > to jump into, so this doesn't happen. It's a good fallback, but it was
> > a bit confusing during debugging.
> >
> > It's not entirely clear to me why sometimes a helper PID is picked,
> > even though it's not the first sample in the recorded trace (at least
> > not in the output). This happens deterministically, or at least
> > appears so. In my process tree it's 255695.
> >
> > I think perf should try to fallback to the default namespace to look
> > up symbols if they are not found inside to cover our case. Relevant
> > piece of logic is here:
>
> That should work for your use case, as you're sure that looking up by
> pathname only will find, outside the namespace, the binary you want.
>
> Even with pathname based looukups being fragile, it works for your
> usecase, so please consider providing a patch for such fallback,
> together with a pr_debug() or even pr_warning() if this don't get too
> noisy, to warn the user.
>
> - Arnaldo
>
> > * https://elixir.free-electrons.com/linux/v5.4.1/source/tools/perf/util/dso.c#L520
>
> --
>
> - Arnaldo

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: perf not picking up symbols for namespaced processes
  2019-12-06  2:17   ` Ivan Babrou
@ 2020-02-04 15:09     ` Marek Majkowski
  2020-02-04 19:26       ` Jiri Olsa
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Majkowski @ 2020-02-04 15:09 UTC (permalink / raw)
  To: Ivan Babrou, kernel-team
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Jiri Olsa,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Namhyung Kim,
	sashal, Kenton Varda

On Fri, Dec 6, 2019 at 2:17 AM Ivan Babrou <ivan@cloudflare.com> wrote:
>
> I'm not very good at this, but the following works for me. If you this
> is in general vicinity of what you expected, I can email patch
> properly.
>

Thanks for the patch, I can confirm it works. I had this problem today
when playing
with gvisor. Gvisor is starting up in a fresh mount namespace and perf fails
to read the symbols. Stracing perf shows:

11913 openat(AT_FDCWD, "/proc/9512/ns/mnt", O_RDONLY) = 197
11913 setns(197, CLONE_NEWNS) = 0
11913 stat("/home/marek/bin/runsc-debug", 0x7fffffff8480) = -1 ENOENT
(No such file or directory)
11913 setns(196, CLONE_NEWNS) = 0

Which of course makes no sense - the runsc-debug binary does not exist in the
empty mount namespace of the restricted runsc process.

Marek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf not picking up symbols for namespaced processes
  2020-02-04 15:09     ` Marek Majkowski
@ 2020-02-04 19:26       ` Jiri Olsa
  2020-02-11 10:06         ` Marek Majkowski
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2020-02-04 19:26 UTC (permalink / raw)
  To: Marek Majkowski
  Cc: Ivan Babrou, kernel-team, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Namhyung Kim,
	sashal, Kenton Varda

On Tue, Feb 04, 2020 at 03:09:48PM +0000, Marek Majkowski wrote:
> On Fri, Dec 6, 2019 at 2:17 AM Ivan Babrou <ivan@cloudflare.com> wrote:
> >
> > I'm not very good at this, but the following works for me. If you this
> > is in general vicinity of what you expected, I can email patch
> > properly.
> >
> 
> Thanks for the patch, I can confirm it works. I had this problem today
> when playing
> with gvisor. Gvisor is starting up in a fresh mount namespace and perf fails
> to read the symbols. Stracing perf shows:
> 
> 11913 openat(AT_FDCWD, "/proc/9512/ns/mnt", O_RDONLY) = 197
> 11913 setns(197, CLONE_NEWNS) = 0
> 11913 stat("/home/marek/bin/runsc-debug", 0x7fffffff8480) = -1 ENOENT
> (No such file or directory)
> 11913 setns(196, CLONE_NEWNS) = 0
> 
> Which of course makes no sense - the runsc-debug binary does not exist in the
> empty mount namespace of the restricted runsc process.

hi,
could you guys please share more details on what you run exactly,
and perhaps that change you mentioned?

thanks,
jirka


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf not picking up symbols for namespaced processes
  2020-02-04 19:26       ` Jiri Olsa
@ 2020-02-11 10:06         ` Marek Majkowski
  2020-02-11 13:46           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Majkowski @ 2020-02-11 10:06 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ivan Babrou, kernel-team, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Namhyung Kim,
	sashal, Kenton Varda

Jirka,

On Tue, Feb 4, 2020 at 7:27 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > 11913 openat(AT_FDCWD, "/proc/9512/ns/mnt", O_RDONLY) = 197
> > 11913 setns(197, CLONE_NEWNS) = 0
> > 11913 stat("/home/marek/bin/runsc-debug", 0x7fffffff8480) = -1 ENOENT
> > (No such file or directory)
> > 11913 setns(196, CLONE_NEWNS) = 0
>
> hi,
> could you guys please share more details on what you run exactly,
> and perhaps that change you mentioned?

I was debugging gvisor (runsc), which does execve(/proc/self/exe), and
then messes up with its mount namespace. The effect is that the binary
running doesn't exist in the mount namespace. This confuses perf,
which fails to load symbols for that process.

To my understanding, by default, perf looks for the binary in the
process mount namespace. In this case clearly the binary wasn't there.
Ivan wrote a rough patch [1], which I just confirmed works. The patch
attempts, after a failure to load binary from pids mount namespace, to
load binary from the default mount namespace (the one running perf).

[1] https://lkml.org/lkml/2019/12/5/878

Marek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf not picking up symbols for namespaced processes
  2020-02-11 10:06         ` Marek Majkowski
@ 2020-02-11 13:46           ` Arnaldo Carvalho de Melo
  2020-02-11 13:54             ` Marek Majkowski
  0 siblings, 1 reply; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-02-11 13:46 UTC (permalink / raw)
  To: Marek Majkowski
  Cc: Jiri Olsa, Ivan Babrou, kernel-team, Arnaldo Carvalho de Melo,
	linux-kernel, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, sashal, Kenton Varda

Em Tue, Feb 11, 2020 at 10:06:35AM +0000, Marek Majkowski escreveu:
> Jirka,
> 
> On Tue, Feb 4, 2020 at 7:27 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > > 11913 openat(AT_FDCWD, "/proc/9512/ns/mnt", O_RDONLY) = 197
> > > 11913 setns(197, CLONE_NEWNS) = 0
> > > 11913 stat("/home/marek/bin/runsc-debug", 0x7fffffff8480) = -1 ENOENT
> > > (No such file or directory)
> > > 11913 setns(196, CLONE_NEWNS) = 0
> >
> > hi,
> > could you guys please share more details on what you run exactly,
> > and perhaps that change you mentioned?
> 
> I was debugging gvisor (runsc), which does execve(/proc/self/exe), and
> then messes up with its mount namespace. The effect is that the binary
> running doesn't exist in the mount namespace. This confuses perf,
> which fails to load symbols for that process.
> 
> To my understanding, by default, perf looks for the binary in the
> process mount namespace. In this case clearly the binary wasn't there.
> Ivan wrote a rough patch [1], which I just confirmed works. The patch
> attempts, after a failure to load binary from pids mount namespace, to
> load binary from the default mount namespace (the one running perf).
> 
> [1] https://lkml.org/lkml/2019/12/5/878

That is a fallback that works in this specific case, and, with a warning
or some explicitely specified option makes perf work with this specific
usecase, but either a warning ("fallback to root namespace binary
/foo/bar") or the explicit option, please, is that what that patch does?

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf not picking up symbols for namespaced processes
  2020-02-11 13:46           ` Arnaldo Carvalho de Melo
@ 2020-02-11 13:54             ` Marek Majkowski
  2020-02-11 14:28               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 9+ messages in thread
From: Marek Majkowski @ 2020-02-11 13:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Ivan Babrou, kernel-team, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Namhyung Kim,
	sashal, Kenton Varda

On Tue, Feb 11, 2020 at 1:46 PM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Tue, Feb 11, 2020 at 10:06:35AM +0000, Marek Majkowski escreveu:
> > Jirka,
> >
> > On Tue, Feb 4, 2020 at 7:27 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > > > 11913 openat(AT_FDCWD, "/proc/9512/ns/mnt", O_RDONLY) = 197
> > > > 11913 setns(197, CLONE_NEWNS) = 0
> > > > 11913 stat("/home/marek/bin/runsc-debug", 0x7fffffff8480) = -1 ENOENT
> > > > (No such file or directory)
> > > > 11913 setns(196, CLONE_NEWNS) = 0
> > >
> > > hi,
> > > could you guys please share more details on what you run exactly,
> > > and perhaps that change you mentioned?
> >
> > I was debugging gvisor (runsc), which does execve(/proc/self/exe), and
> > then messes up with its mount namespace. The effect is that the binary
> > running doesn't exist in the mount namespace. This confuses perf,
> > which fails to load symbols for that process.
> >
> > To my understanding, by default, perf looks for the binary in the
> > process mount namespace. In this case clearly the binary wasn't there.
> > Ivan wrote a rough patch [1], which I just confirmed works. The patch
> > attempts, after a failure to load binary from pids mount namespace, to
> > load binary from the default mount namespace (the one running perf).
> >
> > [1] https://lkml.org/lkml/2019/12/5/878
>
> That is a fallback that works in this specific case, and, with a warning
> or some explicitely specified option makes perf work with this specific
> usecase, but either a warning ("fallback to root namespace binary
> /foo/bar") or the explicit option, please, is that what that patch does?

You got it right, custom patch, to do something custom (look up in top
mount ns) yet on failure. I'm not sure how to make it more generic.

Furthermore, there is one more use case this patch doesn't support:
namely a situation when the binary is reachable in some mount
namespace, but not under sensible path. This can happen when we launch
a command under gvisor. Gvisor-sandbox runs under empty mount
namespace, the binary is delivered over 9p from gvisor-gofer process,
from potentially arbitrary path. In that scenario we have three mount
namespaces: the empty one running process, another one with access to
the binary, and host one.

I have two ideas how to solve the symbol discovery here:
 (a) give perf an explicit link (potentially including mount namespace
pid) to the binary
 (b) supply perf with /tmp/perf-<pid>.map file with symbols, extracted
via some external helper.

I tried (b) but failed, I'm not sure how to produce perf-pid.map from
a proper binary, using basic tools like readelf.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: perf not picking up symbols for namespaced processes
  2020-02-11 13:54             ` Marek Majkowski
@ 2020-02-11 14:28               ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-02-11 14:28 UTC (permalink / raw)
  To: Marek Majkowski
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ivan Babrou, kernel-team,
	linux-kernel, Peter Zijlstra, Ingo Molnar, Alexander Shishkin,
	Namhyung Kim, sashal, Kenton Varda

Em Tue, Feb 11, 2020 at 01:54:33PM +0000, Marek Majkowski escreveu:
> On Tue, Feb 11, 2020 at 1:46 PM Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
> > Em Tue, Feb 11, 2020 at 10:06:35AM +0000, Marek Majkowski escreveu:
> > > On Tue, Feb 4, 2020 at 7:27 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > > > > 11913 openat(AT_FDCWD, "/proc/9512/ns/mnt", O_RDONLY) = 197
> > > > > 11913 setns(197, CLONE_NEWNS) = 0
> > > > > 11913 stat("/home/marek/bin/runsc-debug", 0x7fffffff8480) = -1 ENOENT
> > > > > (No such file or directory)
> > > > > 11913 setns(196, CLONE_NEWNS) = 0

> > > > could you guys please share more details on what you run exactly,
> > > > and perhaps that change you mentioned?

> > > I was debugging gvisor (runsc), which does execve(/proc/self/exe), and
> > > then messes up with its mount namespace. The effect is that the binary
> > > running doesn't exist in the mount namespace. This confuses perf,
> > > which fails to load symbols for that process.

> > > To my understanding, by default, perf looks for the binary in the
> > > process mount namespace. In this case clearly the binary wasn't there.
> > > Ivan wrote a rough patch [1], which I just confirmed works. The patch
> > > attempts, after a failure to load binary from pids mount namespace, to
> > > load binary from the default mount namespace (the one running perf).

> > > [1] https://lkml.org/lkml/2019/12/5/878

> > That is a fallback that works in this specific case, and, with a warning
> > or some explicitely specified option makes perf work with this specific
> > usecase, but either a warning ("fallback to root namespace binary
> > /foo/bar") or the explicit option, please, is that what that patch does?

> You got it right, custom patch, to do something custom (look up in top
> mount ns) yet on failure. I'm not sure how to make it more generic.

We have buildids in binaries:

[acme@quaco ~]$ file /bin/bash
/bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=0cb50a07a621d02a0d2c7efec6743fddec845bfb, stripped
[acme@quaco ~]$ file /lib64/libc-2.29.so
/lib64/libc-2.29.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7ddecbbf9f22ec76c9e4a256fd1c06004a1907ce, for GNU/Linux 3.2.0, not stripped, too many notes (256)
[acme@quaco ~]$

We need to get this somehow from a given executable map, this comes and
goes in situations like this :-\

I.e. this info is in an ELF section:

[acme@quaco ~]$ readelf -SW /bin/bash | grep build-id
  [ 4] .note.gnu.build-id NOTE            0000000000000340 000340 000024 00   A  0   0  4
[acme@quaco ~]$

Somebody needs to associate that with that executable mmap at load time,
so that perf gets it via PERF_RECORD_MMAP3 instead of having to try,
optimistically, to get it from the binary (that may not be there when we
try to read it, or maybe in some place like you describe in this
message, or...) when generating its build-id perf.data header section:

[acme@seventh ~]$ perf record stress-ng --cpu 1 --timeout 1s
stress-ng: info:  [17622] dispatching hogs: 1 cpu
stress-ng: info:  [17622] successful run completed in 1.02s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.159 MB perf.data (4105 samples) ]
[acme@seventh ~]$ perf buildid-list
e9e69be73f7c5a4cee110ced52409371e95fe2a8 [kernel.kallsyms]
7133e5dbdfae821a9bbe4ba5467e49f6cf166e1d /usr/bin/stress-ng
bd5e36f101b175755c7943105390078dff596657 /usr/lib64/ld-2.29.so
1e292b30223c69eff986710c62eda48c561d43ca [vdso]
b8d7438178da8f84d89869addf6b5e36d356c555 /usr/lib64/libm-2.29.so
7ddecbbf9f22ec76c9e4a256fd1c06004a1907ce /usr/lib64/libc-2.29.so
[acme@seventh ~]$ file /usr/bin/stress-ng
/usr/bin/stress-ng: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=7133e5dbdfae821a9bbe4ba5467e49f6cf166e1d, stripped, too many notes (256)
[acme@seventh ~]$
 
> Furthermore, there is one more use case this patch doesn't support:
> namely a situation when the binary is reachable in some mount
> namespace, but not under sensible path. This can happen when we launch
> a command under gvisor. Gvisor-sandbox runs under empty mount
> namespace, the binary is delivered over 9p from gvisor-gofer process,
> from potentially arbitrary path. In that scenario we have three mount
> namespaces: the empty one running process, another one with access to
> the binary, and host one.
 
> I have two ideas how to solve the symbol discovery here:
>  (a) give perf an explicit link (potentially including mount namespace
> pid) to the binary
>  (b) supply perf with /tmp/perf-<pid>.map file with symbols, extracted
> via some external helper.
> 
> I tried (b) but failed, I'm not sure how to produce perf-pid.map from
> a proper binary, using basic tools like readelf.

Have you looked at:

[acme@quaco ~]$ perf report -h symfs

 Usage: perf report [<options>]

        --symfs <directory>
                          Look for files with symbols relative to this directory

[acme@quaco ~]$

?

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-02-11 14:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05  3:46 perf not picking up symbols for namespaced processes Ivan Babrou
2019-12-05 12:33 ` Arnaldo Carvalho de Melo
2019-12-06  2:17   ` Ivan Babrou
2020-02-04 15:09     ` Marek Majkowski
2020-02-04 19:26       ` Jiri Olsa
2020-02-11 10:06         ` Marek Majkowski
2020-02-11 13:46           ` Arnaldo Carvalho de Melo
2020-02-11 13:54             ` Marek Majkowski
2020-02-11 14:28               ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.