Re: [lttng-dev] Userspace tracing in docker containers - Eqbal via lttng-dev

From: Eqbal via lttng-dev <lttng-dev@lists.lttng.org>
To: Jonathan Rajotte-Julien <jonathan.rajotte-julien@efficios.com>
Cc: lttng-dev@lists.lttng.org
Subject: Re: [lttng-dev] Userspace tracing in docker containers
Date: Mon, 3 May 2021 16:45:07 -0700	[thread overview]
Message-ID: <CAPj=WkhE+DB3_mvtKeRPBPyZ2RHztoM5P2ZKYk8rhTzLQekwZg@mail.gmail.com> (raw)
In-Reply-To: <20210406140753.GE79283@joraj-alpa>

[-- Attachment #1.1: Type: text/plain, Size: 5640 bytes --]

Thanks for the responses. The reasoning makes sense. We have decided to run
lttng-sessiond on the host. Our trace generating application will run in a
container and so will our libbabeltrace based trace consumer app (using
live sessions).

On Tue, Apr 6, 2021 at 7:07 AM Jonathan Rajotte-Julien <
jonathan.rajotte-julien@efficios.com> wrote:

> Hi,
>
> On Mon, Apr 05, 2021 at 11:09:39AM -0700, Eqbal via lttng-dev wrote:
> > Hi,
> >
> > I am trying to get user space tracing working for an application running
> in
> > a docker container. I am running lttng session daemon in another
> container.
> > I mounted the unix socket locations (either /var/run/lttng for root or
> > $HOME/.lttng for another user). By doing that I can run commands like
> lttng
> > create or lttng list <session-name>, but the tracepoint events from the
> > application don't get registered and there is no trace output.
> >
> > I enabled LTTNG_UST_DEBUG an ran lttng-sessiond in verbose mode (-vvv and
> > --verbose-consumer) and got the following error message:
> >
> > "*Unix socket credential pid=0. Refusing application in distinct,
> > non-nested pid namespace.*"
> >
> > It appears that for some calls to the session daemon there is a
> getsockopt
> > syscall made with *SO_PEERCRED* which returns 0 for pid and the call is
> > failed with *LTTNG_UST_ERR_PEERCRED_PID* error (see get_cred call in
> > ustctl.c).
> >
> > If I comment out the getsockopt call, my application tracing starts to
> work.
> >
> > From what I found, docker cannot support getsockopt/SO_PEERCRED call to
> get
> > peer pid on the unix socket which would make sense as it's in a separate
> > namespace.
> >
> > I have a few questions on this:
> > 1. What is the reason for the get_cred/getsockopt call with SO_PEERCRED?
> I
> > would like to understand why it's required for some and not other calls.
>
>
> More information is found in the introducing commit:
>
>   commit a834901f2890deadb815d7f9e3ab79c3ba673994
>   Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>   Date:   Mon Oct 12 16:52:03 2020 -0400
>
>     Fix: Use unix socket peercred for pid, uid, gid credentials
>
>     Currently, the session daemon trust the pid, ppid, uid, and gid values
>     passed by the application, but should really validate the uid using
> unix
>     socket peercred. This fix uses the peercred values rather than the
>     values provided by the application on registration for:
>
>     - pid, uid and gid on Linux,
>     - uid and gid on FreeBSD.
>
>     This should improve how the session daemon deals with containerized
>     applications on Linux as well. Applications are required to be either
> in
>     the same pid namespace, or in a pid namespace nested within the pid
>     namespace of the lttng-sessiond, so the session daemon can map the
>     application pid to something meaningful within its own pid namespace.
>     Applications in a unrelated (disjoint) pid namespace will be refused by
>     the session daemon.
>
>     About the uid and gid with user namespaces on Linux, those will provide
>     meaningful IDs if the application user namespace is either the same as
>     the user namespace of the session daemon, or a nested user namespace.
>     Otherwise, the IDs will be that of /proc/sys/kernel/overflowuid and
>     /proc/sys/kernel/overflowgid, which typically maps to nobody.nogroup on
>     current distributions.
>
>     Given that fetching the parent pid (ppid) of the application would
>     require to use /proc/<pid>/status (which is racy wrt pid reuse), expose
>     the ppid provided by the application on registration instead, but only
>     in situations where the application sits in the same pid namespace as
>     the session daemon (on Linux), which is detected by checking if the pid
>     provided by the application matches the pid obtained using unix socket
>     credentials. The ppid is only used for logging/debugging purposes in
> the
>     session daemon anyway, so it is OK to use the value provided by the
>     application for that purpose.
>
>     Fixes: #1286
>     Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>     Change-Id: I94742e57dad642106908d09e2c7e395993c2c48f
>
> As for "why it's required for some and not other calls.", there is a
> difference
> between communicating with a lttng-sessiond daemon (using the lttng CLI)
> and
> userspace application registering. They are essentially two distinct
> communication interface. Now, to be honest, I'm not certain of the complete
> "security" policy for the lttng-sessiond <-> CLI interface and if we
> should be
> more strict or not.
>
> > 2. Is there any workaround for this problem, so that I can get this to
> work
> > with the container topology I am working with (app in one container and
> > lttng daemons in another).
>
> Based on the commit message, lttng-ust explicitly cannot be used across
> non-nested pid namespace.
>
> Could you give us more information on the goal for the topology you plan
> to use?
> This could lead to further discussion and/or alternative solution based on
> the
> goal and constraints of your deployment.
>
> > 3. Related to 2, are there any gotchas to bypassing the getsockopt call
> in
> > get_cred?
>
> Based on the content of the mentioned bug (1286) [1],  the principal
> concern is:
>
> "
> This means a non-root application could theoretically impersonate a root
> application from a tracing perspective, and thus access root tracing
> buffers in
> a per-uid configuration, which is unwanted.
> "
>
> [1] https://bugs.lttng.org/issues/1286
>
> Cheers
>
> --
> Jonathan Rajotte-Julien
> EfficiOS
>

[-- Attachment #1.2: Type: text/html, Size: 6753 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev