linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Beau Belgrave <beaub@linux.microsoft.com>
To: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: rostedt@goodmis.org, mhiramat@kernel.org,
	linux-trace-devel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [PATCH v8 00/12] user_events: Enable user processes to create and write to trace events
Date: Mon, 18 Apr 2022 17:25:49 -0700	[thread overview]
Message-ID: <20220419002549.GA2055@kbox> (raw)
In-Reply-To: <Yl3NcRRIJMrd0WZu@laniakea>

On Mon, Apr 18, 2022 at 10:43:29PM +0200, Hagen Paul Pfeifer wrote:
> * Beau Belgrave | 2021-12-16 09:34:59 [-0800]:
> 
> >The typical scenario is on process start to mmap user_events_status. Processes
> >then register the events they plan to use via the REG ioctl. The ioctl reads
> >and updates the passed in user_reg struct. The status_index of the struct is
> >used to know the byte in the status page to check for that event. The
> >write_index of the struct is used to describe that event when writing out to
> >the fd that was used for the ioctl call. The data must always include this
> >index first when writing out data for an event. Data can be written either by
> >write() or by writev().
> 
> Hey Beau, a little bit late to the party. A few questions from my side: What
> are the exact weak points of USDT compared to User Events that stand in the
> way of further extend USDT (in a non-compatible way, sure, just as an
> different approach!)? The nice thing about USDT is that I can search for all
> possible probes of the system via "find / | readelf | ". Since they are listed
> in a dedicated ELF section (.note.stapsdt) - they are visible & transparent. I
> can also map a hierarchy/structure in Executable/DSO via clever choice of
> names. The big disadvantage of USDT is the lack of type information, but from
> a registration, explicit point of view, they are nice.
> 
> Or in other words: why not extends the USDT approach? Why not
> 
> u32 val = 23;
> const char *garbage = "tracestring";
> 
> DYNAMIC_TRACE_PROBE2("foo:bar", val, u32, garbage, cstring);
> 

We actually tried some USDT extension methods early on, by extending the
.note.stapsdt sections and seeing how far we could get our definitions
into that form.

There are a few problems when running in a highly container/CGROUP
environment even if you can get our formats into stapsdt.

It costs a lot to transverse every ELF file on the machine to find all
the notes. When profiling or tracing many containers, each cgroup's
mount space must be entered and then tracked. Since these files are in
different locations, they each need a separate probe definition, since
the definitions/patches are tied to the location of the binary to patch.

As new cgroups come online, we would have to keep track of each new
binary location and find probes that match their location. This becomes
really hard to manage if for example we just want to always enable a
specific event regardless of where it is on the filesystem. Events are
limited to a max of 2^16 having many duplicate events in the system
might start to approach that limit for high-core machines with many
small cgroup isolations.

We run programs that are built on interpreted or JIT'd code (C#,
javascript, etc.). These don't have great places to put a stap
definition, since they aren't ELF files. I've seen approaches where
temporary ELF files are generated, however, this costs a lot. Now we
have even more temporarily files to go patch, meaning more events and
more probe definitions (many of them in our case would be duplicates of
the others).

In production environments we have them locked down heavily with both
SELINUX and IPE enabled. This prevents us from patching user mode code
on the fly, the typical perf probe calls fail here.

We typically want to know what events are available to us with very
little overhead. Having programs register to a well known location
already (trace_events, tracefs) I can easily see all the user events on
the system by just doing ls on /sys/kernel/tracing/events/user_events. I
can also see all their data formats and easily enable hist and filtering
since these formats are known to the kernel.

In our testing uprobes are much more costly to the running program than
the write syscall.

For managed code, as in java, code is moving around and are not always
in static locations. The probe locations can change, etc. Calling from
a managed location into a native one has performance implications as
well when using a dynamic/temp elf stub approach.

We are actively using user_events to solve these problems in our
environments that have previously seen high overheads to achieve the
same results. Many times we cannot afford to miss any events, so live
scanning for new ELF files doesn't work for us as the programs and
cgroups are short lived.

> 
> Sure, the argument names, here "val" and "garbage" should also be saved. I
> also like the "just one additional header to the project to get things
> running" (#include "sdt.h"). Sure, a DYNAMIC_TRACE_IS_ACTIVE("foo:bar") would
> be great. But in fact we have never needed that in the past.
> 
> 
> hgn

Thanks,
-Beau

      reply	other threads:[~2022-04-19  0:26 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-16 17:34 [PATCH v8 00/12] user_events: Enable user processes to create and write to trace events Beau Belgrave
2021-12-16 17:35 ` [PATCH v8 01/12] user_events: Add minimal support for trace_event into ftrace Beau Belgrave
2021-12-21 15:16   ` Masami Hiramatsu
2022-01-03 18:22     ` Beau Belgrave
2021-12-16 17:35 ` [PATCH v8 02/12] user_events: Add print_fmt generation support for basic types Beau Belgrave
2021-12-22  0:30   ` Masami Hiramatsu
2022-01-03 18:56     ` Beau Belgrave
2021-12-16 17:35 ` [PATCH v8 03/12] user_events: Handle matching arguments from dyn_events Beau Belgrave
2021-12-22  6:19   ` Masami Hiramatsu
2021-12-16 17:35 ` [PATCH v8 04/12] user_events: Add basic perf and eBPF support Beau Belgrave
2021-12-22  7:55   ` Masami Hiramatsu
2021-12-16 17:35 ` [PATCH v8 05/12] user_events: Add self-test for ftrace integration Beau Belgrave
2021-12-16 17:35 ` [PATCH v8 06/12] user_events: Add self-test for dynamic_events integration Beau Belgrave
2021-12-16 17:35 ` [PATCH v8 07/12] user_events: Add self-test for perf_event integration Beau Belgrave
2021-12-16 17:35 ` [PATCH v8 08/12] user_events: Optimize writing events by only copying data once Beau Belgrave
2021-12-22 15:11   ` Masami Hiramatsu
2022-01-03 18:58     ` Beau Belgrave
2022-01-06 22:17   ` Steven Rostedt
2022-01-06 23:05     ` Beau Belgrave
2021-12-16 17:35 ` [PATCH v8 09/12] user_events: Add documentation file Beau Belgrave
2021-12-22 14:18   ` Masami Hiramatsu
2022-01-03 23:01     ` Beau Belgrave
2022-01-06 21:14       ` Steven Rostedt
2021-12-16 17:35 ` [PATCH v8 10/12] user_events: Add sample code for typical usage Beau Belgrave
2021-12-22 23:18   ` Masami Hiramatsu
2022-01-06 22:09     ` Steven Rostedt
2022-01-06 23:06       ` Beau Belgrave
2021-12-16 17:35 ` [PATCH v8 11/12] user_events: Validate user payloads for size and null termination Beau Belgrave
2021-12-23  0:08   ` Masami Hiramatsu
2022-01-03 18:53     ` Beau Belgrave
2022-01-06 23:32       ` Masami Hiramatsu
2022-01-07  1:01         ` Beau Belgrave
2021-12-16 17:35 ` [PATCH v8 12/12] user_events: Add self-test for validator boundaries Beau Belgrave
2022-04-18 20:43 ` [PATCH v8 00/12] user_events: Enable user processes to create and write to trace events Hagen Paul Pfeifer
2022-04-19  0:25   ` Beau Belgrave [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220419002549.GA2055@kbox \
    --to=beaub@linux.microsoft.com \
    --cc=hagen@jauu.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-devel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).