Linux-Trace-Devel Archive on
 help / color / Atom feed
From: Steven Rostedt <>
To: Primiano Tucci <>
Cc: Joel Fernandes <>,
	Andrew Morton <>,,
	Carmen Jackson <>,, Daniel Colascione <>,,,
	Mayank Gupta <>,,,,, Tim Murray <>,,,,
	Mathieu Desnoyers <>,
	Linux Trace Devel <>
Subject: Re: [patch 026/158] mm: emit tracepoint when RSS changes
Date: Tue, 3 Dec 2019 09:53:38 -0500
Message-ID: <20191203095338.7974a03c@gandalf.local.home> (raw)
In-Reply-To: <>

On Tue, 3 Dec 2019 02:48:37 +0000
Primiano Tucci <> wrote:

> On Mon, Dec 2, 2019 at 11:53 PM Steven Rostedt <> wrote:
> On Mon, 2 Dec 2019 18:45:14 -0500
> Joel Fernandes <> wrote:
> >
> > I would love for that to happen but I don't develop Perfetto much. If I am
> > writing a tool I will definitely give it a go from my side. CC'ing Perfetto's
> > lead developer Primiano -- I believe you have already met Primiano at a
> > conference before as he mentioned it to me that you guys met. I also believe
> > this topic of using a common library was discussed before, but something
> > about licensing came up.  
> Oh hello again!
> > libtraceevent is under LGPL, is that an issue?  
> Unfortunately yes, it is :/
> Our process for incorporating GPL or LGPL code makes Perfetto [1] (which is
> Apache-2 licensed) problematic for us and recursively for other projects that
> depend on us.

The reason for the LGPL was so that this code could be used by non GPL
code. Too bad the process is what limits this.

> For context, Perfetto is a cross-platform tracing project based on shmem and
> protobuf, shipped on production devices and used by other app-developer-facing
> tools (e.g. [6, 7]). It deals with both:
> (1) pure userspace-to-userspace tracing (on all major OSs).
> (2) kernel tracing via ftrace/tracefs (only on Linux/Android).
> explains it a bit more.

> >From my viewpoint, it would be great if Linux treated the individual fields of  
> ftrace events as an ABI, given that ftrace events are exposed in their binary
> form to userspace through tracefs (which is something I'm extremely grateful
> for).

If this were to be the case, trace events would not even exist in the
kernel. The reason new trace events do not appear in the scheduler, and
the VFS layer refuses to add *any* trace events, is because they are
already a partial ABI. Making them an ABI would tightly couple the
implementation of Linux to the exported trace events. The reason I
created libtraceevent was to make it easier to modify trace events
without people worrying too much about the parsing of the events.

Just look at this thread. This started because I wanted to move the
shift operation from the fast path to the slow "print" path, and the
reason for not doing so is because we have a user space dependency on

Really, the kernel developers are fearful about trace events causing
problems with future development. And just saying what you stated
re-enforces their beliefs, and expect to see less trace events being
exposed. This has come up at several kernel/maintainer summits, and
Linus finally has black listed the topic, stating that if it breaks
user space, it gets reverted, PERIOD!

> We use ftrace_pipe_raw through Perfetto in Android since 2017 and it has been
> working great with the exception of a few cases, mainly enums (see below).
> I'd love if we could solve the enum problem in a way that didn't involve running
> a C-preprocessor-alike runtime on the format files, regardless of licensing.
> Unfortunately I don't have any docs that describe the Perfetto <> ftrace interop
> in great details. I apologize for that. I'll fix it soon but in the meanwhile
> I'll try to do my best to summarize this part of Perfetto here:
> Our ftrace-interop code read()s the binary per_cpu/*/trace_pipe_raw, very
> similarly in spirit to what trace_cmd does.
> However, unlike trace_cmd, we convert the trace event stream into protobufs
> (e.g., [4]), doing binary-to-binary conversions (ftrace raw pipe -> protobuf) at
> runtime, asynchronously, on the device being traced.
> The way we deal with format files in Perfetto is twofold:
> 1) We have an archive of known format files for the most common kernels we care
>   about. From this archive we generate protobuf schema files like [4]
>   at Perfetto-compile-time (which is != compile time of the target device's
>   kernel).
>   At runtime, on-device, we read the format files from tracefs and we merge
>   the compile-time knowledge (about messages and field names) with the ABI
>   described by tracefs' format files (message IDs, field types and offsets).
>   We make the following assumption about the ABI of raw ftrace events:
>   - We do *not* rely on message IDs being stable.
>   - We do *not* rely on the field offset to be stable.
>   - We do *not* rely on the size and length of int fields to be stable.
>   - We do *not* assume the presence of any field.
>   - We can detect if a field's type doesn't match anymore (e.g. a string became
>     an int) and ignore the field in that case.
>   - We only deal with fields whose name matches what known at compile-time.
>   This allows us to turn the raw ftrace into a binary-stable protobuf (modulo
>   some fields that might be missing) and allows us to play some other tricks
>   to reduce the size of the trace (e.g. intern/dedupe thread names).
> 2) We have a generic schema [5] to transcode ftrace events that we didn't know
>    at Perfetto-compile-time. This allows us to deal with both ftrace events
>    introduced by future kernel versions or as a fallback for events in 1) where
>    we detect ABI mismatches at runtime. The downside of this generic
>    schema is that the cost of each event, in terms of trace-size, is
>    significantly higher.

Thanks for the detailed explanation.

Question, can the tool that reads the tracefs to feed the message ids,
fields, and sizes have LGPL in it? Then that tool itself could have
libtraceveent, and then you could create your own event format to feed
into the other compiled tool. This could solve a lot of issues.

> There is a point where both 1 and 2 become problematic for us, and this is enums
> and, more in general, any ftrace field depends on macro expansions (which turns
> out to be mainly enums, in practice).
> For instance, the gfp_flags of ftrace events directly reflects the internal enum
> values, which are not stable across kenrnel versions. We had to come up
> with an internal map to catch up with the various kernel versions.
> There are few other cases like gfp_flags but they are quite rare and we ended up
> not needing those events, at least until now.

You are probably dealing with older kernels, as a bunch of newer
kernels have in it:


Which will convert the enums into their actual values before exporting
it to tracefs. And when you have this combined with
CONFIG_TRACE_EVAL_MAP_FILE, you get a file in tracefs called eval_map
that shows the mapping of enums with their values (although, the enums
should have been converted in the tracefs format files).

BTW, the gfp flags look to be defines, and are converted in the format

> Beyond this, ingesting the raw trace events from ftrace raw pipes it has been
> great for all other events without requiring any other parsing library
> Super thanks for all the hard work on developing and maintaining ftrace.
> Happy to discuss more on IRC, email or VC if you want to know more,
> Primiano.

Feel free to join us on #linux-rt on IRC OFTC if you are not already

-- Steve

> [1]
> [2]
> [3]
> [4]
> [5]
> [6]
> [7]

           reply index

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <>]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191203095338.7974a03c@gandalf.local.home \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Trace-Devel Archive on

Archives are clonable:
	git clone --mirror linux-trace-devel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-trace-devel linux-trace-devel/ \
	public-inbox-index linux-trace-devel

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone