Re: [patch 026/158] mm: emit tracepoint when RSS changes

From: Primiano Tucci <primiano@google.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 aneesh.kumar@linux.ibm.com,
	Carmen Jackson <carmenjackson@google.com>,
	 dan.j.williams@intel.com, Daniel Colascione <dancol@google.com>,
	jglisse@redhat.com,  linux-mm@kvack.org,
	Mayank Gupta <mayankgupta@google.com>,
	mhocko@suse.com,  minchan@kernel.org, mm-commits@vger.kernel.org,
	rcampbell@nvidia.com,  Tim Murray <timmurray@google.com>,
	torvalds@linux-foundation.org, vbabka@suse.cz,
	 willy@infradead.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [patch 026/158] mm: emit tracepoint when RSS changes
Date: Tue, 3 Dec 2019 02:48:37 +0000	[thread overview]
Message-ID: <CA+yH71er-nUc2NJsVdJjKi5tgE4JvwEEjJrbbhby4zzR05CxHw@mail.gmail.com> (raw)
In-Reply-To: <20191202185324.30b502bb@gandalf.local.home>

On Mon, Dec 2, 2019 at 11:53 PM Steven Rostedt <rostedt@goodmis.org> wrote:
On Mon, 2 Dec 2019 18:45:14 -0500
Joel Fernandes <joel@joelfernandes.org> wrote:

>
> I would love for that to happen but I don't develop Perfetto much. If I am
> writing a tool I will definitely give it a go from my side. CC'ing Perfetto's
> lead developer Primiano -- I believe you have already met Primiano at a
> conference before as he mentioned it to me that you guys met. I also believe
> this topic of using a common library was discussed before, but something
> about licensing came up.

Oh hello again!

> libtraceevent is under LGPL, is that an issue?

Unfortunately yes, it is :/
Our process for incorporating GPL or LGPL code makes Perfetto [1] (which is
Apache-2 licensed) problematic for us and recursively for other projects that
depend on us.

For context, Perfetto is a cross-platform tracing project based on shmem and
protobuf, shipped on production devices and used by other app-developer-facing
tools (e.g. [6, 7]). It deals with both:
(1) pure userspace-to-userspace tracing (on all major OSs).
(2) kernel tracing via ftrace/tracefs (only on Linux/Android).
https://docs.perfetto.dev/ explains it a bit more.

Today Perfetto is embedded and used both by Chrome [2] and Android platform [3].
For both projects, pulling LGPL-licensed code is cumbersome process-wise: It
would require us to put mechanism in place to guarantee that the relevant LGPL
dependencies don't get accidentally linked in any production binary but only
used for the standalone offline tools to analyze traces.
Such process is unfortunately very expensive to setup and maintain for us and
for the projects that depend on us.
I don't want to start an ideological battle about licensing. To be clear, I
don't have any issues with LGPL, nor I think there's anything inherently
wrong with it. Just, it makes things too complicated when a smaller sub-project
like ours is embedded in larger projects.

Anyhow, beyond licensing, the principle of grabbing the format files on-device
and bundling them as part of the trace is also problematic for us on Android for
technical reasons (mainly interoperability with other tools that depend on
Perfetto).

From my viewpoint, it would be great if Linux treated the individual fields of
ftrace events as an ABI, given that ftrace events are exposed in their binary
form to userspace through tracefs (which is something I'm extremely grateful
for).

We use ftrace_pipe_raw through Perfetto in Android since 2017 and it has been
working great with the exception of a few cases, mainly enums (see below).
I'd love if we could solve the enum problem in a way that didn't involve running
a C-preprocessor-alike runtime on the format files, regardless of licensing.

Unfortunately I don't have any docs that describe the Perfetto <> ftrace interop
in great details. I apologize for that. I'll fix it soon but in the meanwhile
I'll try to do my best to summarize this part of Perfetto here:

Our ftrace-interop code read()s the binary per_cpu/*/trace_pipe_raw, very
similarly in spirit to what trace_cmd does.
However, unlike trace_cmd, we convert the trace event stream into protobufs
(e.g., [4]), doing binary-to-binary conversions (ftrace raw pipe -> protobuf) at
runtime, asynchronously, on the device being traced.

The way we deal with format files in Perfetto is twofold:
1) We have an archive of known format files for the most common kernels we care
  about. From this archive we generate protobuf schema files like [4]
  at Perfetto-compile-time (which is != compile time of the target device's
  kernel).
  At runtime, on-device, we read the format files from tracefs and we merge
  the compile-time knowledge (about messages and field names) with the ABI
  described by tracefs' format files (message IDs, field types and offsets).
  We make the following assumption about the ABI of raw ftrace events:
  - We do *not* rely on message IDs being stable.
  - We do *not* rely on the field offset to be stable.
  - We do *not* rely on the size and length of int fields to be stable.
  - We do *not* assume the presence of any field.
  - We can detect if a field's type doesn't match anymore (e.g. a string became
    an int) and ignore the field in that case.
  - We only deal with fields whose name matches what known at compile-time.
  This allows us to turn the raw ftrace into a binary-stable protobuf (modulo
  some fields that might be missing) and allows us to play some other tricks
  to reduce the size of the trace (e.g. intern/dedupe thread names).

2) We have a generic schema [5] to transcode ftrace events that we didn't know
   at Perfetto-compile-time. This allows us to deal with both ftrace events
   introduced by future kernel versions or as a fallback for events in 1) where
   we detect ABI mismatches at runtime. The downside of this generic
   schema is that the cost of each event, in terms of trace-size, is
   significantly higher.

There is a point where both 1 and 2 become problematic for us, and this is enums
and, more in general, any ftrace field depends on macro expansions (which turns
out to be mainly enums, in practice).
For instance, the gfp_flags of ftrace events directly reflects the internal enum
values, which are not stable across kenrnel versions. We had to come up
with an internal map to catch up with the various kernel versions.
There are few other cases like gfp_flags but they are quite rare and we ended up
not needing those events, at least until now.

Beyond this, ingesting the raw trace events from ftrace raw pipes it has been
great for all other events without requiring any other parsing library
Super thanks for all the hard work on developing and maintaining ftrace.

Happy to discuss more on IRC, email or VC if you want to know more,
Primiano.

[1] https://docs.perfetto.dev
[2] https://cs.chromium.org/chromium/src/third_party/perfetto/?q=f:perfetto&sq=package:chromium&dr
[3] https://android.googlesource.com/platform/external/perfetto/
[4] https://android.googlesource.com/platform/external/perfetto/+/refs/heads/master/protos/perfetto/trace/ftrace/sched.proto
[5] https://android.googlesource.com/platform/external/perfetto/+/refs/heads/master/protos/perfetto/trace/ftrace/generic.proto
[6] https://github.com/google/gapid/
[7] https://developers.google.com/web/tools/chrome-devtools