Re: [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces

From: Josef Bacik <jbacik@fb.com>
To: Steven Rostedt <rostedt@goodmis.org>,
	"ksummit-discuss@lists.linux-foundation.org"
	<ksummit-discuss@lists.linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Josef Bacik <josef@toxicpanda.com>
Subject: Re: [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces
Date: Wed, 20 Sep 2017 14:54:07 +0000	[thread overview]
Message-ID: <0C1E6F2D-2E7D-4477-9F35-8C59F62BB409@fb.com> (raw)
In-Reply-To: <20170920095031.1972fba5@gandalf.local.home>

Cc’ing my personal address so I can reply with a sane email client.

On 9/20/17, 9:50 AM, "Steven Rostedt" <rostedt@goodmis.org> wrote:

The topic came up again at (of all places) the Schedule Workloads
Microconf at Linux Plumbers in LA last week. The addition of
tracepoints in locations that maintainers don't want them, only because
they don't want them to become an ABI for user space tools. Where
these tools then must be supported indefinitely, and may prevent
future development of the kernel. This includes the scheduler as well
as VFS (mandated by Al Viro).

The current solution by Facebook (told to us by Josef Bacik) is to just
hand write kprobes with BPF programs to the locations that they need.
When they get a new kernel, they just rewrite the programs because the
kprobes and BPF programs break at each new release (or can break).

First it was mentioned to add a hook to locations where it would be
easier to get variables, as the compiler could optimize them out, and
it becomes difficult even with BPF and kprobes to get the information
one would like to have. It was asked if we could add a tracepoint hook
in these locations that are not exported to user space where it runs
the risk of becoming an ABI. It was pointed out that this mechanism
already exists in the kernel.

A tracepoint is the hook in the kernel. The TRACE_EVENT() macro is
built on top of a tracepoint to export it to user space. But the
tracepoint itself can be manually added anywhere and there will be no
creation of trace event files in the tracefs directory, nor would perf
be able to access it. But the advantage of having this hook is that a
kernel module could access it without a problem.

By adding tracepoints in the scheduler and VFS, without the TRACE_EVENT
macros that export them to user space, it would be much easier for
companies like Facebook, Red Hat and SuSE to add a module that can tap
into these hooks and build their custom analysis tools on top.

Requiring an external and custom module to access the tracepoints on
live systems (that is, an unmodified vanilla kernel or distro kernel)
will help these companies implement advance analytical tools to monitor
their production kernels, and because it requires a module, and it has
been stated several times in the past that there is no KABI with module
interfaces, the maintainers of these hooks should have no fear that
they will become a stable interface.

Now, I will also point out that if one of the tracepoint hooks prove to
be useful for a generic tool, then this could be an incentive to have
the maintainer change the tracepoint hook into a full blown
TRACE_EVENT() and upgrade it to an ABI, after having time to see how it
is useful. This is a better method than having tens of trace events
where one random one proves to be useful for tools and surprises the
maintainer that the code it affects can no longer be changed.

Thoughts?

-- Steve