[Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces
@ 2017-09-20 13:50 Steven Rostedt
  2017-09-20 14:54 ` Josef Bacik
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2017-09-20 13:50 UTC (permalink / raw)
  To: ksummit-discuss; +Cc: Josef Bacik, Peter Zijlstra

The topic came up again at (of all places) the Schedule Workloads
Microconf at Linux Plumbers in LA last week. The addition of
tracepoints in locations that maintainers don't want them, only because
they don't want them to become an ABI for user space tools. Where
these tools then must be supported indefinitely, and may prevent
future development of the kernel. This includes the scheduler as well
as VFS (mandated by Al Viro).

The current solution by Facebook (told to us by Josef Bacik) is to just
hand write kprobes with BPF programs to the locations that they need.
When they get a new kernel, they just rewrite the programs because the
kprobes and BPF programs break at each new release (or can break).

First it was mentioned to add a hook to locations where it would be
easier to get variables, as the compiler could optimize them out, and
it becomes difficult even with BPF and kprobes to get the information
one would like to have. It was asked if we could add a tracepoint hook
in these locations that are not exported to user space where it runs
the risk of becoming an ABI. It was pointed out that this mechanism
already exists in the kernel.

A tracepoint is the hook in the kernel. The TRACE_EVENT() macro is
built on top of a tracepoint to export it to user space. But the
tracepoint itself can be manually added anywhere and there will be no
creation of trace event files in the tracefs directory, nor would perf
be able to access it. But the advantage of having this hook is that a
kernel module could access it without a problem.

By adding tracepoints in the scheduler and VFS, without the TRACE_EVENT
macros that export them to user space, it would be much easier for
companies like Facebook, Red Hat and SuSE to add a module that can tap
into these hooks and build their custom analysis tools on top.

Requiring an external and custom module to access the tracepoints on
live systems (that is, an unmodified vanilla kernel or distro kernel)
will help these companies implement advance analytical tools to monitor
their production kernels, and because it requires a module, and it has
been stated several times in the past that there is no KABI with module
interfaces, the maintainers of these hooks should have no fear that
they will become a stable interface.

Now, I will also point out that if one of the tracepoint hooks prove to
be useful for a generic tool, then this could be an incentive to have
the maintainer change the tracepoint hook into a full blown
TRACE_EVENT() and upgrade it to an ABI, after having time to see how it
is useful. This is a better method than having tens of trace events
where one random one proves to be useful for tools and surprises the
maintainer that the code it affects can no longer be changed.

Thoughts?

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces
  2017-09-20 13:50 [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces Steven Rostedt
@ 2017-09-20 14:54 ` Josef Bacik
  2017-09-20 15:04   ` Josef Bacik
  0 siblings, 1 reply; 7+ messages in thread
From: Josef Bacik @ 2017-09-20 14:54 UTC (permalink / raw)
  To: Steven Rostedt, ksummit-discuss; +Cc: Peter Zijlstra, Josef Bacik

Cc’ing my personal address so I can reply with a sane email client.

On 9/20/17, 9:50 AM, "Steven Rostedt" <rostedt@goodmis.org> wrote:

The topic came up again at (of all places) the Schedule Workloads
Microconf at Linux Plumbers in LA last week. The addition of
tracepoints in locations that maintainers don't want them, only because
they don't want them to become an ABI for user space tools. Where
these tools then must be supported indefinitely, and may prevent
future development of the kernel. This includes the scheduler as well
as VFS (mandated by Al Viro).

The current solution by Facebook (told to us by Josef Bacik) is to just
hand write kprobes with BPF programs to the locations that they need.
When they get a new kernel, they just rewrite the programs because the
kprobes and BPF programs break at each new release (or can break).

First it was mentioned to add a hook to locations where it would be
easier to get variables, as the compiler could optimize them out, and
it becomes difficult even with BPF and kprobes to get the information
one would like to have. It was asked if we could add a tracepoint hook
in these locations that are not exported to user space where it runs
the risk of becoming an ABI. It was pointed out that this mechanism
already exists in the kernel.

A tracepoint is the hook in the kernel. The TRACE_EVENT() macro is
built on top of a tracepoint to export it to user space. But the
tracepoint itself can be manually added anywhere and there will be no
creation of trace event files in the tracefs directory, nor would perf
be able to access it. But the advantage of having this hook is that a
kernel module could access it without a problem.

By adding tracepoints in the scheduler and VFS, without the TRACE_EVENT
macros that export them to user space, it would be much easier for
companies like Facebook, Red Hat and SuSE to add a module that can tap
into these hooks and build their custom analysis tools on top.

Requiring an external and custom module to access the tracepoints on
live systems (that is, an unmodified vanilla kernel or distro kernel)
will help these companies implement advance analytical tools to monitor
their production kernels, and because it requires a module, and it has
been stated several times in the past that there is no KABI with module
interfaces, the maintainers of these hooks should have no fear that
they will become a stable interface.

Now, I will also point out that if one of the tracepoint hooks prove to
be useful for a generic tool, then this could be an incentive to have
the maintainer change the tracepoint hook into a full blown
TRACE_EVENT() and upgrade it to an ABI, after having time to see how it
is useful. This is a better method than having tens of trace events
where one random one proves to be useful for tools and surprises the
maintainer that the code it affects can no longer be changed.

Thoughts?

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces
  2017-09-20 14:54 ` Josef Bacik
@ 2017-09-20 15:04   ` Josef Bacik
  2017-09-20 15:13     ` Steven Rostedt
  2017-09-29 23:50     ` Alexei Starovoitov
  0 siblings, 2 replies; 7+ messages in thread
From: Josef Bacik @ 2017-09-20 15:04 UTC (permalink / raw)
  To: Josef Bacik; +Cc: ksummit-discuss, Peter Zijlstra, Josef Bacik

On Wed, Sep 20, 2017 at 02:54:07PM +0000, Josef Bacik wrote:
> Cc’ing my personal address so I can reply with a sane email client.
> 
> On 9/20/17, 9:50 AM, "Steven Rostedt" <rostedt@goodmis.org> wrote:
> 
> The topic came up again at (of all places) the Schedule Workloads
> Microconf at Linux Plumbers in LA last week. The addition of
> tracepoints in locations that maintainers don't want them, only because
> they don't want them to become an ABI for user space tools. Where
> these tools then must be supported indefinitely, and may prevent
> future development of the kernel. This includes the scheduler as well
> as VFS (mandated by Al Viro).
> 
> The current solution by Facebook (told to us by Josef Bacik) is to just
> hand write kprobes with BPF programs to the locations that they need.
> When they get a new kernel, they just rewrite the programs because the
> kprobes and BPF programs break at each new release (or can break).
> 
> First it was mentioned to add a hook to locations where it would be
> easier to get variables, as the compiler could optimize them out, and
> it becomes difficult even with BPF and kprobes to get the information
> one would like to have. It was asked if we could add a tracepoint hook
> in these locations that are not exported to user space where it runs
> the risk of becoming an ABI. It was pointed out that this mechanism
> already exists in the kernel.
> 
> A tracepoint is the hook in the kernel. The TRACE_EVENT() macro is
> built on top of a tracepoint to export it to user space. But the
> tracepoint itself can be manually added anywhere and there will be no
> creation of trace event files in the tracefs directory, nor would perf
> be able to access it. But the advantage of having this hook is that a
> kernel module could access it without a problem.
> 
> By adding tracepoints in the scheduler and VFS, without the TRACE_EVENT
> macros that export them to user space, it would be much easier for
> companies like Facebook, Red Hat and SuSE to add a module that can tap
> into these hooks and build their custom analysis tools on top.
> 
> Requiring an external and custom module to access the tracepoints on
> live systems (that is, an unmodified vanilla kernel or distro kernel)
> will help these companies implement advance analytical tools to monitor
> their production kernels, and because it requires a module, and it has
> been stated several times in the past that there is no KABI with module
> interfaces, the maintainers of these hooks should have no fear that
> they will become a stable interface.
> 

The tricky part is we want to be able to access these from eBPF.  I argue that
eBPF is run in the kernel so it has the same rules as kernel modules.  Others
seem less convinced of this argument, so it would be good to get a definitive
answer.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces
  2017-09-20 15:04   ` Josef Bacik
@ 2017-09-20 15:13     ` Steven Rostedt
  2017-09-21  9:45       ` Sergey Senozhatsky
  2017-09-29 23:50     ` Alexei Starovoitov
  1 sibling, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2017-09-20 15:13 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Josef Bacik, ksummit-discuss, Peter Zijlstra

On Wed, 20 Sep 2017 11:04:05 -0400
Josef Bacik <josef@toxicpanda.com> wrote:

> The tricky part is we want to be able to access these from eBPF.  I argue that
> eBPF is run in the kernel so it has the same rules as kernel modules.  Others
> seem less convinced of this argument, so it would be good to get a definitive
> answer.  Thanks,

Note, adding a module to let eBPF access these tracepoints would also
be trivial. Would a module be of issue at FB? It could be easily added
at boot up.

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces
  2017-09-20 15:13     ` Steven Rostedt
@ 2017-09-21  9:45       ` Sergey Senozhatsky
  0 siblings, 0 replies; 7+ messages in thread
From: Sergey Senozhatsky @ 2017-09-21  9:45 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Josef Bacik, ksummit-discuss, Peter Zijlstra, Josef Bacik

Hello,

On (09/20/17 11:13), Steven Rostedt wrote:
> On Wed, 20 Sep 2017 11:04:05 -0400
> Josef Bacik <josef@toxicpanda.com> wrote:
> 
> > The tricky part is we want to be able to access these from eBPF.  I argue that
> > eBPF is run in the kernel so it has the same rules as kernel modules.  Others
> > seem less convinced of this argument, so it would be good to get a definitive
> > answer.  Thanks,
> 
> Note, adding a module to let eBPF access these tracepoints would also
> be trivial. Would a module be of issue at FB? It could be easily added
> at boot up.

JFI, seems that Josef's reply didn't make it to the ksummit-discuss list.
I'm interested in this topic, tho.

	-ss

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces
  2017-09-20 15:04   ` Josef Bacik
  2017-09-20 15:13     ` Steven Rostedt
@ 2017-09-29 23:50     ` Alexei Starovoitov
  2017-10-04  0:55       ` Steven Rostedt
  1 sibling, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2017-09-29 23:50 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Josef Bacik, ksummit-discuss, Peter Zijlstra

On Wed, Sep 20, 2017 at 11:04:05AM -0400, Josef Bacik wrote:
> On Wed, Sep 20, 2017 at 02:54:07PM +0000, Josef Bacik wrote:
> > Cc’ing my personal address so I can reply with a sane email client.
> > 
> > On 9/20/17, 9:50 AM, "Steven Rostedt" <rostedt@goodmis.org> wrote:
> > 
> > The topic came up again at (of all places) the Schedule Workloads
> > Microconf at Linux Plumbers in LA last week. The addition of
> > tracepoints in locations that maintainers don't want them, only because
> > they don't want them to become an ABI for user space tools. Where
> > these tools then must be supported indefinitely, and may prevent
> > future development of the kernel. This includes the scheduler as well
> > as VFS (mandated by Al Viro).
> > 
> > The current solution by Facebook (told to us by Josef Bacik) is to just
> > hand write kprobes with BPF programs to the locations that they need.
> > When they get a new kernel, they just rewrite the programs because the
> > kprobes and BPF programs break at each new release (or can break).
> > 
> > First it was mentioned to add a hook to locations where it would be
> > easier to get variables, as the compiler could optimize them out, and
> > it becomes difficult even with BPF and kprobes to get the information
> > one would like to have. It was asked if we could add a tracepoint hook
> > in these locations that are not exported to user space where it runs
> > the risk of becoming an ABI. It was pointed out that this mechanism
> > already exists in the kernel.

Aren't we beating the dead horse?
A year ago at the kernel summit:
https://lwn.net/Articles/705270/
"The session concluded with Linus saying that, in the history of kernel development,
nobody has ever screamed about a change to a tracepoint. He allowed that this might
happen as the use of tracepoints increases. But, he said, there is no point in
making a big deal about that possibility before it proves to be a problem."

So instead of inventing trace markers and other new things that are just
like existing tracepoints but without arguments how about
adding normal tracepoints with one or two arguments task* and rq*
bpf progs can walk whatever internals of these structs they need
with probe_read() and that would be plenty of info for most users
including kernel developers.
In that sense the only difference between these new sched tracepoints
and existing kprobe-based scripts will be the speed and ease of
access to task/rq pointers.
If pretty print of tracepoints into trace_pipe is an abi
concern then don't print anything.
Existing sched tracepoints are not useful from bpf point of view,
since they don't have pointers in arguments and instead print
comm/pid/cpu which is not very interesting.
Dumb kprobe in enqueue_task_*() is more powerful
since progs can simply bpf_trace_printk("%d\n", rq->nr_running);
btw I won't be in Prague, so best to discuss over email.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces
  2017-09-29 23:50     ` Alexei Starovoitov
@ 2017-10-04  0:55       ` Steven Rostedt
  0 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2017-10-04  0:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Josef Bacik, ksummit-discuss, Peter Zijlstra, Josef Bacik

On Fri, 29 Sep 2017 16:50:23 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> Aren't we beating the dead horse?

Not really, because this dead horse can still give quite a hell of a
kick back.

> A year ago at the kernel summit:
> https://lwn.net/Articles/705270/
> "The session concluded with Linus saying that, in the history of kernel development,
> nobody has ever screamed about a change to a tracepoint. He allowed that this might
> happen as the use of tracepoints increases. But, he said, there is no point in
> making a big deal about that possibility before it proves to be a problem."

We have two, possibly three instances that user space has caused
tracepoint hell already. Yes, powertop screamed about changing a
tracepoint. We have silly crap in sched_switch and sched_wakeup due to
user space not wanting that to change. And there was just another
tracepoint having to carry blank fields because userspace expects them
to exist.

> 
> So instead of inventing trace markers and other new things that are just

There is no "inventing". They already exist. In fact, that's what
TRACE_EVENT() macros are built on. In fact, what we are talking about
was the original introduction of tracepoints. This is what is in
tracepoint.h and is implemented in tracepoint.c. No new code needs to
be done to implement this. All it would take is to put in the
tracepoints by hand, without the use of the TRACE_EVENT macros.

> like existing tracepoints but without arguments how about

We are not talking about tracepoints without arguments.

> adding normal tracepoints with one or two arguments task* and rq*
> bpf progs can walk whatever internals of these structs they need
> with probe_read() and that would be plenty of info for most users
> including kernel developers.
> In that sense the only difference between these new sched tracepoints
> and existing kprobe-based scripts will be the speed and ease of
> access to task/rq pointers.

That may be what we are talking about ;-)

> If pretty print of tracepoints into trace_pipe is an abi
> concern then don't print anything.

No, that's not the issue. The issue is what gets written into the
binary buffers of perf or ftrace.

> Existing sched tracepoints are not useful from bpf point of view,
> since they don't have pointers in arguments and instead print
> comm/pid/cpu which is not very interesting.

?? The sched tracepoints pass in the task pointers that they deal with.

> Dumb kprobe in enqueue_task_*() is more powerful
> since progs can simply bpf_trace_printk("%d\n", rq->nr_running);
> btw I won't be in Prague, so best to discuss over email.

Well this isn't just about bpf, it's also about tracing.

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-10-04  0:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20 13:50 [Ksummit-discuss] [MAINTAINER TOPIC] tracepoints without user space interfaces Steven Rostedt
2017-09-20 14:54 ` Josef Bacik
2017-09-20 15:04   ` Josef Bacik
2017-09-20 15:13     ` Steven Rostedt
2017-09-21  9:45       ` Sergey Senozhatsky
2017-09-29 23:50     ` Alexei Starovoitov
2017-10-04  0:55       ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.