All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Qais Yousef <qais.yousef@arm.com>
Cc: "Valentin Schneider" <valentin.schneider@arm.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	linux-kernel@vger.kernel.org, "Ingo Molnar" <mingo@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Andy Lutomirski" <luto@kernel.org>,
	"Jirka Hladký" <jhladky@redhat.com>,
	"Jiří Vozár" <jvozar@redhat.com>,
	x86@kernel.org
Subject: Re: [PATCH 2/2] sched/debug: add sched_update_nr_running tracepoint
Date: Wed, 4 Sep 2019 09:06:28 -0400	[thread overview]
Message-ID: <20190904130628.GE144846@google.com> (raw)
In-Reply-To: <20190904104332.ogsjtbtuadhsglxh@e107158-lin.cambridge.arm.com>

On Wed, Sep 04, 2019 at 11:43:33AM +0100, Qais Yousef wrote:
> On 09/04/19 00:23, Joel Fernandes wrote:
> > On Tue, Sep 03, 2019 at 05:05:47PM +0100, Valentin Schneider wrote:
> > > On 03/09/2019 16:43, Radim Krčmář wrote:
> > > > The paper "The Linux Scheduler: a Decade of Wasted Cores" used several
> > > > custom data gathering points to better understand what was going on in
> > > > the scheduler.
> > > > Red Hat adapted one of them for the tracepoint framework and created a
> > > > tool to plot a heatmap of nr_running, where the sched_update_nr_running
> > > > tracepoint is being used for fine grained monitoring of scheduling
> > > > imbalance.
> > > > The tool is available from https://github.com/jirvoz/plot-nr-running.
> > > > 
> > > > The best place for the tracepoints is inside the add/sub_nr_running,
> > > > which requires some shenanigans to make it work as they are defined
> > > > inside sched.h.
> > > > The tracepoints have to be included from sched.h, which means that
> > > > CREATE_TRACE_POINTS has to be defined for the whole header and this
> > > > might cause problems if tree-wide headers expose tracepoints in sched.h
> > > > dependencies, but I'd argue it's the other side's misuse of tracepoints.
> > > > 
> > > > Moving the import sched.h line lower would require fixes in s390 and ppc
> > > > headers, because they don't include dependecies properly and expect
> > > > sched.h to do it, so it is simpler to keep sched.h there and
> > > > preventively undefine CREATE_TRACE_POINTS right after.
> > > > 
> > > > Exports of the pelt tracepoints remain because they don't need to be
> > > > protected by CREATE_TRACE_POINTS and moving them closer would be
> > > > unsightly.
> > > > 
> > > 
> > > Pure trace events are frowned upon in scheduler world, try going with
> > > trace points. Qais did something very similar recently:
> > > 
> > > https://lore.kernel.org/lkml/20190604111459.2862-1-qais.yousef@arm.com/
> > > 
> > > You'll have to implement the associated trace events in a module, which
> > > lets you define your own event format and doesn't form an ABI :).
> > 
> > Is that really true? eBPF programs loaded from userspace can access
> > tracepoints through BPF_RAW_TRACEPOINT_OPEN, which is UAPI:
> > https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L103
> > 
> > I don't have a strong opinion about considering tracepoints as ABI / API or
> > not, but just want to get the facts straight :)
> 
> It is actually true.
>
> But you need to make the distinction between a tracepoint
> and a trace event first.

I know this distinction well.

> What Valentin is talking about here is the *bare*
> tracepoint without any event associated with them like the one I added to the
> scheduler recently. These ones are not accessible via eBPF, unless something
> has changed since I last tried.

Can this tracepoint be registered on with tracepoint_probe_register()?
Quickly looking at these new tracepoint, they can be otherwise how would they
even work right? If so, then eBPF can very well access it. Look at
__bpf_probe_register() and bpf_raw_tracepoint_open() which implement the
BPF_RAW_TRACEPOINT_OPEN.

> The current infrastructure needs to be expanded to allow eBPF to attach these
> bare tracepoints. Something similar to what I have in [1] is needed - but
> instead of creating a new macro it needs to expand the current macro. [2] might
> give full context of when I was trying to come up with alternatives to using
> trace events.
> 
> [1] https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> [2] https://lore.kernel.org/lkml/20190415144945.tumeop4djyj45v6k@e107158-lin.cambridge.arm.com/


As I was mentioning, tracepoints, not "trace events" can already be opened
directly with BPF. I don't see how these new tracepoints are different.

I wonder if this distinction of "tracepoint" being non-ABI can be documented
somewhere. I would be happy to do that if there is a place for the same. I
really want some general "policy" in the kernel on where we draw a line in
the sand with respect to tracepoints and ABI :).

For instance, perhaps VFS can also start having non-ABI tracepoints for the
benefit of people tracing the VFS.

thanks,

 - Joel


  reply	other threads:[~2019-09-04 13:06 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 15:43 [PATCH 0/2] sched/debug: add sched_update_nr_running tracepoint Radim Krčmář
2019-09-03 15:43 ` [PATCH 1/2] x86/mm/tlb: include tracepoints from tlb.c instead of mmu_context.h Radim Krčmář
2019-09-03 15:43 ` [PATCH 2/2] sched/debug: add sched_update_nr_running tracepoint Radim Krčmář
2019-09-03 16:05   ` Valentin Schneider
2019-09-04  4:23     ` Joel Fernandes
2019-09-04  8:14       ` Peter Zijlstra
2019-09-04 10:52         ` Qais Yousef
2019-09-04 13:11         ` Joel Fernandes
2019-09-04 15:26           ` Alexei Starovoitov
2019-09-04 15:33             ` Joel Fernandes
2019-09-04 15:37               ` Alexei Starovoitov
2019-09-04 15:41                 ` Joel Fernandes
2019-09-04 10:43       ` Qais Yousef
2019-09-04 13:06         ` Joel Fernandes [this message]
2019-09-04 14:20           ` Qais Yousef
2019-09-04 14:41             ` Joel Fernandes
2019-09-04 14:57               ` Qais Yousef
2019-09-04 15:46                 ` Joel Fernandes
2019-09-04 15:25           ` Alexei Starovoitov
2019-09-04 15:40             ` Joel Fernandes
2019-09-04 15:51               ` Alexei Starovoitov
2019-09-04 17:47                 ` Peter Zijlstra
2019-09-04 17:53                   ` Alexei Starovoitov
2019-09-05  8:13                     ` Ingo Molnar
2019-09-05 16:49                       ` Alexei Starovoitov
2019-09-04  6:55     ` chengjian (D)
2019-09-04 13:13   ` Peter Zijlstra
2019-09-04 13:21     ` Valentin Schneider
2019-09-04 14:37   ` Qais Yousef
2019-09-04 17:48     ` Peter Zijlstra
2019-09-09 10:59       ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190904130628.GE144846@google.com \
    --to=joel@joelfernandes.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jhladky@redhat.com \
    --cc=jvozar@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rkrcmar@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=valentin.schneider@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.