linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Instrumentation and RCU
@ 2020-03-09 17:02 Thomas Gleixner
  2020-03-09 18:15 ` Steven Rostedt
                   ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: Thomas Gleixner @ 2020-03-09 17:02 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Steven Rostedt, Masami Hiramatsu,
	Alexei Starovoitov, Mathieu Desnoyers, Paul E. McKenney,
	Joel Fernandes, Frederic Weisbecker

Folks,

I'm starting a new conversation because there are about 20 different
threads which look at that problem in various ways and the information
is so scattered that creating a coherent picture is pretty much
impossible.

There are several problems to solve:

   1) Fragile low level entry code

   2) Breakpoint utilization

   3) RCU idle

   4) Callchain protection

#1 Fragile low level entry code

   While I understand the desire of instrumentation to observe
   everything we really have to ask the question whether it is worth the
   trouble especially with entry trainwrecks like x86, PTI and other
   horrors in that area.

   I don't think so and we really should just bite the bullet and forbid
   any instrumentation in that code unless it is explicitly designed
   for that case, makes sense and has a real value from an observation
   perspective.

   This is very much related to #3..

#2) Breakpoint utilization

    As recent findings have shown, breakpoint utilization needs to be
    extremly careful about not creating infinite breakpoint recursions.

    I think that's pretty much obvious, but falls into the overall
    question of how to protect callchains.

#3) RCU idle

    Being able to trace code inside RCU idle sections is very similar to
    the question raised in #1.

    Assume all of the instrumentation would be doing conditional RCU
    schemes, i.e.:

    if (rcuidle)
    	....
    else
        rcu_read_lock_sched()

    before invoking the actual instrumentation functions and of course
    undoing that right after it, that really begs the question whether
    it's worth it.

    Especially constructs like:

    trace_hardirqs_off()
       idx = srcu_read_lock()
       rcu_irq_enter_irqson();
       ...
       rcu_irq_exit_irqson();
       srcu_read_unlock(idx);

    if (user_mode)
       user_exit_irqsoff();
    else
       rcu_irq_enter();

    are really more than questionable. For 99.9999% of instrumentation
    users it's absolutely irrelevant whether this traces the interrupt
    disabled time of user_exit_irqsoff() or rcu_irq_enter() or not.

    But what's relevant is the tracer overhead which is e.g. inflicted
    with todays trace_hardirqs_off/on() implementation because that
    unconditionally uses the rcuidle variant with the scru/rcu_irq dance
    around every tracepoint.

    Even if the tracepoint sits in the ASM code it just covers about ~20
    low level ASM instructions more. The tracer invocation, which is
    even done twice when coming from user space on x86 (the second call
    is optimized in the tracer C-code), costs definitely way more
    cycles. When you take the scru/rcu_irq dance into account it's a
    complete disaster performance wise.

#4 Protecting call chains

   Our current approach of annotating functions with notrace/noprobe is
   pretty much broken.

   Functions which are marked NOPROBE or notrace call out into functions
   which are not marked and while this might be ok, there are enough
   places where it is not. But we have no way to verify that.

   That's just a recipe for disaster. We really cannot request from
   sysadmins who want to use instrumentation to stare at the code first
   whether they can place/enable an instrumentation point somewhere.
   That'd be just a bad joke.

   I really think we need to have proper text sections which are off
   limit for any form of instrumentation and have tooling to analyze the
   calls into other sections. These calls need to be annotated as safe
   and intentional.

Thoughts?

Thanks,

        tglx






   


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2020-03-17 17:56 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-09 17:02 Instrumentation and RCU Thomas Gleixner
2020-03-09 18:15 ` Steven Rostedt
2020-03-09 18:42   ` Joel Fernandes
2020-03-09 19:07     ` Steven Rostedt
2020-03-09 19:20       ` Mathieu Desnoyers
2020-03-16 15:02       ` Joel Fernandes
2020-03-09 18:59   ` Thomas Gleixner
2020-03-10  8:09     ` Masami Hiramatsu
2020-03-10 11:43       ` Thomas Gleixner
2020-03-10 15:31         ` Mathieu Desnoyers
2020-03-10 15:46           ` Steven Rostedt
2020-03-10 16:21             ` Mathieu Desnoyers
2020-03-11  0:18               ` Masami Hiramatsu
2020-03-11  0:37                 ` Mathieu Desnoyers
2020-03-11  7:48                   ` Masami Hiramatsu
2020-03-10 16:06         ` Masami Hiramatsu
2020-03-12 13:53         ` Peter Zijlstra
2020-03-10 15:24       ` Mathieu Desnoyers
2020-03-10 17:05       ` Daniel Thompson
2020-03-09 18:37 ` Mathieu Desnoyers
2020-03-09 18:44   ` Steven Rostedt
2020-03-09 18:52     ` Mathieu Desnoyers
2020-03-09 19:09       ` Steven Rostedt
2020-03-09 19:25         ` Mathieu Desnoyers
2020-03-09 19:52   ` Thomas Gleixner
2020-03-10 15:03     ` Mathieu Desnoyers
2020-03-10 16:48       ` Thomas Gleixner
2020-03-10 17:40         ` Mathieu Desnoyers
2020-03-10 18:31           ` Thomas Gleixner
2020-03-10 18:37             ` Mathieu Desnoyers
2020-03-10  1:40   ` Alexei Starovoitov
2020-03-10  8:02     ` Thomas Gleixner
2020-03-10 16:54     ` Paul E. McKenney
2020-03-17 17:56     ` Joel Fernandes
2020-03-09 20:18 ` Peter Zijlstra
2020-03-09 20:47 ` Paul E. McKenney
2020-03-09 20:58   ` Steven Rostedt
2020-03-09 21:25     ` Paul E. McKenney
2020-03-09 23:52   ` Frederic Weisbecker
2020-03-10  2:26     ` Paul E. McKenney
2020-03-10 15:13   ` Mathieu Desnoyers
2020-03-10 16:49     ` Paul E. McKenney
2020-03-10 17:22       ` Mathieu Desnoyers
2020-03-10 17:26         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).