linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Brian Gerst <brgerst@gmail.com>, Juergen Gross <JGross@suse.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>
Subject: Re: [patch 4/8] x86/entry: Move irq tracing on syscall entry to C-code
Date: Sun, 1 Mar 2020 21:36:54 -0500	[thread overview]
Message-ID: <20200302023654.GA211042@google.com> (raw)
In-Reply-To: <DC74BDD5-C71D-4083-A13C-BA066C8C56F8@amacapital.net>

On Sun, Mar 01, 2020 at 06:18:51PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Mar 1, 2020, at 5:10 PM, Joel Fernandes <joel@joelfernandes.org> wrote:
> > 
> > On Sun, Mar 01, 2020 at 10:54:23AM -0800, Andy Lutomirski wrote:
> >>> On Sun, Mar 1, 2020 at 10:26 AM Paul E. McKenney <paulmck@kernel.org> wrote:
> >>> 
> >>> On Sun, Mar 01, 2020 at 07:12:25PM +0100, Thomas Gleixner wrote:
> >>>> Andy Lutomirski <luto@kernel.org> writes:
> >>>>> On Sun, Mar 1, 2020 at 7:21 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>>>>> Andy Lutomirski <luto@amacapital.net> writes:
> >>>>>>>> On Mar 1, 2020, at 2:16 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >>>>>>>> Ok, but for the time being anything before/after CONTEXT_KERNEL is unsafe
> >>>>>>>> except trace_hardirq_off/on() as those trace functions do not allow to
> >>>>>>>> attach anything AFAICT.
> >>>>>>> 
> >>>>>>> Can you point to whatever makes those particular functions special?  I
> >>>>>>> failed to follow the macro maze.
> >>>>>> 
> >>>>>> Those are not tracepoints and not going through the macro maze. See
> >>>>>> kernel/trace/trace_preemptirq.c
> >>>>> 
> >>>>> That has:
> >>>>> 
> >>>>> void trace_hardirqs_on(void)
> >>>>> {
> >>>>>        if (this_cpu_read(tracing_irq_cpu)) {
> >>>>>                if (!in_nmi())
> >>>>>                        trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> >>>>>                tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
> >>>>>                this_cpu_write(tracing_irq_cpu, 0);
> >>>>>        }
> >>>>> 
> >>>>>        lockdep_hardirqs_on(CALLER_ADDR0);
> >>>>> }
> >>>>> EXPORT_SYMBOL(trace_hardirqs_on);
> >>>>> NOKPROBE_SYMBOL(trace_hardirqs_on);
> >>>>> 
> >>>>> But this calls trace_irq_enable_rcuidle(), and that's the part of the
> >>>>> macro maze I got lost in.  I found:
> >>>>> 
> >>>>> #ifdef CONFIG_TRACE_IRQFLAGS
> >>>>> DEFINE_EVENT(preemptirq_template, irq_disable,
> >>>>>             TP_PROTO(unsigned long ip, unsigned long parent_ip),
> >>>>>             TP_ARGS(ip, parent_ip));
> >>>>> 
> >>>>> DEFINE_EVENT(preemptirq_template, irq_enable,
> >>>>>             TP_PROTO(unsigned long ip, unsigned long parent_ip),
> >>>>>             TP_ARGS(ip, parent_ip));
> >>>>> #else
> >>>>> #define trace_irq_enable(...)
> >>>>> #define trace_irq_disable(...)
> >>>>> #define trace_irq_enable_rcuidle(...)
> >>>>> #define trace_irq_disable_rcuidle(...)
> >>>>> #endif
> >>>>> 
> >>>>> But the DEFINE_EVENT doesn't have the "_rcuidle" part.  And that's
> >>>>> where I got lost in the macro maze.  I looked at the gcc asm output,
> >>>>> and there is, indeed:
> >>>> 
> >>>> DEFINE_EVENT
> >>>>  DECLARE_TRACE
> >>>>    __DECLARE_TRACE
> >>>>       __DECLARE_TRACE_RCU
> >>>>         static inline void trace_##name##_rcuidle(proto)
> >>>>            __DO_TRACE
> >>>>               if (rcuidle)
> >>>>                  ....
> >>>> 
> >>>>> But I also don't see why this is any different from any other tracepoint.
> >>>> 
> >>>> Indeed. I took a wrong turn at some point in the macro jungle :)
> >>>> 
> >>>> So tracing itself is fine, but then if you have probes or bpf programs
> >>>> attached to a tracepoint these use rcu_read_lock()/unlock() which is
> >>>> obviosly wrong in rcuidle context.
> >>> 
> >>> Definitely, any such code needs to use tricks similar to that of the
> >>> tracing code.  Or instead use something like SRCU, which is OK with
> >>> readers from idle.  Or use something like Steve Rostedt's workqueue-based
> >>> approach, though please be very careful with this latter, lest the
> >>> battery-powered embedded guys come after you for waking up idle CPUs
> >>> too often.  ;-)
> >>> 
> >> 
> >> Are we okay if we somehow ensure that all the entry code before
> >> enter_from_user_mode() only does rcuidle tracing variants and has
> >> kprobes off?  Including for BPF use cases?
> >> 
> >> It would be *really* nice if we could statically verify this, as has
> >> been mentioned elsewhere in the thread.  It would also probably be
> >> good enough if we could do it at runtime.  Maybe with lockdep on, we
> >> verify rcu state in tracepoints even if the tracepoint isn't active?
> >> And we could plausibly have some widget that could inject something
> >> into *every* kprobeable function to check rcu state.
> > 
> > You are talking about verifying that a non-rcuidle tracepoint is not called
> > into when RCU is not watching right? I think that's fine, though I feel
> > lockdep kernels should not be slowed down any more than they already are. I
> > feel over time if we add too many checks to lockdep enabled kernels, then it
> > becomes too slow even for "debug" kernels. May be it is time for a
> > CONFIG_LOCKDEP_SLOW or some such? And then anyone who wants to go crazy on
> > runtime checking can do so. I myself want to add a few.
> > 
> > Note that the checking is being added into "non rcu-idle" tracepoints many of
> > which are probably always called when RCU is watching, making such checking
> > useless for those tracepoints (and slowing them down however less).
> > 
> 
> Indeed. Static analysis would help a lot here.
> 
> > Also another note would be that the whole reason we are getting rid of the
> > "make RCU watch when rcuidle" logic in DO_TRACE is because it is slow for
> > tracepoints that are frequently called into. Another reason to do it is
> > because tracepoint callbacks are expected to know what they are doing and
> > turn on RCU watching as appropriate (as consensus on the matter suggests).
> 
> Whoa there. We arch people need crystal clear rules as to what tracepoints
> can be called in what contexts and what is the responsibility of the
> callbacks.
> 

The direction that Peter, Mathieu and Steve are going is that callbacks
registered on "rcu idle" tracepoints need to turn on "RCU watching"
themselves. Such as perf. Turning on "RCU watching" is non-free as I tested
in 2017 and we removed it back then, but it was added right back when perf
started splatting. Now it is being removed again, and the turning on of RCU's
eyes happens in the perf callback itself since perf uses RCU.

If you are calling trace_.._rcuidle(), can you not ensure that RCU is
watching by calling the appropriate RCU APIs in your callbacks? Or did I miss
the point?

thanks,

 - Joel


  reply	other threads:[~2020-03-02  2:36 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-25 22:08 [patch 0/8] x86/entry: Consolidation - Part II Thomas Gleixner
2020-02-25 22:08 ` [patch 1/8] x86/entry/64: Trace irqflags unconditionally on when returing to user space Thomas Gleixner
2020-02-27 19:49   ` Borislav Petkov
2020-02-27 22:45   ` Frederic Weisbecker
2020-02-28  8:58   ` Alexandre Chartre
2020-02-25 22:08 ` [patch 2/8] x86/entry/common: Consolidate syscall entry code Thomas Gleixner
2020-02-27 19:57   ` Borislav Petkov
2020-02-27 22:52   ` Frederic Weisbecker
2020-02-28  8:59   ` Alexandre Chartre
2020-02-25 22:08 ` [patch 3/8] x86/entry/common: Mark syscall entry points notrace/nokprobe Thomas Gleixner
2020-02-27 23:15   ` Frederic Weisbecker
2020-02-28  8:59   ` Alexandre Chartre
2020-02-25 22:08 ` [patch 4/8] x86/entry: Move irq tracing on syscall entry to C-code Thomas Gleixner
2020-02-26  5:43   ` Andy Lutomirski
2020-02-26  8:17     ` Peter Zijlstra
2020-02-26 11:20       ` Andy Lutomirski
2020-02-26 19:51         ` Thomas Gleixner
2020-02-29 14:44           ` Thomas Gleixner
2020-02-29 19:25             ` Andy Lutomirski
2020-02-29 23:58               ` Steven Rostedt
2020-03-01 10:16                 ` Thomas Gleixner
2020-03-01 14:37                   ` Andy Lutomirski
2020-03-01 15:21                     ` Thomas Gleixner
2020-03-01 16:00                       ` Andy Lutomirski
2020-03-01 18:12                         ` Thomas Gleixner
2020-03-01 18:26                           ` Paul E. McKenney
2020-03-01 18:54                             ` Andy Lutomirski
2020-03-01 19:30                               ` Paul E. McKenney
2020-03-01 19:39                                 ` Andy Lutomirski
2020-03-01 20:18                                   ` Paul E. McKenney
2020-03-02  0:35                                   ` Steven Rostedt
2020-03-02  6:47                                     ` Masami Hiramatsu
2020-03-02  1:10                               ` Joel Fernandes
2020-03-02  2:18                                 ` Andy Lutomirski
2020-03-02  2:36                                   ` Joel Fernandes [this message]
2020-03-02  5:40                                     ` Andy Lutomirski
2020-03-02  8:10                               ` Thomas Gleixner
2020-03-01 18:23                         ` Steven Rostedt
2020-03-01 18:20                       ` Steven Rostedt
2020-02-27 23:11   ` Frederic Weisbecker
2020-02-28  9:00   ` Alexandre Chartre
2020-02-25 22:08 ` [patch 5/8] x86/entry/common: Provide trace/kprobe safe exit to user space functions Thomas Gleixner
2020-02-26  5:45   ` Andy Lutomirski
2020-02-26  8:15     ` Peter Zijlstra
2020-02-27 15:43   ` Alexandre Chartre
2020-02-27 15:53     ` Thomas Gleixner
2020-02-25 22:08 ` [patch 6/8] x86/entry: Move irq tracing to syscall_slow_exit_work Thomas Gleixner
2020-02-26  5:47   ` Andy Lutomirski
2020-02-27 16:12   ` Alexandre Chartre
2020-02-25 22:08 ` [patch 7/8] x86/entry: Move irq tracing to prepare_exit_to_user_mode() Thomas Gleixner
2020-02-26  5:50   ` Andy Lutomirski
2020-02-26 19:53     ` Thomas Gleixner
2020-02-26 20:07       ` Andy Lutomirski
2020-02-25 22:08 ` [patch 8/8] x86/entry: Move irqflags tracing to do_int80_syscall_32() Thomas Gleixner
2020-02-27 16:46   ` Alexandre Chartre
2020-02-28 13:49     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200302023654.GA211042@google.com \
    --to=joel@joelfernandes.org \
    --cc=JGross@suse.com \
    --cc=arnd@arndb.de \
    --cc=brgerst@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).