All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>, paulmck <paulmck@kernel.org>,
	"Joel Fernandes, Google" <joel@joelfernandes.org>,
	Frederic Weisbecker <frederic@kernel.org>
Subject: Re: Instrumentation and RCU
Date: Tue, 10 Mar 2020 11:03:15 -0400 (EDT)	[thread overview]
Message-ID: <1489283504.23399.1583852595008.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <871rq171ca.fsf@nanos.tec.linutronix.de>

----- On Mar 9, 2020, at 3:52 PM, Thomas Gleixner tglx@linutronix.de wrote:

> Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes:
>> ----- On Mar 9, 2020, at 1:02 PM, Thomas Gleixner tglx@linutronix.de wrote:
[...]
> 
>>> #3) RCU idle
>>>    are really more than questionable. For 99.9999% of instrumentation
>>>    users it's absolutely irrelevant whether this traces the interrupt
>>>    disabled time of user_exit_irqsoff() or rcu_irq_enter() or not.
>>> 
>>>    But what's relevant is the tracer overhead which is e.g. inflicted
>>>    with todays trace_hardirqs_off/on() implementation because that
>>>    unconditionally uses the rcuidle variant with the scru/rcu_irq dance
>>>    around every tracepoint.
>>
>> I think one of the big issues here is that most of the uses of
>> trace_hardirqs_off() are from sites which already have RCU watching,
>> so we are doing heavy-weight operations for nothing.
> 
> That and in some places in the entry code we do the heavy weight
> operations to cover 100 instructions which turn on RCU anyway. That does
> not make any sense at all.
> 
>> I strongly suspect that most of the overhead we've been trying to avoid when
>> introducing use of SRCU in rcuidle tracepoints was actually caused by callsites
>> which use rcuidle tracepoints while having RCU watching, just because there is a
>> handful of callsites which don't have RCU watching. This is confirmed
>> by the commit message of commit e6753f23d9 "tracepoint: Make rcuidle
>> tracepoint callers use SRCU":
>>
>>    "In recent tests with IRQ on/off tracepoints, a large performance
>>     overhead ~10% is noticed when running hackbench. This is root caused to
>>     calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
>>     tracepoint code. Following a long discussion on the list [1] about this,
>>     we concluded that srcu is a better alternative for use during rcu idle.
>>     Although it does involve extra barriers, its lighter than the sched-rcu
>>     version which has to do additional RCU calls to notify RCU idle about
>>     entry into RCU sections.
>> [...]
>>     Test: Tested idle and preempt/irq tracepoints."
> 
> In a quick test I did with a invalid syscall number with profiling the
> trace_hardirqs_off() is pretty prominent and goes down by roughly a
> factor of 2 when I move it past enter_from_user_mode() and use just the
> non RCU idle variant.

I think one issue here is that trace_hardirqs_off() is now shared between
lockdep and tracing. For lockdep, we have the following comment:

        /*
         * IRQ from user mode.
         *
         * We need to tell lockdep that IRQs are off.  We can't do this until
         * we fix gsbase, and we should do it before enter_from_user_mode
         * (which can take locks).  Since TRACE_IRQS_OFF is idempotent,
         * the simplest way to handle it is to just call it twice if
         * we enter from user mode.  There's no reason to optimize this since
         * TRACE_IRQS_OFF is a no-op if lockdep is off.
         */
        TRACE_IRQS_OFF

        CALL_enter_from_user_mode

1:
        ENTER_IRQ_STACK old_rsp=%rdi save_ret=1
        /* We entered an interrupt context - irqs are off: */
        TRACE_IRQS_OFF

which seems to imply that lockdep requires TRACE_IRQS_OFF to be performed
_before_ entering from usermode. I don't expect this to be useful at all for
other tracers though. I think this should be replaced by a new e.g.
LOCKDEP_ENTER_FROM_USER_MODE or such which would call into lockdep without
calling other tracers.

> 
>> So I think we could go back to plain RCU for rcuidle tracepoints if we do
>> the cheaper "rcu_is_watching()" check rather than invoking
>> rcu_irq_{enter,exit}_irqson() unconditionally.
> 
> Haven't tried that yet for the overall usage of trace_hardirqs_off(),
> but yes, it's going to be a measurable difference.

Ideally we should do both: change the TRACE_IRQS_OFF prior entering from usermode
for a lockdep-specific call, which would remove the frequent case which requires
temporarily enabling RCU, *and* change tracepoints to use rcu_is_watching(),
temporarily enable RCU, and use standard RCU for rcuidle cases.

> 
>>>    Even if the tracepoint sits in the ASM code it just covers about ~20
>>>    low level ASM instructions more. The tracer invocation, which is
>>>    even done twice when coming from user space on x86 (the second call
>>>    is optimized in the tracer C-code), costs definitely way more
>>>    cycles. When you take the scru/rcu_irq dance into account it's a
>>>    complete disaster performance wise.
>>
>> Part of the issue here is the current overhead of SRCU read-side lock,
>> which contains memory barriers. The other part of the issue is the fact that
>> rcu_irq_{enter,exit}_irqson() contains an atomic_add_return atomic instruction.
>>
>> We could use the approach proposed by Peterz's and Steven's patches to basically
>> do a lightweight "is_rcu_watching()" check for rcuidle tracepoint, and only
>> enable
>> RCU for those cases. We could then simply go back on using regular RCU
>> like so:
> 
> Right, but that still does the whole rcu_irq dance especially in the
> entry code just to trace 50 or 100 instructions which are turning on RCU
> anyway.

Agreed. Would changing this to a lockdep-specific call as I suggest above
solve this ?

> 
>>> #4 Protecting call chains
[...]
> 
>> In addition to splitting tracing code into a separate section, which I think
>> makes sense, I can think of another alternative way to provide call chains
>> protection: adding a "in_tracing" flag somewhere alongside each kernel stack.
>> Each thread and interrupt stack would have its own flag. However, trap handlers
>> should share the "in_tracing" flag with the context which triggers the trap.
>>
>> If a tracer recurses, or if a tracer attempts to trace another tracer, the
>> instrumentation would break the recursion chain by preventing instrumentation
>> from firing. If we end up caring about tracers tracing other tracers, we could
>> have one distinct flag per tracer and let each tracer break the recursion chain.
>>
>> Having this flag per kernel stack rather than per CPU or per thread would
>> allow tracing of nested interrupt handlers (and NMIs), but would break
>> call chains both within the same stack or going through a trap. I think
>> it could be a nice complementary safety net to handle mishaps in a non-fatal
>> way.
> 
> That works as long as none of this uses breakpoint based patching to
> dynamically disable/enable stuff.

I'm clearly missing something here. I was expecting the "in_tracing" flag trick
to be able to fix the breakpoint recursion issue. What is the problem I'm missing
here ?

An additional point about splitting core code into separate non-instrumentable
sections: The part I'm concerned about is instrumentation of trap handlers.
_That_ will likely need to remain shared between the kernel and the tracers.
This is where I think the "in_tracing" flag approach would help us solve the
recursion issue without losing that important instrumentation coverage.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2020-03-10 15:03 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-09 17:02 Instrumentation and RCU Thomas Gleixner
2020-03-09 18:15 ` Steven Rostedt
2020-03-09 18:42   ` Joel Fernandes
2020-03-09 19:07     ` Steven Rostedt
2020-03-09 19:20       ` Mathieu Desnoyers
2020-03-16 15:02       ` Joel Fernandes
2020-03-09 18:59   ` Thomas Gleixner
2020-03-10  8:09     ` Masami Hiramatsu
2020-03-10 11:43       ` Thomas Gleixner
2020-03-10 15:31         ` Mathieu Desnoyers
2020-03-10 15:46           ` Steven Rostedt
2020-03-10 16:21             ` Mathieu Desnoyers
2020-03-11  0:18               ` Masami Hiramatsu
2020-03-11  0:37                 ` Mathieu Desnoyers
2020-03-11  7:48                   ` Masami Hiramatsu
2020-03-10 16:06         ` Masami Hiramatsu
2020-03-12 13:53         ` Peter Zijlstra
2020-03-10 15:24       ` Mathieu Desnoyers
2020-03-10 17:05       ` Daniel Thompson
2020-03-09 18:37 ` Mathieu Desnoyers
2020-03-09 18:44   ` Steven Rostedt
2020-03-09 18:52     ` Mathieu Desnoyers
2020-03-09 19:09       ` Steven Rostedt
2020-03-09 19:25         ` Mathieu Desnoyers
2020-03-09 19:52   ` Thomas Gleixner
2020-03-10 15:03     ` Mathieu Desnoyers [this message]
2020-03-10 16:48       ` Thomas Gleixner
2020-03-10 17:40         ` Mathieu Desnoyers
2020-03-10 18:31           ` Thomas Gleixner
2020-03-10 18:37             ` Mathieu Desnoyers
2020-03-10  1:40   ` Alexei Starovoitov
2020-03-10  8:02     ` Thomas Gleixner
2020-03-10 16:54     ` Paul E. McKenney
2020-03-17 17:56     ` Joel Fernandes
2020-03-09 20:18 ` Peter Zijlstra
2020-03-09 20:47 ` Paul E. McKenney
2020-03-09 20:58   ` Steven Rostedt
2020-03-09 21:25     ` Paul E. McKenney
2020-03-09 23:52   ` Frederic Weisbecker
2020-03-10  2:26     ` Paul E. McKenney
2020-03-10 15:13   ` Mathieu Desnoyers
2020-03-10 16:49     ` Paul E. McKenney
2020-03-10 17:22       ` Mathieu Desnoyers
2020-03-10 17:26         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1489283504.23399.1583852595008.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ast@kernel.org \
    --cc=frederic@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.