From: Joel Fernandes <joel@joelfernandes.org> To: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>, Thomas Gleixner <tglx@linutronix.de>, linux-kernel <linux-kernel@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, rostedt <rostedt@goodmis.org>, Masami Hiramatsu <mhiramat@kernel.org>, Alexei Starovoitov <ast@kernel.org>, paulmck <paulmck@kernel.org>, Frederic Weisbecker <frederic@kernel.org>, bpf@vger.kernel.org Subject: Re: Instrumentation and RCU Date: Tue, 17 Mar 2020 13:56:14 -0400 Message-ID: <20200317175614.GA13090@google.com> (raw) In-Reply-To: <20200310014043.4dbagqbr2wsbuarm@ast-mbp> On Mon, Mar 09, 2020 at 06:40:45PM -0700, Alexei Starovoitov wrote: > On Mon, Mar 09, 2020 at 02:37:40PM -0400, Mathieu Desnoyers wrote: > > > > > > But what's relevant is the tracer overhead which is e.g. inflicted > > > with todays trace_hardirqs_off/on() implementation because that > > > unconditionally uses the rcuidle variant with the scru/rcu_irq dance > > > around every tracepoint. > > > > I think one of the big issues here is that most of the uses of > > trace_hardirqs_off() are from sites which already have RCU watching, > > so we are doing heavy-weight operations for nothing. > > I think kernel/trace/trace_preemptirq.c created too many problems for the > kernel without providing tangible benefits. My understanding no one is using it > in production. Hi Alexei, There are various people use the preempt/irq disable tracepoints for last 2 years at Google and ARM. There's also a BPF tool (in BCC) that uses those for tracing critical sections. Also Daniel Bristot's entire Preempt-IRQ formal verification stuff depends on it. > It's a tool to understand how kernel works. And such debugging > tool can and should be removed. If we go by that line of reasoning, then function tracing also should be removed from the kernel. I am glad Thomas and Peter are working on it and looking forward to seeing the patches, thanks, - Joel > One of Thomas's patches mentioned that bpf can be invoked from hardirq and > preempt tracers. This connection doesn't exist in a direct way, but > theoretically it's possible. There is no practical use though and I would be > happy to blacklist such bpf usage at a minimum. > > > We could use the approach proposed by Peterz's and Steven's patches to basically > > do a lightweight "is_rcu_watching()" check for rcuidle tracepoint, and only enable > > RCU for those cases. We could then simply go back on using regular RCU like so: > > > > #define __DO_TRACE(tp, proto, args, cond, rcuidle) \ > > do { \ > > struct tracepoint_func *it_func_ptr; \ > > void *it_func; \ > > void *__data; \ > > bool exit_rcu = false; \ > > \ > > if (!(cond)) \ > > return; \ > > \ > > if (rcuidle && !rcu_is_watching()) { \ > > rcu_irq_enter_irqson(); \ > > exit_rcu = true; \ > > } \ > > preempt_disable_notrace(); \ > > it_func_ptr = rcu_dereference_raw((tp)->funcs); \ > > if (it_func_ptr) { \ > > do { \ > > it_func = (it_func_ptr)->func; \ > > __data = (it_func_ptr)->data; \ > > ((void(*)(proto))(it_func))(args); \ > > } while ((++it_func_ptr)->func); \ > > } \ > > preempt_enable_notrace(); \ > > if (exit_rcu) \ > > rcu_irq_exit_irqson(); \ > > } while (0) > > I think it's a fine approach interim. > > Long term sounds like Paul is going to provide sleepable and low overhead > rcu_read_lock_for_tracers() that will include bpf. > My understanding that this new rcu flavor won't have "idle" issues, > so rcu_is_watching() checks will not be necessary. > And if we remove trace_preemptirq.c the only thing left will be Thomas's points > 1 (low level entry) and 2 (breakpoints) that can be addressed without > creating fancy .text annotations and teach objtool about it. > > In the mean time I've benchmarked srcu for sleepable bpf and it's quite heavy. > srcu_read_lock+unlock roughly adds 10x execution cost to trivial bpf prog. > I'm proceeding with it anyway, but really hoping that > rcu_read_lock_for_tracers() will materialize soon. > > In general I'm sceptical that .text annotations will work. Let's say all of > idle is a red zone. But a ton of normal functions are called when idle. So > objtool will go and mark them as red zone too. This way large percent of the > kernel will be off limits for tracers. Which is imo not a good trade off. I > think addressing 1 and 2 with explicit notrace/nokprobe annotations will cover > all practical cases where people can shot themselves in a foot with a tracer. I > realize that there will be forever whack-a-mole game and these annotations will > never reach 100%. I think it's a fine trade off. Security is never 100% either. > Tracing is never going to be 100% safe too.
next prev parent reply index Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-03-09 17:02 Thomas Gleixner 2020-03-09 18:15 ` Steven Rostedt 2020-03-09 18:42 ` Joel Fernandes 2020-03-09 19:07 ` Steven Rostedt 2020-03-09 19:20 ` Mathieu Desnoyers 2020-03-16 15:02 ` Joel Fernandes 2020-03-09 18:59 ` Thomas Gleixner 2020-03-10 8:09 ` Masami Hiramatsu 2020-03-10 11:43 ` Thomas Gleixner 2020-03-10 15:31 ` Mathieu Desnoyers 2020-03-10 15:46 ` Steven Rostedt 2020-03-10 16:21 ` Mathieu Desnoyers 2020-03-11 0:18 ` Masami Hiramatsu 2020-03-11 0:37 ` Mathieu Desnoyers 2020-03-11 7:48 ` Masami Hiramatsu 2020-03-10 16:06 ` Masami Hiramatsu 2020-03-12 13:53 ` Peter Zijlstra 2020-03-10 15:24 ` Mathieu Desnoyers 2020-03-10 17:05 ` Daniel Thompson 2020-03-09 18:37 ` Mathieu Desnoyers 2020-03-09 18:44 ` Steven Rostedt 2020-03-09 18:52 ` Mathieu Desnoyers 2020-03-09 19:09 ` Steven Rostedt 2020-03-09 19:25 ` Mathieu Desnoyers 2020-03-09 19:52 ` Thomas Gleixner 2020-03-10 15:03 ` Mathieu Desnoyers 2020-03-10 16:48 ` Thomas Gleixner 2020-03-10 17:40 ` Mathieu Desnoyers 2020-03-10 18:31 ` Thomas Gleixner 2020-03-10 18:37 ` Mathieu Desnoyers 2020-03-10 1:40 ` Alexei Starovoitov 2020-03-10 8:02 ` Thomas Gleixner 2020-03-10 16:54 ` Paul E. McKenney 2020-03-17 17:56 ` Joel Fernandes [this message] 2020-03-09 20:18 ` Peter Zijlstra 2020-03-09 20:47 ` Paul E. McKenney 2020-03-09 20:58 ` Steven Rostedt 2020-03-09 21:25 ` Paul E. McKenney 2020-03-09 23:52 ` Frederic Weisbecker 2020-03-10 2:26 ` Paul E. McKenney 2020-03-10 15:13 ` Mathieu Desnoyers 2020-03-10 16:49 ` Paul E. McKenney 2020-03-10 17:22 ` Mathieu Desnoyers 2020-03-10 17:26 ` Paul E. McKenney
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200317175614.GA13090@google.com \ --to=joel@joelfernandes.org \ --cc=alexei.starovoitov@gmail.com \ --cc=ast@kernel.org \ --cc=bpf@vger.kernel.org \ --cc=frederic@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mathieu.desnoyers@efficios.com \ --cc=mhiramat@kernel.org \ --cc=paulmck@kernel.org \ --cc=peterz@infradead.org \ --cc=rostedt@goodmis.org \ --cc=tglx@linutronix.de \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ linux-kernel@vger.kernel.org public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git