linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Brendan Gregg <brendan.d.gregg@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daniel Borkmann <daniel@iogearbox.net>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	joel@joelfernandes.org,
	"Naveen N . Rao" <naveen.n.rao@linux.ibm.com>,
	Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Subject: Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
Date: Thu, 23 Jan 2020 02:03:18 -0800	[thread overview]
Message-ID: <20200123100318.GP2935@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20200123151328.f977525ea447da3b7fe4256d@kernel.org>

On Thu, Jan 23, 2020 at 03:13:28PM +0900, Masami Hiramatsu wrote:
> On Wed, 22 Jan 2020 18:26:47 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > Anyway, without this update, I added a printk to count optimizer
> > > queue-length and found that all optimizer call with a single kprobe
> > > on the quenes. I think this depends on the machine, but as far as
> > > I tested on 8-threads qemu x86, shows this result...
> > > 
> > > Probes: 256 kprobe_events
> > > Enable events
> > > real	0m 0.02s
> > > user	0m 0.00s
> > > sys	0m 0.02s
> > > [   17.730548] Queue-update: 180, skipped, Total Queued: 180
> > > [   17.739445] Queue-update: 1, go, Total Queued: 180
> > > Disable events
> [...]
> > > [   41.135594] Queue-update: 1, go, Total Queued: 1
> > > real	0m 21.40s
> > > user	0m 0.00s
> > > sys	0m 0.04s
> > 
> > So 21.4s/256 = 84 milliseconds per event disable, correct?
> 
> Actually, it seems only 172 probes are on the unoptimized list, so
> the number will be a bit different.
> 
> Anyway, that above elapsed time is including non-batch optimizer
> working time as below.
> 
> (1) start looping on probe events
>   (2) disabling-kprobe
>     (2.1) wait kprobe_mutex if optimizer is running
>     (2.2) if the kprobe is on optimized kprobe, queue it to unoptimizing
>           list and kick optimizer with 5 jiffies delay
>   (4) unlink enabled event
>   (5) wait synchronize_rcu()
>   (6) optimizer start optimization before finishing (5)
> (7) goto (1)
> 
> I think the disabling performance issue came from (6) (and (2.1)).
> Thus, if we change (2.2) to 1 HZ jiffies, the optimizer will start
> after some loops are done. (and the optimizer detects "active"
> queuing, postpone the process)
> 
> > 
> > It might be worth trying synchronize_rcu_expedited() just as an experiment
> > to see if it speeds things up significantly.
> 
> Would you mean replacing synchronize_rcu() in disabling loop, or
> replacing synchronize_rcu_tasks() in optimizer?

The former.  (There is no synchronize_rcu_tasks_expedited().)

> I think that is not a root cause of this behavior, since if we
> make the optimizer delay to 1 sec, it seems enough for making
> it a batch operation. See below, this is the result with patched
> kernel (1 HZ delay).
> 
> Probes: 256 kprobe_events
> Enable events
> real	0m 0.07s
> user	0m 0.00s
> sys	0m 0.07s
> [   19.191181] Queue-update: 180, skipped, Total Queued: 180
> Disable events
> [   20.214966] Queue-update: 1, go, Total Queued: 172
> [   21.302924] Queue-update: 86, skipped, Total Queued: 86
> real	0m 2.11s
> user	0m 0.00s
> sys	0m 0.03s
> [   22.327173] Queue-update: 87, skipped, Total Queued: 172
> [   23.350933] Queue-update: 1, go, Total Queued: 172
> Remove events
> real	0m 2.13s
> user	0m 0.02s
> sys	0m 0.02s
> 
> As you can see, the optimizer ran outside of the disabling loop.
> In that case, it is OK to synchronize RCU tasks in the optimizer
> because it just runs *once* per multiple probe events.

Even better!

> >From above result, 86 probes are disabled per 1 sec delay.
> Each probe disabling took 11-12 msec in average. So 
> (HZ / 10) can also be good. (But note that the optimizer
> will retry to run each time until the disabling loop is
> finished.)
> 
> BTW, testing kernel was build with HZ=1000, if HZ=250 or HZ=100,
> the result will be different in the current code. So I think
> we should use HZ-based delay instead of fixed number.

Agreed, the HZ-based delay seems worth a try.

							Thanx, Paul

  reply	other threads:[~2020-01-23 10:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu
2020-01-22  0:29   ` Steven Rostedt
2020-01-22  7:23     ` Masami Hiramatsu
2020-01-22 12:11       ` Steven Rostedt
2020-01-22 13:12         ` Masami Hiramatsu
2020-01-22 16:54           ` Paul E. McKenney
2020-01-23  1:33             ` Masami Hiramatsu
2020-01-23  2:26               ` Paul E. McKenney
2020-01-23  6:13                 ` Masami Hiramatsu
2020-01-23 10:03                   ` Paul E. McKenney [this message]
2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu
2020-01-27 15:02   ` kbuild test robot
2020-01-27 15:02   ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot
2020-01-28 15:02     ` Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less Masami Hiramatsu
2020-01-16 14:46 ` [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs Masami Hiramatsu
2020-01-16 14:46 ` [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously Masami Hiramatsu
2020-01-16 14:46 ` [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe " Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200123100318.GP2935@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=acme@kernel.org \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brendan.d.gregg@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).