From: "Paul E. McKenney" <paulmck@kernel.org>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Brendan Gregg <brendan.d.gregg@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Ingo Molnar <mingo@kernel.org>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
Daniel Borkmann <daniel@iogearbox.net>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
"David S . Miller" <davem@davemloft.net>,
joel@joelfernandes.org,
"Naveen N . Rao" <naveen.n.rao@linux.ibm.com>,
Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Subject: Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
Date: Thu, 23 Jan 2020 02:03:18 -0800 [thread overview]
Message-ID: <20200123100318.GP2935@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20200123151328.f977525ea447da3b7fe4256d@kernel.org>
On Thu, Jan 23, 2020 at 03:13:28PM +0900, Masami Hiramatsu wrote:
> On Wed, 22 Jan 2020 18:26:47 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
>
> > > Anyway, without this update, I added a printk to count optimizer
> > > queue-length and found that all optimizer call with a single kprobe
> > > on the quenes. I think this depends on the machine, but as far as
> > > I tested on 8-threads qemu x86, shows this result...
> > >
> > > Probes: 256 kprobe_events
> > > Enable events
> > > real 0m 0.02s
> > > user 0m 0.00s
> > > sys 0m 0.02s
> > > [ 17.730548] Queue-update: 180, skipped, Total Queued: 180
> > > [ 17.739445] Queue-update: 1, go, Total Queued: 180
> > > Disable events
> [...]
> > > [ 41.135594] Queue-update: 1, go, Total Queued: 1
> > > real 0m 21.40s
> > > user 0m 0.00s
> > > sys 0m 0.04s
> >
> > So 21.4s/256 = 84 milliseconds per event disable, correct?
>
> Actually, it seems only 172 probes are on the unoptimized list, so
> the number will be a bit different.
>
> Anyway, that above elapsed time is including non-batch optimizer
> working time as below.
>
> (1) start looping on probe events
> (2) disabling-kprobe
> (2.1) wait kprobe_mutex if optimizer is running
> (2.2) if the kprobe is on optimized kprobe, queue it to unoptimizing
> list and kick optimizer with 5 jiffies delay
> (4) unlink enabled event
> (5) wait synchronize_rcu()
> (6) optimizer start optimization before finishing (5)
> (7) goto (1)
>
> I think the disabling performance issue came from (6) (and (2.1)).
> Thus, if we change (2.2) to 1 HZ jiffies, the optimizer will start
> after some loops are done. (and the optimizer detects "active"
> queuing, postpone the process)
>
> >
> > It might be worth trying synchronize_rcu_expedited() just as an experiment
> > to see if it speeds things up significantly.
>
> Would you mean replacing synchronize_rcu() in disabling loop, or
> replacing synchronize_rcu_tasks() in optimizer?
The former. (There is no synchronize_rcu_tasks_expedited().)
> I think that is not a root cause of this behavior, since if we
> make the optimizer delay to 1 sec, it seems enough for making
> it a batch operation. See below, this is the result with patched
> kernel (1 HZ delay).
>
> Probes: 256 kprobe_events
> Enable events
> real 0m 0.07s
> user 0m 0.00s
> sys 0m 0.07s
> [ 19.191181] Queue-update: 180, skipped, Total Queued: 180
> Disable events
> [ 20.214966] Queue-update: 1, go, Total Queued: 172
> [ 21.302924] Queue-update: 86, skipped, Total Queued: 86
> real 0m 2.11s
> user 0m 0.00s
> sys 0m 0.03s
> [ 22.327173] Queue-update: 87, skipped, Total Queued: 172
> [ 23.350933] Queue-update: 1, go, Total Queued: 172
> Remove events
> real 0m 2.13s
> user 0m 0.02s
> sys 0m 0.02s
>
> As you can see, the optimizer ran outside of the disabling loop.
> In that case, it is OK to synchronize RCU tasks in the optimizer
> because it just runs *once* per multiple probe events.
Even better!
> >From above result, 86 probes are disabled per 1 sec delay.
> Each probe disabling took 11-12 msec in average. So
> (HZ / 10) can also be good. (But note that the optimizer
> will retry to run each time until the disabling loop is
> finished.)
>
> BTW, testing kernel was build with HZ=1000, if HZ=250 or HZ=100,
> the result will be different in the current code. So I think
> we should use HZ-based delay instead of fixed number.
Agreed, the HZ-based delay seems worth a try.
Thanx, Paul
next prev parent reply other threads:[~2020-01-23 10:03 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu
2020-01-22 0:29 ` Steven Rostedt
2020-01-22 7:23 ` Masami Hiramatsu
2020-01-22 12:11 ` Steven Rostedt
2020-01-22 13:12 ` Masami Hiramatsu
2020-01-22 16:54 ` Paul E. McKenney
2020-01-23 1:33 ` Masami Hiramatsu
2020-01-23 2:26 ` Paul E. McKenney
2020-01-23 6:13 ` Masami Hiramatsu
2020-01-23 10:03 ` Paul E. McKenney [this message]
2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu
2020-01-27 15:02 ` kbuild test robot
2020-01-27 15:02 ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot
2020-01-28 15:02 ` Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less Masami Hiramatsu
2020-01-16 14:46 ` [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs Masami Hiramatsu
2020-01-16 14:46 ` [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously Masami Hiramatsu
2020-01-16 14:46 ` [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe " Masami Hiramatsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200123100318.GP2935@paulmck-ThinkPad-P72 \
--to=paulmck@kernel.org \
--cc=acme@kernel.org \
--cc=anil.s.keshavamurthy@intel.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brendan.d.gregg@gmail.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=naveen.n.rao@linux.ibm.com \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).