* [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration @ 2020-01-16 14:44 Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu ` (12 more replies) 0 siblings, 13 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Hello, Here is an experimental series of patches for asynchronous kprobe event disabling and unregistration. This fixes the performance issue (from operator's viewpoint) of disabling/removing kprobe events. This still under testing phase, so please try to test it, especially with eBPF. You can download this series from here; https://github.com/mhiramat/linux.git ftrace-nowait Background ========== In last Plumbers, Brendan showed me a demo that BPF tracer took too long time (more than 30 seconds?) to exit from tracing. I had seen similar issue on systemtap 10 years ago and had fixed it by array-based batch API (see commit 9861668f7478 ("kprobes: add (un)register_kprobes for batch registration").) I suspected there was similar issue. Investigation ============= While I'm checking the current ftrace-side performance with 256 kprobes (without ftrace-based kprobes)[1] , I found there were another issue. ---- Probes: 256 kprobe_events Enable events real 0m 0.00s user 0m 0.00s sys 0m 0.00s Disable events real 0m 21.40s user 0m 0.00s sys 0m 0.02s Remove events real 0m 2.24s user 0m 0.00s sys 0m 0.01s ---- OK, removing events took more than 2 seconds for 256 probe events. But disabling events took 21 seconds, 10 times longer than removing. Actually, since perf-events (base of BPF tracer) does disable and remove at once, it will take more than that. I also measured it without kprobe jump optimization (echo 0 > /proc/sys/debug/kprobe-optimization) and it changed the results as below. ---- Probes: 256 kprobe_events Enable events real 0m 0.00s user 0m 0.00s sys 0m 0.00s Disable events real 0m 2.07s user 0m 0.00s sys 0m 0.04s Remove events real 0m 2.13s user 0m 0.00s sys 0m 0.01s ---- I checked the kprobes and tracing code and what happened, and found that there might be mutex waiting for unoptimization. Disabling an optimized kprobe will be done without any RCU grace period at kprobes API level, but it kicks unoptimizer which will be done afterwards by delayed workqueue. On the other hand, since the ftrace needs to remove disabled probe from an event (which is on RCU list), it waits RCU grace period to free the event. Thus, if we disable multiple probes at once, it repeats (1) disable kprobes (2) kick (un)optimizer (3) remove event from rcu list and (4) waits RCU grace period. And while (4), if the (un)optimizer starts, then the 2nd (1) will stopped because kprobe_mutex is already held by the (un)optimizer. For kprobe unoptimization, the (un)optimizer locks global kprobe_mutex and waits for some task to be scheduled on all CPUs for safety, which takes a while. So if we repeating the above process, the disabling will take a long time. (and actually, it took 21 seconds) This actually came from the delaying period is not enough for kprobes, at this moment it just wait for 5 jiffies, but it should be enough longer than the batch disabling process. Also since such the batch disabling process involves many rcu grace period wait, from operator's viewpoint, it takes a long time. For example, from above experiment, disabling 256 probes took around 2 seconds. These waits can be replaced by call_rcu() and workqueues. Improvements ============ This series is a trial to solve these issues by introducing many asynchronous improvements. Here is the result. I have tried to remove redundant RCU synchronizations from kprobes and kprobe events with asynchronous APIs. ----- Probes: 256 kprobe_events Enable events real 0m 0.03s user 0m 0.00s sys 0m 0.02s Disable events real 0m 0.05s user 0m 0.00s sys 0m 0.01s Remove events real 0m 0.03s user 0m 0.00s sys 0m 0.01s ----- Note that this results is on the ftrace side, bpf will still have some time to stop the tracing, because perf_event doesn't have independent event disabling code. The ftrace has 2 operations to stop the tracing, disable events and remove events. Thus it is easy to wait an RCU grace period between them. On the other hand, perf event has only "destroy" operation which disables event and remove event at once. To do it safely, in the destroy operation, the perf_trace_event_unreg() calls tracepoint_synchronize_unregister(), which is an alias of synchronize_rcu(), for each event. Thus I guess eBPF still take O(n) to stop tracing, but it might be 1/10 shorter than before with this series. Patches ======= First 2 patches are minor fix and cleanup. I will send those as a separated series. The [3/13] and [4/13] are postpone (and give more delay) the kprobe unoptimizing process. This will minimize kprobe_mutex collision with disabling/unregistering process. [5/13] is modifying ftrace side to remove unused event_file_link to free in call_rcu(). [6/13] is enabling kprobe-booster on preemptive kernel for preparation. The purpose of this is replacing synchronize_rcu() with synchronize_rcu_tasks(). [7/13]-[10/13] are refining instruction cache reclaiming process to be asynchronous. [11/13] adds asynchronous unregister API for kprobes and [12/13] and [13/13] uses that API from ftrace side. Thank you, [1] Here is the test script ------------------ #!/bin/sh # SPDX-License-Identifier: GPL-2.0 # description: Register/unregister many kprobe events cd /sys/kernel/debug/tracing # ftrace fentry skip size depends on the machine architecture. # Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc64le case `uname -m` in x86_64|i[3456]86) OFFS=5;; ppc64le) OFFS=8;; *) OFFS=0;; esac N=0 echo "Setup up kprobes on first available 256 text symbols" grep -i " t " /proc/kallsyms | cut -f3 -d" " | grep -v .*\\..* | \ while read i; do echo p ${i}+${OFFS} >> kprobe_events && N=$((N+1)) &> /dev/null ||: test $N -eq 256 && break done echo -n "Probes: " ; wc -l kprobe_events echo "Enable events" echo 1 | time tee events/kprobes/enable > /dev/null sleep 2 echo "Disable events" echo 0 | time tee events/kprobes/enable > /dev/null sleep 2 echo "Remove events" echo | time tee kprobe_events > /dev/null ------------------ --- Masami Hiramatsu (13): kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex kprobes: Remove redundant arch_disarm_kprobe() call kprobes: Postpone optimizer until a bunch of probes (un)registered kprobes: Make optimizer delay to 1 second tracing/kprobe: Use call_rcu to defer freeing event_file_link kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y kprobes: Use normal list traversal API if a mutex is held kprobes: Use workqueue for reclaiming kprobe insn cache pages kprobes: Free kprobe_insn_page asynchronously kprobes: Make free_*insn_slot() mutex-less kprobes: Add asynchronous unregistration APIs tracing/kprobe: Free probe event asynchronously tracing/kprobe: perf_event: Remove local kprobe event asynchronously arch/Kconfig | 4 + arch/ia64/Kconfig | 1 arch/ia64/kernel/kprobes.c | 3 arch/powerpc/kernel/optprobes.c | 1 arch/x86/Kconfig | 1 arch/x86/kernel/kprobes/core.c | 2 include/linux/kprobes.h | 13 ++ kernel/kprobes.c | 286 ++++++++++++++++++++++++++------------- kernel/trace/trace_dynevent.c | 5 + kernel/trace/trace_kprobe.c | 85 +++++++++--- kernel/trace/trace_probe.c | 10 + kernel/trace/trace_probe.h | 1 12 files changed, 293 insertions(+), 119 deletions(-) -- Masami Hiramatsu (Linaro) <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu @ 2020-01-16 14:44 ` Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call Masami Hiramatsu ` (11 subsequent siblings) 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy In kprobe_optimizer() kick_kprobe_optimizer() is called without kprobe_mutex, but this can race with other caller which is protected by kprobe_mutex. To fix that, expand kprobe_mutex protected area to protect kick_kprobe_optimizer() call. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/kprobes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 5a664f995377..52b05ab9c323 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -592,11 +592,12 @@ static void kprobe_optimizer(struct work_struct *work) mutex_unlock(&module_mutex); mutex_unlock(&text_mutex); cpus_read_unlock(); - mutex_unlock(&kprobe_mutex); /* Step 5: Kick optimizer again if needed */ if (!list_empty(&optimizing_list) || !list_empty(&unoptimizing_list)) kick_kprobe_optimizer(); + + mutex_unlock(&kprobe_mutex); } /* Wait for completing optimization and unoptimization */ ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu @ 2020-01-16 14:44 ` Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered Masami Hiramatsu ` (10 subsequent siblings) 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Fix to remove redundant arch_disarm_kprobe() call in force_unoptimize_kprobe(). This arch_disarm_kprobe() will be done if the kprobe is optimized but disabled, but that means the kprobe (optprobe) is unused (unoptimizing) state. In that case, unoptimize_kprobe() puts it in freeing_list and kprobe_optimizer automatically disarm it. So this arch_disarm_kprobe() is redundant. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/kprobes.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 52b05ab9c323..a2c755e79be7 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -674,8 +674,6 @@ static void force_unoptimize_kprobe(struct optimized_kprobe *op) lockdep_assert_cpus_held(); arch_unoptimize_kprobe(op); op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED; - if (kprobe_disabled(&op->kp)) - arch_disarm_kprobe(&op->kp); } /* Unoptimize a kprobe if p is optimized */ ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call Masami Hiramatsu @ 2020-01-16 14:44 ` Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu ` (9 subsequent siblings) 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Add a counter to kick_kprobe_optimizer() to detect any additional register/unregister kprobes and postpone kprobe_optimizer() until a bunch of probes are registered. This might improve some long waiting unregistration process for bunch of kprobes. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/kprobes.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index a2c755e79be7..0dacdcecc90f 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -465,6 +465,7 @@ static struct kprobe *get_optimized_kprobe(unsigned long addr) static LIST_HEAD(optimizing_list); static LIST_HEAD(unoptimizing_list); static LIST_HEAD(freeing_list); +static int kprobe_optimizer_queue_update; static void kprobe_optimizer(struct work_struct *work); static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer); @@ -555,12 +556,22 @@ static void do_free_cleaned_kprobes(void) static void kick_kprobe_optimizer(void) { schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY); + kprobe_optimizer_queue_update++; } /* Kprobe jump optimizer */ static void kprobe_optimizer(struct work_struct *work) { mutex_lock(&kprobe_mutex); + + /* + * If new kprobe is queued in optimized/unoptimized list while + * OPTIMIZE_DELAY waiting period, wait again for a series of + * probes registration/unregistrations. + */ + if (kprobe_optimizer_queue_update > 1) + goto end; + cpus_read_lock(); mutex_lock(&text_mutex); /* Lock modules while optimizing kprobes */ @@ -593,6 +604,8 @@ static void kprobe_optimizer(struct work_struct *work) mutex_unlock(&text_mutex); cpus_read_unlock(); +end: + kprobe_optimizer_queue_update = 0; /* Step 5: Kick optimizer again if needed */ if (!list_empty(&optimizing_list) || !list_empty(&unoptimizing_list)) kick_kprobe_optimizer(); ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (2 preceding siblings ...) 2020-01-16 14:44 ` [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered Masami Hiramatsu @ 2020-01-16 14:44 ` Masami Hiramatsu 2020-01-22 0:29 ` Steven Rostedt 2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu ` (8 subsequent siblings) 12 siblings, 1 reply; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Since the 5 jiffies delay for the optimizer is too short to wait for other probes, make it longer, like 1 second. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/kprobes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 0dacdcecc90f..9c6e230852ad 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -469,7 +469,8 @@ static int kprobe_optimizer_queue_update; static void kprobe_optimizer(struct work_struct *work); static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer); -#define OPTIMIZE_DELAY 5 +/* Wait 1 second for starting optimization */ +#define OPTIMIZE_DELAY HZ /* * Optimize (replace a breakpoint with a jump) kprobes listed on ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu @ 2020-01-22 0:29 ` Steven Rostedt 2020-01-22 7:23 ` Masami Hiramatsu 0 siblings, 1 reply; 26+ messages in thread From: Steven Rostedt @ 2020-01-22 0:29 UTC (permalink / raw) To: Masami Hiramatsu Cc: Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy On Thu, 16 Jan 2020 23:44:52 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > Since the 5 jiffies delay for the optimizer is too > short to wait for other probes, make it longer, > like 1 second. Hi Masami, Can you explain more *why* 5 jiffies is too short. Thanks! -- Steve > > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> > --- > kernel/kprobes.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index 0dacdcecc90f..9c6e230852ad 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -469,7 +469,8 @@ static int kprobe_optimizer_queue_update; > > static void kprobe_optimizer(struct work_struct *work); > static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer); > -#define OPTIMIZE_DELAY 5 > +/* Wait 1 second for starting optimization */ > +#define OPTIMIZE_DELAY HZ > > /* > * Optimize (replace a breakpoint with a jump) kprobes listed on ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-22 0:29 ` Steven Rostedt @ 2020-01-22 7:23 ` Masami Hiramatsu 2020-01-22 12:11 ` Steven Rostedt 0 siblings, 1 reply; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-22 7:23 UTC (permalink / raw) To: Steven Rostedt Cc: Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy On Tue, 21 Jan 2020 19:29:05 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Thu, 16 Jan 2020 23:44:52 +0900 > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > Since the 5 jiffies delay for the optimizer is too > > short to wait for other probes, make it longer, > > like 1 second. > > Hi Masami, > > Can you explain more *why* 5 jiffies is too short. Yes, I had introduced this 5 jiffies delay for multiple probe registration and unregistration like systemtap, which will use array-based interface to register/unregister. In that case, 5 jiffies will be enough for the delay to wait for other kprobe registration/unregsitration. However, since perf and ftrace register/unregister probes one-by-one with RCU synchronization interval, the optimizer will be started before finishing to register/unregister all probes. And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization. Since the kprobe_mutex is also involved in disabling kprobes, this also stops probe-event disabling. Maybe 5 jiffies is enough for adding/removing a few probe events, but not enough for dozens of probe events. Thank you, > > Thanks! > > -- Steve > > > > > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> > > --- > > kernel/kprobes.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > > index 0dacdcecc90f..9c6e230852ad 100644 > > --- a/kernel/kprobes.c > > +++ b/kernel/kprobes.c > > @@ -469,7 +469,8 @@ static int kprobe_optimizer_queue_update; > > > > static void kprobe_optimizer(struct work_struct *work); > > static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer); > > -#define OPTIMIZE_DELAY 5 > > +/* Wait 1 second for starting optimization */ > > +#define OPTIMIZE_DELAY HZ > > > > /* > > * Optimize (replace a breakpoint with a jump) kprobes listed on > -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-22 7:23 ` Masami Hiramatsu @ 2020-01-22 12:11 ` Steven Rostedt 2020-01-22 13:12 ` Masami Hiramatsu 0 siblings, 1 reply; 26+ messages in thread From: Steven Rostedt @ 2020-01-22 12:11 UTC (permalink / raw) To: Masami Hiramatsu Cc: Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy On Wed, 22 Jan 2020 16:23:17 +0900 Masami Hiramatsu <mhiramat@kernel.org> wrote: > On Tue, 21 Jan 2020 19:29:05 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Thu, 16 Jan 2020 23:44:52 +0900 > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > Since the 5 jiffies delay for the optimizer is too > > > short to wait for other probes, make it longer, > > > like 1 second. > > > > Hi Masami, > > > > Can you explain more *why* 5 jiffies is too short. > > Yes, I had introduced this 5 jiffies delay for multiple probe registration > and unregistration like systemtap, which will use array-based interface to > register/unregister. In that case, 5 jiffies will be enough for the delay > to wait for other kprobe registration/unregsitration. > > However, since perf and ftrace register/unregister probes one-by-one with > RCU synchronization interval, the optimizer will be started before > finishing to register/unregister all probes. > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization. > Since the kprobe_mutex is also involved in disabling kprobes, this also > stops probe-event disabling. > > Maybe 5 jiffies is enough for adding/removing a few probe events, but > not enough for dozens of probe events. > Perhaps we should have a mechanism that can detect new probes being added, and just continue to delay the optimization, instead of having some arbitrary delay. -- Steve ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-22 12:11 ` Steven Rostedt @ 2020-01-22 13:12 ` Masami Hiramatsu 2020-01-22 16:54 ` Paul E. McKenney 0 siblings, 1 reply; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-22 13:12 UTC (permalink / raw) To: Steven Rostedt Cc: Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy On Wed, 22 Jan 2020 07:11:15 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 22 Jan 2020 16:23:17 +0900 > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > On Tue, 21 Jan 2020 19:29:05 -0500 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > On Thu, 16 Jan 2020 23:44:52 +0900 > > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > > > Since the 5 jiffies delay for the optimizer is too > > > > short to wait for other probes, make it longer, > > > > like 1 second. > > > > > > Hi Masami, > > > > > > Can you explain more *why* 5 jiffies is too short. > > > > Yes, I had introduced this 5 jiffies delay for multiple probe registration > > and unregistration like systemtap, which will use array-based interface to > > register/unregister. In that case, 5 jiffies will be enough for the delay > > to wait for other kprobe registration/unregsitration. > > > > However, since perf and ftrace register/unregister probes one-by-one with > > RCU synchronization interval, the optimizer will be started before > > finishing to register/unregister all probes. > > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization. > > Since the kprobe_mutex is also involved in disabling kprobes, this also > > stops probe-event disabling. > > > > Maybe 5 jiffies is enough for adding/removing a few probe events, but > > not enough for dozens of probe events. > > > > Perhaps we should have a mechanism that can detect new probes being > added, and just continue to delay the optimization, instead of having > some arbitrary delay. Yes, that is what [03/13] does :) Anyway, it seems that the RCU-synchronization takes more than 5 jiffies. And in that case, [03/13] still doesn't work. That's why I added this patch after that. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-22 13:12 ` Masami Hiramatsu @ 2020-01-22 16:54 ` Paul E. McKenney 2020-01-23 1:33 ` Masami Hiramatsu 0 siblings, 1 reply; 26+ messages in thread From: Paul E. McKenney @ 2020-01-22 16:54 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy On Wed, Jan 22, 2020 at 10:12:40PM +0900, Masami Hiramatsu wrote: > On Wed, 22 Jan 2020 07:11:15 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Wed, 22 Jan 2020 16:23:17 +0900 > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > On Tue, 21 Jan 2020 19:29:05 -0500 > > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > On Thu, 16 Jan 2020 23:44:52 +0900 > > > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > > > > > Since the 5 jiffies delay for the optimizer is too > > > > > short to wait for other probes, make it longer, > > > > > like 1 second. > > > > > > > > Hi Masami, > > > > > > > > Can you explain more *why* 5 jiffies is too short. > > > > > > Yes, I had introduced this 5 jiffies delay for multiple probe registration > > > and unregistration like systemtap, which will use array-based interface to > > > register/unregister. In that case, 5 jiffies will be enough for the delay > > > to wait for other kprobe registration/unregsitration. > > > > > > However, since perf and ftrace register/unregister probes one-by-one with > > > RCU synchronization interval, the optimizer will be started before > > > finishing to register/unregister all probes. > > > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization. > > > Since the kprobe_mutex is also involved in disabling kprobes, this also > > > stops probe-event disabling. > > > > > > Maybe 5 jiffies is enough for adding/removing a few probe events, but > > > not enough for dozens of probe events. > > > > > > > Perhaps we should have a mechanism that can detect new probes being > > added, and just continue to delay the optimization, instead of having > > some arbitrary delay. > > Yes, that is what [03/13] does :) > Anyway, it seems that the RCU-synchronization takes more than 5 jiffies. > And in that case, [03/13] still doesn't work. That's why I added this patch > after that. If the RCU synchronization is synchronize_rcu_tasks(), then yes, it will often take way more than 5 jiffies. If it is synchronize_rcu(), 5 jiffies would not be unusual, especially on larger systems. But in the case of synchronize_rcu(), one option is to instead use synchronize_rcu_expedited(). It is not clear that this last is really justified in this case, but figured it might be worth mentioning. Thanx, Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-22 16:54 ` Paul E. McKenney @ 2020-01-23 1:33 ` Masami Hiramatsu 2020-01-23 2:26 ` Paul E. McKenney 0 siblings, 1 reply; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-23 1:33 UTC (permalink / raw) To: paulmck Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy On Wed, 22 Jan 2020 08:54:32 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > On Wed, Jan 22, 2020 at 10:12:40PM +0900, Masami Hiramatsu wrote: > > On Wed, 22 Jan 2020 07:11:15 -0500 > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > On Wed, 22 Jan 2020 16:23:17 +0900 > > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > > > On Tue, 21 Jan 2020 19:29:05 -0500 > > > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > > > On Thu, 16 Jan 2020 23:44:52 +0900 > > > > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > > > > > > > Since the 5 jiffies delay for the optimizer is too > > > > > > short to wait for other probes, make it longer, > > > > > > like 1 second. > > > > > > > > > > Hi Masami, > > > > > > > > > > Can you explain more *why* 5 jiffies is too short. > > > > > > > > Yes, I had introduced this 5 jiffies delay for multiple probe registration > > > > and unregistration like systemtap, which will use array-based interface to > > > > register/unregister. In that case, 5 jiffies will be enough for the delay > > > > to wait for other kprobe registration/unregsitration. > > > > > > > > However, since perf and ftrace register/unregister probes one-by-one with > > > > RCU synchronization interval, the optimizer will be started before > > > > finishing to register/unregister all probes. > > > > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization. > > > > Since the kprobe_mutex is also involved in disabling kprobes, this also > > > > stops probe-event disabling. > > > > > > > > Maybe 5 jiffies is enough for adding/removing a few probe events, but > > > > not enough for dozens of probe events. > > > > > > > > > > Perhaps we should have a mechanism that can detect new probes being > > > added, and just continue to delay the optimization, instead of having > > > some arbitrary delay. > > > > Yes, that is what [03/13] does :) > > Anyway, it seems that the RCU-synchronization takes more than 5 jiffies. > > And in that case, [03/13] still doesn't work. That's why I added this patch > > after that. > > If the RCU synchronization is synchronize_rcu_tasks(), then yes, it > will often take way more than 5 jiffies. If it is synchronize_rcu(), > 5 jiffies would not be unusual, especially on larger systems. > But in the case of synchronize_rcu(), one option is to instead use > synchronize_rcu_expedited(). It is not clear that this last is really > justified in this case, but figured it might be worth mentioning. It is synchronize_rcu(), but it seems sometimes it is called several times on one probe disabling. Anyway, without this update, I added a printk to count optimizer queue-length and found that all optimizer call with a single kprobe on the quenes. I think this depends on the machine, but as far as I tested on 8-threads qemu x86, shows this result... Probes: 256 kprobe_events Enable events real 0m 0.02s user 0m 0.00s sys 0m 0.02s [ 17.730548] Queue-update: 180, skipped, Total Queued: 180 [ 17.739445] Queue-update: 1, go, Total Queued: 180 Disable events [ 19.744634] Queue-update: 1, go, Total Queued: 1 [ 19.770743] Queue-update: 1, go, Total Queued: 1 [ 19.886625] Queue-update: 1, go, Total Queued: 1 [ 20.006631] Queue-update: 1, go, Total Queued: 1 [ 20.126628] Queue-update: 1, go, Total Queued: 1 [ 20.246620] Queue-update: 1, go, Total Queued: 1 [ 20.374665] Queue-update: 1, go, Total Queued: 1 [ 20.486617] Queue-update: 1, go, Total Queued: 1 [ 20.606608] Queue-update: 1, go, Total Queued: 1 [ 20.726596] Queue-update: 1, go, Total Queued: 1 [ 20.846608] Queue-update: 1, go, Total Queued: 1 [ 20.966723] Queue-update: 1, go, Total Queued: 1 [ 21.086611] Queue-update: 1, go, Total Queued: 1 [ 21.206605] Queue-update: 1, go, Total Queued: 1 [ 21.326603] Queue-update: 1, go, Total Queued: 1 [ 21.462609] Queue-update: 1, go, Total Queued: 1 [ 21.566755] Queue-update: 1, go, Total Queued: 1 [ 21.686606] Queue-update: 1, go, Total Queued: 1 [ 21.806611] Queue-update: 1, go, Total Queued: 1 [ 21.926609] Queue-update: 1, go, Total Queued: 1 [ 22.046621] Queue-update: 1, go, Total Queued: 1 [ 22.166729] Queue-update: 1, go, Total Queued: 1 [ 22.302609] Queue-update: 1, go, Total Queued: 1 [ 22.510627] Queue-update: 1, go, Total Queued: 1 [ 22.536638] Queue-update: 1, go, Total Queued: 1 [ 22.654618] Queue-update: 1, go, Total Queued: 1 [ 22.774643] Queue-update: 1, go, Total Queued: 1 [ 22.902609] Queue-update: 1, go, Total Queued: 1 [ 23.022608] Queue-update: 1, go, Total Queued: 1 [ 23.158606] Queue-update: 1, go, Total Queued: 1 [ 23.254618] Queue-update: 1, go, Total Queued: 1 [ 23.374647] Queue-update: 1, go, Total Queued: 1 [ 23.494605] Queue-update: 1, go, Total Queued: 1 [ 23.614610] Queue-update: 1, go, Total Queued: 1 [ 23.734606] Queue-update: 1, go, Total Queued: 1 [ 23.854609] Queue-update: 1, go, Total Queued: 1 [ 23.974615] Queue-update: 1, go, Total Queued: 1 [ 24.094609] Queue-update: 1, go, Total Queued: 1 [ 24.230607] Queue-update: 1, go, Total Queued: 1 [ 24.342625] Queue-update: 1, go, Total Queued: 1 [ 24.475478] Queue-update: 1, go, Total Queued: 1 [ 24.574554] Queue-update: 1, go, Total Queued: 1 [ 24.694605] Queue-update: 1, go, Total Queued: 1 [ 24.814585] Queue-update: 1, go, Total Queued: 1 [ 24.934591] Queue-update: 1, go, Total Queued: 1 [ 25.054614] Queue-update: 1, go, Total Queued: 1 [ 25.174628] Queue-update: 1, go, Total Queued: 1 [ 25.294637] Queue-update: 1, go, Total Queued: 1 [ 25.414620] Queue-update: 1, go, Total Queued: 1 [ 25.534553] Queue-update: 1, go, Total Queued: 1 [ 25.654610] Queue-update: 1, go, Total Queued: 1 [ 25.774627] Queue-update: 1, go, Total Queued: 1 [ 25.894609] Queue-update: 1, go, Total Queued: 1 [ 26.030548] Queue-update: 1, go, Total Queued: 1 [ 26.134626] Queue-update: 1, go, Total Queued: 1 [ 26.254620] Queue-update: 1, go, Total Queued: 1 [ 26.373603] Queue-update: 1, go, Total Queued: 1 [ 26.502606] Queue-update: 1, go, Total Queued: 1 [ 26.614607] Queue-update: 1, go, Total Queued: 1 [ 26.734610] Queue-update: 1, go, Total Queued: 1 [ 26.854620] Queue-update: 1, go, Total Queued: 1 [ 26.974660] Queue-update: 1, go, Total Queued: 1 [ 27.094620] Queue-update: 1, go, Total Queued: 1 [ 27.214596] Queue-update: 1, go, Total Queued: 1 [ 27.334640] Queue-update: 1, go, Total Queued: 1 [ 27.494606] Queue-update: 1, go, Total Queued: 1 [ 27.574703] Queue-update: 1, go, Total Queued: 1 [ 27.694609] Queue-update: 1, go, Total Queued: 1 [ 27.814607] Queue-update: 1, go, Total Queued: 1 [ 27.934631] Queue-update: 1, go, Total Queued: 1 [ 28.054606] Queue-update: 1, go, Total Queued: 1 [ 28.182685] Queue-update: 1, go, Total Queued: 1 [ 28.318568] Queue-update: 1, go, Total Queued: 1 [ 28.422610] Queue-update: 1, go, Total Queued: 1 [ 28.534621] Queue-update: 1, go, Total Queued: 1 [ 28.654618] Queue-update: 1, go, Total Queued: 1 [ 28.774622] Queue-update: 1, go, Total Queued: 1 [ 28.894609] Queue-update: 1, go, Total Queued: 1 [ 29.022609] Queue-update: 1, go, Total Queued: 1 [ 29.150615] Queue-update: 1, go, Total Queued: 1 [ 29.262610] Queue-update: 1, go, Total Queued: 1 [ 29.374621] Queue-update: 1, go, Total Queued: 1 [ 29.494606] Queue-update: 1, go, Total Queued: 1 [ 29.614616] Queue-update: 1, go, Total Queued: 1 [ 29.734607] Queue-update: 1, go, Total Queued: 1 [ 29.854601] Queue-update: 1, go, Total Queued: 1 [ 29.974610] Queue-update: 1, go, Total Queued: 1 [ 30.094625] Queue-update: 1, go, Total Queued: 1 [ 30.214606] Queue-update: 1, go, Total Queued: 1 [ 30.334602] Queue-update: 1, go, Total Queued: 1 [ 30.454634] Queue-update: 1, go, Total Queued: 1 [ 30.574606] Queue-update: 1, go, Total Queued: 1 [ 30.694589] Queue-update: 1, go, Total Queued: 1 [ 30.814613] Queue-update: 1, go, Total Queued: 1 [ 30.934602] Queue-update: 1, go, Total Queued: 1 [ 31.054605] Queue-update: 1, go, Total Queued: 1 [ 31.182596] Queue-update: 1, go, Total Queued: 1 [ 31.318621] Queue-update: 1, go, Total Queued: 1 [ 31.414615] Queue-update: 1, go, Total Queued: 1 [ 31.534610] Queue-update: 1, go, Total Queued: 1 [ 31.670608] Queue-update: 1, go, Total Queued: 1 [ 31.774626] Queue-update: 1, go, Total Queued: 1 [ 31.894607] Queue-update: 1, go, Total Queued: 1 [ 32.014609] Queue-update: 1, go, Total Queued: 1 [ 32.134607] Queue-update: 1, go, Total Queued: 1 [ 32.254611] Queue-update: 1, go, Total Queued: 1 [ 32.374608] Queue-update: 1, go, Total Queued: 1 [ 32.494619] Queue-update: 1, go, Total Queued: 1 [ 32.614607] Queue-update: 1, go, Total Queued: 1 [ 32.734612] Queue-update: 1, go, Total Queued: 1 [ 32.862616] Queue-update: 1, go, Total Queued: 1 [ 32.974620] Queue-update: 1, go, Total Queued: 1 [ 33.110609] Queue-update: 1, go, Total Queued: 1 [ 33.214609] Queue-update: 1, go, Total Queued: 1 [ 33.342611] Queue-update: 1, go, Total Queued: 1 [ 33.454607] Queue-update: 1, go, Total Queued: 1 [ 33.574611] Queue-update: 1, go, Total Queued: 1 [ 33.694619] Queue-update: 1, go, Total Queued: 1 [ 33.814607] Queue-update: 1, go, Total Queued: 1 [ 33.950614] Queue-update: 1, go, Total Queued: 1 [ 34.062609] Queue-update: 1, go, Total Queued: 1 [ 34.174609] Queue-update: 1, go, Total Queued: 1 [ 34.294619] Queue-update: 1, go, Total Queued: 1 [ 34.430533] Queue-update: 1, go, Total Queued: 1 [ 34.534594] Queue-update: 1, go, Total Queued: 1 [ 34.654605] Queue-update: 1, go, Total Queued: 1 [ 34.790596] Queue-update: 1, go, Total Queued: 1 [ 34.902611] Queue-update: 1, go, Total Queued: 1 [ 35.014630] Queue-update: 1, go, Total Queued: 1 [ 35.134634] Queue-update: 1, go, Total Queued: 1 [ 35.262608] Queue-update: 1, go, Total Queued: 1 [ 35.374634] Queue-update: 1, go, Total Queued: 1 [ 35.494617] Queue-update: 1, go, Total Queued: 1 [ 35.622608] Queue-update: 1, go, Total Queued: 1 [ 35.742610] Queue-update: 1, go, Total Queued: 1 [ 35.854572] Queue-update: 1, go, Total Queued: 1 [ 35.982596] Queue-update: 1, go, Total Queued: 1 [ 36.094603] Queue-update: 1, go, Total Queued: 1 [ 36.222612] Queue-update: 1, go, Total Queued: 1 [ 36.334610] Queue-update: 1, go, Total Queued: 1 [ 36.454619] Queue-update: 1, go, Total Queued: 1 [ 36.574619] Queue-update: 1, go, Total Queued: 1 [ 36.694643] Queue-update: 1, go, Total Queued: 1 [ 36.814614] Queue-update: 1, go, Total Queued: 1 [ 36.934610] Queue-update: 1, go, Total Queued: 1 [ 37.062609] Queue-update: 1, go, Total Queued: 1 [ 37.198611] Queue-update: 1, go, Total Queued: 1 [ 37.294618] Queue-update: 1, go, Total Queued: 1 [ 37.414618] Queue-update: 1, go, Total Queued: 1 [ 37.534595] Queue-update: 1, go, Total Queued: 1 [ 37.662594] Queue-update: 1, go, Total Queued: 1 [ 37.774610] Queue-update: 1, go, Total Queued: 1 [ 37.894618] Queue-update: 1, go, Total Queued: 1 [ 38.014619] Queue-update: 1, go, Total Queued: 1 [ 38.142612] Queue-update: 1, go, Total Queued: 1 [ 38.254609] Queue-update: 1, go, Total Queued: 1 [ 38.374619] Queue-update: 1, go, Total Queued: 1 [ 38.502481] Queue-update: 1, go, Total Queued: 1 [ 38.614596] Queue-update: 1, go, Total Queued: 1 [ 38.734597] Queue-update: 1, go, Total Queued: 1 [ 38.854606] Queue-update: 1, go, Total Queued: 1 [ 38.974620] Queue-update: 1, go, Total Queued: 1 [ 39.094617] Queue-update: 1, go, Total Queued: 1 [ 39.222597] Queue-update: 1, go, Total Queued: 1 [ 39.334610] Queue-update: 1, go, Total Queued: 1 [ 39.454609] Queue-update: 1, go, Total Queued: 1 [ 39.574633] Queue-update: 1, go, Total Queued: 1 [ 39.694611] Queue-update: 1, go, Total Queued: 1 [ 39.814608] Queue-update: 1, go, Total Queued: 1 [ 39.934610] Queue-update: 1, go, Total Queued: 1 [ 40.054621] Queue-update: 1, go, Total Queued: 1 [ 40.174611] Queue-update: 1, go, Total Queued: 1 [ 40.297471] Queue-update: 1, go, Total Queued: 1 [ 40.414504] Queue-update: 1, go, Total Queued: 1 [ 40.534601] Queue-update: 1, go, Total Queued: 1 [ 40.654611] Queue-update: 1, go, Total Queued: 1 [ 40.774609] Queue-update: 1, go, Total Queued: 1 [ 40.894628] Queue-update: 1, go, Total Queued: 1 [ 41.014608] Queue-update: 1, go, Total Queued: 1 [ 41.135594] Queue-update: 1, go, Total Queued: 1 real 0m 21.40s user 0m 0.00s sys 0m 0.04s Remove events real 0m 2.14s user 0m 0.00s sys 0m 0.04s Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-23 1:33 ` Masami Hiramatsu @ 2020-01-23 2:26 ` Paul E. McKenney 2020-01-23 6:13 ` Masami Hiramatsu 0 siblings, 1 reply; 26+ messages in thread From: Paul E. McKenney @ 2020-01-23 2:26 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy On Thu, Jan 23, 2020 at 10:33:34AM +0900, Masami Hiramatsu wrote: > On Wed, 22 Jan 2020 08:54:32 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > On Wed, Jan 22, 2020 at 10:12:40PM +0900, Masami Hiramatsu wrote: > > > On Wed, 22 Jan 2020 07:11:15 -0500 > > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > On Wed, 22 Jan 2020 16:23:17 +0900 > > > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > > > > > On Tue, 21 Jan 2020 19:29:05 -0500 > > > > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > > > > > On Thu, 16 Jan 2020 23:44:52 +0900 > > > > > > Masami Hiramatsu <mhiramat@kernel.org> wrote: > > > > > > > > > > > > > Since the 5 jiffies delay for the optimizer is too > > > > > > > short to wait for other probes, make it longer, > > > > > > > like 1 second. > > > > > > > > > > > > Hi Masami, > > > > > > > > > > > > Can you explain more *why* 5 jiffies is too short. > > > > > > > > > > Yes, I had introduced this 5 jiffies delay for multiple probe registration > > > > > and unregistration like systemtap, which will use array-based interface to > > > > > register/unregister. In that case, 5 jiffies will be enough for the delay > > > > > to wait for other kprobe registration/unregsitration. > > > > > > > > > > However, since perf and ftrace register/unregister probes one-by-one with > > > > > RCU synchronization interval, the optimizer will be started before > > > > > finishing to register/unregister all probes. > > > > > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization. > > > > > Since the kprobe_mutex is also involved in disabling kprobes, this also > > > > > stops probe-event disabling. > > > > > > > > > > Maybe 5 jiffies is enough for adding/removing a few probe events, but > > > > > not enough for dozens of probe events. > > > > > > > > > > > > > Perhaps we should have a mechanism that can detect new probes being > > > > added, and just continue to delay the optimization, instead of having > > > > some arbitrary delay. > > > > > > Yes, that is what [03/13] does :) > > > Anyway, it seems that the RCU-synchronization takes more than 5 jiffies. > > > And in that case, [03/13] still doesn't work. That's why I added this patch > > > after that. > > > > If the RCU synchronization is synchronize_rcu_tasks(), then yes, it > > will often take way more than 5 jiffies. If it is synchronize_rcu(), > > 5 jiffies would not be unusual, especially on larger systems. > > But in the case of synchronize_rcu(), one option is to instead use > > synchronize_rcu_expedited(). It is not clear that this last is really > > justified in this case, but figured it might be worth mentioning. > > It is synchronize_rcu(), but it seems sometimes it is called several > times on one probe disabling. OK, good to know. > Anyway, without this update, I added a printk to count optimizer > queue-length and found that all optimizer call with a single kprobe > on the quenes. I think this depends on the machine, but as far as > I tested on 8-threads qemu x86, shows this result... > > Probes: 256 kprobe_events > Enable events > real 0m 0.02s > user 0m 0.00s > sys 0m 0.02s > [ 17.730548] Queue-update: 180, skipped, Total Queued: 180 > [ 17.739445] Queue-update: 1, go, Total Queued: 180 > Disable events > [ 19.744634] Queue-update: 1, go, Total Queued: 1 > [ 19.770743] Queue-update: 1, go, Total Queued: 1 > [ 19.886625] Queue-update: 1, go, Total Queued: 1 > [ 20.006631] Queue-update: 1, go, Total Queued: 1 > [ 20.126628] Queue-update: 1, go, Total Queued: 1 > [ 20.246620] Queue-update: 1, go, Total Queued: 1 > [ 20.374665] Queue-update: 1, go, Total Queued: 1 > [ 20.486617] Queue-update: 1, go, Total Queued: 1 > [ 20.606608] Queue-update: 1, go, Total Queued: 1 > [ 20.726596] Queue-update: 1, go, Total Queued: 1 > [ 20.846608] Queue-update: 1, go, Total Queued: 1 > [ 20.966723] Queue-update: 1, go, Total Queued: 1 > [ 21.086611] Queue-update: 1, go, Total Queued: 1 > [ 21.206605] Queue-update: 1, go, Total Queued: 1 > [ 21.326603] Queue-update: 1, go, Total Queued: 1 > [ 21.462609] Queue-update: 1, go, Total Queued: 1 > [ 21.566755] Queue-update: 1, go, Total Queued: 1 > [ 21.686606] Queue-update: 1, go, Total Queued: 1 > [ 21.806611] Queue-update: 1, go, Total Queued: 1 > [ 21.926609] Queue-update: 1, go, Total Queued: 1 > [ 22.046621] Queue-update: 1, go, Total Queued: 1 > [ 22.166729] Queue-update: 1, go, Total Queued: 1 > [ 22.302609] Queue-update: 1, go, Total Queued: 1 > [ 22.510627] Queue-update: 1, go, Total Queued: 1 > [ 22.536638] Queue-update: 1, go, Total Queued: 1 > [ 22.654618] Queue-update: 1, go, Total Queued: 1 > [ 22.774643] Queue-update: 1, go, Total Queued: 1 > [ 22.902609] Queue-update: 1, go, Total Queued: 1 > [ 23.022608] Queue-update: 1, go, Total Queued: 1 > [ 23.158606] Queue-update: 1, go, Total Queued: 1 > [ 23.254618] Queue-update: 1, go, Total Queued: 1 > [ 23.374647] Queue-update: 1, go, Total Queued: 1 > [ 23.494605] Queue-update: 1, go, Total Queued: 1 > [ 23.614610] Queue-update: 1, go, Total Queued: 1 > [ 23.734606] Queue-update: 1, go, Total Queued: 1 > [ 23.854609] Queue-update: 1, go, Total Queued: 1 > [ 23.974615] Queue-update: 1, go, Total Queued: 1 > [ 24.094609] Queue-update: 1, go, Total Queued: 1 > [ 24.230607] Queue-update: 1, go, Total Queued: 1 > [ 24.342625] Queue-update: 1, go, Total Queued: 1 > [ 24.475478] Queue-update: 1, go, Total Queued: 1 > [ 24.574554] Queue-update: 1, go, Total Queued: 1 > [ 24.694605] Queue-update: 1, go, Total Queued: 1 > [ 24.814585] Queue-update: 1, go, Total Queued: 1 > [ 24.934591] Queue-update: 1, go, Total Queued: 1 > [ 25.054614] Queue-update: 1, go, Total Queued: 1 > [ 25.174628] Queue-update: 1, go, Total Queued: 1 > [ 25.294637] Queue-update: 1, go, Total Queued: 1 > [ 25.414620] Queue-update: 1, go, Total Queued: 1 > [ 25.534553] Queue-update: 1, go, Total Queued: 1 > [ 25.654610] Queue-update: 1, go, Total Queued: 1 > [ 25.774627] Queue-update: 1, go, Total Queued: 1 > [ 25.894609] Queue-update: 1, go, Total Queued: 1 > [ 26.030548] Queue-update: 1, go, Total Queued: 1 > [ 26.134626] Queue-update: 1, go, Total Queued: 1 > [ 26.254620] Queue-update: 1, go, Total Queued: 1 > [ 26.373603] Queue-update: 1, go, Total Queued: 1 > [ 26.502606] Queue-update: 1, go, Total Queued: 1 > [ 26.614607] Queue-update: 1, go, Total Queued: 1 > [ 26.734610] Queue-update: 1, go, Total Queued: 1 > [ 26.854620] Queue-update: 1, go, Total Queued: 1 > [ 26.974660] Queue-update: 1, go, Total Queued: 1 > [ 27.094620] Queue-update: 1, go, Total Queued: 1 > [ 27.214596] Queue-update: 1, go, Total Queued: 1 > [ 27.334640] Queue-update: 1, go, Total Queued: 1 > [ 27.494606] Queue-update: 1, go, Total Queued: 1 > [ 27.574703] Queue-update: 1, go, Total Queued: 1 > [ 27.694609] Queue-update: 1, go, Total Queued: 1 > [ 27.814607] Queue-update: 1, go, Total Queued: 1 > [ 27.934631] Queue-update: 1, go, Total Queued: 1 > [ 28.054606] Queue-update: 1, go, Total Queued: 1 > [ 28.182685] Queue-update: 1, go, Total Queued: 1 > [ 28.318568] Queue-update: 1, go, Total Queued: 1 > [ 28.422610] Queue-update: 1, go, Total Queued: 1 > [ 28.534621] Queue-update: 1, go, Total Queued: 1 > [ 28.654618] Queue-update: 1, go, Total Queued: 1 > [ 28.774622] Queue-update: 1, go, Total Queued: 1 > [ 28.894609] Queue-update: 1, go, Total Queued: 1 > [ 29.022609] Queue-update: 1, go, Total Queued: 1 > [ 29.150615] Queue-update: 1, go, Total Queued: 1 > [ 29.262610] Queue-update: 1, go, Total Queued: 1 > [ 29.374621] Queue-update: 1, go, Total Queued: 1 > [ 29.494606] Queue-update: 1, go, Total Queued: 1 > [ 29.614616] Queue-update: 1, go, Total Queued: 1 > [ 29.734607] Queue-update: 1, go, Total Queued: 1 > [ 29.854601] Queue-update: 1, go, Total Queued: 1 > [ 29.974610] Queue-update: 1, go, Total Queued: 1 > [ 30.094625] Queue-update: 1, go, Total Queued: 1 > [ 30.214606] Queue-update: 1, go, Total Queued: 1 > [ 30.334602] Queue-update: 1, go, Total Queued: 1 > [ 30.454634] Queue-update: 1, go, Total Queued: 1 > [ 30.574606] Queue-update: 1, go, Total Queued: 1 > [ 30.694589] Queue-update: 1, go, Total Queued: 1 > [ 30.814613] Queue-update: 1, go, Total Queued: 1 > [ 30.934602] Queue-update: 1, go, Total Queued: 1 > [ 31.054605] Queue-update: 1, go, Total Queued: 1 > [ 31.182596] Queue-update: 1, go, Total Queued: 1 > [ 31.318621] Queue-update: 1, go, Total Queued: 1 > [ 31.414615] Queue-update: 1, go, Total Queued: 1 > [ 31.534610] Queue-update: 1, go, Total Queued: 1 > [ 31.670608] Queue-update: 1, go, Total Queued: 1 > [ 31.774626] Queue-update: 1, go, Total Queued: 1 > [ 31.894607] Queue-update: 1, go, Total Queued: 1 > [ 32.014609] Queue-update: 1, go, Total Queued: 1 > [ 32.134607] Queue-update: 1, go, Total Queued: 1 > [ 32.254611] Queue-update: 1, go, Total Queued: 1 > [ 32.374608] Queue-update: 1, go, Total Queued: 1 > [ 32.494619] Queue-update: 1, go, Total Queued: 1 > [ 32.614607] Queue-update: 1, go, Total Queued: 1 > [ 32.734612] Queue-update: 1, go, Total Queued: 1 > [ 32.862616] Queue-update: 1, go, Total Queued: 1 > [ 32.974620] Queue-update: 1, go, Total Queued: 1 > [ 33.110609] Queue-update: 1, go, Total Queued: 1 > [ 33.214609] Queue-update: 1, go, Total Queued: 1 > [ 33.342611] Queue-update: 1, go, Total Queued: 1 > [ 33.454607] Queue-update: 1, go, Total Queued: 1 > [ 33.574611] Queue-update: 1, go, Total Queued: 1 > [ 33.694619] Queue-update: 1, go, Total Queued: 1 > [ 33.814607] Queue-update: 1, go, Total Queued: 1 > [ 33.950614] Queue-update: 1, go, Total Queued: 1 > [ 34.062609] Queue-update: 1, go, Total Queued: 1 > [ 34.174609] Queue-update: 1, go, Total Queued: 1 > [ 34.294619] Queue-update: 1, go, Total Queued: 1 > [ 34.430533] Queue-update: 1, go, Total Queued: 1 > [ 34.534594] Queue-update: 1, go, Total Queued: 1 > [ 34.654605] Queue-update: 1, go, Total Queued: 1 > [ 34.790596] Queue-update: 1, go, Total Queued: 1 > [ 34.902611] Queue-update: 1, go, Total Queued: 1 > [ 35.014630] Queue-update: 1, go, Total Queued: 1 > [ 35.134634] Queue-update: 1, go, Total Queued: 1 > [ 35.262608] Queue-update: 1, go, Total Queued: 1 > [ 35.374634] Queue-update: 1, go, Total Queued: 1 > [ 35.494617] Queue-update: 1, go, Total Queued: 1 > [ 35.622608] Queue-update: 1, go, Total Queued: 1 > [ 35.742610] Queue-update: 1, go, Total Queued: 1 > [ 35.854572] Queue-update: 1, go, Total Queued: 1 > [ 35.982596] Queue-update: 1, go, Total Queued: 1 > [ 36.094603] Queue-update: 1, go, Total Queued: 1 > [ 36.222612] Queue-update: 1, go, Total Queued: 1 > [ 36.334610] Queue-update: 1, go, Total Queued: 1 > [ 36.454619] Queue-update: 1, go, Total Queued: 1 > [ 36.574619] Queue-update: 1, go, Total Queued: 1 > [ 36.694643] Queue-update: 1, go, Total Queued: 1 > [ 36.814614] Queue-update: 1, go, Total Queued: 1 > [ 36.934610] Queue-update: 1, go, Total Queued: 1 > [ 37.062609] Queue-update: 1, go, Total Queued: 1 > [ 37.198611] Queue-update: 1, go, Total Queued: 1 > [ 37.294618] Queue-update: 1, go, Total Queued: 1 > [ 37.414618] Queue-update: 1, go, Total Queued: 1 > [ 37.534595] Queue-update: 1, go, Total Queued: 1 > [ 37.662594] Queue-update: 1, go, Total Queued: 1 > [ 37.774610] Queue-update: 1, go, Total Queued: 1 > [ 37.894618] Queue-update: 1, go, Total Queued: 1 > [ 38.014619] Queue-update: 1, go, Total Queued: 1 > [ 38.142612] Queue-update: 1, go, Total Queued: 1 > [ 38.254609] Queue-update: 1, go, Total Queued: 1 > [ 38.374619] Queue-update: 1, go, Total Queued: 1 > [ 38.502481] Queue-update: 1, go, Total Queued: 1 > [ 38.614596] Queue-update: 1, go, Total Queued: 1 > [ 38.734597] Queue-update: 1, go, Total Queued: 1 > [ 38.854606] Queue-update: 1, go, Total Queued: 1 > [ 38.974620] Queue-update: 1, go, Total Queued: 1 > [ 39.094617] Queue-update: 1, go, Total Queued: 1 > [ 39.222597] Queue-update: 1, go, Total Queued: 1 > [ 39.334610] Queue-update: 1, go, Total Queued: 1 > [ 39.454609] Queue-update: 1, go, Total Queued: 1 > [ 39.574633] Queue-update: 1, go, Total Queued: 1 > [ 39.694611] Queue-update: 1, go, Total Queued: 1 > [ 39.814608] Queue-update: 1, go, Total Queued: 1 > [ 39.934610] Queue-update: 1, go, Total Queued: 1 > [ 40.054621] Queue-update: 1, go, Total Queued: 1 > [ 40.174611] Queue-update: 1, go, Total Queued: 1 > [ 40.297471] Queue-update: 1, go, Total Queued: 1 > [ 40.414504] Queue-update: 1, go, Total Queued: 1 > [ 40.534601] Queue-update: 1, go, Total Queued: 1 > [ 40.654611] Queue-update: 1, go, Total Queued: 1 > [ 40.774609] Queue-update: 1, go, Total Queued: 1 > [ 40.894628] Queue-update: 1, go, Total Queued: 1 > [ 41.014608] Queue-update: 1, go, Total Queued: 1 > [ 41.135594] Queue-update: 1, go, Total Queued: 1 > real 0m 21.40s > user 0m 0.00s > sys 0m 0.04s So 21.4s/256 = 84 milliseconds per event disable, correct? It might be worth trying synchronize_rcu_expedited() just as an experiment to see if it speeds things up significantly. Thanx, Paul > Remove events > real 0m 2.14s > user 0m 0.00s > sys 0m 0.04s > > > Thank you, > > > -- > Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-23 2:26 ` Paul E. McKenney @ 2020-01-23 6:13 ` Masami Hiramatsu 2020-01-23 10:03 ` Paul E. McKenney 0 siblings, 1 reply; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-23 6:13 UTC (permalink / raw) To: paulmck Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy, Masami Hiramatsu On Wed, 22 Jan 2020 18:26:47 -0800 "Paul E. McKenney" <paulmck@kernel.org> wrote: > > Anyway, without this update, I added a printk to count optimizer > > queue-length and found that all optimizer call with a single kprobe > > on the quenes. I think this depends on the machine, but as far as > > I tested on 8-threads qemu x86, shows this result... > > > > Probes: 256 kprobe_events > > Enable events > > real 0m 0.02s > > user 0m 0.00s > > sys 0m 0.02s > > [ 17.730548] Queue-update: 180, skipped, Total Queued: 180 > > [ 17.739445] Queue-update: 1, go, Total Queued: 180 > > Disable events [...] > > [ 41.135594] Queue-update: 1, go, Total Queued: 1 > > real 0m 21.40s > > user 0m 0.00s > > sys 0m 0.04s > > So 21.4s/256 = 84 milliseconds per event disable, correct? Actually, it seems only 172 probes are on the unoptimized list, so the number will be a bit different. Anyway, that above elapsed time is including non-batch optimizer working time as below. (1) start looping on probe events (2) disabling-kprobe (2.1) wait kprobe_mutex if optimizer is running (2.2) if the kprobe is on optimized kprobe, queue it to unoptimizing list and kick optimizer with 5 jiffies delay (4) unlink enabled event (5) wait synchronize_rcu() (6) optimizer start optimization before finishing (5) (7) goto (1) I think the disabling performance issue came from (6) (and (2.1)). Thus, if we change (2.2) to 1 HZ jiffies, the optimizer will start after some loops are done. (and the optimizer detects "active" queuing, postpone the process) > > It might be worth trying synchronize_rcu_expedited() just as an experiment > to see if it speeds things up significantly. Would you mean replacing synchronize_rcu() in disabling loop, or replacing synchronize_rcu_tasks() in optimizer? I think that is not a root cause of this behavior, since if we make the optimizer delay to 1 sec, it seems enough for making it a batch operation. See below, this is the result with patched kernel (1 HZ delay). Probes: 256 kprobe_events Enable events real 0m 0.07s user 0m 0.00s sys 0m 0.07s [ 19.191181] Queue-update: 180, skipped, Total Queued: 180 Disable events [ 20.214966] Queue-update: 1, go, Total Queued: 172 [ 21.302924] Queue-update: 86, skipped, Total Queued: 86 real 0m 2.11s user 0m 0.00s sys 0m 0.03s [ 22.327173] Queue-update: 87, skipped, Total Queued: 172 [ 23.350933] Queue-update: 1, go, Total Queued: 172 Remove events real 0m 2.13s user 0m 0.02s sys 0m 0.02s As you can see, the optimizer ran outside of the disabling loop. In that case, it is OK to synchronize RCU tasks in the optimizer because it just runs *once* per multiple probe events. From above result, 86 probes are disabled per 1 sec delay. Each probe disabling took 11-12 msec in average. So (HZ / 10) can also be good. (But note that the optimizer will retry to run each time until the disabling loop is finished.) BTW, testing kernel was build with HZ=1000, if HZ=250 or HZ=100, the result will be different in the current code. So I think we should use HZ-based delay instead of fixed number. Thank you, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second 2020-01-23 6:13 ` Masami Hiramatsu @ 2020-01-23 10:03 ` Paul E. McKenney 0 siblings, 0 replies; 26+ messages in thread From: Paul E. McKenney @ 2020-01-23 10:03 UTC (permalink / raw) To: Masami Hiramatsu Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy On Thu, Jan 23, 2020 at 03:13:28PM +0900, Masami Hiramatsu wrote: > On Wed, 22 Jan 2020 18:26:47 -0800 > "Paul E. McKenney" <paulmck@kernel.org> wrote: > > > > Anyway, without this update, I added a printk to count optimizer > > > queue-length and found that all optimizer call with a single kprobe > > > on the quenes. I think this depends on the machine, but as far as > > > I tested on 8-threads qemu x86, shows this result... > > > > > > Probes: 256 kprobe_events > > > Enable events > > > real 0m 0.02s > > > user 0m 0.00s > > > sys 0m 0.02s > > > [ 17.730548] Queue-update: 180, skipped, Total Queued: 180 > > > [ 17.739445] Queue-update: 1, go, Total Queued: 180 > > > Disable events > [...] > > > [ 41.135594] Queue-update: 1, go, Total Queued: 1 > > > real 0m 21.40s > > > user 0m 0.00s > > > sys 0m 0.04s > > > > So 21.4s/256 = 84 milliseconds per event disable, correct? > > Actually, it seems only 172 probes are on the unoptimized list, so > the number will be a bit different. > > Anyway, that above elapsed time is including non-batch optimizer > working time as below. > > (1) start looping on probe events > (2) disabling-kprobe > (2.1) wait kprobe_mutex if optimizer is running > (2.2) if the kprobe is on optimized kprobe, queue it to unoptimizing > list and kick optimizer with 5 jiffies delay > (4) unlink enabled event > (5) wait synchronize_rcu() > (6) optimizer start optimization before finishing (5) > (7) goto (1) > > I think the disabling performance issue came from (6) (and (2.1)). > Thus, if we change (2.2) to 1 HZ jiffies, the optimizer will start > after some loops are done. (and the optimizer detects "active" > queuing, postpone the process) > > > > > It might be worth trying synchronize_rcu_expedited() just as an experiment > > to see if it speeds things up significantly. > > Would you mean replacing synchronize_rcu() in disabling loop, or > replacing synchronize_rcu_tasks() in optimizer? The former. (There is no synchronize_rcu_tasks_expedited().) > I think that is not a root cause of this behavior, since if we > make the optimizer delay to 1 sec, it seems enough for making > it a batch operation. See below, this is the result with patched > kernel (1 HZ delay). > > Probes: 256 kprobe_events > Enable events > real 0m 0.07s > user 0m 0.00s > sys 0m 0.07s > [ 19.191181] Queue-update: 180, skipped, Total Queued: 180 > Disable events > [ 20.214966] Queue-update: 1, go, Total Queued: 172 > [ 21.302924] Queue-update: 86, skipped, Total Queued: 86 > real 0m 2.11s > user 0m 0.00s > sys 0m 0.03s > [ 22.327173] Queue-update: 87, skipped, Total Queued: 172 > [ 23.350933] Queue-update: 1, go, Total Queued: 172 > Remove events > real 0m 2.13s > user 0m 0.02s > sys 0m 0.02s > > As you can see, the optimizer ran outside of the disabling loop. > In that case, it is OK to synchronize RCU tasks in the optimizer > because it just runs *once* per multiple probe events. Even better! > >From above result, 86 probes are disabled per 1 sec delay. > Each probe disabling took 11-12 msec in average. So > (HZ / 10) can also be good. (But note that the optimizer > will retry to run each time until the disabling loop is > finished.) > > BTW, testing kernel was build with HZ=1000, if HZ=250 or HZ=100, > the result will be different in the current code. So I think > we should use HZ-based delay instead of fixed number. Agreed, the HZ-based delay seems worth a try. Thanx, Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (3 preceding siblings ...) 2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu @ 2020-01-16 14:45 ` Masami Hiramatsu 2020-01-27 15:02 ` kbuild test robot 2020-01-27 15:02 ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot 2020-01-16 14:45 ` [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y Masami Hiramatsu ` (7 subsequent siblings) 12 siblings, 2 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Use call_rcu() to defer freeing event_file_link data structure. This removes RCU synchronization from per-probe event disabling operation. Since unregistering kprobe event requires all handlers to be disabled and have finished, this also introduces a gatekeeper to ensure that. If there is any disabled event which is not finished, the unregister process can synchronize RCU once (IOW, may sleep a while.) Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/trace/trace_kprobe.c | 35 +++++++++++++++++++++++++++++------ kernel/trace/trace_probe.c | 10 ++++++++-- kernel/trace/trace_probe.h | 1 + 3 files changed, 38 insertions(+), 8 deletions(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index cbdc4f4e64c7..906af1ffe2b2 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -328,10 +328,25 @@ static inline int __enable_trace_kprobe(struct trace_kprobe *tk) return ret; } +atomic_t trace_kprobe_disabled_finished; + +static void trace_kprobe_disabled_handlers_finish(void) +{ + if (atomic_read(&trace_kprobe_disabled_finished)) + synchronize_rcu(); +} + +static void trace_kprobe_disabled_finish_cb(struct rcu_head *head) +{ + atomic_dec(&trace_kprobe_disabled_finished); + kfree(head); +} + static void __disable_trace_kprobe(struct trace_probe *tp) { struct trace_probe *pos; struct trace_kprobe *tk; + struct rcu_head *head; list_for_each_entry(pos, trace_probe_probe_list(tp), list) { tk = container_of(pos, struct trace_kprobe, tp); @@ -342,6 +357,13 @@ static void __disable_trace_kprobe(struct trace_probe *tp) else disable_kprobe(&tk->rp.kp); } + + /* Handler exit gatekeeper */ + head = kzalloc(sizeof(*head), GFP_KERNEL); + if (WARN_ON(!head)) + return; + atomic_inc(&trace_kprobe_disabled_finished); + call_rcu(head, trace_kprobe_disabled_finish_cb); } /* @@ -422,13 +444,11 @@ static int disable_trace_kprobe(struct trace_event_call *call, out: if (file) - /* - * Synchronization is done in below function. For perf event, - * file == NULL and perf_trace_event_unreg() calls - * tracepoint_synchronize_unregister() to ensure synchronize - * event. We don't need to care about it. - */ trace_probe_remove_file(tp, file); + /* + * We have no RCU synchronization here. Caller must wait for the + * completion of disabling. + */ return 0; } @@ -542,6 +562,9 @@ static int unregister_trace_kprobe(struct trace_kprobe *tk) if (trace_probe_is_enabled(&tk->tp)) return -EBUSY; + /* Make sure all disabled trace_kprobe handlers finished */ + trace_kprobe_disabled_handlers_finish(); + /* Will fail if probe is being used by ftrace or perf */ if (unregister_kprobe_event(tk)) return -EBUSY; diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index 905b10af5d5c..b18df8e1b2d6 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -1067,6 +1067,13 @@ struct event_file_link *trace_probe_get_file_link(struct trace_probe *tp, return NULL; } +static void event_file_link_free_cb(struct rcu_head *head) +{ + struct event_file_link *link = container_of(head, typeof(*link), rcu); + + kfree(link); +} + int trace_probe_remove_file(struct trace_probe *tp, struct trace_event_file *file) { @@ -1077,8 +1084,7 @@ int trace_probe_remove_file(struct trace_probe *tp, return -ENOENT; list_del_rcu(&link->list); - synchronize_rcu(); - kfree(link); + call_rcu(&link->rcu, event_file_link_free_cb); if (list_empty(&tp->event->files)) trace_probe_clear_flag(tp, TP_FLAG_TRACE); diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h index 4ee703728aec..71ac01a50815 100644 --- a/kernel/trace/trace_probe.h +++ b/kernel/trace/trace_probe.h @@ -243,6 +243,7 @@ struct trace_probe { struct event_file_link { struct trace_event_file *file; struct list_head list; + struct rcu_head rcu; }; static inline bool trace_probe_test_flag(struct trace_probe *tp, ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link 2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu @ 2020-01-27 15:02 ` kbuild test robot 2020-01-27 15:02 ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot 1 sibling, 0 replies; 26+ messages in thread From: kbuild test robot @ 2020-01-27 15:02 UTC (permalink / raw) To: Masami Hiramatsu Cc: kbuild-all, Brendan Gregg, Steven Rostedt, Alexei Starovoitov, mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Hi Masami, I love your patch! Perhaps something to improve: [auto build test WARNING on next-20200115] [cannot apply to tip/perf/core ia64/next tip/x86/core linus/master v5.5-rc6 v5.5-rc5 v5.5-rc4 v5.5] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/tracing-kprobes-Introduce-async-unregistration/20200117-191143 base: 5b483a1a0ea1ab19a5734051c9692c2cfabe1327 reproduce: # apt-get install sparse # sparse version: v0.6.1-153-g47b6dfef-dirty make ARCH=x86_64 allmodconfig make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' If you fix the issue, kindly add following tag Reported-by: kbuild test robot <lkp@intel.com> sparse warnings: (new ones prefixed by >>) >> kernel/trace/trace_kprobe.c:331:10: sparse: sparse: symbol 'trace_kprobe_disabled_finished' was not declared. Should it be static? Please review and possibly fold the followup patch. --- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static 2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu 2020-01-27 15:02 ` kbuild test robot @ 2020-01-27 15:02 ` kbuild test robot 2020-01-28 15:02 ` Masami Hiramatsu 1 sibling, 1 reply; 26+ messages in thread From: kbuild test robot @ 2020-01-27 15:02 UTC (permalink / raw) To: Masami Hiramatsu Cc: kbuild-all, Brendan Gregg, Steven Rostedt, Alexei Starovoitov, mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Fixes: 3c794bf25a2b ("tracing/kprobe: Use call_rcu to defer freeing event_file_link") Signed-off-by: kbuild test robot <lkp@intel.com> --- trace_kprobe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 1a5882bb77471..fba738aa458af 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -328,7 +328,7 @@ static inline int __enable_trace_kprobe(struct trace_kprobe *tk) return ret; } -atomic_t trace_kprobe_disabled_finished; +static atomic_t trace_kprobe_disabled_finished; static void trace_kprobe_disabled_handlers_finish(void) { ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static 2020-01-27 15:02 ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot @ 2020-01-28 15:02 ` Masami Hiramatsu 0 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-28 15:02 UTC (permalink / raw) To: kbuild test robot Cc: kbuild-all, Brendan Gregg, Steven Rostedt, Alexei Starovoitov, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy On Mon, 27 Jan 2020 23:02:43 +0800 kbuild test robot <lkp@intel.com> wrote: > > Fixes: 3c794bf25a2b ("tracing/kprobe: Use call_rcu to defer freeing event_file_link") > Signed-off-by: kbuild test robot <lkp@intel.com> > --- > trace_kprobe.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c > index 1a5882bb77471..fba738aa458af 100644 > --- a/kernel/trace/trace_kprobe.c > +++ b/kernel/trace/trace_kprobe.c > @@ -328,7 +328,7 @@ static inline int __enable_trace_kprobe(struct trace_kprobe *tk) > return ret; > } > > -atomic_t trace_kprobe_disabled_finished; > +static atomic_t trace_kprobe_disabled_finished; > > static void trace_kprobe_disabled_handlers_finish(void) > { Oops, right. I forgot the static. I'll update it. Thanks, -- Masami Hiramatsu <mhiramat@kernel.org> ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (4 preceding siblings ...) 2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu @ 2020-01-16 14:45 ` Masami Hiramatsu 2020-01-16 14:45 ` [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held Masami Hiramatsu ` (6 subsequent siblings) 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy As we did in commit a30b85df7d59 ("kprobes: Use synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y"), we can also enable kprobe- booster which depends on trampoline execution buffer as same as optprobe. Before releasing the trampoline buffer (kprobe_insn_page), the garbage collector waits for all potentially preempted tasks on the trampoline bufer using synchronize_rcu_tasks() instead of synchronize_rcu(). This requires to enable CONFIG_TASKS_RCU=y too, so this also introduces HAVE_KPROBES_BOOSTER for the archs which supports kprobe-booster (currently only x86 and ia64.) If both of CONFIG_PREEMPTION and HAVE_KPROBES_BOOSTER is y, CONFIG_KPROBES selects CONFIG_TASKS_RCU=y. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- arch/Kconfig | 4 ++++ arch/ia64/Kconfig | 1 + arch/ia64/kernel/kprobes.c | 3 +-- arch/x86/Kconfig | 1 + arch/x86/kernel/kprobes/core.c | 2 -- kernel/kprobes.c | 4 ++-- 6 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 48b5e103bdb0..ead87084c8bf 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -64,6 +64,7 @@ config KPROBES depends on MODULES depends on HAVE_KPROBES select KALLSYMS + select TASKS_RCU if PREEMPTION && HAVE_KPROBES_BOOSTER help Kprobes allows you to trap at almost any kernel address and execute a callback function. register_kprobe() establishes @@ -189,6 +190,9 @@ config HAVE_KPROBES config HAVE_KRETPROBES bool +config HAVE_KPROBES_BOOSTER + bool + config HAVE_OPTPROBES bool diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index bab7cd878464..341f9ca8a745 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -25,6 +25,7 @@ config IA64 select HAVE_IDE select HAVE_OPROFILE select HAVE_KPROBES + select HAVE_KPROBES_BOOSTER select HAVE_KRETPROBES select HAVE_FTRACE_MCOUNT_RECORD select HAVE_DYNAMIC_FTRACE if (!ITANIUM) diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c index a6d6a0556f08..1680a10c9f49 100644 --- a/arch/ia64/kernel/kprobes.c +++ b/arch/ia64/kernel/kprobes.c @@ -841,7 +841,6 @@ static int __kprobes pre_kprobes_handler(struct die_args *args) return 1; } -#if !defined(CONFIG_PREEMPTION) if (p->ainsn.inst_flag == INST_FLAG_BOOSTABLE && !p->post_handler) { /* Boost up -- we can execute copied instructions directly */ ia64_psr(regs)->ri = p->ainsn.slot; @@ -853,7 +852,7 @@ static int __kprobes pre_kprobes_handler(struct die_args *args) preempt_enable_no_resched(); return 1; } -#endif + prepare_ss(p, regs); kcb->kprobe_status = KPROBE_HIT_SS; return 1; diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e5800e52a59a..d509578d824b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -181,6 +181,7 @@ config X86 select HAVE_KERNEL_LZO select HAVE_KERNEL_XZ select HAVE_KPROBES + select HAVE_KPROBES_BOOSTER select HAVE_KPROBES_ON_FTRACE select HAVE_FUNCTION_ERROR_INJECTION select HAVE_KRETPROBES diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c index 4d7022a740ab..7aba45037885 100644 --- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -587,7 +587,6 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs, if (setup_detour_execution(p, regs, reenter)) return; -#if !defined(CONFIG_PREEMPTION) if (p->ainsn.boostable && !p->post_handler) { /* Boost up -- we can execute copied instructions directly */ if (!reenter) @@ -600,7 +599,6 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs, regs->ip = (unsigned long)p->ainsn.insn; return; } -#endif if (reenter) { save_previous_kprobe(kcb); set_current_kprobe(p, regs, kcb); diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 9c6e230852ad..848c14e92ccc 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -220,8 +220,8 @@ static int collect_garbage_slots(struct kprobe_insn_cache *c) { struct kprobe_insn_page *kip, *next; - /* Ensure no-one is interrupted on the garbages */ - synchronize_rcu(); + /* Ensure no-one is running on the garbages. */ + synchronize_rcu_tasks(); list_for_each_entry_safe(kip, next, &c->pages, list) { int i; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (5 preceding siblings ...) 2020-01-16 14:45 ` [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y Masami Hiramatsu @ 2020-01-16 14:45 ` Masami Hiramatsu 2020-01-16 14:45 ` [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages Masami Hiramatsu ` (5 subsequent siblings) 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Use normal list traversal API instead of rcu_read_lock, RCU list traversal and rcu_read_unlock pair if a mutex which protects the list is held in the methods of kprobe_insn_cache. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/kprobes.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 848c14e92ccc..09b0e33bc845 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -141,8 +141,7 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) /* Since the slot array is not protected by rcu, we need a mutex */ mutex_lock(&c->mutex); retry: - rcu_read_lock(); - list_for_each_entry_rcu(kip, &c->pages, list) { + list_for_each_entry(kip, &c->pages, list) { if (kip->nused < slots_per_page(c)) { int i; for (i = 0; i < slots_per_page(c); i++) { @@ -150,7 +149,6 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) kip->slot_used[i] = SLOT_USED; kip->nused++; slot = kip->insns + (i * c->insn_size); - rcu_read_unlock(); goto out; } } @@ -159,7 +157,6 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) WARN_ON(1); } } - rcu_read_unlock(); /* If there are any garbage slots, collect it and try again. */ if (c->nr_garbage && collect_garbage_slots(c) == 0) @@ -244,8 +241,7 @@ void __free_insn_slot(struct kprobe_insn_cache *c, long idx; mutex_lock(&c->mutex); - rcu_read_lock(); - list_for_each_entry_rcu(kip, &c->pages, list) { + list_for_each_entry(kip, &c->pages, list) { idx = ((long)slot - (long)kip->insns) / (c->insn_size * sizeof(kprobe_opcode_t)); if (idx >= 0 && idx < slots_per_page(c)) @@ -255,7 +251,6 @@ void __free_insn_slot(struct kprobe_insn_cache *c, WARN_ON(1); kip = NULL; out: - rcu_read_unlock(); /* Mark and sweep: this may sleep */ if (kip) { /* Check double free */ ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (6 preceding siblings ...) 2020-01-16 14:45 ` [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held Masami Hiramatsu @ 2020-01-16 14:45 ` Masami Hiramatsu 2020-01-16 14:45 ` [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously Masami Hiramatsu ` (4 subsequent siblings) 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Use workqueues for reclaiming kprobe insn cache pages. This can split the heaviest part from the unregistration process. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- include/linux/kprobes.h | 2 ++ kernel/kprobes.c | 29 ++++++++++++++++++----------- 2 files changed, 20 insertions(+), 11 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 04bdaf01112c..0f832817fca3 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -245,6 +245,7 @@ struct kprobe_insn_cache { struct list_head pages; /* list of kprobe_insn_page */ size_t insn_size; /* size of instruction slot */ int nr_garbage; + struct work_struct work; }; #ifdef __ARCH_WANT_KPROBES_INSN_SLOT @@ -254,6 +255,7 @@ extern void __free_insn_slot(struct kprobe_insn_cache *c, /* sleep-less address checking routine */ extern bool __is_insn_slot_addr(struct kprobe_insn_cache *c, unsigned long addr); +void kprobe_insn_cache_gc(struct work_struct *work); #define DEFINE_INSN_CACHE_OPS(__name) \ extern struct kprobe_insn_cache kprobe_##__name##_slots; \ diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 09b0e33bc845..a9114923da4c 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -126,8 +126,15 @@ struct kprobe_insn_cache kprobe_insn_slots = { .pages = LIST_HEAD_INIT(kprobe_insn_slots.pages), .insn_size = MAX_INSN_SIZE, .nr_garbage = 0, + .work = __WORK_INITIALIZER(kprobe_insn_slots.work, + kprobe_insn_cache_gc), }; -static int collect_garbage_slots(struct kprobe_insn_cache *c); + +static void kick_kprobe_insn_cache_gc(struct kprobe_insn_cache *c) +{ + if (!work_pending(&c->work)) + schedule_work(&c->work); +} /** * __get_insn_slot() - Find a slot on an executable page for an instruction. @@ -140,7 +147,6 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) /* Since the slot array is not protected by rcu, we need a mutex */ mutex_lock(&c->mutex); - retry: list_for_each_entry(kip, &c->pages, list) { if (kip->nused < slots_per_page(c)) { int i; @@ -158,11 +164,7 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) } } - /* If there are any garbage slots, collect it and try again. */ - if (c->nr_garbage && collect_garbage_slots(c) == 0) - goto retry; - - /* All out of space. Need to allocate a new page. */ + /* All out of space. Need to allocate a new page. */ kip = kmalloc(KPROBE_INSN_PAGE_SIZE(slots_per_page(c)), GFP_KERNEL); if (!kip) goto out; @@ -213,10 +215,12 @@ static int collect_one_slot(struct kprobe_insn_page *kip, int idx) return 0; } -static int collect_garbage_slots(struct kprobe_insn_cache *c) +void kprobe_insn_cache_gc(struct work_struct *work) { + struct kprobe_insn_cache *c = container_of(work, typeof(*c), work); struct kprobe_insn_page *kip, *next; + mutex_lock(&c->mutex); /* Ensure no-one is running on the garbages. */ synchronize_rcu_tasks(); @@ -226,12 +230,13 @@ static int collect_garbage_slots(struct kprobe_insn_cache *c) continue; kip->ngarbage = 0; /* we will collect all garbages */ for (i = 0; i < slots_per_page(c); i++) { - if (kip->slot_used[i] == SLOT_DIRTY && collect_one_slot(kip, i)) + if (kip->slot_used[i] == SLOT_DIRTY && + collect_one_slot(kip, i)) break; } } c->nr_garbage = 0; - return 0; + mutex_unlock(&c->mutex); } void __free_insn_slot(struct kprobe_insn_cache *c, @@ -259,7 +264,7 @@ void __free_insn_slot(struct kprobe_insn_cache *c, kip->slot_used[idx] = SLOT_DIRTY; kip->ngarbage++; if (++c->nr_garbage > slots_per_page(c)) - collect_garbage_slots(c); + kick_kprobe_insn_cache_gc(c); } else { collect_one_slot(kip, idx); } @@ -299,6 +304,8 @@ struct kprobe_insn_cache kprobe_optinsn_slots = { .pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages), /* .insn_size is initialized later */ .nr_garbage = 0, + .work = __WORK_INITIALIZER(kprobe_optinsn_slots.work, + kprobe_insn_cache_gc), }; #endif #endif ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (7 preceding siblings ...) 2020-01-16 14:45 ` [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages Masami Hiramatsu @ 2020-01-16 14:45 ` Masami Hiramatsu 2020-01-16 14:45 ` [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less Masami Hiramatsu ` (3 subsequent siblings) 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Free the kprobe_insn_page data structure asynchronously using call_rcu(). Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/kprobes.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index a9114923da4c..60ffc9d54d87 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -87,6 +87,7 @@ static LIST_HEAD(kprobe_blacklist); */ struct kprobe_insn_page { struct list_head list; + struct rcu_head rcu; kprobe_opcode_t *insns; /* Page of instruction slots */ struct kprobe_insn_cache *cache; int nused; @@ -192,6 +193,13 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) return slot; } +static void free_kprobe_insn_page(struct rcu_head *head) +{ + struct kprobe_insn_page *kip = container_of(head, typeof(*kip), rcu); + + kfree(kip); +} + /* Return 1 if all garbages are collected, otherwise 0. */ static int collect_one_slot(struct kprobe_insn_page *kip, int idx) { @@ -206,9 +214,8 @@ static int collect_one_slot(struct kprobe_insn_page *kip, int idx) */ if (!list_is_singular(&kip->list)) { list_del_rcu(&kip->list); - synchronize_rcu(); kip->cache->free(kip->insns); - kfree(kip); + call_rcu(&kip->rcu, free_kprobe_insn_page); } return 1; } ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (8 preceding siblings ...) 2020-01-16 14:45 ` [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously Masami Hiramatsu @ 2020-01-16 14:45 ` Masami Hiramatsu 2020-01-16 14:46 ` [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs Masami Hiramatsu ` (2 subsequent siblings) 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Rewrite kprobe_insn_cache implementation so that free_*insn_slot() do not acquire kprobe_insn_cache->mutex. This allows us to call it from call_rcu() callback function. For this purpose, this introduces flip-flop dirty generation in kprobe_insn_cache. When __free_insn_slot() is called, if a slot is dirty (this means it can be in use even after RCU grace period), it marks as "current-generation" dirty. The garbage collector (kprobe_insn_cache_gc()) flips the generation bit, waits enough safe period by synchronize_rcu_tasks(), and collect "previous- generation" dirty slots. In the results, it collects the dirty slots which was returned by __free_insn_slot() before the GC starts waiting the period (and the dirty slots which is returned while the safe period, will be marked as new-generation dirty.) Since the GC is not concurrently running, we do not need more than 2 generations. So it flips the generation bit instead of counting it up. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- arch/powerpc/kernel/optprobes.c | 1 include/linux/kprobes.h | 2 kernel/kprobes.c | 172 ++++++++++++++++++++++----------------- 3 files changed, 96 insertions(+), 79 deletions(-) diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c index 024f7aad1952..8304f3814515 100644 --- a/arch/powerpc/kernel/optprobes.c +++ b/arch/powerpc/kernel/optprobes.c @@ -53,7 +53,6 @@ struct kprobe_insn_cache kprobe_ppc_optinsn_slots = { /* insn_size initialized later */ .alloc = __ppc_alloc_insn_page, .free = __ppc_free_insn_page, - .nr_garbage = 0, }; /* diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 0f832817fca3..1cd53b7b8409 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -244,7 +244,7 @@ struct kprobe_insn_cache { void (*free)(void *); /* free insn page */ struct list_head pages; /* list of kprobe_insn_page */ size_t insn_size; /* size of instruction slot */ - int nr_garbage; + int generation; /* dirty generation */ struct work_struct work; }; diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 60ffc9d54d87..5c12eb7fa8e1 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -90,8 +90,7 @@ struct kprobe_insn_page { struct rcu_head rcu; kprobe_opcode_t *insns; /* Page of instruction slots */ struct kprobe_insn_cache *cache; - int nused; - int ngarbage; + atomic_t nused; char slot_used[]; }; @@ -106,8 +105,9 @@ static int slots_per_page(struct kprobe_insn_cache *c) enum kprobe_slot_state { SLOT_CLEAN = 0, - SLOT_DIRTY = 1, - SLOT_USED = 2, + SLOT_USED = 1, + SLOT_DIRTY0 = 2, + SLOT_DIRTY1 = 3, }; void __weak *alloc_insn_page(void) @@ -126,7 +126,6 @@ struct kprobe_insn_cache kprobe_insn_slots = { .free = free_insn_page, .pages = LIST_HEAD_INIT(kprobe_insn_slots.pages), .insn_size = MAX_INSN_SIZE, - .nr_garbage = 0, .work = __WORK_INITIALIZER(kprobe_insn_slots.work, kprobe_insn_cache_gc), }; @@ -137,6 +136,22 @@ static void kick_kprobe_insn_cache_gc(struct kprobe_insn_cache *c) schedule_work(&c->work); } +static void *try_to_get_clean_slot(struct kprobe_insn_page *kip) +{ + struct kprobe_insn_cache *c = kip->cache; + int i; + + for (i = 0; i < slots_per_page(c); i++) { + if (kip->slot_used[i] == SLOT_CLEAN) { + kip->slot_used[i] = SLOT_USED; + atomic_inc(&kip->nused); + return kip->insns + (i * c->insn_size); + } + } + + return NULL; +} + /** * __get_insn_slot() - Find a slot on an executable page for an instruction. * We allocate an executable page if there's no room on existing ones. @@ -145,25 +160,20 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) { struct kprobe_insn_page *kip; kprobe_opcode_t *slot = NULL; + bool reclaimable = false; /* Since the slot array is not protected by rcu, we need a mutex */ mutex_lock(&c->mutex); list_for_each_entry(kip, &c->pages, list) { - if (kip->nused < slots_per_page(c)) { - int i; - for (i = 0; i < slots_per_page(c); i++) { - if (kip->slot_used[i] == SLOT_CLEAN) { - kip->slot_used[i] = SLOT_USED; - kip->nused++; - slot = kip->insns + (i * c->insn_size); - goto out; - } - } - /* kip->nused is broken. Fix it. */ - kip->nused = slots_per_page(c); - WARN_ON(1); + if (atomic_read(&kip->nused) < slots_per_page(c)) { + slot = try_to_get_clean_slot(kip); + if (slot) + goto out; + reclaimable = true; } } + if (reclaimable) + kick_kprobe_insn_cache_gc(c); /* All out of space. Need to allocate a new page. */ kip = kmalloc(KPROBE_INSN_PAGE_SIZE(slots_per_page(c)), GFP_KERNEL); @@ -183,8 +193,7 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) INIT_LIST_HEAD(&kip->list); memset(kip->slot_used, SLOT_CLEAN, slots_per_page(c)); kip->slot_used[0] = SLOT_USED; - kip->nused = 1; - kip->ngarbage = 0; + atomic_set(&kip->nused, 1); kip->cache = c; list_add_rcu(&kip->list, &c->pages); slot = kip->insns; @@ -193,90 +202,106 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c) return slot; } -static void free_kprobe_insn_page(struct rcu_head *head) +static void free_kprobe_insn_page_cb(struct rcu_head *head) { struct kprobe_insn_page *kip = container_of(head, typeof(*kip), rcu); kfree(kip); } -/* Return 1 if all garbages are collected, otherwise 0. */ -static int collect_one_slot(struct kprobe_insn_page *kip, int idx) +static void free_kprobe_insn_page(struct kprobe_insn_page *kip) { - kip->slot_used[idx] = SLOT_CLEAN; - kip->nused--; - if (kip->nused == 0) { - /* - * Page is no longer in use. Free it unless - * it's the last one. We keep the last one - * so as not to have to set it up again the - * next time somebody inserts a probe. - */ - if (!list_is_singular(&kip->list)) { - list_del_rcu(&kip->list); - kip->cache->free(kip->insns); - call_rcu(&kip->rcu, free_kprobe_insn_page); - } - return 1; + if (WARN_ON_ONCE(atomic_read(&kip->nused) != 0)) + return; + /* + * Page is no longer in use. Free it unless + * it's the last one. We keep the last one + * so as not to have to set it up again the + * next time somebody inserts a probe. + */ + if (!list_is_singular(&kip->list)) { + list_del_rcu(&kip->list); + kip->cache->free(kip->insns); + call_rcu(&kip->rcu, free_kprobe_insn_page_cb); } - return 0; } void kprobe_insn_cache_gc(struct work_struct *work) { struct kprobe_insn_cache *c = container_of(work, typeof(*c), work); struct kprobe_insn_page *kip, *next; + int dirtygen = c->generation ? SLOT_DIRTY1 : SLOT_DIRTY0; + int i, nr; mutex_lock(&c->mutex); + + c->generation ^= 1; /* flip generation (0->1, 1->0) */ + + /* Make sure the generation update is shown in __free_insn_slot() */ + smp_wmb(); + /* Ensure no-one is running on the garbages. */ synchronize_rcu_tasks(); list_for_each_entry_safe(kip, next, &c->pages, list) { - int i; - if (kip->ngarbage == 0) - continue; - kip->ngarbage = 0; /* we will collect all garbages */ + nr = 0; + /* Reclaim previous generation dirty slots */ for (i = 0; i < slots_per_page(c); i++) { - if (kip->slot_used[i] == SLOT_DIRTY && - collect_one_slot(kip, i)) - break; + if (kip->slot_used[i] == dirtygen) + kip->slot_used[i] = SLOT_CLEAN; + else if (kip->slot_used[i] != SLOT_CLEAN) + nr++; } + if (!nr) + free_kprobe_insn_page(kip); } - c->nr_garbage = 0; mutex_unlock(&c->mutex); } +static struct kprobe_insn_page * +find_kprobe_insn_page(struct kprobe_insn_cache *c, unsigned long addr) +{ + struct kprobe_insn_page *kip; + + list_for_each_entry_rcu(kip, &c->pages, list) { + if (addr >= (unsigned long)kip->insns && + addr < (unsigned long)kip->insns + PAGE_SIZE) + return kip; + } + return NULL; +} + void __free_insn_slot(struct kprobe_insn_cache *c, kprobe_opcode_t *slot, int dirty) { struct kprobe_insn_page *kip; + int dirtygen; long idx; - mutex_lock(&c->mutex); - list_for_each_entry(kip, &c->pages, list) { + rcu_read_lock(); + kip = find_kprobe_insn_page(c, (unsigned long)slot); + if (kip) { idx = ((long)slot - (long)kip->insns) / (c->insn_size * sizeof(kprobe_opcode_t)); - if (idx >= 0 && idx < slots_per_page(c)) + /* Check double free */ + if (WARN_ON(kip->slot_used[idx] != SLOT_USED)) goto out; - } - /* Could not find this slot. */ - WARN_ON(1); - kip = NULL; + + /* Make sure to use new generation */ + smp_rmb(); + + dirtygen = c->generation ? SLOT_DIRTY1 : SLOT_DIRTY0; + if (dirty) + kip->slot_used[idx] = dirtygen; + else + kip->slot_used[idx] = SLOT_CLEAN; + + if (!atomic_dec_return(&kip->nused)) + kick_kprobe_insn_cache_gc(c); + } else + WARN_ON(1); /* Not found: what happen? */ out: - /* Mark and sweep: this may sleep */ - if (kip) { - /* Check double free */ - WARN_ON(kip->slot_used[idx] != SLOT_USED); - if (dirty) { - kip->slot_used[idx] = SLOT_DIRTY; - kip->ngarbage++; - if (++c->nr_garbage > slots_per_page(c)) - kick_kprobe_insn_cache_gc(c); - } else { - collect_one_slot(kip, idx); - } - } - mutex_unlock(&c->mutex); + rcu_read_unlock(); } /* @@ -286,17 +311,11 @@ void __free_insn_slot(struct kprobe_insn_cache *c, */ bool __is_insn_slot_addr(struct kprobe_insn_cache *c, unsigned long addr) { - struct kprobe_insn_page *kip; bool ret = false; rcu_read_lock(); - list_for_each_entry_rcu(kip, &c->pages, list) { - if (addr >= (unsigned long)kip->insns && - addr < (unsigned long)kip->insns + PAGE_SIZE) { - ret = true; - break; - } - } + if (find_kprobe_insn_page(c, addr)) + ret = true; rcu_read_unlock(); return ret; @@ -310,7 +329,6 @@ struct kprobe_insn_cache kprobe_optinsn_slots = { .free = free_insn_page, .pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages), /* .insn_size is initialized later */ - .nr_garbage = 0, .work = __WORK_INITIALIZER(kprobe_optinsn_slots.work, kprobe_insn_cache_gc), }; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (9 preceding siblings ...) 2020-01-16 14:45 ` [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less Masami Hiramatsu @ 2020-01-16 14:46 ` Masami Hiramatsu 2020-01-16 14:46 ` [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously Masami Hiramatsu 2020-01-16 14:46 ` [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe " Masami Hiramatsu 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:46 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Add asynchronous unregistration APIs for kprobes and kretprobes. These APIs can accelerate the unregistration process of multiple probes because user do not need to wait for RCU sync. However, caller must take care of following notes. - If you wants to synchronize unregistration (for example, making sure all handlers are running out), you have to use synchronize_rcu() once at last. - If you need to free objects which related to the kprobes, you can pass a callback, but that callback must call kprobe_free_callback() or kretprobe_free_callback() at first. Since it is easy to shoot your foot, at this moment I don't export these APIs to modules. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- include/linux/kprobes.h | 9 ++++++++ kernel/kprobes.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 1cd53b7b8409..f892c3a11dac 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -98,6 +98,9 @@ struct kprobe { * Protected by kprobe_mutex after this kprobe is registered. */ u32 flags; + + /* For asynchronous unregistration callback */ + struct rcu_head rcu; }; /* Kprobe status flags */ @@ -364,6 +367,12 @@ void unregister_kretprobe(struct kretprobe *rp); int register_kretprobes(struct kretprobe **rps, int num); void unregister_kretprobes(struct kretprobe **rps, int num); +/* Async unregister APIs (Do not wait for rcu sync) */ +void kprobe_free_callback(struct rcu_head *head); +void kretprobe_free_callback(struct rcu_head *head); +void unregister_kprobe_async(struct kprobe *kp, rcu_callback_t free_cb); +void unregister_kretprobe_async(struct kretprobe *kp, rcu_callback_t free_cb); + void kprobe_flush_task(struct task_struct *tk); void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head); diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 5c12eb7fa8e1..ab57c22b64f9 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1887,6 +1887,31 @@ void unregister_kprobes(struct kprobe **kps, int num) } EXPORT_SYMBOL_GPL(unregister_kprobes); +void kprobe_free_callback(struct rcu_head *head) +{ + struct kprobe *kp = container_of(head, struct kprobe, rcu); + + __unregister_kprobe_bottom(kp); +} + +/* + * If you call this function, you must call kprobe_free_callback() at first + * in your free_cb(), or set free_cb = NULL. + */ +void unregister_kprobe_async(struct kprobe *kp, rcu_callback_t free_cb) +{ + mutex_lock(&kprobe_mutex); + if (__unregister_kprobe_top(kp) < 0) + kp->addr = NULL; + mutex_unlock(&kprobe_mutex); + + if (!kp->addr) + return; + if (!free_cb) + free_cb = kprobe_free_callback; + call_rcu(&kp->rcu, free_cb); +} + int __weak kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *data) { @@ -2080,6 +2105,29 @@ void unregister_kretprobes(struct kretprobe **rps, int num) } EXPORT_SYMBOL_GPL(unregister_kretprobes); +void kretprobe_free_callback(struct rcu_head *head) +{ + struct kprobe *kp = container_of(head, struct kprobe, rcu); + struct kretprobe *rp = container_of(kp, struct kretprobe, kp); + + __unregister_kprobe_bottom(kp); + cleanup_rp_inst(rp); +} + +void unregister_kretprobe_async(struct kretprobe *rp, rcu_callback_t free_cb) +{ + mutex_lock(&kprobe_mutex); + if (__unregister_kprobe_top(&rp->kp) < 0) + rp->kp.addr = NULL; + mutex_unlock(&kprobe_mutex); + + if (!rp->kp.addr) + return; + if (!free_cb) + free_cb = kretprobe_free_callback; + call_rcu(&rp->kp.rcu, free_cb); +} + #else /* CONFIG_KRETPROBES */ int register_kretprobe(struct kretprobe *rp) { @@ -2109,6 +2157,14 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs) } NOKPROBE_SYMBOL(pre_handler_kretprobe); +void kretprobe_free_callback(struct rcu_head *head) +{ +} + +void unregister_kretprobe_async(struct kretprobe *rp, rcu_callback_t free_cb) +{ +} + #endif /* CONFIG_KRETPROBES */ /* Set the kprobe gone and remove its instruction buffer. */ ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (10 preceding siblings ...) 2020-01-16 14:46 ` [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs Masami Hiramatsu @ 2020-01-16 14:46 ` Masami Hiramatsu 2020-01-16 14:46 ` [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe " Masami Hiramatsu 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:46 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Free each probe event data structure asynchronously when deleting probe events. But this finally synchronizes RCU so that we make sure all event handlers have finished. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/trace/trace_dynevent.c | 5 ++++ kernel/trace/trace_kprobe.c | 46 +++++++++++++++++++++++++++++++++++------ 2 files changed, 44 insertions(+), 7 deletions(-) diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c index 89779eb84a07..2d5e8d457309 100644 --- a/kernel/trace/trace_dynevent.c +++ b/kernel/trace/trace_dynevent.c @@ -70,6 +70,9 @@ int dyn_event_release(int argc, char **argv, struct dyn_event_operations *type) if (ret) break; } + + /* Wait for running events because of async event unregistration */ + synchronize_rcu(); mutex_unlock(&event_mutex); return ret; @@ -164,6 +167,8 @@ int dyn_events_release_all(struct dyn_event_operations *type) if (ret) break; } + /* Wait for running events because of async event unregistration */ + synchronize_rcu(); out: mutex_unlock(&event_mutex); diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 906af1ffe2b2..f7e0370b10ae 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -551,7 +551,35 @@ static void __unregister_trace_kprobe(struct trace_kprobe *tk) } } -/* Unregister a trace_probe and probe_event */ +static void free_trace_kprobe_cb(struct rcu_head *head) +{ + struct kprobe *kp = container_of(head, struct kprobe, rcu); + struct kretprobe *rp = container_of(kp, struct kretprobe, kp); + struct trace_kprobe *tk = container_of(rp, struct trace_kprobe, rp); + + if (trace_kprobe_is_return(tk)) + kretprobe_free_callback(head); + else + kprobe_free_callback(head); + free_trace_kprobe(tk); +} + +static void __unregister_trace_kprobe_async(struct trace_kprobe *tk) +{ + if (trace_kprobe_is_registered(tk)) { + if (trace_kprobe_is_return(tk)) + unregister_kretprobe_async(&tk->rp, + free_trace_kprobe_cb); + else + unregister_kprobe_async(&tk->rp.kp, + free_trace_kprobe_cb); + } +} + +/* + * Unregister a trace_probe and probe_event asynchronously. + * Caller must wait for RCU. + */ static int unregister_trace_kprobe(struct trace_kprobe *tk) { /* If other probes are on the event, just unregister kprobe */ @@ -570,9 +598,17 @@ static int unregister_trace_kprobe(struct trace_kprobe *tk) return -EBUSY; unreg: - __unregister_trace_kprobe(tk); dyn_event_remove(&tk->devent); + /* + * This trace_probe_unlink() can free the trace_event_call linked to + * this probe. + * We can do this before unregistering because this probe is + * already disabled and the disabling process waits enough period + * for all handlers finished. IOW, the disabling process must wait + * RCU sync at least once before returning to its caller. + */ trace_probe_unlink(&tk->tp); + __unregister_trace_kprobe_async(tk); return 0; } @@ -928,11 +964,7 @@ static int create_or_delete_trace_kprobe(int argc, char **argv) static int trace_kprobe_release(struct dyn_event *ev) { struct trace_kprobe *tk = to_trace_kprobe(ev); - int ret = unregister_trace_kprobe(tk); - - if (!ret) - free_trace_kprobe(tk); - return ret; + return unregister_trace_kprobe(tk); } static int trace_kprobe_show(struct seq_file *m, struct dyn_event *ev) ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe event asynchronously 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu ` (11 preceding siblings ...) 2020-01-16 14:46 ` [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously Masami Hiramatsu @ 2020-01-16 14:46 ` Masami Hiramatsu 12 siblings, 0 replies; 26+ messages in thread From: Masami Hiramatsu @ 2020-01-16 14:46 UTC (permalink / raw) To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel, Naveen N . Rao, Anil S Keshavamurthy Remove local kprobe event asynchronously. Note that this can asynchronously remove a kprobe_event part, but the perf_event needs to wait for all handlers finished before removing the local kprobe event. So from the perf_event (and eBPF) point of view, this shortens the trace termination process a bit, but it still takes O(n) time to finish it. To fix this issue, we need to change perf_event terminating process by decoupling "disable events" and "destroy events" as in ftrace. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> --- kernel/trace/trace_kprobe.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index f7e0370b10ae..e8c4828c21ae 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -1707,9 +1707,7 @@ void destroy_local_trace_kprobe(struct trace_event_call *event_call) return; } - __unregister_trace_kprobe(tk); - - free_trace_kprobe(tk); + __unregister_trace_kprobe_async(tk); } #endif /* CONFIG_PERF_EVENTS */ ^ permalink raw reply related [flat|nested] 26+ messages in thread
end of thread, other threads:[~2020-01-28 15:02 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered Masami Hiramatsu 2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu 2020-01-22 0:29 ` Steven Rostedt 2020-01-22 7:23 ` Masami Hiramatsu 2020-01-22 12:11 ` Steven Rostedt 2020-01-22 13:12 ` Masami Hiramatsu 2020-01-22 16:54 ` Paul E. McKenney 2020-01-23 1:33 ` Masami Hiramatsu 2020-01-23 2:26 ` Paul E. McKenney 2020-01-23 6:13 ` Masami Hiramatsu 2020-01-23 10:03 ` Paul E. McKenney 2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu 2020-01-27 15:02 ` kbuild test robot 2020-01-27 15:02 ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot 2020-01-28 15:02 ` Masami Hiramatsu 2020-01-16 14:45 ` [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y Masami Hiramatsu 2020-01-16 14:45 ` [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held Masami Hiramatsu 2020-01-16 14:45 ` [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages Masami Hiramatsu 2020-01-16 14:45 ` [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously Masami Hiramatsu 2020-01-16 14:45 ` [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less Masami Hiramatsu 2020-01-16 14:46 ` [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs Masami Hiramatsu 2020-01-16 14:46 ` [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously Masami Hiramatsu 2020-01-16 14:46 ` [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe " Masami Hiramatsu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).