BPF Archive on lore.kernel.org
 help / color / Atom feed
* [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration
@ 2020-01-16 14:44 Masami Hiramatsu
  2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu
                   ` (12 more replies)
  0 siblings, 13 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Hello,

Here is an experimental series of patches for asynchronous kprobe
event disabling and unregistration. This fixes the performance
issue (from operator's viewpoint) of disabling/removing kprobe
events.
This still under testing phase, so please try to test it,
especially with eBPF. You can download this series from here;

  https://github.com/mhiramat/linux.git ftrace-nowait


Background
==========

In last Plumbers, Brendan showed me a demo that BPF tracer took
too long time (more than 30 seconds?) to exit from tracing.
I had seen similar issue on systemtap 10 years ago and had fixed
it by array-based batch API (see commit 9861668f7478 ("kprobes:
add (un)register_kprobes for batch registration").)
I suspected there was similar issue.


Investigation
=============

While I'm checking the current ftrace-side performance with 256
kprobes (without ftrace-based kprobes)[1] , I found there were
another issue.

----
Probes: 256 kprobe_events
Enable events
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
Disable events
real	0m 21.40s
user	0m 0.00s
sys	0m 0.02s
Remove events
real	0m 2.24s
user	0m 0.00s
sys	0m 0.01s
----

OK, removing events took more than 2 seconds for 256 probe events.
But disabling events took 21 seconds, 10 times longer than
removing.
Actually, since perf-events (base of BPF tracer) does disable and
remove at once, it will take more than that.

I also measured it without kprobe jump optimization
(echo 0 > /proc/sys/debug/kprobe-optimization) and it changed
the results as below.

----
Probes: 256 kprobe_events
Enable events
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
Disable events
real	0m 2.07s
user	0m 0.00s
sys	0m 0.04s
Remove events
real	0m 2.13s
user	0m 0.00s
sys	0m 0.01s
----

I checked the kprobes and tracing code and what happened, and
found that there might be mutex waiting for unoptimization.

Disabling an optimized kprobe will be done without any RCU
grace period at kprobes API level, but it kicks unoptimizer
which will be done afterwards by delayed workqueue.
On the other hand, since the ftrace needs to remove disabled
probe from an event (which is on RCU list), it waits RCU
grace period to free the event.
Thus, if we disable multiple probes at once, it repeats
(1) disable kprobes (2) kick (un)optimizer (3) remove event
from rcu list and (4) waits RCU grace period. And while (4),
if the (un)optimizer starts, then the 2nd (1) will stopped
because kprobe_mutex is already held by the (un)optimizer.

For kprobe unoptimization, the (un)optimizer locks global
kprobe_mutex and waits for some task to be scheduled on
all CPUs for safety, which takes a while.

So if we repeating the above process, the disabling will
take a long time. (and actually, it took 21 seconds)

This actually came from the delaying period is not enough
for kprobes, at this moment it just wait for 5 jiffies, but
it should be enough longer than the batch disabling process.

Also since such the batch disabling process involves many
rcu grace period wait, from operator's viewpoint, it takes
a long time. For example, from above experiment, disabling
256 probes took around 2 seconds. These waits can be replaced
by call_rcu() and workqueues.


Improvements
============

This series is a trial to solve these issues by introducing
many asynchronous improvements.

Here is the result. I have tried to remove redundant
RCU synchronizations from kprobes and kprobe events with
asynchronous APIs.

-----
Probes: 256 kprobe_events
Enable events
real	0m 0.03s
user	0m 0.00s
sys	0m 0.02s
Disable events
real	0m 0.05s
user	0m 0.00s
sys	0m 0.01s
Remove events
real	0m 0.03s
user	0m 0.00s
sys	0m 0.01s
-----

Note that this results is on the ftrace side, bpf will still
have some time to stop the tracing, because perf_event doesn't
have independent event disabling code.
The ftrace has 2 operations to stop the tracing, disable events
and remove events. Thus it is easy to wait an RCU grace period
between them. On the other hand, perf event has only "destroy"
operation which disables event and remove event at once.
To do it safely, in the destroy operation, the
perf_trace_event_unreg() calls tracepoint_synchronize_unregister(),
which is an alias of synchronize_rcu(), for each event.

Thus I guess eBPF still take O(n) to stop tracing, but it
might be 1/10 shorter than before with this series.


Patches
=======

First 2 patches are minor fix and cleanup. I will send those
as a separated series.

The [3/13] and [4/13] are postpone (and give more delay) the
kprobe unoptimizing process. This will minimize kprobe_mutex
collision with disabling/unregistering process.

[5/13] is modifying ftrace side to remove unused event_file_link
to free in call_rcu().

[6/13] is enabling kprobe-booster on preemptive kernel for
preparation. The purpose of this is replacing synchronize_rcu()
with synchronize_rcu_tasks().

[7/13]-[10/13] are refining instruction cache reclaiming process
to be asynchronous.

[11/13] adds asynchronous unregister API for kprobes and [12/13]
and [13/13] uses that API from ftrace side.


Thank you,


[1] Here is the test script

------------------
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
# description: Register/unregister many kprobe events

cd /sys/kernel/debug/tracing

# ftrace fentry skip size depends on the machine architecture.
# Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc64le
case `uname -m` in
  x86_64|i[3456]86) OFFS=5;;
  ppc64le) OFFS=8;;
  *) OFFS=0;;
esac

N=0
echo "Setup up kprobes on first available 256 text symbols"
grep -i " t " /proc/kallsyms | cut -f3 -d" " | grep -v .*\\..* | \
while read i; do
  echo p ${i}+${OFFS} >> kprobe_events && N=$((N+1)) &> /dev/null ||:
  test $N -eq 256 && break
done


echo -n "Probes: " ; wc -l kprobe_events
echo "Enable events"
echo 1 | time tee events/kprobes/enable > /dev/null
sleep 2
echo "Disable events"
echo 0 | time tee events/kprobes/enable > /dev/null
sleep 2
echo "Remove events"
echo | time tee kprobe_events > /dev/null
------------------

---

Masami Hiramatsu (13):
      kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
      kprobes: Remove redundant arch_disarm_kprobe() call
      kprobes: Postpone optimizer until a bunch of probes (un)registered
      kprobes: Make optimizer delay to 1 second
      tracing/kprobe: Use call_rcu to defer freeing event_file_link
      kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y
      kprobes: Use normal list traversal API if a mutex is held
      kprobes: Use workqueue for reclaiming kprobe insn cache pages
      kprobes: Free kprobe_insn_page asynchronously
      kprobes: Make free_*insn_slot() mutex-less
      kprobes: Add asynchronous unregistration APIs
      tracing/kprobe: Free probe event asynchronously
      tracing/kprobe: perf_event: Remove local kprobe event asynchronously


 arch/Kconfig                    |    4 +
 arch/ia64/Kconfig               |    1 
 arch/ia64/kernel/kprobes.c      |    3 
 arch/powerpc/kernel/optprobes.c |    1 
 arch/x86/Kconfig                |    1 
 arch/x86/kernel/kprobes/core.c  |    2 
 include/linux/kprobes.h         |   13 ++
 kernel/kprobes.c                |  286 ++++++++++++++++++++++++++-------------
 kernel/trace/trace_dynevent.c   |    5 +
 kernel/trace/trace_kprobe.c     |   85 +++++++++---
 kernel/trace/trace_probe.c      |   10 +
 kernel/trace/trace_probe.h      |    1 
 12 files changed, 293 insertions(+), 119 deletions(-)

--
Masami Hiramatsu (Linaro) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
@ 2020-01-16 14:44 ` Masami Hiramatsu
  2020-01-16 14:44 ` [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call Masami Hiramatsu
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

In kprobe_optimizer() kick_kprobe_optimizer() is called
without kprobe_mutex, but this can race with other caller
which is protected by kprobe_mutex.

To fix that, expand kprobe_mutex protected area to protect
kick_kprobe_optimizer() call.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/kprobes.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 5a664f995377..52b05ab9c323 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -592,11 +592,12 @@ static void kprobe_optimizer(struct work_struct *work)
 	mutex_unlock(&module_mutex);
 	mutex_unlock(&text_mutex);
 	cpus_read_unlock();
-	mutex_unlock(&kprobe_mutex);
 
 	/* Step 5: Kick optimizer again if needed */
 	if (!list_empty(&optimizing_list) || !list_empty(&unoptimizing_list))
 		kick_kprobe_optimizer();
+
+	mutex_unlock(&kprobe_mutex);
 }
 
 /* Wait for completing optimization and unoptimization */


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
  2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu
@ 2020-01-16 14:44 ` Masami Hiramatsu
  2020-01-16 14:44 ` [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered Masami Hiramatsu
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Fix to remove redundant arch_disarm_kprobe() call in
force_unoptimize_kprobe(). This arch_disarm_kprobe()
will be done if the kprobe is optimized but disabled,
but that means the kprobe (optprobe) is unused
(unoptimizing) state.
In that case, unoptimize_kprobe() puts it in freeing_list
and kprobe_optimizer automatically disarm it. So this
arch_disarm_kprobe() is redundant.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/kprobes.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 52b05ab9c323..a2c755e79be7 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -674,8 +674,6 @@ static void force_unoptimize_kprobe(struct optimized_kprobe *op)
 	lockdep_assert_cpus_held();
 	arch_unoptimize_kprobe(op);
 	op->kp.flags &= ~KPROBE_FLAG_OPTIMIZED;
-	if (kprobe_disabled(&op->kp))
-		arch_disarm_kprobe(&op->kp);
 }
 
 /* Unoptimize a kprobe if p is optimized */


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
  2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu
  2020-01-16 14:44 ` [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call Masami Hiramatsu
@ 2020-01-16 14:44 ` Masami Hiramatsu
  2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Add a counter to kick_kprobe_optimizer() to detect any
additional register/unregister kprobes and postpone
kprobe_optimizer() until a bunch of probes are registered.

This might improve some long waiting unregistration process
for bunch of kprobes.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/kprobes.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index a2c755e79be7..0dacdcecc90f 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -465,6 +465,7 @@ static struct kprobe *get_optimized_kprobe(unsigned long addr)
 static LIST_HEAD(optimizing_list);
 static LIST_HEAD(unoptimizing_list);
 static LIST_HEAD(freeing_list);
+static int kprobe_optimizer_queue_update;
 
 static void kprobe_optimizer(struct work_struct *work);
 static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer);
@@ -555,12 +556,22 @@ static void do_free_cleaned_kprobes(void)
 static void kick_kprobe_optimizer(void)
 {
 	schedule_delayed_work(&optimizing_work, OPTIMIZE_DELAY);
+	kprobe_optimizer_queue_update++;
 }
 
 /* Kprobe jump optimizer */
 static void kprobe_optimizer(struct work_struct *work)
 {
 	mutex_lock(&kprobe_mutex);
+
+	/*
+	 * If new kprobe is queued in optimized/unoptimized list while
+	 * OPTIMIZE_DELAY waiting period, wait again for a series of
+	 * probes registration/unregistrations.
+	 */
+	if (kprobe_optimizer_queue_update > 1)
+		goto end;
+
 	cpus_read_lock();
 	mutex_lock(&text_mutex);
 	/* Lock modules while optimizing kprobes */
@@ -593,6 +604,8 @@ static void kprobe_optimizer(struct work_struct *work)
 	mutex_unlock(&text_mutex);
 	cpus_read_unlock();
 
+end:
+	kprobe_optimizer_queue_update = 0;
 	/* Step 5: Kick optimizer again if needed */
 	if (!list_empty(&optimizing_list) || !list_empty(&unoptimizing_list))
 		kick_kprobe_optimizer();


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (2 preceding siblings ...)
  2020-01-16 14:44 ` [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered Masami Hiramatsu
@ 2020-01-16 14:44 ` Masami Hiramatsu
  2020-01-22  0:29   ` Steven Rostedt
  2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:44 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Since the 5 jiffies delay for the optimizer is too
short to wait for other probes, make it longer,
like 1 second.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/kprobes.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 0dacdcecc90f..9c6e230852ad 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -469,7 +469,8 @@ static int kprobe_optimizer_queue_update;
 
 static void kprobe_optimizer(struct work_struct *work);
 static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer);
-#define OPTIMIZE_DELAY 5
+/* Wait 1 second for starting optimization */
+#define OPTIMIZE_DELAY HZ
 
 /*
  * Optimize (replace a breakpoint with a jump) kprobes listed on


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (3 preceding siblings ...)
  2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu
@ 2020-01-16 14:45 ` Masami Hiramatsu
  2020-01-27 15:02   ` kbuild test robot
  2020-01-27 15:02   ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot
  2020-01-16 14:45 ` [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y Masami Hiramatsu
                   ` (7 subsequent siblings)
  12 siblings, 2 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Use call_rcu() to defer freeing event_file_link data
structure. This removes RCU synchronization from
per-probe event disabling operation.

Since unregistering kprobe event requires all handlers to
be disabled and have finished, this also introduces a
gatekeeper to ensure that. If there is any disabled event
which is not finished, the unregister process can synchronize
RCU once (IOW, may sleep a while.)

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/trace/trace_kprobe.c |   35 +++++++++++++++++++++++++++++------
 kernel/trace/trace_probe.c  |   10 ++++++++--
 kernel/trace/trace_probe.h  |    1 +
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index cbdc4f4e64c7..906af1ffe2b2 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -328,10 +328,25 @@ static inline int __enable_trace_kprobe(struct trace_kprobe *tk)
 	return ret;
 }
 
+atomic_t trace_kprobe_disabled_finished;
+
+static void trace_kprobe_disabled_handlers_finish(void)
+{
+	if (atomic_read(&trace_kprobe_disabled_finished))
+		synchronize_rcu();
+}
+
+static void trace_kprobe_disabled_finish_cb(struct rcu_head *head)
+{
+	atomic_dec(&trace_kprobe_disabled_finished);
+	kfree(head);
+}
+
 static void __disable_trace_kprobe(struct trace_probe *tp)
 {
 	struct trace_probe *pos;
 	struct trace_kprobe *tk;
+	struct rcu_head *head;
 
 	list_for_each_entry(pos, trace_probe_probe_list(tp), list) {
 		tk = container_of(pos, struct trace_kprobe, tp);
@@ -342,6 +357,13 @@ static void __disable_trace_kprobe(struct trace_probe *tp)
 		else
 			disable_kprobe(&tk->rp.kp);
 	}
+
+	/* Handler exit gatekeeper */
+	head = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (WARN_ON(!head))
+		return;
+	atomic_inc(&trace_kprobe_disabled_finished);
+	call_rcu(head, trace_kprobe_disabled_finish_cb);
 }
 
 /*
@@ -422,13 +444,11 @@ static int disable_trace_kprobe(struct trace_event_call *call,
 
  out:
 	if (file)
-		/*
-		 * Synchronization is done in below function. For perf event,
-		 * file == NULL and perf_trace_event_unreg() calls
-		 * tracepoint_synchronize_unregister() to ensure synchronize
-		 * event. We don't need to care about it.
-		 */
 		trace_probe_remove_file(tp, file);
+	/*
+	 * We have no RCU synchronization here. Caller must wait for the
+	 * completion of disabling.
+	 */
 
 	return 0;
 }
@@ -542,6 +562,9 @@ static int unregister_trace_kprobe(struct trace_kprobe *tk)
 	if (trace_probe_is_enabled(&tk->tp))
 		return -EBUSY;
 
+	/* Make sure all disabled trace_kprobe handlers finished */
+	trace_kprobe_disabled_handlers_finish();
+
 	/* Will fail if probe is being used by ftrace or perf */
 	if (unregister_kprobe_event(tk))
 		return -EBUSY;
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 905b10af5d5c..b18df8e1b2d6 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1067,6 +1067,13 @@ struct event_file_link *trace_probe_get_file_link(struct trace_probe *tp,
 	return NULL;
 }
 
+static void event_file_link_free_cb(struct rcu_head *head)
+{
+	struct event_file_link *link = container_of(head, typeof(*link), rcu);
+
+	kfree(link);
+}
+
 int trace_probe_remove_file(struct trace_probe *tp,
 			    struct trace_event_file *file)
 {
@@ -1077,8 +1084,7 @@ int trace_probe_remove_file(struct trace_probe *tp,
 		return -ENOENT;
 
 	list_del_rcu(&link->list);
-	synchronize_rcu();
-	kfree(link);
+	call_rcu(&link->rcu, event_file_link_free_cb);
 
 	if (list_empty(&tp->event->files))
 		trace_probe_clear_flag(tp, TP_FLAG_TRACE);
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 4ee703728aec..71ac01a50815 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -243,6 +243,7 @@ struct trace_probe {
 struct event_file_link {
 	struct trace_event_file		*file;
 	struct list_head		list;
+	struct rcu_head			rcu;
 };
 
 static inline bool trace_probe_test_flag(struct trace_probe *tp,


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (4 preceding siblings ...)
  2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu
@ 2020-01-16 14:45 ` Masami Hiramatsu
  2020-01-16 14:45 ` [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held Masami Hiramatsu
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

As we did in commit a30b85df7d59 ("kprobes: Use synchronize_rcu_tasks()
for optprobe with CONFIG_PREEMPT=y"), we can also enable kprobe-
booster which depends on trampoline execution buffer as same as
optprobe. Before releasing the trampoline buffer (kprobe_insn_page),
the garbage collector waits for all potentially preempted tasks on
the trampoline bufer using synchronize_rcu_tasks() instead of
synchronize_rcu().

This requires to enable CONFIG_TASKS_RCU=y too, so this also
introduces HAVE_KPROBES_BOOSTER for the archs which supports
kprobe-booster (currently only x86 and ia64.)

If both of CONFIG_PREEMPTION and HAVE_KPROBES_BOOSTER is y,
CONFIG_KPROBES selects CONFIG_TASKS_RCU=y.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/Kconfig                   |    4 ++++
 arch/ia64/Kconfig              |    1 +
 arch/ia64/kernel/kprobes.c     |    3 +--
 arch/x86/Kconfig               |    1 +
 arch/x86/kernel/kprobes/core.c |    2 --
 kernel/kprobes.c               |    4 ++--
 6 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 48b5e103bdb0..ead87084c8bf 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -64,6 +64,7 @@ config KPROBES
 	depends on MODULES
 	depends on HAVE_KPROBES
 	select KALLSYMS
+	select TASKS_RCU if PREEMPTION && HAVE_KPROBES_BOOSTER
 	help
 	  Kprobes allows you to trap at almost any kernel address and
 	  execute a callback function.  register_kprobe() establishes
@@ -189,6 +190,9 @@ config HAVE_KPROBES
 config HAVE_KRETPROBES
 	bool
 
+config HAVE_KPROBES_BOOSTER
+	bool
+
 config HAVE_OPTPROBES
 	bool
 
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index bab7cd878464..341f9ca8a745 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -25,6 +25,7 @@ config IA64
 	select HAVE_IDE
 	select HAVE_OPROFILE
 	select HAVE_KPROBES
+	select HAVE_KPROBES_BOOSTER
 	select HAVE_KRETPROBES
 	select HAVE_FTRACE_MCOUNT_RECORD
 	select HAVE_DYNAMIC_FTRACE if (!ITANIUM)
diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index a6d6a0556f08..1680a10c9f49 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -841,7 +841,6 @@ static int __kprobes pre_kprobes_handler(struct die_args *args)
 		return 1;
 	}
 
-#if !defined(CONFIG_PREEMPTION)
 	if (p->ainsn.inst_flag == INST_FLAG_BOOSTABLE && !p->post_handler) {
 		/* Boost up -- we can execute copied instructions directly */
 		ia64_psr(regs)->ri = p->ainsn.slot;
@@ -853,7 +852,7 @@ static int __kprobes pre_kprobes_handler(struct die_args *args)
 		preempt_enable_no_resched();
 		return 1;
 	}
-#endif
+
 	prepare_ss(p, regs);
 	kcb->kprobe_status = KPROBE_HIT_SS;
 	return 1;
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index e5800e52a59a..d509578d824b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -181,6 +181,7 @@ config X86
 	select HAVE_KERNEL_LZO
 	select HAVE_KERNEL_XZ
 	select HAVE_KPROBES
+	select HAVE_KPROBES_BOOSTER
 	select HAVE_KPROBES_ON_FTRACE
 	select HAVE_FUNCTION_ERROR_INJECTION
 	select HAVE_KRETPROBES
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d7022a740ab..7aba45037885 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -587,7 +587,6 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 	if (setup_detour_execution(p, regs, reenter))
 		return;
 
-#if !defined(CONFIG_PREEMPTION)
 	if (p->ainsn.boostable && !p->post_handler) {
 		/* Boost up -- we can execute copied instructions directly */
 		if (!reenter)
@@ -600,7 +599,6 @@ static void setup_singlestep(struct kprobe *p, struct pt_regs *regs,
 		regs->ip = (unsigned long)p->ainsn.insn;
 		return;
 	}
-#endif
 	if (reenter) {
 		save_previous_kprobe(kcb);
 		set_current_kprobe(p, regs, kcb);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 9c6e230852ad..848c14e92ccc 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -220,8 +220,8 @@ static int collect_garbage_slots(struct kprobe_insn_cache *c)
 {
 	struct kprobe_insn_page *kip, *next;
 
-	/* Ensure no-one is interrupted on the garbages */
-	synchronize_rcu();
+	/* Ensure no-one is running on the garbages. */
+	synchronize_rcu_tasks();
 
 	list_for_each_entry_safe(kip, next, &c->pages, list) {
 		int i;


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (5 preceding siblings ...)
  2020-01-16 14:45 ` [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y Masami Hiramatsu
@ 2020-01-16 14:45 ` Masami Hiramatsu
  2020-01-16 14:45 ` [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages Masami Hiramatsu
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Use normal list traversal API instead of rcu_read_lock,
RCU list traversal and rcu_read_unlock pair if a mutex
which protects the list is held in the methods of
kprobe_insn_cache.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/kprobes.c |    9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 848c14e92ccc..09b0e33bc845 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -141,8 +141,7 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 	/* Since the slot array is not protected by rcu, we need a mutex */
 	mutex_lock(&c->mutex);
  retry:
-	rcu_read_lock();
-	list_for_each_entry_rcu(kip, &c->pages, list) {
+	list_for_each_entry(kip, &c->pages, list) {
 		if (kip->nused < slots_per_page(c)) {
 			int i;
 			for (i = 0; i < slots_per_page(c); i++) {
@@ -150,7 +149,6 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 					kip->slot_used[i] = SLOT_USED;
 					kip->nused++;
 					slot = kip->insns + (i * c->insn_size);
-					rcu_read_unlock();
 					goto out;
 				}
 			}
@@ -159,7 +157,6 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 			WARN_ON(1);
 		}
 	}
-	rcu_read_unlock();
 
 	/* If there are any garbage slots, collect it and try again. */
 	if (c->nr_garbage && collect_garbage_slots(c) == 0)
@@ -244,8 +241,7 @@ void __free_insn_slot(struct kprobe_insn_cache *c,
 	long idx;
 
 	mutex_lock(&c->mutex);
-	rcu_read_lock();
-	list_for_each_entry_rcu(kip, &c->pages, list) {
+	list_for_each_entry(kip, &c->pages, list) {
 		idx = ((long)slot - (long)kip->insns) /
 			(c->insn_size * sizeof(kprobe_opcode_t));
 		if (idx >= 0 && idx < slots_per_page(c))
@@ -255,7 +251,6 @@ void __free_insn_slot(struct kprobe_insn_cache *c,
 	WARN_ON(1);
 	kip = NULL;
 out:
-	rcu_read_unlock();
 	/* Mark and sweep: this may sleep */
 	if (kip) {
 		/* Check double free */


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (6 preceding siblings ...)
  2020-01-16 14:45 ` [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held Masami Hiramatsu
@ 2020-01-16 14:45 ` Masami Hiramatsu
  2020-01-16 14:45 ` [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously Masami Hiramatsu
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Use workqueues for reclaiming kprobe insn cache pages. This can
split the heaviest part from the unregistration process.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 include/linux/kprobes.h |    2 ++
 kernel/kprobes.c        |   29 ++++++++++++++++++-----------
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 04bdaf01112c..0f832817fca3 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -245,6 +245,7 @@ struct kprobe_insn_cache {
 	struct list_head pages; /* list of kprobe_insn_page */
 	size_t insn_size;	/* size of instruction slot */
 	int nr_garbage;
+	struct work_struct work;
 };
 
 #ifdef __ARCH_WANT_KPROBES_INSN_SLOT
@@ -254,6 +255,7 @@ extern void __free_insn_slot(struct kprobe_insn_cache *c,
 /* sleep-less address checking routine  */
 extern bool __is_insn_slot_addr(struct kprobe_insn_cache *c,
 				unsigned long addr);
+void kprobe_insn_cache_gc(struct work_struct *work);
 
 #define DEFINE_INSN_CACHE_OPS(__name)					\
 extern struct kprobe_insn_cache kprobe_##__name##_slots;		\
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 09b0e33bc845..a9114923da4c 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -126,8 +126,15 @@ struct kprobe_insn_cache kprobe_insn_slots = {
 	.pages = LIST_HEAD_INIT(kprobe_insn_slots.pages),
 	.insn_size = MAX_INSN_SIZE,
 	.nr_garbage = 0,
+	.work = __WORK_INITIALIZER(kprobe_insn_slots.work,
+				   kprobe_insn_cache_gc),
 };
-static int collect_garbage_slots(struct kprobe_insn_cache *c);
+
+static void kick_kprobe_insn_cache_gc(struct kprobe_insn_cache *c)
+{
+	if (!work_pending(&c->work))
+		schedule_work(&c->work);
+}
 
 /**
  * __get_insn_slot() - Find a slot on an executable page for an instruction.
@@ -140,7 +147,6 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 
 	/* Since the slot array is not protected by rcu, we need a mutex */
 	mutex_lock(&c->mutex);
- retry:
 	list_for_each_entry(kip, &c->pages, list) {
 		if (kip->nused < slots_per_page(c)) {
 			int i;
@@ -158,11 +164,7 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 		}
 	}
 
-	/* If there are any garbage slots, collect it and try again. */
-	if (c->nr_garbage && collect_garbage_slots(c) == 0)
-		goto retry;
-
-	/* All out of space.  Need to allocate a new page. */
+	/* All out of space. Need to allocate a new page. */
 	kip = kmalloc(KPROBE_INSN_PAGE_SIZE(slots_per_page(c)), GFP_KERNEL);
 	if (!kip)
 		goto out;
@@ -213,10 +215,12 @@ static int collect_one_slot(struct kprobe_insn_page *kip, int idx)
 	return 0;
 }
 
-static int collect_garbage_slots(struct kprobe_insn_cache *c)
+void kprobe_insn_cache_gc(struct work_struct *work)
 {
+	struct kprobe_insn_cache *c = container_of(work, typeof(*c), work);
 	struct kprobe_insn_page *kip, *next;
 
+	mutex_lock(&c->mutex);
 	/* Ensure no-one is running on the garbages. */
 	synchronize_rcu_tasks();
 
@@ -226,12 +230,13 @@ static int collect_garbage_slots(struct kprobe_insn_cache *c)
 			continue;
 		kip->ngarbage = 0;	/* we will collect all garbages */
 		for (i = 0; i < slots_per_page(c); i++) {
-			if (kip->slot_used[i] == SLOT_DIRTY && collect_one_slot(kip, i))
+			if (kip->slot_used[i] == SLOT_DIRTY &&
+			    collect_one_slot(kip, i))
 				break;
 		}
 	}
 	c->nr_garbage = 0;
-	return 0;
+	mutex_unlock(&c->mutex);
 }
 
 void __free_insn_slot(struct kprobe_insn_cache *c,
@@ -259,7 +264,7 @@ void __free_insn_slot(struct kprobe_insn_cache *c,
 			kip->slot_used[idx] = SLOT_DIRTY;
 			kip->ngarbage++;
 			if (++c->nr_garbage > slots_per_page(c))
-				collect_garbage_slots(c);
+				kick_kprobe_insn_cache_gc(c);
 		} else {
 			collect_one_slot(kip, idx);
 		}
@@ -299,6 +304,8 @@ struct kprobe_insn_cache kprobe_optinsn_slots = {
 	.pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages),
 	/* .insn_size is initialized later */
 	.nr_garbage = 0,
+	.work = __WORK_INITIALIZER(kprobe_optinsn_slots.work,
+				   kprobe_insn_cache_gc),
 };
 #endif
 #endif


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (7 preceding siblings ...)
  2020-01-16 14:45 ` [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages Masami Hiramatsu
@ 2020-01-16 14:45 ` Masami Hiramatsu
  2020-01-16 14:45 ` [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less Masami Hiramatsu
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Free the kprobe_insn_page data structure asynchronously
using call_rcu().

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/kprobes.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index a9114923da4c..60ffc9d54d87 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -87,6 +87,7 @@ static LIST_HEAD(kprobe_blacklist);
  */
 struct kprobe_insn_page {
 	struct list_head list;
+	struct rcu_head rcu;
 	kprobe_opcode_t *insns;		/* Page of instruction slots */
 	struct kprobe_insn_cache *cache;
 	int nused;
@@ -192,6 +193,13 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 	return slot;
 }
 
+static void free_kprobe_insn_page(struct rcu_head *head)
+{
+	struct kprobe_insn_page *kip = container_of(head, typeof(*kip), rcu);
+
+	kfree(kip);
+}
+
 /* Return 1 if all garbages are collected, otherwise 0. */
 static int collect_one_slot(struct kprobe_insn_page *kip, int idx)
 {
@@ -206,9 +214,8 @@ static int collect_one_slot(struct kprobe_insn_page *kip, int idx)
 		 */
 		if (!list_is_singular(&kip->list)) {
 			list_del_rcu(&kip->list);
-			synchronize_rcu();
 			kip->cache->free(kip->insns);
-			kfree(kip);
+			call_rcu(&kip->rcu, free_kprobe_insn_page);
 		}
 		return 1;
 	}


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (8 preceding siblings ...)
  2020-01-16 14:45 ` [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously Masami Hiramatsu
@ 2020-01-16 14:45 ` Masami Hiramatsu
  2020-01-16 14:46 ` [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs Masami Hiramatsu
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:45 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Rewrite kprobe_insn_cache implementation so that free_*insn_slot()
do not acquire kprobe_insn_cache->mutex. This allows us to call it
from call_rcu() callback function.

For this purpose, this introduces flip-flop dirty generation in
kprobe_insn_cache. When __free_insn_slot() is called, if a slot
is dirty (this means it can be in use even after RCU grace period),
it marks as "current-generation" dirty. The garbage collector
(kprobe_insn_cache_gc()) flips the generation bit, waits enough
safe period by synchronize_rcu_tasks(), and collect "previous-
generation" dirty slots. In the results, it collects the dirty
slots which was returned by __free_insn_slot() before the GC
starts waiting the period (and the dirty slots which is returned
while the safe period, will be marked as new-generation dirty.)
Since the GC is not concurrently running, we do not need more
than 2 generations. So it flips the generation bit instead of
counting it up.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/powerpc/kernel/optprobes.c |    1 
 include/linux/kprobes.h         |    2 
 kernel/kprobes.c                |  172 ++++++++++++++++++++++-----------------
 3 files changed, 96 insertions(+), 79 deletions(-)

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 024f7aad1952..8304f3814515 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -53,7 +53,6 @@ struct kprobe_insn_cache kprobe_ppc_optinsn_slots = {
 	/* insn_size initialized later */
 	.alloc = __ppc_alloc_insn_page,
 	.free = __ppc_free_insn_page,
-	.nr_garbage = 0,
 };
 
 /*
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 0f832817fca3..1cd53b7b8409 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -244,7 +244,7 @@ struct kprobe_insn_cache {
 	void (*free)(void *);	/* free insn page */
 	struct list_head pages; /* list of kprobe_insn_page */
 	size_t insn_size;	/* size of instruction slot */
-	int nr_garbage;
+	int generation;		/* dirty generation */
 	struct work_struct work;
 };
 
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 60ffc9d54d87..5c12eb7fa8e1 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -90,8 +90,7 @@ struct kprobe_insn_page {
 	struct rcu_head rcu;
 	kprobe_opcode_t *insns;		/* Page of instruction slots */
 	struct kprobe_insn_cache *cache;
-	int nused;
-	int ngarbage;
+	atomic_t nused;
 	char slot_used[];
 };
 
@@ -106,8 +105,9 @@ static int slots_per_page(struct kprobe_insn_cache *c)
 
 enum kprobe_slot_state {
 	SLOT_CLEAN = 0,
-	SLOT_DIRTY = 1,
-	SLOT_USED = 2,
+	SLOT_USED = 1,
+	SLOT_DIRTY0 = 2,
+	SLOT_DIRTY1 = 3,
 };
 
 void __weak *alloc_insn_page(void)
@@ -126,7 +126,6 @@ struct kprobe_insn_cache kprobe_insn_slots = {
 	.free = free_insn_page,
 	.pages = LIST_HEAD_INIT(kprobe_insn_slots.pages),
 	.insn_size = MAX_INSN_SIZE,
-	.nr_garbage = 0,
 	.work = __WORK_INITIALIZER(kprobe_insn_slots.work,
 				   kprobe_insn_cache_gc),
 };
@@ -137,6 +136,22 @@ static void kick_kprobe_insn_cache_gc(struct kprobe_insn_cache *c)
 		schedule_work(&c->work);
 }
 
+static void *try_to_get_clean_slot(struct kprobe_insn_page *kip)
+{
+	struct kprobe_insn_cache *c = kip->cache;
+	int i;
+
+	for (i = 0; i < slots_per_page(c); i++) {
+		if (kip->slot_used[i] == SLOT_CLEAN) {
+			kip->slot_used[i] = SLOT_USED;
+			atomic_inc(&kip->nused);
+			return kip->insns + (i * c->insn_size);
+		}
+	}
+
+	return NULL;
+}
+
 /**
  * __get_insn_slot() - Find a slot on an executable page for an instruction.
  * We allocate an executable page if there's no room on existing ones.
@@ -145,25 +160,20 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 {
 	struct kprobe_insn_page *kip;
 	kprobe_opcode_t *slot = NULL;
+	bool reclaimable = false;
 
 	/* Since the slot array is not protected by rcu, we need a mutex */
 	mutex_lock(&c->mutex);
 	list_for_each_entry(kip, &c->pages, list) {
-		if (kip->nused < slots_per_page(c)) {
-			int i;
-			for (i = 0; i < slots_per_page(c); i++) {
-				if (kip->slot_used[i] == SLOT_CLEAN) {
-					kip->slot_used[i] = SLOT_USED;
-					kip->nused++;
-					slot = kip->insns + (i * c->insn_size);
-					goto out;
-				}
-			}
-			/* kip->nused is broken. Fix it. */
-			kip->nused = slots_per_page(c);
-			WARN_ON(1);
+		if (atomic_read(&kip->nused) < slots_per_page(c)) {
+			slot = try_to_get_clean_slot(kip);
+			if (slot)
+				goto out;
+			reclaimable = true;
 		}
 	}
+	if (reclaimable)
+		kick_kprobe_insn_cache_gc(c);
 
 	/* All out of space. Need to allocate a new page. */
 	kip = kmalloc(KPROBE_INSN_PAGE_SIZE(slots_per_page(c)), GFP_KERNEL);
@@ -183,8 +193,7 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 	INIT_LIST_HEAD(&kip->list);
 	memset(kip->slot_used, SLOT_CLEAN, slots_per_page(c));
 	kip->slot_used[0] = SLOT_USED;
-	kip->nused = 1;
-	kip->ngarbage = 0;
+	atomic_set(&kip->nused, 1);
 	kip->cache = c;
 	list_add_rcu(&kip->list, &c->pages);
 	slot = kip->insns;
@@ -193,90 +202,106 @@ kprobe_opcode_t *__get_insn_slot(struct kprobe_insn_cache *c)
 	return slot;
 }
 
-static void free_kprobe_insn_page(struct rcu_head *head)
+static void free_kprobe_insn_page_cb(struct rcu_head *head)
 {
 	struct kprobe_insn_page *kip = container_of(head, typeof(*kip), rcu);
 
 	kfree(kip);
 }
 
-/* Return 1 if all garbages are collected, otherwise 0. */
-static int collect_one_slot(struct kprobe_insn_page *kip, int idx)
+static void free_kprobe_insn_page(struct kprobe_insn_page *kip)
 {
-	kip->slot_used[idx] = SLOT_CLEAN;
-	kip->nused--;
-	if (kip->nused == 0) {
-		/*
-		 * Page is no longer in use.  Free it unless
-		 * it's the last one.  We keep the last one
-		 * so as not to have to set it up again the
-		 * next time somebody inserts a probe.
-		 */
-		if (!list_is_singular(&kip->list)) {
-			list_del_rcu(&kip->list);
-			kip->cache->free(kip->insns);
-			call_rcu(&kip->rcu, free_kprobe_insn_page);
-		}
-		return 1;
+	if (WARN_ON_ONCE(atomic_read(&kip->nused) != 0))
+		return;
+	/*
+	 * Page is no longer in use.  Free it unless
+	 * it's the last one.  We keep the last one
+	 * so as not to have to set it up again the
+	 * next time somebody inserts a probe.
+	 */
+	if (!list_is_singular(&kip->list)) {
+		list_del_rcu(&kip->list);
+		kip->cache->free(kip->insns);
+		call_rcu(&kip->rcu, free_kprobe_insn_page_cb);
 	}
-	return 0;
 }
 
 void kprobe_insn_cache_gc(struct work_struct *work)
 {
 	struct kprobe_insn_cache *c = container_of(work, typeof(*c), work);
 	struct kprobe_insn_page *kip, *next;
+	int dirtygen = c->generation ? SLOT_DIRTY1 : SLOT_DIRTY0;
+	int i, nr;
 
 	mutex_lock(&c->mutex);
+
+	c->generation ^= 1;	/* flip generation (0->1, 1->0) */
+
+	/* Make sure the generation update is shown in __free_insn_slot() */
+	smp_wmb();
+
 	/* Ensure no-one is running on the garbages. */
 	synchronize_rcu_tasks();
 
 	list_for_each_entry_safe(kip, next, &c->pages, list) {
-		int i;
-		if (kip->ngarbage == 0)
-			continue;
-		kip->ngarbage = 0;	/* we will collect all garbages */
+		nr = 0;
+		/* Reclaim previous generation dirty slots */
 		for (i = 0; i < slots_per_page(c); i++) {
-			if (kip->slot_used[i] == SLOT_DIRTY &&
-			    collect_one_slot(kip, i))
-				break;
+			if (kip->slot_used[i] == dirtygen)
+				kip->slot_used[i] = SLOT_CLEAN;
+			else if (kip->slot_used[i] != SLOT_CLEAN)
+				nr++;
 		}
+		if (!nr)
+			free_kprobe_insn_page(kip);
 	}
-	c->nr_garbage = 0;
 	mutex_unlock(&c->mutex);
 }
 
+static struct kprobe_insn_page *
+find_kprobe_insn_page(struct kprobe_insn_cache *c, unsigned long addr)
+{
+	struct kprobe_insn_page *kip;
+
+	list_for_each_entry_rcu(kip, &c->pages, list) {
+		if (addr >= (unsigned long)kip->insns &&
+		    addr < (unsigned long)kip->insns + PAGE_SIZE)
+			return kip;
+	}
+	return NULL;
+}
+
 void __free_insn_slot(struct kprobe_insn_cache *c,
 		      kprobe_opcode_t *slot, int dirty)
 {
 	struct kprobe_insn_page *kip;
+	int dirtygen;
 	long idx;
 
-	mutex_lock(&c->mutex);
-	list_for_each_entry(kip, &c->pages, list) {
+	rcu_read_lock();
+	kip = find_kprobe_insn_page(c, (unsigned long)slot);
+	if (kip) {
 		idx = ((long)slot - (long)kip->insns) /
 			(c->insn_size * sizeof(kprobe_opcode_t));
-		if (idx >= 0 && idx < slots_per_page(c))
+		/* Check double free */
+		if (WARN_ON(kip->slot_used[idx] != SLOT_USED))
 			goto out;
-	}
-	/* Could not find this slot. */
-	WARN_ON(1);
-	kip = NULL;
+
+		/* Make sure to use new generation */
+		smp_rmb();
+
+		dirtygen = c->generation ? SLOT_DIRTY1 : SLOT_DIRTY0;
+		if (dirty)
+			kip->slot_used[idx] = dirtygen;
+		else
+			kip->slot_used[idx] = SLOT_CLEAN;
+
+		if (!atomic_dec_return(&kip->nused))
+			kick_kprobe_insn_cache_gc(c);
+	} else
+		WARN_ON(1);	/* Not found: what happen? */
 out:
-	/* Mark and sweep: this may sleep */
-	if (kip) {
-		/* Check double free */
-		WARN_ON(kip->slot_used[idx] != SLOT_USED);
-		if (dirty) {
-			kip->slot_used[idx] = SLOT_DIRTY;
-			kip->ngarbage++;
-			if (++c->nr_garbage > slots_per_page(c))
-				kick_kprobe_insn_cache_gc(c);
-		} else {
-			collect_one_slot(kip, idx);
-		}
-	}
-	mutex_unlock(&c->mutex);
+	rcu_read_unlock();
 }
 
 /*
@@ -286,17 +311,11 @@ void __free_insn_slot(struct kprobe_insn_cache *c,
  */
 bool __is_insn_slot_addr(struct kprobe_insn_cache *c, unsigned long addr)
 {
-	struct kprobe_insn_page *kip;
 	bool ret = false;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(kip, &c->pages, list) {
-		if (addr >= (unsigned long)kip->insns &&
-		    addr < (unsigned long)kip->insns + PAGE_SIZE) {
-			ret = true;
-			break;
-		}
-	}
+	if (find_kprobe_insn_page(c, addr))
+		ret = true;
 	rcu_read_unlock();
 
 	return ret;
@@ -310,7 +329,6 @@ struct kprobe_insn_cache kprobe_optinsn_slots = {
 	.free = free_insn_page,
 	.pages = LIST_HEAD_INIT(kprobe_optinsn_slots.pages),
 	/* .insn_size is initialized later */
-	.nr_garbage = 0,
 	.work = __WORK_INITIALIZER(kprobe_optinsn_slots.work,
 				   kprobe_insn_cache_gc),
 };


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (9 preceding siblings ...)
  2020-01-16 14:45 ` [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less Masami Hiramatsu
@ 2020-01-16 14:46 ` Masami Hiramatsu
  2020-01-16 14:46 ` [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously Masami Hiramatsu
  2020-01-16 14:46 ` [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe " Masami Hiramatsu
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:46 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Add asynchronous unregistration APIs for kprobes and kretprobes.
These APIs can accelerate the unregistration process of multiple
probes because user do not need to wait for RCU sync.

However, caller must take care of following notes.

- If you wants to synchronize unregistration (for example, making
  sure all handlers are running out), you have to use
  synchronize_rcu() once at last.

- If you need to free objects which related to the kprobes, you
  can pass a callback, but that callback must call
  kprobe_free_callback() or kretprobe_free_callback() at first.

Since it is easy to shoot your foot, at this moment I don't
export these APIs to modules.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 include/linux/kprobes.h |    9 ++++++++
 kernel/kprobes.c        |   56 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 1cd53b7b8409..f892c3a11dac 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -98,6 +98,9 @@ struct kprobe {
 	 * Protected by kprobe_mutex after this kprobe is registered.
 	 */
 	u32 flags;
+
+	/* For asynchronous unregistration callback */
+	struct rcu_head rcu;
 };
 
 /* Kprobe status flags */
@@ -364,6 +367,12 @@ void unregister_kretprobe(struct kretprobe *rp);
 int register_kretprobes(struct kretprobe **rps, int num);
 void unregister_kretprobes(struct kretprobe **rps, int num);
 
+/* Async unregister APIs (Do not wait for rcu sync) */
+void kprobe_free_callback(struct rcu_head *head);
+void kretprobe_free_callback(struct rcu_head *head);
+void unregister_kprobe_async(struct kprobe *kp, rcu_callback_t free_cb);
+void unregister_kretprobe_async(struct kretprobe *kp, rcu_callback_t free_cb);
+
 void kprobe_flush_task(struct task_struct *tk);
 void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head);
 
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 5c12eb7fa8e1..ab57c22b64f9 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1887,6 +1887,31 @@ void unregister_kprobes(struct kprobe **kps, int num)
 }
 EXPORT_SYMBOL_GPL(unregister_kprobes);
 
+void kprobe_free_callback(struct rcu_head *head)
+{
+	struct kprobe *kp = container_of(head, struct kprobe, rcu);
+
+	__unregister_kprobe_bottom(kp);
+}
+
+/*
+ * If you call this function, you must call kprobe_free_callback() at first
+ * in your free_cb(), or set free_cb = NULL.
+ */
+void unregister_kprobe_async(struct kprobe *kp, rcu_callback_t free_cb)
+{
+	mutex_lock(&kprobe_mutex);
+	if (__unregister_kprobe_top(kp) < 0)
+		kp->addr = NULL;
+	mutex_unlock(&kprobe_mutex);
+
+	if (!kp->addr)
+		return;
+	if (!free_cb)
+		free_cb = kprobe_free_callback;
+	call_rcu(&kp->rcu, free_cb);
+}
+
 int __weak kprobe_exceptions_notify(struct notifier_block *self,
 					unsigned long val, void *data)
 {
@@ -2080,6 +2105,29 @@ void unregister_kretprobes(struct kretprobe **rps, int num)
 }
 EXPORT_SYMBOL_GPL(unregister_kretprobes);
 
+void kretprobe_free_callback(struct rcu_head *head)
+{
+	struct kprobe *kp = container_of(head, struct kprobe, rcu);
+	struct kretprobe *rp = container_of(kp, struct kretprobe, kp);
+
+	__unregister_kprobe_bottom(kp);
+	cleanup_rp_inst(rp);
+}
+
+void unregister_kretprobe_async(struct kretprobe *rp, rcu_callback_t free_cb)
+{
+	mutex_lock(&kprobe_mutex);
+	if (__unregister_kprobe_top(&rp->kp) < 0)
+		rp->kp.addr = NULL;
+	mutex_unlock(&kprobe_mutex);
+
+	if (!rp->kp.addr)
+		return;
+	if (!free_cb)
+		free_cb = kretprobe_free_callback;
+	call_rcu(&rp->kp.rcu, free_cb);
+}
+
 #else /* CONFIG_KRETPROBES */
 int register_kretprobe(struct kretprobe *rp)
 {
@@ -2109,6 +2157,14 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(pre_handler_kretprobe);
 
+void kretprobe_free_callback(struct rcu_head *head)
+{
+}
+
+void unregister_kretprobe_async(struct kretprobe *rp, rcu_callback_t free_cb)
+{
+}
+
 #endif /* CONFIG_KRETPROBES */
 
 /* Set the kprobe gone and remove its instruction buffer. */


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (10 preceding siblings ...)
  2020-01-16 14:46 ` [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs Masami Hiramatsu
@ 2020-01-16 14:46 ` Masami Hiramatsu
  2020-01-16 14:46 ` [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe " Masami Hiramatsu
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:46 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Free each probe event data structure asynchronously when
deleting probe events. But this finally synchronizes RCU
so that we make sure all event handlers have finished.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/trace/trace_dynevent.c |    5 ++++
 kernel/trace/trace_kprobe.c   |   46 +++++++++++++++++++++++++++++++++++------
 2 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace_dynevent.c b/kernel/trace/trace_dynevent.c
index 89779eb84a07..2d5e8d457309 100644
--- a/kernel/trace/trace_dynevent.c
+++ b/kernel/trace/trace_dynevent.c
@@ -70,6 +70,9 @@ int dyn_event_release(int argc, char **argv, struct dyn_event_operations *type)
 		if (ret)
 			break;
 	}
+
+	/* Wait for running events because of async event unregistration */
+	synchronize_rcu();
 	mutex_unlock(&event_mutex);
 
 	return ret;
@@ -164,6 +167,8 @@ int dyn_events_release_all(struct dyn_event_operations *type)
 		if (ret)
 			break;
 	}
+	/* Wait for running events because of async event unregistration */
+	synchronize_rcu();
 out:
 	mutex_unlock(&event_mutex);
 
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 906af1ffe2b2..f7e0370b10ae 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -551,7 +551,35 @@ static void __unregister_trace_kprobe(struct trace_kprobe *tk)
 	}
 }
 
-/* Unregister a trace_probe and probe_event */
+static void free_trace_kprobe_cb(struct rcu_head *head)
+{
+	struct kprobe *kp = container_of(head, struct kprobe, rcu);
+	struct kretprobe *rp = container_of(kp, struct kretprobe, kp);
+	struct trace_kprobe *tk = container_of(rp, struct trace_kprobe, rp);
+
+	if (trace_kprobe_is_return(tk))
+		kretprobe_free_callback(head);
+	else
+		kprobe_free_callback(head);
+	free_trace_kprobe(tk);
+}
+
+static void __unregister_trace_kprobe_async(struct trace_kprobe *tk)
+{
+	if (trace_kprobe_is_registered(tk)) {
+		if (trace_kprobe_is_return(tk))
+			unregister_kretprobe_async(&tk->rp,
+						   free_trace_kprobe_cb);
+		else
+			unregister_kprobe_async(&tk->rp.kp,
+						free_trace_kprobe_cb);
+	}
+}
+
+/*
+ * Unregister a trace_probe and probe_event asynchronously.
+ * Caller must wait for RCU.
+ */
 static int unregister_trace_kprobe(struct trace_kprobe *tk)
 {
 	/* If other probes are on the event, just unregister kprobe */
@@ -570,9 +598,17 @@ static int unregister_trace_kprobe(struct trace_kprobe *tk)
 		return -EBUSY;
 
 unreg:
-	__unregister_trace_kprobe(tk);
 	dyn_event_remove(&tk->devent);
+	/*
+	 * This trace_probe_unlink() can free the trace_event_call linked to
+	 * this probe.
+	 * We can do this before unregistering because this probe is
+	 * already disabled and the disabling process waits enough period
+	 * for all handlers finished. IOW, the disabling process must wait
+	 * RCU sync at least once before returning to its caller.
+	 */
 	trace_probe_unlink(&tk->tp);
+	__unregister_trace_kprobe_async(tk);
 
 	return 0;
 }
@@ -928,11 +964,7 @@ static int create_or_delete_trace_kprobe(int argc, char **argv)
 static int trace_kprobe_release(struct dyn_event *ev)
 {
 	struct trace_kprobe *tk = to_trace_kprobe(ev);
-	int ret = unregister_trace_kprobe(tk);
-
-	if (!ret)
-		free_trace_kprobe(tk);
-	return ret;
+	return unregister_trace_kprobe(tk);
 }
 
 static int trace_kprobe_show(struct seq_file *m, struct dyn_event *ev)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe event asynchronously
  2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
                   ` (11 preceding siblings ...)
  2020-01-16 14:46 ` [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously Masami Hiramatsu
@ 2020-01-16 14:46 ` " Masami Hiramatsu
  12 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-16 14:46 UTC (permalink / raw)
  To: Brendan Gregg, Steven Rostedt, Alexei Starovoitov
  Cc: mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Remove local kprobe event asynchronously. Note that this
can asynchronously remove a kprobe_event part, but the
perf_event needs to wait for all handlers finished before
removing the local kprobe event. So from the perf_event
(and eBPF) point of view, this shortens the trace termination
process a bit, but it still takes O(n) time to finish it.

To fix this issue, we need to change perf_event terminating
process by decoupling "disable events" and "destroy events"
as in ftrace.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 kernel/trace/trace_kprobe.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index f7e0370b10ae..e8c4828c21ae 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1707,9 +1707,7 @@ void destroy_local_trace_kprobe(struct trace_event_call *event_call)
 		return;
 	}
 
-	__unregister_trace_kprobe(tk);
-
-	free_trace_kprobe(tk);
+	__unregister_trace_kprobe_async(tk);
 }
 #endif /* CONFIG_PERF_EVENTS */
 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu
@ 2020-01-22  0:29   ` Steven Rostedt
  2020-01-22  7:23     ` Masami Hiramatsu
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2020-01-22  0:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf,
	linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo,
	David S . Miller, paulmck, joel, Naveen N . Rao,
	Anil S Keshavamurthy

On Thu, 16 Jan 2020 23:44:52 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> Since the 5 jiffies delay for the optimizer is too
> short to wait for other probes, make it longer,
> like 1 second.

Hi Masami,

Can you explain more *why* 5 jiffies is too short.

Thanks!

-- Steve

> 
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> ---
>  kernel/kprobes.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 0dacdcecc90f..9c6e230852ad 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -469,7 +469,8 @@ static int kprobe_optimizer_queue_update;
>  
>  static void kprobe_optimizer(struct work_struct *work);
>  static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer);
> -#define OPTIMIZE_DELAY 5
> +/* Wait 1 second for starting optimization */
> +#define OPTIMIZE_DELAY HZ
>  
>  /*
>   * Optimize (replace a breakpoint with a jump) kprobes listed on


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-22  0:29   ` Steven Rostedt
@ 2020-01-22  7:23     ` Masami Hiramatsu
  2020-01-22 12:11       ` Steven Rostedt
  0 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-22  7:23 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf,
	linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo,
	David S . Miller, paulmck, joel, Naveen N . Rao,
	Anil S Keshavamurthy

On Tue, 21 Jan 2020 19:29:05 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 16 Jan 2020 23:44:52 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > Since the 5 jiffies delay for the optimizer is too
> > short to wait for other probes, make it longer,
> > like 1 second.
> 
> Hi Masami,
> 
> Can you explain more *why* 5 jiffies is too short.

Yes, I had introduced this 5 jiffies delay for multiple probe registration
and unregistration like systemtap, which will use array-based interface to
register/unregister. In that case, 5 jiffies will be enough for the delay
to wait for other kprobe registration/unregsitration.

However, since perf and ftrace register/unregister probes one-by-one with
RCU synchronization interval, the optimizer will be started before
finishing to register/unregister all probes.
And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization.
Since the kprobe_mutex is also involved in disabling kprobes, this also
stops probe-event disabling.

Maybe 5 jiffies is enough for adding/removing a few probe events, but
not enough for dozens of probe events.

Thank you,

> 
> Thanks!
> 
> -- Steve
> 
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> > ---
> >  kernel/kprobes.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index 0dacdcecc90f..9c6e230852ad 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -469,7 +469,8 @@ static int kprobe_optimizer_queue_update;
> >  
> >  static void kprobe_optimizer(struct work_struct *work);
> >  static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer);
> > -#define OPTIMIZE_DELAY 5
> > +/* Wait 1 second for starting optimization */
> > +#define OPTIMIZE_DELAY HZ
> >  
> >  /*
> >   * Optimize (replace a breakpoint with a jump) kprobes listed on
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-22  7:23     ` Masami Hiramatsu
@ 2020-01-22 12:11       ` Steven Rostedt
  2020-01-22 13:12         ` Masami Hiramatsu
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Rostedt @ 2020-01-22 12:11 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf,
	linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo,
	David S . Miller, paulmck, joel, Naveen N . Rao,
	Anil S Keshavamurthy

On Wed, 22 Jan 2020 16:23:17 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> On Tue, 21 Jan 2020 19:29:05 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Thu, 16 Jan 2020 23:44:52 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >   
> > > Since the 5 jiffies delay for the optimizer is too
> > > short to wait for other probes, make it longer,
> > > like 1 second.  
> > 
> > Hi Masami,
> > 
> > Can you explain more *why* 5 jiffies is too short.  
> 
> Yes, I had introduced this 5 jiffies delay for multiple probe registration
> and unregistration like systemtap, which will use array-based interface to
> register/unregister. In that case, 5 jiffies will be enough for the delay
> to wait for other kprobe registration/unregsitration.
> 
> However, since perf and ftrace register/unregister probes one-by-one with
> RCU synchronization interval, the optimizer will be started before
> finishing to register/unregister all probes.
> And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization.
> Since the kprobe_mutex is also involved in disabling kprobes, this also
> stops probe-event disabling.
> 
> Maybe 5 jiffies is enough for adding/removing a few probe events, but
> not enough for dozens of probe events.
> 

Perhaps we should have a mechanism that can detect new probes being
added, and just continue to delay the optimization, instead of having
some arbitrary delay.

-- Steve

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-22 12:11       ` Steven Rostedt
@ 2020-01-22 13:12         ` Masami Hiramatsu
  2020-01-22 16:54           ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-22 13:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Brendan Gregg, Alexei Starovoitov, Ingo Molnar, bpf,
	linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo,
	David S . Miller, paulmck, joel, Naveen N . Rao,
	Anil S Keshavamurthy

On Wed, 22 Jan 2020 07:11:15 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 22 Jan 2020 16:23:17 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > On Tue, 21 Jan 2020 19:29:05 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Thu, 16 Jan 2020 23:44:52 +0900
> > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > >   
> > > > Since the 5 jiffies delay for the optimizer is too
> > > > short to wait for other probes, make it longer,
> > > > like 1 second.  
> > > 
> > > Hi Masami,
> > > 
> > > Can you explain more *why* 5 jiffies is too short.  
> > 
> > Yes, I had introduced this 5 jiffies delay for multiple probe registration
> > and unregistration like systemtap, which will use array-based interface to
> > register/unregister. In that case, 5 jiffies will be enough for the delay
> > to wait for other kprobe registration/unregsitration.
> > 
> > However, since perf and ftrace register/unregister probes one-by-one with
> > RCU synchronization interval, the optimizer will be started before
> > finishing to register/unregister all probes.
> > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization.
> > Since the kprobe_mutex is also involved in disabling kprobes, this also
> > stops probe-event disabling.
> > 
> > Maybe 5 jiffies is enough for adding/removing a few probe events, but
> > not enough for dozens of probe events.
> > 
> 
> Perhaps we should have a mechanism that can detect new probes being
> added, and just continue to delay the optimization, instead of having
> some arbitrary delay.

Yes, that is what [03/13] does :) 
Anyway, it seems that the RCU-synchronization takes more than 5 jiffies.
And in that case, [03/13] still doesn't work. That's why I added this patch
after that.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-22 13:12         ` Masami Hiramatsu
@ 2020-01-22 16:54           ` Paul E. McKenney
  2020-01-23  1:33             ` Masami Hiramatsu
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2020-01-22 16:54 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar,
	bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo,
	David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy

On Wed, Jan 22, 2020 at 10:12:40PM +0900, Masami Hiramatsu wrote:
> On Wed, 22 Jan 2020 07:11:15 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Wed, 22 Jan 2020 16:23:17 +0900
> > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > 
> > > On Tue, 21 Jan 2020 19:29:05 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > On Thu, 16 Jan 2020 23:44:52 +0900
> > > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > > >   
> > > > > Since the 5 jiffies delay for the optimizer is too
> > > > > short to wait for other probes, make it longer,
> > > > > like 1 second.  
> > > > 
> > > > Hi Masami,
> > > > 
> > > > Can you explain more *why* 5 jiffies is too short.  
> > > 
> > > Yes, I had introduced this 5 jiffies delay for multiple probe registration
> > > and unregistration like systemtap, which will use array-based interface to
> > > register/unregister. In that case, 5 jiffies will be enough for the delay
> > > to wait for other kprobe registration/unregsitration.
> > > 
> > > However, since perf and ftrace register/unregister probes one-by-one with
> > > RCU synchronization interval, the optimizer will be started before
> > > finishing to register/unregister all probes.
> > > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization.
> > > Since the kprobe_mutex is also involved in disabling kprobes, this also
> > > stops probe-event disabling.
> > > 
> > > Maybe 5 jiffies is enough for adding/removing a few probe events, but
> > > not enough for dozens of probe events.
> > > 
> > 
> > Perhaps we should have a mechanism that can detect new probes being
> > added, and just continue to delay the optimization, instead of having
> > some arbitrary delay.
> 
> Yes, that is what [03/13] does :) 
> Anyway, it seems that the RCU-synchronization takes more than 5 jiffies.
> And in that case, [03/13] still doesn't work. That's why I added this patch
> after that.

If the RCU synchronization is synchronize_rcu_tasks(), then yes, it
will often take way more than 5 jiffies.  If it is synchronize_rcu(),
5 jiffies would not be unusual, especially on larger systems.
But in the case of synchronize_rcu(), one option is to instead use
synchronize_rcu_expedited().  It is not clear that this last is really
justified in this case, but figured it might be worth mentioning.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-22 16:54           ` Paul E. McKenney
@ 2020-01-23  1:33             ` Masami Hiramatsu
  2020-01-23  2:26               ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-23  1:33 UTC (permalink / raw)
  To: paulmck
  Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar,
	bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo,
	David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy

On Wed, 22 Jan 2020 08:54:32 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> On Wed, Jan 22, 2020 at 10:12:40PM +0900, Masami Hiramatsu wrote:
> > On Wed, 22 Jan 2020 07:11:15 -0500
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> > 
> > > On Wed, 22 Jan 2020 16:23:17 +0900
> > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > > 
> > > > On Tue, 21 Jan 2020 19:29:05 -0500
> > > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > 
> > > > > On Thu, 16 Jan 2020 23:44:52 +0900
> > > > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > > > >   
> > > > > > Since the 5 jiffies delay for the optimizer is too
> > > > > > short to wait for other probes, make it longer,
> > > > > > like 1 second.  
> > > > > 
> > > > > Hi Masami,
> > > > > 
> > > > > Can you explain more *why* 5 jiffies is too short.  
> > > > 
> > > > Yes, I had introduced this 5 jiffies delay for multiple probe registration
> > > > and unregistration like systemtap, which will use array-based interface to
> > > > register/unregister. In that case, 5 jiffies will be enough for the delay
> > > > to wait for other kprobe registration/unregsitration.
> > > > 
> > > > However, since perf and ftrace register/unregister probes one-by-one with
> > > > RCU synchronization interval, the optimizer will be started before
> > > > finishing to register/unregister all probes.
> > > > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization.
> > > > Since the kprobe_mutex is also involved in disabling kprobes, this also
> > > > stops probe-event disabling.
> > > > 
> > > > Maybe 5 jiffies is enough for adding/removing a few probe events, but
> > > > not enough for dozens of probe events.
> > > > 
> > > 
> > > Perhaps we should have a mechanism that can detect new probes being
> > > added, and just continue to delay the optimization, instead of having
> > > some arbitrary delay.
> > 
> > Yes, that is what [03/13] does :) 
> > Anyway, it seems that the RCU-synchronization takes more than 5 jiffies.
> > And in that case, [03/13] still doesn't work. That's why I added this patch
> > after that.
> 
> If the RCU synchronization is synchronize_rcu_tasks(), then yes, it
> will often take way more than 5 jiffies.  If it is synchronize_rcu(),
> 5 jiffies would not be unusual, especially on larger systems.
> But in the case of synchronize_rcu(), one option is to instead use
> synchronize_rcu_expedited().  It is not clear that this last is really
> justified in this case, but figured it might be worth mentioning.

It is synchronize_rcu(), but it seems sometimes it is called several
times on one probe disabling.

Anyway, without this update, I added a printk to count optimizer
queue-length and found that all optimizer call with a single kprobe
on the quenes. I think this depends on the machine, but as far as
I tested on 8-threads qemu x86, shows this result...

Probes: 256 kprobe_events
Enable events
real	0m 0.02s
user	0m 0.00s
sys	0m 0.02s
[   17.730548] Queue-update: 180, skipped, Total Queued: 180
[   17.739445] Queue-update: 1, go, Total Queued: 180
Disable events
[   19.744634] Queue-update: 1, go, Total Queued: 1
[   19.770743] Queue-update: 1, go, Total Queued: 1
[   19.886625] Queue-update: 1, go, Total Queued: 1
[   20.006631] Queue-update: 1, go, Total Queued: 1
[   20.126628] Queue-update: 1, go, Total Queued: 1
[   20.246620] Queue-update: 1, go, Total Queued: 1
[   20.374665] Queue-update: 1, go, Total Queued: 1
[   20.486617] Queue-update: 1, go, Total Queued: 1
[   20.606608] Queue-update: 1, go, Total Queued: 1
[   20.726596] Queue-update: 1, go, Total Queued: 1
[   20.846608] Queue-update: 1, go, Total Queued: 1
[   20.966723] Queue-update: 1, go, Total Queued: 1
[   21.086611] Queue-update: 1, go, Total Queued: 1
[   21.206605] Queue-update: 1, go, Total Queued: 1
[   21.326603] Queue-update: 1, go, Total Queued: 1
[   21.462609] Queue-update: 1, go, Total Queued: 1
[   21.566755] Queue-update: 1, go, Total Queued: 1
[   21.686606] Queue-update: 1, go, Total Queued: 1
[   21.806611] Queue-update: 1, go, Total Queued: 1
[   21.926609] Queue-update: 1, go, Total Queued: 1
[   22.046621] Queue-update: 1, go, Total Queued: 1
[   22.166729] Queue-update: 1, go, Total Queued: 1
[   22.302609] Queue-update: 1, go, Total Queued: 1
[   22.510627] Queue-update: 1, go, Total Queued: 1
[   22.536638] Queue-update: 1, go, Total Queued: 1
[   22.654618] Queue-update: 1, go, Total Queued: 1
[   22.774643] Queue-update: 1, go, Total Queued: 1
[   22.902609] Queue-update: 1, go, Total Queued: 1
[   23.022608] Queue-update: 1, go, Total Queued: 1
[   23.158606] Queue-update: 1, go, Total Queued: 1
[   23.254618] Queue-update: 1, go, Total Queued: 1
[   23.374647] Queue-update: 1, go, Total Queued: 1
[   23.494605] Queue-update: 1, go, Total Queued: 1
[   23.614610] Queue-update: 1, go, Total Queued: 1
[   23.734606] Queue-update: 1, go, Total Queued: 1
[   23.854609] Queue-update: 1, go, Total Queued: 1
[   23.974615] Queue-update: 1, go, Total Queued: 1
[   24.094609] Queue-update: 1, go, Total Queued: 1
[   24.230607] Queue-update: 1, go, Total Queued: 1
[   24.342625] Queue-update: 1, go, Total Queued: 1
[   24.475478] Queue-update: 1, go, Total Queued: 1
[   24.574554] Queue-update: 1, go, Total Queued: 1
[   24.694605] Queue-update: 1, go, Total Queued: 1
[   24.814585] Queue-update: 1, go, Total Queued: 1
[   24.934591] Queue-update: 1, go, Total Queued: 1
[   25.054614] Queue-update: 1, go, Total Queued: 1
[   25.174628] Queue-update: 1, go, Total Queued: 1
[   25.294637] Queue-update: 1, go, Total Queued: 1
[   25.414620] Queue-update: 1, go, Total Queued: 1
[   25.534553] Queue-update: 1, go, Total Queued: 1
[   25.654610] Queue-update: 1, go, Total Queued: 1
[   25.774627] Queue-update: 1, go, Total Queued: 1
[   25.894609] Queue-update: 1, go, Total Queued: 1
[   26.030548] Queue-update: 1, go, Total Queued: 1
[   26.134626] Queue-update: 1, go, Total Queued: 1
[   26.254620] Queue-update: 1, go, Total Queued: 1
[   26.373603] Queue-update: 1, go, Total Queued: 1
[   26.502606] Queue-update: 1, go, Total Queued: 1
[   26.614607] Queue-update: 1, go, Total Queued: 1
[   26.734610] Queue-update: 1, go, Total Queued: 1
[   26.854620] Queue-update: 1, go, Total Queued: 1
[   26.974660] Queue-update: 1, go, Total Queued: 1
[   27.094620] Queue-update: 1, go, Total Queued: 1
[   27.214596] Queue-update: 1, go, Total Queued: 1
[   27.334640] Queue-update: 1, go, Total Queued: 1
[   27.494606] Queue-update: 1, go, Total Queued: 1
[   27.574703] Queue-update: 1, go, Total Queued: 1
[   27.694609] Queue-update: 1, go, Total Queued: 1
[   27.814607] Queue-update: 1, go, Total Queued: 1
[   27.934631] Queue-update: 1, go, Total Queued: 1
[   28.054606] Queue-update: 1, go, Total Queued: 1
[   28.182685] Queue-update: 1, go, Total Queued: 1
[   28.318568] Queue-update: 1, go, Total Queued: 1
[   28.422610] Queue-update: 1, go, Total Queued: 1
[   28.534621] Queue-update: 1, go, Total Queued: 1
[   28.654618] Queue-update: 1, go, Total Queued: 1
[   28.774622] Queue-update: 1, go, Total Queued: 1
[   28.894609] Queue-update: 1, go, Total Queued: 1
[   29.022609] Queue-update: 1, go, Total Queued: 1
[   29.150615] Queue-update: 1, go, Total Queued: 1
[   29.262610] Queue-update: 1, go, Total Queued: 1
[   29.374621] Queue-update: 1, go, Total Queued: 1
[   29.494606] Queue-update: 1, go, Total Queued: 1
[   29.614616] Queue-update: 1, go, Total Queued: 1
[   29.734607] Queue-update: 1, go, Total Queued: 1
[   29.854601] Queue-update: 1, go, Total Queued: 1
[   29.974610] Queue-update: 1, go, Total Queued: 1
[   30.094625] Queue-update: 1, go, Total Queued: 1
[   30.214606] Queue-update: 1, go, Total Queued: 1
[   30.334602] Queue-update: 1, go, Total Queued: 1
[   30.454634] Queue-update: 1, go, Total Queued: 1
[   30.574606] Queue-update: 1, go, Total Queued: 1
[   30.694589] Queue-update: 1, go, Total Queued: 1
[   30.814613] Queue-update: 1, go, Total Queued: 1
[   30.934602] Queue-update: 1, go, Total Queued: 1
[   31.054605] Queue-update: 1, go, Total Queued: 1
[   31.182596] Queue-update: 1, go, Total Queued: 1
[   31.318621] Queue-update: 1, go, Total Queued: 1
[   31.414615] Queue-update: 1, go, Total Queued: 1
[   31.534610] Queue-update: 1, go, Total Queued: 1
[   31.670608] Queue-update: 1, go, Total Queued: 1
[   31.774626] Queue-update: 1, go, Total Queued: 1
[   31.894607] Queue-update: 1, go, Total Queued: 1
[   32.014609] Queue-update: 1, go, Total Queued: 1
[   32.134607] Queue-update: 1, go, Total Queued: 1
[   32.254611] Queue-update: 1, go, Total Queued: 1
[   32.374608] Queue-update: 1, go, Total Queued: 1
[   32.494619] Queue-update: 1, go, Total Queued: 1
[   32.614607] Queue-update: 1, go, Total Queued: 1
[   32.734612] Queue-update: 1, go, Total Queued: 1
[   32.862616] Queue-update: 1, go, Total Queued: 1
[   32.974620] Queue-update: 1, go, Total Queued: 1
[   33.110609] Queue-update: 1, go, Total Queued: 1
[   33.214609] Queue-update: 1, go, Total Queued: 1
[   33.342611] Queue-update: 1, go, Total Queued: 1
[   33.454607] Queue-update: 1, go, Total Queued: 1
[   33.574611] Queue-update: 1, go, Total Queued: 1
[   33.694619] Queue-update: 1, go, Total Queued: 1
[   33.814607] Queue-update: 1, go, Total Queued: 1
[   33.950614] Queue-update: 1, go, Total Queued: 1
[   34.062609] Queue-update: 1, go, Total Queued: 1
[   34.174609] Queue-update: 1, go, Total Queued: 1
[   34.294619] Queue-update: 1, go, Total Queued: 1
[   34.430533] Queue-update: 1, go, Total Queued: 1
[   34.534594] Queue-update: 1, go, Total Queued: 1
[   34.654605] Queue-update: 1, go, Total Queued: 1
[   34.790596] Queue-update: 1, go, Total Queued: 1
[   34.902611] Queue-update: 1, go, Total Queued: 1
[   35.014630] Queue-update: 1, go, Total Queued: 1
[   35.134634] Queue-update: 1, go, Total Queued: 1
[   35.262608] Queue-update: 1, go, Total Queued: 1
[   35.374634] Queue-update: 1, go, Total Queued: 1
[   35.494617] Queue-update: 1, go, Total Queued: 1
[   35.622608] Queue-update: 1, go, Total Queued: 1
[   35.742610] Queue-update: 1, go, Total Queued: 1
[   35.854572] Queue-update: 1, go, Total Queued: 1
[   35.982596] Queue-update: 1, go, Total Queued: 1
[   36.094603] Queue-update: 1, go, Total Queued: 1
[   36.222612] Queue-update: 1, go, Total Queued: 1
[   36.334610] Queue-update: 1, go, Total Queued: 1
[   36.454619] Queue-update: 1, go, Total Queued: 1
[   36.574619] Queue-update: 1, go, Total Queued: 1
[   36.694643] Queue-update: 1, go, Total Queued: 1
[   36.814614] Queue-update: 1, go, Total Queued: 1
[   36.934610] Queue-update: 1, go, Total Queued: 1
[   37.062609] Queue-update: 1, go, Total Queued: 1
[   37.198611] Queue-update: 1, go, Total Queued: 1
[   37.294618] Queue-update: 1, go, Total Queued: 1
[   37.414618] Queue-update: 1, go, Total Queued: 1
[   37.534595] Queue-update: 1, go, Total Queued: 1
[   37.662594] Queue-update: 1, go, Total Queued: 1
[   37.774610] Queue-update: 1, go, Total Queued: 1
[   37.894618] Queue-update: 1, go, Total Queued: 1
[   38.014619] Queue-update: 1, go, Total Queued: 1
[   38.142612] Queue-update: 1, go, Total Queued: 1
[   38.254609] Queue-update: 1, go, Total Queued: 1
[   38.374619] Queue-update: 1, go, Total Queued: 1
[   38.502481] Queue-update: 1, go, Total Queued: 1
[   38.614596] Queue-update: 1, go, Total Queued: 1
[   38.734597] Queue-update: 1, go, Total Queued: 1
[   38.854606] Queue-update: 1, go, Total Queued: 1
[   38.974620] Queue-update: 1, go, Total Queued: 1
[   39.094617] Queue-update: 1, go, Total Queued: 1
[   39.222597] Queue-update: 1, go, Total Queued: 1
[   39.334610] Queue-update: 1, go, Total Queued: 1
[   39.454609] Queue-update: 1, go, Total Queued: 1
[   39.574633] Queue-update: 1, go, Total Queued: 1
[   39.694611] Queue-update: 1, go, Total Queued: 1
[   39.814608] Queue-update: 1, go, Total Queued: 1
[   39.934610] Queue-update: 1, go, Total Queued: 1
[   40.054621] Queue-update: 1, go, Total Queued: 1
[   40.174611] Queue-update: 1, go, Total Queued: 1
[   40.297471] Queue-update: 1, go, Total Queued: 1
[   40.414504] Queue-update: 1, go, Total Queued: 1
[   40.534601] Queue-update: 1, go, Total Queued: 1
[   40.654611] Queue-update: 1, go, Total Queued: 1
[   40.774609] Queue-update: 1, go, Total Queued: 1
[   40.894628] Queue-update: 1, go, Total Queued: 1
[   41.014608] Queue-update: 1, go, Total Queued: 1
[   41.135594] Queue-update: 1, go, Total Queued: 1
real	0m 21.40s
user	0m 0.00s
sys	0m 0.04s
Remove events
real	0m 2.14s
user	0m 0.00s
sys	0m 0.04s


Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-23  1:33             ` Masami Hiramatsu
@ 2020-01-23  2:26               ` Paul E. McKenney
  2020-01-23  6:13                 ` Masami Hiramatsu
  0 siblings, 1 reply; 26+ messages in thread
From: Paul E. McKenney @ 2020-01-23  2:26 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar,
	bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo,
	David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy

On Thu, Jan 23, 2020 at 10:33:34AM +0900, Masami Hiramatsu wrote:
> On Wed, 22 Jan 2020 08:54:32 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > On Wed, Jan 22, 2020 at 10:12:40PM +0900, Masami Hiramatsu wrote:
> > > On Wed, 22 Jan 2020 07:11:15 -0500
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > On Wed, 22 Jan 2020 16:23:17 +0900
> > > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > > > 
> > > > > On Tue, 21 Jan 2020 19:29:05 -0500
> > > > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > > 
> > > > > > On Thu, 16 Jan 2020 23:44:52 +0900
> > > > > > Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > > > > >   
> > > > > > > Since the 5 jiffies delay for the optimizer is too
> > > > > > > short to wait for other probes, make it longer,
> > > > > > > like 1 second.  
> > > > > > 
> > > > > > Hi Masami,
> > > > > > 
> > > > > > Can you explain more *why* 5 jiffies is too short.  
> > > > > 
> > > > > Yes, I had introduced this 5 jiffies delay for multiple probe registration
> > > > > and unregistration like systemtap, which will use array-based interface to
> > > > > register/unregister. In that case, 5 jiffies will be enough for the delay
> > > > > to wait for other kprobe registration/unregsitration.
> > > > > 
> > > > > However, since perf and ftrace register/unregister probes one-by-one with
> > > > > RCU synchronization interval, the optimizer will be started before
> > > > > finishing to register/unregister all probes.
> > > > > And the optimizer locks kprobe_mutex a while -- RCU-tasks synchronization.
> > > > > Since the kprobe_mutex is also involved in disabling kprobes, this also
> > > > > stops probe-event disabling.
> > > > > 
> > > > > Maybe 5 jiffies is enough for adding/removing a few probe events, but
> > > > > not enough for dozens of probe events.
> > > > > 
> > > > 
> > > > Perhaps we should have a mechanism that can detect new probes being
> > > > added, and just continue to delay the optimization, instead of having
> > > > some arbitrary delay.
> > > 
> > > Yes, that is what [03/13] does :) 
> > > Anyway, it seems that the RCU-synchronization takes more than 5 jiffies.
> > > And in that case, [03/13] still doesn't work. That's why I added this patch
> > > after that.
> > 
> > If the RCU synchronization is synchronize_rcu_tasks(), then yes, it
> > will often take way more than 5 jiffies.  If it is synchronize_rcu(),
> > 5 jiffies would not be unusual, especially on larger systems.
> > But in the case of synchronize_rcu(), one option is to instead use
> > synchronize_rcu_expedited().  It is not clear that this last is really
> > justified in this case, but figured it might be worth mentioning.
> 
> It is synchronize_rcu(), but it seems sometimes it is called several
> times on one probe disabling.

OK, good to know.

> Anyway, without this update, I added a printk to count optimizer
> queue-length and found that all optimizer call with a single kprobe
> on the quenes. I think this depends on the machine, but as far as
> I tested on 8-threads qemu x86, shows this result...
> 
> Probes: 256 kprobe_events
> Enable events
> real	0m 0.02s
> user	0m 0.00s
> sys	0m 0.02s
> [   17.730548] Queue-update: 180, skipped, Total Queued: 180
> [   17.739445] Queue-update: 1, go, Total Queued: 180
> Disable events
> [   19.744634] Queue-update: 1, go, Total Queued: 1
> [   19.770743] Queue-update: 1, go, Total Queued: 1
> [   19.886625] Queue-update: 1, go, Total Queued: 1
> [   20.006631] Queue-update: 1, go, Total Queued: 1
> [   20.126628] Queue-update: 1, go, Total Queued: 1
> [   20.246620] Queue-update: 1, go, Total Queued: 1
> [   20.374665] Queue-update: 1, go, Total Queued: 1
> [   20.486617] Queue-update: 1, go, Total Queued: 1
> [   20.606608] Queue-update: 1, go, Total Queued: 1
> [   20.726596] Queue-update: 1, go, Total Queued: 1
> [   20.846608] Queue-update: 1, go, Total Queued: 1
> [   20.966723] Queue-update: 1, go, Total Queued: 1
> [   21.086611] Queue-update: 1, go, Total Queued: 1
> [   21.206605] Queue-update: 1, go, Total Queued: 1
> [   21.326603] Queue-update: 1, go, Total Queued: 1
> [   21.462609] Queue-update: 1, go, Total Queued: 1
> [   21.566755] Queue-update: 1, go, Total Queued: 1
> [   21.686606] Queue-update: 1, go, Total Queued: 1
> [   21.806611] Queue-update: 1, go, Total Queued: 1
> [   21.926609] Queue-update: 1, go, Total Queued: 1
> [   22.046621] Queue-update: 1, go, Total Queued: 1
> [   22.166729] Queue-update: 1, go, Total Queued: 1
> [   22.302609] Queue-update: 1, go, Total Queued: 1
> [   22.510627] Queue-update: 1, go, Total Queued: 1
> [   22.536638] Queue-update: 1, go, Total Queued: 1
> [   22.654618] Queue-update: 1, go, Total Queued: 1
> [   22.774643] Queue-update: 1, go, Total Queued: 1
> [   22.902609] Queue-update: 1, go, Total Queued: 1
> [   23.022608] Queue-update: 1, go, Total Queued: 1
> [   23.158606] Queue-update: 1, go, Total Queued: 1
> [   23.254618] Queue-update: 1, go, Total Queued: 1
> [   23.374647] Queue-update: 1, go, Total Queued: 1
> [   23.494605] Queue-update: 1, go, Total Queued: 1
> [   23.614610] Queue-update: 1, go, Total Queued: 1
> [   23.734606] Queue-update: 1, go, Total Queued: 1
> [   23.854609] Queue-update: 1, go, Total Queued: 1
> [   23.974615] Queue-update: 1, go, Total Queued: 1
> [   24.094609] Queue-update: 1, go, Total Queued: 1
> [   24.230607] Queue-update: 1, go, Total Queued: 1
> [   24.342625] Queue-update: 1, go, Total Queued: 1
> [   24.475478] Queue-update: 1, go, Total Queued: 1
> [   24.574554] Queue-update: 1, go, Total Queued: 1
> [   24.694605] Queue-update: 1, go, Total Queued: 1
> [   24.814585] Queue-update: 1, go, Total Queued: 1
> [   24.934591] Queue-update: 1, go, Total Queued: 1
> [   25.054614] Queue-update: 1, go, Total Queued: 1
> [   25.174628] Queue-update: 1, go, Total Queued: 1
> [   25.294637] Queue-update: 1, go, Total Queued: 1
> [   25.414620] Queue-update: 1, go, Total Queued: 1
> [   25.534553] Queue-update: 1, go, Total Queued: 1
> [   25.654610] Queue-update: 1, go, Total Queued: 1
> [   25.774627] Queue-update: 1, go, Total Queued: 1
> [   25.894609] Queue-update: 1, go, Total Queued: 1
> [   26.030548] Queue-update: 1, go, Total Queued: 1
> [   26.134626] Queue-update: 1, go, Total Queued: 1
> [   26.254620] Queue-update: 1, go, Total Queued: 1
> [   26.373603] Queue-update: 1, go, Total Queued: 1
> [   26.502606] Queue-update: 1, go, Total Queued: 1
> [   26.614607] Queue-update: 1, go, Total Queued: 1
> [   26.734610] Queue-update: 1, go, Total Queued: 1
> [   26.854620] Queue-update: 1, go, Total Queued: 1
> [   26.974660] Queue-update: 1, go, Total Queued: 1
> [   27.094620] Queue-update: 1, go, Total Queued: 1
> [   27.214596] Queue-update: 1, go, Total Queued: 1
> [   27.334640] Queue-update: 1, go, Total Queued: 1
> [   27.494606] Queue-update: 1, go, Total Queued: 1
> [   27.574703] Queue-update: 1, go, Total Queued: 1
> [   27.694609] Queue-update: 1, go, Total Queued: 1
> [   27.814607] Queue-update: 1, go, Total Queued: 1
> [   27.934631] Queue-update: 1, go, Total Queued: 1
> [   28.054606] Queue-update: 1, go, Total Queued: 1
> [   28.182685] Queue-update: 1, go, Total Queued: 1
> [   28.318568] Queue-update: 1, go, Total Queued: 1
> [   28.422610] Queue-update: 1, go, Total Queued: 1
> [   28.534621] Queue-update: 1, go, Total Queued: 1
> [   28.654618] Queue-update: 1, go, Total Queued: 1
> [   28.774622] Queue-update: 1, go, Total Queued: 1
> [   28.894609] Queue-update: 1, go, Total Queued: 1
> [   29.022609] Queue-update: 1, go, Total Queued: 1
> [   29.150615] Queue-update: 1, go, Total Queued: 1
> [   29.262610] Queue-update: 1, go, Total Queued: 1
> [   29.374621] Queue-update: 1, go, Total Queued: 1
> [   29.494606] Queue-update: 1, go, Total Queued: 1
> [   29.614616] Queue-update: 1, go, Total Queued: 1
> [   29.734607] Queue-update: 1, go, Total Queued: 1
> [   29.854601] Queue-update: 1, go, Total Queued: 1
> [   29.974610] Queue-update: 1, go, Total Queued: 1
> [   30.094625] Queue-update: 1, go, Total Queued: 1
> [   30.214606] Queue-update: 1, go, Total Queued: 1
> [   30.334602] Queue-update: 1, go, Total Queued: 1
> [   30.454634] Queue-update: 1, go, Total Queued: 1
> [   30.574606] Queue-update: 1, go, Total Queued: 1
> [   30.694589] Queue-update: 1, go, Total Queued: 1
> [   30.814613] Queue-update: 1, go, Total Queued: 1
> [   30.934602] Queue-update: 1, go, Total Queued: 1
> [   31.054605] Queue-update: 1, go, Total Queued: 1
> [   31.182596] Queue-update: 1, go, Total Queued: 1
> [   31.318621] Queue-update: 1, go, Total Queued: 1
> [   31.414615] Queue-update: 1, go, Total Queued: 1
> [   31.534610] Queue-update: 1, go, Total Queued: 1
> [   31.670608] Queue-update: 1, go, Total Queued: 1
> [   31.774626] Queue-update: 1, go, Total Queued: 1
> [   31.894607] Queue-update: 1, go, Total Queued: 1
> [   32.014609] Queue-update: 1, go, Total Queued: 1
> [   32.134607] Queue-update: 1, go, Total Queued: 1
> [   32.254611] Queue-update: 1, go, Total Queued: 1
> [   32.374608] Queue-update: 1, go, Total Queued: 1
> [   32.494619] Queue-update: 1, go, Total Queued: 1
> [   32.614607] Queue-update: 1, go, Total Queued: 1
> [   32.734612] Queue-update: 1, go, Total Queued: 1
> [   32.862616] Queue-update: 1, go, Total Queued: 1
> [   32.974620] Queue-update: 1, go, Total Queued: 1
> [   33.110609] Queue-update: 1, go, Total Queued: 1
> [   33.214609] Queue-update: 1, go, Total Queued: 1
> [   33.342611] Queue-update: 1, go, Total Queued: 1
> [   33.454607] Queue-update: 1, go, Total Queued: 1
> [   33.574611] Queue-update: 1, go, Total Queued: 1
> [   33.694619] Queue-update: 1, go, Total Queued: 1
> [   33.814607] Queue-update: 1, go, Total Queued: 1
> [   33.950614] Queue-update: 1, go, Total Queued: 1
> [   34.062609] Queue-update: 1, go, Total Queued: 1
> [   34.174609] Queue-update: 1, go, Total Queued: 1
> [   34.294619] Queue-update: 1, go, Total Queued: 1
> [   34.430533] Queue-update: 1, go, Total Queued: 1
> [   34.534594] Queue-update: 1, go, Total Queued: 1
> [   34.654605] Queue-update: 1, go, Total Queued: 1
> [   34.790596] Queue-update: 1, go, Total Queued: 1
> [   34.902611] Queue-update: 1, go, Total Queued: 1
> [   35.014630] Queue-update: 1, go, Total Queued: 1
> [   35.134634] Queue-update: 1, go, Total Queued: 1
> [   35.262608] Queue-update: 1, go, Total Queued: 1
> [   35.374634] Queue-update: 1, go, Total Queued: 1
> [   35.494617] Queue-update: 1, go, Total Queued: 1
> [   35.622608] Queue-update: 1, go, Total Queued: 1
> [   35.742610] Queue-update: 1, go, Total Queued: 1
> [   35.854572] Queue-update: 1, go, Total Queued: 1
> [   35.982596] Queue-update: 1, go, Total Queued: 1
> [   36.094603] Queue-update: 1, go, Total Queued: 1
> [   36.222612] Queue-update: 1, go, Total Queued: 1
> [   36.334610] Queue-update: 1, go, Total Queued: 1
> [   36.454619] Queue-update: 1, go, Total Queued: 1
> [   36.574619] Queue-update: 1, go, Total Queued: 1
> [   36.694643] Queue-update: 1, go, Total Queued: 1
> [   36.814614] Queue-update: 1, go, Total Queued: 1
> [   36.934610] Queue-update: 1, go, Total Queued: 1
> [   37.062609] Queue-update: 1, go, Total Queued: 1
> [   37.198611] Queue-update: 1, go, Total Queued: 1
> [   37.294618] Queue-update: 1, go, Total Queued: 1
> [   37.414618] Queue-update: 1, go, Total Queued: 1
> [   37.534595] Queue-update: 1, go, Total Queued: 1
> [   37.662594] Queue-update: 1, go, Total Queued: 1
> [   37.774610] Queue-update: 1, go, Total Queued: 1
> [   37.894618] Queue-update: 1, go, Total Queued: 1
> [   38.014619] Queue-update: 1, go, Total Queued: 1
> [   38.142612] Queue-update: 1, go, Total Queued: 1
> [   38.254609] Queue-update: 1, go, Total Queued: 1
> [   38.374619] Queue-update: 1, go, Total Queued: 1
> [   38.502481] Queue-update: 1, go, Total Queued: 1
> [   38.614596] Queue-update: 1, go, Total Queued: 1
> [   38.734597] Queue-update: 1, go, Total Queued: 1
> [   38.854606] Queue-update: 1, go, Total Queued: 1
> [   38.974620] Queue-update: 1, go, Total Queued: 1
> [   39.094617] Queue-update: 1, go, Total Queued: 1
> [   39.222597] Queue-update: 1, go, Total Queued: 1
> [   39.334610] Queue-update: 1, go, Total Queued: 1
> [   39.454609] Queue-update: 1, go, Total Queued: 1
> [   39.574633] Queue-update: 1, go, Total Queued: 1
> [   39.694611] Queue-update: 1, go, Total Queued: 1
> [   39.814608] Queue-update: 1, go, Total Queued: 1
> [   39.934610] Queue-update: 1, go, Total Queued: 1
> [   40.054621] Queue-update: 1, go, Total Queued: 1
> [   40.174611] Queue-update: 1, go, Total Queued: 1
> [   40.297471] Queue-update: 1, go, Total Queued: 1
> [   40.414504] Queue-update: 1, go, Total Queued: 1
> [   40.534601] Queue-update: 1, go, Total Queued: 1
> [   40.654611] Queue-update: 1, go, Total Queued: 1
> [   40.774609] Queue-update: 1, go, Total Queued: 1
> [   40.894628] Queue-update: 1, go, Total Queued: 1
> [   41.014608] Queue-update: 1, go, Total Queued: 1
> [   41.135594] Queue-update: 1, go, Total Queued: 1
> real	0m 21.40s
> user	0m 0.00s
> sys	0m 0.04s

So 21.4s/256 = 84 milliseconds per event disable, correct?

It might be worth trying synchronize_rcu_expedited() just as an experiment
to see if it speeds things up significantly.

							Thanx, Paul

> Remove events
> real	0m 2.14s
> user	0m 0.00s
> sys	0m 0.04s
> 
> 
> Thank you,
> 
> 
> -- 
> Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-23  2:26               ` Paul E. McKenney
@ 2020-01-23  6:13                 ` Masami Hiramatsu
  2020-01-23 10:03                   ` Paul E. McKenney
  0 siblings, 1 reply; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-23  6:13 UTC (permalink / raw)
  To: paulmck
  Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar,
	bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo,
	David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy,
	Masami Hiramatsu

On Wed, 22 Jan 2020 18:26:47 -0800
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> > Anyway, without this update, I added a printk to count optimizer
> > queue-length and found that all optimizer call with a single kprobe
> > on the quenes. I think this depends on the machine, but as far as
> > I tested on 8-threads qemu x86, shows this result...
> > 
> > Probes: 256 kprobe_events
> > Enable events
> > real	0m 0.02s
> > user	0m 0.00s
> > sys	0m 0.02s
> > [   17.730548] Queue-update: 180, skipped, Total Queued: 180
> > [   17.739445] Queue-update: 1, go, Total Queued: 180
> > Disable events
[...]
> > [   41.135594] Queue-update: 1, go, Total Queued: 1
> > real	0m 21.40s
> > user	0m 0.00s
> > sys	0m 0.04s
> 
> So 21.4s/256 = 84 milliseconds per event disable, correct?

Actually, it seems only 172 probes are on the unoptimized list, so
the number will be a bit different.

Anyway, that above elapsed time is including non-batch optimizer
working time as below.

(1) start looping on probe events
  (2) disabling-kprobe
    (2.1) wait kprobe_mutex if optimizer is running
    (2.2) if the kprobe is on optimized kprobe, queue it to unoptimizing
          list and kick optimizer with 5 jiffies delay
  (4) unlink enabled event
  (5) wait synchronize_rcu()
  (6) optimizer start optimization before finishing (5)
(7) goto (1)

I think the disabling performance issue came from (6) (and (2.1)).
Thus, if we change (2.2) to 1 HZ jiffies, the optimizer will start
after some loops are done. (and the optimizer detects "active"
queuing, postpone the process)

> 
> It might be worth trying synchronize_rcu_expedited() just as an experiment
> to see if it speeds things up significantly.

Would you mean replacing synchronize_rcu() in disabling loop, or
replacing synchronize_rcu_tasks() in optimizer?

I think that is not a root cause of this behavior, since if we
make the optimizer delay to 1 sec, it seems enough for making
it a batch operation. See below, this is the result with patched
kernel (1 HZ delay).

Probes: 256 kprobe_events
Enable events
real	0m 0.07s
user	0m 0.00s
sys	0m 0.07s
[   19.191181] Queue-update: 180, skipped, Total Queued: 180
Disable events
[   20.214966] Queue-update: 1, go, Total Queued: 172
[   21.302924] Queue-update: 86, skipped, Total Queued: 86
real	0m 2.11s
user	0m 0.00s
sys	0m 0.03s
[   22.327173] Queue-update: 87, skipped, Total Queued: 172
[   23.350933] Queue-update: 1, go, Total Queued: 172
Remove events
real	0m 2.13s
user	0m 0.02s
sys	0m 0.02s

As you can see, the optimizer ran outside of the disabling loop.
In that case, it is OK to synchronize RCU tasks in the optimizer
because it just runs *once* per multiple probe events.

From above result, 86 probes are disabled per 1 sec delay.
Each probe disabling took 11-12 msec in average. So 
(HZ / 10) can also be good. (But note that the optimizer
will retry to run each time until the disabling loop is
finished.)

BTW, testing kernel was build with HZ=1000, if HZ=250 or HZ=100,
the result will be different in the current code. So I think
we should use HZ-based delay instead of fixed number.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second
  2020-01-23  6:13                 ` Masami Hiramatsu
@ 2020-01-23 10:03                   ` Paul E. McKenney
  0 siblings, 0 replies; 26+ messages in thread
From: Paul E. McKenney @ 2020-01-23 10:03 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Brendan Gregg, Alexei Starovoitov, Ingo Molnar,
	bpf, linux-kernel, Daniel Borkmann, Arnaldo Carvalho de Melo,
	David S . Miller, joel, Naveen N . Rao, Anil S Keshavamurthy

On Thu, Jan 23, 2020 at 03:13:28PM +0900, Masami Hiramatsu wrote:
> On Wed, 22 Jan 2020 18:26:47 -0800
> "Paul E. McKenney" <paulmck@kernel.org> wrote:
> 
> > > Anyway, without this update, I added a printk to count optimizer
> > > queue-length and found that all optimizer call with a single kprobe
> > > on the quenes. I think this depends on the machine, but as far as
> > > I tested on 8-threads qemu x86, shows this result...
> > > 
> > > Probes: 256 kprobe_events
> > > Enable events
> > > real	0m 0.02s
> > > user	0m 0.00s
> > > sys	0m 0.02s
> > > [   17.730548] Queue-update: 180, skipped, Total Queued: 180
> > > [   17.739445] Queue-update: 1, go, Total Queued: 180
> > > Disable events
> [...]
> > > [   41.135594] Queue-update: 1, go, Total Queued: 1
> > > real	0m 21.40s
> > > user	0m 0.00s
> > > sys	0m 0.04s
> > 
> > So 21.4s/256 = 84 milliseconds per event disable, correct?
> 
> Actually, it seems only 172 probes are on the unoptimized list, so
> the number will be a bit different.
> 
> Anyway, that above elapsed time is including non-batch optimizer
> working time as below.
> 
> (1) start looping on probe events
>   (2) disabling-kprobe
>     (2.1) wait kprobe_mutex if optimizer is running
>     (2.2) if the kprobe is on optimized kprobe, queue it to unoptimizing
>           list and kick optimizer with 5 jiffies delay
>   (4) unlink enabled event
>   (5) wait synchronize_rcu()
>   (6) optimizer start optimization before finishing (5)
> (7) goto (1)
> 
> I think the disabling performance issue came from (6) (and (2.1)).
> Thus, if we change (2.2) to 1 HZ jiffies, the optimizer will start
> after some loops are done. (and the optimizer detects "active"
> queuing, postpone the process)
> 
> > 
> > It might be worth trying synchronize_rcu_expedited() just as an experiment
> > to see if it speeds things up significantly.
> 
> Would you mean replacing synchronize_rcu() in disabling loop, or
> replacing synchronize_rcu_tasks() in optimizer?

The former.  (There is no synchronize_rcu_tasks_expedited().)

> I think that is not a root cause of this behavior, since if we
> make the optimizer delay to 1 sec, it seems enough for making
> it a batch operation. See below, this is the result with patched
> kernel (1 HZ delay).
> 
> Probes: 256 kprobe_events
> Enable events
> real	0m 0.07s
> user	0m 0.00s
> sys	0m 0.07s
> [   19.191181] Queue-update: 180, skipped, Total Queued: 180
> Disable events
> [   20.214966] Queue-update: 1, go, Total Queued: 172
> [   21.302924] Queue-update: 86, skipped, Total Queued: 86
> real	0m 2.11s
> user	0m 0.00s
> sys	0m 0.03s
> [   22.327173] Queue-update: 87, skipped, Total Queued: 172
> [   23.350933] Queue-update: 1, go, Total Queued: 172
> Remove events
> real	0m 2.13s
> user	0m 0.02s
> sys	0m 0.02s
> 
> As you can see, the optimizer ran outside of the disabling loop.
> In that case, it is OK to synchronize RCU tasks in the optimizer
> because it just runs *once* per multiple probe events.

Even better!

> >From above result, 86 probes are disabled per 1 sec delay.
> Each probe disabling took 11-12 msec in average. So 
> (HZ / 10) can also be good. (But note that the optimizer
> will retry to run each time until the disabling loop is
> finished.)
> 
> BTW, testing kernel was build with HZ=1000, if HZ=250 or HZ=100,
> the result will be different in the current code. So I think
> we should use HZ-based delay instead of fixed number.

Agreed, the HZ-based delay seems worth a try.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link
  2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu
@ 2020-01-27 15:02   ` kbuild test robot
  2020-01-27 15:02   ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot
  1 sibling, 0 replies; 26+ messages in thread
From: kbuild test robot @ 2020-01-27 15:02 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: kbuild-all, Brendan Gregg, Steven Rostedt, Alexei Starovoitov,
	mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

Hi Masami,

I love your patch! Perhaps something to improve:

[auto build test WARNING on next-20200115]
[cannot apply to tip/perf/core ia64/next tip/x86/core linus/master v5.5-rc6 v5.5-rc5 v5.5-rc4 v5.5]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/tracing-kprobes-Introduce-async-unregistration/20200117-191143
base:    5b483a1a0ea1ab19a5734051c9692c2cfabe1327
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-153-g47b6dfef-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> kernel/trace/trace_kprobe.c:331:10: sparse: sparse: symbol 'trace_kprobe_disabled_finished' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static
  2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu
  2020-01-27 15:02   ` kbuild test robot
@ 2020-01-27 15:02   ` kbuild test robot
  2020-01-28 15:02     ` Masami Hiramatsu
  1 sibling, 1 reply; 26+ messages in thread
From: kbuild test robot @ 2020-01-27 15:02 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: kbuild-all, Brendan Gregg, Steven Rostedt, Alexei Starovoitov,
	mhiramat, Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy


Fixes: 3c794bf25a2b ("tracing/kprobe: Use call_rcu to defer freeing event_file_link")
Signed-off-by: kbuild test robot <lkp@intel.com>
---
 trace_kprobe.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 1a5882bb77471..fba738aa458af 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -328,7 +328,7 @@ static inline int __enable_trace_kprobe(struct trace_kprobe *tk)
 	return ret;
 }
 
-atomic_t trace_kprobe_disabled_finished;
+static atomic_t trace_kprobe_disabled_finished;
 
 static void trace_kprobe_disabled_handlers_finish(void)
 {

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static
  2020-01-27 15:02   ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot
@ 2020-01-28 15:02     ` Masami Hiramatsu
  0 siblings, 0 replies; 26+ messages in thread
From: Masami Hiramatsu @ 2020-01-28 15:02 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, Brendan Gregg, Steven Rostedt, Alexei Starovoitov,
	Ingo Molnar, bpf, linux-kernel, Daniel Borkmann,
	Arnaldo Carvalho de Melo, David S . Miller, paulmck, joel,
	Naveen N . Rao, Anil S Keshavamurthy

On Mon, 27 Jan 2020 23:02:43 +0800
kbuild test robot <lkp@intel.com> wrote:

> 
> Fixes: 3c794bf25a2b ("tracing/kprobe: Use call_rcu to defer freeing event_file_link")
> Signed-off-by: kbuild test robot <lkp@intel.com>
> ---
>  trace_kprobe.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 1a5882bb77471..fba738aa458af 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -328,7 +328,7 @@ static inline int __enable_trace_kprobe(struct trace_kprobe *tk)
>  	return ret;
>  }
>  
> -atomic_t trace_kprobe_disabled_finished;
> +static atomic_t trace_kprobe_disabled_finished;
>  
>  static void trace_kprobe_disabled_handlers_finish(void)
>  {

Oops, right. I forgot the static. I'll update it.

Thanks,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, back to index

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-16 14:44 [RFT PATCH 00/13] tracing: kprobes: Introduce async unregistration Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 01/13] kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 02/13] kprobes: Remove redundant arch_disarm_kprobe() call Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 03/13] kprobes: Postpone optimizer until a bunch of probes (un)registered Masami Hiramatsu
2020-01-16 14:44 ` [RFT PATCH 04/13] kprobes: Make optimizer delay to 1 second Masami Hiramatsu
2020-01-22  0:29   ` Steven Rostedt
2020-01-22  7:23     ` Masami Hiramatsu
2020-01-22 12:11       ` Steven Rostedt
2020-01-22 13:12         ` Masami Hiramatsu
2020-01-22 16:54           ` Paul E. McKenney
2020-01-23  1:33             ` Masami Hiramatsu
2020-01-23  2:26               ` Paul E. McKenney
2020-01-23  6:13                 ` Masami Hiramatsu
2020-01-23 10:03                   ` Paul E. McKenney
2020-01-16 14:45 ` [RFT PATCH 05/13] tracing/kprobe: Use call_rcu to defer freeing event_file_link Masami Hiramatsu
2020-01-27 15:02   ` kbuild test robot
2020-01-27 15:02   ` [RFC PATCH] tracing/kprobe: trace_kprobe_disabled_finished can be static kbuild test robot
2020-01-28 15:02     ` Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 06/13] kprobes: Enable kprobe-booster with CONFIG_PREEMPT=y Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 07/13] kprobes: Use normal list traversal API if a mutex is held Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 08/13] kprobes: Use workqueue for reclaiming kprobe insn cache pages Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 09/13] kprobes: Free kprobe_insn_page asynchronously Masami Hiramatsu
2020-01-16 14:45 ` [RFT PATCH 10/13] kprobes: Make free_*insn_slot() mutex-less Masami Hiramatsu
2020-01-16 14:46 ` [RFT PATCH 11/13] kprobes: Add asynchronous unregistration APIs Masami Hiramatsu
2020-01-16 14:46 ` [RFT PATCH 12/13] tracing/kprobe: Free probe event asynchronously Masami Hiramatsu
2020-01-16 14:46 ` [RFT PATCH 13/13] tracing/kprobe: perf_event: Remove local kprobe " Masami Hiramatsu

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git