All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
@ 2013-07-04  3:33 Steven Rostedt
  2013-07-04  3:33 ` [RFC][PATCH 1/4] tracing: Add ref count to ftrace_event_call Steven Rostedt
                   ` (6 more replies)
  0 siblings, 7 replies; 67+ messages in thread
From: Steven Rostedt @ 2013-07-04  3:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton


Currently there exists a race with deleting a kprobe or uprobe and
a user opening the probe event file or using perf events.

The problem stems from not being able to take the probe_lock from the
unregister code because we may have the event_mutex at the time, and
the event mutex may be taken with the probe_lock held.

To solve this, the events get a ref count (using the flags field), where
when an event file is opened, the ftrace_event_call ref count increments.
Then this is checked under event_mutex and if set, the unregistering
of the probe will fail.

Here's a test that shows how things break:

     # cd /sys/kernel/debug/tracing
     # echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
     # enable_probe() {
        sleep 10
        echo 1
     }
     # file=events/kprobes/sigprocmask/enable
     # enable_probe > $file &
     > kprobe_events
    
    The above will corrupt the kprobe system, as the write to the enable
    file will happen after the kprobe was deleted.
    
    Trying to create the probe again fails:
    
     # echo 'p:sigprocmask sigprocmask' > kprobe_events
     # cat kprobe_events
    p:kprobes/sigprocmask sigprocmask
     # ls events/kprobes/
    enable  filter

After applying these patches, the "> kprobe_events" fails due to the
event being busy.

Masami, please review these patches and give your ack.

Srikar, can you please review the last patch. I didn't test uprobes
with this. I'll do that after the 4th.

Thanks,

-- Steve


Oleg Nesterov (1):
      tracing: trace_remove_event_call() should fail if call/file is in use

Steven Rostedt (Red Hat) (3):
      tracing: Add ref count to ftrace_event_call
      tracing/kprobes: Fail to unregister if probe event files are open
      tracing/uprobes: Fail to unregister if probe event files are open

----
 include/linux/ftrace_event.h |    8 +++-
 kernel/trace/trace_events.c  |  109 +++++++++++++++++++++++++++++++++++++++---
 kernel/trace/trace_kprobe.c  |   21 +++++---
 kernel/trace/trace_uprobe.c  |   48 ++++++++++++++-----
 4 files changed, 160 insertions(+), 26 deletions(-)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC][PATCH 1/4] tracing: Add ref count to ftrace_event_call
  2013-07-04  3:33 [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Steven Rostedt
@ 2013-07-04  3:33 ` Steven Rostedt
  2013-07-04  4:22   ` Masami Hiramatsu
  2013-07-04  3:33 ` [RFC][PATCH 2/4] tracing: trace_remove_event_call() should fail if call/file is in use Steven Rostedt
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2013-07-04  3:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

[-- Attachment #1: 0001-tracing-Add-ref-count-to-ftrace_event_call.patch --]
[-- Type: text/plain, Size: 6487 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

When one of the event files is opened, we need to prevent them from
being removed. Modules do with with the module owner set (automated
from the VFS layer).  The ftrace buffer instances have a ref count added
to the trace_array when the enabled file is opened (the others are not
that big of a deal, as they only reference the event calls which
still exist when an instance disappears). But kprobes and uprobes
do not have any protection.

 # cd /sys/kernel/debug/tracing
 # echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
 # enable_probe() {
	sleep 10
	echo 1
 }
 # file=events/kprobes/sigprocmask/enable
 # enable_probe > $file &
 > kprobe_events

The above will corrupt the kprobe system, as the write to the enable
file will happen after the kprobe was deleted.

Trying to create the probe again fails:

 # echo 'p:sigprocmask sigprocmask' > kprobe_events
 # cat kprobe_events
p:kprobes/sigprocmask sigprocmask
 # ls events/kprobes/
enable  filter

Notice that the sigprocmask does not exist.

The first step to fix this is by adding ref counts to the ftrace_event_calls.
Using the flags field, and protecting the updates with the event_mutex
the ref count will be use the first 20 bits of the flags field, and the
current flags will be shifted 20 bits.

Link: http://lkml.kernel.org/r/1372883643.22688.118.camel@gandalf.local.home

Suggested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h |    6 ++++
 kernel/trace/trace_events.c  |   76 +++++++++++++++++++++++++++++++++++++++---
 2 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4372658..72ff2c6 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -195,6 +195,7 @@ extern int ftrace_event_reg(struct ftrace_event_call *event,
 			    enum trace_reg type, void *data);
 
 enum {
+	TRACE_EVENT_FL_REF_MAX_BIT = 20,
 	TRACE_EVENT_FL_FILTERED_BIT,
 	TRACE_EVENT_FL_CAP_ANY_BIT,
 	TRACE_EVENT_FL_NO_SET_FILTER_BIT,
@@ -203,6 +204,8 @@ enum {
 };
 
 /*
+ * Event ref count uses the first 20 bits of the flags field.
+ *
  * Event flags:
  *  FILTERED	  - The event has a filter attached
  *  CAP_ANY	  - Any user can enable for perf
@@ -213,6 +216,7 @@ enum {
  *                     it is best to clear the buffers that used it).
  */
 enum {
+	TRACE_EVENT_FL_REF_MAX		= (1 << TRACE_EVENT_FL_REF_MAX_BIT),
 	TRACE_EVENT_FL_FILTERED		= (1 << TRACE_EVENT_FL_FILTERED_BIT),
 	TRACE_EVENT_FL_CAP_ANY		= (1 << TRACE_EVENT_FL_CAP_ANY_BIT),
 	TRACE_EVENT_FL_NO_SET_FILTER	= (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT),
@@ -220,6 +224,8 @@ enum {
 	TRACE_EVENT_FL_WAS_ENABLED	= (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
 };
 
+#define TRACE_EVENT_FL_REF_MASK	(TRACE_EVENT_FL_REF_MAX - 1)
+
 struct ftrace_event_call {
 	struct list_head	list;
 	struct ftrace_event_class *class;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 7d85429..90cf243 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -391,6 +391,28 @@ static void __get_system_dir(struct ftrace_subsystem_dir *dir)
 	__get_system(dir->subsystem);
 }
 
+static int ftrace_event_call_get(struct ftrace_event_call *call)
+{
+	int ret = 0;
+
+	mutex_lock(&event_mutex);
+	if ((call->flags & TRACE_EVENT_FL_REF_MASK) == TRACE_EVENT_FL_REF_MAX - 1)
+		ret = -EBUSY;
+	else
+		call->flags++;
+	mutex_unlock(&event_mutex);
+
+	return ret;
+}
+
+static void ftrace_event_call_put(struct ftrace_event_call *call)
+{
+	mutex_lock(&event_mutex);
+	if (!WARN_ON_ONCE(!(call->flags & TRACE_EVENT_FL_REF_MASK)))
+		call->flags--;
+	mutex_unlock(&event_mutex);
+}
+
 static void __put_system_dir(struct ftrace_subsystem_dir *dir)
 {
 	WARN_ON_ONCE(dir->ref_count == 0);
@@ -424,7 +446,15 @@ static int tracing_open_generic_file(struct inode *inode, struct file *filp)
 
 	ret = tracing_open_generic(inode, filp);
 	if (ret < 0)
-		trace_array_put(tr);
+		goto fail;
+
+	ret = ftrace_event_call_get(file->event_call);
+	if (ret < 0)
+		goto fail;
+
+	return 0;
+ fail:
+	trace_array_put(tr);
 	return ret;
 }
 
@@ -433,12 +463,40 @@ static int tracing_release_generic_file(struct inode *inode, struct file *filp)
 	struct ftrace_event_file *file = inode->i_private;
 	struct trace_array *tr = file->tr;
 
+	ftrace_event_call_put(file->event_call);
 	trace_array_put(tr);
 
 	return 0;
 }
 
 /*
+ * Open and update call ref count.
+ * Must have ftrace_event_call passed in.
+ */
+static int tracing_open_generic_call(struct inode *inode, struct file *filp)
+{
+	struct ftrace_event_call *call = inode->i_private;
+
+	return ftrace_event_call_get(call);
+}
+
+static int tracing_release_generic_call(struct inode *inode, struct file *filp)
+{
+	struct ftrace_event_call *call = inode->i_private;
+
+	ftrace_event_call_put(call);
+	return 0;
+}
+
+static int tracing_seq_release_call(struct inode *inode, struct file *filp)
+{
+	struct ftrace_event_call *call = inode->i_private;
+
+	ftrace_event_call_put(call);
+	return seq_release(inode, filp);
+}
+
+/*
  * __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events.
  */
 static int
@@ -949,10 +1007,16 @@ static int trace_format_open(struct inode *inode, struct file *file)
 	struct seq_file *m;
 	int ret;
 
-	ret = seq_open(file, &trace_format_seq_ops);
+	ret = ftrace_event_call_get(call);
 	if (ret < 0)
 		return ret;
 
+	ret = seq_open(file, &trace_format_seq_ops);
+	if (ret < 0) {
+		ftrace_event_call_put(call);
+		return ret;
+	}
+
 	m = file->private_data;
 	m->private = call;
 
@@ -1260,20 +1324,22 @@ static const struct file_operations ftrace_event_format_fops = {
 	.open = trace_format_open,
 	.read = seq_read,
 	.llseek = seq_lseek,
-	.release = seq_release,
+	.release = tracing_seq_release_call,
 };
 
 static const struct file_operations ftrace_event_id_fops = {
-	.open = tracing_open_generic,
+	.open = tracing_open_generic_call,
 	.read = event_id_read,
 	.llseek = default_llseek,
+	.release = tracing_release_generic_call,
 };
 
 static const struct file_operations ftrace_event_filter_fops = {
-	.open = tracing_open_generic,
+	.open = tracing_open_generic_call,
 	.read = event_filter_read,
 	.write = event_filter_write,
 	.llseek = default_llseek,
+	.release = tracing_release_generic_call,
 };
 
 static const struct file_operations ftrace_subsystem_filter_fops = {
-- 
1.7.10.4



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC][PATCH 2/4] tracing: trace_remove_event_call() should fail if call/file is in use
  2013-07-04  3:33 [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Steven Rostedt
  2013-07-04  3:33 ` [RFC][PATCH 1/4] tracing: Add ref count to ftrace_event_call Steven Rostedt
@ 2013-07-04  3:33 ` Steven Rostedt
  2013-07-04 12:48   ` Masami Hiramatsu
  2013-07-04  3:33 ` [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open Steven Rostedt
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2013-07-04  3:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

[-- Attachment #1: 0002-tracing-trace_remove_event_call-should-fail-if-call-.patch --]
[-- Type: text/plain, Size: 2834 bytes --]

From: Oleg Nesterov <oleg@redhat.com>

Change trace_remove_event_call(call) to return the error if this
call is active. This is what the callers assume but can't verify
outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c
need the additional changes, unregister_trace_probe() should abort
if trace_remove_event_call() fails.

We also check TRACE_EVENT_FL_REF_MASK to ensure that nobody opened
the files we are going to remove, these means that nobody can access
the soon-to-be-freed ftrace_event_file/call via filp->private_data.

Link: http://lkml.kernel.org/r/20130702222359.GA27629@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace_event.h |    2 +-
 kernel/trace/trace_events.c  |   33 +++++++++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 72ff2c6..bdf6bdd 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -338,7 +338,7 @@ extern int trace_define_field(struct ftrace_event_call *call, const char *type,
 			      const char *name, int offset, int size,
 			      int is_signed, int filter_type);
 extern int trace_add_event_call(struct ftrace_event_call *call);
-extern void trace_remove_event_call(struct ftrace_event_call *call);
+extern int trace_remove_event_call(struct ftrace_event_call *call);
 
 #define is_signed_type(type)	(((type)(-1)) < (type)1)
 
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 90cf243..1a5547e 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1766,16 +1766,45 @@ static void __trace_remove_event_call(struct ftrace_event_call *call)
 	destroy_preds(call);
 }
 
+static int probe_remove_event_call(struct ftrace_event_call *call)
+{
+	struct trace_array *tr;
+	struct ftrace_event_file *file;
+
+	if (call->flags & TRACE_EVENT_FL_REF_MASK)
+		return -EBUSY;
+
+#ifdef CONFIG_PERF_EVENTS
+	if (call->perf_refcount)
+		return -EBUSY;
+#endif
+	do_for_each_event_file(tr, file) {
+		if (file->event_call != call)
+			continue;
+		if (file->flags & FTRACE_EVENT_FL_ENABLED)
+			return -EBUSY;
+		break;
+	} while_for_each_event_file();
+
+	__trace_remove_event_call(call);
+
+	return 0;
+}
+
 /* Remove an event_call */
-void trace_remove_event_call(struct ftrace_event_call *call)
+int trace_remove_event_call(struct ftrace_event_call *call)
 {
+	int ret;
+
 	mutex_lock(&trace_types_lock);
 	mutex_lock(&event_mutex);
 	down_write(&trace_event_sem);
-	__trace_remove_event_call(call);
+	ret = probe_remove_event_call(call);
 	up_write(&trace_event_sem);
 	mutex_unlock(&event_mutex);
 	mutex_unlock(&trace_types_lock);
+
+	return ret;
 }
 
 #define for_each_event(event, start, end)			\
-- 
1.7.10.4



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-04  3:33 [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Steven Rostedt
  2013-07-04  3:33 ` [RFC][PATCH 1/4] tracing: Add ref count to ftrace_event_call Steven Rostedt
  2013-07-04  3:33 ` [RFC][PATCH 2/4] tracing: trace_remove_event_call() should fail if call/file is in use Steven Rostedt
@ 2013-07-04  3:33 ` Steven Rostedt
  2013-07-04 12:45   ` Masami Hiramatsu
                     ` (2 more replies)
  2013-07-04  3:33 ` [RFC][PATCH 4/4] tracing/uprobes: " Steven Rostedt
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 67+ messages in thread
From: Steven Rostedt @ 2013-07-04  3:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

[-- Attachment #1: 0003-tracing-kprobes-Fail-to-unregister-if-probe-event-fi.patch --]
[-- Type: text/plain, Size: 3008 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

When one of the event files is opened, we need to prevent them from
being removed. Modules do with with the module owner set (automated
from the VFS layer).  The ftrace buffer instances have a ref count added
to the trace_array when the enabled file is opened (the others are not
that big of a deal, as they only reference the event calls which
still exist when an instance disappears). But kprobes and uprobes
do not have any protection.

 # cd /sys/kernel/debug/tracing
 # echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
 # enable_probe() {
    sleep 10
    echo 1
 }
 # file=events/kprobes/sigprocmask/enable
 # enable_probe > $file &
 > kprobe_events

The above will corrupt the kprobe system, as the write to the enable
file will happen after the kprobe was deleted.

Trying to create the probe again fails:

 # echo 'p:sigprocmask sigprocmask' > kprobe_events
 # cat kprobe_events
p:kprobes/sigprocmask sigprocmask
 # ls events/kprobes/
enable  filter

Have the unregister probe fail when the event files are open, in use
are used by perf.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_kprobe.c |   21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 7ed6976..ffcaf42 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -95,7 +95,7 @@ static __kprobes bool trace_probe_is_on_module(struct trace_probe *tp)
 }
 
 static int register_probe_event(struct trace_probe *tp);
-static void unregister_probe_event(struct trace_probe *tp);
+static int unregister_probe_event(struct trace_probe *tp);
 
 static DEFINE_MUTEX(probe_lock);
 static LIST_HEAD(probe_list);
@@ -340,9 +340,12 @@ static int unregister_trace_probe(struct trace_probe *tp)
 	if (trace_probe_is_enabled(tp))
 		return -EBUSY;
 
+	/* Will fail if probe is being used by ftrace or perf */
+	if (unregister_probe_event(tp))
+		return -EBUSY;
+
 	__unregister_trace_probe(tp);
 	list_del(&tp->list);
-	unregister_probe_event(tp);
 
 	return 0;
 }
@@ -621,7 +624,9 @@ static int release_all_trace_probes(void)
 	/* TODO: Use batch unregistration */
 	while (!list_empty(&probe_list)) {
 		tp = list_entry(probe_list.next, struct trace_probe, list);
-		unregister_trace_probe(tp);
+		ret = unregister_trace_probe(tp);
+		if (ret)
+			goto end;
 		free_trace_probe(tp);
 	}
 
@@ -1242,11 +1247,15 @@ static int register_probe_event(struct trace_probe *tp)
 	return ret;
 }
 
-static void unregister_probe_event(struct trace_probe *tp)
+static int unregister_probe_event(struct trace_probe *tp)
 {
+	int ret;
+
 	/* tp->event is unregistered in trace_remove_event_call() */
-	trace_remove_event_call(&tp->call);
-	kfree(tp->call.print_fmt);
+	ret = trace_remove_event_call(&tp->call);
+	if (!ret)
+		kfree(tp->call.print_fmt);
+	return ret;
 }
 
 /* Make a debugfs interface for controlling probe points */
-- 
1.7.10.4



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC][PATCH 4/4] tracing/uprobes: Fail to unregister if probe event files are open
  2013-07-04  3:33 [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Steven Rostedt
                   ` (2 preceding siblings ...)
  2013-07-04  3:33 ` [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open Steven Rostedt
@ 2013-07-04  3:33 ` Steven Rostedt
  2013-08-01  3:40   ` Steven Rostedt
  2013-08-01 14:08   ` Oleg Nesterov
  2013-07-04  4:00 ` [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Masami Hiramatsu
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 67+ messages in thread
From: Steven Rostedt @ 2013-07-04  3:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

[-- Attachment #1: 0004-tracing-uprobes-Fail-to-unregister-if-probe-event-fi.patch --]
[-- Type: text/plain, Size: 4112 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

When one of the event files is opened, we need to prevent them from
being removed. Modules do with with the module owner set (automated
from the VFS layer).  The ftrace buffer instances have a ref count added
to the trace_array when the enabled file is opened (the others are not
that big of a deal, as they only reference the event calls which
still exist when an instance disappears). But kprobes and uprobes
do not have any protection.

Have the unregister probe fail when the event files are open, in use
are used by perf.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_uprobe.c |   48 ++++++++++++++++++++++++++++++++-----------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index d5d0cd3..4916da5 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -70,7 +70,7 @@ struct trace_uprobe {
 	(sizeof(struct probe_arg) * (n)))
 
 static int register_uprobe_event(struct trace_uprobe *tu);
-static void unregister_uprobe_event(struct trace_uprobe *tu);
+static int unregister_uprobe_event(struct trace_uprobe *tu);
 
 static DEFINE_MUTEX(uprobe_lock);
 static LIST_HEAD(uprobe_list);
@@ -164,11 +164,17 @@ static struct trace_uprobe *find_probe_event(const char *event, const char *grou
 }
 
 /* Unregister a trace_uprobe and probe_event: call with locking uprobe_lock */
-static void unregister_trace_uprobe(struct trace_uprobe *tu)
+static int unregister_trace_uprobe(struct trace_uprobe *tu)
 {
+	int ret;
+
+	ret = unregister_uprobe_event(tu);
+	if (ret)
+		return ret;
+
 	list_del(&tu->list);
-	unregister_uprobe_event(tu);
 	free_trace_uprobe(tu);
+	return 0;
 }
 
 /* Register a trace_uprobe and probe_event */
@@ -181,9 +187,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 
 	/* register as an event */
 	old_tp = find_probe_event(tu->call.name, tu->call.class->system);
-	if (old_tp)
+	if (old_tp) {
 		/* delete old event */
-		unregister_trace_uprobe(old_tp);
+		ret = unregister_trace_uprobe(old_tp);
+		if (ret)
+			goto end;
+	}
 
 	ret = register_uprobe_event(tu);
 	if (ret) {
@@ -256,6 +265,8 @@ static int create_trace_uprobe(int argc, char **argv)
 		group = UPROBE_EVENT_SYSTEM;
 
 	if (is_delete) {
+		int ret;
+
 		if (!event) {
 			pr_info("Delete command needs an event name.\n");
 			return -EINVAL;
@@ -269,9 +280,9 @@ static int create_trace_uprobe(int argc, char **argv)
 			return -ENOENT;
 		}
 		/* delete an event */
-		unregister_trace_uprobe(tu);
+		ret = unregister_trace_uprobe(tu);
 		mutex_unlock(&uprobe_lock);
-		return 0;
+		return ret;
 	}
 
 	if (argc < 2) {
@@ -408,16 +419,20 @@ fail_address_parse:
 	return ret;
 }
 
-static void cleanup_all_probes(void)
+static int cleanup_all_probes(void)
 {
 	struct trace_uprobe *tu;
+	int ret = 0;
 
 	mutex_lock(&uprobe_lock);
 	while (!list_empty(&uprobe_list)) {
 		tu = list_entry(uprobe_list.next, struct trace_uprobe, list);
-		unregister_trace_uprobe(tu);
+		ret = unregister_trace_uprobe(tu);
+		if (ret)
+			break;
 	}
 	mutex_unlock(&uprobe_lock);
+	return ret;
 }
 
 /* Probes listing interfaces */
@@ -462,8 +477,12 @@ static const struct seq_operations probes_seq_op = {
 
 static int probes_open(struct inode *inode, struct file *file)
 {
+	int ret = 0;
+
 	if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC))
-		cleanup_all_probes();
+		ret = cleanup_all_probes();
+	if (ret)
+		return ret;
 
 	return seq_open(file, &probes_seq_op);
 }
@@ -970,12 +989,17 @@ static int register_uprobe_event(struct trace_uprobe *tu)
 	return ret;
 }
 
-static void unregister_uprobe_event(struct trace_uprobe *tu)
+static int unregister_uprobe_event(struct trace_uprobe *tu)
 {
+	int ret;
+
 	/* tu->event is unregistered in trace_remove_event_call() */
-	trace_remove_event_call(&tu->call);
+	ret = trace_remove_event_call(&tu->call);
+	if (ret)
+		return ret;
 	kfree(tu->call.print_fmt);
 	tu->call.print_fmt = NULL;
+	return 0;
 }
 
 /* Make a trace interface for controling probe points */
-- 
1.7.10.4



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-04  3:33 [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Steven Rostedt
                   ` (3 preceding siblings ...)
  2013-07-04  3:33 ` [RFC][PATCH 4/4] tracing/uprobes: " Steven Rostedt
@ 2013-07-04  4:00 ` Masami Hiramatsu
  2013-07-04  6:14   ` Masami Hiramatsu
  2013-07-12 13:09 ` Masami Hiramatsu
  2013-07-15 18:01 ` Oleg Nesterov
  6 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-04  4:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Oleg Nesterov, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/07/04 12:33), Steven Rostedt wrote:
> Currently there exists a race with deleting a kprobe or uprobe and
> a user opening the probe event file or using perf events.
> 
> The problem stems from not being able to take the probe_lock from the
> unregister code because we may have the event_mutex at the time, and
> the event mutex may be taken with the probe_lock held.
> 
> To solve this, the events get a ref count (using the flags field), where
> when an event file is opened, the ftrace_event_call ref count increments.
> Then this is checked under event_mutex and if set, the unregistering
> of the probe will fail.
> 
> Here's a test that shows how things break:
> 
>      # cd /sys/kernel/debug/tracing
>      # echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
>      # enable_probe() {
>         sleep 10
>         echo 1
>      }
>      # file=events/kprobes/sigprocmask/enable
>      # enable_probe > $file &
>      > kprobe_events
>     
>     The above will corrupt the kprobe system, as the write to the enable
>     file will happen after the kprobe was deleted.
>     
>     Trying to create the probe again fails:
>     
>      # echo 'p:sigprocmask sigprocmask' > kprobe_events
>      # cat kprobe_events
>     p:kprobes/sigprocmask sigprocmask
>      # ls events/kprobes/
>     enable  filter
> 
> After applying these patches, the "> kprobe_events" fails due to the
> event being busy.
> 
> Masami, please review these patches and give your ack.

Thanks Steven!

> Oleg Nesterov (1):
>       tracing: trace_remove_event_call() should fail if call/file is in use
> 
> Steven Rostedt (Red Hat) (3):
>       tracing: Add ref count to ftrace_event_call
>       tracing/kprobes: Fail to unregister if probe event files are open
>       tracing/uprobes: Fail to unregister if probe event files are open

I just started to look into the series, but the 3/4 and 4/4 seems same...
Which one is good to go?

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 1/4] tracing: Add ref count to ftrace_event_call
  2013-07-04  3:33 ` [RFC][PATCH 1/4] tracing: Add ref count to ftrace_event_call Steven Rostedt
@ 2013-07-04  4:22   ` Masami Hiramatsu
  2013-07-04 11:55     ` [RFC PATCH] tracing: Atomically get refcounts of event_call and trace_array Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-04  4:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Oleg Nesterov, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/07/04 12:33), Steven Rostedt wrote:
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 7d85429..90cf243 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -391,6 +391,28 @@ static void __get_system_dir(struct ftrace_subsystem_dir *dir)
>  	__get_system(dir->subsystem);
>  }
>  
> +static int ftrace_event_call_get(struct ftrace_event_call *call)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&event_mutex);
> +	if ((call->flags & TRACE_EVENT_FL_REF_MASK) == TRACE_EVENT_FL_REF_MAX - 1)
> +		ret = -EBUSY;
> +	else
> +		call->flags++;
> +	mutex_unlock(&event_mutex);
> +
> +	return ret;
> +}
> +
> +static void ftrace_event_call_put(struct ftrace_event_call *call)
> +{
> +	mutex_lock(&event_mutex);
> +	if (!WARN_ON_ONCE(!(call->flags & TRACE_EVENT_FL_REF_MASK)))
> +		call->flags--;
> +	mutex_unlock(&event_mutex);
> +}

Hmm, I might misunderstand, but it seems that there is a small unsafe
time slot.

> @@ -424,7 +446,15 @@ static int tracing_open_generic_file(struct inode *inode, struct file *filp)
>  
>  	ret = tracing_open_generic(inode, filp);
>  	if (ret < 0)
> -		trace_array_put(tr);
> +		goto fail;
> +
> +	ret = ftrace_event_call_get(file->event_call);
> +	if (ret < 0)
> +		goto fail;
> +
> +	return 0;
> + fail:
> +	trace_array_put(tr);
>  	return ret;
>  }
>  
> @@ -433,12 +463,40 @@ static int tracing_release_generic_file(struct inode *inode, struct file *filp)
>  	struct ftrace_event_file *file = inode->i_private;
>  	struct trace_array *tr = file->tr;
>  
> +	ftrace_event_call_put(file->event_call);
>  	trace_array_put(tr);
>  
>  	return 0;
>  }

Here, at first we get an event_file from inode->i_private, and then locks
event_mutex and get/put refcount. This should be safe if we get (find) the object
from some list of event_file under the mutex, but we just use inode->i_private.

This can cause a race as below scenario,

CPU0                              CPU1
open(kprobe_events)
  trace_remove_event_call()    open(enable)
    lock event_mutex             refer event_file
      event_remove()             wait for unlock event_mutex
        ...
        free event_file
    unlock event_mutex
                                 lock event_mutex
                                    add refcount of event_file->call (*)

So, at (*) point, the event_file is already freed. Thus there still be
an unsafe time slot. Or, did I miss something (possibly..)?

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-04  4:00 ` [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Masami Hiramatsu
@ 2013-07-04  6:14   ` Masami Hiramatsu
  0 siblings, 0 replies; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-04  6:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Oleg Nesterov, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/07/04 13:00), Masami Hiramatsu wrote:
>> Steven Rostedt (Red Hat) (3):
>>       tracing: Add ref count to ftrace_event_call
>>       tracing/kprobes: Fail to unregister if probe event files are open
>>       tracing/uprobes: Fail to unregister if probe event files are open
> 
> I just started to look into the series, but the 3/4 and 4/4 seems same...
> Which one is good to go?

Ah, now I see, those are for kprobes and uprobes...

Thank you,


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC PATCH] tracing: Atomically get refcounts of event_call and trace_array
  2013-07-04  4:22   ` Masami Hiramatsu
@ 2013-07-04 11:55     ` Masami Hiramatsu
  2013-07-04 12:12       ` Masami Hiramatsu
  2013-07-05  0:32       ` Oleg Nesterov
  0 siblings, 2 replies; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-04 11:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

Currently ftrace_open_generic_file gets an event_file from
inode->i_private, and then locks event_mutex and gets refcount.
However, this can cause a race as below scenario;

CPU0                              CPU1
open(kprobe_events)
  trace_remove_event_call()    open(enable)
    lock event_mutex             get event_file from inode->i_private
      event_remove()             wait for unlock event_mutex
        ...
        free event_file
    unlock event_mutex
                                 lock event_mutex
                                 add refcount of event_file->call (*)

So, at (*) point, the event_file is already freed and we
may access the corrupted object.
The same thing could happen on trace_array because it is also
directly accessed from event_file.

To avoid this, when opening events/*/*/enable, we must atomically
do; ensure the ftrace_event_file object still exists on a trace_array,
and get refcounts of event_file->call and the trace_array.


CPU0                              CPU1
open(kprobe_events)
  trace_remove_event_call()    open(enable)
    lock event_mutex             get event_file from inode->i_private
      event_remove()             wait for unlock event_mutex
        ...
        free event_file
    unlock event_mutex
                                 lock event_mutex
                                 search the event_file and failed
                                 unlock event_mutex
                                 return -ENODEV

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 kernel/trace/trace_events.c |   58 +++++++++++++++++++++++++++++++++----------
 1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 1a5547e..db6b107 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -391,15 +391,24 @@ static void __get_system_dir(struct ftrace_subsystem_dir *dir)
 	__get_system(dir->subsystem);
 }
 
-static int ftrace_event_call_get(struct ftrace_event_call *call)
+static int __ftrace_event_call_get(struct ftrace_event_call *call)
 {
 	int ret = 0;
 
-	mutex_lock(&event_mutex);
 	if ((call->flags & TRACE_EVENT_FL_REF_MASK) == TRACE_EVENT_FL_REF_MAX - 1)
 		ret = -EBUSY;
 	else
 		call->flags++;
+
+	return ret;
+}
+
+static int ftrace_event_call_get(struct ftrace_event_call *call)
+{
+	int ret = 0;
+
+	mutex_lock(&event_mutex);
+	ret = __ftrace_event_call_get(call);
 	mutex_unlock(&event_mutex);
 
 	return ret;
@@ -413,6 +422,35 @@ static void ftrace_event_call_put(struct ftrace_event_call *call)
 	mutex_unlock(&event_mutex);
 }
 
+static int ftrace_event_file_get(struct ftrace_event_file *this_file)
+{
+	struct ftrace_event_file *file;
+	struct trace_array *tr;
+	int ret = -ENODEV;
+
+	mutex_lock(&event_mutex);
+	do_for_each_event_file(tr, file) {
+		if (file == this_file) {
+			ret = __ftrace_event_call_get(file->event_call);
+			if (!ret)
+				tr->ref++;
+			goto out_unlock;
+		}
+	} while_for_each_event_file();
+ out_unlock:
+	mutex_unlock(&event_mutex);
+
+	return ret;
+}
+
+static void ftrace_event_file_put(struct ftrace_event_file *file)
+{
+	struct trace_array *tr = file->tr;
+
+	ftrace_event_call_put(file->event_call);
+	trace_array_put(tr);
+}
+
 static void __put_system_dir(struct ftrace_subsystem_dir *dir)
 {
 	WARN_ON_ONCE(dir->ref_count == 0);
@@ -438,33 +476,27 @@ static void put_system(struct ftrace_subsystem_dir *dir)
 static int tracing_open_generic_file(struct inode *inode, struct file *filp)
 {
 	struct ftrace_event_file *file = inode->i_private;
-	struct trace_array *tr = file->tr;
 	int ret;
 
-	if (trace_array_get(tr) < 0)
-		return -ENODEV;
-
-	ret = tracing_open_generic(inode, filp);
+	ret = ftrace_event_file_get(file);
 	if (ret < 0)
-		goto fail;
+		return ret;
 
-	ret = ftrace_event_call_get(file->event_call);
+	ret = tracing_open_generic(inode, filp);
 	if (ret < 0)
 		goto fail;
 
 	return 0;
  fail:
-	trace_array_put(tr);
+	ftrace_event_file_put(file);
 	return ret;
 }
 
 static int tracing_release_generic_file(struct inode *inode, struct file *filp)
 {
 	struct ftrace_event_file *file = inode->i_private;
-	struct trace_array *tr = file->tr;
 
-	ftrace_event_call_put(file->event_call);
-	trace_array_put(tr);
+	ftrace_event_file_put(file);
 
 	return 0;
 }


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH] tracing: Atomically get refcounts of event_call and trace_array
  2013-07-04 11:55     ` [RFC PATCH] tracing: Atomically get refcounts of event_call and trace_array Masami Hiramatsu
@ 2013-07-04 12:12       ` Masami Hiramatsu
  2013-07-04 12:23         ` Steven Rostedt
  2013-07-05  0:32       ` Oleg Nesterov
  1 sibling, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-04 12:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

Steven, Oleg,

I think your patches are OK, but not enough.
Here is an additional patch to fix the unsafe case which I found.
Could you review this too?

(2013/07/04 20:55), Masami Hiramatsu wrote:
> Currently ftrace_open_generic_file gets an event_file from
> inode->i_private, and then locks event_mutex and gets refcount.
> However, this can cause a race as below scenario;
> 
> CPU0                              CPU1
> open(kprobe_events)
>   trace_remove_event_call()    open(enable)
>     lock event_mutex             get event_file from inode->i_private
>       event_remove()             wait for unlock event_mutex
>         ...
>         free event_file
>     unlock event_mutex
>                                  lock event_mutex
>                                  add refcount of event_file->call (*)
> 
> So, at (*) point, the event_file is already freed and we
> may access the corrupted object.
> The same thing could happen on trace_array because it is also
> directly accessed from event_file.
> 
> To avoid this, when opening events/*/*/enable, we must atomically
> do; ensure the ftrace_event_file object still exists on a trace_array,
> and get refcounts of event_file->call and the trace_array.
> 
> 
> CPU0                              CPU1
> open(kprobe_events)
>   trace_remove_event_call()    open(enable)
>     lock event_mutex             get event_file from inode->i_private
>       event_remove()             wait for unlock event_mutex
>         ...
>         free event_file
>     unlock event_mutex
>                                  lock event_mutex
>                                  search the event_file and failed
>                                  unlock event_mutex
>                                  return -ENODEV
> 
> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> ---
>  kernel/trace/trace_events.c |   58 +++++++++++++++++++++++++++++++++----------
>  1 file changed, 45 insertions(+), 13 deletions(-)
> 
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 1a5547e..db6b107 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -391,15 +391,24 @@ static void __get_system_dir(struct ftrace_subsystem_dir *dir)
>  	__get_system(dir->subsystem);
>  }
>  
> -static int ftrace_event_call_get(struct ftrace_event_call *call)
> +static int __ftrace_event_call_get(struct ftrace_event_call *call)
>  {
>  	int ret = 0;
>  
> -	mutex_lock(&event_mutex);
>  	if ((call->flags & TRACE_EVENT_FL_REF_MASK) == TRACE_EVENT_FL_REF_MAX - 1)
>  		ret = -EBUSY;
>  	else
>  		call->flags++;
> +
> +	return ret;
> +}
> +
> +static int ftrace_event_call_get(struct ftrace_event_call *call)
> +{
> +	int ret = 0;
> +
> +	mutex_lock(&event_mutex);
> +	ret = __ftrace_event_call_get(call);
>  	mutex_unlock(&event_mutex);
>  
>  	return ret;
> @@ -413,6 +422,35 @@ static void ftrace_event_call_put(struct ftrace_event_call *call)
>  	mutex_unlock(&event_mutex);
>  }
>  
> +static int ftrace_event_file_get(struct ftrace_event_file *this_file)
> +{
> +	struct ftrace_event_file *file;
> +	struct trace_array *tr;
> +	int ret = -ENODEV;
> +
> +	mutex_lock(&event_mutex);
> +	do_for_each_event_file(tr, file) {
> +		if (file == this_file) {
> +			ret = __ftrace_event_call_get(file->event_call);
> +			if (!ret)
> +				tr->ref++;
> +			goto out_unlock;
> +		}
> +	} while_for_each_event_file();
> + out_unlock:
> +	mutex_unlock(&event_mutex);
> +
> +	return ret;
> +}
> +
> +static void ftrace_event_file_put(struct ftrace_event_file *file)
> +{
> +	struct trace_array *tr = file->tr;
> +
> +	ftrace_event_call_put(file->event_call);
> +	trace_array_put(tr);
> +}
> +
>  static void __put_system_dir(struct ftrace_subsystem_dir *dir)
>  {
>  	WARN_ON_ONCE(dir->ref_count == 0);
> @@ -438,33 +476,27 @@ static void put_system(struct ftrace_subsystem_dir *dir)
>  static int tracing_open_generic_file(struct inode *inode, struct file *filp)
>  {
>  	struct ftrace_event_file *file = inode->i_private;
> -	struct trace_array *tr = file->tr;
>  	int ret;
>  
> -	if (trace_array_get(tr) < 0)
> -		return -ENODEV;
> -
> -	ret = tracing_open_generic(inode, filp);
> +	ret = ftrace_event_file_get(file);
>  	if (ret < 0)
> -		goto fail;
> +		return ret;
>  
> -	ret = ftrace_event_call_get(file->event_call);
> +	ret = tracing_open_generic(inode, filp);
>  	if (ret < 0)
>  		goto fail;
>  
>  	return 0;
>   fail:
> -	trace_array_put(tr);
> +	ftrace_event_file_put(file);
>  	return ret;
>  }
>  
>  static int tracing_release_generic_file(struct inode *inode, struct file *filp)
>  {
>  	struct ftrace_event_file *file = inode->i_private;
> -	struct trace_array *tr = file->tr;
>  
> -	ftrace_event_call_put(file->event_call);
> -	trace_array_put(tr);
> +	ftrace_event_file_put(file);
>  
>  	return 0;
>  }

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH] tracing: Atomically get refcounts of event_call and trace_array
  2013-07-04 12:12       ` Masami Hiramatsu
@ 2013-07-04 12:23         ` Steven Rostedt
  0 siblings, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2013-07-04 12:23 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

Sorry for the top post. FYI today is a US holiday and tomorrow I'll probably only work a half day. I may not get a chance to look it till Monday.

-- Steve


Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> wrote:

>Steven, Oleg,
>
>I think your patches are OK, but not enough.
>Here is an additional patch to fix the unsafe case which I found.
>Could you review this too?
>
>(2013/07/04 20:55), Masami Hiramatsu wrote:
>> Currently ftrace_open_generic_file gets an event_file from
>> inode->i_private, and then locks event_mutex and gets refcount.
>> However, this can cause a race as below scenario;
>> 
>> CPU0                              CPU1
>> open(kprobe_events)
>>   trace_remove_event_call()    open(enable)
>>     lock event_mutex             get event_file from inode->i_private
>>       event_remove()             wait for unlock event_mutex
>>         ...
>>         free event_file
>>     unlock event_mutex
>>                                  lock event_mutex
>>                                  add refcount of event_file->call (*)
>> 
>> So, at (*) point, the event_file is already freed and we
>> may access the corrupted object.
>> The same thing could happen on trace_array because it is also
>> directly accessed from event_file.
>> 
>> To avoid this, when opening events/*/*/enable, we must atomically
>> do; ensure the ftrace_event_file object still exists on a
>trace_array,
>> and get refcounts of event_file->call and the trace_array.
>> 
>> 
>> CPU0                              CPU1
>> open(kprobe_events)
>>   trace_remove_event_call()    open(enable)
>>     lock event_mutex             get event_file from inode->i_private
>>       event_remove()             wait for unlock event_mutex
>>         ...
>>         free event_file
>>     unlock event_mutex
>>                                  lock event_mutex
>>                                  search the event_file and failed
>>                                  unlock event_mutex
>>                                  return -ENODEV
>> 
>> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> ---
>>  kernel/trace/trace_events.c |   58
>+++++++++++++++++++++++++++++++++----------
>>  1 file changed, 45 insertions(+), 13 deletions(-)
>> 
>> diff --git a/kernel/trace/trace_events.c
>b/kernel/trace/trace_events.c
>> index 1a5547e..db6b107 100644
>> --- a/kernel/trace/trace_events.c
>> +++ b/kernel/trace/trace_events.c
>> @@ -391,15 +391,24 @@ static void __get_system_dir(struct
>ftrace_subsystem_dir *dir)
>>  	__get_system(dir->subsystem);
>>  }
>>  
>> -static int ftrace_event_call_get(struct ftrace_event_call *call)
>> +static int __ftrace_event_call_get(struct ftrace_event_call *call)
>>  {
>>  	int ret = 0;
>>  
>> -	mutex_lock(&event_mutex);
>>  	if ((call->flags & TRACE_EVENT_FL_REF_MASK) ==
>TRACE_EVENT_FL_REF_MAX - 1)
>>  		ret = -EBUSY;
>>  	else
>>  		call->flags++;
>> +
>> +	return ret;
>> +}
>> +
>> +static int ftrace_event_call_get(struct ftrace_event_call *call)
>> +{
>> +	int ret = 0;
>> +
>> +	mutex_lock(&event_mutex);
>> +	ret = __ftrace_event_call_get(call);
>>  	mutex_unlock(&event_mutex);
>>  
>>  	return ret;
>> @@ -413,6 +422,35 @@ static void ftrace_event_call_put(struct
>ftrace_event_call *call)
>>  	mutex_unlock(&event_mutex);
>>  }
>>  
>> +static int ftrace_event_file_get(struct ftrace_event_file
>*this_file)
>> +{
>> +	struct ftrace_event_file *file;
>> +	struct trace_array *tr;
>> +	int ret = -ENODEV;
>> +
>> +	mutex_lock(&event_mutex);
>> +	do_for_each_event_file(tr, file) {
>> +		if (file == this_file) {
>> +			ret = __ftrace_event_call_get(file->event_call);
>> +			if (!ret)
>> +				tr->ref++;
>> +			goto out_unlock;
>> +		}
>> +	} while_for_each_event_file();
>> + out_unlock:
>> +	mutex_unlock(&event_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +static void ftrace_event_file_put(struct ftrace_event_file *file)
>> +{
>> +	struct trace_array *tr = file->tr;
>> +
>> +	ftrace_event_call_put(file->event_call);
>> +	trace_array_put(tr);
>> +}
>> +
>>  static void __put_system_dir(struct ftrace_subsystem_dir *dir)
>>  {
>>  	WARN_ON_ONCE(dir->ref_count == 0);
>> @@ -438,33 +476,27 @@ static void put_system(struct
>ftrace_subsystem_dir *dir)
>>  static int tracing_open_generic_file(struct inode *inode, struct
>file *filp)
>>  {
>>  	struct ftrace_event_file *file = inode->i_private;
>> -	struct trace_array *tr = file->tr;
>>  	int ret;
>>  
>> -	if (trace_array_get(tr) < 0)
>> -		return -ENODEV;
>> -
>> -	ret = tracing_open_generic(inode, filp);
>> +	ret = ftrace_event_file_get(file);
>>  	if (ret < 0)
>> -		goto fail;
>> +		return ret;
>>  
>> -	ret = ftrace_event_call_get(file->event_call);
>> +	ret = tracing_open_generic(inode, filp);
>>  	if (ret < 0)
>>  		goto fail;
>>  
>>  	return 0;
>>   fail:
>> -	trace_array_put(tr);
>> +	ftrace_event_file_put(file);
>>  	return ret;
>>  }
>>  
>>  static int tracing_release_generic_file(struct inode *inode, struct
>file *filp)
>>  {
>>  	struct ftrace_event_file *file = inode->i_private;
>> -	struct trace_array *tr = file->tr;
>>  
>> -	ftrace_event_call_put(file->event_call);
>> -	trace_array_put(tr);
>> +	ftrace_event_file_put(file);
>>  
>>  	return 0;
>>  }

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-04  3:33 ` [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open Steven Rostedt
@ 2013-07-04 12:45   ` Masami Hiramatsu
  2013-07-04 18:48     ` Oleg Nesterov
  2013-07-30  8:15   ` [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open Masami Hiramatsu
  2013-07-31 19:49   ` Steven Rostedt
  2 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-04 12:45 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Oleg Nesterov, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/07/04 12:33), Steven Rostedt wrote:
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 7ed6976..ffcaf42 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -95,7 +95,7 @@ static __kprobes bool trace_probe_is_on_module(struct trace_probe *tp)
>  }
>  
>  static int register_probe_event(struct trace_probe *tp);
> -static void unregister_probe_event(struct trace_probe *tp);
> +static int unregister_probe_event(struct trace_probe *tp);
>  
>  static DEFINE_MUTEX(probe_lock);
>  static LIST_HEAD(probe_list);
> @@ -340,9 +340,12 @@ static int unregister_trace_probe(struct trace_probe *tp)
>  	if (trace_probe_is_enabled(tp))
>  		return -EBUSY;
>  
> +	/* Will fail if probe is being used by ftrace or perf */
> +	if (unregister_probe_event(tp))
> +		return -EBUSY;
> +
>  	__unregister_trace_probe(tp);
>  	list_del(&tp->list);
> -	unregister_probe_event(tp);
>  
>  	return 0;
>  }

This may cause an unexpected access violation at kprobe handler because
unregister_probe_event frees event_call/event_files and it will be
accessed until kprobe is *completely* disabled.
Actually disable_kprobe() doesn't ensure to finish the current running
kprobe handlers. Thus, even if trace_probe_is_enabled() returns false,
we must do synchronize_sched() for waiting, before unregister_probe_event().
OTOH, unregister_kprobe() waits for that. That's why I do
__unregister_trace_probe(tp) first here.

I think there are 2 solutions, one is adding a wait after disable_k*probe
at disable_trace_probe(), another is adding a wait before unregister_probe_event()
at unregister_trace_probe().

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 2/4] tracing: trace_remove_event_call() should fail if call/file is in use
  2013-07-04  3:33 ` [RFC][PATCH 2/4] tracing: trace_remove_event_call() should fail if call/file is in use Steven Rostedt
@ 2013-07-04 12:48   ` Masami Hiramatsu
  0 siblings, 0 replies; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-04 12:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Oleg Nesterov, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/07/04 12:33), Steven Rostedt wrote:
> From: Oleg Nesterov <oleg@redhat.com>
> 
> Change trace_remove_event_call(call) to return the error if this
> call is active. This is what the callers assume but can't verify
> outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c
> need the additional changes, unregister_trace_probe() should abort
> if trace_remove_event_call() fails.
> 
> We also check TRACE_EVENT_FL_REF_MASK to ensure that nobody opened
> the files we are going to remove, these means that nobody can access
> the soon-to-be-freed ftrace_event_file/call via filp->private_data.
> 
> Link: http://lkml.kernel.org/r/20130702222359.GA27629@redhat.com

This looks good for me ;)

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Thanks!

> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  include/linux/ftrace_event.h |    2 +-
>  kernel/trace/trace_events.c  |   33 +++++++++++++++++++++++++++++++--
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index 72ff2c6..bdf6bdd 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -338,7 +338,7 @@ extern int trace_define_field(struct ftrace_event_call *call, const char *type,
>  			      const char *name, int offset, int size,
>  			      int is_signed, int filter_type);
>  extern int trace_add_event_call(struct ftrace_event_call *call);
> -extern void trace_remove_event_call(struct ftrace_event_call *call);
> +extern int trace_remove_event_call(struct ftrace_event_call *call);
>  
>  #define is_signed_type(type)	(((type)(-1)) < (type)1)
>  
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 90cf243..1a5547e 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -1766,16 +1766,45 @@ static void __trace_remove_event_call(struct ftrace_event_call *call)
>  	destroy_preds(call);
>  }
>  
> +static int probe_remove_event_call(struct ftrace_event_call *call)
> +{
> +	struct trace_array *tr;
> +	struct ftrace_event_file *file;
> +
> +	if (call->flags & TRACE_EVENT_FL_REF_MASK)
> +		return -EBUSY;
> +
> +#ifdef CONFIG_PERF_EVENTS
> +	if (call->perf_refcount)
> +		return -EBUSY;
> +#endif
> +	do_for_each_event_file(tr, file) {
> +		if (file->event_call != call)
> +			continue;
> +		if (file->flags & FTRACE_EVENT_FL_ENABLED)
> +			return -EBUSY;
> +		break;
> +	} while_for_each_event_file();
> +
> +	__trace_remove_event_call(call);
> +
> +	return 0;
> +}
> +
>  /* Remove an event_call */
> -void trace_remove_event_call(struct ftrace_event_call *call)
> +int trace_remove_event_call(struct ftrace_event_call *call)
>  {
> +	int ret;
> +
>  	mutex_lock(&trace_types_lock);
>  	mutex_lock(&event_mutex);
>  	down_write(&trace_event_sem);
> -	__trace_remove_event_call(call);
> +	ret = probe_remove_event_call(call);
>  	up_write(&trace_event_sem);
>  	mutex_unlock(&event_mutex);
>  	mutex_unlock(&trace_types_lock);
> +
> +	return ret;
>  }
>  
>  #define for_each_event(event, start, end)			\
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-04 12:45   ` Masami Hiramatsu
@ 2013-07-04 18:48     ` Oleg Nesterov
  2013-07-05  2:53       ` Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-04 18:48 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, linux-kernel, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/04, Masami Hiramatsu wrote:
>
> (2013/07/04 12:33), Steven Rostedt wrote:
> > +	/* Will fail if probe is being used by ftrace or perf */
> > +	if (unregister_probe_event(tp))
> > +		return -EBUSY;
> > +
> >  	__unregister_trace_probe(tp);
> >  	list_del(&tp->list);
> > -	unregister_probe_event(tp);
> >
> >  	return 0;
> >  }
>
> This may cause an unexpected access violation at kprobe handler because
> unregister_probe_event frees event_call/event_files and it will be
> accessed until kprobe is *completely* disabled.

I don't think so... Please correct me.

(but yes I think the patch needs a small update, see below).

> Actually disable_kprobe() doesn't ensure to finish the current running
> kprobe handlers.

Yes. in fact disable_trace_probe(file != NULL) does, but perf doesn't.

> Thus, even if trace_probe_is_enabled() returns false,
> we must do synchronize_sched() for waiting, before unregister_probe_event().

No, I think we should simply kill trace_probe_is_enabled() here.
And synchronize_sched() _before_ unregister_probe_event() can't
help, exactly because trace_probe_is_enabled() is racy.

> OTOH, unregister_kprobe() waits for that.

Yes.

So I think we only need to move kfree(tp->call.print_fmt). In fact I
already wrote the patch assuming that trace_remove_event_call() will
be changed as we discussed.

So the sequence should be:

	if (trace_remove_event_call(...))
		return;

	/* does synchronize_sched */
	unregister_kprobe();

	kfree(everything);

Agreed?

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH] tracing: Atomically get refcounts of event_call and trace_array
  2013-07-04 11:55     ` [RFC PATCH] tracing: Atomically get refcounts of event_call and trace_array Masami Hiramatsu
  2013-07-04 12:12       ` Masami Hiramatsu
@ 2013-07-05  0:32       ` Oleg Nesterov
  2013-07-05  2:32         ` Masami Hiramatsu
  2013-07-09  7:55         ` [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private Masami Hiramatsu
  1 sibling, 2 replies; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-05  0:32 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On 07/04, Masami Hiramatsu wrote:
>
> Currently ftrace_open_generic_file gets an event_file from
> inode->i_private, and then locks event_mutex and gets refcount.
> However, this can cause a race as below scenario;
>
> CPU0                              CPU1
> open(kprobe_events)
>   trace_remove_event_call()    open(enable)
>     lock event_mutex             get event_file from inode->i_private
>       event_remove()             wait for unlock event_mutex
>         ...
>         free event_file
>     unlock event_mutex
>                                  lock event_mutex
>                                  add refcount of event_file->call (*)
>
> So, at (*) point, the event_file is already freed and we
> may access the corrupted object.

Yes, but the same can happen with event_call, so it seems that
this patch is not enough too.

Say, open(id) can take event_mutex when the caller of
trace_remove_event_call() has already freed ftrace_event_call.

Or I missed something...

Perhaps we can rely on d_unlinked() ? IOW, the caller of
__ftrace_event_call_get() should take event_mutex, check
d_unhashed(f_dentry) and only then do _get().

Nasty, I agree.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH] tracing: Atomically get refcounts of event_call and trace_array
  2013-07-05  0:32       ` Oleg Nesterov
@ 2013-07-05  2:32         ` Masami Hiramatsu
  2013-07-09  7:55         ` [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private Masami Hiramatsu
  1 sibling, 0 replies; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-05  2:32 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

(2013/07/05 9:32), Oleg Nesterov wrote:
> On 07/04, Masami Hiramatsu wrote:
>>
>> Currently ftrace_open_generic_file gets an event_file from
>> inode->i_private, and then locks event_mutex and gets refcount.
>> However, this can cause a race as below scenario;
>>
>> CPU0                              CPU1
>> open(kprobe_events)
>>   trace_remove_event_call()    open(enable)
>>     lock event_mutex             get event_file from inode->i_private
>>       event_remove()             wait for unlock event_mutex
>>         ...
>>         free event_file
>>     unlock event_mutex
>>                                  lock event_mutex
>>                                  add refcount of event_file->call (*)
>>
>> So, at (*) point, the event_file is already freed and we
>> may access the corrupted object.
> 
> Yes, but the same can happen with event_call, so it seems that
> this patch is not enough too.

Oops, right!

> Say, open(id) can take event_mutex when the caller of
> trace_remove_event_call() has already freed ftrace_event_call.
> 
> Or I missed something...
> 
> Perhaps we can rely on d_unlinked() ? IOW, the caller of
> __ftrace_event_call_get() should take event_mutex, check
> d_unhashed(f_dentry) and only then do _get().

It sounds a good idea! :)

Thank you!

> 
> Nasty, I agree.
> 
> Oleg.
> 
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-04 18:48     ` Oleg Nesterov
@ 2013-07-05  2:53       ` Masami Hiramatsu
  2013-07-05 17:26         ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-05  2:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, linux-kernel, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/07/05 3:48), Oleg Nesterov wrote:
> On 07/04, Masami Hiramatsu wrote:
>>
>> (2013/07/04 12:33), Steven Rostedt wrote:
>>> +	/* Will fail if probe is being used by ftrace or perf */
>>> +	if (unregister_probe_event(tp))
>>> +		return -EBUSY;
>>> +
>>>  	__unregister_trace_probe(tp);
>>>  	list_del(&tp->list);
>>> -	unregister_probe_event(tp);
>>>
>>>  	return 0;
>>>  }
>>
>> This may cause an unexpected access violation at kprobe handler because
>> unregister_probe_event frees event_call/event_files and it will be
>> accessed until kprobe is *completely* disabled.
> 
> I don't think so... Please correct me.
> 
> (but yes I think the patch needs a small update, see below).
> 
>> Actually disable_kprobe() doesn't ensure to finish the current running
>> kprobe handlers.
> 
> Yes. in fact disable_trace_probe(file != NULL) does, but perf doesn't.

Ah, right. we did that.

> 
>> Thus, even if trace_probe_is_enabled() returns false,
>> we must do synchronize_sched() for waiting, before unregister_probe_event().
> 
> No, I think we should simply kill trace_probe_is_enabled() here.
> And synchronize_sched() _before_ unregister_probe_event() can't
> help, exactly because trace_probe_is_enabled() is racy.

Right, it should be useless.


>> OTOH, unregister_kprobe() waits for that.
> 
> Yes.
> 
> So I think we only need to move kfree(tp->call.print_fmt). In fact I
> already wrote the patch assuming that trace_remove_event_call() will
> be changed as we discussed.
> 
> So the sequence should be:
> 
> 	if (trace_remove_event_call(...))
> 		return;
> 
> 	/* does synchronize_sched */
> 	unregister_kprobe();
> 
> 	kfree(everything);
> 
> Agreed?

If we can free everything after all, I'd like to do so.
Hmm, but AFAICS, trace_remove_event_call() supposes that
all event is disabled completely.

A safe way is to wait rcu always right after disable_*probe
in disable_trace_probe. If we have an unused link, we can
free it after that.

Or, do more aggressively, introducing a dying-bit for each
trace-probe could be another way. If the bit is set, all
enable operations are failed. It works like as a per-event lock.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-05  2:53       ` Masami Hiramatsu
@ 2013-07-05 17:26         ` Oleg Nesterov
  2013-07-08  2:36           ` Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-05 17:26 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, linux-kernel, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/05, Masami Hiramatsu wrote:
>
> (2013/07/05 3:48), Oleg Nesterov wrote:
> > On 07/04, Masami Hiramatsu wrote:
> >>
> >> Actually disable_kprobe() doesn't ensure to finish the current running
> >> kprobe handlers.
> >
> > Yes. in fact disable_trace_probe(file != NULL) does, but perf doesn't.
>
> Ah, right. we did that.

And thus we only need to synchronize kprobe_dispatcher()->kprobe_perf_func()
path. And afaics kprobe_perf_func() doesn't use anything which can be freed
by trace_remove_event_call?

> >> OTOH, unregister_kprobe() waits for that.
> >
> > Yes.
> >
> > So I think we only need to move kfree(tp->call.print_fmt).

OOPS. I was wrong. It seems that ->print_fmt is only for event/format ?
Then it is fine to kfree it right after trace_remove_event_call().

> > So the sequence should be:
> >
> > 	if (trace_remove_event_call(...))
> > 		return;
> >
> > 	/* does synchronize_sched */
> > 	unregister_kprobe();
> >
> > 	kfree(everything);
> >
> > Agreed?
>
> If we can free everything after all, I'd like to do so.
> Hmm, but AFAICS, trace_remove_event_call() supposes that
> all event is disabled completely.

Yes, but kprobe_trace_func() is really disabled?

> A safe way is to wait rcu always right after disable_*probe
> in disable_trace_probe. If we have an unused link, we can
> free it after that.

Aaaah... I am starting to understand... Even if kprobe_perf_func()
is fine, synchronize_sched() is calles _before_ disable_kprobe()
and thus it can't synchronize with the handlers which hit this probe
after we start synchronize_sched().

Did you mean this or I misssed something else?

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-05 17:26         ` Oleg Nesterov
@ 2013-07-08  2:36           ` Masami Hiramatsu
  2013-07-08 14:25             ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-08  2:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, linux-kernel, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/07/06 2:26), Oleg Nesterov wrote:
> On 07/05, Masami Hiramatsu wrote:
>>
>> (2013/07/05 3:48), Oleg Nesterov wrote:
>>> On 07/04, Masami Hiramatsu wrote:
>>>>
>>>> Actually disable_kprobe() doesn't ensure to finish the current running
>>>> kprobe handlers.
>>>
>>> Yes. in fact disable_trace_probe(file != NULL) does, but perf doesn't.
>>
>> Ah, right. we did that.
> 
> And thus we only need to synchronize kprobe_dispatcher()->kprobe_perf_func()
> path. And afaics kprobe_perf_func() doesn't use anything which can be freed
> by trace_remove_event_call?
> 
>>>> OTOH, unregister_kprobe() waits for that.
>>>
>>> Yes.
>>>
>>> So I think we only need to move kfree(tp->call.print_fmt).
> 
> OOPS. I was wrong. It seems that ->print_fmt is only for event/format ?
> Then it is fine to kfree it right after trace_remove_event_call().
> 
>>> So the sequence should be:
>>>
>>> 	if (trace_remove_event_call(...))
>>> 		return;
>>>
>>> 	/* does synchronize_sched */
>>> 	unregister_kprobe();
>>>
>>> 	kfree(everything);
>>>
>>> Agreed?
>>
>> If we can free everything after all, I'd like to do so.
>> Hmm, but AFAICS, trace_remove_event_call() supposes that
>> all event is disabled completely.
> 
> Yes, but kprobe_trace_func() is really disabled?

No, currently, doesn't. We need to fix that.
(Perhaps, uprobes too.)

>> A safe way is to wait rcu always right after disable_*probe
>> in disable_trace_probe. If we have an unused link, we can
>> free it after that.
> 
> Aaaah... I am starting to understand... Even if kprobe_perf_func()
> is fine, synchronize_sched() is calles _before_ disable_kprobe()
> and thus it can't synchronize with the handlers which hit this probe
> after we start synchronize_sched().
> 
> Did you mean this or I misssed something else?

Right, thus perf path also need to be synchronized.

Thank you,


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-08  2:36           ` Masami Hiramatsu
@ 2013-07-08 14:25             ` Oleg Nesterov
  2013-07-09  8:01               ` [RFC PATCH] tracing/kprobe: Wait for disabling all running kprobe handlers Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-08 14:25 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, linux-kernel, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/08, Masami Hiramatsu wrote:
>
> (2013/07/06 2:26), Oleg Nesterov wrote:
> (Perhaps, uprobes too.)

Sure, uprobes needs all the same fixes, lets ignore it for now.

> >> A safe way is to wait rcu always right after disable_*probe
> >> in disable_trace_probe. If we have an unused link, we can
> >> free it after that.
> >
> > Aaaah... I am starting to understand... Even if kprobe_perf_func()
> > is fine, synchronize_sched() is calles _before_ disable_kprobe()
> > and thus it can't synchronize with the handlers which hit this probe
> > after we start synchronize_sched().
> >
> > Did you mean this or I misssed something else?
>
> Right, thus perf path also need to be synchronized.

kprobe_perf_func() looks fine. This path can't use anything which
can be freed by trace_remove_event_call(). We are going to free
tp / tp->call but only after unregister_kprobe().

Afaics. But even if I am right, I agree it would be safer to not
rely on this and simply do synchronize_sched() after disable as
you suggested.

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-05  0:32       ` Oleg Nesterov
  2013-07-05  2:32         ` Masami Hiramatsu
@ 2013-07-09  7:55         ` Masami Hiramatsu
  2013-07-15 18:16           ` Oleg Nesterov
  1 sibling, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-09  7:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

Currently ftrace_open_generic_file gets an event_file from
inode->i_private, and then locks event_mutex and gets refcount.
However, this can cause a race as below scenario;

CPU0                              CPU1
open(kprobe_events)
  trace_remove_event_call()    open(enable)
    lock event_mutex             get event_file from inode->i_private
      event_remove()             wait for unlock event_mutex
        ...
        free event_file
    unlock event_mutex
                                 lock event_mutex
                                 add refcount of event_file->call (*)

So, at (*) point, the event_file is already freed and we
may access the corrupted object.
The same thing could happen on event_call because it is also
directly accessed from i_private on some points.

To avoid this, when opening events/*/*/enable, we have to ensure
the dentry of the file is not unlinked yet, under event_mutex
is locked.

CPU0                              CPU1
open(kprobe_events)
  trace_remove_event_call()    open(enable)
    lock event_mutex             get event_file from inode->i_private
      event_remove()             wait for unlock event_mutex
        ...
        dput(dentry)
        free event_file
    unlock event_mutex
                                 lock event_mutex
                                 check !d_unhashed(dentry) and failed
                                 unlock event_mutex
                                 return -ENODEV

Note: trace_array_get(tr) always ensures existance of tr in
trace_arrays, so above check is not needed in the case that
i_private directly points the tr.

Changes from V1:
 - Use d_unhashed(f_dentry) to ensure the event exists according
   to Oleg's comment.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_events.c |   70 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 1a5547e..06fdac0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -391,15 +391,32 @@ static void __get_system_dir(struct ftrace_subsystem_dir *dir)
 	__get_system(dir->subsystem);
 }
 
-static int ftrace_event_call_get(struct ftrace_event_call *call)
+static int __ftrace_event_call_get(struct ftrace_event_call *call)
 {
 	int ret = 0;
 
-	mutex_lock(&event_mutex);
 	if ((call->flags & TRACE_EVENT_FL_REF_MASK) == TRACE_EVENT_FL_REF_MAX - 1)
 		ret = -EBUSY;
 	else
 		call->flags++;
+
+	return ret;
+}
+
+static int ftrace_event_call_get(struct ftrace_event_call *call,
+				 struct dentry *dentry)
+{
+	int ret = -ENODEV;
+
+	mutex_lock(&event_mutex);
+	spin_lock(&dentry->d_lock);
+	if (d_unhashed(dentry)) /* This file is already unlinked */
+		goto out_unlock;
+
+	ret = __ftrace_event_call_get(call);
+
+ out_unlock:
+	spin_unlock(&dentry->d_lock);
 	mutex_unlock(&event_mutex);
 
 	return ret;
@@ -413,6 +430,35 @@ static void ftrace_event_call_put(struct ftrace_event_call *call)
 	mutex_unlock(&event_mutex);
 }
 
+static int ftrace_event_file_get(struct ftrace_event_file *file,
+				 struct dentry *dentry)
+{
+	int ret = -ENODEV;
+
+	mutex_lock(&event_mutex);
+	spin_lock(&dentry->d_lock);
+	if (d_unhashed(dentry)) /* This file is already unlinked */
+		goto out_unlock;
+
+	ret = __ftrace_event_call_get(file->event_call);
+	if (!ret)
+		file->tr->ref++;
+
+ out_unlock:
+	spin_unlock(&dentry->d_lock);
+	mutex_unlock(&event_mutex);
+
+	return ret;
+}
+
+static void ftrace_event_file_put(struct ftrace_event_file *file)
+{
+	struct trace_array *tr = file->tr;
+
+	ftrace_event_call_put(file->event_call);
+	trace_array_put(tr);
+}
+
 static void __put_system_dir(struct ftrace_subsystem_dir *dir)
 {
 	WARN_ON_ONCE(dir->ref_count == 0);
@@ -438,33 +484,27 @@ static void put_system(struct ftrace_subsystem_dir *dir)
 static int tracing_open_generic_file(struct inode *inode, struct file *filp)
 {
 	struct ftrace_event_file *file = inode->i_private;
-	struct trace_array *tr = file->tr;
 	int ret;
 
-	if (trace_array_get(tr) < 0)
-		return -ENODEV;
-
-	ret = tracing_open_generic(inode, filp);
+	ret = ftrace_event_file_get(file, filp->f_dentry);
 	if (ret < 0)
-		goto fail;
+		return ret;
 
-	ret = ftrace_event_call_get(file->event_call);
+	ret = tracing_open_generic(inode, filp);
 	if (ret < 0)
 		goto fail;
 
 	return 0;
  fail:
-	trace_array_put(tr);
+	ftrace_event_file_put(file);
 	return ret;
 }
 
 static int tracing_release_generic_file(struct inode *inode, struct file *filp)
 {
 	struct ftrace_event_file *file = inode->i_private;
-	struct trace_array *tr = file->tr;
 
-	ftrace_event_call_put(file->event_call);
-	trace_array_put(tr);
+	ftrace_event_file_put(file);
 
 	return 0;
 }
@@ -477,7 +517,7 @@ static int tracing_open_generic_call(struct inode *inode, struct file *filp)
 {
 	struct ftrace_event_call *call = inode->i_private;
 
-	return ftrace_event_call_get(call);
+	return ftrace_event_call_get(call, filp->f_dentry);
 }
 
 static int tracing_release_generic_call(struct inode *inode, struct file *filp)
@@ -1007,7 +1047,7 @@ static int trace_format_open(struct inode *inode, struct file *file)
 	struct seq_file *m;
 	int ret;
 
-	ret = ftrace_event_call_get(call);
+	ret = ftrace_event_call_get(call, file->f_dentry);
 	if (ret < 0)
 		return ret;
 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC PATCH] tracing/kprobe: Wait for disabling all running kprobe handlers
  2013-07-08 14:25             ` Oleg Nesterov
@ 2013-07-09  8:01               ` Masami Hiramatsu
  2013-07-09  8:07                 ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-09  8:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

Wait for disabling all running kprobe handlers when a kprobe
event is disabled, since the caller, trace_remove_event_call()
supposes that a removing event is disabled completely by
disabling the event.
With this change, ftrace can ensure that there is no running
event handlers after disabling it.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 kernel/trace/trace_kprobe.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 7ed6976..a9a4fe5 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -243,11 +243,11 @@ find_event_file_link(struct trace_probe *tp, struct ftrace_event_file *file)
 static int
 disable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
 {
+	struct event_file_link *link = NULL;
+	int wait = 0;
 	int ret = 0;
 
 	if (file) {
-		struct event_file_link *link;
-
 		link = find_event_file_link(tp, file);
 		if (!link) {
 			ret = -EINVAL;
@@ -255,10 +255,7 @@ disable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
 		}
 
 		list_del_rcu(&link->list);
-		/* synchronize with kprobe_trace_func/kretprobe_trace_func */
-		synchronize_sched();
-		kfree(link);
-
+		wait = 1;
 		if (!list_empty(&tp->files))
 			goto out;
 
@@ -271,8 +268,18 @@ disable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
 			disable_kretprobe(&tp->rp);
 		else
 			disable_kprobe(&tp->rp.kp);
+		wait = 1;
 	}
  out:
+	if (wait) {
+		/*
+		 * synchronize with kprobe_trace_func/kretprobe_trace_func
+		 * to ensure disabled (all running handlers are finished)
+		 */
+		synchronize_sched();
+		kfree(link);	/* Ignored if link == NULL */
+	}
+
 	return ret;
 }
 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH] tracing/kprobe: Wait for disabling all running kprobe handlers
  2013-07-09  8:01               ` [RFC PATCH] tracing/kprobe: Wait for disabling all running kprobe handlers Masami Hiramatsu
@ 2013-07-09  8:07                 ` Peter Zijlstra
  2013-07-09  8:20                   ` Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Peter Zijlstra @ 2013-07-09  8:07 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On Tue, Jul 09, 2013 at 05:01:45PM +0900, Masami Hiramatsu wrote:
> +	if (wait) {
> +		/*
> +		 * synchronize with kprobe_trace_func/kretprobe_trace_func
> +		 * to ensure disabled (all running handlers are finished)
> +		 */
> +		synchronize_sched();
> +		kfree(link);	/* Ignored if link == NULL */
> +	}

What's not clear to me from this comment is if we're only waiting for kfree()?
In that case shouldn't we use call_rcu() to free the thing? Or do we need the
sync_sched() for other purposes as well?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [RFC PATCH] tracing/kprobe: Wait for disabling all running kprobe handlers
  2013-07-09  8:07                 ` Peter Zijlstra
@ 2013-07-09  8:20                   ` Masami Hiramatsu
  2013-07-09  8:21                     ` Peter Zijlstra
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-09  8:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

(2013/07/09 17:07), Peter Zijlstra wrote:
> On Tue, Jul 09, 2013 at 05:01:45PM +0900, Masami Hiramatsu wrote:
>> +	if (wait) {
>> +		/*
>> +		 * synchronize with kprobe_trace_func/kretprobe_trace_func
>> +		 * to ensure disabled (all running handlers are finished)
>> +		 */
>> +		synchronize_sched();
>> +		kfree(link);	/* Ignored if link == NULL */
>> +	}
> 
> What's not clear to me from this comment is if we're only waiting for kfree()?
> In that case shouldn't we use call_rcu() to free the thing? Or do we need the
> sync_sched() for other purposes as well?

No, this is not only for kfree, but also to ensure completing
disabling process, because trace_remove_event_call() supposes
that for releasing event_call related objects (Those objects
will be accessed in the handlers).

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [RFC PATCH] tracing/kprobe: Wait for disabling all running kprobe handlers
  2013-07-09  8:20                   ` Masami Hiramatsu
@ 2013-07-09  8:21                     ` Peter Zijlstra
  2013-07-09  8:50                       ` Masami Hiramatsu
  2013-07-09  9:35                       ` [RFC PATCH V2] " Masami Hiramatsu
  0 siblings, 2 replies; 67+ messages in thread
From: Peter Zijlstra @ 2013-07-09  8:21 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On Tue, Jul 09, 2013 at 05:20:17PM +0900, Masami Hiramatsu wrote:
> (2013/07/09 17:07), Peter Zijlstra wrote:
> > On Tue, Jul 09, 2013 at 05:01:45PM +0900, Masami Hiramatsu wrote:
> >> +	if (wait) {
> >> +		/*
> >> +		 * synchronize with kprobe_trace_func/kretprobe_trace_func
> >> +		 * to ensure disabled (all running handlers are finished)
> >> +		 */
> >> +		synchronize_sched();
> >> +		kfree(link);	/* Ignored if link == NULL */
> >> +	}
> > 
> > What's not clear to me from this comment is if we're only waiting for kfree()?
> > In that case shouldn't we use call_rcu() to free the thing? Or do we need the
> > sync_sched() for other purposes as well?
> 
> No, this is not only for kfree, but also to ensure completing
> disabling process, because trace_remove_event_call() supposes
> that for releasing event_call related objects (Those objects
> will be accessed in the handlers).

Then may I kindly suggest you clarify this in the comment? :-)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH] tracing/kprobe: Wait for disabling all running kprobe handlers
  2013-07-09  8:21                     ` Peter Zijlstra
@ 2013-07-09  8:50                       ` Masami Hiramatsu
  2013-07-09  9:35                       ` [RFC PATCH V2] " Masami Hiramatsu
  1 sibling, 0 replies; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-09  8:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

(2013/07/09 17:21), Peter Zijlstra wrote:
> On Tue, Jul 09, 2013 at 05:20:17PM +0900, Masami Hiramatsu wrote:
>> (2013/07/09 17:07), Peter Zijlstra wrote:
>>> On Tue, Jul 09, 2013 at 05:01:45PM +0900, Masami Hiramatsu wrote:
>>>> +	if (wait) {
>>>> +		/*
>>>> +		 * synchronize with kprobe_trace_func/kretprobe_trace_func
>>>> +		 * to ensure disabled (all running handlers are finished)
>>>> +		 */
>>>> +		synchronize_sched();
>>>> +		kfree(link);	/* Ignored if link == NULL */
>>>> +	}
>>>
>>> What's not clear to me from this comment is if we're only waiting for kfree()?
>>> In that case shouldn't we use call_rcu() to free the thing? Or do we need the
>>> sync_sched() for other purposes as well?
>>
>> No, this is not only for kfree, but also to ensure completing
>> disabling process, because trace_remove_event_call() supposes
>> that for releasing event_call related objects (Those objects
>> will be accessed in the handlers).
> 
> Then may I kindly suggest you clarify this in the comment? :-)
> 

Ah, right! I'll update it :)

Thank you!

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [RFC PATCH V2] tracing/kprobe: Wait for disabling all running kprobe handlers
  2013-07-09  8:21                     ` Peter Zijlstra
  2013-07-09  8:50                       ` Masami Hiramatsu
@ 2013-07-09  9:35                       ` Masami Hiramatsu
  2013-07-15 18:20                         ` Oleg Nesterov
  1 sibling, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-09  9:35 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, Frederic Weisbecker, Oleg Nesterov, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

Wait for disabling all running kprobe handlers when a kprobe
event is disabled, since the caller, trace_remove_event_call()
supposes that a removing event is disabled completely by
disabling the event.
With this change, ftrace can ensure that there is no running
event handlers after disabling it.

Changes in V2:
 - Comment (in code) for clarify why we need to wait there.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 kernel/trace/trace_kprobe.c |   23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 7ed6976..8374f78 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -243,11 +243,11 @@ find_event_file_link(struct trace_probe *tp, struct ftrace_event_file *file)
 static int
 disable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
 {
+	struct event_file_link *link = NULL;
+	int wait = 0;
 	int ret = 0;
 
 	if (file) {
-		struct event_file_link *link;
-
 		link = find_event_file_link(tp, file);
 		if (!link) {
 			ret = -EINVAL;
@@ -255,10 +255,7 @@ disable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
 		}
 
 		list_del_rcu(&link->list);
-		/* synchronize with kprobe_trace_func/kretprobe_trace_func */
-		synchronize_sched();
-		kfree(link);
-
+		wait = 1;
 		if (!list_empty(&tp->files))
 			goto out;
 
@@ -271,8 +268,22 @@ disable_trace_probe(struct trace_probe *tp, struct ftrace_event_file *file)
 			disable_kretprobe(&tp->rp);
 		else
 			disable_kprobe(&tp->rp.kp);
+		wait = 1;
 	}
  out:
+	if (wait) {
+		/*
+		 * Synchronize with kprobe_trace_func/kretprobe_trace_func
+		 * to ensure disabled (all running handlers are finished).
+		 * This is not only for kfree(), but also the caller,
+		 * trace_remove_event_call() supposes it for releasing
+		 * event_call related objects, which will be accessed in
+		 * the kprobe_trace_func/kretprobe_trace_func.
+		 */
+		synchronize_sched();
+		kfree(link);	/* Ignored if link == NULL */
+	}
+
 	return ret;
 }
 


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-04  3:33 [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Steven Rostedt
                   ` (4 preceding siblings ...)
  2013-07-04  4:00 ` [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Masami Hiramatsu
@ 2013-07-12 13:09 ` Masami Hiramatsu
  2013-07-12 17:53   ` Oleg Nesterov
  2013-07-15 18:01 ` Oleg Nesterov
  6 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-12 13:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Oleg Nesterov, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/07/04 12:33), Steven Rostedt wrote:
> Currently there exists a race with deleting a kprobe or uprobe and
> a user opening the probe event file or using perf events.
> 
> The problem stems from not being able to take the probe_lock from the
> unregister code because we may have the event_mutex at the time, and
> the event mutex may be taken with the probe_lock held.
> 
> To solve this, the events get a ref count (using the flags field), where
> when an event file is opened, the ftrace_event_call ref count increments.
> Then this is checked under event_mutex and if set, the unregistering
> of the probe will fail.
> 
> Here's a test that shows how things break:
> 
>      # cd /sys/kernel/debug/tracing
>      # echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
>      # enable_probe() {
>         sleep 10
>         echo 1
>      }
>      # file=events/kprobes/sigprocmask/enable
>      # enable_probe > $file &
>      > kprobe_events
>     
>     The above will corrupt the kprobe system, as the write to the enable
>     file will happen after the kprobe was deleted.
>     
>     Trying to create the probe again fails:
>     
>      # echo 'p:sigprocmask sigprocmask' > kprobe_events
>      # cat kprobe_events
>     p:kprobes/sigprocmask sigprocmask
>      # ls events/kprobes/
>     enable  filter
> 
> After applying these patches, the "> kprobe_events" fails due to the
> event being busy.
> 
> Masami, please review these patches and give your ack.

Steve, Oleg, could you also take a look on my additional 2 patches too?

I think both this series and my patches are required for
fixing most of the problem. (what we need is a patch to ensure
finishing all running uprobe handlers at disabling it *if needed*)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-12 13:09 ` Masami Hiramatsu
@ 2013-07-12 17:53   ` Oleg Nesterov
  0 siblings, 0 replies; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-12 17:53 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, linux-kernel, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/12, Masami Hiramatsu wrote:
>
> Steve, Oleg, could you also take a look on my additional 2 patches too?

Sorry, sorry, I was a bit busy.

Masami, I am not sure I can help, but so far I _feel_ that perhaps we
should do something else. I'll try to recheck and report on Monday.

But "Wait for disabling all running kprobe handlers" looks "obviously
fine" to me.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-04  3:33 [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Steven Rostedt
                   ` (5 preceding siblings ...)
  2013-07-12 13:09 ` Masami Hiramatsu
@ 2013-07-15 18:01 ` Oleg Nesterov
  2013-07-16 16:38   ` Oleg Nesterov
  6 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-15 18:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/03, Steven Rostedt wrote:
>
> Currently there exists a race with deleting a kprobe or uprobe and
> a user opening the probe event file or using perf events.
>
> The problem stems from not being able to take the probe_lock from the
> unregister code because we may have the event_mutex at the time, and
> the event mutex may be taken with the probe_lock held.
>
> To solve this, the events get a ref count (using the flags field), where
> when an event file is opened, the ftrace_event_call ref count increments.
> Then this is checked under event_mutex and if set, the unregistering
> of the probe will fail.

So. As Masami pointed out, this is not enough. Probably we can add more
hacks, but I'd like to discuss the alternative approach.

Note also that this ref count has the unfortunate property, if someone
keeps the file opened we can't remove an event.

Not sure you will like it, not sure this can actually work, but let me
try ;)

Note:
	- the "patch" below is only for illustration, it is intentionally
	  incomplete, it only "fixes" ftrace_enable_fops and ignores
	  id/format/filter. However I they could be fixed too. The most
	  problematic is ftrace_event_format_fops, it needs to update
	  trace_format_seq_ops.

	- we still need "trace_remove_event_call() should fail if call/file
	  is in use" which everyone seems to agree with

	- and we still need the discussed changes in trace_kpobes/uprobes

What this patch does:

	- add the new "topmost" rw_semaphore, file_sem.

	- trace_remove_event_call() takes this sem for writing and
	  cleares enable/id/format/filter->i_private

	- removes tracing_open_generic_file/tracing_release_generic_file,
	  we do not use file->f_private any longer

	- changes event_enable_read/event_enable_write to read
	  ftrace_event_file = i_private under read_lock(file_sem) and
	  abort if it is null.

Question: why event_enable_write() needs trace_array_get() ?

Steven, Masami, do you think this can make any sense?

Oleg.

--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -410,35 +410,6 @@ static void put_system(struct ftrace_subsystem_dir *dir)
 }
 
 /*
- * Open and update trace_array ref count.
- * Must have the current trace_array passed to it.
- */
-static int tracing_open_generic_file(struct inode *inode, struct file *filp)
-{
-	struct ftrace_event_file *file = inode->i_private;
-	struct trace_array *tr = file->tr;
-	int ret;
-
-	if (trace_array_get(tr) < 0)
-		return -ENODEV;
-
-	ret = tracing_open_generic(inode, filp);
-	if (ret < 0)
-		trace_array_put(tr);
-	return ret;
-}
-
-static int tracing_release_generic_file(struct inode *inode, struct file *filp)
-{
-	struct ftrace_event_file *file = inode->i_private;
-	struct trace_array *tr = file->tr;
-
-	trace_array_put(tr);
-
-	return 0;
-}
-
-/*
  * __ftrace_set_clr_event(NULL, NULL, NULL, set) will set/unset all events.
  */
 static int
@@ -675,19 +646,51 @@ static void t_stop(struct seq_file *m, void *p)
 	mutex_unlock(&event_mutex);
 }
 
+static DECLARE_RWSEM(file_sem);
+
+static void *get_i_private(struct file *filp)
+{
+	void *data;
+	down_read(&file_sem);
+	data = file_inode(filp)->i_private;
+	if (!data)
+		up_read(&file_sem);
+	return data;
+}
+
+static void put_i_private(struct file *filp)
+{
+	WARN_ON(!file_inode(filp)->i_private);
+	up_read(&file_sem);
+}
+
+static void clear_i_private(struct dentry *dir)
+{
+	struct dentry *child;
+	list_for_each_entry(child, &dir->d_subdirs, d_u.d_child)
+		child->d_inode->i_private = NULL;
+}
+
 static ssize_t
 event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
 		  loff_t *ppos)
 {
-	struct ftrace_event_file *file = filp->private_data;
+	struct ftrace_event_file *file;
+	unsigned long flags;
 	char buf[4] = "0";
 
-	if (file->flags & FTRACE_EVENT_FL_ENABLED &&
-	    !(file->flags & FTRACE_EVENT_FL_SOFT_DISABLED))
+	file = get_i_private(filp);
+	if (!file)
+		return -ENODEV;
+	flags = file->flags;
+	put_i_private(filp);
+
+	if (flags & FTRACE_EVENT_FL_ENABLED &&
+	    !(flags & FTRACE_EVENT_FL_SOFT_DISABLED))
 		strcpy(buf, "1");
 
-	if (file->flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
-	    file->flags & FTRACE_EVENT_FL_SOFT_MODE)
+	if (flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
+	    flags & FTRACE_EVENT_FL_SOFT_MODE)
 		strcat(buf, "*");
 
 	strcat(buf, "\n");
@@ -699,13 +702,10 @@ static ssize_t
 event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
 		   loff_t *ppos)
 {
-	struct ftrace_event_file *file = filp->private_data;
+	struct ftrace_event_file *file;
 	unsigned long val;
 	int ret;
 
-	if (!file)
-		return -EINVAL;
-
 	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
 	if (ret)
 		return ret;
@@ -717,9 +717,18 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
 	switch (val) {
 	case 0:
 	case 1:
-		mutex_lock(&event_mutex);
-		ret = ftrace_event_enable_disable(file, val);
-		mutex_unlock(&event_mutex);
+		file = get_i_private(filp);
+		if (!file)
+			return -ENODEV;
+
+		ret = trace_array_get(file->tr);
+		if (!ret) {
+			mutex_lock(&event_mutex);
+			ret = ftrace_event_enable_disable(file, val);
+			mutex_unlock(&event_mutex);
+			trace_array_put(file->tr);
+		}
+		put_i_private(filp);
 		break;
 
 	default:
@@ -1249,10 +1258,8 @@ static const struct file_operations ftrace_set_event_fops = {
 };
 
 static const struct file_operations ftrace_enable_fops = {
-	.open = tracing_open_generic_file,
 	.read = event_enable_read,
 	.write = event_enable_write,
-	.release = tracing_release_generic_file,
 	.llseek = default_llseek,
 };
 
@@ -1545,6 +1552,7 @@ static void remove_event_from_tracers(struct ftrace_event_call *call)
 			continue;
 
 		list_del(&file->list);
+		clear_i_private(file->dir);
 		debugfs_remove_recursive(file->dir);
 		remove_subsystem(file->system);
 		kmem_cache_free(file_cachep, file);
@@ -1703,6 +1711,7 @@ static void __trace_remove_event_call(struct ftrace_event_call *call)
 /* Remove an event_call */
 void trace_remove_event_call(struct ftrace_event_call *call)
 {
+	down_write(&file_sem);
 	mutex_lock(&trace_types_lock);
 	mutex_lock(&event_mutex);
 	down_write(&trace_event_sem);
@@ -1710,6 +1719,7 @@ void trace_remove_event_call(struct ftrace_event_call *call)
 	up_write(&trace_event_sem);
 	mutex_unlock(&event_mutex);
 	mutex_unlock(&trace_types_lock);
+	up_write(&file_sem);
 }
 
 #define for_each_event(event, start, end)			\


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-09  7:55         ` [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private Masami Hiramatsu
@ 2013-07-15 18:16           ` Oleg Nesterov
  2013-07-17  2:10             ` Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-15 18:16 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On 07/09, Masami Hiramatsu wrote:
>
> To avoid this, when opening events/*/*/enable, we have to ensure
> the dentry of the file is not unlinked yet, under event_mutex
> is locked.

Probably this can work, but I am starting to think that this ref
count becomes toooooo complex....

Could you please look at the draft patch I sent in another email?

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing/kprobe: Wait for disabling all running kprobe handlers
  2013-07-09  9:35                       ` [RFC PATCH V2] " Masami Hiramatsu
@ 2013-07-15 18:20                         ` Oleg Nesterov
  2013-07-18 12:07                           ` Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-15 18:20 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On 07/09, Masami Hiramatsu wrote:
>
> Wait for disabling all running kprobe handlers when a kprobe
> event is disabled, since the caller, trace_remove_event_call()
> supposes that a removing event is disabled completely by
> disabling the event.
> With this change, ftrace can ensure that there is no running
> event handlers after disabling it.
>
> Changes in V2:
>  - Comment (in code) for clarify why we need to wait there.
>
> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Agreed.

We need this change in any case, whatever we do in trace/trace_events.c
and in unregister_trace_probe().

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-15 18:01 ` Oleg Nesterov
@ 2013-07-16 16:38   ` Oleg Nesterov
  2013-07-16 19:10     ` Steven Rostedt
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-16 16:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/15, Oleg Nesterov wrote:
>
> So. As Masami pointed out, this is not enough. Probably we can add more
> hacks, but I'd like to discuss the alternative approach.
>
> Note also that this ref count has the unfortunate property, if someone
> keeps the file opened we can't remove an event.

And please correct me, but afaics there are similar problems with
rmdir instances/xxx.

> What this patch does:
>
> 	- add the new "topmost" rw_semaphore, file_sem.

probably unneeded...

> 	- trace_remove_event_call() takes this sem for writing and
> 	  cleares enable/id/format/filter->i_private
>
> 	- removes tracing_open_generic_file/tracing_release_generic_file,
> 	  we do not use file->f_private any longer
>
> 	- changes event_enable_read/event_enable_write to read
> 	  ftrace_event_file = i_private under read_lock(file_sem) and
> 	  abort if it is null.
>
> Question: why event_enable_write() needs trace_array_get() ?

probably it doesn't...

> Steven, Masami, do you think this can make any sense?

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-16 16:38   ` Oleg Nesterov
@ 2013-07-16 19:10     ` Steven Rostedt
  2013-07-16 19:22       ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2013-07-16 19:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On Tue, 2013-07-16 at 18:38 +0200, Oleg Nesterov wrote:
> On 07/15, Oleg Nesterov wrote:
> >
> > So. As Masami pointed out, this is not enough. Probably we can add more
> > hacks, but I'd like to discuss the alternative approach.
> >
> > Note also that this ref count has the unfortunate property, if someone
> > keeps the file opened we can't remove an event.
> 
> And please correct me, but afaics there are similar problems with
> rmdir instances/xxx.

The instances have their own refcount. And if a file is open, it wont
remove the directory. Just try it...


# cd /sys/kernel/debug/tracing/instances
# mkdir foo
# sleep 10 < foo/events/ext3/enable &
[1] 2087
# rmdir foo
rmdir: failed to remove `foo': Device or resource busy
# 
[1]+  Done                    sleep 10 < foo/events/ext3/enable
# rmdir foo
# 

> 
> > What this patch does:
> >
> > 	- add the new "topmost" rw_semaphore, file_sem.
> 
> probably unneeded...
> 
> > 	- trace_remove_event_call() takes this sem for writing and
> > 	  cleares enable/id/format/filter->i_private
> >
> > 	- removes tracing_open_generic_file/tracing_release_generic_file,
> > 	  we do not use file->f_private any longer
> >
> > 	- changes event_enable_read/event_enable_write to read
> > 	  ftrace_event_file = i_private under read_lock(file_sem) and
> > 	  abort if it is null.
> >
> > Question: why event_enable_write() needs trace_array_get() ?
> 
> probably it doesn't...

I'm confused. Which code has event_enable_write doing a
trace_array_get()?

> 
> > Steven, Masami, do you think this can make any sense?

I need to sort these patches out. They are all over the place. What
exactly do we have for a solution here. Did we figure out which patches
are needed?

I'll have to go back and re-read the entire thread. I've been all over
the place lately with different topics, and don't remember all that has
been said here.

-- Steve



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-16 19:10     ` Steven Rostedt
@ 2013-07-16 19:22       ` Oleg Nesterov
  2013-07-16 19:38         ` Steven Rostedt
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-16 19:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/16, Steven Rostedt wrote:
>
> On Tue, 2013-07-16 at 18:38 +0200, Oleg Nesterov wrote:
> > On 07/15, Oleg Nesterov wrote:
> > >
> > > So. As Masami pointed out, this is not enough. Probably we can add more
> > > hacks, but I'd like to discuss the alternative approach.
> > >
> > > Note also that this ref count has the unfortunate property, if someone
> > > keeps the file opened we can't remove an event.
> >
> > And please correct me, but afaics there are similar problems with
> > rmdir instances/xxx.
>
> The instances have their own refcount. And if a file is open, it wont
> remove the directory. Just try it...

I guess you mean "if (tr->ref)" check.

But tracing_open_generic_file()->trace_array_get() is racy, no?
Somehow we need to ensure it is safe to use file / file->tr.

> > > Question: why event_enable_write() needs trace_array_get() ?
> >
> > probably it doesn't...
>
> I'm confused. Which code has event_enable_write doing a
> trace_array_get()?

I meant tracing_open_generic_file(), sorry. I _think_ it can die,
please see RFC I sent.

But let me repeat once again, I do not pretend I understand this
code ;)

> I need to sort these patches out. They are all over the place. What
> exactly do we have for a solution here. Did we figure out which patches
> are needed?

Well, see above. But I think:

	1. We should close open/delete races

	2. We still need probe_remove_event_call() which return -EBUSY
	   if perf_refcount || FTRACE_EVENT_FL_ENABLED

	   2/4 in your series but without TRACE_EVENT_FL_REF_MASK check

	3. We need the changes in trace_kprobes.c (and uprobes but lets
	   ignore it for now). The patch from you (3/4) and another one
	   from Masami (Wait for disabling all running kprobe handlers).

	   However, I think it would be better to discuss 3. later.

Btw, Steven, what about other pending (and orthogonal changes) ?
>From Jovi, from me. I see nothing new in linux-trace.git#for-next...

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-16 19:22       ` Oleg Nesterov
@ 2013-07-16 19:38         ` Steven Rostedt
  2013-07-17 16:03           ` Steven Rostedt
  0 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2013-07-16 19:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On Tue, 2013-07-16 at 21:22 +0200, Oleg Nesterov wrote:

> But tracing_open_generic_file()->trace_array_get() is racy, no?
> Somehow we need to ensure it is safe to use file / file->tr.

Ah crap. Yeah, I see what you mean. It's the race that I described with
deleting private data and using it on open. :-p


> Btw, Steven, what about other pending (and orthogonal changes) ?
> >From Jovi, from me. I see nothing new in linux-trace.git#for-next...

My INBOX is full with too much crap. I thought I grabbed the patches
from you and Jovi, but they may have been something else.

Can you just tell me what the subjects were. My INBOX currently has
>10,000 emails, and it's just getting worse.

-- Steve



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-15 18:16           ` Oleg Nesterov
@ 2013-07-17  2:10             ` Masami Hiramatsu
  2013-07-17 14:51               ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-17  2:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

(2013/07/16 3:16), Oleg Nesterov wrote:
> On 07/09, Masami Hiramatsu wrote:
>>
>> To avoid this, when opening events/*/*/enable, we have to ensure
>> the dentry of the file is not unlinked yet, under event_mutex
>> is locked.
> 
> Probably this can work, but I am starting to think that this ref
> count becomes toooooo complex....

Oleg, I agree that is a bit complex way to fix but it can fix.

However, I think we should fix bugs first, because there are bugs.
I don't like to leave a bug in the kernel as it is, while we re-
start discussing new approach...

> Could you please look at the draft patch I sent in another email?

I'm happy to see the series, and I guess it will take a long
discussion and review time again.

So, I'd like to suggest you getting these fixes into ftrace as
first-aid treatment, and discuss your new series.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-17  2:10             ` Masami Hiramatsu
@ 2013-07-17 14:51               ` Oleg Nesterov
  2013-07-18  2:20                 ` Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-17 14:51 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On 07/17, Masami Hiramatsu wrote:
>
> (2013/07/16 3:16), Oleg Nesterov wrote:
> > On 07/09, Masami Hiramatsu wrote:
> >>
> >> To avoid this, when opening events/*/*/enable, we have to ensure
> >> the dentry of the file is not unlinked yet, under event_mutex
> >> is locked.
> >
> > Probably this can work, but I am starting to think that this ref
> > count becomes toooooo complex....
>
> Oleg, I agree that is a bit complex way to fix but it can fix.
>
> However, I think we should fix bugs first, because there are bugs.
> I don't like to leave a bug in the kernel as it is, while we re-
> start discussing new approach...

And I am trying to fix the same problems, just _I think_ there is
another and simpler approach.

> So, I'd like to suggest you getting these fixes into ftrace as
> first-aid treatment, and discuss your new series.

Well, perhaps you are right... But this TRACE_EVENT_FL_REF_MASK code
is new too, it is not that we only need a small fixlets to finish it.

So I think that it makes sense to discuss the alternatives before we
decide what exactly we should do.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-16 19:38         ` Steven Rostedt
@ 2013-07-17 16:03           ` Steven Rostedt
  2013-07-17 17:37             ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2013-07-17 16:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On Tue, 2013-07-16 at 15:38 -0400, Steven Rostedt wrote:

> 
> > Btw, Steven, what about other pending (and orthogonal changes) ?
> > >From Jovi, from me. I see nothing new in linux-trace.git#for-next...
> 
> My INBOX is full with too much crap. I thought I grabbed the patches
> from you and Jovi, but they may have been something else.
> 
> Can you just tell me what the subjects were. My INBOX currently has
> >10,000 emails, and it's just getting worse.

Ping?

-- Steve



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe
  2013-07-17 16:03           ` Steven Rostedt
@ 2013-07-17 17:37             ` Oleg Nesterov
  0 siblings, 0 replies; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-17 17:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

On 07/17, Steven Rostedt wrote:
>
> On Tue, 2013-07-16 at 15:38 -0400, Steven Rostedt wrote:
>
> > > Btw, Steven, what about other pending (and orthogonal changes) ?
> > > >From Jovi, from me. I see nothing new in linux-trace.git#for-next...
> >
> > My INBOX is full with too much crap. I thought I grabbed the patches
> > from you and Jovi, but they may have been something else.
> >
> > Can you just tell me what the subjects were. My INBOX currently has
> > >10,000 emails, and it's just getting worse.

Please see the attached mbox. It contains 2 series with
perf_trace_buf/perf_xxx hacks.

1-3 looks very simple and straightforward.

4-6 reimplement TP_perf_assign(), we already discussed this (and you even
did the performance testing), but I am still not sure if you agree with
this hack or not.

And 2 patches from Jovi, acked by Masami and me:

	[PATCH 1/2 v6] tracing/uprobes: Support ftrace_event_file base multibuffer
	[PATCH 2/2 v6] tracing/uprobes: Support soft-mode disabling

Oleg.

[-- Attachment #2: RSND.m --]
[-- Type: text/plain, Size: 16628 bytes --]

>From aa40385bba7692f4b53572da24119c3b293c9aa1 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Mon, 17 Jun 2013 19:02:04 +0200
Subject: [PATCH 1/6] tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty

perf_trace_buf_prepare() + perf_trace_buf_submit(head, task => NULL)
make no sense if hlist_empty(head). Change perf_ftrace_function_call()
to check event_function.perf_events beforehand.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_event_perf.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 84b1e04..12df557 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -266,6 +266,10 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
 	struct pt_regs regs;
 	int rctx;
 
+	head = this_cpu_ptr(event_function.perf_events);
+	if (hlist_empty(head))
+		return;
+
 #define ENTRY_SIZE (ALIGN(sizeof(struct ftrace_entry) + sizeof(u32), \
 		    sizeof(u64)) - sizeof(u32))
 
@@ -279,8 +283,6 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
 
 	entry->ip = ip;
 	entry->parent_ip = parent_ip;
-
-	head = this_cpu_ptr(event_function.perf_events);
 	perf_trace_buf_submit(entry, ENTRY_SIZE, rctx, 0,
 			      1, &regs, head, NULL);
 
-- 
1.5.5.1


>From 803ce880302ef7d21bdad2196cddc76b4c1a49a1 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Mon, 17 Jun 2013 19:02:07 +0200
Subject: [PATCH 2/6] tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty

perf_trace_buf_prepare() + perf_trace_buf_submit(head, task => NULL)
make no sense if hlist_empty(head). Change perf_syscall_enter/exit()
to check sys_data->{enter,exit}_event->perf_events beforehand.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/trace/trace_syscalls.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 322e164..ea5b672 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -566,6 +566,10 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	if (!sys_data)
 		return;
 
+	head = this_cpu_ptr(sys_data->enter_event->perf_events);
+	if (hlist_empty(head))
+		return;
+
 	/* get the size after alignment with the u32 buffer size field */
 	size = sizeof(unsigned long) * sys_data->nb_args + sizeof(*rec);
 	size = ALIGN(size + sizeof(u32), sizeof(u64));
@@ -583,8 +587,6 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	rec->nr = syscall_nr;
 	syscall_get_arguments(current, regs, 0, sys_data->nb_args,
 			       (unsigned long *)&rec->args);
-
-	head = this_cpu_ptr(sys_data->enter_event->perf_events);
 	perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, head, NULL);
 }
 
@@ -642,6 +644,10 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 	if (!sys_data)
 		return;
 
+	head = this_cpu_ptr(sys_data->exit_event->perf_events);
+	if (hlist_empty(head))
+		return;
+
 	/* We can probably do that at build time */
 	size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
@@ -661,8 +667,6 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 
 	rec->nr = syscall_nr;
 	rec->ret = syscall_get_return_value(current, regs);
-
-	head = this_cpu_ptr(sys_data->exit_event->perf_events);
 	perf_trace_buf_submit(rec, size, rctx, 0, 1, regs, head, NULL);
 }
 
-- 
1.5.5.1


>From 6037b2b9af2e6122380e967406f89ae9410fc5d6 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Mon, 17 Jun 2013 19:02:11 +0200
Subject: [PATCH 3/6] tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()

Every perf_trace_buf_prepare() caller does
WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is
almost the same.

Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes
the meaning of _ONCE, but I think this is fine.

	- 4947014 2932448 10104832  17984294  1126b26 vmlinux
	+ 4948422 2932448 10104832  17985702  11270a6 vmlinux

on my build.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/trace/ftrace.h          |    4 ----
 kernel/trace/trace_event_perf.c |    4 ++++
 kernel/trace/trace_kprobe.c     |    6 ------
 kernel/trace/trace_syscalls.c   |   12 ------------
 kernel/trace/trace_uprobe.c     |    2 --
 5 files changed, 4 insertions(+), 24 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 19edd7f..c162a57 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -666,10 +666,6 @@ perf_trace_##call(void *__data, proto)					\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
 									\
-	if (WARN_ONCE(__entry_size > PERF_MAX_TRACE_SIZE,		\
-		      "profile buffer not large enough"))		\
-		return;							\
-									\
 	entry = (struct ftrace_raw_##call *)perf_trace_buf_prepare(	\
 		__entry_size, event_call->event.type, &__regs, &rctx);	\
 	if (!entry)							\
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 12df557..80c36bc 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -236,6 +236,10 @@ __kprobes void *perf_trace_buf_prepare(int size, unsigned short type,
 
 	BUILD_BUG_ON(PERF_MAX_TRACE_SIZE % sizeof(unsigned long));
 
+	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE,
+			"perf buffer not large enough"))
+		return NULL;
+
 	pc = preempt_count();
 
 	*rctxp = perf_swevent_get_recursion_context();
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 7ed6976..ae6ce83 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1087,9 +1087,6 @@ kprobe_perf_func(struct trace_probe *tp, struct pt_regs *regs)
 	__size = sizeof(*entry) + tp->size + dsize;
 	size = ALIGN(__size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
-	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE,
-		     "profile buffer not large enough"))
-		return;
 
 	entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
 	if (!entry)
@@ -1120,9 +1117,6 @@ kretprobe_perf_func(struct trace_probe *tp, struct kretprobe_instance *ri,
 	__size = sizeof(*entry) + tp->size + dsize;
 	size = ALIGN(__size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
-	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE,
-		     "profile buffer not large enough"))
-		return;
 
 	entry = perf_trace_buf_prepare(size, call->event.type, regs, &rctx);
 	if (!entry)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index ea5b672..b7e3447 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -575,10 +575,6 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	size = ALIGN(size + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE,
-		      "perf buffer not large enough"))
-		return;
-
 	rec = (struct syscall_trace_enter *)perf_trace_buf_prepare(size,
 				sys_data->enter_event->event.type, regs, &rctx);
 	if (!rec)
@@ -652,14 +648,6 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 	size = ALIGN(sizeof(*rec) + sizeof(u32), sizeof(u64));
 	size -= sizeof(u32);
 
-	/*
-	 * Impossible, but be paranoid with the future
-	 * How to put this check outside runtime?
-	 */
-	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE,
-		"exit event has grown above perf buffer size"))
-		return;
-
 	rec = (struct syscall_trace_exit *)perf_trace_buf_prepare(size,
 				sys_data->exit_event->event.type, regs, &rctx);
 	if (!rec)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index d5d0cd3..a23d2d7 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -818,8 +818,6 @@ static void uprobe_perf_print(struct trace_uprobe *tu,
 
 	size = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
 	size = ALIGN(size + tu->size + sizeof(u32), sizeof(u64)) - sizeof(u32);
-	if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
-		return;
 
 	preempt_disable();
 	head = this_cpu_ptr(call->perf_events);
-- 
1.5.5.1


>From f883693f6e02a9c5c1e32b47490671ffa554a6fe Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 18 Jun 2013 21:22:11 +0200
Subject: [PATCH 4/6] tracing/perf: expand TRACE_EVENT(sched_stat_runtime)

To simplify the review of the next patch:

1. We are going to reimplent __perf_task/counter and embedd them
   into TP_ARGS(). expand TRACE_EVENT(sched_stat_runtime) into
   DECLARE_EVENT_CLASS() + DEFINE_EVENT(), this way they can use
   different TP_ARGS's.

2. Change perf_trace_##call() macri to do perf_fetch_caller_regs()
   right before perf_trace_buf_prepare(). This way it evaluates
   "args" asap.

   Note: after 87f44bbc perf_trace_buf_prepare() doesn't need
   "struct pt_regs *regs", perhaps it makes sense to remove this
   argument. And perhaps we can teach perf_trace_buf_submit()
   to accept regs == NULL and do fetch_caller_regs(CALLER_ADDR1)
   in this case.

3. Cosmetic, but the typecast from "void*" buys nothing. It just
   adds the noise, remove it.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/trace/events/sched.h |    6 +++++-
 include/trace/ftrace.h       |    7 +++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index e5586ca..249c024 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -372,7 +372,7 @@ DEFINE_EVENT(sched_stat_template, sched_stat_blocked,
  * Tracepoint for accounting runtime (time the task is executing
  * on a CPU).
  */
-TRACE_EVENT(sched_stat_runtime,
+DECLARE_EVENT_CLASS(sched_stat_runtime,
 
 	TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
 
@@ -401,6 +401,10 @@ TRACE_EVENT(sched_stat_runtime,
 			(unsigned long long)__entry->vruntime)
 );
 
+DEFINE_EVENT(sched_stat_runtime, sched_stat_runtime,
+	     TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
+	     TP_ARGS(tsk, runtime, vruntime));
+
 /*
  * Tracepoint for showing priority inheritance modifying a tasks
  * priority.
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index c162a57..aed594a 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -659,15 +659,14 @@ perf_trace_##call(void *__data, proto)					\
 	int __data_size;						\
 	int rctx;							\
 									\
-	perf_fetch_caller_regs(&__regs);				\
-									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
 									\
-	entry = (struct ftrace_raw_##call *)perf_trace_buf_prepare(	\
-		__entry_size, event_call->event.type, &__regs, &rctx);	\
+	perf_fetch_caller_regs(&__regs);				\
+	entry = perf_trace_buf_prepare(__entry_size,			\
+			event_call->event.type, &__regs, &rctx);	\
 	if (!entry)							\
 		return;							\
 									\
-- 
1.5.5.1


>From 74318b1e89e7d6e6de9bb4992ab41e761c1175f6 Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 18 Jun 2013 21:22:14 +0200
Subject: [PATCH 5/6] tracing/perf: reimplement TP_perf_assign() logic

TP_perf_assign/__perf_xxx is used to change the default values
of __addr/__count/__task variables for perf_trace_buf_submit().

Unfortunately, TP_perf_assign() is called "too late", we want to
have a fast-path "__task != NULL" check in perf_trace_##call() at
the start. So this patch simply embeds __perf_xxx() into TP_ARGS(),
this way DECLARE_EVENT_CLASS() can use the result of assignments
hidden in "args" right after ftrace_get_offsets_##call() which is
mostly trivial.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/trace/events/sched.h |   16 +++-------------
 include/trace/ftrace.h       |   19 +++++++++++--------
 2 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 249c024..2e7d994 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -57,7 +57,7 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
 
 	TP_PROTO(struct task_struct *p, int success),
 
-	TP_ARGS(p, success),
+	TP_ARGS(__perf_task(p), success),
 
 	TP_STRUCT__entry(
 		__array(	char,	comm,	TASK_COMM_LEN	)
@@ -73,9 +73,6 @@ DECLARE_EVENT_CLASS(sched_wakeup_template,
 		__entry->prio		= p->prio;
 		__entry->success	= success;
 		__entry->target_cpu	= task_cpu(p);
-	)
-	TP_perf_assign(
-		__perf_task(p);
 	),
 
 	TP_printk("comm=%s pid=%d prio=%d success=%d target_cpu=%03d",
@@ -313,7 +310,7 @@ DECLARE_EVENT_CLASS(sched_stat_template,
 
 	TP_PROTO(struct task_struct *tsk, u64 delay),
 
-	TP_ARGS(tsk, delay),
+	TP_ARGS(__perf_task(tsk), __perf_count(delay)),
 
 	TP_STRUCT__entry(
 		__array( char,	comm,	TASK_COMM_LEN	)
@@ -325,10 +322,6 @@ DECLARE_EVENT_CLASS(sched_stat_template,
 		memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
 		__entry->pid	= tsk->pid;
 		__entry->delay	= delay;
-	)
-	TP_perf_assign(
-		__perf_count(delay);
-		__perf_task(tsk);
 	),
 
 	TP_printk("comm=%s pid=%d delay=%Lu [ns]",
@@ -376,7 +369,7 @@ DECLARE_EVENT_CLASS(sched_stat_runtime,
 
 	TP_PROTO(struct task_struct *tsk, u64 runtime, u64 vruntime),
 
-	TP_ARGS(tsk, runtime, vruntime),
+	TP_ARGS(tsk, __perf_count(runtime), vruntime),
 
 	TP_STRUCT__entry(
 		__array( char,	comm,	TASK_COMM_LEN	)
@@ -390,9 +383,6 @@ DECLARE_EVENT_CLASS(sched_stat_runtime,
 		__entry->pid		= tsk->pid;
 		__entry->runtime	= runtime;
 		__entry->vruntime	= vruntime;
-	)
-	TP_perf_assign(
-		__perf_count(runtime);
 	),
 
 	TP_printk("comm=%s pid=%d runtime=%Lu [ns] vruntime=%Lu [ns]",
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index aed594a..8886877 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -503,8 +503,14 @@ static inline notrace int ftrace_get_offsets_##call(			\
 #undef TP_fast_assign
 #define TP_fast_assign(args...) args
 
-#undef TP_perf_assign
-#define TP_perf_assign(args...)
+#undef __perf_addr
+#define __perf_addr(a)	(a)
+
+#undef __perf_count
+#define __perf_count(c)	(c)
+
+#undef __perf_task
+#define __perf_task(t)	(t)
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
@@ -632,16 +638,13 @@ __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call
 #define __get_str(field) (char *)__get_dynamic_array(field)
 
 #undef __perf_addr
-#define __perf_addr(a) __addr = (a)
+#define __perf_addr(a)	(__addr = (a))
 
 #undef __perf_count
-#define __perf_count(c) __count = (c)
+#define __perf_count(c)	(__count = (c))
 
 #undef __perf_task
-#define __perf_task(t) __task = (t)
-
-#undef TP_perf_assign
-#define TP_perf_assign(args...) args
+#define __perf_task(t)	(__task = (t))
 
 #undef DECLARE_EVENT_CLASS
 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
-- 
1.5.5.1


>From f5e6c3976bd68a72d7259e3282c5fda20fcefa4c Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 18 Jun 2013 21:22:18 +0200
Subject: [PATCH 6/6] tracing/perf: Avoid perf_trace_buf_*() in perf_trace_##call() when possible

perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
make no sense if hlist_empty(head). Change perf_trace_##call()
to check ->perf_events beforehand and do nothing if it is empty.

However, we can only do this if __task == NULL, so we also add
the __builtin_constant_p(__task) check.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/trace/ftrace.h |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 8886877..04455b8 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -663,6 +663,12 @@ perf_trace_##call(void *__data, proto)					\
 	int rctx;							\
 									\
 	__data_size = ftrace_get_offsets_##call(&__data_offsets, args); \
+									\
+	head = this_cpu_ptr(event_call->perf_events);			\
+	if (__builtin_constant_p(!__task) && !__task &&			\
+				hlist_empty(head))			\
+		return;							\
+									\
 	__entry_size = ALIGN(__data_size + sizeof(*entry) + sizeof(u32),\
 			     sizeof(u64));				\
 	__entry_size -= sizeof(u32);					\
@@ -677,7 +683,6 @@ perf_trace_##call(void *__data, proto)					\
 									\
 	{ assign; }							\
 									\
-	head = this_cpu_ptr(event_call->perf_events);			\
 	perf_trace_buf_submit(entry, __entry_size, rctx, __addr,	\
 		__count, &__regs, head, __task);			\
 }
-- 
1.5.5.1


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-17 14:51               ` Oleg Nesterov
@ 2013-07-18  2:20                 ` Masami Hiramatsu
  2013-07-18 14:51                   ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-18  2:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

(2013/07/17 23:51), Oleg Nesterov wrote:
> On 07/17, Masami Hiramatsu wrote:
>>
>> (2013/07/16 3:16), Oleg Nesterov wrote:
>>> On 07/09, Masami Hiramatsu wrote:
>>>>
>>>> To avoid this, when opening events/*/*/enable, we have to ensure
>>>> the dentry of the file is not unlinked yet, under event_mutex
>>>> is locked.
>>>
>>> Probably this can work, but I am starting to think that this ref
>>> count becomes toooooo complex....
>>
>> Oleg, I agree that is a bit complex way to fix but it can fix.
>>
>> However, I think we should fix bugs first, because there are bugs.
>> I don't like to leave a bug in the kernel as it is, while we re-
>> start discussing new approach...
> 
> And I am trying to fix the same problems, just _I think_ there is
> another and simpler approach.

I see.
I also just feel uncomfortable with leaving obvious bugs, like as
having ants in my pants :)

>> So, I'd like to suggest you getting these fixes into ftrace as
>> first-aid treatment, and discuss your new series.
> 
> Well, perhaps you are right... But this TRACE_EVENT_FL_REF_MASK code
> is new too, it is not that we only need a small fixlets to finish it.

Would you mean that TRACE_EVENT_FL_REF_MASK may also have some problems?

> So I think that it makes sense to discuss the alternatives before we
> decide what exactly we should do.

Your approach is also interesting for me, indeed. However, it is so
different from current one. I think you should clarify what bug you
would like to solve and how. And we should clarify what ants^H^H^H^H
bugs we have found in ftrace too. ;)

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing/kprobe: Wait for disabling all running kprobe handlers
  2013-07-15 18:20                         ` Oleg Nesterov
@ 2013-07-18 12:07                           ` Masami Hiramatsu
  2013-07-18 14:35                             ` Steven Rostedt
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-18 12:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, Peter Zijlstra, Frederic Weisbecker, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

(2013/07/16 3:20), Oleg Nesterov wrote:
> On 07/09, Masami Hiramatsu wrote:
>>
>> Wait for disabling all running kprobe handlers when a kprobe
>> event is disabled, since the caller, trace_remove_event_call()
>> supposes that a removing event is disabled completely by
>> disabling the event.
>> With this change, ftrace can ensure that there is no running
>> event handlers after disabling it.
>>
>> Changes in V2:
>>  - Comment (in code) for clarify why we need to wait there.
>>
>> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> 
> Agreed.
> 
> We need this change in any case, whatever we do in trace/trace_events.c
> and in unregister_trace_probe().

Steven, I think this patch is still needed anyway even if we are
discussing file handling bugs, because this fixes another timing bug.

Thank you,

-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing/kprobe: Wait for disabling all running kprobe handlers
  2013-07-18 12:07                           ` Masami Hiramatsu
@ 2013-07-18 14:35                             ` Steven Rostedt
  0 siblings, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2013-07-18 14:35 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Oleg Nesterov, Peter Zijlstra, Frederic Weisbecker, linux-kernel,
	Ingo Molnar, Arnaldo Carvalho de Melo, Andrew Morton,
	jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On Thu, 2013-07-18 at 21:07 +0900, Masami Hiramatsu wrote:
> (2013/07/16 3:20), Oleg Nesterov wrote:
> > On 07/09, Masami Hiramatsu wrote:
> >>
> >> Wait for disabling all running kprobe handlers when a kprobe
> >> event is disabled, since the caller, trace_remove_event_call()
> >> supposes that a removing event is disabled completely by
> >> disabling the event.
> >> With this change, ftrace can ensure that there is no running
> >> event handlers after disabling it.
> >>
> >> Changes in V2:
> >>  - Comment (in code) for clarify why we need to wait there.
> >>
> >> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> > 
> > Agreed.
> > 
> > We need this change in any case, whatever we do in trace/trace_events.c
> > and in unregister_trace_probe().
> 
> Steven, I think this patch is still needed anyway even if we are
> discussing file handling bugs, because this fixes another timing bug.

Sure, I'll pull that into my queue now.

-- Steve



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-18  2:20                 ` Masami Hiramatsu
@ 2013-07-18 14:51                   ` Oleg Nesterov
  2013-07-19  5:21                     ` Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-18 14:51 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On 07/18, Masami Hiramatsu wrote:
>
> (2013/07/17 23:51), Oleg Nesterov wrote:
> > Well, perhaps you are right... But this TRACE_EVENT_FL_REF_MASK code
> > is new too, it is not that we only need a small fixlets to finish it.
>
> Would you mean that TRACE_EVENT_FL_REF_MASK may also have some problems?

It was you who initially pointed that it does have problems ;)

And, _afaics_ your patch which tries to fix this problem is not
exactly correct.

It removes trace_array_get/put from tracing_open_generic_file() and
tracing_release_generic_file(). This assumes that "call->flags++" is
enough, but it is not.

Yes, the next patch adds the "flags & TRACE_EVENT_FL_REF_MASK" check
into trace_remove_event_call() path. But this is still racy wrt
instance_delete() unless I missed something.

IOW, I believe that either .open() should do trace_array_get(), or
__trace_remove_event_dirs() needs another for-each-file loop which
checks file->call->flags & TRACE_EVENT_FL_REF_MASK.


> > So I think that it makes sense to discuss the alternatives before we
> > decide what exactly we should do.
>
> Your approach is also interesting for me, indeed. However, it is so
> different from current one. I think you should clarify what bug you
> would like to solve and how.

The same bugs which Steven's 1/4 tries to solve ;)

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-18 14:51                   ` Oleg Nesterov
@ 2013-07-19  5:21                     ` Masami Hiramatsu
  2013-07-19 13:33                       ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-19  5:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

(2013/07/18 23:51), Oleg Nesterov wrote:
> On 07/18, Masami Hiramatsu wrote:
>>
>> (2013/07/17 23:51), Oleg Nesterov wrote:
>>> Well, perhaps you are right... But this TRACE_EVENT_FL_REF_MASK code
>>> is new too, it is not that we only need a small fixlets to finish it.
>>
>> Would you mean that TRACE_EVENT_FL_REF_MASK may also have some problems?
> 
> It was you who initially pointed that it does have problems ;)
> 
> And, _afaics_ your patch which tries to fix this problem is not
> exactly correct.

Hm,

> It removes trace_array_get/put from tracing_open_generic_file() and
> tracing_release_generic_file(). This assumes that "call->flags++" is
> enough, but it is not.

No, it replaces trace_array_get/put with ftrace_event_file_get/put
which calls trace_array_get/put inside.
(Just one point, previous ftrace_event_file_get has a racy point
when it does tr->ref++, it should be fixed.)

> Yes, the next patch adds the "flags & TRACE_EVENT_FL_REF_MASK" check
> into trace_remove_event_call() path. But this is still racy wrt
> instance_delete() unless I missed something.
> 
> IOW, I believe that either .open() should do trace_array_get(), or
> __trace_remove_event_dirs() needs another for-each-file loop which
> checks file->call->flags & TRACE_EVENT_FL_REF_MASK.

Agreed :)

>>> So I think that it makes sense to discuss the alternatives before we
>>> decide what exactly we should do.
>>
>> Your approach is also interesting for me, indeed. However, it is so
>> different from current one. I think you should clarify what bug you
>> would like to solve and how.
> 
> The same bugs which Steven's 1/4 tries to solve ;)

OK, let me confirm that, would you mean we still need 2/4 - 4/4?

Thank you,
-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-19  5:21                     ` Masami Hiramatsu
@ 2013-07-19 13:33                       ` Oleg Nesterov
  2013-07-22  9:57                         ` Masami Hiramatsu
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-19 13:33 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On 07/19, Masami Hiramatsu wrote:
>
> (2013/07/18 23:51), Oleg Nesterov wrote:
> > It removes trace_array_get/put from tracing_open_generic_file() and
> > tracing_release_generic_file(). This assumes that "call->flags++" is
> > enough, but it is not.
>
> No, it replaces trace_array_get/put with ftrace_event_file_get/put
> which calls trace_array_get/put inside.

Ah, I didn't notice your patch adds "file->tr->ref++" into
ftrace_event_file_get...

So I was wrong in any case, thanks for correcting me.

But,

> (Just one point, previous ftrace_event_file_get has a racy point
> when it does tr->ref++, it should be fixed.)

Not sure what do you mean, but unless I missed something again this
"tr->ref++" above still looks racy. instance_delete() checks tr->ref
first, then it takes event_mutex and removes/kfrees event_files.

But this doesn't really matter even if I am right, surely this can
be fixed. My only point, imho this is more complex than necessary.

In particular,

> > IOW, I believe that either .open() should do trace_array_get(), or
> > __trace_remove_event_dirs() needs another for-each-file loop which
> > checks file->call->flags & TRACE_EVENT_FL_REF_MASK.
>
> Agreed :)

Yes ;) and this makes the ref-counting even more complex, we use
different methods to avoid the races with rmdir and event_remove().

> > The same bugs which Steven's 1/4 tries to solve ;)
>
> OK, let me confirm that, would you mean we still need 2/4 - 4/4?

Yes, yes.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-19 13:33                       ` Oleg Nesterov
@ 2013-07-22  9:57                         ` Masami Hiramatsu
  2013-07-22 17:04                           ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-22  9:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

(2013/07/19 22:33), Oleg Nesterov wrote:
> On 07/19, Masami Hiramatsu wrote:
>>
>> (2013/07/18 23:51), Oleg Nesterov wrote:
>>> It removes trace_array_get/put from tracing_open_generic_file() and
>>> tracing_release_generic_file(). This assumes that "call->flags++" is
>>> enough, but it is not.
>>
>> No, it replaces trace_array_get/put with ftrace_event_file_get/put
>> which calls trace_array_get/put inside.
> 
> Ah, I didn't notice your patch adds "file->tr->ref++" into
> ftrace_event_file_get...
> 
> So I was wrong in any case, thanks for correcting me.
> 
> But,
> 
>> (Just one point, previous ftrace_event_file_get has a racy point
>> when it does tr->ref++, it should be fixed.)
> 
> Not sure what do you mean, but unless I missed something again this
> "tr->ref++" above still looks racy. instance_delete() checks tr->ref
> first, then it takes event_mutex and removes/kfrees event_files.
> 
> But this doesn't really matter even if I am right, surely this can
> be fixed. My only point, imho this is more complex than necessary.

I see, so I'd like to see the fix. However, I'm not sure
we have enough time to fix that cleanly. Note that except
for the timing bug, we still leave a kernel bug which can
easily be reproduced as Jovi reported.

> In particular,
> 
>>> IOW, I believe that either .open() should do trace_array_get(), or
>>> __trace_remove_event_dirs() needs another for-each-file loop which
>>> checks file->call->flags & TRACE_EVENT_FL_REF_MASK.
>>
>> Agreed :)
> 
> Yes ;) and this makes the ref-counting even more complex, we use
> different methods to avoid the races with rmdir and event_remove().
> 
>>> The same bugs which Steven's 1/4 tries to solve ;)
>>
>> OK, let me confirm that, would you mean we still need 2/4 - 4/4?
> 
> Yes, yes.

And those are depends on 1/4...

Thank you,


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-22  9:57                         ` Masami Hiramatsu
@ 2013-07-22 17:04                           ` Oleg Nesterov
  2013-07-23 21:04                             ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-22 17:04 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On 07/22, Masami Hiramatsu wrote:
>
> (2013/07/19 22:33), Oleg Nesterov wrote:
> > My only point, imho this is more complex than necessary.
>
> I see, so I'd like to see the fix. However, I'm not sure
> we have enough time to fix that cleanly.

I promise, tomorrow I'll re-send the RFC patches, so if you don't
like them we can switch back to refcounting.

Sorry for delay. Today I was busy with other bugs I "found" in
subsystem_open/etc code, but when I tried to fix them I realized
that I have misread this code.

> Note that except
> for the timing bug, we still leave a kernel bug which can
> easily be reproduced as Jovi reported.

Could you please remind ?

> >> OK, let me confirm that, would you mean we still need 2/4 - 4/4?
> >
> > Yes, yes.
>
> And those are depends on 1/4...

Not at all or I missed something (quite possible). Just 2/4 should
not check ->flags, of course. 3/4 looks "obviously fine", 4/4 was
already merged.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private
  2013-07-22 17:04                           ` Oleg Nesterov
@ 2013-07-23 21:04                             ` Oleg Nesterov
  0 siblings, 0 replies; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-23 21:04 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Peter Zijlstra, Frederic Weisbecker,
	linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo,
	Andrew Morton, jovi.zhangwei, Jiri Olsa, Srikar Dronamraju

On 07/22, Oleg Nesterov wrote:
>
> On 07/22, Masami Hiramatsu wrote:
> >
> > (2013/07/19 22:33), Oleg Nesterov wrote:
>
> > >> OK, let me confirm that, would you mean we still need 2/4 - 4/4?
> > >
> > > Yes, yes.
> >
> > And those are depends on 1/4...
>
> Not at all or I missed something (quite possible). Just 2/4 should
> not check ->flags, of course. 3/4 looks "obviously fine", 4/4 was
> already merged.

Sorry for confusion, I meant that your a232e270dcb is already merged.

4/4 is still needed. But again, _afaics_ 2-4 can go on top of the
"tracing: open/delete fixes" series I sent, only 2/4 needs the small
update.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-04  3:33 ` [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open Steven Rostedt
  2013-07-04 12:45   ` Masami Hiramatsu
@ 2013-07-30  8:15   ` Masami Hiramatsu
  2013-07-31 19:49   ` Steven Rostedt
  2 siblings, 0 replies; 67+ messages in thread
From: Masami Hiramatsu @ 2013-07-30  8:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Oleg Nesterov, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/07/04 12:33), Steven Rostedt wrote:
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> When one of the event files is opened, we need to prevent them from
> being removed. Modules do with with the module owner set (automated
> from the VFS layer).  The ftrace buffer instances have a ref count added
> to the trace_array when the enabled file is opened (the others are not
> that big of a deal, as they only reference the event calls which
> still exist when an instance disappears). But kprobes and uprobes
> do not have any protection.
> 
>  # cd /sys/kernel/debug/tracing
>  # echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
>  # enable_probe() {
>     sleep 10
>     echo 1
>  }
>  # file=events/kprobes/sigprocmask/enable
>  # enable_probe > $file &
>  > kprobe_events
> 
> The above will corrupt the kprobe system, as the write to the enable
> file will happen after the kprobe was deleted.
> 
> Trying to create the probe again fails:
> 
>  # echo 'p:sigprocmask sigprocmask' > kprobe_events
>  # cat kprobe_events
> p:kprobes/sigprocmask sigprocmask
>  # ls events/kprobes/
> enable  filter
> 
> Have the unregister probe fail when the event files are open, in use
> are used by perf.
> 

Now, since the commit a232e270d makes sure that disable_trace_probe()
waits until all running handlers are done, this patch has no problem.

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Thank you!


> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_kprobe.c |   21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 7ed6976..ffcaf42 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -95,7 +95,7 @@ static __kprobes bool trace_probe_is_on_module(struct trace_probe *tp)
>  }
>  
>  static int register_probe_event(struct trace_probe *tp);
> -static void unregister_probe_event(struct trace_probe *tp);
> +static int unregister_probe_event(struct trace_probe *tp);
>  
>  static DEFINE_MUTEX(probe_lock);
>  static LIST_HEAD(probe_list);
> @@ -340,9 +340,12 @@ static int unregister_trace_probe(struct trace_probe *tp)
>  	if (trace_probe_is_enabled(tp))
>  		return -EBUSY;
>  
> +	/* Will fail if probe is being used by ftrace or perf */
> +	if (unregister_probe_event(tp))
> +		return -EBUSY;
> +
>  	__unregister_trace_probe(tp);
>  	list_del(&tp->list);
> -	unregister_probe_event(tp);
>  
>  	return 0;
>  }
> @@ -621,7 +624,9 @@ static int release_all_trace_probes(void)
>  	/* TODO: Use batch unregistration */
>  	while (!list_empty(&probe_list)) {
>  		tp = list_entry(probe_list.next, struct trace_probe, list);
> -		unregister_trace_probe(tp);
> +		ret = unregister_trace_probe(tp);
> +		if (ret)
> +			goto end;
>  		free_trace_probe(tp);
>  	}
>  
> @@ -1242,11 +1247,15 @@ static int register_probe_event(struct trace_probe *tp)
>  	return ret;
>  }
>  
> -static void unregister_probe_event(struct trace_probe *tp)
> +static int unregister_probe_event(struct trace_probe *tp)
>  {
> +	int ret;
> +
>  	/* tp->event is unregistered in trace_remove_event_call() */
> -	trace_remove_event_call(&tp->call);
> -	kfree(tp->call.print_fmt);
> +	ret = trace_remove_event_call(&tp->call);
> +	if (!ret)
> +		kfree(tp->call.print_fmt);
> +	return ret;
>  }
>  
>  /* Make a debugfs interface for controlling probe points */
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-04  3:33 ` [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open Steven Rostedt
  2013-07-04 12:45   ` Masami Hiramatsu
  2013-07-30  8:15   ` [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open Masami Hiramatsu
@ 2013-07-31 19:49   ` Steven Rostedt
  2013-07-31 20:40     ` Oleg Nesterov
  2 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2013-07-31 19:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On Wed, 2013-07-03 at 23:33 -0400, Steven Rostedt wrote:
> plain text document attachment
> (0003-tracing-kprobes-Fail-to-unregister-if-probe-event-fi.patch)
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> When one of the event files is opened, we need to prevent them from
> being removed. Modules do with with the module owner set (automated
> from the VFS layer).  The ftrace buffer instances have a ref count added
> to the trace_array when the enabled file is opened (the others are not
> that big of a deal, as they only reference the event calls which
> still exist when an instance disappears). But kprobes and uprobes
> do not have any protection.
> 
>  # cd /sys/kernel/debug/tracing
>  # echo 'p:sigprocmask sigprocmask' > kprobe_events || exit -1
>  # enable_probe() {
>     sleep 10
>     echo 1
>  }
>  # file=events/kprobes/sigprocmask/enable
>  # enable_probe > $file &
>  > kprobe_events
> 
> The above will corrupt the kprobe system, as the write to the enable
> file will happen after the kprobe was deleted.
> 
> Trying to create the probe again fails:
> 
>  # echo 'p:sigprocmask sigprocmask' > kprobe_events
>  # cat kprobe_events
> p:kprobes/sigprocmask sigprocmask
>  # ls events/kprobes/
> enable  filter

Oleg,

The above no longer triggers the bug due to your changes. The race is
much tighter now and requires a process with the enable file opened and
races with a write to enable it where the removal of the trace file
checks the trace disabled, sees that it is, continues, but then the
write enables it just as it gets deleted.

Is this correct?

Do you know of a way to trigger this bug? Probably requires writing
something in C and hammer at writing '0's and '1's into the enable file,
and then remove it. Check if it succeeds and try again.

Hmm, what happens without this patch now? If it is active, and we delete
it? It will call back into the kprobes and access a tracepoint that does
not exist? Would this cause a crash?

-- Steve



> 
> Have the unregister probe fail when the event files are open, in use
> are used by perf.
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_kprobe.c |   21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 7ed6976..ffcaf42 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -95,7 +95,7 @@ static __kprobes bool trace_probe_is_on_module(struct trace_probe *tp)
>  }
>  
>  static int register_probe_event(struct trace_probe *tp);
> -static void unregister_probe_event(struct trace_probe *tp);
> +static int unregister_probe_event(struct trace_probe *tp);
>  
>  static DEFINE_MUTEX(probe_lock);
>  static LIST_HEAD(probe_list);
> @@ -340,9 +340,12 @@ static int unregister_trace_probe(struct trace_probe *tp)
>  	if (trace_probe_is_enabled(tp))
>  		return -EBUSY;
>  
> +	/* Will fail if probe is being used by ftrace or perf */
> +	if (unregister_probe_event(tp))
> +		return -EBUSY;
> +
>  	__unregister_trace_probe(tp);
>  	list_del(&tp->list);
> -	unregister_probe_event(tp);
>  
>  	return 0;
>  }
> @@ -621,7 +624,9 @@ static int release_all_trace_probes(void)
>  	/* TODO: Use batch unregistration */
>  	while (!list_empty(&probe_list)) {
>  		tp = list_entry(probe_list.next, struct trace_probe, list);
> -		unregister_trace_probe(tp);
> +		ret = unregister_trace_probe(tp);
> +		if (ret)
> +			goto end;
>  		free_trace_probe(tp);
>  	}
>  
> @@ -1242,11 +1247,15 @@ static int register_probe_event(struct trace_probe *tp)
>  	return ret;
>  }
>  
> -static void unregister_probe_event(struct trace_probe *tp)
> +static int unregister_probe_event(struct trace_probe *tp)
>  {
> +	int ret;
> +
>  	/* tp->event is unregistered in trace_remove_event_call() */
> -	trace_remove_event_call(&tp->call);
> -	kfree(tp->call.print_fmt);
> +	ret = trace_remove_event_call(&tp->call);
> +	if (!ret)
> +		kfree(tp->call.print_fmt);
> +	return ret;
>  }
>  
>  /* Make a debugfs interface for controlling probe points */



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-31 19:49   ` Steven Rostedt
@ 2013-07-31 20:40     ` Oleg Nesterov
  2013-07-31 22:42       ` Steven Rostedt
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-07-31 20:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/31, Steven Rostedt wrote:
>
> On Wed, 2013-07-03 at 23:33 -0400, Steven Rostedt wrote:
> > The above will corrupt the kprobe system, as the write to the enable
> > file will happen after the kprobe was deleted.
>
> Oleg,
>
> The above no longer triggers the bug due to your changes. The race is
> much tighter now

Yes, the changelog should be updated...

> and requires a process with the enable file opened and
> races with a write to enable it where the removal of the trace file
> checks the trace disabled, sees that it is, continues, but then the
> write enables it just as it gets deleted.

This should be fine. Either event_remove() path takes event_mutex
first and then ->write() fails, or ftrace_event_enable_disable()
actually disables this even successfully.

> Do you know of a way to trigger this bug?

I'll try to think more tomorrow, but most probably no. The race is
unlikely.

We need perf_trace_event_init() or ":enable_event:this-event" right
before trace_remove_event_call() takes the mutex.

And right after the caller (kprobes) checks "disabled".

> Hmm, what happens without this patch now? If it is active, and we delete
> it? It will call back into the kprobes and access a tracepoint that does
> not exist? Would this cause a crash?

I think yes, the crash is possible.

perf or FL_SOFT_MODE, this call/file has the external references, and
we are going to free it.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-31 20:40     ` Oleg Nesterov
@ 2013-07-31 22:42       ` Steven Rostedt
  2013-08-01  2:07         ` Steven Rostedt
  2013-08-01 13:10         ` Oleg Nesterov
  0 siblings, 2 replies; 67+ messages in thread
From: Steven Rostedt @ 2013-07-31 22:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On Wed, 2013-07-31 at 22:40 +0200, Oleg Nesterov wrote:
> On 07/31, Steven Rostedt wrote:
> >
> > On Wed, 2013-07-03 at 23:33 -0400, Steven Rostedt wrote:
> > > The above will corrupt the kprobe system, as the write to the enable
> > > file will happen after the kprobe was deleted.
> >
> > Oleg,
> >
> > The above no longer triggers the bug due to your changes. The race is
> > much tighter now
> 
> Yes, the changelog should be updated...
> 
> > and requires a process with the enable file opened and
> > races with a write to enable it where the removal of the trace file
> > checks the trace disabled, sees that it is, continues, but then the
> > write enables it just as it gets deleted.
> 
> This should be fine. Either event_remove() path takes event_mutex
> first and then ->write() fails, or ftrace_event_enable_disable()
> actually disables this even successfully.

Actually I meant while in unregister_trace_probe(), it gets by the
trace_probe_is_enabled() part first, then the write succeeds (as the
event_mutex isn't taken till unregister_probe_event()). The the
unregister_probe_event fails, but the tp was freed. The event files
still reference the tp and this is where a crash can happen without this
patch set.

-- Steve



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-31 22:42       ` Steven Rostedt
@ 2013-08-01  2:07         ` Steven Rostedt
  2013-08-01  2:50           ` Steven Rostedt
  2013-08-01 13:10         ` Oleg Nesterov
  1 sibling, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2013-08-01  2:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 5860 bytes --]

On Wed, 2013-07-31 at 18:42 -0400, Steven Rostedt wrote:
>  
> > This should be fine. Either event_remove() path takes event_mutex
> > first and then ->write() fails, or ftrace_event_enable_disable()
> > actually disables this even successfully.
> 
> Actually I meant while in unregister_trace_probe(), it gets by the
> trace_probe_is_enabled() part first, then the write succeeds (as the
> event_mutex isn't taken till unregister_probe_event()). The the
> unregister_probe_event fails, but the tp was freed. The event files
> still reference the tp and this is where a crash can happen without this
> patch set.

And it's not just theoretical. I worked on a program to try to trigger
this bug, and succeeded :-)  (I've never been so happy being able to
crash my kernel)

[  128.999772] BUG: unable to handle kernel paging request at 00000005000000f9
[  129.000015] IP: [<ffffffff810dee70>] probes_open+0x3b/0xa7
[  129.000015] PGD 7808a067 PUD 0 
[  129.000015] Oops: 0000 [#1] PREEMPT SMP 
[  129.000015] Dumping ftrace buffer:
   <skip>
[  129.000015] ---------------------------------
[  129.000015] Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt kvm_intel snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm kvm snd_page_alloc snd_timer shpchp snd microcode i2c_i801 soundcore pata_acpi firewire_ohci firewire_core crc_itu_t ata_generic i915 drm_kms_helper drm i2c_algo_bit i2c_core video
[  129.000015] CPU: 1 PID: 2070 Comm: test-kprobe-rem Not tainted 3.11.0-rc3-test+ #47
[  129.000015] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[  129.000015] task: ffff880077756440 ti: ffff880076e52000 task.ti: ffff880076e52000
[  129.000015] RIP: 0010:[<ffffffff810dee70>]  [<ffffffff810dee70>] probes_open+0x3b/0xa7
[  129.000015] RSP: 0018:ffff880076e53c38  EFLAGS: 00010203
[  129.000015] RAX: 0000000500000001 RBX: ffff88007844f440 RCX: 0000000000000003
[  129.000015] RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880076e52000
[  129.000015] RBP: ffff880076e53c58 R08: ffff880076e53bd8 R09: 0000000000000000
[  129.000015] R10: ffff880077756440 R11: 0000000000000006 R12: ffffffff810dee35
[  129.000015] R13: ffff880079250418 R14: 0000000000000000 R15: ffff88007844f450
[  129.000015] FS:  00007f87a276f700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000
[  129.000015] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  129.000015] CR2: 00000005000000f9 CR3: 0000000077262000 CR4: 00000000000007e0
[  129.000015] Stack:
[  129.000015]  ffff880076e53c58 ffffffff81219ea0 ffff88007844f440 ffffffff810dee35
[  129.000015]  ffff880076e53ca8 ffffffff81130f78 ffff8800772986c0 ffff8800796f93a0
[  129.000015]  ffffffff81d1b5d8 ffff880076e53e04 0000000000000000 ffff88007844f440
[  129.000015] Call Trace:
[  129.000015]  [<ffffffff81219ea0>] ? security_file_open+0x2c/0x30
[  129.000015]  [<ffffffff810dee35>] ? unregister_trace_probe+0x4b/0x4b
[  129.000015]  [<ffffffff81130f78>] do_dentry_open+0x162/0x226
[  129.000015]  [<ffffffff81131186>] finish_open+0x46/0x54
[  129.000015]  [<ffffffff8113f30b>] do_last+0x7f6/0x996
[  129.000015]  [<ffffffff8113cc6f>] ? inode_permission+0x42/0x44
[  129.000015]  [<ffffffff8113f6dd>] path_openat+0x232/0x496
[  129.000015]  [<ffffffff8113fc30>] do_filp_open+0x3a/0x8a
[  129.000015]  [<ffffffff8114ab32>] ? __alloc_fd+0x168/0x17a
[  129.000015]  [<ffffffff81131f4e>] do_sys_open+0x70/0x102
[  129.000015]  [<ffffffff8108f06e>] ? trace_hardirqs_on_caller+0x160/0x197
[  129.000015]  [<ffffffff81131ffe>] SyS_open+0x1e/0x20
[  129.000015]  [<ffffffff81522742>] system_call_fastpath+0x16/0x1b
[  129.000015] Code: e5 41 54 53 48 89 f3 48 83 ec 10 48 23 56 78 48 39 c2 75 6c 31 f6 48 c7 c7 d0 e6 a4 81 e8 3a 91 43 00 48 8b 05 f2 f8 96 00 eb 0c <f6> 80 f8 00 00 00 03 75 31 48 8b 00 48 3d 60 e7 a4 81 75 ec eb 
[  129.000015] RIP  [<ffffffff810dee70>] probes_open+0x3b/0xa7
[  129.000015]  RSP <ffff880076e53c38>
[  129.000015] CR2: 00000005000000f9
[  130.555521] ---[ end trace 35f17d68fc569897 ]---

Attached is the program I used. It took lots of tweaks and doesn't
always trigger the bug on the first run. It can take several runs to
trigger. Each bug I've seen has been rather random. Twice it crashed due
to memory errors, once it just screwed up the kprobes, but the system
still ran fine. I had another kprobes error bug, when when I went to do
more tracing, it crashed.

What my program does is creates two threads (well, one thread and the
main thread). They place themselves onto CPU 0 and 1. Then the following
occurs:

	CPU 0			CPU 1
	-----			-----
  create sigprocmask probe

loop:			loop:

			fd = open(events/kprobes/sigprocmask/enable)

			pthread_barrier_wait()

  nanosleep(interval)

  pthread_barrier_wait()

			write(fd, "1", 1);

			write(fd, "0", 1);

  truncate kprobe_events
  // deletes all probes

			pthread_barrier_wait();
  pthread_barrier_wait();


			write(fd, "0", 1); // just in case

  create sigprocmask probe

			pthread_barrier_wait()

  pthread_barrier_wait()

  update interval

  goto loop		goto loop



I had to tweak the interval times to see where it would most likely
cause a race. When I found where it started to conflict, I placed the
range around that time and incremented it in steps related to 7/5
(~e/2). That eventually triggered the bug.

Now with this patch, I had to modify the test to give it a negative
number to place the pause before the write. That's because the switch to
removing the events slowed down the deletion and allowed the write to
always succeed (without the delay).

Anyway, I ran this in a while loop with the patch for an hour (and still
running) and it seems to keep it from crashing.

I'll update the change log to reflect this.

Thanks!

-- Steve


[-- Attachment #2: test-kprobe-removal.c --]
[-- Type: text/x-csrc, Size: 3959 bytes --]

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <fcntl.h>
#include <sched.h>
#include <unistd.h>
#include <pthread.h>

#define gettid() syscall(__NR_gettid)

#define KPROBE_FILE "/debug/tracing/kprobe_events"
#define KPROBE_ENABLE "/debug/tracing/events/kprobes/sigprocmask/enable"
#define PROBE "p:sigprocmask sigprocmask"

#define INTERVALS	1000
#define INTERVAL_START	60000
#define INTERVAL_STOP	92000
#define INTERVAL_STEP	(((INTERVAL_STOP - INTERVAL_START) * 7) / (INTERVALS * 5))

#define nano2sec(nan) (nan / 1000000000ULL)
#define ms2nano(ms) (ms * 1000000ULL)
#define sec2nano(sec) (sec * 1000000000ULL)

static int pause_main = 1;

static unsigned long long interval = INTERVAL_START;

static pthread_barrier_t initiate;
static pthread_barrier_t finish;
static pthread_barrier_t restart;

static int done;

static void *write_it(void *ignore)
{
	struct timespec intv;
	cpu_set_t cpumask;
	int ret;
	int fd;

	CPU_ZERO(&cpumask);
	CPU_SET(1, &cpumask);

	sched_setaffinity(gettid(), sizeof(long), &cpumask);

	while (!done) {
		intv.tv_sec = nano2sec(interval);
		intv.tv_nsec = interval % sec2nano(1);

		fd = open(KPROBE_ENABLE, O_WRONLY);
		if (fd < 0) {
			fprintf(stderr, "Can't open '%s'\n", KPROBE_ENABLE);
			exit(-1);
		}
		pthread_barrier_wait(&initiate);

		if (!pause_main)
			nanosleep(&intv, NULL);
		ret = write(fd, "1", 1);
		if (ret >= 0)
			printf("Enabled event! (%lld)\n", interval);
		write(fd, "0", 1);
		pthread_barrier_wait(&finish);
		write(fd, "0", 1);
		close(fd);
		pthread_barrier_wait(&restart);
	}

	return NULL;
}

static int open_kprobe(int trunc)
{
	int fd;
	int flags = O_WRONLY;

	if (trunc)
		flags |=  O_TRUNC;

	fd = open(KPROBE_FILE, flags);

	return fd;
}

static void create_probe(int fd)
{
	if (write(fd, PROBE, strlen(PROBE)) < 0) {
		fprintf(stderr, "Failed to write to '%s'\n", KPROBE_FILE);
		exit(-1);
	}
}

int main (int argc, char **argv)
{
	struct timespec intv;
	unsigned int intervals = INTERVALS;
	int start = INTERVAL_START;
	unsigned int stop = INTERVAL_STOP;
	unsigned int step = INTERVAL_STEP;
	cpu_set_t cpumask;
	pthread_t writer;
	int ret;
	int fd;
	int i;

	if (argc > 1)
		start = atoi(argv[1]);
	if (argc > 2)
		stop = atoi(argv[2]);
	if (argc > 3)
		intervals = atoi(argv[3]);

	if (start < 0) {
		pause_main = 0;
		start = start * -1;
		printf("PAUSING WRITE\n");
	}

	/* Check for bad input (also 0 skips parameters) */
	if (!start)
		start = INTERVAL_START;
	if (!stop)
		stop = INTERVAL_STOP;
	if (stop < start)
		stop = start + INTERVALS;
	if (!intervals)
		intervals = INTERVALS;

	if (argc > 1) {
		step = (((stop - start) * 7) / (intervals * 5));
		interval = start;
		if (!step)
			step = 1;
	}

	printf("interval:\t%d\n", intervals);
	printf("start:\t%d\n", start);
	printf("stop:\t%d\n", stop);
	printf("step:\t%d\n", step);

	CPU_ZERO(&cpumask);
	CPU_SET(0, &cpumask);

	pthread_barrier_init(&initiate, NULL, 2);
	pthread_barrier_init(&finish, NULL, 2);
	pthread_barrier_init(&restart, NULL, 2);

	fd = open_kprobe(1);
	if (fd < 0) {
		fprintf(stderr, "Can't open '%s'\n", KPROBE_FILE);
		exit(-1);
	}
	create_probe(fd);
	close(fd);

	ret = pthread_create(&writer, NULL, write_it, NULL);
	if (ret < 0)
		exit(-1);

	sched_setaffinity(gettid(), sizeof(long), &cpumask);

	for (i = 0; i < intervals; i++) {
		if (pause_main) {
			intv.tv_sec = nano2sec(interval);
			intv.tv_nsec = interval % sec2nano(1);
		} else {
			intv.tv_sec = 0;
			intv.tv_nsec = 1;
		}

		pthread_barrier_wait(&initiate);
		nanosleep(&intv, NULL);

		while ((fd = open_kprobe(1)) < 0)
		       ;
		pthread_barrier_wait(&finish);
		create_probe(fd);
		close(fd);
		pthread_barrier_wait(&restart);

		interval += step;
		if (interval > stop)
			interval = start + interval - stop;
	}
	done = 1;

	pthread_join(writer, NULL);
	fd = open_kprobe(1);
	close(fd);

	return 0;
}

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-08-01  2:07         ` Steven Rostedt
@ 2013-08-01  2:50           ` Steven Rostedt
  2013-08-01  3:48             ` Masami Hiramatsu
  2013-08-01 13:34             ` Oleg Nesterov
  0 siblings, 2 replies; 67+ messages in thread
From: Steven Rostedt @ 2013-08-01  2:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

Here's the new change log, but the same patch. Does this sound ok to you
guys?

-- Steve

>From 40c32592668b727cbfcf7b1c0567f581bd62a5e4 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
Date: Wed, 3 Jul 2013 23:33:50 -0400
Subject: [PATCH] tracing/kprobes: Fail to unregister if probe event files are
 in use

When a probe is being removed, it cleans up the event files that correspond
to the probe. But there is a race between writing to one of these files
and deleting the probe. This is especially true for the "enable" file.

	CPU 0				CPU 1
	-----				-----

				  fd = open("enable",O_WRONLY);

  probes_open()
  release_all_trace_probes()
  unregister_trace_probe()
  if (trace_probe_is_enabled(tp))
	return -EBUSY

				   write(fd, "1", 1)
				   __ftrace_set_clr_event()
				   call->class->reg()
				    (kprobe_register)
				     enable_trace_probe(tp)

  __unregister_trace_probe(tp);
  list_del(&tp->list)
  unregister_probe_event(tp) <-- fails!
  free_trace_probe(tp)

				   write(fd, "0", 1)
				   __ftrace_set_clr_event()
				   call->class->unreg
				    (kprobe_register)
				    disable_trace_probe(tp) <-- BOOM!

A test program was written that used two threads to simulate the
above scenario adding a nanosleep() interval to change the timings
and after several thousand runs, it was able to trigger this bug
and crash:

BUG: unable to handle kernel paging request at 00000005000000f9
IP: [<ffffffff810dee70>] probes_open+0x3b/0xa7
PGD 7808a067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
Dumping ftrace buffer:
---------------------------------
Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6
CPU: 1 PID: 2070 Comm: test-kprobe-rem Not tainted 3.11.0-rc3-test+ #47
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff880077756440 ti: ffff880076e52000 task.ti: ffff880076e52000
RIP: 0010:[<ffffffff810dee70>]  [<ffffffff810dee70>] probes_open+0x3b/0xa7
RSP: 0018:ffff880076e53c38  EFLAGS: 00010203
RAX: 0000000500000001 RBX: ffff88007844f440 RCX: 0000000000000003
RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880076e52000
RBP: ffff880076e53c58 R08: ffff880076e53bd8 R09: 0000000000000000
R10: ffff880077756440 R11: 0000000000000006 R12: ffffffff810dee35
R13: ffff880079250418 R14: 0000000000000000 R15: ffff88007844f450
FS:  00007f87a276f700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000005000000f9 CR3: 0000000077262000 CR4: 00000000000007e0
Stack:
 ffff880076e53c58 ffffffff81219ea0 ffff88007844f440 ffffffff810dee35
 ffff880076e53ca8 ffffffff81130f78 ffff8800772986c0 ffff8800796f93a0
 ffffffff81d1b5d8 ffff880076e53e04 0000000000000000 ffff88007844f440
Call Trace:
 [<ffffffff81219ea0>] ? security_file_open+0x2c/0x30
 [<ffffffff810dee35>] ? unregister_trace_probe+0x4b/0x4b
 [<ffffffff81130f78>] do_dentry_open+0x162/0x226
 [<ffffffff81131186>] finish_open+0x46/0x54
 [<ffffffff8113f30b>] do_last+0x7f6/0x996
 [<ffffffff8113cc6f>] ? inode_permission+0x42/0x44
 [<ffffffff8113f6dd>] path_openat+0x232/0x496
 [<ffffffff8113fc30>] do_filp_open+0x3a/0x8a
 [<ffffffff8114ab32>] ? __alloc_fd+0x168/0x17a
 [<ffffffff81131f4e>] do_sys_open+0x70/0x102
 [<ffffffff8108f06e>] ? trace_hardirqs_on_caller+0x160/0x197
 [<ffffffff81131ffe>] SyS_open+0x1e/0x20
 [<ffffffff81522742>] system_call_fastpath+0x16/0x1b
Code: e5 41 54 53 48 89 f3 48 83 ec 10 48 23 56 78 48 39 c2 75 6c 31 f6 48 c7
RIP  [<ffffffff810dee70>] probes_open+0x3b/0xa7
 RSP <ffff880076e53c38>
CR2: 00000005000000f9
---[ end trace 35f17d68fc569897 ]---

The unregister_trace_probe() must be done first, and if it fails it must
fail the removal of the kprobe.

Several changes have already been made by Oleg Nesterov and Masami Hiramatsu
to allow moving the unregister_probe_event() before the removal of
the probe and exit the function if it fails. This prevents the tp
structure from being used after it is freed.

Link: http://lkml.kernel.org/r/20130704034038.819592356@goodmis.org

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_kprobe.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 3811487..243f683 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -95,7 +95,7 @@ static __kprobes bool trace_probe_is_on_module(struct trace_probe *tp)
 }
 
 static int register_probe_event(struct trace_probe *tp);
-static void unregister_probe_event(struct trace_probe *tp);
+static int unregister_probe_event(struct trace_probe *tp);
 
 static DEFINE_MUTEX(probe_lock);
 static LIST_HEAD(probe_list);
@@ -351,9 +351,12 @@ static int unregister_trace_probe(struct trace_probe *tp)
 	if (trace_probe_is_enabled(tp))
 		return -EBUSY;
 
+	/* Will fail if probe is being used by ftrace or perf */
+	if (unregister_probe_event(tp))
+		return -EBUSY;
+
 	__unregister_trace_probe(tp);
 	list_del(&tp->list);
-	unregister_probe_event(tp);
 
 	return 0;
 }
@@ -632,7 +635,9 @@ static int release_all_trace_probes(void)
 	/* TODO: Use batch unregistration */
 	while (!list_empty(&probe_list)) {
 		tp = list_entry(probe_list.next, struct trace_probe, list);
-		unregister_trace_probe(tp);
+		ret = unregister_trace_probe(tp);
+		if (ret)
+			goto end;
 		free_trace_probe(tp);
 	}
 
@@ -1247,11 +1252,15 @@ static int register_probe_event(struct trace_probe *tp)
 	return ret;
 }
 
-static void unregister_probe_event(struct trace_probe *tp)
+static int unregister_probe_event(struct trace_probe *tp)
 {
+	int ret;
+
 	/* tp->event is unregistered in trace_remove_event_call() */
-	trace_remove_event_call(&tp->call);
-	kfree(tp->call.print_fmt);
+	ret = trace_remove_event_call(&tp->call);
+	if (!ret)
+		kfree(tp->call.print_fmt);
+	return ret;
 }
 
 /* Make a debugfs interface for controlling probe points */
-- 
1.8.1.4




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 4/4] tracing/uprobes: Fail to unregister if probe event files are open
  2013-07-04  3:33 ` [RFC][PATCH 4/4] tracing/uprobes: " Steven Rostedt
@ 2013-08-01  3:40   ` Steven Rostedt
  2013-08-01 14:08   ` Oleg Nesterov
  1 sibling, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2013-08-01  3:40 UTC (permalink / raw)
  To: linux-kernel, Srikar Dronamraju
  Cc: Oleg Nesterov, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Frederic Weisbecker, Ingo Molnar, Andrew Morton

Srikar,

Can you give your Acked-by to this patch.

Thanks!

-- Steve


On Wed, 2013-07-03 at 23:33 -0400, Steven Rostedt wrote:
> plain text document attachment
> (0004-tracing-uprobes-Fail-to-unregister-if-probe-event-fi.patch)
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> When one of the event files is opened, we need to prevent them from
> being removed. Modules do with with the module owner set (automated
> from the VFS layer).  The ftrace buffer instances have a ref count added
> to the trace_array when the enabled file is opened (the others are not
> that big of a deal, as they only reference the event calls which
> still exist when an instance disappears). But kprobes and uprobes
> do not have any protection.
> 
> Have the unregister probe fail when the event files are open, in use
> are used by perf.
> 
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_uprobe.c |   48 ++++++++++++++++++++++++++++++++-----------
>  1 file changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index d5d0cd3..4916da5 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -70,7 +70,7 @@ struct trace_uprobe {
>  	(sizeof(struct probe_arg) * (n)))
>  
>  static int register_uprobe_event(struct trace_uprobe *tu);
> -static void unregister_uprobe_event(struct trace_uprobe *tu);
> +static int unregister_uprobe_event(struct trace_uprobe *tu);
>  
>  static DEFINE_MUTEX(uprobe_lock);
>  static LIST_HEAD(uprobe_list);
> @@ -164,11 +164,17 @@ static struct trace_uprobe *find_probe_event(const char *event, const char *grou
>  }
>  
>  /* Unregister a trace_uprobe and probe_event: call with locking uprobe_lock */
> -static void unregister_trace_uprobe(struct trace_uprobe *tu)
> +static int unregister_trace_uprobe(struct trace_uprobe *tu)
>  {
> +	int ret;
> +
> +	ret = unregister_uprobe_event(tu);
> +	if (ret)
> +		return ret;
> +
>  	list_del(&tu->list);
> -	unregister_uprobe_event(tu);
>  	free_trace_uprobe(tu);
> +	return 0;
>  }
>  
>  /* Register a trace_uprobe and probe_event */
> @@ -181,9 +187,12 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
>  
>  	/* register as an event */
>  	old_tp = find_probe_event(tu->call.name, tu->call.class->system);
> -	if (old_tp)
> +	if (old_tp) {
>  		/* delete old event */
> -		unregister_trace_uprobe(old_tp);
> +		ret = unregister_trace_uprobe(old_tp);
> +		if (ret)
> +			goto end;
> +	}
>  
>  	ret = register_uprobe_event(tu);
>  	if (ret) {
> @@ -256,6 +265,8 @@ static int create_trace_uprobe(int argc, char **argv)
>  		group = UPROBE_EVENT_SYSTEM;
>  
>  	if (is_delete) {
> +		int ret;
> +
>  		if (!event) {
>  			pr_info("Delete command needs an event name.\n");
>  			return -EINVAL;
> @@ -269,9 +280,9 @@ static int create_trace_uprobe(int argc, char **argv)
>  			return -ENOENT;
>  		}
>  		/* delete an event */
> -		unregister_trace_uprobe(tu);
> +		ret = unregister_trace_uprobe(tu);
>  		mutex_unlock(&uprobe_lock);
> -		return 0;
> +		return ret;
>  	}
>  
>  	if (argc < 2) {
> @@ -408,16 +419,20 @@ fail_address_parse:
>  	return ret;
>  }
>  
> -static void cleanup_all_probes(void)
> +static int cleanup_all_probes(void)
>  {
>  	struct trace_uprobe *tu;
> +	int ret = 0;
>  
>  	mutex_lock(&uprobe_lock);
>  	while (!list_empty(&uprobe_list)) {
>  		tu = list_entry(uprobe_list.next, struct trace_uprobe, list);
> -		unregister_trace_uprobe(tu);
> +		ret = unregister_trace_uprobe(tu);
> +		if (ret)
> +			break;
>  	}
>  	mutex_unlock(&uprobe_lock);
> +	return ret;
>  }
>  
>  /* Probes listing interfaces */
> @@ -462,8 +477,12 @@ static const struct seq_operations probes_seq_op = {
>  
>  static int probes_open(struct inode *inode, struct file *file)
>  {
> +	int ret = 0;
> +
>  	if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC))
> -		cleanup_all_probes();
> +		ret = cleanup_all_probes();
> +	if (ret)
> +		return ret;
>  
>  	return seq_open(file, &probes_seq_op);
>  }
> @@ -970,12 +989,17 @@ static int register_uprobe_event(struct trace_uprobe *tu)
>  	return ret;
>  }
>  
> -static void unregister_uprobe_event(struct trace_uprobe *tu)
> +static int unregister_uprobe_event(struct trace_uprobe *tu)
>  {
> +	int ret;
> +
>  	/* tu->event is unregistered in trace_remove_event_call() */
> -	trace_remove_event_call(&tu->call);
> +	ret = trace_remove_event_call(&tu->call);
> +	if (ret)
> +		return ret;
>  	kfree(tu->call.print_fmt);
>  	tu->call.print_fmt = NULL;
> +	return 0;
>  }
>  
>  /* Make a trace interface for controling probe points */



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-08-01  2:50           ` Steven Rostedt
@ 2013-08-01  3:48             ` Masami Hiramatsu
  2013-08-01 13:34             ` Oleg Nesterov
  1 sibling, 0 replies; 67+ messages in thread
From: Masami Hiramatsu @ 2013-08-01  3:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, linux-kernel, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/08/01 11:50), Steven Rostedt wrote:
> Here's the new change log, but the same patch. Does this sound ok to you
> guys?

Great! This looks good for me. ;)

Thank you very much!

> 
> -- Steve
> 
>>From 40c32592668b727cbfcf7b1c0567f581bd62a5e4 Mon Sep 17 00:00:00 2001
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> Date: Wed, 3 Jul 2013 23:33:50 -0400
> Subject: [PATCH] tracing/kprobes: Fail to unregister if probe event files are
>  in use
> 
> When a probe is being removed, it cleans up the event files that correspond
> to the probe. But there is a race between writing to one of these files
> and deleting the probe. This is especially true for the "enable" file.
> 
> 	CPU 0				CPU 1
> 	-----				-----
> 
> 				  fd = open("enable",O_WRONLY);
> 
>   probes_open()
>   release_all_trace_probes()
>   unregister_trace_probe()
>   if (trace_probe_is_enabled(tp))
> 	return -EBUSY
> 
> 				   write(fd, "1", 1)
> 				   __ftrace_set_clr_event()
> 				   call->class->reg()
> 				    (kprobe_register)
> 				     enable_trace_probe(tp)
> 
>   __unregister_trace_probe(tp);
>   list_del(&tp->list)
>   unregister_probe_event(tp) <-- fails!
>   free_trace_probe(tp)
> 
> 				   write(fd, "0", 1)
> 				   __ftrace_set_clr_event()
> 				   call->class->unreg
> 				    (kprobe_register)
> 				    disable_trace_probe(tp) <-- BOOM!
> 
> A test program was written that used two threads to simulate the
> above scenario adding a nanosleep() interval to change the timings
> and after several thousand runs, it was able to trigger this bug
> and crash:
> 
> BUG: unable to handle kernel paging request at 00000005000000f9
> IP: [<ffffffff810dee70>] probes_open+0x3b/0xa7
> PGD 7808a067 PUD 0
> Oops: 0000 [#1] PREEMPT SMP
> Dumping ftrace buffer:
> ---------------------------------
> Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6
> CPU: 1 PID: 2070 Comm: test-kprobe-rem Not tainted 3.11.0-rc3-test+ #47
> Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
> task: ffff880077756440 ti: ffff880076e52000 task.ti: ffff880076e52000
> RIP: 0010:[<ffffffff810dee70>]  [<ffffffff810dee70>] probes_open+0x3b/0xa7
> RSP: 0018:ffff880076e53c38  EFLAGS: 00010203
> RAX: 0000000500000001 RBX: ffff88007844f440 RCX: 0000000000000003
> RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880076e52000
> RBP: ffff880076e53c58 R08: ffff880076e53bd8 R09: 0000000000000000
> R10: ffff880077756440 R11: 0000000000000006 R12: ffffffff810dee35
> R13: ffff880079250418 R14: 0000000000000000 R15: ffff88007844f450
> FS:  00007f87a276f700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000005000000f9 CR3: 0000000077262000 CR4: 00000000000007e0
> Stack:
>  ffff880076e53c58 ffffffff81219ea0 ffff88007844f440 ffffffff810dee35
>  ffff880076e53ca8 ffffffff81130f78 ffff8800772986c0 ffff8800796f93a0
>  ffffffff81d1b5d8 ffff880076e53e04 0000000000000000 ffff88007844f440
> Call Trace:
>  [<ffffffff81219ea0>] ? security_file_open+0x2c/0x30
>  [<ffffffff810dee35>] ? unregister_trace_probe+0x4b/0x4b
>  [<ffffffff81130f78>] do_dentry_open+0x162/0x226
>  [<ffffffff81131186>] finish_open+0x46/0x54
>  [<ffffffff8113f30b>] do_last+0x7f6/0x996
>  [<ffffffff8113cc6f>] ? inode_permission+0x42/0x44
>  [<ffffffff8113f6dd>] path_openat+0x232/0x496
>  [<ffffffff8113fc30>] do_filp_open+0x3a/0x8a
>  [<ffffffff8114ab32>] ? __alloc_fd+0x168/0x17a
>  [<ffffffff81131f4e>] do_sys_open+0x70/0x102
>  [<ffffffff8108f06e>] ? trace_hardirqs_on_caller+0x160/0x197
>  [<ffffffff81131ffe>] SyS_open+0x1e/0x20
>  [<ffffffff81522742>] system_call_fastpath+0x16/0x1b
> Code: e5 41 54 53 48 89 f3 48 83 ec 10 48 23 56 78 48 39 c2 75 6c 31 f6 48 c7
> RIP  [<ffffffff810dee70>] probes_open+0x3b/0xa7
>  RSP <ffff880076e53c38>
> CR2: 00000005000000f9
> ---[ end trace 35f17d68fc569897 ]---
> 
> The unregister_trace_probe() must be done first, and if it fails it must
> fail the removal of the kprobe.
> 
> Several changes have already been made by Oleg Nesterov and Masami Hiramatsu
> to allow moving the unregister_probe_event() before the removal of
> the probe and exit the function if it fails. This prevents the tp
> structure from being used after it is freed.
> 
> Link: http://lkml.kernel.org/r/20130704034038.819592356@goodmis.org
> 
> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  kernel/trace/trace_kprobe.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 3811487..243f683 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -95,7 +95,7 @@ static __kprobes bool trace_probe_is_on_module(struct trace_probe *tp)
>  }
>  
>  static int register_probe_event(struct trace_probe *tp);
> -static void unregister_probe_event(struct trace_probe *tp);
> +static int unregister_probe_event(struct trace_probe *tp);
>  
>  static DEFINE_MUTEX(probe_lock);
>  static LIST_HEAD(probe_list);
> @@ -351,9 +351,12 @@ static int unregister_trace_probe(struct trace_probe *tp)
>  	if (trace_probe_is_enabled(tp))
>  		return -EBUSY;
>  
> +	/* Will fail if probe is being used by ftrace or perf */
> +	if (unregister_probe_event(tp))
> +		return -EBUSY;
> +
>  	__unregister_trace_probe(tp);
>  	list_del(&tp->list);
> -	unregister_probe_event(tp);
>  
>  	return 0;
>  }
> @@ -632,7 +635,9 @@ static int release_all_trace_probes(void)
>  	/* TODO: Use batch unregistration */
>  	while (!list_empty(&probe_list)) {
>  		tp = list_entry(probe_list.next, struct trace_probe, list);
> -		unregister_trace_probe(tp);
> +		ret = unregister_trace_probe(tp);
> +		if (ret)
> +			goto end;
>  		free_trace_probe(tp);
>  	}
>  
> @@ -1247,11 +1252,15 @@ static int register_probe_event(struct trace_probe *tp)
>  	return ret;
>  }
>  
> -static void unregister_probe_event(struct trace_probe *tp)
> +static int unregister_probe_event(struct trace_probe *tp)
>  {
> +	int ret;
> +
>  	/* tp->event is unregistered in trace_remove_event_call() */
> -	trace_remove_event_call(&tp->call);
> -	kfree(tp->call.print_fmt);
> +	ret = trace_remove_event_call(&tp->call);
> +	if (!ret)
> +		kfree(tp->call.print_fmt);
> +	return ret;
>  }
>  
>  /* Make a debugfs interface for controlling probe points */
> 


-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-07-31 22:42       ` Steven Rostedt
  2013-08-01  2:07         ` Steven Rostedt
@ 2013-08-01 13:10         ` Oleg Nesterov
  1 sibling, 0 replies; 67+ messages in thread
From: Oleg Nesterov @ 2013-08-01 13:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/31, Steven Rostedt wrote:
>
> On Wed, 2013-07-31 at 22:40 +0200, Oleg Nesterov wrote:
> >
> > This should be fine. Either event_remove() path takes event_mutex
> > first and then ->write() fails, or ftrace_event_enable_disable()
> > actually disables this even successfully.
>
> Actually I meant while in unregister_trace_probe(), it gets by the
> trace_probe_is_enabled() part first, then the write succeeds (as the
> event_mutex isn't taken till unregister_probe_event()). The the
> unregister_probe_event fails,

Yes sure. In this case evrything is clear. But this looks as if
unregister_probe_event() should not fail. IOW, this looks as if
the bug was introduce by the previous "trace_remove_event_call()
should fail if call/file is in use" change.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-08-01  2:50           ` Steven Rostedt
  2013-08-01  3:48             ` Masami Hiramatsu
@ 2013-08-01 13:34             ` Oleg Nesterov
  2013-08-01 13:49               ` Oleg Nesterov
                                 ` (2 more replies)
  1 sibling, 3 replies; 67+ messages in thread
From: Oleg Nesterov @ 2013-08-01 13:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/31, Steven Rostedt wrote:
>
> Subject: [PATCH] tracing/kprobes: Fail to unregister if probe event files are
>  in use
>
> When a probe is being removed, it cleans up the event files that correspond
> to the probe. But there is a race between writing to one of these files
> and deleting the probe. This is especially true for the "enable" file.
>
> 	CPU 0				CPU 1
> 	-----				-----
>
> 				  fd = open("enable",O_WRONLY);
>
>   probes_open()
>   release_all_trace_probes()
>   unregister_trace_probe()
>   if (trace_probe_is_enabled(tp))
> 	return -EBUSY
>
> 				   write(fd, "1", 1)
> 				   __ftrace_set_clr_event()
> 				   call->class->reg()
> 				    (kprobe_register)
> 				     enable_trace_probe(tp)
>
>   __unregister_trace_probe(tp);
>   list_del(&tp->list)
>   unregister_probe_event(tp) <-- fails!
>   free_trace_probe(tp)

Yes. But again, this doesn't explain why unregister_probe_event()->
__trace_remove_event_call() can't simply proceed and
do ftrace_event_enable_disable() + remove_event_from_tracers().

IOW, if we do not apply the previous "trace_remove_event_call() should
fail if call/file is in use" patch, then everything is fine:


> 				   write(fd, "0", 1)

this will fail with ENODEV.

Let's consider another race:

	CPU 0					CPU 1
	-----					-----

	probes_open()
	trace_probe_is_enabled() == F;

						sys_perf_event_open(attr.config == id)


	...
	trace_remove_event_call()

Whatever we do, we should abort. We could even do TRACE_REG_PERF_UNREGISTER
(although the current code doesn't do this), but we have no way to cleanup
the perf event's which have ->>tp_event = call.

trace_remove_event_call() was already changed to return the error.

And. Since it can fail, this obviously means that it should be checked,
we can't blindly do free_trace_probe().

IOW, the changelog could be very simple, I think. Either
trace_remove_event_call() should always succeed or we should check the
possible failure.

But I won't argue with this changelog. The only important thing is that
we all seem to agree that we do have the races here which can be fixed
by this and the previous change.

And just in case. I believe that the patch is fine.

Just one off-topic note,

> @@ -632,7 +635,9 @@ static int release_all_trace_probes(void)
>  	/* TODO: Use batch unregistration */
>  	while (!list_empty(&probe_list)) {
>  		tp = list_entry(probe_list.next, struct trace_probe, list);
> -		unregister_trace_probe(tp);
> +		ret = unregister_trace_probe(tp);
> +		if (ret)
> +			goto end;
>  		free_trace_probe(tp);
>  	}

This obviously breaks all-or-nothing semantics (I mean, this breaks
the intent, the current code is buggy).

I think we can't avoid this, and I hope this is fine. But then perhaps
we should simply remove the "list_for_each_entry" check above?

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-08-01 13:34             ` Oleg Nesterov
@ 2013-08-01 13:49               ` Oleg Nesterov
  2013-08-01 14:17               ` Steven Rostedt
  2013-08-02  4:57               ` Masami Hiramatsu
  2 siblings, 0 replies; 67+ messages in thread
From: Oleg Nesterov @ 2013-08-01 13:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 08/01, Oleg Nesterov wrote:
>
> And just in case. I believe that the patch is fine.
>
> Just one off-topic note,

Forgot to mention,

> > @@ -632,7 +635,9 @@ static int release_all_trace_probes(void)
> >  	/* TODO: Use batch unregistration */
> >  	while (!list_empty(&probe_list)) {
> >  		tp = list_entry(probe_list.next, struct trace_probe, list);
> > -		unregister_trace_probe(tp);
> > +		ret = unregister_trace_probe(tp);
> > +		if (ret)
> > +			goto end;
> >  		free_trace_probe(tp);
> >  	}
>
> This obviously breaks all-or-nothing semantics (I mean, this breaks
> the intent, the current code is buggy).
>
> I think we can't avoid this, and I hope this is fine. But then perhaps
> we should simply remove the "list_for_each_entry" check above?

And, of course, turn this "while (!list_empty())" into list_for_each_safe().

But again, this is almost off-topic and we can do this later.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 4/4] tracing/uprobes: Fail to unregister if probe event files are open
  2013-07-04  3:33 ` [RFC][PATCH 4/4] tracing/uprobes: " Steven Rostedt
  2013-08-01  3:40   ` Steven Rostedt
@ 2013-08-01 14:08   ` Oleg Nesterov
  2013-08-01 14:25     ` Steven Rostedt
  1 sibling, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-08-01 14:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 07/03, Steven Rostedt wrote:
>
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

Acked-by: Oleg Nesterov <oleg@redhat.com>


Just a couple of nits in the case you are going to redo this change,

> Modules do with with the module owner set (automated
> from the VFS layer).

This logic is dead, I think.

> The ftrace buffer instances have a ref count added
> to the trace_array when the enabled file is opened

This is too.

> -static void cleanup_all_probes(void)
> +static int cleanup_all_probes(void)
>  {
>  	struct trace_uprobe *tu;
> +	int ret = 0;
>  
>  	mutex_lock(&uprobe_lock);
>  	while (!list_empty(&uprobe_list)) {
>  		tu = list_entry(uprobe_list.next, struct trace_uprobe, list);
> -		unregister_trace_uprobe(tu);
> +		ret = unregister_trace_uprobe(tu);
> +		if (ret)
> +			break;
>  	}
>  	mutex_unlock(&uprobe_lock);
> +	return ret;
>  }

Again, it is not clear what exactly we should do and I won't argue
either way. But note that (with or without this patch) this doesn't
match kprobe's release_all_trace_probes() which checks (tries to,
actually) trace_probe_is_enabled() for every probe first. Perhaps
we should cleanup this later.

>  static int probes_open(struct inode *inode, struct file *file)
>  {
> +	int ret = 0;
> +
>  	if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC))
> -		cleanup_all_probes();
> +		ret = cleanup_all_probes();
> +	if (ret)
> +		return ret;

Cosmetic, but perhaps it would be a bit more clean to move this check
(with "int ret") under if (WRITE && TRUNC) block.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-08-01 13:34             ` Oleg Nesterov
  2013-08-01 13:49               ` Oleg Nesterov
@ 2013-08-01 14:17               ` Steven Rostedt
  2013-08-01 14:33                 ` Oleg Nesterov
  2013-08-02  4:57               ` Masami Hiramatsu
  2 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2013-08-01 14:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On Thu, 2013-08-01 at 15:34 +0200, Oleg Nesterov wrote:
> On 07/31, Steven Rostedt wrote:
> >
> > Subject: [PATCH] tracing/kprobes: Fail to unregister if probe event files are
> >  in use
> >
> > When a probe is being removed, it cleans up the event files that correspond
> > to the probe. But there is a race between writing to one of these files
> > and deleting the probe. This is especially true for the "enable" file.
> >
> > 	CPU 0				CPU 1
> > 	-----				-----
> >
> > 				  fd = open("enable",O_WRONLY);
> >
> >   probes_open()
> >   release_all_trace_probes()
> >   unregister_trace_probe()
> >   if (trace_probe_is_enabled(tp))
> > 	return -EBUSY
> >
> > 				   write(fd, "1", 1)
> > 				   __ftrace_set_clr_event()
> > 				   call->class->reg()
> > 				    (kprobe_register)
> > 				     enable_trace_probe(tp)
> >
> >   __unregister_trace_probe(tp);
> >   list_del(&tp->list)
> >   unregister_probe_event(tp) <-- fails!
> >   free_trace_probe(tp)
> 
> Yes. But again, this doesn't explain why unregister_probe_event()->
> __trace_remove_event_call() can't simply proceed and
> do ftrace_event_enable_disable() + remove_event_from_tracers().

The problem is with the soft disable. We would have to look at what
functions have been attached to soft enable this event and remove them.
This can become quite complex. When the function is hit, it will try to
access the event call file. There's no inode associated to that, so the
i_private wont work. The answer would be to actually remove the
functions that are referencing it. See event_enable_probe() and
event_enable_count_probe(). The problem is that these are called from
the function tracer which means it can not take *any* locks.

This will become exponentially more complex when Tom Zanussi's patches
make it in that have events enabling other events and also using the
soft enable too.

This means that if we fail to disable, we must not destroy the event.

We may be able to start adding ref counts, such that it doesn't get
freed till it goes to zero, but every user will see it "not existing".
But that will take a bit of work to do and not for 3.11.

> 
> IOW, if we do not apply the previous "trace_remove_event_call() should
> fail if call/file is in use" patch, then everything is fine:
> 
> 
> > 				   write(fd, "0", 1)
> 
> this will fail with ENODEV.

Currently it does not, because the failure in probe_remove_event_call()
due to the event being busy wont remove the event (event_remove() is
never called). Thus the event is still alive and the write will still
have access to it.

> 
> Let's consider another race:
> 
> 	CPU 0					CPU 1
> 	-----					-----
> 
> 	probes_open()
> 	trace_probe_is_enabled() == F;
> 
> 						sys_perf_event_open(attr.config == id)
> 
> 
> 	...
> 	trace_remove_event_call()
> 
> Whatever we do, we should abort. We could even do TRACE_REG_PERF_UNREGISTER
> (although the current code doesn't do this), but we have no way to cleanup
> the perf event's which have ->>tp_event = call.
> 
> trace_remove_event_call() was already changed to return the error.

Right, and that's what this patch is about. The bug is that
trace_remove_event_call() returns an error which makes
unregister_probe_event() fail silently, which means that there's still a
reference to the tp and the write (which does happen) references it.

> 
> And. Since it can fail, this obviously means that it should be checked,
> we can't blindly do free_trace_probe().

Right! That's what this patch does :-)

What you showed is the same race with perf as I have tested with ftrace.

> 
> IOW, the changelog could be very simple, I think. Either
> trace_remove_event_call() should always succeed or we should check the
> possible failure.

I can update the change log to remove some of the functions that are
being called to be less confusing.

> 
> But I won't argue with this changelog. The only important thing is that
> we all seem to agree that we do have the races here which can be fixed
> by this and the previous change.
> 
> And just in case. I believe that the patch is fine.

OK, I'm currently running this through my tests. I'm hoping to get an
Acked-by from Srikar for uprobes. Time is running out for 3.11.

> 
> Just one off-topic note,
> 
> > @@ -632,7 +635,9 @@ static int release_all_trace_probes(void)
> >  	/* TODO: Use batch unregistration */
> >  	while (!list_empty(&probe_list)) {
> >  		tp = list_entry(probe_list.next, struct trace_probe, list);
> > -		unregister_trace_probe(tp);
> > +		ret = unregister_trace_probe(tp);
> > +		if (ret)
> > +			goto end;
> >  		free_trace_probe(tp);
> >  	}
> 
> This obviously breaks all-or-nothing semantics (I mean, this breaks
> the intent, the current code is buggy).
> 

I agree, this isn't really nice, but for now we have to deal with it.

> I think we can't avoid this, and I hope this is fine. But then perhaps
> we should simply remove the "list_for_each_entry" check above?

I think this can be pushed for 3.12 to fix.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 4/4] tracing/uprobes: Fail to unregister if probe event files are open
  2013-08-01 14:08   ` Oleg Nesterov
@ 2013-08-01 14:25     ` Steven Rostedt
  0 siblings, 0 replies; 67+ messages in thread
From: Steven Rostedt @ 2013-08-01 14:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On Thu, 2013-08-01 at 16:08 +0200, Oleg Nesterov wrote:
> On 07/03, Steven Rostedt wrote:
> >
> > From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
> 
> Acked-by: Oleg Nesterov <oleg@redhat.com>

Thanks!

> 
> 
> Just a couple of nits in the case you are going to redo this change,
> 
> > Modules do with with the module owner set (automated
> > from the VFS layer).
> 
> This logic is dead, I think.
> 
> > The ftrace buffer instances have a ref count added
> > to the trace_array when the enabled file is opened
> 
> This is too.
> 

Yeah, the change log needs an update.

> > -static void cleanup_all_probes(void)
> > +static int cleanup_all_probes(void)
> >  {
> >  	struct trace_uprobe *tu;
> > +	int ret = 0;
> >  
> >  	mutex_lock(&uprobe_lock);
> >  	while (!list_empty(&uprobe_list)) {
> >  		tu = list_entry(uprobe_list.next, struct trace_uprobe, list);
> > -		unregister_trace_uprobe(tu);
> > +		ret = unregister_trace_uprobe(tu);
> > +		if (ret)
> > +			break;
> >  	}
> >  	mutex_unlock(&uprobe_lock);
> > +	return ret;
> >  }
> 
> Again, it is not clear what exactly we should do and I won't argue
> either way. But note that (with or without this patch) this doesn't
> match kprobe's release_all_trace_probes() which checks (tries to,
> actually) trace_probe_is_enabled() for every probe first. Perhaps
> we should cleanup this later.

Agreed on all accounts. Especially the "we should cleanup this later"
part ;-)

> 
> >  static int probes_open(struct inode *inode, struct file *file)
> >  {
> > +	int ret = 0;
> > +
> >  	if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC))
> > -		cleanup_all_probes();
> > +		ret = cleanup_all_probes();
> > +	if (ret)
> > +		return ret;
> 
> Cosmetic, but perhaps it would be a bit more clean to move this check
> (with "int ret") under if (WRITE && TRUNC) block.
> 

Yeah, agreed.

-- Steve



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-08-01 14:17               ` Steven Rostedt
@ 2013-08-01 14:33                 ` Oleg Nesterov
  2013-08-01 14:45                   ` Steven Rostedt
  0 siblings, 1 reply; 67+ messages in thread
From: Oleg Nesterov @ 2013-08-01 14:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 08/01, Steven Rostedt wrote:
>
> On Thu, 2013-08-01 at 15:34 +0200, Oleg Nesterov wrote:
> > >
> > >   __unregister_trace_probe(tp);
> > >   list_del(&tp->list)
> > >   unregister_probe_event(tp) <-- fails!
> > >   free_trace_probe(tp)
> >
> > Yes. But again, this doesn't explain why unregister_probe_event()->
> > __trace_remove_event_call() can't simply proceed and
> > do ftrace_event_enable_disable() + remove_event_from_tracers().
>
> The problem is with the soft disable.

Exactly! This is another (also unlikely) race we need to prevent.

> so the
> i_private wont work.

Yes, and this is another reason why trace_remove_event_call() can't
always succeed, and the comment/changelog in probe_remove_event_call()
(added by the previous change) even tries to document the problems
with FL_SOFT_MODE.

> > IOW, if we do not apply the previous "trace_remove_event_call() should
> > fail if call/file is in use" patch, then everything is fine:
> >
> > > 				   write(fd, "0", 1)
> >
> > this will fail with ENODEV.
>
> Currently it does not, because the failure in probe_remove_event_call()
> due to the event being busy wont remove the event (event_remove() is
> never called). Thus the event is still alive and the write will still
> have access to it.

Yes, yes. That is why the changelog says "Both trace_kprobe.c/trace_uprobe.c
need the additional changes".

IOW, the previous change itself adds the new races fixed by this patch
(and the similar change in trace_uprobe.c). Hopefully this is fine because
the code is buggy anyway.

> I can update the change log to remove some of the functions that are
> being called to be less confusing.

I am fine either way. Just I wanted to be sure that we understand each
other and I didn't miss something.

> I agree, this isn't really nice, but for now we have to deal with it.

Yes, yes, this is not for 3.11.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-08-01 14:33                 ` Oleg Nesterov
@ 2013-08-01 14:45                   ` Steven Rostedt
  2013-08-01 14:46                     ` Oleg Nesterov
  0 siblings, 1 reply; 67+ messages in thread
From: Steven Rostedt @ 2013-08-01 14:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On Thu, 2013-08-01 at 16:33 +0200, Oleg Nesterov wrote:
> On 08/01, Steven Rostedt wrote:
> >
> > On Thu, 2013-08-01 at 15:34 +0200, Oleg Nesterov wrote:
> > > >
> > > >   __unregister_trace_probe(tp);
> > > >   list_del(&tp->list)
> > > >   unregister_probe_event(tp) <-- fails!
> > > >   free_trace_probe(tp)
> > >
> > > Yes. But again, this doesn't explain why unregister_probe_event()->
> > > __trace_remove_event_call() can't simply proceed and
> > > do ftrace_event_enable_disable() + remove_event_from_tracers().
> >
> > The problem is with the soft disable.
> 
> Exactly! This is another (also unlikely) race we need to prevent.
> 

Is there a race even with these patches? I don't see one. To link a
function to an event (set the soft disable mode), the event_mutex is
taken. If the event is gone, it wont be able to link. If the soft
disable is attached (after found and set under event_mutex), the event
can't be deleted.

Or are you just saying that these patches fix that case too?

-- Steve



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-08-01 14:45                   ` Steven Rostedt
@ 2013-08-01 14:46                     ` Oleg Nesterov
  0 siblings, 0 replies; 67+ messages in thread
From: Oleg Nesterov @ 2013-08-01 14:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

On 08/01, Steven Rostedt wrote:
>
> On Thu, 2013-08-01 at 16:33 +0200, Oleg Nesterov wrote:
> > On 08/01, Steven Rostedt wrote:
> > >
> > > On Thu, 2013-08-01 at 15:34 +0200, Oleg Nesterov wrote:
> > > > >
> > > > >   __unregister_trace_probe(tp);
> > > > >   list_del(&tp->list)
> > > > >   unregister_probe_event(tp) <-- fails!
> > > > >   free_trace_probe(tp)
> > > >
> > > > Yes. But again, this doesn't explain why unregister_probe_event()->
> > > > __trace_remove_event_call() can't simply proceed and
> > > > do ftrace_event_enable_disable() + remove_event_from_tracers().
> > >
> > > The problem is with the soft disable.
> >
> > Exactly! This is another (also unlikely) race we need to prevent.
> >
>
> Is there a race even with these patches?

Sorry for confusion,

> I don't see one.

Neither me, but only with these changes.

I meant that this is another reason why trace_remove_event_call() should
fail and the caller should obviously abort in this case.

> Or are you just saying that these patches fix that case too?

Yes, sorry.

Oleg.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: Re: [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open
  2013-08-01 13:34             ` Oleg Nesterov
  2013-08-01 13:49               ` Oleg Nesterov
  2013-08-01 14:17               ` Steven Rostedt
@ 2013-08-02  4:57               ` Masami Hiramatsu
  2 siblings, 0 replies; 67+ messages in thread
From: Masami Hiramatsu @ 2013-08-02  4:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, linux-kernel, zhangwei(Jovi),
	Jiri Olsa, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Srikar Dronamraju, Frederic Weisbecker, Ingo Molnar,
	Andrew Morton

(2013/08/01 22:34), Oleg Nesterov wrote:
> Just one off-topic note,
> 
>> > @@ -632,7 +635,9 @@ static int release_all_trace_probes(void)
>> >  	/* TODO: Use batch unregistration */
>> >  	while (!list_empty(&probe_list)) {
>> >  		tp = list_entry(probe_list.next, struct trace_probe, list);
>> > -		unregister_trace_probe(tp);
>> > +		ret = unregister_trace_probe(tp);
>> > +		if (ret)
>> > +			goto end;
>> >  		free_trace_probe(tp);
>> >  	}
> This obviously breaks all-or-nothing semantics (I mean, this breaks
> the intent, the current code is buggy).

I see, since we can't lock all operations, we have to change the
semantics. I think it's OK.

> I think we can't avoid this, and I hope this is fine. But then perhaps
> we should simply remove the "list_for_each_entry" check above?

Yes, it should be done. And in that case, I'd like to remove
all removable probes, as much as possible.

BTW, I'd like to replace this "remove all" behavior with
writing "-:*", instead of writing without append flag.

Thank you,
-- 
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2013-08-02  4:57 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-04  3:33 [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Steven Rostedt
2013-07-04  3:33 ` [RFC][PATCH 1/4] tracing: Add ref count to ftrace_event_call Steven Rostedt
2013-07-04  4:22   ` Masami Hiramatsu
2013-07-04 11:55     ` [RFC PATCH] tracing: Atomically get refcounts of event_call and trace_array Masami Hiramatsu
2013-07-04 12:12       ` Masami Hiramatsu
2013-07-04 12:23         ` Steven Rostedt
2013-07-05  0:32       ` Oleg Nesterov
2013-07-05  2:32         ` Masami Hiramatsu
2013-07-09  7:55         ` [RFC PATCH V2] tracing: Check f_dentry before accessing event_file/call in inode->i_private Masami Hiramatsu
2013-07-15 18:16           ` Oleg Nesterov
2013-07-17  2:10             ` Masami Hiramatsu
2013-07-17 14:51               ` Oleg Nesterov
2013-07-18  2:20                 ` Masami Hiramatsu
2013-07-18 14:51                   ` Oleg Nesterov
2013-07-19  5:21                     ` Masami Hiramatsu
2013-07-19 13:33                       ` Oleg Nesterov
2013-07-22  9:57                         ` Masami Hiramatsu
2013-07-22 17:04                           ` Oleg Nesterov
2013-07-23 21:04                             ` Oleg Nesterov
2013-07-04  3:33 ` [RFC][PATCH 2/4] tracing: trace_remove_event_call() should fail if call/file is in use Steven Rostedt
2013-07-04 12:48   ` Masami Hiramatsu
2013-07-04  3:33 ` [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open Steven Rostedt
2013-07-04 12:45   ` Masami Hiramatsu
2013-07-04 18:48     ` Oleg Nesterov
2013-07-05  2:53       ` Masami Hiramatsu
2013-07-05 17:26         ` Oleg Nesterov
2013-07-08  2:36           ` Masami Hiramatsu
2013-07-08 14:25             ` Oleg Nesterov
2013-07-09  8:01               ` [RFC PATCH] tracing/kprobe: Wait for disabling all running kprobe handlers Masami Hiramatsu
2013-07-09  8:07                 ` Peter Zijlstra
2013-07-09  8:20                   ` Masami Hiramatsu
2013-07-09  8:21                     ` Peter Zijlstra
2013-07-09  8:50                       ` Masami Hiramatsu
2013-07-09  9:35                       ` [RFC PATCH V2] " Masami Hiramatsu
2013-07-15 18:20                         ` Oleg Nesterov
2013-07-18 12:07                           ` Masami Hiramatsu
2013-07-18 14:35                             ` Steven Rostedt
2013-07-30  8:15   ` [RFC][PATCH 3/4] tracing/kprobes: Fail to unregister if probe event files are open Masami Hiramatsu
2013-07-31 19:49   ` Steven Rostedt
2013-07-31 20:40     ` Oleg Nesterov
2013-07-31 22:42       ` Steven Rostedt
2013-08-01  2:07         ` Steven Rostedt
2013-08-01  2:50           ` Steven Rostedt
2013-08-01  3:48             ` Masami Hiramatsu
2013-08-01 13:34             ` Oleg Nesterov
2013-08-01 13:49               ` Oleg Nesterov
2013-08-01 14:17               ` Steven Rostedt
2013-08-01 14:33                 ` Oleg Nesterov
2013-08-01 14:45                   ` Steven Rostedt
2013-08-01 14:46                     ` Oleg Nesterov
2013-08-02  4:57               ` Masami Hiramatsu
2013-08-01 13:10         ` Oleg Nesterov
2013-07-04  3:33 ` [RFC][PATCH 4/4] tracing/uprobes: " Steven Rostedt
2013-08-01  3:40   ` Steven Rostedt
2013-08-01 14:08   ` Oleg Nesterov
2013-08-01 14:25     ` Steven Rostedt
2013-07-04  4:00 ` [RFC][PATCH 0/4] tracing/kprobes/uprobes: Fix race between opening probe event files and deleting probe Masami Hiramatsu
2013-07-04  6:14   ` Masami Hiramatsu
2013-07-12 13:09 ` Masami Hiramatsu
2013-07-12 17:53   ` Oleg Nesterov
2013-07-15 18:01 ` Oleg Nesterov
2013-07-16 16:38   ` Oleg Nesterov
2013-07-16 19:10     ` Steven Rostedt
2013-07-16 19:22       ` Oleg Nesterov
2013-07-16 19:38         ` Steven Rostedt
2013-07-17 16:03           ` Steven Rostedt
2013-07-17 17:37             ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.