linux-trace-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] tracing: Add event-fork to trace tasks children
@ 2016-04-19 14:34 Steven Rostedt
  2016-04-19 14:34 ` [RFC][PATCH 1/4] tracing: Rename check_ignore_pid() to ignore_this_task() Steven Rostedt
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 14:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, linux-trace-users

This code adds the event-fork option that, when set, will have tasks
with their PIDs in set_event_pid add their children PIDs when they
fork. It will also remove their PID from the file on exit.

Thanks to HPA for suggesting the bitmask over the more complex special
"hash" algorithm I originally had. It simplified the code and makes
it more robust.

Steven Rostedt (3):
      tracing: Rename check_ignore_pid() to ignore_this_task()
      tracing: Use pid bitmap instead of a pid array for set_event_pid
      tracing: Add infrastructure to allow set_event_pid to follow children

Steven Rostedt (Red Hat) (1):
      tracing: Update the documentation to describe "event-fork" option

----
 Documentation/trace/ftrace.txt |  34 +++--
 kernel/trace/trace.c           |   3 +
 kernel/trace/trace.h           |   7 +-
 kernel/trace/trace_events.c    | 297 +++++++++++++++++++++++------------------
 4 files changed, 201 insertions(+), 140 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 1/4] tracing: Rename check_ignore_pid() to ignore_this_task()
  2016-04-19 14:34 [RFC][PATCH 0/4] tracing: Add event-fork to trace tasks children Steven Rostedt
@ 2016-04-19 14:34 ` Steven Rostedt
  2016-04-19 14:34 ` [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid Steven Rostedt
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 14:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, linux-trace-users

[-- Attachment #1: 0001-tracing-Rename-check_ignore_pid-to-ignore_this_task.patch --]
[-- Type: text/plain, Size: 2578 bytes --]

From: Steven Rostedt <rostedt@goodmis.org>

The name "check_ignore_pid" is confusing in trying to figure out if the pid
should be ignored or not. Rename it to "ignore_this_task" which is pretty
straight forward, as a task (not a pid) is passed in, and should if true
should be ignored.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace_events.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 05ddc0820771..598a18675a6b 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -484,7 +484,7 @@ static int cmp_pid(const void *key, const void *elt)
 }
 
 static bool
-check_ignore_pid(struct trace_pid_list *filtered_pids, struct task_struct *task)
+ignore_this_task(struct trace_pid_list *filtered_pids, struct task_struct *task)
 {
 	pid_t search_pid;
 	pid_t *pid;
@@ -517,8 +517,8 @@ event_filter_pid_sched_switch_probe_pre(void *data, bool preempt,
 	pid_list = rcu_dereference_sched(tr->filtered_pids);
 
 	this_cpu_write(tr->trace_buffer.data->ignore_pid,
-		       check_ignore_pid(pid_list, prev) &&
-		       check_ignore_pid(pid_list, next));
+		       ignore_this_task(pid_list, prev) &&
+		       ignore_this_task(pid_list, next));
 }
 
 static void
@@ -531,7 +531,7 @@ event_filter_pid_sched_switch_probe_post(void *data, bool preempt,
 	pid_list = rcu_dereference_sched(tr->filtered_pids);
 
 	this_cpu_write(tr->trace_buffer.data->ignore_pid,
-		       check_ignore_pid(pid_list, next));
+		       ignore_this_task(pid_list, next));
 }
 
 static void
@@ -547,7 +547,7 @@ event_filter_pid_sched_wakeup_probe_pre(void *data, struct task_struct *task)
 	pid_list = rcu_dereference_sched(tr->filtered_pids);
 
 	this_cpu_write(tr->trace_buffer.data->ignore_pid,
-		       check_ignore_pid(pid_list, task));
+		       ignore_this_task(pid_list, task));
 }
 
 static void
@@ -564,7 +564,7 @@ event_filter_pid_sched_wakeup_probe_post(void *data, struct task_struct *task)
 
 	/* Set tracing if current is enabled */
 	this_cpu_write(tr->trace_buffer.data->ignore_pid,
-		       check_ignore_pid(pid_list, current));
+		       ignore_this_task(pid_list, current));
 }
 
 static void __ftrace_clear_event_pids(struct trace_array *tr)
@@ -1561,7 +1561,7 @@ static void ignore_task_cpu(void *data)
 					     mutex_is_locked(&event_mutex));
 
 	this_cpu_write(tr->trace_buffer.data->ignore_pid,
-		       check_ignore_pid(pid_list, current));
+		       ignore_this_task(pid_list, current));
 }
 
 static ssize_t
-- 
2.8.0.rc3



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid
  2016-04-19 14:34 [RFC][PATCH 0/4] tracing: Add event-fork to trace tasks children Steven Rostedt
  2016-04-19 14:34 ` [RFC][PATCH 1/4] tracing: Rename check_ignore_pid() to ignore_this_task() Steven Rostedt
@ 2016-04-19 14:34 ` Steven Rostedt
       [not found]   ` <1694657549.62933.1461084928341.JavaMail.zimbra@efficios.com>
       [not found]   ` <20160422024530.GA1790@sejong>
  2016-04-19 14:34 ` [RFC][PATCH 3/4] tracing: Add infrastructure to allow set_event_pid to follow children Steven Rostedt
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 14:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, linux-trace-users

[-- Attachment #1: 0002-tracing-Use-pid-bitmap-instead-of-a-pid-array-for-se.patch --]
[-- Type: text/plain, Size: 10472 bytes --]

From: Steven Rostedt <rostedt@goodmis.org>

In order to add the ability to let tasks that are filtered by the events
have their children also be traced on fork (and then not traced on exit),
convert the array into a pid bitmask. Most of the time the number of pids is
only 32768 pids or a 4k bitmask, which is the same size as the default list
currently is, and that list could grow if more pids are listed.

This also greatly simplifies the code.

Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.h        |   5 +-
 kernel/trace/trace_events.c | 221 ++++++++++++++++++++------------------------
 2 files changed, 102 insertions(+), 124 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3fff4adfd431..68cbb8e10aea 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -177,9 +177,8 @@ struct trace_options {
 };
 
 struct trace_pid_list {
-	unsigned int			nr_pids;
-	int				order;
-	pid_t				*pids;
+	int				pid_max;
+	unsigned long			*pids;
 };
 
 /*
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 598a18675a6b..45f7cc72bf25 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -15,7 +15,7 @@
 #include <linux/kthread.h>
 #include <linux/tracefs.h>
 #include <linux/uaccess.h>
-#include <linux/bsearch.h>
+#include <linux/vmalloc.h>
 #include <linux/module.h>
 #include <linux/ctype.h>
 #include <linux/sort.h>
@@ -471,23 +471,13 @@ static void ftrace_clear_events(struct trace_array *tr)
 	mutex_unlock(&event_mutex);
 }
 
-static int cmp_pid(const void *key, const void *elt)
-{
-	const pid_t *search_pid = key;
-	const pid_t *pid = elt;
-
-	if (*search_pid == *pid)
-		return 0;
-	if (*search_pid < *pid)
-		return -1;
-	return 1;
-}
+/* Shouldn't this be in a header? */
+extern int pid_max;
 
 static bool
 ignore_this_task(struct trace_pid_list *filtered_pids, struct task_struct *task)
 {
-	pid_t search_pid;
-	pid_t *pid;
+	pid_t pid;
 
 	/*
 	 * Return false, because if filtered_pids does not exist,
@@ -496,15 +486,16 @@ ignore_this_task(struct trace_pid_list *filtered_pids, struct task_struct *task)
 	if (!filtered_pids)
 		return false;
 
-	search_pid = task->pid;
+	pid = task->pid;
 
-	pid = bsearch(&search_pid, filtered_pids->pids,
-		      filtered_pids->nr_pids, sizeof(pid_t),
-		      cmp_pid);
-	if (!pid)
+	/*
+	 * If pid_max changed after filtered_pids was created, we
+	 * by default ignore all pids greater than the previous pid_max.
+	 */
+	if (task->pid >= filtered_pids->pid_max)
 		return true;
 
-	return false;
+	return !test_bit(task->pid, filtered_pids->pids);
 }
 
 static void
@@ -602,7 +593,7 @@ static void __ftrace_clear_event_pids(struct trace_array *tr)
 	/* Wait till all users are no longer using pid filtering */
 	synchronize_sched();
 
-	free_pages((unsigned long)pid_list->pids, pid_list->order);
+	vfree(pid_list->pids);
 	kfree(pid_list);
 }
 
@@ -946,11 +937,32 @@ static void t_stop(struct seq_file *m, void *p)
 	mutex_unlock(&event_mutex);
 }
 
+static void *
+p_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct trace_array *tr = m->private;
+	struct trace_pid_list *pid_list = rcu_dereference_sched(tr->filtered_pids);
+	unsigned long pid = (unsigned long)v;
+
+	(*pos)++;
+
+	/* pid already is +1 of the actual prevous bit */
+	pid = find_next_bit(pid_list->pids, pid_list->pid_max, pid);
+
+	/* Return pid + 1 to allow zero to be represented */
+	if (pid < pid_list->pid_max)
+		return (void *)(pid + 1);
+
+	return NULL;
+}
+
 static void *p_start(struct seq_file *m, loff_t *pos)
 	__acquires(RCU)
 {
 	struct trace_pid_list *pid_list;
 	struct trace_array *tr = m->private;
+	unsigned long pid;
+	loff_t l = 0;
 
 	/*
 	 * Grab the mutex, to keep calls to p_next() having the same
@@ -963,10 +975,18 @@ static void *p_start(struct seq_file *m, loff_t *pos)
 
 	pid_list = rcu_dereference_sched(tr->filtered_pids);
 
-	if (!pid_list || *pos >= pid_list->nr_pids)
+	if (!pid_list)
+		return NULL;
+
+	pid = find_first_bit(pid_list->pids, pid_list->pid_max);
+	if (pid >= pid_list->pid_max)
 		return NULL;
 
-	return (void *)&pid_list->pids[*pos];
+	/* Return pid + 1 so that zero can be the exit value */
+	for (pid++; pid && l < *pos;
+	     pid = (unsigned long)p_next(m, (void *)pid, &l))
+		;
+	return (void *)pid;
 }
 
 static void p_stop(struct seq_file *m, void *p)
@@ -976,25 +996,11 @@ static void p_stop(struct seq_file *m, void *p)
 	mutex_unlock(&event_mutex);
 }
 
-static void *
-p_next(struct seq_file *m, void *v, loff_t *pos)
-{
-	struct trace_array *tr = m->private;
-	struct trace_pid_list *pid_list = rcu_dereference_sched(tr->filtered_pids);
-
-	(*pos)++;
-
-	if (*pos >= pid_list->nr_pids)
-		return NULL;
-
-	return (void *)&pid_list->pids[*pos];
-}
-
 static int p_show(struct seq_file *m, void *v)
 {
-	pid_t *pid = v;
+	unsigned long pid = (unsigned long)v - 1;
 
-	seq_printf(m, "%d\n", *pid);
+	seq_printf(m, "%lu\n", pid);
 	return 0;
 }
 
@@ -1543,11 +1549,6 @@ show_header(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos)
 	return r;
 }
 
-static int max_pids(struct trace_pid_list *pid_list)
-{
-	return (PAGE_SIZE << pid_list->order) / sizeof(pid_t);
-}
-
 static void ignore_task_cpu(void *data)
 {
 	struct trace_array *tr = data;
@@ -1571,7 +1572,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
 	struct seq_file *m = filp->private_data;
 	struct trace_array *tr = m->private;
 	struct trace_pid_list *filtered_pids = NULL;
-	struct trace_pid_list *pid_list = NULL;
+	struct trace_pid_list *pid_list;
 	struct trace_event_file *file;
 	struct trace_parser parser;
 	unsigned long val;
@@ -1579,7 +1580,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
 	ssize_t read = 0;
 	ssize_t ret = 0;
 	pid_t pid;
-	int i;
+	int nr_pids = 0;
 
 	if (!cnt)
 		return 0;
@@ -1592,10 +1593,43 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
 		return -ENOMEM;
 
 	mutex_lock(&event_mutex);
+	filtered_pids = rcu_dereference_protected(tr->filtered_pids,
+					     lockdep_is_held(&event_mutex));
+
 	/*
-	 * Load as many pids into the array before doing a
-	 * swap from the tr->filtered_pids to the new list.
+	 * Always recreate a new array. The write is an all or nothing
+	 * operation. Always create a new array when adding new pids by
+	 * the user. If the operation fails, then the current list is
+	 * not modified.
 	 */
+	pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL);
+	if (!pid_list) {
+		read = -ENOMEM;
+		goto out;
+	}
+	pid_list->pid_max = READ_ONCE(pid_max);
+	/* Only truncating will shrink pid_max */
+	if (filtered_pids && filtered_pids->pid_max > pid_list->pid_max)
+		pid_list->pid_max = filtered_pids->pid_max;
+	pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
+	if (!pid_list->pids) {
+		kfree(pid_list);
+		read = -ENOMEM;
+		goto out;
+	}
+	if (filtered_pids) {
+		/* copy the current bits to the new max */
+		pid = find_first_bit(filtered_pids->pids,
+				     filtered_pids->pid_max);
+		while (pid < filtered_pids->pid_max) {
+			set_bit(pid, pid_list->pids);
+			pid = find_next_bit(filtered_pids->pids,
+					    filtered_pids->pid_max,
+					    pid + 1);
+			nr_pids++;
+		}
+	}
+
 	while (cnt > 0) {
 
 		this_pos = 0;
@@ -1613,92 +1647,35 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
 		ret = -EINVAL;
 		if (kstrtoul(parser.buffer, 0, &val))
 			break;
-		if (val > INT_MAX)
+		if (val >= pid_list->pid_max)
 			break;
 
 		pid = (pid_t)val;
 
-		ret = -ENOMEM;
-		if (!pid_list) {
-			pid_list = kmalloc(sizeof(*pid_list), GFP_KERNEL);
-			if (!pid_list)
-				break;
-
-			filtered_pids = rcu_dereference_protected(tr->filtered_pids,
-							lockdep_is_held(&event_mutex));
-			if (filtered_pids)
-				pid_list->order = filtered_pids->order;
-			else
-				pid_list->order = 0;
-
-			pid_list->pids = (void *)__get_free_pages(GFP_KERNEL,
-								  pid_list->order);
-			if (!pid_list->pids)
-				break;
-
-			if (filtered_pids) {
-				pid_list->nr_pids = filtered_pids->nr_pids;
-				memcpy(pid_list->pids, filtered_pids->pids,
-				       pid_list->nr_pids * sizeof(pid_t));
-			} else
-				pid_list->nr_pids = 0;
-		}
-
-		if (pid_list->nr_pids >= max_pids(pid_list)) {
-			pid_t *pid_page;
-
-			pid_page = (void *)__get_free_pages(GFP_KERNEL,
-							    pid_list->order + 1);
-			if (!pid_page)
-				break;
-			memcpy(pid_page, pid_list->pids,
-			       pid_list->nr_pids * sizeof(pid_t));
-			free_pages((unsigned long)pid_list->pids, pid_list->order);
-
-			pid_list->order++;
-			pid_list->pids = pid_page;
-		}
+		set_bit(pid, pid_list->pids);
+		nr_pids++;
 
-		pid_list->pids[pid_list->nr_pids++] = pid;
 		trace_parser_clear(&parser);
 		ret = 0;
 	}
 	trace_parser_put(&parser);
 
 	if (ret < 0) {
-		if (pid_list)
-			free_pages((unsigned long)pid_list->pids, pid_list->order);
+		vfree(pid_list->pids);
 		kfree(pid_list);
-		mutex_unlock(&event_mutex);
-		return ret;
-	}
-
-	if (!pid_list) {
-		mutex_unlock(&event_mutex);
-		return ret;
+		read = ret;
+		goto out;
 	}
 
-	sort(pid_list->pids, pid_list->nr_pids, sizeof(pid_t), cmp_pid, NULL);
-
-	/* Remove duplicates */
-	for (i = 1; i < pid_list->nr_pids; i++) {
-		int start = i;
-
-		while (i < pid_list->nr_pids &&
-		       pid_list->pids[i - 1] == pid_list->pids[i])
-			i++;
-
-		if (start != i) {
-			if (i < pid_list->nr_pids) {
-				memmove(&pid_list->pids[start], &pid_list->pids[i],
-					(pid_list->nr_pids - i) * sizeof(pid_t));
-				pid_list->nr_pids -= i - start;
-				i = start;
-			} else
-				pid_list->nr_pids = start;
-		}
+	if (!nr_pids) {
+		/* Cleared the list of pids */
+		vfree(pid_list->pids);
+		kfree(pid_list);
+		read = ret;
+		if (!filtered_pids)
+			goto out;
+		pid_list = NULL;
 	}
-
 	rcu_assign_pointer(tr->filtered_pids, pid_list);
 
 	list_for_each_entry(file, &tr->events, list) {
@@ -1708,7 +1685,7 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
 	if (filtered_pids) {
 		synchronize_sched();
 
-		free_pages((unsigned long)filtered_pids->pids, filtered_pids->order);
+		vfree(filtered_pids->pids);
 		kfree(filtered_pids);
 	} else {
 		/*
@@ -1745,10 +1722,12 @@ ftrace_event_pid_write(struct file *filp, const char __user *ubuf,
 	 */
 	on_each_cpu(ignore_task_cpu, tr, 1);
 
+ out:
 	mutex_unlock(&event_mutex);
 
 	ret = read;
-	*ppos += read;
+	if (read > 0)
+		*ppos += read;
 
 	return ret;
 }
-- 
2.8.0.rc3



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC][PATCH 3/4] tracing: Add infrastructure to allow set_event_pid to follow children
  2016-04-19 14:34 [RFC][PATCH 0/4] tracing: Add event-fork to trace tasks children Steven Rostedt
  2016-04-19 14:34 ` [RFC][PATCH 1/4] tracing: Rename check_ignore_pid() to ignore_this_task() Steven Rostedt
  2016-04-19 14:34 ` [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid Steven Rostedt
@ 2016-04-19 14:34 ` Steven Rostedt
       [not found]   ` <1887707510.62932.1461084911586.JavaMail.zimbra@efficios.com>
  2016-04-19 14:34 ` [RFC][PATCH 4/4] tracing: Update the documentation to describe "event-fork" option Steven Rostedt
       [not found] ` <5716E3E8.7000609@redhat.com>
  4 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 14:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, linux-trace-users

[-- Attachment #1: 0003-tracing-Add-infrastructure-to-allow-set_event_pid-to.patch --]
[-- Type: text/plain, Size: 5444 bytes --]

From: Steven Rostedt <rostedt@goodmis.org>

Add the infrastructure needed to have the PIDs in set_event_pid to
automatically add PIDs of the children of the tasks that have their PIDs in
set_event_pid. This will also remove PIDs from set_event_pid when a task
exits

This is implemented by adding hooks into the fork and exit tracepoints. On
fork, the PIDs are added to the list, and on exit, they are removed.

Add a new option called event_fork that when set, PIDs in set_event_pid will
automatically get their children PIDs added when they fork, as well as any
task that exits will have its PID removed from set_event_pid.

This works for instances as well.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/trace/trace.c        |  3 ++
 kernel/trace/trace.h        |  2 ++
 kernel/trace/trace_events.c | 84 +++++++++++++++++++++++++++++++++++++++------
 3 files changed, 79 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a2f0b9f33e9b..0d12dbde8399 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3571,6 +3571,9 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled)
 	if (mask == TRACE_ITER_RECORD_CMD)
 		trace_event_enable_cmd_record(enabled);
 
+	if (mask == TRACE_ITER_EVENT_FORK)
+		trace_event_follow_fork(tr, enabled);
+
 	if (mask == TRACE_ITER_OVERWRITE) {
 		ring_buffer_change_overwrite(tr->trace_buffer.buffer, enabled);
 #ifdef CONFIG_TRACER_MAX_TRACE
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 68cbb8e10aea..2525042760e6 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -655,6 +655,7 @@ static inline void __trace_stack(struct trace_array *tr, unsigned long flags,
 extern cycle_t ftrace_now(int cpu);
 
 extern void trace_find_cmdline(int pid, char comm[]);
+extern void trace_event_follow_fork(struct trace_array *tr, bool enable);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern unsigned long ftrace_update_tot_cnt;
@@ -966,6 +967,7 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
 		C(STOP_ON_FREE,		"disable_on_free"),	\
 		C(IRQ_INFO,		"irq-info"),		\
 		C(MARKERS,		"markers"),		\
+		C(EVENT_FORK,		"event-fork"),		\
 		FUNCTION_FLAGS					\
 		FGRAPH_FLAGS					\
 		STACK_FLAGS					\
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 45f7cc72bf25..add81dff7520 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -474,11 +474,23 @@ static void ftrace_clear_events(struct trace_array *tr)
 /* Shouldn't this be in a header? */
 extern int pid_max;
 
+/* Returns true if found in filter */
 static bool
-ignore_this_task(struct trace_pid_list *filtered_pids, struct task_struct *task)
+find_filtered_pid(struct trace_pid_list *filtered_pids, pid_t search_pid)
 {
-	pid_t pid;
+	/*
+	 * If pid_max changed after filtered_pids was created, we
+	 * by default ignore all pids greater than the previous pid_max.
+	 */
+	if (search_pid >= filtered_pids->pid_max)
+		return false;
+
+	return test_bit(search_pid, filtered_pids->pids);
+}
 
+static bool
+ignore_this_task(struct trace_pid_list *filtered_pids, struct task_struct *task)
+{
 	/*
 	 * Return false, because if filtered_pids does not exist,
 	 * all pids are good to trace.
@@ -486,16 +498,68 @@ ignore_this_task(struct trace_pid_list *filtered_pids, struct task_struct *task)
 	if (!filtered_pids)
 		return false;
 
-	pid = task->pid;
+	return !find_filtered_pid(filtered_pids, task->pid);
+}
 
-	/*
-	 * If pid_max changed after filtered_pids was created, we
-	 * by default ignore all pids greater than the previous pid_max.
-	 */
-	if (task->pid >= filtered_pids->pid_max)
-		return true;
+static void filter_add_remove_task(struct trace_pid_list *pid_list,
+				   struct task_struct *self,
+				   struct task_struct *task)
+{
+	if (!pid_list)
+		return;
+
+	/* For forks, we only add if the forking task is listed */
+	if (self) {
+		if (!find_filtered_pid(pid_list, self->pid))
+			return;
+	}
+
+	/* Sorry, but we don't support pid_max changing after setting */
+	if (task->pid >= pid_list->pid_max)
+		return;
+
+	/* "self" is set for forks, and NULL for exits */
+	if (self)
+		set_bit(task->pid, pid_list->pids);
+	else
+		clear_bit(task->pid, pid_list->pids);
+}
+
+static void
+event_filter_pid_sched_process_exit(void *data, struct task_struct *task)
+{
+	struct trace_pid_list *pid_list;
+	struct trace_array *tr = data;
+
+	pid_list = rcu_dereference_sched(tr->filtered_pids);
+	filter_add_remove_task(pid_list, NULL, task);
+}
 
-	return !test_bit(task->pid, filtered_pids->pids);
+static void
+event_filter_pid_sched_process_fork(void *data,
+				    struct task_struct *self,
+				    struct task_struct *task)
+{
+	struct trace_pid_list *pid_list;
+	struct trace_array *tr = data;
+
+	pid_list = rcu_dereference_sched(tr->filtered_pids);
+	filter_add_remove_task(pid_list, self, task);
+}
+
+void trace_event_follow_fork(struct trace_array *tr, bool enable)
+{
+	if (enable) {
+		register_trace_prio_sched_process_fork(event_filter_pid_sched_process_fork,
+						       tr, INT_MIN);
+		register_trace_prio_sched_process_exit(event_filter_pid_sched_process_exit,
+						       tr, INT_MAX);
+	} else {
+		unregister_trace_sched_process_fork(event_filter_pid_sched_process_fork,
+						    tr);
+		unregister_trace_sched_process_exit(event_filter_pid_sched_process_exit,
+						    tr);
+	}
 }
 
 static void
-- 
2.8.0.rc3



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC][PATCH 4/4] tracing: Update the documentation to describe "event-fork" option
  2016-04-19 14:34 [RFC][PATCH 0/4] tracing: Add event-fork to trace tasks children Steven Rostedt
                   ` (2 preceding siblings ...)
  2016-04-19 14:34 ` [RFC][PATCH 3/4] tracing: Add infrastructure to allow set_event_pid to follow children Steven Rostedt
@ 2016-04-19 14:34 ` Steven Rostedt
       [not found] ` <5716E3E8.7000609@redhat.com>
  4 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 14:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, H. Peter Anvin, Thomas Gleixner,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, linux-trace-users

[-- Attachment #1: 0004-tracing-Update-the-documentation-to-describe-event-f.patch --]
[-- Type: text/plain, Size: 3377 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

Add documentation to the ftrace.txt file in Documentation to describe the
event-fork option. Also add the missing "display-graph" option now that it
shows up in the trace_options file (from a previous commit).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 Documentation/trace/ftrace.txt | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index f52f297cb406..fdf18485db2b 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -210,6 +210,11 @@ of ftrace. Here is a list of some of the key files:
 	Note, sched_switch and sched_wake_up will also trace events
 	listed in this file.
 
+	To have the PIDs of children of tasks with their PID in this file
+	added on fork, enable the "event-fork" option. That option will also
+	cause the PIDs of tasks to be removed from this file when the task
+	exits.
+
   set_graph_function:
 
 	Set a "trigger" function where tracing should start
@@ -725,16 +730,14 @@ noraw
 nohex
 nobin
 noblock
-nostacktrace
 trace_printk
-noftrace_preempt
 nobranch
 annotate
 nouserstacktrace
 nosym-userobj
 noprintk-msg-only
 context-info
-latency-format
+nolatency-format
 sleep-time
 graph-time
 record-cmd
@@ -742,7 +745,10 @@ overwrite
 nodisable_on_free
 irq-info
 markers
+noevent-fork
 function-trace
+nodisplay-graph
+nostacktrace
 
 To disable one of the options, echo in the option prepended with
 "no".
@@ -796,11 +802,6 @@ Here are the available options:
 
   block - When set, reading trace_pipe will not block when polled.
 
-  stacktrace - This is one of the options that changes the trace
-	       itself. When a trace is recorded, so is the stack
-	       of functions. This allows for back traces of
-	       trace sites.
-
   trace_printk - Can disable trace_printk() from writing into the buffer.
 
   branch - Enable branch tracing with the tracer.
@@ -897,6 +898,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
   	    When disabled, the trace_marker will error with EINVAL
 	    on write.
 
+  event-fork - When set, tasks with PIDs listed in set_event_pid will have
+	       the PIDs of their children added to set_event_pid when those
+	       tasks fork. Also, when tasks with PIDs in set_event_pid exit,
+	       their PIDs will be removed from the file.
 
   function-trace - The latency tracers will enable function tracing
   	    if this option is enabled (default it is). When
@@ -904,8 +909,17 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
 	    functions. This keeps the overhead of the tracer down
 	    when performing latency tests.
 
- Note: Some tracers have their own options. They only appear
-       when the tracer is active.
+  display-graph - When set, the latency tracers (irqsoff, wakeup, etc) will
+	          use function graph tracing instead of function tracing.
+
+  stacktrace - This is one of the options that changes the trace
+	       itself. When a trace is recorded, so is the stack
+	       of functions. This allows for back traces of
+	       trace sites.
+
+ Note: Some tracers have their own options. They only appear in this
+       file when the tracer is active. They always appear in the
+       options directory.
 
 
 
-- 
2.8.0.rc3



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 3/4] tracing: Add infrastructure to allow set_event_pid to follow children
       [not found]   ` <1887707510.62932.1461084911586.JavaMail.zimbra@efficios.com>
@ 2016-04-19 17:13     ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 17:13 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, H. Peter Anvin,
	Thomas Gleixner, Jiri Olsa, Masami Hiramatsu, Namhyung Kim,
	linux-trace-users

On Tue, 19 Apr 2016 16:55:11 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Apr 19, 2016, at 10:34 AM, rostedt rostedt@goodmis.org wrote:
> 
> > From: Steven Rostedt <rostedt@goodmis.org>
> > 
> > Add the infrastructure needed to have the PIDs in set_event_pid to
> > automatically add PIDs of the children of the tasks that have their PIDs in
> > set_event_pid. This will also remove PIDs from set_event_pid when a task
> > exits
> > 
> > This is implemented by adding hooks into the fork and exit tracepoints. On
> > fork, the PIDs are added to the list, and on exit, they are removed.
> > 
> > Add a new option called event_fork that when set, PIDs in set_event_pid will
> > automatically get their children PIDs added when they fork, as well as any
> > task that exits will have its PID removed from set_event_pid.  
> 
> Just out of curiosity: how does it deal with multi-process and multi-thread ?
> What events are expected in each case ?
> 

Not sure what you mean by that. This is in-kernel, and it's simply
tasks. That is, any task (process or thread) that creates another task
has its kernel pid checked. That would be the thread ID as well. So it
works the same with processes as with threads because within the kernel
they are just all just "tasks".

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid
       [not found]   ` <1694657549.62933.1461084928341.JavaMail.zimbra@efficios.com>
@ 2016-04-19 17:19     ` Steven Rostedt
       [not found]       ` <4ACF15B6-D344-4647-9CF8-CEDE5BF5EF70@zytor.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 17:19 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, H. Peter Anvin,
	Thomas Gleixner, Jiri Olsa, Masami Hiramatsu, Namhyung Kim,
	linux-trace-users

On Tue, 19 Apr 2016 16:55:28 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Apr 19, 2016, at 10:34 AM, rostedt rostedt@goodmis.org wrote:
> 
> > From: Steven Rostedt <rostedt@goodmis.org>
> > 
> > In order to add the ability to let tasks that are filtered by the events
> > have their children also be traced on fork (and then not traced on exit),
> > convert the array into a pid bitmask. Most of the time the number of pids is
> > only 32768 pids or a 4k bitmask, which is the same size as the default list
> > currently is, and that list could grow if more pids are listed.
> > 
> > This also greatly simplifies the code.  
> 
> The maximum PID number can be increased with sysctl.
> 
> See "pid_max" in Documentation/sysctl/kernel.txt
> 
> What happens when you have a very large pid_max set ?

I discussed this with HPA, and it appears that the pid_max max would
require a bitmap of about 1/2 meg (the current default is 8k). This is
also why I chose to keep the bitmap as vmalloc and not a continuous
page allocation.

> 
> You say "most of the time" as if this was a fast-path vs a slow-path,
> but it is not the case here.

I meant "most of the time" as "default". Yes, you can make the pid_max
really big, but in that case you better have enough memory in your
system to handle that many threads. Thus a 1/2 meg used for tracking
pids shouldn't be an issue.

> 
> This is a configuration option that can significantly hurt memory usage
> in configurations using a large pid_max.

No, it is created dynamically. If you never write anything into the
set_event_pid file, then you have nothing to worry about, as nothing
is allocated. It creates the array when a pid is added to the file, and
only then. If it fails to allocate, the write will return -ENOMEM as the
errno.

Again, if you have a large pid_max your box had better have a lot of
memory to begin with, because this array will be negligible compared to
the memory required to handle large number of tasks.

> 
> FWIW, I implement a similar feature with a hash table in lttng-modules.
> I don't have the child process tracking though, which is a neat improvement.

I originally had a complex hash algorithm because I too was worried
about the size of pid_max and using a bitmap, but HPA convinced me it
was the way to go.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid
       [not found]       ` <4ACF15B6-D344-4647-9CF8-CEDE5BF5EF70@zytor.com>
@ 2016-04-19 19:41         ` Steven Rostedt
       [not found]           ` <2093660141.63332.1461097049611.JavaMail.zimbra@efficios.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 19:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Jiri Olsa, Masami Hiramatsu, Namhyung Kim,
	linux-trace-users

On Tue, 19 Apr 2016 11:57:32 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> Also, I understand there is one of these bitmaps per ring buffer, and
> the ring buffer is in the tens of megabytes.

Right, there's only one bitmap per tracing instance, which in most
cases is just one (I know of people who make more). And by default, the
tracing buffer is 1.4 megs per CPU.

If you have a pid_max of the max size, I highly doubt you will be doing
that on a single CPU machine. If you have 48 CPUs, the ring buffer will
be 1.4 * 48 megs, making the 1/2 meg bitmap a nit.

I will say, there may be two bitmaps soon, because I plan on adding
this same code to the function tracer logic.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid
       [not found]           ` <2093660141.63332.1461097049611.JavaMail.zimbra@efficios.com>
@ 2016-04-19 20:50             ` Steven Rostedt
       [not found]               ` <568915868.63547.1461100941927.JavaMail.zimbra@efficios.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 20:50 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Jiri Olsa, Masami Hiramatsu, Namhyung Kim,
	linux-trace-users

On Tue, 19 Apr 2016 20:17:29 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:


> Ah indeed, since there is a hard limit to 4194304, that makes the
> worse case bitmap 512k.

Yep.

> 
> We could argue that given a sparse dataset in the PID table (typical
> in our use-cases), a small hash table would have better cache locality
> than the bitmap. But I agree that the hash table does add a bit of
> complexity, so it becomes a complexity vs cache locality tradeoff.
> So I understand why you would want to go for the simpler bitmap
> solution, unless the hash table would prove to bring a measurable
> performance improvement.

We discussed this too (cache locality), and came to the same conclusion
that a bitmask would still be better. If you think about it, if you
have a lot of CPUs and lots of PIDs, tasks don't migrate as much, and
if they do, cache locality of this bitmap will be the least of the
performance issues. Then you have a limited amount of PIDs per CPU, and
thus those PIDs will probably be in the CPU cache for the bitmap.

Note, that the check of the bitmap to trace a task or not is not done
at every tracepoint. It's only done at sched_switch, and then an
internal flag is set. That flag will determine if the event should be
traced, and that is a single bit checked all the time (very good for
cache).

-- Steve



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid
       [not found]               ` <568915868.63547.1461100941927.JavaMail.zimbra@efficios.com>
@ 2016-04-19 22:49                 ` Steven Rostedt
       [not found]                   ` <2099338042.63665.1461106754326.JavaMail.zimbra@efficios.com>
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 22:49 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Jiri Olsa, Masami Hiramatsu, Namhyung Kim,
	linux-trace-users

On Tue, 19 Apr 2016 21:22:21 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> It makes sense. Anyway, looking back at my own implementation, I have
> an array of 64 hlist_head entries (64 * 8 = 512 bytes), typically
> populated by NULL pointers. It's only a factor 8 smaller than the
> bitmap, so it's not a huge gain.

Actually we talked about a second level bitmap for quicker searchers. I
can't remember what it was called, but I'm sure HPA can ;-)

Basically it was a much smaller bitmap, where each bit represents a
number of bits in the main bitmap. When a bit is set in the main
bitmap, its corresponding bit is set in the smaller one. This means
that if you don't have many PIDs, the smaller bitmap wont have many
bits set either, and you keep all the checks very cache local, because
you are checking the smaller bitmap most of the time. But this too
makes things more complex, especially when clearing a bit (although,
that only happens on exit, where speed isn't a big deal). But we
decided it still wasn't worth it.

> 
> One alternative approach would be to keep a few entries (e.g. 16 PIDs)
> in a fast-path lookup array that fits in a single-cache line. When the
> number of PIDs to track go beyond that, fall-back to the bitmap instead.
> 
> > 
> > Note, that the check of the bitmap to trace a task or not is not done
> > at every tracepoint. It's only done at sched_switch, and then an
> > internal flag is set. That flag will determine if the event should be
> > traced, and that is a single bit checked all the time (very good for
> > cache).  
> 
> Could this be used by multiple tracers, and used in a multi-session scheme ?
> In lttng, one user may want to track a set of PIDs, whereas another user may
> be concurrently interested by another set.

I should specify, the bit isn't in the task struct, because different
trace instances may have different criteria to what task may be traced.
That is, you can have multiple buffers tracing multiple tasks. The
tracepoint has a private data structure attached to it that is added
when a tracepoint is registered. This data is a descriptor that
represents the trace instance. This instance descriptor has a flag to
ignore or trace the task.


> 
> Currently, in the lttng kernel tracer, we do the hash table query for
> each tracepoint hit, which is clearly not as efficient as checking a
> task struct flag. One option I see would be to set the task struct flag
> whenever there is at least one tracer/tracing session that is interested
> in this event (this would end up being a reference count on the flag). Then,
> for every flag check that passes, lttng could do HT/bitmap lookups to see if
> the event needs to go to each specific session.
> 
> Is this task struct "trace" flag currently exposed to tracers through a
> reference-counting enable/disable API ? If not, do you think it would make
> sense ?
> 

Nope. As I said, it's on my own descriptor that is passed through the
tracepoint private data.

If you look at my code, you'll notice that I pass around a
"trace_array" struct (tr), which represents all the information about a
single trace instance. What tracer is running, what events are enabled,
and even keeps track of the file descriptors holding the trace event
information. This "tr" has a per_cpu "buffer" section that contains per
cpu data (like the ring buffer). It also has a "data" section for
miscellaneous fields. One of them is now "ignore" which is set when
filtering is on and the sched_switch event noticed that the new task
shouldn't be traced for this instance.

If there's multiple instances, then there will be multiple callbacks
done at each sched_switch. One for each instance.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid
       [not found]                   ` <2099338042.63665.1461106754326.JavaMail.zimbra@efficios.com>
@ 2016-04-19 23:06                     ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2016-04-19 23:06 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Jiri Olsa, Masami Hiramatsu, Namhyung Kim,
	linux-trace-users

On Tue, 19 Apr 2016 22:59:14 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- On Apr 19, 2016, at 6:49 PM, rostedt rostedt@goodmis.org wrote:
> 
> > On Tue, 19 Apr 2016 21:22:21 +0000 (UTC)
> > Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> >   
> >> It makes sense. Anyway, looking back at my own implementation, I have
> >> an array of 64 hlist_head entries (64 * 8 = 512 bytes), typically
> >> populated by NULL pointers. It's only a factor 8 smaller than the
> >> bitmap, so it's not a huge gain.  
> > 
> > Actually we talked about a second level bitmap for quicker searchers. I
> > can't remember what it was called, but I'm sure HPA can ;-)
> > 
> > Basically it was a much smaller bitmap, where each bit represents a
> > number of bits in the main bitmap. When a bit is set in the main
> > bitmap, its corresponding bit is set in the smaller one. This means
> > that if you don't have many PIDs, the smaller bitmap wont have many
> > bits set either, and you keep all the checks very cache local, because
> > you are checking the smaller bitmap most of the time. But this too
> > makes things more complex, especially when clearing a bit (although,
> > that only happens on exit, where speed isn't a big deal). But we
> > decided it still wasn't worth it.  
> 
> Seems like an interesting possible improvement if ever needed.
> 

One reason we decided against it is, if you think about use cases; if
you are tracing a single task, and other tasks are created around it in
the same time period, you will have pids of tasks running that are
close to the pid of the traced task. That means the bit in the smaller
array will most likely be always set, and now you are taking two cache
hits to find the pid you are looking for, and not gaining anything out
of it.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 0/4] tracing: Add event-fork to trace tasks children
       [not found] ` <5716E3E8.7000609@redhat.com>
@ 2016-04-20  2:30   ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2016-04-20  2:30 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, H. Peter Anvin,
	Thomas Gleixner, Jiri Olsa, Masami Hiramatsu, Namhyung Kim,
	linux-trace-users

On Tue, 19 Apr 2016 23:05:28 -0300
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

> On 04/19/2016 11:34 AM, Steven Rostedt wrote:
> > This code adds the event-fork option that, when set, will have tasks
> > with their PIDs in set_event_pid add their children PIDs when they
> > fork. It will also remove their PID from the file on exit.  
> 
> That is a nice feature! I tested it and it works. But, look this...
> 
> Set the event-fork, the current shell pid, and enable the trace:
> 
> [root@f23 tracing]# echo event-fork > trace_options 
> [root@f23 tracing]# echo $$ > set_ftrace_pid 

event-fork does not affect function tracing. I'm currently working on
patches to create a function-fork option for 4.8.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid
       [not found]   ` <20160422024530.GA1790@sejong>
@ 2016-04-22 15:30     ` Steven Rostedt
  0 siblings, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2016-04-22 15:30 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, H. Peter Anvin,
	Thomas Gleixner, Jiri Olsa, Masami Hiramatsu, linux-trace-users

On Fri, 22 Apr 2016 11:45:30 +0900
Namhyung Kim <namhyung@kernel.org> wrote:

> > +	pid_list->pid_max = READ_ONCE(pid_max);
> > +	/* Only truncating will shrink pid_max */
> > +	if (filtered_pids && filtered_pids->pid_max > pid_list->pid_max)
> > +		pid_list->pid_max = filtered_pids->pid_max;
> > +	pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3);
> > +	if (!pid_list->pids) {
> > +		kfree(pid_list);
> > +		read = -ENOMEM;
> > +		goto out;
> > +	}
> > +	if (filtered_pids) {
> > +		/* copy the current bits to the new max */
> > +		pid = find_first_bit(filtered_pids->pids,
> > +				     filtered_pids->pid_max);
> > +		while (pid < filtered_pids->pid_max) {
> > +			set_bit(pid, pid_list->pids);
> > +			pid = find_next_bit(filtered_pids->pids,
> > +					    filtered_pids->pid_max,
> > +					    pid + 1);
> > +			nr_pids++;
> > +		}  
> 
> Why not just use memcpy and keep nr_pids in the pid_list?

This is the slow path (very slow ;-), and this was the first method
that came to my mind (while I programmed this during a conference). I
could use memcpy, or simply one of the bitmask copies, and then get the
nr_pids from bitmask_weight(). I would not keep nr_pids in pid_list
because that would mean that I would have to manage it in the fast path.

Maybe later I'll convert it to bitmap_copy(), but for now I'll just
keep it as is. I move this code in my queue for 4.8, and don't want to
deal with conflicts unless there's a real bug discovered.

Thanks for looking at this code!

-- Steve


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-04-22 15:30 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-19 14:34 [RFC][PATCH 0/4] tracing: Add event-fork to trace tasks children Steven Rostedt
2016-04-19 14:34 ` [RFC][PATCH 1/4] tracing: Rename check_ignore_pid() to ignore_this_task() Steven Rostedt
2016-04-19 14:34 ` [RFC][PATCH 2/4] tracing: Use pid bitmap instead of a pid array for set_event_pid Steven Rostedt
     [not found]   ` <1694657549.62933.1461084928341.JavaMail.zimbra@efficios.com>
2016-04-19 17:19     ` Steven Rostedt
     [not found]       ` <4ACF15B6-D344-4647-9CF8-CEDE5BF5EF70@zytor.com>
2016-04-19 19:41         ` Steven Rostedt
     [not found]           ` <2093660141.63332.1461097049611.JavaMail.zimbra@efficios.com>
2016-04-19 20:50             ` Steven Rostedt
     [not found]               ` <568915868.63547.1461100941927.JavaMail.zimbra@efficios.com>
2016-04-19 22:49                 ` Steven Rostedt
     [not found]                   ` <2099338042.63665.1461106754326.JavaMail.zimbra@efficios.com>
2016-04-19 23:06                     ` Steven Rostedt
     [not found]   ` <20160422024530.GA1790@sejong>
2016-04-22 15:30     ` Steven Rostedt
2016-04-19 14:34 ` [RFC][PATCH 3/4] tracing: Add infrastructure to allow set_event_pid to follow children Steven Rostedt
     [not found]   ` <1887707510.62932.1461084911586.JavaMail.zimbra@efficios.com>
2016-04-19 17:13     ` Steven Rostedt
2016-04-19 14:34 ` [RFC][PATCH 4/4] tracing: Update the documentation to describe "event-fork" option Steven Rostedt
     [not found] ` <5716E3E8.7000609@redhat.com>
2016-04-20  2:30   ` [RFC][PATCH 0/4] tracing: Add event-fork to trace tasks children Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).