All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.
@ 2012-06-05 12:06 Hiraku Toyooka
  2012-07-04 10:47 ` Hiraku Toyooka
  2012-07-11 23:26 ` Steven Rostedt
  0 siblings, 2 replies; 8+ messages in thread
From: Hiraku Toyooka @ 2012-06-05 12:06 UTC (permalink / raw)
  Cc: yrl.pp-manager.tt, Hiraku Toyooka, Steven Rostedt,
	Frederic Weisbecker, Ingo Molnar, Rob Landley, Masami Hiramatsu,
	linux-doc, linux-kernel

Hello,

This patch provides new debugfs interfaces for taking a snapshot
in ftrace.

Ftrace is available as a flight recorder. When a kernel panic
occurs, we can get trace data from the dumped kernel image.

But in case of some systems, non-critical errors (such as
application's errors) are difficult to handle because of following
reasons.

 - When the system has to continue to run, it can't stop tracing.
 - Even then, failure analysis is necessary to prevent the same
   errors.

So, I propose making the ftrace's snapshot feature available from
userspace to solve above dilemma.
(See also
https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_toyooka.pdf)

This enables to preserve trace buffer at a particular point in time
without stopping tracing.  When a snapshot is taken, ftrace swaps the
current buffer with a spare buffer which is prepared in advance. This
means that the tracing itself continues on the spare buffer.

Currently, ftrace has a snapshot feature available from kernel space
and some tracers (e.g. irqsoff) are using it. This patch allows users
to use the same snapshot feature via debugfs.

I added two debugfs files in "tracing" directory.

  snapshot_enabled:

        This is used to set or display whether the snapshot is
        enabled. Echo 1 into this file to prepare a spare buffer
        or 0 to shrink it. So, the memory for the spare buffer
        will be consumed only when this knob is set.

  snapshot_pipe:

        This is used to take a snapshot and to read the output
        of the snapshot (max_tr). Echo 1 into this file to take a
        snapshot. Reads from this file is the same as the
        "trace_pipe" file.

Here is an example of using the snapshot feature.

 # echo nop > current_tracer
 # echo 1 > snapshot_enabled
 # echo 1 > events/sched/enable
 [...]
 # echo 1 > snapshot_pipe
 # cat snapshot_pipe
            bash-3352  [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 p
id=28 prio=0 success=1 target_cpu=006
            bash-3352  [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 p
id=32 prio=0 success=1 target_cpu=007
            bash-3352  [001] d... 18440.883935: sched_switch: prev_comm=bash pre
v_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 next_
prio=0
[...]


Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rob Landley <rob@landley.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---

 Documentation/trace/ftrace.txt    |   47 +++++++++++
 kernel/trace/trace.c              |  152 +++++++++++++++++++++++++++++++++++++
 kernel/trace/trace.h              |    8 ++
 kernel/trace/trace_irqsoff.c      |    3 +
 kernel/trace/trace_sched_wakeup.c |    2 
 5 files changed, 210 insertions(+), 2 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 6f51fed..df9fa13 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1842,6 +1842,53 @@ an error.
  # cat buffer_size_kb
 85
 
+
+Snapshot
+--------
+If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
+feature is available in all tracers except for the special
+tracers which use a snapshot inside themselves(such as "irqsoff"
+or "wakeup").
+
+This enables to preserve trace buffer at a particular point in
+time without stopping tracing.  When a snapshot is taken, ftrace
+swaps the current buffer with a spare buffer which is prepared
+in advance. This means that the tracing itself continues on the
+spare buffer.
+
+Following debugfs files in "tracing" directory are related with
+this feature.
+
+  snapshot_enabled:
+
+	This is used to set or display whether the snapshot is
+	enabled. Echo 1 into this file to prepare a spare buffer
+	or 0 to shrink it. So, the memory for the spare buffer
+	will be consumed only when this knob is set.
+
+  snapshot_pipe:
+
+	This is used to take a snapshot and to read the output
+	of the snapshot. Echo 1 into this file to take a
+	snapshot. Reads from this file is the same as the
+	"trace_pipe" file (described above "The File System"
+	section), so that both reads from the snapshot and
+	tracing are executable in parallel.
+
+Here is an example of using the snapshot feature.
+
+ # echo nop > current_tracer
+ # echo 1 > snapshot_enabled
+ # echo 1 > events/sched/enable
+ [...]
+ # echo 1 > snapshot_pipe
+ # cat snapshot_pipe
+            bash-3352  [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 pid=28 prio=0 success=1 target_cpu=006
+            bash-3352  [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 pid=32 prio=0 success=1 target_cpu=007
+            bash-3352  [001] d... 18440.883935: sched_switch: prev_comm=bash prev_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 next_prio=0
+[...]
+
+
 -----------
 
 More details can be found in the source code, in the
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 68032c6..8e722c3 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3266,7 +3266,8 @@ tracing_max_lat_write(struct file *filp, const char __user *ubuf,
 	return cnt;
 }
 
-static int tracing_open_pipe(struct inode *inode, struct file *filp)
+static int __tracing_open_pipe(struct inode *inode, struct file *filp,
+			       struct trace_array *tr)
 {
 	long cpu_file = (long) inode->i_private;
 	struct trace_iterator *iter;
@@ -3308,7 +3309,7 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
 		iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
 	iter->cpu_file = cpu_file;
-	iter->tr = &global_trace;
+	iter->tr = tr;
 	mutex_init(&iter->mutex);
 	filp->private_data = iter;
 
@@ -3327,6 +3328,11 @@ fail:
 	return ret;
 }
 
+static int tracing_open_pipe(struct inode *inode, struct file *filp)
+{
+	return __tracing_open_pipe(inode, filp, &global_trace);
+}
+
 static int tracing_release_pipe(struct inode *inode, struct file *file)
 {
 	struct trace_iterator *iter = file->private_data;
@@ -4008,6 +4014,122 @@ static int tracing_clock_open(struct inode *inode, struct file *file)
 	return single_open(file, tracing_clock_show, NULL);
 }
 
+#ifdef CONFIG_TRACER_MAX_TRACE
+static int tracing_open_snapshot_pipe(struct inode *inode, struct file *filp)
+{
+	return __tracing_open_pipe(inode, filp, &max_tr);
+}
+
+static ssize_t
+tracing_write_snapshot_pipe(struct file *filp, const char __user *ubuf,
+		       size_t cnt, loff_t *ppos)
+{
+	unsigned long val = 0;
+	int ret;
+
+	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+	if (ret)
+		return ret;
+
+	mutex_lock(&trace_types_lock);
+
+	/* If current tracer's use_max_tr == 0, we prevent taking a snapshot */
+	if (!current_trace->use_max_tr) {
+		ret = -EBUSY;
+		goto out;
+	}
+	if (val) {
+		unsigned long flags;
+		local_irq_save(flags);
+		update_max_tr(&global_trace, current, raw_smp_processor_id());
+		local_irq_restore(flags);
+	}
+out:
+	mutex_unlock(&trace_types_lock);
+
+	if (!ret) {
+		*ppos += cnt;
+		ret = cnt;
+	}
+
+	return ret;
+}
+
+static ssize_t
+tracing_snapshot_ctrl_read(struct file *filp, char __user *ubuf,
+			   size_t cnt, loff_t *ppos)
+{
+	char buf[64];
+	int r;
+
+	r = sprintf(buf, "%d\n", current_trace->use_max_tr);
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static ssize_t
+tracing_snapshot_ctrl_write(struct file *filp, const char __user *ubuf,
+			    size_t cnt, loff_t *ppos)
+{
+	unsigned long val;
+	int ret;
+
+	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+	if (ret)
+		return ret;
+
+	val = !!val;
+
+	mutex_lock(&trace_types_lock);
+	tracing_stop();
+	arch_spin_lock(&ftrace_max_lock);
+
+	/* When always_use_max_tr == 1, we can't toggle use_max_tr. */
+	if (current_trace->always_use_max_tr) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	if (!(current_trace->use_max_tr ^ val))
+		goto out;
+
+	if (val) {
+		int cpu;
+		for_each_tracing_cpu(cpu) {
+			ret = ring_buffer_resize(max_tr.buffer,
+						global_trace.data[cpu]->entries,
+						cpu);
+			if (ret < 0)
+				break;
+			max_tr.data[cpu]->entries =
+				global_trace.data[cpu]->entries;
+		}
+		if (ret < 0) {
+			ring_buffer_resize(max_tr.buffer, 1,
+					   RING_BUFFER_ALL_CPUS);
+			set_buffer_entries(&max_tr, 1);
+			ret = -ENOMEM;
+			goto out;
+		}
+		current_trace->use_max_tr = 1;
+	} else {
+		/* shrink buffer for saving memory */
+		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
+		set_buffer_entries(&max_tr, 1);
+		current_trace->use_max_tr = 0;
+	}
+out:
+	arch_spin_unlock(&ftrace_max_lock);
+	tracing_start();
+	mutex_unlock(&trace_types_lock);
+
+	if (!ret) {
+		*ppos += cnt;
+		ret = cnt;
+	}
+	return ret;
+}
+#endif /* CONFIG_TRACER_MAX_TRACE */
+
 static const struct file_operations tracing_max_lat_fops = {
 	.open		= tracing_open_generic,
 	.read		= tracing_max_lat_read,
@@ -4071,6 +4193,25 @@ static const struct file_operations trace_clock_fops = {
 	.write		= tracing_clock_write,
 };
 
+#ifdef CONFIG_TRACER_MAX_TRACE
+static const struct file_operations tracing_snapshot_pipe_fops = {
+	.open           = tracing_open_snapshot_pipe,
+	.poll           = tracing_poll_pipe,
+	.read           = tracing_read_pipe,
+	.write          = tracing_write_snapshot_pipe,
+	.splice_read    = tracing_splice_read_pipe,
+	.release        = tracing_release_pipe,
+	.llseek         = no_llseek,
+};
+
+static const struct file_operations tracing_snapshot_ctrl_fops = {
+	.open           = tracing_open_generic,
+	.read           = tracing_snapshot_ctrl_read,
+	.write          = tracing_snapshot_ctrl_write,
+	.llseek         = generic_file_llseek,
+};
+#endif /* CONFIG_TRACER_MAX_TRACE */
+
 struct ftrace_buffer_info {
 	struct trace_array	*tr;
 	void			*spare;
@@ -4819,6 +4960,13 @@ static __init int tracer_init_debugfs(void)
 #ifdef CONFIG_TRACER_MAX_TRACE
 	trace_create_file("tracing_max_latency", 0644, d_tracer,
 			&tracing_max_latency, &tracing_max_lat_fops);
+
+	trace_create_file("snapshot_pipe", 0644, d_tracer,
+			  (void *) TRACE_PIPE_ALL_CPU,
+			  &tracing_snapshot_pipe_fops);
+
+	trace_create_file("snapshot_enabled", 0644, d_tracer,
+			  NULL, &tracing_snapshot_ctrl_fops);
 #endif
 
 	trace_create_file("tracing_thresh", 0644, d_tracer,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 5aec220..602657a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -254,6 +254,8 @@ struct tracer_flags {
  * @print_line: callback that prints a trace
  * @set_flag: signals one of your private flags changed (trace_options file)
  * @flags: your private flags
+ * @use_max_tr: show whether the snapshot feature is available on this tracer
+ * @always_use_max_tr: show that max_tr is always used on this tracer
  */
 struct tracer {
 	const char		*name;
@@ -286,7 +288,13 @@ struct tracer {
 	struct tracer		*next;
 	struct tracer_flags	*flags;
 	int			print_max;
+	/* Dynamically toggled via "snapshot_enabled" debugfs file */
 	int			use_max_tr;
+	/*
+	 * If this value is 1, this tracer always uses max_tr and "use_max_tr"
+	 * can't be toggled.
+	 */
+	int			always_use_max_tr;
 };
 
 
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 99d20e9..37cdb75 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -614,6 +614,7 @@ static struct tracer irqsoff_tracer __read_mostly =
 	.open           = irqsoff_trace_open,
 	.close          = irqsoff_trace_close,
 	.use_max_tr	= 1,
+	.always_use_max_tr	= 1,
 };
 # define register_irqsoff(trace) register_tracer(&trace)
 #else
@@ -647,6 +648,7 @@ static struct tracer preemptoff_tracer __read_mostly =
 	.open		= irqsoff_trace_open,
 	.close		= irqsoff_trace_close,
 	.use_max_tr	= 1,
+	.always_use_max_tr	= 1,
 };
 # define register_preemptoff(trace) register_tracer(&trace)
 #else
@@ -682,6 +684,7 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
 	.open		= irqsoff_trace_open,
 	.close		= irqsoff_trace_close,
 	.use_max_tr	= 1,
+	.always_use_max_tr	= 1,
 };
 
 # define register_preemptirqsoff(trace) register_tracer(&trace)
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index ff791ea..b6e91a0 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -599,6 +599,7 @@ static struct tracer wakeup_tracer __read_mostly =
 	.open		= wakeup_trace_open,
 	.close		= wakeup_trace_close,
 	.use_max_tr	= 1,
+	.always_use_max_tr	= 1,
 };
 
 static struct tracer wakeup_rt_tracer __read_mostly =
@@ -620,6 +621,7 @@ static struct tracer wakeup_rt_tracer __read_mostly =
 	.open		= wakeup_trace_open,
 	.close		= wakeup_trace_close,
 	.use_max_tr	= 1,
+	.always_use_max_tr	= 1,
 };
 
 __init static int init_wakeup_tracer(void)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.
  2012-06-05 12:06 [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace Hiraku Toyooka
@ 2012-07-04 10:47 ` Hiraku Toyooka
  2012-07-05  1:01   ` Rob Landley
  2012-07-11 23:26 ` Steven Rostedt
  1 sibling, 1 reply; 8+ messages in thread
From: Hiraku Toyooka @ 2012-07-04 10:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Frederic Weisbecker, Ingo Molnar, Rob Landley, Masami Hiramatsu,
	linux-kernel, yrl.pp-manager.tt

Hello, Steven,

I've sent below RFC patch, but still have no responses. This patch can
be applied to current tip tree.

If you have time, could you give any comment about this patch?

Best regards,
Hiraku Toyooka


(2012/06/05 21:06), Hiraku Toyooka wrote:
> Hello,
> 
> This patch provides new debugfs interfaces for taking a snapshot
> in ftrace.
> 
> Ftrace is available as a flight recorder. When a kernel panic
> occurs, we can get trace data from the dumped kernel image.
> 
> But in case of some systems, non-critical errors (such as
> application's errors) are difficult to handle because of following
> reasons.
> 
>  - When the system has to continue to run, it can't stop tracing.
>  - Even then, failure analysis is necessary to prevent the same
>    errors.
> 
> So, I propose making the ftrace's snapshot feature available from
> userspace to solve above dilemma.
> (See also
> https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_toyooka.pdf)
> 
> This enables to preserve trace buffer at a particular point in time
> without stopping tracing.  When a snapshot is taken, ftrace swaps the
> current buffer with a spare buffer which is prepared in advance. This
> means that the tracing itself continues on the spare buffer.
> 
> Currently, ftrace has a snapshot feature available from kernel space
> and some tracers (e.g. irqsoff) are using it. This patch allows users
> to use the same snapshot feature via debugfs.
> 
> I added two debugfs files in "tracing" directory.
> 
>   snapshot_enabled:
> 
>         This is used to set or display whether the snapshot is
>         enabled. Echo 1 into this file to prepare a spare buffer
>         or 0 to shrink it. So, the memory for the spare buffer
>         will be consumed only when this knob is set.
> 
>   snapshot_pipe:
> 
>         This is used to take a snapshot and to read the output
>         of the snapshot (max_tr). Echo 1 into this file to take a
>         snapshot. Reads from this file is the same as the
>         "trace_pipe" file.
> 
> Here is an example of using the snapshot feature.
> 
>  # echo nop > current_tracer
>  # echo 1 > snapshot_enabled
>  # echo 1 > events/sched/enable
>  [...]
>  # echo 1 > snapshot_pipe
>  # cat snapshot_pipe
>             bash-3352  [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 p
> id=28 prio=0 success=1 target_cpu=006
>             bash-3352  [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 p
> id=32 prio=0 success=1 target_cpu=007
>             bash-3352  [001] d... 18440.883935: sched_switch: prev_comm=bash pre
> v_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 next_
> prio=0
> [...]
> 
> 
> Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Rob Landley <rob@landley.net>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> 
>  Documentation/trace/ftrace.txt    |   47 +++++++++++
>  kernel/trace/trace.c              |  152 +++++++++++++++++++++++++++++++++++++
>  kernel/trace/trace.h              |    8 ++
>  kernel/trace/trace_irqsoff.c      |    3 +
>  kernel/trace/trace_sched_wakeup.c |    2 
>  5 files changed, 210 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
> index 6f51fed..df9fa13 100644
> --- a/Documentation/trace/ftrace.txt
> +++ b/Documentation/trace/ftrace.txt
> @@ -1842,6 +1842,53 @@ an error.
>   # cat buffer_size_kb
>  85
>  
> +
> +Snapshot
> +--------
> +If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
> +feature is available in all tracers except for the special
> +tracers which use a snapshot inside themselves(such as "irqsoff"
> +or "wakeup").
> +
> +This enables to preserve trace buffer at a particular point in
> +time without stopping tracing.  When a snapshot is taken, ftrace
> +swaps the current buffer with a spare buffer which is prepared
> +in advance. This means that the tracing itself continues on the
> +spare buffer.
> +
> +Following debugfs files in "tracing" directory are related with
> +this feature.
> +
> +  snapshot_enabled:
> +
> +	This is used to set or display whether the snapshot is
> +	enabled. Echo 1 into this file to prepare a spare buffer
> +	or 0 to shrink it. So, the memory for the spare buffer
> +	will be consumed only when this knob is set.
> +
> +  snapshot_pipe:
> +
> +	This is used to take a snapshot and to read the output
> +	of the snapshot. Echo 1 into this file to take a
> +	snapshot. Reads from this file is the same as the
> +	"trace_pipe" file (described above "The File System"
> +	section), so that both reads from the snapshot and
> +	tracing are executable in parallel.
> +
> +Here is an example of using the snapshot feature.
> +
> + # echo nop > current_tracer
> + # echo 1 > snapshot_enabled
> + # echo 1 > events/sched/enable
> + [...]
> + # echo 1 > snapshot_pipe
> + # cat snapshot_pipe
> +            bash-3352  [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 pid=28 prio=0 success=1 target_cpu=006
> +            bash-3352  [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 pid=32 prio=0 success=1 target_cpu=007
> +            bash-3352  [001] d... 18440.883935: sched_switch: prev_comm=bash prev_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 next_prio=0
> +[...]
> +
> +
>  -----------
>  
>  More details can be found in the source code, in the
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 68032c6..8e722c3 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -3266,7 +3266,8 @@ tracing_max_lat_write(struct file *filp, const char __user *ubuf,
>  	return cnt;
>  }
>  
> -static int tracing_open_pipe(struct inode *inode, struct file *filp)
> +static int __tracing_open_pipe(struct inode *inode, struct file *filp,
> +			       struct trace_array *tr)
>  {
>  	long cpu_file = (long) inode->i_private;
>  	struct trace_iterator *iter;
> @@ -3308,7 +3309,7 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
>  		iter->iter_flags |= TRACE_FILE_LAT_FMT;
>  
>  	iter->cpu_file = cpu_file;
> -	iter->tr = &global_trace;
> +	iter->tr = tr;
>  	mutex_init(&iter->mutex);
>  	filp->private_data = iter;
>  
> @@ -3327,6 +3328,11 @@ fail:
>  	return ret;
>  }
>  
> +static int tracing_open_pipe(struct inode *inode, struct file *filp)
> +{
> +	return __tracing_open_pipe(inode, filp, &global_trace);
> +}
> +
>  static int tracing_release_pipe(struct inode *inode, struct file *file)
>  {
>  	struct trace_iterator *iter = file->private_data;
> @@ -4008,6 +4014,122 @@ static int tracing_clock_open(struct inode *inode, struct file *file)
>  	return single_open(file, tracing_clock_show, NULL);
>  }
>  
> +#ifdef CONFIG_TRACER_MAX_TRACE
> +static int tracing_open_snapshot_pipe(struct inode *inode, struct file *filp)
> +{
> +	return __tracing_open_pipe(inode, filp, &max_tr);
> +}
> +
> +static ssize_t
> +tracing_write_snapshot_pipe(struct file *filp, const char __user *ubuf,
> +		       size_t cnt, loff_t *ppos)
> +{
> +	unsigned long val = 0;
> +	int ret;
> +
> +	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&trace_types_lock);
> +
> +	/* If current tracer's use_max_tr == 0, we prevent taking a snapshot */
> +	if (!current_trace->use_max_tr) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +	if (val) {
> +		unsigned long flags;
> +		local_irq_save(flags);
> +		update_max_tr(&global_trace, current, raw_smp_processor_id());
> +		local_irq_restore(flags);
> +	}
> +out:
> +	mutex_unlock(&trace_types_lock);
> +
> +	if (!ret) {
> +		*ppos += cnt;
> +		ret = cnt;
> +	}
> +
> +	return ret;
> +}
> +
> +static ssize_t
> +tracing_snapshot_ctrl_read(struct file *filp, char __user *ubuf,
> +			   size_t cnt, loff_t *ppos)
> +{
> +	char buf[64];
> +	int r;
> +
> +	r = sprintf(buf, "%d\n", current_trace->use_max_tr);
> +	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +}
> +
> +static ssize_t
> +tracing_snapshot_ctrl_write(struct file *filp, const char __user *ubuf,
> +			    size_t cnt, loff_t *ppos)
> +{
> +	unsigned long val;
> +	int ret;
> +
> +	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
> +	if (ret)
> +		return ret;
> +
> +	val = !!val;
> +
> +	mutex_lock(&trace_types_lock);
> +	tracing_stop();
> +	arch_spin_lock(&ftrace_max_lock);
> +
> +	/* When always_use_max_tr == 1, we can't toggle use_max_tr. */
> +	if (current_trace->always_use_max_tr) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +
> +	if (!(current_trace->use_max_tr ^ val))
> +		goto out;
> +
> +	if (val) {
> +		int cpu;
> +		for_each_tracing_cpu(cpu) {
> +			ret = ring_buffer_resize(max_tr.buffer,
> +						global_trace.data[cpu]->entries,
> +						cpu);
> +			if (ret < 0)
> +				break;
> +			max_tr.data[cpu]->entries =
> +				global_trace.data[cpu]->entries;
> +		}
> +		if (ret < 0) {
> +			ring_buffer_resize(max_tr.buffer, 1,
> +					   RING_BUFFER_ALL_CPUS);
> +			set_buffer_entries(&max_tr, 1);
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		current_trace->use_max_tr = 1;
> +	} else {
> +		/* shrink buffer for saving memory */
> +		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
> +		set_buffer_entries(&max_tr, 1);
> +		current_trace->use_max_tr = 0;
> +	}
> +out:
> +	arch_spin_unlock(&ftrace_max_lock);
> +	tracing_start();
> +	mutex_unlock(&trace_types_lock);
> +
> +	if (!ret) {
> +		*ppos += cnt;
> +		ret = cnt;
> +	}
> +	return ret;
> +}
> +#endif /* CONFIG_TRACER_MAX_TRACE */
> +
>  static const struct file_operations tracing_max_lat_fops = {
>  	.open		= tracing_open_generic,
>  	.read		= tracing_max_lat_read,
> @@ -4071,6 +4193,25 @@ static const struct file_operations trace_clock_fops = {
>  	.write		= tracing_clock_write,
>  };
>  
> +#ifdef CONFIG_TRACER_MAX_TRACE
> +static const struct file_operations tracing_snapshot_pipe_fops = {
> +	.open           = tracing_open_snapshot_pipe,
> +	.poll           = tracing_poll_pipe,
> +	.read           = tracing_read_pipe,
> +	.write          = tracing_write_snapshot_pipe,
> +	.splice_read    = tracing_splice_read_pipe,
> +	.release        = tracing_release_pipe,
> +	.llseek         = no_llseek,
> +};
> +
> +static const struct file_operations tracing_snapshot_ctrl_fops = {
> +	.open           = tracing_open_generic,
> +	.read           = tracing_snapshot_ctrl_read,
> +	.write          = tracing_snapshot_ctrl_write,
> +	.llseek         = generic_file_llseek,
> +};
> +#endif /* CONFIG_TRACER_MAX_TRACE */
> +
>  struct ftrace_buffer_info {
>  	struct trace_array	*tr;
>  	void			*spare;
> @@ -4819,6 +4960,13 @@ static __init int tracer_init_debugfs(void)
>  #ifdef CONFIG_TRACER_MAX_TRACE
>  	trace_create_file("tracing_max_latency", 0644, d_tracer,
>  			&tracing_max_latency, &tracing_max_lat_fops);
> +
> +	trace_create_file("snapshot_pipe", 0644, d_tracer,
> +			  (void *) TRACE_PIPE_ALL_CPU,
> +			  &tracing_snapshot_pipe_fops);
> +
> +	trace_create_file("snapshot_enabled", 0644, d_tracer,
> +			  NULL, &tracing_snapshot_ctrl_fops);
>  #endif
>  
>  	trace_create_file("tracing_thresh", 0644, d_tracer,
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 5aec220..602657a 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -254,6 +254,8 @@ struct tracer_flags {
>   * @print_line: callback that prints a trace
>   * @set_flag: signals one of your private flags changed (trace_options file)
>   * @flags: your private flags
> + * @use_max_tr: show whether the snapshot feature is available on this tracer
> + * @always_use_max_tr: show that max_tr is always used on this tracer
>   */
>  struct tracer {
>  	const char		*name;
> @@ -286,7 +288,13 @@ struct tracer {
>  	struct tracer		*next;
>  	struct tracer_flags	*flags;
>  	int			print_max;
> +	/* Dynamically toggled via "snapshot_enabled" debugfs file */
>  	int			use_max_tr;
> +	/*
> +	 * If this value is 1, this tracer always uses max_tr and "use_max_tr"
> +	 * can't be toggled.
> +	 */
> +	int			always_use_max_tr;
>  };
>  
>  
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index 99d20e9..37cdb75 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -614,6 +614,7 @@ static struct tracer irqsoff_tracer __read_mostly =
>  	.open           = irqsoff_trace_open,
>  	.close          = irqsoff_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,
>  };
>  # define register_irqsoff(trace) register_tracer(&trace)
>  #else
> @@ -647,6 +648,7 @@ static struct tracer preemptoff_tracer __read_mostly =
>  	.open		= irqsoff_trace_open,
>  	.close		= irqsoff_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,
>  };
>  # define register_preemptoff(trace) register_tracer(&trace)
>  #else
> @@ -682,6 +684,7 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
>  	.open		= irqsoff_trace_open,
>  	.close		= irqsoff_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,
>  };
>  
>  # define register_preemptirqsoff(trace) register_tracer(&trace)
> diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
> index ff791ea..b6e91a0 100644
> --- a/kernel/trace/trace_sched_wakeup.c
> +++ b/kernel/trace/trace_sched_wakeup.c
> @@ -599,6 +599,7 @@ static struct tracer wakeup_tracer __read_mostly =
>  	.open		= wakeup_trace_open,
>  	.close		= wakeup_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,
>  };
>  
>  static struct tracer wakeup_rt_tracer __read_mostly =
> @@ -620,6 +621,7 @@ static struct tracer wakeup_rt_tracer __read_mostly =
>  	.open		= wakeup_trace_open,
>  	.close		= wakeup_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,
>  };
>  
>  __init static int init_wakeup_tracer(void)
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.
  2012-07-04 10:47 ` Hiraku Toyooka
@ 2012-07-05  1:01   ` Rob Landley
  2012-07-11  5:39     ` Hiraku Toyooka
  0 siblings, 1 reply; 8+ messages in thread
From: Rob Landley @ 2012-07-05  1:01 UTC (permalink / raw)
  To: Hiraku Toyooka
  Cc: Steven Rostedt, Frederic Weisbecker, Ingo Molnar,
	Masami Hiramatsu, linux-kernel, yrl.pp-manager.tt

On 07/04/2012 05:47 AM, Hiraku Toyooka wrote:
> Hello, Steven,
> 
> I've sent below RFC patch, but still have no responses. This patch can
> be applied to current tip tree.
> 
> If you have time, could you give any comment about this patch?

My familiarity with ftrace is "I saw a presentation on it at a
conference a couple years ago", so I'm not the guy to comment on the
advisability of the patch itself.

As for the documentation: seems reasonable, could use some english
polishing.

>> +Snapshot
>> +--------
>> +If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
>> +feature is available in all tracers except for the special
>> +tracers which use a snapshot inside themselves(such as "irqsoff"
>> +or "wakeup").

This is confusing, I'm guessing irqsoff and wakeup already have
persistent buffers without CONFIG_TRACER_MAX_TRACE? (So there is a
generic snapshot feature, but some tracers have their own internal
buffer and don't use the generic one?)

Is the fact that some tracers don't use this feature an important part
of the description of the feature? Is making them use common code a todo
item, or just a comment?

How about:

CONFIG_TRACER_MAX_TRACE makes a generic snapshot feature available to
all tracers. (Some tracers, such as "irqsoff" or "wakeup", use their own
internal snapshot implementation.)

>> +This enables to preserve trace buffer at a particular point in
>> +time without stopping tracing. When a snapshot is taken, ftrace
>> +swaps the current buffer with a spare buffer which is prepared
>> +in advance. This means that the tracing itself continues on the
>> +spare buffer.

Snapshots preserve a trace buffer at a particular point in time without
stopping tracing; ftrace swaps the current buffer with a spare buffer,
and tracking continues in the spare buffer.

>> +Following debugfs files in "tracing" directory are related with
>> +this feature.

The following debugfs files in "tracing" are related to this feature:

>> +  snapshot_enabled:
>> +
>> +    This is used to set or display whether the snapshot is
>> +    enabled. Echo 1 into this file to prepare a spare buffer
>> +    or 0 to shrink it. So, the memory for the spare buffer
>> +    will be consumed only when this knob is set.

Write 1 to this file to allocate a snapshot buffer, 0 to free it.

(Query: do you have to free the buffer after taking a snapshot below?)

>> +  snapshot_pipe:
>> +
>> +    This is used to take a snapshot and to read the output
>> +    of the snapshot. Echo 1 into this file to take a
>> +    snapshot. Reads from this file is the same as the
>> +    "trace_pipe" file (described above "The File System"
>> +    section), so that both reads from the snapshot and
>> +    tracing are executable in parallel.

Echo 1 into this file to take a snapshot, then read the snapshot from
the file in the same format as "trace_pipe" (described above in the
section "The File System").

Design questions left for the reader: why are allocating a snapshot
buffer and taking a snapshot separate actions? Why is there only _one_
snapshot buffer?

>> +Here is an example of using the snapshot feature.
>> +
>> + # echo nop > current_tracer
>> + # echo 1 > snapshot_enabled
>> + # echo 1 > events/sched/enable
>> + [...]
>> + # echo 1 > snapshot_pipe

Is the order significant here? Do you have to set up a snapshot buffer
before you start tracing? When you switch buffers, does the new buffer
start empty, or with a copy of the previous trace data?

I'm guessing the reason buffer switching was described above is to
explain that taking a snapshot blanks the current trace history since
you switch to an empty buffer?

>> + # cat snapshot_pipe
>> +            bash-3352  [001] dN.. 18440.883932: sched_wakeup:
>> comm=migration/6 pid=28 prio=0 success=1 target_cpu=006
>> +            bash-3352  [001] dN.. 18440.883933: sched_wakeup:
>> comm=migration/7 pid=32 prio=0 success=1 target_cpu=007
>> +            bash-3352  [001] d... 18440.883935: sched_switch:
>> prev_comm=bash prev_pid=3352 prev_prio=120 prev_state=R ==>
>> next_comm=migration/1 next_pid=8 next_prio=0
>> +[...]

How big is one of these buffers, anyway?  I take it the reason opening
snapshot_pipe doesn't just allocate a snapshot buffer and take the
snapshot then (to persist for the duration of the filehandle) is to
avoid pinning too much kernel memory?

Rob
-- 
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation.  Pick one.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.
  2012-07-05  1:01   ` Rob Landley
@ 2012-07-11  5:39     ` Hiraku Toyooka
  2012-07-11 18:00       ` Rob Landley
  0 siblings, 1 reply; 8+ messages in thread
From: Hiraku Toyooka @ 2012-07-11  5:39 UTC (permalink / raw)
  To: Rob Landley; +Cc: linux-kernel, yrl.pp-manager.tt, ltc-kernel

Hello, Rob,

Thank you very much for your advice.

(2012/07/05 10:01), Rob Landley wrote:
> On 07/04/2012 05:47 AM, Hiraku Toyooka wrote:
>> Hello, Steven,
>>
>> I've sent below RFC patch, but still have no responses. This patch can
>> be applied to current tip tree.
>>
>> If you have time, could you give any comment about this patch?
> 
> My familiarity with ftrace is "I saw a presentation on it at a
> conference a couple years ago", so I'm not the guy to comment on the
> advisability of the patch itself.
> 
> As for the documentation: seems reasonable, could use some english
> polishing.
> 
>>> +Snapshot
>>> +--------
>>> +If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
>>> +feature is available in all tracers except for the special
>>> +tracers which use a snapshot inside themselves(such as "irqsoff"
>>> +or "wakeup").
> 
> This is confusing, I'm guessing irqsoff and wakeup already have
> persistent buffers without CONFIG_TRACER_MAX_TRACE? (So there is a
> generic snapshot feature, but some tracers have their own internal
> buffer and don't use the generic one?)
> 

No, CONFIG_TRACER_MAX_TRACE is originally for making the spare buffer
available. Both some special tracers (such as irqsoff) and generic
snapshot uses the common spare buffer. But purpose of each snapshot is
different. In the special tracers, snapshot is used for recording
information related to max latency of something (such as longest
interrupt-disabled area). This is automatically updated only when the
max latency is detected. In other words, those special tracers use the
same spare buffer in the different way. Thus, we can not enable generic
snapshot for those tracers.


> Is the fact that some tracers don't use this feature an important part
> of the description of the feature? Is making them use common code a todo
> item, or just a comment?
> 

Anyway, the fact is not important here.


> How about:
> 
> CONFIG_TRACER_MAX_TRACE makes a generic snapshot feature available to
> all tracers. (Some tracers, such as "irqsoff" or "wakeup", use their own
> internal snapshot implementation.)
> 

Thanks, but I think the following one is more suitable.

(Some tracers, such as "irqsoff" or "wakeup", already use the snapshot
implementation internally)


>>> +This enables to preserve trace buffer at a particular point in
>>> +time without stopping tracing. When a snapshot is taken, ftrace
>>> +swaps the current buffer with a spare buffer which is prepared
>>> +in advance. This means that the tracing itself continues on the
>>> +spare buffer.
> 
> Snapshots preserve a trace buffer at a particular point in time without
> stopping tracing; ftrace swaps the current buffer with a spare buffer,
> and tracking continues in the spare buffer.
> 
>>> +Following debugfs files in "tracing" directory are related with
>>> +this feature.
> 
> The following debugfs files in "tracing" are related to this feature:
> 
>>> +  snapshot_enabled:
>>> +
>>> +    This is used to set or display whether the snapshot is
>>> +    enabled. Echo 1 into this file to prepare a spare buffer
>>> +    or 0 to shrink it. So, the memory for the spare buffer
>>> +    will be consumed only when this knob is set.
> 
> Write 1 to this file to allocate a snapshot buffer, 0 to free it.
> 

I'll fix them.


> (Query: do you have to free the buffer after taking a snapshot below?)
> 

No, we don't always need to free the buffer, although we can free it
when the snapshot becomes unnecessary. We can also reuse the buffer if
we'd like to take the next snapshot.
(I'll add this description.)


>>> +  snapshot_pipe:
>>> +
>>> +    This is used to take a snapshot and to read the output
>>> +    of the snapshot. Echo 1 into this file to take a
>>> +    snapshot. Reads from this file is the same as the
>>> +    "trace_pipe" file (described above "The File System"
>>> +    section), so that both reads from the snapshot and
>>> +    tracing are executable in parallel.
> 
> Echo 1 into this file to take a snapshot, then read the snapshot from
> the file in the same format as "trace_pipe" (described above in the
> section "The File System").
> 

I'll fix that.

> Design questions left for the reader: why are allocating a snapshot
> buffer and taking a snapshot separate actions?

I'll add following description:
Allocating a spare buffer and taking a snapshot are separated because
latter one should be done successfully and immediately. If they are not
separated, we may fail to take a snapshot because of memory allocation
failure or we may lose older trace data while allocating memory.

 > Why is there only _one_ snapshot buffer?
 >

Because of easy reviewing, I've decided to implement step by step. So
this patch just provide "only one" spare buffer. However, I got some
feedback for multiple snapshot at LinuxCon Japan 2012. So I may extend
this feature.


>>> +Here is an example of using the snapshot feature.
>>> +
>>> + # echo nop > current_tracer
>>> + # echo 1 > snapshot_enabled
>>> + # echo 1 > events/sched/enable
>>> + [...]
>>> + # echo 1 > snapshot_pipe
> 
> Is the order significant here? 

If the current_tracer is a special tracer such as irqsoff,
snapshot_enabled can't be set to 1. So current_tracer has to be
configured to appropriate tracer in the first place.

> Do you have to set up a snapshot buffer before you start tracing? 

No. We can enable trace events or start tracing any time.

> When you switch buffers, does the new buffer
> start empty, or with a copy of the previous trace data?
>

Yes, after taking a snapshot, current buffer starts empty.


> I'm guessing the reason buffer switching was described above is to
> explain that taking a snapshot blanks the current trace history since
> you switch to an empty buffer?
> 

Yes. Actually, this snapshot feature only swaps the current buffer and
the spare buffer. Now I'm considering to call it "swap" instead of
"snapshot"...


>>> + # cat snapshot_pipe
>>> +            bash-3352  [001] dN.. 18440.883932: sched_wakeup:
>>> comm=migration/6 pid=28 prio=0 success=1 target_cpu=006
>>> +            bash-3352  [001] dN.. 18440.883933: sched_wakeup:
>>> comm=migration/7 pid=32 prio=0 success=1 target_cpu=007
>>> +            bash-3352  [001] d... 18440.883935: sched_switch:
>>> prev_comm=bash prev_pid=3352 prev_prio=120 prev_state=R ==>
>>> next_comm=migration/1 next_pid=8 next_prio=0
>>> +[...]
> 
> How big is one of these buffers, anyway?  I take it the reason opening
> snapshot_pipe doesn't just allocate a snapshot buffer and take the
> snapshot then (to persist for the duration of the filehandle) is to
> avoid pinning too much kernel memory?
> 

The size of two buffers is same because this is swap.
The reason of the separated operations is to swap two buffers
immediately without fail. if opening snapshot_pipe allocates a buffer,
we need a special userspace command to realize that.


Best regards,
Hiraku Toyooka


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.
  2012-07-11  5:39     ` Hiraku Toyooka
@ 2012-07-11 18:00       ` Rob Landley
  2012-07-20  5:25         ` Hiraku Toyooka
  0 siblings, 1 reply; 8+ messages in thread
From: Rob Landley @ 2012-07-11 18:00 UTC (permalink / raw)
  To: Hiraku Toyooka; +Cc: linux-kernel, yrl.pp-manager.tt, ltc-kernel

On 07/11/2012 12:39 AM, Hiraku Toyooka wrote:
> Hello, Rob,
> 
> Thank you very much for your advice.
> 
> (2012/07/05 10:01), Rob Landley wrote:
>> On 07/04/2012 05:47 AM, Hiraku Toyooka wrote:
>>> Hello, Steven,
>>>
>>> I've sent below RFC patch, but still have no responses. This patch can
>>> be applied to current tip tree.
>>>
>>> If you have time, could you give any comment about this patch?
>>
>> My familiarity with ftrace is "I saw a presentation on it at a
>> conference a couple years ago", so I'm not the guy to comment on the
>> advisability of the patch itself.
>>
>> As for the documentation: seems reasonable, could use some english
>> polishing.
>>
>>>> +Snapshot
>>>> +--------
>>>> +If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
>>>> +feature is available in all tracers except for the special
>>>> +tracers which use a snapshot inside themselves(such as "irqsoff"
>>>> +or "wakeup").
>>
>> This is confusing, I'm guessing irqsoff and wakeup already have
>> persistent buffers without CONFIG_TRACER_MAX_TRACE? (So there is a
>> generic snapshot feature, but some tracers have their own internal
>> buffer and don't use the generic one?)
>>
> 
> No, CONFIG_TRACER_MAX_TRACE is originally for making the spare buffer
> available. Both some special tracers (such as irqsoff) and generic
> snapshot uses the common spare buffer. But purpose of each snapshot is
> different. In the special tracers, snapshot is used for recording
> information related to max latency of something (such as longest
> interrupt-disabled area). This is automatically updated only when the
> max latency is detected. In other words, those special tracers use the
> same spare buffer in the different way. Thus, we can not enable generic
> snapshot for those tracers.

Ok, those tracers are not _compatible_ with snapshot.

Good to know.

>> Is the fact that some tracers don't use this feature an important part
>> of the description of the feature? Is making them use common code a todo
>> item, or just a comment?
>>
> 
> Anyway, the fact is not important here.
> 
> 
>> How about:
>>
>> CONFIG_TRACER_MAX_TRACE makes a generic snapshot feature available to
>> all tracers. (Some tracers, such as "irqsoff" or "wakeup", use their own
>> internal snapshot implementation.)
>>
> 
> Thanks, but I think the following one is more suitable.
> 
> (Some tracers, such as "irqsoff" or "wakeup", already use the snapshot
> implementation internally)

This implies that setting flag is a NOP for them, rather than "if you
take a snapshot, they'll stomp all over the buffer".

>>>> +This enables to preserve trace buffer at a particular point in
>>>> +time without stopping tracing. When a snapshot is taken, ftrace
>>>> +swaps the current buffer with a spare buffer which is prepared
>>>> +in advance. This means that the tracing itself continues on the
>>>> +spare buffer.
>>
>> Snapshots preserve a trace buffer at a particular point in time without
>> stopping tracing; ftrace swaps the current buffer with a spare buffer,
>> and tracking continues in the spare buffer.
>>
>>>> +Following debugfs files in "tracing" directory are related with
>>>> +this feature.
>>
>> The following debugfs files in "tracing" are related to this feature:
>>
>>>> +  snapshot_enabled:
>>>> +
>>>> +    This is used to set or display whether the snapshot is
>>>> +    enabled. Echo 1 into this file to prepare a spare buffer
>>>> +    or 0 to shrink it. So, the memory for the spare buffer
>>>> +    will be consumed only when this knob is set.
>>
>> Write 1 to this file to allocate a snapshot buffer, 0 to free it.
>>
> 
> I'll fix them.
> 
> 
>> (Query: do you have to free the buffer after taking a snapshot below?)
>>
> 
> No, we don't always need to free the buffer, although we can free it
> when the snapshot becomes unnecessary. We can also reuse the buffer if
> we'd like to take the next snapshot.
> (I'll add this description.)

Actually I was worried about the lifetime rules for the buffer (when
does it need to be disposed of, and who is responsible for doing so?)
but it looks like ftrace only allows one trace to be going on in the
entire system at any given time, so all this context is kernel global
anyway...

>>>> +  snapshot_pipe:
>>>> +
>>>> +    This is used to take a snapshot and to read the output
>>>> +    of the snapshot. Echo 1 into this file to take a
>>>> +    snapshot. Reads from this file is the same as the
>>>> +    "trace_pipe" file (described above "The File System"
>>>> +    section), so that both reads from the snapshot and
>>>> +    tracing are executable in parallel.
>>
>> Echo 1 into this file to take a snapshot, then read the snapshot from
>> the file in the same format as "trace_pipe" (described above in the
>> section "The File System").
>>
> 
> I'll fix that.
> 
>> Design questions left for the reader: why are allocating a snapshot
>> buffer and taking a snapshot separate actions?
> 
> I'll add following description:
> Allocating a spare buffer and taking a snapshot are separated because
> latter one should be done successfully and immediately. If they are not
> separated, we may fail to take a snapshot because of memory allocation
> failure or we may lose older trace data while allocating memory.

Buffer allocation is a separate action because taking a snapshot is a
low-latency operation that should never fail. Got it.

But again, why couldn't open() do one and read() (from position 0) do
the other? And then if you wanted to take another snapshot you could
lseek() back to 0...

*shrug* I suppose the way you have it works. Just curious.

>> Why is there only _one_ snapshot buffer?
>>
> 
> Because of easy reviewing, I've decided to implement step by step. So
> this patch just provide "only one" spare buffer. However, I got some
> feedback for multiple snapshot at LinuxCon Japan 2012. So I may extend
> this feature.

The implementation can change all you like after the code goes in, but
you really have to get the API right because you're going to be stuck
supporting it <pinkypie>FOREVER</pinkypie>.

(The <tag> is a cultural reference from a local anime. Don't worry about
it.)

>>>> +Here is an example of using the snapshot feature.
>>>> +
>>>> + # echo nop > current_tracer
>>>> + # echo 1 > snapshot_enabled
>>>> + # echo 1 > events/sched/enable
>>>> + [...]
>>>> + # echo 1 > snapshot_pipe
>>
>> Is the order significant here? 
> 
> If the current_tracer is a special tracer such as irqsoff,
> snapshot_enabled can't be set to 1. So current_tracer has to be
> configured to appropriate tracer in the first place.

But they aren't compatible anyway? (So presumably this call failing is
part of the enforcement mechanism by which the incompatible combination
is disallowed?)

What happens if you do it the other way around? (echo 1 >
snapshot_enabled and then echo irqsoff > current_tracer)?

>> Do you have to set up a snapshot buffer before you start tracing? 
> 
> No. We can enable trace events or start tracing any time.
>
>> When you switch buffers, does the new buffer
>> start empty, or with a copy of the previous trace data?
>>
> 
> Yes, after taking a snapshot, current buffer starts empty.
> 
> 
>> I'm guessing the reason buffer switching was described above is to
>> explain that taking a snapshot blanks the current trace history since
>> you switch to an empty buffer?
>>
> 
> Yes. Actually, this snapshot feature only swaps the current buffer and
> the spare buffer. Now I'm considering to call it "swap" instead of
> "snapshot"...

I'm all for clarifying what it does, but we already have a subsystem
called "swap" (swap partitions, swapon/swapoff) so that name isn't
necessarily less confusing.

>>>> + # cat snapshot_pipe
>>>> +            bash-3352  [001] dN.. 18440.883932: sched_wakeup:
>>>> comm=migration/6 pid=28 prio=0 success=1 target_cpu=006
>>>> +            bash-3352  [001] dN.. 18440.883933: sched_wakeup:
>>>> comm=migration/7 pid=32 prio=0 success=1 target_cpu=007
>>>> +            bash-3352  [001] d... 18440.883935: sched_switch:
>>>> prev_comm=bash prev_pid=3352 prev_prio=120 prev_state=R ==>
>>>> next_comm=migration/1 next_pid=8 next_prio=0
>>>> +[...]
>>
>> How big is one of these buffers, anyway?  I take it the reason opening
>> snapshot_pipe doesn't just allocate a snapshot buffer and take the
>> snapshot then (to persist for the duration of the filehandle) is to
>> avoid pinning too much kernel memory?
>>
> 
> The size of two buffers is same because this is swap.

So the new one is also buffer_size_kb?

Out of curiosity, what happens if you go:

  echo nop > current_tracer
  echo 1 > snapshot_enabled
  echo 64 > buffer_size_kb

(Techncially current_tracer is still nop...)

> The reason of the separated operations is to swap two buffers
> immediately without fail. if opening snapshot_pipe allocates a buffer,
> we need a special userspace command to realize that.

Um, cat can open a file? I don't understand.

If you need a delay between open and read, you can do that from the
command line with something like:

  (read -n 1 < /dev/tty; cat > ~/walrus.txt) < /path/to/thingy

I dunno if that's a good idea, I'm just trying to understand the design
here...

Rob
-- 
GNU/Linux isn't: Linux=GPLv2, GNU=GPLv3+, they can't share code.
Either it's "mere aggregation", or a license violation.  Pick one.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.
  2012-06-05 12:06 [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace Hiraku Toyooka
  2012-07-04 10:47 ` Hiraku Toyooka
@ 2012-07-11 23:26 ` Steven Rostedt
  2012-07-20  5:25   ` Hiraku Toyooka
  1 sibling, 1 reply; 8+ messages in thread
From: Steven Rostedt @ 2012-07-11 23:26 UTC (permalink / raw)
  To: Hiraku Toyooka
  Cc: yrl.pp-manager.tt, Frederic Weisbecker, Ingo Molnar, Rob Landley,
	Masami Hiramatsu, linux-doc, linux-kernel

Sorry for taking so long, I've been hammering away at other things. :-/


On Tue, 2012-06-05 at 21:06 +0900, Hiraku Toyooka wrote:
> Hello,
> 
> This patch provides new debugfs interfaces for taking a snapshot
> in ftrace.
> 
> Ftrace is available as a flight recorder. When a kernel panic
> occurs, we can get trace data from the dumped kernel image.
> 
> But in case of some systems, non-critical errors (such as
> application's errors) are difficult to handle because of following
> reasons.
> 
>  - When the system has to continue to run, it can't stop tracing.
>  - Even then, failure analysis is necessary to prevent the same
>    errors.
> 
> So, I propose making the ftrace's snapshot feature available from
> userspace to solve above dilemma.
> (See also
> https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_toyooka.pdf)
> 
> This enables to preserve trace buffer at a particular point in time
> without stopping tracing.  When a snapshot is taken, ftrace swaps the
> current buffer with a spare buffer which is prepared in advance. This
> means that the tracing itself continues on the spare buffer.
> 
> Currently, ftrace has a snapshot feature available from kernel space
> and some tracers (e.g. irqsoff) are using it. This patch allows users
> to use the same snapshot feature via debugfs.
> 
> I added two debugfs files in "tracing" directory.
> 
>   snapshot_enabled:
> 
>         This is used to set or display whether the snapshot is
>         enabled. Echo 1 into this file to prepare a spare buffer
>         or 0 to shrink it. So, the memory for the spare buffer
>         will be consumed only when this knob is set.
> 
>   snapshot_pipe:
> 
>         This is used to take a snapshot and to read the output
>         of the snapshot (max_tr). Echo 1 into this file to take a
>         snapshot. Reads from this file is the same as the
>         "trace_pipe" file.
> 
> Here is an example of using the snapshot feature.
> 
>  # echo nop > current_tracer
>  # echo 1 > snapshot_enabled
>  # echo 1 > events/sched/enable
>  [...]
>  # echo 1 > snapshot_pipe
>  # cat snapshot_pipe
>             bash-3352  [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 p
> id=28 prio=0 success=1 target_cpu=006
>             bash-3352  [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 p
> id=32 prio=0 success=1 target_cpu=007
>             bash-3352  [001] d... 18440.883935: sched_switch: prev_comm=bash pre
> v_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 next_
> prio=0
> [...]
> 
> 
> Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Rob Landley <rob@landley.net>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> 
>  Documentation/trace/ftrace.txt    |   47 +++++++++++
>  kernel/trace/trace.c              |  152 +++++++++++++++++++++++++++++++++++++
>  kernel/trace/trace.h              |    8 ++
>  kernel/trace/trace_irqsoff.c      |    3 +
>  kernel/trace/trace_sched_wakeup.c |    2 
>  5 files changed, 210 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
> index 6f51fed..df9fa13 100644
> --- a/Documentation/trace/ftrace.txt
> +++ b/Documentation/trace/ftrace.txt
> @@ -1842,6 +1842,53 @@ an error.
>   # cat buffer_size_kb
>  85
>  
> +
> +Snapshot
> +--------
> +If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
> +feature is available in all tracers except for the special
> +tracers which use a snapshot inside themselves(such as "irqsoff"
> +or "wakeup").

I find this kind of ironic, that it is only defined when one of the
tracers that can't use it defines it.

Maybe we should make it a prompt config for this feature.

> +
> +This enables to preserve trace buffer at a particular point in
> +time without stopping tracing.  When a snapshot is taken, ftrace
> +swaps the current buffer with a spare buffer which is prepared
> +in advance. This means that the tracing itself continues on the
> +spare buffer.
> +
> +Following debugfs files in "tracing" directory are related with
> +this feature.
> +
> +  snapshot_enabled:
> +
> +	This is used to set or display whether the snapshot is
> +	enabled. Echo 1 into this file to prepare a spare buffer
> +	or 0 to shrink it. So, the memory for the spare buffer
> +	will be consumed only when this knob is set.
> +
> +  snapshot_pipe:
> +
> +	This is used to take a snapshot and to read the output
> +	of the snapshot. Echo 1 into this file to take a
> +	snapshot. Reads from this file is the same as the
> +	"trace_pipe" file (described above "The File System"
> +	section), so that both reads from the snapshot and
> +	tracing are executable in parallel.

I don't really like the name snapshot_pipe. What about just calling it
snapshot, and just document that it works like trace_pipe?

Also, rename snapshot_enabled, to snapshot_allocate. If someone echos 1
into snapshot, it would automatically allocate the buffer (and set
snapshot_allocate to 1). If you don't want the delay (for allocation),
then you can do the echo 1 into snapshot_allocate first, and it would
behave as it does here.


> +
> +Here is an example of using the snapshot feature.
> +
> + # echo nop > current_tracer
> + # echo 1 > snapshot_enabled
> + # echo 1 > events/sched/enable
> + [...]
> + # echo 1 > snapshot_pipe
> + # cat snapshot_pipe
> +            bash-3352  [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 pid=28 prio=0 success=1 target_cpu=006
> +            bash-3352  [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 pid=32 prio=0 success=1 target_cpu=007
> +            bash-3352  [001] d... 18440.883935: sched_switch: prev_comm=bash prev_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 next_prio=0
> +[...]

BTW, why make it a pipe action anyway? As a snapshot doesn't have a
writer to it, doing just an iterate over the snapshot would make sense,
wouldn't it?

If you reply with a good rational for keeping the snapshot_pipe, then we
should have both snapshot and snapshot_pipe, where snapshot works like
trace and snapshot_pipe works like trace_pipe.

> +
> +
>  -----------
>  
>  More details can be found in the source code, in the
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 68032c6..8e722c3 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -3266,7 +3266,8 @@ tracing_max_lat_write(struct file *filp, const char __user *ubuf,
>  	return cnt;
>  }
>  
> -static int tracing_open_pipe(struct inode *inode, struct file *filp)
> +static int __tracing_open_pipe(struct inode *inode, struct file *filp,
> +			       struct trace_array *tr)
>  {
>  	long cpu_file = (long) inode->i_private;
>  	struct trace_iterator *iter;
> @@ -3308,7 +3309,7 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
>  		iter->iter_flags |= TRACE_FILE_LAT_FMT;
>  
>  	iter->cpu_file = cpu_file;
> -	iter->tr = &global_trace;
> +	iter->tr = tr;
>  	mutex_init(&iter->mutex);
>  	filp->private_data = iter;
>  
> @@ -3327,6 +3328,11 @@ fail:
>  	return ret;
>  }
>  
> +static int tracing_open_pipe(struct inode *inode, struct file *filp)
> +{
> +	return __tracing_open_pipe(inode, filp, &global_trace);
> +}
> +
>  static int tracing_release_pipe(struct inode *inode, struct file *file)
>  {
>  	struct trace_iterator *iter = file->private_data;
> @@ -4008,6 +4014,122 @@ static int tracing_clock_open(struct inode *inode, struct file *file)
>  	return single_open(file, tracing_clock_show, NULL);
>  }
>  
> +#ifdef CONFIG_TRACER_MAX_TRACE
> +static int tracing_open_snapshot_pipe(struct inode *inode, struct file *filp)
> +{
> +	return __tracing_open_pipe(inode, filp, &max_tr);
> +}
> +
> +static ssize_t
> +tracing_write_snapshot_pipe(struct file *filp, const char __user *ubuf,
> +		       size_t cnt, loff_t *ppos)
> +{
> +	unsigned long val = 0;
> +	int ret;
> +
> +	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
> +	if (ret)
> +		return ret;
> +
> +	mutex_lock(&trace_types_lock);
> +
> +	/* If current tracer's use_max_tr == 0, we prevent taking a snapshot */

Here we should just allocate it first.

> +	if (!current_trace->use_max_tr) {

I also have issues with the use of 'use_max_tr' here, but I'll explain
that below.

> +		ret = -EBUSY;
> +		goto out;
> +	}
> +	if (val) {
> +		unsigned long flags;
> +		local_irq_save(flags);

Interrupts will never be disabled here. Just use
'local_irq_disable/enable()', and remove flags.

> +		update_max_tr(&global_trace, current, raw_smp_processor_id());

Also, get rid of the 'raw_' that's for critical paths that can be broken
by the debug version of the normal user (like in function tracing and
callbacks from disabling interrupts).


> +		local_irq_restore(flags);
> +	}
> +out:
> +	mutex_unlock(&trace_types_lock);
> +
> +	if (!ret) {
> +		*ppos += cnt;
> +		ret = cnt;
> +	}
> +
> +	return ret;
> +}
> +
> +static ssize_t
> +tracing_snapshot_ctrl_read(struct file *filp, char __user *ubuf,
> +			   size_t cnt, loff_t *ppos)
> +{
> +	char buf[64];
> +	int r;
> +
> +	r = sprintf(buf, "%d\n", current_trace->use_max_tr);
> +	return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
> +}
> +
> +static ssize_t
> +tracing_snapshot_ctrl_write(struct file *filp, const char __user *ubuf,
> +			    size_t cnt, loff_t *ppos)
> +{
> +	unsigned long val;
> +	int ret;
> +
> +	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
> +	if (ret)
> +		return ret;
> +
> +	val = !!val;
> +
> +	mutex_lock(&trace_types_lock);
> +	tracing_stop();
> +	arch_spin_lock(&ftrace_max_lock);
> +
> +	/* When always_use_max_tr == 1, we can't toggle use_max_tr. */
> +	if (current_trace->always_use_max_tr) {

I'll state my issue here. Don't rename use_max_tr to always_use_max_tr,
keep it as is and its use as is. Your other value should be
"allocated_snapshot", which can be set even for the use_max_tr user.


> +		ret = -EBUSY;
> +		goto out;
> +	}
> +
> +	if (!(current_trace->use_max_tr ^ val))
> +		goto out;
> +
> +	if (val) {
> +		int cpu;
> +		for_each_tracing_cpu(cpu) {
> +			ret = ring_buffer_resize(max_tr.buffer,
> +						global_trace.data[cpu]->entries,
> +						cpu);
> +			if (ret < 0)
> +				break;
> +			max_tr.data[cpu]->entries =
> +				global_trace.data[cpu]->entries;
> +		}
> +		if (ret < 0) {
> +			ring_buffer_resize(max_tr.buffer, 1,
> +					   RING_BUFFER_ALL_CPUS);
> +			set_buffer_entries(&max_tr, 1);
> +			ret = -ENOMEM;
> +			goto out;
> +		}

The above code is basically duplicated by the
__tracing_resize_ring_buffer(). As this code is not that trivial, lets
make use of a helper function and keep the bugs in one location. Have
both this function and the resize function use the same code.

In fact, the __tracing_resize_ring_buffer() could be modified to do all
the work. It will either shrink or expand as necessary. This isn't a
critical section so calling ring_buffer_resize() even when there's
nothing to do should not be an issue.

In fact, I think there's a small bug in the code that you just
duplicated. Not your bug, but you copied it.


> +		current_trace->use_max_tr = 1;
> +	} else {
> +		/* shrink buffer for saving memory */
> +		ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
> +		set_buffer_entries(&max_tr, 1);
> +		current_trace->use_max_tr = 0;
> +	}
> +out:
> +	arch_spin_unlock(&ftrace_max_lock);
> +	tracing_start();
> +	mutex_unlock(&trace_types_lock);
> +
> +	if (!ret) {
> +		*ppos += cnt;
> +		ret = cnt;
> +	}
> +	return ret;
> +}
> +#endif /* CONFIG_TRACER_MAX_TRACE */
> +
>  static const struct file_operations tracing_max_lat_fops = {
>  	.open		= tracing_open_generic,
>  	.read		= tracing_max_lat_read,
> @@ -4071,6 +4193,25 @@ static const struct file_operations trace_clock_fops = {
>  	.write		= tracing_clock_write,
>  };
>  
> +#ifdef CONFIG_TRACER_MAX_TRACE
> +static const struct file_operations tracing_snapshot_pipe_fops = {
> +	.open           = tracing_open_snapshot_pipe,
> +	.poll           = tracing_poll_pipe,
> +	.read           = tracing_read_pipe,
> +	.write          = tracing_write_snapshot_pipe,
> +	.splice_read    = tracing_splice_read_pipe,
> +	.release        = tracing_release_pipe,
> +	.llseek         = no_llseek,
> +};
> +
> +static const struct file_operations tracing_snapshot_ctrl_fops = {
> +	.open           = tracing_open_generic,
> +	.read           = tracing_snapshot_ctrl_read,
> +	.write          = tracing_snapshot_ctrl_write,
> +	.llseek         = generic_file_llseek,
> +};
> +#endif /* CONFIG_TRACER_MAX_TRACE */
> +
>  struct ftrace_buffer_info {
>  	struct trace_array	*tr;
>  	void			*spare;
> @@ -4819,6 +4960,13 @@ static __init int tracer_init_debugfs(void)
>  #ifdef CONFIG_TRACER_MAX_TRACE
>  	trace_create_file("tracing_max_latency", 0644, d_tracer,
>  			&tracing_max_latency, &tracing_max_lat_fops);
> +
> +	trace_create_file("snapshot_pipe", 0644, d_tracer,
> +			  (void *) TRACE_PIPE_ALL_CPU,
> +			  &tracing_snapshot_pipe_fops);
> +
> +	trace_create_file("snapshot_enabled", 0644, d_tracer,
> +			  NULL, &tracing_snapshot_ctrl_fops);
>  #endif
>  
>  	trace_create_file("tracing_thresh", 0644, d_tracer,
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 5aec220..602657a 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -254,6 +254,8 @@ struct tracer_flags {
>   * @print_line: callback that prints a trace
>   * @set_flag: signals one of your private flags changed (trace_options file)
>   * @flags: your private flags
> + * @use_max_tr: show whether the snapshot feature is available on this tracer
> + * @always_use_max_tr: show that max_tr is always used on this tracer
>   */
>  struct tracer {
>  	const char		*name;
> @@ -286,7 +288,13 @@ struct tracer {
>  	struct tracer		*next;
>  	struct tracer_flags	*flags;
>  	int			print_max;
> +	/* Dynamically toggled via "snapshot_enabled" debugfs file */
>  	int			use_max_tr;
> +	/*
> +	 * If this value is 1, this tracer always uses max_tr and "use_max_tr"
> +	 * can't be toggled.
> +	 */
> +	int			always_use_max_tr;

I already said how I dislike this. Leave use_max_tr alone, but add a
allocated_snapshot. Also, I hate the wasting of 4 bytes just to act like
a flag. We should probably make print_max, use_max_tr and
always_use_max_tr into 'bool's.

The print_max change should be a separate patch.

>  };
>  
> 
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index 99d20e9..37cdb75 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -614,6 +614,7 @@ static struct tracer irqsoff_tracer __read_mostly =
>  	.open           = irqsoff_trace_open,
>  	.close          = irqsoff_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,

Remove all these. Have the 'allocated_snapshot' get set when the tracer
is added, not here.

But on the whole, I like the idea of a snapshot (and this has actually
been on my todo list for some time, thanks for doing it for me ;-)

-- Steve


>  };
>  # define register_irqsoff(trace) register_tracer(&trace)
>  #else
> @@ -647,6 +648,7 @@ static struct tracer preemptoff_tracer __read_mostly =
>  	.open		= irqsoff_trace_open,
>  	.close		= irqsoff_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,
>  };
>  # define register_preemptoff(trace) register_tracer(&trace)
>  #else
> @@ -682,6 +684,7 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
>  	.open		= irqsoff_trace_open,
>  	.close		= irqsoff_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,
>  };
>  
>  # define register_preemptirqsoff(trace) register_tracer(&trace)
> diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
> index ff791ea..b6e91a0 100644
> --- a/kernel/trace/trace_sched_wakeup.c
> +++ b/kernel/trace/trace_sched_wakeup.c
> @@ -599,6 +599,7 @@ static struct tracer wakeup_tracer __read_mostly =
>  	.open		= wakeup_trace_open,
>  	.close		= wakeup_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,
>  };
>  
>  static struct tracer wakeup_rt_tracer __read_mostly =
> @@ -620,6 +621,7 @@ static struct tracer wakeup_rt_tracer __read_mostly =
>  	.open		= wakeup_trace_open,
>  	.close		= wakeup_trace_close,
>  	.use_max_tr	= 1,
> +	.always_use_max_tr	= 1,
>  };
>  
>  __init static int init_wakeup_tracer(void)



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.
  2012-07-11 18:00       ` Rob Landley
@ 2012-07-20  5:25         ` Hiraku Toyooka
  0 siblings, 0 replies; 8+ messages in thread
From: Hiraku Toyooka @ 2012-07-20  5:25 UTC (permalink / raw)
  To: Rob Landley; +Cc: linux-kernel, yrl.pp-manager.tt

Hello,
Sorry for the late reply.

(2012/07/12 3:00), Rob Landley wrote:
>>> How about:
>>>
>>> CONFIG_TRACER_MAX_TRACE makes a generic snapshot feature available to
>>> all tracers. (Some tracers, such as "irqsoff" or "wakeup", use their own
>>> internal snapshot implementation.)
>>>
>> Thanks, but I think the following one is more suitable.
>>
>> (Some tracers, such as "irqsoff" or "wakeup", already use the snapshot
>> implementation internally)
> 
> This implies that setting flag is a NOP for them, rather than "if you
> take a snapshot, they'll stomp all over the buffer".
> 

How about:
(Some tracers which record max latency, such as "irqsoff" or "wakeup",
can't use this feature, since those are already using same buffer
internally.)

>>> (Query: do you have to free the buffer after taking a snapshot below?)
>>>
>> No, we don't always need to free the buffer, although we can free it
>> when the snapshot becomes unnecessary. We can also reuse the buffer if
>> we'd like to take the next snapshot.
>> (I'll add this description.)
> 
> Actually I was worried about the lifetime rules for the buffer (when
> does it need to be disposed of, and who is responsible for doing so?)
> but it looks like ftrace only allows one trace to be going on in the
> entire system at any given time, so all this context is kernel global
> anyway...
> 

Since current ftrace supposes single user, I think it's enough now.


> Buffer allocation is a separate action because taking a snapshot is a
> low-latency operation that should never fail. Got it.
> 
> But again, why couldn't open() do one and read() (from position 0) do
> the other? And then if you wanted to take another snapshot you could
> lseek() back to 0...
> 
> *shrug* I suppose the way you have it works. Just curious.
> 

Taking a snapshot by using read() will cause a problem. Users don't
always want to read a snapshot just after taking the snapshot. Users
may want to transfer the snapshot data to persistent storage after
usage of system resources(such as CPU or I/O) becomes low. So we should
separate taking a snapshot from reading data.


>>> Why is there only _one_ snapshot buffer?
>>>
>> Because of easy reviewing, I've decided to implement step by step. So
>> this patch just provide "only one" spare buffer. However, I got some
>> feedback for multiple snapshot at LinuxCon Japan 2012. So I may extend
>> this feature.
> 
> The implementation can change all you like after the code goes in, but
> you really have to get the API right because you're going to be stuck
> supporting it <pinkypie>FOREVER</pinkypie>.
> 

I see. I'm now thinking extensible API.


>>>>> +Here is an example of using the snapshot feature.
>>>>> +
>>>>> + # echo nop > current_tracer
>>>>> + # echo 1 > snapshot_enabled
>>>>> + # echo 1 > events/sched/enable
>>>>> + [...]
>>>>> + # echo 1 > snapshot_pipe
>>> Is the order significant here? 
>> If the current_tracer is a special tracer such as irqsoff,
>> snapshot_enabled can't be set to 1. So current_tracer has to be
>> configured to appropriate tracer in the first place.
> 
> But they aren't compatible anyway? (So presumably this call failing is
> part of the enforcement mechanism by which the incompatible combination
> is disallowed?)
> 

Yes, we should have the special tracers fail to allocate a spare buffer.

> What happens if you do it the other way around? (echo 1 >
> snapshot_enabled and then echo irqsoff > current_tracer)?
> 

Then, "echo irqsoff > current_tracer" succeed, and snapshot_enabled turn
into 0. (But in this patch, it remains 1. This is a bug.)

>>> I'm guessing the reason buffer switching was described above is to
>>> explain that taking a snapshot blanks the current trace history since
>>> you switch to an empty buffer?
>>>
>> Yes. Actually, this snapshot feature only swaps the current buffer and
>> the spare buffer. Now I'm considering to call it "swap" instead of
>> "snapshot"...
> 
> I'm all for clarifying what it does, but we already have a subsystem
> called "swap" (swap partitions, swapon/swapoff) so that name isn't
> necessarily less confusing.
> 

Hmm, a good name doesn't occur to me. I'll continue to call this feature
snapshot.

>> The size of two buffers is same because this is swap.
> 
> So the new one is also buffer_size_kb?
> 

Yes.

> Out of curiosity, what happens if you go:
> 
>   echo nop > current_tracer
>   echo 1 > snapshot_enabled
>   echo 64 > buffer_size_kb
> 
> (Techncially current_tracer is still nop...)
> 

Write to buffer_size_kb affects both buffers when the snapshot is
enabled. In that case, the spare buffer is allocated and becomes same
size as the current buffer after second line. The size of both buffers
becomes 64KB after third line.


>> The reason of the separated operations is to swap two buffers
>> immediately without fail. if opening snapshot_pipe allocates a buffer,
>> we need a special userspace command to realize that.
> 
> Um, cat can open a file? I don't understand.
> 
> If you need a delay between open and read, you can do that from the
> command line with something like:
> 
>   (read -n 1 < /dev/tty; cat > ~/walrus.txt) < /path/to/thingy
> 
> I dunno if that's a good idea, I'm just trying to understand the design
> here...

Your method may be possible, but I think that is somewhat complex...


Best regards,
Hiraku Toyooka




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace.
  2012-07-11 23:26 ` Steven Rostedt
@ 2012-07-20  5:25   ` Hiraku Toyooka
  0 siblings, 0 replies; 8+ messages in thread
From: Hiraku Toyooka @ 2012-07-20  5:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: yrl.pp-manager.tt, Frederic Weisbecker, Ingo Molnar, Rob Landley,
	Masami Hiramatsu, linux-doc, linux-kernel

Hello, Steven,
(Sorry for the late reply.)

Tnank you for your comments.

(2012/07/12 8:26), Steven Rostedt wrote:
>> +Snapshot
>> +--------
>> +If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
>> +feature is available in all tracers except for the special
>> +tracers which use a snapshot inside themselves(such as "irqsoff"
>> +or "wakeup").
> 
> I find this kind of ironic, that it is only defined when one of the
> tracers that can't use it defines it.
> 

Ah, I missed that.

> Maybe we should make it a prompt config for this feature.
> 

Yes, I'll make the new config like "CONFIG_TRACER_SNAPSHOT".


>> +  snapshot_enabled:
>> +
>> +	This is used to set or display whether the snapshot is
>> +	enabled. Echo 1 into this file to prepare a spare buffer
>> +	or 0 to shrink it. So, the memory for the spare buffer
>> +	will be consumed only when this knob is set.
>> +
>> +  snapshot_pipe:
>> +
>> +	This is used to take a snapshot and to read the output
>> +	of the snapshot. Echo 1 into this file to take a
>> +	snapshot. Reads from this file is the same as the
>> +	"trace_pipe" file (described above "The File System"
>> +	section), so that both reads from the snapshot and
>> +	tracing are executable in parallel.
> 
> I don't really like the name snapshot_pipe. What about just calling it
> snapshot, and just document that it works like trace_pipe?
> 

Agreed. I'll change the name to snapshot and modify document.


> Also, rename snapshot_enabled, to snapshot_allocate. If someone echos 1
> into snapshot, it would automatically allocate the buffer (and set
> snapshot_allocate to 1). If you don't want the delay (for allocation),
> then you can do the echo 1 into snapshot_allocate first, and it would
> behave as it does here.
> 

I'll change them to that way.

> 
>> +
>> +Here is an example of using the snapshot feature.
>> +
>> + # echo nop > current_tracer
>> + # echo 1 > snapshot_enabled
>> + # echo 1 > events/sched/enable
>> + [...]
>> + # echo 1 > snapshot_pipe
>> + # cat snapshot_pipe
>> +            bash-3352  [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 pid=28 prio=0 success=1 target_cpu=006
>> +            bash-3352  [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 pid=32 prio=0 success=1 target_cpu=007
>> +            bash-3352  [001] d... 18440.883935: sched_switch: prev_comm=bash prev_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 next_prio=0
>> +[...]
> 
> BTW, why make it a pipe action anyway? As a snapshot doesn't have a
> writer to it, doing just an iterate over the snapshot would make sense,
> wouldn't it?
> 

I thought I should reuse existing code as much as possible. So I'd like
to reuse the "trace" code at first. But opening "trace" stops tracing
until it is closed. Therefore, I reused "trace_pipe" code instead of
"trace".

However, it seems that I should have made new iteration code as you
pointed out. I will make it the "trace"-like action.

> If you reply with a good rational for keeping the snapshot_pipe, then we
> should have both snapshot and snapshot_pipe, where snapshot works like
> trace and snapshot_pipe works like trace_pipe.
> 

I think only "snapshot" file is enough for the present.


>> +static ssize_t
>> +tracing_write_snapshot_pipe(struct file *filp, const char __user *ubuf,
>> +		       size_t cnt, loff_t *ppos)
>> +{
>> +	unsigned long val = 0;
>> +	int ret;
>> +
>> +	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	mutex_lock(&trace_types_lock);
>> +
>> +	/* If current tracer's use_max_tr == 0, we prevent taking a snapshot */
> 
> Here we should just allocate it first.
> 

OK. I'll add that.

>> +	if (!current_trace->use_max_tr) {
> 
> I also have issues with the use of 'use_max_tr' here, but I'll explain
> that below.
> 
>> +		ret = -EBUSY;
>> +		goto out;
>> +	}
>> +	if (val) {
>> +		unsigned long flags;
>> +		local_irq_save(flags);
> 
> Interrupts will never be disabled here. Just use
> 'local_irq_disable/enable()', and remove flags.
> 

Yes. I'll fix it.

>> +		update_max_tr(&global_trace, current, raw_smp_processor_id());
> 
> Also, get rid of the 'raw_' that's for critical paths that can be broken
> by the debug version of the normal user (like in function tracing and
> callbacks from disabling interrupts).
> 

I'll fix it.


>> +static ssize_t
>> +tracing_snapshot_ctrl_write(struct file *filp, const char __user *ubuf,
>> +			    size_t cnt, loff_t *ppos)
>> +{
>> +	unsigned long val;
>> +	int ret;
>> +
>> +	ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
>> +	if (ret)
>> +		return ret;
>> +
>> +	val = !!val;
>> +
>> +	mutex_lock(&trace_types_lock);
>> +	tracing_stop();
>> +	arch_spin_lock(&ftrace_max_lock);
>> +
>> +	/* When always_use_max_tr == 1, we can't toggle use_max_tr. */
>> +	if (current_trace->always_use_max_tr) {
> 
> I'll state my issue here. Don't rename use_max_tr to always_use_max_tr,
> keep it as is and its use as is. Your other value should be
> "allocated_snapshot", which can be set even for the use_max_tr user.
> 

Yes. I'll put use_max_tr back to its original.

> 
>> +		ret = -EBUSY;
>> +		goto out;
>> +	}
>> +
>> +	if (!(current_trace->use_max_tr ^ val))
>> +		goto out;
>> +
>> +	if (val) {
>> +		int cpu;
>> +		for_each_tracing_cpu(cpu) {
>> +			ret = ring_buffer_resize(max_tr.buffer,
>> +						global_trace.data[cpu]->entries,
>> +						cpu);
>> +			if (ret < 0)
>> +				break;
>> +			max_tr.data[cpu]->entries =
>> +				global_trace.data[cpu]->entries;
>> +		}
>> +		if (ret < 0) {
>> +			ring_buffer_resize(max_tr.buffer, 1,
>> +					   RING_BUFFER_ALL_CPUS);
>> +			set_buffer_entries(&max_tr, 1);
>> +			ret = -ENOMEM;
>> +			goto out;
>> +		}
> 
> The above code is basically duplicated by the
> __tracing_resize_ring_buffer(). As this code is not that trivial, lets
> make use of a helper function and keep the bugs in one location. Have
> both this function and the resize function use the same code.
> 
> In fact, the __tracing_resize_ring_buffer() could be modified to do all
> the work. It will either shrink or expand as necessary. This isn't a
> critical section so calling ring_buffer_resize() even when there's
> nothing to do should not be an issue.
> 

OK. I think I can make the common helper function.

> In fact, I think there's a small bug in the code that you just
> duplicated. Not your bug, but you copied it.
> 

Oh, I didn't notice that...
Is it related to return value?


>>  struct tracer {
>>  	const char		*name;
>> @@ -286,7 +288,13 @@ struct tracer {
>>  	struct tracer		*next;
>>  	struct tracer_flags	*flags;
>>  	int			print_max;
>> +	/* Dynamically toggled via "snapshot_enabled" debugfs file */
>>  	int			use_max_tr;
>> +	/*
>> +	 * If this value is 1, this tracer always uses max_tr and "use_max_tr"
>> +	 * can't be toggled.
>> +	 */
>> +	int			always_use_max_tr;
> 
> I already said how I dislike this. Leave use_max_tr alone, but add a
> allocated_snapshot. Also, I hate the wasting of 4 bytes just to act like
> a flag. We should probably make print_max, use_max_tr and
> always_use_max_tr into 'bool's.
> 
> The print_max change should be a separate patch.
> 

I see.
By the way, I noticed that struct tracer's values become invisible when
the current_tracer is changed. This may be somewhat problematic. I'm now
considering we should put the allocated_snapshot value into global
trace_flags as a flag and access this value by
options/snapshot_allocate. What do you think of this?


>>  };
>>  
>>
>> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
>> index 99d20e9..37cdb75 100644
>> --- a/kernel/trace/trace_irqsoff.c
>> +++ b/kernel/trace/trace_irqsoff.c
>> @@ -614,6 +614,7 @@ static struct tracer irqsoff_tracer __read_mostly =
>>  	.open           = irqsoff_trace_open,
>>  	.close          = irqsoff_trace_close,
>>  	.use_max_tr	= 1,
>> +	.always_use_max_tr	= 1,
> 
> Remove all these. Have the 'allocated_snapshot' get set when the tracer
> is added, not here.
> 

OK, I'll remove them.


> But on the whole, I like the idea of a snapshot (and this has actually
> been on my todo list for some time, thanks for doing it for me ;-)
> 

Thank you for your review!

Best regards,
Hiraku Toyooka


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-07-20  5:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-05 12:06 [RFC PATCH -tip ] tracing: make a snapshot feature available from userspace Hiraku Toyooka
2012-07-04 10:47 ` Hiraku Toyooka
2012-07-05  1:01   ` Rob Landley
2012-07-11  5:39     ` Hiraku Toyooka
2012-07-11 18:00       ` Rob Landley
2012-07-20  5:25         ` Hiraku Toyooka
2012-07-11 23:26 ` Steven Rostedt
2012-07-20  5:25   ` Hiraku Toyooka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.