linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] TSC trace_clock
@ 2012-09-12  2:41 David Sharp
  2012-09-12  2:41 ` [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change David Sharp
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: David Sharp @ 2012-09-12  2:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

As discussed at Plumbers, here are my patches to add a TSC clock to ftrace.
Also noticeable is that the formatting patch improves the output of the
"counter" clock.

David Sharp (3):
  tracing,x86: add a TSC trace_clock; reset buffer on clock change
  tracing: reset ring buffer when changing trace_clocks
  tracing: format non-nanosec times from tsc clock without a decimal
    point.

 include/linux/ftrace_event.h |    6 +++
 include/linux/trace_clock.h  |    3 ++
 kernel/trace/trace.c         |   26 ++++++++++++--
 kernel/trace/trace.h         |    4 --
 kernel/trace/trace_clock.c   |   16 ++++++++
 kernel/trace/trace_output.c  |   82 +++++++++++++++++++++++++++++-------------
 6 files changed, 105 insertions(+), 32 deletions(-)

-- 
1.7.7.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change
  2012-09-12  2:41 [PATCH 0/3] TSC trace_clock David Sharp
@ 2012-09-12  2:41 ` David Sharp
  2012-09-12  3:48   ` Steven Rostedt
  2012-09-12  2:41 ` [PATCH 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 37+ messages in thread
From: David Sharp @ 2012-09-12  2:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
---
 include/linux/trace_clock.h |    3 +++
 kernel/trace/trace.c        |    3 +++
 kernel/trace/trace_clock.c  |   16 ++++++++++++++++
 3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
index 4eb4902..b86c7363 100644
--- a/include/linux/trace_clock.h
+++ b/include/linux/trace_clock.h
@@ -16,5 +16,8 @@ extern u64 notrace trace_clock_local(void);
 extern u64 notrace trace_clock(void);
 extern u64 notrace trace_clock_global(void);
 extern u64 notrace trace_clock_counter(void);
+#ifdef CONFIG_X86_TSC
+extern u64 notrace trace_clock_tsc(void);
+#endif
 
 #endif /* _LINUX_TRACE_CLOCK_H */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5c38c81..dc1f1fa 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -480,6 +480,9 @@ static struct {
 	{ trace_clock_local,	"local" },
 	{ trace_clock_global,	"global" },
 	{ trace_clock_counter,	"counter" },
+#ifdef CONFIG_X86_TSC
+	{ trace_clock_tsc,	"tsc" },
+#endif
 };
 
 int trace_clock_id;
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 3947835..1770737 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -125,3 +125,19 @@ u64 notrace trace_clock_counter(void)
 {
 	return atomic64_add_return(1, &trace_counter);
 }
+
+#ifdef CONFIG_X86_TSC
+/*
+ * trace_clock_tsc(): A clock that is just the cycle counter.
+ *
+ * Unlike the other clocks, this is not in nanoseconds.
+ */
+u64 notrace trace_clock_tsc(void)
+{
+	u64 ret;
+	rdtsc_barrier();
+	rdtscll(ret);
+
+	return ret;
+}
+#endif
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/3] tracing: reset ring buffer when changing trace_clocks
  2012-09-12  2:41 [PATCH 0/3] TSC trace_clock David Sharp
  2012-09-12  2:41 ` [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change David Sharp
@ 2012-09-12  2:41 ` David Sharp
  2012-09-12  2:41 ` [PATCH 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
  2012-09-12 23:31 ` [PATCH v2 0/3] TSC trace_clock David Sharp
  3 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-12  2:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

Because the "tsc" clock isn't in nanoseconds, the ring buffer must be
reset when changing clocks so that incomparable timestamps don't end up
in the same trace.

Tested: Confirmed switching clocks resets the trace buffer.
Signed-off-by: David Sharp <dhsharp@google.com>
---
 kernel/trace/trace.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index dc1f1fa..6911f35 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4014,6 +4014,14 @@ static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf,
 	if (max_tr.buffer)
 		ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func);
 
+	/*
+	 * New clock may not be consistent with the previous clock.
+	 * Reset the buffer so that it doesn't have incomparable timestamps.
+	 */
+	tracing_reset_online_cpus(&global_trace);
+	if (max_tr.buffer)
+		tracing_reset_online_cpus(&max_tr);
+
 	mutex_unlock(&trace_types_lock);
 
 	*fpos += cnt;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-12  2:41 [PATCH 0/3] TSC trace_clock David Sharp
  2012-09-12  2:41 ` [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change David Sharp
  2012-09-12  2:41 ` [PATCH 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
@ 2012-09-12  2:41 ` David Sharp
  2012-09-12 23:31 ` [PATCH v2 0/3] TSC trace_clock David Sharp
  3 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-12  2:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
          <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
           sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
          <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
           sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
---
 include/linux/ftrace_event.h |    6 +++
 kernel/trace/trace.c         |   17 +++++++--
 kernel/trace/trace.h         |    4 --
 kernel/trace/trace_output.c  |   82 +++++++++++++++++++++++++++++-------------
 4 files changed, 76 insertions(+), 33 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
 	cpumask_var_t		started;
 };
 
+enum trace_iter_flags {
+	TRACE_FILE_LAT_FMT	= 1,
+	TRACE_FILE_ANNOTATE	= 2,
+	TRACE_FILE_TIME_IN_NS	= 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6911f35..4b78ce2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,12 +476,13 @@ static const char *trace_options[] = {
 static struct {
 	u64 (*func)(void);
 	const char *name;
+	int in_ns; /* is this clock in nanoseconds? */
 } trace_clocks[] = {
-	{ trace_clock_local,	"local" },
-	{ trace_clock_global,	"global" },
-	{ trace_clock_counter,	"counter" },
+	{ trace_clock_local,	"local",	1 },
+	{ trace_clock_global,	"global",	1 },
+	{ trace_clock_counter,	"counter",	0 },
 #ifdef CONFIG_X86_TSC
-	{ trace_clock_tsc,	"tsc" },
+	{ trace_clock_tsc,	"tsc",		0 },
 #endif
 };
 
@@ -2427,6 +2428,10 @@ __tracing_open(struct inode *inode, struct file *file)
 	if (ring_buffer_overruns(iter->tr->buffer))
 		iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	/* stop the trace while dumping */
 	tracing_stop();
 
@@ -3326,6 +3331,10 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_LATENCY_FMT)
 		iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	iter->cpu_file = cpu_file;
 	iter->tr = &global_trace;
 	mutex_init(&iter->mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -409,10 +409,6 @@ void tracing_start_sched_switch_record(void);
 int register_tracer(struct tracer *type);
 void unregister_tracer(struct tracer *type);
 int is_tracing_stopped(void);
-enum trace_file_type {
-	TRACE_FILE_LAT_FMT	= 1,
-	TRACE_FILE_ANNOTATE	= 2,
-};
 
 extern cpumask_var_t __read_mostly tracing_buffer_mask;
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 123b189..c86d62a 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -610,24 +610,57 @@ lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
 	return trace_print_lat_fmt(s, entry);
 }
 
-static unsigned long preempt_mark_thresh = 100;
+static unsigned long preempt_mark_thresh_us = 100;
+/* roughly the same at 2.0GHz: */
+static unsigned long preempt_mark_thresh_cycles = 200000;
 
 static int
-lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
-		    unsigned long rel_usecs)
+lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
 {
-	return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
-				rel_usecs > preempt_mark_thresh ? '!' :
-				  rel_usecs > 1 ? '+' : ' ');
+	int ret;
+	struct trace_seq *s = &iter->seq;
+	unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
+	unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
+	unsigned long long abs_ts = iter->ts - iter->tr->time_start;
+	unsigned long long rel_ts = next_ts - iter->ts;
+	unsigned long mark_thresh;
+
+	if (in_ns) {
+		abs_ts = ns2usecs(abs_ts);
+		rel_ts = ns2usecs(rel_ts);
+		mark_thresh = preempt_mark_thresh_us;
+	} else
+		mark_thresh = preempt_mark_thresh_cycles;
+
+	if (verbose && in_ns) {
+		ret = trace_seq_printf(
+				s, "[%08llx] %lld.%03lldms (+%lld.%03lldms): ",
+				ns2usecs(iter->ts),
+				abs_ts / USEC_PER_MSEC,
+				abs_ts % USEC_PER_MSEC,
+				rel_ts / USEC_PER_MSEC,
+				rel_ts % USEC_PER_MSEC);
+	} else if (verbose && !in_ns) {
+		ret = trace_seq_printf(
+				s, "[%016llx] %lld (+%lld): ",
+				iter->ts, abs_ts, rel_ts);
+	} else { /* !verbose */
+		ret = trace_seq_printf(
+				s, " %4lld%s%c: ",
+				abs_ts,
+				in_ns ? "us" : "",
+				rel_ts > mark_thresh ? '!' :
+				  rel_ts > 1 ? '+' : ' ');
+	}
+	return ret;
 }
 
 int trace_print_context(struct trace_iterator *iter)
 {
 	struct trace_seq *s = &iter->seq;
 	struct trace_entry *entry = iter->ent;
-	unsigned long long t = ns2usecs(iter->ts);
-	unsigned long usec_rem = do_div(t, USEC_PER_SEC);
-	unsigned long secs = (unsigned long)t;
+	unsigned long long t;
+	unsigned long secs, usec_rem;
 	char comm[TASK_COMM_LEN];
 	int ret;
 
@@ -644,8 +677,13 @@ int trace_print_context(struct trace_iterator *iter)
 			return 0;
 	}
 
-	return trace_seq_printf(s, " %5lu.%06lu: ",
-				secs, usec_rem);
+	if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
+		t = ns2usecs(iter->ts);
+		usec_rem = do_div(t, USEC_PER_SEC);
+		secs = (unsigned long)t;
+		return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
+	} else
+		return trace_seq_printf(s, "%12llu: ", iter->ts);
 }
 
 int trace_print_lat_context(struct trace_iterator *iter)
@@ -659,36 +697,30 @@ int trace_print_lat_context(struct trace_iterator *iter)
 			   *next_entry = trace_find_next_entry(iter, NULL,
 							       &next_ts);
 	unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
-	unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
-	unsigned long rel_usecs;
+
 
 	/* Restore the original ent_size */
 	iter->ent_size = ent_size;
 
 	if (!next_entry)
 		next_ts = iter->ts;
-	rel_usecs = ns2usecs(next_ts - iter->ts);
 
 	if (verbose) {
 		char comm[TASK_COMM_LEN];
 
 		trace_find_cmdline(entry->pid, comm);
 
-		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
-				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
-				       entry->pid, iter->cpu, entry->flags,
-				       entry->preempt_count, iter->idx,
-				       ns2usecs(iter->ts),
-				       abs_usecs / USEC_PER_MSEC,
-				       abs_usecs % USEC_PER_MSEC,
-				       rel_usecs / USEC_PER_MSEC,
-				       rel_usecs % USEC_PER_MSEC);
+		ret = trace_seq_printf(
+				s, "%16s %5d %3d %d %08x %08lx ",
+				comm, entry->pid, iter->cpu, entry->flags,
+				entry->preempt_count, iter->idx);
 	} else {
 		ret = lat_print_generic(s, entry, iter->cpu);
-		if (ret)
-			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
 	}
 
+	if (ret)
+		ret = lat_print_timestamp(iter, next_ts);
+
 	return ret;
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change
  2012-09-12  2:41 ` [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change David Sharp
@ 2012-09-12  3:48   ` Steven Rostedt
  2012-09-12 18:48     ` David Sharp
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2012-09-12  3:48 UTC (permalink / raw)
  To: David Sharp; +Cc: linux-kernel, Masami Hiramatsu

On Tue, 2012-09-11 at 19:41 -0700, David Sharp wrote:
> In order to promote interoperability between userspace tracers and ftrace,
> add a trace_clock that reports raw TSC values which will then be recorded
> in the ring buffer. Userspace tracers that also record TSCs are then on
> exactly the same time base as the kernel and events can be unambiguously
> interlaced.
> 
> Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
> timestamp values.
> 
> Google-Bug-Id: 6980623
> Signed-off-by: David Sharp <dhsharp@google.com>
> ---
>  include/linux/trace_clock.h |    3 +++
>  kernel/trace/trace.c        |    3 +++
>  kernel/trace/trace_clock.c  |   16 ++++++++++++++++
>  3 files changed, 22 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
> index 4eb4902..b86c7363 100644
> --- a/include/linux/trace_clock.h
> +++ b/include/linux/trace_clock.h
> @@ -16,5 +16,8 @@ extern u64 notrace trace_clock_local(void);
>  extern u64 notrace trace_clock(void);
>  extern u64 notrace trace_clock_global(void);
>  extern u64 notrace trace_clock_counter(void);
> +#ifdef CONFIG_X86_TSC
> +extern u64 notrace trace_clock_tsc(void);
> +#endif
>  
>  #endif /* _LINUX_TRACE_CLOCK_H */
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5c38c81..dc1f1fa 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -480,6 +480,9 @@ static struct {
>  	{ trace_clock_local,	"local" },
>  	{ trace_clock_global,	"global" },
>  	{ trace_clock_counter,	"counter" },
> +#ifdef CONFIG_X86_TSC
> +	{ trace_clock_tsc,	"tsc" },
> +#endif
>  };

I really hate adding arch defs in generic code. Perhaps what we could do
is add a define here. Something like:

#ifndef ARCH_TRACE_CLOCKS
# define ARCH_TRACE_CLOCKS
#endif

[...]
	{ trace_clock_counter, "counter" },
	ARCH_TRACE_CLOCKS
};

and have ARCH_TRACE_CLOCKS defined somewhere in an arch specific header.
Not sure what header we could use though :-/

That is, in a header have:

#define ARCH_TRACE_CLOCKS \
	{ trace_clock_x86_tsc,	"tsc" },

and also define trace_clock_x86_tsc in arch/x86/kernel...

-- Steve


>  
>  int trace_clock_id;
> diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
> index 3947835..1770737 100644
> --- a/kernel/trace/trace_clock.c
> +++ b/kernel/trace/trace_clock.c
> @@ -125,3 +125,19 @@ u64 notrace trace_clock_counter(void)
>  {
>  	return atomic64_add_return(1, &trace_counter);
>  }
> +
> +#ifdef CONFIG_X86_TSC
> +/*
> + * trace_clock_tsc(): A clock that is just the cycle counter.
> + *
> + * Unlike the other clocks, this is not in nanoseconds.
> + */
> +u64 notrace trace_clock_tsc(void)
> +{
> +	u64 ret;
> +	rdtsc_barrier();
> +	rdtscll(ret);
> +
> +	return ret;
> +}
> +#endif



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change
  2012-09-12  3:48   ` Steven Rostedt
@ 2012-09-12 18:48     ` David Sharp
  0 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-12 18:48 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Masami Hiramatsu

On Tue, Sep 11, 2012 at 8:48 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2012-09-11 at 19:41 -0700, David Sharp wrote:
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index 5c38c81..dc1f1fa 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -480,6 +480,9 @@ static struct {
>>       { trace_clock_local,    "local" },
>>       { trace_clock_global,   "global" },
>>       { trace_clock_counter,  "counter" },
>> +#ifdef CONFIG_X86_TSC
>> +     { trace_clock_tsc,      "tsc" },
>> +#endif
>>  };
>
> I really hate adding arch defs in generic code. Perhaps what we could do
> is add a define here. Something like:
>
> #ifndef ARCH_TRACE_CLOCKS
> # define ARCH_TRACE_CLOCKS
> #endif
>
> [...]
>         { trace_clock_counter, "counter" },
>         ARCH_TRACE_CLOCKS
> };
>
> and have ARCH_TRACE_CLOCKS defined somewhere in an arch specific header.
> Not sure what header we could use though :-/
>
> That is, in a header have:
>
> #define ARCH_TRACE_CLOCKS \
>         { trace_clock_x86_tsc,  "tsc" },
>
> and also define trace_clock_x86_tsc in arch/x86/kernel...

Good point. I'll figure out somewhere better to put it...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 0/3] TSC trace_clock
  2012-09-12  2:41 [PATCH 0/3] TSC trace_clock David Sharp
                   ` (2 preceding siblings ...)
  2012-09-12  2:41 ` [PATCH 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
@ 2012-09-12 23:31 ` David Sharp
  2012-09-12 23:31   ` [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change David Sharp
                     ` (3 more replies)
  3 siblings, 4 replies; 37+ messages in thread
From: David Sharp @ 2012-09-12 23:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

Updated to move arch-specific bits out of generic code. I did this by adding a
new file in asm-generic because I couldn't think of a better place to put it,
but I'm open to suggestions.

David Sharp (3):
  tracing,x86: add a TSC trace_clock
  tracing: reset ring buffer when changing trace_clocks
  tracing: format non-nanosec times from tsc clock without a decimal
    point.

 arch/x86/include/asm/trace_clock.h |   16 +++++++
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/trace_clock.c      |   20 +++++++++
 include/asm-generic/trace_clock.h  |   15 +++++++
 include/linux/ftrace_event.h       |    6 +++
 include/linux/trace_clock.h        |    2 +
 kernel/trace/trace.c               |   24 +++++++++-
 kernel/trace/trace.h               |    4 --
 kernel/trace/trace_output.c        |   82 +++++++++++++++++++++++++-----------
 9 files changed, 138 insertions(+), 32 deletions(-)
 create mode 100644 arch/x86/include/asm/trace_clock.h
 create mode 100644 arch/x86/kernel/trace_clock.c
 create mode 100644 include/asm-generic/trace_clock.h

-- 
1.7.7.3


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change
  2012-09-12 23:31 ` [PATCH v2 0/3] TSC trace_clock David Sharp
@ 2012-09-12 23:31   ` David Sharp
  2012-09-12 23:31   ` [PATCH v2 1/3] tracing,x86: add a TSC trace_clock David Sharp
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-12 23:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
---
 include/linux/trace_clock.h |    3 +++
 kernel/trace/trace.c        |    3 +++
 kernel/trace/trace_clock.c  |   16 ++++++++++++++++
 3 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
index 4eb4902..b86c7363 100644
--- a/include/linux/trace_clock.h
+++ b/include/linux/trace_clock.h
@@ -16,5 +16,8 @@ extern u64 notrace trace_clock_local(void);
 extern u64 notrace trace_clock(void);
 extern u64 notrace trace_clock_global(void);
 extern u64 notrace trace_clock_counter(void);
+#ifdef CONFIG_X86_TSC
+extern u64 notrace trace_clock_tsc(void);
+#endif
 
 #endif /* _LINUX_TRACE_CLOCK_H */
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5c38c81..dc1f1fa 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -480,6 +480,9 @@ static struct {
 	{ trace_clock_local,	"local" },
 	{ trace_clock_global,	"global" },
 	{ trace_clock_counter,	"counter" },
+#ifdef CONFIG_X86_TSC
+	{ trace_clock_tsc,	"tsc" },
+#endif
 };
 
 int trace_clock_id;
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 3947835..1770737 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -125,3 +125,19 @@ u64 notrace trace_clock_counter(void)
 {
 	return atomic64_add_return(1, &trace_counter);
 }
+
+#ifdef CONFIG_X86_TSC
+/*
+ * trace_clock_tsc(): A clock that is just the cycle counter.
+ *
+ * Unlike the other clocks, this is not in nanoseconds.
+ */
+u64 notrace trace_clock_tsc(void)
+{
+	u64 ret;
+	rdtsc_barrier();
+	rdtscll(ret);
+
+	return ret;
+}
+#endif
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 1/3] tracing,x86: add a TSC trace_clock
  2012-09-12 23:31 ` [PATCH v2 0/3] TSC trace_clock David Sharp
  2012-09-12 23:31   ` [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change David Sharp
@ 2012-09-12 23:31   ` David Sharp
  2012-09-13  1:15     ` Masami Hiramatsu
  2012-09-13  1:23     ` Steven Rostedt
  2012-09-12 23:31   ` [PATCH v2 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
  2012-09-12 23:32   ` [PATCH v2 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
  3 siblings, 2 replies; 37+ messages in thread
From: David Sharp @ 2012-09-12 23:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.

Google-Bug-Id: 6980623
---
 arch/x86/include/asm/trace_clock.h |   16 ++++++++++++++++
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/trace_clock.c      |   20 ++++++++++++++++++++
 include/asm-generic/trace_clock.h  |   15 +++++++++++++++
 include/linux/trace_clock.h        |    2 ++
 kernel/trace/trace.c               |    1 +
 6 files changed, 55 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/trace_clock.h
 create mode 100644 arch/x86/kernel/trace_clock.c
 create mode 100644 include/asm-generic/trace_clock.h

diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
new file mode 100644
index 0000000..0b1f391
--- /dev/null
+++ b/arch/x86/include/asm/trace_clock.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_TRACE_CLOCK_H
+#define _ASM_X86_TRACE_CLOCK_H
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+#ifdef CONFIG_X86_TSC
+
+extern u64 notrace trace_clock_x86_tsc(void);
+
+# define ARCH_TRACE_CLOCKS \
+	{ trace_clock_x86_tsc,	"tsc" },
+
+#endif
+
+#endif  /* _ASM_X86_TRACE_CLOCK_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8215e56..0ee9344 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
+obj-$(CONFIG_X86_TSC)		+= trace_clock.o
 obj-$(CONFIG_KEXEC)		+= machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC)		+= relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_CRASH_DUMP)	+= crash_dump_$(BITS).o
diff --git a/arch/x86/kernel/trace_clock.c b/arch/x86/kernel/trace_clock.c
new file mode 100644
index 0000000..b8959f8
--- /dev/null
+++ b/arch/x86/kernel/trace_clock.c
@@ -0,0 +1,20 @@
+/*
+ * X86 trace clocks
+ */
+#include <asm/trace_clock.h>
+#include <asm/barrier.h>
+#include <asm/msr.h>
+
+/*
+ * trace_clock_x86_tsc(): A clock that is just the cycle counter.
+ *
+ * Unlike the other clocks, this is not in nanoseconds.
+ */
+u64 notrace trace_clock_x86_tsc(void)
+{
+	u64 ret;
+	rdtsc_barrier();
+	rdtscll(ret);
+
+	return ret;
+}
diff --git a/include/asm-generic/trace_clock.h b/include/asm-generic/trace_clock.h
new file mode 100644
index 0000000..648cdcd
--- /dev/null
+++ b/include/asm-generic/trace_clock.h
@@ -0,0 +1,15 @@
+/*
+ * Arch-specific trace clocks.
+ */
+#ifndef _ASM_GENERIC_TRACE_CLOCK_H
+#define _ASM_GENERIC_TRACE_CLOCK_H
+
+/*
+ * Additional trace clocks added to the trace_clocks
+ * array in kernel/trace/trace.c
+ */
+#ifndef ARCH_TRACE_CLOCKS
+# define ARCH_TRACE_CLOCKS
+#endif
+
+#endif  /* _ASM_GENERIC_TRACE_CLOCK_H */
diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
index 4eb4902..d563f37 100644
--- a/include/linux/trace_clock.h
+++ b/include/linux/trace_clock.h
@@ -12,6 +12,8 @@
 #include <linux/compiler.h>
 #include <linux/types.h>
 
+#include <asm/trace_clock.h>
+
 extern u64 notrace trace_clock_local(void);
 extern u64 notrace trace_clock(void);
 extern u64 notrace trace_clock_global(void);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5c38c81..92fb08e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -480,6 +480,7 @@ static struct {
 	{ trace_clock_local,	"local" },
 	{ trace_clock_global,	"global" },
 	{ trace_clock_counter,	"counter" },
+	ARCH_TRACE_CLOCKS
 };
 
 int trace_clock_id;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 2/3] tracing: reset ring buffer when changing trace_clocks
  2012-09-12 23:31 ` [PATCH v2 0/3] TSC trace_clock David Sharp
  2012-09-12 23:31   ` [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change David Sharp
  2012-09-12 23:31   ` [PATCH v2 1/3] tracing,x86: add a TSC trace_clock David Sharp
@ 2012-09-12 23:31   ` David Sharp
  2012-09-12 23:32   ` [PATCH v2 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
  3 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-12 23:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

Because the "tsc" clock isn't in nanoseconds, the ring buffer must be
reset when changing clocks so that incomparable timestamps don't end up
in the same trace.

Tested: Confirmed switching clocks resets the trace buffer.
---
 kernel/trace/trace.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 92fb08e..4e26df3 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4012,6 +4012,14 @@ static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf,
 	if (max_tr.buffer)
 		ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func);
 
+	/*
+	 * New clock may not be consistent with the previous clock.
+	 * Reset the buffer so that it doesn't have incomparable timestamps.
+	 */
+	tracing_reset_online_cpus(&global_trace);
+	if (max_tr.buffer)
+		tracing_reset_online_cpus(&max_tr);
+
 	mutex_unlock(&trace_types_lock);
 
 	*fpos += cnt;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-12 23:31 ` [PATCH v2 0/3] TSC trace_clock David Sharp
                     ` (2 preceding siblings ...)
  2012-09-12 23:31   ` [PATCH v2 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
@ 2012-09-12 23:32   ` David Sharp
  3 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-12 23:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
          <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
           sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
          <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
           sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

v2:
Move arch-specific bits out of generic code.

Google-Bug-Id: 6980623
---
 arch/x86/include/asm/trace_clock.h |    2 +-
 include/linux/ftrace_event.h       |    6 +++
 kernel/trace/trace.c               |   15 +++++-
 kernel/trace/trace.h               |    4 --
 kernel/trace/trace_output.c        |   82 +++++++++++++++++++++++++-----------
 5 files changed, 76 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
index 0b1f391..4af2eb7 100644
--- a/arch/x86/include/asm/trace_clock.h
+++ b/arch/x86/include/asm/trace_clock.h
@@ -9,7 +9,7 @@
 extern u64 notrace trace_clock_x86_tsc(void);
 
 # define ARCH_TRACE_CLOCKS \
-	{ trace_clock_x86_tsc,	"tsc" },
+	{ trace_clock_x86_tsc,	"tsc",		.in_ns = 0 },
 
 #endif
 
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
 	cpumask_var_t		started;
 };
 
+enum trace_iter_flags {
+	TRACE_FILE_LAT_FMT	= 1,
+	TRACE_FILE_ANNOTATE	= 2,
+	TRACE_FILE_TIME_IN_NS	= 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4e26df3..3fe4c5b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,10 +476,11 @@ static const char *trace_options[] = {
 static struct {
 	u64 (*func)(void);
 	const char *name;
+	int in_ns; /* is this clock in nanoseconds? */
 } trace_clocks[] = {
-	{ trace_clock_local,	"local" },
-	{ trace_clock_global,	"global" },
-	{ trace_clock_counter,	"counter" },
+	{ trace_clock_local,	"local",	1 },
+	{ trace_clock_global,	"global",	1 },
+	{ trace_clock_counter,	"counter",	0 },
 	ARCH_TRACE_CLOCKS
 };
 
@@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
 	if (ring_buffer_overruns(iter->tr->buffer))
 		iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	/* stop the trace while dumping */
 	tracing_stop();
 
@@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_LATENCY_FMT)
 		iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	iter->cpu_file = cpu_file;
 	iter->tr = &global_trace;
 	mutex_init(&iter->mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -409,10 +409,6 @@ void tracing_start_sched_switch_record(void);
 int register_tracer(struct tracer *type);
 void unregister_tracer(struct tracer *type);
 int is_tracing_stopped(void);
-enum trace_file_type {
-	TRACE_FILE_LAT_FMT	= 1,
-	TRACE_FILE_ANNOTATE	= 2,
-};
 
 extern cpumask_var_t __read_mostly tracing_buffer_mask;
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 123b189..c86d62a 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -610,24 +610,57 @@ lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
 	return trace_print_lat_fmt(s, entry);
 }
 
-static unsigned long preempt_mark_thresh = 100;
+static unsigned long preempt_mark_thresh_us = 100;
+/* roughly the same at 2.0GHz: */
+static unsigned long preempt_mark_thresh_cycles = 200000;
 
 static int
-lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
-		    unsigned long rel_usecs)
+lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
 {
-	return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
-				rel_usecs > preempt_mark_thresh ? '!' :
-				  rel_usecs > 1 ? '+' : ' ');
+	int ret;
+	struct trace_seq *s = &iter->seq;
+	unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
+	unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
+	unsigned long long abs_ts = iter->ts - iter->tr->time_start;
+	unsigned long long rel_ts = next_ts - iter->ts;
+	unsigned long mark_thresh;
+
+	if (in_ns) {
+		abs_ts = ns2usecs(abs_ts);
+		rel_ts = ns2usecs(rel_ts);
+		mark_thresh = preempt_mark_thresh_us;
+	} else
+		mark_thresh = preempt_mark_thresh_cycles;
+
+	if (verbose && in_ns) {
+		ret = trace_seq_printf(
+				s, "[%08llx] %lld.%03lldms (+%lld.%03lldms): ",
+				ns2usecs(iter->ts),
+				abs_ts / USEC_PER_MSEC,
+				abs_ts % USEC_PER_MSEC,
+				rel_ts / USEC_PER_MSEC,
+				rel_ts % USEC_PER_MSEC);
+	} else if (verbose && !in_ns) {
+		ret = trace_seq_printf(
+				s, "[%016llx] %lld (+%lld): ",
+				iter->ts, abs_ts, rel_ts);
+	} else { /* !verbose */
+		ret = trace_seq_printf(
+				s, " %4lld%s%c: ",
+				abs_ts,
+				in_ns ? "us" : "",
+				rel_ts > mark_thresh ? '!' :
+				  rel_ts > 1 ? '+' : ' ');
+	}
+	return ret;
 }
 
 int trace_print_context(struct trace_iterator *iter)
 {
 	struct trace_seq *s = &iter->seq;
 	struct trace_entry *entry = iter->ent;
-	unsigned long long t = ns2usecs(iter->ts);
-	unsigned long usec_rem = do_div(t, USEC_PER_SEC);
-	unsigned long secs = (unsigned long)t;
+	unsigned long long t;
+	unsigned long secs, usec_rem;
 	char comm[TASK_COMM_LEN];
 	int ret;
 
@@ -644,8 +677,13 @@ int trace_print_context(struct trace_iterator *iter)
 			return 0;
 	}
 
-	return trace_seq_printf(s, " %5lu.%06lu: ",
-				secs, usec_rem);
+	if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
+		t = ns2usecs(iter->ts);
+		usec_rem = do_div(t, USEC_PER_SEC);
+		secs = (unsigned long)t;
+		return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
+	} else
+		return trace_seq_printf(s, "%12llu: ", iter->ts);
 }
 
 int trace_print_lat_context(struct trace_iterator *iter)
@@ -659,36 +697,30 @@ int trace_print_lat_context(struct trace_iterator *iter)
 			   *next_entry = trace_find_next_entry(iter, NULL,
 							       &next_ts);
 	unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
-	unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
-	unsigned long rel_usecs;
+
 
 	/* Restore the original ent_size */
 	iter->ent_size = ent_size;
 
 	if (!next_entry)
 		next_ts = iter->ts;
-	rel_usecs = ns2usecs(next_ts - iter->ts);
 
 	if (verbose) {
 		char comm[TASK_COMM_LEN];
 
 		trace_find_cmdline(entry->pid, comm);
 
-		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
-				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
-				       entry->pid, iter->cpu, entry->flags,
-				       entry->preempt_count, iter->idx,
-				       ns2usecs(iter->ts),
-				       abs_usecs / USEC_PER_MSEC,
-				       abs_usecs % USEC_PER_MSEC,
-				       rel_usecs / USEC_PER_MSEC,
-				       rel_usecs % USEC_PER_MSEC);
+		ret = trace_seq_printf(
+				s, "%16s %5d %3d %d %08x %08lx ",
+				comm, entry->pid, iter->cpu, entry->flags,
+				entry->preempt_count, iter->idx);
 	} else {
 		ret = lat_print_generic(s, entry, iter->cpu);
-		if (ret)
-			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
 	}
 
+	if (ret)
+		ret = lat_print_timestamp(iter, next_ts);
+
 	return ret;
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] tracing,x86: add a TSC trace_clock
  2012-09-12 23:31   ` [PATCH v2 1/3] tracing,x86: add a TSC trace_clock David Sharp
@ 2012-09-13  1:15     ` Masami Hiramatsu
  2012-09-13  1:23     ` Steven Rostedt
  1 sibling, 0 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2012-09-13  1:15 UTC (permalink / raw)
  To: David Sharp; +Cc: linux-kernel, Steven Rostedt

(2012/09/13 8:31), David Sharp wrote:
> In order to promote interoperability between userspace tracers and ftrace,
> add a trace_clock that reports raw TSC values which will then be recorded
> in the ring buffer. Userspace tracers that also record TSCs are then on
> exactly the same time base as the kernel and events can be unambiguously
> interlaced.
> 
> Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
> timestamp values.

Yeah, this is what I need!

BTW, could you add your signed-off-by signature to your patches?

Thank you,

> 
> v2:
> Move arch-specific bits out of generic code.
> 
> Google-Bug-Id: 6980623
> ---
>  arch/x86/include/asm/trace_clock.h |   16 ++++++++++++++++
>  arch/x86/kernel/Makefile           |    1 +
>  arch/x86/kernel/trace_clock.c      |   20 ++++++++++++++++++++
>  include/asm-generic/trace_clock.h  |   15 +++++++++++++++
>  include/linux/trace_clock.h        |    2 ++
>  kernel/trace/trace.c               |    1 +
>  6 files changed, 55 insertions(+), 0 deletions(-)
>  create mode 100644 arch/x86/include/asm/trace_clock.h
>  create mode 100644 arch/x86/kernel/trace_clock.c
>  create mode 100644 include/asm-generic/trace_clock.h
> 
> diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
> new file mode 100644
> index 0000000..0b1f391
> --- /dev/null
> +++ b/arch/x86/include/asm/trace_clock.h
> @@ -0,0 +1,16 @@
> +#ifndef _ASM_X86_TRACE_CLOCK_H
> +#define _ASM_X86_TRACE_CLOCK_H
> +
> +#include <linux/compiler.h>
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_X86_TSC
> +
> +extern u64 notrace trace_clock_x86_tsc(void);
> +
> +# define ARCH_TRACE_CLOCKS \
> +	{ trace_clock_x86_tsc,	"tsc" },
> +
> +#endif
> +
> +#endif  /* _ASM_X86_TRACE_CLOCK_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 8215e56..0ee9344 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -62,6 +62,7 @@ obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
>  obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
>  obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
>  obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
> +obj-$(CONFIG_X86_TSC)		+= trace_clock.o
>  obj-$(CONFIG_KEXEC)		+= machine_kexec_$(BITS).o
>  obj-$(CONFIG_KEXEC)		+= relocate_kernel_$(BITS).o crash.o
>  obj-$(CONFIG_CRASH_DUMP)	+= crash_dump_$(BITS).o
> diff --git a/arch/x86/kernel/trace_clock.c b/arch/x86/kernel/trace_clock.c
> new file mode 100644
> index 0000000..b8959f8
> --- /dev/null
> +++ b/arch/x86/kernel/trace_clock.c
> @@ -0,0 +1,20 @@
> +/*
> + * X86 trace clocks
> + */
> +#include <asm/trace_clock.h>
> +#include <asm/barrier.h>
> +#include <asm/msr.h>
> +
> +/*
> + * trace_clock_x86_tsc(): A clock that is just the cycle counter.
> + *
> + * Unlike the other clocks, this is not in nanoseconds.
> + */
> +u64 notrace trace_clock_x86_tsc(void)
> +{
> +	u64 ret;
> +	rdtsc_barrier();
> +	rdtscll(ret);
> +
> +	return ret;
> +}
> diff --git a/include/asm-generic/trace_clock.h b/include/asm-generic/trace_clock.h
> new file mode 100644
> index 0000000..648cdcd
> --- /dev/null
> +++ b/include/asm-generic/trace_clock.h
> @@ -0,0 +1,15 @@
> +/*
> + * Arch-specific trace clocks.
> + */
> +#ifndef _ASM_GENERIC_TRACE_CLOCK_H
> +#define _ASM_GENERIC_TRACE_CLOCK_H
> +
> +/*
> + * Additional trace clocks added to the trace_clocks
> + * array in kernel/trace/trace.c
> + */
> +#ifndef ARCH_TRACE_CLOCKS
> +# define ARCH_TRACE_CLOCKS
> +#endif
> +
> +#endif  /* _ASM_GENERIC_TRACE_CLOCK_H */
> diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
> index 4eb4902..d563f37 100644
> --- a/include/linux/trace_clock.h
> +++ b/include/linux/trace_clock.h
> @@ -12,6 +12,8 @@
>  #include <linux/compiler.h>
>  #include <linux/types.h>
>  
> +#include <asm/trace_clock.h>
> +
>  extern u64 notrace trace_clock_local(void);
>  extern u64 notrace trace_clock(void);
>  extern u64 notrace trace_clock_global(void);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5c38c81..92fb08e 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -480,6 +480,7 @@ static struct {
>  	{ trace_clock_local,	"local" },
>  	{ trace_clock_global,	"global" },
>  	{ trace_clock_counter,	"counter" },
> +	ARCH_TRACE_CLOCKS
>  };
>  
>  int trace_clock_id;
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] tracing,x86: add a TSC trace_clock
  2012-09-12 23:31   ` [PATCH v2 1/3] tracing,x86: add a TSC trace_clock David Sharp
  2012-09-13  1:15     ` Masami Hiramatsu
@ 2012-09-13  1:23     ` Steven Rostedt
  2012-09-18 21:37       ` David Sharp
  2012-09-18 21:46       ` David Sharp
  1 sibling, 2 replies; 37+ messages in thread
From: Steven Rostedt @ 2012-09-13  1:23 UTC (permalink / raw)
  To: David Sharp
  Cc: linux-kernel, Masami Hiramatsu, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

This change will require an ack from one of the x86 maintainers.

For reference to this patch, please see this thread:

 https://lkml.org/lkml/2012/9/11/572

On Wed, 2012-09-12 at 16:31 -0700, David Sharp wrote:
> In order to promote interoperability between userspace tracers and ftrace,
> add a trace_clock that reports raw TSC values which will then be recorded
> in the ring buffer. Userspace tracers that also record TSCs are then on
> exactly the same time base as the kernel and events can be unambiguously
> interlaced.
> 
> Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
> timestamp values.
> 
> v2:
> Move arch-specific bits out of generic code.

As Masami pointed out, please add a SOB to your patches.

Thanks,

-- Steve

> 
> Google-Bug-Id: 6980623
> ---
>  arch/x86/include/asm/trace_clock.h |   16 ++++++++++++++++
>  arch/x86/kernel/Makefile           |    1 +
>  arch/x86/kernel/trace_clock.c      |   20 ++++++++++++++++++++
>  include/asm-generic/trace_clock.h  |   15 +++++++++++++++
>  include/linux/trace_clock.h        |    2 ++
>  kernel/trace/trace.c               |    1 +
>  6 files changed, 55 insertions(+), 0 deletions(-)
>  create mode 100644 arch/x86/include/asm/trace_clock.h
>  create mode 100644 arch/x86/kernel/trace_clock.c
>  create mode 100644 include/asm-generic/trace_clock.h
> 
> diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
> new file mode 100644
> index 0000000..0b1f391
> --- /dev/null
> +++ b/arch/x86/include/asm/trace_clock.h
> @@ -0,0 +1,16 @@
> +#ifndef _ASM_X86_TRACE_CLOCK_H
> +#define _ASM_X86_TRACE_CLOCK_H
> +
> +#include <linux/compiler.h>
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_X86_TSC
> +
> +extern u64 notrace trace_clock_x86_tsc(void);
> +
> +# define ARCH_TRACE_CLOCKS \
> +	{ trace_clock_x86_tsc,	"tsc" },
> +
> +#endif
> +
> +#endif  /* _ASM_X86_TRACE_CLOCK_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 8215e56..0ee9344 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -62,6 +62,7 @@ obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
>  obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
>  obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
>  obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
> +obj-$(CONFIG_X86_TSC)		+= trace_clock.o
>  obj-$(CONFIG_KEXEC)		+= machine_kexec_$(BITS).o
>  obj-$(CONFIG_KEXEC)		+= relocate_kernel_$(BITS).o crash.o
>  obj-$(CONFIG_CRASH_DUMP)	+= crash_dump_$(BITS).o
> diff --git a/arch/x86/kernel/trace_clock.c b/arch/x86/kernel/trace_clock.c
> new file mode 100644
> index 0000000..b8959f8
> --- /dev/null
> +++ b/arch/x86/kernel/trace_clock.c
> @@ -0,0 +1,20 @@
> +/*
> + * X86 trace clocks
> + */
> +#include <asm/trace_clock.h>
> +#include <asm/barrier.h>
> +#include <asm/msr.h>
> +
> +/*
> + * trace_clock_x86_tsc(): A clock that is just the cycle counter.
> + *
> + * Unlike the other clocks, this is not in nanoseconds.
> + */
> +u64 notrace trace_clock_x86_tsc(void)
> +{
> +	u64 ret;
> +	rdtsc_barrier();
> +	rdtscll(ret);
> +
> +	return ret;
> +}
> diff --git a/include/asm-generic/trace_clock.h b/include/asm-generic/trace_clock.h
> new file mode 100644
> index 0000000..648cdcd
> --- /dev/null
> +++ b/include/asm-generic/trace_clock.h
> @@ -0,0 +1,15 @@
> +/*
> + * Arch-specific trace clocks.
> + */
> +#ifndef _ASM_GENERIC_TRACE_CLOCK_H
> +#define _ASM_GENERIC_TRACE_CLOCK_H
> +
> +/*
> + * Additional trace clocks added to the trace_clocks
> + * array in kernel/trace/trace.c
> + */
> +#ifndef ARCH_TRACE_CLOCKS
> +# define ARCH_TRACE_CLOCKS
> +#endif
> +
> +#endif  /* _ASM_GENERIC_TRACE_CLOCK_H */
> diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
> index 4eb4902..d563f37 100644
> --- a/include/linux/trace_clock.h
> +++ b/include/linux/trace_clock.h
> @@ -12,6 +12,8 @@
>  #include <linux/compiler.h>
>  #include <linux/types.h>
>  
> +#include <asm/trace_clock.h>
> +
>  extern u64 notrace trace_clock_local(void);
>  extern u64 notrace trace_clock(void);
>  extern u64 notrace trace_clock_global(void);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5c38c81..92fb08e 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -480,6 +480,7 @@ static struct {
>  	{ trace_clock_local,	"local" },
>  	{ trace_clock_global,	"global" },
>  	{ trace_clock_counter,	"counter" },
> +	ARCH_TRACE_CLOCKS
>  };
>  
>  int trace_clock_id;



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 1/3] tracing,x86: add a TSC trace_clock
  2012-09-13  1:23     ` Steven Rostedt
@ 2012-09-18 21:37       ` David Sharp
  2012-09-18 21:37         ` [PATCH v2 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
                           ` (3 more replies)
  2012-09-18 21:46       ` David Sharp
  1 sibling, 4 replies; 37+ messages in thread
From: David Sharp @ 2012-09-18 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Sharp, Steven Rostedt, Masami Hiramatsu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin

In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
---
 arch/x86/include/asm/trace_clock.h |   16 ++++++++++++++++
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/trace_clock.c      |   20 ++++++++++++++++++++
 include/asm-generic/trace_clock.h  |   15 +++++++++++++++
 include/linux/trace_clock.h        |    2 ++
 kernel/trace/trace.c               |    1 +
 6 files changed, 55 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/trace_clock.h
 create mode 100644 arch/x86/kernel/trace_clock.c
 create mode 100644 include/asm-generic/trace_clock.h

diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
new file mode 100644
index 0000000..0b1f391
--- /dev/null
+++ b/arch/x86/include/asm/trace_clock.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_TRACE_CLOCK_H
+#define _ASM_X86_TRACE_CLOCK_H
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+#ifdef CONFIG_X86_TSC
+
+extern u64 notrace trace_clock_x86_tsc(void);
+
+# define ARCH_TRACE_CLOCKS \
+	{ trace_clock_x86_tsc,	"tsc" },
+
+#endif
+
+#endif  /* _ASM_X86_TRACE_CLOCK_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8215e56..0ee9344 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
+obj-$(CONFIG_X86_TSC)		+= trace_clock.o
 obj-$(CONFIG_KEXEC)		+= machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC)		+= relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_CRASH_DUMP)	+= crash_dump_$(BITS).o
diff --git a/arch/x86/kernel/trace_clock.c b/arch/x86/kernel/trace_clock.c
new file mode 100644
index 0000000..b8959f8
--- /dev/null
+++ b/arch/x86/kernel/trace_clock.c
@@ -0,0 +1,20 @@
+/*
+ * X86 trace clocks
+ */
+#include <asm/trace_clock.h>
+#include <asm/barrier.h>
+#include <asm/msr.h>
+
+/*
+ * trace_clock_x86_tsc(): A clock that is just the cycle counter.
+ *
+ * Unlike the other clocks, this is not in nanoseconds.
+ */
+u64 notrace trace_clock_x86_tsc(void)
+{
+	u64 ret;
+	rdtsc_barrier();
+	rdtscll(ret);
+
+	return ret;
+}
diff --git a/include/asm-generic/trace_clock.h b/include/asm-generic/trace_clock.h
new file mode 100644
index 0000000..648cdcd
--- /dev/null
+++ b/include/asm-generic/trace_clock.h
@@ -0,0 +1,15 @@
+/*
+ * Arch-specific trace clocks.
+ */
+#ifndef _ASM_GENERIC_TRACE_CLOCK_H
+#define _ASM_GENERIC_TRACE_CLOCK_H
+
+/*
+ * Additional trace clocks added to the trace_clocks
+ * array in kernel/trace/trace.c
+ */
+#ifndef ARCH_TRACE_CLOCKS
+# define ARCH_TRACE_CLOCKS
+#endif
+
+#endif  /* _ASM_GENERIC_TRACE_CLOCK_H */
diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
index 4eb4902..d563f37 100644
--- a/include/linux/trace_clock.h
+++ b/include/linux/trace_clock.h
@@ -12,6 +12,8 @@
 #include <linux/compiler.h>
 #include <linux/types.h>
 
+#include <asm/trace_clock.h>
+
 extern u64 notrace trace_clock_local(void);
 extern u64 notrace trace_clock(void);
 extern u64 notrace trace_clock_global(void);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5c38c81..92fb08e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -480,6 +480,7 @@ static struct {
 	{ trace_clock_local,	"local" },
 	{ trace_clock_global,	"global" },
 	{ trace_clock_counter,	"counter" },
+	ARCH_TRACE_CLOCKS
 };
 
 int trace_clock_id;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 2/3] tracing: reset ring buffer when changing trace_clocks
  2012-09-18 21:37       ` David Sharp
@ 2012-09-18 21:37         ` David Sharp
  2012-09-18 21:37         ` [PATCH v2 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-18 21:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

Because the "tsc" clock isn't in nanoseconds, the ring buffer must be
reset when changing clocks so that incomparable timestamps don't end up
in the same trace.

Tested: Confirmed switching clocks resets the trace buffer.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 kernel/trace/trace.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 92fb08e..4e26df3 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4012,6 +4012,14 @@ static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf,
 	if (max_tr.buffer)
 		ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func);
 
+	/*
+	 * New clock may not be consistent with the previous clock.
+	 * Reset the buffer so that it doesn't have incomparable timestamps.
+	 */
+	tracing_reset_online_cpus(&global_trace);
+	if (max_tr.buffer)
+		tracing_reset_online_cpus(&max_tr);
+
 	mutex_unlock(&trace_types_lock);
 
 	*fpos += cnt;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v2 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-18 21:37       ` David Sharp
  2012-09-18 21:37         ` [PATCH v2 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
@ 2012-09-18 21:37         ` David Sharp
  2012-09-19  2:10         ` [PATCH v2 1/3] tracing,x86: add a TSC trace_clock Steven Rostedt
  2012-09-20  8:55         ` Ingo Molnar
  3 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-18 21:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
          <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
           sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
          <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
           sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

v2:
Move arch-specific bits out of generic code.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 arch/x86/include/asm/trace_clock.h |    2 +-
 include/linux/ftrace_event.h       |    6 +++
 kernel/trace/trace.c               |   15 +++++-
 kernel/trace/trace.h               |    4 --
 kernel/trace/trace_output.c        |   82 +++++++++++++++++++++++++-----------
 5 files changed, 76 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
index 0b1f391..4af2eb7 100644
--- a/arch/x86/include/asm/trace_clock.h
+++ b/arch/x86/include/asm/trace_clock.h
@@ -9,7 +9,7 @@
 extern u64 notrace trace_clock_x86_tsc(void);
 
 # define ARCH_TRACE_CLOCKS \
-	{ trace_clock_x86_tsc,	"tsc" },
+	{ trace_clock_x86_tsc,	"tsc",		.in_ns = 0 },
 
 #endif
 
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
 	cpumask_var_t		started;
 };
 
+enum trace_iter_flags {
+	TRACE_FILE_LAT_FMT	= 1,
+	TRACE_FILE_ANNOTATE	= 2,
+	TRACE_FILE_TIME_IN_NS	= 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4e26df3..3fe4c5b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,10 +476,11 @@ static const char *trace_options[] = {
 static struct {
 	u64 (*func)(void);
 	const char *name;
+	int in_ns; /* is this clock in nanoseconds? */
 } trace_clocks[] = {
-	{ trace_clock_local,	"local" },
-	{ trace_clock_global,	"global" },
-	{ trace_clock_counter,	"counter" },
+	{ trace_clock_local,	"local",	1 },
+	{ trace_clock_global,	"global",	1 },
+	{ trace_clock_counter,	"counter",	0 },
 	ARCH_TRACE_CLOCKS
 };
 
@@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
 	if (ring_buffer_overruns(iter->tr->buffer))
 		iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	/* stop the trace while dumping */
 	tracing_stop();
 
@@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_LATENCY_FMT)
 		iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	iter->cpu_file = cpu_file;
 	iter->tr = &global_trace;
 	mutex_init(&iter->mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -409,10 +409,6 @@ void tracing_start_sched_switch_record(void);
 int register_tracer(struct tracer *type);
 void unregister_tracer(struct tracer *type);
 int is_tracing_stopped(void);
-enum trace_file_type {
-	TRACE_FILE_LAT_FMT	= 1,
-	TRACE_FILE_ANNOTATE	= 2,
-};
 
 extern cpumask_var_t __read_mostly tracing_buffer_mask;
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 123b189..c86d62a 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -610,24 +610,57 @@ lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
 	return trace_print_lat_fmt(s, entry);
 }
 
-static unsigned long preempt_mark_thresh = 100;
+static unsigned long preempt_mark_thresh_us = 100;
+/* roughly the same at 2.0GHz: */
+static unsigned long preempt_mark_thresh_cycles = 200000;
 
 static int
-lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
-		    unsigned long rel_usecs)
+lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
 {
-	return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
-				rel_usecs > preempt_mark_thresh ? '!' :
-				  rel_usecs > 1 ? '+' : ' ');
+	int ret;
+	struct trace_seq *s = &iter->seq;
+	unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
+	unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
+	unsigned long long abs_ts = iter->ts - iter->tr->time_start;
+	unsigned long long rel_ts = next_ts - iter->ts;
+	unsigned long mark_thresh;
+
+	if (in_ns) {
+		abs_ts = ns2usecs(abs_ts);
+		rel_ts = ns2usecs(rel_ts);
+		mark_thresh = preempt_mark_thresh_us;
+	} else
+		mark_thresh = preempt_mark_thresh_cycles;
+
+	if (verbose && in_ns) {
+		ret = trace_seq_printf(
+				s, "[%08llx] %lld.%03lldms (+%lld.%03lldms): ",
+				ns2usecs(iter->ts),
+				abs_ts / USEC_PER_MSEC,
+				abs_ts % USEC_PER_MSEC,
+				rel_ts / USEC_PER_MSEC,
+				rel_ts % USEC_PER_MSEC);
+	} else if (verbose && !in_ns) {
+		ret = trace_seq_printf(
+				s, "[%016llx] %lld (+%lld): ",
+				iter->ts, abs_ts, rel_ts);
+	} else { /* !verbose */
+		ret = trace_seq_printf(
+				s, " %4lld%s%c: ",
+				abs_ts,
+				in_ns ? "us" : "",
+				rel_ts > mark_thresh ? '!' :
+				  rel_ts > 1 ? '+' : ' ');
+	}
+	return ret;
 }
 
 int trace_print_context(struct trace_iterator *iter)
 {
 	struct trace_seq *s = &iter->seq;
 	struct trace_entry *entry = iter->ent;
-	unsigned long long t = ns2usecs(iter->ts);
-	unsigned long usec_rem = do_div(t, USEC_PER_SEC);
-	unsigned long secs = (unsigned long)t;
+	unsigned long long t;
+	unsigned long secs, usec_rem;
 	char comm[TASK_COMM_LEN];
 	int ret;
 
@@ -644,8 +677,13 @@ int trace_print_context(struct trace_iterator *iter)
 			return 0;
 	}
 
-	return trace_seq_printf(s, " %5lu.%06lu: ",
-				secs, usec_rem);
+	if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
+		t = ns2usecs(iter->ts);
+		usec_rem = do_div(t, USEC_PER_SEC);
+		secs = (unsigned long)t;
+		return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
+	} else
+		return trace_seq_printf(s, "%12llu: ", iter->ts);
 }
 
 int trace_print_lat_context(struct trace_iterator *iter)
@@ -659,36 +697,30 @@ int trace_print_lat_context(struct trace_iterator *iter)
 			   *next_entry = trace_find_next_entry(iter, NULL,
 							       &next_ts);
 	unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
-	unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
-	unsigned long rel_usecs;
+
 
 	/* Restore the original ent_size */
 	iter->ent_size = ent_size;
 
 	if (!next_entry)
 		next_ts = iter->ts;
-	rel_usecs = ns2usecs(next_ts - iter->ts);
 
 	if (verbose) {
 		char comm[TASK_COMM_LEN];
 
 		trace_find_cmdline(entry->pid, comm);
 
-		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
-				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
-				       entry->pid, iter->cpu, entry->flags,
-				       entry->preempt_count, iter->idx,
-				       ns2usecs(iter->ts),
-				       abs_usecs / USEC_PER_MSEC,
-				       abs_usecs % USEC_PER_MSEC,
-				       rel_usecs / USEC_PER_MSEC,
-				       rel_usecs % USEC_PER_MSEC);
+		ret = trace_seq_printf(
+				s, "%16s %5d %3d %d %08x %08lx ",
+				comm, entry->pid, iter->cpu, entry->flags,
+				entry->preempt_count, iter->idx);
 	} else {
 		ret = lat_print_generic(s, entry, iter->cpu);
-		if (ret)
-			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
 	}
 
+	if (ret)
+		ret = lat_print_timestamp(iter, next_ts);
+
 	return ret;
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] tracing,x86: add a TSC trace_clock
  2012-09-13  1:23     ` Steven Rostedt
  2012-09-18 21:37       ` David Sharp
@ 2012-09-18 21:46       ` David Sharp
  1 sibling, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-18 21:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

On Wed, Sep 12, 2012 at 6:23 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> This change will require an ack from one of the x86 maintainers.
>
> For reference to this patch, please see this thread:
>
>  https://lkml.org/lkml/2012/9/11/572
>
> On Wed, 2012-09-12 at 16:31 -0700, David Sharp wrote:
>> In order to promote interoperability between userspace tracers and ftrace,
>> add a trace_clock that reports raw TSC values which will then be recorded
>> in the ring buffer. Userspace tracers that also record TSCs are then on
>> exactly the same time base as the kernel and events can be unambiguously
>> interlaced.
>>
>> Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
>> timestamp values.
>>
>> v2:
>> Move arch-specific bits out of generic code.
>
> As Masami pointed out, please add a SOB to your patches.

Sorry about that. I guess that's what happens when I rush before going
on vacation. Apparently I did `git format-patch --signoff` the first
time instead of in the commit.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] tracing,x86: add a TSC trace_clock
  2012-09-18 21:37       ` David Sharp
  2012-09-18 21:37         ` [PATCH v2 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
  2012-09-18 21:37         ` [PATCH v2 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
@ 2012-09-19  2:10         ` Steven Rostedt
  2012-09-20  8:55         ` Ingo Molnar
  3 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2012-09-19  2:10 UTC (permalink / raw)
  To: David Sharp
  Cc: linux-kernel, Masami Hiramatsu, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

On Tue, 2012-09-18 at 14:37 -0700, David Sharp wrote:
> In order to promote interoperability between userspace tracers and ftrace,
> add a trace_clock that reports raw TSC values which will then be recorded
> in the ring buffer. Userspace tracers that also record TSCs are then on
> exactly the same time base as the kernel and events can be unambiguously
> interlaced.
> 
> Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
> timestamp values.
> 
> v2:
> Move arch-specific bits out of generic code.
> 
> Google-Bug-Id: 6980623
> Signed-off-by: David Sharp <dhsharp@google.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>


> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@linux.intel.com>

Again this needs an Acked-by from one of the above.

-- Steve


> ---
>  arch/x86/include/asm/trace_clock.h |   16 ++++++++++++++++
>  arch/x86/kernel/Makefile           |    1 +
>  arch/x86/kernel/trace_clock.c      |   20 ++++++++++++++++++++
>  include/asm-generic/trace_clock.h  |   15 +++++++++++++++
>  include/linux/trace_clock.h        |    2 ++
>  kernel/trace/trace.c               |    1 +
>  6 files changed, 55 insertions(+), 0 deletions(-)
>  create mode 100644 arch/x86/include/asm/trace_clock.h
>  create mode 100644 arch/x86/kernel/trace_clock.c
>  create mode 100644 include/asm-generic/trace_clock.h
> 
> diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
> new file mode 100644
> index 0000000..0b1f391
> --- /dev/null
> +++ b/arch/x86/include/asm/trace_clock.h
> @@ -0,0 +1,16 @@
> +#ifndef _ASM_X86_TRACE_CLOCK_H
> +#define _ASM_X86_TRACE_CLOCK_H
> +
> +#include <linux/compiler.h>
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_X86_TSC
> +
> +extern u64 notrace trace_clock_x86_tsc(void);
> +
> +# define ARCH_TRACE_CLOCKS \
> +	{ trace_clock_x86_tsc,	"tsc" },
> +
> +#endif
> +
> +#endif  /* _ASM_X86_TRACE_CLOCK_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 8215e56..0ee9344 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -62,6 +62,7 @@ obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
>  obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
>  obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
>  obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
> +obj-$(CONFIG_X86_TSC)		+= trace_clock.o
>  obj-$(CONFIG_KEXEC)		+= machine_kexec_$(BITS).o
>  obj-$(CONFIG_KEXEC)		+= relocate_kernel_$(BITS).o crash.o
>  obj-$(CONFIG_CRASH_DUMP)	+= crash_dump_$(BITS).o
> diff --git a/arch/x86/kernel/trace_clock.c b/arch/x86/kernel/trace_clock.c
> new file mode 100644
> index 0000000..b8959f8
> --- /dev/null
> +++ b/arch/x86/kernel/trace_clock.c
> @@ -0,0 +1,20 @@
> +/*
> + * X86 trace clocks
> + */
> +#include <asm/trace_clock.h>
> +#include <asm/barrier.h>
> +#include <asm/msr.h>
> +
> +/*
> + * trace_clock_x86_tsc(): A clock that is just the cycle counter.
> + *
> + * Unlike the other clocks, this is not in nanoseconds.
> + */
> +u64 notrace trace_clock_x86_tsc(void)
> +{
> +	u64 ret;
> +	rdtsc_barrier();
> +	rdtscll(ret);
> +
> +	return ret;
> +}
> diff --git a/include/asm-generic/trace_clock.h b/include/asm-generic/trace_clock.h
> new file mode 100644
> index 0000000..648cdcd
> --- /dev/null
> +++ b/include/asm-generic/trace_clock.h
> @@ -0,0 +1,15 @@
> +/*
> + * Arch-specific trace clocks.
> + */
> +#ifndef _ASM_GENERIC_TRACE_CLOCK_H
> +#define _ASM_GENERIC_TRACE_CLOCK_H
> +
> +/*
> + * Additional trace clocks added to the trace_clocks
> + * array in kernel/trace/trace.c
> + */
> +#ifndef ARCH_TRACE_CLOCKS
> +# define ARCH_TRACE_CLOCKS
> +#endif
> +
> +#endif  /* _ASM_GENERIC_TRACE_CLOCK_H */
> diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
> index 4eb4902..d563f37 100644
> --- a/include/linux/trace_clock.h
> +++ b/include/linux/trace_clock.h
> @@ -12,6 +12,8 @@
>  #include <linux/compiler.h>
>  #include <linux/types.h>
>  
> +#include <asm/trace_clock.h>
> +
>  extern u64 notrace trace_clock_local(void);
>  extern u64 notrace trace_clock(void);
>  extern u64 notrace trace_clock_global(void);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5c38c81..92fb08e 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -480,6 +480,7 @@ static struct {
>  	{ trace_clock_local,	"local" },
>  	{ trace_clock_global,	"global" },
>  	{ trace_clock_counter,	"counter" },
> +	ARCH_TRACE_CLOCKS
>  };
>  
>  int trace_clock_id;



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] tracing,x86: add a TSC trace_clock
  2012-09-18 21:37       ` David Sharp
                           ` (2 preceding siblings ...)
  2012-09-19  2:10         ` [PATCH v2 1/3] tracing,x86: add a TSC trace_clock Steven Rostedt
@ 2012-09-20  8:55         ` Ingo Molnar
  2012-09-20 13:34           ` Steven Rostedt
  2012-09-20 22:53           ` [PATCH v2 " David Sharp
  3 siblings, 2 replies; 37+ messages in thread
From: Ingo Molnar @ 2012-09-20  8:55 UTC (permalink / raw)
  To: David Sharp
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu, Thomas Gleixner,
	H. Peter Anvin


* David Sharp <dhsharp@google.com> wrote:

> In order to promote interoperability between userspace tracers and ftrace,
> add a trace_clock that reports raw TSC values which will then be recorded
> in the ring buffer. Userspace tracers that also record TSCs are then on
> exactly the same time base as the kernel and events can be unambiguously
> interlaced.
> 
> Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
> timestamp values.
> 
> v2:
> Move arch-specific bits out of generic code.
> 
> Google-Bug-Id: 6980623
> Signed-off-by: David Sharp <dhsharp@google.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "H. Peter Anvin" <hpa@linux.intel.com>
> ---
>  arch/x86/include/asm/trace_clock.h |   16 ++++++++++++++++
>  arch/x86/kernel/Makefile           |    1 +
>  arch/x86/kernel/trace_clock.c      |   20 ++++++++++++++++++++
>  include/asm-generic/trace_clock.h  |   15 +++++++++++++++
>  include/linux/trace_clock.h        |    2 ++
>  kernel/trace/trace.c               |    1 +
>  6 files changed, 55 insertions(+), 0 deletions(-)
>  create mode 100644 arch/x86/include/asm/trace_clock.h
>  create mode 100644 arch/x86/kernel/trace_clock.c
>  create mode 100644 include/asm-generic/trace_clock.h
> 
> diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
> new file mode 100644
> index 0000000..0b1f391
> --- /dev/null
> +++ b/arch/x86/include/asm/trace_clock.h
> @@ -0,0 +1,16 @@
> +#ifndef _ASM_X86_TRACE_CLOCK_H
> +#define _ASM_X86_TRACE_CLOCK_H
> +
> +#include <linux/compiler.h>
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_X86_TSC
> +
> +extern u64 notrace trace_clock_x86_tsc(void);
> +
> +# define ARCH_TRACE_CLOCKS \
> +	{ trace_clock_x86_tsc,	"tsc" },

I'd name it "x86-tsc", to make sure the naming is unique. That 
will also enable the addition of other hw clocks, should the 
need arise.

> +
> +#endif
> +
> +#endif  /* _ASM_X86_TRACE_CLOCK_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 8215e56..0ee9344 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -62,6 +62,7 @@ obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
>  obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
>  obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
>  obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
> +obj-$(CONFIG_X86_TSC)		+= trace_clock.o
>  obj-$(CONFIG_KEXEC)		+= machine_kexec_$(BITS).o
>  obj-$(CONFIG_KEXEC)		+= relocate_kernel_$(BITS).o crash.o
>  obj-$(CONFIG_CRASH_DUMP)	+= crash_dump_$(BITS).o
> diff --git a/arch/x86/kernel/trace_clock.c b/arch/x86/kernel/trace_clock.c
> new file mode 100644
> index 0000000..b8959f8
> --- /dev/null
> +++ b/arch/x86/kernel/trace_clock.c
> @@ -0,0 +1,20 @@
> +/*
> + * X86 trace clocks
> + */
> +#include <asm/trace_clock.h>
> +#include <asm/barrier.h>
> +#include <asm/msr.h>
> +
> +/*
> + * trace_clock_x86_tsc(): A clock that is just the cycle counter.
> + *
> + * Unlike the other clocks, this is not in nanoseconds.
> + */
> +u64 notrace trace_clock_x86_tsc(void)
> +{
> +	u64 ret;
> +	rdtsc_barrier();

Missing newline.

> +	rdtscll(ret);
> +
> +	return ret;
> +}
> diff --git a/include/asm-generic/trace_clock.h b/include/asm-generic/trace_clock.h
> new file mode 100644
> index 0000000..648cdcd
> --- /dev/null
> +++ b/include/asm-generic/trace_clock.h
> @@ -0,0 +1,15 @@
> +/*
> + * Arch-specific trace clocks.
> + */
> +#ifndef _ASM_GENERIC_TRACE_CLOCK_H
> +#define _ASM_GENERIC_TRACE_CLOCK_H

We typically put header guards to the first and last lines of 
the file, that way it can be ignored easily.

> +
> +/*
> + * Additional trace clocks added to the trace_clocks
> + * array in kernel/trace/trace.c

I'd add: "None if the architecture has not defined it."

> + */
> +#ifndef ARCH_TRACE_CLOCKS
> +# define ARCH_TRACE_CLOCKS
> +#endif
> +
> +#endif  /* _ASM_GENERIC_TRACE_CLOCK_H */
> diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
> index 4eb4902..d563f37 100644
> --- a/include/linux/trace_clock.h
> +++ b/include/linux/trace_clock.h
> @@ -12,6 +12,8 @@
>  #include <linux/compiler.h>
>  #include <linux/types.h>
>  
> +#include <asm/trace_clock.h>
> +
>  extern u64 notrace trace_clock_local(void);
>  extern u64 notrace trace_clock(void);
>  extern u64 notrace trace_clock_global(void);
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5c38c81..92fb08e 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -480,6 +480,7 @@ static struct {
>  	{ trace_clock_local,	"local" },
>  	{ trace_clock_global,	"global" },
>  	{ trace_clock_counter,	"counter" },
> +	ARCH_TRACE_CLOCKS
>  };
>  

Looks useful and good otherwise:

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] tracing,x86: add a TSC trace_clock
  2012-09-20  8:55         ` Ingo Molnar
@ 2012-09-20 13:34           ` Steven Rostedt
  2012-09-20 22:52             ` [PATCH v3 " David Sharp
  2012-09-20 22:53           ` [PATCH v2 " David Sharp
  1 sibling, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2012-09-20 13:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Sharp, linux-kernel, Masami Hiramatsu, Thomas Gleixner,
	H. Peter Anvin

On Thu, 2012-09-20 at 10:55 +0200, Ingo Molnar wrote:

> Looks useful and good otherwise:
> 
> Acked-by: Ingo Molnar <mingo@kernel.org>
> 

David,

Please submit a v3 with Ingo's updates and I'll go ahead and start
testing it.

Thanks!

-- Steve



^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 1/3] tracing,x86: add a TSC trace_clock
  2012-09-20 13:34           ` Steven Rostedt
@ 2012-09-20 22:52             ` David Sharp
  2012-09-20 22:52               ` [PATCH v3 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
                                 ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: David Sharp @ 2012-09-20 22:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: David Sharp, Steven Rostedt, Masami Hiramatsu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin

In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.
v3:
Rename "x86-tsc", cleanups

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/trace_clock.h |   16 ++++++++++++++++
 arch/x86/kernel/Makefile           |    1 +
 arch/x86/kernel/trace_clock.c      |   21 +++++++++++++++++++++
 include/asm-generic/trace_clock.h  |   16 ++++++++++++++++
 include/linux/trace_clock.h        |    2 ++
 kernel/trace/trace.c               |    1 +
 6 files changed, 57 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/trace_clock.h
 create mode 100644 arch/x86/kernel/trace_clock.c
 create mode 100644 include/asm-generic/trace_clock.h

diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
new file mode 100644
index 0000000..7ee0d8c
--- /dev/null
+++ b/arch/x86/include/asm/trace_clock.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_X86_TRACE_CLOCK_H
+#define _ASM_X86_TRACE_CLOCK_H
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+
+#ifdef CONFIG_X86_TSC
+
+extern u64 notrace trace_clock_x86_tsc(void);
+
+# define ARCH_TRACE_CLOCKS \
+	{ trace_clock_x86_tsc,	"x86-tsc" },
+
+#endif
+
+#endif  /* _ASM_X86_TRACE_CLOCK_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8215e56..0ee9344 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_X86_REBOOTFIXUPS)	+= reboot_fixups_32.o
 obj-$(CONFIG_DYNAMIC_FTRACE)	+= ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)	+= ftrace.o
+obj-$(CONFIG_X86_TSC)		+= trace_clock.o
 obj-$(CONFIG_KEXEC)		+= machine_kexec_$(BITS).o
 obj-$(CONFIG_KEXEC)		+= relocate_kernel_$(BITS).o crash.o
 obj-$(CONFIG_CRASH_DUMP)	+= crash_dump_$(BITS).o
diff --git a/arch/x86/kernel/trace_clock.c b/arch/x86/kernel/trace_clock.c
new file mode 100644
index 0000000..25b9937
--- /dev/null
+++ b/arch/x86/kernel/trace_clock.c
@@ -0,0 +1,21 @@
+/*
+ * X86 trace clocks
+ */
+#include <asm/trace_clock.h>
+#include <asm/barrier.h>
+#include <asm/msr.h>
+
+/*
+ * trace_clock_x86_tsc(): A clock that is just the cycle counter.
+ *
+ * Unlike the other clocks, this is not in nanoseconds.
+ */
+u64 notrace trace_clock_x86_tsc(void)
+{
+	u64 ret;
+
+	rdtsc_barrier();
+	rdtscll(ret);
+
+	return ret;
+}
diff --git a/include/asm-generic/trace_clock.h b/include/asm-generic/trace_clock.h
new file mode 100644
index 0000000..6726f1b
--- /dev/null
+++ b/include/asm-generic/trace_clock.h
@@ -0,0 +1,16 @@
+#ifndef _ASM_GENERIC_TRACE_CLOCK_H
+#define _ASM_GENERIC_TRACE_CLOCK_H
+/*
+ * Arch-specific trace clocks.
+ */
+
+/*
+ * Additional trace clocks added to the trace_clocks
+ * array in kernel/trace/trace.c
+ * None if the architecture has not defined it.
+ */
+#ifndef ARCH_TRACE_CLOCKS
+# define ARCH_TRACE_CLOCKS
+#endif
+
+#endif  /* _ASM_GENERIC_TRACE_CLOCK_H */
diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
index 4eb4902..d563f37 100644
--- a/include/linux/trace_clock.h
+++ b/include/linux/trace_clock.h
@@ -12,6 +12,8 @@
 #include <linux/compiler.h>
 #include <linux/types.h>
 
+#include <asm/trace_clock.h>
+
 extern u64 notrace trace_clock_local(void);
 extern u64 notrace trace_clock(void);
 extern u64 notrace trace_clock_global(void);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5c38c81..92fb08e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -480,6 +480,7 @@ static struct {
 	{ trace_clock_local,	"local" },
 	{ trace_clock_global,	"global" },
 	{ trace_clock_counter,	"counter" },
+	ARCH_TRACE_CLOCKS
 };
 
 int trace_clock_id;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 2/3] tracing: reset ring buffer when changing trace_clocks
  2012-09-20 22:52             ` [PATCH v3 " David Sharp
@ 2012-09-20 22:52               ` David Sharp
  2012-09-20 22:52               ` [PATCH v3 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
  2012-09-29  3:15               ` [PATCH v3 1/3] tracing,x86: add a TSC trace_clock Steven Rostedt
  2 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-20 22:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

Because the "tsc" clock isn't in nanoseconds, the ring buffer must be
reset when changing clocks so that incomparable timestamps don't end up
in the same trace.

Tested: Confirmed switching clocks resets the trace buffer.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 kernel/trace/trace.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 92fb08e..4e26df3 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4012,6 +4012,14 @@ static ssize_t tracing_clock_write(struct file *filp, const char __user *ubuf,
 	if (max_tr.buffer)
 		ring_buffer_set_clock(max_tr.buffer, trace_clocks[i].func);
 
+	/*
+	 * New clock may not be consistent with the previous clock.
+	 * Reset the buffer so that it doesn't have incomparable timestamps.
+	 */
+	tracing_reset_online_cpus(&global_trace);
+	if (max_tr.buffer)
+		tracing_reset_online_cpus(&max_tr);
+
 	mutex_unlock(&trace_types_lock);
 
 	*fpos += cnt;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-20 22:52             ` [PATCH v3 " David Sharp
  2012-09-20 22:52               ` [PATCH v3 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
@ 2012-09-20 22:52               ` David Sharp
  2012-09-25  3:27                 ` Steven Rostedt
  2012-09-29  3:15               ` [PATCH v3 1/3] tracing,x86: add a TSC trace_clock Steven Rostedt
  2 siblings, 1 reply; 37+ messages in thread
From: David Sharp @ 2012-09-20 22:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
          <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
           sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
          <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
           sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

v2:
Move arch-specific bits out of generic code.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 arch/x86/include/asm/trace_clock.h |    2 +-
 include/linux/ftrace_event.h       |    6 +++
 kernel/trace/trace.c               |   15 +++++-
 kernel/trace/trace.h               |    4 --
 kernel/trace/trace_output.c        |   82 +++++++++++++++++++++++++-----------
 5 files changed, 76 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
index 7ee0d8c..45e17f5 100644
--- a/arch/x86/include/asm/trace_clock.h
+++ b/arch/x86/include/asm/trace_clock.h
@@ -9,7 +9,7 @@
 extern u64 notrace trace_clock_x86_tsc(void);
 
 # define ARCH_TRACE_CLOCKS \
-	{ trace_clock_x86_tsc,	"x86-tsc" },
+	{ trace_clock_x86_tsc,	"x86-tsc",	.in_ns = 0 },
 
 #endif
 
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
 	cpumask_var_t		started;
 };
 
+enum trace_iter_flags {
+	TRACE_FILE_LAT_FMT	= 1,
+	TRACE_FILE_ANNOTATE	= 2,
+	TRACE_FILE_TIME_IN_NS	= 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4e26df3..3fe4c5b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,10 +476,11 @@ static const char *trace_options[] = {
 static struct {
 	u64 (*func)(void);
 	const char *name;
+	int in_ns; /* is this clock in nanoseconds? */
 } trace_clocks[] = {
-	{ trace_clock_local,	"local" },
-	{ trace_clock_global,	"global" },
-	{ trace_clock_counter,	"counter" },
+	{ trace_clock_local,	"local",	1 },
+	{ trace_clock_global,	"global",	1 },
+	{ trace_clock_counter,	"counter",	0 },
 	ARCH_TRACE_CLOCKS
 };
 
@@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
 	if (ring_buffer_overruns(iter->tr->buffer))
 		iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	/* stop the trace while dumping */
 	tracing_stop();
 
@@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_LATENCY_FMT)
 		iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	iter->cpu_file = cpu_file;
 	iter->tr = &global_trace;
 	mutex_init(&iter->mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -409,10 +409,6 @@ void tracing_start_sched_switch_record(void);
 int register_tracer(struct tracer *type);
 void unregister_tracer(struct tracer *type);
 int is_tracing_stopped(void);
-enum trace_file_type {
-	TRACE_FILE_LAT_FMT	= 1,
-	TRACE_FILE_ANNOTATE	= 2,
-};
 
 extern cpumask_var_t __read_mostly tracing_buffer_mask;
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 123b189..c86d62a 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -610,24 +610,57 @@ lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
 	return trace_print_lat_fmt(s, entry);
 }
 
-static unsigned long preempt_mark_thresh = 100;
+static unsigned long preempt_mark_thresh_us = 100;
+/* roughly the same at 2.0GHz: */
+static unsigned long preempt_mark_thresh_cycles = 200000;
 
 static int
-lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
-		    unsigned long rel_usecs)
+lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
 {
-	return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
-				rel_usecs > preempt_mark_thresh ? '!' :
-				  rel_usecs > 1 ? '+' : ' ');
+	int ret;
+	struct trace_seq *s = &iter->seq;
+	unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
+	unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
+	unsigned long long abs_ts = iter->ts - iter->tr->time_start;
+	unsigned long long rel_ts = next_ts - iter->ts;
+	unsigned long mark_thresh;
+
+	if (in_ns) {
+		abs_ts = ns2usecs(abs_ts);
+		rel_ts = ns2usecs(rel_ts);
+		mark_thresh = preempt_mark_thresh_us;
+	} else
+		mark_thresh = preempt_mark_thresh_cycles;
+
+	if (verbose && in_ns) {
+		ret = trace_seq_printf(
+				s, "[%08llx] %lld.%03lldms (+%lld.%03lldms): ",
+				ns2usecs(iter->ts),
+				abs_ts / USEC_PER_MSEC,
+				abs_ts % USEC_PER_MSEC,
+				rel_ts / USEC_PER_MSEC,
+				rel_ts % USEC_PER_MSEC);
+	} else if (verbose && !in_ns) {
+		ret = trace_seq_printf(
+				s, "[%016llx] %lld (+%lld): ",
+				iter->ts, abs_ts, rel_ts);
+	} else { /* !verbose */
+		ret = trace_seq_printf(
+				s, " %4lld%s%c: ",
+				abs_ts,
+				in_ns ? "us" : "",
+				rel_ts > mark_thresh ? '!' :
+				  rel_ts > 1 ? '+' : ' ');
+	}
+	return ret;
 }
 
 int trace_print_context(struct trace_iterator *iter)
 {
 	struct trace_seq *s = &iter->seq;
 	struct trace_entry *entry = iter->ent;
-	unsigned long long t = ns2usecs(iter->ts);
-	unsigned long usec_rem = do_div(t, USEC_PER_SEC);
-	unsigned long secs = (unsigned long)t;
+	unsigned long long t;
+	unsigned long secs, usec_rem;
 	char comm[TASK_COMM_LEN];
 	int ret;
 
@@ -644,8 +677,13 @@ int trace_print_context(struct trace_iterator *iter)
 			return 0;
 	}
 
-	return trace_seq_printf(s, " %5lu.%06lu: ",
-				secs, usec_rem);
+	if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
+		t = ns2usecs(iter->ts);
+		usec_rem = do_div(t, USEC_PER_SEC);
+		secs = (unsigned long)t;
+		return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
+	} else
+		return trace_seq_printf(s, "%12llu: ", iter->ts);
 }
 
 int trace_print_lat_context(struct trace_iterator *iter)
@@ -659,36 +697,30 @@ int trace_print_lat_context(struct trace_iterator *iter)
 			   *next_entry = trace_find_next_entry(iter, NULL,
 							       &next_ts);
 	unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
-	unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
-	unsigned long rel_usecs;
+
 
 	/* Restore the original ent_size */
 	iter->ent_size = ent_size;
 
 	if (!next_entry)
 		next_ts = iter->ts;
-	rel_usecs = ns2usecs(next_ts - iter->ts);
 
 	if (verbose) {
 		char comm[TASK_COMM_LEN];
 
 		trace_find_cmdline(entry->pid, comm);
 
-		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
-				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
-				       entry->pid, iter->cpu, entry->flags,
-				       entry->preempt_count, iter->idx,
-				       ns2usecs(iter->ts),
-				       abs_usecs / USEC_PER_MSEC,
-				       abs_usecs % USEC_PER_MSEC,
-				       rel_usecs / USEC_PER_MSEC,
-				       rel_usecs % USEC_PER_MSEC);
+		ret = trace_seq_printf(
+				s, "%16s %5d %3d %d %08x %08lx ",
+				comm, entry->pid, iter->cpu, entry->flags,
+				entry->preempt_count, iter->idx);
 	} else {
 		ret = lat_print_generic(s, entry, iter->cpu);
-		if (ret)
-			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
 	}
 
+	if (ret)
+		ret = lat_print_timestamp(iter, next_ts);
+
 	return ret;
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 1/3] tracing,x86: add a TSC trace_clock
  2012-09-20  8:55         ` Ingo Molnar
  2012-09-20 13:34           ` Steven Rostedt
@ 2012-09-20 22:53           ` David Sharp
  1 sibling, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-20 22:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu, Thomas Gleixner,
	H. Peter Anvin

On Thu, Sep 20, 2012 at 1:55 AM, Ingo Molnar <mingo@kernel.org> wrote:
> Looks useful and good otherwise:
>
> Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks! Updated with all your comments.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-20 22:52               ` [PATCH v3 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
@ 2012-09-25  3:27                 ` Steven Rostedt
  2012-09-25 20:48                   ` David Sharp
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2012-09-25  3:27 UTC (permalink / raw)
  To: David Sharp; +Cc: linux-kernel, Masami Hiramatsu

On Thu, 2012-09-20 at 15:52 -0700, David Sharp wrote:
> With the addition of the "tsc" clock, formatting timestamps to look like
> fractional seconds is misleading. Mark clocks as either in nanoseconds or
> not, and format non-nanosecond timestamps as decimal integers.

I got this:

/work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__umoddi3'
/work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__udivdi3'
/work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__umoddi3'
/work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__umoddi3'
/work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__udivdi3'
/work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__umoddi3'
/work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__udivdi3'

when building for x86_32.

> +	int ret;
> +	struct trace_seq *s = &iter->seq;
> +	unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
> +	unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
> +	unsigned long long abs_ts = iter->ts - iter->tr->time_start;
> +	unsigned long long rel_ts = next_ts - iter->ts;
> +	unsigned long mark_thresh;
> +
> +	if (in_ns) {
> +		abs_ts = ns2usecs(abs_ts);
> +		rel_ts = ns2usecs(rel_ts);
> +		mark_thresh = preempt_mark_thresh_us;
> +	} else
> +		mark_thresh = preempt_mark_thresh_cycles;
> +
> +	if (verbose && in_ns) {
> +		ret = trace_seq_printf(
> +				s, "[%08llx] %lld.%03lldms (+%lld.%03lldms): ",
> +				ns2usecs(iter->ts),
> +				abs_ts / USEC_PER_MSEC,
> +				abs_ts % USEC_PER_MSEC,
> +				rel_ts / USEC_PER_MSEC,

You can't divide 64 bit numbers in the kernel. It breaks on 32bit archs.

-- Steve

> +				rel_ts % USEC_PER_MSEC);
> +	} else if (verbose && !in_ns) {
> +		ret = trace_seq_printf(
> +				s, "[%016llx] %lld (+%lld): ",
> +				iter->ts, abs_ts, rel_ts);
> +	} else { /* !verbose */
> +		ret = trace_seq_printf(
> +				s, " %4lld%s%c: ",
> +				abs_ts,
> +				in_ns ? "us" : "",
> +				rel_ts > mark_thresh ? '!' :
> +				  rel_ts > 1 ? '+' : ' ');
> +	}
> +	return ret;
>  }
>  
>  int trace_print_context(struct trace_iterator *iter)
>  {
>  	struct trace_seq *s = &iter->seq;
>  	struct trace_entry *entry = iter->ent;
> -	unsigned long long t = ns2usecs(iter->ts);
> -	unsigned long usec_rem = do_div(t, USEC_PER_SEC);
> -	unsigned long secs = (unsigned long)t;
> +	unsigned long long t;
> +	unsigned long secs, usec_rem;
>  	char comm[TASK_COMM_LEN];
>  	int ret;
>  
> @@ -644,8 +677,13 @@ int trace_print_context(struct trace_iterator *iter)
>  			return 0;
>  	}
>  
> -	return trace_seq_printf(s, " %5lu.%06lu: ",
> -				secs, usec_rem);
> +	if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
> +		t = ns2usecs(iter->ts);
> +		usec_rem = do_div(t, USEC_PER_SEC);
> +		secs = (unsigned long)t;
> +		return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
> +	} else
> +		return trace_seq_printf(s, "%12llu: ", iter->ts);
>  }
>  
>  int trace_print_lat_context(struct trace_iterator *iter)
> @@ -659,36 +697,30 @@ int trace_print_lat_context(struct trace_iterator *iter)
>  			   *next_entry = trace_find_next_entry(iter, NULL,
>  							       &next_ts);
>  	unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
> -	unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
> -	unsigned long rel_usecs;
> +
>  
>  	/* Restore the original ent_size */
>  	iter->ent_size = ent_size;
>  
>  	if (!next_entry)
>  		next_ts = iter->ts;
> -	rel_usecs = ns2usecs(next_ts - iter->ts);
>  
>  	if (verbose) {
>  		char comm[TASK_COMM_LEN];
>  
>  		trace_find_cmdline(entry->pid, comm);
>  
> -		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
> -				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
> -				       entry->pid, iter->cpu, entry->flags,
> -				       entry->preempt_count, iter->idx,
> -				       ns2usecs(iter->ts),
> -				       abs_usecs / USEC_PER_MSEC,
> -				       abs_usecs % USEC_PER_MSEC,
> -				       rel_usecs / USEC_PER_MSEC,
> -				       rel_usecs % USEC_PER_MSEC);
> +		ret = trace_seq_printf(
> +				s, "%16s %5d %3d %d %08x %08lx ",
> +				comm, entry->pid, iter->cpu, entry->flags,
> +				entry->preempt_count, iter->idx);
>  	} else {
>  		ret = lat_print_generic(s, entry, iter->cpu);
> -		if (ret)
> -			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
>  	}
>  
> +	if (ret)
> +		ret = lat_print_timestamp(iter, next_ts);
> +
>  	return ret;
>  }
>  



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-25  3:27                 ` Steven Rostedt
@ 2012-09-25 20:48                   ` David Sharp
  2012-09-25 20:49                     ` [PATCH v4 " David Sharp
  0 siblings, 1 reply; 37+ messages in thread
From: David Sharp @ 2012-09-25 20:48 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Masami Hiramatsu

On Mon, Sep 24, 2012 at 8:27 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 2012-09-20 at 15:52 -0700, David Sharp wrote:
>> With the addition of the "tsc" clock, formatting timestamps to look like
>> fractional seconds is misleading. Mark clocks as either in nanoseconds or
>> not, and format non-nanosecond timestamps as decimal integers.
>
> I got this:
>
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__umoddi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__udivdi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__umoddi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__umoddi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__udivdi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__umoddi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: undefined reference to `__udivdi3'
>
> when building for x86_32.
>
>> +     int ret;
>> +     struct trace_seq *s = &iter->seq;
>> +     unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
>> +     unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
>> +     unsigned long long abs_ts = iter->ts - iter->tr->time_start;
>> +     unsigned long long rel_ts = next_ts - iter->ts;
>> +     unsigned long mark_thresh;
>> +
>> +     if (in_ns) {
>> +             abs_ts = ns2usecs(abs_ts);
>> +             rel_ts = ns2usecs(rel_ts);
>> +             mark_thresh = preempt_mark_thresh_us;
>> +     } else
>> +             mark_thresh = preempt_mark_thresh_cycles;
>> +
>> +     if (verbose && in_ns) {
>> +             ret = trace_seq_printf(
>> +                             s, "[%08llx] %lld.%03lldms (+%lld.%03lldms): ",
>> +                             ns2usecs(iter->ts),
>> +                             abs_ts / USEC_PER_MSEC,
>> +                             abs_ts % USEC_PER_MSEC,
>> +                             rel_ts / USEC_PER_MSEC,
>
> You can't divide 64 bit numbers in the kernel. It breaks on 32bit archs.

I didn't realize that. Sorry. Not sure why I changed them to long long
in the first place.

Although, that does mean that it is currently overflowing every 4.29
seconds. I may as well fix that.

>
> -- Steve
>
>> +                             rel_ts % USEC_PER_MSEC);
>> +     } else if (verbose && !in_ns) {
>> +             ret = trace_seq_printf(
>> +                             s, "[%016llx] %lld (+%lld): ",
>> +                             iter->ts, abs_ts, rel_ts);
>> +     } else { /* !verbose */
>> +             ret = trace_seq_printf(
>> +                             s, " %4lld%s%c: ",
>> +                             abs_ts,
>> +                             in_ns ? "us" : "",
>> +                             rel_ts > mark_thresh ? '!' :
>> +                               rel_ts > 1 ? '+' : ' ');
>> +     }
>> +     return ret;
>>  }
>>
>>  int trace_print_context(struct trace_iterator *iter)
>>  {
>>       struct trace_seq *s = &iter->seq;
>>       struct trace_entry *entry = iter->ent;
>> -     unsigned long long t = ns2usecs(iter->ts);
>> -     unsigned long usec_rem = do_div(t, USEC_PER_SEC);
>> -     unsigned long secs = (unsigned long)t;
>> +     unsigned long long t;
>> +     unsigned long secs, usec_rem;
>>       char comm[TASK_COMM_LEN];
>>       int ret;
>>
>> @@ -644,8 +677,13 @@ int trace_print_context(struct trace_iterator *iter)
>>                       return 0;
>>       }
>>
>> -     return trace_seq_printf(s, " %5lu.%06lu: ",
>> -                             secs, usec_rem);
>> +     if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
>> +             t = ns2usecs(iter->ts);
>> +             usec_rem = do_div(t, USEC_PER_SEC);
>> +             secs = (unsigned long)t;
>> +             return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
>> +     } else
>> +             return trace_seq_printf(s, "%12llu: ", iter->ts);
>>  }
>>
>>  int trace_print_lat_context(struct trace_iterator *iter)
>> @@ -659,36 +697,30 @@ int trace_print_lat_context(struct trace_iterator *iter)
>>                          *next_entry = trace_find_next_entry(iter, NULL,
>>                                                              &next_ts);
>>       unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
>> -     unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
>> -     unsigned long rel_usecs;
>> +
>>
>>       /* Restore the original ent_size */
>>       iter->ent_size = ent_size;
>>
>>       if (!next_entry)
>>               next_ts = iter->ts;
>> -     rel_usecs = ns2usecs(next_ts - iter->ts);
>>
>>       if (verbose) {
>>               char comm[TASK_COMM_LEN];
>>
>>               trace_find_cmdline(entry->pid, comm);
>>
>> -             ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
>> -                                    " %ld.%03ldms (+%ld.%03ldms): ", comm,
>> -                                    entry->pid, iter->cpu, entry->flags,
>> -                                    entry->preempt_count, iter->idx,
>> -                                    ns2usecs(iter->ts),
>> -                                    abs_usecs / USEC_PER_MSEC,
>> -                                    abs_usecs % USEC_PER_MSEC,
>> -                                    rel_usecs / USEC_PER_MSEC,
>> -                                    rel_usecs % USEC_PER_MSEC);
>> +             ret = trace_seq_printf(
>> +                             s, "%16s %5d %3d %d %08x %08lx ",
>> +                             comm, entry->pid, iter->cpu, entry->flags,
>> +                             entry->preempt_count, iter->idx);
>>       } else {
>>               ret = lat_print_generic(s, entry, iter->cpu);
>> -             if (ret)
>> -                     ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
>>       }
>>
>> +     if (ret)
>> +             ret = lat_print_timestamp(iter, next_ts);
>> +
>>       return ret;
>>  }
>>
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-25 20:48                   ` David Sharp
@ 2012-09-25 20:49                     ` David Sharp
  2012-09-25 21:42                       ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: David Sharp @ 2012-09-25 20:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
          <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
           sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
          <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
           sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

v2:
Move arch-specific bits out of generic code.
v4:
Fix x86_32 build due to 64-bit division.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 arch/x86/include/asm/trace_clock.h |    2 +-
 include/linux/ftrace_event.h       |    6 +++
 kernel/trace/trace.c               |   15 +++++-
 kernel/trace/trace.h               |    4 --
 kernel/trace/trace_output.c        |   84 +++++++++++++++++++++++++-----------
 5 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
index 7ee0d8c..45e17f5 100644
--- a/arch/x86/include/asm/trace_clock.h
+++ b/arch/x86/include/asm/trace_clock.h
@@ -9,7 +9,7 @@
 extern u64 notrace trace_clock_x86_tsc(void);
 
 # define ARCH_TRACE_CLOCKS \
-	{ trace_clock_x86_tsc,	"x86-tsc" },
+	{ trace_clock_x86_tsc,	"x86-tsc",	.in_ns = 0 },
 
 #endif
 
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
 	cpumask_var_t		started;
 };
 
+enum trace_iter_flags {
+	TRACE_FILE_LAT_FMT	= 1,
+	TRACE_FILE_ANNOTATE	= 2,
+	TRACE_FILE_TIME_IN_NS	= 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4e26df3..3fe4c5b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,10 +476,11 @@ static const char *trace_options[] = {
 static struct {
 	u64 (*func)(void);
 	const char *name;
+	int in_ns; /* is this clock in nanoseconds? */
 } trace_clocks[] = {
-	{ trace_clock_local,	"local" },
-	{ trace_clock_global,	"global" },
-	{ trace_clock_counter,	"counter" },
+	{ trace_clock_local,	"local",	1 },
+	{ trace_clock_global,	"global",	1 },
+	{ trace_clock_counter,	"counter",	0 },
 	ARCH_TRACE_CLOCKS
 };
 
@@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
 	if (ring_buffer_overruns(iter->tr->buffer))
 		iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	/* stop the trace while dumping */
 	tracing_stop();
 
@@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_LATENCY_FMT)
 		iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	iter->cpu_file = cpu_file;
 	iter->tr = &global_trace;
 	mutex_init(&iter->mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -409,10 +409,6 @@ void tracing_start_sched_switch_record(void);
 int register_tracer(struct tracer *type);
 void unregister_tracer(struct tracer *type);
 int is_tracing_stopped(void);
-enum trace_file_type {
-	TRACE_FILE_LAT_FMT	= 1,
-	TRACE_FILE_ANNOTATE	= 2,
-};
 
 extern cpumask_var_t __read_mostly tracing_buffer_mask;
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 123b189..ca640ff 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -610,24 +610,59 @@ lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
 	return trace_print_lat_fmt(s, entry);
 }
 
-static unsigned long preempt_mark_thresh = 100;
+static unsigned long preempt_mark_thresh_us = 100;
+/* roughly the same at 2.0GHz: */
+static unsigned long preempt_mark_thresh_cycles = 200000;
 
 static int
-lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
-		    unsigned long rel_usecs)
+lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
 {
-	return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
-				rel_usecs > preempt_mark_thresh ? '!' :
-				  rel_usecs > 1 ? '+' : ' ');
+	int ret;
+	struct trace_seq *s = &iter->seq;
+	unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
+	unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
+	unsigned long long abs_ts = iter->ts - iter->tr->time_start;
+	unsigned long long rel_ts = next_ts - iter->ts;
+	unsigned long mark_thresh;
+
+	if (in_ns) {
+		abs_ts = ns2usecs(abs_ts);
+		rel_ts = ns2usecs(rel_ts);
+		mark_thresh = preempt_mark_thresh_us;
+	} else
+		mark_thresh = preempt_mark_thresh_cycles;
+
+	if (verbose && in_ns) {
+		unsigned long abs_msec = abs_ts;
+		unsigned long abs_usec = do_div(abs_msec, USEC_PER_MSEC);
+		unsigned long rel_msec = rel_ts;
+		unsigned long rel_usec = do_div(rel_msec, USEC_PER_MSEC);
+		ret = trace_seq_printf(
+				s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ",
+				ns2usecs(iter->ts),
+				abs_msec, abs_usec,
+				rel_msec, rel_usec);
+	} else if (verbose && !in_ns) {
+		ret = trace_seq_printf(
+				s, "[%016llx] %lld (+%lld): ",
+				iter->ts, abs_ts, rel_ts);
+	} else { /* !verbose */
+		ret = trace_seq_printf(
+				s, " %4lld%s%c: ",
+				abs_ts,
+				in_ns ? "us" : "",
+				rel_ts > mark_thresh ? '!' :
+				  rel_ts > 1 ? '+' : ' ');
+	}
+	return ret;
 }
 
 int trace_print_context(struct trace_iterator *iter)
 {
 	struct trace_seq *s = &iter->seq;
 	struct trace_entry *entry = iter->ent;
-	unsigned long long t = ns2usecs(iter->ts);
-	unsigned long usec_rem = do_div(t, USEC_PER_SEC);
-	unsigned long secs = (unsigned long)t;
+	unsigned long long t;
+	unsigned long secs, usec_rem;
 	char comm[TASK_COMM_LEN];
 	int ret;
 
@@ -644,8 +679,13 @@ int trace_print_context(struct trace_iterator *iter)
 			return 0;
 	}
 
-	return trace_seq_printf(s, " %5lu.%06lu: ",
-				secs, usec_rem);
+	if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
+		t = ns2usecs(iter->ts);
+		usec_rem = do_div(t, USEC_PER_SEC);
+		secs = (unsigned long)t;
+		return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
+	} else
+		return trace_seq_printf(s, "%12llu: ", iter->ts);
 }
 
 int trace_print_lat_context(struct trace_iterator *iter)
@@ -659,36 +699,30 @@ int trace_print_lat_context(struct trace_iterator *iter)
 			   *next_entry = trace_find_next_entry(iter, NULL,
 							       &next_ts);
 	unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
-	unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
-	unsigned long rel_usecs;
+
 
 	/* Restore the original ent_size */
 	iter->ent_size = ent_size;
 
 	if (!next_entry)
 		next_ts = iter->ts;
-	rel_usecs = ns2usecs(next_ts - iter->ts);
 
 	if (verbose) {
 		char comm[TASK_COMM_LEN];
 
 		trace_find_cmdline(entry->pid, comm);
 
-		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
-				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
-				       entry->pid, iter->cpu, entry->flags,
-				       entry->preempt_count, iter->idx,
-				       ns2usecs(iter->ts),
-				       abs_usecs / USEC_PER_MSEC,
-				       abs_usecs % USEC_PER_MSEC,
-				       rel_usecs / USEC_PER_MSEC,
-				       rel_usecs % USEC_PER_MSEC);
+		ret = trace_seq_printf(
+				s, "%16s %5d %3d %d %08x %08lx ",
+				comm, entry->pid, iter->cpu, entry->flags,
+				entry->preempt_count, iter->idx);
 	} else {
 		ret = lat_print_generic(s, entry, iter->cpu);
-		if (ret)
-			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
 	}
 
+	if (ret)
+		ret = lat_print_timestamp(iter, next_ts);
+
 	return ret;
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-25 20:49                     ` [PATCH v4 " David Sharp
@ 2012-09-25 21:42                       ` Steven Rostedt
  2012-09-25 22:29                         ` David Sharp
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2012-09-25 21:42 UTC (permalink / raw)
  To: David Sharp; +Cc: linux-kernel, Masami Hiramatsu

Sorry, I should have been more picky before. I haven't totally tested
this yet.

On Tue, 2012-09-25 at 13:49 -0700, David Sharp wrote:
> With the addition of the "tsc" clock, formatting timestamps to look like
> fractional seconds is misleading. Mark clocks as either in nanoseconds or
> not, and format non-nanosecond timestamps as decimal integers.
> 
> Tested:
> $ cd /sys/kernel/debug/tracing/
> $ cat trace_clock
> [local] global tsc
> $ echo sched_switch > set_event
> $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
> $ cat trace
>           <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
>            sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>   ...
> $ echo 1 > options/latency-format
> $ cat trace
>   <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
>    sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>   ...
> $ echo tsc > trace_clock
> $ cat trace
> $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
> $ echo 0 > options/latency-format
> $ cat trace
>           <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
>            sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>   ...
> echo 1 > options/latency-format
> $ cat trace
>   <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
>    sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>   ...
> 
> v2:
> Move arch-specific bits out of generic code.
> v4:
> Fix x86_32 build due to 64-bit division.
> 
> Google-Bug-Id: 6980623
> Signed-off-by: David Sharp <dhsharp@google.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
> ---
>  arch/x86/include/asm/trace_clock.h |    2 +-
>  include/linux/ftrace_event.h       |    6 +++
>  kernel/trace/trace.c               |   15 +++++-
>  kernel/trace/trace.h               |    4 --
>  kernel/trace/trace_output.c        |   84 +++++++++++++++++++++++++-----------
>  5 files changed, 78 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
> index 7ee0d8c..45e17f5 100644
> --- a/arch/x86/include/asm/trace_clock.h
> +++ b/arch/x86/include/asm/trace_clock.h
> @@ -9,7 +9,7 @@
>  extern u64 notrace trace_clock_x86_tsc(void);
>  
>  # define ARCH_TRACE_CLOCKS \
> -	{ trace_clock_x86_tsc,	"x86-tsc" },
> +	{ trace_clock_x86_tsc,	"x86-tsc",	.in_ns = 0 },
>  
>  #endif
>  
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index 642928c..c760670 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -86,6 +86,12 @@ struct trace_iterator {
>  	cpumask_var_t		started;
>  };
>  
> +enum trace_iter_flags {
> +	TRACE_FILE_LAT_FMT	= 1,
> +	TRACE_FILE_ANNOTATE	= 2,
> +	TRACE_FILE_TIME_IN_NS	= 4,
> +};
> +
>  
>  struct trace_event;
>  
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 4e26df3..3fe4c5b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -476,10 +476,11 @@ static const char *trace_options[] = {
>  static struct {
>  	u64 (*func)(void);
>  	const char *name;
> +	int in_ns; /* is this clock in nanoseconds? */

Add a few tabs between the ns; and /*


>  } trace_clocks[] = {
> -	{ trace_clock_local,	"local" },
> -	{ trace_clock_global,	"global" },
> -	{ trace_clock_counter,	"counter" },
> +	{ trace_clock_local,	"local",	1 },
> +	{ trace_clock_global,	"global",	1 },
> +	{ trace_clock_counter,	"counter",	0 },
>  	ARCH_TRACE_CLOCKS
>  };
>  
> @@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
>  	if (ring_buffer_overruns(iter->tr->buffer))
>  		iter->iter_flags |= TRACE_FILE_ANNOTATE;
>  
> +	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
> +	if (trace_clocks[trace_clock_id].in_ns)
> +		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
> +
>  	/* stop the trace while dumping */
>  	tracing_stop();
>  
> @@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
>  	if (trace_flags & TRACE_ITER_LATENCY_FMT)
>  		iter->iter_flags |= TRACE_FILE_LAT_FMT;
>  
> +	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
> +	if (trace_clocks[trace_clock_id].in_ns)
> +		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
> +
>  	iter->cpu_file = cpu_file;
>  	iter->tr = &global_trace;
>  	mutex_init(&iter->mutex);
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 55e1f7f..84fefed 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -409,10 +409,6 @@ void tracing_start_sched_switch_record(void);
>  int register_tracer(struct tracer *type);
>  void unregister_tracer(struct tracer *type);
>  int is_tracing_stopped(void);
> -enum trace_file_type {
> -	TRACE_FILE_LAT_FMT	= 1,
> -	TRACE_FILE_ANNOTATE	= 2,
> -};
>  
>  extern cpumask_var_t __read_mostly tracing_buffer_mask;
>  
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 123b189..ca640ff 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -610,24 +610,59 @@ lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
>  	return trace_print_lat_fmt(s, entry);
>  }
>  
> -static unsigned long preempt_mark_thresh = 100;
> +static unsigned long preempt_mark_thresh_us = 100;
> +/* roughly the same at 2.0GHz: */
> +static unsigned long preempt_mark_thresh_cycles = 200000;
>  
>  static int
> -lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
> -		    unsigned long rel_usecs)
> +lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
>  {
> -	return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
> -				rel_usecs > preempt_mark_thresh ? '!' :
> -				  rel_usecs > 1 ? '+' : ' ');
> +	int ret;
> +	struct trace_seq *s = &iter->seq;

Move the above down below, to give a nicer aesthetic look.

> +	unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
> +	unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
> +	unsigned long long abs_ts = iter->ts - iter->tr->time_start;
> +	unsigned long long rel_ts = next_ts - iter->ts;
	  struct trace_seq *s = &iter->seq;
> +	unsigned long mark_thresh;
	  int ret;

> +
> +	if (in_ns) {
> +		abs_ts = ns2usecs(abs_ts);
> +		rel_ts = ns2usecs(rel_ts);
> +		mark_thresh = preempt_mark_thresh_us;
> +	} else
> +		mark_thresh = preempt_mark_thresh_cycles;
> +
> +	if (verbose && in_ns) {
> +		unsigned long abs_msec = abs_ts;
> +		unsigned long abs_usec = do_div(abs_msec, USEC_PER_MSEC);
> +		unsigned long rel_msec = rel_ts;
> +		unsigned long rel_usec = do_div(rel_msec, USEC_PER_MSEC);

Either add a space here, or move the declarations to the top of the
function and keep the code part here.

> +		ret = trace_seq_printf(
> +				s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ",
> +				ns2usecs(iter->ts),
> +				abs_msec, abs_usec,
> +				rel_msec, rel_usec);
> +	} else if (verbose && !in_ns) {
> +		ret = trace_seq_printf(
> +				s, "[%016llx] %lld (+%lld): ",
> +				iter->ts, abs_ts, rel_ts);
> +	} else { /* !verbose */
> +		ret = trace_seq_printf(
> +				s, " %4lld%s%c: ",
> +				abs_ts,
> +				in_ns ? "us" : "",
> +				rel_ts > mark_thresh ? '!' :
> +				  rel_ts > 1 ? '+' : ' ');
> +	}
> +	return ret;
>  }
>  
>  int trace_print_context(struct trace_iterator *iter)
>  {
>  	struct trace_seq *s = &iter->seq;
>  	struct trace_entry *entry = iter->ent;
> -	unsigned long long t = ns2usecs(iter->ts);
> -	unsigned long usec_rem = do_div(t, USEC_PER_SEC);
> -	unsigned long secs = (unsigned long)t;
> +	unsigned long long t;
> +	unsigned long secs, usec_rem;
>  	char comm[TASK_COMM_LEN];
>  	int ret;
>  
> @@ -644,8 +679,13 @@ int trace_print_context(struct trace_iterator *iter)
>  			return 0;
>  	}
>  
> -	return trace_seq_printf(s, " %5lu.%06lu: ",
> -				secs, usec_rem);
> +	if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
> +		t = ns2usecs(iter->ts);
> +		usec_rem = do_div(t, USEC_PER_SEC);
> +		secs = (unsigned long)t;
> +		return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
> +	} else
> +		return trace_seq_printf(s, "%12llu: ", iter->ts);
>  }
>  
>  int trace_print_lat_context(struct trace_iterator *iter)
> @@ -659,36 +699,30 @@ int trace_print_lat_context(struct trace_iterator *iter)
>  			   *next_entry = trace_find_next_entry(iter, NULL,
>  							       &next_ts);
>  	unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
> -	unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
> -	unsigned long rel_usecs;
> +

Is this adding an extra newline?

-- Steve

>  
>  	/* Restore the original ent_size */
>  	iter->ent_size = ent_size;
>  
>  	if (!next_entry)
>  		next_ts = iter->ts;
> -	rel_usecs = ns2usecs(next_ts - iter->ts);
>  
>  	if (verbose) {
>  		char comm[TASK_COMM_LEN];
>  
>  		trace_find_cmdline(entry->pid, comm);
>  
> -		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
> -				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
> -				       entry->pid, iter->cpu, entry->flags,
> -				       entry->preempt_count, iter->idx,
> -				       ns2usecs(iter->ts),
> -				       abs_usecs / USEC_PER_MSEC,
> -				       abs_usecs % USEC_PER_MSEC,
> -				       rel_usecs / USEC_PER_MSEC,
> -				       rel_usecs % USEC_PER_MSEC);
> +		ret = trace_seq_printf(
> +				s, "%16s %5d %3d %d %08x %08lx ",
> +				comm, entry->pid, iter->cpu, entry->flags,
> +				entry->preempt_count, iter->idx);
>  	} else {
>  		ret = lat_print_generic(s, entry, iter->cpu);
> -		if (ret)
> -			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
>  	}
>  
> +	if (ret)
> +		ret = lat_print_timestamp(iter, next_ts);
> +
>  	return ret;
>  }
>  



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-25 21:42                       ` Steven Rostedt
@ 2012-09-25 22:29                         ` David Sharp
  2012-09-25 22:29                           ` David Sharp
  2012-09-25 23:36                           ` Steven Rostedt
  0 siblings, 2 replies; 37+ messages in thread
From: David Sharp @ 2012-09-25 22:29 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Masami Hiramatsu

On Tue, Sep 25, 2012 at 2:42 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> Sorry, I should have been more picky before. I haven't totally tested
> this yet.
>
> On Tue, 2012-09-25 at 13:49 -0700, David Sharp wrote:
>> With the addition of the "tsc" clock, formatting timestamps to look like
>> fractional seconds is misleading. Mark clocks as either in nanoseconds or
>> not, and format non-nanosecond timestamps as decimal integers.
>>
>> Tested:
>> $ cd /sys/kernel/debug/tracing/
>> $ cat trace_clock
>> [local] global tsc
>> $ echo sched_switch > set_event
>> $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
>> $ cat trace
>>           <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
>>            sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>>   ...
>> $ echo 1 > options/latency-format
>> $ cat trace
>>   <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
>>    sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>>   ...
>> $ echo tsc > trace_clock
>> $ cat trace
>> $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
>> $ echo 0 > options/latency-format
>> $ cat trace
>>           <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
>>            sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>>   ...
>> echo 1 > options/latency-format
>> $ cat trace
>>   <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
>>    sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>>   ...
>>
>> v2:
>> Move arch-specific bits out of generic code.
>> v4:
>> Fix x86_32 build due to 64-bit division.
>>
>> Google-Bug-Id: 6980623
>> Signed-off-by: David Sharp <dhsharp@google.com>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
>> ---
>>  arch/x86/include/asm/trace_clock.h |    2 +-
>>  include/linux/ftrace_event.h       |    6 +++
>>  kernel/trace/trace.c               |   15 +++++-
>>  kernel/trace/trace.h               |    4 --
>>  kernel/trace/trace_output.c        |   84 +++++++++++++++++++++++++-----------
>>  5 files changed, 78 insertions(+), 33 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
>> index 7ee0d8c..45e17f5 100644
>> --- a/arch/x86/include/asm/trace_clock.h
>> +++ b/arch/x86/include/asm/trace_clock.h
>> @@ -9,7 +9,7 @@
>>  extern u64 notrace trace_clock_x86_tsc(void);
>>
>>  # define ARCH_TRACE_CLOCKS \
>> -     { trace_clock_x86_tsc,  "x86-tsc" },
>> +     { trace_clock_x86_tsc,  "x86-tsc",      .in_ns = 0 },
>>
>>  #endif
>>
>> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
>> index 642928c..c760670 100644
>> --- a/include/linux/ftrace_event.h
>> +++ b/include/linux/ftrace_event.h
>> @@ -86,6 +86,12 @@ struct trace_iterator {
>>       cpumask_var_t           started;
>>  };
>>
>> +enum trace_iter_flags {
>> +     TRACE_FILE_LAT_FMT      = 1,
>> +     TRACE_FILE_ANNOTATE     = 2,
>> +     TRACE_FILE_TIME_IN_NS   = 4,
>> +};
>> +
>>
>>  struct trace_event;
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index 4e26df3..3fe4c5b 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -476,10 +476,11 @@ static const char *trace_options[] = {
>>  static struct {
>>       u64 (*func)(void);
>>       const char *name;
>> +     int in_ns; /* is this clock in nanoseconds? */
>
> Add a few tabs between the ns; and /*

Done.

>
>
>>  } trace_clocks[] = {
>> -     { trace_clock_local,    "local" },
>> -     { trace_clock_global,   "global" },
>> -     { trace_clock_counter,  "counter" },
>> +     { trace_clock_local,    "local",        1 },
>> +     { trace_clock_global,   "global",       1 },
>> +     { trace_clock_counter,  "counter",      0 },
>>       ARCH_TRACE_CLOCKS
>>  };
>>
>> @@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
>>       if (ring_buffer_overruns(iter->tr->buffer))
>>               iter->iter_flags |= TRACE_FILE_ANNOTATE;
>>
>> +     /* Output in nanoseconds only if we are using a clock in nanoseconds. */
>> +     if (trace_clocks[trace_clock_id].in_ns)
>> +             iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
>> +
>>       /* stop the trace while dumping */
>>       tracing_stop();
>>
>> @@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
>>       if (trace_flags & TRACE_ITER_LATENCY_FMT)
>>               iter->iter_flags |= TRACE_FILE_LAT_FMT;
>>
>> +     /* Output in nanoseconds only if we are using a clock in nanoseconds. */
>> +     if (trace_clocks[trace_clock_id].in_ns)
>> +             iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
>> +
>>       iter->cpu_file = cpu_file;
>>       iter->tr = &global_trace;
>>       mutex_init(&iter->mutex);
>> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
>> index 55e1f7f..84fefed 100644
>> --- a/kernel/trace/trace.h
>> +++ b/kernel/trace/trace.h
>> @@ -409,10 +409,6 @@ void tracing_start_sched_switch_record(void);
>>  int register_tracer(struct tracer *type);
>>  void unregister_tracer(struct tracer *type);
>>  int is_tracing_stopped(void);
>> -enum trace_file_type {
>> -     TRACE_FILE_LAT_FMT      = 1,
>> -     TRACE_FILE_ANNOTATE     = 2,
>> -};
>>
>>  extern cpumask_var_t __read_mostly tracing_buffer_mask;
>>
>> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
>> index 123b189..ca640ff 100644
>> --- a/kernel/trace/trace_output.c
>> +++ b/kernel/trace/trace_output.c
>> @@ -610,24 +610,59 @@ lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
>>       return trace_print_lat_fmt(s, entry);
>>  }
>>
>> -static unsigned long preempt_mark_thresh = 100;
>> +static unsigned long preempt_mark_thresh_us = 100;
>> +/* roughly the same at 2.0GHz: */
>> +static unsigned long preempt_mark_thresh_cycles = 200000;
>>
>>  static int
>> -lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
>> -                 unsigned long rel_usecs)
>> +lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
>>  {
>> -     return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
>> -                             rel_usecs > preempt_mark_thresh ? '!' :
>> -                               rel_usecs > 1 ? '+' : ' ');
>> +     int ret;
>> +     struct trace_seq *s = &iter->seq;
>
> Move the above down below, to give a nicer aesthetic look.

Well, that's a matter of opinion... you're the maintainer. :)  Done.

>
>> +     unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
>> +     unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
>> +     unsigned long long abs_ts = iter->ts - iter->tr->time_start;
>> +     unsigned long long rel_ts = next_ts - iter->ts;
>           struct trace_seq *s = &iter->seq;
>> +     unsigned long mark_thresh;
>           int ret;
>
>> +
>> +     if (in_ns) {
>> +             abs_ts = ns2usecs(abs_ts);
>> +             rel_ts = ns2usecs(rel_ts);
>> +             mark_thresh = preempt_mark_thresh_us;
>> +     } else
>> +             mark_thresh = preempt_mark_thresh_cycles;
>> +
>> +     if (verbose && in_ns) {
>> +             unsigned long abs_msec = abs_ts;
>> +             unsigned long abs_usec = do_div(abs_msec, USEC_PER_MSEC);
>> +             unsigned long rel_msec = rel_ts;
>> +             unsigned long rel_usec = do_div(rel_msec, USEC_PER_MSEC);
>
> Either add a space here, or move the declarations to the top of the
> function and keep the code part here.

Done.

>
>> +             ret = trace_seq_printf(
>> +                             s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ",
>> +                             ns2usecs(iter->ts),
>> +                             abs_msec, abs_usec,
>> +                             rel_msec, rel_usec);
>> +     } else if (verbose && !in_ns) {
>> +             ret = trace_seq_printf(
>> +                             s, "[%016llx] %lld (+%lld): ",
>> +                             iter->ts, abs_ts, rel_ts);
>> +     } else { /* !verbose */
>> +             ret = trace_seq_printf(
>> +                             s, " %4lld%s%c: ",
>> +                             abs_ts,
>> +                             in_ns ? "us" : "",
>> +                             rel_ts > mark_thresh ? '!' :
>> +                               rel_ts > 1 ? '+' : ' ');

I just noticed something about this: with x86-tsc clock, this will
always print a '+'. Does it matter? Also, is the 200k cycle threshold
for '!' okay? I guess the counter clock will always end up with rel_ts
== 1, so marks should never appear.

>> +     }
>> +     return ret;
>>  }
>>
>>  int trace_print_context(struct trace_iterator *iter)
>>  {
>>       struct trace_seq *s = &iter->seq;
>>       struct trace_entry *entry = iter->ent;
>> -     unsigned long long t = ns2usecs(iter->ts);
>> -     unsigned long usec_rem = do_div(t, USEC_PER_SEC);
>> -     unsigned long secs = (unsigned long)t;
>> +     unsigned long long t;
>> +     unsigned long secs, usec_rem;
>>       char comm[TASK_COMM_LEN];
>>       int ret;
>>
>> @@ -644,8 +679,13 @@ int trace_print_context(struct trace_iterator *iter)
>>                       return 0;
>>       }
>>
>> -     return trace_seq_printf(s, " %5lu.%06lu: ",
>> -                             secs, usec_rem);
>> +     if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
>> +             t = ns2usecs(iter->ts);
>> +             usec_rem = do_div(t, USEC_PER_SEC);
>> +             secs = (unsigned long)t;
>> +             return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
>> +     } else
>> +             return trace_seq_printf(s, "%12llu: ", iter->ts);
>>  }
>>
>>  int trace_print_lat_context(struct trace_iterator *iter)
>> @@ -659,36 +699,30 @@ int trace_print_lat_context(struct trace_iterator *iter)
>>                          *next_entry = trace_find_next_entry(iter, NULL,
>>                                                              &next_ts);
>>       unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
>> -     unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
>> -     unsigned long rel_usecs;
>> +
>
> Is this adding an extra newline?

Yup. Fixed.

>
> -- Steve
>
>>
>>       /* Restore the original ent_size */
>>       iter->ent_size = ent_size;
>>
>>       if (!next_entry)
>>               next_ts = iter->ts;
>> -     rel_usecs = ns2usecs(next_ts - iter->ts);
>>
>>       if (verbose) {
>>               char comm[TASK_COMM_LEN];
>>
>>               trace_find_cmdline(entry->pid, comm);
>>
>> -             ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
>> -                                    " %ld.%03ldms (+%ld.%03ldms): ", comm,
>> -                                    entry->pid, iter->cpu, entry->flags,
>> -                                    entry->preempt_count, iter->idx,
>> -                                    ns2usecs(iter->ts),
>> -                                    abs_usecs / USEC_PER_MSEC,
>> -                                    abs_usecs % USEC_PER_MSEC,
>> -                                    rel_usecs / USEC_PER_MSEC,
>> -                                    rel_usecs % USEC_PER_MSEC);
>> +             ret = trace_seq_printf(
>> +                             s, "%16s %5d %3d %d %08x %08lx ",
>> +                             comm, entry->pid, iter->cpu, entry->flags,
>> +                             entry->preempt_count, iter->idx);
>>       } else {
>>               ret = lat_print_generic(s, entry, iter->cpu);
>> -             if (ret)
>> -                     ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
>>       }
>>
>> +     if (ret)
>> +             ret = lat_print_timestamp(iter, next_ts);
>> +
>>       return ret;
>>  }
>>
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-25 22:29                         ` David Sharp
@ 2012-09-25 22:29                           ` David Sharp
  2012-09-26 18:30                             ` Steven Rostedt
  2012-09-25 23:36                           ` Steven Rostedt
  1 sibling, 1 reply; 37+ messages in thread
From: David Sharp @ 2012-09-25 22:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Sharp, Steven Rostedt, Masami Hiramatsu

With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
          <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
           sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
          <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
           sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

v2:
Move arch-specific bits out of generic code.
v4:
Fix x86_32 build due to 64-bit division.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp <dhsharp@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
---
 arch/x86/include/asm/trace_clock.h |    2 +-
 include/linux/ftrace_event.h       |    6 +++
 kernel/trace/trace.c               |   15 +++++-
 kernel/trace/trace.h               |    4 --
 kernel/trace/trace_output.c        |   84 +++++++++++++++++++++++++-----------
 5 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/trace_clock.h b/arch/x86/include/asm/trace_clock.h
index 7ee0d8c..45e17f5 100644
--- a/arch/x86/include/asm/trace_clock.h
+++ b/arch/x86/include/asm/trace_clock.h
@@ -9,7 +9,7 @@
 extern u64 notrace trace_clock_x86_tsc(void);
 
 # define ARCH_TRACE_CLOCKS \
-	{ trace_clock_x86_tsc,	"x86-tsc" },
+	{ trace_clock_x86_tsc,	"x86-tsc",	.in_ns = 0 },
 
 #endif
 
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
 	cpumask_var_t		started;
 };
 
+enum trace_iter_flags {
+	TRACE_FILE_LAT_FMT	= 1,
+	TRACE_FILE_ANNOTATE	= 2,
+	TRACE_FILE_TIME_IN_NS	= 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4e26df3..cff3427 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,10 +476,11 @@ static const char *trace_options[] = {
 static struct {
 	u64 (*func)(void);
 	const char *name;
+	int in_ns;		/* is this clock in nanoseconds? */
 } trace_clocks[] = {
-	{ trace_clock_local,	"local" },
-	{ trace_clock_global,	"global" },
-	{ trace_clock_counter,	"counter" },
+	{ trace_clock_local,	"local",	1 },
+	{ trace_clock_global,	"global",	1 },
+	{ trace_clock_counter,	"counter",	0 },
 	ARCH_TRACE_CLOCKS
 };
 
@@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
 	if (ring_buffer_overruns(iter->tr->buffer))
 		iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	/* stop the trace while dumping */
 	tracing_stop();
 
@@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
 	if (trace_flags & TRACE_ITER_LATENCY_FMT)
 		iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+	/* Output in nanoseconds only if we are using a clock in nanoseconds. */
+	if (trace_clocks[trace_clock_id].in_ns)
+		iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
 	iter->cpu_file = cpu_file;
 	iter->tr = &global_trace;
 	mutex_init(&iter->mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -409,10 +409,6 @@ void tracing_start_sched_switch_record(void);
 int register_tracer(struct tracer *type);
 void unregister_tracer(struct tracer *type);
 int is_tracing_stopped(void);
-enum trace_file_type {
-	TRACE_FILE_LAT_FMT	= 1,
-	TRACE_FILE_ANNOTATE	= 2,
-};
 
 extern cpumask_var_t __read_mostly tracing_buffer_mask;
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 123b189..a759896 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -610,24 +610,60 @@ lat_print_generic(struct trace_seq *s, struct trace_entry *entry, int cpu)
 	return trace_print_lat_fmt(s, entry);
 }
 
-static unsigned long preempt_mark_thresh = 100;
+static unsigned long preempt_mark_thresh_us = 100;
+/* roughly the same at 2.0GHz: */
+static unsigned long preempt_mark_thresh_cycles = 200000;
 
 static int
-lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
-		    unsigned long rel_usecs)
+lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
 {
-	return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
-				rel_usecs > preempt_mark_thresh ? '!' :
-				  rel_usecs > 1 ? '+' : ' ');
+	unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
+	unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
+	unsigned long long abs_ts = iter->ts - iter->tr->time_start;
+	unsigned long long rel_ts = next_ts - iter->ts;
+	struct trace_seq *s = &iter->seq;
+	unsigned long mark_thresh;
+	int ret;
+
+	if (in_ns) {
+		abs_ts = ns2usecs(abs_ts);
+		rel_ts = ns2usecs(rel_ts);
+		mark_thresh = preempt_mark_thresh_us;
+	} else
+		mark_thresh = preempt_mark_thresh_cycles;
+
+	if (verbose && in_ns) {
+		unsigned long abs_msec = abs_ts;
+		unsigned long abs_usec = do_div(abs_msec, USEC_PER_MSEC);
+		unsigned long rel_msec = rel_ts;
+		unsigned long rel_usec = do_div(rel_msec, USEC_PER_MSEC);
+
+		ret = trace_seq_printf(
+				s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ",
+				ns2usecs(iter->ts),
+				abs_msec, abs_usec,
+				rel_msec, rel_usec);
+	} else if (verbose && !in_ns) {
+		ret = trace_seq_printf(
+				s, "[%016llx] %lld (+%lld): ",
+				iter->ts, abs_ts, rel_ts);
+	} else { /* !verbose */
+		ret = trace_seq_printf(
+				s, " %4lld%s%c: ",
+				abs_ts,
+				in_ns ? "us" : "",
+				rel_ts > mark_thresh ? '!' :
+				  rel_ts > 1 ? '+' : ' ');
+	}
+	return ret;
 }
 
 int trace_print_context(struct trace_iterator *iter)
 {
 	struct trace_seq *s = &iter->seq;
 	struct trace_entry *entry = iter->ent;
-	unsigned long long t = ns2usecs(iter->ts);
-	unsigned long usec_rem = do_div(t, USEC_PER_SEC);
-	unsigned long secs = (unsigned long)t;
+	unsigned long long t;
+	unsigned long secs, usec_rem;
 	char comm[TASK_COMM_LEN];
 	int ret;
 
@@ -644,8 +680,13 @@ int trace_print_context(struct trace_iterator *iter)
 			return 0;
 	}
 
-	return trace_seq_printf(s, " %5lu.%06lu: ",
-				secs, usec_rem);
+	if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
+		t = ns2usecs(iter->ts);
+		usec_rem = do_div(t, USEC_PER_SEC);
+		secs = (unsigned long)t;
+		return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
+	} else
+		return trace_seq_printf(s, "%12llu: ", iter->ts);
 }
 
 int trace_print_lat_context(struct trace_iterator *iter)
@@ -659,36 +700,29 @@ int trace_print_lat_context(struct trace_iterator *iter)
 			   *next_entry = trace_find_next_entry(iter, NULL,
 							       &next_ts);
 	unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
-	unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
-	unsigned long rel_usecs;
 
 	/* Restore the original ent_size */
 	iter->ent_size = ent_size;
 
 	if (!next_entry)
 		next_ts = iter->ts;
-	rel_usecs = ns2usecs(next_ts - iter->ts);
 
 	if (verbose) {
 		char comm[TASK_COMM_LEN];
 
 		trace_find_cmdline(entry->pid, comm);
 
-		ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx [%08llx]"
-				       " %ld.%03ldms (+%ld.%03ldms): ", comm,
-				       entry->pid, iter->cpu, entry->flags,
-				       entry->preempt_count, iter->idx,
-				       ns2usecs(iter->ts),
-				       abs_usecs / USEC_PER_MSEC,
-				       abs_usecs % USEC_PER_MSEC,
-				       rel_usecs / USEC_PER_MSEC,
-				       rel_usecs % USEC_PER_MSEC);
+		ret = trace_seq_printf(
+				s, "%16s %5d %3d %d %08x %08lx ",
+				comm, entry->pid, iter->cpu, entry->flags,
+				entry->preempt_count, iter->idx);
 	} else {
 		ret = lat_print_generic(s, entry, iter->cpu);
-		if (ret)
-			ret = lat_print_timestamp(s, abs_usecs, rel_usecs);
 	}
 
+	if (ret)
+		ret = lat_print_timestamp(iter, next_ts);
+
 	return ret;
 }
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-25 22:29                         ` David Sharp
  2012-09-25 22:29                           ` David Sharp
@ 2012-09-25 23:36                           ` Steven Rostedt
  2012-09-26  0:32                             ` David Sharp
  1 sibling, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2012-09-25 23:36 UTC (permalink / raw)
  To: David Sharp; +Cc: linux-kernel, Masami Hiramatsu

On Tue, 2012-09-25 at 15:29 -0700, David Sharp wrote:


> >> +             ret = trace_seq_printf(
> >> +                             s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ",
> >> +                             ns2usecs(iter->ts),
> >> +                             abs_msec, abs_usec,
> >> +                             rel_msec, rel_usec);
> >> +     } else if (verbose && !in_ns) {
> >> +             ret = trace_seq_printf(
> >> +                             s, "[%016llx] %lld (+%lld): ",
> >> +                             iter->ts, abs_ts, rel_ts);
> >> +     } else { /* !verbose */
> >> +             ret = trace_seq_printf(
> >> +                             s, " %4lld%s%c: ",
> >> +                             abs_ts,
> >> +                             in_ns ? "us" : "",
> >> +                             rel_ts > mark_thresh ? '!' :
> >> +                               rel_ts > 1 ? '+' : ' ');
> 
> I just noticed something about this: with x86-tsc clock, this will
> always print a '+'. Does it matter? Also, is the 200k cycle threshold
> for '!' okay? I guess the counter clock will always end up with rel_ts
> == 1, so marks should never appear.
> 

Actually, I'm thinking that counters should not add those annotations.
As it just doesn't make sense.

-- Steve



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-25 23:36                           ` Steven Rostedt
@ 2012-09-26  0:32                             ` David Sharp
  0 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-26  0:32 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Masami Hiramatsu

On Tue, Sep 25, 2012 at 4:36 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2012-09-25 at 15:29 -0700, David Sharp wrote:
>
>
>> >> +             ret = trace_seq_printf(
>> >> +                             s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ",
>> >> +                             ns2usecs(iter->ts),
>> >> +                             abs_msec, abs_usec,
>> >> +                             rel_msec, rel_usec);
>> >> +     } else if (verbose && !in_ns) {
>> >> +             ret = trace_seq_printf(
>> >> +                             s, "[%016llx] %lld (+%lld): ",
>> >> +                             iter->ts, abs_ts, rel_ts);
>> >> +     } else { /* !verbose */
>> >> +             ret = trace_seq_printf(
>> >> +                             s, " %4lld%s%c: ",
>> >> +                             abs_ts,
>> >> +                             in_ns ? "us" : "",
>> >> +                             rel_ts > mark_thresh ? '!' :
>> >> +                               rel_ts > 1 ? '+' : ' ');
>>
>> I just noticed something about this: with x86-tsc clock, this will
>> always print a '+'. Does it matter? Also, is the 200k cycle threshold
>> for '!' okay? I guess the counter clock will always end up with rel_ts
>> == 1, so marks should never appear.
>>
>
> Actually, I'm thinking that counters should not add those annotations.
> As it just doesn't make sense.

Right. But they won't appear anyway, since the delta will always be 1.

wait, by "counters" are you including TSC? Surely that makes sense,
since it is a measurement of time.

Eh... sorry I brought it up. I don't really want to change it. I never
use the latency tracer, so I mostly just don't want to break it.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-25 22:29                           ` David Sharp
@ 2012-09-26 18:30                             ` Steven Rostedt
  2012-09-26 21:03                               ` David Sharp
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2012-09-26 18:30 UTC (permalink / raw)
  To: David Sharp; +Cc: linux-kernel, Masami Hiramatsu

On Tue, 2012-09-25 at 15:29 -0700, David Sharp wrote:

>  static int
> -lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
> -		    unsigned long rel_usecs)
> +lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
>  {
> -	return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
> -				rel_usecs > preempt_mark_thresh ? '!' :
> -				  rel_usecs > 1 ? '+' : ' ');
> +	unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
> +	unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
> +	unsigned long long abs_ts = iter->ts - iter->tr->time_start;
> +	unsigned long long rel_ts = next_ts - iter->ts;
> +	struct trace_seq *s = &iter->seq;
> +	unsigned long mark_thresh;
> +	int ret;
> +
> +	if (in_ns) {
> +		abs_ts = ns2usecs(abs_ts);
> +		rel_ts = ns2usecs(rel_ts);
> +		mark_thresh = preempt_mark_thresh_us;
> +	} else
> +		mark_thresh = preempt_mark_thresh_cycles;
> +
> +	if (verbose && in_ns) {
> +		unsigned long abs_msec = abs_ts;
> +		unsigned long abs_usec = do_div(abs_msec, USEC_PER_MSEC);
> +		unsigned long rel_msec = rel_ts;
> +		unsigned long rel_usec = do_div(rel_msec, USEC_PER_MSEC);
> +
> +		ret = trace_seq_printf(
> +				s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ",
> +				ns2usecs(iter->ts),
> +				abs_msec, abs_usec,
> +				rel_msec, rel_usec);
> +	} else if (verbose && !in_ns) {
> +		ret = trace_seq_printf(
> +				s, "[%016llx] %lld (+%lld): ",
> +				iter->ts, abs_ts, rel_ts);
> +	} else { /* !verbose */
> +		ret = trace_seq_printf(
> +				s, " %4lld%s%c: ",
> +				abs_ts,
> +				in_ns ? "us" : "",
> +				rel_ts > mark_thresh ? '!' :
> +				  rel_ts > 1 ? '+' : ' ');

I still think the annotations are meaningless for counters. Even if the
counter is a timer like the tsc, as it does not coincide with real time
(us), I say just don't print it for counters.

This is not helpful:

  <idle>-0       0d..1  168+: trace_hardirqs_off_thunk <-apic_timer_interrupt
  <idle>-0       0d..1  672+: smp_apic_timer_interrupt <-apic_timer_interrupt
  <idle>-0       0d..1 1224+: irq_enter <-smp_apic_timer_interrupt
  <idle>-0       0d..1 1720+: rcu_irq_enter <-irq_enter
  <idle>-0       0d..1 2328+: rcu_idle_exit_common.isra.38 <-rcu_irq_enter
  <idle>-0       0d..1 3016+: local_bh_disable <-irq_enter
  <idle>-0       0d..1 3512+: __local_bh_disable <-irq_enter
  <idle>-0       0d.s1 4120+: tick_check_idle <-irq_enter
  <idle>-0       0d.s1 4632+: tick_check_oneshot_broadcast <-tick_check_idle
  <idle>-0       0d.s1 5296+: ktime_get <-tick_check_idle
  <idle>-0       0d.s1 6104+: tick_nohz_stop_idle <-tick_check_idle
  <idle>-0       0d.s1 6616+: update_ts_time_stats <-tick_nohz_stop_idle
  <idle>-0       0d.s1 7144+: nr_iowait_cpu <-update_ts_time_stats
  <idle>-0       0d.s1 7720+: ktime_get <-sched_clock_tick
  <idle>-0       0d.s1 8480+: touch_softlockup_watchdog <-tick_check_idle
  <idle>-0       0d.s1 9120+: tick_do_update_jiffies64 <-tick_check_idle
  <idle>-0       0d.s1 9648+: _raw_spin_lock <-tick_do_update_jiffies64


> +	}
> +	return ret;
>  }
>  
>  int trace_print_context(struct trace_iterator *iter)
>  {
>  	struct trace_seq *s = &iter->seq;
>  	struct trace_entry *entry = iter->ent;
> -	unsigned long long t = ns2usecs(iter->ts);
> -	unsigned long usec_rem = do_div(t, USEC_PER_SEC);
> -	unsigned long secs = (unsigned long)t;
> +	unsigned long long t;
> +	unsigned long secs, usec_rem;
>  	char comm[TASK_COMM_LEN];
>  	int ret;
>  
> @@ -644,8 +680,13 @@ int trace_print_context(struct trace_iterator *iter)
>  			return 0;
>  	}
>  
> -	return trace_seq_printf(s, " %5lu.%06lu: ",
> -				secs, usec_rem);
> +	if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
> +		t = ns2usecs(iter->ts);
> +		usec_rem = do_div(t, USEC_PER_SEC);
> +		secs = (unsigned long)t;
> +		return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);

You took away the space before %5.

> +	} else
> +		return trace_seq_printf(s, "%12llu: ", iter->ts);

here too.

We end up with this:

          <idle>-0     [002] d.h11968173392984: _raw_spin_lock_irqsave <-try_to_wake_up
          <idle>-0     [002] d.h11968173393704: add_preempt_count <-_raw_spin_lock_irqsave
          <idle>-0     [002] d.h21968173394568: task_waking_fair <-try_to_wake_up
          <idle>-0     [002] d.h21968173395488: select_task_rq_fair <-select_task_rq
          <idle>-0     [002] d.h21968173396240: __rcu_read_lock <-select_task_rq_fair
          <idle>-0     [002] d.h21968173397136: idle_cpu <-select_task_rq_fair
          <idle>-0     [002] d.h21968173397816: __rcu_read_unlock <-select_task_rq_fair
          <idle>-0     [002] d.h21968173398608: cpus_share_cache <-try_to_wake_up
          <idle>-0     [002] d.h21968173399272: _raw_spin_lock <-try_to_wake_up
          <idle>-0     [002] d.h21968173399888: add_preempt_count <-_raw_spin_lock
          <idle>-0     [002] d.h31968173400608: ttwu_do_activate.constprop.87 <-try_to_wake_up
          <idle>-0     [002] d.h31968173401272: activate_task <-ttwu_do_activate.constprop.87
          <idle>-0     [002] d.h31968173401912: enqueue_task <-ttwu_do_activate.constprop.87

-- Steve

>  }
>  
>  int trace_print_lat_context(struct trace_iterator *iter)
> @@ -659,36 +700,29 @@ int trace_print_lat_context(struct trace_iterator *iter)
>  			   *next_entry = trace_find_next_entry(iter, NULL,
>  							       &next_ts);
>  	unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
> -	unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
> -	unsigned long rel_usecs;
>  
>  	/* Restore the original ent_size */
>  	iter->ent_size = ent_size;
>  
>  	if (!next_entry)
>  		next_ts = iter->ts;
> -	rel_usecs = ns2usecs(next_ts - iter->ts);



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.
  2012-09-26 18:30                             ` Steven Rostedt
@ 2012-09-26 21:03                               ` David Sharp
  0 siblings, 0 replies; 37+ messages in thread
From: David Sharp @ 2012-09-26 21:03 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Masami Hiramatsu

On Wed, Sep 26, 2012 at 11:30 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2012-09-25 at 15:29 -0700, David Sharp wrote:
>
>>  static int
>> -lat_print_timestamp(struct trace_seq *s, u64 abs_usecs,
>> -                 unsigned long rel_usecs)
>> +lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
>>  {
>> -     return trace_seq_printf(s, " %4lldus%c: ", abs_usecs,
>> -                             rel_usecs > preempt_mark_thresh ? '!' :
>> -                               rel_usecs > 1 ? '+' : ' ');
>> +     unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
>> +     unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
>> +     unsigned long long abs_ts = iter->ts - iter->tr->time_start;
>> +     unsigned long long rel_ts = next_ts - iter->ts;
>> +     struct trace_seq *s = &iter->seq;
>> +     unsigned long mark_thresh;
>> +     int ret;
>> +
>> +     if (in_ns) {
>> +             abs_ts = ns2usecs(abs_ts);
>> +             rel_ts = ns2usecs(rel_ts);
>> +             mark_thresh = preempt_mark_thresh_us;
>> +     } else
>> +             mark_thresh = preempt_mark_thresh_cycles;
>> +
>> +     if (verbose && in_ns) {
>> +             unsigned long abs_msec = abs_ts;
>> +             unsigned long abs_usec = do_div(abs_msec, USEC_PER_MSEC);
>> +             unsigned long rel_msec = rel_ts;
>> +             unsigned long rel_usec = do_div(rel_msec, USEC_PER_MSEC);
>> +
>> +             ret = trace_seq_printf(
>> +                             s, "[%08llx] %ld.%03ldms (+%ld.%03ldms): ",
>> +                             ns2usecs(iter->ts),
>> +                             abs_msec, abs_usec,
>> +                             rel_msec, rel_usec);
>> +     } else if (verbose && !in_ns) {
>> +             ret = trace_seq_printf(
>> +                             s, "[%016llx] %lld (+%lld): ",
>> +                             iter->ts, abs_ts, rel_ts);
>> +     } else { /* !verbose */
>> +             ret = trace_seq_printf(
>> +                             s, " %4lld%s%c: ",
>> +                             abs_ts,
>> +                             in_ns ? "us" : "",
>> +                             rel_ts > mark_thresh ? '!' :
>> +                               rel_ts > 1 ? '+' : ' ');
>
> I still think the annotations are meaningless for counters. Even if the
> counter is a timer like the tsc, as it does not coincide with real time
> (us), I say just don't print it for counters.

Sounds fine to me, as that's the easiest modification anyway. Done.

>
> This is not helpful:
>
> <snip>

Agreed. The other way to fix it would have been to change the "+"
threshold from 1 to ~2000.

>
>
>> +     }
>> +     return ret;
>>  }
>>
>>  int trace_print_context(struct trace_iterator *iter)
>>  {
>>       struct trace_seq *s = &iter->seq;
>>       struct trace_entry *entry = iter->ent;
>> -     unsigned long long t = ns2usecs(iter->ts);
>> -     unsigned long usec_rem = do_div(t, USEC_PER_SEC);
>> -     unsigned long secs = (unsigned long)t;
>> +     unsigned long long t;
>> +     unsigned long secs, usec_rem;
>>       char comm[TASK_COMM_LEN];
>>       int ret;
>>
>> @@ -644,8 +680,13 @@ int trace_print_context(struct trace_iterator *iter)
>>                       return 0;
>>       }
>>
>> -     return trace_seq_printf(s, " %5lu.%06lu: ",
>> -                             secs, usec_rem);
>> +     if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
>> +             t = ns2usecs(iter->ts);
>> +             usec_rem = do_div(t, USEC_PER_SEC);
>> +             secs = (unsigned long)t;
>> +             return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
>
> You took away the space before %5.
>
>> +     } else
>> +             return trace_seq_printf(s, "%12llu: ", iter->ts);
>
> here too.

Don't know why I would have changed that... Fixed.

>>  }
>>
>>  int trace_print_lat_context(struct trace_iterator *iter)
>> @@ -659,36 +700,29 @@ int trace_print_lat_context(struct trace_iterator *iter)
>>                          *next_entry = trace_find_next_entry(iter, NULL,
>>                                                              &next_ts);
>>       unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
>> -     unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
>> -     unsigned long rel_usecs;
>>
>>       /* Restore the original ent_size */
>>       iter->ent_size = ent_size;
>>
>>       if (!next_entry)
>>               next_ts = iter->ts;
>> -     rel_usecs = ns2usecs(next_ts - iter->ts);
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/3] tracing,x86: add a TSC trace_clock
  2012-09-20 22:52             ` [PATCH v3 " David Sharp
  2012-09-20 22:52               ` [PATCH v3 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
  2012-09-20 22:52               ` [PATCH v3 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
@ 2012-09-29  3:15               ` Steven Rostedt
  2012-10-02  0:33                 ` David Sharp
  2 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2012-09-29  3:15 UTC (permalink / raw)
  To: David Sharp
  Cc: linux-kernel, Masami Hiramatsu, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

On Thu, 2012-09-20 at 15:52 -0700, David Sharp wrote:

> diff --git a/include/asm-generic/trace_clock.h b/include/asm-generic/trace_clock.h
> new file mode 100644
> index 0000000..6726f1b
> --- /dev/null
> +++ b/include/asm-generic/trace_clock.h
> @@ -0,0 +1,16 @@
> +#ifndef _ASM_GENERIC_TRACE_CLOCK_H
> +#define _ASM_GENERIC_TRACE_CLOCK_H
> +/*
> + * Arch-specific trace clocks.
> + */
> +
> +/*
> + * Additional trace clocks added to the trace_clocks
> + * array in kernel/trace/trace.c
> + * None if the architecture has not defined it.
> + */
> +#ifndef ARCH_TRACE_CLOCKS
> +# define ARCH_TRACE_CLOCKS
> +#endif
> +
> +#endif  /* _ASM_GENERIC_TRACE_CLOCK_H */

Unfortunately this isn't enough. You need to add a stub in all
arch/*/include/asm/ directories. That has:

--- trace_clock.h:
#ifndef _ASM_TRACE_CLOCK_H
#define _ASM_TRACE_CLOCK_H

#include <asm-generic/trace_clock.h>

#endif
---


Because running this against my cross-compile test, failed on all archs
but x86.

-- Steve



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/3] tracing,x86: add a TSC trace_clock
  2012-09-29  3:15               ` [PATCH v3 1/3] tracing,x86: add a TSC trace_clock Steven Rostedt
@ 2012-10-02  0:33                 ` David Sharp
  2012-10-02  0:43                   ` Steven Rostedt
  0 siblings, 1 reply; 37+ messages in thread
From: David Sharp @ 2012-10-02  0:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Masami Hiramatsu, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

On Fri, Sep 28, 2012 at 8:15 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 2012-09-20 at 15:52 -0700, David Sharp wrote:
>
>> diff --git a/include/asm-generic/trace_clock.h b/include/asm-generic/trace_clock.h
>> new file mode 100644
>> index 0000000..6726f1b
>> --- /dev/null
>> +++ b/include/asm-generic/trace_clock.h
>> @@ -0,0 +1,16 @@
>> +#ifndef _ASM_GENERIC_TRACE_CLOCK_H
>> +#define _ASM_GENERIC_TRACE_CLOCK_H
>> +/*
>> + * Arch-specific trace clocks.
>> + */
>> +
>> +/*
>> + * Additional trace clocks added to the trace_clocks
>> + * array in kernel/trace/trace.c
>> + * None if the architecture has not defined it.
>> + */
>> +#ifndef ARCH_TRACE_CLOCKS
>> +# define ARCH_TRACE_CLOCKS
>> +#endif
>> +
>> +#endif  /* _ASM_GENERIC_TRACE_CLOCK_H */
>
> Unfortunately this isn't enough. You need to add a stub in all
> arch/*/include/asm/ directories. That has:

Oh really? Okay. Sorry, I really don't know the mechanics of the arch
trees. I'll work on that now, and then send you a new series since I
forgot to send out my last update.

>
> --- trace_clock.h:
> #ifndef _ASM_TRACE_CLOCK_H
> #define _ASM_TRACE_CLOCK_H
>
> #include <asm-generic/trace_clock.h>
>
> #endif
> ---
>
>
> Because running this against my cross-compile test, failed on all archs
> but x86.
>
> -- Steve
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 1/3] tracing,x86: add a TSC trace_clock
  2012-10-02  0:33                 ` David Sharp
@ 2012-10-02  0:43                   ` Steven Rostedt
  0 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2012-10-02  0:43 UTC (permalink / raw)
  To: David Sharp
  Cc: linux-kernel, Masami Hiramatsu, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin

On Mon, 2012-10-01 at 17:33 -0700, David Sharp wrote:

> > Unfortunately this isn't enough. You need to add a stub in all
> > arch/*/include/asm/ directories. That has:
> 
> Oh really? Okay. Sorry, I really don't know the mechanics of the arch
> trees. I'll work on that now, and then send you a new series since I
> forgot to send out my last update.

It's a good idea to cross compile archs, especially if you make any
change that affects archs (like this one did).

Just use ktest.pl. You can download the cross tools at:

  http://kernel.org/pub/tools/crosstool/files/bin/x86_64/

And you can use the example config file in:

  tools/testing/ktest/examples/crosstests.conf

That's basically the test I use to test cross compiling of other archs.

-- Steve



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2012-10-02  0:43 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-12  2:41 [PATCH 0/3] TSC trace_clock David Sharp
2012-09-12  2:41 ` [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change David Sharp
2012-09-12  3:48   ` Steven Rostedt
2012-09-12 18:48     ` David Sharp
2012-09-12  2:41 ` [PATCH 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
2012-09-12  2:41 ` [PATCH 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
2012-09-12 23:31 ` [PATCH v2 0/3] TSC trace_clock David Sharp
2012-09-12 23:31   ` [PATCH 1/3] tracing,x86: add a TSC trace_clock; reset buffer on clock change David Sharp
2012-09-12 23:31   ` [PATCH v2 1/3] tracing,x86: add a TSC trace_clock David Sharp
2012-09-13  1:15     ` Masami Hiramatsu
2012-09-13  1:23     ` Steven Rostedt
2012-09-18 21:37       ` David Sharp
2012-09-18 21:37         ` [PATCH v2 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
2012-09-18 21:37         ` [PATCH v2 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
2012-09-19  2:10         ` [PATCH v2 1/3] tracing,x86: add a TSC trace_clock Steven Rostedt
2012-09-20  8:55         ` Ingo Molnar
2012-09-20 13:34           ` Steven Rostedt
2012-09-20 22:52             ` [PATCH v3 " David Sharp
2012-09-20 22:52               ` [PATCH v3 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
2012-09-20 22:52               ` [PATCH v3 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp
2012-09-25  3:27                 ` Steven Rostedt
2012-09-25 20:48                   ` David Sharp
2012-09-25 20:49                     ` [PATCH v4 " David Sharp
2012-09-25 21:42                       ` Steven Rostedt
2012-09-25 22:29                         ` David Sharp
2012-09-25 22:29                           ` David Sharp
2012-09-26 18:30                             ` Steven Rostedt
2012-09-26 21:03                               ` David Sharp
2012-09-25 23:36                           ` Steven Rostedt
2012-09-26  0:32                             ` David Sharp
2012-09-29  3:15               ` [PATCH v3 1/3] tracing,x86: add a TSC trace_clock Steven Rostedt
2012-10-02  0:33                 ` David Sharp
2012-10-02  0:43                   ` Steven Rostedt
2012-09-20 22:53           ` [PATCH v2 " David Sharp
2012-09-18 21:46       ` David Sharp
2012-09-12 23:31   ` [PATCH v2 2/3] tracing: reset ring buffer when changing trace_clocks David Sharp
2012-09-12 23:32   ` [PATCH v2 3/3] tracing: format non-nanosec times from tsc clock without a decimal point David Sharp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).