linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] tracing: fix uptime overflow problem
@ 2014-06-28 11:09 Xie XiuQi
  2014-06-28 11:10 ` [PATCH 1/2] " Xie XiuQi
  2014-06-28 11:10 ` [PATCH 2/2] tracing: update Documentation/trace/ftrace.txt for uptime Xie XiuQi
  0 siblings, 2 replies; 14+ messages in thread
From: Xie XiuQi @ 2014-06-28 11:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: rostedt, mingo, tony.luck, fweisbec, m.chehab

The "uptime" tracer added in:
    commit 8aacf017b065a805d27467843490c976835eb4a5
    tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
        (u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds.  An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Tony provide full featured jiffies_to_nsecs() function, but
can't resolve another problem that jiffies_lock is not safe
in NMI context.

Now we use the lockless function __current_kernel_time() and
getboottime() to calculate the uptime.

The former discussion is here:
http://lkml.org/lkml/2014/4/8/525

Additional, I changed trace_clock_jiffies to trace_clock_uptime,
in order to better describe its function and updated the document.

Xie XiuQi (2):
  tracing: fix uptime overflow problem
  tracing: update Documentation/trace/ftrace.txt for uptime

 Documentation/trace/ftrace.txt |  3 +--
 include/linux/trace_clock.h    |  2 +-
 kernel/trace/trace.c           |  2 +-
 kernel/trace/trace_clock.c     | 15 +++++++++++----
 4 files changed, 14 insertions(+), 8 deletions(-)

-- 
2.0.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] tracing: fix uptime overflow problem
  2014-06-28 11:09 [PATCH 0/2] tracing: fix uptime overflow problem Xie XiuQi
@ 2014-06-28 11:10 ` Xie XiuQi
  2014-06-30 15:10   ` Steven Rostedt
  2014-06-28 11:10 ` [PATCH 2/2] tracing: update Documentation/trace/ftrace.txt for uptime Xie XiuQi
  1 sibling, 1 reply; 14+ messages in thread
From: Xie XiuQi @ 2014-06-28 11:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: rostedt, mingo, tony.luck, fweisbec, m.chehab

The "uptime" tracer added in:
    commit 8aacf017b065a805d27467843490c976835eb4a5
    tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
        (u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds.  An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Tony provide full featured jiffies_to_nsecs() function, but
can't resolve another problem that jiffies_lock is not safe
in NMI context.

Now we use the lockless function __current_kernel_time() and
getboottime() to calculate the uptime.

The former discussion is here:
http://lkml.org/lkml/2014/4/8/525

Additional, I changed trace_clock_jiffies to trace_clock_uptime,
in order to better describe its function.

Reported-by: Tony Luck <tony.luck@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: stable@vger.kernel.org	# 3.10+
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 include/linux/trace_clock.h |  2 +-
 kernel/trace/trace.c        |  2 +-
 kernel/trace/trace_clock.c  | 15 +++++++++++----
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/trace_clock.h b/include/linux/trace_clock.h
index 1d7ca27..2961ac7 100644
--- a/include/linux/trace_clock.h
+++ b/include/linux/trace_clock.h
@@ -16,7 +16,7 @@
 
 extern u64 notrace trace_clock_local(void);
 extern u64 notrace trace_clock(void);
-extern u64 notrace trace_clock_jiffies(void);
+extern u64 notrace trace_clock_uptime(void);
 extern u64 notrace trace_clock_global(void);
 extern u64 notrace trace_clock_counter(void);
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 384ede3..867e849 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -809,7 +809,7 @@ static struct {
 	{ trace_clock_local,	"local",	1 },
 	{ trace_clock_global,	"global",	1 },
 	{ trace_clock_counter,	"counter",	0 },
-	{ trace_clock_jiffies,	"uptime",	1 },
+	{ trace_clock_uptime,	"uptime",	1 },
 	{ trace_clock,		"perf",		1 },
 	ARCH_TRACE_CLOCKS
 };
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348..59e4d4d 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -19,6 +19,7 @@
 #include <linux/percpu.h>
 #include <linux/sched.h>
 #include <linux/ktime.h>
+#include <linux/time.h>
 #include <linux/trace_clock.h>
 
 /*
@@ -58,14 +59,20 @@ u64 notrace trace_clock(void)
 }
 
 /*
- * trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * trace_clock_uptime(): Use lockless version __current_kernel_time,
+ * so it's safe in NMI context.
  */
-u64 notrace trace_clock_jiffies(void)
+u64 notrace trace_clock_uptime(void)
 {
-	u64 jiffy = jiffies - INITIAL_JIFFIES;
+	struct timespec uptime, now, boottime;
+
+	/* Does not take xtime_lock, so it's safe in NMI context. */
+	now = __current_kernel_time();
+	getboottime(&boottime);
+	uptime = timespec_sub(now, boottime);
 
 	/* Return nsecs */
-	return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+	return timespec_to_ns(&uptime);
 }
 
 /*
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] tracing: update Documentation/trace/ftrace.txt for uptime
  2014-06-28 11:09 [PATCH 0/2] tracing: fix uptime overflow problem Xie XiuQi
  2014-06-28 11:10 ` [PATCH 1/2] " Xie XiuQi
@ 2014-06-28 11:10 ` Xie XiuQi
  1 sibling, 0 replies; 14+ messages in thread
From: Xie XiuQi @ 2014-06-28 11:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: rostedt, mingo, tony.luck, fweisbec, m.chehab

update uptime description.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 Documentation/trace/ftrace.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 2479b2a..7b2e906 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -331,8 +331,7 @@ of ftrace. Here is a list of some of the key files:
 		   know exactly the order events occurred with respect to
 		   each other on different CPUs.
 
-	  uptime: This uses the jiffies counter and the time stamp
-	  	  is relative to the time since boot up.
+	  uptime: This time stamp is relative to the time since boot up.
 
 	  perf: This makes ftrace use the same clock that perf uses.
 	  	Eventually perf will be able to read ftrace buffers
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] tracing: fix uptime overflow problem
  2014-06-28 11:10 ` [PATCH 1/2] " Xie XiuQi
@ 2014-06-30 15:10   ` Steven Rostedt
  2014-06-30 18:17     ` [PATCH] tracing: Fix wraparound problems in "uptime" tracer Tony Luck
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2014-06-30 15:10 UTC (permalink / raw)
  To: Xie XiuQi; +Cc: linux-kernel, mingo, tony.luck, fweisbec, m.chehab

On Sat, 28 Jun 2014 19:10:00 +0800
Xie XiuQi <xiexiuqi@huawei.com> wrote:

 
>  /*
> - * trace_jiffy_clock(): Simply use jiffies as a clock counter.
> + * trace_clock_uptime(): Use lockless version __current_kernel_time,
> + * so it's safe in NMI context.
>   */
> -u64 notrace trace_clock_jiffies(void)
> +u64 notrace trace_clock_uptime(void)
>  {
> -	u64 jiffy = jiffies - INITIAL_JIFFIES;
> +	struct timespec uptime, now, boottime;
> +
> +	/* Does not take xtime_lock, so it's safe in NMI context. */
> +	now = __current_kernel_time();

The problem with this patch is that you have a race:

        ts.tv_sec = tk->xtime_sec;
        ts.tv_nsec = (long)(tk->xtime_nsec >> tk->shift);

There's a chance that gets done between the update of xtime_sec an
xtime_nsec, and make time go backwards.

This call is lockless but is not reliable.

-- Steve


> +	getboottime(&boottime);
> +	uptime = timespec_sub(now, boottime);
>  
>  	/* Return nsecs */
> -	return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
> +	return timespec_to_ns(&uptime);
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] tracing: Fix wraparound problems in "uptime" tracer
  2014-06-30 15:10   ` Steven Rostedt
@ 2014-06-30 18:17     ` Tony Luck
  2014-06-30 18:40       ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Tony Luck @ 2014-06-30 18:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, tony.luck, fweisbec,
	<m.chehab@samsung.com> Xie XiuQi

There seem to be no non-racy solutions ... I've been wondering
about giving up on a generic jiffies_to_nsec() function because
people might use it in cases where the races might be likley to
bite them.  For my need, I think that "perfect is the enemy of good":

1) The race window is only a few microseconds wide
2) It only exists on 32-bit kernels - which are dying out on server
   systems because they can't handle the amounts of memory on modern
   machines.
3) It opens every 49 days (on a HZ=1000 system)
4) I'm logging error events that happen at a "per-month" frequency (or lower)
5) If the race does happen - the visible result is that we have a
   bad time logged against an error event.

so what about this: ...

From: Tony Luck <tony.luck@intel.com>

The "uptime" tracer added in:
    commit 8aacf017b065a805d27467843490c976835eb4a5
    tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
	(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds.  An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

We can't provide a full features jiffies_to_nsec() function in
any safe way (32-bit systems need locking to read the full 64-bit
jiffies value).  Just do the best we can here and recognise that
32-bit systems may seem some timestamp anomolies if jiffies64
was in the middle of rolling over a 2^32 boundary.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 kernel/timeconst.bc        |  6 ++++++
 kernel/trace/trace_clock.c | 10 ++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/timeconst.bc b/kernel/timeconst.bc
index 511bdf2cafda..a5fef7a7fb27 100644
--- a/kernel/timeconst.bc
+++ b/kernel/timeconst.bc
@@ -100,6 +100,12 @@ define timeconst(hz) {
 		print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
 		print "\n"
 
+		obase=10
+		cd=gcd(hz,1000000000)
+		print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
+		print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
+		print "\n"
+
 		print "#endif /* KERNEL_TIMECONST_H */\n"
 	}
 	halt
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348332b7..dc5b11b9f8a4 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -59,13 +59,19 @@ u64 notrace trace_clock(void)
 
 /*
  * trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * This usage of jiffies_64 isn't safe on 32-bit, but we may be
+ * called from NMI context, and we have no safe way to get a timestamp.
  */
 u64 notrace trace_clock_jiffies(void)
 {
-	u64 jiffy = jiffies - INITIAL_JIFFIES;
+	u64 jiffy = jiffies_64 - INITIAL_JIFFIES;
 
 	/* Return nsecs */
-	return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+#if !(NSEC_PER_SEC % HZ)
+	return (NSEC_PER_SEC / HZ) * jiffy;
+#else
+	return (jiffy * HZ_TO_NSEC_NUM) / HZ_TO_NSEC_DEN;
+#endif
 }
 
 /*
-- 
1.8.4.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] tracing: Fix wraparound problems in "uptime" tracer
  2014-06-30 18:17     ` [PATCH] tracing: Fix wraparound problems in "uptime" tracer Tony Luck
@ 2014-06-30 18:40       ` Steven Rostedt
  2014-06-30 20:31         ` [PATCH-v2] " Tony Luck
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2014-06-30 18:40 UTC (permalink / raw)
  To: Tony Luck
  Cc: linux-kernel, mingo, fweisbec, <m.chehab@samsung.com> Xie XiuQi

On Mon, 30 Jun 2014 11:17:18 -0700
Tony Luck <tony.luck@intel.com> wrote:

> There seem to be no non-racy solutions ... I've been wondering
> about giving up on a generic jiffies_to_nsec() function because
> people might use it in cases where the races might be likley to
> bite them.  For my need, I think that "perfect is the enemy of good":
> 
> 1) The race window is only a few microseconds wide
> 2) It only exists on 32-bit kernels - which are dying out on server
>    systems because they can't handle the amounts of memory on modern
>    machines.
> 3) It opens every 49 days (on a HZ=1000 system)
> 4) I'm logging error events that happen at a "per-month" frequency (or lower)
> 5) If the race does happen - the visible result is that we have a
>    bad time logged against an error event.
> 
> so what about this: ...
> 
> From: Tony Luck <tony.luck@intel.com>
> 
> The "uptime" tracer added in:
>     commit 8aacf017b065a805d27467843490c976835eb4a5
>     tracing: Add "uptime" trace clock that uses jiffies
> has wraparound problems when the system has been up more
> than 1 hour 11 minutes and 34 seconds. It converts jiffies
> to nanoseconds using:
> 	(u64)jiffies_to_usecs(jiffy) * 1000ULL
> but since jiffies_to_usecs() only returns a 32-bit value, it
> truncates at 2^32 microseconds.  An additional problem on 32-bit
> systems is that the argument is "unsigned long", so fixing the
> return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
> system).
> 
> We can't provide a full features jiffies_to_nsec() function in
> any safe way (32-bit systems need locking to read the full 64-bit
> jiffies value).  Just do the best we can here and recognise that
> 32-bit systems may seem some timestamp anomolies if jiffies64
> was in the middle of rolling over a 2^32 boundary.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  kernel/timeconst.bc        |  6 ++++++
>  kernel/trace/trace_clock.c | 10 ++++++++--
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/timeconst.bc b/kernel/timeconst.bc
> index 511bdf2cafda..a5fef7a7fb27 100644
> --- a/kernel/timeconst.bc
> +++ b/kernel/timeconst.bc
> @@ -100,6 +100,12 @@ define timeconst(hz) {
>  		print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
>  		print "\n"
>  
> +		obase=10
> +		cd=gcd(hz,1000000000)
> +		print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
> +		print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
> +		print "\n"
> +
>  		print "#endif /* KERNEL_TIMECONST_H */\n"
>  	}
>  	halt
> diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
> index 26dc348332b7..dc5b11b9f8a4 100644
> --- a/kernel/trace/trace_clock.c
> +++ b/kernel/trace/trace_clock.c
> @@ -59,13 +59,19 @@ u64 notrace trace_clock(void)
>  
>  /*
>   * trace_jiffy_clock(): Simply use jiffies as a clock counter.
> + * This usage of jiffies_64 isn't safe on 32-bit, but we may be
> + * called from NMI context, and we have no safe way to get a timestamp.
>   */
>  u64 notrace trace_clock_jiffies(void)
>  {
> -	u64 jiffy = jiffies - INITIAL_JIFFIES;
> +	u64 jiffy = jiffies_64 - INITIAL_JIFFIES;
>  
>  	/* Return nsecs */
> -	return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
> +#if !(NSEC_PER_SEC % HZ)
> +	return (NSEC_PER_SEC / HZ) * jiffy;
> +#else
> +	return (jiffy * HZ_TO_NSEC_NUM) / HZ_TO_NSEC_DEN;

Wont this break on 32 bit systems. That is, you can't divide 64bit
integers without using do_div().

-- Steve

> +#endif
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer
  2014-06-30 18:40       ` Steven Rostedt
@ 2014-06-30 20:31         ` Tony Luck
  2014-07-17 23:02           ` Tony Luck
  0 siblings, 1 reply; 14+ messages in thread
From: Tony Luck @ 2014-06-30 20:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, mingo, tony.luck, fweisbec,
	<m.chehab@samsung.com> Xie XiuQi

The "uptime" tracer added in:
    commit 8aacf017b065a805d27467843490c976835eb4a5
    tracing: Add "uptime" trace clock that uses jiffies
has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
	(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds.  An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

We can't provide a full features jiffies_to_nsec() function in
any safe way (32-bit systems need locking to read the full 64-bit
jiffies value).  Just do the best we can here and recognise that
32-bit systems may seem some timestamp anomolies if jiffies64
was in the middle of rolling over a 2^32 boundary.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---

v1->v2: Use "do_div()" for 64-bit division

 kernel/timeconst.bc        |  6 ++++++
 kernel/trace/trace_clock.c | 12 ++++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/kernel/timeconst.bc b/kernel/timeconst.bc
index 511bdf2cafda..a5fef7a7fb27 100644
--- a/kernel/timeconst.bc
+++ b/kernel/timeconst.bc
@@ -100,6 +100,12 @@ define timeconst(hz) {
 		print "#define USEC_TO_HZ_DEN\t\t", 1000000/cd, "\n"
 		print "\n"
 
+		obase=10
+		cd=gcd(hz,1000000000)
+		print "#define HZ_TO_NSEC_NUM\t\t", 1000000000/cd, "\n"
+		print "#define HZ_TO_NSEC_DEN\t\t", hz/cd, "\n"
+		print "\n"
+
 		print "#endif /* KERNEL_TIMECONST_H */\n"
 	}
 	halt
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348332b7..3fcf8f4807d5 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -59,13 +59,21 @@ u64 notrace trace_clock(void)
 
 /*
  * trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * This usage of jiffies_64 isn't safe on 32-bit, but we may be
+ * called from NMI context, and we have no safe way to get a timestamp.
  */
 u64 notrace trace_clock_jiffies(void)
 {
-	u64 jiffy = jiffies - INITIAL_JIFFIES;
+	u64 jiffy = jiffies_64 - INITIAL_JIFFIES;
 
 	/* Return nsecs */
-	return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+#if !(NSEC_PER_SEC % HZ)
+	return (NSEC_PER_SEC / HZ) * jiffy;
+#else
+	jiffy *= HZ_TO_NSEC_NUM;
+	do_div(jiffy, HZ_TO_NSEC_DEN);
+	return jiffy;
+#endif
 }
 
 /*
-- 
1.8.4.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer
  2014-06-30 20:31         ` [PATCH-v2] " Tony Luck
@ 2014-07-17 23:02           ` Tony Luck
  2014-07-18  2:08             ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Tony Luck @ 2014-07-17 23:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linux Kernel Mailing List, Ingo Molnar, Luck, Tony,
	Frédéric Weisbecker,
	<m.chehab@samsung.com> Xie XiuQi

On Mon, Jun 30, 2014 at 1:31 PM, Tony Luck <tony.luck@intel.com> wrote:
> +#else
> +       jiffy *= HZ_TO_NSEC_NUM;
> +       do_div(jiffy, HZ_TO_NSEC_DEN);
> +       return jiffy;
> +#endif

Well that didn't work either.  kernel/trace/trace_clock.c doesn't include
timeconst.h - so we fail the compile because HZ_TO_NSEC_{NUM,DEN}
aren't defined.  a #include looks dicey because timeconst.h is a generated
file up the the kernel/ level ... so getting Makefile dependencies right would
be ugly.

Do we really need to convert to nanoseconds? Couldn't we just return
jiffies:

u64 notrace trace_clock_jiffies(void)
{
       return jiffies_64 - INITIAL_JIFFIES;
}

and leave it as an exercise to userspace to interpret that?

-Tony

[Yes, this changes the UI . but there can't be any serious users
given that the existing UI only works for 1hr 11 minutes after each
boot]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer
  2014-07-17 23:02           ` Tony Luck
@ 2014-07-18  2:08             ` Steven Rostedt
  2014-07-18 17:05               ` Tony Luck
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2014-07-18  2:08 UTC (permalink / raw)
  To: Tony Luck
  Cc: Linux Kernel Mailing List, Ingo Molnar, Luck, Tony,
	Frédéric Weisbecker,
	<m.chehab@samsung.com> Xie XiuQi

On Thu, 17 Jul 2014 16:02:26 -0700
Tony Luck <tony.luck@gmail.com> wrote:

> On Mon, Jun 30, 2014 at 1:31 PM, Tony Luck <tony.luck@intel.com> wrote:
> > +#else
> > +       jiffy *= HZ_TO_NSEC_NUM;
> > +       do_div(jiffy, HZ_TO_NSEC_DEN);
> > +       return jiffy;
> > +#endif
> 
> Well that didn't work either.  kernel/trace/trace_clock.c doesn't include
> timeconst.h - so we fail the compile because HZ_TO_NSEC_{NUM,DEN}
> aren't defined.  a #include looks dicey because timeconst.h is a generated
> file up the the kernel/ level ... so getting Makefile dependencies right would
> be ugly.
> 
> Do we really need to convert to nanoseconds? Couldn't we just return
> jiffies:

Sure, and we can make it a "counter". That is, the counters don't hide
1000 counts on output.

> 
> u64 notrace trace_clock_jiffies(void)
> {
>        return jiffies_64 - INITIAL_JIFFIES;
> }
> 
> and leave it as an exercise to userspace to interpret that?
> 
> -Tony
> 
> [Yes, this changes the UI . but there can't be any serious users
> given that the existing UI only works for 1hr 11 minutes after each
> boot]

I doubt it will break any tools, so it's fine.

-- Steve


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer
  2014-07-18  2:08             ` Steven Rostedt
@ 2014-07-18 17:05               ` Tony Luck
  2014-07-18 17:36                 ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Tony Luck @ 2014-07-18 17:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linux Kernel Mailing List, Ingo Molnar,
	Frédéric Weisbecker,
	<m.chehab@samsung.com> Xie XiuQi

On Thu, Jul 17, 2014 at 7:08 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>> Do we really need to convert to nanoseconds? Couldn't we just return
>> jiffies:
>
> Sure, and we can make it a "counter". That is, the counters don't hide
> 1000 counts on output.

Can you explain that a bit more ... how do we mark it as a "counter"?

>> u64 notrace trace_clock_jiffies(void)
>> {
>>        return jiffies_64 - INITIAL_JIFFIES;
>> }

Hmm it seems that we make it hard for applications to see what HZ
value the kernel is actually using. All user-facing interfaces should
convert to USER_HZ (100).  So we should say:

u64 notrace trace_clock_jiffies(void)
{
        return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES);
}

Do we need to change anything in Documentation/trace/ftrace.txt?

Currently it says:

          uptime: This uses the jiffies counter and the time stamp
                  is relative to the time since boot up.

-Tony

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH-v2] tracing: Fix wraparound problems in "uptime" tracer
  2014-07-18 17:05               ` Tony Luck
@ 2014-07-18 17:36                 ` Steven Rostedt
  2014-07-18 18:43                   ` [PATCH] " Tony Luck
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2014-07-18 17:36 UTC (permalink / raw)
  To: Tony Luck
  Cc: Linux Kernel Mailing List, Ingo Molnar,
	Frédéric Weisbecker,
	<m.chehab@samsung.com> Xie XiuQi

On Fri, 18 Jul 2014 10:05:00 -0700
Tony Luck <tony.luck@gmail.com> wrote:

> On Thu, Jul 17, 2014 at 7:08 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >> Do we really need to convert to nanoseconds? Couldn't we just return
> >> jiffies:
> >
> > Sure, and we can make it a "counter". That is, the counters don't hide
> > 1000 counts on output.
> 
> Can you explain that a bit more ... how do we mark it as a "counter"?

My mistake, I should have said, mark it as not "in_ns"...

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index bda9621..291397e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -823,7 +823,7 @@ static struct {
 	{ trace_clock_local,	"local",	1 },
 	{ trace_clock_global,	"global",	1 },
 	{ trace_clock_counter,	"counter",	0 },
-	{ trace_clock_jiffies,	"uptime",	1 },
+	{ trace_clock_jiffies,	"uptime",	0 },
 	{ trace_clock,		"perf",		1 },
 	ARCH_TRACE_CLOCKS
 };

> 
> >> u64 notrace trace_clock_jiffies(void)
> >> {
> >>        return jiffies_64 - INITIAL_JIFFIES;
> >> }
> 
> Hmm it seems that we make it hard for applications to see what HZ
> value the kernel is actually using. All user-facing interfaces should
> convert to USER_HZ (100).  So we should say:

Yeah, I remember theses conversations in the past. Basically, HZ is an
internal kernel operation and userspace shouldn't know what it is,
otherwise you may have tools doing whacky things based on it.

Which is a good thing when we eventually get rid of the HZ idea.

> 
> u64 notrace trace_clock_jiffies(void)
> {
>         return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES);
> }
> 
> Do we need to change anything in Documentation/trace/ftrace.txt?
> 
> Currently it says:
> 
>           uptime: This uses the jiffies counter and the time stamp
>                   is relative to the time since boot up.

Yeah, and even if we change it to be converted, I believe that comment
is still correct. It does use the jiffies counter, and the time stamp
is still relative to system boot up.

-- Steve

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] tracing: Fix wraparound problems in "uptime" tracer
  2014-07-18 17:36                 ` Steven Rostedt
@ 2014-07-18 18:43                   ` Tony Luck
  2014-07-18 18:47                     ` Steven Rostedt
  0 siblings, 1 reply; 14+ messages in thread
From: Tony Luck @ 2014-07-18 18:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linux Kernel Mailing List, Ingo Molnar, fweisbec, m.chehab, Xie XiuQi

The "uptime" tracer added in:

    commit 8aacf017b065a805d27467843490c976835eb4a5
    tracing: Add "uptime" trace clock that uses jiffies

has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
        (u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds.  An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Avoid these problems by using jiffies_64 as our basis, and
not converting to nanoseconds (we do convert to clock_t because
user facing API must not be dependent on internal kernel
HZ values).

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 kernel/trace/trace.c       | 2 +-
 kernel/trace/trace_clock.c | 9 +++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 384ede311717..eb7d56480545 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -809,7 +809,7 @@ static struct {
 	{ trace_clock_local,	"local",	1 },
 	{ trace_clock_global,	"global",	1 },
 	{ trace_clock_counter,	"counter",	0 },
-	{ trace_clock_jiffies,	"uptime",	1 },
+	{ trace_clock_jiffies,	"uptime",	0 },
 	{ trace_clock,		"perf",		1 },
 	ARCH_TRACE_CLOCKS
 };
diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
index 26dc348332b7..57b67b1f24d1 100644
--- a/kernel/trace/trace_clock.c
+++ b/kernel/trace/trace_clock.c
@@ -59,13 +59,14 @@ u64 notrace trace_clock(void)
 
 /*
  * trace_jiffy_clock(): Simply use jiffies as a clock counter.
+ * Note that this use of jiffies_64 is not completely safe on
+ * 32-bit systems. But the window is tiny, and the effect if
+ * we are affected is that we will have an obviously bogus
+ * timestamp on a trace event - i.e. not life threatening.
  */
 u64 notrace trace_clock_jiffies(void)
 {
-	u64 jiffy = jiffies - INITIAL_JIFFIES;
-
-	/* Return nsecs */
-	return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
+	return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES);
 }
 
 /*
-- 
1.8.4.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] tracing: Fix wraparound problems in "uptime" tracer
  2014-07-18 18:43                   ` [PATCH] " Tony Luck
@ 2014-07-18 18:47                     ` Steven Rostedt
  2014-07-18 18:54                       ` Tony Luck
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2014-07-18 18:47 UTC (permalink / raw)
  To: Tony Luck
  Cc: Linux Kernel Mailing List, Ingo Molnar, fweisbec, m.chehab, Xie XiuQi

On Fri, 18 Jul 2014 11:43:01 -0700
Tony Luck <tony.luck@intel.com> wrote:

> The "uptime" tracer added in:
> 
>     commit 8aacf017b065a805d27467843490c976835eb4a5
>     tracing: Add "uptime" trace clock that uses jiffies
> 
> has wraparound problems when the system has been up more
> than 1 hour 11 minutes and 34 seconds. It converts jiffies
> to nanoseconds using:
>         (u64)jiffies_to_usecs(jiffy) * 1000ULL
> but since jiffies_to_usecs() only returns a 32-bit value, it
> truncates at 2^32 microseconds.  An additional problem on 32-bit
> systems is that the argument is "unsigned long", so fixing the
> return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
> system).
> 
> Avoid these problems by using jiffies_64 as our basis, and
> not converting to nanoseconds (we do convert to clock_t because
> user facing API must not be dependent on internal kernel
> HZ values).
> 

Do you think this is worthy of stable and 3.16 material?

-- Steve

> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  kernel/trace/trace.c       | 2 +-
>  kernel/trace/trace_clock.c | 9 +++++----
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 384ede311717..eb7d56480545 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -809,7 +809,7 @@ static struct {
>  	{ trace_clock_local,	"local",	1 },
>  	{ trace_clock_global,	"global",	1 },
>  	{ trace_clock_counter,	"counter",	0 },
> -	{ trace_clock_jiffies,	"uptime",	1 },
> +	{ trace_clock_jiffies,	"uptime",	0 },
>  	{ trace_clock,		"perf",		1 },
>  	ARCH_TRACE_CLOCKS
>  };
> diff --git a/kernel/trace/trace_clock.c b/kernel/trace/trace_clock.c
> index 26dc348332b7..57b67b1f24d1 100644
> --- a/kernel/trace/trace_clock.c
> +++ b/kernel/trace/trace_clock.c
> @@ -59,13 +59,14 @@ u64 notrace trace_clock(void)
>  
>  /*
>   * trace_jiffy_clock(): Simply use jiffies as a clock counter.
> + * Note that this use of jiffies_64 is not completely safe on
> + * 32-bit systems. But the window is tiny, and the effect if
> + * we are affected is that we will have an obviously bogus
> + * timestamp on a trace event - i.e. not life threatening.
>   */
>  u64 notrace trace_clock_jiffies(void)
>  {
> -	u64 jiffy = jiffies - INITIAL_JIFFIES;
> -
> -	/* Return nsecs */
> -	return (u64)jiffies_to_usecs(jiffy) * 1000ULL;
> +	return jiffies_64_to_clock_t(jiffies_64 - INITIAL_JIFFIES);
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] tracing: Fix wraparound problems in "uptime" tracer
  2014-07-18 18:47                     ` Steven Rostedt
@ 2014-07-18 18:54                       ` Tony Luck
  0 siblings, 0 replies; 14+ messages in thread
From: Tony Luck @ 2014-07-18 18:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linux Kernel Mailing List, Ingo Molnar,
	Frédéric Weisbecker, m.chehab, Xie XiuQi

On Fri, Jul 18, 2014 at 11:47 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> Do you think this is worthy of stable and 3.16 material?

Perhaps.  Not sure that I need it (I'm stuck waiting for the extlog_trace pieces
which are now queued in "tip", but won't go in until 3.17).  But I suppose we
should minimize the window where people see the bogus wraparound nanoseconds.

-Tony

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-07-18 18:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-28 11:09 [PATCH 0/2] tracing: fix uptime overflow problem Xie XiuQi
2014-06-28 11:10 ` [PATCH 1/2] " Xie XiuQi
2014-06-30 15:10   ` Steven Rostedt
2014-06-30 18:17     ` [PATCH] tracing: Fix wraparound problems in "uptime" tracer Tony Luck
2014-06-30 18:40       ` Steven Rostedt
2014-06-30 20:31         ` [PATCH-v2] " Tony Luck
2014-07-17 23:02           ` Tony Luck
2014-07-18  2:08             ` Steven Rostedt
2014-07-18 17:05               ` Tony Luck
2014-07-18 17:36                 ` Steven Rostedt
2014-07-18 18:43                   ` [PATCH] " Tony Luck
2014-07-18 18:47                     ` Steven Rostedt
2014-07-18 18:54                       ` Tony Luck
2014-06-28 11:10 ` [PATCH 2/2] tracing: update Documentation/trace/ftrace.txt for uptime Xie XiuQi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).