All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] printk, Add a printk.clock kernel parameter
@ 2016-01-06 13:00 Prarit Bhargava
  2016-01-06 13:00 ` [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset() Prarit Bhargava
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Prarit Bhargava @ 2016-01-06 13:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, John Stultz, Xunlei Pang, Thomas Gleixner,
	Baolin Wang, Andrew Morton, Greg Kroah-Hartman, Petr Mladek,
	Tejun Heo, Peter Hurley, Vasily Averin, Joe Perches

Over the past years I've seen many reports of bugs that include
time-stamped kernel logs (enabled when CONFIG_PRINTK_TIME=y or
print.time=1 is specified as a kernel parameter) that do not align
with either external time stamped logs or /var/log/messages.

For example,

[root@intel-wildcatpass-06 ~]# date; echo "Hello!" > /dev/kmsg ; date
Thu Dec 17 13:58:31 EST 2015
Thu Dec 17 13:58:31 EST 2015

which displays

[83973.768912] Hello!

on the serial console.

Running a script to convert this to "boot time",

[root@intel-wildcatpass-06 ~]# ./human.sh  | tail -1
[Thu Dec 17 13:59:57 2015] Hello!

which is already off by 1 minute and 26 seconds off after ~24 hours of
uptime.

This occurs because the time stamp is obtained from a call to
local_clock() which (on x86) is a direct call to the hardware.  These
hardware clock reads are not modified by the standard ntp or ptp protocol,
while the other timestamps are, and that results in situations where
/var/log/messages timestamps are further and further offset from the
hardware clock timestamps.

This patch introduces printk.clock=[local|boot|real|tai] allowing a
user to specify an adjusted clock to use with printk timestamps.  The
hardware clock, or the existing functionality, is preserved by default.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Xunlei Pang <pang.xunlei@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Baolin Wang <baolin.wang@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>

Prarit Bhargava (2):
  kernel, timekeeping, add trylock option to ktime_get_with_offset()
  printk, Add printk.clock kernel parameter

 include/linux/time64.h      |    2 +
 include/linux/timekeeping.h |   50 +++++++++++++++++++--
 kernel/printk/printk.c      |  105 +++++++++++++++++++++++++++++++++++++++----
 kernel/time/timekeeping.c   |   15 ++++++-
 4 files changed, 159 insertions(+), 13 deletions(-)

-- 
1.7.9.3


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 13:00 [PATCH 0/2] printk, Add a printk.clock kernel parameter Prarit Bhargava
@ 2016-01-06 13:00 ` Prarit Bhargava
  2016-01-06 16:04   ` Jiri Bohac
                     ` (2 more replies)
  2016-01-06 13:00 ` [PATCH 2/2] printk, Add printk.clock kernel parameter Prarit Bhargava
  2016-01-06 17:09 ` [PATCH 0/2] printk, Add a " Joe Perches
  2 siblings, 3 replies; 21+ messages in thread
From: Prarit Bhargava @ 2016-01-06 13:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, John Stultz, Thomas Gleixner, Ingo Molnar,
	Xunlei Pang, Peter Zijlstra, Baolin Wang, Arnd Bergmann

This is a timekeeping staging patch for the printk() timestamp
functionality that adds a trylock option for the timekeeping_lock() to
ktime_get_with_offset().  When trylock is 1, calls to
ktime_get_with_offset() will return return a ktime of 0 if the
timekeeping_lock is locked.

This patch adds ktime_try_real(), ktime_try_boot(), and ktime_try_tai() as
wrapper functions around ktime_get_with_offset() with trylock = 1, and
modifies other callers to call ktime_get_with_offset() with trylock = 0.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Xunlei Pang <pang.xunlei@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Baolin Wang <baolin.wang@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
 include/linux/timekeeping.h |   50 +++++++++++++++++++++++++++++++++++++++----
 kernel/time/timekeeping.c   |   15 ++++++++++++-
 2 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ec89d84..4f47352 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -166,7 +166,7 @@ enum tk_offsets {
 };
 
 extern ktime_t ktime_get(void);
-extern ktime_t ktime_get_with_offset(enum tk_offsets offs);
+extern ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock);
 extern ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs);
 extern ktime_t ktime_get_raw(void);
 extern u32 ktime_get_resolution_ns(void);
@@ -176,7 +176,16 @@ extern u32 ktime_get_resolution_ns(void);
  */
 static inline ktime_t ktime_get_real(void)
 {
-	return ktime_get_with_offset(TK_OFFS_REAL);
+	return ktime_get_with_offset(TK_OFFS_REAL, 0);
+}
+
+/**
+ * ktime_try_real - same as ktime_get_real, except return 0 if timekeeping is
+ * locked.
+ */
+static inline ktime_t ktime_try_real(void)
+{
+	return ktime_get_with_offset(TK_OFFS_REAL, 1);
 }
 
 /**
@@ -187,7 +196,16 @@ static inline ktime_t ktime_get_real(void)
  */
 static inline ktime_t ktime_get_boottime(void)
 {
-	return ktime_get_with_offset(TK_OFFS_BOOT);
+	return ktime_get_with_offset(TK_OFFS_BOOT, 0);
+}
+
+/**
+ * ktime_try_boottime - same as ktime_get_bootime, except return 0 if
+ * timekeeping is locked.
+ */
+static inline ktime_t ktime_try_boottime(void)
+{
+	return ktime_get_with_offset(TK_OFFS_BOOT, 1);
 }
 
 /**
@@ -195,7 +213,16 @@ static inline ktime_t ktime_get_boottime(void)
  */
 static inline ktime_t ktime_get_clocktai(void)
 {
-	return ktime_get_with_offset(TK_OFFS_TAI);
+	return ktime_get_with_offset(TK_OFFS_TAI, 0);
+}
+
+/**
+ * ktime_try_clocktai - same as ktime_get_clocktai, except return 0 if
+ * timekeeping is locked.
+ */
+static inline ktime_t ktime_try_clocktai(void)
+{
+	return ktime_get_with_offset(TK_OFFS_TAI, 1);
 }
 
 /**
@@ -216,16 +243,31 @@ static inline u64 ktime_get_real_ns(void)
 	return ktime_to_ns(ktime_get_real());
 }
 
+static inline u64 ktime_try_real_ns(void)
+{
+	return ktime_to_ns(ktime_try_real());
+}
+
 static inline u64 ktime_get_boot_ns(void)
 {
 	return ktime_to_ns(ktime_get_boottime());
 }
 
+static inline u64 ktime_try_boot_ns(void)
+{
+	return ktime_to_ns(ktime_try_boottime());
+}
+
 static inline u64 ktime_get_tai_ns(void)
 {
 	return ktime_to_ns(ktime_get_clocktai());
 }
 
+static inline u64 ktime_try_tai_ns(void)
+{
+	return ktime_to_ns(ktime_try_clocktai());
+}
+
 static inline u64 ktime_get_raw_ns(void)
 {
 	return ktime_to_ns(ktime_get_raw());
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d563c19..6e2cbeb 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -44,6 +44,8 @@ static struct {
 static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static struct timekeeper shadow_timekeeper;
 
+/* printk may call ktime_get_with_offset() before timekeeping is initialized. */
+static int timekeeping_initialized;
 /**
  * struct tk_fast - NMI safe timekeeper
  * @seq:	Sequence counter for protecting updates. The lowest bit
@@ -705,15 +707,22 @@ static ktime_t *offsets[TK_OFFS_MAX] = {
 	[TK_OFFS_TAI]	= &tk_core.timekeeper.offs_tai,
 };
 
-ktime_t ktime_get_with_offset(enum tk_offsets offs)
+ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	unsigned int seq;
 	ktime_t base, *offset = offsets[offs];
 	s64 nsecs;
+	unsigned long flags = 0;
+
+	if (unlikely(!timekeeping_initialized))
+		return ktime_set(0, 0);
 
 	WARN_ON(timekeeping_suspended);
 
+	if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
+		return ktime_set(KTIME_MAX, 0);
+
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		base = ktime_add(tk->tkr_mono.base, *offset);
@@ -721,6 +730,9 @@ ktime_t ktime_get_with_offset(enum tk_offsets offs)
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
+	if (trylock)
+		raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+
 	return ktime_add_ns(base, nsecs);
 
 }
@@ -1255,6 +1267,7 @@ void __init timekeeping_init(void)
 
 	write_seqcount_end(&tk_core.seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+	timekeeping_initialized = 1;
 }
 
 /* time in seconds when suspend began for persistent clock */
-- 
1.7.9.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/2] printk, Add printk.clock kernel parameter
  2016-01-06 13:00 [PATCH 0/2] printk, Add a printk.clock kernel parameter Prarit Bhargava
  2016-01-06 13:00 ` [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset() Prarit Bhargava
@ 2016-01-06 13:00 ` Prarit Bhargava
  2016-01-07 14:57   ` Petr Mladek
  2016-01-06 17:09 ` [PATCH 0/2] printk, Add a " Joe Perches
  2 siblings, 1 reply; 21+ messages in thread
From: Prarit Bhargava @ 2016-01-06 13:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, John Stultz, Xunlei Pang, Thomas Gleixner,
	Baolin Wang, Andrew Morton, Greg Kroah-Hartman, Petr Mladek,
	Tejun Heo, Peter Hurley, Vasily Averin, Joe Perches

The script used in the analysis below:

dmesg_with_human_timestamps () {
    $(type -P dmesg) "$@" | perl -w -e 'use strict;
        my ($uptime) = do { local @ARGV="/proc/uptime";<>}; ($uptime) = ($uptime =~ /^(\d+)\./);
        foreach my $line (<>) {
            printf( ($line=~/^\[\s*(\d+)\.\d+\](.+)/) ? ( "[%s]%s\n", scalar localtime(time - $uptime + $1), $2 ) : $line )
        }'
}

dmesg_with_human_timestamps

----8<----

Over the past years I've seen many reports of bugs that include
time-stamped kernel logs (enabled when CONFIG_PRINTK_TIME=y or
print.time=1 is specified as a kernel parameter) that do not align
with either external time stamped logs or /var/log/messages.

For example,

[root@intel-wildcatpass-06 ~]# date; echo "Hello!" > /dev/kmsg ; date
Thu Dec 17 13:58:31 EST 2015
Thu Dec 17 13:58:31 EST 2015

which displays

[83973.768912] Hello!

on the serial console.

Running a script to convert this to "boot time",

[root@intel-wildcatpass-06 ~]# ./human.sh  | tail -1
[Thu Dec 17 13:59:57 2015] Hello!

which is already off by 1 minute and 26 seconds off after ~24 hours of
uptime.

This occurs because the time stamp is obtained from a call to
local_clock() which (on x86) is a direct call to the hardware.  These
hardware clock reads are not modified by the standard ntp or ptp protocol,
while the other timestamps are, and that results in situations where
/var/log/messages timestamps are further and further offset from the
hardware clock timestamps.

This patch introduces printk.clock=[local|boot|real|tai] allowing a
user to specify an adjusted clock to use with printk timestamps.  The
hardware clock, or the existing functionality, is preserved by default.

If the timekeeper_lock (used to protect writes from reads) is under
contention, the last known good time is output again and "??" is used as
the last two digits of time.  This avoids contention issues with a CPU
panicking while another CPU is writing time data.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Xunlei Pang <pang.xunlei@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Baolin Wang <baolin.wang@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
 include/linux/time64.h |    2 +
 kernel/printk/printk.c |  105 ++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/include/linux/time64.h b/include/linux/time64.h
index 367d5af..f4a35d9 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -39,6 +39,8 @@ struct itimerspec64 {
 #define TIME64_MAX			((s64)~((u64)1 << 63))
 #define KTIME_MAX			((s64)~((u64)1 << 63))
 #define KTIME_SEC_MAX			(KTIME_MAX / NSEC_PER_SEC)
+/* Maximum value returned by ktime_to_ns() */
+#define KTIME_MAX_NS 			((u64)((U64_MAX>>1)))
 
 #if __BITS_PER_LONG == 64
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2ce8826..ffc05ee 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -419,6 +419,83 @@ static u32 truncate_msg(u16 *text_len, u16 *trunc_msg_len,
 	return msg_used_size(*text_len + *trunc_msg_len, 0, pad_len);
 }
 
+static u64 printk_try_real_ns(void)
+{
+	return ktime_try_real_ns();
+}
+
+static u64 printk_try_boot_ns(void)
+{
+	return ktime_try_boot_ns();
+}
+
+static u64 printk_try_tai_ns(void)
+{
+	return ktime_try_tai_ns();
+}
+
+/* default to local clock */
+static u64 last_time_stamp;
+static char *printk_clock = "local";
+static u64 (*printk_clock_fn)(void) = &local_clock;
+
+static u64 printk_get_timestamp(void)
+{
+	u64 ret = printk_clock_fn();
+
+	if (likely(ret != KTIME_MAX_NS))
+		last_time_stamp = ret;
+	return ret;
+}
+
+static int printk_clock_param_set(const char *val,
+				  const struct kernel_param *kp)
+{
+	char *printk_clock_new = strstrip((char *)val);
+	int ret;
+
+	/* check if change is needed */
+	if (!strcmp(printk_clock_new, printk_clock))
+		return 0;
+
+	if (!strncmp("local", printk_clock_new, 5)) {
+		ret = param_set_charp(printk_clock_new, kp);
+		if (ret)
+			return ret;
+		printk_clock_fn = &local_clock;
+	} else if (!strncmp("real", printk_clock_new, 4)) {
+		ret = param_set_charp(printk_clock_new, kp);
+		if (ret)
+			return ret;
+		printk_clock_fn = &printk_try_real_ns;
+	} else if (!strncmp("boot", printk_clock_new, 4)) {
+		ret = param_set_charp(printk_clock_new, kp);
+		if (ret)
+			return ret;
+		printk_clock_fn = &printk_try_boot_ns;
+	} else if (!strncmp("tai", printk_clock_new, 3)) {
+		ret = param_set_charp(printk_clock_new, kp);
+		if (ret)
+			return ret;
+		printk_clock_fn = &printk_try_tai_ns;
+	}
+
+	/* Did the clock change ? */
+	if (strcmp(printk_clock_new, printk_clock))
+		return -EINVAL;
+
+	pr_info("printk: timestamp set to %s clock.\n", printk_clock);
+	return 0;
+}
+
+static struct kernel_param_ops printk_clock_param_ops = {
+	.set =		printk_clock_param_set,
+	.get =		param_get_charp,
+};
+
+module_param_cb(clock, &printk_clock_param_ops, &printk_clock, 0644);
+MODULE_PARM_DESC(clock, "Clock used for printk timestamps.  Can be local (hardware/default), boot, real, or tai.\n");
+
 /* insert record into the buffer, discard old ones, update heads */
 static int log_store(int facility, int level,
 		     enum log_flags flags, u64 ts_nsec,
@@ -467,7 +544,7 @@ static int log_store(int facility, int level,
 	if (ts_nsec > 0)
 		msg->ts_nsec = ts_nsec;
 	else
-		msg->ts_nsec = local_clock();
+		msg->ts_nsec = printk_get_timestamp();
 	memset(log_dict(msg) + dict_len, 0, pad_len);
 	msg->len = size;
 
@@ -1039,18 +1116,30 @@ module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
 
 static size_t print_time(u64 ts, char *buf)
 {
-	unsigned long rem_nsec;
+	unsigned long rem;
+	size_t size;
 
 	if (!printk_time)
 		return 0;
 
-	rem_nsec = do_div(ts, 1000000000);
+	if (ts == KTIME_MAX_NS)
+		ts = last_time_stamp;
 
-	if (!buf)
-		return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
+	if (buf) {
+		size = INT_MAX;
+		rem = do_div(ts, 1000000000) / 1000;
+	} else {
+		size = 0;
+		rem = 0;
+	}
 
-	return sprintf(buf, "[%5lu.%06lu] ",
-		       (unsigned long)ts, rem_nsec / 1000);
+	if (ts != KTIME_MAX_NS) {
+		return snprintf(buf, size, "[%5lu.%06lu] ",
+			       (unsigned long)ts, rem);
+	}
+	/* return last_time_stamp with ?? warning */
+	return snprintf(buf, size, "[%5lu.%04lu??] ",
+		       (unsigned long)ts, rem);
 }
 
 static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
@@ -1613,7 +1702,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
 		cont.facility = facility;
 		cont.level = level;
 		cont.owner = current;
-		cont.ts_nsec = local_clock();
+		cont.ts_nsec = printk_get_timestamp();
 		cont.flags = 0;
 		cont.cons = 0;
 		cont.flushed = false;
-- 
1.7.9.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 13:00 ` [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset() Prarit Bhargava
@ 2016-01-06 16:04   ` Jiri Bohac
  2016-01-06 16:27     ` Prarit Bhargava
  2016-01-06 16:25   ` Petr Mladek
  2016-01-06 17:28   ` John Stultz
  2 siblings, 1 reply; 21+ messages in thread
From: Jiri Bohac @ 2016-01-06 16:04 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Ingo Molnar,
	Xunlei Pang, Peter Zijlstra, Baolin Wang, Arnd Bergmann

On Wed, Jan 06, 2016 at 08:00:33AM -0500, Prarit Bhargava wrote:
> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
>  {
>  	struct timekeeper *tk = &tk_core.timekeeper;
>  	unsigned int seq;
>  	ktime_t base, *offset = offsets[offs];
>  	s64 nsecs;
> +	unsigned long flags = 0;
> +
> +	if (unlikely(!timekeeping_initialized))
> +		return ktime_set(0, 0);
>  
>  	WARN_ON(timekeeping_suspended);
>  
> +	if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
> +		return ktime_set(KTIME_MAX, 0);
> +

Are you trying to avoid a deadlock caused by calling printk() with
timekeeper_lock locked?

I believe this is already unsafe, as explained in the commit log
of 6d9bcb62 (timekeeping: use printk_deferred when holding
timekeeping seqlock).

So directly calling ktime_get() from printk would just turn a
rare deadlock into a certain one - perhaps a good thing?


-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 13:00 ` [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset() Prarit Bhargava
  2016-01-06 16:04   ` Jiri Bohac
@ 2016-01-06 16:25   ` Petr Mladek
  2016-01-06 17:28   ` John Stultz
  2 siblings, 0 replies; 21+ messages in thread
From: Petr Mladek @ 2016-01-06 16:25 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Ingo Molnar,
	Xunlei Pang, Peter Zijlstra, Baolin Wang, Arnd Bergmann

On Wed 2016-01-06 08:00:33, Prarit Bhargava wrote:
> This is a timekeeping staging patch for the printk() timestamp
> functionality that adds a trylock option for the timekeeping_lock() to
> ktime_get_with_offset().  When trylock is 1, calls to
> ktime_get_with_offset() will return return a ktime of 0 if the
> timekeeping_lock is locked.

If I get it correctly, it returns 0 when timekeeping is not
initialized. But it returns TIME_MAX_NS when the lock is taken.
Where TIME_MAX_NS is defined in the 2nd patch.

> This patch adds ktime_try_real(), ktime_try_boot(), and ktime_try_tai() as
> wrapper functions around ktime_get_with_offset() with trylock = 1, and
> modifies other callers to call ktime_get_with_offset() with trylock = 0.
> 
> ---
>  include/linux/timekeeping.h |   50 +++++++++++++++++++++++++++++++++++++++----
>  kernel/time/timekeeping.c   |   15 ++++++++++++-
>  2 files changed, 60 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> index ec89d84..4f47352 100644
> --- a/include/linux/timekeeping.h
> +++ b/include/linux/timekeeping.h
> @@ -166,7 +166,7 @@ enum tk_offsets {
>  };
>  
>  extern ktime_t ktime_get(void);
> -extern ktime_t ktime_get_with_offset(enum tk_offsets offs);
> +extern ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock);
>  extern ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs);
>  extern ktime_t ktime_get_raw(void);
>  extern u32 ktime_get_resolution_ns(void);
> @@ -176,7 +176,16 @@ extern u32 ktime_get_resolution_ns(void);
>   */
>  static inline ktime_t ktime_get_real(void)
>  {
> -	return ktime_get_with_offset(TK_OFFS_REAL);
> +	return ktime_get_with_offset(TK_OFFS_REAL, 0);
> +}
> +
> +/**
> + * ktime_try_real - same as ktime_get_real, except return 0 if timekeeping is
> + * locked.
> + */
> +static inline ktime_t ktime_try_real(void)

I would call this ktime_try_get_real() to make it clear.


> +{
> +	return ktime_get_with_offset(TK_OFFS_REAL, 1);
>  }
>  
[...]

>  static inline u64 ktime_get_raw_ns(void)
>  {
>  	return ktime_to_ns(ktime_get_raw());
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index d563c19..6e2cbeb 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -44,6 +44,8 @@ static struct {
>  static DEFINE_RAW_SPINLOCK(timekeeper_lock);
>  static struct timekeeper shadow_timekeeper;
>  
> +/* printk may call ktime_get_with_offset() before timekeeping is initialized. */
> +static int timekeeping_initialized;
>  /**
>   * struct tk_fast - NMI safe timekeeper
>   * @seq:	Sequence counter for protecting updates. The lowest bit
> @@ -705,15 +707,22 @@ static ktime_t *offsets[TK_OFFS_MAX] = {
>  	[TK_OFFS_TAI]	= &tk_core.timekeeper.offs_tai,
>  };
>  
> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
>  {
>  	struct timekeeper *tk = &tk_core.timekeeper;
>  	unsigned int seq;
>  	ktime_t base, *offset = offsets[offs];
>  	s64 nsecs;
> +	unsigned long flags = 0;
> +
> +	if (unlikely(!timekeeping_initialized))
> +		return ktime_set(0, 0);



>  	WARN_ON(timekeeping_suspended);
>  
> +	if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
> +		return ktime_set(KTIME_MAX, 0);
> +

I guess that you want to avoid a deadlock with this. I mean that you
want to survive when you call, for example, ktime_try_tai_ns() from
inside timekeeping_set_tai_offset(). Am I right?

One problem is that it will fail even when the lock is taken from
another CPU and the deadlock is not real. It probably is not a big
issue for printk() because currently used local_clock() is far from perfect
but...

Another problem is that it will block writers. This might be solved
if you try only one while cycle instead of taking the lock.
I mean to do something like:

ktime_t ktime_get_with_offset(enum tk_offsets offs, int try_once)
{
	struct timekeeper *tk = &tk_core.timekeeper;
	unsigned int seq;
	int retry;
	ktime_t base, *offset = offsets[offs];
	s64 nsecs;

	WARN_ON(timekeeping_suspended);

	do {
		seq = read_seqcount_begin(&tk_core.seq);
		base = ktime_add(tk->tkr_mono.base, *offset);
		nsecs = timekeeping_get_ns(&tk->tkr_mono);
		retry = read_seqcount_retry(&tk_core.seq, seq));
	} while (retry && !try_once);

	if (try_once && retry)
		return ktime_set(KTIME_MAX, 0);

	return ktime_add_ns(base, nsecs);

}

Another question is if you really need to distinguish between
non-initialized and locked state. You might always return zero
time if you do not know. It will things easier.


>  	do {
>  		seq = read_seqcount_begin(&tk_core.seq);
>  		base = ktime_add(tk->tkr_mono.base, *offset);
> @@ -721,6 +730,9 @@ ktime_t ktime_get_with_offset(enum tk_offsets offs)
>  
>  	} while (read_seqcount_retry(&tk_core.seq, seq));
>  
> +	if (trylock)
> +		raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
> +
>  	return ktime_add_ns(base, nsecs);
>  

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 16:04   ` Jiri Bohac
@ 2016-01-06 16:27     ` Prarit Bhargava
  0 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2016-01-06 16:27 UTC (permalink / raw)
  To: Jiri Bohac
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Ingo Molnar,
	Xunlei Pang, Peter Zijlstra, Baolin Wang, Arnd Bergmann



On 01/06/2016 11:04 AM, Jiri Bohac wrote:
> On Wed, Jan 06, 2016 at 08:00:33AM -0500, Prarit Bhargava wrote:
>> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
>> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
>>  {
>>  	struct timekeeper *tk = &tk_core.timekeeper;
>>  	unsigned int seq;
>>  	ktime_t base, *offset = offsets[offs];
>>  	s64 nsecs;
>> +	unsigned long flags = 0;
>> +
>> +	if (unlikely(!timekeeping_initialized))
>> +		return ktime_set(0, 0);
>>  
>>  	WARN_ON(timekeeping_suspended);
>>  
>> +	if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
>> +		return ktime_set(KTIME_MAX, 0);
>> +
> 
> Are you trying to avoid a deadlock caused by calling printk() with
> timekeeper_lock locked?

Not exactly.  When I initially sent this as a RFE to jstultz he pointed out that
if CPU A had acquired the timekeeper_lock (and therefore incremented tk_core.seq
for a write), and CPU B panicked, no output would occur because the reads of
tk_core.seq would spin indefinitely.

> 
> I believe this is already unsafe, as explained in the commit log
> of 6d9bcb62 (timekeeping: use printk_deferred when holding
> timekeeping seqlock).

Hmm ... John Stultz, any suggestions here?

P.

> 
> So directly calling ktime_get() from printk would just turn a
> rare deadlock into a certain one - perhaps a good thing?
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/2] printk, Add a printk.clock kernel parameter
  2016-01-06 13:00 [PATCH 0/2] printk, Add a printk.clock kernel parameter Prarit Bhargava
  2016-01-06 13:00 ` [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset() Prarit Bhargava
  2016-01-06 13:00 ` [PATCH 2/2] printk, Add printk.clock kernel parameter Prarit Bhargava
@ 2016-01-06 17:09 ` Joe Perches
  2 siblings, 0 replies; 21+ messages in thread
From: Joe Perches @ 2016-01-06 17:09 UTC (permalink / raw)
  To: Prarit Bhargava, linux-kernel
  Cc: John Stultz, Xunlei Pang, Thomas Gleixner, Baolin Wang,
	Andrew Morton, Greg Kroah-Hartman, Petr Mladek, Tejun Heo,
	Peter Hurley, Vasily Averin

On Wed, 2016-01-06 at 08:00 -0500, Prarit Bhargava wrote:
> Over the past years I've seen many reports of bugs that include
> time-stamped kernel logs (enabled when CONFIG_PRINTK_TIME=y or
> print.time=1 is specified as a kernel parameter) that do not align
> with either external time stamped logs or /var/log/messages.
[]
> This patch introduces printk.clock=[local|boot|real|tai] allowing a
> user to specify an adjusted clock to use with printk timestamps.  The
> hardware clock, or the existing functionality, is preserved by default.

printk.c is already way too large.

Perhaps it would be reasonable to create a new
kernel/printk/timestamp.c file with these changes.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 13:00 ` [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset() Prarit Bhargava
  2016-01-06 16:04   ` Jiri Bohac
  2016-01-06 16:25   ` Petr Mladek
@ 2016-01-06 17:28   ` John Stultz
  2016-01-06 17:33     ` John Stultz
  2016-01-06 17:34     ` Thomas Gleixner
  2 siblings, 2 replies; 21+ messages in thread
From: John Stultz @ 2016-01-06 17:28 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: lkml, Thomas Gleixner, Ingo Molnar, Xunlei Pang, Peter Zijlstra,
	Baolin Wang, Arnd Bergmann

On Wed, Jan 6, 2016 at 5:00 AM, Prarit Bhargava <prarit@redhat.com> wrote:
> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
>  {
>         struct timekeeper *tk = &tk_core.timekeeper;
>         unsigned int seq;
>         ktime_t base, *offset = offsets[offs];
>         s64 nsecs;
> +       unsigned long flags = 0;
> +
> +       if (unlikely(!timekeeping_initialized))
> +               return ktime_set(0, 0);
>
>         WARN_ON(timekeeping_suspended);
>
> +       if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
> +               return ktime_set(KTIME_MAX, 0);

Wait.. this doesn't make sense. The timekeeper lock is only for reading.

What I was suggesting to you off line is to have something that avoids
spinning on the seqcounter should if a bug occurs and we IPI all the
cpus, that we don't deadlock or block any printk messages.


> +
>         do {
>                 seq = read_seqcount_begin(&tk_core.seq);
>                 base = ktime_add(tk->tkr_mono.base, *offset);
> @@ -721,6 +730,9 @@ ktime_t ktime_get_with_offset(enum tk_offsets offs)
>
>         } while (read_seqcount_retry(&tk_core.seq, seq));

So instead of the do/while() loop above... Something closer to:

int __ktime_get_with_offset(enum tk_offsets offs, ktime_t* base, s64 *nsec)
{
       unsigned int seq;

       seq = read_seqcount_begin(&tk_core.seq);
       *base = ktime_add(tk->tkr_mono.base, *offset);
       *nsecs = timekeeping_get_ns(&tk->tkr_mono);

       return read_seqcount_retry(&tk_core.seq, seq);
}

Then ktime_get_with_offset() would just call:
      ...
      while(__ktime_get_with_offset(offs, &base, &nsecs))
                          /*spin*/;
      return ktime_add_ns(base,nsecs);

Then you add a simple:
int ktime_try_get_with_offset(enum tk_offsets offs, ktime_t* ret)
{
   ...
    if (__ktime_get_with_offset(offs, &base, &nsecs))
           return -EAGAIN;
    *ret = ktime_add_ns(base,nsecs);
     return 0;
}


thanks
-john

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 17:28   ` John Stultz
@ 2016-01-06 17:33     ` John Stultz
  2016-01-06 18:06       ` Prarit Bhargava
  2016-01-06 17:34     ` Thomas Gleixner
  1 sibling, 1 reply; 21+ messages in thread
From: John Stultz @ 2016-01-06 17:33 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: lkml, Thomas Gleixner, Ingo Molnar, Xunlei Pang, Peter Zijlstra,
	Baolin Wang, Arnd Bergmann

On Wed, Jan 6, 2016 at 9:28 AM, John Stultz <john.stultz@linaro.org> wrote:
> On Wed, Jan 6, 2016 at 5:00 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
>> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
>>  {
>>         struct timekeeper *tk = &tk_core.timekeeper;
>>         unsigned int seq;
>>         ktime_t base, *offset = offsets[offs];
>>         s64 nsecs;
>> +       unsigned long flags = 0;
>> +
>> +       if (unlikely(!timekeeping_initialized))
>> +               return ktime_set(0, 0);
>>
>>         WARN_ON(timekeeping_suspended);
>>
>> +       if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
>> +               return ktime_set(KTIME_MAX, 0);
>
> Wait.. this doesn't make sense. The timekeeper lock is only for reading.

Only for writing.. sorry.. still drinking my coffee.

> What I was suggesting to you off line is to have something that avoids
> spinning on the seqcounter should if a bug occurs and we IPI all the
> cpus, that we don't deadlock or block any printk messages.

And more clearly here, if a cpu takes a write on the seqcounter in
update_wall_time() and at that point another cpu hits a bug, and IPIs
the cpus, the system would deadlock. That's really what I want to
avoid.

thanks
-john

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 17:28   ` John Stultz
  2016-01-06 17:33     ` John Stultz
@ 2016-01-06 17:34     ` Thomas Gleixner
  2016-01-06 18:09       ` Prarit Bhargava
  2016-01-06 19:04       ` John Stultz
  1 sibling, 2 replies; 21+ messages in thread
From: Thomas Gleixner @ 2016-01-06 17:34 UTC (permalink / raw)
  To: John Stultz
  Cc: Prarit Bhargava, lkml, Ingo Molnar, Xunlei Pang, Peter Zijlstra,
	Baolin Wang, Arnd Bergmann

On Wed, 6 Jan 2016, John Stultz wrote:
> On Wed, Jan 6, 2016 at 5:00 AM, Prarit Bhargava <prarit@redhat.com> wrote:
> > -ktime_t ktime_get_with_offset(enum tk_offsets offs)
> > +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
> >  {
> >         struct timekeeper *tk = &tk_core.timekeeper;
> >         unsigned int seq;
> >         ktime_t base, *offset = offsets[offs];
> >         s64 nsecs;
> > +       unsigned long flags = 0;
> > +
> > +       if (unlikely(!timekeeping_initialized))
> > +               return ktime_set(0, 0);
> >
> >         WARN_ON(timekeeping_suspended);
> >
> > +       if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
> > +               return ktime_set(KTIME_MAX, 0);
> 
> Wait.. this doesn't make sense. The timekeeper lock is only for reading.
> 
> What I was suggesting to you off line is to have something that avoids
> spinning on the seqcounter should if a bug occurs and we IPI all the
> cpus, that we don't deadlock or block any printk messages.

We could also extend the fast timekeeper with boot/real/tai extensions and use
that for printk. You can use ktime_get_mono_fast_ns() today.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 17:33     ` John Stultz
@ 2016-01-06 18:06       ` Prarit Bhargava
  2016-01-06 18:10         ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Prarit Bhargava @ 2016-01-06 18:06 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Thomas Gleixner, Ingo Molnar, Xunlei Pang, Peter Zijlstra,
	Baolin Wang, Arnd Bergmann



On 01/06/2016 12:33 PM, John Stultz wrote:
> On Wed, Jan 6, 2016 at 9:28 AM, John Stultz <john.stultz@linaro.org> wrote:
>> On Wed, Jan 6, 2016 at 5:00 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>>> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
>>> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
>>>  {
>>>         struct timekeeper *tk = &tk_core.timekeeper;
>>>         unsigned int seq;
>>>         ktime_t base, *offset = offsets[offs];
>>>         s64 nsecs;
>>> +       unsigned long flags = 0;
>>> +
>>> +       if (unlikely(!timekeeping_initialized))
>>> +               return ktime_set(0, 0);
>>>
>>>         WARN_ON(timekeeping_suspended);
>>>
>>> +       if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
>>> +               return ktime_set(KTIME_MAX, 0);
>>
>> Wait.. this doesn't make sense. The timekeeper lock is only for reading.
> 
> Only for writing.. sorry.. still drinking my coffee.
> 
>> What I was suggesting to you off line is to have something that avoids
>> spinning on the seqcounter should if a bug occurs and we IPI all the
>> cpus, that we don't deadlock or block any printk messages.
> 
> And more clearly here, if a cpu takes a write on the seqcounter in
> update_wall_time() and at that point another cpu hits a bug, and IPIs
> the cpus, the system would deadlock. That's really what I want to
> avoid.

Right -- but the only time that the seq_lock is taken for writing is when the
timekeeper_lock is acquired (including update_wall_time()).  This means that

if (!raw_spin_trylock_irqsave(&timekeeper_lock, flags))

is equivalent to

if (tk_core.seq & 1) // sequence_t is odd when writing

The problem with the latter is that it is possible that there is no
protection from a writer setting tk_core.seq odd AFTER I've read it,
and the protection for that AFAICT comes from the timekeeper_lock.

That means I need to check to see if the timekeeper_lock is locked.  And
the patch does exactly that -- checks to see if the lock is available, and
if not avoids spinning on the seq_lock.

P.



> 
> thanks
> -john
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 17:34     ` Thomas Gleixner
@ 2016-01-06 18:09       ` Prarit Bhargava
  2016-01-06 18:12         ` Thomas Gleixner
  2016-01-06 19:04       ` John Stultz
  1 sibling, 1 reply; 21+ messages in thread
From: Prarit Bhargava @ 2016-01-06 18:09 UTC (permalink / raw)
  To: Thomas Gleixner, John Stultz
  Cc: lkml, Ingo Molnar, Xunlei Pang, Peter Zijlstra, Baolin Wang,
	Arnd Bergmann



On 01/06/2016 12:34 PM, Thomas Gleixner wrote:
> On Wed, 6 Jan 2016, John Stultz wrote:
>> On Wed, Jan 6, 2016 at 5:00 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>>> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
>>> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
>>>  {
>>>         struct timekeeper *tk = &tk_core.timekeeper;
>>>         unsigned int seq;
>>>         ktime_t base, *offset = offsets[offs];
>>>         s64 nsecs;
>>> +       unsigned long flags = 0;
>>> +
>>> +       if (unlikely(!timekeeping_initialized))
>>> +               return ktime_set(0, 0);
>>>
>>>         WARN_ON(timekeeping_suspended);
>>>
>>> +       if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
>>> +               return ktime_set(KTIME_MAX, 0);
>>
>> Wait.. this doesn't make sense. The timekeeper lock is only for reading.
>>
>> What I was suggesting to you off line is to have something that avoids
>> spinning on the seqcounter should if a bug occurs and we IPI all the
>> cpus, that we don't deadlock or block any printk messages.
> 
> We could also extend the fast timekeeper with boot/real/tai extensions and use
> that for printk. You can use ktime_get_mono_fast_ns() today.
> 

Thanks tglx -- I thought about doing that but was put off by the comments
in __ktime_get_fast_ns() which point out that we could see backwards time
stamps.  But I see your point -- I could do the same "last_time_stamp" check
and use "??" in the output.

That's a far better approach here, and unless John has any objections I'll
go with that.

P.


> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 18:06       ` Prarit Bhargava
@ 2016-01-06 18:10         ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2016-01-06 18:10 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: John Stultz, lkml, Ingo Molnar, Xunlei Pang, Peter Zijlstra,
	Baolin Wang, Arnd Bergmann

On Wed, 6 Jan 2016, Prarit Bhargava wrote:
> On 01/06/2016 12:33 PM, John Stultz wrote:
> > On Wed, Jan 6, 2016 at 9:28 AM, John Stultz <john.stultz@linaro.org> wrote:
> >> On Wed, Jan 6, 2016 at 5:00 AM, Prarit Bhargava <prarit@redhat.com> wrote:
> >>> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
> >>> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
> >>>  {
> >>>         struct timekeeper *tk = &tk_core.timekeeper;
> >>>         unsigned int seq;
> >>>         ktime_t base, *offset = offsets[offs];
> >>>         s64 nsecs;
> >>> +       unsigned long flags = 0;
> >>> +
> >>> +       if (unlikely(!timekeeping_initialized))
> >>> +               return ktime_set(0, 0);
> >>>
> >>>         WARN_ON(timekeeping_suspended);
> >>>
> >>> +       if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
> >>> +               return ktime_set(KTIME_MAX, 0);
> >>
> >> Wait.. this doesn't make sense. The timekeeper lock is only for reading.
> > 
> > Only for writing.. sorry.. still drinking my coffee.
> > 
> >> What I was suggesting to you off line is to have something that avoids
> >> spinning on the seqcounter should if a bug occurs and we IPI all the
> >> cpus, that we don't deadlock or block any printk messages.
> > 
> > And more clearly here, if a cpu takes a write on the seqcounter in
> > update_wall_time() and at that point another cpu hits a bug, and IPIs
> > the cpus, the system would deadlock. That's really what I want to
> > avoid.
> 
> Right -- but the only time that the seq_lock is taken for writing is when the
> timekeeper_lock is acquired (including update_wall_time()).  This means that
> 
> if (!raw_spin_trylock_irqsave(&timekeeper_lock, flags))
> 
> is equivalent to
> 
> if (tk_core.seq & 1) // sequence_t is odd when writing
> 
> The problem with the latter is that it is possible that there is no
> protection from a writer setting tk_core.seq odd AFTER I've read it,
> and the protection for that AFAICT comes from the timekeeper_lock.
> 
> That means I need to check to see if the timekeeper_lock is locked.  And
> the patch does exactly that -- checks to see if the lock is available, and
> if not avoids spinning on the seq_lock.

And no, we don't want that in every code path.

We already have the concept of the fast timekeeper, which is lockless and NMI
safe. It's useable for tracing and perf, so it can be used for printk as well.

It supports clock monotonic today, which is good enough for printk, but it
could be extended to other clocks if really required.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 18:09       ` Prarit Bhargava
@ 2016-01-06 18:12         ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2016-01-06 18:12 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: John Stultz, lkml, Ingo Molnar, Xunlei Pang, Peter Zijlstra,
	Baolin Wang, Arnd Bergmann

On Wed, 6 Jan 2016, Prarit Bhargava wrote:
> On 01/06/2016 12:34 PM, Thomas Gleixner wrote:
> > On Wed, 6 Jan 2016, John Stultz wrote:
> >> On Wed, Jan 6, 2016 at 5:00 AM, Prarit Bhargava <prarit@redhat.com> wrote:
> >>> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
> >>> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
> >>>  {
> >>>         struct timekeeper *tk = &tk_core.timekeeper;
> >>>         unsigned int seq;
> >>>         ktime_t base, *offset = offsets[offs];
> >>>         s64 nsecs;
> >>> +       unsigned long flags = 0;
> >>> +
> >>> +       if (unlikely(!timekeeping_initialized))
> >>> +               return ktime_set(0, 0);
> >>>
> >>>         WARN_ON(timekeeping_suspended);
> >>>
> >>> +       if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
> >>> +               return ktime_set(KTIME_MAX, 0);
> >>
> >> Wait.. this doesn't make sense. The timekeeper lock is only for reading.
> >>
> >> What I was suggesting to you off line is to have something that avoids
> >> spinning on the seqcounter should if a bug occurs and we IPI all the
> >> cpus, that we don't deadlock or block any printk messages.
> > 
> > We could also extend the fast timekeeper with boot/real/tai extensions and use
> > that for printk. You can use ktime_get_mono_fast_ns() today.
> > 
> 
> Thanks tglx -- I thought about doing that but was put off by the comments
> in __ktime_get_fast_ns() which point out that we could see backwards time
> stamps.  But I see your point -- I could do the same "last_time_stamp" check
> and use "??" in the output.

We talk about single digit nanoseconds here and in the case of a crash/bug we
really do not care about that at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 17:34     ` Thomas Gleixner
  2016-01-06 18:09       ` Prarit Bhargava
@ 2016-01-06 19:04       ` John Stultz
  2016-01-06 19:06         ` Prarit Bhargava
  1 sibling, 1 reply; 21+ messages in thread
From: John Stultz @ 2016-01-06 19:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Prarit Bhargava, lkml, Ingo Molnar, Xunlei Pang, Peter Zijlstra,
	Baolin Wang, Arnd Bergmann

On Wed, Jan 6, 2016 at 9:34 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Wed, 6 Jan 2016, John Stultz wrote:
>> On Wed, Jan 6, 2016 at 5:00 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>> > -ktime_t ktime_get_with_offset(enum tk_offsets offs)
>> > +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
>> >  {
>> >         struct timekeeper *tk = &tk_core.timekeeper;
>> >         unsigned int seq;
>> >         ktime_t base, *offset = offsets[offs];
>> >         s64 nsecs;
>> > +       unsigned long flags = 0;
>> > +
>> > +       if (unlikely(!timekeeping_initialized))
>> > +               return ktime_set(0, 0);
>> >
>> >         WARN_ON(timekeeping_suspended);
>> >
>> > +       if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
>> > +               return ktime_set(KTIME_MAX, 0);
>>
>> Wait.. this doesn't make sense. The timekeeper lock is only for reading.
>>
>> What I was suggesting to you off line is to have something that avoids
>> spinning on the seqcounter should if a bug occurs and we IPI all the
>> cpus, that we don't deadlock or block any printk messages.
>
> We could also extend the fast timekeeper with boot/real/tai extensions and use
> that for printk. You can use ktime_get_mono_fast_ns() today.

Ack. There'd be a chance for odd values around when the time is set,
but for debug printks I think its not critical.

thanks
-john

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset()
  2016-01-06 19:04       ` John Stultz
@ 2016-01-06 19:06         ` Prarit Bhargava
  0 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2016-01-06 19:06 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner
  Cc: lkml, Ingo Molnar, Xunlei Pang, Peter Zijlstra, Baolin Wang,
	Arnd Bergmann



On 01/06/2016 02:04 PM, John Stultz wrote:
> On Wed, Jan 6, 2016 at 9:34 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Wed, 6 Jan 2016, John Stultz wrote:
>>> On Wed, Jan 6, 2016 at 5:00 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>>>> -ktime_t ktime_get_with_offset(enum tk_offsets offs)
>>>> +ktime_t ktime_get_with_offset(enum tk_offsets offs, int trylock)
>>>>  {
>>>>         struct timekeeper *tk = &tk_core.timekeeper;
>>>>         unsigned int seq;
>>>>         ktime_t base, *offset = offsets[offs];
>>>>         s64 nsecs;
>>>> +       unsigned long flags = 0;
>>>> +
>>>> +       if (unlikely(!timekeeping_initialized))
>>>> +               return ktime_set(0, 0);
>>>>
>>>>         WARN_ON(timekeeping_suspended);
>>>>
>>>> +       if (trylock && !raw_spin_trylock_irqsave(&timekeeper_lock, flags))
>>>> +               return ktime_set(KTIME_MAX, 0);
>>>
>>> Wait.. this doesn't make sense. The timekeeper lock is only for reading.
>>>
>>> What I was suggesting to you off line is to have something that avoids
>>> spinning on the seqcounter should if a bug occurs and we IPI all the
>>> cpus, that we don't deadlock or block any printk messages.
>>
>> We could also extend the fast timekeeper with boot/real/tai extensions and use
>> that for printk. You can use ktime_get_mono_fast_ns() today.
> 
> Ack. There'd be a chance for odd values around when the time is set,
> but for debug printks I think its not critical.

I'll convert to this in v2.

Thanks for the input everyone!

P.

> 
> thanks
> -john
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] printk, Add printk.clock kernel parameter
  2016-01-06 13:00 ` [PATCH 2/2] printk, Add printk.clock kernel parameter Prarit Bhargava
@ 2016-01-07 14:57   ` Petr Mladek
  2016-01-07 15:38     ` Prarit Bhargava
  0 siblings, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2016-01-07 14:57 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, John Stultz, Xunlei Pang, Thomas Gleixner,
	Baolin Wang, Andrew Morton, Greg Kroah-Hartman, Tejun Heo,
	Peter Hurley, Vasily Averin, Joe Perches

On Wed 2016-01-06 08:00:34, Prarit Bhargava wrote:
> This patch introduces printk.clock=[local|boot|real|tai] allowing a
> user to specify an adjusted clock to use with printk timestamps.  The
> hardware clock, or the existing functionality, is preserved by default.
>
> If the timekeeper_lock (used to protect writes from reads) is under
> contention, the last known good time is output again and "??" is used as
> the last two digits of time.  This avoids contention issues with a CPU
> panicking while another CPU is writing time data.
> 
> ---
>  include/linux/time64.h |    2 +
>  kernel/printk/printk.c |  105 ++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 99 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/time64.h b/include/linux/time64.h
> index 367d5af..f4a35d9 100644
> --- a/include/linux/time64.h
> +++ b/include/linux/time64.h
> @@ -39,6 +39,8 @@ struct itimerspec64 {
>  #define TIME64_MAX			((s64)~((u64)1 << 63))
>  #define KTIME_MAX			((s64)~((u64)1 << 63))
>  #define KTIME_SEC_MAX			(KTIME_MAX / NSEC_PER_SEC)
> +/* Maximum value returned by ktime_to_ns() */
> +#define KTIME_MAX_NS 			((u64)((U64_MAX>>1)))

Please, use KTIME_NS_MAX to use the same naming scheme as
KTIME_SEC_MAX.

Well, I would return 0 when the time is not known. See also below.

>  #if __BITS_PER_LONG == 64
>  
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 2ce8826..ffc05ee 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -419,6 +419,83 @@ static u32 truncate_msg(u16 *text_len, u16 *trunc_msg_len,
>  	return msg_used_size(*text_len + *trunc_msg_len, 0, pad_len);
>  }
>  
> +static u64 printk_try_real_ns(void)
> +{
> +	return ktime_try_real_ns();
> +}
> +
> +static u64 printk_try_boot_ns(void)
> +{
> +	return ktime_try_boot_ns();
> +}
> +
> +static u64 printk_try_tai_ns(void)
> +{
> +	return ktime_try_tai_ns();
> +}

Do we need these wrappers? Why not using ktime_try_xxx_ns() directly?
You already use local_clock().


> +static int printk_clock_param_set(const char *val,
> +				  const struct kernel_param *kp)
> +{
> +	char *printk_clock_new = strstrip((char *)val);
> +	int ret;
> +
> +	/* check if change is needed */
> +	if (!strcmp(printk_clock_new, printk_clock))
> +		return 0;
> +
> +	if (!strncmp("local", printk_clock_new, 5)) {
> +		ret = param_set_charp(printk_clock_new, kp);
> +		if (ret)
> +			return ret;
> +		printk_clock_fn = &local_clock;
> +	} else if (!strncmp("real", printk_clock_new, 4)) {
> +		ret = param_set_charp(printk_clock_new, kp);
> +		if (ret)
> +			return ret;
> +		printk_clock_fn = &printk_try_real_ns;
> +	} else if (!strncmp("boot", printk_clock_new, 4)) {
> +		ret = param_set_charp(printk_clock_new, kp);
> +		if (ret)
> +			return ret;
> +		printk_clock_fn = &printk_try_boot_ns;
> +	} else if (!strncmp("tai", printk_clock_new, 3)) {
> +		ret = param_set_charp(printk_clock_new, kp);
> +		if (ret)
> +			return ret;
> +		printk_clock_fn = &printk_try_tai_ns;
> +	}
> +
> +	/* Did the clock change ? */
> +	if (strcmp(printk_clock_new, printk_clock))
> +		return -EINVAL;

I wonder if we need this extra check. It might be enough to
print the error message in the final "else" above.

> +	pr_info("printk: timestamp set to %s clock.\n", printk_clock);
> +	return 0;
> +}
> +
> +static struct kernel_param_ops printk_clock_param_ops = {
> +	.set =		printk_clock_param_set,
> +	.get =		param_get_charp,
> +};
> +
> +module_param_cb(clock, &printk_clock_param_ops, &printk_clock, 0644);
> +MODULE_PARM_DESC(clock, "Clock used for printk timestamps.  Can be local (hardware/default), boot, real, or tai.\n");

I have problems to parse this. It seems that the most common style
used for possible values is to putthem into brackets. I wonder
if this is better readable.

MODULE_PARM_DESC(clock, "Clock used for printk timestamps (local = default, boot, real, tai).");


> +
>  /* insert record into the buffer, discard old ones, update heads */
>  static int log_store(int facility, int level,
>  		     enum log_flags flags, u64 ts_nsec,
> @@ -467,7 +544,7 @@ static int log_store(int facility, int level,
>  	if (ts_nsec > 0)
>  		msg->ts_nsec = ts_nsec;
>  	else
> -		msg->ts_nsec = local_clock();
> +		msg->ts_nsec = printk_get_timestamp();

Hmm, one problem is that each clock returns another type of time.
"local" and "boot" clocks returns the number of ns since the boot.
While "real" and "tai" clocks returns the number of ns since 1.1.1970
or so.

The tools reading the timestamps are confused by this. For example,
I get this:

$> echo boot >/sys/module/printk/parameters/clock
$> echo local >/sys/module/printk/parameters/clock
$> echo real >/sys/module/printk/parameters/clock
$> echo tai >/sys/module/printk/parameters/clock
$> dmesg | tail -6
[    6.976593] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[    6.977168] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   77.500483] printk: timestamp set to boot clock.
[   81.419957] printk: timestamp set to local clock.
[1452177961.544909] printk: timestamp set to real clock.
[1452177965.224824] printk: timestamp set to tai clock.
$> dmesg -T | tail -6
[Thu Jan  7 15:44:41 2016] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[Thu Jan  7 15:44:41 2016] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[Thu Jan  7 15:45:52 2016] printk: timestamp set to boot clock.
[Thu Jan  7 15:45:56 2016] printk: timestamp set to local clock.
[Fri Jan 13 06:30:36 2062] printk: timestamp set to real clock.
[Fri Jan 13 06:30:40 2062] printk: timestamp set to tai clock.

Please, note that the last messages looks as they are printed in
a far far future ;-)


>  	memset(log_dict(msg) + dict_len, 0, pad_len);
>  	msg->len = size;
>  
> @@ -1039,18 +1116,30 @@ module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
>  
>  static size_t print_time(u64 ts, char *buf)
>  {
> -	unsigned long rem_nsec;
> +	unsigned long rem;
> +	size_t size;
>  
>  	if (!printk_time)
>  		return 0;
>  
> -	rem_nsec = do_div(ts, 1000000000);
> +	if (ts == KTIME_MAX_NS)
> +		ts = last_time_stamp;
>  
> -	if (!buf)
> -		return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
> +	if (buf) {
> +		size = INT_MAX;
> +		rem = do_div(ts, 1000000000) / 1000;
> +	} else {
> +		size = 0;
> +		rem = 0;
> +	}
>  
> -	return sprintf(buf, "[%5lu.%06lu] ",
> -		       (unsigned long)ts, rem_nsec / 1000);
> +	if (ts != KTIME_MAX_NS) {
> +		return snprintf(buf, size, "[%5lu.%06lu] ",
> +			       (unsigned long)ts, rem);
> +	}
> +	/* return last_time_stamp with ?? warning */
> +	return snprintf(buf, size, "[%5lu.%04lu??] ",
> +		       (unsigned long)ts, rem);

I am not sure if this is worth the effort. All other tools parsing the
log will be confused by this. Also there are tools that read the ring
buffer dirrectly, e.g. "crash" reading a crashdump. They will need to
get updated to handle KTIME_MAX_NS correctly.

Instead, I would modify printk_get_timestamp() to return the last time
stamp if the current time is not known. Then log_buf will included
reasonable values.

IMHO, it is not a big deal if two messages have the same time stamp.
It already happens for messages that are printed before timekeeping
machine is initialized. They all have zero time. Also the time stamp
is created under the logbuf_lock(). It means that it is not the time
when the printk() was called but it is the time when the printk()
got access to the log_buf. By other words, the timestamp is not
perfect anyway.

Even better solution would be to always get some reasonable time.
Especially, when the message is printed long time after the previous
one.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] printk, Add printk.clock kernel parameter
  2016-01-07 14:57   ` Petr Mladek
@ 2016-01-07 15:38     ` Prarit Bhargava
  2016-01-07 15:52       ` Petr Mladek
  0 siblings, 1 reply; 21+ messages in thread
From: Prarit Bhargava @ 2016-01-07 15:38 UTC (permalink / raw)
  To: Petr Mladek
  Cc: linux-kernel, John Stultz, Xunlei Pang, Thomas Gleixner,
	Baolin Wang, Andrew Morton, Greg Kroah-Hartman, Tejun Heo,
	Peter Hurley, Vasily Averin, Joe Perches



On 01/07/2016 09:57 AM, Petr Mladek wrote:
> On Wed 2016-01-06 08:00:34, Prarit Bhargava wrote:
>> This patch introduces printk.clock=[local|boot|real|tai] allowing a
>> user to specify an adjusted clock to use with printk timestamps.  The
>> hardware clock, or the existing functionality, is preserved by default.
>>
>> If the timekeeper_lock (used to protect writes from reads) is under
>> contention, the last known good time is output again and "??" is used as
>> the last two digits of time.  This avoids contention issues with a CPU
>> panicking while another CPU is writing time data.
>>
>> ---
>>  include/linux/time64.h |    2 +
>>  kernel/printk/printk.c |  105 ++++++++++++++++++++++++++++++++++++++++++++----
>>  2 files changed, 99 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/time64.h b/include/linux/time64.h
>> index 367d5af..f4a35d9 100644
>> --- a/include/linux/time64.h
>> +++ b/include/linux/time64.h
>> @@ -39,6 +39,8 @@ struct itimerspec64 {
>>  #define TIME64_MAX			((s64)~((u64)1 << 63))
>>  #define KTIME_MAX			((s64)~((u64)1 << 63))
>>  #define KTIME_SEC_MAX			(KTIME_MAX / NSEC_PER_SEC)
>> +/* Maximum value returned by ktime_to_ns() */
>> +#define KTIME_MAX_NS 			((u64)((U64_MAX>>1)))
> 
> Please, use KTIME_NS_MAX to use the same naming scheme as
> KTIME_SEC_MAX.
> 
> Well, I would return 0 when the time is not known. See also below.
> 
>>  #if __BITS_PER_LONG == 64
>>  
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 2ce8826..ffc05ee 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -419,6 +419,83 @@ static u32 truncate_msg(u16 *text_len, u16 *trunc_msg_len,
>>  	return msg_used_size(*text_len + *trunc_msg_len, 0, pad_len);
>>  }
>>  
>> +static u64 printk_try_real_ns(void)
>> +{
>> +	return ktime_try_real_ns();
>> +}
>> +
>> +static u64 printk_try_boot_ns(void)
>> +{
>> +	return ktime_try_boot_ns();
>> +}
>> +
>> +static u64 printk_try_tai_ns(void)
>> +{
>> +	return ktime_try_tai_ns();
>> +}
> 
> Do we need these wrappers? Why not using ktime_try_xxx_ns() directly?
> You already use local_clock().

Unfortunately ktime_* are inlined functions, and I've mimicked the currently
coding style there.

In any case I'm doing a v2 with different functions.  I'll see if making this
stuff not-inlined is any better.

> 
> 
>> +static int printk_clock_param_set(const char *val,
>> +				  const struct kernel_param *kp)
>> +{
>> +	char *printk_clock_new = strstrip((char *)val);
>> +	int ret;
>> +
>> +	/* check if change is needed */
>> +	if (!strcmp(printk_clock_new, printk_clock))
>> +		return 0;
>> +
>> +	if (!strncmp("local", printk_clock_new, 5)) {
>> +		ret = param_set_charp(printk_clock_new, kp);
>> +		if (ret)
>> +			return ret;
>> +		printk_clock_fn = &local_clock;
>> +	} else if (!strncmp("real", printk_clock_new, 4)) {
>> +		ret = param_set_charp(printk_clock_new, kp);
>> +		if (ret)
>> +			return ret;
>> +		printk_clock_fn = &printk_try_real_ns;
>> +	} else if (!strncmp("boot", printk_clock_new, 4)) {
>> +		ret = param_set_charp(printk_clock_new, kp);
>> +		if (ret)
>> +			return ret;
>> +		printk_clock_fn = &printk_try_boot_ns;
>> +	} else if (!strncmp("tai", printk_clock_new, 3)) {
>> +		ret = param_set_charp(printk_clock_new, kp);
>> +		if (ret)
>> +			return ret;
>> +		printk_clock_fn = &printk_try_tai_ns;
>> +	}
>> +
>> +	/* Did the clock change ? */
>> +	if (strcmp(printk_clock_new, printk_clock))
>> +		return -EINVAL;
> 
> I wonder if we need this extra check. It might be enough to
> print the error message in the final "else" above.

I was trying to catch a local to local change or a real to real change.
But yeah ... I could sort of do that in the final else above.

> 
>> +	pr_info("printk: timestamp set to %s clock.\n", printk_clock);
>> +	return 0;
>> +}
>> +
>> +static struct kernel_param_ops printk_clock_param_ops = {
>> +	.set =		printk_clock_param_set,
>> +	.get =		param_get_charp,
>> +};
>> +
>> +module_param_cb(clock, &printk_clock_param_ops, &printk_clock, 0644);
>> +MODULE_PARM_DESC(clock, "Clock used for printk timestamps.  Can be local (hardware/default), boot, real, or tai.\n");
> 
> I have problems to parse this. It seems that the most common style
> used for possible values is to putthem into brackets. I wonder
> if this is better readable.
> 
> MODULE_PARM_DESC(clock, "Clock used for printk timestamps (local = default, boot, real, tai).");
> 

Ah, thanks! :)  I was wondering if there was a coding style preference there
when I wrote this up.

> 
>> +
>>  /* insert record into the buffer, discard old ones, update heads */
>>  static int log_store(int facility, int level,
>>  		     enum log_flags flags, u64 ts_nsec,
>> @@ -467,7 +544,7 @@ static int log_store(int facility, int level,
>>  	if (ts_nsec > 0)
>>  		msg->ts_nsec = ts_nsec;
>>  	else
>> -		msg->ts_nsec = local_clock();
>> +		msg->ts_nsec = printk_get_timestamp();
> 
> Hmm, one problem is that each clock returns another type of time.
> "local" and "boot" clocks returns the number of ns since the boot.
> While "real" and "tai" clocks returns the number of ns since 1.1.1970
> or so.
> 
> The tools reading the timestamps are confused by this. For example,
> I get this:
> 
> $> echo boot >/sys/module/printk/parameters/clock
> $> echo local >/sys/module/printk/parameters/clock
> $> echo real >/sys/module/printk/parameters/clock
> $> echo tai >/sys/module/printk/parameters/clock
> $> dmesg | tail -6
> [    6.976593] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> [    6.977168] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [   77.500483] printk: timestamp set to boot clock.
> [   81.419957] printk: timestamp set to local clock.
> [1452177961.544909] printk: timestamp set to real clock.
> [1452177965.224824] printk: timestamp set to tai clock.
> $> dmesg -T | tail -6
> [Thu Jan  7 15:44:41 2016] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> [Thu Jan  7 15:44:41 2016] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [Thu Jan  7 15:45:52 2016] printk: timestamp set to boot clock.
> [Thu Jan  7 15:45:56 2016] printk: timestamp set to local clock.
> [Fri Jan 13 06:30:36 2062] printk: timestamp set to real clock.
> [Fri Jan 13 06:30:40 2062] printk: timestamp set to tai clock.
> 
> Please, note that the last messages looks as they are printed in
> a far far future ;-)

Heh :)  I didn't even think of testing that.  I'll have to think about that a
bit more.  systemd logging must assume boot time/local time.  I'll ping those
guys with a patch if/when the next version of this gets accepted.

Thanks for catching this.

> 
> 
>>  	memset(log_dict(msg) + dict_len, 0, pad_len);
>>  	msg->len = size;
>>  
>> @@ -1039,18 +1116,30 @@ module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
>>  
>>  static size_t print_time(u64 ts, char *buf)
>>  {
>> -	unsigned long rem_nsec;
>> +	unsigned long rem;
>> +	size_t size;
>>  
>>  	if (!printk_time)
>>  		return 0;
>>  
>> -	rem_nsec = do_div(ts, 1000000000);
>> +	if (ts == KTIME_MAX_NS)
>> +		ts = last_time_stamp;
>>  
>> -	if (!buf)
>> -		return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
>> +	if (buf) {
>> +		size = INT_MAX;
>> +		rem = do_div(ts, 1000000000) / 1000;
>> +	} else {
>> +		size = 0;
>> +		rem = 0;
>> +	}
>>  
>> -	return sprintf(buf, "[%5lu.%06lu] ",
>> -		       (unsigned long)ts, rem_nsec / 1000);
>> +	if (ts != KTIME_MAX_NS) {
>> +		return snprintf(buf, size, "[%5lu.%06lu] ",
>> +			       (unsigned long)ts, rem);
>> +	}
>> +	/* return last_time_stamp with ?? warning */
>> +	return snprintf(buf, size, "[%5lu.%04lu??] ",
>> +		       (unsigned long)ts, rem);
> 
> I am not sure if this is worth the effort. All other tools parsing the
> log will be confused by this. Also there are tools that read the ring
> buffer dirrectly, e.g. "crash" reading a crashdump. They will need to
> get updated to handle KTIME_MAX_NS correctly.

Yep -- the thread contains suggestions on how to avoid all of this by using the
NMI safe ktime_* calls (and implementing new ones).  I'm going to do that which
should nicely clean all of this up.

Thanks for the review Petr,

P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] printk, Add printk.clock kernel parameter
  2016-01-07 15:38     ` Prarit Bhargava
@ 2016-01-07 15:52       ` Petr Mladek
  2016-01-08  9:04         ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Petr Mladek @ 2016-01-07 15:52 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, John Stultz, Xunlei Pang, Thomas Gleixner,
	Baolin Wang, Andrew Morton, Greg Kroah-Hartman, Tejun Heo,
	Peter Hurley, Vasily Averin, Joe Perches

On Thu 2016-01-07 10:38:37, Prarit Bhargava wrote:
> 
> 
> On 01/07/2016 09:57 AM, Petr Mladek wrote:
> > On Wed 2016-01-06 08:00:34, Prarit Bhargava wrote:
> >> This patch introduces printk.clock=[local|boot|real|tai] allowing a
> >> user to specify an adjusted clock to use with printk timestamps.  The
> >> hardware clock, or the existing functionality, is preserved by default.
> >> +	pr_info("printk: timestamp set to %s clock.\n", printk_clock);
> >> +	return 0;
> >> +}
> >> +
> >> +static struct kernel_param_ops printk_clock_param_ops = {
> >> +	.set =		printk_clock_param_set,
> >> +	.get =		param_get_charp,
> >> +};
> >> +
> >> +module_param_cb(clock, &printk_clock_param_ops, &printk_clock, 0644);
> >> +MODULE_PARM_DESC(clock, "Clock used for printk timestamps.  Can be local (hardware/default), boot, real, or tai.\n");
> > 
> > I have problems to parse this. It seems that the most common style
> > used for possible values is to putthem into brackets. I wonder
> > if this is better readable.
> > 
> > MODULE_PARM_DESC(clock, "Clock used for printk timestamps (local = default, boot, real, tai).");
> > 
> 
> Ah, thanks! :)  I was wondering if there was a coding style preference there
> when I wrote this up.

Heh, I am not sure if there is any coding style for this. I just went
over the output from

      git grep MODULE_PARM_DESC

> > 
> >> +
> >>  /* insert record into the buffer, discard old ones, update heads */
> >>  static int log_store(int facility, int level,
> >>  		     enum log_flags flags, u64 ts_nsec,
> >> @@ -467,7 +544,7 @@ static int log_store(int facility, int level,
> >>  	if (ts_nsec > 0)
> >>  		msg->ts_nsec = ts_nsec;
> >>  	else
> >> -		msg->ts_nsec = local_clock();
> >> +		msg->ts_nsec = printk_get_timestamp();
> > 
> > Hmm, one problem is that each clock returns another type of time.
> > "local" and "boot" clocks returns the number of ns since the boot.
> > While "real" and "tai" clocks returns the number of ns since 1.1.1970
> > or so.
> > 
> > The tools reading the timestamps are confused by this. For example,
> > I get this:
> > 
> > $> echo boot >/sys/module/printk/parameters/clock
> > $> echo local >/sys/module/printk/parameters/clock
> > $> echo real >/sys/module/printk/parameters/clock
> > $> echo tai >/sys/module/printk/parameters/clock
> > $> dmesg | tail -6
> > [    6.976593] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> > [    6.977168] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> > [   77.500483] printk: timestamp set to boot clock.
> > [   81.419957] printk: timestamp set to local clock.
> > [1452177961.544909] printk: timestamp set to real clock.
> > [1452177965.224824] printk: timestamp set to tai clock.
> > $> dmesg -T | tail -6
> > [Thu Jan  7 15:44:41 2016] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> > [Thu Jan  7 15:44:41 2016] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> > [Thu Jan  7 15:45:52 2016] printk: timestamp set to boot clock.
> > [Thu Jan  7 15:45:56 2016] printk: timestamp set to local clock.
> > [Fri Jan 13 06:30:36 2062] printk: timestamp set to real clock.
> > [Fri Jan 13 06:30:40 2062] printk: timestamp set to tai clock.
> > 
> > Please, note that the last messages looks as they are printed in
> > a far far future ;-)
> 
> Heh :)  I didn't even think of testing that.  I'll have to think about that a
> bit more.  systemd logging must assume boot time/local time.  I'll ping those
> guys with a patch if/when the next version of this gets accepted.

I am not sure how many tools would be affected by this. An easier
solution would be to normalize all times and always save the relative
time since the boot as it is done now.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/2] printk, Add printk.clock kernel parameter
  2016-01-07 15:52       ` Petr Mladek
@ 2016-01-08  9:04         ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2016-01-08  9:04 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Prarit Bhargava, linux-kernel, John Stultz, Xunlei Pang,
	Baolin Wang, Andrew Morton, Greg Kroah-Hartman, Tejun Heo,
	Peter Hurley, Vasily Averin, Joe Perches

On Thu, 7 Jan 2016, Petr Mladek wrote:
> On Thu 2016-01-07 10:38:37, Prarit Bhargava wrote:
> > Heh :)  I didn't even think of testing that.  I'll have to think about that a
> > bit more.  systemd logging must assume boot time/local time.  I'll ping those
> > guys with a patch if/when the next version of this gets accepted.
> 
> I am not sure how many tools would be affected by this.

I don't see a point in a gazillion of different clocks for printk either.

> An easier solution would be to normalize all times and always save the
> relative time since the boot as it is done now.

Why? What's wrong with simply using the existing ktime_get_mono_fast_ns()
instead of local_clock() and be done with it?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/2] printk, Add printk.clock kernel parameter
  2016-01-13 12:34 [PATCH 0/2] printk, Add printk.clock kernel parameter [v2] Prarit Bhargava
@ 2016-01-13 12:34 ` Prarit Bhargava
  0 siblings, 0 replies; 21+ messages in thread
From: Prarit Bhargava @ 2016-01-13 12:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, John Stultz, Xunlei Pang, Thomas Gleixner,
	Baolin Wang, Andrew Morton, Greg Kroah-Hartman, Petr Mladek,
	Tejun Heo, Peter Hurley, Vasily Averin, Joe Perches

The script used in the analysis below:

dmesg_with_human_timestamps () {
    $(type -P dmesg) "$@" | perl -w -e 'use strict;
        my ($uptime) = do { local @ARGV="/proc/uptime";<>}; ($uptime) = ($uptime =~ /^(\d+)\./);
        foreach my $line (<>) {
            printf( ($line=~/^\[\s*(\d+)\.\d+\](.+)/) ? ( "[%s]%s\n", scalar localtime(time - $uptime + $1), $2 ) : $line )
        }'
}

dmesg_with_human_timestamps

----8<----

Over the past years I've seen many reports of bugs that include
time-stamped kernel logs (enabled when CONFIG_PRINTK_TIME=y or
print.time=1 is specified as a kernel parameter) that do not align
with either external time stamped logs or /var/log/messages.

For example,

[root@intel-wildcatpass-06 ~]# date; echo "Hello!" > /dev/kmsg ; date
Thu Dec 17 13:58:31 EST 2015
Thu Dec 17 13:58:31 EST 2015

which displays

[83973.768912] Hello!

on the serial console.

Running a script to convert this to "boot time",

[root@intel-wildcatpass-06 ~]# ./human.sh  | tail -1
[Thu Dec 17 13:59:57 2015] Hello!

which is already off by 1 minute and 26 seconds off after ~24 hours of
uptime.

This occurs because the time stamp is obtained from a call to
local_clock() which (on x86) is a direct call to the hardware.  These
hardware clock reads are not modified by the standard ntp or ptp protocol,
while the other timestamps are, and that results in situations external
time sources are further and further offset from the kernel log
timestamps.

This patch introduces printk.clock=[local|boot|real|tai] allowing a
user to specify an adjusted clock to use with printk timestamps.  The
hardware clock, or the existing functionality, is preserved by default.

[v2]: use NMI safe timekeeping access functions

Cc: John Stultz <john.stultz@linaro.org>
Cc: Xunlei Pang <pang.xunlei@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Baolin Wang <baolin.wang@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
 kernel/printk/printk.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2ce8826..671f0ec 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -419,6 +419,56 @@ static u32 truncate_msg(u16 *text_len, u16 *trunc_msg_len,
 	return msg_used_size(*text_len + *trunc_msg_len, 0, pad_len);
 }
 
+/* Default timestamp is local clock */
+static char *printk_clock = "local";
+static u64 (*printk_clock_fn)(void) = &local_clock;
+
+static int printk_clock_param_set(const char *val,
+				  const struct kernel_param *kp)
+{
+	char *printk_clock_new = strstrip((char *)val);
+	int ret;
+
+	/* check if change is needed */
+	if (!strcmp(printk_clock_new, printk_clock))
+		return 0;
+
+	if (!strncmp("local", printk_clock_new, 5)) {
+		ret = param_set_charp(printk_clock_new, kp);
+		if (ret)
+			return ret;
+		printk_clock_fn = &local_clock;
+	} else if (!strncmp("real", printk_clock_new, 4)) {
+		ret = param_set_charp(printk_clock_new, kp);
+		if (ret)
+			return ret;
+		printk_clock_fn = &ktime_get_real_fast_ns;
+	} else if (!strncmp("boot", printk_clock_new, 4)) {
+		ret = param_set_charp(printk_clock_new, kp);
+		if (ret)
+			return ret;
+		printk_clock_fn = &ktime_get_boot_fast_ns;
+	} else if (!strncmp("tai", printk_clock_new, 3)) {
+		ret = param_set_charp(printk_clock_new, kp);
+		if (ret)
+			return ret;
+		printk_clock_fn = &ktime_get_tai_fast_ns;
+	} else {
+		return -EINVAL;
+	}
+
+	pr_info("printk: timestamp set to %s clock.\n", printk_clock);
+	return 0;
+}
+
+static struct kernel_param_ops printk_clock_param_ops = {
+	.set =		printk_clock_param_set,
+	.get =		param_get_charp,
+};
+
+module_param_cb(clock, &printk_clock_param_ops, &printk_clock, 0644);
+MODULE_PARM_DESC(clock, "Clock used for printk timestamps.  Can be local (hardware/default), boot, real, or tai.\n");
+
 /* insert record into the buffer, discard old ones, update heads */
 static int log_store(int facility, int level,
 		     enum log_flags flags, u64 ts_nsec,
@@ -467,7 +517,7 @@ static int log_store(int facility, int level,
 	if (ts_nsec > 0)
 		msg->ts_nsec = ts_nsec;
 	else
-		msg->ts_nsec = local_clock();
+		msg->ts_nsec = printk_clock_fn();
 	memset(log_dict(msg) + dict_len, 0, pad_len);
 	msg->len = size;
 
@@ -1613,7 +1663,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
 		cont.facility = facility;
 		cont.level = level;
 		cont.owner = current;
-		cont.ts_nsec = local_clock();
+		cont.ts_nsec = printk_clock_fn();
 		cont.flags = 0;
 		cont.cons = 0;
 		cont.flushed = false;
-- 
1.7.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-01-13 12:34 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-06 13:00 [PATCH 0/2] printk, Add a printk.clock kernel parameter Prarit Bhargava
2016-01-06 13:00 ` [PATCH 1/2] kernel, timekeeping, add trylock option to ktime_get_with_offset() Prarit Bhargava
2016-01-06 16:04   ` Jiri Bohac
2016-01-06 16:27     ` Prarit Bhargava
2016-01-06 16:25   ` Petr Mladek
2016-01-06 17:28   ` John Stultz
2016-01-06 17:33     ` John Stultz
2016-01-06 18:06       ` Prarit Bhargava
2016-01-06 18:10         ` Thomas Gleixner
2016-01-06 17:34     ` Thomas Gleixner
2016-01-06 18:09       ` Prarit Bhargava
2016-01-06 18:12         ` Thomas Gleixner
2016-01-06 19:04       ` John Stultz
2016-01-06 19:06         ` Prarit Bhargava
2016-01-06 13:00 ` [PATCH 2/2] printk, Add printk.clock kernel parameter Prarit Bhargava
2016-01-07 14:57   ` Petr Mladek
2016-01-07 15:38     ` Prarit Bhargava
2016-01-07 15:52       ` Petr Mladek
2016-01-08  9:04         ` Thomas Gleixner
2016-01-06 17:09 ` [PATCH 0/2] printk, Add a " Joe Perches
2016-01-13 12:34 [PATCH 0/2] printk, Add printk.clock kernel parameter [v2] Prarit Bhargava
2016-01-13 12:34 ` [PATCH 2/2] printk, Add printk.clock kernel parameter Prarit Bhargava

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.