All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC patch 0/7] printk: Switch to CLOCK_MONOTONIC and store extra time stamps
@ 2017-11-15 18:15 Thomas Gleixner
  2017-11-15 18:15 ` [RFC patch 1/7] timekeeping: Do not unconditionally suspend NMI safe timekeepers Thomas Gleixner
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-15 18:15 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

Following up on yesterdays discussion about adding support for reliably
correlatable time stamps to the printk ringbuffer entries, I split up that
draft patch from yesterday, addressed review feedback and actually tested
it.

There are a few things which are visible changes:

1) The fine grained time stamps start later in the boot process because
   local_clock() is initialized before a high resolution clocksource is
   installed.

2) During suspend/resume between timekeeping_suspend() and
   timekeeping_resume() the NMI safe accessors return a stale timestamp,
   i.e. the timestamp which was captured in timekeeping_suspend()

Mitigation:

 #1 Can be mitigated by a horrible hack, which I added as last patch in the
    series. It uses local_clock() in early boot up to the point where a
    proper clocksource providing fine grained CLOCK_MONOTONIC is
    available. That's actually not that bad, because time won't drift apart.

 #2 Can be mitigated at least for clocksources which are safe to access past
    timekeeping_suspend(). There are patches which enable that for TSC when
    the TSC_NONSTOP_SUSPEND cpu feature bit is enabled.

    For clocksources which cannot be accessed past timekeeping_suspend()
    the workaround #1 can be used (see last patch^Whack). Though this might
    cause time inconsistencies (non monotonic time stamps).

Not sure if any of this is a real problem, because not all
local/sched_clock() implementations are functional past
timekeeping_suspend(). 

I left all 3 time stamps in the patch for now so people can actually play
with that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC patch 1/7] timekeeping: Do not unconditionally suspend NMI safe timekeepers
  2017-11-15 18:15 [RFC patch 0/7] printk: Switch to CLOCK_MONOTONIC and store extra time stamps Thomas Gleixner
@ 2017-11-15 18:15 ` Thomas Gleixner
  2017-11-15 18:15 ` [RFC patch 2/7] x86/tsc: Set clocksource CLOCK_SOURCE_SUSPEND_ACCESS_OK Thomas Gleixner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-15 18:15 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

[-- Attachment #1: timekeeping--Do-not-unconditionally-suspend-fast-timekeepers.patch --]
[-- Type: text/plain, Size: 1197 bytes --]

If the current clocksource can be accessed safely after timekeeping is
suspended then there is no reason to suspend the NMI safe timekeepers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/clocksource.h |    3 ++-
 kernel/time/timekeeping.c   |    3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -119,7 +119,8 @@ struct clocksource {
 #define CLOCK_SOURCE_VALID_FOR_HRES		0x20
 #define CLOCK_SOURCE_UNSTABLE			0x40
 #define CLOCK_SOURCE_SUSPEND_NONSTOP		0x80
-#define CLOCK_SOURCE_RESELECT			0x100
+#define CLOCK_SOURCE_SUSPEND_ACCESS_OK		0x100
+#define CLOCK_SOURCE_RESELECT			0x200
 
 /* simplify initialization of mask field */
 #define CLOCKSOURCE_MASK(bits) GENMASK_ULL((bits) - 1, 0)
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -545,6 +545,9 @@ static void halt_fast_timekeeper(struct
 	static struct tk_read_base tkr_dummy;
 	struct tk_read_base *tkr = &tk->tkr_mono;
 
+	if (tkr->clock->flags & CLOCK_SOURCE_SUSPEND_ACCESS_OK)
+		return;
+
 	memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy));
 	cycles_at_suspend = tk_clock_read(tkr);
 	tkr_dummy.clock = &dummy_clock;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC patch 2/7] x86/tsc: Set clocksource CLOCK_SOURCE_SUSPEND_ACCESS_OK
  2017-11-15 18:15 [RFC patch 0/7] printk: Switch to CLOCK_MONOTONIC and store extra time stamps Thomas Gleixner
  2017-11-15 18:15 ` [RFC patch 1/7] timekeeping: Do not unconditionally suspend NMI safe timekeepers Thomas Gleixner
@ 2017-11-15 18:15 ` Thomas Gleixner
  2017-11-15 18:15 ` [RFC patch 3/7] printk: Use clock MONOTONIC for timestamps Thomas Gleixner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-15 18:15 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

[-- Attachment #1: x86-tsc--Set-clocksource-CLOCK_SOURCE_SUSPEND_ACCESS_OK.patch --]
[-- Type: text/plain, Size: 890 bytes --]

When TSC is non stop accross suspend then it can be accessed after
timekeeping_suspend(). Set the proper flag to indicate that.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: b/arch/x86/kernel/tsc.c
===================================================================
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1240,8 +1240,10 @@ static int __init init_tsc_clocksource(v
 		clocksource_tsc.flags &= ~CLOCK_SOURCE_IS_CONTINUOUS;
 	}
 
-	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
-		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
+	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3)) {
+		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP |
+					 CLOCK_SOURCE_SUSPEND_ACCESS_OK;
+	}
 
 	/*
 	 * When TSC frequency is known (retrieved via MSR or CPUID), we skip

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC patch 3/7] printk: Use clock MONOTONIC for timestamps
  2017-11-15 18:15 [RFC patch 0/7] printk: Switch to CLOCK_MONOTONIC and store extra time stamps Thomas Gleixner
  2017-11-15 18:15 ` [RFC patch 1/7] timekeeping: Do not unconditionally suspend NMI safe timekeepers Thomas Gleixner
  2017-11-15 18:15 ` [RFC patch 2/7] x86/tsc: Set clocksource CLOCK_SOURCE_SUSPEND_ACCESS_OK Thomas Gleixner
@ 2017-11-15 18:15 ` Thomas Gleixner
  2017-11-16  7:58   ` Sergey Senozhatsky
  2017-11-15 18:15 ` [RFC patch 4/7] timekeeping: Add NMI safe accessor to mono/boot/real clocks Thomas Gleixner
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-15 18:15 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

[-- Attachment #1: printk--Use-clock-MONOTONIC-for-timestamps.patch --]
[-- Type: text/plain, Size: 1731 bytes --]

local_clock() cannot be reliably correlated to CLOCK_MONOTONIC, which is
used by user space, e.g. systemd, to create correlation timestamps.

There are multiple reasons:

 - CLOCK_MONOTONIC is NTP adjusted, local_clock() not. Depending on the
   calibration accuracy and uptime significant drift can be observed.

 - CLOCK_MONOTONIC does not advance across suspend/resume for historical
   reasons. local_clock() might or might not depending on the properties of
   the underlying hardware counter.

Use the NMI safe accessor to clock MONOTONIC instead of local_clock(). The
access might be slower than local_clock(), but printk is not such a
performance critical hotpath that it matters.

Visible change:

The early boot timestamps are jiffies based longer than with local_clock()
depending on the platform. During suspend/resume the timestamp may become
stale when the underlying clocksource hardware is not flagged with
CLOCKSOURCE_SUSPEND_ACCESS_OK.

A horrible follow up patch demonstrates how that could be mitigated.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/printk/printk.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -624,7 +624,7 @@ static int log_store(int facility, int l
 	if (ts_nsec > 0)
 		msg->ts_nsec = ts_nsec;
 	else
-		msg->ts_nsec = local_clock();
+		msg->ts_nsec = ktime_get_mono_fast_ns();
 	memset(log_dict(msg) + dict_len, 0, pad_len);
 	msg->len = size;
 
@@ -1631,7 +1631,7 @@ static bool cont_add(int facility, int l
 		cont.facility = facility;
 		cont.level = level;
 		cont.owner = current;
-		cont.ts_nsec = local_clock();
+		cont.ts_nsec = ktime_get_mono_fast_ns();
 		cont.flags = flags;
 	}
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC patch 4/7] timekeeping: Add NMI safe accessor to mono/boot/real clocks
  2017-11-15 18:15 [RFC patch 0/7] printk: Switch to CLOCK_MONOTONIC and store extra time stamps Thomas Gleixner
                   ` (2 preceding siblings ...)
  2017-11-15 18:15 ` [RFC patch 3/7] printk: Use clock MONOTONIC for timestamps Thomas Gleixner
@ 2017-11-15 18:15 ` Thomas Gleixner
  2017-11-17 23:00   ` Steven Rostedt
  2017-11-15 18:15 ` [RFC patch 5/7] crash: Add VMCOREINFO_FIELD_AND_OFFSET() Thomas Gleixner
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-15 18:15 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

[-- Attachment #1: timekeeping--Add-NMI-safe-accessor-to-full-data.patch --]
[-- Type: text/plain, Size: 3452 bytes --]

Clock MONOTONIC timestamps are not allowing reliable correlation to other
timestamps (clock REALTIME/BOOTTIME). For post mortem analysis and other
purposes its helpful to store all of these time stamps in the printk ring
buffer entries.

Rework __ktime_get_real_fast() so it captures both MONOTONIC and REALTIME
atomically and provide an accessor which stores these plus BOOTTIME in a
new system_timestamp data structure.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/timekeeping.h |   17 ++++++++++++++++-
 kernel/time/timekeeping.c   |   36 +++++++++++++++++++++++++-----------
 2 files changed, 41 insertions(+), 12 deletions(-)

--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -146,7 +146,19 @@ extern void timekeeping_inject_sleeptime
  * PPS accessor
  */
 extern void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw,
-				        struct timespec64 *ts_real);
+					struct timespec64 *ts_real);
+
+/*
+ * struct system_timestamps - Simultaneous mono/boot/real timestamps
+ * @mono:	Monotonic timestamp
+ * @boot:	Boottime timestamp
+ * @real:	Realtime timestamp
+ */
+struct system_timestamps {
+	u64		mono;
+	u64		boot;
+	u64		real;
+};
 
 /*
  * struct system_time_snapshot - simultaneous raw/real time capture with
@@ -206,6 +218,9 @@ extern int get_device_system_crosststamp
  */
 extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
 
+/* NMI safe mono/boot/realtime timestamps */
+extern void ktime_get_fast_timestamps(struct system_timestamps *ts);
+
 /*
  * Persistent clock related interfaces
  */
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -496,29 +496,29 @@ u64 notrace ktime_get_boot_fast_ns(void)
 }
 EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
 
-
 /*
  * See comment for __ktime_get_fast_ns() vs. timestamp ordering
  */
-static __always_inline u64 __ktime_get_real_fast_ns(struct tk_fast *tkf)
+static notrace u64 __ktime_get_real_fast(struct tk_fast *tkf, u64 *mono)
 {
 	struct tk_read_base *tkr;
+	u64 basem, baser, delta;
 	unsigned int seq;
-	u64 now;
 
 	do {
 		seq = raw_read_seqcount_latch(&tkf->seq);
 		tkr = tkf->base + (seq & 0x01);
-		now = ktime_to_ns(tkr->base_real);
+		basem = ktime_to_ns(tkr->base);
+		baser = ktime_to_ns(tkr->base_real);
 
-		now += timekeeping_delta_to_ns(tkr,
-				clocksource_delta(
-					tk_clock_read(tkr),
-					tkr->cycle_last,
-					tkr->mask));
+		delta = timekeeping_delta_to_ns(tkr,
+				clocksource_delta(tk_clock_read(tkr),
+				tkr->cycle_last, tkr->mask));
 	} while (read_seqcount_retry(&tkf->seq, seq));
 
-	return now;
+	if (mono)
+		*mono = basem + delta;
+	return baser + delta;
 }
 
 /**
@@ -526,11 +526,25 @@ static __always_inline u64 __ktime_get_r
  */
 u64 ktime_get_real_fast_ns(void)
 {
-	return __ktime_get_real_fast_ns(&tk_fast_mono);
+	return __ktime_get_real_fast(&tk_fast_mono, NULL);
 }
 EXPORT_SYMBOL_GPL(ktime_get_real_fast_ns);
 
 /**
+ * ktime_get_fast_timestamps: - NMI safe timestamps
+ * @ts:		Pointer to timestamp storage
+ *
+ * Stores clock monotonic, boottime and realtime time stamps
+ */
+void ktime_get_fast_timestamps(struct system_timestamps *ts)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+
+	ts->real = __ktime_get_real_fast(&tk_fast_mono, &ts->mono);
+	ts->boot = ts->mono + ktime_to_ns(tk->offs_boot);
+}
+
+/**
  * halt_fast_timekeeper - Prevent fast timekeeper from accessing clocksource.
  * @tk: Timekeeper to snapshot.
  *

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC patch 5/7] crash: Add VMCOREINFO_FIELD_AND_OFFSET()
  2017-11-15 18:15 [RFC patch 0/7] printk: Switch to CLOCK_MONOTONIC and store extra time stamps Thomas Gleixner
                   ` (3 preceding siblings ...)
  2017-11-15 18:15 ` [RFC patch 4/7] timekeeping: Add NMI safe accessor to mono/boot/real clocks Thomas Gleixner
@ 2017-11-15 18:15 ` Thomas Gleixner
  2017-11-23 12:46   ` Petr Mladek
  2017-11-15 18:15 ` [RFC patch 6/7] printk: Store mono/boot/real time timestamps Thomas Gleixner
  2017-11-15 18:15 ` [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot Thomas Gleixner
  6 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-15 18:15 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches, Linus Torvalds

[-- Attachment #1: crash--Add-VMCOREINFO_FIELD_ANDOFFSET--.patch --]
[-- Type: text/plain, Size: 1353 bytes --]

Changing the time stamp storage in the printk ringbuffer entries would
break existing tools which rely on the availability of the 'ts_nsec' field
in vmcore info.

Provide a new macro which allows to store the offset of a member of struct
printk_log with a different name. This allows to change the underlying
storage and without breaking tools.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/crash_core.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -48,9 +48,11 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 #define VMCOREINFO_STRUCT_SIZE(name) \
 	vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \
 			      (unsigned long)sizeof(struct name))
-#define VMCOREINFO_OFFSET(name, field) \
-	vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #field, \
+#define VMCOREINFO_FIELD_AND_OFFSET(name, fieldname, field)		\
+	vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #fieldname,	\
 			      (unsigned long)offsetof(struct name, field))
+#define VMCOREINFO_OFFSET(name, field) \
+	VMCOREINFO_FIELD_AND_OFFSET(name, field, field)
 #define VMCOREINFO_LENGTH(name, value) \
 	vmcoreinfo_append_str("LENGTH(%s)=%lu\n", #name, (unsigned long)value)
 #define VMCOREINFO_NUMBER(name) \

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC patch 6/7] printk: Store mono/boot/real time timestamps
  2017-11-15 18:15 [RFC patch 0/7] printk: Switch to CLOCK_MONOTONIC and store extra time stamps Thomas Gleixner
                   ` (4 preceding siblings ...)
  2017-11-15 18:15 ` [RFC patch 5/7] crash: Add VMCOREINFO_FIELD_AND_OFFSET() Thomas Gleixner
@ 2017-11-15 18:15 ` Thomas Gleixner
  2017-11-23 13:36   ` Petr Mladek
  2017-11-15 18:15 ` [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot Thomas Gleixner
  6 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-15 18:15 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

[-- Attachment #1: printk--Use-time-snapshot.patch --]
[-- Type: text/plain, Size: 5672 bytes --]

To allow reliable timestamp correlation, replace the clock MONOTONIC based
timestamp in the printk ringbuffer entries with a collection of MONOTONIC,
BOOTTIME and REALTIME timestamps.

This does not change the textual output, as this is a separate issue. For
tools which access a vmcore the new timestamp fields are separately
exported by name so the tools do not have to gain knowledge about struct
system_timestamps. The existing 'ts_nsec' timestamp, which is used by
existing tools is mapped to the clock MONOTONIC entry of the timestamp
collection.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/printk/printk.c |   55 ++++++++++++++++++++++++++++---------------------
 1 file changed, 32 insertions(+), 23 deletions(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -352,13 +352,13 @@ enum log_flags {
 };
 
 struct printk_log {
-	u64 ts_nsec;		/* timestamp in nanoseconds */
-	u16 len;		/* length of entire record */
-	u16 text_len;		/* length of text buffer */
-	u16 dict_len;		/* length of dictionary buffer */
-	u8 facility;		/* syslog facility */
-	u8 flags:5;		/* internal record flags */
-	u8 level:3;		/* syslog level */
+	struct system_timestamps ts;	/* Timestamps in nanoseconds */
+	u16 len;			/* length of entire record */
+	u16 text_len;			/* length of text buffer */
+	u16 dict_len;			/* length of dictionary buffer */
+	u8 facility;			/* syslog facility */
+	u8 flags:5;			/* internal record flags */
+	u8 level:3;			/* syslog level */
 }
 #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
 __packed __aligned(4)
@@ -577,8 +577,8 @@ static u32 truncate_msg(u16 *text_len, u
 }
 
 /* insert record into the buffer, discard old ones, update heads */
-static int log_store(int facility, int level,
-		     enum log_flags flags, u64 ts_nsec,
+static int log_store(int facility, int level, enum log_flags flags,
+		     struct system_timestamps *ts,
 		     const char *dict, u16 dict_len,
 		     const char *text, u16 text_len)
 {
@@ -621,10 +621,10 @@ static int log_store(int facility, int l
 	msg->facility = facility;
 	msg->level = level & 7;
 	msg->flags = flags & 0x1f;
-	if (ts_nsec > 0)
-		msg->ts_nsec = ts_nsec;
+	if (ts)
+		msg->ts = *ts;
 	else
-		msg->ts_nsec = ktime_get_mono_fast_ns();
+		ktime_get_fast_timestamps(&msg->ts);
 	memset(log_dict(msg) + dict_len, 0, pad_len);
 	msg->len = size;
 
@@ -687,7 +687,7 @@ static void append_char(char **pp, char
 static ssize_t msg_print_ext_header(char *buf, size_t size,
 				    struct printk_log *msg, u64 seq)
 {
-	u64 ts_usec = msg->ts_nsec;
+	u64 ts_usec = msg->ts.mono;
 
 	do_div(ts_usec, 1000);
 
@@ -1021,7 +1021,14 @@ void log_buf_vmcoreinfo_setup(void)
 	 * parse it and detect any changes to structure down the line.
 	 */
 	VMCOREINFO_STRUCT_SIZE(printk_log);
-	VMCOREINFO_OFFSET(printk_log, ts_nsec);
+	/*
+	 * The timestamps fields are exported explicitely so older tools
+	 * which only know "ts_nsec" still work and newer tools do not need
+	 * to know about the struct system_timestamps layout.
+	 */
+	VMCOREINFO_FIELD_AND_OFFSET(printk_log, ts_nsec, ts.mono);
+	VMCOREINFO_FIELD_AND_OFFSET(printk_log, ts_boot_nsec, ts.boot);
+	VMCOREINFO_FIELD_AND_OFFSET(printk_log, ts_real_nsec, ts.real);
 	VMCOREINFO_OFFSET(printk_log, len);
 	VMCOREINFO_OFFSET(printk_log, text_len);
 	VMCOREINFO_OFFSET(printk_log, dict_len);
@@ -1204,20 +1211,21 @@ static inline void boot_delay_msec(int l
 static bool printk_time = IS_ENABLED(CONFIG_PRINTK_TIME);
 module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
 
-static size_t print_time(u64 ts, char *buf)
+static size_t print_time(const struct system_timestamps *ts, char *buf)
 {
 	unsigned long rem_nsec;
+	u64 mono = ts->mono;
 
 	if (!printk_time)
 		return 0;
 
-	rem_nsec = do_div(ts, 1000000000);
+	rem_nsec = do_div(mono, 1000000000);
 
 	if (!buf)
-		return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
+		return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)mono);
 
 	return sprintf(buf, "[%5lu.%06lu] ",
-		       (unsigned long)ts, rem_nsec / 1000);
+		       (unsigned long)mono, rem_nsec / 1000);
 }
 
 static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
@@ -1239,7 +1247,7 @@ static size_t print_prefix(const struct
 		}
 	}
 
-	len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
+	len += print_time(&msg->ts, buf ? buf + len : NULL);
 	return len;
 }
 
@@ -1599,7 +1607,7 @@ static struct cont {
 	char buf[LOG_LINE_MAX];
 	size_t len;			/* length == 0 means unused buffer */
 	struct task_struct *owner;	/* task of first print*/
-	u64 ts_nsec;			/* time of first print */
+	struct system_timestamps ts;	/* time of first print */
 	u8 level;			/* log level of first message */
 	u8 facility;			/* log facility of first message */
 	enum log_flags flags;		/* prefix, newline flags */
@@ -1610,7 +1618,7 @@ static void cont_flush(void)
 	if (cont.len == 0)
 		return;
 
-	log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
+	log_store(cont.facility, cont.level, cont.flags, &cont.ts,
 		  NULL, 0, cont.buf, cont.len);
 	cont.len = 0;
 }
@@ -1631,7 +1639,7 @@ static bool cont_add(int facility, int l
 		cont.facility = facility;
 		cont.level = level;
 		cont.owner = current;
-		cont.ts_nsec = ktime_get_mono_fast_ns();
+		ktime_get_fast_timestamps(&cont.ts);
 		cont.flags = flags;
 	}
 
@@ -1677,7 +1685,8 @@ static size_t log_output(int facility, i
 	}
 
 	/* Store it in the record log */
-	return log_store(facility, level, lflags, 0, dict, dictlen, text, text_len);
+	return log_store(facility, level, lflags, NULL, dict, dictlen, text,
+			 text_len);
 }
 
 asmlinkage int vprintk_emit(int facility, int level,

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-11-15 18:15 [RFC patch 0/7] printk: Switch to CLOCK_MONOTONIC and store extra time stamps Thomas Gleixner
                   ` (5 preceding siblings ...)
  2017-11-15 18:15 ` [RFC patch 6/7] printk: Store mono/boot/real time timestamps Thomas Gleixner
@ 2017-11-15 18:15 ` Thomas Gleixner
  2017-11-23 12:58   ` Petr Mladek
  6 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-15 18:15 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

[-- Attachment #1: timekeeping--Hack-to-use-fine-grained-timestamps-during-boot.patch --]
[-- Type: text/plain, Size: 2968 bytes --]

For demonstration purposes only.

Add a disgusting hack to work around the fact that high resolution clock
MONOTONIC accessors are not available during early boot and return stale
time stamps accross suspend/resume when the current clocksource is not
flagged with CLOCK_SOURCE_SUSPEND_ACCESS_OK.

Use local_clock() to provide timestamps in early boot and when the
clocksource is not accessible after timekeeping_suspend(). In the
suspend/resume case this might cause non monotonic timestamps.

Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timekeeping.c |   25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -16,6 +16,7 @@
 #include <linux/mm.h>
 #include <linux/nmi.h>
 #include <linux/sched.h>
+#include <linux/sched/clock.h>
 #include <linux/sched/loadavg.h>
 #include <linux/syscore_ops.h>
 #include <linux/clocksource.h>
@@ -499,17 +500,19 @@ EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns
 /*
  * See comment for __ktime_get_fast_ns() vs. timestamp ordering
  */
-static notrace u64 __ktime_get_real_fast(struct tk_fast *tkf, u64 *mono)
+static bool __ktime_get_real_fast(struct tk_fast *tkf, u64 *real, u64 *mono)
 {
 	struct tk_read_base *tkr;
 	u64 basem, baser, delta;
 	unsigned int seq;
+	bool hres;
 
 	do {
 		seq = raw_read_seqcount_latch(&tkf->seq);
 		tkr = tkf->base + (seq & 0x01);
 		basem = ktime_to_ns(tkr->base);
 		baser = ktime_to_ns(tkr->base_real);
+		hres = tkr->clock->flags & CLOCK_SOURCE_VALID_FOR_HRES;
 
 		delta = timekeeping_delta_to_ns(tkr,
 				clocksource_delta(tk_clock_read(tkr),
@@ -518,15 +521,23 @@ static notrace u64 __ktime_get_real_fast
 
 	if (mono)
 		*mono = basem + delta;
-	return baser + delta;
+	*real = baser + delta;
+	return hres;
 }
 
 /**
  * ktime_get_real_fast_ns: - NMI safe and fast access to clock realtime.
+ *
+ * Returns the wall clock (UTC) timestamp. Caveats:
+ *  - Returns 0 before timekeeping is initialized
+ *  - Returns a stale value accross suspend/resume
  */
 u64 ktime_get_real_fast_ns(void)
 {
-	return __ktime_get_real_fast(&tk_fast_mono, NULL);
+	u64 real;
+
+	__ktime_get_real_fast(&tk_fast_mono, &real, NULL);
+	return real;
 }
 EXPORT_SYMBOL_GPL(ktime_get_real_fast_ns);
 
@@ -535,13 +546,19 @@ EXPORT_SYMBOL_GPL(ktime_get_real_fast_ns
  * @ts:		Pointer to timestamp storage
  *
  * Stores clock monotonic, boottime and realtime time stamps
+ *
+ * Note: This is a special implementation for printk. The early boot time
+ * stamps, i.e. before timekeeping is available are taken from local_clock().
  */
 void ktime_get_fast_timestamps(struct system_timestamps *ts)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
+	bool hres;
 
-	ts->real = __ktime_get_real_fast(&tk_fast_mono, &ts->mono);
+	hres = __ktime_get_real_fast(&tk_fast_mono, &ts->real, &ts->mono);
 	ts->boot = ts->mono + ktime_to_ns(tk->offs_boot);
+	if (!hres)
+		ts->mono = local_clock();
 }
 
 /**

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 3/7] printk: Use clock MONOTONIC for timestamps
  2017-11-15 18:15 ` [RFC patch 3/7] printk: Use clock MONOTONIC for timestamps Thomas Gleixner
@ 2017-11-16  7:58   ` Sergey Senozhatsky
  0 siblings, 0 replies; 23+ messages in thread
From: Sergey Senozhatsky @ 2017-11-16  7:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

Hello Thomas,

On (11/15/17 19:15), Thomas Gleixner wrote:
> local_clock() cannot be reliably correlated to CLOCK_MONOTONIC, which is
> used by user space, e.g. systemd, to create correlation timestamps.
> 
> There are multiple reasons:
> 
>  - CLOCK_MONOTONIC is NTP adjusted, local_clock() not. Depending on the
>    calibration accuracy and uptime significant drift can be observed.
> 
>  - CLOCK_MONOTONIC does not advance across suspend/resume for historical
>    reasons. local_clock() might or might not depending on the properties of
>    the underlying hardware counter.
> 
> Use the NMI safe accessor to clock MONOTONIC instead of local_clock(). The
> access might be slower than local_clock(), but printk is not such a
> performance critical hotpath that it matters.
> 
> Visible change:
> 
> The early boot timestamps are jiffies based longer than with local_clock()
> depending on the platform. During suspend/resume the timestamp may become
> stale when the underlying clocksource hardware is not flagged with
> CLOCKSOURCE_SUSPEND_ACCESS_OK.
> 
> A horrible follow up patch demonstrates how that could be mitigated.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/printk/printk.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

a silly nitpick,

I suppose we can do

-#include <linux/sched/clock.h>
+#include <linux/timekeeping.h>

	-ss

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 4/7] timekeeping: Add NMI safe accessor to mono/boot/real clocks
  2017-11-15 18:15 ` [RFC patch 4/7] timekeeping: Add NMI safe accessor to mono/boot/real clocks Thomas Gleixner
@ 2017-11-17 23:00   ` Steven Rostedt
  2017-11-17 23:12     ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2017-11-17 23:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Joe Perches

On Wed, 15 Nov 2017 19:15:35 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> +static notrace u64 __ktime_get_real_fast(struct tk_fast *tkf, u64 *mono)
>  {
>  	struct tk_read_base *tkr;
> +	u64 basem, baser, delta;
>  	unsigned int seq;
> -	u64 now;
>  
>  	do {
>  		seq = raw_read_seqcount_latch(&tkf->seq);
>  		tkr = tkf->base + (seq & 0x01);
> -		now = ktime_to_ns(tkr->base_real);
> +		basem = ktime_to_ns(tkr->base);
> +		baser = ktime_to_ns(tkr->base_real);
>  
> -		now += timekeeping_delta_to_ns(tkr,
> -				clocksource_delta(
> -					tk_clock_read(tkr),
> -					tkr->cycle_last,
> -					tkr->mask));
> +		delta = timekeeping_delta_to_ns(tkr,
> +				clocksource_delta(tk_clock_read(tkr),
> +				tkr->cycle_last, tkr->mask));

Could you indent this better. It looks like timekeeping_delta_to_ns()
takes 4 arguments, when it really only takes two.

		delta = timekeeping_dela_to_ns(tkr,
				clocksource_delta(tk_clock_read(tkr),
					tkr->cycle_last, tkr->mask));

Would point out, at least, that it takes two arguments, and
clocksource_delta() takes three (not one).

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 4/7] timekeeping: Add NMI safe accessor to mono/boot/real clocks
  2017-11-17 23:00   ` Steven Rostedt
@ 2017-11-17 23:12     ` Linus Torvalds
  2017-11-17 23:43       ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2017-11-17 23:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, LKML, Prarit Bhargava, Mark Salyzyn,
	Petr Mladek, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Andrew Morton, Sergey Senozhatsky, Joe Perches

On Fri, Nov 17, 2017 at 3:00 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> Could you indent this better. It looks like timekeeping_delta_to_ns()
> takes 4 arguments, when it really only takes two.

Actually, even better would be to simply not have that ugly three-line
expression from hell.

Maybe it could even be a helper inline function, somethin glike

   static inline u64 tk_delta_ns(struct tk_read_base *tkr)
   {
        u64 cycle = tk_clock_read(tkr);
        u64 delta= clocksource_delta(cycle, tkr->cycle_last, tkr->mask);
        return timekeeping_delta_to_ns(tkr, cycle);
    }

and that whole thing would just become

        now += tk_delta_ns(tkr);

which looks a whole lot more legible. And shouldn't generate any worse code.

(I might have screwed up something, I wrote that in the email client,
maybe I got something wrong.

And my name choice is probably garbage. But you get the idea.

            Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 4/7] timekeeping: Add NMI safe accessor to mono/boot/real clocks
  2017-11-17 23:12     ` Linus Torvalds
@ 2017-11-17 23:43       ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-17 23:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steven Rostedt, LKML, Prarit Bhargava, Mark Salyzyn, Petr Mladek,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Joe Perches

On Fri, 17 Nov 2017, Linus Torvalds wrote:

> On Fri, Nov 17, 2017 at 3:00 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Could you indent this better. It looks like timekeeping_delta_to_ns()
> > takes 4 arguments, when it really only takes two.
> 
> Actually, even better would be to simply not have that ugly three-line
> expression from hell.

I pondered and almost did that when I was hacking that up, but then
lazyness won :)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 5/7] crash: Add VMCOREINFO_FIELD_AND_OFFSET()
  2017-11-15 18:15 ` [RFC patch 5/7] crash: Add VMCOREINFO_FIELD_AND_OFFSET() Thomas Gleixner
@ 2017-11-23 12:46   ` Petr Mladek
  0 siblings, 0 replies; 23+ messages in thread
From: Petr Mladek @ 2017-11-23 12:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches, Linus Torvalds

On Wed 2017-11-15 19:15:36, Thomas Gleixner wrote:
> Changing the time stamp storage in the printk ringbuffer entries would
> break existing tools which rely on the availability of the 'ts_nsec' field
> in vmcore info.
> 
> Provide a new macro which allows to store the offset of a member of struct
> printk_log with a different name. This allows to change the underlying
> storage and without breaking tools.

It seems that this cheating is not enough. I tried to use it in printk
(6th patch in this patch set) and the "crash" tool complained:

===== cut =====
WARNING: log buf data structure(s) have changed

log: invalid structure member offset: log_ts_nsec
     FILE: kernel.c  LINE: 5055  FUNCTION: dump_log_entry()
===== cut =====


The crash tool get this value using the following code:

  MEMBER_OFFSET_INIT(log_ts_nsec, "printk_log", "ts_nsec");
  + MEMBER_OFFSET()
    + datatype_info((X), (Y), NULL)

, where the last function does something like:

long
datatype_info(char *name, char *member, struct datatype_member *dm)
{
	struct gnu_request *req;

	req->command = GNU_GET_DATATYPE;
	req->name = name;
	req->member = member;

	gdb_interface(req);

I am not completely sure what it does. I guess that gdb
is able to get information about the real types either
from vmlinux binary or vmcore. It is not able to
find "ts_nsec" in the struct "printk_log" and fails.


I am afraid that we would need to fix crash for
the modified struct printk_log. Then this hack
will not be needed.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-11-15 18:15 ` [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot Thomas Gleixner
@ 2017-11-23 12:58   ` Petr Mladek
  2017-11-28 18:43     ` Prarit Bhargava
  0 siblings, 1 reply; 23+ messages in thread
From: Petr Mladek @ 2017-11-23 12:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Linus Torvalds, Prarit Bhargava, Mark Salyzyn, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

On Wed 2017-11-15 19:15:38, Thomas Gleixner wrote:
> For demonstration purposes only.
> 
> Add a disgusting hack to work around the fact that high resolution clock
> MONOTONIC accessors are not available during early boot and return stale
> time stamps accross suspend/resume when the current clocksource is not
> flagged with CLOCK_SOURCE_SUSPEND_ACCESS_OK.
> 
> Use local_clock() to provide timestamps in early boot and when the
> clocksource is not accessible after timekeeping_suspend(). In the
> suspend/resume case this might cause non monotonic timestamps.

I get the non-monotonic times even during boot:

[    0.026709] smp: Bringing up secondary CPUs ...
[    0.027973] x86: Booting SMP configuration:
[    0.028006] .... node  #0, CPUs:      #1
[    0.004000] kvm-clock: cpu 1, msr 1:3ff51041, secondary cpu clock
     ^^^^^^^^
[    0.032097] KVM setup async PF for cpu 1
[    0.032766] kvm-stealtime: cpu 1, msr 13b00dc40
[    0.036502]  #2
[    0.004000] kvm-clock: cpu 2, msr 1:3ff51081, secondary cpu clock
     ^^^^^^^^
[    0.040062] KVM setup async PF for cpu 2
[    0.040576] kvm-stealtime: cpu 2, msr 13b20dc40
[    0.041304]  #3
[    0.004000] kvm-clock: cpu 3, msr 1:3ff510c1, secondary cpu clock
     ^^^^^^^^
[    0.048051] KVM setup async PF for cpu 3
[    0.048554] kvm-stealtime: cpu 3, msr 13b40dc40


To be honest, I do not feel experienced enough to decide which
solution is acceptable. I would say that only few people care
about timestamps during boot. On the other hand, some tools
might get confused by non-monotonic timestamps. Therefore
I would try to avoid this patch.

But I might be completely wrong. After all, this feature is
done because people wanted to track the time more precisely.
Some of them wanted to have the most precise time even during
suspend and resume. On the other hand, it might be enough
if we allow to get the information from the other
stored timestamps.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 6/7] printk: Store mono/boot/real time timestamps
  2017-11-15 18:15 ` [RFC patch 6/7] printk: Store mono/boot/real time timestamps Thomas Gleixner
@ 2017-11-23 13:36   ` Petr Mladek
  0 siblings, 0 replies; 23+ messages in thread
From: Petr Mladek @ 2017-11-23 13:36 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Gleixner
  Cc: LKML, Prarit Bhargava, Mark Salyzyn, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Andrew Morton, Sergey Senozhatsky,
	Steven Rostedt, Joe Perches

On Wed 2017-11-15 19:15:37, Thomas Gleixner wrote:
> To allow reliable timestamp correlation, replace the clock MONOTONIC based
> timestamp in the printk ringbuffer entries with a collection of MONOTONIC,
> BOOTTIME and REALTIME timestamps.
> 
> This does not change the textual output, as this is a separate issue. For
> tools which access a vmcore the new timestamp fields are separately
> exported by name so the tools do not have to gain knowledge about struct
> system_timestamps. The existing 'ts_nsec' timestamp, which is used by
> existing tools is mapped to the clock MONOTONIC entry of the timestamp
> collection.

OK, we do this to do not break existing users. But what are
the acceptable ways to make this information available?

IMHO, the right solution is that every reader would need to
explicitely ask for the extended format:

  + /dev/kmsg: ask for extended output via ioctl()

  + syslog: ask for extra timestamps by LOG_*_TIMESTAMP options
	    of the openlog() syscall. It would be similar
	    to LOG_PID.

  + console: extend to console= parameter format, e.g.
            console=name[,options][/t]

            , where /t would cause showing all the timestamps
	    in the following format

            [xxxxx.xxxxxx][b=xxxxx.xxxxxx][r=xxxxx.xxxxxxx] <message>

Would this be acceptable, please?
Do you have other/better ideas, please?

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-11-23 12:58   ` Petr Mladek
@ 2017-11-28 18:43     ` Prarit Bhargava
  2017-11-28 18:47       ` Thomas Gleixner
  2017-11-28 19:10       ` Mark Salyzyn
  0 siblings, 2 replies; 23+ messages in thread
From: Prarit Bhargava @ 2017-11-28 18:43 UTC (permalink / raw)
  To: Petr Mladek, Thomas Gleixner
  Cc: LKML, Linus Torvalds, Mark Salyzyn, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Andrew Morton, Sergey Senozhatsky,
	Steven Rostedt, Joe Perches



On 11/23/2017 07:58 AM, Petr Mladek wrote:
> On Wed 2017-11-15 19:15:38, Thomas Gleixner wrote:
>> For demonstration purposes only.
>>
>> Add a disgusting hack to work around the fact that high resolution clock
>> MONOTONIC accessors are not available during early boot and return stale
>> time stamps accross suspend/resume when the current clocksource is not
>> flagged with CLOCK_SOURCE_SUSPEND_ACCESS_OK.
>>
>> Use local_clock() to provide timestamps in early boot and when the
>> clocksource is not accessible after timekeeping_suspend(). In the
>> suspend/resume case this might cause non monotonic timestamps.
> 
> I get the non-monotonic times even during boot:
> 
> [    0.026709] smp: Bringing up secondary CPUs ...
> [    0.027973] x86: Booting SMP configuration:
> [    0.028006] .... node  #0, CPUs:      #1
> [    0.004000] kvm-clock: cpu 1, msr 1:3ff51041, secondary cpu clock
>      ^^^^^^^^
> [    0.032097] KVM setup async PF for cpu 1
> [    0.032766] kvm-stealtime: cpu 1, msr 13b00dc40
> [    0.036502]  #2
> [    0.004000] kvm-clock: cpu 2, msr 1:3ff51081, secondary cpu clock
>      ^^^^^^^^
> [    0.040062] KVM setup async PF for cpu 2
> [    0.040576] kvm-stealtime: cpu 2, msr 13b20dc40
> [    0.041304]  #3
> [    0.004000] kvm-clock: cpu 3, msr 1:3ff510c1, secondary cpu clock
>      ^^^^^^^^
> [    0.048051] KVM setup async PF for cpu 3
> [    0.048554] kvm-stealtime: cpu 3, msr 13b40dc40
> 
> 
> To be honest, I do not feel experienced enough to decide which
> solution is acceptable. I would say that only few people care
> about timestamps during boot. On the other hand, some tools

It is extremely important to know what happened and how long it took.  I agree
with Petr, we should figure out a way to guarantee that the timestamp is monotonic.

P.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-11-28 18:43     ` Prarit Bhargava
@ 2017-11-28 18:47       ` Thomas Gleixner
  2017-12-08 11:23         ` Petr Mladek
  2017-11-28 19:10       ` Mark Salyzyn
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2017-11-28 18:47 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: Petr Mladek, LKML, Linus Torvalds, Mark Salyzyn, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

On Tue, 28 Nov 2017, Prarit Bhargava wrote:
> On 11/23/2017 07:58 AM, Petr Mladek wrote:
> > On Wed 2017-11-15 19:15:38, Thomas Gleixner wrote:
> >> For demonstration purposes only.
> >>
> >> Add a disgusting hack to work around the fact that high resolution clock
> >> MONOTONIC accessors are not available during early boot and return stale
> >> time stamps accross suspend/resume when the current clocksource is not
> >> flagged with CLOCK_SOURCE_SUSPEND_ACCESS_OK.
> >>
> >> Use local_clock() to provide timestamps in early boot and when the
> >> clocksource is not accessible after timekeeping_suspend(). In the
> >> suspend/resume case this might cause non monotonic timestamps.
> > 
> > I get the non-monotonic times even during boot:
> > 
> > [    0.026709] smp: Bringing up secondary CPUs ...
> > [    0.027973] x86: Booting SMP configuration:
> > [    0.028006] .... node  #0, CPUs:      #1
> > [    0.004000] kvm-clock: cpu 1, msr 1:3ff51041, secondary cpu clock
> >      ^^^^^^^^

Which is interesting as this should happen even w/o those patches. 

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-11-28 18:43     ` Prarit Bhargava
  2017-11-28 18:47       ` Thomas Gleixner
@ 2017-11-28 19:10       ` Mark Salyzyn
  2017-11-28 19:45         ` Steven Rostedt
  1 sibling, 1 reply; 23+ messages in thread
From: Mark Salyzyn @ 2017-11-28 19:10 UTC (permalink / raw)
  To: Prarit Bhargava, Petr Mladek, Thomas Gleixner
  Cc: LKML, Linus Torvalds, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Andrew Morton, Sergey Senozhatsky,
	Steven Rostedt, Joe Perches

On 11/28/2017 10:43 AM, Prarit Bhargava wrote:
>
> On 11/23/2017 07:58 AM, Petr Mladek wrote:
>> On Wed 2017-11-15 19:15:38, Thomas Gleixner wrote:
>>> For demonstration purposes only.
>>>
>>> Add a disgusting hack to work around the fact that high resolution clock
>>> MONOTONIC accessors are not available during early boot and return stale
>>> time stamps accross suspend/resume when the current clocksource is not
>>> flagged with CLOCK_SOURCE_SUSPEND_ACCESS_OK.
>>>
>>> Use local_clock() to provide timestamps in early boot and when the
>>> clocksource is not accessible after timekeeping_suspend(). In the
>>> suspend/resume case this might cause non monotonic timestamps.
>> I get the non-monotonic times even during boot:
>>
>> [    0.026709] smp: Bringing up secondary CPUs ...
>> [    0.027973] x86: Booting SMP configuration:
>> [    0.028006] .... node  #0, CPUs:      #1
>> [    0.004000] kvm-clock: cpu 1, msr 1:3ff51041, secondary cpu clock
>>       ^^^^^^^^
>> [    0.032097] KVM setup async PF for cpu 1
>> [    0.032766] kvm-stealtime: cpu 1, msr 13b00dc40
>> [    0.036502]  #2
>> [    0.004000] kvm-clock: cpu 2, msr 1:3ff51081, secondary cpu clock
>>       ^^^^^^^^
>> [    0.040062] KVM setup async PF for cpu 2
>> [    0.040576] kvm-stealtime: cpu 2, msr 13b20dc40
>> [    0.041304]  #3
>> [    0.004000] kvm-clock: cpu 3, msr 1:3ff510c1, secondary cpu clock
>>       ^^^^^^^^
>> [    0.048051] KVM setup async PF for cpu 3
>> [    0.048554] kvm-stealtime: cpu 3, msr 13b40dc40
>>
>>
>> To be honest, I do not feel experienced enough to decide which
>> solution is acceptable. I would say that only few people care
>> about timestamps during boot. On the other hand, some tools
> It is extremely important to know what happened and how long it took.  I agree
> with Petr, we should figure out a way to guarantee that the timestamp is monotonic.
>
> P.

There is no guarantee of sequential order of delivery for kernel prints.

[TL;DR]

The timestamp may very well be monotonic in the general case, and yet it 
may take longer to _compose_ or _deliver_ the message (multiprocessor, 
via vsprint, etc). To get sequential order the timestamp _has_ to be 
reliably monotonic and _collected_ at incident _before_ anything else, 
before message is composed, and then (land in-place, post-sorted, what 
evz?) be presented in time sorted order.

User space can also inject into the kernel print buffer, it certainly 
does not support at-incident timestamp collection as it is delayed 
already by a syscall or two. We would need to allow user space (via fast 
vdso) collect timestamp(s), then make it part of the payload of the 
report to even get close.

It is out-of-scope for this RFC to consider rewriting the timestamp 
acquisition and delivery infrastructure for kernel prints IMHO. For now 
we all have accepted they will be _somewhat_ sequential within the 
confines of races and composition.

Sincerely -- Mark Salyzyn

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-11-28 19:10       ` Mark Salyzyn
@ 2017-11-28 19:45         ` Steven Rostedt
  2017-11-28 20:29           ` Mark Salyzyn
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2017-11-28 19:45 UTC (permalink / raw)
  To: Mark Salyzyn
  Cc: Prarit Bhargava, Petr Mladek, Thomas Gleixner, LKML,
	Linus Torvalds, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Andrew Morton, Sergey Senozhatsky, Joe Perches

On Tue, 28 Nov 2017 11:10:02 -0800
Mark Salyzyn <salyzyn@android.com> wrote:

> There is no guarantee of sequential order of delivery for kernel prints.

But isn't the timestamp taken with the logbuf_lock, and then delivered
to the printk buffer? That would guarantee that all events will be in
order, and the timestamps be sequential. They are serialized by the
logbuf_lock.

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-11-28 19:45         ` Steven Rostedt
@ 2017-11-28 20:29           ` Mark Salyzyn
  2017-11-28 20:38             ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Salyzyn @ 2017-11-28 20:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Prarit Bhargava, Petr Mladek, Thomas Gleixner, LKML,
	Linus Torvalds, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Andrew Morton, Sergey Senozhatsky, Joe Perches

On 11/28/2017 11:45 AM, Steven Rostedt wrote:
> On Tue, 28 Nov 2017 11:10:02 -0800
> Mark Salyzyn <salyzyn@android.com> wrote:
>
>> There is no guarantee of sequential order of delivery for kernel prints.
> But isn't the timestamp taken with the logbuf_lock, and then delivered
> to the printk buffer? That would guarantee that all events will be in
> order, and the timestamps be sequential. They are serialized by the
> logbuf_lock.
>
> -- Steve

To get closer to a sequential order of event delivery by timestamp, you 
really should take a global timestamp (I know, local clock is _not_ a 
global timestamp) first thing and as close to the event as possible, 
before anything else, and outside of a lock which is merely a 
synchronization action.

A) event came first, got preempted by the scheduler before lock.

B) multiple events came in order while lock held, random selection of 
which one gets the lock once released.

C) scheduling differences between FIFO, BATCH, low priority, cgroup 
limits, interrupt, especially since an event is usually generated 
_after_ the fact related to an activity.

We do not have a guarantee of sequential order, outside the scope of 
this RFC. We accept this behavior as it is good enough for the most 
part, and we are all smart adults here, we can figure it out when things 
get strange.

-- Mark

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-11-28 20:29           ` Mark Salyzyn
@ 2017-11-28 20:38             ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2017-11-28 20:38 UTC (permalink / raw)
  To: Mark Salyzyn
  Cc: Steven Rostedt, Prarit Bhargava, Petr Mladek, Thomas Gleixner,
	LKML, Linus Torvalds, Ingo Molnar, H. Peter Anvin, Andrew Morton,
	Sergey Senozhatsky, Joe Perches

On Tue, Nov 28, 2017 at 12:29:38PM -0800, Mark Salyzyn wrote:
> To get closer to a sequential order of event delivery by timestamp, you
> really should take a global timestamp (I know, local clock is _not_ a global
> timestamp)

it is if you have a halfway sane machine

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-11-28 18:47       ` Thomas Gleixner
@ 2017-12-08 11:23         ` Petr Mladek
  2017-12-08 19:51           ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Petr Mladek @ 2017-12-08 11:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Prarit Bhargava, LKML, Linus Torvalds, Mark Salyzyn, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

On Tue 2017-11-28 19:47:09, Thomas Gleixner wrote:
> On Tue, 28 Nov 2017, Prarit Bhargava wrote:
> > On 11/23/2017 07:58 AM, Petr Mladek wrote:
> > > On Wed 2017-11-15 19:15:38, Thomas Gleixner wrote:
> > >> For demonstration purposes only.
> > >>
> > >> Add a disgusting hack to work around the fact that high resolution clock
> > >> MONOTONIC accessors are not available during early boot and return stale
> > >> time stamps accross suspend/resume when the current clocksource is not
> > >> flagged with CLOCK_SOURCE_SUSPEND_ACCESS_OK.
> > >>
> > >> Use local_clock() to provide timestamps in early boot and when the
> > >> clocksource is not accessible after timekeeping_suspend(). In the
> > >> suspend/resume case this might cause non monotonic timestamps.
> > > 
> > > I get the non-monotonic times even during boot:
> > > 
> > > [    0.026709] smp: Bringing up secondary CPUs ...
> > > [    0.027973] x86: Booting SMP configuration:
> > > [    0.028006] .... node  #0, CPUs:      #1
> > > [    0.004000] kvm-clock: cpu 1, msr 1:3ff51041, secondary cpu clock
> > >      ^^^^^^^^
> 
> Which is interesting as this should happen even w/o those patches. 

Good point. I really see this even without those patches.

I was confused because I saw the following with this patchset
but without this latest patch:

[    0.016000] smp: Bringing up secondary CPUs ...
[    0.016000] x86: Booting SMP configuration:
[    0.020000] .... node  #0, CPUs:      #1
[    0.020000] kvm-clock: cpu 1, msr 1:3ff52041, secondary cpu clock
[    0.024000] KVM setup async PF for cpu 1
[    0.024000] kvm-stealtime: cpu 1, msr 13fc8d940

It is monotonic. The only drawback is that the resolution of the
monotonic clock is low at this stage.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot
  2017-12-08 11:23         ` Petr Mladek
@ 2017-12-08 19:51           ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2017-12-08 19:51 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Prarit Bhargava, LKML, Linus Torvalds, Mark Salyzyn, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra, Andrew Morton,
	Sergey Senozhatsky, Steven Rostedt, Joe Perches

On Fri, 8 Dec 2017, Petr Mladek wrote:

> On Tue 2017-11-28 19:47:09, Thomas Gleixner wrote:
> > On Tue, 28 Nov 2017, Prarit Bhargava wrote:
> > > On 11/23/2017 07:58 AM, Petr Mladek wrote:
> > > > On Wed 2017-11-15 19:15:38, Thomas Gleixner wrote:
> > > >> For demonstration purposes only.
> > > >>
> > > >> Add a disgusting hack to work around the fact that high resolution clock
> > > >> MONOTONIC accessors are not available during early boot and return stale
> > > >> time stamps accross suspend/resume when the current clocksource is not
> > > >> flagged with CLOCK_SOURCE_SUSPEND_ACCESS_OK.
> > > >>
> > > >> Use local_clock() to provide timestamps in early boot and when the
> > > >> clocksource is not accessible after timekeeping_suspend(). In the
> > > >> suspend/resume case this might cause non monotonic timestamps.
> > > > 
> > > > I get the non-monotonic times even during boot:
> > > > 
> > > > [    0.026709] smp: Bringing up secondary CPUs ...
> > > > [    0.027973] x86: Booting SMP configuration:
> > > > [    0.028006] .... node  #0, CPUs:      #1
> > > > [    0.004000] kvm-clock: cpu 1, msr 1:3ff51041, secondary cpu clock
> > > >      ^^^^^^^^
> > 
> > Which is interesting as this should happen even w/o those patches. 
> 
> Good point. I really see this even without those patches.
> 
> I was confused because I saw the following with this patchset
> but without this latest patch:
> 
> [    0.016000] smp: Bringing up secondary CPUs ...
> [    0.016000] x86: Booting SMP configuration:
> [    0.020000] .... node  #0, CPUs:      #1
> [    0.020000] kvm-clock: cpu 1, msr 1:3ff52041, secondary cpu clock
> [    0.024000] KVM setup async PF for cpu 1
> [    0.024000] kvm-stealtime: cpu 1, msr 13fc8d940
> 
> It is monotonic. The only drawback is that the resolution of the
> monotonic clock is low at this stage.

The kvm-clock printout probably happens before it is completely initialized.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-12-08 19:51 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-15 18:15 [RFC patch 0/7] printk: Switch to CLOCK_MONOTONIC and store extra time stamps Thomas Gleixner
2017-11-15 18:15 ` [RFC patch 1/7] timekeeping: Do not unconditionally suspend NMI safe timekeepers Thomas Gleixner
2017-11-15 18:15 ` [RFC patch 2/7] x86/tsc: Set clocksource CLOCK_SOURCE_SUSPEND_ACCESS_OK Thomas Gleixner
2017-11-15 18:15 ` [RFC patch 3/7] printk: Use clock MONOTONIC for timestamps Thomas Gleixner
2017-11-16  7:58   ` Sergey Senozhatsky
2017-11-15 18:15 ` [RFC patch 4/7] timekeeping: Add NMI safe accessor to mono/boot/real clocks Thomas Gleixner
2017-11-17 23:00   ` Steven Rostedt
2017-11-17 23:12     ` Linus Torvalds
2017-11-17 23:43       ` Thomas Gleixner
2017-11-15 18:15 ` [RFC patch 5/7] crash: Add VMCOREINFO_FIELD_AND_OFFSET() Thomas Gleixner
2017-11-23 12:46   ` Petr Mladek
2017-11-15 18:15 ` [RFC patch 6/7] printk: Store mono/boot/real time timestamps Thomas Gleixner
2017-11-23 13:36   ` Petr Mladek
2017-11-15 18:15 ` [RFC patch 7/7] timekeeping: Hack to use fine grained timestamps during boot Thomas Gleixner
2017-11-23 12:58   ` Petr Mladek
2017-11-28 18:43     ` Prarit Bhargava
2017-11-28 18:47       ` Thomas Gleixner
2017-12-08 11:23         ` Petr Mladek
2017-12-08 19:51           ` Thomas Gleixner
2017-11-28 19:10       ` Mark Salyzyn
2017-11-28 19:45         ` Steven Rostedt
2017-11-28 20:29           ` Mark Salyzyn
2017-11-28 20:38             ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.