linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/2] timekeeping: NMI safe timekeeper enhancements
@ 2020-08-14 10:19 Thomas Gleixner
  2020-08-14 10:19 ` [patch 1/2] timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Thomas Gleixner @ 2020-08-14 10:19 UTC (permalink / raw)
  To: LKML
  Cc: Petr Mladek, Peter Zijlstra, Orson Zhai, Prarit Bhargava,
	Dave Young, Baoquan He, Vivek Goyal, Sergey Senozhatsky,
	Steven Rostedt, John Stultz, Stephen Boyd, kexec

printk intends to store various timestamps (MONOTONIC, REALTIME, BOOTTIME)
to make correlation of dmesg accross different machines easier.

The NMI safe timekeeper allows to retrieve these timestamps from any
context, but it lacks a few things:

  1) The nmi safe accessors are not providing time stamps until timekeeping
     is initialized during early boot.

     This can be mitigated by using sched clock up to the point where time-
     keeping becomes available. This has no side effects because clock
     monotonic takes sched clock into account at initialization time
     anyway. So no random time jumps are possible.

     If early sched clock is not available then there is no difference
     either, obviously. Both return 0.

  2) It requires a new accessor which allows to retrieve all three clock
     timestamps in one go.

     Trivial excercise. But there are a few twists:

     A) Access to boot time can be racy if the sleep time offset on resume
     	is injected after timekeeping resume. That's the case when the RTC
     	or whatever is used to calculate sleep time is not availble when
	the timekeeping core is resumed.

     B) Timestamps are frozen accross the very inner low level
     	suspend/resume path. Not a big problem, but might affect the
     	developer debug printks.

     A detailed description of these two points is in the changelog of
     patch 2.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 1/2] timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot
  2020-08-14 10:19 [patch 0/2] timekeeping: NMI safe timekeeper enhancements Thomas Gleixner
@ 2020-08-14 10:19 ` Thomas Gleixner
  2020-08-23  8:40   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
  2020-08-14 10:19 ` [patch 2/2] timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper Thomas Gleixner
  2020-08-20  8:47 ` [patch 0/2] timekeeping: NMI safe timekeeper enhancements Petr Mladek
  2 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2020-08-14 10:19 UTC (permalink / raw)
  To: LKML
  Cc: Petr Mladek, Peter Zijlstra, Orson Zhai, Prarit Bhargava,
	Dave Young, Baoquan He, Vivek Goyal, Sergey Senozhatsky,
	Steven Rostedt, John Stultz, Stephen Boyd, kexec

[-- Attachment #1: timekeeping--Utilize-local_clock---for-NMI-safe-timekeeper-during-early-boot.patch --]
[-- Type: text/plain, Size: 3214 bytes --]

During early boot the NMI safe timekeeper returns 0 until the first
clocksource becomes available.

This prevents it from being used for printk or other facilities which today
use sched clock. sched clock can be available way before timekeeping is
initialized.

The obvious workaround for this is to utilize the early sched clock in the
default dummy clock read function until a clocksource becomes available.

After switching to the clocksource clock MONOTONIC and BOOTTIME will not
jump because the timekeeping_init() bases clock MONOTONIC on sched clock
and the offset between clock MONOTONIC and BOOTTIME is zero during boot.

Clock REALTIME cannot provide useful timestamps during early boot up to
the point where a persistent clock becomes available, which is either in
timekeeping_init() or later when the RTC driver which might depend on I2C
or other subsystems is initialized.

There is a minor difference to sched_clock() vs. suspend/resume. As the
timekeeper clock source might not be accessible during suspend, after
timekeeping_suspend() timestamps freeze up to the point where
timekeeping_resume() is invoked. OTOH this is true for some sched clock
implementations as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/timekeeping.c |   33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -53,6 +53,9 @@ static struct {
 static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static struct timekeeper shadow_timekeeper;
 
+/* flag for if timekeeping is suspended */
+int __read_mostly timekeeping_suspended;
+
 /**
  * struct tk_fast - NMI safe timekeeper
  * @seq:	Sequence counter for protecting updates. The lowest bit
@@ -72,26 +75,40 @@ static u64 cycles_at_suspend;
 
 static u64 dummy_clock_read(struct clocksource *cs)
 {
-	return cycles_at_suspend;
+	if (timekeeping_suspended)
+		return cycles_at_suspend;
+	return local_clock();
 }
 
 static struct clocksource dummy_clock = {
 	.read = dummy_clock_read,
 };
 
+/*
+ * Boot time initialization which allows local_clock() to be utilized
+ * during early boot when clocksources are not available. local_clock()
+ * returns nanoseconds already so no conversion is required, hence mult=1
+ * and shift=0. When the first proper clocksource is installed then
+ * the fast time keepers are updated with the correct values.
+ */
+#define FAST_TK_INIT						\
+	{							\
+		.clock		= &dummy_clock,			\
+		.mask		= CLOCKSOURCE_MASK(64),		\
+		.mult		= 1,				\
+		.shift		= 0,				\
+	}
+
 static struct tk_fast tk_fast_mono ____cacheline_aligned = {
-	.base[0] = { .clock = &dummy_clock, },
-	.base[1] = { .clock = &dummy_clock, },
+	.base[0] = FAST_TK_INIT,
+	.base[1] = FAST_TK_INIT,
 };
 
 static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
-	.base[0] = { .clock = &dummy_clock, },
-	.base[1] = { .clock = &dummy_clock, },
+	.base[0] = FAST_TK_INIT,
+	.base[1] = FAST_TK_INIT,
 };
 
-/* flag for if timekeeping is suspended */
-int __read_mostly timekeeping_suspended;
-
 static inline void tk_normalize_xtime(struct timekeeper *tk)
 {
 	while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [patch 2/2] timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper
  2020-08-14 10:19 [patch 0/2] timekeeping: NMI safe timekeeper enhancements Thomas Gleixner
  2020-08-14 10:19 ` [patch 1/2] timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot Thomas Gleixner
@ 2020-08-14 10:19 ` Thomas Gleixner
  2020-08-23  8:40   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
  2020-08-20  8:47 ` [patch 0/2] timekeeping: NMI safe timekeeper enhancements Petr Mladek
  2 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2020-08-14 10:19 UTC (permalink / raw)
  To: LKML
  Cc: Petr Mladek, Peter Zijlstra, Orson Zhai, Prarit Bhargava,
	Dave Young, Baoquan He, Vivek Goyal, Sergey Senozhatsky,
	Steven Rostedt, John Stultz, Stephen Boyd, kexec

[-- Attachment #1: printk_Change_timestamp_to_triplet_as_mono_boot_and_real.patch --]
[-- Type: text/plain, Size: 7226 bytes --]

printk wants to store various timestamps (MONOTONIC, REALTIME, BOOTTIME) to
make correlation of dmesg from several systems easier.

Provide an interface to retrieve all three timestamps in one go.

There are some caveats:

1) Boot time and late sleep time injection

  Boot time is a racy access on 32bit systems if the sleep time injection
  happens late during resume and not in timekeeping_resume(). That could be
  avoided by expanding struct tk_read_base with boot offset for 32bit and
  adding more overhead to the update. As this is a hard to observe once per
  resume event which can be filtered with reasonable effort using the
  accurate mono/real timestamps, it's probably not worth the trouble.

  Aside of that it might be possible on 32 and 64 bit to observe the
  following when the sleep time injection happens late:

  CPU 0				         CPU 1
  timekeeping_resume()
  ktime_get_fast_timestamps()
    mono, real = __ktime_get_real_fast()
  					 inject_sleep_time()
  					   update boot offset
  	boot = mono + bootoffset;
  
  That means that boot time already has the sleep time adjustment, but
  real time does not. On the next readout both are in sync again.
  
  Preventing this for 64bit is not really feasible without destroying the
  careful cache layout of the timekeeper because the sequence count and
  struct tk_read_base would then need two cache lines instead of one.

2) Suspend/resume timestamps

   Access to the time keeper clock source is disabled accross the innermost
   steps of suspend/resume. The accessors still work, but the timestamps
   are frozen until time keeping is resumed which happens very early.

   For regular suspend/resume there is no observable difference vs. sched
   clock, but it might affect some of the nasty low level debug printks.

   OTOH, access to sched clock is not guaranteed accross suspend/resume on
   all systems either so it depends on the hardware in use.

   If that turns out to be a real problem then this could be mitigated by
   using sched clock in a similar way as during early boot. But it's not as
   trivial as on early boot because it needs some careful protection
   against the clock monotonic timestamp jumping backwards on resume.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/timekeeping.h |   15 ++++++++
 kernel/time/timekeeping.c   |   76 +++++++++++++++++++++++++++++++++++++-------
 2 files changed, 80 insertions(+), 11 deletions(-)

--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -222,6 +222,18 @@ extern bool timekeeping_rtc_skipresume(v
 
 extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
 
+/*
+ * struct ktime_timestanps - Simultaneous mono/boot/real timestamps
+ * @mono:	Monotonic timestamp
+ * @boot:	Boottime timestamp
+ * @real:	Realtime timestamp
+ */
+struct ktime_timestamps {
+	u64		mono;
+	u64		boot;
+	u64		real;
+};
+
 /**
  * struct system_time_snapshot - simultaneous raw/real time capture with
  *				 counter value
@@ -280,6 +292,9 @@ extern int get_device_system_crosststamp
  */
 extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
 
+/* NMI safe mono/boot/realtime timestamps */
+extern void ktime_get_fast_timestamps(struct ktime_timestamps *snap);
+
 /*
  * Persistent clock related interfaces
  */
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -527,29 +527,29 @@ u64 notrace ktime_get_boot_fast_ns(void)
 }
 EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
 
-
 /*
  * See comment for __ktime_get_fast_ns() vs. timestamp ordering
  */
-static __always_inline u64 __ktime_get_real_fast_ns(struct tk_fast *tkf)
+static __always_inline u64 __ktime_get_real_fast(struct tk_fast *tkf, u64 *mono)
 {
 	struct tk_read_base *tkr;
+	u64 basem, baser, delta;
 	unsigned int seq;
-	u64 now;
 
 	do {
 		seq = raw_read_seqcount_latch(&tkf->seq);
 		tkr = tkf->base + (seq & 0x01);
-		now = ktime_to_ns(tkr->base_real);
+		basem = ktime_to_ns(tkr->base);
+		baser = ktime_to_ns(tkr->base_real);
 
-		now += timekeeping_delta_to_ns(tkr,
-				clocksource_delta(
-					tk_clock_read(tkr),
-					tkr->cycle_last,
-					tkr->mask));
+		delta = timekeeping_delta_to_ns(tkr,
+				clocksource_delta(tk_clock_read(tkr),
+				tkr->cycle_last, tkr->mask));
 	} while (read_seqcount_retry(&tkf->seq, seq));
 
-	return now;
+	if (mono)
+		*mono = basem + delta;
+	return baser + delta;
 }
 
 /**
@@ -557,11 +557,65 @@ static __always_inline u64 __ktime_get_r
  */
 u64 ktime_get_real_fast_ns(void)
 {
-	return __ktime_get_real_fast_ns(&tk_fast_mono);
+	return __ktime_get_real_fast(&tk_fast_mono, NULL);
 }
 EXPORT_SYMBOL_GPL(ktime_get_real_fast_ns);
 
 /**
+ * ktime_get_fast_timestamps: - NMI safe timestamps
+ * @snapshot:	Pointer to timestamp storage
+ *
+ * Stores clock monotonic, boottime and realtime timestamps.
+ *
+ * Boot time is a racy access on 32bit systems if the sleep time injection
+ * happens late during resume and not in timekeeping_resume(). That could
+ * be avoided by expanding struct tk_read_base with boot offset for 32bit
+ * and adding more overhead to the update. As this is a hard to observe
+ * once per resume event which can be filtered with reasonable effort using
+ * the accurate mono/real timestamps, it's probably not worth the trouble.
+ *
+ * Aside of that it might be possible on 32 and 64 bit to observe the
+ * following when the sleep time injection happens late:
+ *
+ * CPU 0				CPU 1
+ * timekeeping_resume()
+ * ktime_get_fast_timestamps()
+ *	mono, real = __ktime_get_real_fast()
+ *					inject_sleep_time()
+ *					   update boot offset
+ *	boot = mono + bootoffset;
+ *
+ * That means that boot time already has the sleep time adjustment, but
+ * real time does not. On the next readout both are in sync again.
+ *
+ * Preventing this for 64bit is not really feasible without destroying the
+ * careful cache layout of the timekeeper because the sequence count and
+ * struct tk_read_base would then need two cache lines instead of one.
+ *
+ * Access to the time keeper clock source is disabled accross the innermost
+ * steps of suspend/resume. The accessors still work, but the timestamps
+ * are frozen until time keeping is resumed which happens very early.
+ *
+ * For regular suspend/resume there is no observable difference vs. sched
+ * clock, but it might affect some of the nasty low level debug printks.
+ *
+ * OTOH, access to sched clock is not guaranteed accross suspend/resume on
+ * all systems either so it depends on the hardware in use.
+ *
+ * If that turns out to be a real problem then this could be mitigated by
+ * using sched clock in a similar way as during early boot. But it's not as
+ * trivial as on early boot because it needs some careful protection
+ * against the clock monotonic timestamp jumping backwards on resume.
+ */
+void ktime_get_fast_timestamps(struct ktime_timestamps *snapshot)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+
+	snapshot->real = __ktime_get_real_fast(&tk_fast_mono, &snapshot->mono);
+	snapshot->boot = snapshot->mono + ktime_to_ns(data_race(tk->offs_boot));
+}
+
+/**
  * halt_fast_timekeeper - Prevent fast timekeeper from accessing clocksource.
  * @tk: Timekeeper to snapshot.
  *


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 0/2] timekeeping: NMI safe timekeeper enhancements
  2020-08-14 10:19 [patch 0/2] timekeeping: NMI safe timekeeper enhancements Thomas Gleixner
  2020-08-14 10:19 ` [patch 1/2] timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot Thomas Gleixner
  2020-08-14 10:19 ` [patch 2/2] timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper Thomas Gleixner
@ 2020-08-20  8:47 ` Petr Mladek
  2020-08-20 10:30   ` Thomas Gleixner
  2 siblings, 1 reply; 9+ messages in thread
From: Petr Mladek @ 2020-08-20  8:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Orson Zhai, Prarit Bhargava, Dave Young,
	Baoquan He, Vivek Goyal, Sergey Senozhatsky, Steven Rostedt,
	John Stultz, Stephen Boyd, kexec

On Fri 2020-08-14 12:19:33, Thomas Gleixner wrote:
> printk intends to store various timestamps (MONOTONIC, REALTIME, BOOTTIME)
> to make correlation of dmesg accross different machines easier.
> 
> The NMI safe timekeeper allows to retrieve these timestamps from any
> context.

For both patches:

Tested-by: Petr Mladek <pmladek@suse.com>

I am not familiar with the timekeeping code so that I could not
provide a valuable review. Anyway, the patches seem to work
as expected.

The interface is perfectly fine for printk() needs.


I tested them with the patch below. The first timestamps appear
as early as before:

[    0.000000] [    0.000000][1970-01-01T00:00:00][    T0] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] [    0.000000][1970-01-01T00:00:00][    T0] kvm-clock: cpu 0, msr 129c01001, primary cpu clock
[    0.000000] [    0.000000][1970-01-01T00:00:00][    T0] kvm-clock: using sched offset of 56519600356309 cycles
[    0.000008] [    0.000008][1970-01-01T00:00:00][    T0] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000022] [    0.000022][1970-01-01T00:00:00][    T0] tsc: Detected 2112.000 MHz processor
[    0.000835] [    0.000835][1970-01-01T00:00:00][    T0] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved

The realtime timestamps get ready later as expected:

[    0.073075] [    0.073075][1970-01-01T00:00:00][    T0] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=3
[    0.075847] [    0.075847][1970-01-01T00:00:00][    T0] NR_IRQS: 524544, nr_irqs: 448, preallocated irqs: 16
[    0.076014] [    0.076014][2020-08-20T07:58:46][    T0] Console: colour dummy device 80x25
[    0.076014] [    0.076014][2020-08-20T07:58:46][    T0] printk: console [tty0] enabled

Also suspend resume seems to work as expected. I tested it the follwing way:

echo core > /sys/power/pm_test
echo reboot > /sys/power/disk
echo disk > /sys/power/state

and the result is:

[  224.422540] [  224.422540][2020-08-20T08:02:31][ T5124] Disabling non-boot CPUs ...
[  224.424171] [  224.424171][2020-08-20T08:02:31][   T15] IRQ fixup: irq 21 move in progress, old vector 36
[  224.425304] [  224.425304][2020-08-20T08:02:31][ T5124] smpboot: CPU 1 is now offline
[  224.426664] [  224.426664][2020-08-20T08:02:31][   T20] IRQ 21: no longer affine to CPU2
[  224.426685] [  224.426685][2020-08-20T08:02:31][   T20] IRQ 24: no longer affine to CPU2
[  224.426717] [  224.426717][2020-08-20T08:02:31][   T20] IRQ 27: no longer affine to CPU2
[  224.427765] [  224.427765][2020-08-20T08:02:31][ T5124] smpboot: CPU 2 is now offline
[  224.428016] [  224.428016][2020-08-20T08:02:31][ T5124] PM: hibernation: debug: Waiting for 5 seconds.
[  224.428218] [  229.618141][2020-08-20T08:02:36][ T5124] Enabling non-boot CPUs ...
   ^^^            ^^^                          ^^
[  224.428560] [  229.618483][2020-08-20T08:02:36][ T5124] x86: Booting SMP configuration:
[  224.428564] [  229.618486][2020-08-20T08:02:36][ T5124] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  224.428766] [  229.618688][2020-08-20T08:02:36][    T0] kvm-clock: cpu 1, msr 129c01041, secondary cpu clock
[  224.449192] [  229.639115][2020-08-20T08:02:36][   T14] kvm-guest: stealtime: cpu 1, msr 17fbf2080


And here is the patch that I used for testing:

From 39bdfebfa94fc55616fe23f2f0b80e06479b65e0 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Thu, 20 Aug 2020 10:18:57 +0200
Subject: [PATCH] printk: Test using all three timestamps: mono, boot, real

Test timekeeper enhancements proposed at
https://lore.kernel.org/r/20200814101933.574326079@linutronix.de

Allow to store and show all three timestamp (mono, boot, real).

It is not final solution. The prefix takes 83 characters that might
make the real text invisible or hard to read. Fragments of seconds
are lost in when printing real time timestamp. Offset is not provided
for crashdump tools.

Only syslog interface is supported. The log can be seen on consoles
or via:

    dmesg -S

Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 kernel/printk/printk.c | 40 ++++++++++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9b75f6bfc333..329f3595b024 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -366,7 +366,7 @@ enum log_flags {
 };
 
 struct printk_log {
-	u64 ts_nsec;		/* timestamp in nanoseconds */
+	struct ktime_timestamps ts; /* timestamps */
 	u16 len;		/* length of entire record */
 	u16 text_len;		/* length of text buffer */
 	u16 dict_len;		/* length of dictionary buffer */
@@ -443,7 +443,7 @@ static u64 clear_seq;
 static u32 clear_idx;
 
 #ifdef CONFIG_PRINTK_CALLER
-#define PREFIX_MAX		48
+#define PREFIX_MAX		48+14+21
 #else
 #define PREFIX_MAX		32
 #endif
@@ -614,7 +614,7 @@ static u32 truncate_msg(u16 *text_len, u16 *trunc_msg_len,
 
 /* insert record into the buffer, discard old ones, update heads */
 static int log_store(u32 caller_id, int facility, int level,
-		     enum log_flags flags, u64 ts_nsec,
+		     enum log_flags flags, struct ktime_timestamps *ts,
 		     const char *dict, u16 dict_len,
 		     const char *text, u16 text_len)
 {
@@ -657,10 +657,10 @@ static int log_store(u32 caller_id, int facility, int level,
 	msg->facility = facility;
 	msg->level = level & 7;
 	msg->flags = flags & 0x1f;
-	if (ts_nsec > 0)
-		msg->ts_nsec = ts_nsec;
+	if (ts)
+		msg->ts = *ts;
 	else
-		msg->ts_nsec = local_clock();
+		ktime_get_fast_timestamps(&msg->ts);
 #ifdef CONFIG_PRINTK_CALLER
 	msg->caller_id = caller_id;
 #endif
@@ -726,7 +726,7 @@ static void append_char(char **pp, char *e, char c)
 static ssize_t msg_print_ext_header(char *buf, size_t size,
 				    struct printk_log *msg, u64 seq)
 {
-	u64 ts_usec = msg->ts_nsec;
+	u64 ts_usec = msg->ts.mono;
 	char caller[20];
 #ifdef CONFIG_PRINTK_CALLER
 	u32 id = msg->caller_id;
@@ -1090,8 +1090,8 @@ void log_buf_vmcoreinfo_setup(void)
 	 * parse it and detect any changes to structure down the line.
 	 */
 	VMCOREINFO_STRUCT_SIZE(printk_log);
-	VMCOREINFO_OFFSET(printk_log, ts_nsec);
-	VMCOREINFO_OFFSET(printk_log, len);
+/*	VMCOREINFO_OFFSET(printk_log, ts);
+ */	VMCOREINFO_OFFSET(printk_log, len);
 	VMCOREINFO_OFFSET(printk_log, text_len);
 	VMCOREINFO_OFFSET(printk_log, dict_len);
 #ifdef CONFIG_PRINTK_CALLER
@@ -1308,6 +1308,14 @@ static size_t print_time(u64 ts, char *buf)
 		       (unsigned long)ts, rem_nsec / 1000);
 }
 
+static size_t print_real_time(u64 ts, char *buf)
+{
+	u64 ts_nsec = ts / 1000000000;
+
+	return sprintf(buf, "[%ptT]", &ts_nsec);
+}
+
+
 #ifdef CONFIG_PRINTK_CALLER
 static size_t print_caller(u32 id, char *buf)
 {
@@ -1329,8 +1337,12 @@ static size_t print_prefix(const struct printk_log *msg, bool syslog,
 	if (syslog)
 		len = print_syslog((msg->facility << 3) | msg->level, buf);
 
-	if (time)
-		len += print_time(msg->ts_nsec, buf + len);
+
+	if (time) {
+		len += print_time(msg->ts.mono, buf + len);
+		len += print_time(msg->ts.boot, buf + len);
+		len += print_real_time(msg->ts.real, buf + len);
+	}
 
 	len += print_caller(msg->caller_id, buf + len);
 
@@ -1855,7 +1867,7 @@ static struct cont {
 	char buf[LOG_LINE_MAX];
 	size_t len;			/* length == 0 means unused buffer */
 	u32 caller_id;			/* printk_caller_id() of first print */
-	u64 ts_nsec;			/* time of first print */
+	struct ktime_timestamps ts;	/* time of first print */
 	u8 level;			/* log level of first message */
 	u8 facility;			/* log facility of first message */
 	enum log_flags flags;		/* prefix, newline flags */
@@ -1867,7 +1879,7 @@ static void cont_flush(void)
 		return;
 
 	log_store(cont.caller_id, cont.facility, cont.level, cont.flags,
-		  cont.ts_nsec, NULL, 0, cont.buf, cont.len);
+		  &cont.ts, NULL, 0, cont.buf, cont.len);
 	cont.len = 0;
 }
 
@@ -1884,7 +1896,7 @@ static bool cont_add(u32 caller_id, int facility, int level,
 		cont.facility = facility;
 		cont.level = level;
 		cont.caller_id = caller_id;
-		cont.ts_nsec = local_clock();
+		ktime_get_fast_timestamps(&cont.ts);
 		cont.flags = flags;
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [patch 0/2] timekeeping: NMI safe timekeeper enhancements
  2020-08-20  8:47 ` [patch 0/2] timekeeping: NMI safe timekeeper enhancements Petr Mladek
@ 2020-08-20 10:30   ` Thomas Gleixner
  2020-08-20 10:43     ` Petr Mladek
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2020-08-20 10:30 UTC (permalink / raw)
  To: Petr Mladek
  Cc: LKML, Peter Zijlstra, Orson Zhai, Prarit Bhargava, Dave Young,
	Baoquan He, Vivek Goyal, Sergey Senozhatsky, Steven Rostedt,
	John Stultz, Stephen Boyd, kexec

Petr,

On Thu, Aug 20 2020 at 10:47, Petr Mladek wrote:
> The interface is perfectly fine for printk() needs.

Good. So I suggest that I apply that on top of rc1 somewhere in tip and
tag the top commit. So you can pull that tag into your printk branch and
go wild.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch 0/2] timekeeping: NMI safe timekeeper enhancements
  2020-08-20 10:30   ` Thomas Gleixner
@ 2020-08-20 10:43     ` Petr Mladek
  2020-08-23  8:43       ` Thomas Gleixner
  0 siblings, 1 reply; 9+ messages in thread
From: Petr Mladek @ 2020-08-20 10:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Orson Zhai, Prarit Bhargava, Dave Young,
	Baoquan He, Vivek Goyal, Sergey Senozhatsky, Steven Rostedt,
	John Stultz, Stephen Boyd, kexec

On Thu 2020-08-20 12:30:55, Thomas Gleixner wrote:
> Petr,
> 
> On Thu, Aug 20 2020 at 10:47, Petr Mladek wrote:
> > The interface is perfectly fine for printk() needs.
> 
> Good. So I suggest that I apply that on top of rc1 somewhere in tip and
> tag the top commit. So you can pull that tag into your printk branch and
> go wild.

Sounds good to me.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: timers/core] timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper
  2020-08-14 10:19 ` [patch 2/2] timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper Thomas Gleixner
@ 2020-08-23  8:40   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-08-23  8:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, Petr Mladek, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     e2d977c9f1abd1d199b412f8f83c1727808b794d
Gitweb:        https://git.kernel.org/tip/e2d977c9f1abd1d199b412f8f83c1727808b794d
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 14 Aug 2020 12:19:35 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sun, 23 Aug 2020 10:38:24 +02:00

timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper

printk wants to store various timestamps (MONOTONIC, REALTIME, BOOTTIME) to
make correlation of dmesg from several systems easier.

Provide an interface to retrieve all three timestamps in one go.

There are some caveats:

1) Boot time and late sleep time injection

  Boot time is a racy access on 32bit systems if the sleep time injection
  happens late during resume and not in timekeeping_resume(). That could be
  avoided by expanding struct tk_read_base with boot offset for 32bit and
  adding more overhead to the update. As this is a hard to observe once per
  resume event which can be filtered with reasonable effort using the
  accurate mono/real timestamps, it's probably not worth the trouble.

  Aside of that it might be possible on 32 and 64 bit to observe the
  following when the sleep time injection happens late:

  CPU 0				         CPU 1
  timekeeping_resume()
  ktime_get_fast_timestamps()
    mono, real = __ktime_get_real_fast()
  					 inject_sleep_time()
  					   update boot offset
  	boot = mono + bootoffset;
  
  That means that boot time already has the sleep time adjustment, but
  real time does not. On the next readout both are in sync again.
  
  Preventing this for 64bit is not really feasible without destroying the
  careful cache layout of the timekeeper because the sequence count and
  struct tk_read_base would then need two cache lines instead of one.

2) Suspend/resume timestamps

   Access to the time keeper clock source is disabled accross the innermost
   steps of suspend/resume. The accessors still work, but the timestamps
   are frozen until time keeping is resumed which happens very early.

   For regular suspend/resume there is no observable difference vs. sched
   clock, but it might affect some of the nasty low level debug printks.

   OTOH, access to sched clock is not guaranteed accross suspend/resume on
   all systems either so it depends on the hardware in use.

   If that turns out to be a real problem then this could be mitigated by
   using sched clock in a similar way as during early boot. But it's not as
   trivial as on early boot because it needs some careful protection
   against the clock monotonic timestamp jumping backwards on resume.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Petr Mladek <pmladek@suse.com>                                                                                                                                                                                                                                      
Link: https://lore.kernel.org/r/20200814115512.159981360@linutronix.de

---
 include/linux/timekeeping.h | 15 +++++++-
 kernel/time/timekeeping.c   | 76 ++++++++++++++++++++++++++++++------
 2 files changed, 80 insertions(+), 11 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index d5471d6..7f7e4a3 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -222,6 +222,18 @@ extern bool timekeeping_rtc_skipresume(void);
 
 extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta);
 
+/*
+ * struct ktime_timestanps - Simultaneous mono/boot/real timestamps
+ * @mono:	Monotonic timestamp
+ * @boot:	Boottime timestamp
+ * @real:	Realtime timestamp
+ */
+struct ktime_timestamps {
+	u64		mono;
+	u64		boot;
+	u64		real;
+};
+
 /**
  * struct system_time_snapshot - simultaneous raw/real time capture with
  *				 counter value
@@ -280,6 +292,9 @@ extern int get_device_system_crosststamp(
  */
 extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
 
+/* NMI safe mono/boot/realtime timestamps */
+extern void ktime_get_fast_timestamps(struct ktime_timestamps *snap);
+
 /*
  * Persistent clock related interfaces
  */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 57d064d..ba76576 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -530,29 +530,29 @@ u64 notrace ktime_get_boot_fast_ns(void)
 }
 EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
 
-
 /*
  * See comment for __ktime_get_fast_ns() vs. timestamp ordering
  */
-static __always_inline u64 __ktime_get_real_fast_ns(struct tk_fast *tkf)
+static __always_inline u64 __ktime_get_real_fast(struct tk_fast *tkf, u64 *mono)
 {
 	struct tk_read_base *tkr;
+	u64 basem, baser, delta;
 	unsigned int seq;
-	u64 now;
 
 	do {
 		seq = raw_read_seqcount_latch(&tkf->seq);
 		tkr = tkf->base + (seq & 0x01);
-		now = ktime_to_ns(tkr->base_real);
+		basem = ktime_to_ns(tkr->base);
+		baser = ktime_to_ns(tkr->base_real);
 
-		now += timekeeping_delta_to_ns(tkr,
-				clocksource_delta(
-					tk_clock_read(tkr),
-					tkr->cycle_last,
-					tkr->mask));
+		delta = timekeeping_delta_to_ns(tkr,
+				clocksource_delta(tk_clock_read(tkr),
+				tkr->cycle_last, tkr->mask));
 	} while (read_seqcount_retry(&tkf->seq, seq));
 
-	return now;
+	if (mono)
+		*mono = basem + delta;
+	return baser + delta;
 }
 
 /**
@@ -560,11 +560,65 @@ static __always_inline u64 __ktime_get_real_fast_ns(struct tk_fast *tkf)
  */
 u64 ktime_get_real_fast_ns(void)
 {
-	return __ktime_get_real_fast_ns(&tk_fast_mono);
+	return __ktime_get_real_fast(&tk_fast_mono, NULL);
 }
 EXPORT_SYMBOL_GPL(ktime_get_real_fast_ns);
 
 /**
+ * ktime_get_fast_timestamps: - NMI safe timestamps
+ * @snapshot:	Pointer to timestamp storage
+ *
+ * Stores clock monotonic, boottime and realtime timestamps.
+ *
+ * Boot time is a racy access on 32bit systems if the sleep time injection
+ * happens late during resume and not in timekeeping_resume(). That could
+ * be avoided by expanding struct tk_read_base with boot offset for 32bit
+ * and adding more overhead to the update. As this is a hard to observe
+ * once per resume event which can be filtered with reasonable effort using
+ * the accurate mono/real timestamps, it's probably not worth the trouble.
+ *
+ * Aside of that it might be possible on 32 and 64 bit to observe the
+ * following when the sleep time injection happens late:
+ *
+ * CPU 0				CPU 1
+ * timekeeping_resume()
+ * ktime_get_fast_timestamps()
+ *	mono, real = __ktime_get_real_fast()
+ *					inject_sleep_time()
+ *					   update boot offset
+ *	boot = mono + bootoffset;
+ *
+ * That means that boot time already has the sleep time adjustment, but
+ * real time does not. On the next readout both are in sync again.
+ *
+ * Preventing this for 64bit is not really feasible without destroying the
+ * careful cache layout of the timekeeper because the sequence count and
+ * struct tk_read_base would then need two cache lines instead of one.
+ *
+ * Access to the time keeper clock source is disabled accross the innermost
+ * steps of suspend/resume. The accessors still work, but the timestamps
+ * are frozen until time keeping is resumed which happens very early.
+ *
+ * For regular suspend/resume there is no observable difference vs. sched
+ * clock, but it might affect some of the nasty low level debug printks.
+ *
+ * OTOH, access to sched clock is not guaranteed accross suspend/resume on
+ * all systems either so it depends on the hardware in use.
+ *
+ * If that turns out to be a real problem then this could be mitigated by
+ * using sched clock in a similar way as during early boot. But it's not as
+ * trivial as on early boot because it needs some careful protection
+ * against the clock monotonic timestamp jumping backwards on resume.
+ */
+void ktime_get_fast_timestamps(struct ktime_timestamps *snapshot)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+
+	snapshot->real = __ktime_get_real_fast(&tk_fast_mono, &snapshot->mono);
+	snapshot->boot = snapshot->mono + ktime_to_ns(data_race(tk->offs_boot));
+}
+
+/**
  * halt_fast_timekeeper - Prevent fast timekeeper from accessing clocksource.
  * @tk: Timekeeper to snapshot.
  *

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [tip: timers/core] timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot
  2020-08-14 10:19 ` [patch 1/2] timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot Thomas Gleixner
@ 2020-08-23  8:40   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2020-08-23  8:40 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, Petr Mladek, x86, LKML

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     71419b30cab099f7ca37e61bf41028d8b7d4984d
Gitweb:        https://git.kernel.org/tip/71419b30cab099f7ca37e61bf41028d8b7d4984d
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Fri, 14 Aug 2020 12:19:34 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sun, 23 Aug 2020 10:38:24 +02:00

timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot

During early boot the NMI safe timekeeper returns 0 until the first
clocksource becomes available.

This prevents it from being used for printk or other facilities which today
use sched clock. sched clock can be available way before timekeeping is
initialized.

The obvious workaround for this is to utilize the early sched clock in the
default dummy clock read function until a clocksource becomes available.

After switching to the clocksource clock MONOTONIC and BOOTTIME will not
jump because the timekeeping_init() bases clock MONOTONIC on sched clock
and the offset between clock MONOTONIC and BOOTTIME is zero during boot.

Clock REALTIME cannot provide useful timestamps during early boot up to
the point where a persistent clock becomes available, which is either in
timekeeping_init() or later when the RTC driver which might depend on I2C
or other subsystems is initialized.

There is a minor difference to sched_clock() vs. suspend/resume. As the
timekeeper clock source might not be accessible during suspend, after
timekeeping_suspend() timestamps freeze up to the point where
timekeeping_resume() is invoked. OTOH this is true for some sched clock
implementations as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Petr Mladek <pmladek@suse.com>                                                                                                                                                                                                                                      
Link: https://lore.kernel.org/r/20200814115512.041422402@linutronix.de

---
 kernel/time/timekeeping.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4c47f38..57d064d 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -54,6 +54,9 @@ static struct {
 
 static struct timekeeper shadow_timekeeper;
 
+/* flag for if timekeeping is suspended */
+int __read_mostly timekeeping_suspended;
+
 /**
  * struct tk_fast - NMI safe timekeeper
  * @seq:	Sequence counter for protecting updates. The lowest bit
@@ -73,28 +76,42 @@ static u64 cycles_at_suspend;
 
 static u64 dummy_clock_read(struct clocksource *cs)
 {
-	return cycles_at_suspend;
+	if (timekeeping_suspended)
+		return cycles_at_suspend;
+	return local_clock();
 }
 
 static struct clocksource dummy_clock = {
 	.read = dummy_clock_read,
 };
 
+/*
+ * Boot time initialization which allows local_clock() to be utilized
+ * during early boot when clocksources are not available. local_clock()
+ * returns nanoseconds already so no conversion is required, hence mult=1
+ * and shift=0. When the first proper clocksource is installed then
+ * the fast time keepers are updated with the correct values.
+ */
+#define FAST_TK_INIT						\
+	{							\
+		.clock		= &dummy_clock,			\
+		.mask		= CLOCKSOURCE_MASK(64),		\
+		.mult		= 1,				\
+		.shift		= 0,				\
+	}
+
 static struct tk_fast tk_fast_mono ____cacheline_aligned = {
 	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, &timekeeper_lock),
-	.base[0] = { .clock = &dummy_clock, },
-	.base[1] = { .clock = &dummy_clock, },
+	.base[0] = FAST_TK_INIT,
+	.base[1] = FAST_TK_INIT,
 };
 
 static struct tk_fast tk_fast_raw  ____cacheline_aligned = {
 	.seq     = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, &timekeeper_lock),
-	.base[0] = { .clock = &dummy_clock, },
-	.base[1] = { .clock = &dummy_clock, },
+	.base[0] = FAST_TK_INIT,
+	.base[1] = FAST_TK_INIT,
 };
 
-/* flag for if timekeeping is suspended */
-int __read_mostly timekeeping_suspended;
-
 static inline void tk_normalize_xtime(struct timekeeper *tk)
 {
 	while (tk->tkr_mono.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_mono.shift)) {

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [patch 0/2] timekeeping: NMI safe timekeeper enhancements
  2020-08-20 10:43     ` Petr Mladek
@ 2020-08-23  8:43       ` Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2020-08-23  8:43 UTC (permalink / raw)
  To: Petr Mladek
  Cc: LKML, Peter Zijlstra, Orson Zhai, Prarit Bhargava, Dave Young,
	Baoquan He, Vivek Goyal, Sergey Senozhatsky, Steven Rostedt,
	John Stultz, Stephen Boyd, kexec

Petr,

On Thu, Aug 20 2020 at 12:43, Petr Mladek wrote:
> On Thu 2020-08-20 12:30:55, Thomas Gleixner wrote:
>> Good. So I suggest that I apply that on top of rc1 somewhere in tip and
>> tag the top commit. So you can pull that tag into your printk branch and
>> go wild.
>
> Sounds good to me.

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timekeeping-for-printk-2020-08-23

Have fun!

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-23  8:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14 10:19 [patch 0/2] timekeeping: NMI safe timekeeper enhancements Thomas Gleixner
2020-08-14 10:19 ` [patch 1/2] timekeeping: Utilize local_clock() for NMI safe timekeeper during early boot Thomas Gleixner
2020-08-23  8:40   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2020-08-14 10:19 ` [patch 2/2] timekeeping: Provide multi-timestamp accessor to NMI safe timekeeper Thomas Gleixner
2020-08-23  8:40   ` [tip: timers/core] " tip-bot2 for Thomas Gleixner
2020-08-20  8:47 ` [patch 0/2] timekeeping: NMI safe timekeeper enhancements Petr Mladek
2020-08-20 10:30   ` Thomas Gleixner
2020-08-20 10:43     ` Petr Mladek
2020-08-23  8:43       ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).