All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] Early boot time stamps for x86
@ 2017-08-30 18:03 Pavel Tatashin
  2017-08-30 18:03 ` [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot Pavel Tatashin
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Pavel Tatashin @ 2017-08-30 18:03 UTC (permalink / raw)
  To: linux, schwidefsky, heiko.carstens, john.stultz, sboyd,
	pasha.tatashin, x86, linux-kernel, mingo, peterz, tglx, hpa,
	douly.fnst

changelog
---------
v5 - v6
	- Added a new patch:
		time: sync read_boot_clock64() with persistent clock
	  Which fixes missing __init macro, and enabled time discrepancy
	  fix that was noted by Thomas Gleixner
	- Split "x86/time: read_boot_clock64() implementation" into a
	  separate patch
v4 - v5 - Fix compiler warnings on systems with stable clocks.
v3 - v4
	- Fixed tsc_early_fini() call to be in the 2nd patch as reported
	  by Dou Liyang
	- Improved comment before __use_sched_clock_early to explain why
	  we need both booleans.
	- Simplified valid_clock logic in read_boot_clock64().

v2 - v3
	- Addressed comment from Thomas Gleixner
	- Timestamps are available a little later in boot but still much
	  earlier than in mainline. This significantly simplified this
	  work.
v1 - v2
	In patch "x86/tsc: tsc early":
	- added tsc_adjusted_early()
	- fixed 32-bit compile error use do_div()

Adding early boot time stamps support for x86 machines.
SPARC patches for early boot time stamps are already integrated into
mainline linux.

Sample output
-------------
Before:
https://hastebin.com/jadaqukubu.scala

After:
https://hastebin.com/nubipozacu.scala

For more exaples how early time stamps are used, see this work:
https://lwn.net/Articles/732233/

As seen above, currently timestamps are available from around the time when
"Security Framework" is initialized. But, 26s already passed until we
reached to this point.
Pavel Tatashin (2):
  sched/clock: interface to allow timestamps early in boot
  x86/tsc: use tsc early

Pavel Tatashin (4):
  sched/clock: interface to allow timestamps early in boot
  time: sync read_boot_clock64() with persistent clock
  x86/time: read_boot_clock64() implementation
  x86/tsc: use tsc early

 arch/arm/kernel/time.c      |  2 +-
 arch/s390/kernel/time.c     |  2 +-
 arch/x86/include/asm/tsc.h  |  4 +++
 arch/x86/kernel/setup.c     | 10 +++++--
 arch/x86/kernel/time.c      | 31 ++++++++++++++++++++++
 arch/x86/kernel/tsc.c       | 47 +++++++++++++++++++++++++++++++++
 include/linux/sched/clock.h |  4 +++
 include/linux/timekeeping.h | 10 +++----
 kernel/sched/clock.c        | 63 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/time/timekeeping.c   |  8 ++++--
 10 files changed, 169 insertions(+), 12 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-08-30 18:03 [PATCH v6 0/4] Early boot time stamps for x86 Pavel Tatashin
@ 2017-08-30 18:03 ` Pavel Tatashin
  2017-09-27 12:58   ` Peter Zijlstra
  2017-09-27 14:45   ` Russell King - ARM Linux
  2017-08-30 18:03 ` [PATCH v6 2/4] time: sync read_boot_clock64() with persistent clock Pavel Tatashin
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 26+ messages in thread
From: Pavel Tatashin @ 2017-08-30 18:03 UTC (permalink / raw)
  To: linux, schwidefsky, heiko.carstens, john.stultz, sboyd,
	pasha.tatashin, x86, linux-kernel, mingo, peterz, tglx, hpa,
	douly.fnst

In Linux printk() can output timestamps next to every line.  This is very
useful for tracking regressions, and finding places that can be optimized.
However, the timestamps are available only later in boot. On smaller
machines it is insignificant amount of time, but on larger it can be many
seconds or even minutes into the boot process.

This patch adds an interface for platforms with unstable sched clock to
show timestamps early in boot. In order to get this functionality a
platform must:

- Implement u64 sched_clock_early()
  Clock that returns monotonic time

- Call sched_clock_early_init()
  Tells sched clock that the early clock can be used

- Call sched_clock_early_fini()
  Tells sched clock that the early clock is finished, and sched clock
  should hand over the operation to permanent clock.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 include/linux/sched/clock.h |  4 +++
 kernel/sched/clock.c        | 63 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
index a55600ffdf4b..f8291fa28c0c 100644
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -63,6 +63,10 @@ extern void sched_clock_tick_stable(void);
 extern void sched_clock_idle_sleep_event(void);
 extern void sched_clock_idle_wakeup_event(void);
 
+void sched_clock_early_init(void);
+void sched_clock_early_fini(void);
+u64 sched_clock_early(void);
+
 /*
  * As outlined in clock.c, provides a fast, high resolution, nanosecond
  * time source that is monotonic per cpu argument and has bounded drift
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index ca0f8fc945c6..2a41791b22fa 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -80,9 +80,17 @@ EXPORT_SYMBOL_GPL(sched_clock);
 
 __read_mostly int sched_clock_running;
 
+static bool __read_mostly sched_clock_early_running;
+
 void sched_clock_init(void)
 {
-	sched_clock_running = 1;
+	/*
+	 * We start clock once early clock is finished or if early clock
+	 * was not running.
+	 */
+	if (!sched_clock_early_running)
+		sched_clock_running = 1;
+
 }
 
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
@@ -96,6 +104,16 @@ void sched_clock_init(void)
 static DEFINE_STATIC_KEY_FALSE(__sched_clock_stable);
 static int __sched_clock_stable_early = 1;
 
+/*
+ * Because static branches cannot be altered before jump_label_init() is called,
+ * and early time stamps may be initialized before that, we start with sched
+ * clock early static branch enabled, and global status disabled.  Early in boot
+ * it is decided whether to enable the global status as well (set
+ * sched_clock_early_running to true), later when early clock is no longer
+ * needed, the static branch is disabled to keep hot-path fast.
+ */
+static DEFINE_STATIC_KEY_TRUE(__use_sched_clock_early);
+
 /*
  * We want: ktime_get_ns() + __gtod_offset == sched_clock() + __sched_clock_offset
  */
@@ -362,6 +380,11 @@ u64 sched_clock_cpu(int cpu)
 	if (sched_clock_stable())
 		return sched_clock() + __sched_clock_offset;
 
+	if (static_branch_unlikely(&__use_sched_clock_early)) {
+		if (sched_clock_early_running)
+			return sched_clock_early();
+	}
+
 	if (unlikely(!sched_clock_running))
 		return 0ull;
 
@@ -444,6 +467,44 @@ void sched_clock_idle_wakeup_event(void)
 }
 EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
 
+u64 __weak sched_clock_early(void)
+{
+	return 0;
+}
+
+/*
+ * Is called when sched_clock_early() is about to be finished: notifies sched
+ * clock that after this call sched_clock_early() cannot be used.
+ */
+void __init sched_clock_early_fini(void)
+{
+	struct sched_clock_data *scd = this_scd();
+	u64 now_early, now_sched;
+
+	now_early = sched_clock_early();
+	now_sched = sched_clock();
+
+	__gtod_offset = now_early - scd->tick_gtod;
+	__sched_clock_offset = now_early - now_sched;
+
+	sched_clock_early_running = false;
+	static_branch_disable(&__use_sched_clock_early);
+
+	/* Now that early clock is finished, start regular sched clock */
+	sched_clock_init();
+}
+
+/*
+ * Notifies sched clock that early boot clocksource is available, it means that
+ * the current platform has implemented sched_clock_early().
+ *
+ * The early clock is running until sched_clock_early_fini is called.
+ */
+void __init sched_clock_early_init(void)
+{
+	sched_clock_early_running = true;
+}
+
 #else /* CONFIG_HAVE_UNSTABLE_SCHED_CLOCK */
 
 u64 sched_clock_cpu(int cpu)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 2/4] time: sync read_boot_clock64() with persistent clock
  2017-08-30 18:03 [PATCH v6 0/4] Early boot time stamps for x86 Pavel Tatashin
  2017-08-30 18:03 ` [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot Pavel Tatashin
@ 2017-08-30 18:03 ` Pavel Tatashin
  2017-08-30 18:03 ` [PATCH v6 3/4] x86/time: read_boot_clock64() implementation Pavel Tatashin
  2017-08-30 18:03 ` [PATCH v6 4/4] x86/tsc: use tsc early Pavel Tatashin
  3 siblings, 0 replies; 26+ messages in thread
From: Pavel Tatashin @ 2017-08-30 18:03 UTC (permalink / raw)
  To: linux, schwidefsky, heiko.carstens, john.stultz, sboyd,
	pasha.tatashin, x86, linux-kernel, mingo, peterz, tglx, hpa,
	douly.fnst

read_boot_clock64() returns a boot start timestamp from epoch. Some arches
may need to access the persistent clock interface in order to calculate the
epoch offset. However, the resolution of the persistent clock might be low.
Therefore, in order to avoid time discrepancies a new argument 'now' is
added to read_boot_clock64() parameters. Arch may decide to use it instead
of accessing persistent clock again.

Also, change read_boot_clock64() to have __init prototype since it is
accessed only during boot.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 arch/arm/kernel/time.c      |  2 +-
 arch/s390/kernel/time.c     |  2 +-
 include/linux/timekeeping.h | 10 +++++-----
 kernel/time/timekeeping.c   |  8 ++++++--
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/time.c b/arch/arm/kernel/time.c
index 629f8e9981f1..5b259261a268 100644
--- a/arch/arm/kernel/time.c
+++ b/arch/arm/kernel/time.c
@@ -90,7 +90,7 @@ void read_persistent_clock64(struct timespec64 *ts)
 	__read_persistent_clock(ts);
 }
 
-void read_boot_clock64(struct timespec64 *ts)
+void __init read_boot_clock64(struct timespec64 *now, struct timespec64 *ts)
 {
 	__read_boot_clock(ts);
 }
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 192efdfac918..fd3050e2e825 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -203,7 +203,7 @@ void read_persistent_clock64(struct timespec64 *ts)
 	tod_to_timeval(clock - TOD_UNIX_EPOCH, ts);
 }
 
-void read_boot_clock64(struct timespec64 *ts)
+void __init read_boot_clock64(struct timespec64 *now, struct timespec64 *ts)
 {
 	__u64 clock;
 
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ddc229ff6d1e..ffe5705bd064 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -340,11 +340,11 @@ extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
  */
 extern int persistent_clock_is_local;
 
-extern void read_persistent_clock(struct timespec *ts);
-extern void read_persistent_clock64(struct timespec64 *ts);
-extern void read_boot_clock64(struct timespec64 *ts);
-extern int update_persistent_clock(struct timespec now);
-extern int update_persistent_clock64(struct timespec64 now);
+void read_persistent_clock(struct timespec *ts);
+void read_persistent_clock64(struct timespec64 *ts);
+void read_boot_clock64(struct timespec64 *now, struct timespec64 *ts);
+int update_persistent_clock(struct timespec now);
+int update_persistent_clock64(struct timespec64 now);
 
 
 #endif
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cedafa008de5..a74f4c3a46a4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1468,9 +1468,13 @@ void __weak read_persistent_clock64(struct timespec64 *ts64)
  * Function to read the exact time the system has been started.
  * Returns a timespec64 with tv_sec=0 and tv_nsec=0 if unsupported.
  *
+ * Argument 'now' contains time from persistent clock to calculate offset from
+ * epoch. May contain zeros if persist ant clock is not available.
+ *
  *  XXX - Do be sure to remove it once all arches implement it.
  */
-void __weak read_boot_clock64(struct timespec64 *ts)
+void __weak __init read_boot_clock64(struct timespec64 *now,
+				     struct timespec64 *ts)
 {
 	ts->tv_sec = 0;
 	ts->tv_nsec = 0;
@@ -1501,7 +1505,7 @@ void __init timekeeping_init(void)
 	} else if (now.tv_sec || now.tv_nsec)
 		persistent_clock_exists = true;
 
-	read_boot_clock64(&boot);
+	read_boot_clock64(&now, &boot);
 	if (!timespec64_valid_strict(&boot)) {
 		pr_warn("WARNING: Boot clock returned invalid value!\n"
 			"         Check your CMOS/BIOS settings.\n");
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 3/4] x86/time: read_boot_clock64() implementation
  2017-08-30 18:03 [PATCH v6 0/4] Early boot time stamps for x86 Pavel Tatashin
  2017-08-30 18:03 ` [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot Pavel Tatashin
  2017-08-30 18:03 ` [PATCH v6 2/4] time: sync read_boot_clock64() with persistent clock Pavel Tatashin
@ 2017-08-30 18:03 ` Pavel Tatashin
  2017-08-30 18:03 ` [PATCH v6 4/4] x86/tsc: use tsc early Pavel Tatashin
  3 siblings, 0 replies; 26+ messages in thread
From: Pavel Tatashin @ 2017-08-30 18:03 UTC (permalink / raw)
  To: linux, schwidefsky, heiko.carstens, john.stultz, sboyd,
	pasha.tatashin, x86, linux-kernel, mingo, peterz, tglx, hpa,
	douly.fnst

read_boot_clock64() returns time of when system started. Now, that
sched_clock_early() is available on systems with unstable clocks it is
possible to implement x86 specific version of read_boot_clock64() that
takes advantage of this new interface.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 arch/x86/kernel/time.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index e0754cdbad37..fbad8bf2fa24 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -14,6 +14,7 @@
 #include <linux/i8253.h>
 #include <linux/time.h>
 #include <linux/export.h>
+#include <linux/sched/clock.h>
 
 #include <asm/vsyscall.h>
 #include <asm/x86_init.h>
@@ -95,3 +96,32 @@ void __init time_init(void)
 {
 	late_time_init = x86_late_time_init;
 }
+
+/*
+ * Called once during to boot to initialize boot time.
+ * This function returns timestamp in timespec format which is sec/nsec from
+ * epoch of when boot started.
+ * We use sched_clock_early() that gives us nanoseconds from when this clock has
+ * been started and it happens quiet early during boot process. To calculate
+ * offset from epoch we use information provided in 'now' by the caller
+ *
+ * If sched_clock_early() is not available or if there is any kind of error
+ * i.e. time from epoch is smaller than boot time, we must return zeros in ts,
+ * and the caller will take care of the error: by assuming that the time when
+ * this function was called is the beginning of boot time.
+ */
+void __init read_boot_clock64(struct timespec64 *now, struct timespec64 *ts)
+{
+	u64 ns_boot = sched_clock_early();
+	bool valid_clock;
+	u64 ns_now;
+
+	ns_now = timespec64_to_ns(now);
+	valid_clock = ns_boot && timespec64_valid_strict(now) &&
+			(ns_now > ns_boot);
+
+	if (!valid_clock)
+		*ts = (struct timespec64){0, 0};
+	else
+		*ts = ns_to_timespec64(ns_now - ns_boot);
+}
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v6 4/4] x86/tsc: use tsc early
  2017-08-30 18:03 [PATCH v6 0/4] Early boot time stamps for x86 Pavel Tatashin
                   ` (2 preceding siblings ...)
  2017-08-30 18:03 ` [PATCH v6 3/4] x86/time: read_boot_clock64() implementation Pavel Tatashin
@ 2017-08-30 18:03 ` Pavel Tatashin
       [not found]   ` <CALBSrqBKsojGGpe85GOg7jda-SJHLrR=pS-Pg-xa0SUg7j3OQA@mail.gmail.com>
  3 siblings, 1 reply; 26+ messages in thread
From: Pavel Tatashin @ 2017-08-30 18:03 UTC (permalink / raw)
  To: linux, schwidefsky, heiko.carstens, john.stultz, sboyd,
	pasha.tatashin, x86, linux-kernel, mingo, peterz, tglx, hpa,
	douly.fnst

tsc_early_init():
Determines offset, shift and multiplier for the early clock based on the
TSC frequency. Notifies sched clock by calling sched_clock_early_init()
that early clock is available.

tsc_early_fini()
Implement the finish part of early tsc feature, prints message about the
offset, which can be useful to find out how much time was spent in post and
boot manager (if TSC starts from 0 during boot), and also calls
sched_clock_early_fini() to let sched clock that early clock cannot be used
anymore.

sched_clock_early():
TSC based implementation of weak function that is defined in sched clock.

Call tsc_early_init() to initialize early boot time stamps functionality on
the supported x86 platforms, and call tsc_early_fini() to finish this
feature after permanent tsc has been initialized.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 arch/x86/include/asm/tsc.h |  4 ++++
 arch/x86/kernel/setup.c    | 10 ++++++++--
 arch/x86/kernel/time.c     |  1 +
 arch/x86/kernel/tsc.c      | 47 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index f5e6f1c417df..6dc9618b24e3 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -50,11 +50,15 @@ extern bool tsc_store_and_check_tsc_adjust(bool bootcpu);
 extern void tsc_verify_tsc_adjust(bool resume);
 extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
+void tsc_early_init(unsigned int khz);
+void tsc_early_fini(void);
 #else
 static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) { return false; }
 static inline void tsc_verify_tsc_adjust(bool resume) { }
 static inline void check_tsc_sync_source(int cpu) { }
 static inline void check_tsc_sync_target(void) { }
+static inline void tsc_early_init(unsigned int khz) { }
+static inline void tsc_early_fini(void) { }
 #endif
 
 extern int notsc_setup(char *);
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3486d0498800..413434d98a23 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -812,7 +812,11 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 	return 0;
 }
 
-static void __init simple_udelay_calibration(void)
+/*
+ * Initialize early tsc to show early boot timestamps, and also loops_per_jiffy
+ * for udelay
+ */
+static void __init early_clock_calibration(void)
 {
 	unsigned int tsc_khz, cpu_khz;
 	unsigned long lpj;
@@ -827,6 +831,8 @@ static void __init simple_udelay_calibration(void)
 	if (!tsc_khz)
 		return;
 
+	tsc_early_init(tsc_khz);
+
 	lpj = tsc_khz * 1000;
 	do_div(lpj, HZ);
 	loops_per_jiffy = lpj;
@@ -1039,7 +1045,7 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	init_hypervisor_platform();
 
-	simple_udelay_calibration();
+	early_clock_calibration();
 
 	x86_init.resources.probe_roms();
 
diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index fbad8bf2fa24..44411d769b53 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -86,6 +86,7 @@ static __init void x86_late_time_init(void)
 {
 	x86_init.timers.timer_init();
 	tsc_init();
+	tsc_early_fini();
 }
 
 /*
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 796d96bb0821..bd44c2dd4235 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1263,6 +1263,53 @@ static int __init init_tsc_clocksource(void)
  */
 device_initcall(init_tsc_clocksource);
 
+#ifdef CONFIG_X86_TSC
+
+static struct cyc2ns_data  cyc2ns_early;
+static bool sched_clock_early_enabled;
+
+u64 sched_clock_early(void)
+{
+	u64 ns;
+
+	if (!sched_clock_early_enabled)
+		return 0;
+	ns = mul_u64_u32_shr(rdtsc(), cyc2ns_early.cyc2ns_mul,
+			     cyc2ns_early.cyc2ns_shift);
+	return ns + cyc2ns_early.cyc2ns_offset;
+}
+
+/*
+ * Initialize clock for early time stamps
+ */
+void __init tsc_early_init(unsigned int khz)
+{
+	sched_clock_early_enabled = true;
+	clocks_calc_mult_shift(&cyc2ns_early.cyc2ns_mul,
+			       &cyc2ns_early.cyc2ns_shift,
+			       khz, NSEC_PER_MSEC, 0);
+	cyc2ns_early.cyc2ns_offset = -sched_clock_early();
+	sched_clock_early_init();
+}
+
+void __init tsc_early_fini(void)
+{
+	unsigned long long t;
+	unsigned long r;
+
+	/* We did not have early sched clock if multiplier is 0 */
+	if (cyc2ns_early.cyc2ns_mul == 0)
+		return;
+
+	t = -cyc2ns_early.cyc2ns_offset;
+	r = do_div(t, NSEC_PER_SEC);
+
+	sched_clock_early_fini();
+	pr_info("sched clock early is finished, offset [%lld.%09lds]\n", t, r);
+	sched_clock_early_enabled = false;
+}
+#endif /* CONFIG_X86_TSC */
+
 void __init tsc_init(void)
 {
 	u64 lpj, cyc;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 4/4] x86/tsc: use tsc early
       [not found]   ` <CALBSrqBKsojGGpe85GOg7jda-SJHLrR=pS-Pg-xa0SUg7j3OQA@mail.gmail.com>
@ 2017-08-30 21:21     ` Fenghua Yu
  2017-08-30 21:32       ` Pasha Tatashin
  0 siblings, 1 reply; 26+ messages in thread
From: Fenghua Yu @ 2017-08-30 21:21 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd,
	pasha.tatashin, x86, linux-kernel, mingo, peterz, tglx, hpa,
	douly.fnst

On Wed, Aug 30, 2017 at 02:12:09PM -0700, Fenghua Yu wrote:
> +static struct cyc2ns_data  cyc2ns_early;
> +static bool sched_clock_early_enabled;

Should these two varaibles be "__initdata"?

> +u64 sched_clock_early(void)
This function is only called during boot time. Should it
be a "__init" function?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 4/4] x86/tsc: use tsc early
  2017-08-30 21:21     ` Fenghua Yu
@ 2017-08-30 21:32       ` Pasha Tatashin
  0 siblings, 0 replies; 26+ messages in thread
From: Pasha Tatashin @ 2017-08-30 21:32 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, peterz, tglx, hpa, douly.fnst

Hi Fenghua,

Thank you for looking at this. Unfortunately I can't mark either of them 
__init because sched_clock_early() is called from
	u64 sched_clock_cpu(int cpu)

Which is around for the live of the system.

Thank you,
Pasha

On 08/30/2017 05:21 PM, Fenghua Yu wrote:
> On Wed, Aug 30, 2017 at 02:12:09PM -0700, Fenghua Yu wrote:
>> +static struct cyc2ns_data  cyc2ns_early;
>> +static bool sched_clock_early_enabled;
> 
> Should these two varaibles be "__initdata"?
> 
>> +u64 sched_clock_early(void)
> This function is only called during boot time. Should it
> be a "__init" function?
> 
> Thanks.
> 
> -Fenghua
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-08-30 18:03 ` [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot Pavel Tatashin
@ 2017-09-27 12:58   ` Peter Zijlstra
  2017-09-27 13:10     ` Peter Zijlstra
  2017-10-09 16:34     ` Pavel Tatashin
  2017-09-27 14:45   ` Russell King - ARM Linux
  1 sibling, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-27 12:58 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, tglx, hpa, douly.fnst

On Wed, Aug 30, 2017 at 02:03:22PM -0400, Pavel Tatashin wrote:
> In Linux printk() can output timestamps next to every line.  This is very
> useful for tracking regressions, and finding places that can be optimized.
> However, the timestamps are available only later in boot. On smaller
> machines it is insignificant amount of time, but on larger it can be many
> seconds or even minutes into the boot process.
> 
> This patch adds an interface for platforms with unstable sched clock to
> show timestamps early in boot. In order to get this functionality a
> platform must:
> 
> - Implement u64 sched_clock_early()
>   Clock that returns monotonic time
> 
> - Call sched_clock_early_init()
>   Tells sched clock that the early clock can be used
> 
> - Call sched_clock_early_fini()
>   Tells sched clock that the early clock is finished, and sched clock
>   should hand over the operation to permanent clock.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>

Urgh, that's horrific.

Can't we simply make sched_clock() go earlier? (we're violating "notsc"
in any case and really should kill that option).

Then we can do something like so on top...


---
 include/linux/sched/clock.h |  6 +++++-
 kernel/sched/clock.c        | 42 +++++++++++++++++++++++++++---------------
 2 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
index a55600ffdf4b..986d14a208e7 100644
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -20,9 +20,12 @@ extern u64 running_clock(void);
 extern u64 sched_clock_cpu(int cpu);
 
 
-extern void sched_clock_init(void);
 
 #ifndef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
+static inline void sched_clock_init(void)
+{
+}
+
 static inline void sched_clock_tick(void)
 {
 }
@@ -49,6 +52,7 @@ static inline u64 local_clock(void)
 	return sched_clock();
 }
 #else
+extern void sched_clock_init(void);
 extern int sched_clock_stable(void);
 extern void clear_sched_clock_stable(void);
 
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index ca0f8fc945c6..47d13d37f2f1 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -80,11 +80,6 @@ EXPORT_SYMBOL_GPL(sched_clock);
 
 __read_mostly int sched_clock_running;
 
-void sched_clock_init(void)
-{
-	sched_clock_running = 1;
-}
-
 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
 /*
  * We must start with !__sched_clock_stable because the unstable -> stable
@@ -211,6 +206,31 @@ void clear_sched_clock_stable(void)
 		__clear_sched_clock_stable();
 }
 
+static void __sched_clock_gtod_offset(void)
+{
+	u64 gtod, clock;
+
+	local_irq_disable();
+	gtod = ktime_get_ns();
+	clock = sched_clock();
+	__gtod_offset = (clock + __sched_clock_offset) - gtod;
+	local_irq_enable();
+}
+
+void sched_clock_init(void)
+{
+	/*
+	 * Set __gtod_offset such that once we mark sched_clock_running,
+	 * sched_clock_tick() continues where sched_clock() left off.
+	 *
+	 * Even if TSC is buggered, we're still UP at this point so it
+	 * can't really be out of sync.
+	 */
+	__sched_clock_gtod_offset();
+	barrier();
+	sched_clock_running = 1;
+}
+
 /*
  * We run this as late_initcall() such that it runs after all built-in drivers,
  * notably: acpi_processor and intel_idle, which can mark the TSC as unstable.
@@ -363,7 +383,7 @@ u64 sched_clock_cpu(int cpu)
 		return sched_clock() + __sched_clock_offset;
 
 	if (unlikely(!sched_clock_running))
-		return 0ull;
+		return sched_clock();
 
 	preempt_disable_notrace();
 	scd = cpu_sdc(cpu);
@@ -397,7 +417,6 @@ void sched_clock_tick(void)
 
 void sched_clock_tick_stable(void)
 {
-	u64 gtod, clock;
 
 	if (!sched_clock_stable())
 		return;
@@ -409,11 +428,7 @@ void sched_clock_tick_stable(void)
 	 * good moment to update our __gtod_offset. Because once we find the
 	 * TSC to be unstable, any computation will be computing crap.
 	 */
-	local_irq_disable();
-	gtod = ktime_get_ns();
-	clock = sched_clock();
-	__gtod_offset = (clock + __sched_clock_offset) - gtod;
-	local_irq_enable();
+	__sched_clock_gtod_offset();
 }
 
 /*
@@ -448,9 +463,6 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
 
 u64 sched_clock_cpu(int cpu)
 {
-	if (unlikely(!sched_clock_running))
-		return 0;
-
 	return sched_clock();
 }
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 12:58   ` Peter Zijlstra
@ 2017-09-27 13:10     ` Peter Zijlstra
  2017-09-27 13:16       ` Pasha Tatashin
  2017-10-09 16:34     ` Pavel Tatashin
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-27 13:10 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, tglx, hpa, douly.fnst

On Wed, Sep 27, 2017 at 02:58:57PM +0200, Peter Zijlstra wrote:
> (we're violating "notsc" in any case and really should kill that
> option).

Something like so; in particular simple_udelay_calibrate() will issue
RDTSC _way_ early, so there is absolutely no point in then pretending we
can't use RDTSC for sched_clock.

---

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 796d96bb0821..1dd3849a42ca 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -37,13 +37,6 @@ EXPORT_SYMBOL(tsc_khz);
  */
 static int __read_mostly tsc_unstable;
 
-/* native_sched_clock() is called before tsc_init(), so
-   we must start with the TSC soft disabled to prevent
-   erroneous rdtsc usage on !boot_cpu_has(X86_FEATURE_TSC) processors */
-static int __read_mostly tsc_disabled = -1;
-
-static DEFINE_STATIC_KEY_FALSE(__use_tsc);
-
 int tsc_clocksource_reliable;
 
 static u32 art_to_tsc_numerator;
@@ -191,24 +184,7 @@ static void set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long tsc_
  */
 u64 native_sched_clock(void)
 {
-	if (static_branch_likely(&__use_tsc)) {
-		u64 tsc_now = rdtsc();
-
-		/* return the value in ns */
-		return cycles_2_ns(tsc_now);
-	}
-
-	/*
-	 * Fall back to jiffies if there's no TSC available:
-	 * ( But note that we still use it if the TSC is marked
-	 *   unstable. We do this because unlike Time Of Day,
-	 *   the scheduler clock tolerates small errors and it's
-	 *   very important for it to be as fast as the platform
-	 *   can achieve it. )
-	 */
-
-	/* No locking but a rare wrong value is not a big deal: */
-	return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ);
+	return cycles_2_ns(rdtsc());
 }
 
 /*
@@ -244,27 +220,6 @@ int check_tsc_unstable(void)
 }
 EXPORT_SYMBOL_GPL(check_tsc_unstable);
 
-#ifdef CONFIG_X86_TSC
-int __init notsc_setup(char *str)
-{
-	pr_warn("Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely\n");
-	tsc_disabled = 1;
-	return 1;
-}
-#else
-/*
- * disable flag for tsc. Takes effect by clearing the TSC cpu flag
- * in cpu/common.c
- */
-int __init notsc_setup(char *str)
-{
-	setup_clear_cpu_cap(X86_FEATURE_TSC);
-	return 1;
-}
-#endif
-
-__setup("notsc", notsc_setup);
-
 static int no_sched_irq_time;
 
 static int __init tsc_setup(char *str)
@@ -1229,7 +1184,7 @@ static void tsc_refine_calibration_work(struct work_struct *work)
 
 static int __init init_tsc_clocksource(void)
 {
-	if (!boot_cpu_has(X86_FEATURE_TSC) || tsc_disabled > 0 || !tsc_khz)
+	if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
 		return 0;
 
 	if (tsc_clocksource_reliable)
@@ -1311,14 +1266,6 @@ void __init tsc_init(void)
 		set_cyc2ns_scale(tsc_khz, cpu, cyc);
 	}
 
-	if (tsc_disabled > 0)
-		return;
-
-	/* now allow native_sched_clock() to use rdtsc */
-
-	tsc_disabled = 0;
-	static_branch_enable(&__use_tsc);
-
 	if (!no_sched_irq_time)
 		enable_sched_clock_irqtime();
 
@@ -1348,7 +1295,7 @@ unsigned long calibrate_delay_is_known(void)
 	int sibling, cpu = smp_processor_id();
 	struct cpumask *mask = topology_core_cpumask(cpu);
 
-	if (!tsc_disabled && !cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC))
+	if (!cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC))
 		return 0;
 
 	if (!mask)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 13:10     ` Peter Zijlstra
@ 2017-09-27 13:16       ` Pasha Tatashin
  2017-09-27 13:52         ` Dou Liyang
  0 siblings, 1 reply; 26+ messages in thread
From: Pasha Tatashin @ 2017-09-27 13:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, tglx, hpa, douly.fnst

Hi Peter,

I am totally happy with removing notsc. This certainly simplifies the 
sched_clock code. Are there any issues with removing existing kernel 
parameters that I should be aware of?

Thank you,
Pasha

On 09/27/2017 09:10 AM, Peter Zijlstra wrote:
> On Wed, Sep 27, 2017 at 02:58:57PM +0200, Peter Zijlstra wrote:
>> (we're violating "notsc" in any case and really should kill that
>> option).
> 
> Something like so; in particular simple_udelay_calibrate() will issue
> RDTSC _way_ early, so there is absolutely no point in then pretending we
> can't use RDTSC for sched_clock.
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 13:16       ` Pasha Tatashin
@ 2017-09-27 13:52         ` Dou Liyang
  2017-09-27 17:13           ` Pasha Tatashin
  2017-09-27 18:05           ` Peter Zijlstra
  0 siblings, 2 replies; 26+ messages in thread
From: Dou Liyang @ 2017-09-27 13:52 UTC (permalink / raw)
  To: Pasha Tatashin, Peter Zijlstra
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, tglx, hpa

Hi Pasha, Peter

At 09/27/2017 09:16 PM, Pasha Tatashin wrote:
> Hi Peter,
>
> I am totally happy with removing notsc. This certainly simplifies the
> sched_clock code. Are there any issues with removing existing kernel
> parameters that I should be aware of?
>

We do not want to do that. Because, we use "notsc" to support Dynamic
Reconfiguration[1].

AFAIK, this feature enables hot-add system board which contains CPUs
and memories. But the CPUs in different board may have different TSCs
which are not consistent with the TSC from the existing CPUs. If we 
hot-add a board directly, the machine may happen the inconsistency of
TSC.

We make our effort to specify the same TSC value as existing one through
hardware and firmware, but it is hard. So we recommend to specify
"notsc" option in command line for users who want to use Dynamic
Reconfiguration.

[1] 
http://www.fujitsu.com/global/products/computing/servers/mission-critical/primequest/technology/availability/dynamic-reconfiguration.html

Thanks,

	dou

> Thank you,
> Pasha
>
> On 09/27/2017 09:10 AM, Peter Zijlstra wrote:
>> On Wed, Sep 27, 2017 at 02:58:57PM +0200, Peter Zijlstra wrote:
>>> (we're violating "notsc" in any case and really should kill that
>>> option).
>>
>> Something like so; in particular simple_udelay_calibrate() will issue
>> RDTSC _way_ early, so there is absolutely no point in then pretending we
>> can't use RDTSC for sched_clock.
>>
>
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-08-30 18:03 ` [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot Pavel Tatashin
  2017-09-27 12:58   ` Peter Zijlstra
@ 2017-09-27 14:45   ` Russell King - ARM Linux
  2017-09-27 17:10     ` Pasha Tatashin
  2017-09-27 18:11     ` Peter Zijlstra
  1 sibling, 2 replies; 26+ messages in thread
From: Russell King - ARM Linux @ 2017-09-27 14:45 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, peterz, tglx, hpa, douly.fnst

On Wed, Aug 30, 2017 at 02:03:22PM -0400, Pavel Tatashin wrote:
> In Linux printk() can output timestamps next to every line.  This is very
> useful for tracking regressions, and finding places that can be optimized.
> However, the timestamps are available only later in boot. On smaller
> machines it is insignificant amount of time, but on larger it can be many
> seconds or even minutes into the boot process.

The sched_clock work I did for ARM could be setup really early at boot,
from setup_arch().  I tried to encourage platforms to do that, but all
my encouragement fell on deaf ears - most people setup the sched_clock
source along side the time initialisation on ARM.

I don't think we need yet another "early" mechanism to solve this problem,
we just need people to use the existing mechanism to register their
sched_clock implementation earlier.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 14:45   ` Russell King - ARM Linux
@ 2017-09-27 17:10     ` Pasha Tatashin
  2017-09-27 18:11     ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Pasha Tatashin @ 2017-09-27 17:10 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, peterz, tglx, hpa, douly.fnst,
	Peter Zijlstra, Dou Liyang

Hi Russell,

This might be so for ARM, and in fact if you look at my SPARC 
implementation, I simply made source clock initialize early, so regular 
sched_clock() is used. As on SPARC, we use either %tick or %stick 
registers with frequency determined via OpenFrimware. But, on x86 there 
are dozen ways clock sources are setup, and some of them available quiet 
late in boot because of various dependencies. So, my early clock 
initialization for x86 (and expendable to other platforms with unstable 
clocks) is to make it available when TSC is available, which is 
determined by already existing kernel functionality in 
simple_udelay_calibration().

My goal was not to introduce any regressions to the already complex (in 
terms of number of branches and loads) sched_clock_cpu(), therefore I 
added a new function and avoided any extra branches through out the life 
of the system. I could mitigate some of that by using static branches, 
but imo the current approach is better.

Pasha

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 13:52         ` Dou Liyang
@ 2017-09-27 17:13           ` Pasha Tatashin
  2017-09-27 18:05           ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Pasha Tatashin @ 2017-09-27 17:13 UTC (permalink / raw)
  To: Dou Liyang, Peter Zijlstra
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, tglx, hpa

Hi Dou,

This makes sense. The current sched_clock_early() approach does not 
break it because with notsc TSC is used early in boot, and later 
stopped. But, notsc must stay.

Peter,

So, we could either expend sched_clock() with another static branch for 
early clock, or use what I proposed. IMO, the later is better, but 
either way works for me.

Thank you,
Pasha

On 09/27/2017 09:52 AM, Dou Liyang wrote:
> Hi Pasha, Peter
> 
> At 09/27/2017 09:16 PM, Pasha Tatashin wrote:
>> Hi Peter,
>>
>> I am totally happy with removing notsc. This certainly simplifies the
>> sched_clock code. Are there any issues with removing existing kernel
>> parameters that I should be aware of?
>>
> 
> We do not want to do that. Because, we use "notsc" to support Dynamic
> Reconfiguration[1].
> 
> AFAIK, this feature enables hot-add system board which contains CPUs
> and memories. But the CPUs in different board may have different TSCs
> which are not consistent with the TSC from the existing CPUs. If we 
> hot-add a board directly, the machine may happen the inconsistency of
> TSC.
> 
> We make our effort to specify the same TSC value as existing one through
> hardware and firmware, but it is hard. So we recommend to specify
> "notsc" option in command line for users who want to use Dynamic
> Reconfiguration.
> 
> [1] 
> http://www.fujitsu.com/global/products/computing/servers/mission-critical/primequest/technology/availability/dynamic-reconfiguration.html 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 13:52         ` Dou Liyang
  2017-09-27 17:13           ` Pasha Tatashin
@ 2017-09-27 18:05           ` Peter Zijlstra
  2017-09-27 18:09             ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-27 18:05 UTC (permalink / raw)
  To: Dou Liyang
  Cc: Pasha Tatashin, linux, schwidefsky, heiko.carstens, john.stultz,
	sboyd, x86, linux-kernel, mingo, tglx, hpa

On Wed, Sep 27, 2017 at 09:52:36PM +0800, Dou Liyang wrote:
> We do not want to do that. Because, we use "notsc" to support Dynamic
> Reconfiguration[1].
> 
> AFAIK, this feature enables hot-add system board which contains CPUs
> and memories. But the CPUs in different board may have different TSCs
> which are not consistent with the TSC from the existing CPUs. If we hot-add
> a board directly, the machine may happen the inconsistency of
> TSC.
> 
> We make our effort to specify the same TSC value as existing one through
> hardware and firmware, but it is hard. So we recommend to specify
> "notsc" option in command line for users who want to use Dynamic
> Reconfiguration.

Oh gawd, that's horrific. And in my book a good reason to kill that
option.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 18:05           ` Peter Zijlstra
@ 2017-09-27 18:09             ` Peter Zijlstra
  2017-09-28 10:03               ` Dou Liyang
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-27 18:09 UTC (permalink / raw)
  To: Dou Liyang
  Cc: Pasha Tatashin, linux, schwidefsky, heiko.carstens, john.stultz,
	sboyd, x86, linux-kernel, mingo, tglx, hpa

On Wed, Sep 27, 2017 at 08:05:48PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 27, 2017 at 09:52:36PM +0800, Dou Liyang wrote:
> > We do not want to do that. Because, we use "notsc" to support Dynamic
> > Reconfiguration[1].
> > 
> > AFAIK, this feature enables hot-add system board which contains CPUs
> > and memories. But the CPUs in different board may have different TSCs
> > which are not consistent with the TSC from the existing CPUs. If we hot-add
> > a board directly, the machine may happen the inconsistency of
> > TSC.
> > 
> > We make our effort to specify the same TSC value as existing one through
> > hardware and firmware, but it is hard. So we recommend to specify
> > "notsc" option in command line for users who want to use Dynamic
> > Reconfiguration.
> 
> Oh gawd, that's horrific. And in my book a good reason to kill that
> option.

That is, even with unsynchronized TSC we're better off using RDTSC. The
whole mess in kernel/sched/clock.c is all about getting semi sensible
results out of unsynchronized TSC.

There really is no reason to artificially kill TSC usage.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 14:45   ` Russell King - ARM Linux
  2017-09-27 17:10     ` Pasha Tatashin
@ 2017-09-27 18:11     ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-27 18:11 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Pavel Tatashin, schwidefsky, heiko.carstens, john.stultz, sboyd,
	x86, linux-kernel, mingo, tglx, hpa, douly.fnst

On Wed, Sep 27, 2017 at 03:45:06PM +0100, Russell King - ARM Linux wrote:
> On Wed, Aug 30, 2017 at 02:03:22PM -0400, Pavel Tatashin wrote:
> > In Linux printk() can output timestamps next to every line.  This is very
> > useful for tracking regressions, and finding places that can be optimized.
> > However, the timestamps are available only later in boot. On smaller
> > machines it is insignificant amount of time, but on larger it can be many
> > seconds or even minutes into the boot process.
> 
> The sched_clock work I did for ARM could be setup really early at boot,
> from setup_arch().  I tried to encourage platforms to do that, but all
> my encouragement fell on deaf ears - most people setup the sched_clock
> source along side the time initialisation on ARM.
> 
> I don't think we need yet another "early" mechanism to solve this problem,
> we just need people to use the existing mechanism to register their
> sched_clock implementation earlier.

x86 is a bit 'special' in the whole sched_clock department. But yes, we
should very much make the regular sched_clock() happen earlier.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 18:09             ` Peter Zijlstra
@ 2017-09-28 10:03               ` Dou Liyang
  2017-09-28 11:58                 ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Dou Liyang @ 2017-09-28 10:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Pasha Tatashin, linux, schwidefsky, heiko.carstens, john.stultz,
	sboyd, x86, linux-kernel, mingo, tglx, hpa


Hi Peter,

At 09/28/2017 02:09 AM, Peter Zijlstra wrote:
> On Wed, Sep 27, 2017 at 08:05:48PM +0200, Peter Zijlstra wrote:
>> On Wed, Sep 27, 2017 at 09:52:36PM +0800, Dou Liyang wrote:
>>> We do not want to do that. Because, we use "notsc" to support Dynamic
>>> Reconfiguration[1].
>>>
>>> AFAIK, this feature enables hot-add system board which contains CPUs
>>> and memories. But the CPUs in different board may have different TSCs
>>> which are not consistent with the TSC from the existing CPUs. If we hot-add
>>> a board directly, the machine may happen the inconsistency of
>>> TSC.
>>>
>>> We make our effort to specify the same TSC value as existing one through
>>> hardware and firmware, but it is hard. So we recommend to specify
>>> "notsc" option in command line for users who want to use Dynamic
>>> Reconfiguration.
>>
>> Oh gawd, that's horrific. And in my book a good reason to kill that
>> option.
>
> That is, even with unsynchronized TSC we're better off using RDTSC. The
> whole mess in kernel/sched/clock.c is all about getting semi sensible
> results out of unsynchronized TSC.
>

It will be best if we can support TSC sync capability in x86, but seems
is not easy.

Thanks,

	dou.

> There really is no reason to artificially kill TSC usage.
>
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-28 10:03               ` Dou Liyang
@ 2017-09-28 11:58                 ` Peter Zijlstra
  2017-09-28 12:12                   ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2017-09-28 11:58 UTC (permalink / raw)
  To: Dou Liyang
  Cc: Pasha Tatashin, linux, schwidefsky, heiko.carstens, john.stultz,
	sboyd, x86, linux-kernel, mingo, tglx, hpa

On Thu, Sep 28, 2017 at 06:03:05PM +0800, Dou Liyang wrote:
> At 09/28/2017 02:09 AM, Peter Zijlstra wrote:
> > On Wed, Sep 27, 2017 at 08:05:48PM +0200, Peter Zijlstra wrote:
> > > On Wed, Sep 27, 2017 at 09:52:36PM +0800, Dou Liyang wrote:
> > > > We do not want to do that. Because, we use "notsc" to support Dynamic
> > > > Reconfiguration[1].
> > > > 
> > > > AFAIK, this feature enables hot-add system board which contains CPUs
> > > > and memories. But the CPUs in different board may have different TSCs
> > > > which are not consistent with the TSC from the existing CPUs. If we hot-add
> > > > a board directly, the machine may happen the inconsistency of
> > > > TSC.
> > > > 
> > > > We make our effort to specify the same TSC value as existing one through
> > > > hardware and firmware, but it is hard. So we recommend to specify
> > > > "notsc" option in command line for users who want to use Dynamic
> > > > Reconfiguration.
> > > 
> > > Oh gawd, that's horrific. And in my book a good reason to kill that
> > > option.
> > 
> > That is, even with unsynchronized TSC we're better off using RDTSC. The
> > whole mess in kernel/sched/clock.c is all about getting semi sensible
> > results out of unsynchronized TSC.
> > 
> 
> It will be best if we can support TSC sync capability in x86, but seems
> is not easy.

Sure, your hardware achieving sync would be best, but even if it does
not, we can still use TSC. Using notsc simple because you fail to sync
TSCs is quite crazy.

The thing is, we need to support unsync'ed TSC in any case, because
older chips (pre Nehalem) didn't have synchronized TSC in any case, and
it still happens on recent chips if the BIOS mucks it up, which happens
surprisingly often :-(

I would suggest you try your reconfigurable setup with "tsc=unstable"
and see if that works for you. That marks the TSC unconditionally
unstable at boot and avoids any further wobbles once the TSC watchdog
notices (although that too _should_ more or less work).

I do however hope you have a custom clocksource driver placed at higher
priority than the HPET.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-28 11:58                 ` Peter Zijlstra
@ 2017-09-28 12:12                   ` Thomas Gleixner
  2017-09-28 13:11                     ` Pasha Tatashin
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2017-09-28 12:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dou Liyang, Pasha Tatashin, linux, schwidefsky, heiko.carstens,
	john.stultz, sboyd, x86, linux-kernel, mingo, hpa

On Thu, 28 Sep 2017, Peter Zijlstra wrote:

> On Thu, Sep 28, 2017 at 06:03:05PM +0800, Dou Liyang wrote:
> > At 09/28/2017 02:09 AM, Peter Zijlstra wrote:
> > > On Wed, Sep 27, 2017 at 08:05:48PM +0200, Peter Zijlstra wrote:
> > > > On Wed, Sep 27, 2017 at 09:52:36PM +0800, Dou Liyang wrote:
> > > > > We do not want to do that. Because, we use "notsc" to support Dynamic
> > > > > Reconfiguration[1].
> > > > > 
> > > > > AFAIK, this feature enables hot-add system board which contains CPUs
> > > > > and memories. But the CPUs in different board may have different TSCs
> > > > > which are not consistent with the TSC from the existing CPUs. If we hot-add
> > > > > a board directly, the machine may happen the inconsistency of
> > > > > TSC.
> > > > > 
> > > > > We make our effort to specify the same TSC value as existing one through
> > > > > hardware and firmware, but it is hard. So we recommend to specify
> > > > > "notsc" option in command line for users who want to use Dynamic
> > > > > Reconfiguration.
> > > > 
> > > > Oh gawd, that's horrific. And in my book a good reason to kill that
> > > > option.
> > > 
> > > That is, even with unsynchronized TSC we're better off using RDTSC. The
> > > whole mess in kernel/sched/clock.c is all about getting semi sensible
> > > results out of unsynchronized TSC.
> > > 
> > 
> > It will be best if we can support TSC sync capability in x86, but seems
> > is not easy.
> 
> Sure, your hardware achieving sync would be best, but even if it does
> not, we can still use TSC. Using notsc simple because you fail to sync
> TSCs is quite crazy.
> 
> The thing is, we need to support unsync'ed TSC in any case, because
> older chips (pre Nehalem) didn't have synchronized TSC in any case, and
> it still happens on recent chips if the BIOS mucks it up, which happens
> surprisingly often :-(
> 
> I would suggest you try your reconfigurable setup with "tsc=unstable"
> and see if that works for you. That marks the TSC unconditionally
> unstable at boot and avoids any further wobbles once the TSC watchdog
> notices (although that too _should_ more or less work).

That should do the trick nicely and we might just end up converting notsc
to tsc=unstable silently so we can avoid the bike shed discussions about
removing it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-28 12:12                   ` Thomas Gleixner
@ 2017-09-28 13:11                     ` Pasha Tatashin
  2017-09-29 15:00                       ` Dou Liyang
  0 siblings, 1 reply; 26+ messages in thread
From: Pasha Tatashin @ 2017-09-28 13:11 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Dou Liyang, linux, schwidefsky, heiko.carstens, john.stultz,
	sboyd, x86, linux-kernel, mingo, hpa

>>> It will be best if we can support TSC sync capability in x86, but seems
>>> is not easy.
>>
>> Sure, your hardware achieving sync would be best, but even if it does
>> not, we can still use TSC. Using notsc simple because you fail to sync
>> TSCs is quite crazy.
>>
>> The thing is, we need to support unsync'ed TSC in any case, because
>> older chips (pre Nehalem) didn't have synchronized TSC in any case, and
>> it still happens on recent chips if the BIOS mucks it up, which happens
>> surprisingly often :-(
>>
>> I would suggest you try your reconfigurable setup with "tsc=unstable"
>> and see if that works for you. That marks the TSC unconditionally
>> unstable at boot and avoids any further wobbles once the TSC watchdog
>> notices (although that too _should_ more or less work).
> 
> That should do the trick nicely and we might just end up converting notsc
> to tsc=unstable silently so we can avoid the bike shed discussions about
> removing it.
> 

Ok, I will start working on converting notsc to unstable, and modify my 
patches to do what Peter suggested earlier. In the mean time, I'd like 
to hear from Dou if this setup works with dynamic reconfig.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-28 13:11                     ` Pasha Tatashin
@ 2017-09-29 15:00                       ` Dou Liyang
  2017-10-18 10:01                         ` Dou Liyang
  0 siblings, 1 reply; 26+ messages in thread
From: Dou Liyang @ 2017-09-29 15:00 UTC (permalink / raw)
  To: Pasha Tatashin, Thomas Gleixner, Peter Zijlstra
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, hpa

Hi, Pasha

At 09/28/2017 09:11 PM, Pasha Tatashin wrote:
>>>> It will be best if we can support TSC sync capability in x86, but seems
>>>> is not easy.
>>>
>>> Sure, your hardware achieving sync would be best, but even if it does
>>> not, we can still use TSC. Using notsc simple because you fail to sync
>>> TSCs is quite crazy.
>>>
>>> The thing is, we need to support unsync'ed TSC in any case, because
>>> older chips (pre Nehalem) didn't have synchronized TSC in any case, and
>>> it still happens on recent chips if the BIOS mucks it up, which happens
>>> surprisingly often :-(
>>>
>>> I would suggest you try your reconfigurable setup with "tsc=unstable"
>>> and see if that works for you. That marks the TSC unconditionally
>>> unstable at boot and avoids any further wobbles once the TSC watchdog
>>> notices (although that too _should_ more or less work).
>>
>> That should do the trick nicely and we might just end up converting notsc
>> to tsc=unstable silently so we can avoid the bike shed discussions about
>> removing it.
>>
>
> Ok, I will start working on converting notsc to unstable, and modify my
> patches to do what Peter suggested earlier. In the mean time, I'd like
> to hear from Dou if this setup works with dynamic reconfig.
>

OK, I will do it, But, October 1 is our national holiday, I will in 
holiday, and I just returned the test machine. :-(

May reply you in middle of the October.

Thanks,

	dou.

> Thank you,
> Pasha
>
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-27 12:58   ` Peter Zijlstra
  2017-09-27 13:10     ` Peter Zijlstra
@ 2017-10-09 16:34     ` Pavel Tatashin
  2017-10-18 21:01       ` Thomas Gleixner
  1 sibling, 1 reply; 26+ messages in thread
From: Pavel Tatashin @ 2017-10-09 16:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, tglx, hpa, douly.fnst

> 
> Urgh, that's horrific.
> 
> Can't we simply make sched_clock() go earlier? (we're violating "notsc"
> in any case and really should kill that option).
> 
> Then we can do something like so on top...
> 

Hi Peter,

I've been thinking about your proposal, and I have one concern: 
sched_clock() can be implemented two ways either via 
pv_time_ops.sched_clock vectors when CONFIG_PARAVIRT is defined

sched_clock()
    paravirt_sched_clock()
       PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);

Or native via alias

sched_clock()
	native_sched_clock()

Using sched_clock_early() approach makes early time stamps work with 
both cases when it is determined that tsc can be used 
simple_udelay_calibration(). (As we agreed I am going to change notsc to 
use tsc=unstable path.)

It may be not the most efficient clock for some virtualizations to use 
rdtsc directly, but since this is for early boot only, and not something 
that is going to be used after machine is booted it is OK.

Pavel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-09-29 15:00                       ` Dou Liyang
@ 2017-10-18 10:01                         ` Dou Liyang
  2017-10-18 13:38                           ` Pavel Tatashin
  0 siblings, 1 reply; 26+ messages in thread
From: Dou Liyang @ 2017-10-18 10:01 UTC (permalink / raw)
  To: Pasha Tatashin, Thomas Gleixner, Peter Zijlstra
  Cc: linux, schwidefsky, heiko.carstens, john.stultz, sboyd, x86,
	linux-kernel, mingo, hpa

Hi Pasha,

Sorry to reply you so late.

I have test the TSC sync in our machine with DR(Dynamic Reconfiguration)
   Linux kernel: Linux-4.14.0-rc5
   NUMA nodes: 4 node.
   Use clock_gettime() to reach nano-second accuracy.

It is OK that we setup our reconfigurable with "tsc=unstable".

Thanks,
	dou.

At 09/29/2017 11:00 PM, Dou Liyang wrote:
> Hi, Pasha
>
> At 09/28/2017 09:11 PM, Pasha Tatashin wrote:
>>>>> It will be best if we can support TSC sync capability in x86, but
>>>>> seems
>>>>> is not easy.
>>>>
>>>> Sure, your hardware achieving sync would be best, but even if it does
>>>> not, we can still use TSC. Using notsc simple because you fail to sync
>>>> TSCs is quite crazy.
>>>>
>>>> The thing is, we need to support unsync'ed TSC in any case, because
>>>> older chips (pre Nehalem) didn't have synchronized TSC in any case, and
>>>> it still happens on recent chips if the BIOS mucks it up, which happens
>>>> surprisingly often :-(
>>>>
>>>> I would suggest you try your reconfigurable setup with "tsc=unstable"
>>>> and see if that works for you. That marks the TSC unconditionally
>>>> unstable at boot and avoids any further wobbles once the TSC watchdog
>>>> notices (although that too _should_ more or less work).
>>>
>>> That should do the trick nicely and we might just end up converting
>>> notsc
>>> to tsc=unstable silently so we can avoid the bike shed discussions about
>>> removing it.
>>>
>>
>> Ok, I will start working on converting notsc to unstable, and modify my
>> patches to do what Peter suggested earlier. In the mean time, I'd like
>> to hear from Dou if this setup works with dynamic reconfig.
>>
>
> OK, I will do it, But, October 1 is our national holiday, I will in
> holiday, and I just returned the test machine. :-(
>
> May reply you in middle of the October.


>
> Thanks,
>
>     dou.
>
>> Thank you,
>> Pasha
>>
>>
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-10-18 10:01                         ` Dou Liyang
@ 2017-10-18 13:38                           ` Pavel Tatashin
  0 siblings, 0 replies; 26+ messages in thread
From: Pavel Tatashin @ 2017-10-18 13:38 UTC (permalink / raw)
  To: Dou Liyang
  Cc: Thomas Gleixner, Peter Zijlstra, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	hpa

On Wed, Oct 18, 2017 at 6:01 AM, Dou Liyang <douly.fnst@cn.fujitsu.com> wrote:
> Hi Pasha,
>
> Sorry to reply you so late.
>
> I have test the TSC sync in our machine with DR(Dynamic Reconfiguration)
>   Linux kernel: Linux-4.14.0-rc5
>   NUMA nodes: 4 node.
>   Use clock_gettime() to reach nano-second accuracy.
>
> It is OK that we setup our reconfigurable with "tsc=unstable".
>

Excellent, thank you very much for confirming this Dou!

Pavel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot
  2017-10-09 16:34     ` Pavel Tatashin
@ 2017-10-18 21:01       ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2017-10-18 21:01 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: Peter Zijlstra, linux, schwidefsky, heiko.carstens, john.stultz,
	sboyd, x86, linux-kernel, mingo, hpa, douly.fnst

On Mon, 9 Oct 2017, Pavel Tatashin wrote:
> > Urgh, that's horrific.
> > 
> > Can't we simply make sched_clock() go earlier? (we're violating "notsc"
> > in any case and really should kill that option).
> > 
> > Then we can do something like so on top...
> > 
> 
> Hi Peter,
> 
> I've been thinking about your proposal, and I have one concern: sched_clock()
> can be implemented two ways either via pv_time_ops.sched_clock vectors when
> CONFIG_PARAVIRT is defined
> 
> sched_clock()
>    paravirt_sched_clock()
>       PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
> 
> Or native via alias
> 
> sched_clock()
> 	native_sched_clock()
> 
> Using sched_clock_early() approach makes early time stamps work with both
> cases when it is determined that tsc can be used simple_udelay_calibration().
> (As we agreed I am going to change notsc to use tsc=unstable path.)
> 
> It may be not the most efficient clock for some virtualizations to use rdtsc
> directly, but since this is for early boot only, and not something that is
> going to be used after machine is booted it is OK.

The early boot time stamps are not really important for the production use
case where you care about performance.

So it's simple enough to refactor the code so it supports early time stamps
under the following conditons:

   1) TSC is stable and constant frequency

   2) TSC frequency is known early

   3) paravirt sched clock is not used

      - running on bare metal

      - forcing the paravirt stuff to not override pv_time_ops.sched_clock
        via an command line parameter

That's a valid restriction for the intended use case, i.e. timing of early
boot process.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-10-18 21:02 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-30 18:03 [PATCH v6 0/4] Early boot time stamps for x86 Pavel Tatashin
2017-08-30 18:03 ` [PATCH v6 1/4] sched/clock: interface to allow timestamps early in boot Pavel Tatashin
2017-09-27 12:58   ` Peter Zijlstra
2017-09-27 13:10     ` Peter Zijlstra
2017-09-27 13:16       ` Pasha Tatashin
2017-09-27 13:52         ` Dou Liyang
2017-09-27 17:13           ` Pasha Tatashin
2017-09-27 18:05           ` Peter Zijlstra
2017-09-27 18:09             ` Peter Zijlstra
2017-09-28 10:03               ` Dou Liyang
2017-09-28 11:58                 ` Peter Zijlstra
2017-09-28 12:12                   ` Thomas Gleixner
2017-09-28 13:11                     ` Pasha Tatashin
2017-09-29 15:00                       ` Dou Liyang
2017-10-18 10:01                         ` Dou Liyang
2017-10-18 13:38                           ` Pavel Tatashin
2017-10-09 16:34     ` Pavel Tatashin
2017-10-18 21:01       ` Thomas Gleixner
2017-09-27 14:45   ` Russell King - ARM Linux
2017-09-27 17:10     ` Pasha Tatashin
2017-09-27 18:11     ` Peter Zijlstra
2017-08-30 18:03 ` [PATCH v6 2/4] time: sync read_boot_clock64() with persistent clock Pavel Tatashin
2017-08-30 18:03 ` [PATCH v6 3/4] x86/time: read_boot_clock64() implementation Pavel Tatashin
2017-08-30 18:03 ` [PATCH v6 4/4] x86/tsc: use tsc early Pavel Tatashin
     [not found]   ` <CALBSrqBKsojGGpe85GOg7jda-SJHLrR=pS-Pg-xa0SUg7j3OQA@mail.gmail.com>
2017-08-30 21:21     ` Fenghua Yu
2017-08-30 21:32       ` Pasha Tatashin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.