All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/7] Early boot time stamps for x86
@ 2018-02-09 21:11 Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 1/7] x86/tsc: remove tsc_disabled flag Pavel Tatashin
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Pavel Tatashin @ 2018-02-09 21:11 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	tglx, hpa, douly.fnst, peterz, prarit

changelog
---------
v10 - v9
	- Added another patch to this series that removes dependency
	  between KVM clock, and memblock allocator. The benefit is that
	  all clocks can now be initialized even earlier.
v9 - v8
	- Addressed more comments from Dou Liyang

v8 - v7
	- Addressed comments from Dou Liyang:
	- Moved tsc_early_init() and tsc_early_fini() to be all inside
	  tsc.c, and changed them to be static.
	- Removed warning when notsc parameter is used.
	- Merged with:
	  https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

v7 - v6
	- Removed tsc_disabled flag, now notsc is equivalent of
	  tsc=unstable
	- Simplified changes to sched/clock.c, by removing the
	  sched_clock_early() and friends as requested by Peter Zijlstra.
	  We know always use sched_clock()
	- Modified x86 sched_clock() to return either early boot time or
	  regular.
	- Added another example why ealry boot time is important

v5 - v6
	- Added a new patch:
		time: sync read_boot_clock64() with persistent clock
	  Which fixes missing __init macro, and enabled time discrepancy
	  fix that was noted by Thomas Gleixner
	- Split "x86/time: read_boot_clock64() implementation" into a
	  separate patch

v4 - v5
	- Fix compiler warnings on systems with stable clocks.

v3 - v4
	- Fixed tsc_early_fini() call to be in the 2nd patch as reported
	  by Dou Liyang
	- Improved comment before __use_sched_clock_early to explain why
	  we need both booleans.
	- Simplified valid_clock logic in read_boot_clock64().

v2 - v3
	- Addressed comment from Thomas Gleixner
	- Timestamps are available a little later in boot but still much
	  earlier than in mainline. This significantly simplified this
	  work.

v1 - v2
	In patch "x86/tsc: tsc early":
	- added tsc_adjusted_early()
	- fixed 32-bit compile error use do_div()

Adding early boot time stamps support for x86 machines.
SPARC patches for early boot time stamps are already integrated into
mainline linux.

Sample output
-------------
Before:
https://paste.ubuntu.com/26133428/

After:
https://paste.ubuntu.com/26133523/

For exaples how early time stamps are used, see this work:
Example 1:
https://lwn.net/Articles/734374/
- Without early boot time stamps we would not know about the extra time
  that is spent zeroing struct pages early in boot even when deferred
  page initialization.

Example 2:
https://patchwork.kernel.org/patch/10021247/
- If early boot timestamps were available, the engineer who introduced
  this bug would have noticed the extra time that is spent early in boot.

Pavel Tatashin (7):
  x86/tsc: remove tsc_disabled flag
  time: sync read_boot_clock64() with persistent clock
  x86/time: read_boot_clock64() implementation
  sched: early boot clock
  x86/paravirt: add active_sched_clock to pv_time_ops
  x86/tsc: use tsc early
  kvm/x86: remove kvm memblock dependency

 arch/arm/kernel/time.c                |   2 +-
 arch/s390/kernel/time.c               |   2 +-
 arch/x86/include/asm/paravirt.h       |   2 +-
 arch/x86/include/asm/paravirt_types.h |   1 +
 arch/x86/kernel/kvm.c                 |   1 +
 arch/x86/kernel/kvmclock.c            |  64 +++----------------
 arch/x86/kernel/paravirt.c            |   1 +
 arch/x86/kernel/setup.c               |   7 +-
 arch/x86/kernel/time.c                |  30 +++++++++
 arch/x86/kernel/tsc.c                 | 117 ++++++++++++++++++++++++++++------
 arch/x86/xen/time.c                   |   7 +-
 include/linux/timekeeping.h           |   3 +-
 kernel/sched/clock.c                  |  10 ++-
 kernel/time/timekeeping.c             |   8 ++-
 14 files changed, 164 insertions(+), 91 deletions(-)

-- 
2.16.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v10 1/7] x86/tsc: remove tsc_disabled flag
  2018-02-09 21:11 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
@ 2018-02-09 21:11 ` Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 2/7] time: sync read_boot_clock64() with persistent clock Pavel Tatashin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Tatashin @ 2018-02-09 21:11 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	tglx, hpa, douly.fnst, peterz, prarit

tsc_disabled is set when notsc is passed as kernel parameter. The reason we
have notsc is to avoid timing problems on multi-socket systems.  We already
have a mechanism, however, to detect and resolve these issues by invoking
tsc unstable path. Thus, make notsc to behave the same as tsc=unstable.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
---
 arch/x86/kernel/tsc.c | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index fb4302738410..700a61cbafa5 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -38,11 +38,6 @@ EXPORT_SYMBOL(tsc_khz);
  */
 static int __read_mostly tsc_unstable;
 
-/* native_sched_clock() is called before tsc_init(), so
-   we must start with the TSC soft disabled to prevent
-   erroneous rdtsc usage on !boot_cpu_has(X86_FEATURE_TSC) processors */
-static int __read_mostly tsc_disabled = -1;
-
 static DEFINE_STATIC_KEY_FALSE(__use_tsc);
 
 int tsc_clocksource_reliable;
@@ -248,8 +243,7 @@ EXPORT_SYMBOL_GPL(check_tsc_unstable);
 #ifdef CONFIG_X86_TSC
 int __init notsc_setup(char *str)
 {
-	pr_warn("Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely\n");
-	tsc_disabled = 1;
+	mark_tsc_unstable("boot parameter notsc");
 	return 1;
 }
 #else
@@ -1269,7 +1263,7 @@ static void tsc_refine_calibration_work(struct work_struct *work)
 
 static int __init init_tsc_clocksource(void)
 {
-	if (!boot_cpu_has(X86_FEATURE_TSC) || tsc_disabled > 0 || !tsc_khz)
+	if (!boot_cpu_has(X86_FEATURE_TSC) || !tsc_khz)
 		return 0;
 
 	if (check_tsc_unstable())
@@ -1375,12 +1369,6 @@ void __init tsc_init(void)
 		set_cyc2ns_scale(tsc_khz, cpu, cyc);
 	}
 
-	if (tsc_disabled > 0)
-		return;
-
-	/* now allow native_sched_clock() to use rdtsc */
-
-	tsc_disabled = 0;
 	static_branch_enable(&__use_tsc);
 
 	if (!no_sched_irq_time)
@@ -1413,10 +1401,9 @@ void __init tsc_init(void)
 unsigned long calibrate_delay_is_known(void)
 {
 	int sibling, cpu = smp_processor_id();
-	int constant_tsc = cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC);
 	const struct cpumask *mask = topology_core_cpumask(cpu);
 
-	if (tsc_disabled || !constant_tsc || !mask)
+	if (!cpu_has(&cpu_data(cpu), X86_FEATURE_CONSTANT_TSC) || !mask)
 		return 0;
 
 	sibling = cpumask_any_but(mask, cpu);
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 2/7] time: sync read_boot_clock64() with persistent clock
  2018-02-09 21:11 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 1/7] x86/tsc: remove tsc_disabled flag Pavel Tatashin
@ 2018-02-09 21:11 ` Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 3/7] x86/time: read_boot_clock64() implementation Pavel Tatashin
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Tatashin @ 2018-02-09 21:11 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	tglx, hpa, douly.fnst, peterz, prarit

read_boot_clock64() returns a boot start timestamp from epoch. Some arches
may need to access the persistent clock interface in order to calculate the
epoch offset. The resolution of the persistent clock, however, might be
low.  Therefore, in order to avoid time discrepancies a new argument 'now'
is added to read_boot_clock64() parameters. Arch may decide to use it
instead of accessing persistent clock again.

Also, change read_boot_clock64() to have __init prototype since it is
accessed only during boot.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 arch/arm/kernel/time.c      | 2 +-
 arch/s390/kernel/time.c     | 2 +-
 include/linux/timekeeping.h | 3 +--
 kernel/time/timekeeping.c   | 8 ++++++--
 4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/time.c b/arch/arm/kernel/time.c
index 629f8e9981f1..5b259261a268 100644
--- a/arch/arm/kernel/time.c
+++ b/arch/arm/kernel/time.c
@@ -90,7 +90,7 @@ void read_persistent_clock64(struct timespec64 *ts)
 	__read_persistent_clock(ts);
 }
 
-void read_boot_clock64(struct timespec64 *ts)
+void __init read_boot_clock64(struct timespec64 *now, struct timespec64 *ts)
 {
 	__read_boot_clock(ts);
 }
diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index cf561160ea88..9b0c69e3d795 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -221,7 +221,7 @@ void read_persistent_clock64(struct timespec64 *ts)
 	ext_to_timespec64(clk, ts);
 }
 
-void read_boot_clock64(struct timespec64 *ts)
+void __init read_boot_clock64(struct timespec64 *now, struct timespec64 *ts)
 {
 	unsigned char clk[STORE_CLOCK_EXT_SIZE];
 	__u64 delta;
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index b17bcce58bc4..b251fc226c00 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -206,8 +206,7 @@ extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
 extern int persistent_clock_is_local;
 
 extern void read_persistent_clock64(struct timespec64 *ts);
-extern void read_boot_clock64(struct timespec64 *ts);
+extern void read_boot_clock64(struct timespec64 *now, struct timespec64 *ts);
 extern int update_persistent_clock64(struct timespec64 now);
 
-
 #endif
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cd03317e7b57..6894b4841b2e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1501,9 +1501,13 @@ void __weak read_persistent_clock64(struct timespec64 *ts64)
  * Function to read the exact time the system has been started.
  * Returns a timespec64 with tv_sec=0 and tv_nsec=0 if unsupported.
  *
+ * Argument 'now' contains time from persistent clock to calculate offset from
+ * epoch. May contain zeros if persist ant clock is not available.
+ *
  *  XXX - Do be sure to remove it once all arches implement it.
  */
-void __weak read_boot_clock64(struct timespec64 *ts)
+void __weak __init read_boot_clock64(struct timespec64 *now,
+				     struct timespec64 *ts)
 {
 	ts->tv_sec = 0;
 	ts->tv_nsec = 0;
@@ -1534,7 +1538,7 @@ void __init timekeeping_init(void)
 	} else if (now.tv_sec || now.tv_nsec)
 		persistent_clock_exists = true;
 
-	read_boot_clock64(&boot);
+	read_boot_clock64(&now, &boot);
 	if (!timespec64_valid_strict(&boot)) {
 		pr_warn("WARNING: Boot clock returned invalid value!\n"
 			"         Check your CMOS/BIOS settings.\n");
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 3/7] x86/time: read_boot_clock64() implementation
  2018-02-09 21:11 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 1/7] x86/tsc: remove tsc_disabled flag Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 2/7] time: sync read_boot_clock64() with persistent clock Pavel Tatashin
@ 2018-02-09 21:11 ` Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 4/7] sched: early boot clock Pavel Tatashin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Tatashin @ 2018-02-09 21:11 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	tglx, hpa, douly.fnst, peterz, prarit

read_boot_clock64() returns time of when system was started. Now, that
early boot clock is going to be available on x86 it is possible to
implement x86 specific version of read_boot_clock64() that takes advantage
of this new feature.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 arch/x86/kernel/time.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/x86/kernel/time.c b/arch/x86/kernel/time.c
index 774ebafa97c4..32dff35719d9 100644
--- a/arch/x86/kernel/time.c
+++ b/arch/x86/kernel/time.c
@@ -15,6 +15,7 @@
 #include <linux/i8253.h>
 #include <linux/time.h>
 #include <linux/export.h>
+#include <linux/sched/clock.h>
 
 #include <asm/vsyscall.h>
 #include <asm/x86_init.h>
@@ -104,3 +105,32 @@ void __init time_init(void)
 {
 	late_time_init = x86_late_time_init;
 }
+
+/*
+ * Called once during to boot to initialize boot time.
+ * This function returns timestamp in timespec format which is sec/nsec from
+ * epoch of when boot started.
+ * We use sched_clock_cpu() that gives us nanoseconds from when this clock has
+ * been started and it happens quiet early during boot process. To calculate
+ * offset from epoch we use information provided in 'now' by the caller
+ *
+ * If sched_clock_cpu() is not available or if there is any kind of error
+ * i.e. time from epoch is smaller than boot time, we must return zeros in ts,
+ * and the caller will take care of the error: by assuming that the time when
+ * this function was called is the beginning of boot time.
+ */
+void __init read_boot_clock64(struct timespec64 *now, struct timespec64 *ts)
+{
+	u64 ns_boot = sched_clock_cpu(smp_processor_id());
+	bool valid_clock;
+	u64 ns_now;
+
+	ns_now = timespec64_to_ns(now);
+	valid_clock = ns_boot && timespec64_valid_strict(now) &&
+			(ns_now > ns_boot);
+
+	if (!valid_clock)
+		*ts = (struct timespec64){0, 0};
+	else
+		*ts = ns_to_timespec64(ns_now - ns_boot);
+}
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 4/7] sched: early boot clock
  2018-02-09 21:11 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
                   ` (2 preceding siblings ...)
  2018-02-09 21:11 ` [PATCH v10 3/7] x86/time: read_boot_clock64() implementation Pavel Tatashin
@ 2018-02-09 21:11 ` Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 5/7] x86/paravirt: add active_sched_clock to pv_time_ops Pavel Tatashin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Tatashin @ 2018-02-09 21:11 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	tglx, hpa, douly.fnst, peterz, prarit

Allow sched_clock() to be used before schec_clock_init() and
sched_clock_init_late() are called. This provides us with a way to get
early boot timestamps on machines with unstable clocks.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 kernel/sched/clock.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index e086babe6c61..8bc603951c2d 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -217,6 +217,11 @@ void clear_sched_clock_stable(void)
  */
 static int __init sched_clock_init_late(void)
 {
+	/* Transition to unstable clock from early clock */
+	local_irq_disable();
+	__gtod_offset = sched_clock() + __sched_clock_offset - ktime_get_ns();
+	local_irq_enable();
+
 	sched_clock_running = 2;
 	/*
 	 * Ensure that it is impossible to not do a static_key update.
@@ -362,8 +367,9 @@ u64 sched_clock_cpu(int cpu)
 	if (sched_clock_stable())
 		return sched_clock() + __sched_clock_offset;
 
-	if (unlikely(!sched_clock_running))
-		return 0ull;
+	/* Use early clock until sched_clock_init_late() */
+	if (unlikely(sched_clock_running < 2))
+		return sched_clock() + __sched_clock_offset;
 
 	preempt_disable_notrace();
 	scd = cpu_sdc(cpu);
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 5/7] x86/paravirt: add active_sched_clock to pv_time_ops
  2018-02-09 21:11 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
                   ` (3 preceding siblings ...)
  2018-02-09 21:11 ` [PATCH v10 4/7] sched: early boot clock Pavel Tatashin
@ 2018-02-09 21:11 ` Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 6/7] x86/tsc: use tsc early Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 7/7] kvm/x86: remove kvm memblock dependency Pavel Tatashin
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Tatashin @ 2018-02-09 21:11 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	tglx, hpa, douly.fnst, peterz, prarit

Early boot clock might differ from the clock that is used later on,
therefore add a new field to pv_time_ops, that shows currently active
clock. If platform supports early boot clock, this field will be changed
to use that clock early in boot, and later will be replaced with the
permanent clock.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 arch/x86/include/asm/paravirt_types.h | 1 +
 arch/x86/kernel/paravirt.c            | 1 +
 arch/x86/xen/time.c                   | 7 ++++---
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 6ec54d01972d..96518f027c15 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -98,6 +98,7 @@ struct pv_lazy_ops {
 struct pv_time_ops {
 	unsigned long long (*sched_clock)(void);
 	unsigned long long (*steal_clock)(int cpu);
+	unsigned long long (*active_sched_clock)(void);
 } __no_randomize_layout;
 
 struct pv_cpu_ops {
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 041096bdef86..91af9e6b913b 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -320,6 +320,7 @@ struct pv_init_ops pv_init_ops = {
 struct pv_time_ops pv_time_ops = {
 	.sched_clock = native_sched_clock,
 	.steal_clock = native_steal_clock,
+	.active_sched_clock = native_sched_clock,
 };
 
 __visible struct pv_irq_ops pv_irq_ops = {
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index 29163c43ebbd..d1a5f96893c8 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -505,8 +505,8 @@ static void __init xen_time_init(void)
 
 void __ref xen_init_time_ops(void)
 {
-	pv_time_ops = xen_time_ops;
-
+	pv_time_ops.sched_clock = xen_time_ops.sched_clock;
+	pv_time_ops.steal_clock = xen_time_ops.steal_clock;
 	x86_init.timers.timer_init = xen_time_init;
 	x86_init.timers.setup_percpu_clockev = x86_init_noop;
 	x86_cpuinit.setup_percpu_clockev = x86_init_noop;
@@ -547,7 +547,8 @@ void __init xen_hvm_init_time_ops(void)
 		return;
 	}
 
-	pv_time_ops = xen_time_ops;
+	pv_time_ops.sched_clock = xen_time_ops.sched_clock;
+	pv_time_ops.steal_clock = xen_time_ops.steal_clock;
 	x86_init.timers.setup_percpu_clockev = xen_time_init;
 	x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents;
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 6/7] x86/tsc: use tsc early
  2018-02-09 21:11 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
                   ` (4 preceding siblings ...)
  2018-02-09 21:11 ` [PATCH v10 5/7] x86/paravirt: add active_sched_clock to pv_time_ops Pavel Tatashin
@ 2018-02-09 21:11 ` Pavel Tatashin
  2018-02-09 21:11 ` [PATCH v10 7/7] kvm/x86: remove kvm memblock dependency Pavel Tatashin
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Tatashin @ 2018-02-09 21:11 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	tglx, hpa, douly.fnst, peterz, prarit

This patch adds early clock feature to x86 platforms.

tsc_early_init():
Determines offset, shift and multiplier for the early clock based on the
TSC frequency.

tsc_early_fini()
Implement the finish part of early tsc feature, prints message about the
offset, which can be useful to find out how much time was spent in post and
boot manager (if TSC starts from 0 during boot)

sched_clock_early():
TSC based implementation of early clock and is called from sched_clock().

start_early_clock():
Calls tsc_early_init(), and makes sched_clock() to use early boot clock

set_final_clock():
Sets the final clock which is either platform specific or
native_sched_clock(). Also calls tsc_early_fini() if early clock was
previously initialized.

Call start_early_clock() to start using early boot time stamps
functionality on the supported x86 platforms, and call set_final_clock() to
finish this feature and switch back to the default clock. The supported x86
systems are those where TSC frequency is determined early in boot.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 arch/x86/include/asm/paravirt.h |  2 +-
 arch/x86/kernel/tsc.c           | 98 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 97 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 892df375b615..17b04919c71a 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -171,7 +171,7 @@ static inline int rdmsrl_safe(unsigned msr, unsigned long long *p)
 
 static inline unsigned long long paravirt_sched_clock(void)
 {
-	return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
+	return PVOP_CALL0(unsigned long long, pv_time_ops.active_sched_clock);
 }
 
 struct static_key;
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 700a61cbafa5..d492f5dcea67 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -182,6 +182,84 @@ static void set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long tsc_
 	local_irq_restore(flags);
 }
 
+static struct cyc2ns_data  cyc2ns_early;
+
+static u64 sched_clock_early(void)
+{
+	u64 ns = mul_u64_u32_shr(rdtsc(), cyc2ns_early.cyc2ns_mul,
+				 cyc2ns_early.cyc2ns_shift);
+	return ns + cyc2ns_early.cyc2ns_offset;
+}
+
+/*
+ * Initialize clock for early time stamps
+ */
+static void __init tsc_early_init(unsigned int khz)
+{
+	clocks_calc_mult_shift(&cyc2ns_early.cyc2ns_mul,
+			       &cyc2ns_early.cyc2ns_shift,
+			       khz, NSEC_PER_MSEC, 0);
+	cyc2ns_early.cyc2ns_offset = -sched_clock_early();
+}
+
+/*
+ * Finish clock for early time stamps, and hand over to permanent clock by
+ * setting __sched_clock_offset appropriately for continued time keeping.
+ */
+static void __init tsc_early_fini(void)
+{
+	unsigned long long t;
+	unsigned long r;
+
+	t = -cyc2ns_early.cyc2ns_offset;
+	r = do_div(t, NSEC_PER_SEC);
+
+	__sched_clock_offset = sched_clock_early() - sched_clock();
+	pr_info("early sched clock is finished, offset [%lld.%09lds]\n", t, r);
+}
+
+#ifdef CONFIG_PARAVIRT
+static inline void __init start_early_clock(void)
+{
+	tsc_early_init(tsc_khz);
+	pv_time_ops.active_sched_clock = sched_clock_early;
+}
+
+static inline void __init set_final_clock(void)
+{
+	pv_time_ops.active_sched_clock = pv_time_ops.sched_clock;
+
+	/* We did not have early sched clock if multiplier is 0 */
+	if (cyc2ns_early.cyc2ns_mul)
+		tsc_early_fini();
+}
+#else /* CONFIG_PARAVIRT */
+/*
+ * For native clock we use two switches static and dynamic, the static switch is
+ * initially true, so we check the dynamic switch, which is initially false.
+ * Later  when early clock is disabled, we can alter the static switch in order
+ * to avoid branch check on every sched_clock() call.
+ */
+static bool __tsc_early;
+static DEFINE_STATIC_KEY_TRUE(__tsc_early_static);
+
+static inline void __init start_early_clock(void)
+{
+	tsc_early_init(tsc_khz);
+	__tsc_early = true;
+}
+
+static inline void __init set_final_clock(void)
+{
+	__tsc_early = false;
+	static_branch_disable(&__tsc_early_static);
+
+	/* We did not have early sched clock if multiplier is 0 */
+	if (cyc2ns_early.cyc2ns_mul)
+		tsc_early_fini();
+}
+#endif /* CONFIG_PARAVIRT */
+
 /*
  * Scheduler clock - returns current time in nanosec units.
  */
@@ -194,6 +272,13 @@ u64 native_sched_clock(void)
 		return cycles_2_ns(tsc_now);
 	}
 
+#ifndef CONFIG_PARAVIRT
+	if (static_branch_unlikely(&__tsc_early_static)) {
+		if (__tsc_early)
+			return sched_clock_early();
+	}
+#endif /* !CONFIG_PARAVIRT */
+
 	/*
 	 * Fall back to jiffies if there's no TSC available:
 	 * ( But note that we still use it if the TSC is marked
@@ -1313,6 +1398,7 @@ void __init tsc_early_delay_calibrate(void)
 	lpj = tsc_khz * 1000;
 	do_div(lpj, HZ);
 	loops_per_jiffy = lpj;
+	start_early_clock();
 }
 
 void __init tsc_init(void)
@@ -1322,7 +1408,7 @@ void __init tsc_init(void)
 
 	if (!boot_cpu_has(X86_FEATURE_TSC)) {
 		setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
-		return;
+		goto final_sched_clock;
 	}
 
 	cpu_khz = x86_platform.calibrate_cpu();
@@ -1341,7 +1427,7 @@ void __init tsc_init(void)
 	if (!tsc_khz) {
 		mark_tsc_unstable("could not calculate TSC khz");
 		setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
-		return;
+		goto final_sched_clock;
 	}
 
 	pr_info("Detected %lu.%03lu MHz processor\n",
@@ -1389,6 +1475,14 @@ void __init tsc_init(void)
 
 	clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
 	detect_art();
+final_sched_clock:
+	/*
+	 * Final sched clock is either platform specific clock when
+	 * CONFIG_PARAVIRT is defined, or native_sched_clock() with disabled
+	 * static branch for early tsc clock. We must call this function even if
+	 * start_early_clock() was never called.
+	 */
+	set_final_clock();
 }
 
 #ifdef CONFIG_SMP
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 7/7] kvm/x86: remove kvm memblock dependency
  2018-02-09 21:11 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
                   ` (5 preceding siblings ...)
  2018-02-09 21:11 ` [PATCH v10 6/7] x86/tsc: use tsc early Pavel Tatashin
@ 2018-02-09 21:11 ` Pavel Tatashin
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Tatashin @ 2018-02-09 21:11 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	tglx, hpa, douly.fnst, peterz, prarit

KVM clock is initialized later compared to other hypervisor because it has
dependency on memblock allocator.

This patch removes this dependency by using memory from BSS instead of
allocating it.

The benefits:
- remove ifdef from common code
- earlier availability of TSC.
- remove dependency on memblock, and reduce code

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 arch/x86/kernel/kvm.c      |  1 +
 arch/x86/kernel/kvmclock.c | 64 +++++++---------------------------------------
 arch/x86/kernel/setup.c    |  7 +----
 3 files changed, 11 insertions(+), 61 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b40ffbf156c1..5fed337836b0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -582,6 +582,7 @@ const __initconst struct hypervisor_x86 x86_hyper_kvm = {
 	.name			= "KVM",
 	.detect			= kvm_detect,
 	.type			= X86_HYPER_KVM,
+	.init.init_platform	= kvmclock_init,
 	.init.guest_late_init	= kvm_guest_init,
 	.init.x2apic_available	= kvm_para_available,
 };
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 8b26c9e01cc4..11bae04a19a7 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -23,9 +23,9 @@
 #include <asm/apic.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
-#include <linux/memblock.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
+#include <linux/mm.h>
 
 #include <asm/mem_encrypt.h>
 #include <asm/x86_init.h>
@@ -44,6 +44,11 @@ static int parse_no_kvmclock(char *arg)
 }
 early_param("no-kvmclock", parse_no_kvmclock);
 
+/* Aligned to page sizes to match whats mapped via vsyscalls to userspace */
+#define HV_CLOCK_SIZE	(sizeof(struct pvclock_vsyscall_time_info) * NR_CPUS)
+#define WALL_CLOCK_SIZE	(sizeof(struct pvclock_wall_clock))
+static u8 hv_clock_mem[PAGE_ALIGN(HV_CLOCK_SIZE)] __aligned(PAGE_SIZE);
+static u8 wall_clock_mem[PAGE_ALIGN(WALL_CLOCK_SIZE)] __aligned(PAGE_SIZE);
 /* The hypervisor will put information about time periodically here */
 static struct pvclock_vsyscall_time_info *hv_clock;
 static struct pvclock_wall_clock *wall_clock;
@@ -244,43 +249,12 @@ static void kvm_shutdown(void)
 	native_machine_shutdown();
 }
 
-static phys_addr_t __init kvm_memblock_alloc(phys_addr_t size,
-					     phys_addr_t align)
-{
-	phys_addr_t mem;
-
-	mem = memblock_alloc(size, align);
-	if (!mem)
-		return 0;
-
-	if (sev_active()) {
-		if (early_set_memory_decrypted((unsigned long)__va(mem), size))
-			goto e_free;
-	}
-
-	return mem;
-e_free:
-	memblock_free(mem, size);
-	return 0;
-}
-
-static void __init kvm_memblock_free(phys_addr_t addr, phys_addr_t size)
-{
-	if (sev_active())
-		early_set_memory_encrypted((unsigned long)__va(addr), size);
-
-	memblock_free(addr, size);
-}
-
 void __init kvmclock_init(void)
 {
 	struct pvclock_vcpu_time_info *vcpu_time;
-	unsigned long mem, mem_wall_clock;
-	int size, cpu, wall_clock_size;
+	int cpu;
 	u8 flags;
 
-	size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
-
 	if (!kvm_para_available())
 		return;
 
@@ -290,28 +264,11 @@ void __init kvmclock_init(void)
 	} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
 		return;
 
-	wall_clock_size = PAGE_ALIGN(sizeof(struct pvclock_wall_clock));
-	mem_wall_clock = kvm_memblock_alloc(wall_clock_size, PAGE_SIZE);
-	if (!mem_wall_clock)
-		return;
-
-	wall_clock = __va(mem_wall_clock);
-	memset(wall_clock, 0, wall_clock_size);
-
-	mem = kvm_memblock_alloc(size, PAGE_SIZE);
-	if (!mem) {
-		kvm_memblock_free(mem_wall_clock, wall_clock_size);
-		wall_clock = NULL;
-		return;
-	}
-
-	hv_clock = __va(mem);
-	memset(hv_clock, 0, size);
+	wall_clock = (struct pvclock_wall_clock *)wall_clock_mem;
+	hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem;
 
 	if (kvm_register_clock("primary cpu clock")) {
 		hv_clock = NULL;
-		kvm_memblock_free(mem, size);
-		kvm_memblock_free(mem_wall_clock, wall_clock_size);
 		wall_clock = NULL;
 		return;
 	}
@@ -354,13 +311,10 @@ int __init kvm_setup_vsyscall_timeinfo(void)
 	int cpu;
 	u8 flags;
 	struct pvclock_vcpu_time_info *vcpu_time;
-	unsigned int size;
 
 	if (!hv_clock)
 		return 0;
 
-	size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
-
 	cpu = get_cpu();
 
 	vcpu_time = &hv_clock[cpu].pvti;
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1ae67e982af7..9a691306ce8b 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -48,7 +48,6 @@
 #include <linux/pci.h>
 #include <asm/pci-direct.h>
 #include <linux/init_ohci1394_dma.h>
-#include <linux/kvm_para.h>
 #include <linux/dma-contiguous.h>
 
 #include <linux/errno.h>
@@ -1007,6 +1006,7 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	init_hypervisor_platform();
 
+	tsc_early_delay_calibrate();
 	x86_init.resources.probe_roms();
 
 	/* after parse_early_param, so could debug it */
@@ -1192,11 +1192,6 @@ void __init setup_arch(char **cmdline_p)
 
 	memblock_find_dma_reserve();
 
-#ifdef CONFIG_KVM_GUEST
-	kvmclock_init();
-#endif
-
-	tsc_early_delay_calibrate();
 	if (!early_xdbc_setup_hardware())
 		early_xdbc_register_console();
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v10 4/7] sched: early boot clock
  2018-06-15 17:41 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
@ 2018-06-15 17:42 ` Pavel Tatashin
  0 siblings, 0 replies; 9+ messages in thread
From: Pavel Tatashin @ 2018-06-15 17:42 UTC (permalink / raw)
  To: steven.sistare, daniel.m.jordan, linux, schwidefsky,
	heiko.carstens, john.stultz, sboyd, x86, linux-kernel, mingo,
	tglx, hpa, douly.fnst, peterz, prarit, feng.tang, pmladek,
	gnomes

Allow sched_clock() to be used before schec_clock_init() and
sched_clock_init_late() are called. This provides us with a way to get
early boot timestamps on machines with unstable clocks.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
---
 kernel/sched/clock.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 10c83e73837a..f034392b0f6c 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -205,6 +205,11 @@ void clear_sched_clock_stable(void)
  */
 static int __init sched_clock_init_late(void)
 {
+	/* Transition to unstable clock from early clock */
+	local_irq_disable();
+	__gtod_offset = sched_clock() + __sched_clock_offset - ktime_get_ns();
+	local_irq_enable();
+
 	sched_clock_running = 2;
 	/*
 	 * Ensure that it is impossible to not do a static_key update.
@@ -350,8 +355,9 @@ u64 sched_clock_cpu(int cpu)
 	if (sched_clock_stable())
 		return sched_clock() + __sched_clock_offset;
 
-	if (unlikely(!sched_clock_running))
-		return 0ull;
+	/* Use early clock until sched_clock_init_late() */
+	if (unlikely(sched_clock_running < 2))
+		return sched_clock() + __sched_clock_offset;
 
 	preempt_disable_notrace();
 	scd = cpu_sdc(cpu);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-06-15 17:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-09 21:11 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
2018-02-09 21:11 ` [PATCH v10 1/7] x86/tsc: remove tsc_disabled flag Pavel Tatashin
2018-02-09 21:11 ` [PATCH v10 2/7] time: sync read_boot_clock64() with persistent clock Pavel Tatashin
2018-02-09 21:11 ` [PATCH v10 3/7] x86/time: read_boot_clock64() implementation Pavel Tatashin
2018-02-09 21:11 ` [PATCH v10 4/7] sched: early boot clock Pavel Tatashin
2018-02-09 21:11 ` [PATCH v10 5/7] x86/paravirt: add active_sched_clock to pv_time_ops Pavel Tatashin
2018-02-09 21:11 ` [PATCH v10 6/7] x86/tsc: use tsc early Pavel Tatashin
2018-02-09 21:11 ` [PATCH v10 7/7] kvm/x86: remove kvm memblock dependency Pavel Tatashin
2018-06-15 17:41 [PATCH v10 0/7] Early boot time stamps for x86 Pavel Tatashin
2018-06-15 17:42 ` [PATCH v10 4/7] sched: early boot clock Pavel Tatashin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.