linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Early boot time stamps for arm64
@ 2018-12-26 16:45 Pavel Tatashin
  2018-12-26 16:45 ` [PATCH v3 1/3] arm_arch_timer: add macro for timer nbits Pavel Tatashin
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Pavel Tatashin @ 2018-12-26 16:45 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm, rppt, mhocko, ard.biesheuvel,
	andrew.murray, james.morse, marc.zyngier, sboyd,
	linux-arm-kernel, linux-kernel, pasha.tatashin

Changelog:
v2 - v3
	Addressed comments from Will Deacon: split into three
	patches, removed hardcoded assumption from vdso, don't
	assign arch_timer_read_counter during early boot.
v1 - v2
	Addressed comments from Marc Zyngier

I made early boot time stamps available for SPARC and X86.

x86:
https://lore.kernel.org/lkml/20180719205545.16512-1-pasha.tatashin@oracle.com

sparc:
https://www.spinics.net/lists/sparclinux/msg18063.html

As discussed at plumbers, I would like to add the same for arm64. The
implementation does not have to be perfect, and should work only when early
clock is easy to determine. arm64 defines a clock register, and thus makes
it easy, but on some platforms frequency register is broken, so if it is
not known, simply don't initialize clock early.

dmesg before:
https://paste.ubuntu.com/p/3pJ5kgJHyN

dmesg after:
https://paste.ubuntu.com/p/RHKGVd9nSM

As seen from the above with base smp_init is finished after 0.47s:
[    0.464585] SMP: Total of 8 processors activated.

But, in reality, 3.2s is missing which is a quiet long considering this is
only 60G domain.


Pavel Tatashin (3):
  arm_arch_timer: add macro for timer nbits
  arm64: vdso: Use ARCH_TIMER_NBITS to calculate mask
  arm64: Early boot time stamps

 arch/arm64/kernel/asm-offsets.c       |  1 +
 arch/arm64/kernel/setup.c             | 25 +++++++++++++++++++++++++
 arch/arm64/kernel/vdso/gettimeofday.S |  3 +--
 drivers/clocksource/arm_arch_timer.c  |  8 ++++----
 include/clocksource/arm_arch_timer.h  |  3 +++
 5 files changed, 34 insertions(+), 6 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/3] arm_arch_timer: add macro for timer nbits
  2018-12-26 16:45 [PATCH v3 0/3] Early boot time stamps for arm64 Pavel Tatashin
@ 2018-12-26 16:45 ` Pavel Tatashin
  2018-12-26 16:45 ` [PATCH v3 2/3] arm64: vdso: Use ARCH_TIMER_NBITS to calculate mask Pavel Tatashin
  2018-12-26 16:45 ` [PATCH v3 3/3] arm64: Early boot time stamps Pavel Tatashin
  2 siblings, 0 replies; 10+ messages in thread
From: Pavel Tatashin @ 2018-12-26 16:45 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm, rppt, mhocko, ard.biesheuvel,
	andrew.murray, james.morse, marc.zyngier, sboyd,
	linux-arm-kernel, linux-kernel, pasha.tatashin

The minimum value of bits arm defines for timer value is 56bits, add a new
macro ARCH_TIMER_NBITS to use it in code instead of hardcoding 56.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 drivers/clocksource/arm_arch_timer.c | 8 ++++----
 include/clocksource/arm_arch_timer.h | 3 +++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 9a7d4dc00b6e..e4843ad48bd3 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -175,13 +175,13 @@ static struct clocksource clocksource_counter = {
 	.name	= "arch_sys_counter",
 	.rating	= 400,
 	.read	= arch_counter_read,
-	.mask	= CLOCKSOURCE_MASK(56),
+	.mask	= CLOCKSOURCE_MASK(ARCH_TIMER_NBITS),
 	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
 static struct cyclecounter cyclecounter __ro_after_init = {
 	.read	= arch_counter_read_cc,
-	.mask	= CLOCKSOURCE_MASK(56),
+	.mask	= CLOCKSOURCE_MASK(ARCH_TIMER_NBITS),
 };
 
 struct ate_acpi_oem_info {
@@ -963,8 +963,8 @@ static void __init arch_counter_register(unsigned type)
 	timecounter_init(&arch_timer_kvm_info.timecounter,
 			 &cyclecounter, start_count);
 
-	/* 56 bits minimum, so we assume worst case rollover */
-	sched_clock_register(arch_timer_read_counter, 56, arch_timer_rate);
+	sched_clock_register(arch_timer_read_counter, ARCH_TIMER_NBITS,
+			     arch_timer_rate);
 }
 
 static void arch_timer_stop(struct clock_event_device *clk)
diff --git a/include/clocksource/arm_arch_timer.h b/include/clocksource/arm_arch_timer.h
index 349e5957c949..c485512e1d01 100644
--- a/include/clocksource/arm_arch_timer.h
+++ b/include/clocksource/arm_arch_timer.h
@@ -71,6 +71,9 @@ enum arch_timer_spi_nr {
 #define ARCH_TIMER_EVT_STREAM_FREQ				\
 	(USEC_PER_SEC / ARCH_TIMER_EVT_STREAM_PERIOD_US)
 
+/* 56 bits minimum, so we assume worst case rollover */
+#define	ARCH_TIMER_NBITS		56
+
 struct arch_timer_kvm_info {
 	struct timecounter timecounter;
 	int virtual_irq;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/3] arm64: vdso: Use ARCH_TIMER_NBITS to calculate mask
  2018-12-26 16:45 [PATCH v3 0/3] Early boot time stamps for arm64 Pavel Tatashin
  2018-12-26 16:45 ` [PATCH v3 1/3] arm_arch_timer: add macro for timer nbits Pavel Tatashin
@ 2018-12-26 16:45 ` Pavel Tatashin
  2018-12-26 16:45 ` [PATCH v3 3/3] arm64: Early boot time stamps Pavel Tatashin
  2 siblings, 0 replies; 10+ messages in thread
From: Pavel Tatashin @ 2018-12-26 16:45 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm, rppt, mhocko, ard.biesheuvel,
	andrew.murray, james.morse, marc.zyngier, sboyd,
	linux-arm-kernel, linux-kernel, pasha.tatashin

get_clock_shifted_nsec macro in vdso/gettimeofday.S has another place
where the size of clocksource precision is hardcoded. Use
ARCH_TIMER_NBITS to generate mask value.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/kernel/asm-offsets.c       | 1 +
 arch/arm64/kernel/vdso/gettimeofday.S | 3 +--
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 65b8afc84466..32e1b935dd6c 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -107,6 +107,7 @@ int main(void)
   DEFINE(CLOCK_MONOTONIC_COARSE,CLOCK_MONOTONIC_COARSE);
   DEFINE(CLOCK_COARSE_RES,	LOW_RES_NSEC);
   DEFINE(NSEC_PER_SEC,		NSEC_PER_SEC);
+  DEFINE(CLOCKSOURCE_MASK,	CLOCKSOURCE_MASK(ARCH_TIMER_NBITS));
   BLANK();
   DEFINE(VDSO_CS_CYCLE_LAST,	offsetof(struct vdso_data, cs_cycle_last));
   DEFINE(VDSO_RAW_TIME_SEC,	offsetof(struct vdso_data, raw_time_sec));
diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
index c39872a7b03c..5a7580d0118e 100644
--- a/arch/arm64/kernel/vdso/gettimeofday.S
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -69,8 +69,7 @@ x_tmp		.req	x8
 	mrs	x_tmp, cntvct_el0
 	/* Calculate cycle delta and convert to ns. */
 	sub	\res, x_tmp, \cycle_last
-	/* We can only guarantee 56 bits of precision. */
-	movn	x_tmp, #0xff00, lsl #48
+	mov     x_tmp, #CLOCKSOURCE_MASK
 	and	\res, x_tmp, \res
 	mul	\res, \res, \mult
 	.endm
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/3] arm64: Early boot time stamps
  2018-12-26 16:45 [PATCH v3 0/3] Early boot time stamps for arm64 Pavel Tatashin
  2018-12-26 16:45 ` [PATCH v3 1/3] arm_arch_timer: add macro for timer nbits Pavel Tatashin
  2018-12-26 16:45 ` [PATCH v3 2/3] arm64: vdso: Use ARCH_TIMER_NBITS to calculate mask Pavel Tatashin
@ 2018-12-26 16:45 ` Pavel Tatashin
  2019-01-03 10:51   ` Marc Zyngier
  2 siblings, 1 reply; 10+ messages in thread
From: Pavel Tatashin @ 2018-12-26 16:45 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, akpm, rppt, mhocko, ard.biesheuvel,
	andrew.murray, james.morse, marc.zyngier, sboyd,
	linux-arm-kernel, linux-kernel, pasha.tatashin

Allow printk time stamps/sched_clock() to be available from the early
boot.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/kernel/setup.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 4b0e1231625c..28126facc4ed 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -40,6 +40,7 @@
 #include <linux/efi.h>
 #include <linux/psci.h>
 #include <linux/sched/task.h>
+#include <linux/sched_clock.h>
 #include <linux/mm.h>
 
 #include <asm/acpi.h>
@@ -279,8 +280,32 @@ arch_initcall(reserve_memblock_reserved_regions);
 
 u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
 
+/*
+ * Get time stamps available early in boot, useful to identify boot time issues
+ * from the early boot.
+ */
+static __init void sched_clock_early_init(void)
+{
+	u64 (*read_time)(void) = arch_counter_get_cntvct;
+	u64 freq = arch_timer_get_cntfrq();
+
+	/*
+	 * The arm64 boot protocol mandates that CNTFRQ_EL0 reflects
+	 * the timer frequency. To avoid breakage on misconfigured
+	 * systems, do not register the early sched_clock if the
+	 * programmed value if zero. Other random values will just
+	 * result in random output.
+	 */
+	if (!freq)
+		return;
+
+	sched_clock_register(read_time, ARCH_TIMER_NBITS, freq);
+}
+
 void __init setup_arch(char **cmdline_p)
 {
+	sched_clock_early_init();
+
 	init_mm.start_code = (unsigned long) _text;
 	init_mm.end_code   = (unsigned long) _etext;
 	init_mm.end_data   = (unsigned long) _edata;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/3] arm64: Early boot time stamps
  2018-12-26 16:45 ` [PATCH v3 3/3] arm64: Early boot time stamps Pavel Tatashin
@ 2019-01-03 10:51   ` Marc Zyngier
  2019-01-03 19:58     ` Pavel Tatashin
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2019-01-03 10:51 UTC (permalink / raw)
  To: Pavel Tatashin, catalin.marinas, will.deacon, akpm, rppt, mhocko,
	ard.biesheuvel, andrew.murray, james.morse, sboyd,
	linux-arm-kernel, linux-kernel

Hi Pavel,

On 26/12/2018 16:45, Pavel Tatashin wrote:
> Allow printk time stamps/sched_clock() to be available from the early
> boot.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> ---
>  arch/arm64/kernel/setup.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index 4b0e1231625c..28126facc4ed 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -40,6 +40,7 @@
>  #include <linux/efi.h>
>  #include <linux/psci.h>
>  #include <linux/sched/task.h>
> +#include <linux/sched_clock.h>
>  #include <linux/mm.h>
>  
>  #include <asm/acpi.h>
> @@ -279,8 +280,32 @@ arch_initcall(reserve_memblock_reserved_regions);
>  
>  u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
>  
> +/*
> + * Get time stamps available early in boot, useful to identify boot time issues
> + * from the early boot.
> + */
> +static __init void sched_clock_early_init(void)
> +{
> +	u64 (*read_time)(void) = arch_counter_get_cntvct;
> +	u64 freq = arch_timer_get_cntfrq();
> +
> +	/*
> +	 * The arm64 boot protocol mandates that CNTFRQ_EL0 reflects
> +	 * the timer frequency. To avoid breakage on misconfigured
> +	 * systems, do not register the early sched_clock if the
> +	 * programmed value if zero. Other random values will just
> +	 * result in random output.
> +	 */
> +	if (!freq)
> +		return;
> +
> +	sched_clock_register(read_time, ARCH_TIMER_NBITS, freq);
> +}
> +
>  void __init setup_arch(char **cmdline_p)
>  {
> +	sched_clock_early_init();
> +
>  	init_mm.start_code = (unsigned long) _text;
>  	init_mm.end_code   = (unsigned long) _etext;
>  	init_mm.end_data   = (unsigned long) _edata;
> 

I still think this approach is flawed. You provide the kernel with a
potentially broken sched_clock that may jump back and forth until the
workaround kicks in. Nobody expects this.

Instead, I'd suggest you allow for a something other than local_clock()
to be used for the time stamping until a properly working sched_clock
gets registered.

This way, you'll only impact the timestamps when running on a broken system.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/3] arm64: Early boot time stamps
  2019-01-03 10:51   ` Marc Zyngier
@ 2019-01-03 19:58     ` Pavel Tatashin
  2019-01-04 15:39       ` Marc Zyngier
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Tatashin @ 2019-01-03 19:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: catalin.marinas, Will Deacon, Andrew Morton, rppt, Michal Hocko,
	Ard Biesheuvel, andrew.murray, james.morse, sboyd,
	linux-arm-kernel, LKML

> I still think this approach is flawed. You provide the kernel with a
> potentially broken sched_clock that may jump back and forth until the
> workaround kicks in. Nobody expects this.
>
> Instead, I'd suggest you allow for a something other than local_clock()
> to be used for the time stamping until a properly working sched_clock
> gets registered.
>
> This way, you'll only impact the timestamps when running on a broken system.

I think, given that on other platforms sched_clock() is already used
early, it is not a good idea to invent a different clock just for time
stamps.

We could limit arm64 approach only for chips where cntvct_el0 is
working: i.e. frequency is known, and the clock is stable, meaning
cannot go backward. Perhaps we would start early clock a little later,
but at least it will be available for the sane chips. The only
question, where during boot time this is known.

Another approach is to modify sched_clock() in
kernel/time/sched_clock.c to never return backward value during boot.

1. Rename  current implementation of sched_clock() to sched_clock_raw()
2. New sched_clock() would look like this:

u64 sched_clock(void)
{
   if (static_branch(early_unstable_clock))
      return sched_clock_unstable();
   else
      return sched_clock_raw();
}

3. sched_clock_unstable() would look like this:

u64 sched_clock_unstable(void)
{
again:
  static u64 old_clock;
  u64 new_clock = sched_clock_raw();
  static u64 old_clock_read =   READ_ONCE(old_clock);
  /* It is ok if time does not progress, but don't allow to go backward */
  if (new_clock < old_clock_read)
    return old_clock_read;
   /* update the old_clock value */
   if (cmpxchg64(&old_clock, old_clock_read, new_clock) != old_clock_read)
      goto again;
   return new_clock;
}

Pasha

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/3] arm64: Early boot time stamps
  2019-01-03 19:58     ` Pavel Tatashin
@ 2019-01-04 15:39       ` Marc Zyngier
  2019-01-04 16:23         ` Pavel Tatashin
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2019-01-04 15:39 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: catalin.marinas, Will Deacon, Andrew Morton, rppt, Michal Hocko,
	Ard Biesheuvel, andrew.murray, james.morse, sboyd,
	linux-arm-kernel, LKML

On Thu, 03 Jan 2019 19:58:25 +0000,
Pavel Tatashin <pasha.tatashin@soleen.com> wrote:
> 
> > I still think this approach is flawed. You provide the kernel with a
> > potentially broken sched_clock that may jump back and forth until the
> > workaround kicks in. Nobody expects this.
> >
> > Instead, I'd suggest you allow for a something other than local_clock()
> > to be used for the time stamping until a properly working sched_clock
> > gets registered.
> >
> > This way, you'll only impact the timestamps when running on a broken system.
> 
> I think, given that on other platforms sched_clock() is already used
> early, it is not a good idea to invent a different clock just for time
> stamps.

Square pegs vs round holes. Mimicking other architectures isn't always
the right thing to do when faced with a different problem. We put a
lot of effort in working around timer errata for a good reason, and
feeding the rest of the system bogus timing information doesn't sound
great.

> We could limit arm64 approach only for chips where cntvct_el0 is
> working: i.e. frequency is known, and the clock is stable, meaning
> cannot go backward. Perhaps we would start early clock a little later,
> but at least it will be available for the sane chips. The only
> question, where during boot time this is known.

How do you propose we do that? Defective timers can be a property of
the implementation, of the integration, or both. In any case, it
requires firmware support (DT, ACPI). All that is only available quite
late, and moving it earlier is not easily doable.

> Another approach is to modify sched_clock() in
> kernel/time/sched_clock.c to never return backward value during boot.
> 
> 1. Rename  current implementation of sched_clock() to sched_clock_raw()
> 2. New sched_clock() would look like this:
> 
> u64 sched_clock(void)
> {
>    if (static_branch(early_unstable_clock))
>       return sched_clock_unstable();
>    else
>       return sched_clock_raw();
> }
> 
> 3. sched_clock_unstable() would look like this:
> 
> u64 sched_clock_unstable(void)
> {
> again:
>   static u64 old_clock;
>   u64 new_clock = sched_clock_raw();
>   static u64 old_clock_read =   READ_ONCE(old_clock);
>   /* It is ok if time does not progress, but don't allow to go backward */
>   if (new_clock < old_clock_read)
>     return old_clock_read;
>    /* update the old_clock value */
>    if (cmpxchg64(&old_clock, old_clock_read, new_clock) != old_clock_read)
>       goto again;
>    return new_clock;
> }

You now have an "unstable" clock that is only allowed to move forward,
until you switch to the real one. And at handover time, anything can
happen.

It is one thing to allow for the time stamping to be imprecise. But
imposing the same behaviour on other parts of the kernel that have so
far relied on a strictly monotonic sched_clock feels like a bad idea.

What I'm proposing is that we allow architectures to override the hard
tie between local_clock/sched_clock and kernel log time stamping, with
the default being of course what we have today. This gives a clean
separation between the two when the architecture needs to delay the
availability of sched_clock until implementation requirements are
discovered. It also keep sched_clock simple and efficient.

To illustrate what I'm trying to argue for, I've pushed out a couple
of proof of concept patches here[1]. I've briefly tested them in a
guest, and things seem to work OK.

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=arm64/tsclock

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/3] arm64: Early boot time stamps
  2019-01-04 15:39       ` Marc Zyngier
@ 2019-01-04 16:23         ` Pavel Tatashin
  2019-01-04 16:49           ` Marc Zyngier
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Tatashin @ 2019-01-04 16:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: catalin.marinas, Will Deacon, Andrew Morton, rppt, Michal Hocko,
	Ard Biesheuvel, andrew.murray, james.morse, sboyd,
	linux-arm-kernel, LKML

Hi Marc,

Thank you for taking a look at this please see my replies below.

> > I think, given that on other platforms sched_clock() is already used
> > early, it is not a good idea to invent a different clock just for time
> > stamps.
>
> Square pegs vs round holes. Mimicking other architectures isn't always
> the right thing to do when faced with a different problem. We put a
> lot of effort in working around timer errata for a good reason, and
> feeding the rest of the system bogus timing information doesn't sound
> great.
>
> > We could limit arm64 approach only for chips where cntvct_el0 is
> > working: i.e. frequency is known, and the clock is stable, meaning
> > cannot go backward. Perhaps we would start early clock a little later,
> > but at least it will be available for the sane chips. The only
> > question, where during boot time this is known.
>
> How do you propose we do that? Defective timers can be a property of
> the implementation, of the integration, or both. In any case, it
> requires firmware support (DT, ACPI). All that is only available quite
> late, and moving it earlier is not easily doable.

OK, but could we at least whitelist something early with expectation
that the future chips won't be bogus?

> > Another approach is to modify sched_clock() in
> > kernel/time/sched_clock.c to never return backward value during boot.
> >
> > 1. Rename  current implementation of sched_clock() to sched_clock_raw()
> > 2. New sched_clock() would look like this:
> >
> > u64 sched_clock(void)
> > {
> >    if (static_branch(early_unstable_clock))
> >       return sched_clock_unstable();
> >    else
> >       return sched_clock_raw();
> > }
> >
> > 3. sched_clock_unstable() would look like this:
> >
> > u64 sched_clock_unstable(void)
> > {
> > again:
> >   static u64 old_clock;
> >   u64 new_clock = sched_clock_raw();
> >   static u64 old_clock_read =   READ_ONCE(old_clock);
> >   /* It is ok if time does not progress, but don't allow to go backward */
> >   if (new_clock < old_clock_read)
> >     return old_clock_read;
> >    /* update the old_clock value */
> >    if (cmpxchg64(&old_clock, old_clock_read, new_clock) != old_clock_read)
> >       goto again;
> >    return new_clock;
> > }
>
> You now have an "unstable" clock that is only allowed to move forward,
> until you switch to the real one. And at handover time, anything can
> happen.
>
> It is one thing to allow for the time stamping to be imprecise. But
> imposing the same behaviour on other parts of the kernel that have so
> far relied on a strictly monotonic sched_clock feels like a bad idea.

sched_clock() will still be strictly monotonic. During switch over we
will guarantee to continue from where the early clock left.

>
> What I'm proposing is that we allow architectures to override the hard
> tie between local_clock/sched_clock and kernel log time stamping, with
> the default being of course what we have today. This gives a clean
> separation between the two when the architecture needs to delay the
> availability of sched_clock until implementation requirements are
> discovered. It also keep sched_clock simple and efficient.
>
> To illustrate what I'm trying to argue for, I've pushed out a couple
> of proof of concept patches here[1]. I've briefly tested them in a
> guest, and things seem to work OK.

What I am worried is that decoupling time stamps from the
sched_clock() will cause uptime and other commands that show boot time
not to correlate with timestamps in dmesg with these changes. For them
to correlate we would still have to have a switch back to
local_clock() in timestamp_clock() after we are done with early boot,
which brings us back to using a temporarily unstable clock that I
proposed above but without adding an architectural hook for it. Again,
we would need to solve the problem of time continuity during switch
over, which is not a hard problem to solve, as we do it already in
sched_clock.c, and everytime clocksource changes.

During early boot time stamps project for x86 we were extra careful to
make sure that they stay the same.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/3] arm64: Early boot time stamps
  2019-01-04 16:23         ` Pavel Tatashin
@ 2019-01-04 16:49           ` Marc Zyngier
  2019-01-04 20:54             ` Pavel Tatashin
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2019-01-04 16:49 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: catalin.marinas, Will Deacon, Andrew Morton, rppt, Michal Hocko,
	Ard Biesheuvel, andrew.murray, james.morse, sboyd,
	linux-arm-kernel, LKML

On 04/01/2019 16:23, Pavel Tatashin wrote:

Hi Pavel,

>>> We could limit arm64 approach only for chips where cntvct_el0 is
>>> working: i.e. frequency is known, and the clock is stable, meaning
>>> cannot go backward. Perhaps we would start early clock a little later,
>>> but at least it will be available for the sane chips. The only
>>> question, where during boot time this is known.
>>
>> How do you propose we do that? Defective timers can be a property of
>> the implementation, of the integration, or both. In any case, it
>> requires firmware support (DT, ACPI). All that is only available quite
>> late, and moving it earlier is not easily doable.
> 
> OK, but could we at least whitelist something early with expectation
> that the future chips won't be bogus?

Just as I wish we had universal world peace. Timer integration is
probably the most broken thing in the whole ARM ecosystem (clock
domains, Gray code and general incompetence do get in the way). And as I
said above, retecting a broken implementation usually relies on some
firmware indication, which is only available at a later time (and I'm
trying really hard to keep the errata handling in the timer code).

>>> Another approach is to modify sched_clock() in
>>> kernel/time/sched_clock.c to never return backward value during boot.
>>>
>>> 1. Rename  current implementation of sched_clock() to sched_clock_raw()
>>> 2. New sched_clock() would look like this:
>>>
>>> u64 sched_clock(void)
>>> {
>>>    if (static_branch(early_unstable_clock))
>>>       return sched_clock_unstable();
>>>    else
>>>       return sched_clock_raw();
>>> }
>>>
>>> 3. sched_clock_unstable() would look like this:
>>>
>>> u64 sched_clock_unstable(void)
>>> {
>>> again:
>>>   static u64 old_clock;
>>>   u64 new_clock = sched_clock_raw();
>>>   static u64 old_clock_read =   READ_ONCE(old_clock);
>>>   /* It is ok if time does not progress, but don't allow to go backward */
>>>   if (new_clock < old_clock_read)
>>>     return old_clock_read;
>>>    /* update the old_clock value */
>>>    if (cmpxchg64(&old_clock, old_clock_read, new_clock) != old_clock_read)
>>>       goto again;
>>>    return new_clock;
>>> }
>>
>> You now have an "unstable" clock that is only allowed to move forward,
>> until you switch to the real one. And at handover time, anything can
>> happen.
>>
>> It is one thing to allow for the time stamping to be imprecise. But
>> imposing the same behaviour on other parts of the kernel that have so
>> far relied on a strictly monotonic sched_clock feels like a bad idea.
> 
> sched_clock() will still be strictly monotonic. During switch over we
> will guarantee to continue from where the early clock left.

Not quite. There is at least one broken integration that results in
large, spurious jumps ahead. If one of these jumps happens during the
"unstable" phase, we'll only return old_clock. At some point, we switch
early_unstable_clock to be false, as we've now properly initialized the
timer and found the appropriate workaround. We'll now return a much
smaller value. sched_clock continuity doesn't seem to apply here, as
you're not registering a new sched_clock (or at least that's not how I
understand your code above).

>> What I'm proposing is that we allow architectures to override the hard
>> tie between local_clock/sched_clock and kernel log time stamping, with
>> the default being of course what we have today. This gives a clean
>> separation between the two when the architecture needs to delay the
>> availability of sched_clock until implementation requirements are
>> discovered. It also keep sched_clock simple and efficient.
>>
>> To illustrate what I'm trying to argue for, I've pushed out a couple
>> of proof of concept patches here[1]. I've briefly tested them in a
>> guest, and things seem to work OK.
> 
> What I am worried is that decoupling time stamps from the
> sched_clock() will cause uptime and other commands that show boot time
> not to correlate with timestamps in dmesg with these changes. For them
> to correlate we would still have to have a switch back to
> local_clock() in timestamp_clock() after we are done with early boot,
> which brings us back to using a temporarily unstable clock that I
> proposed above but without adding an architectural hook for it. Again,
> we would need to solve the problem of time continuity during switch
> over, which is not a hard problem to solve, as we do it already in
> sched_clock.c, and everytime clocksource changes.
> 
> During early boot time stamps project for x86 we were extra careful to
> make sure that they stay the same.

I can see two ways to achieve this requirement:

- we allow timestamp_clock to fall-back to sched_clock once it becomes
non-zero. It has the drawback of resetting the time stamping in the
middle of the boot, which isn't great.

- we allow sched_clock to inherit the timestamp_clock value instead of
starting at zero like it does now. Not sure if that breaks anything, but
that's worth trying (it should be a matter of setting new_epoch to zero
in sched_clock_register).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 3/3] arm64: Early boot time stamps
  2019-01-04 16:49           ` Marc Zyngier
@ 2019-01-04 20:54             ` Pavel Tatashin
  0 siblings, 0 replies; 10+ messages in thread
From: Pavel Tatashin @ 2019-01-04 20:54 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: catalin.marinas, Will Deacon, Andrew Morton, rppt, Michal Hocko,
	Ard Biesheuvel, andrew.murray, james.morse, sboyd,
	linux-arm-kernel, LKML

> > sched_clock() will still be strictly monotonic. During switch over we
> > will guarantee to continue from where the early clock left.
>
> Not quite. There is at least one broken integration that results in
> large, spurious jumps ahead. If one of these jumps happens during the
> "unstable" phase, we'll only return old_clock. At some point, we switch
> early_unstable_clock to be false, as we've now properly initialized the
> timer and found the appropriate workaround. We'll now return a much
> smaller value. sched_clock continuity doesn't seem to apply here, as
> you're not registering a new sched_clock (or at least that's not how I
> understand your code above).
>
> >> What I'm proposing is that we allow architectures to override the hard
> >> tie between local_clock/sched_clock and kernel log time stamping, with
> >> the default being of course what we have today. This gives a clean
> >> separation between the two when the architecture needs to delay the
> >> availability of sched_clock until implementation requirements are
> >> discovered. It also keep sched_clock simple and efficient.
> >>
> >> To illustrate what I'm trying to argue for, I've pushed out a couple
> >> of proof of concept patches here[1]. I've briefly tested them in a
> >> guest, and things seem to work OK.
> >
> > What I am worried is that decoupling time stamps from the
> > sched_clock() will cause uptime and other commands that show boot time
> > not to correlate with timestamps in dmesg with these changes. For them
> > to correlate we would still have to have a switch back to
> > local_clock() in timestamp_clock() after we are done with early boot,
> > which brings us back to using a temporarily unstable clock that I
> > proposed above but without adding an architectural hook for it. Again,
> > we would need to solve the problem of time continuity during switch
> > over, which is not a hard problem to solve, as we do it already in
> > sched_clock.c, and everytime clocksource changes.
> >
> > During early boot time stamps project for x86 we were extra careful to
> > make sure that they stay the same.
>
> I can see two ways to achieve this requirement:
>
> - we allow timestamp_clock to fall-back to sched_clock once it becomes
> non-zero. It has the drawback of resetting the time stamping in the
> middle of the boot, which isn't great.

Right, I'd like those timestamps to be continuous.

>
> - we allow sched_clock to inherit the timestamp_clock value instead of
> starting at zero like it does now. Not sure if that breaks anything, but
> that's worth trying (it should be a matter of setting new_epoch to zero
> in sched_clock_register).

This is what I am proposing above with my approach. Inherit the last
value of unstable sched_clock before switching to permanent. Please
see [1] how I implemented it, and we can discuss what is better
whether to use timestamp hook in printk or what I am suggestion.

[1] https://github.com/soleen/time_arm64/commits/time
sched_clock: generic unstable clock is a new patch, the other patches
are the ones sent out in this series. Because we use sched_clock() as
the last value calculating epoch in sched_clock_register() we
guarantee monotonicity during clock change.

Thank you,
Pasha

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-01-04 20:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-26 16:45 [PATCH v3 0/3] Early boot time stamps for arm64 Pavel Tatashin
2018-12-26 16:45 ` [PATCH v3 1/3] arm_arch_timer: add macro for timer nbits Pavel Tatashin
2018-12-26 16:45 ` [PATCH v3 2/3] arm64: vdso: Use ARCH_TIMER_NBITS to calculate mask Pavel Tatashin
2018-12-26 16:45 ` [PATCH v3 3/3] arm64: Early boot time stamps Pavel Tatashin
2019-01-03 10:51   ` Marc Zyngier
2019-01-03 19:58     ` Pavel Tatashin
2019-01-04 15:39       ` Marc Zyngier
2019-01-04 16:23         ` Pavel Tatashin
2019-01-04 16:49           ` Marc Zyngier
2019-01-04 20:54             ` Pavel Tatashin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).