All of lore.kernel.org
 help / color / mirror / Atom feed
* On NTP, RTCs and accurately setting their time
@ 2017-09-20 11:21 Russell King - ARM Linux
  2017-09-20 13:23 ` Russell King - ARM Linux
  2017-09-20 16:22 ` Jason Gunthorpe
  0 siblings, 2 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-20 11:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

It's common for systems to be time synchronised using programs such as
chrony or ntpd.  In such situations, when we are properly synchronised,
the kernel writes the current time to the RTC every 11 seconds.

However, assumptions are made about the RTC:

1. kernel/time/ntp.c assumes that all RTCs want to be told to set the
   time at around 500ms into the second.

2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms,
   then we want to set the _next_ second.

This leads to RTCs being set with an offset time, where the offset
depends on the RTC.  For example, the PCF8523 ends up reliably set
around 970ms in the future:

RTC Time: 19-09-2017 14:27:51
System Time was:     14:27:50.031

What this means is that when the RTC ticked from 50 to 51 seconds,
system time was at 50.031s.  Hence, the RTC in this case is 969ms
in the future.

For Armada 388, the situation is different:

RTC Time: 19-09-2017 14:48:17
System Time was:     14:48:16.521

Here, the RTC is being set 479ms in the future.

The SNVS RTC in imx6 is different again, although I have no definitive
figures yet.

Why does this matter - if you introduce a 1s offset in system time
(eg, you reboot a machine running NTP) then it takes at least four hours
after the reboot for ntpd to settle down, with the PPM swinging faster
and slower than normal as it compensates for the offset.  This means if
you are using a Linux system for time measurement purposes, it is
useless for those four hours.

Being able to accurately save and restore system time helps to reduce
the disruptive effect of a reboot.

Currently, there are no controls in the kernel over this mechanism - if
you're running a multi-platform kernel which has support for writing the
RTC if ntp-sync'd, then you're stuck with the kernel writing the RTC
every 11 seconds.  Not only is this detrimental due to the enforced RTC
dependent offset, but it also means that if you want to trim your RTC
to a synchronised source of time (for which the kernel must not write
the RTC, but you do want it to ntp sync), the only way to do it is to
either modify the kernel, or disable CONFIG_RTC_SYSTOHC in the multi-
platform kernel.


The kernel can do better - elimination of the rounding-up in systohc.c
gives a better result for PCF8523:

RTC Time: 19-09-2017 16:30:38
System Time was:     16:30:38.034

and the remaining offset can be reduced by adjusting the 500ms offset
in ntp.c to 470ms.  This is specific to the offset that PCF8523 wants.
We know that MC146818 RTCs found in PCs want a 500ms offset (without
the addition of one second that systohc.c does.)  The Armada 388 RTC
wants to be set on the exact second.

We need some way to cater for these differing requirements (eg, RTC
drivers provide some properties to rtclib and the ntp code to describe
how long it takes for a written time to "take"), or we decide (as I
think was decided in the past) that the kernel should not be setting
the RTC, but userspace should be responsible for performing that
function.  Either way, we need to know about these RTC specific
properties in order to set their time accurately.

One of the issues here, however, is that RTC datasheets do not give this
information - this can only be found out by experimentation and
measurement.

Currently I'm using the hacky patch below to be able to (a) disable the
regular RTC write during runtime, so I can measure the RTC drift and
trim it, and (b) provide a knob to adjust how far past the second the
RTC will receive its write.  For a properly trimmed PCF8523 (measured
over about 12 hours to have 0.1ppm drift - which is better than its
associated crystal is rated for), setting this knob to 470ms results
in the following (from two samples this morning):

RTC Time: 20-09-2017 10:41:30
System Time was:     10:41:30.000
RTC Time: 20-09-2017 11:17:38
System Time was:     11:17:38.000

There's probably some noise in the setting of this due to the workqueue,
but getting it within 10ms is definitely an improvement over being
almost a second out.

So, the question is... how should these differences in rtc requirements
be handled?

 drivers/rtc/systohc.c |  6 +++---
 kernel/sysctl.c       | 21 +++++++++++++++++++++
 kernel/time/ntp.c     |  9 ++++++---
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index b4a68ffcd06b..df804597de71 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -26,10 +26,7 @@ int rtc_set_ntp_time(struct timespec64 now)
 	struct rtc_time tm;
 	int err = -ENODEV;
 
-	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
-		rtc_time64_to_tm(now.tv_sec, &tm);
-	else
-		rtc_time64_to_tm(now.tv_sec + 1, &tm);
+	rtc_time64_to_tm(now.tv_sec, &tm);
 
 	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
 	if (rtc) {
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6648fbbb8157..c2ce802f7ab9 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -98,6 +98,9 @@
 
 #if defined(CONFIG_SYSCTL)
 
+extern unsigned int sysctl_ntp_rtc_offset;
+extern unsigned int sysctl_ntp_rtc_sync;
+
 /* External variables not in a header file. */
 extern int suid_dumpable;
 #ifdef CONFIG_COREDUMP
@@ -303,6 +306,24 @@ static int max_extfrag_threshold = 1000;
 #endif
 
 static struct ctl_table kern_table[] = {
+#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
+	{
+		.procname	= "ntp_rtc_offset",
+		.data		= &sysctl_ntp_rtc_offset,
+		.maxlen		= sizeof(sysctl_ntp_rtc_offset),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec,
+	},
+	{
+		.procname	= "ntp_rtc_sync",
+		.data		= &sysctl_ntp_rtc_sync,
+		.maxlen		= sizeof(sysctl_ntp_rtc_sync),
+		.mode		= 0644,
+		.proc_handler	= proc_douintvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 	{
 		.procname	= "sched_child_runs_first",
 		.data		= &sysctl_sched_child_runs_first,
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index edf19cc53140..674c45d30561 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -508,6 +508,9 @@ int __weak update_persistent_clock64(struct timespec64 now64)
 #endif
 
 #if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
+unsigned long sysctl_ntp_rtc_offset = NSEC_PER_SEC / 2;
+unsigned int sysctl_ntp_rtc_sync = true;
+
 static void sync_cmos_clock(struct work_struct *work);
 
 static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock);
@@ -526,7 +529,7 @@ static void sync_cmos_clock(struct work_struct *work)
 	 * may not expire at the correct time.  Thus, we adjust...
 	 * We want the clock to be within a couple of ticks from the target.
 	 */
-	if (!ntp_synced()) {
+	if (!ntp_synced() || !sysctl_ntp_rtc_sync) {
 		/*
 		 * Not synced, exit, do not restart a timer (if one is
 		 * running, let it run out).
@@ -535,7 +538,7 @@ static void sync_cmos_clock(struct work_struct *work)
 	}
 
 	getnstimeofday64(&now);
-	if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {
+	if (abs(now.tv_nsec - sysctl_ntp_rtc_offset) <= tick_nsec * 5) {
 		struct timespec64 adjust = now;
 
 		fail = -ENODEV;
@@ -551,7 +554,7 @@ static void sync_cmos_clock(struct work_struct *work)
 #endif
 	}
 
-	next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2);
+	next.tv_nsec = sysctl_ntp_rtc_offset - now.tv_nsec - (TICK_NSEC / 2);
 	if (next.tv_nsec <= 0)
 		next.tv_nsec += NSEC_PER_SEC;
 


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync@8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-20 11:21 On NTP, RTCs and accurately setting their time Russell King - ARM Linux
@ 2017-09-20 13:23 ` Russell King - ARM Linux
  2017-09-20 16:22 ` Jason Gunthorpe
  1 sibling, 0 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-20 13:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 20, 2017 at 12:21:52PM +0100, Russell King - ARM Linux wrote:
> Hi,
> 
> It's common for systems to be time synchronised using programs such as
> chrony or ntpd.  In such situations, when we are properly synchronised,
> the kernel writes the current time to the RTC every 11 seconds.

That should be 11 minutes.

> 
> However, assumptions are made about the RTC:
> 
> 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the
>    time at around 500ms into the second.
> 
> 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms,
>    then we want to set the _next_ second.
> 
> This leads to RTCs being set with an offset time, where the offset
> depends on the RTC.  For example, the PCF8523 ends up reliably set
> around 970ms in the future:
> 
> RTC Time: 19-09-2017 14:27:51
> System Time was:     14:27:50.031
> 
> What this means is that when the RTC ticked from 50 to 51 seconds,
> system time was at 50.031s.  Hence, the RTC in this case is 969ms
> in the future.
> 
> For Armada 388, the situation is different:
> 
> RTC Time: 19-09-2017 14:48:17
> System Time was:     14:48:16.521
> 
> Here, the RTC is being set 479ms in the future.
> 
> The SNVS RTC in imx6 is different again, although I have no definitive
> figures yet.
> 
> Why does this matter - if you introduce a 1s offset in system time
> (eg, you reboot a machine running NTP) then it takes at least four hours
> after the reboot for ntpd to settle down, with the PPM swinging faster
> and slower than normal as it compensates for the offset.  This means if
> you are using a Linux system for time measurement purposes, it is
> useless for those four hours.
> 
> Being able to accurately save and restore system time helps to reduce
> the disruptive effect of a reboot.
> 
> Currently, there are no controls in the kernel over this mechanism - if
> you're running a multi-platform kernel which has support for writing the
> RTC if ntp-sync'd, then you're stuck with the kernel writing the RTC
> every 11 seconds.  Not only is this detrimental due to the enforced RTC

and here too.

> dependent offset, but it also means that if you want to trim your RTC
> to a synchronised source of time (for which the kernel must not write
> the RTC, but you do want it to ntp sync), the only way to do it is to
> either modify the kernel, or disable CONFIG_RTC_SYSTOHC in the multi-
> platform kernel.
> 
> 
> The kernel can do better - elimination of the rounding-up in systohc.c
> gives a better result for PCF8523:
> 
> RTC Time: 19-09-2017 16:30:38
> System Time was:     16:30:38.034
> 
> and the remaining offset can be reduced by adjusting the 500ms offset
> in ntp.c to 470ms.  This is specific to the offset that PCF8523 wants.
> We know that MC146818 RTCs found in PCs want a 500ms offset (without
> the addition of one second that systohc.c does.)  The Armada 388 RTC
> wants to be set on the exact second.
> 
> We need some way to cater for these differing requirements (eg, RTC
> drivers provide some properties to rtclib and the ntp code to describe
> how long it takes for a written time to "take"), or we decide (as I
> think was decided in the past) that the kernel should not be setting
> the RTC, but userspace should be responsible for performing that
> function.  Either way, we need to know about these RTC specific
> properties in order to set their time accurately.
> 
> One of the issues here, however, is that RTC datasheets do not give this
> information - this can only be found out by experimentation and
> measurement.
> 
> Currently I'm using the hacky patch below to be able to (a) disable the
> regular RTC write during runtime, so I can measure the RTC drift and
> trim it, and (b) provide a knob to adjust how far past the second the
> RTC will receive its write.  For a properly trimmed PCF8523 (measured
> over about 12 hours to have 0.1ppm drift - which is better than its
> associated crystal is rated for), setting this knob to 470ms results
> in the following (from two samples this morning):
> 
> RTC Time: 20-09-2017 10:41:30
> System Time was:     10:41:30.000
> RTC Time: 20-09-2017 11:17:38
> System Time was:     11:17:38.000
> 
> There's probably some noise in the setting of this due to the workqueue,
> but getting it within 10ms is definitely an improvement over being
> almost a second out.
> 
> So, the question is... how should these differences in rtc requirements
> be handled?
> 
>  drivers/rtc/systohc.c |  6 +++---
>  kernel/sysctl.c       | 21 +++++++++++++++++++++
>  kernel/time/ntp.c     |  9 ++++++---
>  3 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
> index b4a68ffcd06b..df804597de71 100644
> --- a/drivers/rtc/systohc.c
> +++ b/drivers/rtc/systohc.c
> @@ -26,10 +26,7 @@ int rtc_set_ntp_time(struct timespec64 now)
>  	struct rtc_time tm;
>  	int err = -ENODEV;
>  
> -	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
> -		rtc_time64_to_tm(now.tv_sec, &tm);
> -	else
> -		rtc_time64_to_tm(now.tv_sec + 1, &tm);
> +	rtc_time64_to_tm(now.tv_sec, &tm);
>  
>  	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
>  	if (rtc) {
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 6648fbbb8157..c2ce802f7ab9 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -98,6 +98,9 @@
>  
>  #if defined(CONFIG_SYSCTL)
>  
> +extern unsigned int sysctl_ntp_rtc_offset;
> +extern unsigned int sysctl_ntp_rtc_sync;
> +
>  /* External variables not in a header file. */
>  extern int suid_dumpable;
>  #ifdef CONFIG_COREDUMP
> @@ -303,6 +306,24 @@ static int max_extfrag_threshold = 1000;
>  #endif
>  
>  static struct ctl_table kern_table[] = {
> +#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
> +	{
> +		.procname	= "ntp_rtc_offset",
> +		.data		= &sysctl_ntp_rtc_offset,
> +		.maxlen		= sizeof(sysctl_ntp_rtc_offset),
> +		.mode		= 0644,
> +		.proc_handler	= proc_douintvec,
> +	},
> +	{
> +		.procname	= "ntp_rtc_sync",
> +		.data		= &sysctl_ntp_rtc_sync,
> +		.maxlen		= sizeof(sysctl_ntp_rtc_sync),
> +		.mode		= 0644,
> +		.proc_handler	= proc_douintvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &one,
> +	},
> +#endif
>  	{
>  		.procname	= "sched_child_runs_first",
>  		.data		= &sysctl_sched_child_runs_first,
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index edf19cc53140..674c45d30561 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -508,6 +508,9 @@ int __weak update_persistent_clock64(struct timespec64 now64)
>  #endif
>  
>  #if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
> +unsigned long sysctl_ntp_rtc_offset = NSEC_PER_SEC / 2;
> +unsigned int sysctl_ntp_rtc_sync = true;
> +
>  static void sync_cmos_clock(struct work_struct *work);
>  
>  static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock);
> @@ -526,7 +529,7 @@ static void sync_cmos_clock(struct work_struct *work)
>  	 * may not expire at the correct time.  Thus, we adjust...
>  	 * We want the clock to be within a couple of ticks from the target.
>  	 */
> -	if (!ntp_synced()) {
> +	if (!ntp_synced() || !sysctl_ntp_rtc_sync) {
>  		/*
>  		 * Not synced, exit, do not restart a timer (if one is
>  		 * running, let it run out).
> @@ -535,7 +538,7 @@ static void sync_cmos_clock(struct work_struct *work)
>  	}
>  
>  	getnstimeofday64(&now);
> -	if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {
> +	if (abs(now.tv_nsec - sysctl_ntp_rtc_offset) <= tick_nsec * 5) {
>  		struct timespec64 adjust = now;
>  
>  		fail = -ENODEV;
> @@ -551,7 +554,7 @@ static void sync_cmos_clock(struct work_struct *work)
>  #endif
>  	}
>  
> -	next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2);
> +	next.tv_nsec = sysctl_ntp_rtc_offset - now.tv_nsec - (TICK_NSEC / 2);
>  	if (next.tv_nsec <= 0)
>  		next.tv_nsec += NSEC_PER_SEC;
>  
> 
> 
> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
> According to speedtest.net: 8.21Mbps down 510kbps up
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-20 11:21 On NTP, RTCs and accurately setting their time Russell King - ARM Linux
  2017-09-20 13:23 ` Russell King - ARM Linux
@ 2017-09-20 16:22 ` Jason Gunthorpe
  2017-09-20 16:51   ` Russell King - ARM Linux
  1 sibling, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2017-09-20 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 20, 2017 at 12:21:52PM +0100, Russell King - ARM Linux wrote:

> However, assumptions are made about the RTC:
> 
> 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the
>    time at around 500ms into the second.
> 
> 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms,
>    then we want to set the _next_ second.

I looked at these issues when I did the sys to HC patches and I
concluced the first problem was that the RTC read functions generally
did not return sub second resolution, either in sense of directly
returning ts_nsec, or the sense of delaying the read until a clock
tick over event.

Eg if I repeatedly booted my various embedded ARM/PPC systems, and
recorded the time offset from NTP, I got a fairly random distribution
of offsets.

Given that we couldn't seem to read back sub second resolution on my
systems, storing it more accurately did not seem important.

I think we also did some experiments with a few of the RTCs we were
using and some of them did not adjust the seconds clock phase on
write, so they seemed incapable of storing sub second data anyhow.

So.. My feeling was that we'd need driver support in each RTC driver
to enable sub section resolution.

Do you know differently?

Our pragmatic solution in our products was to have the initial time
sync from NTP step the clock even if the offset is small.

> So, the question is... how should these differences in rtc requirements
> be handled?

I think patch wise, this is something I would rather see handled
internally via the drivers and perhaps with input from DT, not via
sysctl knobs.

The HW driver should know how to read and write with sub second
resolution. If it works best with a certain value in the ts_nsec
field, then it should set something inside rtc_chip that causes the
systohc code to try and call it with that tv_nsec.

It would probably also make sense to add new ntp ops for sub second
get/set that includes the full timespec. This way the RTC driver can
provide the right adjustments and we can get rid of the +1 in
rtc_set_ntp_time and the +0.5 in rtc_hctosys for sub sec aware
drivers..

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-20 16:22 ` Jason Gunthorpe
@ 2017-09-20 16:51   ` Russell King - ARM Linux
  2017-09-20 17:16     ` Jason Gunthorpe
  2017-09-20 22:45     ` Jason Gunthorpe
  0 siblings, 2 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-20 16:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 20, 2017 at 10:22:08AM -0600, Jason Gunthorpe wrote:
> On Wed, Sep 20, 2017 at 12:21:52PM +0100, Russell King - ARM Linux wrote:
> 
> > However, assumptions are made about the RTC:
> > 
> > 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the
> >    time at around 500ms into the second.
> > 
> > 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms,
> >    then we want to set the _next_ second.
> 
> I looked at these issues when I did the sys to HC patches and I
> concluced the first problem was that the RTC read functions generally
> did not return sub second resolution, either in sense of directly
> returning ts_nsec, or the sense of delaying the read until a clock
> tick over event.

The boot time problem can be resolved by using hwclock to set the
system time in userspace - there are distros that do exactly that.
For example, debian has a udev rule:

# Set the System Time from the Hardware Clock and set the kernel's timezone
# value to the local timezone when the kernel clock module is loaded.

KERNEL=="rtc0", RUN+="/lib/udev/hwclock-set $root/$name"

and that script uses hwclock to set the system time from the RTC.
hwclock doesn't just read the RTC and set the system time, it tries
to read the RTC at a whole second, and apply any known RTC drift
correction.

So I wouldn't worry about the kernel's RTC read being inaccurate,
that's been worked around in distros for years.

Embedded distros may not care so much about this (since they care
more about boot speed) except if they're reliant on having correct
time, in which case they may /choose/ to use hwclock in a blocking
manner to ensure that system time is accurately set from the RTC.

However, all that effort is for nowt if the RTC isn't set correctly
in the first place.

> I think we also did some experiments with a few of the RTCs we were
> using and some of them did not adjust the seconds clock phase on
> write, so they seemed incapable of storing sub second data anyhow.

I can imagine that there are RTCs out there which do not reset the
seconds pre-scaler, but just because there are some like that is no
reason to cripple those which are more inteligent about it.

> So.. My feeling was that we'd need driver support in each RTC driver
> to enable sub section resolution.
> 
> Do you know differently?

See above...

> Our pragmatic solution in our products was to have the initial time
> sync from NTP step the clock even if the offset is small.

That's all very well, but consider when your time source is GPS and
you're not on a network connection that would allow ntpdate to do
any better (eg, a 3G mobile data card...)  I have exactly that setup
with a remote monitoring system.  If I reboot the gateway with the
3G and GPS, NTP makes big PPM adjustments while trying to correct for
the 970ms time shift, and that madness is transmitted by ntpd to the
stratum 2 servers.

> > So, the question is... how should these differences in rtc requirements
> > be handled?
> 
> I think patch wise, this is something I would rather see handled
> internally via the drivers and perhaps with input from DT, not via
> sysctl knobs.

I sort-of agree as far as the time offset information goes, but there's
a complication that we only open the RTC to set the time at the point in
time that we want to set it - while the RTC is closed, the RTC driver
module could be removed and replaced by a different RTC driver which
replaces the existing device.

Don't think that's not possible, there are boards out there with multiple
RTCs on them, so its entirely possible that the wrong RTC could be
selected with random module probe ordering and need manual resolution.
For example, the system I mention above has the built-in iMX6 SVNS RTC
which gets powered down when power is removed, and a PCF8523 RTC which
doesn't... and there's variants of the board where the PCF8523 isn't
fitted, so users are reliant on the iMX6 RTC for cross-reboot time.


The patch is not meant to be a final solution, so criticising it for
using sysctls is not appropriate - it is firstly to prove the point that
we are in fact able to correctly set RTCs, and secondly (as I explained
in the bits you cut) to provide a knob to turn off the kernel's automatic
RTC setting so it's possible to accurately trim the RTC.

Many RTCs contain registers that allow fine trimming of their tick rate,
and if you care about the time keeping of your RTC, it's something that
needs to be calibrated against a known correct time source.  Having the
kernel repeatedly set the RTC every 11 minutes while you're trying to
measure the RTCs drift over a period of 12 hours is not on - and in
order to make such a measurement, you want the machine to be NTP
synchronised.  That's the exact situation that triggers kernels to
periodically write the time to the RTC.

So, we _do_ need a knob to turn that kernel timekeeping facility on and
off in addition to the "are we NTP sync'd" status.

> The HW driver should know how to read and write with sub second
> resolution. If it works best with a certain value in the ts_nsec
> field, then it should set something inside rtc_chip that causes the
> systohc code to try and call it with that tv_nsec.

The problem is deeper than the systohc.c code - the timing of the call
made into the systohc.c code is decided by kernel/time/ntp.c, and
currently is within a tick of 500ms past the second.

> It would probably also make sense to add new ntp ops for sub second
> get/set that includes the full timespec. This way the RTC driver can
> provide the right adjustments and we can get rid of the +1 in
> rtc_set_ntp_time and the +0.5 in rtc_hctosys for sub sec aware
> drivers..

Do we have any sub-second aware drivers?  None of my RTCs are, and I
don't think it's fair for RTC drivers to sleep when getting a request
to set the time.

The userspace API doesn't do that, and the workqueue involved in
setting NTP time is probably run using shared system_power_efficient_wq
resources, so blocking there will be detrimental to other works queued
on that.

Maybe the NTP code should call into RTCs to enable/disable the time
setting feature and leave RTC drivers to do that, but that seems to me
like a lot of code in each driver as well as something of a layering
issue.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-20 16:51   ` Russell King - ARM Linux
@ 2017-09-20 17:16     ` Jason Gunthorpe
  2017-09-20 22:45     ` Jason Gunthorpe
  1 sibling, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2017-09-20 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 20, 2017 at 05:51:41PM +0100, Russell King - ARM Linux wrote:
> On Wed, Sep 20, 2017 at 10:22:08AM -0600, Jason Gunthorpe wrote:
> > On Wed, Sep 20, 2017 at 12:21:52PM +0100, Russell King - ARM Linux wrote:
> > 
> > > However, assumptions are made about the RTC:
> > > 
> > > 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the
> > >    time at around 500ms into the second.
> > > 
> > > 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms,
> > >    then we want to set the _next_ second.
> > 
> > I looked at these issues when I did the sys to HC patches and I
> > concluced the first problem was that the RTC read functions generally
> > did not return sub second resolution, either in sense of directly
> > returning ts_nsec, or the sense of delaying the read until a clock
> > tick over event.
> 
> The boot time problem can be resolved by using hwclock to set the
> system time in userspace - there are distros that do exactly that.
> For example, debian has a udev rule:

Okay, that part I didn't know, thanks.

> > I think patch wise, this is something I would rather see handled
> > internally via the drivers and perhaps with input from DT, not via
> > sysctl knobs.
> 
> I sort-of agree as far as the time offset information goes, but there's
> a complication that we only open the RTC to set the time at the point in
> time that we want to set it - while the RTC is closed, the RTC driver
> module could be removed and replaced by a different RTC driver which
> replaces the existing device.

Yes, the ntp code would have to open the rtc initially and record the
offset. Each time it goes to write it would have to check the offset
it got against what the device wants and, if necessary, reschedule
the update if the ts_nsec is not correct. That should handle
infrequent dynamic changes..

> So, we _do_ need a knob to turn that kernel timekeeping facility on and
> off in addition to the "are we NTP sync'd" status.

Sure, that makes sense as a sysctl

> > The HW driver should know how to read and write with sub second
> > resolution. If it works best with a certain value in the ts_nsec
> > field, then it should set something inside rtc_chip that causes the
> > systohc code to try and call it with that tv_nsec.
> 
> The problem is deeper than the systohc.c code - the timing of the call
> made into the systohc.c code is decided by kernel/time/ntp.c, and
> currently is within a tick of 500ms past the second.

Right, I was thinking about the entire systohc process, including the
part in ntp.c

> Do we have any sub-second aware drivers?  None of my RTCs are, and I
> don't think it's fair for RTC drivers to sleep when getting a request
> to set the time.

I would call any RTC that can change the phase of the seconds clock as
sub-second capable. Currently, I think that only the CMOS driver is
really a sub-second aware driver, and the rest of the RTC stack has been
sort of hardcoded around what it does..

My thinking was adding a new sub-second entry point would let us keep
the existing assumptions above while having a cleaner entry point that
allows fixing the RTC drivers gradually. Eg you have tested PCF, so it
could use the subsecond entry and define the proper ts_nsec value/etc.

> The userspace API doesn't do that, and the workqueue involved in
> setting NTP time is probably run using shared system_power_efficient_wq
> resources, so blocking there will be detrimental to other works queued
> on that.

I think the NTP path should be non blocking, and rely on the ntp.c
code calling into the driver with the desired tv_nsec.

However, it also makes sense to provide a blocking sub-second user
space API that would have the required sleep to make setting the time
work properly. This way we can hide the desired tv_nsec implementation
detail from userspace completely.

eg I assume hwclock also has the built in 0.5s assumption when trying
to write to the RTC?

The same would be true for read, I think it would be cleaner to have a
kernel uapi that reads the time with subsecond accuracy than to have
hwclock implement a spinning loop in userspace. That gives drivers
more options in how they measure the second tick over. The common code
would just provide a simple loop like we see in hwclock.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-20 16:51   ` Russell King - ARM Linux
  2017-09-20 17:16     ` Jason Gunthorpe
@ 2017-09-20 22:45     ` Jason Gunthorpe
  2017-09-21  7:59       ` Russell King - ARM Linux
                         ` (3 more replies)
  1 sibling, 4 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2017-09-20 22:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 20, 2017 at 05:51:41PM +0100, Russell King - ARM Linux wrote:
> I sort-of agree as far as the time offset information goes, but there's
> a complication that we only open the RTC to set the time at the
> point in

Hi Russell,

What do you think of this untested approach in the below patch?

Upon more careful inspection I think I found a way to make the
rounding in rtc_set_ntp_time compatible with a wide range of rtc
devices, so the subsecond capable ops I suggested do not seem
necessary.

>From 3455f4d225b01b6d3e85df372c9724a45d065b22 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Wed, 20 Sep 2017 16:43:10 -0600
Subject: [PATCH] rtc: Allow rtc drivers to specify the tv_nsec value for ntp

ntp is currently hardwired to try and call the rtc set when wall clock
tv_nsec is 0.5 seconds. This historical behaviour works well with certain
PC RTCs, but is not universal to all rtc hardware.

Change ntp to take a value from struct rtc_device that specifies what the
target tv_nsec value should be. This allows each driver to fine tune
the tv_nsec based on its own requirements.

This changes how rtc_set_ntp_time computes the rounding, instead of using
0.5s as a hard line, we compute a +-1ms band around the target tv_nsec
value and 'snap' the incoming timespec to the tv_nsec inside the
band (or defer this update).

The calculation of the sleep time for ntp is also revised to use modern
helper functions, and to more directly and safely compute a relative
jiffies delay that will result in the correct tv_nsec. If this fails it
does a short sleep and tries again rather than trying to program the RTC
with a poor tv_nsec value.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/rtc/class.c   |  6 +++++
 drivers/rtc/systohc.c | 67 +++++++++++++++++++++++++++++++++++++++++----------
 include/linux/rtc.h   |  4 ++-
 kernel/time/ntp.c     | 62 ++++++++++++++++++++++++++++++-----------------
 4 files changed, 103 insertions(+), 36 deletions(-)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 2ed970d61da140..ed1a4e0f8742ba 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -161,6 +161,12 @@ static struct rtc_device *rtc_allocate_device(void)
 
 	device_initialize(&rtc->dev);
 
+	/* Drivers can revise this default after allocating the device. It
+	 * should be the value of wallclock tv_nsec that the driver needs in
+	 * order to synchronize the second tick over during set.
+	 */
+	rtc->time_set_nsec =  NSEC_PER_SEC / 2;
+
 	rtc->irq_freq = 1;
 	rtc->max_user_freq = 64;
 	rtc->dev.class = rtc_class;
diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index b4a68ffcd06bb8..f756dc1804829b 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -7,9 +7,43 @@
 #include <linux/rtc.h>
 #include <linux/time.h>
 
+/* True if now.tv_nsec is close enough to time_sec_nsec that we can call the
+ * drivers's set_time_ns(). Arbitrarily choose 1ms as the allowable error
+ * margin.
+ *
+ * This also rounds tv_sec to the second that covers the tv_nsec tick we are
+ * targeting.
+ */
+#define TIME_SET_NSEC_FUZZ (1000*1000)
+static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
+				  struct timespec64 *now)
+{
+	long diff;
+
+	diff = (now->tv_nsec - rtc->time_set_nsec) % NSEC_PER_SEC;
+	if (diff < TIME_SET_NSEC_FUZZ) {
+		if (rtc->time_set_nsec + diff >= NSEC_PER_SEC) {
+			now->tv_sec -= 1;
+			now->tv_nsec = NSEC_PER_SEC-1;
+		}
+		return true;
+	}
+
+	diff = (rtc->time_set_nsec - now->tv_nsec) % NSEC_PER_SEC;
+	if (diff < TIME_SET_NSEC_FUZZ) {
+		if (now->tv_nsec + diff >= NSEC_PER_SEC) {
+			now->tv_sec += 1;
+			now->tv_nsec = 0;
+		}
+		return true;
+	}
+	return false;
+}
+
 /**
  * rtc_set_ntp_time - Save NTP synchronized time to the RTC
  * @now: Current time of day
+ * @target_nsec: Output value indicating what now->tv_nsec
  *
  * Replacement for the NTP platform function update_persistent_clock64
  * that stores time for later retrieval by rtc_hctosys.
@@ -20,28 +54,35 @@
  *
  * If temporary failure is indicated the caller should try again 'soon'
  */
-int rtc_set_ntp_time(struct timespec64 now)
+int rtc_set_ntp_time(struct timespec64 now, long *target_nsec)
 {
 	struct rtc_device *rtc;
 	struct rtc_time tm;
 	int err = -ENODEV;
 
-	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
-		rtc_time64_to_tm(now.tv_sec, &tm);
-	else
-		rtc_time64_to_tm(now.tv_sec + 1, &tm);
-
 	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
-	if (rtc) {
+	if (!rtc)
+		goto out_err;
+
+	/* The ntp code must call this with the correct value in tv_nsec, if
+	 * it does not we update target_nsec and return EPROTO to make the ntp
+	 * code try again later.
+	 */
+	*target_nsec = rtc->time_set_nsec;
+	if (!rtc_tv_nsec_ok(rtc, &now)) {
+		err = -EPROTO;
+		goto out_close;
+	}
+
+	if (rtc->ops && (rtc->ops->set_time || rtc->ops->set_mmss64 ||
+			 rtc->ops->set_mmss)) {
 		/* rtc_hctosys exclusively uses UTC, so we call set_time here,
 		 * not set_mmss. */
-		if (rtc->ops &&
-		    (rtc->ops->set_time ||
-		     rtc->ops->set_mmss64 ||
-		     rtc->ops->set_mmss))
-			err = rtc_set_time(rtc, &tm);
-		rtc_class_close(rtc);
+		err = rtc_set_time(rtc, &tm);
 	}
 
+out_close:
+	rtc_class_close(rtc);
+out_err:
 	return err;
 }
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 0a0f0d14a5fba5..7f9858ef03111a 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -137,6 +137,8 @@ struct rtc_device {
 	/* Some hardware can't support UIE mode */
 	int uie_unsupported;
 
+	long time_set_nsec;
+
 	bool registered;
 
 	struct nvmem_config *nvmem_config;
@@ -174,7 +176,7 @@ extern void devm_rtc_device_unregister(struct device *dev,
 
 extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm);
 extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
-extern int rtc_set_ntp_time(struct timespec64 now);
+extern int rtc_set_ntp_time(struct timespec64 now, long *target_nsec);
 int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
 extern int rtc_read_alarm(struct rtc_device *rtc,
 			struct rtc_wkalrm *alrm);
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index edf19cc5314043..61fdf758dcb754 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -514,17 +514,19 @@ static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock);
 
 static void sync_cmos_clock(struct work_struct *work)
 {
-	struct timespec64 now;
-	struct timespec64 next;
+	struct timespec64 now, next, delta;
 	int fail = 1;
+	long target_nsec = NSEC_PER_SEC / 2;
 
 	/*
-	 * If we have an externally synchronized Linux clock, then update
-	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
-	 * called as close as possible to 500 ms before the new second starts.
-	 * This code is run on a timer.  If the clock is set, that timer
-	 * may not expire at the correct time.  Thus, we adjust...
-	 * We want the clock to be within a couple of ticks from the target.
+	 * If we have an externally synchronized Linux clock, then update CMOS
+	 * clock accordingly every ~11 minutes.  Histiocally Set_rtc_mmss()
+	 * has to be called as close as possible to 500 ms (target_nsec)
+	 * before the new second starts, but new RTC drivers can select a
+	 * different value.  This code is run on a timer.  If the clock is
+	 * set, that timer may not expire at the correct time.  Thus, we
+	 * adjust...  We want the clock to be within a couple of ticks from
+	 * the target.
 	 */
 	if (!ntp_synced()) {
 		/*
@@ -547,25 +549,41 @@ static void sync_cmos_clock(struct work_struct *work)
 
 #ifdef CONFIG_RTC_SYSTOHC
 		if (fail == -ENODEV)
-			fail = rtc_set_ntp_time(adjust);
+			fail = rtc_set_ntp_time(adjust, &target_nsec);
 #endif
 	}
 
-	next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2);
-	if (next.tv_nsec <= 0)
-		next.tv_nsec += NSEC_PER_SEC;
+	do {
+		/*
+		 * Compute the next wall clock time to try and set the
+		 * clock
+		 */
+		next = now;
+		if (!fail || fail == -ENODEV)
+			timespec64_add_ns(&next, 659 * NSEC_PER_SEC);
+		else
+			/* Update failed, try again in about 10 seconds */
+			timespec64_add_ns(&next, 10 * NSEC_PER_SEC);
 
-	if (!fail || fail == -ENODEV)
-		next.tv_sec = 659;
-	else
-		next.tv_sec = 0;
+		/*
+		 * The next call to sync_cmos_clock needs to have have a wall
+		 * clock tv_nsec value equal to target_nsec.
+		 */
+		if (next.tv_nsec > target_nsec)
+			next.tv_sec++;
+		next.tv_nsec = target_nsec;
 
-	if (next.tv_nsec >= NSEC_PER_SEC) {
-		next.tv_sec++;
-		next.tv_nsec -= NSEC_PER_SEC;
-	}
-	queue_delayed_work(system_power_efficient_wq,
-			   &sync_cmos_work, timespec64_to_jiffies(&next));
+		/*
+		 * Convert to a relative delay. If time set took a really long
+		 * time, or the wall clock was changed, this might become
+		 * negative, so try again.
+		 */
+		getnstimeofday64(&now);
+		delta = timespec64_sub(next, now);
+	} while (delta.tv_sec <= 0);
+
+	queue_delayed_work(system_power_efficient_wq, &sync_cmos_work,
+			   timespec64_to_jiffies(&delta));
 }
 
 void ntp_notify_cmos_timer(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-20 22:45     ` Jason Gunthorpe
@ 2017-09-21  7:59       ` Russell King - ARM Linux
  2017-09-21  9:32         ` Russell King - ARM Linux
  2017-09-21  9:46       ` Russell King - ARM Linux
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-21  7:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 20, 2017 at 04:45:22PM -0600, Jason Gunthorpe wrote:
> On Wed, Sep 20, 2017 at 05:51:41PM +0100, Russell King - ARM Linux wrote:
> > I sort-of agree as far as the time offset information goes, but there's
> > a complication that we only open the RTC to set the time at the
> > point in
> 
> Hi Russell,
> 
> What do you think of this untested approach in the below patch?
> 
> Upon more careful inspection I think I found a way to make the
> rounding in rtc_set_ntp_time compatible with a wide range of rtc
> devices, so the subsecond capable ops I suggested do not seem
> necessary.

Looks like a possibility, but...

> +/* True if now.tv_nsec is close enough to time_sec_nsec that we can call the
> + * drivers's set_time_ns(). Arbitrarily choose 1ms as the allowable error
> + * margin.
> + *
> + * This also rounds tv_sec to the second that covers the tv_nsec tick we are
> + * targeting.
> + */
> +#define TIME_SET_NSEC_FUZZ (1000*1000)
> +static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
> +				  struct timespec64 *now)
> +{
> +	long diff;
> +
> +	diff = (now->tv_nsec - rtc->time_set_nsec) % NSEC_PER_SEC;
> +	if (diff < TIME_SET_NSEC_FUZZ) {
> +		if (rtc->time_set_nsec + diff >= NSEC_PER_SEC) {
> +			now->tv_sec -= 1;
> +			now->tv_nsec = NSEC_PER_SEC-1;
> +		}
> +		return true;
> +	}
> +
> +	diff = (rtc->time_set_nsec - now->tv_nsec) % NSEC_PER_SEC;
> +	if (diff < TIME_SET_NSEC_FUZZ) {
> +		if (now->tv_nsec + diff >= NSEC_PER_SEC) {
> +			now->tv_sec += 1;
> +			now->tv_nsec = 0;
> +		}
> +		return true;
> +	}

I don't think this is correct - and I think it's overly expensive.
I threw this into a test program to prove the point, and it confirmed
my suspicions - the above will always return true.

Here's the test program output:

now=0.990000000, offset=         0: diff= 990000000 true 2: 0.990000000
now=0.999000000, offset=         0: diff= 999000000 true 2: 0.999000000
now=0.999900000, offset=         0: diff= 999900000 true 2: 0.999900000
now=1.000000000, offset=         0: diff=         0 true 1: 1.000000000
now=1.000100000, offset=         0: diff=    100000 true 1: 1.000100000
now=1.001000000, offset=         0: diff=   1000000 true 2: 1.001000000
now=1.010000000, offset=         0: diff=  10000000 true 2: 1.010000000
now=0.990000000, offset= 470000000: diff= 520000000 true 2: 0.990000000
now=0.999000000, offset= 470000000: diff= 529000000 true 2: 0.999000000
now=0.999900000, offset= 470000000: diff= 529900000 true 2: 0.999900000
now=1.000000000, offset= 470000000: diff=-470000000 true 1: 1.000000000
now=1.000100000, offset= 470000000: diff=-469900000 true 1: 1.000100000
now=1.001000000, offset= 470000000: diff=-469000000 true 1: 1.001000000
now=1.010000000, offset= 470000000: diff=-460000000 true 1: 1.010000000
now=0.990000000, offset=1000000000: diff= -10000000 true 1: 0.990000000
now=0.999000000, offset=1000000000: diff=  -1000000 true 1: 0.999000000
now=0.999900000, offset=1000000000: diff=   -100000 true 1: 0.999900000
now=1.000000000, offset=1000000000: diff=         0 true 1: 0.999999999
now=1.000100000, offset=1000000000: diff=-999900000 true 1: 1.000100000
now=1.001000000, offset=1000000000: diff=-999000000 true 1: 1.001000000
now=1.010000000, offset=1000000000: diff=-990000000 true 1: 1.010000000
now=0.990000000, offset=1470000000: diff=-480000000 true 1: 0.990000000
now=0.999000000, offset=1470000000: diff=-471000000 true 1: 0.999000000
now=0.999900000, offset=1470000000: diff=-470100000 true 1: 0.999900000
now=1.000000000, offset=1470000000: diff=-470000000 true 1: 0.999999999
now=1.000100000, offset=1470000000: diff=-469900000 true 1: 0.999999999
now=1.001000000, offset=1470000000: diff=-469000000 true 1: 0.999999999
now=1.010000000, offset=1470000000: diff=-460000000 true 1: 0.999999999

which should be self explanatory.  Another point to note is that
the computation of the time to be set is also incorrect - for
example, in the case of offset=0, we can see if we're slightly
early, we set tv_sec=0 rather than tv_sec=1.

In the offset=1sec case, it also isn't working as we'd expect -
this case should be providing tv_sec for the preceding second, but
it doesn't.

I don't yet have a proposal to fix this...

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-21  7:59       ` Russell King - ARM Linux
@ 2017-09-21  9:32         ` Russell King - ARM Linux
  2017-09-21 11:30           ` Russell King - ARM Linux
  2017-09-21 22:05           ` Jason Gunthorpe
  0 siblings, 2 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-21  9:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 21, 2017 at 08:59:07AM +0100, Russell King - ARM Linux wrote:
> On Wed, Sep 20, 2017 at 04:45:22PM -0600, Jason Gunthorpe wrote:
> > On Wed, Sep 20, 2017 at 05:51:41PM +0100, Russell King - ARM Linux wrote:
> > > I sort-of agree as far as the time offset information goes, but there's
> > > a complication that we only open the RTC to set the time at the
> > > point in
> > 
> > Hi Russell,
> > 
> > What do you think of this untested approach in the below patch?
> > 
> > Upon more careful inspection I think I found a way to make the
> > rounding in rtc_set_ntp_time compatible with a wide range of rtc
> > devices, so the subsecond capable ops I suggested do not seem
> > necessary.
> 
> Looks like a possibility, but...
> 
> > +/* True if now.tv_nsec is close enough to time_sec_nsec that we can call the
> > + * drivers's set_time_ns(). Arbitrarily choose 1ms as the allowable error
> > + * margin.
> > + *
> > + * This also rounds tv_sec to the second that covers the tv_nsec tick we are
> > + * targeting.
> > + */
> > +#define TIME_SET_NSEC_FUZZ (1000*1000)
> > +static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
> > +				  struct timespec64 *now)
> > +{
> > +	long diff;
> > +
> > +	diff = (now->tv_nsec - rtc->time_set_nsec) % NSEC_PER_SEC;
> > +	if (diff < TIME_SET_NSEC_FUZZ) {
> > +		if (rtc->time_set_nsec + diff >= NSEC_PER_SEC) {
> > +			now->tv_sec -= 1;
> > +			now->tv_nsec = NSEC_PER_SEC-1;
> > +		}
> > +		return true;
> > +	}
> > +
> > +	diff = (rtc->time_set_nsec - now->tv_nsec) % NSEC_PER_SEC;
> > +	if (diff < TIME_SET_NSEC_FUZZ) {
> > +		if (now->tv_nsec + diff >= NSEC_PER_SEC) {
> > +			now->tv_sec += 1;
> > +			now->tv_nsec = 0;
> > +		}
> > +		return true;
> > +	}
> 
> I don't think this is correct - and I think it's overly expensive.
> I threw this into a test program to prove the point, and it confirmed
> my suspicions - the above will always return true.
> 
> Here's the test program output:
> 
> now=0.990000000, offset=         0: diff= 990000000 true 2: 0.990000000
> now=0.999000000, offset=         0: diff= 999000000 true 2: 0.999000000
> now=0.999900000, offset=         0: diff= 999900000 true 2: 0.999900000
> now=1.000000000, offset=         0: diff=         0 true 1: 1.000000000
> now=1.000100000, offset=         0: diff=    100000 true 1: 1.000100000
> now=1.001000000, offset=         0: diff=   1000000 true 2: 1.001000000
> now=1.010000000, offset=         0: diff=  10000000 true 2: 1.010000000
> now=0.990000000, offset= 470000000: diff= 520000000 true 2: 0.990000000
> now=0.999000000, offset= 470000000: diff= 529000000 true 2: 0.999000000
> now=0.999900000, offset= 470000000: diff= 529900000 true 2: 0.999900000
> now=1.000000000, offset= 470000000: diff=-470000000 true 1: 1.000000000
> now=1.000100000, offset= 470000000: diff=-469900000 true 1: 1.000100000
> now=1.001000000, offset= 470000000: diff=-469000000 true 1: 1.001000000
> now=1.010000000, offset= 470000000: diff=-460000000 true 1: 1.010000000
> now=0.990000000, offset=1000000000: diff= -10000000 true 1: 0.990000000
> now=0.999000000, offset=1000000000: diff=  -1000000 true 1: 0.999000000
> now=0.999900000, offset=1000000000: diff=   -100000 true 1: 0.999900000
> now=1.000000000, offset=1000000000: diff=         0 true 1: 0.999999999
> now=1.000100000, offset=1000000000: diff=-999900000 true 1: 1.000100000
> now=1.001000000, offset=1000000000: diff=-999000000 true 1: 1.001000000
> now=1.010000000, offset=1000000000: diff=-990000000 true 1: 1.010000000
> now=0.990000000, offset=1470000000: diff=-480000000 true 1: 0.990000000
> now=0.999000000, offset=1470000000: diff=-471000000 true 1: 0.999000000
> now=0.999900000, offset=1470000000: diff=-470100000 true 1: 0.999900000
> now=1.000000000, offset=1470000000: diff=-470000000 true 1: 0.999999999
> now=1.000100000, offset=1470000000: diff=-469900000 true 1: 0.999999999
> now=1.001000000, offset=1470000000: diff=-469000000 true 1: 0.999999999
> now=1.010000000, offset=1470000000: diff=-460000000 true 1: 0.999999999
> 
> which should be self explanatory.  Another point to note is that
> the computation of the time to be set is also incorrect - for
> example, in the case of offset=0, we can see if we're slightly
> early, we set tv_sec=0 rather than tv_sec=1.
> 
> In the offset=1sec case, it also isn't working as we'd expect -
> this case should be providing tv_sec for the preceding second, but
> it doesn't.
> 
> I don't yet have a proposal to fix this...

How about this:

static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
				  struct timespec64 *now)
{
	long diff;

	diff = (now->tv_nsec - rtc->time_set_nsec) % NSEC_PER_SEC;
	/* diff must be within [-NSEC_PER_SEC/2..NSEC_PER_SEC/2) */
	if (diff >= NSEC_PER_SEC / 2)
		diff -= NSEC_PER_SEC;
	else if (diff < -NSEC_PER_SEC / 2)
		diff += NSEC_PER_SEC;

	if (abs(diff) < TIME_SET_NSEC_FUZZ) {
		/* Correct the time for the rtc->time_set_nsec and difference */
		diff += rtc->time_set_nsec;
		if (diff > 0)
			timespec64_sub_ns(now, diff);
		else if (diff < 0)
			timespec64_add_ns(now, -diff);
		return true;
	}
	return false;
}

This always returns now->tv_nsec = 0, but with now->tv_sec appropriately
adjusted.  The return value is true if we're within 1msec of the required
nanoseconds, false otherwise.

Here's the results of my test program, which looks a lot better to me:

now=0.469000000, offset=-1000000000: diff= 469000000 false
now=0.469000001, offset=-1000000000: diff= 469000001 false
now=0.470000000, offset=-1000000000: diff= 470000000 false
now=0.470999999, offset=-1000000000: diff= 470999999 false
now=0.471000000, offset=-1000000000: diff= 471000000 false
now=0.499999999, offset=-1000000000: diff= 499999999 false
now=0.500000000, offset=-1000000000: diff=-500000000 false
now=0.990000000, offset=-1000000000: diff= -10000000 false
now=0.999000000, offset=-1000000000: diff=  -1000000 false
now=0.999900000, offset=-1000000000: diff=   -100000 true: 2.000000000
now=1.000000000, offset=-1000000000: diff=         0 true: 2.000000000
now=1.000100000, offset=-1000000000: diff=    100000 true: 2.000000000
now=1.001000000, offset=-1000000000: diff=   1000000 false
now=1.010000000, offset=-1000000000: diff=  10000000 false
now=1.469000000, offset=-1000000000: diff= 469000000 false
now=1.469000001, offset=-1000000000: diff= 469000001 false
now=1.470000000, offset=-1000000000: diff= 470000000 false
now=1.470999999, offset=-1000000000: diff= 470999999 false
now=1.471000000, offset=-1000000000: diff= 471000000 false
now=1.499999999, offset=-1000000000: diff= 499999999 false
now=1.500000000, offset=-1000000000: diff=-500000000 false

now=0.469000000, offset= -530000000: diff=  -1000000 false
now=0.469000001, offset= -530000000: diff=   -999999 true: 1.000000000
now=0.470000000, offset= -530000000: diff=         0 true: 1.000000000
now=0.470999999, offset= -530000000: diff=    999999 true: 1.000000000
now=0.471000000, offset= -530000000: diff=   1000000 false
now=0.499999999, offset= -530000000: diff=  29999999 false
now=0.500000000, offset= -530000000: diff=  30000000 false
now=0.990000000, offset= -530000000: diff=-480000000 false
now=0.999000000, offset= -530000000: diff=-471000000 false
now=0.999900000, offset= -530000000: diff=-470100000 false
now=1.000000000, offset= -530000000: diff=-470000000 false
now=1.000100000, offset= -530000000: diff=-469900000 false
now=1.001000000, offset= -530000000: diff=-469000000 false
now=1.010000000, offset= -530000000: diff=-460000000 false
now=1.469000000, offset= -530000000: diff=  -1000000 false
now=1.469000001, offset= -530000000: diff=   -999999 true: 2.000000000
now=1.470000000, offset= -530000000: diff=         0 true: 2.000000000
now=1.470999999, offset= -530000000: diff=    999999 true: 2.000000000
now=1.471000000, offset= -530000000: diff=   1000000 false
now=1.499999999, offset= -530000000: diff=  29999999 false
now=1.500000000, offset= -530000000: diff=  30000000 false

now=0.469000000, offset=          0: diff= 469000000 false
now=0.469000001, offset=          0: diff= 469000001 false
now=0.470000000, offset=          0: diff= 470000000 false
now=0.470999999, offset=          0: diff= 470999999 false
now=0.471000000, offset=          0: diff= 471000000 false
now=0.499999999, offset=          0: diff= 499999999 false
now=0.500000000, offset=          0: diff=-500000000 false
now=0.990000000, offset=          0: diff= -10000000 false
now=0.999000000, offset=          0: diff=  -1000000 false
now=0.999900000, offset=          0: diff=   -100000 true: 1.000000000
now=1.000000000, offset=          0: diff=         0 true: 1.000000000
now=1.000100000, offset=          0: diff=    100000 true: 1.000000000
now=1.001000000, offset=          0: diff=   1000000 false
now=1.010000000, offset=          0: diff=  10000000 false
now=1.469000000, offset=          0: diff= 469000000 false
now=1.469000001, offset=          0: diff= 469000001 false
now=1.470000000, offset=          0: diff= 470000000 false
now=1.470999999, offset=          0: diff= 470999999 false
now=1.471000000, offset=          0: diff= 471000000 false
now=1.499999999, offset=          0: diff= 499999999 false
now=1.500000000, offset=          0: diff=-500000000 false

now=0.469000000, offset=  470000000: diff=  -1000000 false
now=0.469000001, offset=  470000000: diff=   -999999 true: 0.000000000
now=0.470000000, offset=  470000000: diff=         0 true: 0.000000000
now=0.470999999, offset=  470000000: diff=    999999 true: 0.000000000
now=0.471000000, offset=  470000000: diff=   1000000 false
now=0.499999999, offset=  470000000: diff=  29999999 false
now=0.500000000, offset=  470000000: diff=  30000000 false
now=0.990000000, offset=  470000000: diff=-480000000 false
now=0.999000000, offset=  470000000: diff=-471000000 false
now=0.999900000, offset=  470000000: diff=-470100000 false
now=1.000000000, offset=  470000000: diff=-470000000 false
now=1.000100000, offset=  470000000: diff=-469900000 false
now=1.001000000, offset=  470000000: diff=-469000000 false
now=1.010000000, offset=  470000000: diff=-460000000 false
now=1.469000000, offset=  470000000: diff=  -1000000 false
now=1.469000001, offset=  470000000: diff=   -999999 true: 1.000000000
now=1.470000000, offset=  470000000: diff=         0 true: 1.000000000
now=1.470999999, offset=  470000000: diff=    999999 true: 1.000000000
now=1.471000000, offset=  470000000: diff=   1000000 false
now=1.499999999, offset=  470000000: diff=  29999999 false
now=1.500000000, offset=  470000000: diff=  30000000 false

now=0.469000000, offset= 1000000000: diff= 469000000 false
now=0.469000001, offset= 1000000000: diff= 469000001 false
now=0.470000000, offset= 1000000000: diff= 470000000 false
now=0.470999999, offset= 1000000000: diff= 470999999 false
now=0.471000000, offset= 1000000000: diff= 471000000 false
now=0.499999999, offset= 1000000000: diff= 499999999 false
now=0.500000000, offset= 1000000000: diff=-500000000 false
now=0.990000000, offset= 1000000000: diff= -10000000 false
now=0.999000000, offset= 1000000000: diff=  -1000000 false
now=0.999900000, offset= 1000000000: diff=   -100000 true: 0.000000000
now=1.000000000, offset= 1000000000: diff=         0 true: 0.000000000
now=1.000100000, offset= 1000000000: diff=    100000 true: 0.000000000
now=1.001000000, offset= 1000000000: diff=   1000000 false
now=1.010000000, offset= 1000000000: diff=  10000000 false
now=1.469000000, offset= 1000000000: diff= 469000000 false
now=1.469000001, offset= 1000000000: diff= 469000001 false
now=1.470000000, offset= 1000000000: diff= 470000000 false
now=1.470999999, offset= 1000000000: diff= 470999999 false
now=1.471000000, offset= 1000000000: diff= 471000000 false
now=1.499999999, offset= 1000000000: diff= 499999999 false
now=1.500000000, offset= 1000000000: diff=-500000000 false

now=0.469000000, offset= 1470000000: diff=  -1000000 false
now=0.469000001, offset= 1470000000: diff=   -999999 true: -1.000000000
now=0.470000000, offset= 1470000000: diff=         0 true: -1.000000000
now=0.470999999, offset= 1470000000: diff=    999999 true: -1.000000000
now=0.471000000, offset= 1470000000: diff=   1000000 false
now=0.499999999, offset= 1470000000: diff=  29999999 false
now=0.500000000, offset= 1470000000: diff=  30000000 false
now=0.990000000, offset= 1470000000: diff=-480000000 false
now=0.999000000, offset= 1470000000: diff=-471000000 false
now=0.999900000, offset= 1470000000: diff=-470100000 false
now=1.000000000, offset= 1470000000: diff=-470000000 false
now=1.000100000, offset= 1470000000: diff=-469900000 false
now=1.001000000, offset= 1470000000: diff=-469000000 false
now=1.010000000, offset= 1470000000: diff=-460000000 false
now=1.469000000, offset= 1470000000: diff=  -1000000 false
now=1.469000001, offset= 1470000000: diff=   -999999 true: 0.000000000
now=1.470000000, offset= 1470000000: diff=         0 true: 0.000000000
now=1.470999999, offset= 1470000000: diff=    999999 true: 0.000000000
now=1.471000000, offset= 1470000000: diff=   1000000 false
now=1.499999999, offset= 1470000000: diff=  29999999 false
now=1.500000000, offset= 1470000000: diff=  30000000 false

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-20 22:45     ` Jason Gunthorpe
  2017-09-21  7:59       ` Russell King - ARM Linux
@ 2017-09-21  9:46       ` Russell King - ARM Linux
  2017-09-21 11:29       ` Russell King - ARM Linux
  2017-09-21 12:28       ` Russell King - ARM Linux
  3 siblings, 0 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-21  9:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 20, 2017 at 04:45:22PM -0600, Jason Gunthorpe wrote:
>  /**
>   * rtc_set_ntp_time - Save NTP synchronized time to the RTC
>   * @now: Current time of day
> + * @target_nsec: Output value indicating what now->tv_nsec

I'd suggest:

 * @target_nsec: pointer for desired now->tv_nsec value

>   *
>   * Replacement for the NTP platform function update_persistent_clock64
>   * that stores time for later retrieval by rtc_hctosys.
> @@ -20,28 +54,35 @@
>   *
>   * If temporary failure is indicated the caller should try again 'soon'
>   */
> -int rtc_set_ntp_time(struct timespec64 now)
> +int rtc_set_ntp_time(struct timespec64 now, long *target_nsec)
>  {
>  	struct rtc_device *rtc;
>  	struct rtc_time tm;
>  	int err = -ENODEV;
>  
> -	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
> -		rtc_time64_to_tm(now.tv_sec, &tm);
> -	else
> -		rtc_time64_to_tm(now.tv_sec + 1, &tm);
> -
>  	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
> -	if (rtc) {
> +	if (!rtc)
> +		goto out_err;
> +
> +	/* The ntp code must call this with the correct value in tv_nsec, if
> +	 * it does not we update target_nsec and return EPROTO to make the ntp
> +	 * code try again later.
> +	 */
> +	*target_nsec = rtc->time_set_nsec;

Here, we want just the positive nanoseconds part of the desired offset
(iow, the value we want to see in now.tv_nsec):

	nsec = rtc->time_set_nsec % NSEC_PER_SEC;
	if (nsec < 0)
		nsec += NSEC_PER_SEC;
	*target_nsec = nsec;

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync@8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-20 22:45     ` Jason Gunthorpe
  2017-09-21  7:59       ` Russell King - ARM Linux
  2017-09-21  9:46       ` Russell King - ARM Linux
@ 2017-09-21 11:29       ` Russell King - ARM Linux
  2017-09-21 12:28       ` Russell King - ARM Linux
  3 siblings, 0 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-21 11:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 20, 2017 at 04:45:22PM -0600, Jason Gunthorpe wrote:
> +		if (!fail || fail == -ENODEV)
> +			timespec64_add_ns(&next, 659 * NSEC_PER_SEC);
> +		else
> +			/* Update failed, try again in about 10 seconds */
> +			timespec64_add_ns(&next, 10 * NSEC_PER_SEC);

Trying to build this, the compiler complains about integer overflow here.
You need to cast these to 64-bit values.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-21  9:32         ` Russell King - ARM Linux
@ 2017-09-21 11:30           ` Russell King - ARM Linux
  2017-09-21 22:05           ` Jason Gunthorpe
  1 sibling, 0 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-21 11:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 21, 2017 at 10:32:19AM +0100, Russell King - ARM Linux wrote:
> On Thu, Sep 21, 2017 at 08:59:07AM +0100, Russell King - ARM Linux wrote:
> > On Wed, Sep 20, 2017 at 04:45:22PM -0600, Jason Gunthorpe wrote:
> > > On Wed, Sep 20, 2017 at 05:51:41PM +0100, Russell King - ARM Linux wrote:
> > > > I sort-of agree as far as the time offset information goes, but there's
> > > > a complication that we only open the RTC to set the time at the
> > > > point in
> > > 
> > > Hi Russell,
> > > 
> > > What do you think of this untested approach in the below patch?
> > > 
> > > Upon more careful inspection I think I found a way to make the
> > > rounding in rtc_set_ntp_time compatible with a wide range of rtc
> > > devices, so the subsecond capable ops I suggested do not seem
> > > necessary.
> > 
> > Looks like a possibility, but...
> > 
> > > +/* True if now.tv_nsec is close enough to time_sec_nsec that we can call the
> > > + * drivers's set_time_ns(). Arbitrarily choose 1ms as the allowable error
> > > + * margin.
> > > + *
> > > + * This also rounds tv_sec to the second that covers the tv_nsec tick we are
> > > + * targeting.
> > > + */
> > > +#define TIME_SET_NSEC_FUZZ (1000*1000)
> > > +static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
> > > +				  struct timespec64 *now)
> > > +{
> > > +	long diff;
> > > +
> > > +	diff = (now->tv_nsec - rtc->time_set_nsec) % NSEC_PER_SEC;
> > > +	if (diff < TIME_SET_NSEC_FUZZ) {
> > > +		if (rtc->time_set_nsec + diff >= NSEC_PER_SEC) {
> > > +			now->tv_sec -= 1;
> > > +			now->tv_nsec = NSEC_PER_SEC-1;
> > > +		}
> > > +		return true;
> > > +	}
> > > +
> > > +	diff = (rtc->time_set_nsec - now->tv_nsec) % NSEC_PER_SEC;
> > > +	if (diff < TIME_SET_NSEC_FUZZ) {
> > > +		if (now->tv_nsec + diff >= NSEC_PER_SEC) {
> > > +			now->tv_sec += 1;
> > > +			now->tv_nsec = 0;
> > > +		}
> > > +		return true;
> > > +	}
> > 
> > I don't think this is correct - and I think it's overly expensive.
> > I threw this into a test program to prove the point, and it confirmed
> > my suspicions - the above will always return true.
> > 
> > Here's the test program output:
> > 
> > now=0.990000000, offset=         0: diff= 990000000 true 2: 0.990000000
> > now=0.999000000, offset=         0: diff= 999000000 true 2: 0.999000000
> > now=0.999900000, offset=         0: diff= 999900000 true 2: 0.999900000
> > now=1.000000000, offset=         0: diff=         0 true 1: 1.000000000
> > now=1.000100000, offset=         0: diff=    100000 true 1: 1.000100000
> > now=1.001000000, offset=         0: diff=   1000000 true 2: 1.001000000
> > now=1.010000000, offset=         0: diff=  10000000 true 2: 1.010000000
> > now=0.990000000, offset= 470000000: diff= 520000000 true 2: 0.990000000
> > now=0.999000000, offset= 470000000: diff= 529000000 true 2: 0.999000000
> > now=0.999900000, offset= 470000000: diff= 529900000 true 2: 0.999900000
> > now=1.000000000, offset= 470000000: diff=-470000000 true 1: 1.000000000
> > now=1.000100000, offset= 470000000: diff=-469900000 true 1: 1.000100000
> > now=1.001000000, offset= 470000000: diff=-469000000 true 1: 1.001000000
> > now=1.010000000, offset= 470000000: diff=-460000000 true 1: 1.010000000
> > now=0.990000000, offset=1000000000: diff= -10000000 true 1: 0.990000000
> > now=0.999000000, offset=1000000000: diff=  -1000000 true 1: 0.999000000
> > now=0.999900000, offset=1000000000: diff=   -100000 true 1: 0.999900000
> > now=1.000000000, offset=1000000000: diff=         0 true 1: 0.999999999
> > now=1.000100000, offset=1000000000: diff=-999900000 true 1: 1.000100000
> > now=1.001000000, offset=1000000000: diff=-999000000 true 1: 1.001000000
> > now=1.010000000, offset=1000000000: diff=-990000000 true 1: 1.010000000
> > now=0.990000000, offset=1470000000: diff=-480000000 true 1: 0.990000000
> > now=0.999000000, offset=1470000000: diff=-471000000 true 1: 0.999000000
> > now=0.999900000, offset=1470000000: diff=-470100000 true 1: 0.999900000
> > now=1.000000000, offset=1470000000: diff=-470000000 true 1: 0.999999999
> > now=1.000100000, offset=1470000000: diff=-469900000 true 1: 0.999999999
> > now=1.001000000, offset=1470000000: diff=-469000000 true 1: 0.999999999
> > now=1.010000000, offset=1470000000: diff=-460000000 true 1: 0.999999999
> > 
> > which should be self explanatory.  Another point to note is that
> > the computation of the time to be set is also incorrect - for
> > example, in the case of offset=0, we can see if we're slightly
> > early, we set tv_sec=0 rather than tv_sec=1.
> > 
> > In the offset=1sec case, it also isn't working as we'd expect -
> > this case should be providing tv_sec for the preceding second, but
> > it doesn't.
> > 
> > I don't yet have a proposal to fix this...
> 
> How about this:
> 
> static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
> 				  struct timespec64 *now)
> {
> 	long diff;
> 
> 	diff = (now->tv_nsec - rtc->time_set_nsec) % NSEC_PER_SEC;
> 	/* diff must be within [-NSEC_PER_SEC/2..NSEC_PER_SEC/2) */
> 	if (diff >= NSEC_PER_SEC / 2)
> 		diff -= NSEC_PER_SEC;
> 	else if (diff < -NSEC_PER_SEC / 2)
> 		diff += NSEC_PER_SEC;
> 
> 	if (abs(diff) < TIME_SET_NSEC_FUZZ) {
> 		/* Correct the time for the rtc->time_set_nsec and difference */
> 		diff += rtc->time_set_nsec;
> 		if (diff > 0)
> 			timespec64_sub_ns(now, diff);

The only issue here is that we don't have this function...

> 		else if (diff < 0)
> 			timespec64_add_ns(now, -diff);
> 		return true;
> 	}
> 	return false;
> }
> 
> This always returns now->tv_nsec = 0, but with now->tv_sec appropriately
> adjusted.  The return value is true if we're within 1msec of the required
> nanoseconds, false otherwise.
> 
> Here's the results of my test program, which looks a lot better to me:
> 
> now=0.469000000, offset=-1000000000: diff= 469000000 false
> now=0.469000001, offset=-1000000000: diff= 469000001 false
> now=0.470000000, offset=-1000000000: diff= 470000000 false
> now=0.470999999, offset=-1000000000: diff= 470999999 false
> now=0.471000000, offset=-1000000000: diff= 471000000 false
> now=0.499999999, offset=-1000000000: diff= 499999999 false
> now=0.500000000, offset=-1000000000: diff=-500000000 false
> now=0.990000000, offset=-1000000000: diff= -10000000 false
> now=0.999000000, offset=-1000000000: diff=  -1000000 false
> now=0.999900000, offset=-1000000000: diff=   -100000 true: 2.000000000
> now=1.000000000, offset=-1000000000: diff=         0 true: 2.000000000
> now=1.000100000, offset=-1000000000: diff=    100000 true: 2.000000000
> now=1.001000000, offset=-1000000000: diff=   1000000 false
> now=1.010000000, offset=-1000000000: diff=  10000000 false
> now=1.469000000, offset=-1000000000: diff= 469000000 false
> now=1.469000001, offset=-1000000000: diff= 469000001 false
> now=1.470000000, offset=-1000000000: diff= 470000000 false
> now=1.470999999, offset=-1000000000: diff= 470999999 false
> now=1.471000000, offset=-1000000000: diff= 471000000 false
> now=1.499999999, offset=-1000000000: diff= 499999999 false
> now=1.500000000, offset=-1000000000: diff=-500000000 false
> 
> now=0.469000000, offset= -530000000: diff=  -1000000 false
> now=0.469000001, offset= -530000000: diff=   -999999 true: 1.000000000
> now=0.470000000, offset= -530000000: diff=         0 true: 1.000000000
> now=0.470999999, offset= -530000000: diff=    999999 true: 1.000000000
> now=0.471000000, offset= -530000000: diff=   1000000 false
> now=0.499999999, offset= -530000000: diff=  29999999 false
> now=0.500000000, offset= -530000000: diff=  30000000 false
> now=0.990000000, offset= -530000000: diff=-480000000 false
> now=0.999000000, offset= -530000000: diff=-471000000 false
> now=0.999900000, offset= -530000000: diff=-470100000 false
> now=1.000000000, offset= -530000000: diff=-470000000 false
> now=1.000100000, offset= -530000000: diff=-469900000 false
> now=1.001000000, offset= -530000000: diff=-469000000 false
> now=1.010000000, offset= -530000000: diff=-460000000 false
> now=1.469000000, offset= -530000000: diff=  -1000000 false
> now=1.469000001, offset= -530000000: diff=   -999999 true: 2.000000000
> now=1.470000000, offset= -530000000: diff=         0 true: 2.000000000
> now=1.470999999, offset= -530000000: diff=    999999 true: 2.000000000
> now=1.471000000, offset= -530000000: diff=   1000000 false
> now=1.499999999, offset= -530000000: diff=  29999999 false
> now=1.500000000, offset= -530000000: diff=  30000000 false
> 
> now=0.469000000, offset=          0: diff= 469000000 false
> now=0.469000001, offset=          0: diff= 469000001 false
> now=0.470000000, offset=          0: diff= 470000000 false
> now=0.470999999, offset=          0: diff= 470999999 false
> now=0.471000000, offset=          0: diff= 471000000 false
> now=0.499999999, offset=          0: diff= 499999999 false
> now=0.500000000, offset=          0: diff=-500000000 false
> now=0.990000000, offset=          0: diff= -10000000 false
> now=0.999000000, offset=          0: diff=  -1000000 false
> now=0.999900000, offset=          0: diff=   -100000 true: 1.000000000
> now=1.000000000, offset=          0: diff=         0 true: 1.000000000
> now=1.000100000, offset=          0: diff=    100000 true: 1.000000000
> now=1.001000000, offset=          0: diff=   1000000 false
> now=1.010000000, offset=          0: diff=  10000000 false
> now=1.469000000, offset=          0: diff= 469000000 false
> now=1.469000001, offset=          0: diff= 469000001 false
> now=1.470000000, offset=          0: diff= 470000000 false
> now=1.470999999, offset=          0: diff= 470999999 false
> now=1.471000000, offset=          0: diff= 471000000 false
> now=1.499999999, offset=          0: diff= 499999999 false
> now=1.500000000, offset=          0: diff=-500000000 false
> 
> now=0.469000000, offset=  470000000: diff=  -1000000 false
> now=0.469000001, offset=  470000000: diff=   -999999 true: 0.000000000
> now=0.470000000, offset=  470000000: diff=         0 true: 0.000000000
> now=0.470999999, offset=  470000000: diff=    999999 true: 0.000000000
> now=0.471000000, offset=  470000000: diff=   1000000 false
> now=0.499999999, offset=  470000000: diff=  29999999 false
> now=0.500000000, offset=  470000000: diff=  30000000 false
> now=0.990000000, offset=  470000000: diff=-480000000 false
> now=0.999000000, offset=  470000000: diff=-471000000 false
> now=0.999900000, offset=  470000000: diff=-470100000 false
> now=1.000000000, offset=  470000000: diff=-470000000 false
> now=1.000100000, offset=  470000000: diff=-469900000 false
> now=1.001000000, offset=  470000000: diff=-469000000 false
> now=1.010000000, offset=  470000000: diff=-460000000 false
> now=1.469000000, offset=  470000000: diff=  -1000000 false
> now=1.469000001, offset=  470000000: diff=   -999999 true: 1.000000000
> now=1.470000000, offset=  470000000: diff=         0 true: 1.000000000
> now=1.470999999, offset=  470000000: diff=    999999 true: 1.000000000
> now=1.471000000, offset=  470000000: diff=   1000000 false
> now=1.499999999, offset=  470000000: diff=  29999999 false
> now=1.500000000, offset=  470000000: diff=  30000000 false
> 
> now=0.469000000, offset= 1000000000: diff= 469000000 false
> now=0.469000001, offset= 1000000000: diff= 469000001 false
> now=0.470000000, offset= 1000000000: diff= 470000000 false
> now=0.470999999, offset= 1000000000: diff= 470999999 false
> now=0.471000000, offset= 1000000000: diff= 471000000 false
> now=0.499999999, offset= 1000000000: diff= 499999999 false
> now=0.500000000, offset= 1000000000: diff=-500000000 false
> now=0.990000000, offset= 1000000000: diff= -10000000 false
> now=0.999000000, offset= 1000000000: diff=  -1000000 false
> now=0.999900000, offset= 1000000000: diff=   -100000 true: 0.000000000
> now=1.000000000, offset= 1000000000: diff=         0 true: 0.000000000
> now=1.000100000, offset= 1000000000: diff=    100000 true: 0.000000000
> now=1.001000000, offset= 1000000000: diff=   1000000 false
> now=1.010000000, offset= 1000000000: diff=  10000000 false
> now=1.469000000, offset= 1000000000: diff= 469000000 false
> now=1.469000001, offset= 1000000000: diff= 469000001 false
> now=1.470000000, offset= 1000000000: diff= 470000000 false
> now=1.470999999, offset= 1000000000: diff= 470999999 false
> now=1.471000000, offset= 1000000000: diff= 471000000 false
> now=1.499999999, offset= 1000000000: diff= 499999999 false
> now=1.500000000, offset= 1000000000: diff=-500000000 false
> 
> now=0.469000000, offset= 1470000000: diff=  -1000000 false
> now=0.469000001, offset= 1470000000: diff=   -999999 true: -1.000000000
> now=0.470000000, offset= 1470000000: diff=         0 true: -1.000000000
> now=0.470999999, offset= 1470000000: diff=    999999 true: -1.000000000
> now=0.471000000, offset= 1470000000: diff=   1000000 false
> now=0.499999999, offset= 1470000000: diff=  29999999 false
> now=0.500000000, offset= 1470000000: diff=  30000000 false
> now=0.990000000, offset= 1470000000: diff=-480000000 false
> now=0.999000000, offset= 1470000000: diff=-471000000 false
> now=0.999900000, offset= 1470000000: diff=-470100000 false
> now=1.000000000, offset= 1470000000: diff=-470000000 false
> now=1.000100000, offset= 1470000000: diff=-469900000 false
> now=1.001000000, offset= 1470000000: diff=-469000000 false
> now=1.010000000, offset= 1470000000: diff=-460000000 false
> now=1.469000000, offset= 1470000000: diff=  -1000000 false
> now=1.469000001, offset= 1470000000: diff=   -999999 true: 0.000000000
> now=1.470000000, offset= 1470000000: diff=         0 true: 0.000000000
> now=1.470999999, offset= 1470000000: diff=    999999 true: 0.000000000
> now=1.471000000, offset= 1470000000: diff=   1000000 false
> now=1.499999999, offset= 1470000000: diff=  29999999 false
> now=1.500000000, offset= 1470000000: diff=  30000000 false
> 
> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
> According to speedtest.net: 8.21Mbps down 510kbps up
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-20 22:45     ` Jason Gunthorpe
                         ` (2 preceding siblings ...)
  2017-09-21 11:29       ` Russell King - ARM Linux
@ 2017-09-21 12:28       ` Russell King - ARM Linux
  2017-09-26 11:58         ` Alexandre Belloni
  3 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-21 12:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 20, 2017 at 04:45:22PM -0600, Jason Gunthorpe wrote:
> What do you think of this untested approach in the below patch?

There's another issue.

Most drivers use rtc_device_register() or devm_rtc_device_register()
rather than rtc_allocate_device() (which is static.)  This does not
give RTC drivers a chance to set rtc->time_set_nsec before the
RTC is registered with the kernel.

Setting it after the device has been registered is racy.  So, having
this member in struct rtc_device and assuming that drivers will override
the value doesn't work.

It doesn't make sense to put it in rtc_class_ops as it isn't an
operation, but we could add a callback in there which is used during
initialisation but before registration which could be used to set this
member.

Thoughts?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-21  9:32         ` Russell King - ARM Linux
  2017-09-21 11:30           ` Russell King - ARM Linux
@ 2017-09-21 22:05           ` Jason Gunthorpe
  2017-09-21 22:45             ` Russell King - ARM Linux
  1 sibling, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2017-09-21 22:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 21, 2017 at 10:32:19AM +0100, Russell King - ARM Linux
wrote:

I've combined several of your messages together into this reply...

> static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
> 				  struct timespec64 *now)
> {
> 	long diff;
> 
> 	diff = (now->tv_nsec - rtc->time_set_nsec) % NSEC_PER_SEC;
> 	/* diff must be within [-NSEC_PER_SEC/2..NSEC_PER_SEC/2) */
> 	if (diff >= NSEC_PER_SEC / 2)
> 		diff -= NSEC_PER_SEC;
> 	else if (diff < -NSEC_PER_SEC / 2)
> 		diff += NSEC_PER_SEC;

I like your concept of allowing a negative time_set_nsec, I was
focused on a positive value and botched the first draft by not casting
to unsigned for the maths.. It seems much better because it
incorporates the rounding calculation into the common code instead of
shifting it to the driver.

I revised this idea a few times and came up with something a bit
simpler to understand by changing from a time_set_nsec to its inverse
the set_offset_nsec, that makes the maths simpler and we can avoid
troublesome signed modulos by using time maths functions instead of
open coding them.

See full revised patch below, this time I did test the
rtc_tv_nsec_ok() using your test vectors and it seems OK. Still have
not tested the whole patch...

So with v2 of this patch, the driver would specify positive 1s, and
then the set will be called when tv_nsec == 0, but the tv_sec will
be +1.

Similarly the existing case of 0.5s offset will work more like
before, where it will be called with tv_nsec == .5s and the tv_sec
will be +1 - before I think this happened reliably 'by accident' due
to the rounding.

> Most drivers use rtc_device_register() or devm_rtc_device_register()
> rather than rtc_allocate_device() (which is static.)  This does not
> give RTC drivers a chance to set rtc->time_set_nsec before the RTC
> is registered with the kernel.

You are correct about the race, of course. I left it like that
deliberately, I'm not sure it is worth solving since I felt nothing
should be setting the RTC within a few CPU instructions of registering
it.

As you say, adding a pre-registration init op will solve the issue if
you think it is worth solving.

Jason

>From 566712cb27596836469637b0e50e7fd973368df8 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Thu, 21 Sep 2017 15:55:33 -0600
Subject: [PATCH v2] rtc: Allow rtc drivers to specify the tv_nsec value for ntp

ntp is currently hardwired to try and call the rtc set when wall clock
tv_nsec is 0.5 seconds. This historical behaviour works well with certain
PC RTCs, but is not universal to all rtc hardware.

Change how this works by introducing the driver specific concept of
a required delay between current wall clock time and the target time
to set (with a 0 tv_nsecs). When this delay is set to 0.5 seconds then
the behaviour will be the same as today, at wall clock time 0.5 sec the
RTC set will be called to set 1.0 sec.

Each RTC driver should set the set_offset_nsec according to its needs.

This also polices the incoming set time ts_nsec from NTP, if it is not
within +-1ms of the required time then the set is deferred. The target
time to set is 'snapped' to the nearest value.

The calculation of the sleep time for ntp is also revised to use modern
helper functions, and to more directly and safely compute a relative
jiffies delay that will result in the correct tv_nsec. If for some reason
the timer does not meet the requirement then it does a short sleep
and tries again.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/rtc/class.c   |  6 ++++
 drivers/rtc/systohc.c | 85 ++++++++++++++++++++++++++++++++++++++++++---------
 include/linux/rtc.h   | 10 +++++-
 kernel/time/ntp.c     | 62 ++++++++++++++++++++++++-------------
 4 files changed, 125 insertions(+), 38 deletions(-)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 2ed970d61da140..eef4123a573504 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -161,6 +161,12 @@ static struct rtc_device *rtc_allocate_device(void)
 
 	device_initialize(&rtc->dev);
 
+	/* Drivers can revise this default after allocating the device. It
+	 * should be the value of wallclock tv_nsec that the driver needs in
+	 * order to synchronize the second tick over during set.
+	 */
+	rtc->set_offset_nsec =  NSEC_PER_SEC / 2;
+
 	rtc->irq_freq = 1;
 	rtc->max_user_freq = 64;
 	rtc->dev.class = rtc_class;
diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index b4a68ffcd06bb8..a7d0bbe2577110 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -7,9 +7,42 @@
 #include <linux/rtc.h>
 #include <linux/time.h>
 
+/* Determine if we can call to driver to set the time. Drivers can only be
+ * called to set a second aligned time value, and the field set_offset_nsec
+ * specifies how far away from the second aligned time to call the driver.
+ *
+ * This also compute 'to_set' which is the time we are trying to set, and has
+ * a zero in tv_nsecs, such that:
+ *    to_set - set_delay_nsec == now +/- FUZZ
+ *
+ */
+#define TIME_SET_NSEC_FUZZ (1000 * 1000)
+static inline bool rtc_tv_nsec_ok(struct rtc_device *rtc,
+				  struct timespec64 *to_set,
+				  const struct timespec64 *now)
+{
+	struct timespec64 delay = {.tv_sec = 0,
+				   .tv_nsec = rtc->set_offset_nsec};
+
+	*to_set = timespec64_add(*now, delay);
+
+	if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
+		to_set->tv_nsec = 0;
+		return true;
+	}
+
+	if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
+		to_set->tv_sec++;
+		to_set->tv_nsec = 0;
+		return true;
+	}
+	return false;
+}
+
 /**
  * rtc_set_ntp_time - Save NTP synchronized time to the RTC
  * @now: Current time of day
+ * @target_nsec: pointer for desired now->tv_nsec value
  *
  * Replacement for the NTP platform function update_persistent_clock64
  * that stores time for later retrieval by rtc_hctosys.
@@ -18,30 +51,52 @@
  * possible at all, and various other -errno for specific temporary failure
  * cases.
  *
+ * -EPROTO is returned if now.tv_nsec is not close enough to *target_nsec.
+ (
  * If temporary failure is indicated the caller should try again 'soon'
  */
-int rtc_set_ntp_time(struct timespec64 now)
+int rtc_set_ntp_time(struct timespec64 now, long *target_nsec)
 {
 	struct rtc_device *rtc;
 	struct rtc_time tm;
+	struct timespec64 to_set;
 	int err = -ENODEV;
-
-	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
-		rtc_time64_to_tm(now.tv_sec, &tm);
-	else
-		rtc_time64_to_tm(now.tv_sec + 1, &tm);
+	bool ok;
 
 	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
-	if (rtc) {
-		/* rtc_hctosys exclusively uses UTC, so we call set_time here,
-		 * not set_mmss. */
-		if (rtc->ops &&
-		    (rtc->ops->set_time ||
-		     rtc->ops->set_mmss64 ||
-		     rtc->ops->set_mmss))
-			err = rtc_set_time(rtc, &tm);
-		rtc_class_close(rtc);
+	if (!rtc)
+		goto out_err;
+
+	if (!rtc->ops || (!rtc->ops->set_time && !rtc->ops->set_mmss64 &&
+			  !rtc->ops->set_mmss))
+		goto out_close;
+
+	/* Compute the value of tv_nsec we require the caller to supply in
+	 * now.tv_nsec.  This is the value such that (now +
+	 * set_offset_nsec).tv_nsec == 0.
+	 */
+	set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec);
+	*target_nsec = to_set.tv_nsec;
+
+	/* The ntp code must call this with the correct value in tv_nsec, if
+	 * it does not we update target_nsec and return EPROTO to make the ntp
+	 * code try again later.
+	 */
+	ok = rtc_tv_nsec_ok(rtc, &to_set, &now);
+	if (!ok) {
+		err = -EPROTO;
+		goto out_close;
 	}
 
+	rtc_time64_to_tm(to_set.tv_sec, &tm);
+
+	/* rtc_hctosys exclusively uses UTC, so we call set_time here, not
+	 * set_mmss.
+	 */
+	err = rtc_set_time(rtc, &tm);
+
+out_close:
+	rtc_class_close(rtc);
+out_err:
 	return err;
 }
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 0a0f0d14a5fba5..7d526550a700f8 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -137,6 +137,14 @@ struct rtc_device {
 	/* Some hardware can't support UIE mode */
 	int uie_unsupported;
 
+	/* Number of nsec it takes to set the RTC clock. This influences when
+	 * the set ops are called. An offset:
+	 *   - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s
+	 *   - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s
+	 *   - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s
+	 */
+	long set_offset_nsec;
+
 	bool registered;
 
 	struct nvmem_config *nvmem_config;
@@ -174,7 +182,7 @@ extern void devm_rtc_device_unregister(struct device *dev,
 
 extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm);
 extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
-extern int rtc_set_ntp_time(struct timespec64 now);
+extern int rtc_set_ntp_time(struct timespec64 now, long *target_nsec);
 int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
 extern int rtc_read_alarm(struct rtc_device *rtc,
 			struct rtc_wkalrm *alrm);
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index edf19cc5314043..6d2f521dd97748 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -514,17 +514,19 @@ static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock);
 
 static void sync_cmos_clock(struct work_struct *work)
 {
-	struct timespec64 now;
-	struct timespec64 next;
+	struct timespec64 now, next, delta;
 	int fail = 1;
+	long target_nsec = NSEC_PER_SEC / 2;
 
 	/*
-	 * If we have an externally synchronized Linux clock, then update
-	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
-	 * called as close as possible to 500 ms before the new second starts.
-	 * This code is run on a timer.  If the clock is set, that timer
-	 * may not expire at the correct time.  Thus, we adjust...
-	 * We want the clock to be within a couple of ticks from the target.
+	 * If we have an externally synchronized Linux clock, then update CMOS
+	 * clock accordingly every ~11 minutes.  Histiocally Set_rtc_mmss()
+	 * has to be called as close as possible to 500 ms (target_nsec)
+	 * before the new second starts, but new RTC drivers can select a
+	 * different value.  This code is run on a timer.  If the clock is
+	 * set, that timer may not expire at the correct time.  Thus, we
+	 * adjust...  We want the clock to be within a couple of ticks from
+	 * the target.
 	 */
 	if (!ntp_synced()) {
 		/*
@@ -547,25 +549,41 @@ static void sync_cmos_clock(struct work_struct *work)
 
 #ifdef CONFIG_RTC_SYSTOHC
 		if (fail == -ENODEV)
-			fail = rtc_set_ntp_time(adjust);
+			fail = rtc_set_ntp_time(adjust, &target_nsec);
 #endif
 	}
 
-	next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2);
-	if (next.tv_nsec <= 0)
-		next.tv_nsec += NSEC_PER_SEC;
+	do {
+		/*
+		 * Compute the next wall clock time to try and set the
+		 * clock
+		 */
+		next = now;
+		if (!fail || fail == -ENODEV)
+			timespec64_add_ns(&next, 659ULL * NSEC_PER_SEC);
+		else
+			/* Update failed, try again in about 10 seconds */
+			timespec64_add_ns(&next, 10ULL * NSEC_PER_SEC);
 
-	if (!fail || fail == -ENODEV)
-		next.tv_sec = 659;
-	else
-		next.tv_sec = 0;
+		/*
+		 * The next call to sync_cmos_clock needs to have have a wall
+		 * clock tv_nsec value equal to target_nsec.
+		 */
+		if (next.tv_nsec > target_nsec)
+			next.tv_sec++;
+		next.tv_nsec = target_nsec;
 
-	if (next.tv_nsec >= NSEC_PER_SEC) {
-		next.tv_sec++;
-		next.tv_nsec -= NSEC_PER_SEC;
-	}
-	queue_delayed_work(system_power_efficient_wq,
-			   &sync_cmos_work, timespec64_to_jiffies(&next));
+		/*
+		 * Convert to a relative delay. If time set took a really long
+		 * time, or the wall clock was changed, this might become
+		 * negative, so try again.
+		 */
+		getnstimeofday64(&now);
+		delta = timespec64_sub(next, now);
+	} while (delta.tv_sec <= 0);
+
+	queue_delayed_work(system_power_efficient_wq, &sync_cmos_work,
+			   timespec64_to_jiffies(&delta));
 }
 
 void ntp_notify_cmos_timer(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-21 22:05           ` Jason Gunthorpe
@ 2017-09-21 22:45             ` Russell King - ARM Linux
  2017-09-21 23:20               ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-21 22:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 21, 2017 at 04:05:10PM -0600, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2017 at 10:32:19AM +0100, Russell King - ARM Linux
> wrote:
> 
> I've combined several of your messages together into this reply...

Sorry about that, I'm currently ill and sleep deprived, so it's been
a struggle to pick up on everything together.

> So with v2 of this patch, the driver would specify positive 1s, and
> then the set will be called when tv_nsec == 0, but the tv_sec will
> be +1.
> 
> Similarly the existing case of 0.5s offset will work more like
> before, where it will be called with tv_nsec == .5s and the tv_sec
> will be +1 - before I think this happened reliably 'by accident' due
> to the rounding.

It seems to mean (from reading the code) that I'd need to specify
-470ms for the PCF8523, if I'm understand what you're saying and
the code correctly.

I should point out, however, that I'm running a modified version
of your previous patch with 470ms specified - I added a debug
printk() for the failure cases, and I seem to be getting quite a
number:

[  416.966909] rtc_tv_nsec_ok: fail: ts=1506001533.510194975
[  427.974142] rtc_tv_nsec_ok: fail: ts=1506001544.518073728
[  768.946856] rtc_tv_nsec_ok: fail: ts=1506001885.510091123
[  779.954207] rtc_tv_nsec_ok: fail: ts=1506001896.518068045
...
[31391.230806] rtc_tv_nsec_ok: fail: ts=1506032509.510048412
[31402.238221] rtc_tv_nsec_ok: fail: ts=1506032520.518080991

176 so far in that window, which equates to an average failure rate
of one failure every 3m5s.  This is on a kernel with:

CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_HZ=250

>From what I can see, we have failures that are 11s apart and 340s
apart.  If we assume that we had a successful write at 1506001555s,
then that means we only waited 329s before we next tried to set the
RTC, not the 11 minutes that we should have.  Not sure yet what's
going on there, but it seems to suggest that we're struggling to
set the RTC, taking multiple attempts to achieve it.

Also... don't we want to move the call to rtc_set_ntp_time() from out
of the "if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {"
check, otherwise how could it be called except at about .5s?

I'm actually not sure if we are ending up setting the time - my test
program reports:

RTC Time: 21-09-2017 22:40:45
System Time was:     22:40:44.997

so the RTC has drifted off by 3ms wrt system time, and that suggests
it hasn't been set.  It's difficult to know.  Given that tick_nsec is
likely to be 4ms (I've not checked) it would suggest that we're never
successfully setting the RTC - since either the test in ntp.c fails
or the test in systohc.c fails.  Both can't succeed given these
constraints.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync@8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-21 22:45             ` Russell King - ARM Linux
@ 2017-09-21 23:20               ` Jason Gunthorpe
  2017-09-22  9:57                 ` Russell King - ARM Linux
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2017-09-21 23:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 21, 2017 at 11:45:41PM +0100, Russell King - ARM Linux wrote:
> > I've combined several of your messages together into this reply...
> 
> Sorry about that, I'm currently ill and sleep deprived, so it's been
> a struggle to pick up on everything together.

No worries, your replys were easy to follow, thanks

> > So with v2 of this patch, the driver would specify positive 1s, and
> > then the set will be called when tv_nsec == 0, but the tv_sec will
> > be +1.
> > 
> > Similarly the existing case of 0.5s offset will work more like
> > before, where it will be called with tv_nsec == .5s and the tv_sec
> > will be +1 - before I think this happened reliably 'by accident' due
> > to the rounding.
> 
> It seems to mean (from reading the code) that I'd need to specify
> -470ms for the PCF8523, if I'm understand what you're saying and
> the code correctly.

Okay.. but I guessed a negative number is probably not what most RTCs would
want, eg a positive number would compensate for delays executing an
I2C command, assuming the RTC zeros its divider when it sets the time.

> I should point out, however, that I'm running a modified version
> of your previous patch with 470ms specified - I added a debug
> printk() for the failure cases, and I seem to be getting quite a
> number:
> 
> [  416.966909] rtc_tv_nsec_ok: fail: ts=1506001533.510194975

> then that means we only waited 329s before we next tried to set the
> RTC, not the 11 minutes that we should have.  Not sure yet what's
> going on there, but it seems to suggest that we're struggling to
> set the RTC, taking multiple attempts to achieve it.

Ah..

> Also... don't we want to move the call to rtc_set_ntp_time() from out
> of the "if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {"
> check, otherwise how could it be called except at about .5s?

Yes, definitely. If we do not call through to rtc_set_ntp_time then it
does not update target_nsec, and we go back to the 0.5 offset not the
offset you want, which will certainly cause random rtc_tv_nsec_ok
failures.

My bad to not notice that.. But I'm not sure what the revision should
be.. I think this is using tick_nsec similarly to TIME_SET_NSEC_FUZZ,
but I'm not sure where tick_nsec comes from or if we should push it
down in to TIME_SET_NSEC_FUZZ, or something else.

> I'm actually not sure if we are ending up setting the time - my test
> program reports:

Probably not, given the above!

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-21 23:20               ` Jason Gunthorpe
@ 2017-09-22  9:57                 ` Russell King - ARM Linux
  2017-09-22 12:24                   ` Russell King - ARM Linux
  2017-09-22 16:48                   ` Jason Gunthorpe
  0 siblings, 2 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-22  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 21, 2017 at 05:20:40PM -0600, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2017 at 11:45:41PM +0100, Russell King - ARM Linux wrote:
> > > I've combined several of your messages together into this reply...
> > 
> > Sorry about that, I'm currently ill and sleep deprived, so it's been
> > a struggle to pick up on everything together.
> 
> No worries, your replys were easy to follow, thanks

Feeling better today, thankfully.

> > > So with v2 of this patch, the driver would specify positive 1s, and
> > > then the set will be called when tv_nsec == 0, but the tv_sec will
> > > be +1.
> > > 
> > > Similarly the existing case of 0.5s offset will work more like
> > > before, where it will be called with tv_nsec == .5s and the tv_sec
> > > will be +1 - before I think this happened reliably 'by accident' due
> > > to the rounding.
> > 
> > It seems to mean (from reading the code) that I'd need to specify
> > -470ms for the PCF8523, if I'm understand what you're saying and
> > the code correctly.
> 
> Okay.. but I guessed a negative number is probably not what most RTCs would
> want, eg a positive number would compensate for delays executing an
> I2C command, assuming the RTC zeros its divider when it sets the time.

Okay, I thought I'd investigate a few other systems:

With the mainline code, which sets the RTC to tv_sec + 1 when tv_nsec
>= 500ms or tv_sec when tv_nsec < 500ms, at about tv_nsec = 500ms:

Dove - drifting about 4-5ms per minute:
RTC Time: 22-09-2017 09:06:38
System Time was:     09:06:37.464
RTC Time: 22-09-2017 09:07:38
System Time was:     09:07:37.508
RTC Time: 22-09-2017 09:08:39
System Time was:     09:08:38.504
...
RTC Time: 22-09-2017 09:17:48
System Time was:     09:17:47.464
RTC Time: 22-09-2017 09:18:48
System Time was:     09:18:47.500

I suspect this wants the time to be set at or around tv_nsec = 0.


Armada 388 - drifting@about 2-3ms per minute:
RTC Time: 22-09-2017 08:22:25
System Time was:     08:22:25.498
...
RTC Time: 22-09-2017 08:30:42
System Time was:     08:30:42.516
RTC Time: 22-09-2017 08:31:43
System Time was:     08:31:43.518
RTC Time: 22-09-2017 08:32:45
System Time was:     08:32:44.520
RTC Time: 22-09-2017 08:33:46
System Time was:     08:33:45.522
...
RTC Time: 22-09-2017 08:41:54
System Time was:     08:41:53.540
RTC Time: 22-09-2017 08:42:55
System Time was:     08:42:54.542
RTC Time: 22-09-2017 08:43:56
System Time was:     08:43:55.520
RTC Time: 22-09-2017 08:44:57
System Time was:     08:44:56.522

This looks the same as the Dove case - although there's some flucuation
in the second offset, probably caused by the 500ms threshold in
rtc_set_ntp_time().

> > Also... don't we want to move the call to rtc_set_ntp_time() from out
> > of the "if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {"
> > check, otherwise how could it be called except at about .5s?
> 
> Yes, definitely. If we do not call through to rtc_set_ntp_time then it
> does not update target_nsec, and we go back to the 0.5 offset not the
> offset you want, which will certainly cause random rtc_tv_nsec_ok
> failures.
> 
> My bad to not notice that.. But I'm not sure what the revision should
> be.. I think this is using tick_nsec similarly to TIME_SET_NSEC_FUZZ,
> but I'm not sure where tick_nsec comes from or if we should push it
> down in to TIME_SET_NSEC_FUZZ, or something else.

I think tick_nsec is the length of one tick in nanoseconds, adjusted
by NTP (see ntp_update_frequency() where it's calculated.)

It's probably easiest to think about it in terms of the old timekeeping
model, where we had a periodic tick whose period was fixed and defined
the point at which time incremented.  tick_nsec would be the amount of
time for one period.

I think that's still relevant in a modern kernel, especially when it's
using a clock event in periodic mode, as that's when we get the regular
interrupts which guarantee a scheduling point (other scheduling points
are available!) and hence when the workqueue can be run.

The only places it's used is by the NTP code, and also by a few
architectures to that still have gettimeoffset() functionality (pre-
clocksource) as it can be used to derive the scaling factor to convert
timer count ticks to wall time.

I also don't see an obvious rearrangement of the code to fix this -
the problem is we need to call update_persistent_clock64() to know
whether we should call rtc_set_ntp_time(), but we can only call
update_persistent_clock64() on tv_nsec=500ms.

It looks like the users of update_persistent_clock*() are:
arch/mips/kernel/time.c
arch/powerpc/kernel/time.c
arch/sh/kernel/time.c
arch/x86/kernel/rtc.c

I thought SH was introduced after rtclib was born?

All of these have some kind of architecture specific multiplexing of
the update_persistent_clock() call - switch statements depending on
rtc type, function pointers set according to the rtc type, etc.  They
all look like legacy code that no one wants to update to rtclib.

Maybe the solution here is to split two forms of ntp RTC synchronisation
so:
- If you enable CONFIG_GENERIC_CMOS_UPDATE, then you get the
  update_persistent_clock*() methods called.
- If you enable CONFIG_RTC_SYSTOHC, rtc_set_ntp_time() gets called.
- If you enable both, then both get called at their appropriate times.

This would certainly simplify sync_cmos_clock().

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-22  9:57                 ` Russell King - ARM Linux
@ 2017-09-22 12:24                   ` Russell King - ARM Linux
  2017-09-22 16:28                     ` Russell King - ARM Linux
  2017-09-22 16:48                   ` Jason Gunthorpe
  1 sibling, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-22 12:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 22, 2017 at 10:57:13AM +0100, Russell King - ARM Linux wrote:
> Maybe the solution here is to split two forms of ntp RTC synchronisation
> so:
> - If you enable CONFIG_GENERIC_CMOS_UPDATE, then you get the
>   update_persistent_clock*() methods called.
> - If you enable CONFIG_RTC_SYSTOHC, rtc_set_ntp_time() gets called.
> - If you enable both, then both get called at their appropriate times.
> 
> This would certainly simplify sync_cmos_clock().

I've just tried that out, and it doesn't work very well (not wrapping
the kernel messages):

[   18.827144] sync_rtc_clock: adjust=1506081346.710933001 target=470000000 next=1506081357.470000000 now=1506081346.710953334 delta=10.759046666 jiffies=2690
[   29.662077] sync_rtc_clock: adjust=1506081357.546474534 target=470000000 next=1506081368.470000000 now=1506081357.546487201 delta=10.923512799 jiffies=2731
[   40.671669] sync_rtc_clock: adjust=1506081368.556684059 target=470000000 next=1506081379.470000000 now=1506081368.556696727 delta=10.913303273 jiffies=2729
[   51.675016] sync_rtc_clock: adjust=1506081379.560649235 target=470000000 next=1506081390.470000000 now=1506081379.560662236 delta=10.909337764 jiffies=2728
[   62.682385] sync_rtc_clock: adjust=1506081390.568637970 target=470000000 next=1506081401.470000000 now=1506081390.568648637 delta=10.901351363 jiffies=2726
[   73.689831] sync_rtc_clock: adjust=1506081401.576696708 target=470000000 next=1506081412.470000000 now=1506081401.576708376 delta=10.893291624 jiffies=2724
[   84.697233] sync_rtc_clock: adjust=1506081412.584695110 target=470000000 next=1506081423.470000000 now=1506081412.584723778 delta=10.885276222 jiffies=2722
[   95.704624] sync_rtc_clock: adjust=1506081423.592724847 target=470000000 next=1506081434.470000000 now=1506081423.592736847 delta=10.877263153 jiffies=2720
[  106.711941] sync_rtc_clock: adjust=1506081434.600651911 target=470000000 next=1506081445.470000000 now=1506081434.600665579 delta=10.869334421 jiffies=2718
[  117.719334] sync_rtc_clock: adjust=1506081445.608659647 target=470000000 next=1506081456.470000000 now=1506081445.608673981 delta=10.861326019 jiffies=2716
[  128.726723] sync_rtc_clock: adjust=1506081456.616661715 target=470000000 next=1506081467.470000000 now=1506081456.616676049 delta=10.853323951 jiffies=2714
[  139.734103] sync_rtc_clock: adjust=1506081467.624659450 target=470000000 next=1506081478.470000000 now=1506081467.624672451 delta=10.845327549 jiffies=2712
[  150.741489] sync_rtc_clock: adjust=1506081478.632661185 target=470000000 next=1506081489.470000000 now=1506081478.632673519 delta=10.837326481 jiffies=2710
[  161.748870] sync_rtc_clock: adjust=1506081489.640655587 target=470000000 next=1506081500.470000000 now=1506081489.640668254 delta=10.829331746 jiffies=2708
[  172.756352] sync_rtc_clock: adjust=1506081500.648758661 target=470000000 next=1506081511.470000000 now=1506081500.648772995 delta=10.821227005 jiffies=2706
[  183.763642] sync_rtc_clock: adjust=1506081511.656658057 target=470000000 next=1506081522.470000000 now=1506081511.656671391 delta=10.813328609 jiffies=2704
[  194.771043] sync_rtc_clock: adjust=1506081522.664678127 target=470000000 next=1506081533.470000000 now=1506081522.664689794 delta=10.805310206 jiffies=2702
[  205.778402] sync_rtc_clock: adjust=1506081533.672649194 target=470000000 next=1506081544.470000000 now=1506081533.672662528 delta=10.797337472 jiffies=2700
[  216.785780] sync_rtc_clock: adjust=1506081544.680643928 target=470000000 next=1506081555.470000000 now=1506081544.680656929 delta=10.789343071 jiffies=2698
[  227.792447] sync_rtc_clock: adjust=1506081555.688650369 target=470000000 next=1506081566.470000000 now=1506081555.688664037 delta=10.781335963 jiffies=2696
[  238.798968] sync_rtc_clock: adjust=1506081566.696674887 target=470000000 next=1506081577.470000000 now=1506081566.696687889 delta=10.773312111 jiffies=2694
[  249.805483] sync_rtc_clock: adjust=1506081577.704652923 target=470000000 next=1506081588.470000000 now=1506081577.704666259 delta=10.765333741 jiffies=2692
[  260.812078] sync_rtc_clock: adjust=1506081588.712679979 target=470000000 next=1506081599.470000000 now=1506081588.712693647 delta=10.757306353 jiffies=2690
[  271.818656] sync_rtc_clock: adjust=1506081599.720654429 target=470000000 next=1506081610.470000000 now=1506081599.720668431 delta=10.749331569 jiffies=2688
[  282.825311] sync_rtc_clock: adjust=1506081610.728677710 target=470000000 next=1506081621.470000000 now=1506081610.728690045 delta=10.741309955 jiffies=2686
[  293.575979] sync_rtc_clock: adjust=1506081621.480653995 target=470000000 next=1506081632.470000000 now=1506081621.480666996 delta=10.989333004 jiffies=2748
[  304.582667] sync_rtc_clock: adjust=1506081632.488651088 target=470000000 next=1506081643.470000000 now=1506081632.488663756 delta=10.981336244 jiffies=2746
[  315.589372] sync_rtc_clock: adjust=1506081643.496641761 target=470000000 next=1506081654.470000000 now=1506081643.496653429 delta=10.973346571 jiffies=2744

As previous mentioned, this has HZ=250, so one jiffy is about 4ms.  What
this shows is that we're failing every single time to hit the desired
window - the closest we got in the 5 minutes of uptime to 470ms was
480653995ns.

The comments in linux/workqueue.h refer to a gmane link -
http://thread.gmane.org/gmane.linux.kernel/1480396 but this seems to be
dead - "Archived At Nothing found - bye".  Looks like using gmane URLs
in kernel code is not a good idea, they don't seem to be persistent!

It's interesting to watch what's happening there - despite reducing the
number of jiffies each time, we still seem to be woken later and later
each time, with an approximate increase of 8ms or two jiffies each time.
Eg:

[  293.575979] sync_rtc_clock: adjust=1506081621.480653995 target=470000000
 next=1506081632.470000000 now=1506081621.480666996 delta=10.989333004
 jiffies=2748
[  304.582667] sync_rtc_clock: adjust=1506081632.488651088 target=470000000
 next=1506081643.470000000 now=1506081632.488663756 delta=10.981336244
 jiffies=2746
[  315.589372] sync_rtc_clock: adjust=1506081643.496641761 target=470000000
 next=1506081654.470000000 now=1506081643.496653429 delta=10.973346571
 jiffies=2744

If we do the math, then:

 2748 jiffies is 10.992s, which is 10.989333004 rounded up to a jiffy.
 10.992 + 1506081621.480666996 gives 1506081632.472666996, but we're
 next called at 1506081632.488651088, which is almost 4 jiffies later
 than requested.

 2746 jiffies is 10.984s, which is 10.981336244 rounded up to a jiffy.
 10.984 + 1506081632.488663756 gives 1506081643.472663756, but we're
 next called at 1506081643.496641761, which is almost 6 jiffies later
 than requested.

I'm not sure where those extra jiffies are coming from... but maybe
using the power efficient wq is not the best for accuracy?

Also notice that the target time seems to always be about 2.6ms late.
based on the requested jiffies, which in itself would mean we miss
the 1ms window in the systohc.c code.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-22 12:24                   ` Russell King - ARM Linux
@ 2017-09-22 16:28                     ` Russell King - ARM Linux
  0 siblings, 0 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-22 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 22, 2017 at 01:24:20PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 22, 2017 at 10:57:13AM +0100, Russell King - ARM Linux wrote:
> > Maybe the solution here is to split two forms of ntp RTC synchronisation
> > so:
> > - If you enable CONFIG_GENERIC_CMOS_UPDATE, then you get the
> >   update_persistent_clock*() methods called.
> > - If you enable CONFIG_RTC_SYSTOHC, rtc_set_ntp_time() gets called.
> > - If you enable both, then both get called at their appropriate times.
> > 
> > This would certainly simplify sync_cmos_clock().
> 
> I've just tried that out, and it doesn't work very well (not wrapping
> the kernel messages):
> 
> [   18.827144] sync_rtc_clock: adjust=1506081346.710933001 target=470000000 next=1506081357.470000000 now=1506081346.710953334 delta=10.759046666 jiffies=2690
> [   29.662077] sync_rtc_clock: adjust=1506081357.546474534 target=470000000 next=1506081368.470000000 now=1506081357.546487201 delta=10.923512799 jiffies=2731
> [   40.671669] sync_rtc_clock: adjust=1506081368.556684059 target=470000000 next=1506081379.470000000 now=1506081368.556696727 delta=10.913303273 jiffies=2729
> [   51.675016] sync_rtc_clock: adjust=1506081379.560649235 target=470000000 next=1506081390.470000000 now=1506081379.560662236 delta=10.909337764 jiffies=2728
> [   62.682385] sync_rtc_clock: adjust=1506081390.568637970 target=470000000 next=1506081401.470000000 now=1506081390.568648637 delta=10.901351363 jiffies=2726
> [   73.689831] sync_rtc_clock: adjust=1506081401.576696708 target=470000000 next=1506081412.470000000 now=1506081401.576708376 delta=10.893291624 jiffies=2724
> [   84.697233] sync_rtc_clock: adjust=1506081412.584695110 target=470000000 next=1506081423.470000000 now=1506081412.584723778 delta=10.885276222 jiffies=2722
> [   95.704624] sync_rtc_clock: adjust=1506081423.592724847 target=470000000 next=1506081434.470000000 now=1506081423.592736847 delta=10.877263153 jiffies=2720
> [  106.711941] sync_rtc_clock: adjust=1506081434.600651911 target=470000000 next=1506081445.470000000 now=1506081434.600665579 delta=10.869334421 jiffies=2718
> [  117.719334] sync_rtc_clock: adjust=1506081445.608659647 target=470000000 next=1506081456.470000000 now=1506081445.608673981 delta=10.861326019 jiffies=2716
> [  128.726723] sync_rtc_clock: adjust=1506081456.616661715 target=470000000 next=1506081467.470000000 now=1506081456.616676049 delta=10.853323951 jiffies=2714
> [  139.734103] sync_rtc_clock: adjust=1506081467.624659450 target=470000000 next=1506081478.470000000 now=1506081467.624672451 delta=10.845327549 jiffies=2712
> [  150.741489] sync_rtc_clock: adjust=1506081478.632661185 target=470000000 next=1506081489.470000000 now=1506081478.632673519 delta=10.837326481 jiffies=2710
> [  161.748870] sync_rtc_clock: adjust=1506081489.640655587 target=470000000 next=1506081500.470000000 now=1506081489.640668254 delta=10.829331746 jiffies=2708
> [  172.756352] sync_rtc_clock: adjust=1506081500.648758661 target=470000000 next=1506081511.470000000 now=1506081500.648772995 delta=10.821227005 jiffies=2706
> [  183.763642] sync_rtc_clock: adjust=1506081511.656658057 target=470000000 next=1506081522.470000000 now=1506081511.656671391 delta=10.813328609 jiffies=2704
> [  194.771043] sync_rtc_clock: adjust=1506081522.664678127 target=470000000 next=1506081533.470000000 now=1506081522.664689794 delta=10.805310206 jiffies=2702
> [  205.778402] sync_rtc_clock: adjust=1506081533.672649194 target=470000000 next=1506081544.470000000 now=1506081533.672662528 delta=10.797337472 jiffies=2700
> [  216.785780] sync_rtc_clock: adjust=1506081544.680643928 target=470000000 next=1506081555.470000000 now=1506081544.680656929 delta=10.789343071 jiffies=2698
> [  227.792447] sync_rtc_clock: adjust=1506081555.688650369 target=470000000 next=1506081566.470000000 now=1506081555.688664037 delta=10.781335963 jiffies=2696
> [  238.798968] sync_rtc_clock: adjust=1506081566.696674887 target=470000000 next=1506081577.470000000 now=1506081566.696687889 delta=10.773312111 jiffies=2694
> [  249.805483] sync_rtc_clock: adjust=1506081577.704652923 target=470000000 next=1506081588.470000000 now=1506081577.704666259 delta=10.765333741 jiffies=2692
> [  260.812078] sync_rtc_clock: adjust=1506081588.712679979 target=470000000 next=1506081599.470000000 now=1506081588.712693647 delta=10.757306353 jiffies=2690
> [  271.818656] sync_rtc_clock: adjust=1506081599.720654429 target=470000000 next=1506081610.470000000 now=1506081599.720668431 delta=10.749331569 jiffies=2688
> [  282.825311] sync_rtc_clock: adjust=1506081610.728677710 target=470000000 next=1506081621.470000000 now=1506081610.728690045 delta=10.741309955 jiffies=2686
> [  293.575979] sync_rtc_clock: adjust=1506081621.480653995 target=470000000 next=1506081632.470000000 now=1506081621.480666996 delta=10.989333004 jiffies=2748
> [  304.582667] sync_rtc_clock: adjust=1506081632.488651088 target=470000000 next=1506081643.470000000 now=1506081632.488663756 delta=10.981336244 jiffies=2746
> [  315.589372] sync_rtc_clock: adjust=1506081643.496641761 target=470000000 next=1506081654.470000000 now=1506081643.496653429 delta=10.973346571 jiffies=2744
> 
> As previous mentioned, this has HZ=250, so one jiffy is about 4ms.  What
> this shows is that we're failing every single time to hit the desired
> window - the closest we got in the 5 minutes of uptime to 470ms was
> 480653995ns.
> 
> The comments in linux/workqueue.h refer to a gmane link -
> http://thread.gmane.org/gmane.linux.kernel/1480396 but this seems to be
> dead - "Archived At Nothing found - bye".  Looks like using gmane URLs
> in kernel code is not a good idea, they don't seem to be persistent!
> 
> It's interesting to watch what's happening there - despite reducing the
> number of jiffies each time, we still seem to be woken later and later
> each time, with an approximate increase of 8ms or two jiffies each time.
> Eg:
> 
> [  293.575979] sync_rtc_clock: adjust=1506081621.480653995 target=470000000
>  next=1506081632.470000000 now=1506081621.480666996 delta=10.989333004
>  jiffies=2748
> [  304.582667] sync_rtc_clock: adjust=1506081632.488651088 target=470000000
>  next=1506081643.470000000 now=1506081632.488663756 delta=10.981336244
>  jiffies=2746
> [  315.589372] sync_rtc_clock: adjust=1506081643.496641761 target=470000000
>  next=1506081654.470000000 now=1506081643.496653429 delta=10.973346571
>  jiffies=2744
> 
> If we do the math, then:
> 
>  2748 jiffies is 10.992s, which is 10.989333004 rounded up to a jiffy.
>  10.992 + 1506081621.480666996 gives 1506081632.472666996, but we're
>  next called at 1506081632.488651088, which is almost 4 jiffies later
>  than requested.
> 
>  2746 jiffies is 10.984s, which is 10.981336244 rounded up to a jiffy.
>  10.984 + 1506081632.488663756 gives 1506081643.472663756, but we're
>  next called at 1506081643.496641761, which is almost 6 jiffies later
>  than requested.
> 
> I'm not sure where those extra jiffies are coming from... but maybe
> using the power efficient wq is not the best for accuracy?
> 
> Also notice that the target time seems to always be about 2.6ms late.
> based on the requested jiffies, which in itself would mean we miss
> the 1ms window in the systohc.c code.

Okay, the problem seems to be the new timer wheel code - see
https://lwn.net/Articles/646950/

This means that timers set way in the future (like more than a few
milliseconds) can be delayed - they no longer expire "on the specified
jiffies".  Something to bear in mind for the future, as this affects
everything that uses normal timers.

I think the reason this doesn't show up for the original code is
because of your change from a 0-1s retry to a 10s retry - the further
the expiry into the future, the more delayed the timers can get.

I guess we could go back to the 0-1s retry and keep using the delayed
work queues, but I wonder if normal timers will get even less accurate
in the future (if tglx does more work there.)

The suggestion from Arjan van de Ven is to use hrtimers for this - but
we still need the workqueue (as we may sleep) so we need a hrtimer-
delayed workqueue.  On the face of it, that seems a contradiction in
terms to use a hrtimer with a workqueue.

I do have an implementation for the hrtimer suggestion, which I'll try
to sort out during the remainder of the day, once I've ironed out a
few remaining quirks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-22  9:57                 ` Russell King - ARM Linux
  2017-09-22 12:24                   ` Russell King - ARM Linux
@ 2017-09-22 16:48                   ` Jason Gunthorpe
  2017-09-22 17:20                     ` Russell King - ARM Linux
  1 sibling, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2017-09-22 16:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 22, 2017 at 10:57:13AM +0100, Russell King - ARM Linux wrote:

> > Okay.. but I guessed a negative number is probably not what most RTCs would
> > want, eg a positive number would compensate for delays executing an
> > I2C command, assuming the RTC zeros its divider when it sets the time.
> 
> Okay, I thought I'd investigate a few other systems:
> 
> With the mainline code, which sets the RTC to tv_sec + 1 when tv_nsec
> >= 500ms or tv_sec when tv_nsec < 500ms,@about tv_nsec = 500ms:

The more I think about this, the more I think the above rounding is a bug
in rtc_set_ntp_time. It does not make any sense to target 0.5 for the
update and then, essentially, randomly add one second depending on
where the time tick happens.

Based on your other study of the WQ behavior I would anticipate the
time tick is consistently > .5s, so historically the rtc_set_ntp_time
always adds +1s.

Unfortunately, the classic pre-rtclib mechanism in somthing like x86
mach_set_rtc_mmss just truncates, not rounds. Both cannot be right
when talking about the same RTC chip.

So, my guess, is that the ntp stuff was written and tested against the
old x86 code, eg the round down behavior.

This suggests the CMOS mechanism expects to set time 1.0 seconds at
instant 1.5 s. ie when the RTC executes the time set it sets the seconds
scaler to 0.5, not zero. With the v2 patch this would be a -0.5s
set_offset_nsec value.

> RTC Time: 22-09-2017 09:18:48
> System Time was:     09:18:47.500
> 
> I suspect this wants the time to be set at or around tv_nsec = 0.

Right, in that case I would imagine setting set_offset_nsec to
something on the order of the I2C execution latency.

> > My bad to not notice that.. But I'm not sure what the revision should
> > be.. I think this is using tick_nsec similarly to TIME_SET_NSEC_FUZZ,
> > but I'm not sure where tick_nsec comes from or if we should push it
> > down in to TIME_SET_NSEC_FUZZ, or something else.
> 
> I think tick_nsec is the length of one tick in nanoseconds, adjusted
> by NTP (see ntp_update_frequency() where it's calculated.)

Yes, having some time to look at it now, tick_ns*5 is about 5
jiffies, your system is using 4ms jiffies, so the existing code has a
TIME_SET_NSEC_FUZZ of +-20ms from the target time. Guessing someone
already figured out the WQ is very inaccurate and delt with it by
using tick_ns*5. I copied that into patch v3 below

Perhaps this should ultimately be moved to a hrtimer or something.

> The only places it's used is by the NTP code, and also by a few
> architectures to that still have gettimeoffset() functionality (pre-
> clocksource) as it can be used to derive the scaling factor to convert
> timer count ticks to wall time.

This may also be a source of some of the WQ error, my patches have
done this:

	getnstimeofday64(&now);
	delta = timespec64_sub(next, now);
        timespec64_to_jiffies(&delta));

'delta' is wall clock ns, not jiffies ns. Depending on how the
timekeeping works, it may need a scale factor when converting to
jiffies? Particularly when talking about the 11 min sleep

However the original code does not seem to have anything like that..

> Maybe the solution here is to split two forms of ntp RTC synchronisation
> so:
> - If you enable CONFIG_GENERIC_CMOS_UPDATE, then you get the
>   update_persistent_clock*() methods called.
> - If you enable CONFIG_RTC_SYSTOHC, rtc_set_ntp_time() gets called.
> - If you enable both, then both get called at their appropriate times.
> 
> This would certainly simplify sync_cmos_clock().

Yes, splitting makes sense to me, but calling both rtc_set_ntp_time
and update_persistent_clock doesn't seem like a good idea. At least
x86 looks like it will provide both mechanism?, We don't want to
set the RTC twice..

Patch v3 reworks that code to be clearer as you suggested, but keeps
the semantic that if cmos is supported rtc does not run.

Still untested, sorry about that..

v3:
- Revise TIME_SET_NSEC_FUZZ to be 5 jiffies, like sync_cmos_clock had
- Split sync_cmos_clock into CMOS and RTC code paths, retain the two
  different kinds of rounding being used.
- Go back to a 1 s update interval

Jason

>From 2ff270d4984daf100112c33689db428bebb73d6f Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Fri, 22 Sep 2017 10:43:11 -0600
Subject: [PATCH v3] rtc: Allow rtc drivers to specify the tv_nsec value for ntp

ntp is currently hardwired to try and call the rtc set when wall clock
tv_nsec is 0.5 seconds. This historical behaviour works well with certain
PC RTCs, but is not universal to all rtc hardware.

Change how this works by introducing the driver specific concept of
a required delay between current wall clock time and the target time
to set (with a 0 tv_nsecs). When this delay is set to 0.5 seconds then
the behaviour will be the same as today, at wall clock time 0.5 sec the
RTC set will be called to set 1.0 sec.

Each RTC driver should set the set_offset_nsec according to its needs.

The calculation of the sleep time and 'fuzz' for ntp is also revised
to use modern helper functions, and to more directly and safely
compute a relative jiffies delay that will result in the correct
tv_nsec. If for some reason the timer does not meet the requirement
then it does a shorter sleep and tries again.

Since cmos and RTC now have very different handling they are split
into two dedicated code paths, sharing the support code.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/rtc/class.c   |   6 ++
 drivers/rtc/systohc.c |  53 +++++++++++-----
 include/linux/rtc.h   |  43 ++++++++++++-
 kernel/time/ntp.c     | 171 +++++++++++++++++++++++++++++++++-----------------
 4 files changed, 201 insertions(+), 72 deletions(-)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 2ed970d61da140..eef4123a573504 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -161,6 +161,12 @@ static struct rtc_device *rtc_allocate_device(void)
 
 	device_initialize(&rtc->dev);
 
+	/* Drivers can revise this default after allocating the device. It
+	 * should be the value of wallclock tv_nsec that the driver needs in
+	 * order to synchronize the second tick over during set.
+	 */
+	rtc->set_offset_nsec =  NSEC_PER_SEC / 2;
+
 	rtc->irq_freq = 1;
 	rtc->max_user_freq = 64;
 	rtc->dev.class = rtc_class;
diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index b4a68ffcd06bb8..5e50c9db57344a 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -10,6 +10,7 @@
 /**
  * rtc_set_ntp_time - Save NTP synchronized time to the RTC
  * @now: Current time of day
+ * @target_nsec: pointer for desired now->tv_nsec value
  *
  * Replacement for the NTP platform function update_persistent_clock64
  * that stores time for later retrieval by rtc_hctosys.
@@ -18,30 +19,52 @@
  * possible at all, and various other -errno for specific temporary failure
  * cases.
  *
+ * -EPROTO is returned if now.tv_nsec is not close enough to *target_nsec.
+ (
  * If temporary failure is indicated the caller should try again 'soon'
  */
-int rtc_set_ntp_time(struct timespec64 now)
+int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec)
 {
 	struct rtc_device *rtc;
 	struct rtc_time tm;
+	struct timespec64 to_set;
 	int err = -ENODEV;
-
-	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
-		rtc_time64_to_tm(now.tv_sec, &tm);
-	else
-		rtc_time64_to_tm(now.tv_sec + 1, &tm);
+	bool ok;
 
 	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
-	if (rtc) {
-		/* rtc_hctosys exclusively uses UTC, so we call set_time here,
-		 * not set_mmss. */
-		if (rtc->ops &&
-		    (rtc->ops->set_time ||
-		     rtc->ops->set_mmss64 ||
-		     rtc->ops->set_mmss))
-			err = rtc_set_time(rtc, &tm);
-		rtc_class_close(rtc);
+	if (!rtc)
+		goto out_err;
+
+	if (!rtc->ops || (!rtc->ops->set_time && !rtc->ops->set_mmss64 &&
+			  !rtc->ops->set_mmss))
+		goto out_close;
+
+	/* Compute the value of tv_nsec we require the caller to supply in
+	 * now.tv_nsec.  This is the value such that (now +
+	 * set_offset_nsec).tv_nsec == 0.
+	 */
+	set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec);
+	*target_nsec = to_set.tv_nsec;
+
+	/* The ntp code must call this with the correct value in tv_nsec, if
+	 * it does not we update target_nsec and return EPROTO to make the ntp
+	 * code try again later.
+	 */
+	ok = rtc_tv_nsec_ok(rtc, &to_set, &now);
+	if (!ok) {
+		err = -EPROTO;
+		goto out_close;
 	}
 
+	rtc_time64_to_tm(to_set.tv_sec, &tm);
+
+	/* rtc_hctosys exclusively uses UTC, so we call set_time here, not
+	 * set_mmss.
+	 */
+	err = rtc_set_time(rtc, &tm);
+
+out_close:
+	rtc_class_close(rtc);
+out_err:
 	return err;
 }
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 0a0f0d14a5fba5..eb767f0bfc13fa 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -137,6 +137,14 @@ struct rtc_device {
 	/* Some hardware can't support UIE mode */
 	int uie_unsupported;
 
+	/* Number of nsec it takes to set the RTC clock. This influences when
+	 * the set ops are called. An offset:
+	 *   - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s
+	 *   - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s
+	 *   - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s
+	 */
+	long set_offset_nsec;
+
 	bool registered;
 
 	struct nvmem_config *nvmem_config;
@@ -174,7 +182,7 @@ extern void devm_rtc_device_unregister(struct device *dev,
 
 extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm);
 extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
-extern int rtc_set_ntp_time(struct timespec64 now);
+extern int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec);
 int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
 extern int rtc_read_alarm(struct rtc_device *rtc,
 			struct rtc_wkalrm *alrm);
@@ -223,6 +231,39 @@ static inline bool is_leap_year(unsigned int year)
 	return (!(year % 4) && (year % 100)) || !(year % 400);
 }
 
+/* Determine if we can call to driver to set the time. Drivers can only be
+ * called to set a second aligned time value, and the field set_offset_nsec
+ * specifies how far away from the second aligned time to call the driver.
+ *
+ * This also computes 'to_set' which is the time we are trying to set, and has
+ * a zero in tv_nsecs, such that:
+ *    to_set - set_delay_nsec == now +/- FUZZ
+ *
+ */
+static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
+				  struct timespec64 *to_set,
+				  const struct timespec64 *now)
+{
+	/* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
+	const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
+	struct timespec64 delay = {.tv_sec = 0,
+				   .tv_nsec = set_offset_nsec};
+
+	*to_set = timespec64_add(*now, delay);
+
+	if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
+		to_set->tv_nsec = 0;
+		return true;
+	}
+
+	if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
+		to_set->tv_sec++;
+		to_set->tv_nsec = 0;
+		return true;
+	}
+	return false;
+}
+
 #define rtc_register_device(device) \
 	__rtc_register_device(THIS_MODULE, device)
 
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index edf19cc5314043..90c348781995b5 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -492,12 +492,77 @@ int second_overflow(time64_t secs)
 	return leap;
 }
 
-#ifdef CONFIG_GENERIC_CMOS_UPDATE
-int __weak update_persistent_clock(struct timespec now)
+static void sync_hw_clock(struct work_struct *work);
+static DECLARE_DELAYED_WORK(sync_work, sync_hw_clock);
+
+static void sched_sync_hw_clock(struct timespec64 now,
+				unsigned long target_nsec, bool fail)
 {
-	return -ENODEV;
+	struct timespec64 next, delta;
+
+	do {
+		/*
+		 * Compute the next wall clock time to try and set the
+		 * clock
+		 */
+		next = now;
+		if (!fail)
+			timespec64_add_ns(&next, 659ULL * NSEC_PER_SEC);
+		else
+			/* Update failed, try again in about 10 seconds */
+			timespec64_add_ns(&next, 10ULL * NSEC_PER_SEC);
+
+		/*
+		 * The next call to sync_cmos_clock needs to have have a wall
+		 * clock tv_nsec value equal to target_nsec.
+		 */
+		if (next.tv_nsec > target_nsec)
+			next.tv_sec++;
+		next.tv_nsec = target_nsec;
+
+		/*
+		 * Convert to a relative delay. If time set took a really long
+		 * time, or the wall clock was changed, this might become
+		 * negative, so try again.
+		 */
+		getnstimeofday64(&now);
+		delta = timespec64_sub(next, now);
+	} while (delta.tv_sec <= 0);
+
+	queue_delayed_work(system_power_efficient_wq, &sync_work,
+			   timespec64_to_jiffies(&delta));
 }
 
+#if defined(CONFIG_RTC_SYSTOHC)
+static void sync_rtc_clock(void)
+{
+	unsigned long target_nsec;
+	struct timespec64 adjust, now;
+	int rc;
+
+	getnstimeofday64(&now);
+
+	now = adjust;
+	if (persistent_clock_is_local)
+		adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
+
+	/*
+	 * The current RTC in use will provide the target_nsec it wants to be
+	 * called at, and does rtc_tv_nsec_ok internally.
+	 */
+	rc = rtc_set_ntp_time(adjust, &target_nsec);
+
+	sched_sync_hw_clock(now, target_nsec, rc && rc != -ENODEV);
+}
+#else
+static void sync_rtc_clock(void)
+{
+}
+#endif
+
+#if defined(CONFIG_GENERIC_CMOS_UPDATE)
+int __weak update_persistent_clock(struct timespec now) { return -ENODEV; }
+
 int __weak update_persistent_clock64(struct timespec64 now64)
 {
 	struct timespec now;
@@ -505,78 +570,72 @@ int __weak update_persistent_clock64(struct timespec64 now64)
 	now = timespec64_to_timespec(now64);
 	return update_persistent_clock(now);
 }
-#endif
-
-#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
-static void sync_cmos_clock(struct work_struct *work);
-
-static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock);
 
-static void sync_cmos_clock(struct work_struct *work)
+static bool sync_cmos_clock(void)
 {
+	static bool no_cmos;
 	struct timespec64 now;
-	struct timespec64 next;
-	int fail = 1;
+	struct timespec64 adjust;
+	int rc = -EPROTO;
+	long target_nsec = NSEC_PER_SEC / 2;
+
+	if (no_cmos)
+		return false;
 
 	/*
-	 * If we have an externally synchronized Linux clock, then update
-	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
-	 * called as close as possible to 500 ms before the new second starts.
-	 * This code is run on a timer.  If the clock is set, that timer
-	 * may not expire@the correct time.  Thus, we adjust...
-	 * We want the clock to be within a couple of ticks from the target.
+	 * Historically update_persistent_clock64() has to be called as close
+	 * as possible to 500 ms (target_nsec) before the new second starts.
+	 * The CMOS code 'rounds down' from this value, so we use a negative
+	 * input to rtc_tv_nsec_ok to compute adjust.
+	 *
+	 * Architectures are strongly encouraged to use rtclib and not
+	 * implement this legacy API.
 	 */
-	if (!ntp_synced()) {
-		/*
-		 * Not synced, exit, do not restart a timer (if one is
-		 * running, let it run out).
-		 */
-		return;
-	}
-
 	getnstimeofday64(&now);
-	if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {
-		struct timespec64 adjust = now;
-
-		fail = -ENODEV;
+	if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) {
 		if (persistent_clock_is_local)
 			adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
-#ifdef CONFIG_GENERIC_CMOS_UPDATE
-		fail = update_persistent_clock64(adjust);
-#endif
+		rc = update_persistent_clock64(adjust);
+		/*
+		 * The machine does not support update_persistent_clock64 even
+		 * though it defines it.
+		 */
+		if (rc == -ENODEV) {
+			no_cmos = true;
+			return false;
+		}
+	}
 
-#ifdef CONFIG_RTC_SYSTOHC
-		if (fail == -ENODEV)
-			fail = rtc_set_ntp_time(adjust);
+	sched_sync_hw_clock(now, target_nsec, rc);
+	return true;
+}
+#else
+static bool sync_cmos_clock(void)
+{
+	return false;
+}
 #endif
-	}
 
-	next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2);
-	if (next.tv_nsec <= 0)
-		next.tv_nsec += NSEC_PER_SEC;
+static void sync_hw_clock(struct work_struct *work)
+{
+	if (!ntp_synced())
+		return;
 
-	if (!fail || fail == -ENODEV)
-		next.tv_sec = 659;
-	else
-		next.tv_sec = 0;
+	if (sync_cmos_clock())
+		return;
 
-	if (next.tv_nsec >= NSEC_PER_SEC) {
-		next.tv_sec++;
-		next.tv_nsec -= NSEC_PER_SEC;
-	}
-	queue_delayed_work(system_power_efficient_wq,
-			   &sync_cmos_work, timespec64_to_jiffies(&next));
+	sync_rtc_clock();
 }
 
 void ntp_notify_cmos_timer(void)
 {
-	queue_delayed_work(system_power_efficient_wq, &sync_cmos_work, 0);
-}
-
-#else
-void ntp_notify_cmos_timer(void) { }
-#endif
+	if (!ntp_synced())
+		return;
 
+	if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
+	    IS_ENABLED(CONFIG_RTC_SYSTOHC))
+		queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
+}
 
 /*
  * Propagate a new txc->status value into the NTP state:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-22 16:48                   ` Jason Gunthorpe
@ 2017-09-22 17:20                     ` Russell King - ARM Linux
  2017-09-22 18:24                       ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-22 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 22, 2017 at 10:48:50AM -0600, Jason Gunthorpe wrote:
> On Fri, Sep 22, 2017 at 10:57:13AM +0100, Russell King - ARM Linux wrote:
> 
> > > Okay.. but I guessed a negative number is probably not what most RTCs would
> > > want, eg a positive number would compensate for delays executing an
> > > I2C command, assuming the RTC zeros its divider when it sets the time.
> > 
> > Okay, I thought I'd investigate a few other systems:
> > 
> > With the mainline code, which sets the RTC to tv_sec + 1 when tv_nsec
> > >= 500ms or tv_sec when tv_nsec < 500ms,@about tv_nsec = 500ms:
> 
> The more I think about this, the more I think the above rounding is a bug
> in rtc_set_ntp_time. It does not make any sense to target 0.5 for the
> update and then, essentially, randomly add one second depending on
> where the time tick happens.
> 
> Based on your other study of the WQ behavior I would anticipate the
> time tick is consistently > .5s, so historically the rtc_set_ntp_time
> always adds +1s.
> 
> Unfortunately, the classic pre-rtclib mechanism in somthing like x86
> mach_set_rtc_mmss just truncates, not rounds. Both cannot be right
> when talking about the same RTC chip.
> 
> So, my guess, is that the ntp stuff was written and tested against the
> old x86 code, eg the round down behavior.
> 
> This suggests the CMOS mechanism expects to set time 1.0 seconds at
> instant 1.5 s. ie when the RTC executes the time set it sets the seconds
> scaler to 0.5, not zero. With the v2 patch this would be a -0.5s
> set_offset_nsec value.

See the comment in arch/x86/kernel/rtc.c:

/*
 * In order to set the CMOS clock precisely, set_rtc_mmss has to be
 * called 500 ms after the second nowtime has started, because when
 * nowtime is written into the registers of the CMOS clock, it will
 * jump to the next second precisely 500 ms later. Check the Motorola
 * MC146818A or Dallas DS12887 data sheet for details.
 */

> > > My bad to not notice that.. But I'm not sure what the revision should
> > > be.. I think this is using tick_nsec similarly to TIME_SET_NSEC_FUZZ,
> > > but I'm not sure where tick_nsec comes from or if we should push it
> > > down in to TIME_SET_NSEC_FUZZ, or something else.
> > 
> > I think tick_nsec is the length of one tick in nanoseconds, adjusted
> > by NTP (see ntp_update_frequency() where it's calculated.)
> 
> Yes, having some time to look at it now, tick_ns*5 is about 5
> jiffies, your system is using 4ms jiffies, so the existing code has a
> TIME_SET_NSEC_FUZZ of +-20ms from the target time. Guessing someone
> already figured out the WQ is very inaccurate and delt with it by
> using tick_ns*5. I copied that into patch v3 below

It's much worse than that - I'm seeing the 10s timer delayed by between
1 and 63 jiffies, which is 4 to 252ms - over a quarter of a second.  The
further the timer is in the future, the more inaccurate it is.

This, I think, is why the code reschedules the timer in the failure case
using as small a delay as possible, not 10s.

> Perhaps this should ultimately be moved to a hrtimer or something.
> 
> > The only places it's used is by the NTP code, and also by a few
> > architectures to that still have gettimeoffset() functionality (pre-
> > clocksource) as it can be used to derive the scaling factor to convert
> > timer count ticks to wall time.
> 
> This may also be a source of some of the WQ error, my patches have
> done this:
> 
> 	getnstimeofday64(&now);
> 	delta = timespec64_sub(next, now);
>         timespec64_to_jiffies(&delta));
> 
> 'delta' is wall clock ns, not jiffies ns. Depending on how the
> timekeeping works, it may need a scale factor when converting to
> jiffies? Particularly when talking about the 11 min sleep
> 
> However the original code does not seem to have anything like that..
> 
> > Maybe the solution here is to split two forms of ntp RTC synchronisation
> > so:
> > - If you enable CONFIG_GENERIC_CMOS_UPDATE, then you get the
> >   update_persistent_clock*() methods called.
> > - If you enable CONFIG_RTC_SYSTOHC, rtc_set_ntp_time() gets called.
> > - If you enable both, then both get called at their appropriate times.
> > 
> > This would certainly simplify sync_cmos_clock().
> 
> Yes, splitting makes sense to me, but calling both rtc_set_ntp_time
> and update_persistent_clock doesn't seem like a good idea. At least
> x86 looks like it will provide both mechanism?, We don't want to
> set the RTC twice..
> 
> Patch v3 reworks that code to be clearer as you suggested, but keeps
> the semantic that if cmos is supported rtc does not run.
> 
> Still untested, sorry about that..
> 
> v3:
> - Revise TIME_SET_NSEC_FUZZ to be 5 jiffies, like sync_cmos_clock had
> - Split sync_cmos_clock into CMOS and RTC code paths, retain the two
>   different kinds of rounding being used.
> - Go back to a 1 s update interval
> 
> Jason
> 
> >From 2ff270d4984daf100112c33689db428bebb73d6f Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> Date: Fri, 22 Sep 2017 10:43:11 -0600
> Subject: [PATCH v3] rtc: Allow rtc drivers to specify the tv_nsec value for ntp
> 
> ntp is currently hardwired to try and call the rtc set when wall clock
> tv_nsec is 0.5 seconds. This historical behaviour works well with certain
> PC RTCs, but is not universal to all rtc hardware.
> 
> Change how this works by introducing the driver specific concept of
> a required delay between current wall clock time and the target time
> to set (with a 0 tv_nsecs). When this delay is set to 0.5 seconds then
> the behaviour will be the same as today, at wall clock time 0.5 sec the
> RTC set will be called to set 1.0 sec.
> 
> Each RTC driver should set the set_offset_nsec according to its needs.
> 
> The calculation of the sleep time and 'fuzz' for ntp is also revised
> to use modern helper functions, and to more directly and safely
> compute a relative jiffies delay that will result in the correct
> tv_nsec. If for some reason the timer does not meet the requirement
> then it does a shorter sleep and tries again.
> 
> Since cmos and RTC now have very different handling they are split
> into two dedicated code paths, sharing the support code.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> ---
>  drivers/rtc/class.c   |   6 ++
>  drivers/rtc/systohc.c |  53 +++++++++++-----
>  include/linux/rtc.h   |  43 ++++++++++++-
>  kernel/time/ntp.c     | 171 +++++++++++++++++++++++++++++++++-----------------
>  4 files changed, 201 insertions(+), 72 deletions(-)
> 
> diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
> index 2ed970d61da140..eef4123a573504 100644
> --- a/drivers/rtc/class.c
> +++ b/drivers/rtc/class.c
> @@ -161,6 +161,12 @@ static struct rtc_device *rtc_allocate_device(void)
>  
>  	device_initialize(&rtc->dev);
>  
> +	/* Drivers can revise this default after allocating the device. It
> +	 * should be the value of wallclock tv_nsec that the driver needs in
> +	 * order to synchronize the second tick over during set.
> +	 */
> +	rtc->set_offset_nsec =  NSEC_PER_SEC / 2;
> +
>  	rtc->irq_freq = 1;
>  	rtc->max_user_freq = 64;
>  	rtc->dev.class = rtc_class;
> diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
> index b4a68ffcd06bb8..5e50c9db57344a 100644
> --- a/drivers/rtc/systohc.c
> +++ b/drivers/rtc/systohc.c
> @@ -10,6 +10,7 @@
>  /**
>   * rtc_set_ntp_time - Save NTP synchronized time to the RTC
>   * @now: Current time of day
> + * @target_nsec: pointer for desired now->tv_nsec value
>   *
>   * Replacement for the NTP platform function update_persistent_clock64
>   * that stores time for later retrieval by rtc_hctosys.
> @@ -18,30 +19,52 @@
>   * possible at all, and various other -errno for specific temporary failure
>   * cases.
>   *
> + * -EPROTO is returned if now.tv_nsec is not close enough to *target_nsec.
> + (
>   * If temporary failure is indicated the caller should try again 'soon'
>   */
> -int rtc_set_ntp_time(struct timespec64 now)
> +int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec)
>  {
>  	struct rtc_device *rtc;
>  	struct rtc_time tm;
> +	struct timespec64 to_set;
>  	int err = -ENODEV;
> -
> -	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
> -		rtc_time64_to_tm(now.tv_sec, &tm);
> -	else
> -		rtc_time64_to_tm(now.tv_sec + 1, &tm);
> +	bool ok;
>  
>  	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
> -	if (rtc) {
> -		/* rtc_hctosys exclusively uses UTC, so we call set_time here,
> -		 * not set_mmss. */
> -		if (rtc->ops &&
> -		    (rtc->ops->set_time ||
> -		     rtc->ops->set_mmss64 ||
> -		     rtc->ops->set_mmss))
> -			err = rtc_set_time(rtc, &tm);
> -		rtc_class_close(rtc);
> +	if (!rtc)
> +		goto out_err;
> +
> +	if (!rtc->ops || (!rtc->ops->set_time && !rtc->ops->set_mmss64 &&
> +			  !rtc->ops->set_mmss))
> +		goto out_close;
> +
> +	/* Compute the value of tv_nsec we require the caller to supply in
> +	 * now.tv_nsec.  This is the value such that (now +
> +	 * set_offset_nsec).tv_nsec == 0.
> +	 */
> +	set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec);
> +	*target_nsec = to_set.tv_nsec;
> +
> +	/* The ntp code must call this with the correct value in tv_nsec, if
> +	 * it does not we update target_nsec and return EPROTO to make the ntp
> +	 * code try again later.
> +	 */
> +	ok = rtc_tv_nsec_ok(rtc, &to_set, &now);

This doesn't match the prototype.

> +	if (!ok) {
> +		err = -EPROTO;
> +		goto out_close;
>  	}
>  
> +	rtc_time64_to_tm(to_set.tv_sec, &tm);
> +
> +	/* rtc_hctosys exclusively uses UTC, so we call set_time here, not
> +	 * set_mmss.
> +	 */
> +	err = rtc_set_time(rtc, &tm);
> +
> +out_close:
> +	rtc_class_close(rtc);
> +out_err:
>  	return err;
>  }
> diff --git a/include/linux/rtc.h b/include/linux/rtc.h
> index 0a0f0d14a5fba5..eb767f0bfc13fa 100644
> --- a/include/linux/rtc.h
> +++ b/include/linux/rtc.h
> @@ -137,6 +137,14 @@ struct rtc_device {
>  	/* Some hardware can't support UIE mode */
>  	int uie_unsupported;
>  
> +	/* Number of nsec it takes to set the RTC clock. This influences when
> +	 * the set ops are called. An offset:
> +	 *   - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s
> +	 *   - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s
> +	 *   - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s
> +	 */
> +	long set_offset_nsec;
> +
>  	bool registered;
>  
>  	struct nvmem_config *nvmem_config;
> @@ -174,7 +182,7 @@ extern void devm_rtc_device_unregister(struct device *dev,
>  
>  extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm);
>  extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
> -extern int rtc_set_ntp_time(struct timespec64 now);
> +extern int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec);
>  int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
>  extern int rtc_read_alarm(struct rtc_device *rtc,
>  			struct rtc_wkalrm *alrm);
> @@ -223,6 +231,39 @@ static inline bool is_leap_year(unsigned int year)
>  	return (!(year % 4) && (year % 100)) || !(year % 400);
>  }
>  
> +/* Determine if we can call to driver to set the time. Drivers can only be
> + * called to set a second aligned time value, and the field set_offset_nsec
> + * specifies how far away from the second aligned time to call the driver.
> + *
> + * This also computes 'to_set' which is the time we are trying to set, and has
> + * a zero in tv_nsecs, such that:
> + *    to_set - set_delay_nsec == now +/- FUZZ
> + *
> + */
> +static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
> +				  struct timespec64 *to_set,
> +				  const struct timespec64 *now)
> +{
> +	/* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
> +	const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
> +	struct timespec64 delay = {.tv_sec = 0,
> +				   .tv_nsec = set_offset_nsec};
> +
> +	*to_set = timespec64_add(*now, delay);
> +
> +	if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
> +		to_set->tv_nsec = 0;
> +		return true;
> +	}
> +
> +	if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
> +		to_set->tv_sec++;
> +		to_set->tv_nsec = 0;
> +		return true;
> +	}
> +	return false;
> +}
> +
>  #define rtc_register_device(device) \
>  	__rtc_register_device(THIS_MODULE, device)
>  
> diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
> index edf19cc5314043..90c348781995b5 100644
> --- a/kernel/time/ntp.c
> +++ b/kernel/time/ntp.c
> @@ -492,12 +492,77 @@ int second_overflow(time64_t secs)
>  	return leap;
>  }
>  
> -#ifdef CONFIG_GENERIC_CMOS_UPDATE
> -int __weak update_persistent_clock(struct timespec now)
> +static void sync_hw_clock(struct work_struct *work);
> +static DECLARE_DELAYED_WORK(sync_work, sync_hw_clock);
> +
> +static void sched_sync_hw_clock(struct timespec64 now,
> +				unsigned long target_nsec, bool fail)
>  {
> -	return -ENODEV;
> +	struct timespec64 next, delta;
> +
> +	do {
> +		/*
> +		 * Compute the next wall clock time to try and set the
> +		 * clock
> +		 */
> +		next = now;
> +		if (!fail)
> +			timespec64_add_ns(&next, 659ULL * NSEC_PER_SEC);
> +		else
> +			/* Update failed, try again in about 10 seconds */
> +			timespec64_add_ns(&next, 10ULL * NSEC_PER_SEC);
> +
> +		/*
> +		 * The next call to sync_cmos_clock needs to have have a wall
> +		 * clock tv_nsec value equal to target_nsec.
> +		 */
> +		if (next.tv_nsec > target_nsec)
> +			next.tv_sec++;
> +		next.tv_nsec = target_nsec;
> +
> +		/*
> +		 * Convert to a relative delay. If time set took a really long
> +		 * time, or the wall clock was changed, this might become
> +		 * negative, so try again.
> +		 */
> +		getnstimeofday64(&now);
> +		delta = timespec64_sub(next, now);
> +	} while (delta.tv_sec <= 0);

If we re-read "now"@the start, do we need the complexity of this loop?
it isn't critical that this gets run exactly 11 minutes after the previous
attempt, so we ought to keep this code simple.  At that point, we might
as well keep the whole thing simple - we don't need all the complexity
that the timespec64 math helpers above bring with it.

	getnstimeofday64(&next);
	if (!fail)
		next.tv_sec = 659;
	else
		next.tv_sec = 0;

	/* 0 < target_nsec < NSEC_PER_SEC */
	next.tv_nsec = target_nsec - next.tv_nsec;
	if (next.tv_nsec <= 0)
		next.tv_nsec += NSEC_PER_SEC;
	if (next.tv_nsec >= NSEC_PER_SEC) {
		next.tv_sec++;
		next.tv_nsec -= NSEC_PER_SEC;
	}

This is basically what the current mainline code is doing, and there's
not much need to move away from it, because target_nsec needs to still
satisfy the constraint mentioned in the comment above.

The weirdness with the <= 0 and >= NSEC_PER_SEC also caters for the
tv_sec = tv_nsec = 0 case, bumping into the next second for a zero
timeout.

> +
> +	queue_delayed_work(system_power_efficient_wq, &sync_work,
> +			   timespec64_to_jiffies(&delta));
>  }
>  
> +#if defined(CONFIG_RTC_SYSTOHC)
> +static void sync_rtc_clock(void)
> +{
> +	unsigned long target_nsec;
> +	struct timespec64 adjust, now;
> +	int rc;
> +
> +	getnstimeofday64(&now);
> +
> +	now = adjust;
> +	if (persistent_clock_is_local)
> +		adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
> +
> +	/*
> +	 * The current RTC in use will provide the target_nsec it wants to be
> +	 * called at, and does rtc_tv_nsec_ok internally.
> +	 */
> +	rc = rtc_set_ntp_time(adjust, &target_nsec);
> +
> +	sched_sync_hw_clock(now, target_nsec, rc && rc != -ENODEV);
> +}
> +#else
> +static void sync_rtc_clock(void)
> +{
> +}
> +#endif
> +
> +#if defined(CONFIG_GENERIC_CMOS_UPDATE)
> +int __weak update_persistent_clock(struct timespec now) { return -ENODEV; }
> +
>  int __weak update_persistent_clock64(struct timespec64 now64)
>  {
>  	struct timespec now;
> @@ -505,78 +570,72 @@ int __weak update_persistent_clock64(struct timespec64 now64)
>  	now = timespec64_to_timespec(now64);
>  	return update_persistent_clock(now);
>  }
> -#endif
> -
> -#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
> -static void sync_cmos_clock(struct work_struct *work);
> -
> -static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock);
>  
> -static void sync_cmos_clock(struct work_struct *work)
> +static bool sync_cmos_clock(void)
>  {
> +	static bool no_cmos;
>  	struct timespec64 now;
> -	struct timespec64 next;
> -	int fail = 1;
> +	struct timespec64 adjust;
> +	int rc = -EPROTO;
> +	long target_nsec = NSEC_PER_SEC / 2;
> +
> +	if (no_cmos)
> +		return false;
>  
>  	/*
> -	 * If we have an externally synchronized Linux clock, then update
> -	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
> -	 * called as close as possible to 500 ms before the new second starts.
> -	 * This code is run on a timer.  If the clock is set, that timer
> -	 * may not expire at the correct time.  Thus, we adjust...
> -	 * We want the clock to be within a couple of ticks from the target.
> +	 * Historically update_persistent_clock64() has to be called as close
> +	 * as possible to 500 ms (target_nsec) before the new second starts.
> +	 * The CMOS code 'rounds down' from this value, so we use a negative
> +	 * input to rtc_tv_nsec_ok to compute adjust.
> +	 *
> +	 * Architectures are strongly encouraged to use rtclib and not
> +	 * implement this legacy API.
>  	 */
> -	if (!ntp_synced()) {
> -		/*
> -		 * Not synced, exit, do not restart a timer (if one is
> -		 * running, let it run out).
> -		 */
> -		return;
> -	}
> -
>  	getnstimeofday64(&now);
> -	if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {
> -		struct timespec64 adjust = now;
> -
> -		fail = -ENODEV;
> +	if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) {
>  		if (persistent_clock_is_local)
>  			adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
> -#ifdef CONFIG_GENERIC_CMOS_UPDATE
> -		fail = update_persistent_clock64(adjust);
> -#endif
> +		rc = update_persistent_clock64(adjust);
> +		/*
> +		 * The machine does not support update_persistent_clock64 even
> +		 * though it defines it.
> +		 */
> +		if (rc == -ENODEV) {
> +			no_cmos = true;
> +			return false;
> +		}
> +	}
>  
> -#ifdef CONFIG_RTC_SYSTOHC
> -		if (fail == -ENODEV)
> -			fail = rtc_set_ntp_time(adjust);
> +	sched_sync_hw_clock(now, target_nsec, rc);
> +	return true;
> +}
> +#else
> +static bool sync_cmos_clock(void)
> +{
> +	return false;
> +}
>  #endif
> -	}
>  
> -	next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2);
> -	if (next.tv_nsec <= 0)
> -		next.tv_nsec += NSEC_PER_SEC;
> +static void sync_hw_clock(struct work_struct *work)
> +{
> +	if (!ntp_synced())
> +		return;
>  
> -	if (!fail || fail == -ENODEV)
> -		next.tv_sec = 659;
> -	else
> -		next.tv_sec = 0;
> +	if (sync_cmos_clock())
> +		return;
>  
> -	if (next.tv_nsec >= NSEC_PER_SEC) {
> -		next.tv_sec++;
> -		next.tv_nsec -= NSEC_PER_SEC;
> -	}
> -	queue_delayed_work(system_power_efficient_wq,
> -			   &sync_cmos_work, timespec64_to_jiffies(&next));
> +	sync_rtc_clock();
>  }
>  
>  void ntp_notify_cmos_timer(void)
>  {
> -	queue_delayed_work(system_power_efficient_wq, &sync_cmos_work, 0);
> -}
> -
> -#else
> -void ntp_notify_cmos_timer(void) { }
> -#endif
> +	if (!ntp_synced())
> +		return;
>  
> +	if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
> +	    IS_ENABLED(CONFIG_RTC_SYSTOHC))
> +		queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
> +}
>  
>  /*
>   * Propagate a new txc->status value into the NTP state:
> -- 
> 2.7.4
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-22 17:20                     ` Russell King - ARM Linux
@ 2017-09-22 18:24                       ` Jason Gunthorpe
  2017-09-23 18:23                         ` Russell King - ARM Linux
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2017-09-22 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 22, 2017 at 06:20:22PM +0100, Russell King - ARM Linux wrote:

> > This suggests the CMOS mechanism expects to set time 1.0 seconds at
> > instant 1.5 s. ie when the RTC executes the time set it sets the seconds
> > scaler to 0.5, not zero. With the v2 patch this would be a -0.5s
> > set_offset_nsec value.
> 
> See the comment in arch/x86/kernel/rtc.c:
> 
> /*
>  * In order to set the CMOS clock precisely, set_rtc_mmss has to be
>  * called 500 ms after the second nowtime has started, because when
>  * nowtime is written into the registers of the CMOS clock, it will
>  * jump to the next second precisely 500 ms later. Check the Motorola
>  * MC146818A or Dallas DS12887 data sheet for details.
>  */

Right, I've seen that before, so this is all on the right track, and
rtc_set_ntp_time is broken to do that old rounding when calling those
chips..

We can probably even go ahead and patch those the RTC driver for the
above chips to use set_offset_nsec = -0.5s

I wonder if the default for set_offset_nsec should then be set to 0?
Unclear to me if rtc_set_ntp_time ever actually worked at sub second
resolution with any RTC ..

> This, I think, is why the code reschedules the timer in the failure case
> using as small a delay as possible, not 10s.

That makes sense.

> > +	/* The ntp code must call this with the correct value in tv_nsec, if
> > +	 * it does not we update target_nsec and return EPROTO to make the ntp
> > +	 * code try again later.
> > +	 */
> > +	ok = rtc_tv_nsec_ok(rtc, &to_set, &now);
> 
> This doesn't match the prototype.

Arg, the compiler warning got lost during all my edits..

> > +static void sched_sync_hw_clock(struct timespec64 now,
> > +				unsigned long target_nsec, bool fail)
> >  {
> > -	return -ENODEV;
> > +	struct timespec64 next, delta;
> > +
> > +	do {
> > +		/*
> > +		 * Compute the next wall clock time to try and set the
> > +		 * clock
> > +		 */
> > +		next = now;
> > +		if (!fail)
> > +			timespec64_add_ns(&next, 659ULL * NSEC_PER_SEC);
> > +		else
> > +			/* Update failed, try again in about 10 seconds */
> > +			timespec64_add_ns(&next, 10ULL * NSEC_PER_SEC);
> > +
> > +		/*
> > +		 * The next call to sync_cmos_clock needs to have have a wall
> > +		 * clock tv_nsec value equal to target_nsec.
> > +		 */
> > +		if (next.tv_nsec > target_nsec)
> > +			next.tv_sec++;
> > +		next.tv_nsec = target_nsec;
> > +
> > +		/*
> > +		 * Convert to a relative delay. If time set took a really long
> > +		 * time, or the wall clock was changed, this might become
> > +		 * negative, so try again.
> > +		 */
> > +		getnstimeofday64(&now);
> > +		delta = timespec64_sub(next, now);
> > +	} while (delta.tv_sec <= 0);
> 
> If we re-read "now" at the start, do we need the complexity of this
> loop?

Nope, probably not.

> attempt, so we ought to keep this code simple.  At that point, we might
> as well keep the whole thing simple - we don't need all the complexity
> that the timespec64 math helpers above bring with it.

Okay, this looks fine to me, thanks.

For reference, here is the revised patch

>From 067bae114369a17d2e90b4522434dd568fe62558 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Fri, 22 Sep 2017 12:18:03 -0600
Subject: [PATCH v4] rtc: Allow rtc drivers to specify the tv_nsec value for ntp

ntp is currently hardwired to try and call the rtc set when wall clock
tv_nsec is 0.5 seconds. This historical behaviour works well with certain
PC RTCs, but is not universal to all rtc hardware.

Change how this works by introducing the driver specific concept of
set_offset_nsec, the delay between current wall clock time and the target
time to set (with a 0 tv_nsecs).

For x86-style CMOS set_offset_nsec should be -0.5 s which causes the last
second to be written 0.5 s after it has started.

For compat with the old rtc_set_ntp_time, the value is defaulted to
+ 0.5 s, which causes the next second to be written 0.5s before it starts,
as things were before this patch.

Testing shows many non-x86 RTCs would like set_offset_nsec ~= 0,
so ultimately each RTC driver should set the set_offset_nsec according
to its needs, and non x86 architectures should stop using
update_persistent_clock64 in order to access this feature.
Future patches will revise the drivers as needed.

Since CMOS and RTC now have very different handling they are split
into two dedicated code paths, sharing the support code.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/rtc/class.c   |   6 ++
 drivers/rtc/systohc.c |  53 +++++++++++-----
 include/linux/rtc.h   |  43 ++++++++++++-
 kernel/time/ntp.c     | 164 +++++++++++++++++++++++++++++++++-----------------
 4 files changed, 194 insertions(+), 72 deletions(-)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 2ed970d61da140..eef4123a573504 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -161,6 +161,12 @@ static struct rtc_device *rtc_allocate_device(void)
 
 	device_initialize(&rtc->dev);
 
+	/* Drivers can revise this default after allocating the device. It
+	 * should be the value of wallclock tv_nsec that the driver needs in
+	 * order to synchronize the second tick over during set.
+	 */
+	rtc->set_offset_nsec =  NSEC_PER_SEC / 2;
+
 	rtc->irq_freq = 1;
 	rtc->max_user_freq = 64;
 	rtc->dev.class = rtc_class;
diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c
index b4a68ffcd06bb8..0c177647ea6c71 100644
--- a/drivers/rtc/systohc.c
+++ b/drivers/rtc/systohc.c
@@ -10,6 +10,7 @@
 /**
  * rtc_set_ntp_time - Save NTP synchronized time to the RTC
  * @now: Current time of day
+ * @target_nsec: pointer for desired now->tv_nsec value
  *
  * Replacement for the NTP platform function update_persistent_clock64
  * that stores time for later retrieval by rtc_hctosys.
@@ -18,30 +19,52 @@
  * possible at all, and various other -errno for specific temporary failure
  * cases.
  *
+ * -EPROTO is returned if now.tv_nsec is not close enough to *target_nsec.
+ (
  * If temporary failure is indicated the caller should try again 'soon'
  */
-int rtc_set_ntp_time(struct timespec64 now)
+int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec)
 {
 	struct rtc_device *rtc;
 	struct rtc_time tm;
+	struct timespec64 to_set;
 	int err = -ENODEV;
-
-	if (now.tv_nsec < (NSEC_PER_SEC >> 1))
-		rtc_time64_to_tm(now.tv_sec, &tm);
-	else
-		rtc_time64_to_tm(now.tv_sec + 1, &tm);
+	bool ok;
 
 	rtc = rtc_class_open(CONFIG_RTC_SYSTOHC_DEVICE);
-	if (rtc) {
-		/* rtc_hctosys exclusively uses UTC, so we call set_time here,
-		 * not set_mmss. */
-		if (rtc->ops &&
-		    (rtc->ops->set_time ||
-		     rtc->ops->set_mmss64 ||
-		     rtc->ops->set_mmss))
-			err = rtc_set_time(rtc, &tm);
-		rtc_class_close(rtc);
+	if (!rtc)
+		goto out_err;
+
+	if (!rtc->ops || (!rtc->ops->set_time && !rtc->ops->set_mmss64 &&
+			  !rtc->ops->set_mmss))
+		goto out_close;
+
+	/* Compute the value of tv_nsec we require the caller to supply in
+	 * now.tv_nsec.  This is the value such that (now +
+	 * set_offset_nsec).tv_nsec == 0.
+	 */
+	set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec);
+	*target_nsec = to_set.tv_nsec;
+
+	/* The ntp code must call this with the correct value in tv_nsec, if
+	 * it does not we update target_nsec and return EPROTO to make the ntp
+	 * code try again later.
+	 */
+	ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now);
+	if (!ok) {
+		err = -EPROTO;
+		goto out_close;
 	}
 
+	rtc_time64_to_tm(to_set.tv_sec, &tm);
+
+	/* rtc_hctosys exclusively uses UTC, so we call set_time here, not
+	 * set_mmss.
+	 */
+	err = rtc_set_time(rtc, &tm);
+
+out_close:
+	rtc_class_close(rtc);
+out_err:
 	return err;
 }
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 0a0f0d14a5fba5..eb767f0bfc13fa 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -137,6 +137,14 @@ struct rtc_device {
 	/* Some hardware can't support UIE mode */
 	int uie_unsupported;
 
+	/* Number of nsec it takes to set the RTC clock. This influences when
+	 * the set ops are called. An offset:
+	 *   - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s
+	 *   - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s
+	 *   - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s
+	 */
+	long set_offset_nsec;
+
 	bool registered;
 
 	struct nvmem_config *nvmem_config;
@@ -174,7 +182,7 @@ extern void devm_rtc_device_unregister(struct device *dev,
 
 extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm);
 extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
-extern int rtc_set_ntp_time(struct timespec64 now);
+extern int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec);
 int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
 extern int rtc_read_alarm(struct rtc_device *rtc,
 			struct rtc_wkalrm *alrm);
@@ -223,6 +231,39 @@ static inline bool is_leap_year(unsigned int year)
 	return (!(year % 4) && (year % 100)) || !(year % 400);
 }
 
+/* Determine if we can call to driver to set the time. Drivers can only be
+ * called to set a second aligned time value, and the field set_offset_nsec
+ * specifies how far away from the second aligned time to call the driver.
+ *
+ * This also computes 'to_set' which is the time we are trying to set, and has
+ * a zero in tv_nsecs, such that:
+ *    to_set - set_delay_nsec == now +/- FUZZ
+ *
+ */
+static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec,
+				  struct timespec64 *to_set,
+				  const struct timespec64 *now)
+{
+	/* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */
+	const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5;
+	struct timespec64 delay = {.tv_sec = 0,
+				   .tv_nsec = set_offset_nsec};
+
+	*to_set = timespec64_add(*now, delay);
+
+	if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) {
+		to_set->tv_nsec = 0;
+		return true;
+	}
+
+	if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) {
+		to_set->tv_sec++;
+		to_set->tv_nsec = 0;
+		return true;
+	}
+	return false;
+}
+
 #define rtc_register_device(device) \
 	__rtc_register_device(THIS_MODULE, device)
 
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index edf19cc5314043..c209fc1d455668 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -492,11 +492,70 @@ int second_overflow(time64_t secs)
 	return leap;
 }
 
-#ifdef CONFIG_GENERIC_CMOS_UPDATE
-int __weak update_persistent_clock(struct timespec now)
+static void sync_hw_clock(struct work_struct *work);
+static DECLARE_DELAYED_WORK(sync_work, sync_hw_clock);
+
+static void sched_sync_hw_clock(struct timespec64 now,
+				unsigned long target_nsec, bool fail)
+
+{
+	struct timespec64 next;
+
+	getnstimeofday64(&next);
+	if (!fail)
+		next.tv_sec = 659;
+	else {
+		/*
+		 * Try again as soon as possible. Delaying long periods
+		 * decreases the accuracy of the work queue timer. Due to this
+		 * the algorithm is very likely to require a short-sleep retry
+		 * after the above long sleep to synchronize ts_nsec.
+		 */
+		next.tv_sec = 0;
+	}
+
+	/* Compute the needed delay that will get to tv_nsec == target_nsec */
+	next.tv_nsec = target_nsec - next.tv_nsec;
+	if (next.tv_nsec <= 0)
+		next.tv_nsec += NSEC_PER_SEC;
+	if (next.tv_nsec >= NSEC_PER_SEC) {
+		next.tv_sec++;
+		next.tv_nsec -= NSEC_PER_SEC;
+	}
+
+	queue_delayed_work(system_power_efficient_wq, &sync_work,
+			   timespec64_to_jiffies(&next));
+}
+
+#if defined(CONFIG_RTC_SYSTOHC)
+static void sync_rtc_clock(void)
+{
+	unsigned long target_nsec;
+	struct timespec64 adjust, now;
+	int rc;
+
+	getnstimeofday64(&now);
+
+	now = adjust;
+	if (persistent_clock_is_local)
+		adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
+
+	/*
+	 * The current RTC in use will provide the target_nsec it wants to be
+	 * called at, and does rtc_tv_nsec_ok internally.
+	 */
+	rc = rtc_set_ntp_time(adjust, &target_nsec);
+
+	sched_sync_hw_clock(now, target_nsec, rc && rc != -ENODEV);
+}
+#else
+static void sync_rtc_clock(void)
 {
-	return -ENODEV;
 }
+#endif
+
+#if defined(CONFIG_GENERIC_CMOS_UPDATE)
+int __weak update_persistent_clock(struct timespec now) { return -ENODEV; }
 
 int __weak update_persistent_clock64(struct timespec64 now64)
 {
@@ -505,78 +564,71 @@ int __weak update_persistent_clock64(struct timespec64 now64)
 	now = timespec64_to_timespec(now64);
 	return update_persistent_clock(now);
 }
-#endif
-
-#if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
-static void sync_cmos_clock(struct work_struct *work);
-
-static DECLARE_DELAYED_WORK(sync_cmos_work, sync_cmos_clock);
 
-static void sync_cmos_clock(struct work_struct *work)
+static bool sync_cmos_clock(void)
 {
+	static bool no_cmos;
 	struct timespec64 now;
-	struct timespec64 next;
-	int fail = 1;
+	struct timespec64 adjust;
+	int rc = -EPROTO;
+	long target_nsec = NSEC_PER_SEC / 2;
+
+	if (no_cmos)
+		return false;
 
 	/*
-	 * If we have an externally synchronized Linux clock, then update
-	 * CMOS clock accordingly every ~11 minutes. Set_rtc_mmss() has to be
-	 * called as close as possible to 500 ms before the new second starts.
-	 * This code is run on a timer.  If the clock is set, that timer
-	 * may not expire at the correct time.  Thus, we adjust...
-	 * We want the clock to be within a couple of ticks from the target.
+	 * Historically update_persistent_clock64() has followed x86
+	 * semantics, which match the MC146818A/etc RTC. This RTC will store
+	 * 'adjust' and then in .5s it will advance once second.
+	 *
+	 * Architectures are strongly encouraged to use rtclib and not
+	 * implement this legacy API.
 	 */
-	if (!ntp_synced()) {
-		/*
-		 * Not synced, exit, do not restart a timer (if one is
-		 * running, let it run out).
-		 */
-		return;
-	}
-
 	getnstimeofday64(&now);
-	if (abs(now.tv_nsec - (NSEC_PER_SEC / 2)) <= tick_nsec * 5) {
-		struct timespec64 adjust = now;
-
-		fail = -ENODEV;
+	if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) {
 		if (persistent_clock_is_local)
 			adjust.tv_sec -= (sys_tz.tz_minuteswest * 60);
-#ifdef CONFIG_GENERIC_CMOS_UPDATE
-		fail = update_persistent_clock64(adjust);
-#endif
+		rc = update_persistent_clock64(adjust);
+		/*
+		 * The machine does not support update_persistent_clock64 even
+		 * though it defines it.
+		 */
+		if (rc == -ENODEV) {
+			no_cmos = true;
+			return false;
+		}
+	}
 
-#ifdef CONFIG_RTC_SYSTOHC
-		if (fail == -ENODEV)
-			fail = rtc_set_ntp_time(adjust);
+	sched_sync_hw_clock(now, target_nsec, rc);
+	return true;
+}
+#else
+static bool sync_cmos_clock(void)
+{
+	return false;
+}
 #endif
-	}
 
-	next.tv_nsec = (NSEC_PER_SEC / 2) - now.tv_nsec - (TICK_NSEC / 2);
-	if (next.tv_nsec <= 0)
-		next.tv_nsec += NSEC_PER_SEC;
+static void sync_hw_clock(struct work_struct *work)
+{
+	if (!ntp_synced())
+		return;
 
-	if (!fail || fail == -ENODEV)
-		next.tv_sec = 659;
-	else
-		next.tv_sec = 0;
+	if (sync_cmos_clock())
+		return;
 
-	if (next.tv_nsec >= NSEC_PER_SEC) {
-		next.tv_sec++;
-		next.tv_nsec -= NSEC_PER_SEC;
-	}
-	queue_delayed_work(system_power_efficient_wq,
-			   &sync_cmos_work, timespec64_to_jiffies(&next));
+	sync_rtc_clock();
 }
 
 void ntp_notify_cmos_timer(void)
 {
-	queue_delayed_work(system_power_efficient_wq, &sync_cmos_work, 0);
-}
-
-#else
-void ntp_notify_cmos_timer(void) { }
-#endif
+	if (!ntp_synced())
+		return;
 
+	if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
+	    IS_ENABLED(CONFIG_RTC_SYSTOHC))
+		queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
+}
 
 /*
  * Propagate a new txc->status value into the NTP state:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-22 18:24                       ` Jason Gunthorpe
@ 2017-09-23 18:23                         ` Russell King - ARM Linux
  2017-09-23 19:05                           ` Russell King - ARM Linux
  2017-09-24  1:30                           ` Jason Gunthorpe
  0 siblings, 2 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-23 18:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 22, 2017 at 12:24:24PM -0600, Jason Gunthorpe wrote:
> For reference, here is the revised patch

Thanks.  I'm running with this patch, and moved on to a different
machine (clearfog, Armada 388) with some of my other tweaks in
place (like debug printks).

It doesn't seem to manage to set the RTC:

[  164.508327] sched_sync_hw_clock: adjust=0.504426705 now=1506185035.504429664 next=0.995570336 target=500000000 fail=1
[  165.532499] sched_sync_hw_clock: adjust=0.528428314 now=1506185036.528431353 next=0.971568647 target=500000000 fail=1
[  166.524666] sched_sync_hw_clock: adjust=0.520430346 now=1506185037.520433426 next=0.979566574 target=500000000 fail=1
[  167.516828] sched_sync_hw_clock: adjust=0.512428087 now=1506185038.512431646 next=0.987568354 target=500000000 fail=1
[  168.508991] sched_sync_hw_clock: adjust=0.504429462 now=1506185039.504432861 next=0.995567139 target=500000000 fail=1
[  169.533162] sched_sync_hw_clock: adjust=0.528433808 now=1506185040.528437087 next=0.971562913 target=500000000 fail=1
[  170.525319] sched_sync_hw_clock: adjust=0.520430693 now=1506185041.520433773 next=0.979566227 target=500000000 fail=1
[  171.517485] sched_sync_hw_clock: adjust=0.512431067 now=1506185042.512436146 next=0.987563854 target=500000000 fail=1
[  172.509639] sched_sync_hw_clock: adjust=0.504428078 now=1506185043.504432197 next=0.995567803 target=500000000 fail=1
[  173.533807] sched_sync_hw_clock: adjust=0.528431790 now=1506185044.528435910 next=0.971564090 target=500000000 fail=1

Notice the weidness of "adjust" which is the "now" argument to
sched_sync_hw_clock() (now= is the getnstimeofday64() value inside
the function.)

> +#if defined(CONFIG_RTC_SYSTOHC)
> +static void sync_rtc_clock(void)
> +{
> +	unsigned long target_nsec;
> +	struct timespec64 adjust, now;
> +	int rc;
> +
> +	getnstimeofday64(&now);
> +
> +	now = adjust;

This should be:
	adjust = now;

With that fixed, things start behaving:

[ 1610.640275] sched_sync_hw_clock: adjust=1506186910.320475949 now=1506186910.320481069 next=0.179518931 target=500000000 fail=1
[ 1610.824271] sched_sync_hw_clock: adjust=1506186910.504470719 now=1506186910.504489040 next=659.995510960 target=500000000 fail=0

and:

# ./test-rtc
Takes average of 64364ns to read RTC
Took 63923ns to read RTC on second change
RTC Time: 23-09-2017 17:15:32
System Time was:     17:15:31.505
# perl -e 'print gmtime( 1506186910.504470719)."\n";'
Sat Sep 23 17:15:10 2017

The update to the clock looks like it took 19us, so nice and quick.  We
were about 504ms past the second, which is reflected in the read-back.
Setting the offset to zero gives:

[ 2283.310136] sched_sync_hw_clock: adjust=1506187583.024478934 now=1506187583.024484335 next=0.975515665 target=0 fail=1
[ 2284.302088] sched_sync_hw_clock: adjust=1506187584.016471005 now=1506187584.016490006 next=659.983509994 target=0 fail=0

and:

RTC Time: 23-09-2017 17:26:28
System Time was:     17:26:28.017

So we're getting it almost right - except for the sloppyness of the
workqueue being run (which is mostly dependent on the sloppiness of
the timer.)

[ 2954.059895] sched_sync_hw_clock: adjust=1506188253.808470501 now=1506188253.808475661 next=0.191524339 target=0 fail=1
[ 2954.255898] sched_sync_hw_clock: adjust=1506188254.004471056 now=1506188254.004490017 next=659.995509983 target=0 fail=0
[ 3625.769764] sched_sync_hw_clock: adjust=1506188925.552468680 now=1506188925.552473640 next=0.447526360 target=0 fail=1
[ 3626.249738] sched_sync_hw_clock: adjust=1506188926.032470718 now=1506188926.032474358 next=0.967525642 target=0 fail=1
[ 3627.241688] sched_sync_hw_clock: adjust=1506188927.024472221 now=1506188927.024475501 next=0.975524499 target=0 fail=1
[ 3628.233656] sched_sync_hw_clock: adjust=1506188928.016475925 now=1506188928.016493206 next=659.983506794 target=0 fail=0
[ 4297.479450] sched_sync_hw_clock: adjust=1506189597.296468095 now=1506189597.296473256 next=0.703526744 target=0 fail=1
[ 4298.215411] sched_sync_hw_clock: adjust=1506189598.032469007 now=1506189598.032472527 next=0.967527473 target=0 fail=1
[ 4299.207366] sched_sync_hw_clock: adjust=1506189599.024475501 now=1506189599.024478982 next=0.975521018 target=0 fail=1
[ 4300.199324] sched_sync_hw_clock: adjust=1506189600.016469555 now=1506189600.016487676 next=659.983512324 target=0 fail=0
[ 4969.189207] sched_sync_hw_clock: adjust=1506190269.040474937 now=1506190269.040479897 next=0.959520103 target=0 fail=1
[ 4970.181150] sched_sync_hw_clock: adjust=1506190270.032471016 now=1506190270.032474976 next=0.967525024 target=0 fail=1
[ 4971.173104] sched_sync_hw_clock: adjust=1506190271.024476656 now=1506190271.024480136 next=0.975519864 target=0 fail=1
[ 4972.165063] sched_sync_hw_clock: adjust=1506190272.016471455 now=1506190272.016489416 next=659.983510584 target=0 fail=0

It looks like we struggle to hit that window, even though it's +/-
5 ticks (@250Hz, that's still 20ms, and we can see from the above
we're twice that.)

So, I think the old timers are just too sloppy in modern kernels.
If we want to do this, we need to use a hrtimer to trigger a
workqueue - in other words, open-coding a hrtimer-delayed workqueue.
That's something I've already tried, and it seems to work a lot
better.  I'll adapt it to your current patch.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-23 18:23                         ` Russell King - ARM Linux
@ 2017-09-23 19:05                           ` Russell King - ARM Linux
  2017-09-24  1:30                           ` Jason Gunthorpe
  1 sibling, 0 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2017-09-23 19:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 23, 2017 at 07:23:24PM +0100, Russell King - ARM Linux wrote:
> So we're getting it almost right - except for the sloppyness of the
> workqueue being run (which is mostly dependent on the sloppiness of
> the timer.)
> 
> [ 2954.059895] sched_sync_hw_clock: adjust=1506188253.808470501 now=1506188253.808475661 next=0.191524339 target=0 fail=1
> [ 2954.255898] sched_sync_hw_clock: adjust=1506188254.004471056 now=1506188254.004490017 next=659.995509983 target=0 fail=0
> [ 3625.769764] sched_sync_hw_clock: adjust=1506188925.552468680 now=1506188925.552473640 next=0.447526360 target=0 fail=1
> [ 3626.249738] sched_sync_hw_clock: adjust=1506188926.032470718 now=1506188926.032474358 next=0.967525642 target=0 fail=1
> [ 3627.241688] sched_sync_hw_clock: adjust=1506188927.024472221 now=1506188927.024475501 next=0.975524499 target=0 fail=1
> [ 3628.233656] sched_sync_hw_clock: adjust=1506188928.016475925 now=1506188928.016493206 next=659.983506794 target=0 fail=0
> [ 4297.479450] sched_sync_hw_clock: adjust=1506189597.296468095 now=1506189597.296473256 next=0.703526744 target=0 fail=1
> [ 4298.215411] sched_sync_hw_clock: adjust=1506189598.032469007 now=1506189598.032472527 next=0.967527473 target=0 fail=1
> [ 4299.207366] sched_sync_hw_clock: adjust=1506189599.024475501 now=1506189599.024478982 next=0.975521018 target=0 fail=1
> [ 4300.199324] sched_sync_hw_clock: adjust=1506189600.016469555 now=1506189600.016487676 next=659.983512324 target=0 fail=0
> [ 4969.189207] sched_sync_hw_clock: adjust=1506190269.040474937 now=1506190269.040479897 next=0.959520103 target=0 fail=1
> [ 4970.181150] sched_sync_hw_clock: adjust=1506190270.032471016 now=1506190270.032474976 next=0.967525024 target=0 fail=1
> [ 4971.173104] sched_sync_hw_clock: adjust=1506190271.024476656 now=1506190271.024480136 next=0.975519864 target=0 fail=1
> [ 4972.165063] sched_sync_hw_clock: adjust=1506190272.016471455 now=1506190272.016489416 next=659.983510584 target=0 fail=0
> 
> It looks like we struggle to hit that window, even though it's +/-
> 5 ticks (@250Hz, that's still 20ms, and we can see from the above
> we're twice that.)
> 
> So, I think the old timers are just too sloppy in modern kernels.
> If we want to do this, we need to use a hrtimer to trigger a
> workqueue - in other words, open-coding a hrtimer-delayed workqueue.
> That's something I've already tried, and it seems to work a lot
> better.  I'll adapt it to your current patch.

Same system, but using hrtimer instead:

[    6.256727] sched_sync_hw_clock: adjust=1506191994.278198320 now=1506191994.278210760 next=0.721789240 target=0 fail=1
[    6.978658] sched_sync_hw_clock: adjust=1506191995.000428349 now=1506191995.000635599 next=659.999364401 target=0 fail=0
[  666.863114] sched_sync_hw_clock: adjust=1506192655.001876494 now=1506192655.001897657 next=659.998102343 target=0 fail=0
[ 1326.830734] sched_sync_hw_clock: adjust=1506193315.001819282 now=1506193315.001840282 next=659.998159718 target=0 fail=0

# ./test-rtc
Takes average of 64608ns to read RTC
Took 63920ns to read RTC on second change
RTC Time: 23-09-2017 19:01:59
System Time was:     19:01:59.002

which looks way better.  Here's the hrtimer conversion I tested there:

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 32268e338015..7ab4f1628967 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -492,10 +492,58 @@ int second_overflow(time64_t secs)
 	return leap;
 }
 
+/*
+ * We need to use a hrtimer to queue our work - normal delayed works use
+ * the timer_list timers, which are too inaccurate: for a 10s delay,
+ * they can be delayed between 1 and 63 jiffies.
+ */
+struct hrtimer_delayed_work {
+	struct work_struct work;
+	struct hrtimer timer;
+};
+
+static enum hrtimer_restart hdw_queue_work(struct hrtimer *hr)
+{
+	struct hrtimer_delayed_work *w;
+
+	w = container_of(hr, struct hrtimer_delayed_work, timer);
+	queue_work(system_power_efficient_wq, &w->work);
+
+	return HRTIMER_NORESTART;
+}
+
+static bool hdw_queued(struct hrtimer_delayed_work *hdw)
+{
+	return hrtimer_is_queued(&hdw->timer) != 0;
+}
+
+static void hdw_queue(struct hrtimer_delayed_work *hdw,
+		      const struct timespec64 *t)
+{
+	ktime_t ns = t ? timespec64_to_ktime(*t) : 0;
+
+	hrtimer_start(&hdw->timer, ns, HRTIMER_MODE_REL);
+}
+
+static bool hdw_initialised(struct hrtimer_delayed_work *hdw)
+{
+	return hdw->timer.function != NULL;
+}
+
+static void hdw_init(struct hrtimer_delayed_work *hdw)
+{
+	hrtimer_init(&hdw->timer, CLOCK_REALTIME, HRTIMER_MODE_REL);
+	hdw->timer.function = hdw_queue_work;
+}
+
+
 unsigned int sysctl_ntp_rtc_sync = true;
 
 static void sync_hw_clock(struct work_struct *work);
-static DECLARE_DELAYED_WORK(sync_work, sync_hw_clock);
+
+static struct hrtimer_delayed_work sync_work = {
+	.work = __WORK_INITIALIZER(sync_work.work, sync_hw_clock),
+};
 
 static void sched_sync_hw_clock(struct timespec64 now,
 				unsigned long target_nsec, bool fail)
@@ -526,8 +574,7 @@ static void sched_sync_hw_clock(struct timespec64 now,
 		next.tv_nsec -= NSEC_PER_SEC;
 	}
 
-	queue_delayed_work(system_power_efficient_wq, &sync_work,
-			   timespec64_to_jiffies(&next));
+	hdw_queue(&sync_work, &next);
 
 	printk(KERN_DEBUG "%s: adjust=%lld.%09ld now=%lld.%09ld next=%lld.%09ld target=%ld fail=%d\n",
 		__func__,
@@ -635,9 +682,12 @@ void ntp_notify_cmos_timer(void)
 	if (!ntp_synced() || !sysctl_ntp_rtc_sync)
 		return;
 
-	if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
-	    IS_ENABLED(CONFIG_RTC_SYSTOHC))
-		queue_delayed_work(system_power_efficient_wq, &sync_work, 0);
+	if (!hdw_initialised(&sync_work))
+		hdw_init(&sync_work);
+
+	if ((IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) ||
+	     IS_ENABLED(CONFIG_RTC_SYSTOHC)) && !hdw_queued(&sync_work))
+		hdw_queue(&sync_work, NULL);
 }
 
 /*


-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-23 18:23                         ` Russell King - ARM Linux
  2017-09-23 19:05                           ` Russell King - ARM Linux
@ 2017-09-24  1:30                           ` Jason Gunthorpe
  1 sibling, 0 replies; 25+ messages in thread
From: Jason Gunthorpe @ 2017-09-24  1:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 23, 2017 at 07:23:24PM +0100, Russell King - ARM Linux wrote:

> So we're getting it almost right - except for the sloppyness of the
> workqueue being run (which is mostly dependent on the sloppiness of
> the timer.)

Looks really good. I made a note of your fix, but I probably will not
be able to pick this up again until Friday.

I think what is left to do:
- Test on x86 with CMOS, using rtclib and update_persistent_clock. I
  may be able to put together a system for this, but I don't have
  anything ready to go right now that is not kvm based. :\
- Add patches in a series to set set_offset_nsec for RTCs we have
  learned about.
- HR timer stuff, I'm inclined to put that in a dedicated patch in
  a series?

Anything else? Thanks for your patience with my untested patches :\

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* On NTP, RTCs and accurately setting their time
  2017-09-21 12:28       ` Russell King - ARM Linux
@ 2017-09-26 11:58         ` Alexandre Belloni
  0 siblings, 0 replies; 25+ messages in thread
From: Alexandre Belloni @ 2017-09-26 11:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/09/2017 at 13:28:53 +0100, Russell King - ARM Linux wrote:
> On Wed, Sep 20, 2017 at 04:45:22PM -0600, Jason Gunthorpe wrote:
> > What do you think of this untested approach in the below patch?
> 
> There's another issue.
> 
> Most drivers use rtc_device_register() or devm_rtc_device_register()
> rather than rtc_allocate_device() (which is static.)  This does not

It is static on purpose, drivers must use devm_rtc_allocate_device()
if they need to allocate the rtc before registering it. I'm in the
process of converting most drivers, including rtc_cmos as this solve
other race issues.

> give RTC drivers a chance to set rtc->time_set_nsec before the
> RTC is registered with the kernel.
> 
> Setting it after the device has been registered is racy.  So, having
> this member in struct rtc_device and assuming that drivers will override
> the value doesn't work.
> 
> It doesn't make sense to put it in rtc_class_ops as it isn't an
> operation, but we could add a callback in there which is used during
> initialisation but before registration which could be used to set this
> member.
> 

I'd prefer having drivers that need to set the value to something other
than 0 to convert to devm_rtc_allocate_device.


-- 
Alexandre Belloni, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-09-26 11:58 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20 11:21 On NTP, RTCs and accurately setting their time Russell King - ARM Linux
2017-09-20 13:23 ` Russell King - ARM Linux
2017-09-20 16:22 ` Jason Gunthorpe
2017-09-20 16:51   ` Russell King - ARM Linux
2017-09-20 17:16     ` Jason Gunthorpe
2017-09-20 22:45     ` Jason Gunthorpe
2017-09-21  7:59       ` Russell King - ARM Linux
2017-09-21  9:32         ` Russell King - ARM Linux
2017-09-21 11:30           ` Russell King - ARM Linux
2017-09-21 22:05           ` Jason Gunthorpe
2017-09-21 22:45             ` Russell King - ARM Linux
2017-09-21 23:20               ` Jason Gunthorpe
2017-09-22  9:57                 ` Russell King - ARM Linux
2017-09-22 12:24                   ` Russell King - ARM Linux
2017-09-22 16:28                     ` Russell King - ARM Linux
2017-09-22 16:48                   ` Jason Gunthorpe
2017-09-22 17:20                     ` Russell King - ARM Linux
2017-09-22 18:24                       ` Jason Gunthorpe
2017-09-23 18:23                         ` Russell King - ARM Linux
2017-09-23 19:05                           ` Russell King - ARM Linux
2017-09-24  1:30                           ` Jason Gunthorpe
2017-09-21  9:46       ` Russell King - ARM Linux
2017-09-21 11:29       ` Russell King - ARM Linux
2017-09-21 12:28       ` Russell King - ARM Linux
2017-09-26 11:58         ` Alexandre Belloni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.