* [PATCH] rtc: adapt allowed RTC update error @ 2020-12-01 14:38 Miroslav Lichvar 2020-12-01 16:12 ` Jason Gunthorpe 0 siblings, 1 reply; 38+ messages in thread From: Miroslav Lichvar @ 2020-12-01 14:38 UTC (permalink / raw) To: linux-kernel Cc: Miroslav Lichvar, Thomas Gleixner, John Stultz, Prarit Bhargava, Jason Gunthorpe When the system clock is marked as synchronized via adjtimex(), the kernel is expected to copy the system time to the RTC every 11 minutes. There are reports that it doesn't always work reliably. It seems the current requirement for the RTC update to happen within 5 ticks of the target time in some cases can consistently fail for hours or even days. It is better to set the RTC with a larger error than let it drift for too long. Add a static variable to rtc_tv_nsec_ok() to count the checks. With each failed check, relax the requirement by one jiffie, and reset the counter when it finally succeeds. This should allow the RTC update to happen in a minute at most. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> --- include/linux/rtc.h | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/rtc.h b/include/linux/rtc.h index 22d1575e4991..8d105f10ef6a 100644 --- a/include/linux/rtc.h +++ b/include/linux/rtc.h @@ -218,21 +218,30 @@ static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, struct timespec64 *to_set, const struct timespec64 *now) { - /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */ - const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5; struct timespec64 delay = {.tv_sec = 0, .tv_nsec = set_offset_nsec}; + unsigned long time_set_nsec_fuzz; + static unsigned int attempt; *to_set = timespec64_add(*now, delay); - if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) { + /* + * Determine allowed error in tv_nsec. Start at 5 jiffies and add a + * jiffie with each failed attempt to make sure the RTC will be set at + * some point, even if the update cannot be scheduled very accurately. + */ + time_set_nsec_fuzz = (5 + attempt++) * TICK_NSEC; + + if (to_set->tv_nsec < time_set_nsec_fuzz) { to_set->tv_nsec = 0; + attempt = 0; return true; } - if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) { + if (to_set->tv_nsec > NSEC_PER_SEC - time_set_nsec_fuzz) { to_set->tv_sec++; to_set->tv_nsec = 0; + attempt = 0; return true; } return false; -- 2.26.2 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-01 14:38 [PATCH] rtc: adapt allowed RTC update error Miroslav Lichvar @ 2020-12-01 16:12 ` Jason Gunthorpe 2020-12-01 17:14 ` Miroslav Lichvar 0 siblings, 1 reply; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-01 16:12 UTC (permalink / raw) To: Miroslav Lichvar Cc: linux-kernel, Thomas Gleixner, John Stultz, Prarit Bhargava On Tue, Dec 01, 2020 at 03:38:35PM +0100, Miroslav Lichvar wrote: > When the system clock is marked as synchronized via adjtimex(), the > kernel is expected to copy the system time to the RTC every 11 minutes. > > There are reports that it doesn't always work reliably. It seems the > current requirement for the RTC update to happen within 5 ticks of the > target time in some cases can consistently fail for hours or even days. > > It is better to set the RTC with a larger error than let it drift for > too long. > > Add a static variable to rtc_tv_nsec_ok() to count the checks. With each > failed check, relax the requirement by one jiffie, and reset the counter > when it finally succeeds. This should allow the RTC update to happen in > a minute at most. > > Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: John Stultz <john.stultz@linaro.org> > Cc: Prarit Bhargava <prarit@redhat.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > include/linux/rtc.h | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/include/linux/rtc.h b/include/linux/rtc.h > index 22d1575e4991..8d105f10ef6a 100644 > +++ b/include/linux/rtc.h > @@ -218,21 +218,30 @@ static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, > struct timespec64 *to_set, > const struct timespec64 *now) > { > - /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */ > - const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5; > struct timespec64 delay = {.tv_sec = 0, > .tv_nsec = set_offset_nsec}; > + unsigned long time_set_nsec_fuzz; > + static unsigned int attempt; Adding a static value instide a static inline should not be done I'm not sure using a static like this is the best idea anyhow, if you want something like this it should be per-rtc, not global Did you look at why time has become so in-accurate in your system? 5 jiffies is usually a pretty long time? Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-01 16:12 ` Jason Gunthorpe @ 2020-12-01 17:14 ` Miroslav Lichvar 2020-12-01 17:35 ` Jason Gunthorpe 0 siblings, 1 reply; 38+ messages in thread From: Miroslav Lichvar @ 2020-12-01 17:14 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-kernel, Thomas Gleixner, John Stultz, Prarit Bhargava On Tue, Dec 01, 2020 at 12:12:24PM -0400, Jason Gunthorpe wrote: > On Tue, Dec 01, 2020 at 03:38:35PM +0100, Miroslav Lichvar wrote: > > + unsigned long time_set_nsec_fuzz; > > + static unsigned int attempt; > > Adding a static value instide a static inline should not be done Well, grepping through the other header files in include/linux, this would not be the first case. > I'm not sure using a static like this is the best idea anyhow, if you > want something like this it should be per-rtc, not global If there are multiple RTCs, are they all updated in this 11-minute sync? > Did you look at why time has become so in-accurate in your system? 5 > jiffies is usually a pretty long time? I found no good explanation. It seems to depend on what system is doing, if it's idle, etc. I suspect it's a property of the workqueues that they cannot generally guarantee the jobs to run exactly on time. I tried switching to the non-power-efficient and high priority workqueues and it didn't seem to make a big difference. -- Miroslav Lichvar ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-01 17:14 ` Miroslav Lichvar @ 2020-12-01 17:35 ` Jason Gunthorpe 2020-12-02 10:01 ` [PATCHv2] " Miroslav Lichvar 2020-12-02 13:44 ` [PATCH] " Thomas Gleixner 0 siblings, 2 replies; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-01 17:35 UTC (permalink / raw) To: Miroslav Lichvar Cc: linux-kernel, Thomas Gleixner, John Stultz, Prarit Bhargava On Tue, Dec 01, 2020 at 06:14:20PM +0100, Miroslav Lichvar wrote: > On Tue, Dec 01, 2020 at 12:12:24PM -0400, Jason Gunthorpe wrote: > > On Tue, Dec 01, 2020 at 03:38:35PM +0100, Miroslav Lichvar wrote: > > > + unsigned long time_set_nsec_fuzz; > > > + static unsigned int attempt; > > > > Adding a static value instide a static inline should not be done > > Well, grepping through the other header files in include/linux, this > would not be the first case. Nevertheless, it should be avoided and has surprising behaviors. > > I'm not sure using a static like this is the best idea anyhow, if you > > want something like this it should be per-rtc, not global > > If there are multiple RTCs, are they all updated in this 11-minute > sync? Yes, but they have different offsets and thus different timers, IIRC. Though only one gets to be the CONFIG_RTC_SYSTOHC_DEVICE If you are going to put some static it is clearer to put it along side the sync_rtc_clock() > I found no good explanation. It seems to depend on what system is > doing, if it's idle, etc. I suspect it's a property of the workqueues > that they cannot generally guarantee the jobs to run exactly on time. > I tried switching to the non-power-efficient and high priority > workqueues and it didn't seem to make a big difference. When I wrote it originally the workqueues got increasingly inaccurate the longer the duration, so it does a very short sleep to get back on track. If you are missing that time target it must be constantly scheduling new delayed work and none of it hits the target for long, long time periods? This seems like a more fundamental problem someplace else in the kernel. Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCHv2] rtc: adapt allowed RTC update error 2020-12-01 17:35 ` Jason Gunthorpe @ 2020-12-02 10:01 ` Miroslav Lichvar 2020-12-02 13:44 ` [PATCH] " Thomas Gleixner 1 sibling, 0 replies; 38+ messages in thread From: Miroslav Lichvar @ 2020-12-02 10:01 UTC (permalink / raw) To: linux-kernel Cc: Miroslav Lichvar, Thomas Gleixner, John Stultz, Prarit Bhargava, Jason Gunthorpe When the system clock is marked as synchronized via adjtimex(), the kernel is expected to copy the system time to the RTC every 11 minutes. There are reports that it doesn't always work reliably. It seems the current requirement for the RTC update to happen within 5 ticks of the target time in some cases can consistently fail for hours or even days. It is better to set the RTC with a larger error than let it drift for too long. Instead of increasing the constant again, use a static variable to count the checks and with each failed check increase the allowed error by one jiffie. Reset the counter when the check finally succeeds. This will allow the RTC update to keep good accuracy if it can happen in the first few attempts and it will not take more than a minute if the timing is consistently bad for any reason. Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> --- Notes: v2: - moved the static variable to callers in ntp.c drivers/rtc/systohc.c | 6 ++++-- include/linux/rtc.h | 14 +++++++++----- kernel/time/ntp.c | 9 +++++++-- 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/drivers/rtc/systohc.c b/drivers/rtc/systohc.c index 8b70f0520e13..0777f590cdae 100644 --- a/drivers/rtc/systohc.c +++ b/drivers/rtc/systohc.c @@ -5,6 +5,7 @@ /** * rtc_set_ntp_time - Save NTP synchronized time to the RTC * @now: Current time of day + * @attempt: Number of previous failures used to adjust allowed error * @target_nsec: pointer for desired now->tv_nsec value * * Replacement for the NTP platform function update_persistent_clock64 @@ -18,7 +19,8 @@ * * If temporary failure is indicated the caller should try again 'soon' */ -int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec) +int rtc_set_ntp_time(struct timespec64 now, unsigned int attempt, + unsigned long *target_nsec) { struct rtc_device *rtc; struct rtc_time tm; @@ -44,7 +46,7 @@ int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec) * it does not we update target_nsec and return EPROTO to make the ntp * code try again later. */ - ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now); + ok = rtc_tv_nsec_ok(rtc->set_offset_nsec, attempt, &to_set, &now); if (!ok) { err = -EPROTO; goto out_close; diff --git a/include/linux/rtc.h b/include/linux/rtc.h index 22d1575e4991..9f3326b43620 100644 --- a/include/linux/rtc.h +++ b/include/linux/rtc.h @@ -165,7 +165,8 @@ int __rtc_register_device(struct module *owner, struct rtc_device *rtc); extern int rtc_read_time(struct rtc_device *rtc, struct rtc_time *tm); extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm); -extern int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec); +extern int rtc_set_ntp_time(struct timespec64 now, unsigned int attempt, + unsigned long *target_nsec); int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm); extern int rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alrm); @@ -213,24 +214,27 @@ static inline bool is_leap_year(unsigned int year) * a zero in tv_nsecs, such that: * to_set - set_delay_nsec == now +/- FUZZ * + * The allowed error starts at 5 jiffies on the first attempt and is increased + * with each failed attempt to make sure the RTC will be set at some point, + * even if the timing is consistently inaccurate. */ static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, + unsigned int attempt, struct timespec64 *to_set, const struct timespec64 *now) { - /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */ - const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5; + unsigned long time_set_nsec_fuzz = (5 + attempt) * TICK_NSEC; struct timespec64 delay = {.tv_sec = 0, .tv_nsec = set_offset_nsec}; *to_set = timespec64_add(*now, delay); - if (to_set->tv_nsec < TIME_SET_NSEC_FUZZ) { + if (to_set->tv_nsec < time_set_nsec_fuzz) { to_set->tv_nsec = 0; return true; } - if (to_set->tv_nsec > NSEC_PER_SEC - TIME_SET_NSEC_FUZZ) { + if (to_set->tv_nsec > NSEC_PER_SEC - time_set_nsec_fuzz) { to_set->tv_sec++; to_set->tv_nsec = 0; return true; diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 069ca78fb0bf..893bc7ed7845 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -531,6 +531,7 @@ static void sched_sync_hw_clock(struct timespec64 now, static void sync_rtc_clock(void) { + static unsigned int attempt; unsigned long target_nsec; struct timespec64 adjust, now; int rc; @@ -548,9 +549,11 @@ static void sync_rtc_clock(void) * The current RTC in use will provide the target_nsec it wants to be * called at, and does rtc_tv_nsec_ok internally. */ - rc = rtc_set_ntp_time(adjust, &target_nsec); + rc = rtc_set_ntp_time(adjust, attempt++, &target_nsec); if (rc == -ENODEV) return; + if (rc != -EPROTO) + attempt = 0; sched_sync_hw_clock(now, target_nsec, rc); } @@ -564,6 +567,7 @@ int __weak update_persistent_clock64(struct timespec64 now64) static bool sync_cmos_clock(void) { + static unsigned int attempt; static bool no_cmos; struct timespec64 now; struct timespec64 adjust; @@ -585,7 +589,8 @@ static bool sync_cmos_clock(void) * implement this legacy API. */ ktime_get_real_ts64(&now); - if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) { + if (rtc_tv_nsec_ok(-1 * target_nsec, attempt++, &adjust, &now)) { + attempt = 0; if (persistent_clock_is_local) adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); rc = update_persistent_clock64(adjust); -- 2.26.2 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-01 17:35 ` Jason Gunthorpe 2020-12-02 10:01 ` [PATCHv2] " Miroslav Lichvar @ 2020-12-02 13:44 ` Thomas Gleixner 2020-12-02 15:07 ` Miroslav Lichvar 2020-12-02 16:27 ` Jason Gunthorpe 1 sibling, 2 replies; 38+ messages in thread From: Thomas Gleixner @ 2020-12-02 13:44 UTC (permalink / raw) To: Jason Gunthorpe, Miroslav Lichvar Cc: linux-kernel, John Stultz, Prarit Bhargava On Tue, Dec 01 2020 at 13:35, Jason Gunthorpe wrote: > On Tue, Dec 01, 2020 at 06:14:20PM +0100, Miroslav Lichvar wrote: >> I found no good explanation. It seems to depend on what system is >> doing, if it's idle, etc. I suspect it's a property of the workqueues >> that they cannot generally guarantee the jobs to run exactly on time. >> I tried switching to the non-power-efficient and high priority >> workqueues and it didn't seem to make a big difference. > > When I wrote it originally the workqueues got increasingly inaccurate > the longer the duration, so it does a very short sleep to get back on > track. > > If you are missing that time target it must be constantly scheduling > new delayed work and none of it hits the target for long, long time > periods? delayed work is based on the timer wheel which is inaccurate by design. Looking at the whole machinery: sync_rtc_clock() ktime_get_real_ts64(&now); rtc_set_ntp_time(now, &target_nsec) set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec); // rtc->set_offset_nsec = NSEC_PER_SEC / 2 *target_nsec = to_set.tv_nsec; sched_sync_hw_clock(now, target_nsec, rc) <- now is unused here ktime_get_real_ts64(&next); if (!fail) next.tv_sec = 659; else next.tv_sec = 0; next.tv_nsec = target_nsec - next.tv_nsec; ... queue_delayed_work(system_power_efficient_wq, &sync_work, timespec64_to_jiffies(&next)); Let's look at the 659 seconds case first. Depending on the HZ value the granularity of the timer wheel bucket in which that timer ends up is: HZ Granularity 1000 32s 250 16s 100 40s That's been true for the old timer wheel implementation as well, just the granularity values were slighty different. The probability to run at the expected time is pretty low. The target time would need to be exactly aligned with the wheel level period. Now for the 0 second X nanoseconds case. That's affected by the bucket granularities as well depending on the resulting nanoseconds value: * HZ 1000 * Level Offset Granularity Range * 0 0 1 ms 0 ms - 63 ms * 1 64 8 ms 64 ms - 511 ms * 2 128 64 ms 512 ms - 4095 ms (512ms - ~4s) * * HZ 250 * Level Offset Granularity Range * 0 0 4 ms 0 ms - 255 ms * 1 64 32 ms 256 ms - 2047 ms (256ms - ~2s) * * HZ 100 * Level Offset Granularity Range * 0 0 10 ms 0 ms - 630 ms * 1 64 80 ms 640 ms - 5110 ms (640ms - ~5s) So if this ends up in level #1 then again the chance is pretty low that the expiry time is aligned to the the level period. If this works then it only works by chance. > This seems like a more fundamental problem someplace else in the > kernel. I don't think so. :) Something like the completely untested below should make this reliable and only needs to retry when the work is running late (busy machine), but the wakeup will be on time or at max 1 jiffie off when high resolution timers are not available or disabled. Thanks, tglx --- kernel/time/ntp.c | 65 +++++++++++++++++++++++------------------------------- 1 file changed, 28 insertions(+), 37 deletions(-) --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -495,64 +495,53 @@ int second_overflow(time64_t secs) } static void sync_hw_clock(struct work_struct *work); -static DECLARE_DELAYED_WORK(sync_work, sync_hw_clock); - -static void sched_sync_hw_clock(struct timespec64 now, - unsigned long target_nsec, bool fail) +static DECLARE_WORK(sync_work, sync_hw_clock); +static struct hrtimer sync_hrtimer; +#define SYNC_PERIOD_NS (11UL * 60 * NSEC_PER_SEC) +static enum hrtimer_restart sync_timer_callback(struct hrtimer *timer) { - struct timespec64 next; + queue_work(system_power_efficient_wq, &sync_work); - ktime_get_real_ts64(&next); - if (!fail) - next.tv_sec = 659; - else { - /* - * Try again as soon as possible. Delaying long periods - * decreases the accuracy of the work queue timer. Due to this - * the algorithm is very likely to require a short-sleep retry - * after the above long sleep to synchronize ts_nsec. - */ - next.tv_sec = 0; - } + return HRTIMER_NORESTART; +} - /* Compute the needed delay that will get to tv_nsec == target_nsec */ - next.tv_nsec = target_nsec - next.tv_nsec; - if (next.tv_nsec <= 0) - next.tv_nsec += NSEC_PER_SEC; - if (next.tv_nsec >= NSEC_PER_SEC) { - next.tv_sec++; - next.tv_nsec -= NSEC_PER_SEC; - } +static void sched_sync_hw_clock(unsigned long offset_nsec, bool retry) +{ + ktime_t exp = ktime_set(ktime_get_real_seconds(), 0); + + if (retry) + exp = ktime_add_ns(exp, 2 * NSEC_PER_SEC - offset_nsec); + else + exp = ktime_add_ns(exp, SYNC_PERIOD_NS - offset_nsec); - queue_delayed_work(system_power_efficient_wq, &sync_work, - timespec64_to_jiffies(&next)); + hrtimer_start(&sync_hrtimer, exp, HRTIMER_MODE_ABS); } static void sync_rtc_clock(void) { - unsigned long target_nsec; - struct timespec64 adjust, now; + unsigned long offset_nsec; + struct timespec64 adjust; int rc; if (!IS_ENABLED(CONFIG_RTC_SYSTOHC)) return; - ktime_get_real_ts64(&now); + ktime_get_real_ts64(&adjust); - adjust = now; if (persistent_clock_is_local) adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); /* - * The current RTC in use will provide the target_nsec it wants to be - * called at, and does rtc_tv_nsec_ok internally. + * The current RTC in use will provide the nanoseconds offset prior + * to a full second it wants to be called at, and invokes + * rtc_tv_nsec_ok() internally. */ - rc = rtc_set_ntp_time(adjust, &target_nsec); + rc = rtc_set_ntp_time(adjust, &offset_nsec); if (rc == -ENODEV) return; - sched_sync_hw_clock(now, target_nsec, rc); + sched_sync_hw_clock(offset_nsec, rc != 0); } #ifdef CONFIG_GENERIC_CMOS_UPDATE @@ -599,7 +588,7 @@ static bool sync_cmos_clock(void) } } - sched_sync_hw_clock(now, target_nsec, rc); + sched_sync_hw_clock(target_nsec, rc != 0); return true; } @@ -629,7 +618,7 @@ void ntp_notify_cmos_timer(void) if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) || IS_ENABLED(CONFIG_RTC_SYSTOHC)) - queue_delayed_work(system_power_efficient_wq, &sync_work, 0); + queue_work(system_power_efficient_wq, &sync_work); } /* @@ -1044,4 +1033,6 @@ static int __init ntp_tick_adj_setup(cha void __init ntp_init(void) { ntp_clear(); + hrtimer_init(&sync_hrtimer, CLOCK_REALTIME, HRTIMER_MODE_ABS); + sync_hrtimer.function = sync_timer_callback; } ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-02 13:44 ` [PATCH] " Thomas Gleixner @ 2020-12-02 15:07 ` Miroslav Lichvar 2020-12-02 15:36 ` Miroslav Lichvar 2020-12-02 16:27 ` Jason Gunthorpe 1 sibling, 1 reply; 38+ messages in thread From: Miroslav Lichvar @ 2020-12-02 15:07 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, linux-kernel, John Stultz, Prarit Bhargava On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote: > Something like the completely untested below should make this reliable > and only needs to retry when the work is running late (busy machine), > but the wakeup will be on time or at max 1 jiffie off when high > resolution timers are not available or disabled. It seems to work nicely. In my test most of the updates succeeded on the first attempt hitting the right tick, the rest succeeding on the second attempt. Only when the clock was set to run 10% faster, it needed few more attempts to converge to the target time. Thanks, -- Miroslav Lichvar ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-02 15:07 ` Miroslav Lichvar @ 2020-12-02 15:36 ` Miroslav Lichvar 2020-12-02 18:36 ` Thomas Gleixner 0 siblings, 1 reply; 38+ messages in thread From: Miroslav Lichvar @ 2020-12-02 15:36 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, linux-kernel, John Stultz, Prarit Bhargava On Wed, Dec 02, 2020 at 04:07:28PM +0100, Miroslav Lichvar wrote: > On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote: > > Something like the completely untested below should make this reliable > > and only needs to retry when the work is running late (busy machine), > > but the wakeup will be on time or at max 1 jiffie off when high > > resolution timers are not available or disabled. > > It seems to work nicely. In my test most of the updates succeeded on > the first attempt hitting the right tick, the rest succeeding on the > second attempt. Only when the clock was set to run 10% faster, it > needed few more attempts to converge to the target time. I noticed an observable change wrt adjtimex() calls though. It seems it now reschedules the RTC update, i.e. there can be more than one update per 11 minutes. Was this intended? > @@ -629,7 +618,7 @@ void ntp_notify_cmos_timer(void) > > if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) || > IS_ENABLED(CONFIG_RTC_SYSTOHC)) > - queue_delayed_work(system_power_efficient_wq, &sync_work, 0); > + queue_work(system_power_efficient_wq, &sync_work); > } -- Miroslav Lichvar ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-02 15:36 ` Miroslav Lichvar @ 2020-12-02 18:36 ` Thomas Gleixner 0 siblings, 0 replies; 38+ messages in thread From: Thomas Gleixner @ 2020-12-02 18:36 UTC (permalink / raw) To: Miroslav Lichvar Cc: Jason Gunthorpe, linux-kernel, John Stultz, Prarit Bhargava On Wed, Dec 02 2020 at 16:36, Miroslav Lichvar wrote: > On Wed, Dec 02, 2020 at 04:07:28PM +0100, Miroslav Lichvar wrote: >> On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote: >> > Something like the completely untested below should make this reliable >> > and only needs to retry when the work is running late (busy machine), >> > but the wakeup will be on time or at max 1 jiffie off when high >> > resolution timers are not available or disabled. >> >> It seems to work nicely. In my test most of the updates succeeded on >> the first attempt hitting the right tick, the rest succeeding on the >> second attempt. Only when the clock was set to run 10% faster, it >> needed few more attempts to converge to the target time. > > I noticed an observable change wrt adjtimex() calls though. It seems > it now reschedules the RTC update, i.e. there can be more than one > update per 11 minutes. Was this intended? No. Let me fix that. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-02 13:44 ` [PATCH] " Thomas Gleixner 2020-12-02 15:07 ` Miroslav Lichvar @ 2020-12-02 16:27 ` Jason Gunthorpe 2020-12-02 19:21 ` Thomas Gleixner 1 sibling, 1 reply; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-02 16:27 UTC (permalink / raw) To: Thomas Gleixner Cc: Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote: > So if this ends up in level #1 then again the chance is pretty low that > the expiry time is aligned to the the level period. If this works then > it only works by chance. Yes, I always thought it was timer wheel related, but at least in my testing long ago it seemed reliable at the short timeout. Maybe I had a lucky hz value. > > This seems like a more fundamental problem someplace else in the > > kernel. > > I don't think so. :) > > Something like the completely untested below should make this reliable > and only needs to retry when the work is running late (busy machine), > but the wakeup will be on time or at max 1 jiffie off when high > resolution timers are not available or disabled. This seems like the right approach > @@ -629,7 +618,7 @@ void ntp_notify_cmos_timer(void) > > if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) || > IS_ENABLED(CONFIG_RTC_SYSTOHC)) > - queue_delayed_work(system_power_efficient_wq, &sync_work, 0); > + queue_work(system_power_efficient_wq, &sync_work); As Miroslav noted, probably the right thing to do here is to reset the hrtimer and remove the sync_work? I think this code was to expedite an RTC sync when NTP fixes the clock on boot. IIRC this was made somewhat messy due to the dual path with rtclib and old cmos. It would be very nice to kill the cmos path, things are better now.. But PPC still has a long way to go: arch/powerpc/platforms/52xx/efika.c: .set_rtc_time = rtas_set_rtc_time, arch/powerpc/platforms/8xx/mpc86xads_setup.c: .set_rtc_time = mpc8xx_set_rtc_time, arch/powerpc/platforms/8xx/tqm8xx_setup.c: .set_rtc_time = mpc8xx_set_rtc_time, arch/powerpc/platforms/cell/setup.c: .set_rtc_time = rtas_set_rtc_time, arch/powerpc/platforms/chrp/setup.c: ppc_md.set_rtc_time = rtas_set_rtc_time; arch/powerpc/platforms/chrp/setup.c: .set_rtc_time = chrp_set_rtc_time, arch/powerpc/platforms/maple/setup.c: .set_rtc_time = maple_set_rtc_time, arch/powerpc/platforms/powermac/setup.c: .set_rtc_time = pmac_set_rtc_time, arch/powerpc/platforms/pseries/setup.c: .set_rtc_time = rtas_set_rtc_time, Also x86 needs a touch, it already has RTC lib, no idea why it also provides this old path too I wonder if the cmos path could be killed off under the dead HW principle? Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-02 16:27 ` Jason Gunthorpe @ 2020-12-02 19:21 ` Thomas Gleixner 2020-12-02 20:54 ` Jason Gunthorpe 0 siblings, 1 reply; 38+ messages in thread From: Thomas Gleixner @ 2020-12-02 19:21 UTC (permalink / raw) To: Jason Gunthorpe Cc: Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava On Wed, Dec 02 2020 at 12:27, Jason Gunthorpe wrote: > On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote: >> if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) || >> IS_ENABLED(CONFIG_RTC_SYSTOHC)) >> - queue_delayed_work(system_power_efficient_wq, &sync_work, 0); >> + queue_work(system_power_efficient_wq, &sync_work); > > As Miroslav noted, probably the right thing to do here is to reset the > hrtimer and remove the sync_work? I think this code was to expedite an > RTC sync when NTP fixes the clock on boot. This has two purposes: 1) Initiating the update on boot once ntp is synced. 2) Reinitiating the sync after ntp lost sync and the work did not reschedule itself because it observed !ntp_synced(). In both cases it's highly unlikely that the write actually happens when the work is queued because do_adjtimex() would have to be exactly around the valid update window. So it will not write immediately. It will run through at least one retry. I don't think the timer should be canceled if the ntp_synced() state did not change. Otherwise every do_adtimex() call will cancel/restart it, which does not make sense. Lemme stare at it some more. > IIRC this was made somewhat messy due to the dual path with rtclib and > old cmos. :) > It would be very nice to kill the cmos path, things are better > now.. But PPC still has a long way to go: > > arch/powerpc/platforms/52xx/efika.c: .set_rtc_time = rtas_set_rtc_time, > arch/powerpc/platforms/8xx/mpc86xads_setup.c: .set_rtc_time = mpc8xx_set_rtc_time, > arch/powerpc/platforms/8xx/tqm8xx_setup.c: .set_rtc_time = mpc8xx_set_rtc_time, > arch/powerpc/platforms/cell/setup.c: .set_rtc_time = rtas_set_rtc_time, > arch/powerpc/platforms/chrp/setup.c: ppc_md.set_rtc_time = rtas_set_rtc_time; > arch/powerpc/platforms/chrp/setup.c: .set_rtc_time = chrp_set_rtc_time, > arch/powerpc/platforms/maple/setup.c: .set_rtc_time = maple_set_rtc_time, > arch/powerpc/platforms/powermac/setup.c: .set_rtc_time = pmac_set_rtc_time, > arch/powerpc/platforms/pseries/setup.c: .set_rtc_time = rtas_set_rtc_time, > > Also x86 needs a touch, it already has RTC lib, no idea why it also > provides this old path too Because nobody had the stomach and/or cycles to touch it :) > I wonder if the cmos path could be killed off under the dead HW > principle? Unfortunately that code path is not that dead on x86. You need to fix all the (ab)users first. :) Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-02 19:21 ` Thomas Gleixner @ 2020-12-02 20:54 ` Jason Gunthorpe 2020-12-02 22:08 ` Thomas Gleixner 0 siblings, 1 reply; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-02 20:54 UTC (permalink / raw) To: Thomas Gleixner Cc: Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava On Wed, Dec 02, 2020 at 08:21:00PM +0100, Thomas Gleixner wrote: > On Wed, Dec 02 2020 at 12:27, Jason Gunthorpe wrote: > > On Wed, Dec 02, 2020 at 02:44:53PM +0100, Thomas Gleixner wrote: > >> if (IS_ENABLED(CONFIG_GENERIC_CMOS_UPDATE) || > >> IS_ENABLED(CONFIG_RTC_SYSTOHC)) > >> - queue_delayed_work(system_power_efficient_wq, &sync_work, 0); > >> + queue_work(system_power_efficient_wq, &sync_work); > > > > As Miroslav noted, probably the right thing to do here is to reset the > > hrtimer and remove the sync_work? I think this code was to expedite an > > RTC sync when NTP fixes the clock on boot. > > This has two purposes: > > 1) Initiating the update on boot once ntp is synced. > > 2) Reinitiating the sync after ntp lost sync and the work did not > reschedule itself because it observed !ntp_synced(). > > In both cases it's highly unlikely that the write actually happens when > the work is queued because do_adjtimex() would have to be exactly around > the valid update window. Yes > So it will not write immediately. It will run through at least one > retry. Right, bascially this is scheduling a WQ to do sched_sync_hw_clock() which will only call hrtimer_start() - seems like jsut calling hrtimer_start instead of queue_work above would be equivilant > I don't think the timer should be canceled if the ntp_synced() state did > not change. Otherwise every do_adtimex() call will cancel/restart > it, which does not make sense. Lemme stare at it some more. That makes sense, being conditional on the STA_UNSYNC prior to doing any hrtimer_start seems OK? > > Also x86 needs a touch, it already has RTC lib, no idea why it also > > provides this old path too > > Because nobody had the stomach and/or cycles to touch it :) Hahaha yes.. I vaugely remember looking at this once.. Lets see: arch/x86/kernel/kvmclock.c: x86_platform.set_wallclock = kvm_set_wallclock; arch/x86/kernel/x86_init.c: x86_platform.set_wallclock = set_rtc_noop; arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock; arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock; All returns -ENODEV/EINVAL arch/x86/kernel/x86_init.c: .set_wallclock = mach_set_rtc_mmss, This is already rtclib under drivers/rtc/rtc-mc146818-lib.c I suppose the issue here is the rtclib driver only binds via PNP and very old x86 systems won't have the PNP tables? It seems doable to check for a PNP device after late init and manually create a platform_device for the RTC arch/x86/platform/intel-mid/intel_mid_vrtc.c: x86_platform.set_wallclock = vrtc_set_mmss; This is also already in rtclib under rtc-mrst.c, and this is already wired to create the rtc platform device during init So it is very close now to be able to delete all this for x86. Do you know of something I've missed? > > I wonder if the cmos path could be killed off under the dead HW > > principle? > > Unfortunately that code path is not that dead on x86. You need to fix > all the (ab)users first. :) Assuming x86 can be resolved as above, that leaves two 20 year old MIPS platforms and the PPC list from before. ARM is gone compared to last time I looked! Progress :) Thanks, Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-02 20:54 ` Jason Gunthorpe @ 2020-12-02 22:08 ` Thomas Gleixner 2020-12-02 23:03 ` Jason Gunthorpe 2020-12-03 1:14 ` Thomas Gleixner 0 siblings, 2 replies; 38+ messages in thread From: Thomas Gleixner @ 2020-12-02 22:08 UTC (permalink / raw) To: Jason Gunthorpe Cc: Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava On Wed, Dec 02 2020 at 16:54, Jason Gunthorpe wrote: > On Wed, Dec 02, 2020 at 08:21:00PM +0100, Thomas Gleixner wrote: >> So it will not write immediately. It will run through at least one >> retry. > > Right, bascially this is scheduling a WQ to do sched_sync_hw_clock() > which will only call hrtimer_start() - seems like jsut calling > hrtimer_start instead of queue_work above would be equivilant Something like that. >> I don't think the timer should be canceled if the ntp_synced() state did >> not change. Otherwise every do_adtimex() call will cancel/restart >> it, which does not make sense. Lemme stare at it some more. > > That makes sense, being conditional on the STA_UNSYNC prior to doing > any hrtimer_start seems OK? Yeah. >> > Also x86 needs a touch, it already has RTC lib, no idea why it also >> > provides this old path too >> >> Because nobody had the stomach and/or cycles to touch it :) > > Hahaha yes.. I vaugely remember looking at this once.. > > Lets see: > > arch/x86/kernel/kvmclock.c: x86_platform.set_wallclock = kvm_set_wallclock; > arch/x86/kernel/x86_init.c: x86_platform.set_wallclock = set_rtc_noop; > arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock; > arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock; > All returns -ENODEV/EINVAL You forgot to stare at the .get_wallclock() functions. That's the more interesting part, i.e. what's behind read_persistent_clock64(). > arch/x86/kernel/x86_init.c: .set_wallclock = mach_set_rtc_mmss, > This is already rtclib under drivers/rtc/rtc-mc146818-lib.c That's the shared library function for setting the darn thing. > I suppose the issue here is the rtclib driver only binds via PNP and > very old x86 systems won't have the PNP tables? It seems doable to > check for a PNP device after late init and manually create a > platform_device for the RTC old crap, broken BIOSes and virt. Welcome to my wonderful world :) > arch/x86/platform/intel-mid/intel_mid_vrtc.c: x86_platform.set_wallclock = vrtc_set_mmss; > This is also already in rtclib under rtc-mrst.c, and this is already > wired to create the rtc platform device during init > > So it is very close now to be able to delete all this for x86. Do you > know of something I've missed? Just the above :) >> > I wonder if the cmos path could be killed off under the dead HW >> > principle? >> >> Unfortunately that code path is not that dead on x86. You need to fix >> all the (ab)users first. :) > > Assuming x86 can be resolved as above, that leaves two 20 year old MIPS > platforms and the PPC list from before. ARM is gone compared to last > time I looked! Progress :) Yeah. We're zooming in .... Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-02 22:08 ` Thomas Gleixner @ 2020-12-02 23:03 ` Jason Gunthorpe 2020-12-03 1:14 ` Thomas Gleixner 1 sibling, 0 replies; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-02 23:03 UTC (permalink / raw) To: Thomas Gleixner Cc: Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava On Wed, Dec 02, 2020 at 11:08:38PM +0100, Thomas Gleixner wrote: > > > > arch/x86/kernel/kvmclock.c: x86_platform.set_wallclock = kvm_set_wallclock; > > arch/x86/kernel/x86_init.c: x86_platform.set_wallclock = set_rtc_noop; > > arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock; > > arch/x86/xen/time.c: x86_platform.set_wallclock = xen_set_wallclock; > > All returns -ENODEV/EINVAL > > You forgot to stare at the .get_wallclock() functions. That's the more > interesting part, i.e. what's behind read_persistent_clock64(). Small steps! I was only looking at deleting the legacy CONFIG_GENERIC_CMOS_UPDATE from x86 which only controls the update_persistent_clock64() Yes, there is a similar redundancy with rtclib on the read_persistant_clock side, and that does looks much further.. > > arch/x86/kernel/x86_init.c: .set_wallclock = mach_set_rtc_mmss, > > This is already rtclib under drivers/rtc/rtc-mc146818-lib.c > > That's the shared library function for setting the darn thing. Yes, but if a PNP entry is present then rtc-cmos will load and call that function through the rtclib path instead of the update_persistent_clock64() path Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-02 22:08 ` Thomas Gleixner 2020-12-02 23:03 ` Jason Gunthorpe @ 2020-12-03 1:14 ` Thomas Gleixner 2020-12-03 2:04 ` Jason Gunthorpe 2020-12-03 2:10 ` Alexandre Belloni 1 sibling, 2 replies; 38+ messages in thread From: Thomas Gleixner @ 2020-12-03 1:14 UTC (permalink / raw) To: Jason Gunthorpe Cc: Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, Alexandre Belloni, linux-rtc, Peter Zijlstra Cc+ RTC folks. On Wed, Dec 02 2020 at 23:08, Thomas Gleixner wrote: > On Wed, Dec 02 2020 at 16:54, Jason Gunthorpe wrote: >>> I don't think the timer should be canceled if the ntp_synced() state did >>> not change. Otherwise every do_adtimex() call will cancel/restart >>> it, which does not make sense. Lemme stare at it some more. >> >> That makes sense, being conditional on the STA_UNSYNC prior to doing >> any hrtimer_start seems OK? > > Yeah. And I was staring at it some more. TBH, rtc_tv_nsec_ok() is pretty much voodoo and wishful thinking. The point is that the original MC146818 has an update cycle once per second. That's what mc146818_is_updating() is checking. Lets start right here. mc146818_get_time() is bonkers to begin with. It does: if (mc146818_is_updating()) mdelay(20); lock(&rtc_lock); read_stuff(); unlock(&rtc_lock); That's crap, really. The MC146818 does a periodic update of the time data once per second and it advertises it via the UIP bit in Register A. The UIP bit is set 244us _before_ the update starts and depending on the time base is takes either 248us or 1984us (32kHz) until the bit goes low again. So let's look at the time line assuming a 32kHz time base: Time 0.000s ~0.998s 1.0s | | _____ UIP ____________________| |_______ Let's look at the code again and annotate it with time stamps: 0.997 read() if (mc146818_is_updating()) lock(rtc); read(regA); unlock(rtc); return regA & UIP; <- Lets assume FALSE 0.998 -> whatever happens (SMI, NMI, interrupt, preemption ...) 0.999 lock(rtc) read_data() <- Result is undefined because RTC is in the middle of the update unlock(rtc); Seriously... The comment above the if(...updating()) is full of wishful thinking: /* * read RTC once any update in progress is done. The update * can take just over 2ms. We wait 20ms. There is no need to * to poll-wait (up to 1s - eeccch) for the falling edge of RTC_UIP. * If you need to know *exactly* when a second has started, enable * periodic update complete interrupts, (via ioctl) and then * immediately read /dev/rtc which will block until you get the IRQ. * Once the read clears, read the RTC time (again via ioctl). Easy. */ - The update can take exactly up to 1984us, which is not just over 2ms. Also the update starts 248us _after_ UIP goes high. - Wait 20ms? What for? To make sure that the RTC has advanced by 2ms? - Poll wait for the falling edge of UIP takes up to 1s? I must have missed something in basic math. If UIP is high then waiting for the falling edge takes at max 2ms. I know what the comment tries to say, but on the first read I had one of those forbidden-by-CoC moments. - The need to know *exactly* part of the comment is even more amazing. Q: Which system guarantees that the interrupt: - will not happen _before_ read(/dev/rtc) after the ioctl() enabled it? - will be delivered without delay ? - will make sure that the task in the blocking read will be scheduled immediately and then do the ioctl based readout ? A: None Q: What is *exact* about this? A: Nothing at all. So the right thing to do here is: read() do { lock(rtc) start = ktime_get(); if (read(regA) & UIP) { unlock(rtc); wait(2ms); continue; } read_data(); end = ktime_get(); unlock(rtc); while (end - start > 2ms); and return _all_ three values (start, end, rtcdata) so clueful userspace can make sense of it. Returning nonsense as pointed off above is a non option. Hmm? Now to the write side. That's really wishful thinking. Let's look at the code: write() lock(rtc); write(regB, read(reagB) | SET); write_data(); write(regB, read(reagB) & ~SET); unlock(rtc); lock/unlock are irrelevant as they just serialize concurrent access to the RTC. The magic comment in ntp.c says that RTC will update the written value 0.5 seconds after writing it. That's pure fiction... I have no idea where this comes from, but any spec out there says about this: SET - When the SET bit is a "0", the update cycle functions normally by advancing the counts once-per-second. When the SET bit is written to a "1", any update cycle in progress is aborted and the program may initialize the time and calendar bytes without an update occuring in the midst of initializing. ..... So yes, the comment is partially correct _if_ and only _if_ the time base which schedules the update is exactly the same time base as the RTC time base and the time on which the update is scheduled is perfectly in sync with the RTC time base. Plus the update must be completed _before_ then next update cycle of the RTC starts which is what the 0.5 sec offset is trying to solve. Emphasis on _trying_. But with NTP that's not the case at all. The clocksource which is adjusted by NTP (usually TSC on x86, but that does not matter) is not at all guaranteed to run from the same crystal as the RTC. On most systems the RTC has a seperate crystal which feeds it across poweroff. Even if it _is_ using the same crystal, then the NTP adjustments are going to skew the clocksource frequency against the underlying crystal which feeds the RTC and the clocksource. Q: So how can any assumption about update cycle correlatation between NTP synced CLOCK_REALTIME and the RTC notion of clock realtime be correct? A: Not at all. Aside of that the magic correction of the time which is written to the RTC is completely bogus. Lets start with the interface and the two callers of it: static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, struct timespec64 *to_set, const struct timespec64 *now) The callers are: sync_cmos_clock() /* The legacy RTC cruft */ struct timespec64 now; struct timespec64 adjust; long target_nsec = NSEC_PER_SEC / 2; ktime_get_real_ts64(&now); if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) { if (persistent_clock_is_local) adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); rc = update_persistent_clock64(adjust); } sync_rtc_clock() unsigned long target_nsec; <- Signed unsigned .... struct timespec64 adjust, now; ktime_get_real_ts64(&now); adjust = now; <- Why the difference to the above? if (persistent_clock_is_local) <- Again, why is the ordering different? adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); rc = rtc_set_ntp_time(adjust, &target_nsec) // int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec) struct timespec64 to_set; set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec); *target_nsec = to_set.tv_nsec; <- target_nsec = rtc->set_offset_nsec because the timespec is normalized ergo == rtc->set_offset_nsec unless the set_offset_nsec would be negative which makes at all. if (rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now)) update_rtc(...); So sync_cmos_clock hands in -(NSEC_PER_SEC/2) and the rtc cruft hands in NSEC_PER_SEC/2 by default. The comment in drivers/rtc/class.c says: drivers/rtc/class.c- /* Drivers can revise this default after allocating the device. */ drivers/rtc/class.c: rtc->set_offset_nsec = NSEC_PER_SEC / 2; but no driver ever bothered to change that value. Also the idea of having this offset as type s64 is beyond my understanding. Why the heck would any RTC require to set an offset which is _after_ the second transition. That aside. Looking at the above two variants let's go into rtc_tv_nsec_ok() static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, struct timespec64 *to_set, const struct timespec64 *now) /* Allowed error in tv_nsec, arbitarily set to 5 jiffies in ns. */ const unsigned long TIME_SET_NSEC_FUZZ = TICK_NSEC * 5; struct timespec64 delay = {.tv_sec = 0, .tv_nsec = set_offset_nsec}; *to_set = timespec64_add(*now, delay); The scheduled point for both legacy CMOS and RTC modern is at the point of the second minus 0.5 seconds (lets ignore that set_offset_nsec might be different for this). So let's assume the update was scheduled at 659s 500ms independent of legacy or modern. Now legacy does the following: struct timespec64 delay = { .tv_sec = 0, tv_nsec = -5e8 } which is an not normalized timespec to begin with but *to_set = timespec64_add(*now , delay); can deal with that. So the result of this computation is: now - delay IOW, 0.5 seconds before now: 659s 0ms Now the same magic for the 'modern' RTC will do: struct timespec64 delay = { .tv_sec = 0, tv_nsec = 5e8 } so the result of the add is: now + delay IOW, 0.5 seconds _after_ now: 700s 0ms Can you spot the subtle difference? That said, can somebody answer the one million dollar question which problem is solved by all of this magic nonsense? If anyone involved seriously believes that any of this solves a real world problem, then please come forth an make your case. If not, all of this illusionary attempts to be "correct" can be removed for good and the whole thing reduced to a update_rtc_plus_minus_a_second() mechanism, which is exactly what we have today just without the code which pretends to be *exact* or whatever. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 1:14 ` Thomas Gleixner @ 2020-12-03 2:04 ` Jason Gunthorpe 2020-12-03 2:10 ` Alexandre Belloni 1 sibling, 0 replies; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-03 2:04 UTC (permalink / raw) To: Thomas Gleixner Cc: Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, Alexandre Belloni, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 02:14:12AM +0100, Thomas Gleixner wrote: > If anyone involved seriously believes that any of this solves a real > world problem, then please come forth an make your case. The original commit 0f295b0650c9 ("rtc: Allow rtc drivers to specify the tv_nsec value for ntp") was tested by myself and RMK on various ARM systems and did work as advertised. Here is the giant thread, RMK's post explains the problem and gives his measurements of several different RTCs: https://lore.kernel.org/linux-arm-kernel/20170920112152.GL20805@n2100.armlinux.org.uk/ And the patch that resulted: https://lore.kernel.org/linux-arm-kernel/20171013175433.GA22062@obsidianresearch.com/ There is a lot of detail in there.. Keep in mind none of this was for the mc146818 style RTCs. I can't recall any more why no drivers use the set_offset_nsec. I'm surprised, maybe I forgot to send the patch for the RTCs I tested or maybe it got dropped someplace.. It certainly was needed for some maxim I2C chips. The thread shows rmk had even written a hrtimer patch to go with this, but it also got lost for some reason. Maybe all the arguing killed further effort? Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 1:14 ` Thomas Gleixner 2020-12-03 2:04 ` Jason Gunthorpe @ 2020-12-03 2:10 ` Alexandre Belloni 2020-12-03 15:39 ` Thomas Gleixner 2020-12-03 15:52 ` Jason Gunthorpe 1 sibling, 2 replies; 38+ messages in thread From: Alexandre Belloni @ 2020-12-03 2:10 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra Hello Thomas, I'll take more time to reply more in depth to the whole email but... On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote: > Aside of that the magic correction of the time which is written to the > RTC is completely bogus. Lets start with the interface and the two > callers of it: > > static inline bool rtc_tv_nsec_ok(s64 set_offset_nsec, > struct timespec64 *to_set, > const struct timespec64 *now) > > The callers are: > > sync_cmos_clock() /* The legacy RTC cruft */ > struct timespec64 now; > struct timespec64 adjust; > long target_nsec = NSEC_PER_SEC / 2; > > ktime_get_real_ts64(&now); > if (rtc_tv_nsec_ok(-1 * target_nsec, &adjust, &now)) { > if (persistent_clock_is_local) > adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); > rc = update_persistent_clock64(adjust); > } > > sync_rtc_clock() > unsigned long target_nsec; <- Signed unsigned .... > struct timespec64 adjust, now; > > ktime_get_real_ts64(&now); > > adjust = now; <- Why the difference to the above? > > if (persistent_clock_is_local) <- Again, why is the ordering different? > adjust.tv_sec -= (sys_tz.tz_minuteswest * 60); > > rc = rtc_set_ntp_time(adjust, &target_nsec) > // int rtc_set_ntp_time(struct timespec64 now, unsigned long *target_nsec) > > struct timespec64 to_set; > > set_normalized_timespec64(&to_set, 0, -rtc->set_offset_nsec); > *target_nsec = to_set.tv_nsec; <- target_nsec = rtc->set_offset_nsec > because the timespec is normalized > ergo == rtc->set_offset_nsec > unless the set_offset_nsec would > be negative which makes at all. > > if (rtc_tv_nsec_ok(rtc->set_offset_nsec, &to_set, &now)) > update_rtc(...); > > So sync_cmos_clock hands in -(NSEC_PER_SEC/2) and the rtc cruft hands in > NSEC_PER_SEC/2 by default. The comment in drivers/rtc/class.c says: > > drivers/rtc/class.c- /* Drivers can revise this default after allocating the device. */ > drivers/rtc/class.c: rtc->set_offset_nsec = NSEC_PER_SEC / 2; > > but no driver ever bothered to change that value. Also the idea of > having this offset as type s64 is beyond my understanding. Why the heck > would any RTC require to set an offset which is _after_ the second > transition. > This (no driver making use of set_offset_nsec) happened because it got applied without me agreeing to the change. I did complain at the time and RMK walked away. [...] > That said, can somebody answer the one million dollar question which > problem is solved by all of this magic nonsense? > The goal was to remove the 500ms offset for all the RTCs but the MC146818 because there are RTC that will reset properly their counter when setting the time, meaning they can be set very precisely. IIRC, used in conjunction with rtc_hctosys which also adds inconditionnaly 500ms this can ends up with the system time being one second away from the wall clock time which NTP will take quite some time to remove. > If anyone involved seriously believes that any of this solves a real > world problem, then please come forth an make your case. > > If not, all of this illusionary attempts to be "correct" can be removed > for good and the whole thing reduced to a > > update_rtc_plus_minus_a_second() > > mechanism, which is exactly what we have today just without the code > which pretends to be *exact* or whatever. > Coincidentally, I was going to revert those patches for v5.11. Also, honestly, I still don't understand why the kernel needs to set the RTC while userspace is very well equipped to do that. chrony is able to set the RTC time and it can do so precisely. It can even compute how that RTC is time drifting and that value can already be used to adjust the RTC crystal. From my tests, with some efforts, userspace can set the RTC time with a 20µs precision, even on low end systems. To do so, it doesn't need set_offset_nsec. I also don't like hctosys, it is currently wrong but I can see use cases and now systemd relies on its presence so my plan is to fix it. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 2:10 ` Alexandre Belloni @ 2020-12-03 15:39 ` Thomas Gleixner 2020-12-03 16:16 ` Jason Gunthorpe 2020-12-03 17:29 ` Alexandre Belloni 2020-12-03 15:52 ` Jason Gunthorpe 1 sibling, 2 replies; 38+ messages in thread From: Thomas Gleixner @ 2020-12-03 15:39 UTC (permalink / raw) To: Alexandre Belloni Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra Alexandre, On Thu, Dec 03 2020 at 03:10, Alexandre Belloni wrote: > On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote: >> That said, can somebody answer the one million dollar question which >> problem is solved by all of this magic nonsense? >> > The goal was to remove the 500ms offset for all the RTCs but the > MC146818 because there are RTC that will reset properly their counter > when setting the time, meaning they can be set very precisely. The MC setting is halfways precise. The write resets the divider chain and when the reset is removed then the next UIP will happen after the magic 0.5 seconds. So yes, writing it 500ms _before_ the next second is halfways correct assumed that there is no interference between ktime_get_real() and the actual write which is a silly assumption as the code is fully preemptible. > IIRC, used in conjunction with rtc_hctosys which also adds > inconditionnaly 500ms this can ends up with the system time > being one second away from the wall clock time which NTP will take quite > some time to remove. The rtc_cmos() driver has a fun comment in cmos_set_time().... The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in behaviour. > Coincidentally, I was going to revert those patches for v5.11. Which will break the rtc_cmos() driver in a different way. We should really fix that proper and just have the 500ms offset for rtc_cmos, aka. MC146818. When other drivers want a different offset, then they still can do so. The direct /dev/rtc settime ioctl is not using that logic anyway. Thats the business of the user space application to get that straight which is scheduling lottery as well. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 15:39 ` Thomas Gleixner @ 2020-12-03 16:16 ` Jason Gunthorpe 2020-12-03 21:05 ` Thomas Gleixner 2020-12-03 17:29 ` Alexandre Belloni 1 sibling, 1 reply; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-03 16:16 UTC (permalink / raw) To: Thomas Gleixner Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 04:39:21PM +0100, Thomas Gleixner wrote: > The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I > pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and > rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in > behaviour. I understood this is because the two APIs work differently, rmk explained this as: > 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the > time at around 500ms into the second. > > 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms, > then we want to set the _next_ second. ie one path is supposed to round down and one path is supposed to round up, so you get to that 1s difference.. IIRC this is also connected to why the offset is signed.. Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 16:16 ` Jason Gunthorpe @ 2020-12-03 21:05 ` Thomas Gleixner 2020-12-03 21:31 ` Thomas Gleixner 2020-12-03 22:00 ` Alexandre Belloni 0 siblings, 2 replies; 38+ messages in thread From: Thomas Gleixner @ 2020-12-03 21:05 UTC (permalink / raw) To: Jason Gunthorpe Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote: > On Thu, Dec 03, 2020 at 04:39:21PM +0100, Thomas Gleixner wrote: > >> The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I >> pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and >> rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in >> behaviour. > > I understood this is because the two APIs work differently, rmk > explained this as: > >> 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the >> time at around 500ms into the second. >> >> 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms, >> then we want to set the _next_ second. > > ie one path is supposed to round down and one path is supposed to > round up, so you get to that 1s difference.. > > IIRC this is also connected to why the offset is signed.. The problem is that it is device specific and therefore having the offset parameter is a good starting point. Lets look at the two scenarios: 1) Direct accessible RTC: tsched t1 t2 write(newsec) RTC increments seconds For rtc_cmos/MC1... tinc = t2 - t1 = 500ms There are RTCs which reset the thing on write so tinc = t2 - t1 = 1000ms No idea what other variants are out there, but the principle is the same for all of them. Lets assume that the event is accurate for now and ignore the fuzz logic, i.e. tsched == t1 tsched must be scheduled to happen tinc before wallclock increments seconds so that the RTC increments seconds at the same time. That means newsec = t1.tv_sec. So now the fuzz logic for the legacy cmos path does: newtime = t1 - tinc; if (newtime.tv_nsec < FUZZ) newsec = newtime.tv_sec; else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ) newsec = newtime.tv_sec + 1; else goto fail; The first condition handles the case where t1 >= tsched and the second one where t1 < tsched. We need the same logic for rtc_cmos() when the update goes through the RTC path, which is broken today. See below. 2) I2C/SPI ... tsched t0 t1 t2 transfer(newsec) RTC update (newsec) RTC increments seconds Lets assume that ttransfer = t1 - t0 is known. tinc is the same as above = t2 - t1 Again, lets assume that the event is accurate for now and ignore the fuzz logic, i.e. tsched == t0 So tsched has to be ttot = t2 - t0 _before_ wallclock reaches t2 and increments seconds. In this case newsec = t1.tv_sec = (t0 + ttransfer).tv_sec So now the fuzz logic for this is: newtime = t0 + ttransfer; if (newtime.tv_nsec < FUZZ) newsec = newtime.tv_sec; else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ) newsec = newtime.tv_sec + 1; else goto fail; Again the first condition handles the case where t1 >= tsched and the second one where t1 < tsched. So now we have two options to fix this: 1) Use a negative sync_offset for devices which need #1 above (rtc_cmos & similar) That requires setting tsched to t2 - abs(sync_offset) 2) Use always a positive sync_offset and a flag which tells rtc_tv_nsec_ok() whether it needs to add or subtract. #1 is good enough. All it takes is a comment at the timer start code why abs() is required. Let me hack that up along with the hrtimer muck. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 21:05 ` Thomas Gleixner @ 2020-12-03 21:31 ` Thomas Gleixner 2020-12-03 22:36 ` Jason Gunthorpe 2020-12-03 22:00 ` Alexandre Belloni 1 sibling, 1 reply; 38+ messages in thread From: Thomas Gleixner @ 2020-12-03 21:31 UTC (permalink / raw) To: Jason Gunthorpe Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03 2020 at 22:05, Thomas Gleixner wrote: > On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote: > So now we have two options to fix this: > > 1) Use a negative sync_offset for devices which need #1 above > (rtc_cmos & similar) > > That requires setting tsched to t2 - abs(sync_offset) > > 2) Use always a positive sync_offset and a flag which tells > rtc_tv_nsec_ok() whether it needs to add or subtract. > > #1 is good enough. All it takes is a comment at the timer start code why > abs() is required. > > Let me hack that up along with the hrtimer muck. That comment in rtc.h makes me cry: /* Number of nsec it takes to set the RTC clock. This influences when * the set ops are called. An offset: * - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s * - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s * - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s */ Setting the wall clock time 10.0 at 10.5 is only possible for time traveling RTCs. It magically works, but come on ... Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 21:31 ` Thomas Gleixner @ 2020-12-03 22:36 ` Jason Gunthorpe 2020-12-04 13:02 ` Thomas Gleixner 0 siblings, 1 reply; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-03 22:36 UTC (permalink / raw) To: Thomas Gleixner Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 10:31:02PM +0100, Thomas Gleixner wrote: > On Thu, Dec 03 2020 at 22:05, Thomas Gleixner wrote: > > On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote: > > So now we have two options to fix this: > > > > 1) Use a negative sync_offset for devices which need #1 above > > (rtc_cmos & similar) > > > > That requires setting tsched to t2 - abs(sync_offset) > > > > 2) Use always a positive sync_offset and a flag which tells > > rtc_tv_nsec_ok() whether it needs to add or subtract. > > > > #1 is good enough. All it takes is a comment at the timer start code why > > abs() is required. > > > > Let me hack that up along with the hrtimer muck. > > That comment in rtc.h makes me cry: > > /* Number of nsec it takes to set the RTC clock. This influences when > * the set ops are called. An offset: > * - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s > * - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s > * - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s > */ > > Setting the wall clock time 10.0 at 10.5 is only possible for time > traveling RTCs. It magically works, but come on ... No tardis required. You can think of storing to a RTC as including a millisecond component, so the three examples are: 10.0 stores 9.5, 10.0 stores 8.5, 10.0 stores 10.5. It was probably included due to cmos, either as a misunderstanding what it does, or it does actually store 10.5 when you store 10.0.. Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 22:36 ` Jason Gunthorpe @ 2020-12-04 13:02 ` Thomas Gleixner 2020-12-04 14:08 ` Jason Gunthorpe 0 siblings, 1 reply; 38+ messages in thread From: Thomas Gleixner @ 2020-12-04 13:02 UTC (permalink / raw) To: Jason Gunthorpe Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03 2020 at 18:36, Jason Gunthorpe wrote: > On Thu, Dec 03, 2020 at 10:31:02PM +0100, Thomas Gleixner wrote: >> On Thu, Dec 03 2020 at 22:05, Thomas Gleixner wrote: >> > On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote: >> > So now we have two options to fix this: >> > >> > 1) Use a negative sync_offset for devices which need #1 above >> > (rtc_cmos & similar) >> > >> > That requires setting tsched to t2 - abs(sync_offset) >> > >> > 2) Use always a positive sync_offset and a flag which tells >> > rtc_tv_nsec_ok() whether it needs to add or subtract. >> > >> > #1 is good enough. All it takes is a comment at the timer start code why >> > abs() is required. >> > >> > Let me hack that up along with the hrtimer muck. >> >> That comment in rtc.h makes me cry: >> >> /* Number of nsec it takes to set the RTC clock. This influences when >> * the set ops are called. An offset: >> * - of 0.5 s will call RTC set for wall clock time 10.0 s at 9.5 s >> * - of 1.5 s will call RTC set for wall clock time 10.0 s at 8.5 s >> * - of -0.5 s will call RTC set for wall clock time 10.0 s at 10.5 s >> */ >> >> Setting the wall clock time 10.0 at 10.5 is only possible for time >> traveling RTCs. It magically works, but come on ... > > No tardis required. You can think of storing to a RTC as including a > millisecond component, so the three examples are: 10.0 stores 9.5, > 10.0 stores 8.5, 10.0 stores 10.5. > > It was probably included due to cmos, either as a misunderstanding > what it does, or it does actually store 10.5 when you store 10.0.. Yes, it kinda stores 10.5 because after the write the next seconds increment happens 500ms later. But none of this magic is actually required because the behaviour of most RTCs is that the next seconds increment happens exactly 1000ms _after_ the write. Which means _all_ of these offsets are positive: tsched twrite tnextsec For CMOS tsched == twrite and tnextsec - twrite = 500ms For I2C tsched = tnextsec - 1000ms - ttransport which means the formula is the same for all of them tRTCinc = tnextsec - twrite tsched = tnextsec - tRTCinc - ttransport And this covers also the (probably unlikely) case where the I2C RTC has a tRTCinc != 1000ms. Imagine a i2c based MC14xxx which would have: offset = 500ms + ttransport No magic sign calculation required if you look at it from the actual timeline and account the time between write and next second increment correctly. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 13:02 ` Thomas Gleixner @ 2020-12-04 14:08 ` Jason Gunthorpe 2020-12-04 14:37 ` Alexandre Belloni 0 siblings, 1 reply; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-04 14:08 UTC (permalink / raw) To: Thomas Gleixner Cc: Alexandre Belloni, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Fri, Dec 04, 2020 at 02:02:57PM +0100, Thomas Gleixner wrote: > No magic sign calculation required if you look at it from the actual > timeline and account the time between write and next second increment > correctly. Yes, it is equivalent to break things into two values, and does look to be more understandable as one can read at least one of the values from a datasheet and the other could be estimated by timing a read clock Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 14:08 ` Jason Gunthorpe @ 2020-12-04 14:37 ` Alexandre Belloni 2020-12-04 14:46 ` Jason Gunthorpe 0 siblings, 1 reply; 38+ messages in thread From: Alexandre Belloni @ 2020-12-04 14:37 UTC (permalink / raw) To: Jason Gunthorpe Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 04/12/2020 10:08:19-0400, Jason Gunthorpe wrote: > On Fri, Dec 04, 2020 at 02:02:57PM +0100, Thomas Gleixner wrote: > > > No magic sign calculation required if you look at it from the actual > > timeline and account the time between write and next second increment > > correctly. > > Yes, it is equivalent to break things into two values, and does look > to be more understandable as one can read at least one of the values > from a datasheet and the other could be estimated by timing a read > clock > If you want to read an RTC accurately, you don't want to time a read, what you want is to time an alarm. This is a common misconception and is, again, why hctosys in its current state is not useful. And because people using systohc are definitively using hctosys, this will still result in an up to 500ms error in the current time. As said, the price to pay for a proper solution will be an up to one second delay when booting which is not acceptable for most users. Is "fixing" systohc worth the effort and the maintenance cost? -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 14:37 ` Alexandre Belloni @ 2020-12-04 14:46 ` Jason Gunthorpe 2020-12-04 15:08 ` Alexandre Belloni 0 siblings, 1 reply; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-04 14:46 UTC (permalink / raw) To: Alexandre Belloni Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Fri, Dec 04, 2020 at 03:37:35PM +0100, Alexandre Belloni wrote: > On 04/12/2020 10:08:19-0400, Jason Gunthorpe wrote: > > On Fri, Dec 04, 2020 at 02:02:57PM +0100, Thomas Gleixner wrote: > > > > > No magic sign calculation required if you look at it from the actual > > > timeline and account the time between write and next second increment > > > correctly. > > > > Yes, it is equivalent to break things into two values, and does look > > to be more understandable as one can read at least one of the values > > from a datasheet and the other could be estimated by timing a read > > clock > > > > If you want to read an RTC accurately, you don't want to time a read, > what you want is to time an alarm. This is a common misconception and > is, again, why hctosys in its current state is not useful. I mean literatally time the excution of something like a straight read. This will give some estimate of the bus latency and it should linearly relate to the bus latency for a write. The driver could time configuring an alarm as well, if it likes. > And because people using systohc are definitively using hctosys, this > will still result in an up to 500ms error in the current time. > As said, the price to pay for a proper solution will be an up to one > second delay when booting which is not acceptable for most users. IIRC I had fixed this in some embedded system long ago by having hctosys reading seconds time during boot, then in parallel using the 'up to one second' method to get the sub-second resolution. This means there was a sub second non-monotonicity in the realtime clock, but the system was designed to tolerate this as it also ran a modified NTP which would unconditionally step the clock on first sync if it was over .1s out of alignment. The goal was to bring the system to correct time as quickly as possible in as many cases as possible, not to maintain the monotonicity of the realtime clock. > Is "fixing" systohc worth the effort and the maintenance cost? As I said before, if there is no desire to address the read side then the whole thing should be abandoned. Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 14:46 ` Jason Gunthorpe @ 2020-12-04 15:08 ` Alexandre Belloni 2020-12-04 15:57 ` Jason Gunthorpe 0 siblings, 1 reply; 38+ messages in thread From: Alexandre Belloni @ 2020-12-04 15:08 UTC (permalink / raw) To: Jason Gunthorpe Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 04/12/2020 10:46:59-0400, Jason Gunthorpe wrote: > > If you want to read an RTC accurately, you don't want to time a read, > > what you want is to time an alarm. This is a common misconception and > > is, again, why hctosys in its current state is not useful. > > I mean literatally time the excution of something like a straight > read. This will give some estimate of the bus latency and it should > linearly relate to the bus latency for a write. > It doesn't, some rtc will require writing dozen registers to set the time and reading only 3 to get the time, the only accurate way is to really time setting the time. You set the RTC time once, set up an alarm for the next second, once you get the alarm, you get system time and you now how far you are off. Notice that systohc could do that if you wanted to be accurate and then the whole issue with mc146818 is gone and this nicely solves it for all the RTCs at once. > The driver could time configuring an alarm as well, if it likes. The driver should definitively not have to do the timing. the core, maybe but I won't go over the 165 drivers to add timing. > > > And because people using systohc are definitively using hctosys, this > > will still result in an up to 500ms error in the current time. > > As said, the price to pay for a proper solution will be an up to one > > second delay when booting which is not acceptable for most users. > > IIRC I had fixed this in some embedded system long ago by having > hctosys reading seconds time during boot, then in parallel using the > 'up to one second' method to get the sub-second resolution. > > This means there was a sub second non-monotonicity in the realtime > clock, but the system was designed to tolerate this as it also ran a > modified NTP which would unconditionally step the clock on first sync > if it was over .1s out of alignment. > > The goal was to bring the system to correct time as quickly as > possible in as many cases as possible, not to maintain the > monotonicity of the realtime clock. > I'm really curious, in your use case, couldn't you have read the RTC from userspace and set the system time from there, right before starting NTP and other applications? Doing so, you would have probably been able to ensure monotonicity. > > Is "fixing" systohc worth the effort and the maintenance cost? > > As I said before, if there is no desire to address the read side then > the whole thing should be abandoned. > What was your plan back in 2017? -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 15:08 ` Alexandre Belloni @ 2020-12-04 15:57 ` Jason Gunthorpe 2020-12-04 16:35 ` Alexandre Belloni 0 siblings, 1 reply; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-04 15:57 UTC (permalink / raw) To: Alexandre Belloni Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Fri, Dec 04, 2020 at 04:08:57PM +0100, Alexandre Belloni wrote: > > I mean literatally time the excution of something like a straight > > read. This will give some estimate of the bus latency and it should > > linearly relate to the bus latency for a write. > > > It doesn't, some rtc will require writing dozen registers to set the > time and reading only 3 to get the time, the only accurate way is to > really time setting the time. You set the RTC time once, set up an alarm for > the next second, once you get the alarm, you get system time and you now > how far you are off. This is why I said linearly related. Yes the relation is per-driver, based on the ops required, but in principle it can get close. > Notice that systohc could do that if you wanted to be accurate and then > the whole issue with mc146818 is gone and this nicely solves it for all > the RTCs at once. Another good solution is to have systohc progam the time and then immediately read it back (eg with an alarm). The difference between the read back and the system RTC is the single value to plug into the adjustment offset and naturally includes all the time factors Thomas identified, including the additional factor of 'latency to read the time' > > The goal was to bring the system to correct time as quickly as > > possible in as many cases as possible, not to maintain the > > monotonicity of the realtime clock. > > I'm really curious, in your use case, couldn't you have read the RTC > from userspace and set the system time from there, right before starting > NTP and other applications? This was RAM constrainted embedded, a few hundred bytes of kernel code wins over 100k of userspace > Doing so, you would have probably been able to ensure monotonicity. No, this case also wasn't willing to wait the 1s to load the time. It had to go parallel with the rest of the boot up. This was enterprise gear, boot time to operational counts against the achievable availability rating. From an upstream perspective this was hacky because - We historically try not to create non-monotinicity in CLOCK_REALTIME because too much stuff wrongly works on CLOCK_REALTIME when they really need CLOCK_MONOTONIC - Having the kernel async set the clock is racy with NTP also trying to set the clock But in a closed system the two above were delt with reliably. > > As I said before, if there is no desire to address the read side then > > the whole thing should be abandoned. > > What was your plan back in 2017? Well I was helping RMK because we had similar issues, but I saw some path to achive the low offset round trip, and view this as laudable for the embedded world. I see I also changed jobs right around then which probably explains why things stopped at this one patch. The fact nobody else picked it up probably says people really just don't care about realtime accuracy. Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 15:57 ` Jason Gunthorpe @ 2020-12-04 16:35 ` Alexandre Belloni 0 siblings, 0 replies; 38+ messages in thread From: Alexandre Belloni @ 2020-12-04 16:35 UTC (permalink / raw) To: Jason Gunthorpe Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 04/12/2020 11:57:08-0400, Jason Gunthorpe wrote: > On Fri, Dec 04, 2020 at 04:08:57PM +0100, Alexandre Belloni wrote: > > > > I mean literatally time the excution of something like a straight > > > read. This will give some estimate of the bus latency and it should > > > linearly relate to the bus latency for a write. > > > > > > It doesn't, some rtc will require writing dozen registers to set the > > time and reading only 3 to get the time, the only accurate way is to > > really time setting the time. You set the RTC time once, set up an alarm for > > the next second, once you get the alarm, you get system time and you now > > how far you are off. > > This is why I said linearly related. Yes the relation is per-driver, > based on the ops required, but in principle it can get close. > > > Notice that systohc could do that if you wanted to be accurate and then > > the whole issue with mc146818 is gone and this nicely solves it for all > > the RTCs at once. > > Another good solution is to have systohc progam the time and then > immediately read it back (eg with an alarm). This is what I was suggesting, with an alarm. > The difference between > the read back and the system RTC is the single value to plug into the > adjustment offset and naturally includes all the time factors Thomas > identified, including the additional factor of 'latency to read the > time' There is no 'latency to read the time' because you don't have to read the time. You already know what the time will be when the alarm fires. That is when you read the system time and you can figure out what the offset is. But you never need to read the time. [...] > I see I also changed jobs right around then which probably explains > why things stopped at this one patch. The fact nobody else picked it > up probably says people really just don't care about realtime > accuracy. Or those that do care do it from userspace. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 21:05 ` Thomas Gleixner 2020-12-03 21:31 ` Thomas Gleixner @ 2020-12-03 22:00 ` Alexandre Belloni 2020-12-04 9:34 ` Thomas Gleixner 1 sibling, 1 reply; 38+ messages in thread From: Alexandre Belloni @ 2020-12-03 22:00 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 03/12/2020 22:05:09+0100, Thomas Gleixner wrote: > 2) I2C/SPI ... > > tsched t0 t1 t2 > transfer(newsec) RTC update (newsec) RTC increments seconds > > Lets assume that ttransfer = t1 - t0 is known. Note that ttransfer is one of the reason why setting set_offset_nsec from the RTC driver is not a good idea. The same RTC may be on busses with different rates and there is no way to know that. I think that was one of my objections at the time. ttransfer is not a function of the RTC model but rather of how it is integrated in the system. > > tinc is the same as above = t2 - t1 > > Again, lets assume that the event is accurate for now and ignore the fuzz > logic, i.e. tsched == t0 > > So tsched has to be ttot = t2 - t0 _before_ wallclock reaches t2 and > increments seconds. > I had a rough week and I'm probably not awake enough to follow completely but is that thinking correct? For the mc146818, you have t1 - t0 which is probably negligible and t2 - T& == 500 ms For most of the other RTCs, you have t1 - t0 is somewhat important, probably around 100 to 150µs and t2 - t1 is 1s. I would think that what is needed is tsched has to be t1-t0 before wallclock reaches t1. In that case t2 doesn't matter, it will always be 1s after t1. > In this case newsec = t1.tv_sec = (t0 + ttransfer).tv_sec > > So now the fuzz logic for this is: > > newtime = t0 + ttransfer; > > if (newtime.tv_nsec < FUZZ) > newsec = newtime.tv_sec; > else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ) > newsec = newtime.tv_sec + 1; > else > goto fail; > > Again the first condition handles the case where t1 >= tsched and the > second one where t1 < tsched. > > So now we have two options to fix this: > > 1) Use a negative sync_offset for devices which need #1 above > (rtc_cmos & similar) > > That requires setting tsched to t2 - abs(sync_offset) > > 2) Use always a positive sync_offset and a flag which tells > rtc_tv_nsec_ok() whether it needs to add or subtract. > > #1 is good enough. All it takes is a comment at the timer start code why > abs() is required. > > Let me hack that up along with the hrtimer muck. > > Thanks, > > tglx -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 22:00 ` Alexandre Belloni @ 2020-12-04 9:34 ` Thomas Gleixner 2020-12-04 9:51 ` Alexandre Belloni 0 siblings, 1 reply; 38+ messages in thread From: Thomas Gleixner @ 2020-12-04 9:34 UTC (permalink / raw) To: Alexandre Belloni Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03 2020 at 23:00, Alexandre Belloni wrote: > On 03/12/2020 22:05:09+0100, Thomas Gleixner wrote: >> 2) I2C/SPI ... >> >> tsched t0 t1 t2 >> transfer(newsec) RTC update (newsec) RTC increments seconds >> >> Lets assume that ttransfer = t1 - t0 is known. > > Note that ttransfer is one of the reason why setting set_offset_nsec > from the RTC driver is not a good idea. The same RTC may be on busses > with different rates and there is no way to know that. I think that was > one of my objections at the time. > > ttransfer is not a function of the RTC model but rather of how it is > integrated in the system. Yes, but it's the right place to store that information. It's a fundamental problem of the RTC driver because that's the one which has to be able to tell the caller about it. The caller has absolutely no way to figure it out because it does not even know what type of RTC is there. So either the RTC knows the requirements for tsched, e.g. the MC14xxx datasheet, or it can retrieve that information from DT or by querying the underlying bus mechanics for the xfer time estimate or just by timing an xfer for reference. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 9:34 ` Thomas Gleixner @ 2020-12-04 9:51 ` Alexandre Belloni 2020-12-04 10:44 ` Thomas Gleixner 0 siblings, 1 reply; 38+ messages in thread From: Alexandre Belloni @ 2020-12-04 9:51 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 04/12/2020 10:34:13+0100, Thomas Gleixner wrote: > On Thu, Dec 03 2020 at 23:00, Alexandre Belloni wrote: > > On 03/12/2020 22:05:09+0100, Thomas Gleixner wrote: > >> 2) I2C/SPI ... > >> > >> tsched t0 t1 t2 > >> transfer(newsec) RTC update (newsec) RTC increments seconds > >> > >> Lets assume that ttransfer = t1 - t0 is known. > > > > Note that ttransfer is one of the reason why setting set_offset_nsec > > from the RTC driver is not a good idea. The same RTC may be on busses > > with different rates and there is no way to know that. I think that was > > one of my objections at the time. > > > > ttransfer is not a function of the RTC model but rather of how it is > > integrated in the system. > > Yes, but it's the right place to store that information. > > It's a fundamental problem of the RTC driver because that's the one > which has to be able to tell the caller about it. The caller has > absolutely no way to figure it out because it does not even know what > type of RTC is there. > > So either the RTC knows the requirements for tsched, e.g. the MC14xxx > datasheet, or it can retrieve that information from DT or by querying > the underlying bus mechanics for the xfer time estimate or just by > timing an xfer for reference. > What I do from userspace is that the caller is the one estimating the transfer time and this works very well. I really think that the ntp code could do just the same. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-04 9:51 ` Alexandre Belloni @ 2020-12-04 10:44 ` Thomas Gleixner 0 siblings, 0 replies; 38+ messages in thread From: Thomas Gleixner @ 2020-12-04 10:44 UTC (permalink / raw) To: Alexandre Belloni Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Fri, Dec 04 2020 at 10:51, Alexandre Belloni wrote: > On 04/12/2020 10:34:13+0100, Thomas Gleixner wrote: >> So either the RTC knows the requirements for tsched, e.g. the MC14xxx >> datasheet, or it can retrieve that information from DT or by querying >> the underlying bus mechanics for the xfer time estimate or just by >> timing an xfer for reference. >> > > What I do from userspace is that the caller is the one estimating the > transfer time and this works very well. I really think that the ntp code > could do just the same. For MC14xxx type RTCs tsched is defined by a constant, so heuristics are really horrible because you have to poll the RTC to get it correct. What's the point if the driver can just provide the value from the data sheet? For RTC's behind a bus the driver its pretty simple to let the driver tell at RTC registration time that the transfer time is unknown. So you don't have to add the estimation procedure to each driver. You simply can add it to the core in one place and expose that info to user space as well as a starting point. Sticking that into the NTP code is really the wrong place. That's like asking the users of a timer device to calibrate it before usage. The requirements for writing a RTC are not a problem of the caller, they are fundamental properties of the RTC itself. So why on earth are you asking every user to implement heuristics to figure these out themself? Having it as property of the RTC device gives at least a halfways correct value for the periodic kernel side update and if user space want's to do better then it still can do so. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 15:39 ` Thomas Gleixner 2020-12-03 16:16 ` Jason Gunthorpe @ 2020-12-03 17:29 ` Alexandre Belloni 2020-12-03 19:52 ` Thomas Gleixner 1 sibling, 1 reply; 38+ messages in thread From: Alexandre Belloni @ 2020-12-03 17:29 UTC (permalink / raw) To: Thomas Gleixner Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 03/12/2020 16:39:21+0100, Thomas Gleixner wrote: > Alexandre, > > On Thu, Dec 03 2020 at 03:10, Alexandre Belloni wrote: > > On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote: > >> That said, can somebody answer the one million dollar question which > >> problem is solved by all of this magic nonsense? > >> > > The goal was to remove the 500ms offset for all the RTCs but the > > MC146818 because there are RTC that will reset properly their counter > > when setting the time, meaning they can be set very precisely. > > The MC setting is halfways precise. The write resets the divider chain > and when the reset is removed then the next UIP will happen after the > magic 0.5 seconds. So yes, writing it 500ms _before_ the next second is > halfways correct assumed that there is no interference between > ktime_get_real() and the actual write which is a silly assumption as the > code is fully preemptible. > > > IIRC, used in conjunction with rtc_hctosys which also adds > > inconditionnaly 500ms this can ends up with the system time > > being one second away from the wall clock time which NTP will take quite > > some time to remove. > > The rtc_cmos() driver has a fun comment in cmos_set_time().... > > The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I > pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and > rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in > behaviour. > > > Coincidentally, I was going to revert those patches for v5.11. > > Which will break the rtc_cmos() driver in a different way. We should > really fix that proper and just have the 500ms offset for rtc_cmos, > aka. MC146818. When other drivers want a different offset, then they > still can do so. > My point was to get back to the previous situation where only rtc_cmos was supposed to work properly by removing rtc_tv_nsec_ok and set_offset_nsec. > The direct /dev/rtc settime ioctl is not using that logic anyway. Thats > the business of the user space application to get that straight which is > scheduling lottery as well. I still don't see how userspace is worse than systohc in that regard and why we need to do that in the kernel. Especially since hctosys is doing a very bad job trying to read the time from the RTC. You may as well not bother with the 500ms and just set the time to the current or next second. And what about the non configurable 659 period, isn't that policy that should be left to userspace configuration? I'm still convinced that set_offset_nsec is not needed to set the time accurately and I still want to remove it. Also, this may be a good time to move systohc.c to kernel/time/ntp.c as this is definitively some NTP specific code being an RTC consumer, very much like alarmtimer.c -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 17:29 ` Alexandre Belloni @ 2020-12-03 19:52 ` Thomas Gleixner 0 siblings, 0 replies; 38+ messages in thread From: Thomas Gleixner @ 2020-12-03 19:52 UTC (permalink / raw) To: Alexandre Belloni Cc: Jason Gunthorpe, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra Alexandre, On Thu, Dec 03 2020 at 18:29, Alexandre Belloni wrote: > On 03/12/2020 16:39:21+0100, Thomas Gleixner wrote: >> On Thu, Dec 03 2020 at 03:10, Alexandre Belloni wrote: >> > On 03/12/2020 02:14:12+0100, Thomas Gleixner wrote: >> > Coincidentally, I was going to revert those patches for v5.11. >> >> Which will break the rtc_cmos() driver in a different way. We should >> really fix that proper and just have the 500ms offset for rtc_cmos, >> aka. MC146818. When other drivers want a different offset, then they >> still can do so. >> > > My point was to get back to the previous situation where only > rtc_cmos was supposed to work properly by removing rtc_tv_nsec_ok and > set_offset_nsec. If you remove the offset, then rtc_cmos() is off by ~500ms. > I'm still convinced that set_offset_nsec is not needed to set the time > accurately and I still want to remove it. Also, this may be a good time > to move systohc.c to kernel/time/ntp.c as this is definitively some NTP > specific code being an RTC consumer, very much like alarmtimer.c No objections from my side. The main pain point is that the periodic update is seen as ABI by some folks. The fact that you can disable it in Kconfig does not matter. You can disable other stuff like posix timers which is ABI as well. So removing it is not really an option. Keeping it broken or break it differently is neither... Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 2:10 ` Alexandre Belloni 2020-12-03 15:39 ` Thomas Gleixner @ 2020-12-03 15:52 ` Jason Gunthorpe 2020-12-03 16:07 ` Alexandre Belloni 1 sibling, 1 reply; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-03 15:52 UTC (permalink / raw) To: Alexandre Belloni Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 03:10:47AM +0100, Alexandre Belloni wrote: > IIRC, used in conjunction with rtc_hctosys which also adds > inconditionnaly 500ms this can ends up with the system time > being one second away from the wall clock time which NTP will take quite > some time to remove. I can't remember the details, but this was not the intention. As long as systohc and hctosys exist then the design goal of rtclib should be to provide sub-second accuracy on the round trip of time through the RTC. Otherwise what is the point? Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 15:52 ` Jason Gunthorpe @ 2020-12-03 16:07 ` Alexandre Belloni 2020-12-03 20:10 ` Jason Gunthorpe 0 siblings, 1 reply; 38+ messages in thread From: Alexandre Belloni @ 2020-12-03 16:07 UTC (permalink / raw) To: Jason Gunthorpe Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On 03/12/2020 11:52:49-0400, Jason Gunthorpe wrote: > On Thu, Dec 03, 2020 at 03:10:47AM +0100, Alexandre Belloni wrote: > > > IIRC, used in conjunction with rtc_hctosys which also adds > > inconditionnaly 500ms this can ends up with the system time > > being one second away from the wall clock time which NTP will take quite > > some time to remove. > > I can't remember the details, but this was not the intention. > > As long as systohc and hctosys exist then the design goal of rtclib > should be to provide sub-second accuracy on the round trip of time > through the RTC. > > Otherwise what is the point? > hctosys never had a sub second accuracy because there is no way to accurately read the rtc time without being ready to wait up to a second. I have patches doing exactly that but I'm pretty sure nobody wants to pay the price and have a kernel that boots significantly slower. Especially since that would basically affect everyone since systemd requires that hctosys is enabled. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] rtc: adapt allowed RTC update error 2020-12-03 16:07 ` Alexandre Belloni @ 2020-12-03 20:10 ` Jason Gunthorpe 0 siblings, 0 replies; 38+ messages in thread From: Jason Gunthorpe @ 2020-12-03 20:10 UTC (permalink / raw) To: Alexandre Belloni Cc: Thomas Gleixner, Miroslav Lichvar, linux-kernel, John Stultz, Prarit Bhargava, Alessandro Zummo, linux-rtc, Peter Zijlstra On Thu, Dec 03, 2020 at 05:07:53PM +0100, Alexandre Belloni wrote: > On 03/12/2020 11:52:49-0400, Jason Gunthorpe wrote: > > On Thu, Dec 03, 2020 at 03:10:47AM +0100, Alexandre Belloni wrote: > > > > > IIRC, used in conjunction with rtc_hctosys which also adds > > > inconditionnaly 500ms this can ends up with the system time > > > being one second away from the wall clock time which NTP will take quite > > > some time to remove. > > > > I can't remember the details, but this was not the intention. > > > > As long as systohc and hctosys exist then the design goal of rtclib > > should be to provide sub-second accuracy on the round trip of time > > through the RTC. > > > > Otherwise what is the point? > > hctosys never had a sub second accuracy because there is no way to > accurately read the rtc time without being ready to wait up to a > second. Yes, I know. I was talking about a goal.. If that is not the goal then just delete it all and update the docs that userspace needs to deal with all of this and the kernel stuff isn't supposed to be useful. Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2020-12-04 16:36 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-01 14:38 [PATCH] rtc: adapt allowed RTC update error Miroslav Lichvar 2020-12-01 16:12 ` Jason Gunthorpe 2020-12-01 17:14 ` Miroslav Lichvar 2020-12-01 17:35 ` Jason Gunthorpe 2020-12-02 10:01 ` [PATCHv2] " Miroslav Lichvar 2020-12-02 13:44 ` [PATCH] " Thomas Gleixner 2020-12-02 15:07 ` Miroslav Lichvar 2020-12-02 15:36 ` Miroslav Lichvar 2020-12-02 18:36 ` Thomas Gleixner 2020-12-02 16:27 ` Jason Gunthorpe 2020-12-02 19:21 ` Thomas Gleixner 2020-12-02 20:54 ` Jason Gunthorpe 2020-12-02 22:08 ` Thomas Gleixner 2020-12-02 23:03 ` Jason Gunthorpe 2020-12-03 1:14 ` Thomas Gleixner 2020-12-03 2:04 ` Jason Gunthorpe 2020-12-03 2:10 ` Alexandre Belloni 2020-12-03 15:39 ` Thomas Gleixner 2020-12-03 16:16 ` Jason Gunthorpe 2020-12-03 21:05 ` Thomas Gleixner 2020-12-03 21:31 ` Thomas Gleixner 2020-12-03 22:36 ` Jason Gunthorpe 2020-12-04 13:02 ` Thomas Gleixner 2020-12-04 14:08 ` Jason Gunthorpe 2020-12-04 14:37 ` Alexandre Belloni 2020-12-04 14:46 ` Jason Gunthorpe 2020-12-04 15:08 ` Alexandre Belloni 2020-12-04 15:57 ` Jason Gunthorpe 2020-12-04 16:35 ` Alexandre Belloni 2020-12-03 22:00 ` Alexandre Belloni 2020-12-04 9:34 ` Thomas Gleixner 2020-12-04 9:51 ` Alexandre Belloni 2020-12-04 10:44 ` Thomas Gleixner 2020-12-03 17:29 ` Alexandre Belloni 2020-12-03 19:52 ` Thomas Gleixner 2020-12-03 15:52 ` Jason Gunthorpe 2020-12-03 16:07 ` Alexandre Belloni 2020-12-03 20:10 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).