From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757766AbcGKGcO (ORCPT ); Mon, 11 Jul 2016 02:32:14 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:35560 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750826AbcGKGcN (ORCPT ); Mon, 11 Jul 2016 02:32:13 -0400 From: Nicolai Stange To: Nicolai Stange Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, John Stultz , Borislav Petkov , Paolo Bonzini , Viresh Kumar , Hidehiro Kawai , "Peter Zijlstra \(Intel\)" , "Christopher S. Hall" , Adrian Hunter , linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 4/4] kernel/time/clockevents: compensate for monotonic clock's dynamic frequency References: <20160710193047.18320-1-nicstange@gmail.com> <20160710193047.18320-5-nicstange@gmail.com> Date: Mon, 11 Jul 2016 08:32:08 +0200 In-Reply-To: <20160710193047.18320-5-nicstange@gmail.com> (Nicolai Stange's message of "Sun, 10 Jul 2016 21:30:47 +0200") Message-ID: <87inwcd6s7.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.94 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Nicolai Stange writes: > With NOHZ_FULL and one single well-isolated, CPU consumptive task, one > would expect approximately one clockevent interrupt per second. However, on > my Intel Haswell where the monotonic clock is the TSC monotonic clock and > the clockevent device is the TSC deadline device, it turns out that every > second, there are two such interrupts: the first one arrives always > approximately ~50us before the scheduled deadline as programmed by > tick_nohz_stop_sched_tick() through the hrtimer API. The > __hrtimer_run_queues() called in this interrupt detects that the queued > tick_sched_timer hasn't expired yet and simply does nothing except > reprogramming the clock event device to fire shortly after again. > > These too early programmed deadlines are explained as follows: > clockevents_program_event() programs the clockevent device to fire > after > f_event * delta_t_progr > clockevent device cycles where f_event is the clockevent device's hardware > frequency and delta_t_progr is the requested time interval. After that many > clockevent device cycles have elapsed, the device underlying the monotonic > clock, that is the monotonic raw clock has seen f_raw / f_event as many > cycles. > The ktime_get() called from __hrtimer_run_queues() interprets those > cycles to run at the frequency of the monotonic clock. Summarizing: > delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr > = f_raw / f_mono * delta_t_progr > with f_mono being the monotonic clock's frequency and delta_t_perc being > the elapsed time interval as perceived by __hrtimer_run_queues(). > > Now, f_mono is not a constant, but is dynamically adjusted in > timekeeping_adjust() in order to compensate for the NTP error. With the > large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made > becomes significant and results in the double timer interrupts described > above. > > Compensate for this error by multiplying delta_t_progr with f_mono / f_raw > in clockevents_program_event() before actually programming the clockevent > device. > > Namely, introduce a helper, timekeeping_mono_interval_to_raw(), which > converts a given time interval from the monotonic clock's perception to > that of the raw monotonic clock by multiplying the value by f_mono / f_raw. > Call that helper from clockevents_program_event() in order to obtain a > suitable time interval to program the clockevent device with. > > Signed-off-by: Nicolai Stange > --- > kernel/time/clockevents.c | 1 + > kernel/time/timekeeping.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++ > kernel/time/timekeeping.h | 1 + > 3 files changed, 61 insertions(+) > > diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c > index a9b76a4..4bccf04 100644 > --- a/kernel/time/clockevents.c > +++ b/kernel/time/clockevents.c > @@ -329,6 +329,7 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires, > return dev->set_next_ktime(expires, dev); > > delta = ktime_to_ns(ktime_sub(expires, ktime_get())); > + delta = timekeeping_mono_interval_to_raw(delta); > if (delta <= 0) > return force ? clockevents_program_min_delta(dev) : -ETIME; > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c > index dcd5ce6..51dfbbb 100644 > --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c > @@ -23,6 +23,7 @@ > #include > #include > #include > +#include > > #include "tick-internal.h" > #include "ntp_internal.h" > @@ -2133,6 +2134,64 @@ out: > } > > /** > + * timekeeper_mono_interval_to_raw - Convert mono interval to raw's perception. > + * @interval: Time interval as measured by the mono clock. > + * > + * Converts the given time interval as measured by the monotonic clock to > + * what would have been measured by the raw monotonic clock in the meanwhile. > + * The monotonic clock's frequency gets dynamically adjusted every now and then > + * in order to compensate for the differences to NTP. OTOH, the clockevents > + * devices are not affected by this adjustment, i.e. they keep ticking at some > + * fixed hardware frequency which may be assumed to have a constant ratio to > + * the fixed raw monotonic clock's frequency. This function provides a means > + * to convert time intervals from the dynamic frequency monotonic clock to > + * the fixed frequency hardware world. > + * > + * If interval < 0, zero is returned. If an overflow happens during the > + * calculation, KTIME_MAX is returned. > + */ > +s64 timekeeping_mono_interval_to_raw(s64 interval) > +{ > + struct timekeeper *tk = &tk_core.timekeeper; > + u32 raw_mult = tk->tkr_raw.mult, mono_mult = tk->tkr_mono.mult; > + u64 raw, tmp; > + > + /* The overflow checks below can't deal with negative intervals. */ > + if (interval <= 0) > + return 0; > + > + /* > + * Calculate > + * raw = f_mono / f_raw * interval > + * = (raw_mult / 2^raw_shift) / (mono_mult / 2^mono_shift) > + * * interval > + * where f_mono and f_raw denote the frequencies of the monotonic > + * and raw clock respectively. > + * > + * Note that the monotonic and raw clocks' shifts are equal and fixed, > + * that is they cancel. > + */ > + > + /* First, calculate interval * raw_mult while checking for overflow. */ After thinking further about this, I had to recognize that (raw_mult - mono_mult) * interval is *much* less likely to overflow. So, I'll send a v3 doing raw = interval + (((raw_mult - mono_mult) * interval) / mono_mult) during the course of the day. > + raw = ((u64)interval >> 32) * raw_mult; /* Upper half of interval */ > + if (raw >> 32) > + return KTIME_MAX; > + raw <<= 32; > + tmp = ((u64)interval & U32_MAX) * raw_mult; /* Lower half of interval */ > + if (U64_MAX - raw < tmp) > + return KTIME_MAX; > + raw += tmp; > + > + /* Finally, do raw /= mono_mult with proper rounding. */ > + if (U64_MAX - raw < mono_mult / 2) > + return KTIME_MAX; > + raw += mono_mult / 2; > + do_div(raw, mono_mult); > + > + return (s64)raw; > +} > + > +/** > * getboottime64 - Return the real time of system boot. > * @ts: pointer to the timespec64 to be set > * > diff --git a/kernel/time/timekeeping.h b/kernel/time/timekeeping.h > index 704f595..40a0fa9 100644 > --- a/kernel/time/timekeeping.h > +++ b/kernel/time/timekeeping.h > @@ -18,6 +18,7 @@ extern void timekeeping_resume(void); > > extern void do_timer(unsigned long ticks); > extern void update_wall_time(void); > +extern s64 timekeeping_mono_interval_to_raw(s64 interval); > > extern seqlock_t jiffies_lock;