From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752207AbcGUSIc (ORCPT ); Thu, 21 Jul 2016 14:08:32 -0400 Received: from mail-oi0-f50.google.com ([209.85.218.50]:34601 "EHLO mail-oi0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751592AbcGUSI3 (ORCPT ); Thu, 21 Jul 2016 14:08:29 -0400 MIME-Version: 1.0 In-Reply-To: <20160713130017.8202-2-nicstange@gmail.com> References: <20160713130017.8202-1-nicstange@gmail.com> <20160713130017.8202-2-nicstange@gmail.com> From: John Stultz Date: Thu, 21 Jul 2016 11:08:27 -0700 Message-ID: Subject: Re: [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion To: Nicolai Stange Cc: Thomas Gleixner , lkml Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 13, 2016 at 6:00 AM, Nicolai Stange wrote: > With NOHZ_FULL and one single well-isolated, CPU consumptive task, one > would expect approximately one clockevent interrupt per second. However, on > my Intel Haswell where the monotonic clock is the TSC monotonic clock and > the clockevent device is the TSC deadline device, it turns out that every > second, there are two such interrupts: the first one arrives always > approximately ~50us before the scheduled deadline as programmed by > tick_nohz_stop_sched_tick() through the hrtimer API. The > __hrtimer_run_queues() called in this interrupt detects that the queued > tick_sched_timer hasn't expired yet and simply does nothing except > reprogramming the clock event device to fire shortly after again. > > These too early programmed deadlines are explained as follows: > clockevents_program_event() programs the clockevent device to fire > after > f_event * delta_t_progr > clockevent device cycles where f_event is the clockevent device's hardware > frequency and delta_t_progr is the requested time interval. After that many > clockevent device cycles have elapsed, the device underlying the monotonic > clock, that is the monotonic raw clock has seen f_raw / f_event as many > cycles. > The ktime_get() called from __hrtimer_run_queues() interprets those > cycles to run at the frequency of the monotonic clock. Summarizing: > delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr > = f_raw / f_mono * delta_t_progr > with f_mono being the monotonic clock's frequency and delta_t_perc being > the elapsed time interval as perceived by __hrtimer_run_queues(). > > Now, f_mono is not a constant, but is dynamically adjusted in > timekeeping_adjust() in order to compensate for the NTP error. With the > large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made > becomes significant and results in the double timer interrupts described > above. > > Compensate for this error by multiplying the clockevent device's f_event > by f_mono/f_raw. > > Namely: > - Introduce a ->mult_mono member to the struct clock_event_device. It's > value is supposed to be equal to ->mult * f_mono/f_raw. > - Introduce the timekeeping_get_mono_mult() helper which provides > the clockevent core with access to the timekeeping's current f_mono > and f_raw. > - Introduce the helper __clockevents_adjust_freq() which > sets a clockevent device's ->mult_mono member as appropriate. It is > implemented with the help of the new __clockevents_calc_adjust_freq(). > - Call __clockevents_adjust_freq() at clockevent device registration time > as well as at frequency updates through clockevents_update_freq(). > - Finally, use the ->mult_mono rather than ->mult in the ns to cycle > conversion made in clockevents_program_event(). > > Note that future adjustments of the monotonic clock are not taken into > account yet. Furthemore, this patch assumes that after a clockevent > device's registration, its ->mult changes only through calls to > clockevents_update_freq(). Sorry for being a little slow to review here. Been swamped. I was about to queue this but had a few nits that need addressing. > Signed-off-by: Nicolai Stange > --- > include/linux/clockchips.h | 1 + > kernel/time/clockevents.c | 49 ++++++++++++++++++++++++++++++++++++++++++++- > kernel/time/tick-internal.h | 1 + > kernel/time/timekeeping.c | 8 ++++++++ > 4 files changed, 58 insertions(+), 1 deletion(-) > > diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h > index 0d442e3..2863742 100644 > --- a/include/linux/clockchips.h > +++ b/include/linux/clockchips.h > @@ -104,6 +104,7 @@ struct clock_event_device { > u64 max_delta_ns; > u64 min_delta_ns; > u32 mult; > + u32 mult_mono; So in this context(for me at least), mult and mult_mono are a bit confusing. I tend to think of it as mult and mult_raw, but in this case mult is the "raw" unmodified value and mult_mono is the adjusted one. I'd probably suggest mult_adjusted or some other name to make it more clear how it differs from the clockevent mult. > > +void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw) > +{ > + struct tk_read_base *tkr_mono = &tk_core.timekeeper.tkr_mono; > + > + *mult_cs_mono = tkr_mono->mult; > + *mult_cs_raw = tkr_mono->clock->mult; > +} So.. you probably should have some locking here. Or at least a big comment making it clear why locking isn't necessary. thanks -john