linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolai Stange <nicstange@gmail.com>
To: Nicolai Stange <nicstange@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, John Stultz <john.stultz@linaro.org>,
	Borislav Petkov <bp@suse.de>, Paolo Bonzini <pbonzini@redhat.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
	"Peter Zijlstra \(Intel\)" <peterz@infradead.org>,
	"Christopher S. Hall" <christopher.s.hall@intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 4/4] kernel/time/clockevents: compensate for monotonic clock's dynamic frequency
Date: Mon, 11 Jul 2016 08:32:08 +0200	[thread overview]
Message-ID: <87inwcd6s7.fsf@gmail.com> (raw)
In-Reply-To: <20160710193047.18320-5-nicstange@gmail.com> (Nicolai Stange's message of "Sun, 10 Jul 2016 21:30:47 +0200")

Nicolai Stange <nicstange@gmail.com> writes:

> With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
> would expect approximately one clockevent interrupt per second. However, on
> my Intel Haswell where the monotonic clock is the TSC monotonic clock and
> the clockevent device is the TSC deadline device, it turns out that every
> second, there are two such interrupts: the first one arrives always
> approximately ~50us before the scheduled deadline as programmed by
> tick_nohz_stop_sched_tick() through the hrtimer API. The
> __hrtimer_run_queues() called in this interrupt detects that the queued
> tick_sched_timer hasn't expired yet and simply does nothing except
> reprogramming the clock event device to fire shortly after again.
>
> These too early programmed deadlines are explained as follows:
> clockevents_program_event() programs the clockevent device to fire
> after
>   f_event * delta_t_progr
> clockevent device cycles where f_event is the clockevent device's hardware
> frequency and delta_t_progr is the requested time interval. After that many
> clockevent device cycles have elapsed, the device underlying the monotonic
> clock, that is the monotonic raw clock has seen f_raw / f_event as many
> cycles.
> The ktime_get() called from __hrtimer_run_queues() interprets those
> cycles to run at the frequency of the monotonic clock. Summarizing:
>   delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
>                = f_raw / f_mono * delta_t_progr
> with f_mono being the monotonic clock's frequency and delta_t_perc being
> the elapsed time interval as perceived by __hrtimer_run_queues().
>
> Now, f_mono is not a constant, but is dynamically adjusted in
> timekeeping_adjust() in order to compensate for the NTP error. With the
> large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
> becomes significant and results in the double timer interrupts described
> above.
>
> Compensate for this error by multiplying delta_t_progr with f_mono / f_raw
> in clockevents_program_event() before actually programming the clockevent
> device.
>
> Namely, introduce a helper, timekeeping_mono_interval_to_raw(), which
> converts a given time interval from the monotonic clock's perception to
> that of the raw monotonic clock by multiplying the value by f_mono / f_raw.
> Call that helper from clockevents_program_event() in order to obtain a
> suitable time interval to program the clockevent device with.
>
> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
> ---
>  kernel/time/clockevents.c |  1 +
>  kernel/time/timekeeping.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/time/timekeeping.h |  1 +
>  3 files changed, 61 insertions(+)
>
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index a9b76a4..4bccf04 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -329,6 +329,7 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
>  		return dev->set_next_ktime(expires, dev);
>  
>  	delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
> +	delta = timekeeping_mono_interval_to_raw(delta);
>  	if (delta <= 0)
>  		return force ? clockevents_program_min_delta(dev) : -ETIME;
>  
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index dcd5ce6..51dfbbb 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -23,6 +23,7 @@
>  #include <linux/stop_machine.h>
>  #include <linux/pvclock_gtod.h>
>  #include <linux/compiler.h>
> +#include <asm/div64.h>
>  
>  #include "tick-internal.h"
>  #include "ntp_internal.h"
> @@ -2133,6 +2134,64 @@ out:
>  }
>  
>  /**
> + * timekeeper_mono_interval_to_raw - Convert mono interval to raw's perception.
> + * @interval: Time interval as measured by the mono clock.
> + *
> + * Converts the given time interval as measured by the monotonic clock to
> + * what would have been measured by the raw monotonic clock in the meanwhile.
> + * The monotonic clock's frequency gets dynamically adjusted every now and then
> + * in order to compensate for the differences to NTP. OTOH, the clockevents
> + * devices are not affected by this adjustment, i.e. they keep ticking at some
> + * fixed hardware frequency which may be assumed to have a constant ratio to
> + * the fixed raw monotonic clock's frequency. This function provides a means
> + * to convert time intervals from the dynamic frequency monotonic clock to
> + * the fixed frequency hardware world.
> + *
> + * If interval < 0, zero is returned. If an overflow happens during the
> + * calculation, KTIME_MAX is returned.
> + */
> +s64 timekeeping_mono_interval_to_raw(s64 interval)
> +{
> +	struct timekeeper *tk = &tk_core.timekeeper;
> +	u32 raw_mult = tk->tkr_raw.mult, mono_mult = tk->tkr_mono.mult;
> +	u64 raw, tmp;
> +
> +	/* The overflow checks below can't deal with negative intervals. */
> +	if (interval <= 0)
> +		return 0;
> +
> +	/*
> +	 * Calculate
> +	 *   raw = f_mono / f_raw * interval
> +	 *       = (raw_mult / 2^raw_shift) / (mono_mult / 2^mono_shift)
> +	 *            * interval
> +	 * where f_mono and f_raw denote the frequencies of the monotonic
> +	 * and raw clock respectively.
> +	 *
> +	 * Note that the monotonic and raw clocks' shifts are equal and fixed,
> +	 * that is they cancel.
> +	 */
> +
> +	/* First, calculate interval * raw_mult while checking for overflow. */

After thinking further about this, I had to recognize that
  (raw_mult - mono_mult) * interval
is *much* less likely to overflow.

So, I'll send a v3 doing

  raw = interval + (((raw_mult - mono_mult) * interval) / mono_mult)

during the course of the day.



> +	raw = ((u64)interval >> 32) * raw_mult; /* Upper half of interval */
> +	if (raw >> 32)
> +		return KTIME_MAX;
> +	raw <<= 32;
> +	tmp = ((u64)interval & U32_MAX) * raw_mult; /* Lower half of interval */
> +	if (U64_MAX - raw < tmp)
> +		return KTIME_MAX;
> +	raw += tmp;
> +
> +	/* Finally, do raw /= mono_mult with proper rounding. */
> +	if (U64_MAX - raw < mono_mult / 2)
> +		return KTIME_MAX;
> +	raw += mono_mult / 2;
> +	do_div(raw, mono_mult);
> +
> +	return (s64)raw;
> +}
> +
> +/**
>   * getboottime64 - Return the real time of system boot.
>   * @ts:		pointer to the timespec64 to be set
>   *
> diff --git a/kernel/time/timekeeping.h b/kernel/time/timekeeping.h
> index 704f595..40a0fa9 100644
> --- a/kernel/time/timekeeping.h
> +++ b/kernel/time/timekeeping.h
> @@ -18,6 +18,7 @@ extern void timekeeping_resume(void);
>  
>  extern void do_timer(unsigned long ticks);
>  extern void update_wall_time(void);
> +extern s64 timekeeping_mono_interval_to_raw(s64 interval);
>  
>  extern seqlock_t jiffies_lock;

  reply	other threads:[~2016-07-11  6:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-10 19:30 [PATCH v2 0/4] avoid double timer interrupt with nohz and Intel TSC Nicolai Stange
2016-07-10 19:30 ` [PATCH v2 1/4] arch, x86, tsc deadline clockevent dev: reduce frequency roundoff error Nicolai Stange
2016-07-10 19:30 ` [PATCH v2 2/4] arch, x86, tsc deadline clockevent dev: reduce TSC_DIVISOR to 2 Nicolai Stange
2016-07-10 19:30 ` [PATCH v2 3/4] arch, x86, tsc: inform TSC deadline clockevent device about recalibration Nicolai Stange
2016-07-10 19:30 ` [PATCH v2 4/4] kernel/time/clockevents: compensate for monotonic clock's dynamic frequency Nicolai Stange
2016-07-11  6:32   ` Nicolai Stange [this message]
2016-07-11  8:32     ` Thomas Gleixner
2016-07-12 11:10       ` Nicolai Stange
2016-07-12 15:04         ` Thomas Gleixner
2016-07-13 13:08           ` Nicolai Stange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87inwcd6s7.fsf@gmail.com \
    --to=nicstange@gmail.com \
    --cc=adrian.hunter@intel.com \
    --cc=bp@suse.de \
    --cc=christopher.s.hall@intel.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=hpa@zytor.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=viresh.kumar@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).