linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolai Stange <nicstange@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, John Stultz <john.stultz@linaro.org>,
	Borislav Petkov <bp@suse.de>, Paolo Bonzini <pbonzini@redhat.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	"Christopher S. Hall" <christopher.s.hall@intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	linux-kernel@vger.kernel.org,
	Nicolai Stange <nicstange@gmail.com>
Subject: [PATCH v2 4/4] kernel/time/clockevents: compensate for monotonic clock's dynamic frequency
Date: Sun, 10 Jul 2016 21:30:47 +0200	[thread overview]
Message-ID: <20160710193047.18320-5-nicstange@gmail.com> (raw)
In-Reply-To: <20160710193047.18320-1-nicstange@gmail.com>

With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
would expect approximately one clockevent interrupt per second. However, on
my Intel Haswell where the monotonic clock is the TSC monotonic clock and
the clockevent device is the TSC deadline device, it turns out that every
second, there are two such interrupts: the first one arrives always
approximately ~50us before the scheduled deadline as programmed by
tick_nohz_stop_sched_tick() through the hrtimer API. The
__hrtimer_run_queues() called in this interrupt detects that the queued
tick_sched_timer hasn't expired yet and simply does nothing except
reprogramming the clock event device to fire shortly after again.

These too early programmed deadlines are explained as follows:
clockevents_program_event() programs the clockevent device to fire
after
  f_event * delta_t_progr
clockevent device cycles where f_event is the clockevent device's hardware
frequency and delta_t_progr is the requested time interval. After that many
clockevent device cycles have elapsed, the device underlying the monotonic
clock, that is the monotonic raw clock has seen f_raw / f_event as many
cycles.
The ktime_get() called from __hrtimer_run_queues() interprets those
cycles to run at the frequency of the monotonic clock. Summarizing:
  delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
               = f_raw / f_mono * delta_t_progr
with f_mono being the monotonic clock's frequency and delta_t_perc being
the elapsed time interval as perceived by __hrtimer_run_queues().

Now, f_mono is not a constant, but is dynamically adjusted in
timekeeping_adjust() in order to compensate for the NTP error. With the
large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
becomes significant and results in the double timer interrupts described
above.

Compensate for this error by multiplying delta_t_progr with f_mono / f_raw
in clockevents_program_event() before actually programming the clockevent
device.

Namely, introduce a helper, timekeeping_mono_interval_to_raw(), which
converts a given time interval from the monotonic clock's perception to
that of the raw monotonic clock by multiplying the value by f_mono / f_raw.
Call that helper from clockevents_program_event() in order to obtain a
suitable time interval to program the clockevent device with.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
 kernel/time/clockevents.c |  1 +
 kernel/time/timekeeping.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
 kernel/time/timekeeping.h |  1 +
 3 files changed, 61 insertions(+)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index a9b76a4..4bccf04 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -329,6 +329,7 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
 		return dev->set_next_ktime(expires, dev);
 
 	delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
+	delta = timekeeping_mono_interval_to_raw(delta);
 	if (delta <= 0)
 		return force ? clockevents_program_min_delta(dev) : -ETIME;
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index dcd5ce6..51dfbbb 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -23,6 +23,7 @@
 #include <linux/stop_machine.h>
 #include <linux/pvclock_gtod.h>
 #include <linux/compiler.h>
+#include <asm/div64.h>
 
 #include "tick-internal.h"
 #include "ntp_internal.h"
@@ -2133,6 +2134,64 @@ out:
 }
 
 /**
+ * timekeeper_mono_interval_to_raw - Convert mono interval to raw's perception.
+ * @interval: Time interval as measured by the mono clock.
+ *
+ * Converts the given time interval as measured by the monotonic clock to
+ * what would have been measured by the raw monotonic clock in the meanwhile.
+ * The monotonic clock's frequency gets dynamically adjusted every now and then
+ * in order to compensate for the differences to NTP. OTOH, the clockevents
+ * devices are not affected by this adjustment, i.e. they keep ticking at some
+ * fixed hardware frequency which may be assumed to have a constant ratio to
+ * the fixed raw monotonic clock's frequency. This function provides a means
+ * to convert time intervals from the dynamic frequency monotonic clock to
+ * the fixed frequency hardware world.
+ *
+ * If interval < 0, zero is returned. If an overflow happens during the
+ * calculation, KTIME_MAX is returned.
+ */
+s64 timekeeping_mono_interval_to_raw(s64 interval)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	u32 raw_mult = tk->tkr_raw.mult, mono_mult = tk->tkr_mono.mult;
+	u64 raw, tmp;
+
+	/* The overflow checks below can't deal with negative intervals. */
+	if (interval <= 0)
+		return 0;
+
+	/*
+	 * Calculate
+	 *   raw = f_mono / f_raw * interval
+	 *       = (raw_mult / 2^raw_shift) / (mono_mult / 2^mono_shift)
+	 *            * interval
+	 * where f_mono and f_raw denote the frequencies of the monotonic
+	 * and raw clock respectively.
+	 *
+	 * Note that the monotonic and raw clocks' shifts are equal and fixed,
+	 * that is they cancel.
+	 */
+
+	/* First, calculate interval * raw_mult while checking for overflow. */
+	raw = ((u64)interval >> 32) * raw_mult; /* Upper half of interval */
+	if (raw >> 32)
+		return KTIME_MAX;
+	raw <<= 32;
+	tmp = ((u64)interval & U32_MAX) * raw_mult; /* Lower half of interval */
+	if (U64_MAX - raw < tmp)
+		return KTIME_MAX;
+	raw += tmp;
+
+	/* Finally, do raw /= mono_mult with proper rounding. */
+	if (U64_MAX - raw < mono_mult / 2)
+		return KTIME_MAX;
+	raw += mono_mult / 2;
+	do_div(raw, mono_mult);
+
+	return (s64)raw;
+}
+
+/**
  * getboottime64 - Return the real time of system boot.
  * @ts:		pointer to the timespec64 to be set
  *
diff --git a/kernel/time/timekeeping.h b/kernel/time/timekeeping.h
index 704f595..40a0fa9 100644
--- a/kernel/time/timekeeping.h
+++ b/kernel/time/timekeeping.h
@@ -18,6 +18,7 @@ extern void timekeeping_resume(void);
 
 extern void do_timer(unsigned long ticks);
 extern void update_wall_time(void);
+extern s64 timekeeping_mono_interval_to_raw(s64 interval);
 
 extern seqlock_t jiffies_lock;
 
-- 
2.9.0

  parent reply	other threads:[~2016-07-10 19:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-10 19:30 [PATCH v2 0/4] avoid double timer interrupt with nohz and Intel TSC Nicolai Stange
2016-07-10 19:30 ` [PATCH v2 1/4] arch, x86, tsc deadline clockevent dev: reduce frequency roundoff error Nicolai Stange
2016-07-10 19:30 ` [PATCH v2 2/4] arch, x86, tsc deadline clockevent dev: reduce TSC_DIVISOR to 2 Nicolai Stange
2016-07-10 19:30 ` [PATCH v2 3/4] arch, x86, tsc: inform TSC deadline clockevent device about recalibration Nicolai Stange
2016-07-10 19:30 ` Nicolai Stange [this message]
2016-07-11  6:32   ` [PATCH v2 4/4] kernel/time/clockevents: compensate for monotonic clock's dynamic frequency Nicolai Stange
2016-07-11  8:32     ` Thomas Gleixner
2016-07-12 11:10       ` Nicolai Stange
2016-07-12 15:04         ` Thomas Gleixner
2016-07-13 13:08           ` Nicolai Stange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160710193047.18320-5-nicstange@gmail.com \
    --to=nicstange@gmail.com \
    --cc=adrian.hunter@intel.com \
    --cc=bp@suse.de \
    --cc=christopher.s.hall@intel.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=hpa@zytor.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=viresh.kumar@linaro.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).