All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolai Stange <nicstange@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>,
	linux-kernel@vger.kernel.org,
	Nicolai Stange <nicstange@gmail.com>
Subject: [RFC v6 21/23] clockevents: initial support for mono to raw time conversion
Date: Fri,  9 Sep 2016 22:18:10 +0200	[thread overview]
Message-ID: <20160909201812.32396-6-nicstange@gmail.com> (raw)
In-Reply-To: <20160909200033.32103-1-nicstange@gmail.com>

With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
would expect approximately one clockevent interrupt per second. However, on
my Intel Haswell where the monotonic clock is the TSC monotonic clock and
the clockevent device is the TSC deadline device, it turns out that every
second, there are two such interrupts: the first one arrives always
approximately ~50us before the scheduled deadline as programmed by
tick_nohz_stop_sched_tick() through the hrtimer API. The
__hrtimer_run_queues() called in this interrupt detects that the queued
tick_sched_timer hasn't expired yet and simply does nothing except
reprogramming the clock event device to fire shortly after again.

These too early programmed deadlines are explained as follows:
clockevents_program_event() programs the clockevent device to fire
after
  f_event * delta_t_progr
clockevent device cycles where f_event is the clockevent device's hardware
frequency and delta_t_progr is the requested time interval. After that many
clockevent device cycles have elapsed, the device underlying the monotonic
clock, that is the monotonic raw clock has seen f_raw / f_event as many
cycles.
The ktime_get() called from __hrtimer_run_queues() interprets those
cycles to run at the frequency of the monotonic clock. Summarizing:
  delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
               = f_raw / f_mono * delta_t_progr
with f_mono being the monotonic clock's frequency and delta_t_perc being
the elapsed time interval as perceived by __hrtimer_run_queues().

Now, f_mono is not a constant, but is dynamically adjusted in
timekeeping_adjust() in order to compensate for the NTP error. With the
large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
becomes significant and results in the double timer interrupts described
above.

Compensate for this error by multiplying the clockevent device's f_event
by f_mono/f_raw.

Namely:
- Introduce a ->mult_adjusted member to the struct clock_event_device. Its
  value is supposed to be equal to ->mult * f_mono/f_raw for devices
  which don't have the CLOCK_EVT_FEAT_NO_ADJUST flag set, equal to ->mult
  otherwise.
- Introduce the timekeeping_get_mono_mult() helper which provides
  the clockevent core with access to the timekeeping's current f_mono
  and f_raw.
- Introduce the helper __clockevents_adjust_freq() which
  sets a clockevent device's ->mult_adjusted member as appropriate. It is
  implemented with the help of the new __clockevents_calc_adjust_freq().
- Call __clockevents_adjust_freq() at clockevent device registration time
  as well as at frequency updates through clockevents_update_freq().
- Use the ->mult_adjusted rather than ->mult in the ns to cycle
  conversion made in clockevents_program_event() as well as in the
  cycle to ns conversion in cev_delta2ns().
- Finally, move ->mult out of struct clock_event_device's first cacheline.

Note that future adjustments of the monotonic clock are not taken into
account yet. Furthemore, this patch assumes that after a clockevent
device's registration, its ->mult changes only through calls to
clockevents_update_freq().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
 include/linux/clockchips.h  |  6 ++--
 kernel/time/clockevents.c   | 79 ++++++++++++++++++++++++++++++++++++++-------
 kernel/time/tick-internal.h |  1 +
 kernel/time/timekeeping.c   | 14 ++++++++
 4 files changed, 87 insertions(+), 13 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 2ff15f03..28f9263 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -82,7 +82,7 @@ enum clock_event_state {
  * @max_delta_ns:	maximum delta value in ns
  * @max_delta_ticks:	maximum delta value in ticks
  * @min_delta_ticks_adjusted:	minimum delta value, increased as needed
- * @mult:		nanosecond to cycles multiplier
+ * @mult_adjusted:	adjusted multiplier compensating for NTP adjustments
  * @shift:		nanoseconds to cycles divisor (power of two)
  * @state_use_accessors:current state of the device, assigned by the core code
  * @features:		features
@@ -94,6 +94,7 @@ enum clock_event_state {
  * @tick_resume:	resume clkevt device
  * @broadcast:		function to broadcast events
  * @min_delta_ticks:	minimum delta value in ticks stored for reconfiguration
+ * @mult:		ns to cycles multiplier stored for reconfiguration
  * @name:		ptr to clock event name
  * @rating:		variable to rate clock event devices
  * @irq:		IRQ number (only for non CPU local devices)
@@ -110,7 +111,7 @@ struct clock_event_device {
 	u64			max_delta_ns;
 	unsigned long		max_delta_ticks;
 	unsigned long		min_delta_ticks_adjusted;
-	u32			mult;
+	u32			mult_adjusted;
 	u32			shift;
 	enum clock_event_state	state_use_accessors;
 	unsigned int		features;
@@ -126,6 +127,7 @@ struct clock_event_device {
 	void			(*suspend)(struct clock_event_device *);
 	void			(*resume)(struct clock_event_device *);
 	unsigned long		min_delta_ticks;
+	u32			mult;
 
 	const char		*name;
 	int			rating;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 394b8dc..67d572e 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -34,17 +34,19 @@ struct ce_unbind {
 	int res;
 };
 
+static void __clockevents_adjust_freq(struct clock_event_device *dev);
+
 static u64 cev_delta2ns(unsigned long latch, struct clock_event_device *evt,
 			bool ismax)
 {
 	u64 clc = (u64) latch << evt->shift;
 	u64 rnd;
 
-	if (unlikely(!evt->mult)) {
-		evt->mult = 1;
+	if (unlikely(!evt->mult_adjusted)) {
+		evt->mult_adjusted = 1;
 		WARN_ON(1);
 	}
-	rnd = (u64) evt->mult - 1;
+	rnd = (u64) evt->mult_adjusted - 1;
 
 	/*
 	 * Upper bound sanity check. If the backwards conversion is
@@ -73,10 +75,10 @@ static u64 cev_delta2ns(unsigned long latch, struct clock_event_device *evt,
 	 * Also omit the add if it would overflow the u64 boundary.
 	 */
 	if ((~0ULL - clc > rnd) &&
-	    (!ismax || evt->mult <= (1ULL << evt->shift)))
+	    (!ismax || evt->mult_adjusted <= (1ULL << evt->shift)))
 		clc += rnd;
 
-	do_div(clc, evt->mult);
+	do_div(clc, evt->mult_adjusted);
 
 	/* Deltas less than 1usec are pointless noise */
 	return clc > 1000 ? clc : 1000;
@@ -165,8 +167,8 @@ void clockevents_switch_state(struct clock_event_device *dev,
 		 * on it, so fix it up and emit a warning:
 		 */
 		if (clockevent_state_oneshot(dev)) {
-			if (unlikely(!dev->mult)) {
-				dev->mult = 1;
+			if (unlikely(!dev->mult_adjusted)) {
+				dev->mult_adjusted = 1;
 				WARN_ON(1);
 			}
 		}
@@ -229,8 +231,9 @@ static int clockevents_increase_min_delta(struct clock_event_device *dev)
 	if (min_delta_ns > MIN_DELTA_LIMIT)
 		min_delta_ns = MIN_DELTA_LIMIT;
 
-	dev->min_delta_ticks_adjusted = (unsigned long)((min_delta_ns *
-						dev->mult) >> dev->shift);
+	dev->min_delta_ticks_adjusted =
+		(unsigned long)((min_delta_ns * dev->mult_adjusted) >>
+				dev->shift);
 	dev->min_delta_ticks_adjusted = max(dev->min_delta_ticks_adjusted,
 						dev->min_delta_ticks);
 
@@ -330,7 +333,7 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
 
 	delta = min(delta, (int64_t) dev->max_delta_ns);
 
-	clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
+	clc = ((unsigned long long) delta * dev->mult_adjusted) >> dev->shift;
 
 	clc = min_t(unsigned long, clc, dev->max_delta_ticks);
 	clc = max_t(unsigned long, clc, dev->min_delta_ticks_adjusted);
@@ -497,7 +500,8 @@ static void __clockevents_update_bounds(struct clock_event_device *dev)
 	 */
 	dev->min_delta_ticks_adjusted =
 		max(dev->min_delta_ticks,
-			(unsigned long)((1000ULL * dev->mult) >> dev->shift));
+			(unsigned long)((1000ULL * dev->mult_adjusted) >>
+					dev->shift));
 }
 
 /**
@@ -516,6 +520,7 @@ void clockevents_register_device(struct clock_event_device *dev)
 		dev->cpumask = cpumask_of(smp_processor_id());
 	}
 
+	__clockevents_adjust_freq(dev);
 	__clockevents_update_bounds(dev);
 
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
@@ -570,9 +575,61 @@ void clockevents_config_and_register(struct clock_event_device *dev,
 }
 EXPORT_SYMBOL_GPL(clockevents_config_and_register);
 
+static u32 __clockevents_calc_adjust_freq(u32 mult_ce_raw, u32 mult_cs_mono,
+					u32 mult_cs_raw)
+{
+	u64 adj;
+	int sign;
+
+	if (mult_cs_raw >= mult_cs_mono) {
+		sign = 0;
+		adj = mult_cs_raw - mult_cs_mono;
+	} else {
+		sign = 1;
+		adj = mult_cs_mono - mult_cs_raw;
+	}
+
+	adj *= mult_ce_raw;
+	adj += mult_cs_mono / 2;
+	do_div(adj, mult_cs_mono);
+
+	if (!sign) {
+		/*
+		 * Never increase mult by more than 12.5%,
+		 * c.f. __clockevents_update_bounds().
+		 */
+		adj = min_t(u64, adj, mult_ce_raw / 8);
+		if (U32_MAX - mult_ce_raw < adj)
+			return U32_MAX;
+		return mult_ce_raw + (u32)adj;
+	}
+	if (adj >= mult_ce_raw)
+		return 1;
+	return mult_ce_raw - (u32)adj;
+}
+
+void __clockevents_adjust_freq(struct clock_event_device *dev)
+{
+	u32 mult_cs_mono, mult_cs_raw;
+
+	if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT))
+		return;
+
+	if (dev->features & CLOCK_EVT_FEAT_NO_ADJUST) {
+		dev->mult_adjusted = dev->mult;
+		return;
+	}
+
+	timekeeping_get_mono_mult(&mult_cs_mono, &mult_cs_raw);
+	dev->mult_adjusted = __clockevents_calc_adjust_freq(dev->mult,
+							mult_cs_mono,
+							mult_cs_raw);
+}
+
 int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
 {
 	clockevents_config(dev, freq);
+	__clockevents_adjust_freq(dev);
 	__clockevents_update_bounds(dev);
 
 	if (clockevent_state_oneshot(dev))
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index f738251..0b29d23 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -56,6 +56,7 @@ extern int clockevents_program_event(struct clock_event_device *dev,
 				     ktime_t expires, bool force);
 extern void clockevents_handle_noop(struct clock_event_device *dev);
 extern int __clockevents_update_freq(struct clock_event_device *dev, u32 freq);
+extern void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw);
 extern ssize_t sysfs_get_uname(const char *buf, char *dst, size_t cnt);
 
 /* Broadcasting support */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e07fb09..7ddca9e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -329,6 +329,20 @@ static inline s64 timekeeping_cycles_to_ns(struct tk_read_base *tkr,
 	return timekeeping_delta_to_ns(tkr, delta);
 }
 
+void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw)
+{
+	unsigned int seq;
+	struct tk_read_base *tkr_mono = &tk_core.timekeeper.tkr_mono;
+
+	/* The seqlock protects us from a racing change_clocksource(). */
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+
+		*mult_cs_mono = tkr_mono->mult;
+		*mult_cs_raw = tkr_mono->clock->mult;
+	} while (read_seqcount_retry(&tk_core.seq, seq));
+}
+
 /**
  * update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
  * @tkr: Timekeeping readout base from which we take the update
-- 
2.9.3

  parent reply	other threads:[~2016-09-09 20:18 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-09 20:00 [RFC v6 00/23] adapt clockevents frequencies to mono clock Nicolai Stange
2016-09-09 20:00 ` [RFC v6 01/23] clocksource: sh_cmt: compute rate before registration again Nicolai Stange
2016-09-10 12:56   ` Thomas Gleixner
2016-09-10 18:51     ` Joe Perches
2016-09-10 19:11     ` Nicolai Stange
2016-09-09 20:00 ` [RFC v6 02/23] clocksource: sh_tmu: " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 03/23] clocksource: em_sti: split clock prepare and enable steps Nicolai Stange
2016-09-09 20:00 ` [RFC v6 04/23] clocksource: em_sti: compute rate before registration Nicolai Stange
2016-09-09 20:00 ` [RFC v6 05/23] clocksource: h8300_timer8: don't reset rate in ->set_state_oneshot() Nicolai Stange
2016-09-09 20:00 ` [RFC v6 06/23] clockevents: make clockevents_config() static Nicolai Stange
2016-09-09 20:00 ` [RFC v6 07/23] many clockevent drivers: set ->min_delta_ticks and ->max_delta_ticks Nicolai Stange
2016-09-09 20:00 ` [RFC v6 08/23] arch/s390/kernel/time: " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 09/23] arch/x86/platform/uv/uv_time: " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 10/23] arch/tile/kernel/time: " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 11/23] clockevents: always initialize ->min_delta_ns and ->max_delta_ns Nicolai Stange
2016-09-09 20:00 ` [RFC v6 12/23] many clockevent drivers: don't set " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 13/23] clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag Nicolai Stange
2016-09-09 20:00 ` [RFC v6 14/23] clockevents: decouple ->max_delta_ns from ->max_delta_ticks Nicolai Stange
2016-09-09 20:00 ` [RFC v6 15/23] clockevents: do comparison of delta against minimum in terms of cycles Nicolai Stange
2016-09-09 20:18 ` [RFC v6 16/23] clockevents: clockevents_program_min_delta(): don't set ->next_event Nicolai Stange
2016-09-09 20:18 ` [RFC v6 17/23] clockevents: use ->min_delta_ticks_adjusted to program minimum delta Nicolai Stange
2016-09-09 20:18 ` [RFC v6 18/23] clockevents: min delta increment: calculate min_delta_ns from ticks Nicolai Stange
2016-09-09 20:18 ` [RFC v6 19/23] timer_list: print_tickdevice(): calculate ->min_delta_ns dynamically Nicolai Stange
2016-09-09 20:18 ` [RFC v6 20/23] clockevents: purge ->min_delta_ns Nicolai Stange
2016-09-09 20:18 ` Nicolai Stange [this message]
2016-09-09 20:18 ` [RFC v6 22/23] clockevents: make setting of ->mult and ->mult_adjusted atomic Nicolai Stange
2016-09-09 20:18 ` [RFC v6 23/23] timekeeping: inform clockevents about freq adjustments Nicolai Stange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160909201812.32396-6-nicstange@gmail.com \
    --to=nicstange@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.