All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v3 0/3] adapt clockevents frequencies to mono clock
@ 2016-07-13 13:00 Nicolai Stange
  2016-07-13 13:00 ` [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion Nicolai Stange
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Nicolai Stange @ 2016-07-13 13:00 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: John Stultz, linux-kernel, Nicolai Stange

This series is a split-off of the arguable and non-x86 related parts
from the "avoid double timer interrupt with nohz and Intel TSC" v2 series
(http://lkml.kernel.org/g/20160710193047.18320-1-nicstange@gmail.com).

The goal is to make the clockevents core take the dynamic frequency
adjustments of the monotonic clock into account.

My first attempt, [4/4] ("kernel/time/clockevents: compensate for
monotonic clock's dynamic frequency), raised concerns with regard to
performance (way too much "math in the CE programming path"). Thomas
Gleixner asked me to provide an initial patch doing the necessary
adjustments on the clockevents devices' frequencies instead.

Here it is.

Known issues:
- The way the export of the mono and raw clock's ->mult from timekeeping
  to clockevents is done is ugly. I'd rather make tk_core non-static. But
  for this POC, it's fine.

- The patchset assumes that a clockevent device's ->mult is changed after
  registration only through calls to clockevents_update_freq().
  For a handful of non-x86 drivers this isn't the case.

- ->min_delta_ns and ->max_delta_ns vs ->mult_mono:
  In clockevents_program_event(), we had
   	delta = min(delta, (int64_t) dev->max_delta_ns);
	delta = max(delta, (int64_t) dev->min_delta_ns);
	clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
  The dev->mult is replaced with the dynamically adjusted dev->mult_mono
  by this series. That's problematic since as I understand it, especially
  ->max_delta_ns is a hard limit preventing the clockevent devices counter
  to be programmed with values larger than its width allows for.
  If ->mult_mono happens to be only slightly larger than ->mult, the
  comparison of delta against the ->mult based ->max_delta_ns can pass
  although the final clc might actually be larger than allowed.
  I think what we really want to have at this place is a check of clc
  against the already present ->min_delta_cycles and ->max_delta_cycles.

  The problem with this approach is that many drivers (~40) initialize
  ->min_delta_ns and ->max_delta_ns (typically with clockevent_delta2ns())
  but not the ->*_delta_cycles members. My suggestion at this point would
  be to convert them. This makes ->max_delta_ns obsolete right away.

  ->min_delta_ns is still needed in order to set the ->next_event in
  clockevents_program_min_delta() though. My claim is that
  ->min_delta_ns can be safely replaced with 0 in
  clockevents_program_min_delta(), i.e. that
     dev->next_event = ktime_add_ns(ktime_get(), delta);
  can be replaced with
     dev->next_event = ktime_get();
  Reasoning:
  1. ->next_event is consumed only from __clockevents_update_freq():
     clockevents_program_event(dev, dev->next_event, false);
  2. Either way dev->next_event is set from
     clockevents_program_min_delta(), the above clockevents_program_event()
     will hit the min limit and thus, reprogram with the min delta again.

  Another place where ->min_delta_ns is used is
  clockevents_increase_min_delta(). But this one should be invoked seldomly
  and it would probably be OK to do derive the needed value from
  ->min_delta_cycles by means of the math-intensive clockevent_delta2ns().

  Summarizing: if all clockevent drivers are converted to set
  ->min_delta_cycles and ->max_delta_cycles, then it might be possible to
  get rid of ->min_delta_ns and ->max_delta_ns.

Applicable to linux-next 20160712. The patches depend on each other in the
order given.

Nicolai Stange (3):
  kernel/time/clockevents: initial support for mono to raw time
    conversion
  kernel/time/clockevents: make setting of ->mult and ->mult_mono atomic
  kernel/time/timekeeping: inform clockevents about freq adjustments

 include/linux/clockchips.h   |  1 +
 kernel/time/clockevents.c    | 81 +++++++++++++++++++++++++++++++++++++++++---
 kernel/time/tick-broadcast.c |  5 +--
 kernel/time/tick-internal.h  |  2 ++
 kernel/time/timekeeping.c    | 11 ++++++
 5 files changed, 94 insertions(+), 6 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion
  2016-07-13 13:00 [RFC v3 0/3] adapt clockevents frequencies to mono clock Nicolai Stange
@ 2016-07-13 13:00 ` Nicolai Stange
  2016-07-21 18:08   ` John Stultz
  2016-07-13 13:00 ` [RFC v3 2/3] kernel/time/clockevents: make setting of ->mult and ->mult_mono atomic Nicolai Stange
  2016-07-13 13:00 ` [RFC v3 3/3] kernel/time/timekeeping: inform clockevents about freq adjustments Nicolai Stange
  2 siblings, 1 reply; 9+ messages in thread
From: Nicolai Stange @ 2016-07-13 13:00 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: John Stultz, linux-kernel, Nicolai Stange

With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
would expect approximately one clockevent interrupt per second. However, on
my Intel Haswell where the monotonic clock is the TSC monotonic clock and
the clockevent device is the TSC deadline device, it turns out that every
second, there are two such interrupts: the first one arrives always
approximately ~50us before the scheduled deadline as programmed by
tick_nohz_stop_sched_tick() through the hrtimer API. The
__hrtimer_run_queues() called in this interrupt detects that the queued
tick_sched_timer hasn't expired yet and simply does nothing except
reprogramming the clock event device to fire shortly after again.

These too early programmed deadlines are explained as follows:
clockevents_program_event() programs the clockevent device to fire
after
  f_event * delta_t_progr
clockevent device cycles where f_event is the clockevent device's hardware
frequency and delta_t_progr is the requested time interval. After that many
clockevent device cycles have elapsed, the device underlying the monotonic
clock, that is the monotonic raw clock has seen f_raw / f_event as many
cycles.
The ktime_get() called from __hrtimer_run_queues() interprets those
cycles to run at the frequency of the monotonic clock. Summarizing:
  delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
               = f_raw / f_mono * delta_t_progr
with f_mono being the monotonic clock's frequency and delta_t_perc being
the elapsed time interval as perceived by __hrtimer_run_queues().

Now, f_mono is not a constant, but is dynamically adjusted in
timekeeping_adjust() in order to compensate for the NTP error. With the
large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
becomes significant and results in the double timer interrupts described
above.

Compensate for this error by multiplying the clockevent device's f_event
by f_mono/f_raw.

Namely:
- Introduce a ->mult_mono member to the struct clock_event_device. It's
  value is supposed to be equal to ->mult * f_mono/f_raw.
- Introduce the timekeeping_get_mono_mult() helper which provides
  the clockevent core with access to the timekeeping's current f_mono
  and f_raw.
- Introduce the helper __clockevents_adjust_freq() which
  sets a clockevent device's ->mult_mono member as appropriate. It is
  implemented with the help of the new __clockevents_calc_adjust_freq().
- Call __clockevents_adjust_freq() at clockevent device registration time
  as well as at frequency updates through clockevents_update_freq().
- Finally, use the ->mult_mono rather than ->mult in the ns to cycle
  conversion made in clockevents_program_event().

Note that future adjustments of the monotonic clock are not taken into
account yet. Furthemore, this patch assumes that after a clockevent
device's registration, its ->mult changes only through calls to
clockevents_update_freq().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
 include/linux/clockchips.h  |  1 +
 kernel/time/clockevents.c   | 49 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/time/tick-internal.h |  1 +
 kernel/time/timekeeping.c   |  8 ++++++++
 4 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 0d442e3..2863742 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -104,6 +104,7 @@ struct clock_event_device {
 	u64			max_delta_ns;
 	u64			min_delta_ns;
 	u32			mult;
+	u32			mult_mono;
 	u32			shift;
 	enum clock_event_state	state_use_accessors;
 	unsigned int		features;
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index a9b76a4..ba7fea4 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -33,6 +33,8 @@ struct ce_unbind {
 	int res;
 };
 
+static void __clockevents_adjust_freq(struct clock_event_device *dev);
+
 static u64 cev_delta2ns(unsigned long latch, struct clock_event_device *evt,
 			bool ismax)
 {
@@ -166,6 +168,7 @@ void clockevents_switch_state(struct clock_event_device *dev,
 		if (clockevent_state_oneshot(dev)) {
 			if (unlikely(!dev->mult)) {
 				dev->mult = 1;
+				dev->mult_mono = 1;
 				WARN_ON(1);
 			}
 		}
@@ -335,7 +338,7 @@ int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
 	delta = min(delta, (int64_t) dev->max_delta_ns);
 	delta = max(delta, (int64_t) dev->min_delta_ns);
 
-	clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
+	clc = ((unsigned long long) delta * dev->mult_mono) >> dev->shift;
 	rc = dev->set_next_event((unsigned long) clc, dev);
 
 	return (rc && force) ? clockevents_program_min_delta(dev) : rc;
@@ -458,6 +461,8 @@ void clockevents_register_device(struct clock_event_device *dev)
 		dev->cpumask = cpumask_of(smp_processor_id());
 	}
 
+	__clockevents_adjust_freq(dev);
+
 	raw_spin_lock_irqsave(&clockevents_lock, flags);
 
 	list_add(&dev->list, &clockevent_devices);
@@ -512,9 +517,51 @@ void clockevents_config_and_register(struct clock_event_device *dev,
 }
 EXPORT_SYMBOL_GPL(clockevents_config_and_register);
 
+static u32 __clockevents_calc_adjust_freq(u32 mult_ce_raw, u32 mult_cs_mono,
+					u32 mult_cs_raw)
+{
+	u64 adj;
+	int sign;
+
+	if (mult_cs_raw >= mult_cs_mono) {
+		sign = 0;
+		adj = mult_cs_raw - mult_cs_mono;
+	} else {
+		sign = 1;
+		adj = mult_cs_mono - mult_cs_raw;
+	}
+
+	adj *= mult_ce_raw;
+	adj += mult_cs_mono / 2;
+	do_div(adj, mult_cs_mono);
+
+	if (!sign) {
+		if (U32_MAX - mult_ce_raw < adj)
+			return U32_MAX;
+		return mult_ce_raw + (u32)adj;
+	}
+	if (adj >= mult_ce_raw)
+		return 1;
+	return mult_ce_raw - (u32)adj;
+}
+
+void __clockevents_adjust_freq(struct clock_event_device *dev)
+{
+	u32 mult_cs_mono, mult_cs_raw;
+
+	if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT))
+		return;
+
+	timekeeping_get_mono_mult(&mult_cs_mono, &mult_cs_raw);
+	dev->mult_mono = __clockevents_calc_adjust_freq(dev->mult,
+							mult_cs_mono,
+							mult_cs_raw);
+}
+
 int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
 {
 	clockevents_config(dev, freq);
+	__clockevents_adjust_freq(dev);
 
 	if (clockevent_state_oneshot(dev))
 		return clockevents_program_event(dev, dev->next_event, false);
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index f738251..0b29d23 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -56,6 +56,7 @@ extern int clockevents_program_event(struct clock_event_device *dev,
 				     ktime_t expires, bool force);
 extern void clockevents_handle_noop(struct clock_event_device *dev);
 extern int __clockevents_update_freq(struct clock_event_device *dev, u32 freq);
+extern void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw);
 extern ssize_t sysfs_get_uname(const char *buf, char *dst, size_t cnt);
 
 /* Broadcasting support */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index dcd5ce6..a011ae1 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -330,6 +330,14 @@ static inline s64 timekeeping_cycles_to_ns(struct tk_read_base *tkr,
 	return timekeeping_delta_to_ns(tkr, delta);
 }
 
+void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw)
+{
+	struct tk_read_base *tkr_mono = &tk_core.timekeeper.tkr_mono;
+
+	*mult_cs_mono = tkr_mono->mult;
+	*mult_cs_raw = tkr_mono->clock->mult;
+}
+
 /**
  * update_fast_timekeeper - Update the fast and NMI safe monotonic timekeeper.
  * @tkr: Timekeeping readout base from which we take the update
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v3 2/3] kernel/time/clockevents: make setting of ->mult and ->mult_mono atomic
  2016-07-13 13:00 [RFC v3 0/3] adapt clockevents frequencies to mono clock Nicolai Stange
  2016-07-13 13:00 ` [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion Nicolai Stange
@ 2016-07-13 13:00 ` Nicolai Stange
  2016-07-21 18:16   ` John Stultz
  2016-07-13 13:00 ` [RFC v3 3/3] kernel/time/timekeeping: inform clockevents about freq adjustments Nicolai Stange
  2 siblings, 1 reply; 9+ messages in thread
From: Nicolai Stange @ 2016-07-13 13:00 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: John Stultz, linux-kernel, Nicolai Stange

In order to avoid races between setting a struct clock_event_device's
->mult_mono in clockevents_update_freq() and yet to be implemented updates
triggered from the timekeeping core, the setting of ->mult and ->mult_mono
should be made atomic.

Protect the update in clockevents_update_freq() by locking the
clockevents_lock spinlock. Frequency updates are expected to be done
seldomly and thus, taking this subsystem lock should not have any impact
on performance.

Use a raw_spin_lock_irq_save()/raw_spin_unlock_irq_restore() pair for
locking/unlocking the clockevents_lock spinlock.
Purge the now redundant local_irq_save()/local_irq_restore() pair from
clockevents_update_freq(). Since the call to tick_broadcast_update_freq()
isn't done with interrupts disabled anymore,  its
raw_spin_lock()/raw_spin_unlock() pair must be converted to
raw_spin_lock_irq_save()/raw_spin_unlock_irq_restore().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
 kernel/time/clockevents.c    | 7 ++++---
 kernel/time/tick-broadcast.c | 5 +++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index ba7fea4..ec01375 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -589,11 +589,12 @@ int clockevents_update_freq(struct clock_event_device *dev, u32 freq)
 	unsigned long flags;
 	int ret;
 
-	local_irq_save(flags);
 	ret = tick_broadcast_update_freq(dev, freq);
-	if (ret == -ENODEV)
+	if (ret == -ENODEV) {
+		raw_spin_lock_irqsave(&clockevents_lock, flags);
 		ret = __clockevents_update_freq(dev, freq);
-	local_irq_restore(flags);
+		raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+	}
 	return ret;
 }
 
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f6aae79..9c94c41 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -125,11 +125,12 @@ int tick_is_broadcast_device(struct clock_event_device *dev)
 int tick_broadcast_update_freq(struct clock_event_device *dev, u32 freq)
 {
 	int ret = -ENODEV;
+	unsigned long flags;
 
 	if (tick_is_broadcast_device(dev)) {
-		raw_spin_lock(&tick_broadcast_lock);
+		raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
 		ret = __clockevents_update_freq(dev, freq);
-		raw_spin_unlock(&tick_broadcast_lock);
+		raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 	}
 	return ret;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC v3 3/3] kernel/time/timekeeping: inform clockevents about freq adjustments
  2016-07-13 13:00 [RFC v3 0/3] adapt clockevents frequencies to mono clock Nicolai Stange
  2016-07-13 13:00 ` [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion Nicolai Stange
  2016-07-13 13:00 ` [RFC v3 2/3] kernel/time/clockevents: make setting of ->mult and ->mult_mono atomic Nicolai Stange
@ 2016-07-13 13:00 ` Nicolai Stange
  2 siblings, 0 replies; 9+ messages in thread
From: Nicolai Stange @ 2016-07-13 13:00 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: John Stultz, linux-kernel, Nicolai Stange

Upon adjustments of the monotonic clock's frequencies from the
timekeeping core, the clockevents devices' ->mult_mono should be changed
accordingly, too.

Introduce clockevents_adjust_all_freqs() which traverses all registered
clockevent devices and recalculates their ->mult_mono based on the
monotonic clock's current frequency.

Call clockevents_adjust_all_freqs() from timekeeping_apply_adjustment().

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
 kernel/time/clockevents.c   | 25 +++++++++++++++++++++++++
 kernel/time/tick-internal.h |  1 +
 kernel/time/timekeeping.c   |  3 +++
 3 files changed, 29 insertions(+)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index ec01375..f3fb1e5 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -558,6 +558,31 @@ void __clockevents_adjust_freq(struct clock_event_device *dev)
 							mult_cs_raw);
 }
 
+void clockevents_adjust_all_freqs(u32 mult_cs_mono, u32 mult_cs_raw)
+{
+	u32 last_mult_raw = 0, last_mult_mono = 0;
+	u32 mult_raw;
+	unsigned long flags;
+	struct clock_event_device *dev;
+
+	raw_spin_lock_irqsave(&clockevents_lock, flags);
+	list_for_each_entry(dev, &clockevent_devices, list) {
+		if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT))
+			continue;
+
+		mult_raw = dev->mult;
+		if (mult_raw != last_mult_raw) {
+			last_mult_raw = mult_raw;
+			last_mult_mono =
+				__clockevents_calc_adjust_freq(mult_raw,
+							mult_cs_mono,
+							mult_cs_raw);
+		}
+		dev->mult_mono = last_mult_mono;
+	}
+	raw_spin_unlock_irqrestore(&clockevents_lock, flags);
+}
+
 int __clockevents_update_freq(struct clock_event_device *dev, u32 freq)
 {
 	clockevents_config(dev, freq);
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 0b29d23..9162671 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -56,6 +56,7 @@ extern int clockevents_program_event(struct clock_event_device *dev,
 				     ktime_t expires, bool force);
 extern void clockevents_handle_noop(struct clock_event_device *dev);
 extern int __clockevents_update_freq(struct clock_event_device *dev, u32 freq);
+extern void clockevents_adjust_all_freqs(u32 mult_cs_mono, u32 mult_cs_raw);
 extern void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw);
 extern ssize_t sysfs_get_uname(const char *buf, char *dst, size_t cnt);
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index a011ae1..e98f67c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1852,6 +1852,9 @@ static __always_inline void timekeeping_apply_adjustment(struct timekeeper *tk,
 	tk->xtime_interval += interval;
 	tk->tkr_mono.xtime_nsec -= offset;
 	tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
+
+	clockevents_adjust_all_freqs(tk->tkr_mono.mult,
+				tk->tkr_mono.clock->mult);
 }
 
 /*
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion
  2016-07-13 13:00 ` [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion Nicolai Stange
@ 2016-07-21 18:08   ` John Stultz
  2016-07-21 19:11     ` Nicolai Stange
  0 siblings, 1 reply; 9+ messages in thread
From: John Stultz @ 2016-07-21 18:08 UTC (permalink / raw)
  To: Nicolai Stange; +Cc: Thomas Gleixner, lkml

On Wed, Jul 13, 2016 at 6:00 AM, Nicolai Stange <nicstange@gmail.com> wrote:
> With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
> would expect approximately one clockevent interrupt per second. However, on
> my Intel Haswell where the monotonic clock is the TSC monotonic clock and
> the clockevent device is the TSC deadline device, it turns out that every
> second, there are two such interrupts: the first one arrives always
> approximately ~50us before the scheduled deadline as programmed by
> tick_nohz_stop_sched_tick() through the hrtimer API. The
> __hrtimer_run_queues() called in this interrupt detects that the queued
> tick_sched_timer hasn't expired yet and simply does nothing except
> reprogramming the clock event device to fire shortly after again.
>
> These too early programmed deadlines are explained as follows:
> clockevents_program_event() programs the clockevent device to fire
> after
>   f_event * delta_t_progr
> clockevent device cycles where f_event is the clockevent device's hardware
> frequency and delta_t_progr is the requested time interval. After that many
> clockevent device cycles have elapsed, the device underlying the monotonic
> clock, that is the monotonic raw clock has seen f_raw / f_event as many
> cycles.
> The ktime_get() called from __hrtimer_run_queues() interprets those
> cycles to run at the frequency of the monotonic clock. Summarizing:
>   delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
>                = f_raw / f_mono * delta_t_progr
> with f_mono being the monotonic clock's frequency and delta_t_perc being
> the elapsed time interval as perceived by __hrtimer_run_queues().
>
> Now, f_mono is not a constant, but is dynamically adjusted in
> timekeeping_adjust() in order to compensate for the NTP error. With the
> large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
> becomes significant and results in the double timer interrupts described
> above.
>
> Compensate for this error by multiplying the clockevent device's f_event
> by f_mono/f_raw.
>
> Namely:
> - Introduce a ->mult_mono member to the struct clock_event_device. It's
>   value is supposed to be equal to ->mult * f_mono/f_raw.
> - Introduce the timekeeping_get_mono_mult() helper which provides
>   the clockevent core with access to the timekeeping's current f_mono
>   and f_raw.
> - Introduce the helper __clockevents_adjust_freq() which
>   sets a clockevent device's ->mult_mono member as appropriate. It is
>   implemented with the help of the new __clockevents_calc_adjust_freq().
> - Call __clockevents_adjust_freq() at clockevent device registration time
>   as well as at frequency updates through clockevents_update_freq().
> - Finally, use the ->mult_mono rather than ->mult in the ns to cycle
>   conversion made in clockevents_program_event().
>
> Note that future adjustments of the monotonic clock are not taken into
> account yet. Furthemore, this patch assumes that after a clockevent
> device's registration, its ->mult changes only through calls to
> clockevents_update_freq().

Sorry for being a little slow to review here. Been swamped.

I was about to queue this but had a few nits that need addressing.


> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
> ---
>  include/linux/clockchips.h  |  1 +
>  kernel/time/clockevents.c   | 49 ++++++++++++++++++++++++++++++++++++++++++++-
>  kernel/time/tick-internal.h |  1 +
>  kernel/time/timekeeping.c   |  8 ++++++++
>  4 files changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
> index 0d442e3..2863742 100644
> --- a/include/linux/clockchips.h
> +++ b/include/linux/clockchips.h
> @@ -104,6 +104,7 @@ struct clock_event_device {
>         u64                     max_delta_ns;
>         u64                     min_delta_ns;
>         u32                     mult;
> +       u32                     mult_mono;

So in this context(for me at least), mult and mult_mono are a bit
confusing.  I tend to think of it as mult and mult_raw, but in this
case mult is the "raw" unmodified value and mult_mono is the adjusted
one.

I'd probably suggest mult_adjusted or some other name to make it more
clear how it differs from the clockevent mult.

>
> +void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw)
> +{
> +       struct tk_read_base *tkr_mono = &tk_core.timekeeper.tkr_mono;
> +
> +       *mult_cs_mono = tkr_mono->mult;
> +       *mult_cs_raw = tkr_mono->clock->mult;
> +}

So.. you probably should have some locking here. Or at least a big
comment making it clear why locking isn't necessary.

thanks
-john

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v3 2/3] kernel/time/clockevents: make setting of ->mult and ->mult_mono atomic
  2016-07-13 13:00 ` [RFC v3 2/3] kernel/time/clockevents: make setting of ->mult and ->mult_mono atomic Nicolai Stange
@ 2016-07-21 18:16   ` John Stultz
  2016-07-21 19:24     ` Nicolai Stange
  0 siblings, 1 reply; 9+ messages in thread
From: John Stultz @ 2016-07-21 18:16 UTC (permalink / raw)
  To: Nicolai Stange; +Cc: Thomas Gleixner, lkml

On Wed, Jul 13, 2016 at 6:00 AM, Nicolai Stange <nicstange@gmail.com> wrote:
> In order to avoid races between setting a struct clock_event_device's
> ->mult_mono in clockevents_update_freq() and yet to be implemented updates
> triggered from the timekeeping core, the setting of ->mult and ->mult_mono
> should be made atomic.
>
> Protect the update in clockevents_update_freq() by locking the
> clockevents_lock spinlock. Frequency updates are expected to be done
> seldomly and thus, taking this subsystem lock should not have any impact
> on performance.
>
> Use a raw_spin_lock_irq_save()/raw_spin_unlock_irq_restore() pair for
> locking/unlocking the clockevents_lock spinlock.
> Purge the now redundant local_irq_save()/local_irq_restore() pair from
> clockevents_update_freq(). Since the call to tick_broadcast_update_freq()
> isn't done with interrupts disabled anymore,  its
> raw_spin_lock()/raw_spin_unlock() pair must be converted to
> raw_spin_lock_irq_save()/raw_spin_unlock_irq_restore().
>
> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
> ---
>  kernel/time/clockevents.c    | 7 ++++---
>  kernel/time/tick-broadcast.c | 5 +++--
>  2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index ba7fea4..ec01375 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -589,11 +589,12 @@ int clockevents_update_freq(struct clock_event_device *dev, u32 freq)
>         unsigned long flags;
>         int ret;
>
> -       local_irq_save(flags);
>         ret = tick_broadcast_update_freq(dev, freq);
> -       if (ret == -ENODEV)
> +       if (ret == -ENODEV) {
> +               raw_spin_lock_irqsave(&clockevents_lock, flags);
>                 ret = __clockevents_update_freq(dev, freq);
> -       local_irq_restore(flags);
> +               raw_spin_unlock_irqrestore(&clockevents_lock, flags);
> +       }
>         return ret;
>  }
>
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index f6aae79..9c94c41 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -125,11 +125,12 @@ int tick_is_broadcast_device(struct clock_event_device *dev)
>  int tick_broadcast_update_freq(struct clock_event_device *dev, u32 freq)
>  {
>         int ret = -ENODEV;
> +       unsigned long flags;
>
>         if (tick_is_broadcast_device(dev)) {
> -               raw_spin_lock(&tick_broadcast_lock);
> +               raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
>                 ret = __clockevents_update_freq(dev, freq);
> -               raw_spin_unlock(&tick_broadcast_lock);
> +               raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
>         }


So not necessarily part of your change, but this makes using
tick_broadcast_update_freq() seem strange.

We call it and if dev is a broadcast_device we call
__clockevents_update_freq(), and if not, it fails and we then just
call __clockevents_update_freq() again?

Why bother calling tick_broadcast_update_freq here, and instead just
call __clockevents_update_freq() directly the first time?

thanks
-john

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion
  2016-07-21 18:08   ` John Stultz
@ 2016-07-21 19:11     ` Nicolai Stange
  0 siblings, 0 replies; 9+ messages in thread
From: Nicolai Stange @ 2016-07-21 19:11 UTC (permalink / raw)
  To: John Stultz; +Cc: Nicolai Stange, Thomas Gleixner, lkml

John Stultz <john.stultz@linaro.org> writes:

> On Wed, Jul 13, 2016 at 6:00 AM, Nicolai Stange <nicstange@gmail.com> wrote:
>> With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
>> would expect approximately one clockevent interrupt per second. However, on
>> my Intel Haswell where the monotonic clock is the TSC monotonic clock and
>> the clockevent device is the TSC deadline device, it turns out that every
>> second, there are two such interrupts: the first one arrives always
>> approximately ~50us before the scheduled deadline as programmed by
>> tick_nohz_stop_sched_tick() through the hrtimer API. The
>> __hrtimer_run_queues() called in this interrupt detects that the queued
>> tick_sched_timer hasn't expired yet and simply does nothing except
>> reprogramming the clock event device to fire shortly after again.
>>
>> These too early programmed deadlines are explained as follows:
>> clockevents_program_event() programs the clockevent device to fire
>> after
>>   f_event * delta_t_progr
>> clockevent device cycles where f_event is the clockevent device's hardware
>> frequency and delta_t_progr is the requested time interval. After that many
>> clockevent device cycles have elapsed, the device underlying the monotonic
>> clock, that is the monotonic raw clock has seen f_raw / f_event as many
>> cycles.
>> The ktime_get() called from __hrtimer_run_queues() interprets those
>> cycles to run at the frequency of the monotonic clock. Summarizing:
>>   delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
>>                = f_raw / f_mono * delta_t_progr
>> with f_mono being the monotonic clock's frequency and delta_t_perc being
>> the elapsed time interval as perceived by __hrtimer_run_queues().
>>
>> Now, f_mono is not a constant, but is dynamically adjusted in
>> timekeeping_adjust() in order to compensate for the NTP error. With the
>> large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
>> becomes significant and results in the double timer interrupts described
>> above.
>>
>> Compensate for this error by multiplying the clockevent device's f_event
>> by f_mono/f_raw.
>>
>> Namely:
>> - Introduce a ->mult_mono member to the struct clock_event_device. It's
>>   value is supposed to be equal to ->mult * f_mono/f_raw.
>> - Introduce the timekeeping_get_mono_mult() helper which provides
>>   the clockevent core with access to the timekeeping's current f_mono
>>   and f_raw.
>> - Introduce the helper __clockevents_adjust_freq() which
>>   sets a clockevent device's ->mult_mono member as appropriate. It is
>>   implemented with the help of the new __clockevents_calc_adjust_freq().
>> - Call __clockevents_adjust_freq() at clockevent device registration time
>>   as well as at frequency updates through clockevents_update_freq().
>> - Finally, use the ->mult_mono rather than ->mult in the ns to cycle
>>   conversion made in clockevents_program_event().
>>
>> Note that future adjustments of the monotonic clock are not taken into
>> account yet. Furthemore, this patch assumes that after a clockevent
>> device's registration, its ->mult changes only through calls to
>> clockevents_update_freq().
>
> Sorry for being a little slow to review here. Been swamped.

No need for any hurry here, I don't expect this to make it into 4.8
anyway.


> I was about to queue this but had a few nits that need addressing.

There are even more "known issues" listed in the cover letter.
Given the huge amount of patches potentially required to get this into a
good shape, the question whether you want to have this at all came up
(c.f. http://lkml.kernel.org/g/alpine.DEB.2.11.1607121651580.4083@nanos).

So, once you give it a go, ideally accompanied with some comments on
which of the known issues to address and which ones to ignore, I'll send
another RFC series consisting of way more (mostly trivial) patches.


>
>> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
>> ---
>>  include/linux/clockchips.h  |  1 +
>>  kernel/time/clockevents.c   | 49 ++++++++++++++++++++++++++++++++++++++++++++-
>>  kernel/time/tick-internal.h |  1 +
>>  kernel/time/timekeeping.c   |  8 ++++++++
>>  4 files changed, 58 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
>> index 0d442e3..2863742 100644
>> --- a/include/linux/clockchips.h
>> +++ b/include/linux/clockchips.h
>> @@ -104,6 +104,7 @@ struct clock_event_device {
>>         u64                     max_delta_ns;
>>         u64                     min_delta_ns;
>>         u32                     mult;
>> +       u32                     mult_mono;
>
> So in this context(for me at least), mult and mult_mono are a bit
> confusing.  I tend to think of it as mult and mult_raw, but in this
> case mult is the "raw" unmodified value and mult_mono is the adjusted
> one.
>
> I'd probably suggest mult_adjusted or some other name to make it more
> clear how it differs from the clockevent mult.
>

Totally agreed. I didn't want to rename ->mult in order to avoid
touching clockevents driver initizalization code all over the tree.
But "mult_adjusted" is really more intuitive.


>>
>> +void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw)
>> +{
>> +       struct tk_read_base *tkr_mono = &tk_core.timekeeper.tkr_mono;
>> +
>> +       *mult_cs_mono = tkr_mono->mult;
>> +       *mult_cs_raw = tkr_mono->clock->mult;
>> +}
>
> So.. you probably should have some locking here. Or at least a big
> comment making it clear why locking isn't necessary.

Agreed. To my taste, this timekeeping_get_mono_mult is an ugly hack
anyway and I'd like to make tk_core non-static instead (c.f. cover
letter). I'm lacking the needed guts though.

Thanks,

Nicolai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v3 2/3] kernel/time/clockevents: make setting of ->mult and ->mult_mono atomic
  2016-07-21 18:16   ` John Stultz
@ 2016-07-21 19:24     ` Nicolai Stange
  2016-07-21 19:31       ` John Stultz
  0 siblings, 1 reply; 9+ messages in thread
From: Nicolai Stange @ 2016-07-21 19:24 UTC (permalink / raw)
  To: John Stultz; +Cc: Nicolai Stange, Thomas Gleixner, lkml

John Stultz <john.stultz@linaro.org> writes:

> On Wed, Jul 13, 2016 at 6:00 AM, Nicolai Stange <nicstange@gmail.com> wrote:
>> In order to avoid races between setting a struct clock_event_device's
>> ->mult_mono in clockevents_update_freq() and yet to be implemented updates
>> triggered from the timekeeping core, the setting of ->mult and ->mult_mono
>> should be made atomic.
>>
>> Protect the update in clockevents_update_freq() by locking the
>> clockevents_lock spinlock. Frequency updates are expected to be done
>> seldomly and thus, taking this subsystem lock should not have any impact
>> on performance.
>>
>> Use a raw_spin_lock_irq_save()/raw_spin_unlock_irq_restore() pair for
>> locking/unlocking the clockevents_lock spinlock.
>> Purge the now redundant local_irq_save()/local_irq_restore() pair from
>> clockevents_update_freq(). Since the call to tick_broadcast_update_freq()
>> isn't done with interrupts disabled anymore,  its
>> raw_spin_lock()/raw_spin_unlock() pair must be converted to
>> raw_spin_lock_irq_save()/raw_spin_unlock_irq_restore().
>>
>> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
>> ---
>>  kernel/time/clockevents.c    | 7 ++++---
>>  kernel/time/tick-broadcast.c | 5 +++--
>>  2 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>> index ba7fea4..ec01375 100644
>> --- a/kernel/time/clockevents.c
>> +++ b/kernel/time/clockevents.c
>> @@ -589,11 +589,12 @@ int clockevents_update_freq(struct clock_event_device *dev, u32 freq)
>>         unsigned long flags;
>>         int ret;
>>
>> -       local_irq_save(flags);
>>         ret = tick_broadcast_update_freq(dev, freq);
>> -       if (ret == -ENODEV)
>> +       if (ret == -ENODEV) {
>> +               raw_spin_lock_irqsave(&clockevents_lock, flags);
>>                 ret = __clockevents_update_freq(dev, freq);
>> -       local_irq_restore(flags);
>> +               raw_spin_unlock_irqrestore(&clockevents_lock, flags);
>> +       }
>>         return ret;
>>  }
>>
>> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
>> index f6aae79..9c94c41 100644
>> --- a/kernel/time/tick-broadcast.c
>> +++ b/kernel/time/tick-broadcast.c
>> @@ -125,11 +125,12 @@ int tick_is_broadcast_device(struct clock_event_device *dev)
>>  int tick_broadcast_update_freq(struct clock_event_device *dev, u32 freq)
>>  {
>>         int ret = -ENODEV;
>> +       unsigned long flags;
>>
>>         if (tick_is_broadcast_device(dev)) {
>> -               raw_spin_lock(&tick_broadcast_lock);
>> +               raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
>>                 ret = __clockevents_update_freq(dev, freq);
>> -               raw_spin_unlock(&tick_broadcast_lock);
>> +               raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
>>         }
>
>
> So not necessarily part of your change, but this makes using
> tick_broadcast_update_freq() seem strange.
>
> We call it and if dev is a broadcast_device we call
> __clockevents_update_freq(), and if not, it fails and we then just
> call __clockevents_update_freq() again?

Yes, but the first call is made under a different lock than the second
one.

>
> Why bother calling tick_broadcast_update_freq here, and instead just
> call __clockevents_update_freq() directly the first time?


Thanks,

Nicolai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC v3 2/3] kernel/time/clockevents: make setting of ->mult and ->mult_mono atomic
  2016-07-21 19:24     ` Nicolai Stange
@ 2016-07-21 19:31       ` John Stultz
  0 siblings, 0 replies; 9+ messages in thread
From: John Stultz @ 2016-07-21 19:31 UTC (permalink / raw)
  To: Nicolai Stange; +Cc: Thomas Gleixner, lkml

On Thu, Jul 21, 2016 at 12:24 PM, Nicolai Stange <nicstange@gmail.com> wrote:
> John Stultz <john.stultz@linaro.org> writes:
>
>> On Wed, Jul 13, 2016 at 6:00 AM, Nicolai Stange <nicstange@gmail.com> wrote:
>>> In order to avoid races between setting a struct clock_event_device's
>>> ->mult_mono in clockevents_update_freq() and yet to be implemented updates
>>> triggered from the timekeeping core, the setting of ->mult and ->mult_mono
>>> should be made atomic.
>>>
>>> Protect the update in clockevents_update_freq() by locking the
>>> clockevents_lock spinlock. Frequency updates are expected to be done
>>> seldomly and thus, taking this subsystem lock should not have any impact
>>> on performance.
>>>
>>> Use a raw_spin_lock_irq_save()/raw_spin_unlock_irq_restore() pair for
>>> locking/unlocking the clockevents_lock spinlock.
>>> Purge the now redundant local_irq_save()/local_irq_restore() pair from
>>> clockevents_update_freq(). Since the call to tick_broadcast_update_freq()
>>> isn't done with interrupts disabled anymore,  its
>>> raw_spin_lock()/raw_spin_unlock() pair must be converted to
>>> raw_spin_lock_irq_save()/raw_spin_unlock_irq_restore().
>>>
>>> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
>>> ---
>>>  kernel/time/clockevents.c    | 7 ++++---
>>>  kernel/time/tick-broadcast.c | 5 +++--
>>>  2 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
>>> index ba7fea4..ec01375 100644
>>> --- a/kernel/time/clockevents.c
>>> +++ b/kernel/time/clockevents.c
>>> @@ -589,11 +589,12 @@ int clockevents_update_freq(struct clock_event_device *dev, u32 freq)
>>>         unsigned long flags;
>>>         int ret;
>>>
>>> -       local_irq_save(flags);
>>>         ret = tick_broadcast_update_freq(dev, freq);
>>> -       if (ret == -ENODEV)
>>> +       if (ret == -ENODEV) {
>>> +               raw_spin_lock_irqsave(&clockevents_lock, flags);
>>>                 ret = __clockevents_update_freq(dev, freq);
>>> -       local_irq_restore(flags);
>>> +               raw_spin_unlock_irqrestore(&clockevents_lock, flags);
>>> +       }
>>>         return ret;
>>>  }
>>>
>>> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
>>> index f6aae79..9c94c41 100644
>>> --- a/kernel/time/tick-broadcast.c
>>> +++ b/kernel/time/tick-broadcast.c
>>> @@ -125,11 +125,12 @@ int tick_is_broadcast_device(struct clock_event_device *dev)
>>>  int tick_broadcast_update_freq(struct clock_event_device *dev, u32 freq)
>>>  {
>>>         int ret = -ENODEV;
>>> +       unsigned long flags;
>>>
>>>         if (tick_is_broadcast_device(dev)) {
>>> -               raw_spin_lock(&tick_broadcast_lock);
>>> +               raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
>>>                 ret = __clockevents_update_freq(dev, freq);
>>> -               raw_spin_unlock(&tick_broadcast_lock);
>>> +               raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
>>>         }
>>
>>
>> So not necessarily part of your change, but this makes using
>> tick_broadcast_update_freq() seem strange.
>>
>> We call it and if dev is a broadcast_device we call
>> __clockevents_update_freq(), and if not, it fails and we then just
>> call __clockevents_update_freq() again?
>
> Yes, but the first call is made under a different lock than the second
> one.

Ah. Thanks, that bit didn't stick out to me.

-john

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-21 19:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-13 13:00 [RFC v3 0/3] adapt clockevents frequencies to mono clock Nicolai Stange
2016-07-13 13:00 ` [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion Nicolai Stange
2016-07-21 18:08   ` John Stultz
2016-07-21 19:11     ` Nicolai Stange
2016-07-13 13:00 ` [RFC v3 2/3] kernel/time/clockevents: make setting of ->mult and ->mult_mono atomic Nicolai Stange
2016-07-21 18:16   ` John Stultz
2016-07-21 19:24     ` Nicolai Stange
2016-07-21 19:31       ` John Stultz
2016-07-13 13:00 ` [RFC v3 3/3] kernel/time/timekeeping: inform clockevents about freq adjustments Nicolai Stange

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.