All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolai Stange <nicstange@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>,
	linux-kernel@vger.kernel.org,
	Nicolai Stange <nicstange@gmail.com>
Subject: [RFC v6 16/23] clockevents: clockevents_program_min_delta(): don't set ->next_event
Date: Fri,  9 Sep 2016 22:18:05 +0200	[thread overview]
Message-ID: <20160909201812.32396-1-nicstange@gmail.com> (raw)
In-Reply-To: <20160909200033.32103-1-nicstange@gmail.com>

Currently, clockevents_program_min_delta() sets a clockevent device's
->next_event to the point in time where the minimum delta would actually
expire:

  delta = dev->min_delta_ns;
  dev->next_event = ktime_add_ns(ktime_get(), delta);

For your reference, this is so since the initial advent of
clockevents_program_min_delta() with
  commit d1748302f70b ("clockevents: Make minimum delay adjustments
                        configurable").

clockevents_program_min_delta() is called from clockevents_program_event()
only. More specifically, it is called if the latter's force argument is set
and, neglecting the case of device programming failure for the moment, if
the requested expiry is in the past.

On the contrary, if the expiry requested from clockevents_program_event()
is in the future, but less than ->min_delta_ns behind, then
- ->next_event gets set to that expiry verbatim
- but the clockevent device gets silently programmed to fire after
  ->min_delta_ns only.

Thus, in the extreme cases of expires == ktime_get() and
expires == ktime_get() + 1, the respective values of ->next_event would
differ by ->min_delta_ns while the clockevent device would actually get
programmed to fire at (almost) the same times (with force being set,
of course).

While this discontinuity of ->next_event at expires == ktime_get() is not
a problem by itself, the mere use of ->min_delta_ns in the event
programming path hinders upcoming changes making the clockevent core
NTP correction aware: both, ->mult and ->min_delta_ns would need to get
updated as well as consumed atomically and we'd rather like to avoid any
locking here.

Thus, let clockevents_program_event() unconditionally set ->next_event to
the expiry time actually requested by its caller, i.e. don't set
->next_event from clockevents_program_min_delta().

A few notes on why this change is safe with the current consumers of
->next_event:
1.
Note that a clockevents_program_event() with a requested expiry in the
past and force being set basically means: "fire ASAP". Now, consider this
so programmed event getting handed once again to
clockevents_program_event(), i.e. that a

  clockevents_program_event(dev, dev->next_event, false)

as in __clockevents_update_freq() is done.
With this change applied, clockevents_program_event() would now properly
detect the expiry being in the past and, due to the force argument being
unset, wouldn't actually do anything.
Before this change OTOH, there would be the (very unlikely) possibility
that the requested event is still somewhere in the future and
clockevents_program_event() would silently delay the event expiration by
another ->min_delta_ns.

2.
The periodic tick handlers on oneshot-only devices use ->next_event
to calculate the followup expiry time.
tick_handle_periodic() spins on reprogramming the clockevent device
until some expiry in the future has been reached:

  ktime_t next = dev->next_event;
  ...
  for(;;) {
    next = ktime_add(next, tick_period);
    if (!clockevents_program_event(dev, next, false))
      return;
    ...
  }

Thus, tick_handle_periodic() isn't affected by this change.
For tick_handle_periodic_broadcast(), the situation is different since

  commit 2951d5c031a3 ("tick: broadcast: Prevent livelock from event
                        handler")

though: a loop similar to the one from tick_handle_periodic() above got
replaced by a single

  ktime_t next = ktime_add(dev->next_event, tick_period);
  clockevents_program_event(dev, next, true);

In the case that dev->next_event + tick_period happens to be less than
ktime_get() + ->min_delta_ns, without this change applied, ->next_event
would get recovered to some point in the future after a single
tick_handle_periodic_broadcast() event.
On the contrary, with this patch applied, it could potentially take some
number of tick_handle_periodic_broadcast() events, each separated by
->min_delta_ns only, until ->next_event is able to catch up with the
current ktime_get(). However, if this turns out to become a problem,
the reprogramming loop in tick_handle_periodic_broadcast() can probably
be restored easily.

3.
In kernel/time/tick-broadcast.c, the broadcast receiving clockevent
devices' ->next_event is read multiple times in order to determine who's
next or who must be pinged. These uses all continue to work. Moreover,
clockevent devices getting programmed to something less than
ktime_get() + ->min_delta_ns
might not be the best candidates for a transition into C3 anyway.

4.
Finally, a "sleep length" is calculated at the very end of
tick_nohz_stop_sched_tick() as follows:

  ts->sleep_length = ktime_sub(dev->next_event, now);

AFAICS, this can happen to be negative w/o this change applied already: in
NOHZ_MODE_HIGHRES mode there can be some overdue hrtimers whose removal is
blocked because tick_nohz_stop_sched_tick() gets called with interrupts
disabled. Unfortunately, the only user, the menu cpuidle governor,
can't cope with negative sleep lengths as it casts the return value
of the tick_nohz_get_sleep_length() getter to an unsigned int.
This change can very well make things worse here. A followup patch
will force this ->sleep_length to >= 0.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
---
 kernel/time/clockevents.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index f41f584..8983fee 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -252,7 +252,6 @@ static int clockevents_program_min_delta(struct clock_event_device *dev)
 
 	for (i = 0;;) {
 		delta = dev->min_delta_ns;
-		dev->next_event = ktime_add_ns(ktime_get(), delta);
 
 		if (clockevent_state_shutdown(dev))
 			return 0;
@@ -289,7 +288,6 @@ static int clockevents_program_min_delta(struct clock_event_device *dev)
 	int64_t delta;
 
 	delta = dev->min_delta_ns;
-	dev->next_event = ktime_add_ns(ktime_get(), delta);
 
 	if (clockevent_state_shutdown(dev))
 		return 0;
-- 
2.9.3

  parent reply	other threads:[~2016-09-09 20:18 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-09 20:00 [RFC v6 00/23] adapt clockevents frequencies to mono clock Nicolai Stange
2016-09-09 20:00 ` [RFC v6 01/23] clocksource: sh_cmt: compute rate before registration again Nicolai Stange
2016-09-10 12:56   ` Thomas Gleixner
2016-09-10 18:51     ` Joe Perches
2016-09-10 19:11     ` Nicolai Stange
2016-09-09 20:00 ` [RFC v6 02/23] clocksource: sh_tmu: " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 03/23] clocksource: em_sti: split clock prepare and enable steps Nicolai Stange
2016-09-09 20:00 ` [RFC v6 04/23] clocksource: em_sti: compute rate before registration Nicolai Stange
2016-09-09 20:00 ` [RFC v6 05/23] clocksource: h8300_timer8: don't reset rate in ->set_state_oneshot() Nicolai Stange
2016-09-09 20:00 ` [RFC v6 06/23] clockevents: make clockevents_config() static Nicolai Stange
2016-09-09 20:00 ` [RFC v6 07/23] many clockevent drivers: set ->min_delta_ticks and ->max_delta_ticks Nicolai Stange
2016-09-09 20:00 ` [RFC v6 08/23] arch/s390/kernel/time: " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 09/23] arch/x86/platform/uv/uv_time: " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 10/23] arch/tile/kernel/time: " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 11/23] clockevents: always initialize ->min_delta_ns and ->max_delta_ns Nicolai Stange
2016-09-09 20:00 ` [RFC v6 12/23] many clockevent drivers: don't set " Nicolai Stange
2016-09-09 20:00 ` [RFC v6 13/23] clockevents: introduce CLOCK_EVT_FEAT_NO_ADJUST flag Nicolai Stange
2016-09-09 20:00 ` [RFC v6 14/23] clockevents: decouple ->max_delta_ns from ->max_delta_ticks Nicolai Stange
2016-09-09 20:00 ` [RFC v6 15/23] clockevents: do comparison of delta against minimum in terms of cycles Nicolai Stange
2016-09-09 20:18 ` Nicolai Stange [this message]
2016-09-09 20:18 ` [RFC v6 17/23] clockevents: use ->min_delta_ticks_adjusted to program minimum delta Nicolai Stange
2016-09-09 20:18 ` [RFC v6 18/23] clockevents: min delta increment: calculate min_delta_ns from ticks Nicolai Stange
2016-09-09 20:18 ` [RFC v6 19/23] timer_list: print_tickdevice(): calculate ->min_delta_ns dynamically Nicolai Stange
2016-09-09 20:18 ` [RFC v6 20/23] clockevents: purge ->min_delta_ns Nicolai Stange
2016-09-09 20:18 ` [RFC v6 21/23] clockevents: initial support for mono to raw time conversion Nicolai Stange
2016-09-09 20:18 ` [RFC v6 22/23] clockevents: make setting of ->mult and ->mult_adjusted atomic Nicolai Stange
2016-09-09 20:18 ` [RFC v6 23/23] timekeeping: inform clockevents about freq adjustments Nicolai Stange

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160909201812.32396-1-nicstange@gmail.com \
    --to=nicstange@gmail.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.