From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753633AbcGUTLr (ORCPT <rfc822;w@1wt.eu>);
	Thu, 21 Jul 2016 15:11:47 -0400
Received: from mail-wm0-f66.google.com ([74.125.82.66]:34255 "EHLO
	mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752175AbcGUTLp (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 21 Jul 2016 15:11:45 -0400
From: Nicolai Stange <nicstange@gmail.com>
To: John Stultz <john.stultz@linaro.org>
Cc: Nicolai Stange <nicstange@gmail.com>, Thomas Gleixner <tglx@linutronix.de>,
        lkml <linux-kernel@vger.kernel.org>
Subject: Re: [RFC v3 1/3] kernel/time/clockevents: initial support for mono to raw time conversion
References: <20160713130017.8202-1-nicstange@gmail.com>
	<20160713130017.8202-2-nicstange@gmail.com>
	<CALAqxLUw4soqVKQaaojVqF=FwDjDZxUEAqQHEA7nY0xe21BJ3A@mail.gmail.com>
Date: Thu, 21 Jul 2016 21:11:41 +0200
In-Reply-To: <CALAqxLUw4soqVKQaaojVqF=FwDjDZxUEAqQHEA7nY0xe21BJ3A@mail.gmail.com>
	(John Stultz's message of "Thu, 21 Jul 2016 11:08:27 -0700")
Message-ID: <87shv2reky.fsf@gmail.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.95 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

John Stultz <john.stultz@linaro.org> writes:

> On Wed, Jul 13, 2016 at 6:00 AM, Nicolai Stange <nicstange@gmail.com> wrote:
>> With NOHZ_FULL and one single well-isolated, CPU consumptive task, one
>> would expect approximately one clockevent interrupt per second. However, on
>> my Intel Haswell where the monotonic clock is the TSC monotonic clock and
>> the clockevent device is the TSC deadline device, it turns out that every
>> second, there are two such interrupts: the first one arrives always
>> approximately ~50us before the scheduled deadline as programmed by
>> tick_nohz_stop_sched_tick() through the hrtimer API. The
>> __hrtimer_run_queues() called in this interrupt detects that the queued
>> tick_sched_timer hasn't expired yet and simply does nothing except
>> reprogramming the clock event device to fire shortly after again.
>>
>> These too early programmed deadlines are explained as follows:
>> clockevents_program_event() programs the clockevent device to fire
>> after
>>   f_event * delta_t_progr
>> clockevent device cycles where f_event is the clockevent device's hardware
>> frequency and delta_t_progr is the requested time interval. After that many
>> clockevent device cycles have elapsed, the device underlying the monotonic
>> clock, that is the monotonic raw clock has seen f_raw / f_event as many
>> cycles.
>> The ktime_get() called from __hrtimer_run_queues() interprets those
>> cycles to run at the frequency of the monotonic clock. Summarizing:
>>   delta_t_perc = 1/f_mono * f_raw/f_event * f_event * delta_t_progr
>>                = f_raw / f_mono * delta_t_progr
>> with f_mono being the monotonic clock's frequency and delta_t_perc being
>> the elapsed time interval as perceived by __hrtimer_run_queues().
>>
>> Now, f_mono is not a constant, but is dynamically adjusted in
>> timekeeping_adjust() in order to compensate for the NTP error. With the
>> large values of delta_t_progr of 10^9ns with NOHZ_FULL, the error made
>> becomes significant and results in the double timer interrupts described
>> above.
>>
>> Compensate for this error by multiplying the clockevent device's f_event
>> by f_mono/f_raw.
>>
>> Namely:
>> - Introduce a ->mult_mono member to the struct clock_event_device. It's
>>   value is supposed to be equal to ->mult * f_mono/f_raw.
>> - Introduce the timekeeping_get_mono_mult() helper which provides
>>   the clockevent core with access to the timekeeping's current f_mono
>>   and f_raw.
>> - Introduce the helper __clockevents_adjust_freq() which
>>   sets a clockevent device's ->mult_mono member as appropriate. It is
>>   implemented with the help of the new __clockevents_calc_adjust_freq().
>> - Call __clockevents_adjust_freq() at clockevent device registration time
>>   as well as at frequency updates through clockevents_update_freq().
>> - Finally, use the ->mult_mono rather than ->mult in the ns to cycle
>>   conversion made in clockevents_program_event().
>>
>> Note that future adjustments of the monotonic clock are not taken into
>> account yet. Furthemore, this patch assumes that after a clockevent
>> device's registration, its ->mult changes only through calls to
>> clockevents_update_freq().
>
> Sorry for being a little slow to review here. Been swamped.

No need for any hurry here, I don't expect this to make it into 4.8
anyway.


> I was about to queue this but had a few nits that need addressing.

There are even more "known issues" listed in the cover letter.
Given the huge amount of patches potentially required to get this into a
good shape, the question whether you want to have this at all came up
(c.f. http://lkml.kernel.org/g/alpine.DEB.2.11.1607121651580.4083@nanos).

So, once you give it a go, ideally accompanied with some comments on
which of the known issues to address and which ones to ignore, I'll send
another RFC series consisting of way more (mostly trivial) patches.


>
>> Signed-off-by: Nicolai Stange <nicstange@gmail.com>
>> ---
>>  include/linux/clockchips.h  |  1 +
>>  kernel/time/clockevents.c   | 49 ++++++++++++++++++++++++++++++++++++++++++++-
>>  kernel/time/tick-internal.h |  1 +
>>  kernel/time/timekeeping.c   |  8 ++++++++
>>  4 files changed, 58 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
>> index 0d442e3..2863742 100644
>> --- a/include/linux/clockchips.h
>> +++ b/include/linux/clockchips.h
>> @@ -104,6 +104,7 @@ struct clock_event_device {
>>         u64                     max_delta_ns;
>>         u64                     min_delta_ns;
>>         u32                     mult;
>> +       u32                     mult_mono;
>
> So in this context(for me at least), mult and mult_mono are a bit
> confusing.  I tend to think of it as mult and mult_raw, but in this
> case mult is the "raw" unmodified value and mult_mono is the adjusted
> one.
>
> I'd probably suggest mult_adjusted or some other name to make it more
> clear how it differs from the clockevent mult.
>

Totally agreed. I didn't want to rename ->mult in order to avoid
touching clockevents driver initizalization code all over the tree.
But "mult_adjusted" is really more intuitive.


>>
>> +void timekeeping_get_mono_mult(u32 *mult_cs_mono, u32 *mult_cs_raw)
>> +{
>> +       struct tk_read_base *tkr_mono = &tk_core.timekeeper.tkr_mono;
>> +
>> +       *mult_cs_mono = tkr_mono->mult;
>> +       *mult_cs_raw = tkr_mono->clock->mult;
>> +}
>
> So.. you probably should have some locking here. Or at least a big
> comment making it clear why locking isn't necessary.

Agreed. To my taste, this timekeeping_get_mono_mult is an ugly hack
anyway and I'd like to make tk_core non-static instead (c.f. cover
letter). I'm lacking the needed guts though.

Thanks,

Nicolai