From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753271AbbIJRmI (ORCPT <rfc822;w@1wt.eu>);
	Thu, 10 Sep 2015 13:42:08 -0400
Received: from mail-ig0-f178.google.com ([209.85.213.178]:35984 "EHLO
	mail-ig0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750970AbbIJRmF (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 10 Sep 2015 13:42:05 -0400
MIME-Version: 1.0
In-Reply-To: <20150910120220.GV24711@localhost>
References: <1441840051-20244-1-git-send-email-john.stultz@linaro.org>
	<1441840051-20244-2-git-send-email-john.stultz@linaro.org>
	<20150910120220.GV24711@localhost>
Date: Thu, 10 Sep 2015 10:42:03 -0700
Message-ID: <CALAqxLWdLEBXHc2B5KTyKYUHBat5iNPPZ+wM7=DNevq64yewuQ@mail.gmail.com>
Subject: Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate
 adjtimex() tick adjustments
From: John Stultz <john.stultz@linaro.org>
To: Miroslav Lichvar <mlichvar@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
        =?UTF-8?Q?Nuno_Gon=C3=A7alves?= <nunojpg@gmail.com>,
        Prarit Bhargava <prarit@redhat.com>,
        Richard Cochran <richardcochran@gmail.com>,
        Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
        Shuah Khan <shuahkh@osg.samsung.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Sep 10, 2015 at 5:02 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Wed, Sep 09, 2015 at 04:07:31PM -0700, John Stultz wrote:
>> Recently an issue was reported that was difficult to detect except
>> by tweaking the adjtimex tick value, and noticing how quickly the
>> adjustment took to be made:
>>       https://lkml.org/lkml/2015/9/1/488
>>
>> Thus this patch introduces a new test which manipulates the adjtimex
>> tick value and validates the results are what we expect.
>
>> +     if (llabs(eppm - ppm) > 10) {
>> +             printf("        [FAILED]\n");
>> +             return -1;
>> +     }
>> +     printf("        [OK]\n");
>> +     return  0;
>
> This seems to work nicely with the tsc and hpet clocksources, but for
> some reason 10 ppm is not enough with the acpi_pm clocksource on both
> machines I tried this on. They both show -99988 ppm for the first
> test. When I modify the program to go through errors I get:
>
> Estimating tick (act: 9000 usec, -100000 ppm): 9001 usec, -99988 ppm    [FAILED]
> Estimating tick (act: 9250 usec, -75000 ppm): 9251 usec, -74991 ppm     [OK]
> Estimating tick (act: 9500 usec, -50000 ppm): 9501 usec, -49994 ppm     [OK]
> Estimating tick (act: 9750 usec, -25000 ppm): 9751 usec, -24997 ppm     [OK]
> Estimating tick (act: 10000 usec, 0 ppm): 10000 usec, 0 ppm     [OK]
> Estimating tick (act: 10250 usec, 25000 ppm): 10249 usec, 24996 ppm     [OK]
> Estimating tick (act: 10500 usec, 50000 ppm): 10499 usec, 49993 ppm     [OK]
> Estimating tick (act: 10750 usec, 75000 ppm): 10749 usec, 74990 ppm     [OK]
>
> The precision of the clock is better than microsecond, so that
> wouldn't explain a 12 ppm error over the 15 second interval. I guess
> it's due to a larger xtime_remainder, which basically is a hidden
> frequency offset added (and not multiplied) to the NTP frequency
> offset. Would that explain it?

I think its due to the ntp_error being large enough prior (or during
the freq transition) that we're still applying a single unit freq
adjustment for that error. But I'm guessing on the acpi_pm clocksource
the shift is low enough that a single unit adjustment is coarse enough
to affect the ppm, since I see the same consistently measured ppm
result if I both increase the settling time measurement sleep times.
If I left it for a long long time, the single unit correction would
likely null the error out and we'd get the desired result, but I don't
think the test has time for that.

The short term answer is to likely up the acceptable range for passing
the test.

Long term, we can look at further improving the error accumulation.

I'm thinking your earlier approach of doing the more expensive
division instead of the approximation over a series of ticks might
reduce the error generated during that transition.  So that might be
one approach.

Pondering a bit on this, I'm thinking while its ideally nice to keep
the ntp_error true to the difference between where the system time is
and where its been told to be, I'm not if that full history makes
total sense. As if ntpd has specified a different frequency, it may
not make since to try to correct the accumulated error from the past.
Since at that point, if ntpd has looked at where we are and is
specifying a new freq, in some ways its accounting for the current
uncorrected error. So we might just consider clearing the ntp_error
after the approximation is finished. Though I probably need to think
on this approach a bit more.

Your thoughts?

thanks
-john