From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753271AbbIJRmI (ORCPT ); Thu, 10 Sep 2015 13:42:08 -0400 Received: from mail-ig0-f178.google.com ([209.85.213.178]:35984 "EHLO mail-ig0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750970AbbIJRmF (ORCPT ); Thu, 10 Sep 2015 13:42:05 -0400 MIME-Version: 1.0 In-Reply-To: <20150910120220.GV24711@localhost> References: <1441840051-20244-1-git-send-email-john.stultz@linaro.org> <1441840051-20244-2-git-send-email-john.stultz@linaro.org> <20150910120220.GV24711@localhost> Date: Thu, 10 Sep 2015 10:42:03 -0700 Message-ID: Subject: Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments From: John Stultz To: Miroslav Lichvar Cc: LKML , =?UTF-8?Q?Nuno_Gon=C3=A7alves?= , Prarit Bhargava , Richard Cochran , Ingo Molnar , Thomas Gleixner , Shuah Khan Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 10, 2015 at 5:02 AM, Miroslav Lichvar wrote: > On Wed, Sep 09, 2015 at 04:07:31PM -0700, John Stultz wrote: >> Recently an issue was reported that was difficult to detect except >> by tweaking the adjtimex tick value, and noticing how quickly the >> adjustment took to be made: >> https://lkml.org/lkml/2015/9/1/488 >> >> Thus this patch introduces a new test which manipulates the adjtimex >> tick value and validates the results are what we expect. > >> + if (llabs(eppm - ppm) > 10) { >> + printf(" [FAILED]\n"); >> + return -1; >> + } >> + printf(" [OK]\n"); >> + return 0; > > This seems to work nicely with the tsc and hpet clocksources, but for > some reason 10 ppm is not enough with the acpi_pm clocksource on both > machines I tried this on. They both show -99988 ppm for the first > test. When I modify the program to go through errors I get: > > Estimating tick (act: 9000 usec, -100000 ppm): 9001 usec, -99988 ppm [FAILED] > Estimating tick (act: 9250 usec, -75000 ppm): 9251 usec, -74991 ppm [OK] > Estimating tick (act: 9500 usec, -50000 ppm): 9501 usec, -49994 ppm [OK] > Estimating tick (act: 9750 usec, -25000 ppm): 9751 usec, -24997 ppm [OK] > Estimating tick (act: 10000 usec, 0 ppm): 10000 usec, 0 ppm [OK] > Estimating tick (act: 10250 usec, 25000 ppm): 10249 usec, 24996 ppm [OK] > Estimating tick (act: 10500 usec, 50000 ppm): 10499 usec, 49993 ppm [OK] > Estimating tick (act: 10750 usec, 75000 ppm): 10749 usec, 74990 ppm [OK] > > The precision of the clock is better than microsecond, so that > wouldn't explain a 12 ppm error over the 15 second interval. I guess > it's due to a larger xtime_remainder, which basically is a hidden > frequency offset added (and not multiplied) to the NTP frequency > offset. Would that explain it? I think its due to the ntp_error being large enough prior (or during the freq transition) that we're still applying a single unit freq adjustment for that error. But I'm guessing on the acpi_pm clocksource the shift is low enough that a single unit adjustment is coarse enough to affect the ppm, since I see the same consistently measured ppm result if I both increase the settling time measurement sleep times. If I left it for a long long time, the single unit correction would likely null the error out and we'd get the desired result, but I don't think the test has time for that. The short term answer is to likely up the acceptable range for passing the test. Long term, we can look at further improving the error accumulation. I'm thinking your earlier approach of doing the more expensive division instead of the approximation over a series of ticks might reduce the error generated during that transition. So that might be one approach. Pondering a bit on this, I'm thinking while its ideally nice to keep the ntp_error true to the difference between where the system time is and where its been told to be, I'm not if that full history makes total sense. As if ntpd has specified a different frequency, it may not make since to try to correct the accumulated error from the past. Since at that point, if ntpd has looked at where we are and is specifying a new freq, in some ways its accounting for the current uncorrected error. So we might just consider clearing the ntp_error after the approximation is finished. Though I probably need to think on this approach a bit more. Your thoughts? thanks -john