[PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()
@ 2015-09-09 23:07 John Stultz
  2015-09-09 23:07 ` [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments John Stultz
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: John Stultz @ 2015-09-09 23:07 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Nuno Gonçalves, Miroslav Lichvar,
	Prarit Bhargava, Richard Cochran, Ingo Molnar, Thomas Gleixner,
	stable

The internal clocksteering done for fine-grained error correction
uses a logarithmic approximation, so any time adjtimex() adjusts
the clock steering, timekeeping_freqadjust() quickly approximates
the correct clock frequency over a series of ticks.

Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit dc491596f639438 (Rework frequency adjustments to work
better w/ nohz), used the abs() function with a s64 error value
to calculate the size of the approximated adjustment to be made.

Per include/linux/kernel.h: "abs() should not be used for 64-bit
types (s64, u64, long long) - use abs64()".

Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large adjustments
are made).

This patch fixes the issue by using abs64() instead.

Cc: Nuno Gonçalves <nunojpg@gmail.com>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # v3.17+
Reported-by: Nuno Gonçalves <nunojpg@gmail.com>
Tested-by: Nuno Goncalves <nunojpg@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index f6ee2e6..3739ac6 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1614,7 +1614,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 	negative = (tick_error < 0);

 	/* Sort out the magnitude of the correction */
-	tick_error = abs(tick_error);
+	tick_error = abs64(tick_error);
 	for (adj = 0; tick_error > interval; adj++)
 		tick_error >>= 1;

-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments
  2015-09-09 23:07 [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() John Stultz
@ 2015-09-09 23:07 ` John Stultz
  2015-09-10 12:02   ` Miroslav Lichvar
  2015-09-13  8:32 ` [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() Ingo Molnar
  2015-09-13 11:07 ` [tip:timers/urgent] time: Fix timekeeping_freqadjust()' s " tip-bot for John Stultz
  2 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2015-09-09 23:07 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Nuno Gonçalves, Miroslav Lichvar,
	Prarit Bhargava, Richard Cochran, Ingo Molnar, Thomas Gleixner,
	Shuah Khan

Recently an issue was reported that was difficult to detect except
by tweaking the adjtimex tick value, and noticing how quickly the
adjustment took to be made:
	https://lkml.org/lkml/2015/9/1/488

Thus this patch introduces a new test which manipulates the adjtimex
tick value and validates the results are what we expect.

Cc: Nuno Gonçalves <nunojpg@gmail.com>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
v2: Use sysconf(_SC_CLK_TCK) to properly get USER_HZ value.

 tools/testing/selftests/timers/Makefile  |   3 +-
 tools/testing/selftests/timers/adjtick.c | 199 +++++++++++++++++++++++++++++++
 2 files changed, 201 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timers/adjtick.c

diff --git a/tools/testing/selftests/timers/Makefile b/tools/testing/selftests/timers/Makefile
index 89a3f44..4a1be1b 100644
--- a/tools/testing/selftests/timers/Makefile
+++ b/tools/testing/selftests/timers/Makefile
@@ -8,7 +8,7 @@ LDFLAGS += -lrt -lpthread
 TEST_PROGS = posix_timers nanosleep nsleep-lat set-timer-lat mqueue-lat \
 	     inconsistency-check raw_skew threadtest rtctest
 
-TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex change_skew \
+TEST_PROGS_EXTENDED = alarmtimer-suspend valid-adjtimex adjtick change_skew \
 		      skew_consistency clocksource-switch leap-a-day \
 		      leapcrash set-tai set-2038
 
@@ -24,6 +24,7 @@ include ../lib.mk
 run_destructive_tests: run_tests
 	./alarmtimer-suspend
 	./valid-adjtimex
+	./adjtick
 	./change_skew
 	./skew_consistency
 	./clocksource-switch
diff --git a/tools/testing/selftests/timers/adjtick.c b/tools/testing/selftests/timers/adjtick.c
new file mode 100644
index 0000000..c6efcec
--- /dev/null
+++ b/tools/testing/selftests/timers/adjtick.c
@@ -0,0 +1,199 @@
+/* adjtimex() tick adjustment test
+ *		by:   John Stultz <john.stultz@linaro.org>
+ *		(C) Copyright Linaro Limited 2015
+ *		Licensed under the GPLv2
+ *
+ *  To build:
+ *	$ gcc adjtick.c -o adjtick -lrt
+ *
+ *   This program is free software: you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, either version 2 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This program is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *   GNU General Public License for more details.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <sys/time.h>
+#include <sys/timex.h>
+#include <time.h>
+#ifdef KTEST
+#include "../kselftest.h"
+#else
+static inline int ksft_exit_pass(void)
+{
+	exit(0);
+}
+static inline int ksft_exit_fail(void)
+{
+	exit(1);
+}
+#endif
+
+
+#define CLOCK_MONOTONIC_RAW		4
+#define NSEC_PER_SEC 1000000000LL
+#define USEC_PER_SEC 1000000
+#define MILLION 1000000
+
+long systick;
+
+long long llabs(long long val)
+{
+	if (val < 0)
+		val = -val;
+	return val;
+}
+
+unsigned long long ts_to_nsec(struct timespec ts)
+{
+	return ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec;
+}
+
+struct timespec nsec_to_ts(long long ns)
+{
+	struct timespec ts;
+
+	ts.tv_sec = ns/NSEC_PER_SEC;
+	ts.tv_nsec = ns%NSEC_PER_SEC;
+	return ts;
+}
+
+long long diff_timespec(struct timespec start, struct timespec end)
+{
+	long long start_ns, end_ns;
+
+	start_ns = ts_to_nsec(start);
+	end_ns = ts_to_nsec(end);
+	return end_ns - start_ns;
+}
+
+void get_monotonic_and_raw(struct timespec *mon, struct timespec *raw)
+{
+	struct timespec start, mid, end;
+	long long diff = 0, tmp;
+	int i;
+
+	clock_gettime(CLOCK_MONOTONIC, mon);
+	clock_gettime(CLOCK_MONOTONIC_RAW, raw);
+
+	/* Try to get a more tightly bound pairing */
+	for (i = 0; i < 3; i++) {
+		long long newdiff;
+
+		clock_gettime(CLOCK_MONOTONIC, &start);
+		clock_gettime(CLOCK_MONOTONIC_RAW, &mid);
+		clock_gettime(CLOCK_MONOTONIC, &end);
+
+		newdiff = diff_timespec(start, end);
+		if (diff == 0 || newdiff < diff) {
+			diff = newdiff;
+			*raw = mid;
+			tmp = (ts_to_nsec(start) + ts_to_nsec(end))/2;
+			*mon = nsec_to_ts(tmp);
+		}
+	}
+}
+
+long long get_ppm_drift(void)
+{
+	struct timespec mon_start, raw_start, mon_end, raw_end;
+	long long delta1, delta2, eppm;
+
+	get_monotonic_and_raw(&mon_start, &raw_start);
+
+	sleep(15);
+
+	get_monotonic_and_raw(&mon_end, &raw_end);
+
+	delta1 = diff_timespec(mon_start, mon_end);
+	delta2 = diff_timespec(raw_start, raw_end);
+
+	eppm = (delta1*MILLION)/delta2 - MILLION;
+	return eppm;
+}
+
+int check_tick_adj(long tickval)
+{
+	long long eppm, ppm;
+	struct timex tx1;
+
+	tx1.modes = ADJ_TICK;
+	tx1.modes |= ADJ_OFFSET;
+	tx1.modes |= ADJ_FREQUENCY;
+	tx1.offset = 0;
+	tx1.freq = 0;
+	tx1.tick = tickval;
+	adjtimex(&tx1);
+	sleep(1);
+
+	ppm = ((long long)tickval * MILLION)/systick - MILLION;
+	printf("Estimating tick (act: %ld usec, %lld ppm): ", tickval, ppm);
+
+	eppm = get_ppm_drift();
+	printf("%lld usec, %lld ppm", systick + (systick * eppm / MILLION), eppm);
+
+	tx1.modes = 0;
+	adjtimex(&tx1);
+	if (tx1.offset || tx1.freq || tx1.tick != tickval) {
+		printf("WARNING: Unexpected adjtimex return values, make sure ntpd is not running. ");
+		return -1;
+	}
+
+	if (llabs(eppm - ppm) > 10) {
+		printf("	[FAILED]\n");
+		return -1;
+	}
+	printf("	[OK]\n");
+	return  0;
+}
+
+int main(int argv, char **argc)
+{
+	struct timespec raw;
+	long tick, max, interval, err;
+	struct timex tx1;
+
+	err = 0;
+	setbuf(stdout, NULL);
+
+	if (clock_gettime(CLOCK_MONOTONIC_RAW, &raw)) {
+		printf("ERR: NO CLOCK_MONOTONIC_RAW\n");
+		return -1;
+	}
+
+
+	systick = sysconf(_SC_CLK_TCK);
+	systick = USEC_PER_SEC/sysconf(_SC_CLK_TCK);
+	printf("systick: %ld\n", systick);
+
+	max = systick/10; /* +/- 10% */
+	interval = max/4; /* in 4 steps each side */
+
+	for (tick = (systick - max); tick < (systick + max);
+	     tick += interval) {
+		if (check_tick_adj(tick)) {
+			err = 1;
+			break;
+		}
+	}
+
+	/* Reset things to zero */
+	tx1.modes = ADJ_TICK;
+	tx1.modes |= ADJ_OFFSET;
+	tx1.modes |= ADJ_FREQUENCY;
+	tx1.offset = 0;
+	tx1.freq = 0;
+	tx1.tick = systick;
+	adjtimex(&tx1);
+
+	if (err)
+		return ksft_exit_fail();
+	return ksft_exit_pass();
+}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments
  2015-09-09 23:07 ` [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments John Stultz
@ 2015-09-10 12:02   ` Miroslav Lichvar
  2015-09-10 17:42     ` John Stultz
  0 siblings, 1 reply; 14+ messages in thread
From: Miroslav Lichvar @ 2015-09-10 12:02 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Nuno Gonçalves, Prarit Bhargava, Richard Cochran,
	Ingo Molnar, Thomas Gleixner, Shuah Khan

On Wed, Sep 09, 2015 at 04:07:31PM -0700, John Stultz wrote:
> Recently an issue was reported that was difficult to detect except
> by tweaking the adjtimex tick value, and noticing how quickly the
> adjustment took to be made:
> 	https://lkml.org/lkml/2015/9/1/488
> 
> Thus this patch introduces a new test which manipulates the adjtimex
> tick value and validates the results are what we expect.

> +	if (llabs(eppm - ppm) > 10) {
> +		printf("	[FAILED]\n");
> +		return -1;
> +	}
> +	printf("	[OK]\n");
> +	return  0;

This seems to work nicely with the tsc and hpet clocksources, but for
some reason 10 ppm is not enough with the acpi_pm clocksource on both
machines I tried this on. They both show -99988 ppm for the first
test. When I modify the program to go through errors I get:

Estimating tick (act: 9000 usec, -100000 ppm): 9001 usec, -99988 ppm    [FAILED]
Estimating tick (act: 9250 usec, -75000 ppm): 9251 usec, -74991 ppm     [OK]
Estimating tick (act: 9500 usec, -50000 ppm): 9501 usec, -49994 ppm     [OK]
Estimating tick (act: 9750 usec, -25000 ppm): 9751 usec, -24997 ppm     [OK]
Estimating tick (act: 10000 usec, 0 ppm): 10000 usec, 0 ppm     [OK]
Estimating tick (act: 10250 usec, 25000 ppm): 10249 usec, 24996 ppm     [OK]
Estimating tick (act: 10500 usec, 50000 ppm): 10499 usec, 49993 ppm     [OK]
Estimating tick (act: 10750 usec, 75000 ppm): 10749 usec, 74990 ppm     [OK]

The precision of the clock is better than microsecond, so that
wouldn't explain a 12 ppm error over the 15 second interval. I guess
it's due to a larger xtime_remainder, which basically is a hidden
frequency offset added (and not multiplied) to the NTP frequency
offset. Would that explain it?

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments
  2015-09-10 12:02   ` Miroslav Lichvar
@ 2015-09-10 17:42     ` John Stultz
  2015-09-10 18:14       ` John Stultz
  0 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2015-09-10 17:42 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: LKML, Nuno Gonçalves, Prarit Bhargava, Richard Cochran,
	Ingo Molnar, Thomas Gleixner, Shuah Khan

On Thu, Sep 10, 2015 at 5:02 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Wed, Sep 09, 2015 at 04:07:31PM -0700, John Stultz wrote:
>> Recently an issue was reported that was difficult to detect except
>> by tweaking the adjtimex tick value, and noticing how quickly the
>> adjustment took to be made:
>>       https://lkml.org/lkml/2015/9/1/488
>>
>> Thus this patch introduces a new test which manipulates the adjtimex
>> tick value and validates the results are what we expect.
>
>> +     if (llabs(eppm - ppm) > 10) {
>> +             printf("        [FAILED]\n");
>> +             return -1;
>> +     }
>> +     printf("        [OK]\n");
>> +     return  0;
>
> This seems to work nicely with the tsc and hpet clocksources, but for
> some reason 10 ppm is not enough with the acpi_pm clocksource on both
> machines I tried this on. They both show -99988 ppm for the first
> test. When I modify the program to go through errors I get:
>
> Estimating tick (act: 9000 usec, -100000 ppm): 9001 usec, -99988 ppm    [FAILED]
> Estimating tick (act: 9250 usec, -75000 ppm): 9251 usec, -74991 ppm     [OK]
> Estimating tick (act: 9500 usec, -50000 ppm): 9501 usec, -49994 ppm     [OK]
> Estimating tick (act: 9750 usec, -25000 ppm): 9751 usec, -24997 ppm     [OK]
> Estimating tick (act: 10000 usec, 0 ppm): 10000 usec, 0 ppm     [OK]
> Estimating tick (act: 10250 usec, 25000 ppm): 10249 usec, 24996 ppm     [OK]
> Estimating tick (act: 10500 usec, 50000 ppm): 10499 usec, 49993 ppm     [OK]
> Estimating tick (act: 10750 usec, 75000 ppm): 10749 usec, 74990 ppm     [OK]
>
> The precision of the clock is better than microsecond, so that
> wouldn't explain a 12 ppm error over the 15 second interval. I guess
> it's due to a larger xtime_remainder, which basically is a hidden
> frequency offset added (and not multiplied) to the NTP frequency
> offset. Would that explain it?

I think its due to the ntp_error being large enough prior (or during
the freq transition) that we're still applying a single unit freq
adjustment for that error. But I'm guessing on the acpi_pm clocksource
the shift is low enough that a single unit adjustment is coarse enough
to affect the ppm, since I see the same consistently measured ppm
result if I both increase the settling time measurement sleep times.
If I left it for a long long time, the single unit correction would
likely null the error out and we'd get the desired result, but I don't
think the test has time for that.

The short term answer is to likely up the acceptable range for passing
the test.

Long term, we can look at further improving the error accumulation.

I'm thinking your earlier approach of doing the more expensive
division instead of the approximation over a series of ticks might
reduce the error generated during that transition.  So that might be
one approach.

Pondering a bit on this, I'm thinking while its ideally nice to keep
the ntp_error true to the difference between where the system time is
and where its been told to be, I'm not if that full history makes
total sense. As if ntpd has specified a different frequency, it may
not make since to try to correct the accumulated error from the past.
Since at that point, if ntpd has looked at where we are and is
specifying a new freq, in some ways its accounting for the current
uncorrected error. So we might just consider clearing the ntp_error
after the approximation is finished. Though I probably need to think
on this approach a bit more.

Your thoughts?

thanks
-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments
  2015-09-10 17:42     ` John Stultz
@ 2015-09-10 18:14       ` John Stultz
  2015-09-14 14:48         ` Miroslav Lichvar
  0 siblings, 1 reply; 14+ messages in thread
From: John Stultz @ 2015-09-10 18:14 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: LKML, Nuno Gonçalves, Prarit Bhargava, Richard Cochran,
	Ingo Molnar, Thomas Gleixner, Shuah Khan

On Thu, Sep 10, 2015 at 10:42 AM, John Stultz <john.stultz@linaro.org> wrote:
> On Thu, Sep 10, 2015 at 5:02 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
>> On Wed, Sep 09, 2015 at 04:07:31PM -0700, John Stultz wrote:
>>> Recently an issue was reported that was difficult to detect except
>>> by tweaking the adjtimex tick value, and noticing how quickly the
>>> adjustment took to be made:
>>>       https://lkml.org/lkml/2015/9/1/488
>>>
>>> Thus this patch introduces a new test which manipulates the adjtimex
>>> tick value and validates the results are what we expect.
>>
>>> +     if (llabs(eppm - ppm) > 10) {
>>> +             printf("        [FAILED]\n");
>>> +             return -1;
>>> +     }
>>> +     printf("        [OK]\n");
>>> +     return  0;
>>
>> This seems to work nicely with the tsc and hpet clocksources, but for
>> some reason 10 ppm is not enough with the acpi_pm clocksource on both
>> machines I tried this on. They both show -99988 ppm for the first
>> test. When I modify the program to go through errors I get:
>>
>> Estimating tick (act: 9000 usec, -100000 ppm): 9001 usec, -99988 ppm    [FAILED]
>> Estimating tick (act: 9250 usec, -75000 ppm): 9251 usec, -74991 ppm     [OK]
>> Estimating tick (act: 9500 usec, -50000 ppm): 9501 usec, -49994 ppm     [OK]
>> Estimating tick (act: 9750 usec, -25000 ppm): 9751 usec, -24997 ppm     [OK]
>> Estimating tick (act: 10000 usec, 0 ppm): 10000 usec, 0 ppm     [OK]
>> Estimating tick (act: 10250 usec, 25000 ppm): 10249 usec, 24996 ppm     [OK]
>> Estimating tick (act: 10500 usec, 50000 ppm): 10499 usec, 49993 ppm     [OK]
>> Estimating tick (act: 10750 usec, 75000 ppm): 10749 usec, 74990 ppm     [OK]
>>
>> The precision of the clock is better than microsecond, so that
>> wouldn't explain a 12 ppm error over the 15 second interval. I guess
>> it's due to a larger xtime_remainder, which basically is a hidden
>> frequency offset added (and not multiplied) to the NTP frequency
>> offset. Would that explain it?
>
> I think its due to the ntp_error being large enough prior (or during
> the freq transition) that we're still applying a single unit freq
> adjustment for that error. But I'm guessing on the acpi_pm clocksource
> the shift is low enough that a single unit adjustment is coarse enough
> to affect the ppm, since I see the same consistently measured ppm
> result if I both increase the settling time measurement sleep times.
> If I left it for a long long time, the single unit correction would
> likely null the error out and we'd get the desired result, but I don't
> think the test has time for that.
>
> The short term answer is to likely up the acceptable range for passing
> the test.

So bumping the fail level to > 100ppm avoids false positives due to
long-term error correction with coarse clocksources, but still is
tight enough to catch the dampened approximation issue caused by the
abs(s64) problem.

Any objection to moving to that? It is still a 0.01% error bound.

thanks
-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()
  2015-09-09 23:07 [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() John Stultz
  2015-09-09 23:07 ` [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments John Stultz
@ 2015-09-13  8:32 ` Ingo Molnar
  2015-09-14 23:24   ` John Stultz
  2015-09-13 11:07 ` [tip:timers/urgent] time: Fix timekeeping_freqadjust()' s " tip-bot for John Stultz
  2 siblings, 1 reply; 14+ messages in thread
From: Ingo Molnar @ 2015-09-13  8:32 UTC (permalink / raw)
  To: John Stultz, Linus Torvalds, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner
  Cc: LKML, Nuno Gonçalves, Miroslav Lichvar, Prarit Bhargava,
	Richard Cochran, Thomas Gleixner, stable


* John Stultz <john.stultz@linaro.org> wrote:

> The internal clocksteering done for fine-grained error correction
> uses a logarithmic approximation, so any time adjtimex() adjusts
> the clock steering, timekeeping_freqadjust() quickly approximates
> the correct clock frequency over a series of ticks.
> 
> Unfortunately, the logic in timekeeping_freqadjust(), introduced
> in commit dc491596f639438 (Rework frequency adjustments to work
> better w/ nohz), used the abs() function with a s64 error value
> to calculate the size of the approximated adjustment to be made.
> 
> Per include/linux/kernel.h: "abs() should not be used for 64-bit
> types (s64, u64, long long) - use abs64()".
> 
> Thus on 32-bit platforms, this resulted in the clocksteering to
> take a quite dampended random walk trying to converge on the
> proper frequency, which caused the adjustments to be made much
> slower then intended (most easily observed when large adjustments
> are made).
> 
> This patch fixes the issue by using abs64() instead.

>  	/* Sort out the magnitude of the correction */
> -	tick_error = abs(tick_error);
> +	tick_error = abs64(tick_error);

Ugh, and we had this bug for almost two years!

Would it be possible to make abs() warn about 64-bit types during build time,
to prevent such mishaps?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:timers/urgent] time: Fix timekeeping_freqadjust()' s incorrect use of abs() instead of abs64()
  2015-09-09 23:07 [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() John Stultz
  2015-09-09 23:07 ` [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments John Stultz
  2015-09-13  8:32 ` [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() Ingo Molnar
@ 2015-09-13 11:07 ` tip-bot for John Stultz
  2015-10-05 15:10   ` Nuno Gonçalves
  2 siblings, 1 reply; 14+ messages in thread
From: tip-bot for John Stultz @ 2015-09-13 11:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: prarit, torvalds, nunojpg, linux-kernel, tglx, richardcochran,
	mlichvar, peterz, mingo, john.stultz, hpa

Commit-ID:  2619d7e9c92d524cb155ec89fd72875321512e5b
Gitweb:     http://git.kernel.org/tip/2619d7e9c92d524cb155ec89fd72875321512e5b
Author:     John Stultz <john.stultz@linaro.org>
AuthorDate: Wed, 9 Sep 2015 16:07:30 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sun, 13 Sep 2015 10:30:47 +0200

time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()

The internal clocksteering done for fine-grained error
correction uses a logarithmic approximation, so any time
adjtimex() adjusts the clock steering, timekeeping_freqadjust()
quickly approximates the correct clock frequency over a series
of ticks.

Unfortunately, the logic in timekeeping_freqadjust(), introduced
in commit:

  dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz")

used the abs() function with a s64 error value to calculate the
size of the approximated adjustment to be made.

Per include/linux/kernel.h:

  "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".

Thus on 32-bit platforms, this resulted in the clocksteering to
take a quite dampended random walk trying to converge on the
proper frequency, which caused the adjustments to be made much
slower then intended (most easily observed when large
adjustments are made).

This patch fixes the issue by using abs64() instead.

Reported-by: Nuno Gonçalves <nunojpg@gmail.com>
Tested-by: Nuno Goncalves <nunojpg@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/time/timekeeping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index f6ee2e6..3739ac6 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1614,7 +1614,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 	negative = (tick_error < 0);
 
 	/* Sort out the magnitude of the correction */
-	tick_error = abs(tick_error);
+	tick_error = abs64(tick_error);
 	for (adj = 0; tick_error > interval; adj++)
 		tick_error >>= 1;
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments
  2015-09-10 18:14       ` John Stultz
@ 2015-09-14 14:48         ` Miroslav Lichvar
  2015-10-02 20:25           ` John Stultz
  2015-10-02 20:49           ` John Stultz
  0 siblings, 2 replies; 14+ messages in thread
From: Miroslav Lichvar @ 2015-09-14 14:48 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Nuno Gonçalves, Prarit Bhargava, Richard Cochran,
	Ingo Molnar, Thomas Gleixner, Shuah Khan

On Thu, Sep 10, 2015 at 11:14:25AM -0700, John Stultz wrote:
> On Thu, Sep 10, 2015 at 10:42 AM, John Stultz <john.stultz@linaro.org> wrote:
> > On Thu, Sep 10, 2015 at 5:02 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> >> The precision of the clock is better than microsecond, so that
> >> wouldn't explain a 12 ppm error over the 15 second interval. I guess
> >> it's due to a larger xtime_remainder, which basically is a hidden
> >> frequency offset added (and not multiplied) to the NTP frequency
> >> offset. Would that explain it?
> >
> > I think its due to the ntp_error being large enough prior (or during
> > the freq transition) that we're still applying a single unit freq
> > adjustment for that error. But I'm guessing on the acpi_pm clocksource
> > the shift is low enough that a single unit adjustment is coarse enough
> > to affect the ppm, since I see the same consistently measured ppm
> > result if I both increase the settling time measurement sleep times.
> > If I left it for a long long time, the single unit correction would
> > likely null the error out and we'd get the desired result, but I don't
> > think the test has time for that.

I ran few tests and it doesn't seem to be a problem with large
ntp_error or an extremely slow adjustment of the multiplier for the
new frequency.

I think it really is the xtime_remainder correction. It is a fixed
offset added to the ntp error on each tick to compensate for the
cycle_interval rounding error. With the acpi_pm clocksource and 1000Hz
update rate xtime_remainder is -127 ns, which effectively speeds up
the clock by 127 ppm. When NTP slows the clock down by 10%, the
correction is not decreased by 10% and we can observe the clock is
running faster by 12.7 ppm than expected.

Is there a cheap way to calculate this?
xtime_remainder * (ntp_tick >> ntp_error_shift) / NTP_INTERVAL_LENGTH

> So bumping the fail level to > 100ppm avoids false positives due to
> long-term error correction with coarse clocksources, but still is
> tight enough to catch the dampened approximation issue caused by the
> abs(s64) problem.
> 
> Any objection to moving to that? It is still a 0.01% error bound.

No objection from me as long as we understand where that error is
coming from.

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()
  2015-09-13  8:32 ` [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() Ingo Molnar
@ 2015-09-14 23:24   ` John Stultz
  0 siblings, 0 replies; 14+ messages in thread
From: John Stultz @ 2015-09-14 23:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	LKML, Nuno Gonçalves, Miroslav Lichvar, Prarit Bhargava,
	Richard Cochran, stable, Joe Perches

[-- Attachment #1: Type: text/plain, Size: 2083 bytes --]

On Sun, Sep 13, 2015 at 1:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> The internal clocksteering done for fine-grained error correction
>> uses a logarithmic approximation, so any time adjtimex() adjusts
>> the clock steering, timekeeping_freqadjust() quickly approximates
>> the correct clock frequency over a series of ticks.
>>
>> Unfortunately, the logic in timekeeping_freqadjust(), introduced
>> in commit dc491596f639438 (Rework frequency adjustments to work
>> better w/ nohz), used the abs() function with a s64 error value
>> to calculate the size of the approximated adjustment to be made.
>>
>> Per include/linux/kernel.h: "abs() should not be used for 64-bit
>> types (s64, u64, long long) - use abs64()".
>>
>> Thus on 32-bit platforms, this resulted in the clocksteering to
>> take a quite dampended random walk trying to converge on the
>> proper frequency, which caused the adjustments to be made much
>> slower then intended (most easily observed when large adjustments
>> are made).
>>
>> This patch fixes the issue by using abs64() instead.
>
>>       /* Sort out the magnitude of the correction */
>> -     tick_error = abs(tick_error);
>> +     tick_error = abs64(tick_error);
>
> Ugh, and we had this bug for almost two years!

Well. I sat on the patch for quite awhile, so the author date isn't
really representative. It was only included in mainline in 3.17, so
its been in use for a little over a year.  But yea, still.

> Would it be possible to make abs() warn about 64-bit types during build time,
> to prevent such mishaps?

Yea. I was surprised this wasn't something the compiler would catch previously.

So is BUILD_BUG_ON() the best option for this? Its catching a whole
bunch of other related issues (frighteningly, more then Joe's cocci
script). The down-side is BUILD_BUG_ON causes build errors, not
warnings, and its output isn't super easy to parse on first view.

Potential BUILD_BUG_ON patch attached. I'll also try to spin some
patches to fix the issues this one catches.

thanks
-john

[-- Attachment #2: abs-build-bug.patch --]
[-- Type: text/x-patch, Size: 687 bytes --]

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5582410..753be99 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -8,6 +8,7 @@
 #include <linux/types.h>
 #include <linux/compiler.h>
 #include <linux/bitops.h>
+#include <linux/bug.h>
 #include <linux/log2.h>
 #include <linux/typecheck.h>
 #include <linux/printk.h>
@@ -208,6 +209,9 @@ extern int _cond_resched(void);
  */
 #define abs(x) ({						\
 		long ret;					\
+		BUILD_BUG_ON_MSG(				\
+			sizeof(typeof(x)) > sizeof(long),	\
+			"abs() should not be used for 64-bit types - use abs64()");\
 		if (sizeof(x) == sizeof(long)) {		\
 			long __x = (x);				\
 			ret = (__x < 0) ? -__x : __x;		\

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments
  2015-09-14 14:48         ` Miroslav Lichvar
@ 2015-10-02 20:25           ` John Stultz
  2015-10-02 20:27             ` John Stultz
  2015-10-02 20:49           ` John Stultz
  1 sibling, 1 reply; 14+ messages in thread
From: John Stultz @ 2015-10-02 20:25 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: LKML, Nuno Gonçalves, Prarit Bhargava, Richard Cochran,
	Ingo Molnar, Thomas Gleixner, Shuah Khan

On Mon, Sep 14, 2015 at 7:48 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Thu, Sep 10, 2015 at 11:14:25AM -0700, John Stultz wrote:
>> On Thu, Sep 10, 2015 at 10:42 AM, John Stultz <john.stultz@linaro.org> wrote:
>> > On Thu, Sep 10, 2015 at 5:02 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
>> >> The precision of the clock is better than microsecond, so that
>> >> wouldn't explain a 12 ppm error over the 15 second interval. I guess
>> >> it's due to a larger xtime_remainder, which basically is a hidden
>> >> frequency offset added (and not multiplied) to the NTP frequency
>> >> offset. Would that explain it?
>> >
>> > I think its due to the ntp_error being large enough prior (or during
>> > the freq transition) that we're still applying a single unit freq
>> > adjustment for that error. But I'm guessing on the acpi_pm clocksource
>> > the shift is low enough that a single unit adjustment is coarse enough
>> > to affect the ppm, since I see the same consistently measured ppm
>> > result if I both increase the settling time measurement sleep times.
>> > If I left it for a long long time, the single unit correction would
>> > likely null the error out and we'd get the desired result, but I don't
>> > think the test has time for that.
>
> I ran few tests and it doesn't seem to be a problem with large
> ntp_error or an extremely slow adjustment of the multiplier for the
> new frequency.
>
> I think it really is the xtime_remainder correction. It is a fixed
> offset added to the ntp error on each tick to compensate for the
> cycle_interval rounding error. With the acpi_pm clocksource and 1000Hz
> update rate xtime_remainder is -127 ns, which effectively speeds up
> the clock by 127 ppm. When NTP slows the clock down by 10%, the
> correction is not decreased by 10% and we can observe the clock is
> running faster by 12.7 ppm than expected.

Sorry for taking so long to get back to you here. Had a conference
(and related prep) that pulled me away.

So yea.. I've spend some more time looking at this, and your argument
above looks pretty convincing and my theory didn't prove out (clearing
out the ntp_error value on frequency changes doesn't avoid the issue).

> Is there a cheap way to calculate this?
> xtime_remainder * (ntp_tick >> ntp_error_shift) / NTP_INTERVAL_LENGTH



Hrm.. So
   xtime_remainder = (NTP_INTERVAL_LENGTH <<
tk->tkr_mono.clock->shift) - (tk->cycle_interval *
tk->tkr_mono.clock->mult)

   for simplificiation:

And we want to scale it as you pointed out above (though slightly
fixed here) by:
         (tk->ntp_tick >> tk->ntp_error_shift) / (NTP_INTERVAL_LENGTH
<< tk->tkr_mono.clock->shift)


So this comes  out to:


(tk->ntp_tick ) -  (tk->ntp_tick ) *  (tk->cycle_interval *
tk->tkr_mono.clock->mult) / (NTP_INTERVAL_LENGTH <<
tk->tkr_mono.clock->shift)
tk->ntp_error_shift



would:

xtime_remainder = (tk->ntp_tick >> ntp_error_shift) - tk->xtime_interval

After we've adjusted xtime_interval give us the equivalent?


>> So bumping the fail level to > 100ppm avoids false positives due to
>> long-term error correction with coarse clocksources, but still is
>> tight enough to catch the dampened approximation issue caused by the
>> abs(s64) problem.
>>
>> Any objection to moving to that? It is still a 0.01% error bound.
>
> No objection from me as long as we understand where that error is
> coming from.

Yea. I think we agree. I did find one bug with the test (we can't
clear an existing offset if STA_PLL isn't set), so I'll be
resubmitting it here in a bit.

thanks
-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments
  2015-10-02 20:25           ` John Stultz
@ 2015-10-02 20:27             ` John Stultz
  0 siblings, 0 replies; 14+ messages in thread
From: John Stultz @ 2015-10-02 20:27 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: LKML, Nuno Gonçalves, Prarit Bhargava, Richard Cochran,
	Ingo Molnar, Thomas Gleixner, Shuah Khan

On Fri, Oct 2, 2015 at 1:25 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Mon, Sep 14, 2015 at 7:48 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
>> Is there a cheap way to calculate this?
>> xtime_remainder * (ntp_tick >> ntp_error_shift) / NTP_INTERVAL_LENGTH
>
>
>
> Hrm.. So
>    xtime_remainder = (NTP_INTERVAL_LENGTH <<
> tk->tkr_mono.clock->shift) - (tk->cycle_interval *
> tk->tkr_mono.clock->mult)
>
>    for simplificiation:
>
> And we want to scale it as you pointed out above (though slightly
> fixed here) by:
>          (tk->ntp_tick >> tk->ntp_error_shift) / (NTP_INTERVAL_LENGTH
> << tk->tkr_mono.clock->shift)
>
>
> So this comes  out to:
>
>
> (tk->ntp_tick ) -  (tk->ntp_tick ) *  (tk->cycle_interval *
> tk->tkr_mono.clock->mult) / (NTP_INTERVAL_LENGTH <<
> tk->tkr_mono.clock->shift)
> tk->ntp_error_shift
>
>
>
> would:
>
> xtime_remainder = (tk->ntp_tick >> ntp_error_shift) - tk->xtime_interval
>
> After we've adjusted xtime_interval give us the equivalent?

Bah.. Ignore the above here. I was still working out a calculation and
accidentally hit send.

-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments
  2015-09-14 14:48         ` Miroslav Lichvar
  2015-10-02 20:25           ` John Stultz
@ 2015-10-02 20:49           ` John Stultz
  1 sibling, 0 replies; 14+ messages in thread
From: John Stultz @ 2015-10-02 20:49 UTC (permalink / raw)
  To: Miroslav Lichvar
  Cc: LKML, Nuno Gonçalves, Prarit Bhargava, Richard Cochran,
	Ingo Molnar, Thomas Gleixner, Shuah Khan

On Mon, Sep 14, 2015 at 7:48 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> Is there a cheap way to calculate this?
> xtime_remainder * (ntp_tick >> ntp_error_shift) / NTP_INTERVAL_LENGTH

So as I was trying to figure out before I prematurely hit send...

I was thinking...

 xtime_remainder = (NTP_INTERVAL_LENGTH << tk->tkr_mono.clock->shift)
- (tk->cycle_interval * tk->tkr_mono.clock->mult)

or for simplification:

xtime_remainder = NTP_INTERVAL_SHIFTED - ORIG_MULT_INTERVAL_SHIFTED

And we want to scale it as you pointed out above (though slightly
fixed here) by:
(tk->ntp_tick >> tk->ntp_error_shift) / (NTP_INTERVAL_LENGTH <<
tk->tkr_mono.clock->shift)

Which we'll simplify a touch as:
(tk->ntp_tick >> tk->ntp_error_shift) / NTP_INTERVAL_SHIFTED

So multiplying these comes out to:

(tk->ntp_tick >> tk->ntp_error_shift) - ((tk->ntp_tick >>
tk->ntp_error_shift)  * ORIG_MULT_INTERVAL_SHIFTED /
NTP_INTERVAL_SHIFTED)

Which simplifies a bit to:

(tk->ntp_tick - (tk->ntp_tick *  ORIG_MULT_INTERVAL_SHIFTED /
NTP_INTERVAL_SHIFTED)) >> tk->ntp_error_shift

(tk->ntp_tick * ( 1 - ORIG_MULT_INTERVAL_SHIFTED /
NTP_INTERVAL_SHIFTED)) >> tk->ntp_error_shift;

So this looks like it would be cheap to calculate,  but since we're
doing integer math, the problem is that  ORIG_MULT_INTERVAL_SHIFTED /
NTP_INTERVAL_SHIFTED isn't an integer value, so if we want any
precision we'd still have the costly tk->ntp_tick/NTP_INTERVAL_SHIFTED
to do each time.

So.. that's a long winded way to say I can't think of a cheap way.... :P

thanks
-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:timers/urgent] time: Fix timekeeping_freqadjust()' s incorrect use of abs() instead of abs64()
  2015-09-13 11:07 ` [tip:timers/urgent] time: Fix timekeeping_freqadjust()' s " tip-bot for John Stultz
@ 2015-10-05 15:10   ` Nuno Gonçalves
  2015-10-06  8:05     ` Greg KH
  0 siblings, 1 reply; 14+ messages in thread
From: Nuno Gonçalves @ 2015-10-05 15:10 UTC (permalink / raw)
  To: stable
  Cc: linux-tip-commits, mingo, hpa, john.stultz, mlichvar,
	richardcochran, peterz, tglx, linux-kernel, prarit, torvalds

On Sun, Sep 13, 2015 at 12:07 PM, tip-bot for John Stultz
<tipbot@zytor.com> wrote:
> Commit-ID:  2619d7e9c92d524cb155ec89fd72875321512e5b
> Gitweb:     http://git.kernel.org/tip/2619d7e9c92d524cb155ec89fd72875321512e5b
> Author:     John Stultz <john.stultz@linaro.org>
> AuthorDate: Wed, 9 Sep 2015 16:07:30 -0700
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Sun, 13 Sep 2015 10:30:47 +0200
>
> time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()
>
> The internal clocksteering done for fine-grained error
> correction uses a logarithmic approximation, so any time
> adjtimex() adjusts the clock steering, timekeeping_freqadjust()
> quickly approximates the correct clock frequency over a series
> of ticks.
>
> Unfortunately, the logic in timekeeping_freqadjust(), introduced
> in commit:
>
>   dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz")
>
> used the abs() function with a s64 error value to calculate the
> size of the approximated adjustment to be made.
>
> Per include/linux/kernel.h:
>
>   "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".
>
> Thus on 32-bit platforms, this resulted in the clocksteering to
> take a quite dampended random walk trying to converge on the
> proper frequency, which caused the adjustments to be made much
> slower then intended (most easily observed when large
> adjustments are made).
>
> This patch fixes the issue by using abs64() instead.
>
> Reported-by: Nuno Gonçalves <nunojpg@gmail.com>
> Tested-by: Nuno Goncalves <nunojpg@gmail.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> Cc: <stable@vger.kernel.org> # v3.17+
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Miroslav Lichvar <mlichvar@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/time/timekeeping.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index f6ee2e6..3739ac6 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -1614,7 +1614,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
>         negative = (tick_error < 0);
>
>         /* Sort out the magnitude of the correction */
> -       tick_error = abs(tick_error);
> +       tick_error = abs64(tick_error);
>         for (adj = 0; tick_error > interval; adj++)
>                 tick_error >>= 1;
>

I think this got lost on its way to the linux-stable queue (or I don't
understand the stable rules).

Thanks,
Nuno

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:timers/urgent] time: Fix timekeeping_freqadjust()' s incorrect use of abs() instead of abs64()
  2015-10-05 15:10   ` Nuno Gonçalves
@ 2015-10-06  8:05     ` Greg KH
  0 siblings, 0 replies; 14+ messages in thread
From: Greg KH @ 2015-10-06  8:05 UTC (permalink / raw)
  To: Nuno Gonçalves
  Cc: stable, linux-tip-commits, mingo, hpa, john.stultz, mlichvar,
	richardcochran, peterz, tglx, linux-kernel, prarit, torvalds

On Mon, Oct 05, 2015 at 04:10:28PM +0100, Nuno Gonçalves wrote:
> On Sun, Sep 13, 2015 at 12:07 PM, tip-bot for John Stultz
> <tipbot@zytor.com> wrote:
> > Commit-ID:  2619d7e9c92d524cb155ec89fd72875321512e5b
> > Gitweb:     http://git.kernel.org/tip/2619d7e9c92d524cb155ec89fd72875321512e5b
> > Author:     John Stultz <john.stultz@linaro.org>
> > AuthorDate: Wed, 9 Sep 2015 16:07:30 -0700
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Sun, 13 Sep 2015 10:30:47 +0200
> >
> > time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64()
> >
> > The internal clocksteering done for fine-grained error
> > correction uses a logarithmic approximation, so any time
> > adjtimex() adjusts the clock steering, timekeeping_freqadjust()
> > quickly approximates the correct clock frequency over a series
> > of ticks.
> >
> > Unfortunately, the logic in timekeeping_freqadjust(), introduced
> > in commit:
> >
> >   dc491596f639 ("timekeeping: Rework frequency adjustments to work better w/ nohz")
> >
> > used the abs() function with a s64 error value to calculate the
> > size of the approximated adjustment to be made.
> >
> > Per include/linux/kernel.h:
> >
> >   "abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()".
> >
> > Thus on 32-bit platforms, this resulted in the clocksteering to
> > take a quite dampended random walk trying to converge on the
> > proper frequency, which caused the adjustments to be made much
> > slower then intended (most easily observed when large
> > adjustments are made).
> >
> > This patch fixes the issue by using abs64() instead.
> >
> > Reported-by: Nuno Gonçalves <nunojpg@gmail.com>
> > Tested-by: Nuno Goncalves <nunojpg@gmail.com>
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > Cc: <stable@vger.kernel.org> # v3.17+
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Miroslav Lichvar <mlichvar@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Prarit Bhargava <prarit@redhat.com>
> > Cc: Richard Cochran <richardcochran@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Link: http://lkml.kernel.org/r/1441840051-20244-1-git-send-email-john.stultz@linaro.org
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > ---
> >  kernel/time/timekeeping.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index f6ee2e6..3739ac6 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -1614,7 +1614,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
> >         negative = (tick_error < 0);
> >
> >         /* Sort out the magnitude of the correction */
> > -       tick_error = abs(tick_error);
> > +       tick_error = abs64(tick_error);
> >         for (adj = 0; tick_error > interval; adj++)
> >                 tick_error >>= 1;
> >
> 
> I think this got lost on its way to the linux-stable queue (or I don't
> understand the stable rules).

It's still in the "todo" queue, along with 159 other patches that I have
yet to get to :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-10-06  8:06 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-09 23:07 [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() John Stultz
2015-09-09 23:07 ` [PATCH 2/2 (v2)] kselftest: timers: Add adjtick test to validate adjtimex() tick adjustments John Stultz
2015-09-10 12:02   ` Miroslav Lichvar
2015-09-10 17:42     ` John Stultz
2015-09-10 18:14       ` John Stultz
2015-09-14 14:48         ` Miroslav Lichvar
2015-10-02 20:25           ` John Stultz
2015-10-02 20:27             ` John Stultz
2015-10-02 20:49           ` John Stultz
2015-09-13  8:32 ` [PATCH 1/2] time: Fix timekeeping_freqadjust()'s incorrect use of abs() instead of abs64() Ingo Molnar
2015-09-14 23:24   ` John Stultz
2015-09-13 11:07 ` [tip:timers/urgent] time: Fix timekeeping_freqadjust()' s " tip-bot for John Stultz
2015-10-05 15:10   ` Nuno Gonçalves
2015-10-06  8:05     ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).