linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] clocksource, Add warning to clocksource_delta() validation code
@ 2014-10-17 13:57 Prarit Bhargava
  2014-10-17 17:23 ` Prarit Bhargava
  2014-10-17 18:17 ` John Stultz
  0 siblings, 2 replies; 6+ messages in thread
From: Prarit Bhargava @ 2014-10-17 13:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Prarit Bhargava, John Stultz, Thomas Gleixner

A bug report came in against an older kernel which output "backward time"
messages and the report noted that the upstream kernel worked.  After some
investigation it turned out that one of the sockets was bad on the system
and the "backward time" messages were caused by a real, but intermittent,
hardware failure.

Commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108 ("clocksource: Move
cycle_last validation to core code") modifies the x86 clocksource such that
if a negative delta between two reads of time is calculated the
clocksource_delta() code will return 0.  There is no warning when this
occurs and there really should be one in order to catch not only hardware
issues like the issue above, but potential coding issues as the code is
modified.  This patch introduces a WARN() which will also dump a stack
trace to the console so the exact code path can be evaluated.

I tested this by booting on the broken hardware and left the system idle
until a negative clocksource_delta() event occurred.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
 kernel/time/timekeeping_internal.h |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
index 4ea005a..abe6bc8 100644
--- a/kernel/time/timekeeping_internal.h
+++ b/kernel/time/timekeeping_internal.h
@@ -17,7 +17,12 @@ static inline cycle_t clocksource_delta(cycle_t now, cycle_t last, cycle_t mask)
 {
 	cycle_t ret = (now - last) & mask;
 
-	return (s64) ret > 0 ? ret : 0;
+	if ((s64)ret > 0)
+		return ret;
+
+	WARN(1, "Clocksource calculated negative delta, %lld.  last = %llu, now = %llu, mask = %llx\n",
+	     (s64)ret, last, now, mask);
+	return 0;
 }
 #else
 static inline cycle_t clocksource_delta(cycle_t now, cycle_t last, cycle_t mask)
-- 
1.7.9.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource, Add warning to clocksource_delta() validation code
  2014-10-17 13:57 [PATCH] clocksource, Add warning to clocksource_delta() validation code Prarit Bhargava
@ 2014-10-17 17:23 ` Prarit Bhargava
  2014-10-17 18:17 ` John Stultz
  1 sibling, 0 replies; 6+ messages in thread
From: Prarit Bhargava @ 2014-10-17 17:23 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: linux-kernel, John Stultz, Thomas Gleixner



On 10/17/2014 09:57 AM, Prarit Bhargava wrote:
> A bug report came in against an older kernel which output "backward time"
> messages and the report noted that the upstream kernel worked.  After some
> investigation it turned out that one of the sockets was bad on the system
> and the "backward time" messages were caused by a real, but intermittent,
> hardware failure.
> 
> Commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108 ("clocksource: Move
> cycle_last validation to core code") modifies the x86 clocksource such that
> if a negative delta between two reads of time is calculated the
> clocksource_delta() code will return 0.  There is no warning when this
> occurs and there really should be one in order to catch not only hardware
> issues like the issue above, but potential coding issues as the code is
> modified.  This patch introduces a WARN() which will also dump a stack
> trace to the console so the exact code path can be evaluated.
> 
> I tested this by booting on the broken hardware and left the system idle
> until a negative clocksource_delta() event occurred.
> 
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>

Sorry everyone -- I accidentally sent this out.  It is still untested and needs
some bake time.

Please ignore this patch.

P.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource, Add warning to clocksource_delta() validation code
  2014-10-17 13:57 [PATCH] clocksource, Add warning to clocksource_delta() validation code Prarit Bhargava
  2014-10-17 17:23 ` Prarit Bhargava
@ 2014-10-17 18:17 ` John Stultz
  2014-10-17 18:23   ` Prarit Bhargava
  1 sibling, 1 reply; 6+ messages in thread
From: John Stultz @ 2014-10-17 18:17 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: lkml, Thomas Gleixner

On Fri, Oct 17, 2014 at 6:57 AM, Prarit Bhargava <prarit@redhat.com> wrote:
> A bug report came in against an older kernel which output "backward time"
> messages and the report noted that the upstream kernel worked.  After some
> investigation it turned out that one of the sockets was bad on the system
> and the "backward time" messages were caused by a real, but intermittent,
> hardware failure.
>
> Commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108 ("clocksource: Move
> cycle_last validation to core code") modifies the x86 clocksource such that
> if a negative delta between two reads of time is calculated the
> clocksource_delta() code will return 0.  There is no warning when this
> occurs and there really should be one in order to catch not only hardware
> issues like the issue above, but potential coding issues as the code is
> modified.  This patch introduces a WARN() which will also dump a stack
> trace to the console so the exact code path can be evaluated.
>
> I tested this by booting on the broken hardware and left the system idle
> until a negative clocksource_delta() event occurred.
>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> ---
>  kernel/time/timekeeping_internal.h |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
> index 4ea005a..abe6bc8 100644
> --- a/kernel/time/timekeeping_internal.h
> +++ b/kernel/time/timekeeping_internal.h
> @@ -17,7 +17,12 @@ static inline cycle_t clocksource_delta(cycle_t now, cycle_t last, cycle_t mask)
>  {
>         cycle_t ret = (now - last) & mask;
>
> -       return (s64) ret > 0 ? ret : 0;
> +       if ((s64)ret > 0)
> +               return ret;
> +
> +       WARN(1, "Clocksource calculated negative delta, %lld.  last = %llu, now = %llu, mask = %llx\n",
> +            (s64)ret, last, now, mask);
> +       return 0;


I realize you followed up that this wasn't finished, but just as some
feedback, there's a number of types of hardware where there may be a
very slight skew between cpu TSC, and this will briefly trigger right
after each timekeeping update if a system is reading the clock
frequently (think of the case where the update happens on the cpu
thats just a little bit ahead, while a timestamping loop is running on
a cpu that is a little bit behind).

So this warning would trigger quite often and be a bit alarming when
its really expected (well... more like unfortunate but handled)
behavior.

thanks
-john

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource, Add warning to clocksource_delta() validation code
  2014-10-17 18:17 ` John Stultz
@ 2014-10-17 18:23   ` Prarit Bhargava
  2014-10-17 18:27     ` John Stultz
  0 siblings, 1 reply; 6+ messages in thread
From: Prarit Bhargava @ 2014-10-17 18:23 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Thomas Gleixner



On 10/17/2014 02:17 PM, John Stultz wrote:
> On Fri, Oct 17, 2014 at 6:57 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>> A bug report came in against an older kernel which output "backward time"
>> messages and the report noted that the upstream kernel worked.  After some
>> investigation it turned out that one of the sockets was bad on the system
>> and the "backward time" messages were caused by a real, but intermittent,
>> hardware failure.
>>
>> Commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108 ("clocksource: Move
>> cycle_last validation to core code") modifies the x86 clocksource such that
>> if a negative delta between two reads of time is calculated the
>> clocksource_delta() code will return 0.  There is no warning when this
>> occurs and there really should be one in order to catch not only hardware
>> issues like the issue above, but potential coding issues as the code is
>> modified.  This patch introduces a WARN() which will also dump a stack
>> trace to the console so the exact code path can be evaluated.
>>
>> I tested this by booting on the broken hardware and left the system idle
>> until a negative clocksource_delta() event occurred.
>>
>> Cc: John Stultz <john.stultz@linaro.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>> ---
>>  kernel/time/timekeeping_internal.h |    7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
>> index 4ea005a..abe6bc8 100644
>> --- a/kernel/time/timekeeping_internal.h
>> +++ b/kernel/time/timekeeping_internal.h
>> @@ -17,7 +17,12 @@ static inline cycle_t clocksource_delta(cycle_t now, cycle_t last, cycle_t mask)
>>  {
>>         cycle_t ret = (now - last) & mask;
>>
>> -       return (s64) ret > 0 ? ret : 0;
>> +       if ((s64)ret > 0)
>> +               return ret;
>> +
>> +       WARN(1, "Clocksource calculated negative delta, %lld.  last = %llu, now = %llu, mask = %llx\n",
>> +            (s64)ret, last, now, mask);
>> +       return 0;
> 
> 
> I realize you followed up that this wasn't finished, but just as some
> feedback, there's a number of types of hardware where there may be a
> very slight skew between cpu TSC, and this will briefly trigger right
> after each timekeeping update if a system is reading the clock
> frequently (think of the case where the update happens on the cpu
> thats just a little bit ahead, while a timestamping loop is running on
> a cpu that is a little bit behind).

Ah, interesting.  Okay ... drop this patch then.  Thanks for the info John.

P.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource, Add warning to clocksource_delta() validation code
  2014-10-17 18:23   ` Prarit Bhargava
@ 2014-10-17 18:27     ` John Stultz
  2014-10-18 11:06       ` Prarit Bhargava
  0 siblings, 1 reply; 6+ messages in thread
From: John Stultz @ 2014-10-17 18:27 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: lkml, Thomas Gleixner

On Fri, Oct 17, 2014 at 11:23 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>
>
> On 10/17/2014 02:17 PM, John Stultz wrote:
>> On Fri, Oct 17, 2014 at 6:57 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>>> A bug report came in against an older kernel which output "backward time"
>>> messages and the report noted that the upstream kernel worked.  After some
>>> investigation it turned out that one of the sockets was bad on the system
>>> and the "backward time" messages were caused by a real, but intermittent,
>>> hardware failure.
>>>
>>> Commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108 ("clocksource: Move
>>> cycle_last validation to core code") modifies the x86 clocksource such that
>>> if a negative delta between two reads of time is calculated the
>>> clocksource_delta() code will return 0.  There is no warning when this
>>> occurs and there really should be one in order to catch not only hardware
>>> issues like the issue above, but potential coding issues as the code is
>>> modified.  This patch introduces a WARN() which will also dump a stack
>>> trace to the console so the exact code path can be evaluated.
>>>
>>> I tested this by booting on the broken hardware and left the system idle
>>> until a negative clocksource_delta() event occurred.
>>>
>>> Cc: John Stultz <john.stultz@linaro.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>>> ---
>>>  kernel/time/timekeeping_internal.h |    7 ++++++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
>>> index 4ea005a..abe6bc8 100644
>>> --- a/kernel/time/timekeeping_internal.h
>>> +++ b/kernel/time/timekeeping_internal.h
>>> @@ -17,7 +17,12 @@ static inline cycle_t clocksource_delta(cycle_t now, cycle_t last, cycle_t mask)
>>>  {
>>>         cycle_t ret = (now - last) & mask;
>>>
>>> -       return (s64) ret > 0 ? ret : 0;
>>> +       if ((s64)ret > 0)
>>> +               return ret;
>>> +
>>> +       WARN(1, "Clocksource calculated negative delta, %lld.  last = %llu, now = %llu, mask = %llx\n",
>>> +            (s64)ret, last, now, mask);
>>> +       return 0;
>>
>>
>> I realize you followed up that this wasn't finished, but just as some
>> feedback, there's a number of types of hardware where there may be a
>> very slight skew between cpu TSC, and this will briefly trigger right
>> after each timekeeping update if a system is reading the clock
>> frequently (think of the case where the update happens on the cpu
>> thats just a little bit ahead, while a timestamping loop is running on
>> a cpu that is a little bit behind).
>
> Ah, interesting.  Okay ... drop this patch then.  Thanks for the info John.

If you're wanting something that aids with debugging, maybe some sort
calmly stated warn-once in the dmesg might be ok, that or some other
flag exported via a debugging interface.

thanks
-john

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] clocksource, Add warning to clocksource_delta() validation code
  2014-10-17 18:27     ` John Stultz
@ 2014-10-18 11:06       ` Prarit Bhargava
  0 siblings, 0 replies; 6+ messages in thread
From: Prarit Bhargava @ 2014-10-18 11:06 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Thomas Gleixner



On 10/17/2014 02:27 PM, John Stultz wrote:
> On Fri, Oct 17, 2014 at 11:23 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>>
>>
>> On 10/17/2014 02:17 PM, John Stultz wrote:
>>> On Fri, Oct 17, 2014 at 6:57 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>>>> A bug report came in against an older kernel which output "backward time"
>>>> messages and the report noted that the upstream kernel worked.  After some
>>>> investigation it turned out that one of the sockets was bad on the system
>>>> and the "backward time" messages were caused by a real, but intermittent,
>>>> hardware failure.
>>>>
>>>> Commit 09ec54429c6d10f87d1f084de53ae2c1c3a81108 ("clocksource: Move
>>>> cycle_last validation to core code") modifies the x86 clocksource such that
>>>> if a negative delta between two reads of time is calculated the
>>>> clocksource_delta() code will return 0.  There is no warning when this
>>>> occurs and there really should be one in order to catch not only hardware
>>>> issues like the issue above, but potential coding issues as the code is
>>>> modified.  This patch introduces a WARN() which will also dump a stack
>>>> trace to the console so the exact code path can be evaluated.
>>>>
>>>> I tested this by booting on the broken hardware and left the system idle
>>>> until a negative clocksource_delta() event occurred.
>>>>
>>>> Cc: John Stultz <john.stultz@linaro.org>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
>>>> ---
>>>>  kernel/time/timekeeping_internal.h |    7 ++++++-
>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/time/timekeeping_internal.h b/kernel/time/timekeeping_internal.h
>>>> index 4ea005a..abe6bc8 100644
>>>> --- a/kernel/time/timekeeping_internal.h
>>>> +++ b/kernel/time/timekeeping_internal.h
>>>> @@ -17,7 +17,12 @@ static inline cycle_t clocksource_delta(cycle_t now, cycle_t last, cycle_t mask)
>>>>  {
>>>>         cycle_t ret = (now - last) & mask;
>>>>
>>>> -       return (s64) ret > 0 ? ret : 0;
>>>> +       if ((s64)ret > 0)
>>>> +               return ret;
>>>> +
>>>> +       WARN(1, "Clocksource calculated negative delta, %lld.  last = %llu, now = %llu, mask = %llx\n",
>>>> +            (s64)ret, last, now, mask);
>>>> +       return 0;
>>>
>>>
>>> I realize you followed up that this wasn't finished, but just as some
>>> feedback, there's a number of types of hardware where there may be a
>>> very slight skew between cpu TSC, and this will briefly trigger right
>>> after each timekeeping update if a system is reading the clock
>>> frequently (think of the case where the update happens on the cpu
>>> thats just a little bit ahead, while a timestamping loop is running on
>>> a cpu that is a little bit behind).
>>
>> Ah, interesting.  Okay ... drop this patch then.  Thanks for the info John.
> 
> If you're wanting something that aids with debugging, maybe some sort
> calmly stated warn-once in the dmesg might be ok, that or some other
> flag exported via a debugging interface.

IMO with the clock code I'd prefer it to be 100% accurate.  There is nothing
more annoying than going through the rigors of debugging and discovering that it
is a hardware issue of some sort... This can be dropped IMO.  It's really not
that important.

P.
> 
> thanks
> -john
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-10-18 11:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-17 13:57 [PATCH] clocksource, Add warning to clocksource_delta() validation code Prarit Bhargava
2014-10-17 17:23 ` Prarit Bhargava
2014-10-17 18:17 ` John Stultz
2014-10-17 18:23   ` Prarit Bhargava
2014-10-17 18:27     ` John Stultz
2014-10-18 11:06       ` Prarit Bhargava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).