linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: monotonic clock going backwards on ppc64
@ 2019-02-26 16:13 Jakub Drnec
  2019-02-28  6:48 ` Daniel Axtens
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jakub Drnec @ 2019-02-26 16:13 UTC (permalink / raw)
  To: linuxppc-dev

Hi all,

I think I observed a potential problem, is this the correct place to report it? (CC me, not on list)

[1.] One line summary: monotonic clock can be made to decrease on ppc64
[2.] Full description:
Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years.
Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
than booting with rtc at >2038 and then running ntp.
When this happens, anything with timers (e.g. openjdk) breaks rather badly.

[3.] Keywords: gettimeofday, ppc64, vdso
[4.] Kernel information
[4.1.] Kernel version: any (tested on 4.19)
[4.2.] Kernel .config file: any
[5.] Most recent kernel version which did not have the bug: not a regression
[6.] Output of Oops..: not applicable
[7.] Example program which triggers the problem
--- testcase.c
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <unistd.h>

long get_time() {
  struct timespec tp;
  if (clock_gettime(CLOCK_MONOTONIC, &tp) != 0) {
    perror("clock_gettime failed");
    exit(1);
  }
  long result = tp.tv_sec + tp.tv_nsec / 1000000000;
  return result;
}

int main() {
  printf("monitoring monotonic clock...\n");
  long last = get_time();
  while(1) {
    long now = get_time();
    if (now < last) {
      printf("clock went backwards by %ld seconds!\n",
        last - now);
    }
    last = now;
    sleep(1);
  }
  return 0;
}
---
when running
# date -s 2040-1-1
# date -s 2037-1-1
program outputs: clock went backwards by 4294967295 seconds!

[8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le running debian unstable
[X.] Other notes, patches, fixes, workarounds:
The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
(possibly because some values used in the calculation are only 32 bit?)
Slightly silly workaround:
nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
Now it always goes through the syscall fallback which does not have the same problem.

Regards,
Jakub Drnec

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: monotonic clock going backwards on ppc64
  2019-02-26 16:13 PROBLEM: monotonic clock going backwards on ppc64 Jakub Drnec
@ 2019-02-28  6:48 ` Daniel Axtens
  2019-02-28  7:04 ` Mathieu Malaterre
  2019-03-01 13:24 ` Michael Ellerman
  2 siblings, 0 replies; 7+ messages in thread
From: Daniel Axtens @ 2019-02-28  6:48 UTC (permalink / raw)
  To: Jakub Drnec, linuxppc-dev

Hi,

> I think I observed a potential problem, is this the correct place to report it? (CC me, not on list)

Yes

> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
> than booting with rtc at >2038 and then running ntp.
> When this happens, anything with timers (e.g. openjdk) breaks rather badly.

I can confirm this on a physical 64bit book3e system and powerpc/next
with the provided reproducer.

> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround: 
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same problem.

I'll look at the asm but I'm no expert here so we might need to defer to
someone with real expertise. Falling back to the syscall sadly ruins
the performance gain of having a vDSO.

Regards,
Daniel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: monotonic clock going backwards on ppc64
  2019-02-26 16:13 PROBLEM: monotonic clock going backwards on ppc64 Jakub Drnec
  2019-02-28  6:48 ` Daniel Axtens
@ 2019-02-28  7:04 ` Mathieu Malaterre
  2019-02-28 11:17   ` Daniel Axtens
  2019-03-01 13:33   ` Jakub Drnec
  2019-03-01 13:24 ` Michael Ellerman
  2 siblings, 2 replies; 7+ messages in thread
From: Mathieu Malaterre @ 2019-02-28  7:04 UTC (permalink / raw)
  To: Jakub Drnec; +Cc: linuxppc-dev

On Tue, Feb 26, 2019 at 9:36 PM Jakub Drnec <jaydee@email.cz> wrote:
>
> Hi all,
>
> I think I observed a potential problem, is this the correct place to report it? (CC me, not on list)
>
> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
> than booting with rtc at >2038 and then running ntp.

Isn't it the expected behavior. Here is what I see for powermac:

$ git show 22db552b50fa
...
    This changes the logic to cast to an unsigned 32-bit number first for
    the Macintosh time and then convert that to the Unix time, which then
    gives us a time in the documented 1904..2040 year range. I decided not
    to use the longer 1970..2106 range that other drivers use, for
    consistency with the literal interpretation of the register, but that
    could be easily changed if we decide we want to support any Mac after
    2040.
...

> When this happens, anything with timers (e.g. openjdk) breaks rather badly.
>
> [3.] Keywords: gettimeofday, ppc64, vdso
> [4.] Kernel information
> [4.1.] Kernel version: any (tested on 4.19)
> [4.2.] Kernel .config file: any
> [5.] Most recent kernel version which did not have the bug: not a regression
> [6.] Output of Oops..: not applicable
> [7.] Example program which triggers the problem
> --- testcase.c
> #include <stdio.h>
> #include <time.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> long get_time() {
>   struct timespec tp;
>   if (clock_gettime(CLOCK_MONOTONIC, &tp) != 0) {
>     perror("clock_gettime failed");
>     exit(1);
>   }
>   long result = tp.tv_sec + tp.tv_nsec / 1000000000;
>   return result;
> }
>
> int main() {
>   printf("monitoring monotonic clock...\n");
>   long last = get_time();
>   while(1) {
>     long now = get_time();
>     if (now < last) {
>       printf("clock went backwards by %ld seconds!\n",
>         last - now);
>     }
>     last = now;
>     sleep(1);
>   }
>   return 0;
> }
> ---
> when running
> # date -s 2040-1-1
> # date -s 2037-1-1
> program outputs: clock went backwards by 4294967295 seconds!
>
> [8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le running debian unstable
> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround:
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same problem.
>
> Regards,
> Jakub Drnec

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: monotonic clock going backwards on ppc64
  2019-02-28  7:04 ` Mathieu Malaterre
@ 2019-02-28 11:17   ` Daniel Axtens
  2019-03-01 13:33   ` Jakub Drnec
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Axtens @ 2019-02-28 11:17 UTC (permalink / raw)
  To: Mathieu Malaterre, Jakub Drnec; +Cc: linuxppc-dev

Mathieu Malaterre <malat@debian.org> writes:

> On Tue, Feb 26, 2019 at 9:36 PM Jakub Drnec <jaydee@email.cz> wrote:
>>
>> Hi all,
>>
>> I think I observed a potential problem, is this the correct place to report it? (CC me, not on list)
>>
>> [1.] One line summary: monotonic clock can be made to decrease on ppc64
>> [2.] Full description:
>> Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years.
>> Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
>> Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
>> than booting with rtc at >2038 and then running ntp.
>
> Isn't it the expected behavior. Here is what I see for powermac:
>
> $ git show 22db552b50fa
> ...
>     This changes the logic to cast to an unsigned 32-bit number first for
>     the Macintosh time and then convert that to the Unix time, which then
>     gives us a time in the documented 1904..2040 year range. I decided not
>     to use the longer 1970..2106 range that other drivers use, for
>     consistency with the literal interpretation of the register, but that
>     could be easily changed if we decide we want to support any Mac after
>     2040.
> ...
>

My interpretation of that commit is that it relates to the kernel
reading the hardware RTC on a powermac, but this issue relates to
userspace fetching the time from the vDSO. I'm also not running on a
powermac, so my hardware should be able to deal with times > 2040...


Regards,
Daniel

>> When this happens, anything with timers (e.g. openjdk) breaks rather badly.
>>
>> [3.] Keywords: gettimeofday, ppc64, vdso
>> [4.] Kernel information
>> [4.1.] Kernel version: any (tested on 4.19)
>> [4.2.] Kernel .config file: any
>> [5.] Most recent kernel version which did not have the bug: not a regression
>> [6.] Output of Oops..: not applicable
>> [7.] Example program which triggers the problem
>> --- testcase.c
>> #include <stdio.h>
>> #include <time.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>>
>> long get_time() {
>>   struct timespec tp;
>>   if (clock_gettime(CLOCK_MONOTONIC, &tp) != 0) {
>>     perror("clock_gettime failed");
>>     exit(1);
>>   }
>>   long result = tp.tv_sec + tp.tv_nsec / 1000000000;
>>   return result;
>> }
>>
>> int main() {
>>   printf("monitoring monotonic clock...\n");
>>   long last = get_time();
>>   while(1) {
>>     long now = get_time();
>>     if (now < last) {
>>       printf("clock went backwards by %ld seconds!\n",
>>         last - now);
>>     }
>>     last = now;
>>     sleep(1);
>>   }
>>   return 0;
>> }
>> ---
>> when running
>> # date -s 2040-1-1
>> # date -s 2037-1-1
>> program outputs: clock went backwards by 4294967295 seconds!
>>
>> [8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le running debian unstable
>> [X.] Other notes, patches, fixes, workarounds:
>> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
>> (possibly because some values used in the calculation are only 32 bit?)
>> Slightly silly workaround:
>> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
>> Now it always goes through the syscall fallback which does not have the same problem.
>>
>> Regards,
>> Jakub Drnec

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: monotonic clock going backwards on ppc64
  2019-02-26 16:13 PROBLEM: monotonic clock going backwards on ppc64 Jakub Drnec
  2019-02-28  6:48 ` Daniel Axtens
  2019-02-28  7:04 ` Mathieu Malaterre
@ 2019-03-01 13:24 ` Michael Ellerman
  2019-03-01 23:59   ` Thomas Gleixner
  2 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2019-03-01 13:24 UTC (permalink / raw)
  To: Jakub Drnec, linuxppc-dev; +Cc: Stephen Boyd, Thomas Gleixner, John Stultz

Hi Jakub,

[Cc += Timekeeping maintainers]

"Jakub Drnec" <jaydee@email.cz> writes:
> Hi all,
>
> I think I observed a potential problem, is this the correct place to report it? (CC me, not on list)
>
> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
> than booting with rtc at >2038 and then running ntp.
> When this happens, anything with timers (e.g. openjdk) breaks rather badly.

Thanks for the report.

> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.

You're right, the wall-to-monotonic offset (wtom_clock_sec) is a signed
32-bit value, so that seems like it's going to have problems.

If I do `date -s 2037-1-1` I see:

[   26.024061] update_vsyscall: tk->wall_to_monotonic.tv_sec -2114341175
[   26.042633] update_vsyscall: vdso_data->wtom_clock_sec    -2114341175

Which looks sane.

But then 2040-1-1 shows:

[   32.617020] update_vsyscall: tk->wall_to_monotonic.tv_sec -2208949168
[   32.632642] update_vsyscall: vdso_data->wtom_clock_sec     2086018128

ie. the larger negative offset has overflowed and become positive.

But then when we go back to 2037 we get a negative offset again and
monotonic time appears to go backward and things are unhappy.

I don't know this code well, but the patch below *appears* to work. I'll
have a closer look on Monday.

cheers


diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h
index 1afe90ade595..139133ec21d5 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -82,7 +82,7 @@ struct vdso_data {
 	__u32 icache_block_size;		/* L1 i-cache block size     */
 	__u32 dcache_log_block_size;		/* L1 d-cache log block size */
 	__u32 icache_log_block_size;		/* L1 i-cache log block size */
-	__s32 wtom_clock_sec;			/* Wall to monotonic clock */
+	__s64 wtom_clock_sec;			/* Wall to monotonic clock */
 	__s32 wtom_clock_nsec;
 	struct timespec stamp_xtime;	/* xtime as at tb_orig_stamp */
 	__u32 stamp_sec_fraction;	/* fractional seconds of stamp_xtime */
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S b/arch/powerpc/kernel/vdso64/gettimeofday.S
index a4ed9edfd5f0..1f324c28705b 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -92,7 +92,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
 	 * At this point, r4,r5 contain our sec/nsec values.
 	 */
 
-	lwa	r6,WTOM_CLOCK_SEC(r3)
+	ld	r6,WTOM_CLOCK_SEC(r3)
 	lwa	r9,WTOM_CLOCK_NSEC(r3)
 
 	/* We now have our result in r6,r9. We create a fake dependency
@@ -125,7 +125,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
 	bne     cr6,75f
 
 	/* CLOCK_MONOTONIC_COARSE */
-	lwa     r6,WTOM_CLOCK_SEC(r3)
+	ld	r6,WTOM_CLOCK_SEC(r3)
 	lwa     r9,WTOM_CLOCK_NSEC(r3)
 
 	/* check if counter has updated */


> [3.] Keywords: gettimeofday, ppc64, vdso
> [4.] Kernel information
> [4.1.] Kernel version: any (tested on 4.19)
> [4.2.] Kernel .config file: any
> [5.] Most recent kernel version which did not have the bug: not a regression
> [6.] Output of Oops..: not applicable
> [7.] Example program which triggers the problem
> --- testcase.c
> #include <stdio.h>
> #include <time.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> long get_time() {
>   struct timespec tp;
>   if (clock_gettime(CLOCK_MONOTONIC, &tp) != 0) {
>     perror("clock_gettime failed");
>     exit(1);
>   }
>   long result = tp.tv_sec + tp.tv_nsec / 1000000000;
>   return result;
> }
>
> int main() {
>   printf("monitoring monotonic clock...\n");
>   long last = get_time();
>   while(1) {
>     long now = get_time();
>     if (now < last) {
>       printf("clock went backwards by %ld seconds!\n",
>         last - now);
>     }
>     last = now;
>     sleep(1);
>   }
>   return 0;
> }
> ---
> when running
> # date -s 2040-1-1
> # date -s 2037-1-1
> program outputs: clock went backwards by 4294967295 seconds!
>
> [8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le running debian unstable
> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround: 
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same problem.
>
> Regards,
> Jakub Drnec

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: PROBLEM: monotonic clock going backwards on ppc64
  2019-02-28  7:04 ` Mathieu Malaterre
  2019-02-28 11:17   ` Daniel Axtens
@ 2019-03-01 13:33   ` Jakub Drnec
  1 sibling, 0 replies; 7+ messages in thread
From: Jakub Drnec @ 2019-03-01 13:33 UTC (permalink / raw)
  To: linuxppc-dev

Hi,

>> Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
>> Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
>> than booting with rtc at >2038 and then running ntp.

> Isn't it the expected behavior. Here is what I see for powermac:
>...    gives us a time in the documented 1904..2040 year range. I decided not

This (or some related issue involving monotonic time going wonky) surfaced on a system running 4.5 kernel with sane clock values running ntpd. It may not be limited to setting the clock beyond 2038, that is just a convenient way to reproduce something visibly wrong.

If the "ntpd-related monotonic" issue is separate from what I reported, I did not find anything in the commit history that looks likely to have fixed that, so I think that is still a thing.

Another way of looking at the issue is that when you set the clock beyond 2038 (or the mystery ntpd thing happens), vdso and syscall versions of clock_gettime start reporting wildly different values for CLOCK_MONOTONIC. If someone manages to somehow interlace calls to those two assuming they are consistent, crazy bugs will ensue...

Regards,
Jakub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: monotonic clock going backwards on ppc64
  2019-03-01 13:24 ` Michael Ellerman
@ 2019-03-01 23:59   ` Thomas Gleixner
  0 siblings, 0 replies; 7+ messages in thread
From: Thomas Gleixner @ 2019-03-01 23:59 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: John Stultz, linuxppc-dev, Jakub Drnec, Stephen Boyd

On Sat, 2 Mar 2019, Michael Ellerman wrote:
> > The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
> 
> You're right, the wall-to-monotonic offset (wtom_clock_sec) is a signed
> 32-bit value, so that seems like it's going to have problems.
> 
> If I do `date -s 2037-1-1` I see:
> 
> [   26.024061] update_vsyscall: tk->wall_to_monotonic.tv_sec -2114341175
> [   26.042633] update_vsyscall: vdso_data->wtom_clock_sec    -2114341175
> 
> Which looks sane.
> 
> But then 2040-1-1 shows:
> 
> [   32.617020] update_vsyscall: tk->wall_to_monotonic.tv_sec -2208949168
> [   32.632642] update_vsyscall: vdso_data->wtom_clock_sec     2086018128
> 
> ie. the larger negative offset has overflowed and become positive.
> 
> But then when we go back to 2037 we get a negative offset again and
> monotonic time appears to go backward and things are unhappy.
> 
> I don't know this code well, but the patch below *appears* to work. I'll
> have a closer look on Monday.

Looks about right. The core hands in the 64bit offset already.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-03-02  0:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26 16:13 PROBLEM: monotonic clock going backwards on ppc64 Jakub Drnec
2019-02-28  6:48 ` Daniel Axtens
2019-02-28  7:04 ` Mathieu Malaterre
2019-02-28 11:17   ` Daniel Axtens
2019-03-01 13:33   ` Jakub Drnec
2019-03-01 13:24 ` Michael Ellerman
2019-03-01 23:59   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).