* Re: PROBLEM: monotonic clock going backwards on ppc64
2019-02-26 16:13 PROBLEM: monotonic clock going backwards on ppc64 Jakub Drnec
@ 2019-02-28 6:48 ` Daniel Axtens
2019-02-28 7:04 ` Mathieu Malaterre
2019-03-01 13:24 ` Michael Ellerman
2 siblings, 0 replies; 7+ messages in thread
From: Daniel Axtens @ 2019-02-28 6:48 UTC (permalink / raw)
To: Jakub Drnec, linuxppc-dev
Hi,
> I think I observed a potential problem, is this the correct place to report it? (CC me, not on list)
Yes
> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
> than booting with rtc at >2038 and then running ntp.
> When this happens, anything with timers (e.g. openjdk) breaks rather badly.
I can confirm this on a physical 64bit book3e system and powerpc/next
with the provided reproducer.
> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround:
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same problem.
I'll look at the asm but I'm no expert here so we might need to defer to
someone with real expertise. Falling back to the syscall sadly ruins
the performance gain of having a vDSO.
Regards,
Daniel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PROBLEM: monotonic clock going backwards on ppc64
2019-02-26 16:13 PROBLEM: monotonic clock going backwards on ppc64 Jakub Drnec
2019-02-28 6:48 ` Daniel Axtens
@ 2019-02-28 7:04 ` Mathieu Malaterre
2019-02-28 11:17 ` Daniel Axtens
2019-03-01 13:33 ` Jakub Drnec
2019-03-01 13:24 ` Michael Ellerman
2 siblings, 2 replies; 7+ messages in thread
From: Mathieu Malaterre @ 2019-02-28 7:04 UTC (permalink / raw)
To: Jakub Drnec; +Cc: linuxppc-dev
On Tue, Feb 26, 2019 at 9:36 PM Jakub Drnec <jaydee@email.cz> wrote:
>
> Hi all,
>
> I think I observed a potential problem, is this the correct place to report it? (CC me, not on list)
>
> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
> than booting with rtc at >2038 and then running ntp.
Isn't it the expected behavior. Here is what I see for powermac:
$ git show 22db552b50fa
...
This changes the logic to cast to an unsigned 32-bit number first for
the Macintosh time and then convert that to the Unix time, which then
gives us a time in the documented 1904..2040 year range. I decided not
to use the longer 1970..2106 range that other drivers use, for
consistency with the literal interpretation of the register, but that
could be easily changed if we decide we want to support any Mac after
2040.
...
> When this happens, anything with timers (e.g. openjdk) breaks rather badly.
>
> [3.] Keywords: gettimeofday, ppc64, vdso
> [4.] Kernel information
> [4.1.] Kernel version: any (tested on 4.19)
> [4.2.] Kernel .config file: any
> [5.] Most recent kernel version which did not have the bug: not a regression
> [6.] Output of Oops..: not applicable
> [7.] Example program which triggers the problem
> --- testcase.c
> #include <stdio.h>
> #include <time.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> long get_time() {
> struct timespec tp;
> if (clock_gettime(CLOCK_MONOTONIC, &tp) != 0) {
> perror("clock_gettime failed");
> exit(1);
> }
> long result = tp.tv_sec + tp.tv_nsec / 1000000000;
> return result;
> }
>
> int main() {
> printf("monitoring monotonic clock...\n");
> long last = get_time();
> while(1) {
> long now = get_time();
> if (now < last) {
> printf("clock went backwards by %ld seconds!\n",
> last - now);
> }
> last = now;
> sleep(1);
> }
> return 0;
> }
> ---
> when running
> # date -s 2040-1-1
> # date -s 2037-1-1
> program outputs: clock went backwards by 4294967295 seconds!
>
> [8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le running debian unstable
> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround:
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same problem.
>
> Regards,
> Jakub Drnec
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PROBLEM: monotonic clock going backwards on ppc64
2019-02-26 16:13 PROBLEM: monotonic clock going backwards on ppc64 Jakub Drnec
2019-02-28 6:48 ` Daniel Axtens
2019-02-28 7:04 ` Mathieu Malaterre
@ 2019-03-01 13:24 ` Michael Ellerman
2019-03-01 23:59 ` Thomas Gleixner
2 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2019-03-01 13:24 UTC (permalink / raw)
To: Jakub Drnec, linuxppc-dev; +Cc: Stephen Boyd, Thomas Gleixner, John Stultz
Hi Jakub,
[Cc += Timekeeping maintainers]
"Jakub Drnec" <jaydee@email.cz> writes:
> Hi all,
>
> I think I observed a potential problem, is this the correct place to report it? (CC me, not on list)
>
> [1.] One line summary: monotonic clock can be made to decrease on ppc64
> [2.] Full description:
> Setting the realtime clock can sometimes make the monotonic clock go back by over a hundred years.
> Decreasing the realtime clock across the y2k38 threshold is one reliable way to reproduce.
> Allegedly this can also happen just by running ntpd, I have not managed to reproduce that other
> than booting with rtc at >2038 and then running ntp.
> When this happens, anything with timers (e.g. openjdk) breaks rather badly.
Thanks for the report.
> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
You're right, the wall-to-monotonic offset (wtom_clock_sec) is a signed
32-bit value, so that seems like it's going to have problems.
If I do `date -s 2037-1-1` I see:
[ 26.024061] update_vsyscall: tk->wall_to_monotonic.tv_sec -2114341175
[ 26.042633] update_vsyscall: vdso_data->wtom_clock_sec -2114341175
Which looks sane.
But then 2040-1-1 shows:
[ 32.617020] update_vsyscall: tk->wall_to_monotonic.tv_sec -2208949168
[ 32.632642] update_vsyscall: vdso_data->wtom_clock_sec 2086018128
ie. the larger negative offset has overflowed and become positive.
But then when we go back to 2037 we get a negative offset again and
monotonic time appears to go backward and things are unhappy.
I don't know this code well, but the patch below *appears* to work. I'll
have a closer look on Monday.
cheers
diff --git a/arch/powerpc/include/asm/vdso_datapage.h b/arch/powerpc/include/asm/vdso_datapage.h
index 1afe90ade595..139133ec21d5 100644
--- a/arch/powerpc/include/asm/vdso_datapage.h
+++ b/arch/powerpc/include/asm/vdso_datapage.h
@@ -82,7 +82,7 @@ struct vdso_data {
__u32 icache_block_size; /* L1 i-cache block size */
__u32 dcache_log_block_size; /* L1 d-cache log block size */
__u32 icache_log_block_size; /* L1 i-cache log block size */
- __s32 wtom_clock_sec; /* Wall to monotonic clock */
+ __s64 wtom_clock_sec; /* Wall to monotonic clock */
__s32 wtom_clock_nsec;
struct timespec stamp_xtime; /* xtime as at tb_orig_stamp */
__u32 stamp_sec_fraction; /* fractional seconds of stamp_xtime */
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S b/arch/powerpc/kernel/vdso64/gettimeofday.S
index a4ed9edfd5f0..1f324c28705b 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -92,7 +92,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
* At this point, r4,r5 contain our sec/nsec values.
*/
- lwa r6,WTOM_CLOCK_SEC(r3)
+ ld r6,WTOM_CLOCK_SEC(r3)
lwa r9,WTOM_CLOCK_NSEC(r3)
/* We now have our result in r6,r9. We create a fake dependency
@@ -125,7 +125,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
bne cr6,75f
/* CLOCK_MONOTONIC_COARSE */
- lwa r6,WTOM_CLOCK_SEC(r3)
+ ld r6,WTOM_CLOCK_SEC(r3)
lwa r9,WTOM_CLOCK_NSEC(r3)
/* check if counter has updated */
> [3.] Keywords: gettimeofday, ppc64, vdso
> [4.] Kernel information
> [4.1.] Kernel version: any (tested on 4.19)
> [4.2.] Kernel .config file: any
> [5.] Most recent kernel version which did not have the bug: not a regression
> [6.] Output of Oops..: not applicable
> [7.] Example program which triggers the problem
> --- testcase.c
> #include <stdio.h>
> #include <time.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> long get_time() {
> struct timespec tp;
> if (clock_gettime(CLOCK_MONOTONIC, &tp) != 0) {
> perror("clock_gettime failed");
> exit(1);
> }
> long result = tp.tv_sec + tp.tv_nsec / 1000000000;
> return result;
> }
>
> int main() {
> printf("monitoring monotonic clock...\n");
> long last = get_time();
> while(1) {
> long now = get_time();
> if (now < last) {
> printf("clock went backwards by %ld seconds!\n",
> last - now);
> }
> last = now;
> sleep(1);
> }
> return 0;
> }
> ---
> when running
> # date -s 2040-1-1
> # date -s 2037-1-1
> program outputs: clock went backwards by 4294967295 seconds!
>
> [8.] Environment: any ppc64, currently reproducing on qemu-system-ppc64le running debian unstable
> [X.] Other notes, patches, fixes, workarounds:
> The problem seems to be in vDSO code in arch/powerpc/kernel/vdso64/gettimeofday.S.
> (possibly because some values used in the calculation are only 32 bit?)
> Slightly silly workaround:
> nuke the "cmpwi cr1,r3,CLOCK_MONOTONIC" in __kernel_clock_gettime
> Now it always goes through the syscall fallback which does not have the same problem.
>
> Regards,
> Jakub Drnec
^ permalink raw reply related [flat|nested] 7+ messages in thread