linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems
@ 2013-06-03 13:34 Tobias Waldekranz
  2013-06-03 14:34 ` Thomas Gleixner
  0 siblings, 1 reply; 8+ messages in thread
From: Tobias Waldekranz @ 2013-06-03 13:34 UTC (permalink / raw)
  To: tglx; +Cc: linux-kernel

In ktime_get_update_offsets, calculate the current time in the same
way as in ktime_get.

On 32-bit systems, the current time is truncated via the call to
ktime_set, the following subtraction of offs_real will result in an
inaccurate time when the current number of seconds since epoch can no
longer fit in 31-bits (2038-01-19 03:14:07 UTC). This will send
hrtimer_interrupt into an infinite loop on some architectures (arm),
or emit an oops on others(x86).

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 kernel/time/timekeeping.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 98cd470..b484ab2 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1600,8 +1600,8 @@ ktime_t ktime_get_update_offsets(ktime_t
*offs_real, ktime_t *offs_boot,
  do {
  seq = read_seqcount_begin(&timekeeper_seq);

- secs = tk->xtime_sec;
- nsecs = timekeeping_get_ns(tk);
+ secs = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
+ nsecs = timekeeping_get_ns(tk) + tk->wall_to_monotonic.tv_nsec;

  *offs_real = tk->offs_real;
  *offs_boot = tk->offs_boot;
@@ -1609,7 +1609,6 @@ ktime_t ktime_get_update_offsets(ktime_t
*offs_real, ktime_t *offs_boot,
  } while (read_seqcount_retry(&timekeeper_seq, seq));

  now = ktime_add_ns(ktime_set(secs, 0), nsecs);
- now = ktime_sub(now, *offs_real);
  return now;
 }
 #endif
--
1.7.10.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems
  2013-06-03 13:34 [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems Tobias Waldekranz
@ 2013-06-03 14:34 ` Thomas Gleixner
  2013-06-03 19:04   ` John Stultz
  2013-06-04  6:59   ` Tobias Waldekranz
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Gleixner @ 2013-06-03 14:34 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: LKML, John Stultz, Ingo Molnar, Peter Zijlstra

B1;2601;0cOn Mon, 3 Jun 2013, Tobias Waldekranz wrote:
> In ktime_get_update_offsets, calculate the current time in the same
> way as in ktime_get.
> 
> On 32-bit systems, the current time is truncated via the call to
> ktime_set, the following subtraction of offs_real will result in an
> inaccurate time when the current number of seconds since epoch can no
> longer fit in 31-bits (2038-01-19 03:14:07 UTC). This will send
> hrtimer_interrupt into an infinite loop on some architectures (arm),
> or emit an oops on others(x86).

If we really want to survive 2038, then we need to get rid of the
timespec based representation of time in the kernel alltogether and
switch all related code over to a scalar nsec 64bit storage.

Just "fixing" some random parts of the code in a "make it work
somehow" way is a pointless exercise IMO.

We already had long discussions about how the timekeeping code should
be restructured to address that and other problems at least on the
kernel side and switching everything to scalar storage is definitely
the way to go.

Though even if we fix that we still need to twist our brains around
the timespec/timeval based user space interfaces. That's going to be
the way more interesting challenge.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems
  2013-06-03 14:34 ` Thomas Gleixner
@ 2013-06-03 19:04   ` John Stultz
  2013-06-07 21:53     ` Thomas Gleixner
  2013-06-04  6:59   ` Tobias Waldekranz
  1 sibling, 1 reply; 8+ messages in thread
From: John Stultz @ 2013-06-03 19:04 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Tobias Waldekranz, LKML, Ingo Molnar, Peter Zijlstra

On 06/03/2013 07:34 AM, Thomas Gleixner wrote:
> B1;2601;0cOn Mon, 3 Jun 2013, Tobias Waldekranz wrote:
>> In ktime_get_update_offsets, calculate the current time in the same
>> way as in ktime_get.
>>
>> On 32-bit systems, the current time is truncated via the call to
>> ktime_set, the following subtraction of offs_real will result in an
>> inaccurate time when the current number of seconds since epoch can no
>> longer fit in 31-bits (2038-01-19 03:14:07 UTC). This will send
>> hrtimer_interrupt into an infinite loop on some architectures (arm),
>> or emit an oops on others(x86).
> If we really want to survive 2038, then we need to get rid of the
> timespec based representation of time in the kernel alltogether and
> switch all related code over to a scalar nsec 64bit storage.
>
> Just "fixing" some random parts of the code in a "make it work
> somehow" way is a pointless exercise IMO.
>
> We already had long discussions about how the timekeeping code should
> be restructured to address that and other problems at least on the
> kernel side and switching everything to scalar storage is definitely
> the way to go.
>
> Though even if we fix that we still need to twist our brains around
> the timespec/timeval based user space interfaces. That's going to be
> the way more interesting challenge.

So yea.. there's a couple approaches for userland that probably need 
more discussion:

1) Create a new ABI for 32bit platforms that have a 64bit time_t
     - I know x32 was talking about this, but I don't actually see that 
code upstream, so maybe there was an issue that blocked this?
     - In talking with some folks, there was some question on how to 
handle multiple compat types, so a 64bit OS could support both old and 
new 32bit abis. I suspect there's some approach that would work here, 
but haven't done any research.

2) Add new time64_t/timespec64 structures, and add new 64bit versions of 
syscalls for any syscall that takes a timespec/time_t
     - This is a ton of work, and lots of new syscalls. Yuck.

3) Redefine time_t to be unsigned. (Possibly as part of an abi bump?). 
This is attractive, as it requires the least change to the kernel 
interfaces, and in many cases existing userland won't care (there' no 
userland that's setting timers for dates prior to 1970). We'd just need 
to update the libc ascii time formatting, and basically give up dates 
prior to 1970. Of course, this is probably too optimistic as existing 
userland code that does (time_a < time_b) would have issues comparing 
dates before and after the 2038 overflow. However, those apps will be 
broken no matter what, so I'm starting to think this approach is likely 
to be the most reasonable.


I'm curious if there are any there other ideas that folks are considering?

thanks
-john


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems
  2013-06-03 14:34 ` Thomas Gleixner
  2013-06-03 19:04   ` John Stultz
@ 2013-06-04  6:59   ` Tobias Waldekranz
  2013-06-07 20:57     ` Thomas Gleixner
  1 sibling, 1 reply; 8+ messages in thread
From: Tobias Waldekranz @ 2013-06-04  6:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, John Stultz, Ingo Molnar, Peter Zijlstra

On Mon, Jun 03, 2013 at 04:34:25PM +0200, Thomas Gleixner wrote:
> B1;2601;0cOn Mon, 3 Jun 2013, Tobias Waldekranz wrote:
> > In ktime_get_update_offsets, calculate the current time in the same
> > way as in ktime_get.
> > 
> > On 32-bit systems, the current time is truncated via the call to
> > ktime_set, the following subtraction of offs_real will result in an
> > inaccurate time when the current number of seconds since epoch can no
> > longer fit in 31-bits (2038-01-19 03:14:07 UTC). This will send
> > hrtimer_interrupt into an infinite loop on some architectures (arm),
> > or emit an oops on others(x86).
> 
> If we really want to survive 2038, then we need to get rid of the
> timespec based representation of time in the kernel alltogether and
> switch all related code over to a scalar nsec 64bit storage.
> 
Agreed.

> Just "fixing" some random parts of the code in a "make it work
> somehow" way is a pointless exercise IMO.
> 
Now hold on, it is hardly random. On an ARM system, the kernel will
completely hang. I would think that many users would like to avoid
that. In addition this behavior is rather new, hrtimer_interrupt used
to source its time from ktime_get which avoids this issue. The change
was introduced in:

5baefd6d84163443215f4a99f6a20f054ef11236

I understand that you would like a solution to the broader issue. But
for some users (embedded especially) having a system that continues to
operate 25 years from now is an issue today.

As for "make it work somehow", modifying the current time calculation
to work in the same way as in ktime_get does seem to be a reasonable
way to go IMO.
 
> We already had long discussions about how the timekeeping code should
> be restructured to address that and other problems at least on the
> kernel side and switching everything to scalar storage is definitely
> the way to go.
> 
> Though even if we fix that we still need to twist our brains around
> the timespec/timeval based user space interfaces. That's going to be
> the way more interesting challenge.
> 
> Thanks,
> 
> 	tglx

Thanks,
	wkz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems
  2013-06-04  6:59   ` Tobias Waldekranz
@ 2013-06-07 20:57     ` Thomas Gleixner
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2013-06-07 20:57 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: LKML, John Stultz, Ingo Molnar, Peter Zijlstra

Tobias,

On Tue, 4 Jun 2013, Tobias Waldekranz wrote:
> On Mon, Jun 03, 2013 at 04:34:25PM +0200, Thomas Gleixner wrote:
> > Just "fixing" some random parts of the code in a "make it work
> > somehow" way is a pointless exercise IMO.
> > 
> Now hold on, it is hardly random. On an ARM system, the kernel will
> completely hang. I would think that many users would like to avoid
> that. In addition this behavior is rather new, hrtimer_interrupt used
> to source its time from ktime_get which avoids this issue. The change
> was introduced in:
> 
> 5baefd6d84163443215f4a99f6a20f054ef11236
> 
> I understand that you would like a solution to the broader issue. But
> for some users (embedded especially) having a system that continues to
> operate 25 years from now is an issue today.
> 
> As for "make it work somehow", modifying the current time calculation
> to work in the same way as in ktime_get does seem to be a reasonable
> way to go IMO.

No, it's not. You are "fixing" something which is not fixable by
definition. There is no rule to prevent similar borkage tomorrow.

You are just band aiding a singular instance of a massive problem
which has an already known root cause.

If you really care about your system working in 25 years from now with
the kernel of today then you rather should sit down and fix it proper.

Your "fix" merily allows the system to boot, but its broken beyond
repair aside of that. So what's the point?

If we do not tackle the underlying issues, then your machine will be
rendered completely useless with or without your patch. It's that
simple. So don't try to sell me a bandaid hack as a reasonable way to
go.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems
  2013-06-03 19:04   ` John Stultz
@ 2013-06-07 21:53     ` Thomas Gleixner
  2013-06-20 12:34       ` Ingo Molnar
  2013-08-24 23:47       ` Michael Gilbert
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Gleixner @ 2013-06-07 21:53 UTC (permalink / raw)
  To: John Stultz; +Cc: Tobias Waldekranz, LKML, Ingo Molnar, Peter Zijlstra

On Mon, 3 Jun 2013, John Stultz wrote:
> On 06/03/2013 07:34 AM, Thomas Gleixner wrote:
> > Though even if we fix that we still need to twist our brains around
> > the timespec/timeval based user space interfaces. That's going to be
> > the way more interesting challenge.
> 
> I'm curious if there are any there other ideas that folks are considering?

Honestly, we have almost 25 years ahead of us to solve that. So why
hurry? If Tobias thinks that his embedded system of today needs to
survive 2038 without updating the kernel and all of userspace, then
all I can do is wish him good luck. Albeit we should not waste 25
years and run into another Y2K horror. :)

The only solid solution is to implement a new set of syscalls (and
there are not that many which are affected by this). The new syscalls
should use a nanosecond based scalar time value and get rid of the
timespec /timeval / time_t nonsense alltogether. That reduces the
number of new syscalls significantly.

That time value should be 64bit, also people might argue, that we are
creating a new issue for the year 2554, i.e 541 years from now. I
don't think we need to worry about that really. We have to leave our
grand-grand-grand..grandchildren (~20 generations from now) a few
unsolved problems!

The evil plan to make this happen looks like this:

    1) Convert the core code to u64 with a timespec based shadow
       infrastruture to avoid performance regressions in the first
       place.

    2) Add new u64 based syscalls

    3) Disable the timespec based shadow infrastructure five years
       from now to force all lazy buggers who ignored the new syscalls
       to fix their crap.

    4) Deprecate the old syscalls 10 years from now

    5) Remove the old syscalls 100 years from now so Linus won't hunt
       us for breaking userspace :)

Thanks,

	tglx

    






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems
  2013-06-07 21:53     ` Thomas Gleixner
@ 2013-06-20 12:34       ` Ingo Molnar
  2013-08-24 23:47       ` Michael Gilbert
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2013-06-20 12:34 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: John Stultz, Tobias Waldekranz, LKML, Peter Zijlstra


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Mon, 3 Jun 2013, John Stultz wrote:
> > On 06/03/2013 07:34 AM, Thomas Gleixner wrote:
> > > Though even if we fix that we still need to twist our brains around
> > > the timespec/timeval based user space interfaces. That's going to be
> > > the way more interesting challenge.
> > 
> > I'm curious if there are any there other ideas that folks are considering?
> 
> Honestly, we have almost 25 years ahead of us to solve that. So why
> hurry? If Tobias thinks that his embedded system of today needs to
> survive 2038 without updating the kernel and all of userspace, then
> all I can do is wish him good luck. Albeit we should not waste 25
> years and run into another Y2K horror. :)
> 
> The only solid solution is to implement a new set of syscalls (and
> there are not that many which are affected by this). The new syscalls
> should use a nanosecond based scalar time value and get rid of the
> timespec /timeval / time_t nonsense alltogether. That reduces the
> number of new syscalls significantly.
> 
> That time value should be 64bit, also people might argue, that we are
> creating a new issue for the year 2554, i.e 541 years from now. I
> don't think we need to worry about that really. We have to leave our
> grand-grand-grand..grandchildren (~20 generations from now) a few
> unsolved problems!
> 
> The evil plan to make this happen looks like this:
> 
>     1) Convert the core code to u64 with a timespec based shadow
>        infrastruture to avoid performance regressions in the first
>        place.
> 
>     2) Add new u64 based syscalls
> 
>     3) Disable the timespec based shadow infrastructure five years
>        from now to force all lazy buggers who ignored the new syscalls
>        to fix their crap.
> 
>     4) Deprecate the old syscalls 10 years from now
> 
>     5) Remove the old syscalls 100 years from now so Linus won't hunt
>        us for breaking userspace :)

50 years from now should be enough for most of us - beyond that there will 
be no hunting, only haunting ... ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems
  2013-06-07 21:53     ` Thomas Gleixner
  2013-06-20 12:34       ` Ingo Molnar
@ 2013-08-24 23:47       ` Michael Gilbert
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Gilbert @ 2013-08-24 23:47 UTC (permalink / raw)
  To: linux-kernel

Thomas Gleixner writes:
> That time value should be 64bit, also people might argue, that we are
> creating a new issue for the year 2554, i.e 541 years from now. I
> don't think we need to worry about that really. We have to leave our
> grand-grand-grand..grandchildren (~20 generations from now) a few
> unsolved problems!

Or at the measly cost of 8 additional bytes, solve the problem well and good 
for the entirety of the human race :)

128 (unsigned) bits defers the rollover problem for 1e-9*(2**128)/3600/24/365 
= 1e22 years, or 770 billion times longer than the current age of the 
universe.

That of course hedges on a 128-bit integer C standard within the next 25 
years ;)

Best wishes,
Mike


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-08-25  0:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-03 13:34 [PATCH] timekeeping: handle epoch roll-over (2038) on 32-bit systems Tobias Waldekranz
2013-06-03 14:34 ` Thomas Gleixner
2013-06-03 19:04   ` John Stultz
2013-06-07 21:53     ` Thomas Gleixner
2013-06-20 12:34       ` Ingo Molnar
2013-08-24 23:47       ` Michael Gilbert
2013-06-04  6:59   ` Tobias Waldekranz
2013-06-07 20:57     ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).