All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Arnd Bergmann <arnd@arndb.de>
Cc: y2038@lists.linaro.org, Baolin Wang <baolin.wang@linaro.org>,
	pang.xunlei@linaro.org, Peter Zijlstra <peterz@infradead.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	cl@linux.com, heenasirwani@gmail.com, linux-arch@vger.kernel.org,
	linux-s390@vger.kernel.org, mpe@ellerman.id.au,
	rafael.j.wysocki@intel.com, ahh@google.com,
	Frederic Weisbecker <fweisbec@gmail.com>,
	pjt@google.com, riel@redhat.com, richardcochran@gmail.com,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	John Stultz <john.stultz@linaro.org>,
	rth@twiddle.net, gregkh@linuxfoundation.org,
	LKML <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	linux390@de.ibm.com, linuxppc-dev@lists.ozlabs.org,
	Ingo Molnar <mingo@kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
Date: Wed, 22 Apr 2015 10:45:23 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.11.1504220946460.13914@nanos> (raw)
In-Reply-To: <alpine.DEB.2.11.1504212204130.13914@nanos>

On Tue, 21 Apr 2015, Thomas Gleixner wrote:
> On Tue, 21 Apr 2015, Arnd Bergmann wrote:
> > I know there are concerns about this, in particular because C11 and
> > POSIX both require tv_nsec to be 'long', unlike timeval->tv_usec,
> > which is a 'suseconds_t' and can be defined as 'long long'.
> >
> > a)
> > 
> > struct timespec {
> > 	time_t tv_sec;
> > 	long long tv_nsec; /* or typedef long long snseconds_t */
> > };
> > 
> > This is not directly compatible with C11 or POSIX.1-2008, but it
> > matches what we do inside of 64-bit kernels, so probably has the
> > highest chance of working correctly in practice
> 
> After reading Linus rant in the x32 thread again (thanks for the
> reminder), and looking at b/c/d - which rate between ugly and butt
> ugly - I think we should go for a) and screw POSIX and C11 as those
> committee dinosaurs seem to completely ignore the 2038 problem on
> 32bit machines. At least I have not found any hint that these folks
> care at all. So why should we comply to something which is completely
> useless?
> 
> That also makes the question about the upper 32bits check moot, so
> it's the simplest and clearest of the possible solutions.

Second thoughts after some sleep.

So the outcome of this is going to be that user space libraries will
not expose the syscall variant of

    syscall_timespec64 {
           s64 tv_sec;
	   s64 tv_nsec;
    };

to applications. The libs will translate them to spec conforming

   timespec {
           time_t tv_sec;
	   long   tv_nsec;
   };

anyway. That means we have two translation steps on 32bit systems:

  1) user space timespec -> syscall timespec64

  2) syscall timespec64 -> scalar nsec s64 (ktime_t)

and the other way round. The kernel internal representation is simply
s64 (nsec) based all over the place.

So we could save one translation step if we implement new syscalls
which have a scalar nsec interface instead of the timespec/timeval
cruft and let user space do the translation to whatever it wants.

So

sys_clock_nanosleep(const clockid_t which_clock, int flags,
	            const struct timespec __user *expires,
		    struct timespec __user *reminder)

would get the new syscall variant:

sys_clock_nanosleep_ns(const clockid_t which_clock, int flags,
		       const s64 expires, s64 __user *reminder)

I personally would welcome such an interface as it makes user space
programming simpler. Just (re)arming a periodic nanosleep based on
absolute expiry time is horrible stupid today:

	 struct timespec expires;
	 ....
	 while ()
	       expires.tv_nsec += period.tv_nsec;
	       expires.tv_sec += period.tv_sec;
	       normalize_timespec(&expires);
	       sys_clock_nanosleep(CLOCK_ID, ABS, &expires, NULL);

So with a scalar interface this would reduce to:

	 s64 expires;
	 ....
	 while ()
	       expires += period;
	       sys_clock_nanosleep_ns(CLOCK_ID, ABS, &expires, NULL);

There is a difference both in text and storage size plus the avoidance
of the two translation steps (one translation step on 64bit).

I know that this is non portable, but OTOH if I look at the non
portable mechanisms which are used by data bases, java VMs and other
apps which exist to squeeze the last cycles out of the system, there
is certainly some value to that.

The portable/spec conforming apps can still use the user space
assisted translated timespec/timeval mechanisms.

There is one caveat though: sys_clock_gettime and sys_gettimeofday
will still need a syscall_timespec64 variant. We have no double
translation steps there because we maintain the timespec
representation in the timekeeping code for performance reasons to
avoid the division in the syscall interface. But everything else can
do nicely without the timespec cruft.

We really should talk to libc folks and high performance users about
this before blindly adding a gazillion of new timespec64 based
interfaces.

Thoughts?

Thanks,

	tglx

WARNING: multiple messages have this Message-ID (diff)
From: Thomas Gleixner <tglx@linutronix.de>
To: Arnd Bergmann <arnd@arndb.de>
Cc: pang.xunlei@linaro.org, Peter Zijlstra <peterz@infradead.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	cl@linux.com, Ingo Molnar <mingo@kernel.org>,
	heenasirwani@gmail.com, linux-arch@vger.kernel.org,
	linux-s390@vger.kernel.org, y2038@lists.linaro.org,
	rafael.j.wysocki@intel.com, ahh@google.com,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	pjt@google.com, riel@redhat.com, richardcochran@gmail.com,
	Tejun Heo <tj@kernel.org>, John Stultz <john.stultz@linaro.org>,
	rth@twiddle.net, Baolin Wang <baolin.wang@linaro.org>,
	gregkh@linuxfoundation.org, LKML <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	linux390@de.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
Date: Wed, 22 Apr 2015 10:45:23 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.11.1504220946460.13914@nanos> (raw)
In-Reply-To: <alpine.DEB.2.11.1504212204130.13914@nanos>

On Tue, 21 Apr 2015, Thomas Gleixner wrote:
> On Tue, 21 Apr 2015, Arnd Bergmann wrote:
> > I know there are concerns about this, in particular because C11 and
> > POSIX both require tv_nsec to be 'long', unlike timeval->tv_usec,
> > which is a 'suseconds_t' and can be defined as 'long long'.
> >
> > a)
> > 
> > struct timespec {
> > 	time_t tv_sec;
> > 	long long tv_nsec; /* or typedef long long snseconds_t */
> > };
> > 
> > This is not directly compatible with C11 or POSIX.1-2008, but it
> > matches what we do inside of 64-bit kernels, so probably has the
> > highest chance of working correctly in practice
> 
> After reading Linus rant in the x32 thread again (thanks for the
> reminder), and looking at b/c/d - which rate between ugly and butt
> ugly - I think we should go for a) and screw POSIX and C11 as those
> committee dinosaurs seem to completely ignore the 2038 problem on
> 32bit machines. At least I have not found any hint that these folks
> care at all. So why should we comply to something which is completely
> useless?
> 
> That also makes the question about the upper 32bits check moot, so
> it's the simplest and clearest of the possible solutions.

Second thoughts after some sleep.

So the outcome of this is going to be that user space libraries will
not expose the syscall variant of

    syscall_timespec64 {
           s64 tv_sec;
	   s64 tv_nsec;
    };

to applications. The libs will translate them to spec conforming

   timespec {
           time_t tv_sec;
	   long   tv_nsec;
   };

anyway. That means we have two translation steps on 32bit systems:

  1) user space timespec -> syscall timespec64

  2) syscall timespec64 -> scalar nsec s64 (ktime_t)

and the other way round. The kernel internal representation is simply
s64 (nsec) based all over the place.

So we could save one translation step if we implement new syscalls
which have a scalar nsec interface instead of the timespec/timeval
cruft and let user space do the translation to whatever it wants.

So

sys_clock_nanosleep(const clockid_t which_clock, int flags,
	            const struct timespec __user *expires,
		    struct timespec __user *reminder)

would get the new syscall variant:

sys_clock_nanosleep_ns(const clockid_t which_clock, int flags,
		       const s64 expires, s64 __user *reminder)

I personally would welcome such an interface as it makes user space
programming simpler. Just (re)arming a periodic nanosleep based on
absolute expiry time is horrible stupid today:

	 struct timespec expires;
	 ....
	 while ()
	       expires.tv_nsec += period.tv_nsec;
	       expires.tv_sec += period.tv_sec;
	       normalize_timespec(&expires);
	       sys_clock_nanosleep(CLOCK_ID, ABS, &expires, NULL);

So with a scalar interface this would reduce to:

	 s64 expires;
	 ....
	 while ()
	       expires += period;
	       sys_clock_nanosleep_ns(CLOCK_ID, ABS, &expires, NULL);

There is a difference both in text and storage size plus the avoidance
of the two translation steps (one translation step on 64bit).

I know that this is non portable, but OTOH if I look at the non
portable mechanisms which are used by data bases, java VMs and other
apps which exist to squeeze the last cycles out of the system, there
is certainly some value to that.

The portable/spec conforming apps can still use the user space
assisted translated timespec/timeval mechanisms.

There is one caveat though: sys_clock_gettime and sys_gettimeofday
will still need a syscall_timespec64 variant. We have no double
translation steps there because we maintain the timespec
representation in the timekeeping code for performance reasons to
avoid the division in the syscall interface. But everything else can
do nicely without the timespec cruft.

We really should talk to libc folks and high performance users about
this before blindly adding a gazillion of new timespec64 based
interfaces.

Thoughts?

Thanks,

	tglx

  reply	other threads:[~2015-04-22  8:45 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-20  5:57 [PATCH 00/11] Convert the posix_clock_operations and k_clock structure to ready for 2038 Baolin Wang
2015-04-20  5:57 ` Baolin Wang
2015-04-20  5:57 ` [PATCH 01/11] linux/time64.h:Introduce the 'struct itimerspec64' for 64bit Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  9:49   ` Sergei Shtylyov
2015-04-20  9:49     ` Sergei Shtylyov
2015-04-20 10:55     ` Baolin Wang
2015-04-20 10:55       ` Baolin Wang
2015-04-20 19:14   ` Thomas Gleixner
2015-04-20 19:14     ` Thomas Gleixner
2015-04-20 19:59     ` Thomas Gleixner
2015-04-20 19:59       ` Thomas Gleixner
2015-04-21  8:19     ` Baolin Wang
2015-04-21  8:19       ` Baolin Wang
2015-04-20  5:57 ` [PATCH 02/11] timekeeping:Introduce the current_kernel_time64() function with timespec64 type Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 03/11] time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for getting the timer resolution Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20 19:15   ` Thomas Gleixner
2015-04-20 19:15     ` Thomas Gleixner
2015-04-20  5:57 ` [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20 20:40   ` Thomas Gleixner
2015-04-20 20:40     ` Thomas Gleixner
2015-04-21  8:59     ` [Y2038] " Arnd Bergmann
2015-04-21  8:59       ` Arnd Bergmann
2015-04-21 14:14       ` Thomas Gleixner
2015-04-21 14:14         ` Thomas Gleixner
2015-04-21 14:57         ` Arnd Bergmann
2015-04-21 14:57           ` Arnd Bergmann
2015-04-21 14:57           ` Arnd Bergmann
2015-04-21 15:13           ` Thomas Gleixner
2015-04-21 15:13             ` Thomas Gleixner
2015-04-21 15:40             ` Arnd Bergmann
2015-04-21 15:40               ` Arnd Bergmann
2015-04-21 15:40               ` Arnd Bergmann
2015-04-21 15:40               ` Arnd Bergmann
2015-04-21 20:13               ` Thomas Gleixner
2015-04-21 20:13                 ` Thomas Gleixner
2015-04-22  8:45                 ` Thomas Gleixner [this message]
2015-04-22  8:45                   ` Thomas Gleixner
2015-04-22 10:11                   ` Richard Cochran
2015-04-22 10:11                     ` Richard Cochran
2015-04-22 10:44                   ` David Laight
2015-04-22 10:44                     ` David Laight
2015-04-22 10:44                     ` David Laight
2015-04-22 10:44                     ` David Laight
2015-04-22 11:07                   ` Arnd Bergmann
2015-04-22 11:07                     ` Arnd Bergmann
2015-04-22 13:37                     ` Thomas Gleixner
2015-04-22 13:37                       ` Thomas Gleixner
2015-04-22 13:50                     ` Arnd Bergmann
2015-04-22 13:50                       ` Arnd Bergmann
2015-04-22 14:54                       ` Richard Cochran
2015-04-22 14:54                         ` Richard Cochran
2015-04-22 15:37                         ` Arnd Bergmann
2015-04-22 15:37                           ` Arnd Bergmann
2015-04-22 15:14                       ` Luc Van Oostenryck
2015-04-22 15:14                         ` Luc Van Oostenryck
2015-04-22 15:14                         ` Luc Van Oostenryck
2015-04-22 15:38                         ` Arnd Bergmann
2015-04-22 15:38                           ` Arnd Bergmann
2015-04-22 15:38                           ` Arnd Bergmann
2015-04-20  5:57 ` [PATCH 05/11] time/posix-timers:Convert to the 64bit methods for k_clock callback functions Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20 20:48   ` Thomas Gleixner
2015-04-20 20:48     ` Thomas Gleixner
2015-04-21  8:36     ` Baolin Wang
2015-04-21  8:36       ` Baolin Wang
2015-04-21  8:45       ` [Y2038] " Arnd Bergmann
2015-04-21  8:45         ` Arnd Bergmann
2015-04-21  8:45         ` Arnd Bergmann
2015-04-21  8:55         ` Baolin Wang
2015-04-21  8:55           ` Baolin Wang
2015-04-20  5:57 ` [PATCH 06/11] char/mmtimer:Convert to the 64bit methods for k_clock callback function Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 07/11] time/alarmtimer:Convert to the new methods for k_clock structure Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 08/11] time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 09/11] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20 21:09   ` Thomas Gleixner
2015-04-20 21:09     ` Thomas Gleixner
2015-04-20  5:57 ` [PATCH 10/11] time/posix-cpu-timers:Convert to the 64bit methods for k_clock structure Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 11/11] k_clock:Remove the 32bit methods with timespec type Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  8:42   ` Richard Cochran
2015-04-20  8:42     ` Richard Cochran
2015-04-20  9:00     ` Baolin Wang
2015-04-20  9:00       ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.11.1504220946460.13914@nanos \
    --to=tglx@linutronix.de \
    --cc=ahh@google.com \
    --cc=arnd@arndb.de \
    --cc=baolin.wang@linaro.org \
    --cc=benh@kernel.crashing.org \
    --cc=cl@linux.com \
    --cc=fweisbec@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heenasirwani@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux390@de.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=pang.xunlei@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=richardcochran@gmail.com \
    --cc=riel@redhat.com \
    --cc=rth@twiddle.net \
    --cc=schwidefsky@de.ibm.com \
    --cc=tj@kernel.org \
    --cc=y2038@lists.linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.