All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: y2038@lists.linaro.org
Cc: Thomas Gleixner <tglx@linutronix.de>,
	pang.xunlei@linaro.org, Peter Zijlstra <peterz@infradead.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	cl@linux.com, Ingo Molnar <mingo@kernel.org>,
	heenasirwani@gmail.com, linux-arch@vger.kernel.org,
	linux-s390@vger.kernel.org, mpe@ellerman.id.au,
	rafael.j.wysocki@intel.com, ahh@google.com,
	Frederic Weisbecker <fweisbec@gmail.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	pjt@google.com, riel@redhat.com, richardcochran@gmail.com,
	Tejun Heo <tj@kernel.org>, John Stultz <john.stultz@linaro.org>,
	rth@twiddle.net, Baolin Wang <baolin.wang@linaro.org>,
	gregkh@linuxfoundation.org, LKML <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	linux390@de.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
Date: Wed, 22 Apr 2015 13:07:44 +0200	[thread overview]
Message-ID: <2233518.Z2Q4dpO62C@wuerfel> (raw)
In-Reply-To: <alpine.DEB.2.11.1504220946460.13914@nanos>

On Wednesday 22 April 2015 10:45:23 Thomas Gleixner wrote:
> On Tue, 21 Apr 2015, Thomas Gleixner wrote:

> So we could save one translation step if we implement new syscalls
> which have a scalar nsec interface instead of the timespec/timeval
> cruft and let user space do the translation to whatever it wants.
> 
> So
> 
> sys_clock_nanosleep(const clockid_t which_clock, int flags,
> 	            const struct timespec __user *expires,
> 		    struct timespec __user *reminder)
> 
> would get the new syscall variant:
> 
> sys_clock_nanosleep_ns(const clockid_t which_clock, int flags,
> 		       const s64 expires, s64 __user *reminder)

As you might expect, there are a number of complications with this
approach:

- John Stultz likes to point out that it's easier to do one change
  at a time, so extending the interface to 64-bit has less potential
  of breaking things than a more fundamental change. I think it's
  useful to drop a lot of the syscalls when a more modern version
  is around (e.g. let libc implement usleep and nanosleep through
  clock_nanosleep), but keep the syscalls as close to the known-working
  64-bit versions as we can.
- The inode timestamp related syscalls (stat, utimes and variants
  thereof) require the full range of time64_t and cannot use ktime_t.
- converting between timespec types of different size is cheap,
  converting timespec to ktime_t is still relatively cheap, but
  converting ktime_t to timespec is rather expensive (at least eight
  32-bit multiplies, plus a few shifts and additions if you don't
  have 64-bit arithmetic).
- ioctls that pass a timespec need to keep doing that or would require
  a source-level change in user space instead of recompiling.

> I personally would welcome such an interface as it makes user space
> programming simpler. Just (re)arming a periodic nanosleep based on
> absolute expiry time is horrible stupid today:
> 
> 	 struct timespec expires;
> 	 ....
> 	 while ()
> 	       expires.tv_nsec += period.tv_nsec;
> 	       expires.tv_sec += period.tv_sec;
> 	       normalize_timespec(&expires);
> 	       sys_clock_nanosleep(CLOCK_ID, ABS, &expires, NULL);
> 
> So with a scalar interface this would reduce to:
> 
> 	 s64 expires;
> 	 ....
> 	 while ()
> 	       expires += period;
> 	       sys_clock_nanosleep_ns(CLOCK_ID, ABS, &expires, NULL);
> 
> There is a difference both in text and storage size plus the avoidance
> of the two translation steps (one translation step on 64bit).

We should probably look at it separately for each syscall. It's
quite possible that we find a number of them for which it helps
and others for which it hurts, so we need to see the big pictures.

There are also a few other calls that will never need 64-bit
time_t because the range is limited by the need to only ever
pass relative timeouts (select, poll, io_getevents, recvmmsg,
clock_getres, rt_sigtimedwait, sched_rr_get_interval, getrusage,
waitid, semtimedop, sysinfo), so we could actually leave them
using a 32-bit structure and have the libc do the conversion.

> I know that this is non portable, but OTOH if I look at the non
> portable mechanisms which are used by data bases, java VMs and other
> apps which exist to squeeze the last cycles out of the system, there
> is certainly some value to that.
> 
> The portable/spec conforming apps can still use the user space
> assisted translated timespec/timeval mechanisms.
> 
> There is one caveat though: sys_clock_gettime and sys_gettimeofday
> will still need a syscall_timespec64 variant. We have no double
> translation steps there because we maintain the timespec
> representation in the timekeeping code for performance reasons to
> avoid the division in the syscall interface. But everything else can
> do nicely without the timespec cruft.
> 
> We really should talk to libc folks and high performance users about
> this before blindly adding a gazillion of new timespec64 based
> interfaces.

I've started a list of affected syscalls at
https://docs.google.com/spreadsheets/d/1HCYwHXxs48TsTb6IGUduNjQnmfRvMPzCN6T_0YiQwis/edit?usp=sharing

Still adding more calls and description, let me know if you want edit
permissions.

	Arnd

WARNING: multiple messages have this Message-ID (diff)
From: Arnd Bergmann <arnd@arndb.de>
To: y2038@lists.linaro.org
Cc: pang.xunlei@linaro.org, Peter Zijlstra <peterz@infradead.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	cl@linux.com, Ingo Molnar <mingo@kernel.org>,
	heenasirwani@gmail.com, linux-arch@vger.kernel.org,
	linux-s390@vger.kernel.org, rafael.j.wysocki@intel.com,
	ahh@google.com, Frederic Weisbecker <fweisbec@gmail.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	pjt@google.com, riel@redhat.com, richardcochran@gmail.com,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	John Stultz <john.stultz@linaro.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	rth@twiddle.net, Baolin Wang <baolin.wang@linaro.org>,
	gregkh@linuxfoundation.org, LKML <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	linux390@de.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [Y2038] [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure
Date: Wed, 22 Apr 2015 13:07:44 +0200	[thread overview]
Message-ID: <2233518.Z2Q4dpO62C@wuerfel> (raw)
In-Reply-To: <alpine.DEB.2.11.1504220946460.13914@nanos>

On Wednesday 22 April 2015 10:45:23 Thomas Gleixner wrote:
> On Tue, 21 Apr 2015, Thomas Gleixner wrote:

> So we could save one translation step if we implement new syscalls
> which have a scalar nsec interface instead of the timespec/timeval
> cruft and let user space do the translation to whatever it wants.
> 
> So
> 
> sys_clock_nanosleep(const clockid_t which_clock, int flags,
> 	            const struct timespec __user *expires,
> 		    struct timespec __user *reminder)
> 
> would get the new syscall variant:
> 
> sys_clock_nanosleep_ns(const clockid_t which_clock, int flags,
> 		       const s64 expires, s64 __user *reminder)

As you might expect, there are a number of complications with this
approach:

- John Stultz likes to point out that it's easier to do one change
  at a time, so extending the interface to 64-bit has less potential
  of breaking things than a more fundamental change. I think it's
  useful to drop a lot of the syscalls when a more modern version
  is around (e.g. let libc implement usleep and nanosleep through
  clock_nanosleep), but keep the syscalls as close to the known-working
  64-bit versions as we can.
- The inode timestamp related syscalls (stat, utimes and variants
  thereof) require the full range of time64_t and cannot use ktime_t.
- converting between timespec types of different size is cheap,
  converting timespec to ktime_t is still relatively cheap, but
  converting ktime_t to timespec is rather expensive (at least eight
  32-bit multiplies, plus a few shifts and additions if you don't
  have 64-bit arithmetic).
- ioctls that pass a timespec need to keep doing that or would require
  a source-level change in user space instead of recompiling.

> I personally would welcome such an interface as it makes user space
> programming simpler. Just (re)arming a periodic nanosleep based on
> absolute expiry time is horrible stupid today:
> 
> 	 struct timespec expires;
> 	 ....
> 	 while ()
> 	       expires.tv_nsec += period.tv_nsec;
> 	       expires.tv_sec += period.tv_sec;
> 	       normalize_timespec(&expires);
> 	       sys_clock_nanosleep(CLOCK_ID, ABS, &expires, NULL);
> 
> So with a scalar interface this would reduce to:
> 
> 	 s64 expires;
> 	 ....
> 	 while ()
> 	       expires += period;
> 	       sys_clock_nanosleep_ns(CLOCK_ID, ABS, &expires, NULL);
> 
> There is a difference both in text and storage size plus the avoidance
> of the two translation steps (one translation step on 64bit).

We should probably look at it separately for each syscall. It's
quite possible that we find a number of them for which it helps
and others for which it hurts, so we need to see the big pictures.

There are also a few other calls that will never need 64-bit
time_t because the range is limited by the need to only ever
pass relative timeouts (select, poll, io_getevents, recvmmsg,
clock_getres, rt_sigtimedwait, sched_rr_get_interval, getrusage,
waitid, semtimedop, sysinfo), so we could actually leave them
using a 32-bit structure and have the libc do the conversion.

> I know that this is non portable, but OTOH if I look at the non
> portable mechanisms which are used by data bases, java VMs and other
> apps which exist to squeeze the last cycles out of the system, there
> is certainly some value to that.
> 
> The portable/spec conforming apps can still use the user space
> assisted translated timespec/timeval mechanisms.
> 
> There is one caveat though: sys_clock_gettime and sys_gettimeofday
> will still need a syscall_timespec64 variant. We have no double
> translation steps there because we maintain the timespec
> representation in the timekeeping code for performance reasons to
> avoid the division in the syscall interface. But everything else can
> do nicely without the timespec cruft.
> 
> We really should talk to libc folks and high performance users about
> this before blindly adding a gazillion of new timespec64 based
> interfaces.

I've started a list of affected syscalls at
https://docs.google.com/spreadsheets/d/1HCYwHXxs48TsTb6IGUduNjQnmfRvMPzCN6T_0YiQwis/edit?usp=sharing

Still adding more calls and description, let me know if you want edit
permissions.

	Arnd

  parent reply	other threads:[~2015-04-22 11:09 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-20  5:57 [PATCH 00/11] Convert the posix_clock_operations and k_clock structure to ready for 2038 Baolin Wang
2015-04-20  5:57 ` Baolin Wang
2015-04-20  5:57 ` [PATCH 01/11] linux/time64.h:Introduce the 'struct itimerspec64' for 64bit Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  9:49   ` Sergei Shtylyov
2015-04-20  9:49     ` Sergei Shtylyov
2015-04-20 10:55     ` Baolin Wang
2015-04-20 10:55       ` Baolin Wang
2015-04-20 19:14   ` Thomas Gleixner
2015-04-20 19:14     ` Thomas Gleixner
2015-04-20 19:59     ` Thomas Gleixner
2015-04-20 19:59       ` Thomas Gleixner
2015-04-21  8:19     ` Baolin Wang
2015-04-21  8:19       ` Baolin Wang
2015-04-20  5:57 ` [PATCH 02/11] timekeeping:Introduce the current_kernel_time64() function with timespec64 type Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 03/11] time/hrtimer:Introduce hrtimer_get_res64() with timespec64 type for getting the timer resolution Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20 19:15   ` Thomas Gleixner
2015-04-20 19:15     ` Thomas Gleixner
2015-04-20  5:57 ` [PATCH 04/11] posix timers:Introduce the 64bit methods with timespec64 type for k_clock structure Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20 20:40   ` Thomas Gleixner
2015-04-20 20:40     ` Thomas Gleixner
2015-04-21  8:59     ` [Y2038] " Arnd Bergmann
2015-04-21  8:59       ` Arnd Bergmann
2015-04-21 14:14       ` Thomas Gleixner
2015-04-21 14:14         ` Thomas Gleixner
2015-04-21 14:57         ` Arnd Bergmann
2015-04-21 14:57           ` Arnd Bergmann
2015-04-21 14:57           ` Arnd Bergmann
2015-04-21 15:13           ` Thomas Gleixner
2015-04-21 15:13             ` Thomas Gleixner
2015-04-21 15:40             ` Arnd Bergmann
2015-04-21 15:40               ` Arnd Bergmann
2015-04-21 15:40               ` Arnd Bergmann
2015-04-21 15:40               ` Arnd Bergmann
2015-04-21 20:13               ` Thomas Gleixner
2015-04-21 20:13                 ` Thomas Gleixner
2015-04-22  8:45                 ` Thomas Gleixner
2015-04-22  8:45                   ` Thomas Gleixner
2015-04-22 10:11                   ` Richard Cochran
2015-04-22 10:11                     ` Richard Cochran
2015-04-22 10:44                   ` David Laight
2015-04-22 10:44                     ` David Laight
2015-04-22 10:44                     ` David Laight
2015-04-22 10:44                     ` David Laight
2015-04-22 11:07                   ` Arnd Bergmann [this message]
2015-04-22 11:07                     ` Arnd Bergmann
2015-04-22 13:37                     ` Thomas Gleixner
2015-04-22 13:37                       ` Thomas Gleixner
2015-04-22 13:50                     ` Arnd Bergmann
2015-04-22 13:50                       ` Arnd Bergmann
2015-04-22 14:54                       ` Richard Cochran
2015-04-22 14:54                         ` Richard Cochran
2015-04-22 15:37                         ` Arnd Bergmann
2015-04-22 15:37                           ` Arnd Bergmann
2015-04-22 15:14                       ` Luc Van Oostenryck
2015-04-22 15:14                         ` Luc Van Oostenryck
2015-04-22 15:14                         ` Luc Van Oostenryck
2015-04-22 15:38                         ` Arnd Bergmann
2015-04-22 15:38                           ` Arnd Bergmann
2015-04-22 15:38                           ` Arnd Bergmann
2015-04-20  5:57 ` [PATCH 05/11] time/posix-timers:Convert to the 64bit methods for k_clock callback functions Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20 20:48   ` Thomas Gleixner
2015-04-20 20:48     ` Thomas Gleixner
2015-04-21  8:36     ` Baolin Wang
2015-04-21  8:36       ` Baolin Wang
2015-04-21  8:45       ` [Y2038] " Arnd Bergmann
2015-04-21  8:45         ` Arnd Bergmann
2015-04-21  8:45         ` Arnd Bergmann
2015-04-21  8:55         ` Baolin Wang
2015-04-21  8:55           ` Baolin Wang
2015-04-20  5:57 ` [PATCH 06/11] char/mmtimer:Convert to the 64bit methods for k_clock callback function Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 07/11] time/alarmtimer:Convert to the new methods for k_clock structure Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 08/11] time/posix-clock:Convert to the 64bit methods for k_clock and posix_clock_operations structure Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 09/11] cputime:Introduce the cputime_to_timespec64/timespec64_to_cputime function Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20 21:09   ` Thomas Gleixner
2015-04-20 21:09     ` Thomas Gleixner
2015-04-20  5:57 ` [PATCH 10/11] time/posix-cpu-timers:Convert to the 64bit methods for k_clock structure Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  5:57 ` [PATCH 11/11] k_clock:Remove the 32bit methods with timespec type Baolin Wang
2015-04-20  5:57   ` Baolin Wang
2015-04-20  8:42   ` Richard Cochran
2015-04-20  8:42     ` Richard Cochran
2015-04-20  9:00     ` Baolin Wang
2015-04-20  9:00       ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2233518.Z2Q4dpO62C@wuerfel \
    --to=arnd@arndb.de \
    --cc=ahh@google.com \
    --cc=baolin.wang@linaro.org \
    --cc=benh@kernel.crashing.org \
    --cc=cl@linux.com \
    --cc=fweisbec@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heenasirwani@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux390@de.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=pang.xunlei@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=richardcochran@gmail.com \
    --cc=riel@redhat.com \
    --cc=rth@twiddle.net \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=y2038@lists.linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.