linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] y2038: introduce struct __kernel_old_timeval
@ 2017-11-27 17:00 Arnd Bergmann
  2017-11-27 17:00 ` [PATCH 2/3] y2038: elfcore: use __kernel_old_timeval for process times Arnd Bergmann
  2017-11-27 17:00 ` [PATCH 3/3] y2038: rusage: " Arnd Bergmann
  0 siblings, 2 replies; 7+ messages in thread
From: Arnd Bergmann @ 2017-11-27 17:00 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner
  Cc: y2038, libc-alpha, linux-kernel, linux-arch, linux-api,
	Albert ARIBAUD, Arnd Bergmann, Stephen Boyd, Al Viro,
	Deepa Dinamani, Ingo Molnar, Andreas Dilger

Dealing with 'struct timeval' users in the y2038 series is a bit tricky:

We have two definitions of timeval that are visible to user space,
one comes from glibc (or some other C library), the other comes from
linux/time.h. The kernel copy is what we want to be used for a number of
structures defined by the kernel itself, e.g. elf_prstatus (used it core
dumps), sysinfo and rusage (used in system calls).  These generally tend
to be used for passing time intervals rather than absolute (epoch-based)
times, so they do not suffer from the y2038 overflow. Some of them
could be changed to use 64-bit timestamps by creating new system calls,
others like the core files cannot easily be changed.

An application using these interfaces likely also uses gettimeofday()
or other interfaces that use absolute times, and pass 'struct timeval'
pointers directly into kernel interfaces, so glibc must redefine their
timeval based on a 64-bit time_t when they introduce their y2038-safe
interfaces.

The only reasonable way forward I see is to remove the 'timeval'
definion from the kernel's uapi headers, and change the interfaces that
we do not want to (or cannot) duplicate for 64-bit times to use a new
__kernel_old_timeval definition instead. This type should be avoided
for all new interfaces (those can use 64-bit nanoseconds, or the 64-bit
version of timespec instead), and should be used with great care when
converting existing interfaces from timeval, to be sure they don't suffer
from the y2038 overflow, and only with consensus for the particular user
that using __kernel_old_timeval is better than moving to a 64-bit based
interface. The structure name is intentionally chosen to not conflict
with user space types, and to be ugly enough to discourage its use.

Note that ioctl based interfaces that pass a bare 'timeval' pointer
cannot change to '__kernel_old_timeval' because the user space source
code refers to 'timeval' instead, and we don't want to modify the user
space sources if possible. However, any application that relies on a
structure to contain an embedded 'timeval' (e.g. by passing a pointer
to the member into a function call that expects a timeval pointer) is
broken when that structure gets converted to __kernel_old_timeval. I
don't see any way around that, and we have to rely on the compiler to
produce a warning or compile failure that will alert users when they
recompile their sources against a new libc.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 include/linux/time32.h    |  1 +
 include/uapi/linux/time.h | 12 ++++++++++++
 kernel/time/time.c        | 12 ++++++++++++
 3 files changed, 25 insertions(+)

diff --git a/include/linux/time32.h b/include/linux/time32.h
index 100411c979be..378c75d9a83c 100644
--- a/include/linux/time32.h
+++ b/include/linux/time32.h
@@ -205,5 +205,6 @@ static inline s64 timeval_to_ns(const struct timeval *tv)
  * Returns the timeval representation of the nsec parameter.
  */
 extern struct timeval ns_to_timeval(const s64 nsec);
+extern struct __kernel_old_timeval ns_to_kernel_old_timeval(const s64 nsec);
 
 #endif
diff --git a/include/uapi/linux/time.h b/include/uapi/linux/time.h
index 0ad4510884b0..30aa734135ad 100644
--- a/include/uapi/linux/time.h
+++ b/include/uapi/linux/time.h
@@ -50,6 +50,18 @@ struct __kernel_timespec {
 #endif
 
 /*
+ * legacy timeval structure, only embedded in structures that
+ * traditionally used 'timeval' to pass time intervals (not absolute
+ * times). Do not add new users. If user space fails to compile
+ * here, this is probably because it is not y2038 safe and needs to
+ * be changed to use another interface.
+ */
+struct __kernel_old_timeval {
+	__kernel_long_t tv_sec;			/* seconds */
+	__kernel_long_t tv_usec;		/* seconds */
+};
+
+/*
  * The IDs of the various system clocks (for POSIX.1b interval timers):
  */
 #define CLOCK_REALTIME			0
diff --git a/kernel/time/time.c b/kernel/time/time.c
index 518b56b17147..92002257f083 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -486,6 +486,18 @@ struct timeval ns_to_timeval(const s64 nsec)
 }
 EXPORT_SYMBOL(ns_to_timeval);
 
+struct __kernel_old_timeval ns_to_kernel_old_timeval(const s64 nsec)
+{
+	struct timespec64 ts = ns_to_timespec64(nsec);
+	struct __kernel_old_timeval tv;
+
+	tv.tv_sec = ts.tv_sec;
+	tv.tv_usec = (suseconds_t) ts.tv_nsec / 1000;
+
+	return tv;
+}
+EXPORT_SYMBOL(ns_to_kernel_old_timeval);
+
 /**
  * set_normalized_timespec - set timespec sec and nsec parts and normalize
  *
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] y2038: elfcore: use __kernel_old_timeval for process times
  2017-11-27 17:00 [PATCH 1/3] y2038: introduce struct __kernel_old_timeval Arnd Bergmann
@ 2017-11-27 17:00 ` Arnd Bergmann
  2017-11-27 17:00 ` [PATCH 3/3] y2038: rusage: " Arnd Bergmann
  1 sibling, 0 replies; 7+ messages in thread
From: Arnd Bergmann @ 2017-11-27 17:00 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner
  Cc: y2038, libc-alpha, linux-kernel, linux-arch, linux-api,
	Albert ARIBAUD, Arnd Bergmann, Ralf Baechle,
	James E.J. Bottomley, Helge Deller, Alexander Viro,
	Greg Kroah-Hartman, Ingo Molnar, Frederic Weisbecker,
	Andrew Morton, Denys Vlasenko, Nicolas Pitre, Dave Martin,
	Mickael GUENE, linux-mips, linux-parisc, linux-fsdevel

We store elapsed time for a crashed process in struct elf_prstatus using
'timeval' structures. Once glibc starts using 64-bit time_t, this becomes
incompatible with the kernel's idea of timeval since the structure layout
no longer matches on 32-bit architectures.

This changes the definition of the elf_prstatus structure to use
__kernel_old_timeval instead, which is hardcoded to the currently used
binary layout. There is no risk of overflow in y2038 though, because
the time values are all relative times, and can store up to 68 years
of process elapsed time.

There is a risk of applications breaking at build time when they
use the new kernel headers and expect the type to be exactly 'timeval'
rather than a structure that has the same fields as before. Those
applications have to be modified to deal with 64-bit time_t anyway.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/mips/kernel/binfmt_elfn32.c  |  4 ++--
 arch/mips/kernel/binfmt_elfo32.c  |  4 ++--
 arch/parisc/kernel/binfmt_elf32.c |  4 ++--
 fs/binfmt_elf.c                   | 12 ++++++------
 fs/binfmt_elf_fdpic.c             | 12 ++++++------
 fs/compat_binfmt_elf.c            |  4 ++--
 include/uapi/linux/elfcore.h      |  8 ++++----
 7 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/mips/kernel/binfmt_elfn32.c b/arch/mips/kernel/binfmt_elfn32.c
index 89b234844534..2fe2d5573289 100644
--- a/arch/mips/kernel/binfmt_elfn32.c
+++ b/arch/mips/kernel/binfmt_elfn32.c
@@ -100,7 +100,7 @@ jiffies_to_compat_timeval(unsigned long jiffies, struct compat_timeval *value)
 #undef TASK_SIZE
 #define TASK_SIZE TASK_SIZE32
 
-#undef ns_to_timeval
-#define ns_to_timeval ns_to_compat_timeval
+#undef ns_to_kernel_old_timeval
+#define ns_to_kernel_old_timeval ns_to_compat_timeval
 
 #include "../../../fs/binfmt_elf.c"
diff --git a/arch/mips/kernel/binfmt_elfo32.c b/arch/mips/kernel/binfmt_elfo32.c
index a88c59db3d48..d3c37583ef91 100644
--- a/arch/mips/kernel/binfmt_elfo32.c
+++ b/arch/mips/kernel/binfmt_elfo32.c
@@ -103,7 +103,7 @@ jiffies_to_compat_timeval(unsigned long jiffies, struct compat_timeval *value)
 #undef TASK_SIZE
 #define TASK_SIZE TASK_SIZE32
 
-#undef ns_to_timeval
-#define ns_to_timeval ns_to_compat_timeval
+#undef ns_to_kernel_old_timeval
+#define ns_to_kernel_old_timeval ns_to_compat_timeval
 
 #include "../../../fs/binfmt_elf.c"
diff --git a/arch/parisc/kernel/binfmt_elf32.c b/arch/parisc/kernel/binfmt_elf32.c
index 20dfa081ed0b..ad3ea00c64f7 100644
--- a/arch/parisc/kernel/binfmt_elf32.c
+++ b/arch/parisc/kernel/binfmt_elf32.c
@@ -92,7 +92,7 @@ struct elf_prpsinfo32
 	current->thread.map_base = DEFAULT_MAP_BASE32; \
 	current->thread.task_size = DEFAULT_TASK_SIZE32 \
 
-#undef ns_to_timeval
-#define ns_to_timeval ns_to_compat_timeval
+#undef ns_to_kernel_old_timeval
+#define ns_to_kernel_old_timeval ns_to_compat_timeval
 
 #include "../../../fs/binfmt_elf.c"
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 83732fef510d..7ae716db7d99 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1500,18 +1500,18 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		 * group-wide total, not its individual thread total.
 		 */
 		thread_group_cputime(p, &cputime);
-		prstatus->pr_utime = ns_to_timeval(cputime.utime);
-		prstatus->pr_stime = ns_to_timeval(cputime.stime);
+		prstatus->pr_utime = ns_to_kernel_old_timeval(cputime.utime);
+		prstatus->pr_stime = ns_to_kernel_old_timeval(cputime.stime);
 	} else {
 		u64 utime, stime;
 
 		task_cputime(p, &utime, &stime);
-		prstatus->pr_utime = ns_to_timeval(utime);
-		prstatus->pr_stime = ns_to_timeval(stime);
+		prstatus->pr_utime = ns_to_kernel_old_timeval(utime);
+		prstatus->pr_stime = ns_to_kernel_old_timeval(stime);
 	}
 
-	prstatus->pr_cutime = ns_to_timeval(p->signal->cutime);
-	prstatus->pr_cstime = ns_to_timeval(p->signal->cstime);
+	prstatus->pr_cutime = ns_to_kernel_old_timeval(p->signal->cutime);
+	prstatus->pr_cstime = ns_to_kernel_old_timeval(p->signal->cstime);
 }
 
 static int fill_psinfo(struct elf_prpsinfo *psinfo, struct task_struct *p,
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 429326b6e2e7..89717459224b 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1362,17 +1362,17 @@ static void fill_prstatus(struct elf_prstatus *prstatus,
 		 * group-wide total, not its individual thread total.
 		 */
 		thread_group_cputime(p, &cputime);
-		prstatus->pr_utime = ns_to_timeval(cputime.utime);
-		prstatus->pr_stime = ns_to_timeval(cputime.stime);
+		prstatus->pr_utime = ns_to_kernel_old_timeval(cputime.utime);
+		prstatus->pr_stime = ns_to_kernel_old_timeval(cputime.stime);
 	} else {
 		u64 utime, stime;
 
 		task_cputime(p, &utime, &stime);
-		prstatus->pr_utime = ns_to_timeval(utime);
-		prstatus->pr_stime = ns_to_timeval(stime);
+		prstatus->pr_utime = ns_to_kernel_old_timeval(utime);
+		prstatus->pr_stime = ns_to_kernel_old_timeval(stime);
 	}
-	prstatus->pr_cutime = ns_to_timeval(p->signal->cutime);
-	prstatus->pr_cstime = ns_to_timeval(p->signal->cstime);
+	prstatus->pr_cutime = ns_to_kernel_old_timeval(p->signal->cutime);
+	prstatus->pr_cstime = ns_to_kernel_old_timeval(p->signal->cstime);
 
 	prstatus->pr_exec_fdpic_loadmap = p->mm->context.exec_fdpic_loadmap;
 	prstatus->pr_interp_fdpic_loadmap = p->mm->context.interp_fdpic_loadmap;
diff --git a/fs/compat_binfmt_elf.c b/fs/compat_binfmt_elf.c
index 504b3c3539dc..5df608af1306 100644
--- a/fs/compat_binfmt_elf.c
+++ b/fs/compat_binfmt_elf.c
@@ -51,8 +51,8 @@
 #define elf_prstatus	compat_elf_prstatus
 #define elf_prpsinfo	compat_elf_prpsinfo
 
-#undef ns_to_timeval
-#define ns_to_timeval ns_to_compat_timeval
+#undef ns_to_kernel_old_timeval
+#define ns_to_kernel_old_timeval ns_to_compat_timeval
 
 /*
  * To use this file, asm/elf.h must define compat_elf_check_arch.
diff --git a/include/uapi/linux/elfcore.h b/include/uapi/linux/elfcore.h
index 0b2c9e16e345..baf03562306d 100644
--- a/include/uapi/linux/elfcore.h
+++ b/include/uapi/linux/elfcore.h
@@ -53,10 +53,10 @@ struct elf_prstatus
 	pid_t	pr_ppid;
 	pid_t	pr_pgrp;
 	pid_t	pr_sid;
-	struct timeval pr_utime;	/* User time */
-	struct timeval pr_stime;	/* System time */
-	struct timeval pr_cutime;	/* Cumulative user time */
-	struct timeval pr_cstime;	/* Cumulative system time */
+	struct __kernel_old_timeval pr_utime;	/* User time */
+	struct __kernel_old_timeval pr_stime;	/* System time */
+	struct __kernel_old_timeval pr_cutime;	/* Cumulative user time */
+	struct __kernel_old_timeval pr_cstime;	/* Cumulative system time */
 #if 0
 	long	pr_instr;		/* Current instruction */
 #endif
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] y2038: rusage: use __kernel_old_timeval for process times
  2017-11-27 17:00 [PATCH 1/3] y2038: introduce struct __kernel_old_timeval Arnd Bergmann
  2017-11-27 17:00 ` [PATCH 2/3] y2038: elfcore: use __kernel_old_timeval for process times Arnd Bergmann
@ 2017-11-27 17:00 ` Arnd Bergmann
  2017-11-27 17:52   ` Paul Eggert
  1 sibling, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2017-11-27 17:00 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner
  Cc: y2038, libc-alpha, linux-kernel, linux-arch, linux-api,
	Albert ARIBAUD, Arnd Bergmann, Richard Henderson,
	Ivan Kokshaysky, Matt Turner, Al Viro, Ingo Molnar,
	Frederic Weisbecker, Deepa Dinamani, Greg Kroah-Hartman,
	Eric W. Biederman, Oleg Nesterov, Andrew Morton, Kirill Tkhai,
	linux-alpha

'struct rusage' contains the run times of a process in 'timeval' format
and is accessed through the wait4() and getrusage() system calls. This
is not a problem for y2038 safety by itself, but causes an issue when
the C library starts using 64-bit time_t on 32-bit architectures because
the structure layout becomes incompatible.

There are three possible ways of dealing with this:

a) deprecate the wait4() and getrusage() system calls, and create
   a set of kernel interfaces based around a newly defined structure that
   could solve multiple problems at once, e.g. provide more fine-grained
   timestamps. The C library could then implement the posix interfaces
   on top of the new system calls.

b) Extend the approach taken by the x32 ABI, and use the 64-bit
   native structure layout for rusage on all architectures with new
   system calls that is otherwise compatible. A possible problem here
   is that we end up with incompatible definitions of rusage between
   /usr/include/linux/resource.h and /usr/include/bits/resource.h

c) Change the definition of struct rusage to be independent of
   time_t. This is the easiest change, as it does not involve new system
   call entry points, but it has the risk of introducing compile-time
   incompatibilities with user space sources that rely on the type
   of ru_utime and ru_stime.

I'm picking approch c) for its simplicity, but I'd like to hear from
others whether they would prefer a different approach.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 arch/alpha/kernel/osf_sys.c   | 2 +-
 include/uapi/linux/resource.h | 4 ++--
 kernel/sys.c                  | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index fa1a392ca9a2..445ded2ea471 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -970,7 +970,7 @@ put_tv32(struct timeval32 __user *o, struct timespec64 *i)
 }
 
 static inline long
-put_tv_to_tv32(struct timeval32 __user *o, struct timeval *i)
+put_tv_to_tv32(struct timeval32 __user *o, struct __kernel_old_timeval *i)
 {
 	return copy_to_user(o, &(struct timeval32){
 				.tv_sec = i->tv_sec,
diff --git a/include/uapi/linux/resource.h b/include/uapi/linux/resource.h
index cc00fd079631..74ef57b38f9f 100644
--- a/include/uapi/linux/resource.h
+++ b/include/uapi/linux/resource.h
@@ -22,8 +22,8 @@
 #define	RUSAGE_THREAD	1		/* only the calling thread */
 
 struct	rusage {
-	struct timeval ru_utime;	/* user time used */
-	struct timeval ru_stime;	/* system time used */
+	struct __kernel_old_timeval ru_utime;	/* user time used */
+	struct __kernel_old_timeval ru_stime;	/* system time used */
 	__kernel_long_t	ru_maxrss;	/* maximum resident set size */
 	__kernel_long_t	ru_ixrss;	/* integral shared memory size */
 	__kernel_long_t	ru_idrss;	/* integral unshared data size */
diff --git a/kernel/sys.c b/kernel/sys.c
index 83ffd7dccf23..c459e294aa9e 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1717,8 +1717,8 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
 	unlock_task_sighand(p, &flags);
 
 out:
-	r->ru_utime = ns_to_timeval(utime);
-	r->ru_stime = ns_to_timeval(stime);
+	r->ru_utime = ns_to_kernel_old_timeval(utime);
+	r->ru_stime = ns_to_kernel_old_timeval(stime);
 
 	if (who != RUSAGE_CHILDREN) {
 		struct mm_struct *mm = get_task_mm(p);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] y2038: rusage: use __kernel_old_timeval for process times
  2017-11-27 17:00 ` [PATCH 3/3] y2038: rusage: " Arnd Bergmann
@ 2017-11-27 17:52   ` Paul Eggert
  2017-11-27 18:49     ` Eric W. Biederman
  0 siblings, 1 reply; 7+ messages in thread
From: Paul Eggert @ 2017-11-27 17:52 UTC (permalink / raw)
  To: Arnd Bergmann, John Stultz, Thomas Gleixner
  Cc: y2038, libc-alpha, linux-kernel, linux-arch, linux-api,
	Albert ARIBAUD, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Al Viro, Ingo Molnar, Frederic Weisbecker, Deepa Dinamani,
	Greg Kroah-Hartman, Eric W. Biederman, Oleg Nesterov,
	Andrew Morton, Kirill Tkhai, linux-alpha

On 11/27/2017 09:00 AM, Arnd Bergmann wrote:
> b) Extend the approach taken by the x32 ABI, and use the 64-bit
>     native structure layout for rusage on all architectures with new
>     system calls that is otherwise compatible. A possible problem here
>     is that we end up with incompatible definitions of rusage between
>     /usr/include/linux/resource.h and /usr/include/bits/resource.h
>
> c) Change the definition of struct rusage to be independent of
>     time_t. This is the easiest change, as it does not involve new system
>     call entry points, but it has the risk of introducing compile-time
>     incompatibilities with user space sources that rely on the type
>     of ru_utime and ru_stime.
>
> I'm picking approch c) for its simplicity, but I'd like to hear from
> others whether they would prefer a different approach.

(c) would break programs like GNU Emacs, which copy ru_utime and 
ru_stime members into struct timeval variables.

All in all, (b) sounds like it would be better for programs using glibc, 
as it's more compatible with what POSIX apps expect. Though I'm not sure 
what problems are meant by "possible ... incompatible definitions"; 
perhaps you could elaborate.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] y2038: rusage: use __kernel_old_timeval for process times
  2017-11-27 17:52   ` Paul Eggert
@ 2017-11-27 18:49     ` Eric W. Biederman
  2017-11-27 20:41       ` Arnd Bergmann
  0 siblings, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2017-11-27 18:49 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Arnd Bergmann, John Stultz, Thomas Gleixner, y2038, libc-alpha,
	linux-kernel, linux-arch, linux-api, Albert ARIBAUD,
	Richard Henderson, Ivan Kokshaysky, Matt Turner, Al Viro,
	Ingo Molnar, Frederic Weisbecker, Deepa Dinamani,
	Greg Kroah-Hartman, Oleg Nesterov, Andrew Morton, Kirill Tkhai,
	linux-alpha

Paul Eggert <eggert@cs.ucla.edu> writes:

> On 11/27/2017 09:00 AM, Arnd Bergmann wrote:
>> b) Extend the approach taken by the x32 ABI, and use the 64-bit
>>     native structure layout for rusage on all architectures with new
>>     system calls that is otherwise compatible. A possible problem here
>>     is that we end up with incompatible definitions of rusage between
>>     /usr/include/linux/resource.h and /usr/include/bits/resource.h
>>
>> c) Change the definition of struct rusage to be independent of
>>     time_t. This is the easiest change, as it does not involve new system
>>     call entry points, but it has the risk of introducing compile-time
>>     incompatibilities with user space sources that rely on the type
>>     of ru_utime and ru_stime.
>>
>> I'm picking approch c) for its simplicity, but I'd like to hear from
>> others whether they would prefer a different approach.
>
> (c) would break programs like GNU Emacs, which copy ru_utime and ru_stime
> members into struct timeval variables.
>
> All in all, (b) sounds like it would be better for programs using glibc, as it's
> more compatible with what POSIX apps expect. Though I'm not sure what problems
> are meant by "possible ... incompatible definitions"; perhaps you could
> elaborate.

getrusage is posix and I believe the use of struct timeval is posix as
well.

So getrusage(3) the libc definition and that defintion must struct
timeval or the implementation will be non-conforming and it won't be
just emacs we need to worry about.

The practical question is what do we provide to userspace so that it can
implement a conforming getrusage?

A 32bit time_t based struct timeval is good for durations up to 136 years
or so.  Which strongly suggests the range is large enough, except for
some crazy massively multi-threaded application.  And anything off the
charts cpu hungry at this point I expect will be 64bit.

It is possible to get a 128 way system with one thread on each core and
consume 100% of the core for a bit over a year to max out getrusage.  So
I do think in the long run we care about increasing the size of time_t
here.  Last I checked applications doing things like that were 64bit in
the year 2000.

Given that userspace is going to be seeing the larger struct rusage in
any event my inclination for long term maintainability would be to
introduce the new syscall and have the current one called oldgetrusage
on 32bit architectures.  Then we won't have to worry about what weird
things glibc will do when translating the data, and we can handle
applications with crazy (but possible) runtimes.  Which inclines me to
(b) as well.

As for (a) does anyone have a need for process acounting at nsec
granularity?  Unless we can get that for free that just seems like
overpromising and a waist to have so much fine granularity.

Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] y2038: rusage: use __kernel_old_timeval for process times
  2017-11-27 18:49     ` Eric W. Biederman
@ 2017-11-27 20:41       ` Arnd Bergmann
  2017-11-28 10:32         ` Arnd Bergmann
  0 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2017-11-27 20:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Paul Eggert, John Stultz, Thomas Gleixner, y2038 Mailman List,
	GNU C Library, Linux Kernel Mailing List, linux-arch, Linux API,
	Albert ARIBAUD, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Al Viro, Ingo Molnar, Frederic Weisbecker, Deepa Dinamani,
	Greg Kroah-Hartman, Oleg Nesterov, Andrew Morton, Kirill Tkhai,
	linux-alpha

On Mon, Nov 27, 2017 at 7:49 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Paul Eggert <eggert@cs.ucla.edu> writes:
>
>> On 11/27/2017 09:00 AM, Arnd Bergmann wrote:
>>> b) Extend the approach taken by the x32 ABI, and use the 64-bit
>>>     native structure layout for rusage on all architectures with new
>>>     system calls that is otherwise compatible. A possible problem here
>>>     is that we end up with incompatible definitions of rusage between
>>>     /usr/include/linux/resource.h and /usr/include/bits/resource.h
>>>
>>> c) Change the definition of struct rusage to be independent of
>>>     time_t. This is the easiest change, as it does not involve new system
>>>     call entry points, but it has the risk of introducing compile-time
>>>     incompatibilities with user space sources that rely on the type
>>>     of ru_utime and ru_stime.
>>>
>>> I'm picking approch c) for its simplicity, but I'd like to hear from
>>> others whether they would prefer a different approach.
>>
>> (c) would break programs like GNU Emacs, which copy ru_utime and ru_stime
>> members into struct timeval variables.

Right. I think I originally had the workaround to have glibc convert
between its own structure and the kernel structure in mind, but then
ended up not including that in the text above. I was going back and
forth on whether it would be needed or not.

>> All in all, (b) sounds like it would be better for programs using glibc, as it's
>> more compatible with what POSIX apps expect. Though I'm not sure what problems
>> are meant by "possible ... incompatible definitions"; perhaps you could
>> elaborate.

I meant that you might have an application that includes
linux/resource.h instead of sys/resource.h but calls the glibc
function, or one that includes sys/resource.h and invokes the
system call directly.

> getrusage is posix and I believe the use of struct timeval is posix as
> well.
>
> So getrusage(3) the libc definition and that defintion must struct
> timeval or the implementation will be non-conforming and it won't be
> just emacs we need to worry about.
>
> The practical question is what do we provide to userspace so that it can
> implement a conforming getrusage?
>
> A 32bit time_t based struct timeval is good for durations up to 136 years
> or so.  Which strongly suggests the range is large enough, except for
> some crazy massively multi-threaded application.  And anything off the
> charts cpu hungry at this point I expect will be 64bit.
>
> It is possible to get a 128 way system with one thread on each core and
> consume 100% of the core for a bit over a year to max out getrusage.  So
> I do think in the long run we care about increasing the size of time_t
> here.  Last I checked applications doing things like that were 64bit in
> the year 2000.

Agreed, this was also a calculation I did.

> Given that userspace is going to be seeing the larger struct rusage in
> any event my inclination for long term maintainability would be to
> introduce the new syscall and have the current one called oldgetrusage
> on 32bit architectures.  Then we won't have to worry about what weird
> things glibc will do when translating the data, and we can handle
> applications with crazy (but possible) runtimes.  Which inclines me to
> (b) as well.

This would actually be the same thing we do for most other syscalls,
regarding the naming, it would become compat_sys_getrusage()
and share the implementation between native 32-bit mode and
compat mode on 64-bit architectures, while sys_getrusage becomes
the function that deals with the 64-bit layout, and would have the
same binary format on both 32-bit and 64-bit native ABIs.

Unfortunately, this opens a new question, as the structure is currently
defined by glibc as:

/* Structure which says how much of each resource has been used.  */

/* The purpose of all the unions is to have the kernel-compatible layout
   while keeping the API type as 'long int', and among machines where
   __syscall_slong_t is not 'long int', this only does the right thing
   for little-endian ones, like x32.  */
struct rusage
  {
    /* Total amount of user time used.  */
    struct timeval ru_utime;
    /* Total amount of system time used.  */
    struct timeval ru_stime;
    /* Maximum resident set size (in kilobytes).  */
    __extension__ union
      {
        long int ru_maxrss;
        __syscall_slong_t __ru_maxrss_word;
      };
    /* Amount of sharing of text segment memory
       with other processes (kilobyte-seconds).  */
    /* Maximum resident set size (in kilobytes).  */
    __extension__ union
      {
        long int ru_ixrss;
        __syscall_slong_t __ru_ixrss_word;
      };
   ...
};

Here, I guess we have to replace __syscall_slong_t with an 'rusage'
specific type that has the same length as time_t, but is independent
of __syscall_slong_t, which is still 32-bit for most 32-bit architectures.

How would we do the big-endian version of that though?

One argument for using c) plus the emulation in glibc is that glibc
has to do emulation anyway, to allow running user space with 64-bit
time_t on older kernels that don't have the new getrusage system
call.

> As for (a) does anyone have a need for process acounting at nsec
> granularity?  Unless we can get that for free that just seems like
> overpromising and a waist to have so much fine granularity.

The kernel does everything in nanoseconds, so we always spend
a few cycles (a lot of cycles on some of the very low-end architectures)
on dividing it by 1000. Moving the division operation to user space
is essentially free, and using the nanoseconds instead of microseconds
might be slightly cheaper. I don't think anyone really needs it though.

      Arnd

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] y2038: rusage: use __kernel_old_timeval for process times
  2017-11-27 20:41       ` Arnd Bergmann
@ 2017-11-28 10:32         ` Arnd Bergmann
  0 siblings, 0 replies; 7+ messages in thread
From: Arnd Bergmann @ 2017-11-28 10:32 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Paul Eggert, John Stultz, Thomas Gleixner, y2038 Mailman List,
	GNU C Library, Linux Kernel Mailing List, linux-arch, Linux API,
	Albert ARIBAUD, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Al Viro, Ingo Molnar, Frederic Weisbecker, Deepa Dinamani,
	Greg Kroah-Hartman, Oleg Nesterov, Andrew Morton, Kirill Tkhai,
	linux-alpha

On Mon, Nov 27, 2017 at 9:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> One argument for using c) plus the emulation in glibc is that glibc
> has to do emulation anyway, to allow running user space with 64-bit
> time_t on older kernels that don't have the new getrusage system
> call.

To clarify my point here, if we stay with approach c), I think it should work
directly with the Albert's proposed patch "Y2038: add function
__getrusage_t64" [1], and we can remove the  "  // TODO: use
64-bit-time syscall if available" there. If we pick approach b), we
still need the same glibc patch, but would also implement the
interface to the new system call. As Eric said, this would be slightly
cleaner, but not really help us since 32-bit fields in rusage are
sufficient on the kernel interface side.

       Arnd

[1] https://patchwork.ozlabs.org/patch/811246/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-28 12:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27 17:00 [PATCH 1/3] y2038: introduce struct __kernel_old_timeval Arnd Bergmann
2017-11-27 17:00 ` [PATCH 2/3] y2038: elfcore: use __kernel_old_timeval for process times Arnd Bergmann
2017-11-27 17:00 ` [PATCH 3/3] y2038: rusage: " Arnd Bergmann
2017-11-27 17:52   ` Paul Eggert
2017-11-27 18:49     ` Eric W. Biederman
2017-11-27 20:41       ` Arnd Bergmann
2017-11-28 10:32         ` Arnd Bergmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).