linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 00/28] kernel: Introduce Time Namespace
@ 2019-06-12 19:25 Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 01/28] ns: " Dmitry Safonov
                   ` (27 more replies)
  0 siblings, 28 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86, Andrei Vagin

Discussions around time namespace are there for a long time. The first
attempt to implement it was in 2006 by Jeff Dike. From that time, the
topic appears on and off in various discussions.

There are two main use cases for time namespaces:
1. change date and time inside a container;
2. adjust clocks for a container restored from a checkpoint.

“It seems like this might be one of the last major obstacles keeping
migration from being used in production systems, given that not all
containers and connections can be migrated as long as a time dependency
is capable of messing it up.” (by github.com/dav-ell)

The kernel provides access to several clocks: CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
start points for them are not defined and are different for each
system. When a container is migrated from one node to another, all
clocks have to be restored into consistent states; in other words, they
have to continue running from the same points where they have been
dumped.

The main idea of this patch set is adding per-namespace offsets for
system clocks. When a process in a non-root time namespace requests
time of a clock, a namespace offset is added to the current value of
this clock and the sum is returned.

All offsets are placed on a separate page, this allows us to map it as
part of VVAR into user processes and use offsets from VDSO calls.

Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME
clocks.

v4 Changes:

* CLOCKE_NEWTIME is unshare()-only flag now (CLON_PIDFD took previous value)
* Addressing Jann Horn's feedback - we don't allow CLONE_THREAD or
  CLONE_VM together with CLONE_NEWTIME (thanks for spotting!)
* Addressing issues found by Thomas - removed unmaintainable CLOCK_TIMENS
  and introduced another call back into k_clock to get ktime instead
  of getting timespec and converting it (Patch 03)
* Renaming timens_offsets members to omit _offset postfix
  (thanks Cyrill for the suggestion)
* Suggestions, renaming and making code more maintainable from Thomas's
  feedback (thanks much!)
* Fixing out-of-bounds and other issues in procfs file (kudos Jann Horn)
* vdso_fault() can be called on a remote task by /proc/$pid/mem or
  process_vm_readv() - addressed by adding a slow-path with searching
  for owner's namespace (thanks for spotting this unobvious issue, Jann)
* Other nits by Jann Horn

v3: Major changes:

* Simplify two VDSO images by using static_branch() in vclock_gettime()
  Removes unwanted conflicts with generic VDSO movement patches and
  simplifies things by dropping too invasive linker magic.
  As an alternative to static_branch() we tested an attempt to introduce
  home-made dynamic patching called retcalls:
  https://github.com/0x7f454c46/linux/commit/4cc0180f6d65
  Considering some theoretical problems with toolchains, we decided to go
  with long well-tested nop-patching in static_branch(). Though, it was
  needed to provide backend for relative code.

* address Thomas' comments.
* add sanity checks for offsets:
  - the current clock time in a namespace has to be in [0, KTIME_MAX / 2).
    KTIME_MAX is divided by two here to be sure that the KTIME_MAX limit
    is still unreachable.
Link: https://lkml.org/lkml/2018/9/19/950
Link: https://lkml.org/lkml/2019/2/5/867

v2: There are two major changes:

* Two versions of the VDSO library to avoid a performance penalty for
  host tasks outside time namespace (as suggested by Andy and Thomas).

  As it has been discussed on timens RFC, adding a new conditional branch
  `if (inside_time_ns)` on VDSO for all processes is undesirable.
  It will add a penalty for everybody as branch predictor may mispredict
  the jump. Also there are instruction cache lines wasted on cmp/jmp.

  Those effects of introducing time namespace are very much unwanted
  having in mind how much work have been spent on micro-optimisation
  VDSO code.

  Addressing those problems, there are two versions of VDSO's .so:
  for host tasks (without any penalty) and for processes inside of time
  namespace with clk_to_ns() that subtracts offsets from host's time.


* Allow to set clock offsets for a namespace only before any processes
  appear in it.

  Now a time namespace looks similar to a pid namespace in a way how it is
  created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
  but doesn't set it to the current process. Then all children of
  the process will be born in the new time namespace, or a process can
  use the setns() system call to join a namespace.

  This scheme allows to create a new time namespaces, set clock offsets
  and then populate the namespace with processes.

Our performance measurements show that the price of VDSO's clock_gettime()
in a child time namespace is about 8% with a hot CPU cache and about 90%
with a cold CPU cache. There is no performance regression for host
processes outside time namespace on those tests.

We wrote two small benchmarks. The first one gettime_perf.c calls
clock_gettime() in a loop for 3 seconds. It shows us performance with
a hot CPU cache (more clock_gettime() cycles - the better):

        | before     | CONFIG_TIME_NS=n | host        | inside timens
--------|------------|------------------|-------------|-------------
cycles  | 139887013  | 139453003        | 139899785   | 128792458
diff (%)| 100        | 99.7             | 100         | 92

The second one gettime_perf_cold.c calls rdtsc, clock_gettime(), rdtsc
and shows a difference of second and first rdtsc. We call this binary in
a loop 1000 times, get 1000 values and calculate MODE for them.
It should show us performance with a cold CPU cache
(lesser tsc per cycle - the better):

        | before     | CONFIG_TIME_NS=n | host        | inside timens
--------|------------|------------------|-------------|-------------
tsc     | 6748       | 6718             | 6862        | 12682
diff (%)| 100        | 99.6             | 101.7       | 188

The numbers gathered on Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz.

Cc: Adrian Reber <adrian@lisas.de>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: containers@lists.linux-foundation.org
Cc: criu@openvz.org
Cc: linux-api@vger.kernel.org
Cc: x86@kernel.org

v3: https://lkml.kernel.org/r/20190425161416.26600-1-dima@arista.com
v2: https://lore.kernel.org/lkml/20190206001107.16488-1-dima@arista.com/
RFC: https://lkml.kernel.org/r/20180919205037.9574-1-dima@arista.com/

Andrei Vagin (17):
  ns: Introduce Time Namespace
  timens: Add timens_offsets
  posix-clocks: add another call back to return clock time in ktime_t
  timens: Introduce CLOCK_MONOTONIC offsets
  timens: Introduce CLOCK_BOOTTIME offset
  timerfd/timens: Take into account ns clock offsets
  posix-timers/timens: Take into account clock offsets
  timens/kernel: Take into account timens clock offsets in
    clock_nanosleep
  x86/vdso: Add offsets page in vvar
  vdso: introduce timens_static_branch
  timens/fs/proc: Introduce /proc/pid/timens_offsets
  selftest/timens: Add a test for timerfd
  selftest/timens: Add a test for clock_nanosleep()
  selftest/timens: Add timer offsets test
  x86/vdso: Align VDSO functions by CPU L1 cache line
  selftests: Add a simple perf test for clock_gettime()
  selftest/timens: Check that a right vdso is mapped after fork and exec

Dmitry Safonov (11):
  timens: Shift /proc/uptime
  x86/vdso2c: Correct err messages on file opening
  x86/vdso2c: Convert iterator to unsigned
  x86/vdso/Makefile: Add vobjs32
  x86/vdso: Restrict splitting VVAR VMA
  x86/vdso: Rename vdso_image {.data=>.text}
  x86/vdso: Allocate timens vdso
  x86/vdso: Switch image on setns()/unshare()/clone()
  timens: Add align for timens_offsets
  selftest/timens: Add Time Namespace test for supported clocks
  selftest/timens: Add procfs selftest

 MAINTAINERS                                   |   3 +
 arch/Kconfig                                  |   5 +
 arch/x86/Kconfig                              |   1 +
 arch/x86/entry/vdso/Makefile                  |  16 +-
 arch/x86/entry/vdso/vclock_gettime.c          |  48 +++
 arch/x86/entry/vdso/vdso-layout.lds.S         |  10 +-
 arch/x86/entry/vdso/vdso2c.c                  |   7 +-
 arch/x86/entry/vdso/vdso2c.h                  |  24 +-
 arch/x86/entry/vdso/vma.c                     | 189 ++++++++-
 arch/x86/include/asm/jump_label.h             |  14 +
 arch/x86/include/asm/vdso.h                   |  14 +-
 fs/proc/base.c                                |  95 +++++
 fs/proc/namespaces.c                          |   4 +
 fs/proc/uptime.c                              |   3 +
 fs/timerfd.c                                  |   3 +
 include/linux/hrtimer.h                       |   2 +-
 include/linux/jump_label.h                    |   5 +
 include/linux/nsproxy.h                       |   2 +
 include/linux/posix-timers.h                  |   3 +
 include/linux/proc_ns.h                       |   2 +
 include/linux/time_namespace.h                | 115 ++++++
 include/linux/timens_offsets.h                |  18 +
 include/linux/user_namespace.h                |   1 +
 include/uapi/linux/sched.h                    |   5 +
 init/Kconfig                                  |   8 +
 kernel/Makefile                               |   1 +
 kernel/fork.c                                 |  29 +-
 kernel/nsproxy.c                              |  41 +-
 kernel/time/alarmtimer.c                      |  27 +-
 kernel/time/hrtimer.c                         |   8 +-
 kernel/time/posix-clock.c                     |   8 +-
 kernel/time/posix-cpu-timers.c                |  32 +-
 kernel/time/posix-stubs.c                     |  15 +-
 kernel/time/posix-timers.c                    |  87 +++--
 kernel/time/posix-timers.h                    |   7 +-
 kernel/time_namespace.c                       | 367 ++++++++++++++++++
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/timens/.gitignore     |   8 +
 tools/testing/selftests/timens/Makefile       |  12 +
 .../selftests/timens/clock_nanosleep.c        | 100 +++++
 tools/testing/selftests/timens/config         |   1 +
 tools/testing/selftests/timens/exec.c         |  91 +++++
 tools/testing/selftests/timens/gettime_perf.c |  74 ++++
 .../selftests/timens/gettime_perf_cold.c      |  63 +++
 tools/testing/selftests/timens/log.h          |  26 ++
 tools/testing/selftests/timens/procfs.c       | 142 +++++++
 tools/testing/selftests/timens/timens.c       | 188 +++++++++
 tools/testing/selftests/timens/timens.h       |  63 +++
 tools/testing/selftests/timens/timer.c        | 116 ++++++
 tools/testing/selftests/timens/timerfd.c      | 127 ++++++
 50 files changed, 2132 insertions(+), 99 deletions(-)
 create mode 100644 include/linux/time_namespace.h
 create mode 100644 include/linux/timens_offsets.h
 create mode 100644 kernel/time_namespace.c
 create mode 100644 tools/testing/selftests/timens/.gitignore
 create mode 100644 tools/testing/selftests/timens/Makefile
 create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c
 create mode 100644 tools/testing/selftests/timens/config
 create mode 100644 tools/testing/selftests/timens/exec.c
 create mode 100644 tools/testing/selftests/timens/gettime_perf.c
 create mode 100644 tools/testing/selftests/timens/gettime_perf_cold.c
 create mode 100644 tools/testing/selftests/timens/log.h
 create mode 100644 tools/testing/selftests/timens/procfs.c
 create mode 100644 tools/testing/selftests/timens/timens.c
 create mode 100644 tools/testing/selftests/timens/timens.h
 create mode 100644 tools/testing/selftests/timens/timer.c
 create mode 100644 tools/testing/selftests/timens/timerfd.c

-- 
2.22.0


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCHv4 01/28] ns: Introduce Time Namespace
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 02/28] timens: Add timens_offsets Dmitry Safonov
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

From: Andrei Vagin <avagin@openvz.org>

Time Namespace isolates clock values.

The kernel provides access to several clocks CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.

CLOCK_REALTIME
      System-wide clock that measures real (i.e., wall-clock) time.

CLOCK_MONOTONIC
      Clock that cannot be set and represents monotonic time since
      some unspecified starting point.

CLOCK_BOOTTIME
      Identical to CLOCK_MONOTONIC, except it also includes any time
      that the system is suspended.

For many users, the time namespace means the ability to changes date and
time in a container (CLOCK_REALTIME).

But in a context of the checkpoint/restore functionality, monotonic and
bootime clocks become interesting. Both clocks are monotonic with
unspecified staring points. These clocks are widely used to measure time
slices and set timers. After restoring or migrating processes, we have to
guarantee that they never go backward. In an ideal case, the behavior of
these clocks should be the same as for a case when a whole system is
suspended. All this means that we need to be able to set CLOCK_MONOTONIC
and CLOCK_BOOTTIME clocks, what can be done by adding per-namespace
offsets for clocks.

A time namespace is similar to a pid namespace in a way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
but doesn't set it to the current process. Then all children of
the process will be born in the new time namespace, or a process can
use the setns() system call to join a namespace.

This scheme allows setting clock offsets for a namespace, before any
processes appear in it.

All avaliable clone flags have been used, so CLONE_NEWTIME uses the
highest bit of CSIGNAL. It means that we can use it with the unshare
system call only. Rith now, this works for us, because time namespace
offsets can be set only when a new time namespace is not populated. In a
future, we will have the clone3 system call [1] which will allow to use
the CSIGNAL mask for clone flags.

[1]: httmps://lkml.kernel.org/r/20190604160944.4058-1-christian@brauner.io

Link: https://criu.org/Time_namespace
Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 MAINTAINERS                    |   2 +
 fs/proc/namespaces.c           |   4 +
 include/linux/nsproxy.h        |   2 +
 include/linux/proc_ns.h        |   2 +
 include/linux/time_namespace.h |  69 +++++++++++
 include/linux/user_namespace.h |   1 +
 include/uapi/linux/sched.h     |   5 +
 init/Kconfig                   |   7 ++
 kernel/Makefile                |   1 +
 kernel/fork.c                  |  29 ++++-
 kernel/nsproxy.c               |  41 +++++--
 kernel/time_namespace.c        | 215 +++++++++++++++++++++++++++++++++
 12 files changed, 367 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/time_namespace.h
 create mode 100644 kernel/time_namespace.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 57f496cff999..323ab92b963b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12589,6 +12589,8 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/core
 S:	Maintained
 F:	fs/timerfd.c
 F:	include/linux/timer*
+F:	include/linux/time_namespace.h
+F:	kernel/time_namespace.c
 F:	kernel/time/*timer*
 
 POWER MANAGEMENT CORE
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index dd2b35f78b09..8b5c720fe5d7 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -33,6 +33,10 @@ static const struct proc_ns_operations *ns_entries[] = {
 #ifdef CONFIG_CGROUPS
 	&cgroupns_operations,
 #endif
+#ifdef CONFIG_TIME_NS
+	&timens_operations,
+	&timens_for_children_operations,
+#endif
 };
 
 static const char *proc_ns_get_link(struct dentry *dentry,
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 2ae1b1a4d84d..074f395b9ad2 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -35,6 +35,8 @@ struct nsproxy {
 	struct mnt_namespace *mnt_ns;
 	struct pid_namespace *pid_ns_for_children;
 	struct net 	     *net_ns;
+	struct time_namespace *time_ns;
+	struct time_namespace *time_ns_for_children;
 	struct cgroup_namespace *cgroup_ns;
 };
 extern struct nsproxy init_nsproxy;
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index d31cb6215905..3e6f332da465 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -32,6 +32,8 @@ extern const struct proc_ns_operations pidns_for_children_operations;
 extern const struct proc_ns_operations userns_operations;
 extern const struct proc_ns_operations mntns_operations;
 extern const struct proc_ns_operations cgroupns_operations;
+extern const struct proc_ns_operations timens_operations;
+extern const struct proc_ns_operations timens_for_children_operations;
 
 /*
  * We always define these enumerators
diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
new file mode 100644
index 000000000000..9507ed7072fe
--- /dev/null
+++ b/include/linux/time_namespace.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_TIMENS_H
+#define _LINUX_TIMENS_H
+
+
+#include <linux/sched.h>
+#include <linux/kref.h>
+#include <linux/nsproxy.h>
+#include <linux/ns_common.h>
+#include <linux/err.h>
+
+struct user_namespace;
+extern struct user_namespace init_user_ns;
+
+struct time_namespace {
+	struct kref kref;
+	struct user_namespace *user_ns;
+	struct ucounts *ucounts;
+	struct ns_common ns;
+	struct timens_offsets *offsets;
+	bool   initialized;
+} __randomize_layout;
+extern struct time_namespace init_time_ns;
+
+#ifdef CONFIG_TIME_NS
+static inline struct time_namespace *get_time_ns(struct time_namespace *ns)
+{
+	kref_get(&ns->kref);
+	return ns;
+}
+
+extern struct time_namespace *copy_time_ns(unsigned long flags,
+	struct user_namespace *user_ns, struct time_namespace *old_ns);
+extern void free_time_ns(struct kref *kref);
+extern int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk);
+
+static inline void put_time_ns(struct time_namespace *ns)
+{
+	kref_put(&ns->kref, free_time_ns);
+}
+
+
+#else
+static inline struct time_namespace *get_time_ns(struct time_namespace *ns)
+{
+	return NULL;
+}
+
+static inline void put_time_ns(struct time_namespace *ns)
+{
+}
+
+static inline struct time_namespace *copy_time_ns(unsigned long flags,
+	struct user_namespace *user_ns, struct time_namespace *old_ns)
+{
+	if (flags & CLONE_NEWTIME)
+		return ERR_PTR(-EINVAL);
+
+	return old_ns;
+}
+
+static inline int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk)
+{
+	return 0;
+}
+
+#endif
+
+#endif /* _LINUX_TIMENS_H */
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index d6b74b91096b..bf84f93dc411 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -45,6 +45,7 @@ enum ucount_type {
 	UCOUNT_NET_NAMESPACES,
 	UCOUNT_MNT_NAMESPACES,
 	UCOUNT_CGROUP_NAMESPACES,
+	UCOUNT_TIME_NAMESPACES,
 #ifdef CONFIG_INOTIFY_USER
 	UCOUNT_INOTIFY_INSTANCES,
 	UCOUNT_INOTIFY_WATCHES,
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index ed4ee170bee2..6c4d7a5d5253 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -31,6 +31,11 @@
 #define CLONE_NEWNET		0x40000000	/* New network namespace */
 #define CLONE_IO		0x80000000	/* Clone io context */
 
+/*
+ * cloning flags which can be used with the unshare syscall only:
+ */
+#define CLONE_NEWTIME	0x00000080	/* New time namespace */
+
 /*
  * Scheduling policies
  */
diff --git a/init/Kconfig b/init/Kconfig
index 0e2344389501..098fe185360c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -991,6 +991,13 @@ config UTS_NS
 	  In this namespace tasks see different info provided with the
 	  uname() system call
 
+config TIME_NS
+	bool "TIME namespace"
+	default y
+	help
+	  In this namespace boottime and monotonic clocks can be set.
+	  The time will keep going with the same pace.
+
 config IPC_NS
 	bool "IPC namespace"
 	depends on (SYSVIPC || POSIX_MQUEUE)
diff --git a/kernel/Makefile b/kernel/Makefile
index a8d923b5481b..a8654c9aa2f5 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -68,6 +68,7 @@ obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o
 obj-$(CONFIG_COMPAT) += compat.o
 obj-$(CONFIG_CGROUPS) += cgroup/
 obj-$(CONFIG_UTS_NS) += utsname.o
+obj-$(CONFIG_TIME_NS) += time_namespace.o
 obj-$(CONFIG_USER_NS) += user_namespace.o
 obj-$(CONFIG_PID_NS) += pid_namespace.o
 obj-$(CONFIG_IKCONFIG) += configs.o
diff --git a/kernel/fork.c b/kernel/fork.c
index 75675b9bf6df..09dacb708b4f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1774,6 +1774,15 @@ static __latent_entropy struct task_struct *copy_process(
 	int pidfd = -1, retval;
 	struct task_struct *p;
 	struct multiprocess_signals delayed;
+	struct nsproxy *nsp = current->nsproxy;
+	int exit_signal;
+
+	/*
+	 * Exclude clone flags which intersect with CSIGNAL and can be used
+	 * with the unshare syscall only.
+	 */
+	exit_signal = clone_flags & CSIGNAL;
+	clone_flags &= ~CSIGNAL;
 
 	/*
 	 * Don't allow sharing the root directory with processes in a different
@@ -1816,8 +1825,17 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	if (clone_flags & CLONE_THREAD) {
 		if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
-		    (task_active_pid_ns(current) !=
-				current->nsproxy->pid_ns_for_children))
+		    (task_active_pid_ns(current) != nsp->pid_ns_for_children))
+			return ERR_PTR(-EINVAL);
+	}
+
+	/*
+	 * If the new process will be in a different time namespace
+	 * do not allow it to share VM or a thread group with the forking task.
+	 */
+	if (clone_flags & (CLONE_THREAD | CLONE_VM)) {
+		if ((clone_flags & CLONE_NEWTIME) ||
+		    (nsp->time_ns != nsp->time_ns_for_children))
 			return ERR_PTR(-EINVAL);
 	}
 
@@ -2106,7 +2124,7 @@ static __latent_entropy struct task_struct *copy_process(
 		if (clone_flags & CLONE_PARENT)
 			p->exit_signal = current->group_leader->exit_signal;
 		else
-			p->exit_signal = (clone_flags & CSIGNAL);
+			p->exit_signal = exit_signal;
 		p->group_leader = p;
 		p->tgid = p->pid;
 	}
@@ -2568,7 +2586,8 @@ static int check_unshare_flags(unsigned long unshare_flags)
 	if (unshare_flags & ~(CLONE_THREAD|CLONE_FS|CLONE_NEWNS|CLONE_SIGHAND|
 				CLONE_VM|CLONE_FILES|CLONE_SYSVSEM|
 				CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWNET|
-				CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP))
+				CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWCGROUP|
+				CLONE_NEWTIME))
 		return -EINVAL;
 	/*
 	 * Not implemented, but pretend it works if there is nothing
@@ -2579,6 +2598,8 @@ static int check_unshare_flags(unsigned long unshare_flags)
 	if (unshare_flags & (CLONE_THREAD | CLONE_SIGHAND | CLONE_VM)) {
 		if (!thread_group_empty(current))
 			return -EINVAL;
+		if (unshare_flags & CLONE_NEWTIME)
+			return -EINVAL;
 	}
 	if (unshare_flags & (CLONE_SIGHAND | CLONE_VM)) {
 		if (refcount_read(&current->sighand->count) > 1)
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index c815f58e6bc0..ed9882108cd2 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -18,6 +18,7 @@
 #include <linux/pid_namespace.h>
 #include <net/net_namespace.h>
 #include <linux/ipc_namespace.h>
+#include <linux/time_namespace.h>
 #include <linux/proc_ns.h>
 #include <linux/file.h>
 #include <linux/syscalls.h>
@@ -40,6 +41,10 @@ struct nsproxy init_nsproxy = {
 #ifdef CONFIG_CGROUPS
 	.cgroup_ns		= &init_cgroup_ns,
 #endif
+#ifdef CONFIG_TIME_NS
+	.time_ns		= &init_time_ns,
+	.time_ns_for_children	= &init_time_ns,
+#endif
 };
 
 static inline struct nsproxy *create_nsproxy(void)
@@ -106,8 +111,18 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
 		goto out_net;
 	}
 
+	new_nsp->time_ns_for_children = copy_time_ns(flags, user_ns,
+					tsk->nsproxy->time_ns_for_children);
+	if (IS_ERR(new_nsp->time_ns_for_children)) {
+		err = PTR_ERR(new_nsp->time_ns_for_children);
+		goto out_time;
+	}
+	new_nsp->time_ns = get_time_ns(tsk->nsproxy->time_ns);
+
 	return new_nsp;
 
+out_time:
+	put_net(new_nsp->net_ns);
 out_net:
 	put_cgroup_ns(new_nsp->cgroup_ns);
 out_cgroup:
@@ -136,15 +151,16 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	struct nsproxy *old_ns = tsk->nsproxy;
 	struct user_namespace *user_ns = task_cred_xxx(tsk, user_ns);
 	struct nsproxy *new_ns;
+	int ret;
 
 	if (likely(!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
 			      CLONE_NEWPID | CLONE_NEWNET |
-			      CLONE_NEWCGROUP)))) {
-		get_nsproxy(old_ns);
-		return 0;
-	}
-
-	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
+			      CLONE_NEWCGROUP | CLONE_NEWTIME)))) {
+		if (likely(old_ns->time_ns_for_children == old_ns->time_ns)) {
+			get_nsproxy(old_ns);
+			return 0;
+		}
+	} else if (!ns_capable(user_ns, CAP_SYS_ADMIN))
 		return -EPERM;
 
 	/*
@@ -162,6 +178,12 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	if (IS_ERR(new_ns))
 		return  PTR_ERR(new_ns);
 
+	ret = timens_on_fork(new_ns, tsk);
+	if (ret) {
+		free_nsproxy(new_ns);
+		return ret;
+	}
+
 	tsk->nsproxy = new_ns;
 	return 0;
 }
@@ -176,6 +198,10 @@ void free_nsproxy(struct nsproxy *ns)
 		put_ipc_ns(ns->ipc_ns);
 	if (ns->pid_ns_for_children)
 		put_pid_ns(ns->pid_ns_for_children);
+	if (ns->time_ns)
+		put_time_ns(ns->time_ns);
+	if (ns->time_ns_for_children)
+		put_time_ns(ns->time_ns_for_children);
 	put_cgroup_ns(ns->cgroup_ns);
 	put_net(ns->net_ns);
 	kmem_cache_free(nsproxy_cachep, ns);
@@ -192,7 +218,8 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags,
 	int err = 0;
 
 	if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			       CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP)))
+			       CLONE_NEWNET | CLONE_NEWPID | CLONE_NEWCGROUP |
+			       CLONE_NEWTIME)))
 		return 0;
 
 	user_ns = new_cred ? new_cred->user_ns : current_user_ns();
diff --git a/kernel/time_namespace.c b/kernel/time_namespace.c
new file mode 100644
index 000000000000..8c600df9771d
--- /dev/null
+++ b/kernel/time_namespace.c
@@ -0,0 +1,215 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Author: Andrei Vagin <avagin@openvz.org>
+ * Author: Dmitry Safonov <dima@arista.com>
+ */
+
+#include <linux/export.h>
+#include <linux/time.h>
+#include <linux/time_namespace.h>
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/cred.h>
+#include <linux/user_namespace.h>
+#include <linux/proc_ns.h>
+#include <linux/sched/task.h>
+
+static struct ucounts *inc_time_namespaces(struct user_namespace *ns)
+{
+	return inc_ucount(ns, current_euid(), UCOUNT_TIME_NAMESPACES);
+}
+
+static void dec_time_namespaces(struct ucounts *ucounts)
+{
+	dec_ucount(ucounts, UCOUNT_TIME_NAMESPACES);
+}
+
+static struct time_namespace *create_time_ns(void)
+{
+	struct time_namespace *time_ns;
+
+	time_ns = kmalloc(sizeof(struct time_namespace), GFP_KERNEL);
+	if (time_ns) {
+		kref_init(&time_ns->kref);
+		time_ns->initialized = false;
+	}
+	return time_ns;
+}
+
+/*
+ * Clone a new ns copying @old_ns, setting refcount to 1
+ * @old_ns: namespace to clone
+ * Return the new ns or ERR_PTR.
+ */
+static struct time_namespace *clone_time_ns(struct user_namespace *user_ns,
+					  struct time_namespace *old_ns)
+{
+	struct time_namespace *ns;
+	struct ucounts *ucounts;
+	int err;
+
+	err = -ENOSPC;
+	ucounts = inc_time_namespaces(user_ns);
+	if (!ucounts)
+		goto fail;
+
+	err = -ENOMEM;
+	ns = create_time_ns();
+	if (!ns)
+		goto fail_dec;
+
+	err = ns_alloc_inum(&ns->ns);
+	if (err)
+		goto fail_free;
+
+	ns->ucounts = ucounts;
+	ns->ns.ops = &timens_operations;
+	ns->user_ns = get_user_ns(user_ns);
+	return ns;
+
+fail_free:
+	kfree(ns);
+fail_dec:
+	dec_time_namespaces(ucounts);
+fail:
+	return ERR_PTR(err);
+}
+
+/*
+ * Add a reference to old_ns, or clone it if @flags specify CLONE_NEWTIME.
+ * In latter case, changes to the time of this process won't be seen by parent,
+ * and vice versa.
+ */
+struct time_namespace *copy_time_ns(unsigned long flags,
+	struct user_namespace *user_ns, struct time_namespace *old_ns)
+{
+	if (!(flags & CLONE_NEWTIME))
+		return get_time_ns(old_ns);
+
+	return clone_time_ns(user_ns, old_ns);
+}
+
+void free_time_ns(struct kref *kref)
+{
+	struct time_namespace *ns;
+
+	ns = container_of(kref, struct time_namespace, kref);
+	dec_time_namespaces(ns->ucounts);
+	put_user_ns(ns->user_ns);
+	ns_free_inum(&ns->ns);
+	kfree(ns);
+}
+
+static struct time_namespace *to_time_ns(struct ns_common *ns)
+{
+	return container_of(ns, struct time_namespace, ns);
+}
+
+static struct ns_common *timens_get(struct task_struct *task)
+{
+	struct time_namespace *ns = NULL;
+	struct nsproxy *nsproxy;
+
+	task_lock(task);
+	nsproxy = task->nsproxy;
+	if (nsproxy) {
+		ns = nsproxy->time_ns;
+		get_time_ns(ns);
+	}
+	task_unlock(task);
+
+	return ns ? &ns->ns : NULL;
+}
+
+static struct ns_common *timens_for_children_get(struct task_struct *task)
+{
+	struct time_namespace *ns = NULL;
+	struct nsproxy *nsproxy;
+
+	task_lock(task);
+	nsproxy = task->nsproxy;
+	if (nsproxy) {
+		ns = nsproxy->time_ns_for_children;
+		get_time_ns(ns);
+	}
+	task_unlock(task);
+
+	return ns ? &ns->ns : NULL;
+}
+
+static void timens_put(struct ns_common *ns)
+{
+	put_time_ns(to_time_ns(ns));
+}
+
+static int timens_install(struct nsproxy *nsproxy, struct ns_common *new)
+{
+	struct time_namespace *ns = to_time_ns(new);
+
+	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
+	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
+		return -EPERM;
+
+	get_time_ns(ns);
+	get_time_ns(ns);
+	put_time_ns(nsproxy->time_ns);
+	put_time_ns(nsproxy->time_ns_for_children);
+	nsproxy->time_ns = ns;
+	nsproxy->time_ns_for_children = ns;
+	ns->initialized = true;
+	return 0;
+}
+
+int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk)
+{
+	struct ns_common *nsc = &nsproxy->time_ns_for_children->ns;
+	struct time_namespace *ns = to_time_ns(nsc);
+
+	if (nsproxy->time_ns == nsproxy->time_ns_for_children)
+		return 0;
+
+	get_time_ns(ns);
+	put_time_ns(nsproxy->time_ns);
+	nsproxy->time_ns = ns;
+	ns->initialized = true;
+
+	return 0;
+}
+
+static struct user_namespace *timens_owner(struct ns_common *ns)
+{
+	return to_time_ns(ns)->user_ns;
+}
+
+const struct proc_ns_operations timens_operations = {
+	.name		= "time",
+	.type		= CLONE_NEWTIME,
+	.get		= timens_get,
+	.put		= timens_put,
+	.install	= timens_install,
+	.owner		= timens_owner,
+};
+
+const struct proc_ns_operations timens_for_children_operations = {
+	.name		= "time_for_children",
+	.type		= CLONE_NEWTIME,
+	.get		= timens_for_children_get,
+	.put		= timens_put,
+	.install	= timens_install,
+	.owner		= timens_owner,
+};
+
+struct time_namespace init_time_ns = {
+	.kref = KREF_INIT(3),
+	.user_ns = &init_user_ns,
+	.ns.inum = PROC_UTS_INIT_INO,
+#ifdef CONFIG_UTS_NS
+	.ns.ops = &timens_operations,
+#endif
+};
+
+static int __init time_ns_init(void)
+{
+	return 0;
+}
+subsys_initcall(time_ns_init);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 02/28] timens: Add timens_offsets
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 01/28] ns: " Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-14 13:11   ` Thomas Gleixner
  2019-06-12 19:26 ` [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t Dmitry Safonov
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

From: Andrei Vagin <avagin@openvz.org>

Introduce offsets for time namespace. They will contain an adjustment
needed to convert clocks to/from host's.

Allocate one page for each time namespace that will be premapped into
userspace among vvar pages.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 MAINTAINERS                    |  1 +
 include/linux/time_namespace.h |  1 +
 include/linux/timens_offsets.h |  8 ++++++++
 kernel/time_namespace.c        | 14 ++++++++++++--
 4 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/timens_offsets.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 323ab92b963b..bf55aec42f2d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12590,6 +12590,7 @@ S:	Maintained
 F:	fs/timerfd.c
 F:	include/linux/timer*
 F:	include/linux/time_namespace.h
+F:	include/linux/timens_offsets.h
 F:	kernel/time_namespace.c
 F:	kernel/time/*timer*
 
diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index 9507ed7072fe..b6985aa87479 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -8,6 +8,7 @@
 #include <linux/nsproxy.h>
 #include <linux/ns_common.h>
 #include <linux/err.h>
+#include <linux/timens_offsets.h>
 
 struct user_namespace;
 extern struct user_namespace init_user_ns;
diff --git a/include/linux/timens_offsets.h b/include/linux/timens_offsets.h
new file mode 100644
index 000000000000..7d7cb68ea778
--- /dev/null
+++ b/include/linux/timens_offsets.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_TIME_OFFSETS_H
+#define _LINUX_TIME_OFFSETS_H
+
+struct timens_offsets {
+};
+
+#endif
diff --git a/kernel/time_namespace.c b/kernel/time_namespace.c
index 8c600df9771d..4828447721ec 100644
--- a/kernel/time_namespace.c
+++ b/kernel/time_namespace.c
@@ -13,6 +13,7 @@
 #include <linux/user_namespace.h>
 #include <linux/proc_ns.h>
 #include <linux/sched/task.h>
+#include <linux/mm.h>
 
 static struct ucounts *inc_time_namespaces(struct user_namespace *ns)
 {
@@ -46,6 +47,7 @@ static struct time_namespace *clone_time_ns(struct user_namespace *user_ns,
 {
 	struct time_namespace *ns;
 	struct ucounts *ucounts;
+	struct page *page;
 	int err;
 
 	err = -ENOSPC;
@@ -58,15 +60,22 @@ static struct time_namespace *clone_time_ns(struct user_namespace *user_ns,
 	if (!ns)
 		goto fail_dec;
 
+	page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!page)
+		goto fail_free;
+	ns->offsets = page_address(page);
+	BUILD_BUG_ON(sizeof(*ns->offsets) > PAGE_SIZE);
+
 	err = ns_alloc_inum(&ns->ns);
 	if (err)
-		goto fail_free;
+		goto fail_page;
 
 	ns->ucounts = ucounts;
 	ns->ns.ops = &timens_operations;
 	ns->user_ns = get_user_ns(user_ns);
 	return ns;
-
+fail_page:
+	free_page((unsigned long)ns->offsets);
 fail_free:
 	kfree(ns);
 fail_dec:
@@ -94,6 +103,7 @@ void free_time_ns(struct kref *kref)
 	struct time_namespace *ns;
 
 	ns = container_of(kref, struct time_namespace, kref);
+	free_page((unsigned long)ns->offsets);
 	dec_time_namespaces(ns->ucounts);
 	put_user_ns(ns->user_ns);
 	ns_free_inum(&ns->ns);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 01/28] ns: " Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 02/28] timens: Add timens_offsets Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-14 13:32   ` Thomas Gleixner
  2019-06-12 19:26 ` [PATCHv4 04/28] timens: Introduce CLOCK_MONOTONIC offsets Dmitry Safonov
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

The callsite in common_timer_get() has already a comment:
        /*
         * The timespec64 based conversion is suboptimal, but it's not
         * worth to implement yet another callback.
         */
        kc->clock_get(timr->it_clock, &ts64);
        now = timespec64_to_ktime(ts64);

Now we are going to add time namespaces and we need to be able to get:
* clock value in a task time namespace to return it from the clock_gettime
  syscall.
* clock valuse in the root time namespace to use it in
  common_timer_get().

It looks like another reason why we need a separate callback to return
clock value in ktime_t.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/posix-timers.h   |  3 ++
 kernel/time/alarmtimer.c       | 24 ++++++++++---
 kernel/time/posix-clock.c      |  8 ++---
 kernel/time/posix-cpu-timers.c | 32 +++++++++---------
 kernel/time/posix-timers.c     | 61 ++++++++++++++++++++++------------
 kernel/time/posix-timers.h     |  7 ++--
 6 files changed, 87 insertions(+), 48 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index b20798fc5191..fe13ab265213 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -127,4 +127,7 @@ void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
 void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new);
 
 void posixtimer_rearm(struct kernel_siginfo *info);
+
+int posix_get_timespec(clockid_t which_clock, struct timespec64 *tp);
+int posix_get_boottime_timespec(const clockid_t which_clock, struct timespec64 *tp);
 #endif
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 0519a8805aab..68a163c8b4f2 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -43,6 +43,8 @@ static struct alarm_base {
 	spinlock_t		lock;
 	struct timerqueue_head	timerqueue;
 	ktime_t			(*gettime)(void);
+	int			(*get_timespec)(const clockid_t which_clock,
+						struct timespec64 *tp);
 	clockid_t		base_clockid;
 } alarm_bases[ALARM_NUMTYPE];
 
@@ -645,21 +647,30 @@ static int alarm_clock_getres(const clockid_t which_clock, struct timespec64 *tp
 }
 
 /**
- * alarm_clock_get - posix clock_get interface
+ * alarm_clock_get_timespec - posix clock_get_timespec interface
  * @which_clock: clockid
  * @tp: timespec to fill.
  *
  * Provides the underlying alarm base time.
  */
-static int alarm_clock_get(clockid_t which_clock, struct timespec64 *tp)
+static int alarm_clock_get_timespec(clockid_t which_clock, struct timespec64 *tp)
 {
 	struct alarm_base *base = &alarm_bases[clock2alarm(which_clock)];
 
 	if (!alarmtimer_get_rtcdev())
 		return -EINVAL;
 
-	*tp = ktime_to_timespec64(base->gettime());
-	return 0;
+	return base->get_timespec(base->base_clockid, tp);
+}
+
+static ktime_t alarm_clock_get_ktime(clockid_t which_clock)
+{
+	struct alarm_base *base = &alarm_bases[clock2alarm(which_clock)];
+
+	if (!alarmtimer_get_rtcdev())
+		return -EINVAL;
+
+	return base->gettime();
 }
 
 /**
@@ -825,7 +836,8 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
 
 const struct k_clock alarm_clock = {
 	.clock_getres		= alarm_clock_getres,
-	.clock_get		= alarm_clock_get,
+	.clock_get_ktime	= alarm_clock_get_ktime,
+	.clock_get_timespec	= alarm_clock_get_timespec,
 	.timer_create		= alarm_timer_create,
 	.timer_set		= common_timer_set,
 	.timer_del		= common_timer_del,
@@ -870,8 +882,10 @@ static int __init alarmtimer_init(void)
 	/* Initialize alarm bases */
 	alarm_bases[ALARM_REALTIME].base_clockid = CLOCK_REALTIME;
 	alarm_bases[ALARM_REALTIME].gettime = &ktime_get_real;
+	alarm_bases[ALARM_BOOTTIME].get_timespec = posix_get_timespec,
 	alarm_bases[ALARM_BOOTTIME].base_clockid = CLOCK_BOOTTIME;
 	alarm_bases[ALARM_BOOTTIME].gettime = &ktime_get_boottime;
+	alarm_bases[ALARM_BOOTTIME].get_timespec = posix_get_boottime_timespec;
 	for (i = 0; i < ALARM_NUMTYPE; i++) {
 		timerqueue_init_head(&alarm_bases[i].timerqueue);
 		spin_lock_init(&alarm_bases[i].lock);
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index ec960bb939fd..c8f9c9b1cd82 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -315,8 +315,8 @@ static int pc_clock_settime(clockid_t id, const struct timespec64 *ts)
 }
 
 const struct k_clock clock_posix_dynamic = {
-	.clock_getres	= pc_clock_getres,
-	.clock_set	= pc_clock_settime,
-	.clock_get	= pc_clock_gettime,
-	.clock_adj	= pc_clock_adjtime,
+	.clock_getres		= pc_clock_getres,
+	.clock_set		= pc_clock_settime,
+	.clock_get_timespec	= pc_clock_gettime,
+	.clock_adj		= pc_clock_adjtime,
 };
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0a426f4e3125..dccf7dfcd36a 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1417,26 +1417,26 @@ static int thread_cpu_timer_create(struct k_itimer *timer)
 }
 
 const struct k_clock clock_posix_cpu = {
-	.clock_getres	= posix_cpu_clock_getres,
-	.clock_set	= posix_cpu_clock_set,
-	.clock_get	= posix_cpu_clock_get,
-	.timer_create	= posix_cpu_timer_create,
-	.nsleep		= posix_cpu_nsleep,
-	.timer_set	= posix_cpu_timer_set,
-	.timer_del	= posix_cpu_timer_del,
-	.timer_get	= posix_cpu_timer_get,
-	.timer_rearm	= posix_cpu_timer_rearm,
+	.clock_getres		= posix_cpu_clock_getres,
+	.clock_set		= posix_cpu_clock_set,
+	.clock_get_timespec	= posix_cpu_clock_get,
+	.timer_create		= posix_cpu_timer_create,
+	.nsleep			= posix_cpu_nsleep,
+	.timer_set		= posix_cpu_timer_set,
+	.timer_del		= posix_cpu_timer_del,
+	.timer_get		= posix_cpu_timer_get,
+	.timer_rearm		= posix_cpu_timer_rearm,
 };
 
 const struct k_clock clock_process = {
-	.clock_getres	= process_cpu_clock_getres,
-	.clock_get	= process_cpu_clock_get,
-	.timer_create	= process_cpu_timer_create,
-	.nsleep		= process_cpu_nsleep,
+	.clock_getres		= process_cpu_clock_getres,
+	.clock_get_timespec	= process_cpu_clock_get,
+	.timer_create		= process_cpu_timer_create,
+	.nsleep			= process_cpu_nsleep,
 };
 
 const struct k_clock clock_thread = {
-	.clock_getres	= thread_cpu_clock_getres,
-	.clock_get	= thread_cpu_clock_get,
-	.timer_create	= thread_cpu_timer_create,
+	.clock_getres		= thread_cpu_clock_getres,
+	.clock_get_timespec	= thread_cpu_clock_get,
+	.timer_create		= thread_cpu_timer_create,
 };
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 29176635991f..b6fcf6935204 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -165,12 +165,18 @@ static inline void unlock_timer(struct k_itimer *timr, unsigned long flags)
 }
 
 /* Get clock_realtime */
-static int posix_clock_realtime_get(clockid_t which_clock, struct timespec64 *tp)
+static int
+posix_clock_realtime_get_timespec(clockid_t which_clock, struct timespec64 *tp)
 {
 	ktime_get_real_ts64(tp);
 	return 0;
 }
 
+static ktime_t posix_clock_realtime_get_ktime(clockid_t which_clock)
+{
+	return ktime_get_real();
+}
+
 /* Set clock_realtime */
 static int posix_clock_realtime_set(const clockid_t which_clock,
 				    const struct timespec64 *tp)
@@ -187,12 +193,17 @@ static int posix_clock_realtime_adj(const clockid_t which_clock,
 /*
  * Get monotonic time for posix timers
  */
-static int posix_ktime_get_ts(clockid_t which_clock, struct timespec64 *tp)
+int posix_get_timespec(clockid_t which_clock, struct timespec64 *tp)
 {
 	ktime_get_ts64(tp);
 	return 0;
 }
 
+static ktime_t posix_get_ktime(clockid_t which_clock)
+{
+	return ktime_get();
+}
+
 /*
  * Get monotonic-raw time for posix timers
  */
@@ -222,18 +233,28 @@ static int posix_get_coarse_res(const clockid_t which_clock, struct timespec64 *
 	return 0;
 }
 
-static int posix_get_boottime(const clockid_t which_clock, struct timespec64 *tp)
+int posix_get_boottime_timespec(const clockid_t which_clock, struct timespec64 *tp)
 {
 	ktime_get_boottime_ts64(tp);
 	return 0;
 }
 
-static int posix_get_tai(clockid_t which_clock, struct timespec64 *tp)
+static ktime_t posix_get_boottime_ktime(const clockid_t which_clock)
+{
+	return ktime_get_boottime();
+}
+
+static int posix_get_tai_timespec(clockid_t which_clock, struct timespec64 *tp)
 {
 	ktime_get_clocktai_ts64(tp);
 	return 0;
 }
 
+static ktime_t posix_get_tai_ktime(clockid_t which_clock)
+{
+	return ktime_get_clocktai();
+}
+
 static int posix_get_hrtimer_res(clockid_t which_clock, struct timespec64 *tp)
 {
 	tp->tv_sec = 0;
@@ -645,7 +666,6 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
 {
 	const struct k_clock *kc = timr->kclock;
 	ktime_t now, remaining, iv;
-	struct timespec64 ts64;
 	bool sig_none;
 
 	sig_none = timr->it_sigev_notify == SIGEV_NONE;
@@ -663,12 +683,7 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting)
 			return;
 	}
 
-	/*
-	 * The timespec64 based conversion is suboptimal, but it's not
-	 * worth to implement yet another callback.
-	 */
-	kc->clock_get(timr->it_clock, &ts64);
-	now = timespec64_to_ktime(ts64);
+	now = kc->clock_get_ktime(timr->it_clock);
 
 	/*
 	 * When a requeue is pending or this is a SIGEV_NONE timer move the
@@ -781,7 +796,7 @@ static void common_hrtimer_arm(struct k_itimer *timr, ktime_t expires,
 	 * Posix magic: Relative CLOCK_REALTIME timers are not affected by
 	 * clock modifications, so they become CLOCK_MONOTONIC based under the
 	 * hood. See hrtimer_init(). Update timr->kclock, so the generic
-	 * functions which use timr->kclock->clock_get() work.
+	 * functions which use timr->kclock->clock_get_*() work.
 	 *
 	 * Note: it_clock stays unmodified, because the next timer_set() might
 	 * use ABSTIME, so it needs to switch back.
@@ -1039,7 +1054,7 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, which_clock,
 	if (!kc)
 		return -EINVAL;
 
-	error = kc->clock_get(which_clock, &kernel_tp);
+	error = kc->clock_get_timespec(which_clock, &kernel_tp);
 
 	if (!error && put_timespec64(&kernel_tp, tp))
 		error = -EFAULT;
@@ -1121,7 +1136,7 @@ SYSCALL_DEFINE2(clock_gettime32, clockid_t, which_clock,
 	if (!kc)
 		return -EINVAL;
 
-	err = kc->clock_get(which_clock, &ts);
+	err = kc->clock_get_timespec(which_clock, &ts);
 
 	if (!err && put_old_timespec32(&ts, tp))
 		err = -EFAULT;
@@ -1233,7 +1248,8 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,
 
 static const struct k_clock clock_realtime = {
 	.clock_getres		= posix_get_hrtimer_res,
-	.clock_get		= posix_clock_realtime_get,
+	.clock_get_timespec	= posix_clock_realtime_get_timespec,
+	.clock_get_ktime	= posix_clock_realtime_get_ktime,
 	.clock_set		= posix_clock_realtime_set,
 	.clock_adj		= posix_clock_realtime_adj,
 	.nsleep			= common_nsleep,
@@ -1250,7 +1266,8 @@ static const struct k_clock clock_realtime = {
 
 static const struct k_clock clock_monotonic = {
 	.clock_getres		= posix_get_hrtimer_res,
-	.clock_get		= posix_ktime_get_ts,
+	.clock_get_timespec	= posix_get_timespec,
+	.clock_get_ktime	= posix_get_ktime,
 	.nsleep			= common_nsleep,
 	.timer_create		= common_timer_create,
 	.timer_set		= common_timer_set,
@@ -1265,22 +1282,23 @@ static const struct k_clock clock_monotonic = {
 
 static const struct k_clock clock_monotonic_raw = {
 	.clock_getres		= posix_get_hrtimer_res,
-	.clock_get		= posix_get_monotonic_raw,
+	.clock_get_timespec	= posix_get_monotonic_raw,
 };
 
 static const struct k_clock clock_realtime_coarse = {
 	.clock_getres		= posix_get_coarse_res,
-	.clock_get		= posix_get_realtime_coarse,
+	.clock_get_timespec	= posix_get_realtime_coarse,
 };
 
 static const struct k_clock clock_monotonic_coarse = {
 	.clock_getres		= posix_get_coarse_res,
-	.clock_get		= posix_get_monotonic_coarse,
+	.clock_get_timespec	= posix_get_monotonic_coarse,
 };
 
 static const struct k_clock clock_tai = {
 	.clock_getres		= posix_get_hrtimer_res,
-	.clock_get		= posix_get_tai,
+	.clock_get_ktime	= posix_get_tai_ktime,
+	.clock_get_timespec	= posix_get_tai_timespec,
 	.nsleep			= common_nsleep,
 	.timer_create		= common_timer_create,
 	.timer_set		= common_timer_set,
@@ -1295,7 +1313,8 @@ static const struct k_clock clock_tai = {
 
 static const struct k_clock clock_boottime = {
 	.clock_getres		= posix_get_hrtimer_res,
-	.clock_get		= posix_get_boottime,
+	.clock_get_ktime	= posix_get_boottime_ktime,
+	.clock_get_timespec	= posix_get_boottime_timespec,
 	.nsleep			= common_nsleep,
 	.timer_create		= common_timer_create,
 	.timer_set		= common_timer_set,
diff --git a/kernel/time/posix-timers.h b/kernel/time/posix-timers.h
index de5daa6d975a..ff227cd1bdbb 100644
--- a/kernel/time/posix-timers.h
+++ b/kernel/time/posix-timers.h
@@ -6,8 +6,11 @@ struct k_clock {
 				struct timespec64 *tp);
 	int	(*clock_set)(const clockid_t which_clock,
 			     const struct timespec64 *tp);
-	int	(*clock_get)(const clockid_t which_clock,
-			     struct timespec64 *tp);
+	/* return the clock value in the current time namespace. */
+	int	(*clock_get_timespec)(const clockid_t which_clock,
+				      struct timespec64 *tp);
+	/* return the clock value in the root time namespace. */
+	ktime_t	(*clock_get_ktime)(const clockid_t which_clock);
 	int	(*clock_adj)(const clockid_t which_clock, struct __kernel_timex *tx);
 	int	(*timer_create)(struct k_itimer *timer);
 	int	(*nsleep)(const clockid_t which_clock, int flags,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 04/28] timens: Introduce CLOCK_MONOTONIC offsets
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (2 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 05/28] timens: Introduce CLOCK_BOOTTIME offset Dmitry Safonov
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

From: Andrei Vagin <avagin@openvz.org>

Add monotonic time virtualisation for time namespace.
Introduce timespec for monotionic clock into timens offsets and wire
clock_gettime() syscall.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/time_namespace.h | 8 ++++++++
 include/linux/timens_offsets.h | 1 +
 kernel/time/posix-stubs.c      | 2 ++
 kernel/time/posix-timers.c     | 4 ++++
 4 files changed, 15 insertions(+)

diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index b6985aa87479..81d0c989df3c 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -40,6 +40,13 @@ static inline void put_time_ns(struct time_namespace *ns)
 	kref_put(&ns->kref, free_time_ns);
 }
 
+static inline void timens_add_monotonic(struct timespec64 *ts)
+{
+        struct timens_offsets *ns_offsets = current->nsproxy->time_ns->offsets;
+
+        if (ns_offsets)
+                *ts = timespec64_add(*ts, ns_offsets->monotonic);
+}
 
 #else
 static inline struct time_namespace *get_time_ns(struct time_namespace *ns)
@@ -65,6 +72,7 @@ static inline int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *ts
 	return 0;
 }
 
+static inline void timens_add_monotonic(struct timespec64 *ts) {}
 #endif
 
 #endif /* _LINUX_TIMENS_H */
diff --git a/include/linux/timens_offsets.h b/include/linux/timens_offsets.h
index 7d7cb68ea778..eaac2c82be5c 100644
--- a/include/linux/timens_offsets.h
+++ b/include/linux/timens_offsets.h
@@ -3,6 +3,7 @@
 #define _LINUX_TIME_OFFSETS_H
 
 struct timens_offsets {
+	struct timespec64 monotonic;
 };
 
 #endif
diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index 67df65f887ac..17c67e0aecd8 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -14,6 +14,7 @@
 #include <linux/ktime.h>
 #include <linux/timekeeping.h>
 #include <linux/posix-timers.h>
+#include <linux/time_namespace.h>
 #include <linux/compat.h>
 
 #ifdef CONFIG_ARCH_HAS_SYSCALL_WRAPPER
@@ -77,6 +78,7 @@ int do_clock_gettime(clockid_t which_clock, struct timespec64 *tp)
 		break;
 	case CLOCK_MONOTONIC:
 		ktime_get_ts64(tp);
+		timens_add_monotonic(tp);
 		break;
 	case CLOCK_BOOTTIME:
 		ktime_get_boottime_ts64(tp);
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index b6fcf6935204..52098f6ad596 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -30,6 +30,7 @@
 #include <linux/hashtable.h>
 #include <linux/compat.h>
 #include <linux/nospec.h>
+#include <linux/time_namespace.h>
 
 #include "timekeeping.h"
 #include "posix-timers.h"
@@ -196,6 +197,7 @@ static int posix_clock_realtime_adj(const clockid_t which_clock,
 int posix_get_timespec(clockid_t which_clock, struct timespec64 *tp)
 {
 	ktime_get_ts64(tp);
+	timens_add_monotonic(tp);
 	return 0;
 }
 
@@ -210,6 +212,7 @@ static ktime_t posix_get_ktime(clockid_t which_clock)
 static int posix_get_monotonic_raw(clockid_t which_clock, struct timespec64 *tp)
 {
 	ktime_get_raw_ts64(tp);
+	timens_add_monotonic(tp);
 	return 0;
 }
 
@@ -224,6 +227,7 @@ static int posix_get_monotonic_coarse(clockid_t which_clock,
 						struct timespec64 *tp)
 {
 	ktime_get_coarse_ts64(tp);
+	timens_add_monotonic(tp);
 	return 0;
 }
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 05/28] timens: Introduce CLOCK_BOOTTIME offset
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (3 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 04/28] timens: Introduce CLOCK_MONOTONIC offsets Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets Dmitry Safonov
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

Adds boottime virtualisation for time namespace.
Introduce timespec for boottime clock into timens offsets and wire
clock_gettime() syscall.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/time_namespace.h | 9 +++++++++
 include/linux/timens_offsets.h | 1 +
 kernel/time/alarmtimer.c       | 1 +
 kernel/time/posix-stubs.c      | 1 +
 kernel/time/posix-timers.c     | 1 +
 5 files changed, 13 insertions(+)

diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index 81d0c989df3c..1dda8af6b9fe 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -48,6 +48,14 @@ static inline void timens_add_monotonic(struct timespec64 *ts)
                 *ts = timespec64_add(*ts, ns_offsets->monotonic);
 }
 
+static inline void timens_add_boottime(struct timespec64 *ts)
+{
+        struct timens_offsets *ns_offsets = current->nsproxy->time_ns->offsets;
+
+        if (ns_offsets)
+                *ts = timespec64_add(*ts, ns_offsets->boottime);
+}
+
 #else
 static inline struct time_namespace *get_time_ns(struct time_namespace *ns)
 {
@@ -73,6 +81,7 @@ static inline int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *ts
 }
 
 static inline void timens_add_monotonic(struct timespec64 *ts) {}
+static inline void timens_add_boottime(struct timespec64 *ts) {}
 #endif
 
 #endif /* _LINUX_TIMENS_H */
diff --git a/include/linux/timens_offsets.h b/include/linux/timens_offsets.h
index eaac2c82be5c..e93aabaa5e45 100644
--- a/include/linux/timens_offsets.h
+++ b/include/linux/timens_offsets.h
@@ -4,6 +4,7 @@
 
 struct timens_offsets {
 	struct timespec64 monotonic;
+	struct timespec64 boottime;
 };
 
 #endif
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 68a163c8b4f2..6346e6ee0d32 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -26,6 +26,7 @@
 #include <linux/freezer.h>
 #include <linux/compat.h>
 #include <linux/module.h>
+#include <linux/time_namespace.h>
 
 #include "posix-timers.h"
 
diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index 17c67e0aecd8..edaf075d1ee4 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -82,6 +82,7 @@ int do_clock_gettime(clockid_t which_clock, struct timespec64 *tp)
 		break;
 	case CLOCK_BOOTTIME:
 		ktime_get_boottime_ts64(tp);
+		timens_add_boottime(tp);
 		break;
 	default:
 		return -EINVAL;
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 52098f6ad596..573942ae2629 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -240,6 +240,7 @@ static int posix_get_coarse_res(const clockid_t which_clock, struct timespec64 *
 int posix_get_boottime_timespec(const clockid_t which_clock, struct timespec64 *tp)
 {
 	ktime_get_boottime_ts64(tp);
+	timens_add_boottime(tp);
 	return 0;
 }
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (4 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 05/28] timens: Introduce CLOCK_BOOTTIME offset Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-14 13:37   ` Thomas Gleixner
  2019-06-12 19:26 ` [PATCHv4 07/28] posix-timers/timens: Take into account " Dmitry Safonov
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

Make timerfd respect timens offsets.
Provide a helper timens_ktime_to_host() that is useful to wire up
timens to different kernel subsystems.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 fs/timerfd.c                   |  3 +++
 include/linux/time_namespace.h | 18 ++++++++++++++++++
 kernel/time_namespace.c        | 27 +++++++++++++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/fs/timerfd.c b/fs/timerfd.c
index 6a6fc8aa1de7..9b0c2f65e7e8 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -26,6 +26,7 @@
 #include <linux/syscalls.h>
 #include <linux/compat.h>
 #include <linux/rcupdate.h>
+#include <linux/time_namespace.h>
 
 struct timerfd_ctx {
 	union {
@@ -196,6 +197,8 @@ static int timerfd_setup(struct timerfd_ctx *ctx, int flags,
 	}
 
 	if (texp != 0) {
+		if (flags & TFD_TIMER_ABSTIME)
+			texp = timens_ktime_to_host(clockid, texp);
 		if (isalarm(ctx)) {
 			if (flags & TFD_TIMER_ABSTIME)
 				alarm_start(&ctx->t.alarm, texp);
diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index 1dda8af6b9fe..d32b55fad953 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -56,6 +56,19 @@ static inline void timens_add_boottime(struct timespec64 *ts)
                 *ts = timespec64_add(*ts, ns_offsets->boottime);
 }
 
+ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim,
+				struct timens_offsets *offsets);
+static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim)
+{
+	struct timens_offsets *offsets = current->nsproxy->time_ns->offsets;
+
+	if (!offsets) /* fast-path for the root time namespace */
+	       return tim;
+
+	return do_timens_ktime_to_host(clockid, tim, offsets);
+}
+
+
 #else
 static inline struct time_namespace *get_time_ns(struct time_namespace *ns)
 {
@@ -82,6 +95,11 @@ static inline int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *ts
 
 static inline void timens_add_monotonic(struct timespec64 *ts) {}
 static inline void timens_add_boottime(struct timespec64 *ts) {}
+
+static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim)
+{
+	return tim;
+}
 #endif
 
 #endif /* _LINUX_TIMENS_H */
diff --git a/kernel/time_namespace.c b/kernel/time_namespace.c
index 4828447721ec..b3cffdf2635c 100644
--- a/kernel/time_namespace.c
+++ b/kernel/time_namespace.c
@@ -15,6 +15,33 @@
 #include <linux/sched/task.h>
 #include <linux/mm.h>
 
+ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim, struct timens_offsets *ns_offsets)
+{
+	ktime_t koff;
+
+	switch (clockid) {
+	case CLOCK_MONOTONIC:
+		koff = timespec64_to_ktime(ns_offsets->monotonic);
+		break;
+	case CLOCK_BOOTTIME:
+	case CLOCK_BOOTTIME_ALARM:
+		koff = timespec64_to_ktime(ns_offsets->boottime);
+		break;
+	default:
+		return tim;
+	}
+
+	/* tim - off has to be in [0, KTIME_MAX) */
+	if (tim < koff)
+		tim = 0;
+	else if (KTIME_MAX - tim < -koff)
+		tim = KTIME_MAX;
+	else
+		tim = ktime_sub(tim, koff);
+
+	return tim;
+}
+
 static struct ucounts *inc_time_namespaces(struct user_namespace *ns)
 {
 	return inc_ucount(ns, current_euid(), UCOUNT_TIME_NAMESPACES);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 07/28] posix-timers/timens: Take into account clock offsets
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (5 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-14 13:42   ` Thomas Gleixner
  2019-06-12 19:26 ` [PATCHv4 08/28] timens/kernel: Take into account timens clock offsets in clock_nanosleep Dmitry Safonov
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

Wire timer_settime() syscall into time namespace virtualization.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 kernel/time/posix-timers.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 573942ae2629..dba77ee48e74 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -857,6 +857,8 @@ int common_timer_set(struct k_itimer *timr, int flags,
 
 	timr->it_interval = timespec64_to_ktime(new_setting->it_interval);
 	expires = timespec64_to_ktime(new_setting->it_value);
+	if (flags & TIMER_ABSTIME)
+		expires = timens_ktime_to_host(timr->it_clock, expires);
 	sigev_none = timr->it_sigev_notify == SIGEV_NONE;
 
 	kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none);
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 08/28] timens/kernel: Take into account timens clock offsets in clock_nanosleep
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (6 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 07/28] posix-timers/timens: Take into account " Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-14 13:49   ` Thomas Gleixner
  2019-06-12 19:26 ` [PATCHv4 09/28] timens: Shift /proc/uptime Dmitry Safonov
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

Wire up clock_nanosleep() to timens offsets.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/hrtimer.h    |  2 +-
 kernel/time/alarmtimer.c   |  2 ++
 kernel/time/hrtimer.c      |  8 ++++----
 kernel/time/posix-stubs.c  | 12 ++++++++++--
 kernel/time/posix-timers.c | 19 ++++++++++++++++---
 5 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 2e8957eac4d4..5a3b3e17d0e8 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -473,7 +473,7 @@ static inline u64 hrtimer_forward_now(struct hrtimer *timer,
 /* Precise sleep: */
 
 extern int nanosleep_copyout(struct restart_block *, struct timespec64 *);
-extern long hrtimer_nanosleep(const struct timespec64 *rqtp,
+extern long hrtimer_nanosleep(ktime_t rqtp,
 			      const enum hrtimer_mode mode,
 			      const clockid_t clockid);
 
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 6346e6ee0d32..f1f42df179d0 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -819,6 +819,8 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
 		ktime_t now = alarm_bases[type].gettime();
 
 		exp = ktime_add_safe(now, exp);
+	} else {
+		exp = timens_ktime_to_host(which_clock, exp);
 	}
 
 	ret = alarmtimer_do_nsleep(&alarm, exp, type);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 41dfff23c1f9..b245f6ff9c8f 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1716,7 +1716,7 @@ static long __sched hrtimer_nanosleep_restart(struct restart_block *restart)
 	return ret;
 }
 
-long hrtimer_nanosleep(const struct timespec64 *rqtp,
+long hrtimer_nanosleep(ktime_t rqtp,
 		       const enum hrtimer_mode mode, const clockid_t clockid)
 {
 	struct restart_block *restart;
@@ -1729,7 +1729,7 @@ long hrtimer_nanosleep(const struct timespec64 *rqtp,
 		slack = 0;
 
 	hrtimer_init_on_stack(&t.timer, clockid, mode);
-	hrtimer_set_expires_range_ns(&t.timer, timespec64_to_ktime(*rqtp), slack);
+	hrtimer_set_expires_range_ns(&t.timer, rqtp, slack);
 	ret = do_nanosleep(&t, mode);
 	if (ret != -ERESTART_RESTARTBLOCK)
 		goto out;
@@ -1764,7 +1764,7 @@ SYSCALL_DEFINE2(nanosleep, struct __kernel_timespec __user *, rqtp,
 
 	current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
 	current->restart_block.nanosleep.rmtp = rmtp;
-	return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+	return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, CLOCK_MONOTONIC);
 }
 
 #endif
@@ -1784,7 +1784,7 @@ SYSCALL_DEFINE2(nanosleep_time32, struct old_timespec32 __user *, rqtp,
 
 	current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE;
 	current->restart_block.nanosleep.compat_rmtp = rmtp;
-	return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+	return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, CLOCK_MONOTONIC);
 }
 #endif
 
diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
index edaf075d1ee4..4ee0dc180866 100644
--- a/kernel/time/posix-stubs.c
+++ b/kernel/time/posix-stubs.c
@@ -129,6 +129,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
 		struct __kernel_timespec __user *, rmtp)
 {
 	struct timespec64 t;
+	ktime_t texp;
 
 	switch (which_clock) {
 	case CLOCK_REALTIME:
@@ -147,7 +148,10 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
 		rmtp = NULL;
 	current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
 	current->restart_block.nanosleep.rmtp = rmtp;
-	return hrtimer_nanosleep(&t, flags & TIMER_ABSTIME ?
+	texp = timespec64_to_ktime(t);
+	if (flags & TIMER_ABSTIME)
+		texp = timens_ktime_to_host(clockid, texp;
+	return hrtimer_nanosleep(texp, flags & TIMER_ABSTIME ?
 				 HRTIMER_MODE_ABS : HRTIMER_MODE_REL,
 				 which_clock);
 }
@@ -215,6 +219,7 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,
 		struct old_timespec32 __user *, rmtp)
 {
 	struct timespec64 t;
+	ktime texp;
 
 	switch (which_clock) {
 	case CLOCK_REALTIME:
@@ -233,7 +238,10 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,
 		rmtp = NULL;
 	current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE;
 	current->restart_block.nanosleep.compat_rmtp = rmtp;
-	return hrtimer_nanosleep(&t, flags & TIMER_ABSTIME ?
+	texp = timespec64_to_ktime(t);
+	if (flags & TIMER_ABSTIME)
+		texp = timens_ktime_to_host(clockid, texp;
+	return hrtimer_nanosleep(texp, flags & TIMER_ABSTIME ?
 				 HRTIMER_MODE_ABS : HRTIMER_MODE_REL,
 				 which_clock);
 }
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index dba77ee48e74..bb457962fc7c 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -1194,7 +1194,20 @@ SYSCALL_DEFINE2(clock_getres_time32, clockid_t, which_clock,
 static int common_nsleep(const clockid_t which_clock, int flags,
 			 const struct timespec64 *rqtp)
 {
-	return hrtimer_nanosleep(rqtp, flags & TIMER_ABSTIME ?
+	ktime_t texp = timespec64_to_ktime(*rqtp);
+	return hrtimer_nanosleep(texp, flags & TIMER_ABSTIME ?
+				 HRTIMER_MODE_ABS : HRTIMER_MODE_REL,
+				 which_clock);
+}
+
+static int common_nsleep_timens(const clockid_t which_clock, int flags,
+			 const struct timespec64 *rqtp)
+{
+	ktime_t texp = timespec64_to_ktime(*rqtp);
+	if (flags & TIMER_ABSTIME)
+		texp = timens_ktime_to_host(which_clock, texp);
+
+	return hrtimer_nanosleep(texp, flags & TIMER_ABSTIME ?
 				 HRTIMER_MODE_ABS : HRTIMER_MODE_REL,
 				 which_clock);
 }
@@ -1275,7 +1288,7 @@ static const struct k_clock clock_monotonic = {
 	.clock_getres		= posix_get_hrtimer_res,
 	.clock_get_timespec	= posix_get_timespec,
 	.clock_get_ktime	= posix_get_ktime,
-	.nsleep			= common_nsleep,
+	.nsleep			= common_nsleep_timens,
 	.timer_create		= common_timer_create,
 	.timer_set		= common_timer_set,
 	.timer_get		= common_timer_get,
@@ -1322,7 +1335,7 @@ static const struct k_clock clock_boottime = {
 	.clock_getres		= posix_get_hrtimer_res,
 	.clock_get_ktime	= posix_get_boottime_ktime,
 	.clock_get_timespec	= posix_get_boottime_timespec,
-	.nsleep			= common_nsleep,
+	.nsleep			= common_nsleep_timens,
 	.timer_create		= common_timer_create,
 	.timer_set		= common_timer_set,
 	.timer_get		= common_timer_get,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 09/28] timens: Shift /proc/uptime
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (7 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 08/28] timens/kernel: Take into account timens clock offsets in clock_nanosleep Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-14 13:50   ` Thomas Gleixner
  2019-06-12 19:26 ` [PATCHv4 10/28] x86/vdso2c: Correct err messages on file opening Dmitry Safonov
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

Respect boottime inside time namespace for /proc/uptime

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 fs/proc/uptime.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/proc/uptime.c b/fs/proc/uptime.c
index a4c2791ab70b..5a1b228964fb 100644
--- a/fs/proc/uptime.c
+++ b/fs/proc/uptime.c
@@ -5,6 +5,7 @@
 #include <linux/sched.h>
 #include <linux/seq_file.h>
 #include <linux/time.h>
+#include <linux/time_namespace.h>
 #include <linux/kernel_stat.h>
 
 static int uptime_proc_show(struct seq_file *m, void *v)
@@ -20,6 +21,8 @@ static int uptime_proc_show(struct seq_file *m, void *v)
 		nsec += (__force u64) kcpustat_cpu(i).cpustat[CPUTIME_IDLE];
 
 	ktime_get_boottime_ts64(&uptime);
+	timens_add_boottime(&uptime);
+
 	idle.tv_sec = div_u64_rem(nsec, NSEC_PER_SEC, &rem);
 	idle.tv_nsec = rem;
 	seq_printf(m, "%lu.%02lu %lu.%02lu\n",
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 10/28] x86/vdso2c: Correct err messages on file opening
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (8 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 09/28] timens: Shift /proc/uptime Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 11/28] x86/vdso2c: Convert iterator to unsigned Dmitry Safonov
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

err() message in main() is misleading: it should print `outfilename`,
which is argv[3], not argv[2].

Correct error messages to be more precise about what failed and for
which file.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/vdso2c.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index 3a4d8d4d39f8..ce67370d14e5 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -184,7 +184,7 @@ static void map_input(const char *name, void **addr, size_t *len, int prot)
 
 	int fd = open(name, O_RDONLY);
 	if (fd == -1)
-		err(1, "%s", name);
+		err(1, "open(%s)", name);
 
 	tmp_len = lseek(fd, 0, SEEK_END);
 	if (tmp_len == (off_t)-1)
@@ -237,7 +237,7 @@ int main(int argc, char **argv)
 	outfilename = argv[3];
 	outfile = fopen(outfilename, "w");
 	if (!outfile)
-		err(1, "%s", argv[2]);
+		err(1, "fopen(%s)", outfilename);
 
 	go(raw_addr, raw_len, stripped_addr, stripped_len, outfile, name);
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 11/28] x86/vdso2c: Convert iterator to unsigned
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (9 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 10/28] x86/vdso2c: Correct err messages on file opening Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 12/28] x86/vdso/Makefile: Add vobjs32 Dmitry Safonov
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

i and j are used everywhere with unsigned types.
Cleanup and prettify the code a bit.

Introduce syms_nr for readability and as a preparation for allocating an
array of vDSO entries that will be needed for creating two vdso .so's:
one for host tasks and another for processes inside time namespace.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/vdso2c.h | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index a20b134de2a8..80be339ee93e 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -13,7 +13,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	unsigned long load_size = -1;  /* Work around bogus warning */
 	unsigned long mapping_size;
 	ELF(Ehdr) *hdr = (ELF(Ehdr) *)raw_addr;
-	int i;
+	unsigned int i, syms_nr;
 	unsigned long j;
 	ELF(Shdr) *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
 		*alt_sec = NULL;
@@ -86,11 +86,10 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	strtab_hdr = raw_addr + GET_LE(&hdr->e_shoff) +
 		GET_LE(&hdr->e_shentsize) * GET_LE(&symtab_hdr->sh_link);
 
+	syms_nr = GET_LE(&symtab_hdr->sh_size) / GET_LE(&symtab_hdr->sh_entsize);
 	/* Walk the symbol table */
-	for (i = 0;
-	     i < GET_LE(&symtab_hdr->sh_size) / GET_LE(&symtab_hdr->sh_entsize);
-	     i++) {
-		int k;
+	for (i = 0; i < syms_nr; i++) {
+		unsigned int k;
 		ELF(Sym) *sym = raw_addr + GET_LE(&symtab_hdr->sh_offset) +
 			GET_LE(&symtab_hdr->sh_entsize) * i;
 		const char *sym_name = raw_addr +
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 12/28] x86/vdso/Makefile: Add vobjs32
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (10 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 11/28] x86/vdso2c: Convert iterator to unsigned Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 13/28] x86/vdso: Restrict splitting VVAR VMA Dmitry Safonov
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

Treat ia32/i386 objects in array the same As for 64-bit vdso objects.
This is a preparation ground to avoid code duplication on introduction
timens vdso.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/Makefile | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 42fe42e82baf..b58d34120fd8 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -18,6 +18,8 @@ VDSO32-$(CONFIG_IA32_EMULATION)	:= y
 
 # files to link into the vdso
 vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o
+vobjs32-y := vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
+vobjs32-y += vdso32/vclock_gettime.o
 
 # files to link into kernel
 obj-y				+= vma.o
@@ -31,10 +33,12 @@ vdso_img-$(VDSO32-y)		+= 32
 obj-$(VDSO32-y)			+= vdso32-setup.o
 
 vobjs := $(foreach F,$(vobjs-y),$(obj)/$F)
+vobjs32 := $(foreach F,$(vobjs32-y),$(obj)/$F)
 
 $(obj)/vdso.o: $(obj)/vdso.so
 
 targets += vdso.lds $(vobjs-y)
+targets += vdso32/vdso32.lds $(vobjs32-y)
 
 # Build the vDSO image C files and link them in.
 vdso_img_objs := $(vdso_img-y:%=vdso-image-%.o)
@@ -125,10 +129,6 @@ $(obj)/vdsox32.so.dbg: $(obj)/vdsox32.lds $(vobjx32s) FORCE
 CPPFLAGS_vdso32.lds = $(CPPFLAGS_vdso.lds)
 VDSO_LDFLAGS_vdso32.lds = -m elf_i386 -soname linux-gate.so.1
 
-targets += vdso32/vdso32.lds
-targets += vdso32/note.o vdso32/system_call.o vdso32/sigreturn.o
-targets += vdso32/vclock_gettime.o
-
 KBUILD_AFLAGS_32 := $(filter-out -m64,$(KBUILD_AFLAGS)) -DBUILD_VDSO
 $(obj)/vdso32.so.dbg: KBUILD_AFLAGS = $(KBUILD_AFLAGS_32)
 $(obj)/vdso32.so.dbg: asflags-$(CONFIG_X86_64) += -m32
@@ -153,12 +153,7 @@ endif
 
 $(obj)/vdso32.so.dbg: KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
 
-$(obj)/vdso32.so.dbg: FORCE \
-		      $(obj)/vdso32/vdso32.lds \
-		      $(obj)/vdso32/vclock_gettime.o \
-		      $(obj)/vdso32/note.o \
-		      $(obj)/vdso32/system_call.o \
-		      $(obj)/vdso32/sigreturn.o
+$(obj)/vdso32.so.dbg: $(obj)/vdso32/vdso32.lds $(vobjs32) FORCE
 	$(call if_changed,vdso)
 
 #
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 13/28] x86/vdso: Restrict splitting VVAR VMA
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (11 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 12/28] x86/vdso/Makefile: Add vobjs32 Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 14/28] x86/vdso: Rename vdso_image {.data=>.text} Dmitry Safonov
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

Although, time namespace can work with VVAR VMA split, it seems worth
to forbid splitting VVAR resulting in stricter ABI and reducing amount
of corner-cases to consider while working further on VDSO.

I don't think there is any use-case for partial mremap() of vvar,
but if there is any - this patch can be easily reverted.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/vma.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 8db1f594e8b1..d2b421233ba5 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -84,6 +84,18 @@ static int vdso_mremap(const struct vm_special_mapping *sm,
 	return 0;
 }
 
+static int vvar_mremap(const struct vm_special_mapping *sm,
+		struct vm_area_struct *new_vma)
+{
+	unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
+	const struct vdso_image *image = new_vma->vm_mm->context.vdso_image;
+
+	if (new_size != -image->sym_vvar_start)
+		return -EINVAL;
+
+	return 0;
+}
+
 static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 		      struct vm_area_struct *vma, struct vm_fault *vmf)
 {
@@ -136,6 +148,7 @@ static const struct vm_special_mapping vdso_mapping = {
 static const struct vm_special_mapping vvar_mapping = {
 	.name = "[vvar]",
 	.fault = vvar_fault,
+	.mremap = vvar_mremap,
 };
 
 /*
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 14/28] x86/vdso: Rename vdso_image {.data=>.text}
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (12 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 13/28] x86/vdso: Restrict splitting VVAR VMA Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 15/28] x86/vdso: Add offsets page in vvar Dmitry Safonov
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

To avoid any confusion with VVAR.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/vdso2c.h | 2 +-
 arch/x86/entry/vdso/vma.c    | 6 +++---
 arch/x86/include/asm/vdso.h  | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index 80be339ee93e..7556bb70ed8b 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -158,7 +158,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	fprintf(outfile, "\n};\n\n");
 
 	fprintf(outfile, "const struct vdso_image %s = {\n", image_name);
-	fprintf(outfile, "\t.data = raw_data,\n");
+	fprintf(outfile, "\t.text = raw_data,\n");
 	fprintf(outfile, "\t.size = %lu,\n", mapping_size);
 	if (alt_sec) {
 		fprintf(outfile, "\t.alt = %lu,\n",
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index d2b421233ba5..c30a33b2963b 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -32,8 +32,8 @@ void __init init_vdso_image(const struct vdso_image *image)
 {
 	BUG_ON(image->size % PAGE_SIZE != 0);
 
-	apply_alternatives((struct alt_instr *)(image->data + image->alt),
-			   (struct alt_instr *)(image->data + image->alt +
+	apply_alternatives((struct alt_instr *)(image->text + image->alt),
+			   (struct alt_instr *)(image->text + image->alt +
 						image->alt_len));
 }
 
@@ -47,7 +47,7 @@ static vm_fault_t vdso_fault(const struct vm_special_mapping *sm,
 	if (!image || (vmf->pgoff << PAGE_SHIFT) >= image->size)
 		return VM_FAULT_SIGBUS;
 
-	vmf->page = virt_to_page(image->data + (vmf->pgoff << PAGE_SHIFT));
+	vmf->page = virt_to_page(image->text + (vmf->pgoff << PAGE_SHIFT));
 	get_page(vmf->page);
 	return 0;
 }
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 230474e2ddb5..dffdc12cc7d6 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -11,7 +11,7 @@
 #include <linux/mm_types.h>
 
 struct vdso_image {
-	void *data;
+	void *text;
 	unsigned long size;   /* Always a multiple of PAGE_SIZE */
 
 	unsigned long alt, alt_len;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 15/28] x86/vdso: Add offsets page in vvar
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (13 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 14/28] x86/vdso: Rename vdso_image {.data=>.text} Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-14 13:58   ` Thomas Gleixner
  2019-06-12 19:26 ` [PATCHv4 16/28] x86/vdso: Allocate timens vdso Dmitry Safonov
                   ` (12 subsequent siblings)
  27 siblings, 1 reply; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86, Andrei Vagin

From: Andrei Vagin <avagin@openvz.org>

As modern applications fetch time from VDSO without entering the kernel,
it's needed to provide offsets for userspace code inside time namespace.

A page for timens offsets is allocated on time namespace construction.
Put that page into VVAR for tasks inside timens and zero page for
host processes.

As VDSO code is already optimized as much as possible in terms of speed,
any new if-condition in VDSO code is undesirable; the goal is to provide
two .so(s), as was originally suggested by Andy and Thomas:
- for host tasks with optimized-out clk_to_ns() without any penalty
- for processes inside timens with clk_to_ns()
For this purpose, define clk_to_ns() under CONFIG_TIME_NS.

To eliminate any performance regression, clk_to_ns() will be called
under static_branch with follow-up patches, that adds support for
patching vdso.

VDSO mappings are platform-specific, add Kconfig dependency for arch.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/Kconfig                          |  5 ++++
 arch/x86/Kconfig                      |  1 +
 arch/x86/entry/vdso/vclock_gettime.c  | 43 +++++++++++++++++++++++++++
 arch/x86/entry/vdso/vdso-layout.lds.S |  9 +++++-
 arch/x86/entry/vdso/vdso2c.c          |  3 ++
 arch/x86/entry/vdso/vma.c             | 12 ++++++++
 arch/x86/include/asm/vdso.h           |  1 +
 init/Kconfig                          |  1 +
 8 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index c47b328eada0..503a4113dc6c 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -734,6 +734,11 @@ config HAVE_ARCH_NVRAM_OPS
 config ISA_BUS_API
 	def_bool ISA
 
+config ARCH_HAS_VDSO_TIME_NS
+	bool
+	help
+	 VDSO can add time-ns offsets without entering kernel.
+
 #
 # ABI hall of shame
 #
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2bbbd4d1ba31..da70b320eb09 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -79,6 +79,7 @@ config X86
 	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
+	select ARCH_HAS_VDSO_TIME_NS
 	select ARCH_HAS_ZONE_DEVICE		if X86_64
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select ARCH_MIGHT_HAVE_ACPI_PDC		if ACPI
diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index 0f82a70c7682..e2d93628c0dd 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -21,6 +21,7 @@
 #include <linux/math64.h>
 #include <linux/time.h>
 #include <linux/kernel.h>
+#include <linux/timens_offsets.h>
 
 #define gtod (&VVAR(vsyscall_gtod_data))
 
@@ -38,6 +39,11 @@ extern u8 hvclock_page[PAGE_SIZE]
 	__attribute__((visibility("hidden")));
 #endif
 
+#ifdef CONFIG_TIME_NS
+extern u8 timens_page
+	__attribute__((visibility("hidden")));
+#endif
+
 #ifndef BUILD_VDSO32
 
 notrace static long vdso_fallback_gettime(long clock, struct timespec *ts)
@@ -139,6 +145,39 @@ notrace static inline u64 vgetcyc(int mode)
 	return U64_MAX;
 }
 
+#ifdef CONFIG_TIME_NS
+notrace static __always_inline void clk_to_ns(clockid_t clk, struct timespec *ts)
+{
+	struct timens_offsets *timens = (struct timens_offsets *) &timens_page;
+	struct timespec64 *offset64;
+
+	switch (clk) {
+	case CLOCK_MONOTONIC:
+	case CLOCK_MONOTONIC_COARSE:
+	case CLOCK_MONOTONIC_RAW:
+		offset64 = &timens->monotonic;
+		break;
+	case CLOCK_BOOTTIME:
+		offset64 = &timens->boottime;
+	default:
+		return;
+	}
+
+	ts->tv_nsec += offset64->tv_nsec;
+	ts->tv_sec += offset64->tv_sec;
+	if (ts->tv_nsec >= NSEC_PER_SEC) {
+		ts->tv_nsec -= NSEC_PER_SEC;
+		ts->tv_sec++;
+	}
+	if (ts->tv_nsec < 0) {
+		ts->tv_nsec += NSEC_PER_SEC;
+		ts->tv_sec--;
+	}
+}
+#else
+notrace static __always_inline void clk_to_ns(clockid_t clk, struct timespec *ts) {}
+#endif
+
 notrace static int do_hres(clockid_t clk, struct timespec *ts)
 {
 	struct vgtod_ts *base = &gtod->basetime[clk];
@@ -165,6 +204,8 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts)
 	ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
 	ts->tv_nsec = ns;
 
+	clk_to_ns(clk, ts);
+
 	return 0;
 }
 
@@ -178,6 +219,8 @@ notrace static void do_coarse(clockid_t clk, struct timespec *ts)
 		ts->tv_sec = base->sec;
 		ts->tv_nsec = base->nsec;
 	} while (unlikely(gtod_read_retry(gtod, seq)));
+
+	clk_to_ns(clk, ts);
 }
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index 93c6dc7812d0..ba216527e59f 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -7,6 +7,12 @@
  * This script controls its layout.
  */
 
+#ifdef CONFIG_TIME_NS
+# define TIMENS_SZ	PAGE_SIZE
+#else
+# define TIMENS_SZ	0
+#endif
+
 SECTIONS
 {
 	/*
@@ -16,7 +22,7 @@ SECTIONS
 	 * segment.
 	 */
 
-	vvar_start = . - 3 * PAGE_SIZE;
+	vvar_start = . - (3 * PAGE_SIZE + TIMENS_SZ);
 	vvar_page = vvar_start;
 
 	/* Place all vvars at the offsets in asm/vvar.h. */
@@ -28,6 +34,7 @@ SECTIONS
 
 	pvclock_page = vvar_start + PAGE_SIZE;
 	hvclock_page = vvar_start + 2 * PAGE_SIZE;
+	timens_page = vvar_start + 3 * PAGE_SIZE;
 
 	. = SIZEOF_HEADERS;
 
diff --git a/arch/x86/entry/vdso/vdso2c.c b/arch/x86/entry/vdso/vdso2c.c
index ce67370d14e5..7380908045c7 100644
--- a/arch/x86/entry/vdso/vdso2c.c
+++ b/arch/x86/entry/vdso/vdso2c.c
@@ -75,12 +75,14 @@ enum {
 	sym_vvar_page,
 	sym_pvclock_page,
 	sym_hvclock_page,
+	sym_timens_page,
 };
 
 const int special_pages[] = {
 	sym_vvar_page,
 	sym_pvclock_page,
 	sym_hvclock_page,
+	sym_timens_page,
 };
 
 struct vdso_sym {
@@ -93,6 +95,7 @@ struct vdso_sym required_syms[] = {
 	[sym_vvar_page] = {"vvar_page", true},
 	[sym_pvclock_page] = {"pvclock_page", true},
 	[sym_hvclock_page] = {"hvclock_page", true},
+	[sym_timens_page] = {"timens_page", true},
 	{"VDSO32_NOTE_MASK", true},
 	{"__kernel_vsyscall", true},
 	{"__kernel_sigreturn", true},
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index c30a33b2963b..8a7f4cfe1fad 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -14,6 +14,7 @@
 #include <linux/elf.h>
 #include <linux/cpu.h>
 #include <linux/ptrace.h>
+#include <linux/time_namespace.h>
 #include <asm/pvclock.h>
 #include <asm/vgtod.h>
 #include <asm/proto.h>
@@ -23,6 +24,7 @@
 #include <asm/desc.h>
 #include <asm/cpufeature.h>
 #include <asm/mshyperv.h>
+#include <asm/page.h>
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
@@ -135,6 +137,16 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 		if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
 			return vmf_insert_pfn(vma, vmf->address,
 					vmalloc_to_pfn(tsc_pg));
+	} else if (sym_offset == image->sym_timens_page) {
+		struct time_namespace *ns = current->nsproxy->time_ns;
+		unsigned long pfn;
+
+		if (!ns->offsets)
+			pfn = page_to_pfn(ZERO_PAGE(0));
+		else
+			pfn = page_to_pfn(virt_to_page(ns->offsets));
+
+		return vmf_insert_pfn(vma, vmf->address, pfn);
 	}
 
 	return VM_FAULT_SIGBUS;
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index dffdc12cc7d6..9d420c545607 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -21,6 +21,7 @@ struct vdso_image {
 	long sym_vvar_page;
 	long sym_pvclock_page;
 	long sym_hvclock_page;
+	long sym_timens_page;
 	long sym_VDSO32_NOTE_MASK;
 	long sym___kernel_sigreturn;
 	long sym___kernel_rt_sigreturn;
diff --git a/init/Kconfig b/init/Kconfig
index 098fe185360c..3d9497241394 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -993,6 +993,7 @@ config UTS_NS
 
 config TIME_NS
 	bool "TIME namespace"
+	depends on ARCH_HAS_VDSO_TIME_NS
 	default y
 	help
 	  In this namespace boottime and monotonic clocks can be set.
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 16/28] x86/vdso: Allocate timens vdso
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (14 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 15/28] x86/vdso: Add offsets page in vvar Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone() Dmitry Safonov
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

As it has been discussed on timens RFC, adding a new conditional branch
`if (inside_time_ns)` on VDSO for all processes is undesirable.
It will add a penalty for everybody as branch predictor may mispredict
the jump. Also there are instruction cache lines wasted on cmp/jmp.

Those effects of introducing time namespace are very much unwanted
having in mind how much work have been spent on micro-optimisation
vdso code.

The propose is to allocate a second vdso code with dynamically
patched out (disabled by static_branch) timens code on boot time.

Allocate another vdso and copy original code.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/vdso2c.h |   2 +-
 arch/x86/entry/vdso/vma.c    | 113 +++++++++++++++++++++++++++++++++--
 arch/x86/include/asm/vdso.h  |   9 +--
 3 files changed, 114 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index 7556bb70ed8b..885b988aea19 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -157,7 +157,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	}
 	fprintf(outfile, "\n};\n\n");
 
-	fprintf(outfile, "const struct vdso_image %s = {\n", image_name);
+	fprintf(outfile, "struct vdso_image %s __ro_after_init = {\n", image_name);
 	fprintf(outfile, "\t.text = raw_data,\n");
 	fprintf(outfile, "\t.size = %lu,\n", mapping_size);
 	if (alt_sec) {
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 8a7f4cfe1fad..cc06c6b70167 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -30,26 +30,128 @@
 unsigned int __read_mostly vdso64_enabled = 1;
 #endif
 
-void __init init_vdso_image(const struct vdso_image *image)
+void __init init_vdso_image(struct vdso_image *image)
 {
 	BUG_ON(image->size % PAGE_SIZE != 0);
 
 	apply_alternatives((struct alt_instr *)(image->text + image->alt),
 			   (struct alt_instr *)(image->text + image->alt +
 						image->alt_len));
+#ifdef CONFIG_TIME_NS
+	image->text_timens = vmalloc_32(image->size);
+	if (WARN_ON(image->text_timens == NULL))
+		return;
+
+	memcpy(image->text_timens, image->text, image->size);
+#endif
 }
 
 struct linux_binprm;
 
+#ifdef CONFIG_TIME_NS
+static inline struct timens_offsets *current_timens_offsets(void)
+{
+	return current->nsproxy->time_ns->offsets;
+}
+
+static int vdso_check_timens(struct vm_area_struct *vma, bool *in_timens)
+{
+	struct task_struct *tsk;
+
+	if (likely(vma->vm_mm == current->mm)) {
+		*in_timens = !!current_timens_offsets();
+		return 0;
+	}
+
+	/*
+	 * .fault() handler can be called over remote process through
+	 * interfaces like /proc/$pid/mem or process_vm_{readv,writev}()
+	 * Considering such access to vdso as a slow-path.
+	 */
+
+#ifdef CONFIG_MEMCG
+	rcu_read_lock();
+
+	tsk = rcu_dereference(vma->vm_mm->owner);
+	if (tsk) {
+		task_lock(tsk);
+		/*
+		 * Shouldn't happen: nsproxy is unset in exit_mm().
+		 * Before that exit_mm() holds mmap_sem to set (mm = NULL).
+		 * It's impossible to have a fault in task without mm
+		 * and mmap_sem is taken during the fault.
+		 */
+		if (WARN_ON_ONCE(tsk->nsproxy == NULL)) {
+			task_unlock(tsk);
+			rcu_read_unlock();
+			return -EIO;
+		}
+		*in_timens = !!tsk->nsproxy->time_ns->offsets;
+		task_unlock(tsk);
+		rcu_read_unlock();
+		return 0;
+	}
+	rcu_read_unlock();
+#endif
+
+	read_lock(&tasklist_lock);
+	for_each_process(tsk) {
+		struct task_struct *c;
+
+		if (tsk->flags & PF_KTHREAD)
+			continue;
+		for_each_thread(tsk, c) {
+			if (c->mm == vma->vm_mm)
+				goto found;
+			if (c->mm)
+				break;
+		}
+	}
+	read_unlock(&tasklist_lock);
+	return -ESRCH;
+
+found:
+	task_lock(tsk);
+	read_unlock(&tasklist_lock);
+	*in_timens = !!tsk->nsproxy->time_ns->offsets;
+	task_unlock(tsk);
+
+	return 0;
+}
+#else /* CONFIG_TIME_NS */
+static inline int vdso_check_timens(struct vm_area_struct *vma, bool *in_timens)
+{
+	*in_timens = false;
+	return 0;
+}
+static inline struct timens_offsets *current_timens_offsets(void)
+{
+	return NULL;
+}
+#endif /* CONFIG_TIME_NS */
+
 static vm_fault_t vdso_fault(const struct vm_special_mapping *sm,
 		      struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	const struct vdso_image *image = vma->vm_mm->context.vdso_image;
+	unsigned long offset = vmf->pgoff << PAGE_SHIFT;
+	bool in_timens;
+	int err;
 
 	if (!image || (vmf->pgoff << PAGE_SHIFT) >= image->size)
 		return VM_FAULT_SIGBUS;
 
-	vmf->page = virt_to_page(image->text + (vmf->pgoff << PAGE_SHIFT));
+	err = vdso_check_timens(vma, &in_timens);
+	if (err)
+		return VM_FAULT_SIGBUS;
+
+	WARN_ON_ONCE(in_timens && !image->text_timens);
+
+	if (in_timens && image->text_timens)
+		vmf->page = vmalloc_to_page(image->text_timens + offset);
+	else
+		vmf->page = virt_to_page(image->text + offset);
+
 	get_page(vmf->page);
 	return 0;
 }
@@ -138,13 +240,14 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 			return vmf_insert_pfn(vma, vmf->address,
 					vmalloc_to_pfn(tsc_pg));
 	} else if (sym_offset == image->sym_timens_page) {
-		struct time_namespace *ns = current->nsproxy->time_ns;
+		/* We can fault only in current context for VM_PFNMAP mapping */
+		struct timens_offsets *offsets = current_timens_offsets();
 		unsigned long pfn;
 
-		if (!ns->offsets)
+		if (!offsets)
 			pfn = page_to_pfn(ZERO_PAGE(0));
 		else
-			pfn = page_to_pfn(virt_to_page(ns->offsets));
+			pfn = page_to_pfn(virt_to_page(offsets));
 
 		return vmf_insert_pfn(vma, vmf->address, pfn);
 	}
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 9d420c545607..03f468c63a24 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -12,6 +12,7 @@
 
 struct vdso_image {
 	void *text;
+	void *text_timens;
 	unsigned long size;   /* Always a multiple of PAGE_SIZE */
 
 	unsigned long alt, alt_len;
@@ -30,18 +31,18 @@ struct vdso_image {
 };
 
 #ifdef CONFIG_X86_64
-extern const struct vdso_image vdso_image_64;
+extern struct vdso_image vdso_image_64;
 #endif
 
 #ifdef CONFIG_X86_X32
-extern const struct vdso_image vdso_image_x32;
+extern struct vdso_image vdso_image_x32;
 #endif
 
 #if defined CONFIG_X86_32 || defined CONFIG_COMPAT
-extern const struct vdso_image vdso_image_32;
+extern struct vdso_image vdso_image_32;
 #endif
 
-extern void __init init_vdso_image(const struct vdso_image *image);
+extern void __init init_vdso_image(struct vdso_image *image);
 
 extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);
 
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone()
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (15 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 16/28] x86/vdso: Allocate timens vdso Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-14 14:05   ` Thomas Gleixner
  2019-06-12 19:26 ` [PATCHv4 18/28] vdso: introduce timens_static_branch Dmitry Safonov
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86, Andrei Vagin

As it has been discussed on timens RFC, adding a new conditional branch
`if (inside_time_ns)` on VDSO for all processes is undesirable.
It will add a penalty for everybody as branch predictor may mispredict
the jump. Also there are instruction cache lines wasted on cmp/jmp.

Those effects of introducing time namespace are very much unwanted
having in mind how much work have been spent on micro-optimisation
vdso code.

Addressing those problems, there are two versions of VDSO's .so:
for host tasks (without any penalty) and for processes inside of time
namespace with clk_to_ns() that subtracts offsets from host's time.

Whenever a user does setns()/unshare() or clone() with CLONE_TIMENS,
change VDSO image in mm and zap existing VVAR/VDSO page tables.
They will be re-faulted with corresponding image and VVAR offsets.

Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/vma.c   | 28 ++++++++++++++++++++++++++++
 arch/x86/include/asm/vdso.h |  1 +
 kernel/time_namespace.c     | 11 +++++++++++
 3 files changed, 40 insertions(+)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index cc06c6b70167..3ed5bf4932af 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -25,6 +25,7 @@
 #include <asm/cpufeature.h>
 #include <asm/mshyperv.h>
 #include <asm/page.h>
+#include <asm/tlb.h>
 
 #if defined(CONFIG_X86_64)
 unsigned int __read_mostly vdso64_enabled = 1;
@@ -266,6 +267,33 @@ static const struct vm_special_mapping vvar_mapping = {
 	.mremap = vvar_mremap,
 };
 
+#ifdef CONFIG_TIME_NS
+int vdso_join_timens(struct task_struct *task)
+{
+	struct mm_struct *mm = task->mm;
+	struct vm_area_struct *vma;
+
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		unsigned long size = vma->vm_end - vma->vm_start;
+
+		if (vma_is_special_mapping(vma, &vvar_mapping) ||
+		    vma_is_special_mapping(vma, &vdso_mapping))
+			zap_page_range(vma, vma->vm_start, size);
+	}
+
+	up_write(&mm->mmap_sem);
+	return 0;
+}
+#else /* CONFIG_TIME_NS */
+int vdso_join_timens(struct task_struct *task)
+{
+	return -ENXIO;
+}
+#endif
+
 /*
  * Add vdso and vvar mappings to current process.
  * @image          - blob to map
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index 03f468c63a24..ccf89dedd04f 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -45,6 +45,7 @@ extern struct vdso_image vdso_image_32;
 extern void __init init_vdso_image(struct vdso_image *image);
 
 extern int map_vdso_once(const struct vdso_image *image, unsigned long addr);
+extern int vdso_join_timens(struct task_struct *task);
 
 #endif /* __ASSEMBLER__ */
 
diff --git a/kernel/time_namespace.c b/kernel/time_namespace.c
index b3cffdf2635c..2a2cab14ac29 100644
--- a/kernel/time_namespace.c
+++ b/kernel/time_namespace.c
@@ -14,6 +14,7 @@
 #include <linux/proc_ns.h>
 #include <linux/sched/task.h>
 #include <linux/mm.h>
+#include <asm/vdso.h>
 
 ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim, struct timens_offsets *ns_offsets)
 {
@@ -182,11 +183,16 @@ static void timens_put(struct ns_common *ns)
 static int timens_install(struct nsproxy *nsproxy, struct ns_common *new)
 {
 	struct time_namespace *ns = to_time_ns(new);
+	int ret;
 
 	if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
 	    !ns_capable(current_user_ns(), CAP_SYS_ADMIN))
 		return -EPERM;
 
+	ret = vdso_join_timens(current);
+	if (ret)
+		return ret;
+
 	get_time_ns(ns);
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
@@ -201,10 +207,15 @@ int timens_on_fork(struct nsproxy *nsproxy, struct task_struct *tsk)
 {
 	struct ns_common *nsc = &nsproxy->time_ns_for_children->ns;
 	struct time_namespace *ns = to_time_ns(nsc);
+	int ret;
 
 	if (nsproxy->time_ns == nsproxy->time_ns_for_children)
 		return 0;
 
+	ret = vdso_join_timens(tsk);
+	if (ret)
+		return ret;
+
 	get_time_ns(ns);
 	put_time_ns(nsproxy->time_ns);
 	nsproxy->time_ns = ns;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 18/28] vdso: introduce timens_static_branch
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (16 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone() Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 19/28] timens: Add align for timens_offsets Dmitry Safonov
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

As it has been discussed on timens RFC, adding a new conditional branch
`if (inside_time_ns)` on VDSO for all processes is undesirable.

Addressing those problems, there are two versions of VDSO's .so:
for host tasks (without any penalty) and for processes inside of time
namespace with clk_to_ns() that subtracts offsets from host's time.

This patch introduces timens_static_branch(), which is similar with
static_branch_unlikely.

The timens code in vdso looks like this:

       if (timens_static_branch()) {
               clk_to_ns(clk, ts);
       }

The version of vdso which is compiled from sources will never execute
clk_to_ns(). And then we can patch the 'no-op' in the straight-line
codepath with a 'jump' instruction to the out-of-line true branch and
get the timens version of the vdso library.

While cooking the patch, an alternative approach has being considered:
to omit no-ops - memcpy() the following asm ret sequience on the place of
a function call: https://github.com/0x7f454c46/linux/commit/4cc0180f6d65
Having in mind possible issues with different toolchains, the usual
static_branch() approach was choosen.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/vclock_gettime.c  |  9 +++++--
 arch/x86/entry/vdso/vdso-layout.lds.S |  1 +
 arch/x86/entry/vdso/vdso2c.h          | 11 +++++++-
 arch/x86/entry/vdso/vma.c             | 37 ++++++++++++++++++++++-----
 arch/x86/include/asm/jump_label.h     | 14 ++++++++++
 arch/x86/include/asm/vdso.h           |  1 +
 include/linux/jump_label.h            |  5 ++++
 7 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/vdso/vclock_gettime.c b/arch/x86/entry/vdso/vclock_gettime.c
index e2d93628c0dd..21b7153cf2b0 100644
--- a/arch/x86/entry/vdso/vclock_gettime.c
+++ b/arch/x86/entry/vdso/vclock_gettime.c
@@ -18,6 +18,7 @@
 #include <asm/msr.h>
 #include <asm/pvclock.h>
 #include <asm/mshyperv.h>
+#include <asm/jump_label.h>
 #include <linux/math64.h>
 #include <linux/time.h>
 #include <linux/kernel.h>
@@ -174,8 +175,10 @@ notrace static __always_inline void clk_to_ns(clockid_t clk, struct timespec *ts
 		ts->tv_sec--;
 	}
 }
+#define _timens_static_branch_unlikely timens_static_branch_unlikely
 #else
 notrace static __always_inline void clk_to_ns(clockid_t clk, struct timespec *ts) {}
+notrace static __always_inline bool _timens_static_branch_unlikely(void) { return false; }
 #endif
 
 notrace static int do_hres(clockid_t clk, struct timespec *ts)
@@ -204,7 +207,8 @@ notrace static int do_hres(clockid_t clk, struct timespec *ts)
 	ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
 	ts->tv_nsec = ns;
 
-	clk_to_ns(clk, ts);
+	if (_timens_static_branch_unlikely())
+		clk_to_ns(clk, ts);
 
 	return 0;
 }
@@ -220,7 +224,8 @@ notrace static void do_coarse(clockid_t clk, struct timespec *ts)
 		ts->tv_nsec = base->nsec;
 	} while (unlikely(gtod_read_retry(gtod, seq)));
 
-	clk_to_ns(clk, ts);
+	if (_timens_static_branch_unlikely())
+		clk_to_ns(clk, ts);
 }
 
 notrace int __vdso_clock_gettime(clockid_t clock, struct timespec *ts)
diff --git a/arch/x86/entry/vdso/vdso-layout.lds.S b/arch/x86/entry/vdso/vdso-layout.lds.S
index ba216527e59f..69dbe4821aa5 100644
--- a/arch/x86/entry/vdso/vdso-layout.lds.S
+++ b/arch/x86/entry/vdso/vdso-layout.lds.S
@@ -45,6 +45,7 @@ SECTIONS
 	.gnu.version	: { *(.gnu.version) }
 	.gnu.version_d	: { *(.gnu.version_d) }
 	.gnu.version_r	: { *(.gnu.version_r) }
+	__jump_table	: { *(__jump_table) }	:text
 
 	.dynamic	: { *(.dynamic) }		:text	:dynamic
 
diff --git a/arch/x86/entry/vdso/vdso2c.h b/arch/x86/entry/vdso/vdso2c.h
index 885b988aea19..392031258315 100644
--- a/arch/x86/entry/vdso/vdso2c.h
+++ b/arch/x86/entry/vdso/vdso2c.h
@@ -16,7 +16,7 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 	unsigned int i, syms_nr;
 	unsigned long j;
 	ELF(Shdr) *symtab_hdr = NULL, *strtab_hdr, *secstrings_hdr,
-		*alt_sec = NULL;
+		*alt_sec = NULL, *jump_table_sec = NULL;
 	ELF(Dyn) *dyn = 0, *dyn_end = 0;
 	const char *secstrings;
 	INT_BITS syms[NSYMS] = {};
@@ -78,6 +78,9 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 		if (!strcmp(secstrings + GET_LE(&sh->sh_name),
 			    ".altinstructions"))
 			alt_sec = sh;
+		if (!strcmp(secstrings + GET_LE(&sh->sh_name),
+			    "__jump_table"))
+			jump_table_sec  = sh;
 	}
 
 	if (!symtab_hdr)
@@ -166,6 +169,12 @@ static void BITSFUNC(go)(void *raw_addr, size_t raw_len,
 		fprintf(outfile, "\t.alt_len = %lu,\n",
 			(unsigned long)GET_LE(&alt_sec->sh_size));
 	}
+	if (jump_table_sec) {
+		fprintf(outfile, "\t.jump_table = %lu,\n",
+			(unsigned long)GET_LE(&jump_table_sec->sh_offset));
+		fprintf(outfile, "\t.jump_table_len = %lu,\n",
+			(unsigned long)GET_LE(&jump_table_sec->sh_size));
+	}
 	for (i = 0; i < NSYMS; i++) {
 		if (required_syms[i].export && syms[i])
 			fprintf(outfile, "\t.sym_%s = %" PRIi64 ",\n",
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 3ed5bf4932af..12cf73aa0aec 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -15,6 +15,7 @@
 #include <linux/cpu.h>
 #include <linux/ptrace.h>
 #include <linux/time_namespace.h>
+#include <linux/jump_label.h>
 #include <asm/pvclock.h>
 #include <asm/vgtod.h>
 #include <asm/proto.h>
@@ -31,20 +32,44 @@
 unsigned int __read_mostly vdso64_enabled = 1;
 #endif
 
-void __init init_vdso_image(struct vdso_image *image)
+#ifdef CONFIG_TIME_NS
+static __init void apply_jump_tables(struct vdso_jump_entry *ent, unsigned long nr)
 {
-	BUG_ON(image->size % PAGE_SIZE != 0);
+	while (nr--) {
+		void *code_addr	= (void *)ent + ent->code;
+		long target_addr = (long) ent->target - (ent->code + JUMP_LABEL_NOP_SIZE);
+		((char *)code_addr)[0] = 0xe9; /* JMP rel32 */
+		*((long *)(code_addr + 1)) = (long)target_addr;
+
+		ent++;
+	}
+}
+
+static __init void init_timens(struct vdso_image *image)
+{
+	struct vdso_jump_entry *entries;
 
-	apply_alternatives((struct alt_instr *)(image->text + image->alt),
-			   (struct alt_instr *)(image->text + image->alt +
-						image->alt_len));
-#ifdef CONFIG_TIME_NS
 	image->text_timens = vmalloc_32(image->size);
 	if (WARN_ON(image->text_timens == NULL))
 		return;
 
 	memcpy(image->text_timens, image->text, image->size);
+
+	entries = image->text_timens + image->jump_table;
+	apply_jump_tables(entries, image->jump_table_len / sizeof(struct vdso_jump_entry));
+}
+#else
+static inline void init_timens(struct vdso_image *image) {}
 #endif
+
+void __init init_vdso_image(struct vdso_image *image)
+{
+	BUG_ON(image->size % PAGE_SIZE != 0);
+
+	apply_alternatives((struct alt_instr *)(image->text + image->alt),
+			   (struct alt_instr *)(image->text + image->alt +
+						image->alt_len));
+	init_timens(image);
 }
 
 struct linux_binprm;
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 65191ce8e1cf..1784aa49cc82 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -51,6 +51,20 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
 	return true;
 }
 
+static __always_inline bool timens_static_branch_unlikely(void)
+{
+	asm_volatile_goto("1:\n\t"
+		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
+		 ".pushsection __jump_table,  \"aw\"\n\t"
+		 "2: .word 1b - 2b, %l[l_yes] - 2b\n\t"
+		 ".popsection\n\t"
+		 : :  :  : l_yes);
+
+	return false;
+l_yes:
+	return true;
+}
+
 #else	/* __ASSEMBLY__ */
 
 .macro STATIC_JUMP_IF_TRUE target, key, def
diff --git a/arch/x86/include/asm/vdso.h b/arch/x86/include/asm/vdso.h
index ccf89dedd04f..5e83bd3cda22 100644
--- a/arch/x86/include/asm/vdso.h
+++ b/arch/x86/include/asm/vdso.h
@@ -16,6 +16,7 @@ struct vdso_image {
 	unsigned long size;   /* Always a multiple of PAGE_SIZE */
 
 	unsigned long alt, alt_len;
+	unsigned long jump_table, jump_table_len;
 
 	long sym_vvar_start;  /* Negative offset to the vvar area */
 
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 3e113a1fa0f1..69854a05d2f2 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -125,6 +125,11 @@ struct jump_entry {
 	long key;	// key may be far away from the core kernel under KASLR
 };
 
+struct vdso_jump_entry {
+	u16 code;
+	u16 target;
+};
+
 static inline unsigned long jump_entry_code(const struct jump_entry *entry)
 {
 	return (unsigned long)&entry->code + entry->code;
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 19/28] timens: Add align for timens_offsets
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (17 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 18/28] vdso: introduce timens_static_branch Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 20/28] timens/fs/proc: Introduce /proc/pid/timens_offsets Dmitry Safonov
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86

Align offsets so that time namespace will work for ia32 applications on
x86_64 host.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/timens_offsets.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/timens_offsets.h b/include/linux/timens_offsets.h
index e93aabaa5e45..05da1b0563ce 100644
--- a/include/linux/timens_offsets.h
+++ b/include/linux/timens_offsets.h
@@ -2,9 +2,17 @@
 #ifndef _LINUX_TIME_OFFSETS_H
 #define _LINUX_TIME_OFFSETS_H
 
+/*
+ * Time offsets need align as they're placed on VVAR page,
+ * which is used by x86_64 and ia32 VDSO code.
+ * On ia32 offset::tv_sec (u64) has align(4), so re-align offsets
+ * to the same positions as 64-bit offsets.
+ * On 64-bit big-endian systems VDSO should convert to timespec64
+ * to timespec because of a padding occurring between the fields.
+ */
 struct timens_offsets {
-	struct timespec64 monotonic;
-	struct timespec64 boottime;
+	struct timespec64 monotonic __aligned(8);
+	struct timespec64 boottime __aligned(8);
 };
 
 #endif
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 20/28] timens/fs/proc: Introduce /proc/pid/timens_offsets
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (18 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 19/28] timens: Add align for timens_offsets Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 21/28] selftest/timens: Add Time Namespace test for supported clocks Dmitry Safonov
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

API to set time namespace offsets for children processes, i.e.:
echo "clockid off_ses off_nsec" > /proc/self/timens_offsets

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 fs/proc/base.c                 |  95 ++++++++++++++++++++++++++++++
 include/linux/time_namespace.h |  10 ++++
 kernel/time_namespace.c        | 104 +++++++++++++++++++++++++++++++++
 3 files changed, 209 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9c8ca6cd3ce4..6a96b0543f69 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -94,6 +94,7 @@
 #include <linux/sched/debug.h>
 #include <linux/sched/stat.h>
 #include <linux/posix-timers.h>
+#include <linux/time_namespace.h>
 #include <trace/events/oom.h>
 #include "internal.h"
 #include "fd.h"
@@ -1516,6 +1517,97 @@ static const struct file_operations proc_pid_sched_autogroup_operations = {
 
 #endif /* CONFIG_SCHED_AUTOGROUP */
 
+#ifdef CONFIG_TIME_NS
+static int timens_offsets_show(struct seq_file *m, void *v)
+{
+	struct task_struct *p;
+
+	p = get_proc_task(file_inode(m->file));
+	if (!p)
+		return -ESRCH;
+	proc_timens_show_offsets(p, m);
+
+	put_task_struct(p);
+
+	return 0;
+}
+
+static ssize_t
+timens_offsets_write(struct file *file, const char __user *buf,
+	    size_t count, loff_t *ppos)
+{
+	struct inode *inode = file_inode(file);
+	struct proc_timens_offset offsets[2];
+	char *kbuf = NULL, *pos, *next_line;
+	struct task_struct *p;
+	int ret, noffsets;
+
+	/* Only allow < page size writes at the beginning of the file */
+	if ((*ppos != 0) || (count >= PAGE_SIZE))
+		return -EINVAL;
+
+	/* Slurp in the user data */
+	kbuf = memdup_user_nul(buf, count);
+	if (IS_ERR(kbuf))
+		return PTR_ERR(kbuf);
+
+	/* Parse the user data */
+	ret = -EINVAL;
+	noffsets = 0;
+	for (pos = kbuf; pos; pos = next_line) {
+		struct proc_timens_offset *off = &offsets[noffsets];
+		int err;
+
+		/* Find the end of line and ensure we don't look past it */
+		next_line = strchr(pos, '\n');
+		if (next_line) {
+			*next_line = '\0';
+			next_line++;
+			if (*next_line == '\0')
+				next_line = NULL;
+		}
+
+		err = sscanf(pos, "%u %lld %lu", &off->clockid,
+				&off->val.tv_sec, &off->val.tv_nsec);
+		if (err != 3 || off->val.tv_nsec >= NSEC_PER_SEC)
+			goto out;
+		noffsets++;
+		if (noffsets == ARRAY_SIZE(offsets)) {
+			if (next_line)
+				count = next_line - kbuf;
+			break;
+		}
+	}
+
+	ret = -ESRCH;
+	p = get_proc_task(inode);
+	if (!p)
+		goto out;
+	ret = proc_timens_set_offset(file, p, offsets, noffsets);
+	put_task_struct(p);
+	if (ret)
+		goto out;
+
+	ret = count;
+out:
+	kfree(kbuf);
+	return ret;
+}
+
+static int timens_offsets_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, timens_offsets_show, inode);
+}
+
+static const struct file_operations proc_timens_offsets_operations = {
+	.open		= timens_offsets_open,
+	.read		= seq_read,
+	.write		= timens_offsets_write,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+#endif /* CONFIG_TIME_NS */
+
 static ssize_t comm_write(struct file *file, const char __user *buf,
 				size_t count, loff_t *offset)
 {
@@ -2982,6 +3074,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 #endif
 #ifdef CONFIG_SCHED_AUTOGROUP
 	REG("autogroup",  S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
+#endif
+#ifdef CONFIG_TIME_NS
+	REG("timens_offsets",  S_IRUGO|S_IWUSR, proc_timens_offsets_operations),
 #endif
 	REG("comm",      S_IRUGO|S_IWUSR, proc_pid_set_comm_operations),
 #ifdef CONFIG_HAVE_ARCH_TRACEHOOK
diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index d32b55fad953..8cd16dfea42d 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -40,6 +40,16 @@ static inline void put_time_ns(struct time_namespace *ns)
 	kref_put(&ns->kref, free_time_ns);
 }
 
+extern void proc_timens_show_offsets(struct task_struct *p, struct seq_file *m);
+
+struct proc_timens_offset {
+	int clockid;
+	struct timespec64 val;
+};
+
+extern int proc_timens_set_offset(struct file *file, struct task_struct *p,
+				struct proc_timens_offset *offsets, int n);
+
 static inline void timens_add_monotonic(struct timespec64 *ts)
 {
         struct timens_offsets *ns_offsets = current->nsproxy->time_ns->offsets;
diff --git a/kernel/time_namespace.c b/kernel/time_namespace.c
index 2a2cab14ac29..a32adeabf9f0 100644
--- a/kernel/time_namespace.c
+++ b/kernel/time_namespace.c
@@ -13,6 +13,7 @@
 #include <linux/user_namespace.h>
 #include <linux/proc_ns.h>
 #include <linux/sched/task.h>
+#include <linux/seq_file.h>
 #include <linux/mm.h>
 #include <asm/vdso.h>
 
@@ -229,6 +230,109 @@ static struct user_namespace *timens_owner(struct ns_common *ns)
 	return to_time_ns(ns)->user_ns;
 }
 
+static void show_offset(struct seq_file *m, int clockid, struct timespec64 *ts)
+{
+	seq_printf(m, "%d %lld %ld\n", clockid, ts->tv_sec, ts->tv_nsec);
+}
+
+void proc_timens_show_offsets(struct task_struct *p, struct seq_file *m)
+{
+	struct ns_common *ns;
+	struct time_namespace *time_ns;
+	struct timens_offsets *ns_offsets;
+
+	ns = timens_for_children_get(p);
+	if (!ns)
+		return;
+	time_ns = to_time_ns(ns);
+
+	if (!time_ns->offsets) {
+		put_time_ns(time_ns);
+		return;
+	}
+	ns_offsets = time_ns->offsets;
+
+	show_offset(m, CLOCK_MONOTONIC, &ns_offsets->monotonic);
+	show_offset(m, CLOCK_BOOTTIME, &ns_offsets->boottime);
+	put_time_ns(time_ns);
+}
+
+int proc_timens_set_offset(struct file *file, struct task_struct *p,
+			   struct proc_timens_offset *offsets, int noffsets)
+{
+	struct ns_common *ns;
+	struct time_namespace *time_ns;
+	struct timens_offsets *ns_offsets;
+	struct timespec64 *offset;
+	struct timespec64 tp;
+	int i, err;
+
+	ns = timens_for_children_get(p);
+	if (!ns)
+		return -ESRCH;
+	time_ns = to_time_ns(ns);
+
+	if (!time_ns->offsets || time_ns->initialized ||
+	    !file_ns_capable(file, time_ns->user_ns, CAP_SYS_TIME)) {
+		put_time_ns(time_ns);
+		return -EPERM;
+	}
+	ns_offsets = time_ns->offsets;
+
+	for (i = 0; i < noffsets; i++) {
+		struct proc_timens_offset *off = &offsets[i];
+
+		switch (off->clockid) {
+		case CLOCK_MONOTONIC:
+			ktime_get_ts64(&tp);
+			break;
+		case CLOCK_BOOTTIME:
+			ktime_get_boottime_ts64(&tp);
+			break;
+		default:
+			err = -EINVAL;
+			goto out;
+		}
+
+		err = -ERANGE;
+
+		if (off->val.tv_sec > KTIME_SEC_MAX || off->val.tv_sec < -KTIME_SEC_MAX)
+			goto out;
+
+		tp = timespec64_add(tp, off->val);
+		/*
+		 * KTIME_SEC_MAX is divided by 2 to be sure that KTIME_MAX is
+		 * still unreachable.
+		 */
+		if (tp.tv_sec < 0 || tp.tv_sec > KTIME_SEC_MAX / 2)
+			goto out;
+	}
+
+	err = 0;
+	/* don't report errors after this line */
+	for (i = 0; i < noffsets; i++) {
+		struct proc_timens_offset *off = &offsets[i];
+
+		switch (off->clockid) {
+		case CLOCK_MONOTONIC:
+			offset = &ns_offsets->monotonic;
+			break;
+		case CLOCK_BOOTTIME:
+			offset = &ns_offsets->boottime;
+			break;
+		default:
+			goto out;
+		}
+
+		*offset = off->val;
+	}
+
+out:
+	put_time_ns(time_ns);
+
+	return err;
+}
+
 const struct proc_ns_operations timens_operations = {
 	.name		= "time",
 	.type		= CLONE_NEWTIME,
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 21/28] selftest/timens: Add Time Namespace test for supported clocks
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (19 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 20/28] timens/fs/proc: Introduce /proc/pid/timens_offsets Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 22/28] selftest/timens: Add a test for timerfd Dmitry Safonov
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86, Andrei Vagin

A test to check that all supported clocks work on host and inside
a new time namespace. Use both ways to get time: through VDSO and
by entering the kernel with implicit syscall.

Introduce a new timens directory in selftests framework for
the next timens tests.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 tools/testing/selftests/Makefile          |   1 +
 tools/testing/selftests/timens/.gitignore |   1 +
 tools/testing/selftests/timens/Makefile   |   5 +
 tools/testing/selftests/timens/config     |   1 +
 tools/testing/selftests/timens/log.h      |  26 +++
 tools/testing/selftests/timens/timens.c   | 188 ++++++++++++++++++++++
 tools/testing/selftests/timens/timens.h   |  63 ++++++++
 7 files changed, 285 insertions(+)
 create mode 100644 tools/testing/selftests/timens/.gitignore
 create mode 100644 tools/testing/selftests/timens/Makefile
 create mode 100644 tools/testing/selftests/timens/config
 create mode 100644 tools/testing/selftests/timens/log.h
 create mode 100644 tools/testing/selftests/timens/timens.c
 create mode 100644 tools/testing/selftests/timens/timens.h

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 9781ca79794a..f71a59632192 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -47,6 +47,7 @@ TARGETS += splice
 TARGETS += static_keys
 TARGETS += sync
 TARGETS += sysctl
+TARGETS += timens
 ifneq (1, $(quicktest))
 TARGETS += timers
 endif
diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
new file mode 100644
index 000000000000..27a693229ce1
--- /dev/null
+++ b/tools/testing/selftests/timens/.gitignore
@@ -0,0 +1 @@
+timens
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
new file mode 100644
index 000000000000..b877efb78974
--- /dev/null
+++ b/tools/testing/selftests/timens/Makefile
@@ -0,0 +1,5 @@
+TEST_GEN_PROGS := timens
+
+CFLAGS := -Wall -Werror
+
+include ../lib.mk
diff --git a/tools/testing/selftests/timens/config b/tools/testing/selftests/timens/config
new file mode 100644
index 000000000000..4480620f6f49
--- /dev/null
+++ b/tools/testing/selftests/timens/config
@@ -0,0 +1 @@
+CONFIG_TIME_NS=y
diff --git a/tools/testing/selftests/timens/log.h b/tools/testing/selftests/timens/log.h
new file mode 100644
index 000000000000..db64df2a8483
--- /dev/null
+++ b/tools/testing/selftests/timens/log.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __SELFTEST_TIMENS_LOG_H__
+#define __SELFTEST_TIMENS_LOG_H__
+
+#define pr_msg(fmt, lvl, ...)						\
+	ksft_print_msg("[%s] (%s:%d)\t" fmt "\n",			\
+			lvl, __FILE__, __LINE__, ##__VA_ARGS__)
+
+#define pr_p(func, fmt, ...)	func(fmt ": %m", ##__VA_ARGS__)
+
+#define pr_err(fmt, ...)						\
+	({								\
+		ksft_test_result_error(fmt "\n", ##__VA_ARGS__);		\
+		-1;							\
+	})
+
+#define pr_fail(fmt, ...)					\
+	({							\
+		ksft_test_result_fail(fmt, ##__VA_ARGS__);	\
+		-1;						\
+	})
+
+#define pr_perror(fmt, ...)	pr_p(pr_err, fmt, ##__VA_ARGS__)
+
+#endif
diff --git a/tools/testing/selftests/timens/timens.c b/tools/testing/selftests/timens/timens.c
new file mode 100644
index 000000000000..407e7a97882f
--- /dev/null
+++ b/tools/testing/selftests/timens/timens.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+#include <time.h>
+#include <string.h>
+
+#include "log.h"
+#include "timens.h"
+
+/*
+ * Test shouldn't be run for a day, so add 10 days to child
+ * time and check parent's time to be in the same day.
+ */
+#define DAY_IN_SEC			(60*60*24)
+#define TEN_DAYS_IN_SEC			(10*DAY_IN_SEC)
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+#define CLOCK_TYPES							\
+	ct(CLOCK_BOOTTIME, -1),						\
+	ct(CLOCK_BOOTTIME_ALARM, 1),					\
+	ct(CLOCK_MONOTONIC, -1),					\
+	ct(CLOCK_MONOTONIC_COARSE, 1),					\
+	ct(CLOCK_MONOTONIC_RAW, 1),					\
+
+
+struct test_clock {
+	clockid_t id;
+	char *name;
+	/*
+	 * off_id is -1 if a clock has own offset, or it contains an index
+	 * which contains a right offset of this clock.
+	 */
+	int off_id;
+	time_t offset;
+};
+
+#define ct(clock, off_id)	{ clock, #clock, off_id }
+static struct test_clock clocks[] = {
+	CLOCK_TYPES
+};
+#undef ct
+
+static int child_ns, parent_ns = -1;
+
+static int switch_ns(int fd)
+{
+	if (setns(fd, CLONE_NEWTIME)) {
+		pr_perror("setns()");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int init_namespaces(void)
+{
+	char path[] = "/proc/self/ns/time_for_children";
+	struct stat st1, st2;
+
+	if (parent_ns == -1) {
+		parent_ns = open(path, O_RDONLY);
+		if (parent_ns <= 0)
+			return pr_perror("Unable to open %s", path);
+	}
+
+	if (fstat(parent_ns, &st1))
+		return pr_perror("Unable to stat the parent timens");
+
+	if (unshare(CLONE_NEWTIME))
+		return pr_perror("Can't unshare() timens");
+
+	child_ns = open(path, O_RDONLY);
+	if (child_ns <= 0)
+		return pr_perror("Unable to open %s", path);
+
+	if (fstat(child_ns, &st2))
+		return pr_perror("Unable to stat the timens");
+
+	if (st1.st_ino == st2.st_ino)
+		return pr_perror("The same child_ns after CLONE_NEWTIME");
+
+	return 0;
+}
+
+static int test_gettime(clockid_t clock_index, bool raw_syscall, time_t offset)
+{
+	struct timespec child_ts_new, parent_ts_old, cur_ts;
+	char *entry = raw_syscall ? "syscall" : "vdso";
+	double precision = 0.0;
+
+	switch (clocks[clock_index].id) {
+	case CLOCK_MONOTONIC_COARSE:
+	case CLOCK_MONOTONIC_RAW:
+		precision = -2.0;
+		break;
+	}
+
+	if (switch_ns(parent_ns))
+		return pr_err("switch_ns(%d)", child_ns);
+
+	if (_gettime(clocks[clock_index].id, &parent_ts_old, raw_syscall))
+		return -1;
+
+	child_ts_new.tv_nsec = parent_ts_old.tv_nsec;
+	child_ts_new.tv_sec = parent_ts_old.tv_sec + offset;
+
+	if (switch_ns(child_ns))
+		return pr_err("switch_ns(%d)", child_ns);
+
+	if (_gettime(clocks[clock_index].id, &cur_ts, raw_syscall))
+		return -1;
+
+	if (difftime(cur_ts.tv_sec, child_ts_new.tv_sec) < precision) {
+		ksft_test_result_fail(
+			"Child's %s (%s) time has not changed: %lu -> %lu [%lu]\n",
+			clocks[clock_index].name, entry, parent_ts_old.tv_sec,
+			child_ts_new.tv_sec, cur_ts.tv_sec);
+		return -1;
+	}
+
+	if (switch_ns(parent_ns))
+		return pr_err("switch_ns(%d)", parent_ns);
+
+	if (_gettime(clocks[clock_index].id, &cur_ts, raw_syscall))
+		return -1;
+
+	if (difftime(cur_ts.tv_sec, parent_ts_old.tv_sec) > DAY_IN_SEC) {
+		ksft_test_result_fail(
+			"Parent's %s (%s) time has changed: %lu -> %lu [%lu]\n",
+			clocks[clock_index].name, entry, parent_ts_old.tv_sec,
+			child_ts_new.tv_sec, cur_ts.tv_sec);
+		/* Let's play nice and put it closer to original */
+		clock_settime(clocks[clock_index].id, &cur_ts);
+		return -1;
+	}
+
+	ksft_test_result_pass("Passed for %s (%s)\n",
+				clocks[clock_index].name, entry);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	unsigned int i;
+	time_t offset;
+	int ret = 0;
+
+	nscheck();
+
+
+	if (init_namespaces())
+		return 1;
+
+	/* Offsets have to be set before tasks enter the namespace. */
+	for (i = 0; i < ARRAY_SIZE(clocks); i++) {
+		if (clocks[i].off_id != -1)
+			continue;
+		offset = TEN_DAYS_IN_SEC + i * 1000;
+		clocks[i].offset = offset;
+		if (_settime(clocks[i].id, offset))
+			return 1;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(clocks); i++) {
+		if (clocks[i].off_id != -1)
+			offset = clocks[clocks[i].off_id].offset;
+		else
+			offset = clocks[i].offset;
+		ret |= test_gettime(i, true, offset);
+		ret |= test_gettime(i, false, offset);
+	}
+
+	if (ret)
+		ksft_exit_fail();
+
+	ksft_exit_pass();
+	return !!ret;
+}
diff --git a/tools/testing/selftests/timens/timens.h b/tools/testing/selftests/timens/timens.h
new file mode 100644
index 000000000000..77c127384810
--- /dev/null
+++ b/tools/testing/selftests/timens/timens.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __TIMENS_H__
+#define __TIMENS_H__
+
+#include <fcntl.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdbool.h>
+
+#include "../kselftest.h"
+
+#ifndef CLONE_NEWTIME
+# define CLONE_NEWTIME	0x00000080
+#endif
+
+static inline int _settime(clockid_t clk_id, time_t offset)
+{
+	int fd, len;
+	char buf[4096];
+
+	if (clk_id == CLOCK_MONOTONIC_COARSE || clk_id == CLOCK_MONOTONIC_RAW)
+		clk_id = CLOCK_MONOTONIC;
+
+	len = snprintf(buf, sizeof(buf), "%d %ld 0", clk_id, offset);
+
+	fd = open("/proc/self/timens_offsets", O_WRONLY);
+	if (fd < 0)
+		return pr_perror("/proc/self/timens_offsets");
+
+	if (write(fd, buf, len) != len)
+		return pr_perror("/proc/self/timens_offsets");
+
+	close(fd);
+
+	return 0;
+}
+
+static inline int _gettime(clockid_t clk_id, struct timespec *res, bool raw_syscall)
+{
+	int err;
+
+	if (!raw_syscall) {
+		if (clock_gettime(clk_id, res)) {
+			pr_perror("clock_gettime(%d)", (int)clk_id);
+			return -1;
+		}
+		return 0;
+	}
+
+	err = syscall(SYS_clock_gettime, clk_id, res);
+	if (err)
+		pr_perror("syscall(SYS_clock_gettime(%d))", (int)clk_id);
+
+	return err;
+}
+
+static inline void nscheck(void)
+{
+	if (access("/proc/self/ns/time", F_OK) < 0)
+		ksft_exit_skip("Time namespaces are not supported\n");
+}
+
+#endif
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 22/28] selftest/timens: Add a test for timerfd
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (20 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 21/28] selftest/timens: Add Time Namespace test for supported clocks Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 23/28] selftest/timens: Add a test for clock_nanosleep() Dmitry Safonov
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

Check that timerfd_create() takes into account clock offsets.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 tools/testing/selftests/timens/.gitignore |   1 +
 tools/testing/selftests/timens/Makefile   |   2 +-
 tools/testing/selftests/timens/timerfd.c  | 127 ++++++++++++++++++++++
 3 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timens/timerfd.c

diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
index 27a693229ce1..b609f6ee9fb9 100644
--- a/tools/testing/selftests/timens/.gitignore
+++ b/tools/testing/selftests/timens/.gitignore
@@ -1 +1,2 @@
 timens
+timerfd
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index b877efb78974..66b90cd28e5c 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -1,4 +1,4 @@
-TEST_GEN_PROGS := timens
+TEST_GEN_PROGS := timens timerfd
 
 CFLAGS := -Wall -Werror
 
diff --git a/tools/testing/selftests/timens/timerfd.c b/tools/testing/selftests/timens/timerfd.c
new file mode 100644
index 000000000000..c9816db4fe79
--- /dev/null
+++ b/tools/testing/selftests/timens/timerfd.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <sched.h>
+
+#include <sys/timerfd.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <time.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdint.h>
+
+#include "log.h"
+#include "timens.h"
+
+static int tclock_gettime(clock_t clockid, struct timespec *now)
+{
+	if (clockid == CLOCK_BOOTTIME_ALARM)
+		clockid = CLOCK_BOOTTIME;
+	return clock_gettime(clockid, now);
+}
+
+int run_test(int clockid, struct timespec now)
+{
+	struct itimerspec new_value;
+	long long elapsed;
+	int fd, i;
+
+	if (tclock_gettime(clockid, &now))
+		return pr_perror("clock_gettime");
+
+	for (i = 0; i < 2; i++) {
+		int flags = 0;
+
+		new_value.it_value.tv_sec = 3600;
+		new_value.it_value.tv_nsec = 0;
+		new_value.it_interval.tv_sec = 1;
+		new_value.it_interval.tv_nsec = 0;
+
+		if (i == 1) {
+			new_value.it_value.tv_sec += now.tv_sec;
+			new_value.it_value.tv_nsec += now.tv_nsec;
+		}
+
+		fd = timerfd_create(clockid, 0);
+		if (fd == -1)
+			return pr_perror("timerfd_create");
+
+		if (i == 1)
+			flags |= TFD_TIMER_ABSTIME;
+
+		if (timerfd_settime(fd, flags, &new_value, NULL))
+			return pr_perror("timerfd_settime");
+
+		if (timerfd_gettime(fd, &new_value))
+			return pr_perror("timerfd_gettime");
+
+		elapsed = new_value.it_value.tv_sec;
+		if (abs(elapsed - 3600) > 60) {
+			ksft_test_result_fail("clockid: %d elapsed: %lld\n",
+					      clockid, elapsed);
+			return 1;
+		}
+
+		close(fd);
+	}
+
+	ksft_test_result_pass("clockid=%d\n", clockid);
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret, status, len, fd;
+	char buf[4096];
+	pid_t pid;
+	struct timespec btime_now, mtime_now;
+
+	nscheck();
+
+	clock_gettime(CLOCK_MONOTONIC, &mtime_now);
+	clock_gettime(CLOCK_BOOTTIME, &btime_now);
+
+	if (unshare(CLONE_NEWTIME))
+		return pr_perror("unshare");
+
+	len = snprintf(buf, sizeof(buf), "%d %d 0\n%d %d 0",
+			CLOCK_MONOTONIC, 70 * 24 * 3600,
+			CLOCK_BOOTTIME, 9 * 24 * 3600);
+	fd = open("/proc/self/timens_offsets", O_WRONLY);
+	if (fd < 0)
+		return pr_perror("/proc/self/timens_offsets");
+
+	if (write(fd, buf, len) != len)
+		return pr_perror("/proc/self/timens_offsets");
+
+	close(fd);
+	mtime_now.tv_sec += 70 * 24 * 3600;
+	btime_now.tv_sec += 9 * 24 * 3600;
+
+	pid = fork();
+	if (pid < 0)
+		return pr_perror("Unable to fork");
+	if (pid == 0) {
+		ret = 0;
+		ret |= run_test(CLOCK_BOOTTIME, btime_now);
+		ret |= run_test(CLOCK_MONOTONIC, mtime_now);
+		ret |= run_test(CLOCK_BOOTTIME_ALARM, btime_now);
+
+		if (ret)
+			ksft_exit_fail();
+		ksft_exit_pass();
+		return ret;
+	}
+
+	if (waitpid(pid, &status, 0) != pid)
+		return pr_perror("Unable to wait the child process");
+
+	if (WIFEXITED(status))
+		return WEXITSTATUS(status);
+
+	return 1;
+}
+
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 23/28] selftest/timens: Add a test for clock_nanosleep()
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (21 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 22/28] selftest/timens: Add a test for timerfd Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 24/28] selftest/timens: Add procfs selftest Dmitry Safonov
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

Check that clock_nanosleep() takes into account clock offsets.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 tools/testing/selftests/timens/.gitignore     |   1 +
 tools/testing/selftests/timens/Makefile       |   2 +-
 .../selftests/timens/clock_nanosleep.c        | 100 ++++++++++++++++++
 3 files changed, 102 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c

diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
index b609f6ee9fb9..9b6c8ddac2c8 100644
--- a/tools/testing/selftests/timens/.gitignore
+++ b/tools/testing/selftests/timens/.gitignore
@@ -1,2 +1,3 @@
+clock_nanosleep
 timens
 timerfd
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index 66b90cd28e5c..76a1dc891184 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -1,4 +1,4 @@
-TEST_GEN_PROGS := timens timerfd
+TEST_GEN_PROGS := timens timerfd clock_nanosleep
 
 CFLAGS := -Wall -Werror
 
diff --git a/tools/testing/selftests/timens/clock_nanosleep.c b/tools/testing/selftests/timens/clock_nanosleep.c
new file mode 100644
index 000000000000..dfd4e3429c75
--- /dev/null
+++ b/tools/testing/selftests/timens/clock_nanosleep.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <sched.h>
+
+#include <sys/timerfd.h>
+#include <sys/syscall.h>
+#include <time.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdint.h>
+
+#include "log.h"
+#include "timens.h"
+
+static long long get_elapsed_time(int clockid, struct timespec *start)
+{
+	struct timespec curr;
+	long long secs, nsecs;
+
+	if (clock_gettime(clockid, &curr) == -1)
+		return pr_perror("clock_gettime");
+
+	secs = curr.tv_sec - start->tv_sec;
+	nsecs = curr.tv_nsec - start->tv_nsec;
+	if (nsecs < 0) {
+		secs--;
+		nsecs += 1000000000;
+	}
+	if (nsecs > 1000000000) {
+		secs++;
+		nsecs -= 1000000000;
+	}
+	return secs * 1000 + nsecs / 1000000;
+}
+
+int run_test(int clockid)
+{
+	long long elapsed;
+	int i;
+
+	for (i = 0; i < 2; i++) {
+		struct timespec now = {};
+		struct timespec start;
+
+		if (clock_gettime(clockid, &start) == -1)
+			return pr_perror("clock_gettime");
+
+
+		if (i == 1) {
+			now.tv_sec = start.tv_sec;
+			now.tv_nsec = start.tv_nsec;
+		}
+
+		now.tv_sec += 2;
+		clock_nanosleep(clockid, i ? TIMER_ABSTIME : 0, &now, NULL);
+
+		elapsed = get_elapsed_time(clockid, &start);
+		if (elapsed < 1900 || elapsed > 2100) {
+			pr_fail("clockid: %d abs: %d elapsed: %lld\n",
+				clockid, i, elapsed);
+			return 1;
+		}
+		ksft_test_result_pass("clockid: %d abs:%d\n", clockid, i);
+	}
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret, nsfd;
+
+	nscheck();
+
+	if (unshare(CLONE_NEWTIME))
+		return pr_perror("unshare");
+
+	if (_settime(CLOCK_MONOTONIC, 7 * 24 * 3600))
+		return 1;
+	if (_settime(CLOCK_BOOTTIME, 9 * 24 * 3600))
+		return 1;
+
+	nsfd = open("/proc/self/ns/time_for_children", O_RDONLY);
+	if (nsfd < 0)
+		return pr_perror("Unable to open timens_for_children");
+
+	if (setns(nsfd, CLONE_NEWTIME))
+		return pr_perror("Unable to set timens");
+
+	ret = 0;
+	ret |= run_test(CLOCK_MONOTONIC);
+	ret |= run_test(CLOCK_BOOTTIME_ALARM);
+
+	if (ret)
+		ksft_exit_fail();
+	ksft_exit_pass();
+	return ret;
+}
+
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 24/28] selftest/timens: Add procfs selftest
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (22 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 23/28] selftest/timens: Add a test for clock_nanosleep() Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 25/28] selftest/timens: Add timer offsets test Dmitry Safonov
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86, Andrei Vagin

Check that /proc/uptime is correct inside a new time namespace.

Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 tools/testing/selftests/timens/.gitignore |   1 +
 tools/testing/selftests/timens/Makefile   |   2 +-
 tools/testing/selftests/timens/procfs.c   | 142 ++++++++++++++++++++++
 3 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timens/procfs.c

diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
index 9b6c8ddac2c8..94ffdd9cead7 100644
--- a/tools/testing/selftests/timens/.gitignore
+++ b/tools/testing/selftests/timens/.gitignore
@@ -1,3 +1,4 @@
 clock_nanosleep
+procfs
 timens
 timerfd
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index 76a1dc891184..f96f50d1fef8 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -1,4 +1,4 @@
-TEST_GEN_PROGS := timens timerfd clock_nanosleep
+TEST_GEN_PROGS := timens timerfd clock_nanosleep procfs
 
 CFLAGS := -Wall -Werror
 
diff --git a/tools/testing/selftests/timens/procfs.c b/tools/testing/selftests/timens/procfs.c
new file mode 100644
index 000000000000..89a24c134510
--- /dev/null
+++ b/tools/testing/selftests/timens/procfs.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <math.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+#include <time.h>
+
+#include "log.h"
+#include "timens.h"
+
+/*
+ * Test shouldn't be run for a day, so add 10 days to child
+ * time and check parent's time to be in the same day.
+ */
+#define MAX_TEST_TIME_SEC		(60*5)
+#define DAY_IN_SEC			(60*60*24)
+#define TEN_DAYS_IN_SEC			(10*DAY_IN_SEC)
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+static int child_ns, parent_ns;
+
+static int switch_ns(int fd)
+{
+	if (setns(fd, CLONE_NEWTIME))
+		return pr_perror("setns()");
+
+	return 0;
+}
+
+static int init_namespaces(void)
+{
+	char path[] = "/proc/self/ns/time_for_children";
+	struct stat st1, st2;
+
+	parent_ns = open(path, O_RDONLY);
+	if (parent_ns <= 0)
+		return pr_perror("Unable to open %s", path);
+
+	if (fstat(parent_ns, &st1))
+		return pr_perror("Unable to stat the parent timens");
+
+	if (unshare(CLONE_NEWTIME))
+		return pr_perror("Can't unshare() timens");
+
+	child_ns = open(path, O_RDONLY);
+	if (child_ns <= 0)
+		return pr_perror("Unable to open %s", path);
+
+	if (fstat(child_ns, &st2))
+		return pr_perror("Unable to stat the timens");
+
+	if (st1.st_ino == st2.st_ino)
+		return pr_err("The same child_ns after CLONE_NEWTIME");
+
+	if (_settime(CLOCK_BOOTTIME, TEN_DAYS_IN_SEC))
+		return -1;
+
+	return 0;
+}
+
+static int read_proc_uptime(struct timespec *uptime)
+{
+	unsigned long up_sec, up_nsec;
+	FILE *proc;
+
+	proc = fopen("/proc/uptime", "r");
+	if (proc == NULL) {
+		pr_perror("Unable to open /proc/uptime");
+		return -1;
+	}
+
+	if (fscanf(proc, "%lu.%02lu", &up_sec, &up_nsec) != 2) {
+		if (errno) {
+			pr_perror("fscanf");
+			return -errno;
+		}
+		pr_err("failed to parse /proc/uptime");
+		return -1;
+	}
+	fclose(proc);
+
+	uptime->tv_sec = up_sec;
+	uptime->tv_nsec = up_nsec;
+	return 0;
+}
+
+static int check_uptime(void)
+{
+	struct timespec uptime_new, uptime_old;
+	time_t uptime_expected;
+	double prec = MAX_TEST_TIME_SEC;
+
+	if (switch_ns(parent_ns))
+		return pr_err("switch_ns(%d)", parent_ns);
+
+	if (read_proc_uptime(&uptime_old))
+		return 1;
+
+	if (switch_ns(child_ns))
+		return pr_err("switch_ns(%d)", child_ns);
+
+	if (read_proc_uptime(&uptime_new))
+		return 1;
+
+	uptime_expected = uptime_old.tv_sec + TEN_DAYS_IN_SEC;
+	if (fabs(difftime(uptime_new.tv_sec, uptime_expected)) > prec) {
+		pr_fail("uptime in /proc/uptime: old %ld, new %ld [%ld]",
+			uptime_old.tv_sec, uptime_new.tv_sec,
+			uptime_old.tv_sec + TEN_DAYS_IN_SEC);
+		return 1;
+	}
+
+	ksft_test_result_pass("Passed for /proc/uptime\n");
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret = 0;
+
+	nscheck();
+
+	if (init_namespaces())
+		return 1;
+
+	ret |= check_uptime();
+
+	if (ret)
+		ksft_exit_fail();
+	ksft_exit_pass();
+	return ret;
+}
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 25/28] selftest/timens: Add timer offsets test
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (23 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 24/28] selftest/timens: Add procfs selftest Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line Dmitry Safonov
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Thomas Gleixner, Vincenzo Frascino, containers, criu, linux-api,
	x86, Andrei Vagin

From: Andrei Vagin <avagin@openvz.org>

Check that timer_create() takes into account clock offsets.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 tools/testing/selftests/timens/.gitignore |   1 +
 tools/testing/selftests/timens/Makefile   |   3 +-
 tools/testing/selftests/timens/timer.c    | 116 ++++++++++++++++++++++
 3 files changed, 119 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timens/timer.c

diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
index 94ffdd9cead7..3b7eda8f35ce 100644
--- a/tools/testing/selftests/timens/.gitignore
+++ b/tools/testing/selftests/timens/.gitignore
@@ -1,4 +1,5 @@
 clock_nanosleep
 procfs
 timens
+timer
 timerfd
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index f96f50d1fef8..ae1ffd24cc43 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -1,5 +1,6 @@
-TEST_GEN_PROGS := timens timerfd clock_nanosleep procfs
+TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs
 
 CFLAGS := -Wall -Werror
+LDFLAGS := -lrt
 
 include ../lib.mk
diff --git a/tools/testing/selftests/timens/timer.c b/tools/testing/selftests/timens/timer.c
new file mode 100644
index 000000000000..6e33cd54d397
--- /dev/null
+++ b/tools/testing/selftests/timens/timer.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <sched.h>
+
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <time.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <signal.h>
+#include <time.h>
+
+#include "log.h"
+#include "timens.h"
+
+int run_test(int clockid, struct timespec now)
+{
+	struct itimerspec new_value;
+	long long elapsed;
+	timer_t fd;
+	int i;
+
+	for (i = 0; i < 2; i++) {
+		struct sigevent sevp = {.sigev_notify = SIGEV_NONE};
+		int flags = 0;
+
+		new_value.it_value.tv_sec = 3600;
+		new_value.it_value.tv_nsec = 0;
+		new_value.it_interval.tv_sec = 1;
+		new_value.it_interval.tv_nsec = 0;
+
+		if (i == 1) {
+			new_value.it_value.tv_sec += now.tv_sec;
+			new_value.it_value.tv_nsec += now.tv_nsec;
+		}
+
+		if (timer_create(clockid, &sevp, &fd) == -1)
+			return pr_perror("timerfd_create");
+
+		if (i == 1)
+			flags |= TIMER_ABSTIME;
+		if (timer_settime(fd, flags, &new_value, NULL) == -1)
+			return pr_perror("timerfd_settime");
+
+		if (timer_gettime(fd, &new_value) == -1)
+			return pr_perror("timerfd_gettime");
+
+		elapsed = new_value.it_value.tv_sec;
+		if (abs(elapsed - 3600) > 60) {
+			ksft_test_result_fail("clockid: %d elapsed: %lld\n",
+					      clockid, elapsed);
+			return 1;
+		}
+	}
+
+	ksft_test_result_pass("clockid=%d\n", clockid);
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret, status, len, fd;
+	char buf[4096];
+	pid_t pid;
+	struct timespec btime_now, mtime_now;
+
+	nscheck();
+
+	clock_gettime(CLOCK_MONOTONIC, &mtime_now);
+	clock_gettime(CLOCK_BOOTTIME, &btime_now);
+
+	if (unshare(CLONE_NEWTIME))
+		return pr_perror("unshare");
+
+	len = snprintf(buf, sizeof(buf), "%d %d 0\n%d %d 0",
+			CLOCK_MONOTONIC, 70 * 24 * 3600,
+			CLOCK_BOOTTIME, 9 * 24 * 3600);
+	fd = open("/proc/self/timens_offsets", O_WRONLY);
+	if (fd < 0)
+		return pr_perror("/proc/self/timens_offsets");
+
+	if (write(fd, buf, len) != len)
+		return pr_perror("/proc/self/timens_offsets");
+
+	close(fd);
+	mtime_now.tv_sec += 70 * 24 * 3600;
+	btime_now.tv_sec += 9 * 24 * 3600;
+
+	pid = fork();
+	if (pid < 0)
+		return pr_perror("Unable to fork");
+	if (pid == 0) {
+		ret = 0;
+		ret |= run_test(CLOCK_BOOTTIME, btime_now);
+		ret |= run_test(CLOCK_MONOTONIC, mtime_now);
+		ret |= run_test(CLOCK_BOOTTIME_ALARM, btime_now);
+
+		if (ret)
+			ksft_exit_fail();
+		ksft_exit_pass();
+		return ret;
+	}
+
+	if (waitpid(pid, &status, 0) != pid)
+		return pr_perror("Unable to wait the child process");
+
+	if (WIFEXITED(status))
+		return WEXITSTATUS(status);
+
+	return 1;
+}
+
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (24 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 25/28] selftest/timens: Add timer offsets test Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-14 14:13   ` Thomas Gleixner
  2019-06-12 19:26 ` [PATCHv4 27/28] selftests: Add a simple perf test for clock_gettime() Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 28/28] selftest/timens: Check that a right vdso is mapped after fork and exec Dmitry Safonov
  27 siblings, 1 reply; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

After performance testing VDSO patches a noticeable 20% regression was
found on gettime_perf selftest with a cold cache.
As it turns to be, before time namespaces introduction, VDSO functions
were quite aligned to cache lines, but adding a new code to adjust
timens offset inside namespace created a small shift and vdso functions
become unaligned on cache lines.

Add align to vdso functions with gcc option to fix performance drop.

Coping the resulting numbers from cover letter:

Hot CPU cache (more gettime_perf.c cycles - the better):
        | before     | CONFIG_TIME_NS=n | host        | inside timens
--------|------------|------------------|-------------|-------------
cycles  | 139887013  | 139453003        | 139899785   | 128792458
diff (%)| 100        | 99.7             | 100         | 92

Cold cache (lesser tsc per gettime_perf_cold.c cycle - the better):
        | before     | CONFIG_TIME_NS=n | host        | inside timens
--------|------------|------------------|-------------|-------------
tsc     | 6748       | 6718             | 6862        | 12682
diff (%)| 100        | 99.6             | 101.7       | 188

Measured on Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/entry/vdso/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b58d34120fd8..c7bfd62d1fc3 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -4,6 +4,7 @@
 #
 
 KBUILD_CFLAGS += $(DISABLE_LTO)
+KBUILD_CFLAGS += -falign-functions=$(CONFIG_X86_L1_CACHE_SHIFT)
 KASAN_SANITIZE			:= n
 UBSAN_SANITIZE			:= n
 OBJECT_FILES_NON_STANDARD	:= y
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 27/28] selftests: Add a simple perf test for clock_gettime()
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (25 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  2019-06-12 19:26 ` [PATCHv4 28/28] selftest/timens: Check that a right vdso is mapped after fork and exec Dmitry Safonov
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 tools/testing/selftests/timens/.gitignore     |  2 +
 tools/testing/selftests/timens/Makefile       |  8 +-
 tools/testing/selftests/timens/gettime_perf.c | 74 +++++++++++++++++++
 .../selftests/timens/gettime_perf_cold.c      | 63 ++++++++++++++++
 4 files changed, 146 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timens/gettime_perf.c
 create mode 100644 tools/testing/selftests/timens/gettime_perf_cold.c

diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
index 3b7eda8f35ce..16292e4d08a5 100644
--- a/tools/testing/selftests/timens/.gitignore
+++ b/tools/testing/selftests/timens/.gitignore
@@ -1,4 +1,6 @@
 clock_nanosleep
+gettime_perf
+gettime_perf_cold
 procfs
 timens
 timer
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index ae1ffd24cc43..ef65bf96b55c 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -1,4 +1,10 @@
-TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs
+TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs gettime_perf
+
+uname_M := $(shell uname -m 2>/dev/null || echo not)
+ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/)
+ifeq ($(ARCH),x86_64)
+TEST_GEN_PROGS += gettime_perf_cold
+endif
 
 CFLAGS := -Wall -Werror
 LDFLAGS := -lrt
diff --git a/tools/testing/selftests/timens/gettime_perf.c b/tools/testing/selftests/timens/gettime_perf.c
new file mode 100644
index 000000000000..510d77a941d9
--- /dev/null
+++ b/tools/testing/selftests/timens/gettime_perf.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <time.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+
+#include "log.h"
+#include "timens.h"
+
+//#define TEST_SYSCALL
+
+static void test(clock_t clockid, char *clockstr, bool in_ns)
+{
+	struct timespec tp, start;
+	long i = 0;
+	const int timeout = 3;
+
+#ifndef TEST_SYSCALL
+	clock_gettime(clockid, &start);
+#else
+	syscall(__NR_clock_gettime, clockid, &start);
+#endif
+	tp = start;
+	for (tp = start; start.tv_sec + timeout > tp.tv_sec ||
+			 (start.tv_sec + timeout == tp.tv_sec &&
+			  start.tv_nsec > tp.tv_nsec); i++) {
+#ifndef TEST_SYSCALL
+		clock_gettime(clockid, &tp);
+#else
+		syscall(__NR_clock_gettime, clockid, &tp);
+#endif
+	}
+
+	ksft_test_result_pass("%s:\tclock: %10s\tcycles:\t%10ld\n",
+			      in_ns ? "ns" : "host", clockstr, i);
+}
+
+int main(int argc, char *argv[])
+{
+	time_t offset = 10;
+	int nsfd;
+
+	test(CLOCK_MONOTONIC, "monotonic", false);
+	test(CLOCK_BOOTTIME, "boottime", false);
+
+	nscheck();
+
+	if (unshare(CLONE_NEWTIME))
+		return pr_perror("Can't unshare() timens");
+
+	nsfd = open("/proc/self/ns/time_for_children", O_RDONLY);
+	if (nsfd < 0)
+		return pr_perror("Can't open a time namespace");
+
+	if (_settime(CLOCK_MONOTONIC, offset))
+		return 1;
+	if (_settime(CLOCK_BOOTTIME, offset))
+		return 1;
+
+	if (setns(nsfd, CLONE_NEWTIME))
+		return pr_perror("setns");
+
+	test(CLOCK_MONOTONIC, "monotonic", true);
+	test(CLOCK_BOOTTIME, "boottime", true);
+
+	ksft_exit_pass();
+	return 0;
+}
diff --git a/tools/testing/selftests/timens/gettime_perf_cold.c b/tools/testing/selftests/timens/gettime_perf_cold.c
new file mode 100644
index 000000000000..f72db8a4c903
--- /dev/null
+++ b/tools/testing/selftests/timens/gettime_perf_cold.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <time.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <string.h>
+
+#include "log.h"
+#include "timens.h"
+
+static __inline__ unsigned long long rdtsc(void)
+{
+	unsigned hi, lo;
+
+	__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
+	return ((unsigned long long) lo) | (((unsigned long long)hi) << 32);
+}
+
+static void test(clock_t clockid, char *clockstr)
+{
+	struct timespec tp;
+	long long s, e;
+
+	s = rdtsc();
+	clock_gettime(clockid, &tp);
+	e = rdtsc();
+	printf("%lld\n", e - s);
+	return;
+}
+
+int main(int argc, char **argv)
+{
+	time_t offset = 10;
+	int nsfd;
+
+	if (argc == 1) {
+		test(CLOCK_MONOTONIC, "monotonic");
+		return 0;
+	}
+	nscheck();
+
+	if (unshare(CLONE_NEWTIME))
+		return pr_perror("Can't unshare() timens");
+
+	nsfd = open("/proc/self/ns/time_for_children", O_RDONLY);
+	if (nsfd < 0)
+		return pr_perror("Can't open a time namespace");
+
+	if (_settime(CLOCK_MONOTONIC, offset))
+		return 1;
+
+	if (setns(nsfd, CLONE_NEWTIME))
+		return pr_perror("setns");
+
+	test(CLOCK_MONOTONIC, "monotonic");
+	return 0;
+}
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv4 28/28] selftest/timens: Check that a right vdso is mapped after fork and exec
  2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
                   ` (26 preceding siblings ...)
  2019-06-12 19:26 ` [PATCHv4 27/28] selftests: Add a simple perf test for clock_gettime() Dmitry Safonov
@ 2019-06-12 19:26 ` Dmitry Safonov
  27 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-12 19:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrei Vagin, Dmitry Safonov, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Thomas Gleixner, Vincenzo Frascino,
	containers, criu, linux-api, x86

From: Andrei Vagin <avagin@gmail.com>

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 tools/testing/selftests/timens/.gitignore |  1 +
 tools/testing/selftests/timens/Makefile   |  2 +-
 tools/testing/selftests/timens/exec.c     | 91 +++++++++++++++++++++++
 3 files changed, 93 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/timens/exec.c

diff --git a/tools/testing/selftests/timens/.gitignore b/tools/testing/selftests/timens/.gitignore
index 16292e4d08a5..789f21e81028 100644
--- a/tools/testing/selftests/timens/.gitignore
+++ b/tools/testing/selftests/timens/.gitignore
@@ -1,4 +1,5 @@
 clock_nanosleep
+exec
 gettime_perf
 gettime_perf_cold
 procfs
diff --git a/tools/testing/selftests/timens/Makefile b/tools/testing/selftests/timens/Makefile
index ef65bf96b55c..9e0edf354906 100644
--- a/tools/testing/selftests/timens/Makefile
+++ b/tools/testing/selftests/timens/Makefile
@@ -1,4 +1,4 @@
-TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs gettime_perf
+TEST_GEN_PROGS := timens timerfd timer clock_nanosleep procfs gettime_perf exec
 
 uname_M := $(shell uname -m 2>/dev/null || echo not)
 ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/)
diff --git a/tools/testing/selftests/timens/exec.c b/tools/testing/selftests/timens/exec.c
new file mode 100644
index 000000000000..b3a05c41e202
--- /dev/null
+++ b/tools/testing/selftests/timens/exec.c
@@ -0,0 +1,91 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdbool.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <time.h>
+#include <unistd.h>
+#include <time.h>
+#include <string.h>
+
+#include "log.h"
+#include "timens.h"
+
+#define OFFSET (36000)
+
+int main(int argc, char *argv[])
+{
+	struct timespec now, tst;
+	int status, i;
+	pid_t pid;
+
+	if (argc > 1) {
+		if (sscanf(argv[1], "%ld", &now.tv_sec) != 1)
+			return pr_perror("sscanf");
+
+		for (i = 0; i < 2; i++) {
+			_gettime(CLOCK_MONOTONIC, &tst, i);
+			if (abs(tst.tv_sec - now.tv_sec) > 5)
+				return pr_fail("%ld %ld\n", now.tv_sec, tst.tv_sec);
+		}
+	}
+
+	nscheck();
+
+	clock_gettime(CLOCK_MONOTONIC, &now);
+
+	if (unshare(CLONE_NEWTIME))
+		return pr_perror("Can't unshare() timens");
+
+	if (_settime(CLOCK_MONOTONIC, OFFSET))
+		return 1;
+
+	for (i = 0; i < 2; i++) {
+		_gettime(CLOCK_MONOTONIC, &tst, i);
+		if (abs(tst.tv_sec - now.tv_sec) > 5)
+			return pr_fail("%ld %ld\n",
+					now.tv_sec, tst.tv_sec);
+	}
+
+	if (argc > 1)
+		return 0;
+
+	pid = fork();
+	if (pid < 0)
+		return pr_perror("fork");
+
+	if (pid == 0) {
+		char now_str[64];
+		char *cargv[] = {"exec", now_str, NULL};
+		char *cenv[] = {NULL};
+
+		/* Check that a child process is in the new timens. */
+		for (i = 0; i < 2; i++) {
+			_gettime(CLOCK_MONOTONIC, &tst, i);
+			if (abs(tst.tv_sec - now.tv_sec - OFFSET) > 5)
+				return pr_fail("%ld %ld\n",
+						now.tv_sec + OFFSET, tst.tv_sec);
+		}
+
+		/* Check that a proper vdso will be mapped after execve. */
+		snprintf(now_str, sizeof(now_str), "%ld", now.tv_sec + OFFSET);
+		execve("/proc/self/exe", cargv, cenv);
+		return pr_perror("execve");
+	}
+
+	if (waitpid(pid, &status, 0) != pid)
+		return pr_perror("waitpid");
+
+	if (status)
+		ksft_exit_fail();
+
+	ksft_test_result_pass("exec\n");
+	ksft_exit_pass();
+	return 0;
+}
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 02/28] timens: Add timens_offsets
  2019-06-12 19:26 ` [PATCHv4 02/28] timens: Add timens_offsets Dmitry Safonov
@ 2019-06-14 13:11   ` Thomas Gleixner
  2019-06-14 14:32     ` Dmitry Safonov
  2019-07-29 22:26     ` Dmitry Safonov
  0 siblings, 2 replies; 47+ messages in thread
From: Thomas Gleixner @ 2019-06-14 13:11 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86

On Wed, 12 Jun 2019, Dmitry Safonov wrote:

> From: Andrei Vagin <avagin@openvz.org>
> 
> Introduce offsets for time namespace. They will contain an adjustment
> needed to convert clocks to/from host's.
> 
> Allocate one page for each time namespace that will be premapped into
> userspace among vvar pages.
> index 000000000000..7d7cb68ea778
> --- /dev/null
> +++ b/include/linux/timens_offsets.h
> @@ -0,0 +1,8 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_TIME_OFFSETS_H
> +#define _LINUX_TIME_OFFSETS_H
> +
> +struct timens_offsets {
> +};

That empty struct which is nowhere used looks odd. Can you move that to the
patch which actually makes use of it?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t
  2019-06-12 19:26 ` [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t Dmitry Safonov
@ 2019-06-14 13:32   ` Thomas Gleixner
  2019-06-14 14:39     ` Dmitry Safonov
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2019-06-14 13:32 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
	linux-api, x86

Dmitry,

On Wed, 12 Jun 2019, Dmitry Safonov wrote:

> From: Andrei Vagin <avagin@gmail.com>
> 
> The callsite in common_timer_get() has already a comment:
>         /*
>          * The timespec64 based conversion is suboptimal, but it's not
>          * worth to implement yet another callback.
>          */
>         kc->clock_get(timr->it_clock, &ts64);
>         now = timespec64_to_ktime(ts64);
> 
> Now we are going to add time namespaces and we need to be able to get:

Please avoid 'we' and try to describe the changes in a neutral technical
form, e.g.:

 The upcoming support for time namespaces requires to have access to:

> * clock value in a task time namespace to return it from the clock_gettime
>   syscall.

  - The time in a tasks time namespace for sys_clock_gettime()

> * clock valuse in the root time namespace to use it in
>   common_timer_get().

  - The time in the root name space for common_timer_get()

> It looks like another reason why we need a separate callback to return
> clock value in ktime_t.

 That adds a valid reason to finally implement a separate callback which
 returns the time in ktime_t format.

Hmm?

> +int posix_get_timespec(clockid_t which_clock, struct timespec64 *tp);
> +int posix_get_boottime_timespec(const clockid_t which_clock, struct timespec64 *tp);
>  #endif
> diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
> index 0519a8805aab..68a163c8b4f2 100644
> --- a/kernel/time/alarmtimer.c
> +++ b/kernel/time/alarmtimer.c
> @@ -43,6 +43,8 @@ static struct alarm_base {
>  	spinlock_t		lock;
>  	struct timerqueue_head	timerqueue;
>  	ktime_t			(*gettime)(void);
> +	int			(*get_timespec)(const clockid_t which_clock,
> +						struct timespec64 *tp);
>  	clockid_t		base_clockid;
>  } alarm_bases[ALARM_NUMTYPE];
>  
> @@ -645,21 +647,30 @@ static int alarm_clock_getres(const clockid_t which_clock, struct timespec64 *tp
>  }
>  
>  /**
> - * alarm_clock_get - posix clock_get interface
> + * alarm_clock_get_timespec - posix clock_get_timespec interface
>   * @which_clock: clockid
>   * @tp: timespec to fill.
>   *
>   * Provides the underlying alarm base time.
>   */
> -static int alarm_clock_get(clockid_t which_clock, struct timespec64 *tp)
> +static int alarm_clock_get_timespec(clockid_t which_clock, struct timespec64 *tp)
>  {
>  	struct alarm_base *base = &alarm_bases[clock2alarm(which_clock)];
>  
>  	if (!alarmtimer_get_rtcdev())
>  		return -EINVAL;
>  
> -	*tp = ktime_to_timespec64(base->gettime());
> -	return 0;
> +	return base->get_timespec(base->base_clockid, tp);
> +}
> +
> +static ktime_t alarm_clock_get_ktime(clockid_t which_clock)

Please add kernel doc for this function. It does not make sense to have one
documented and the other not.

> +{
> +	struct alarm_base *base = &alarm_bases[clock2alarm(which_clock)];
> +
> +	if (!alarmtimer_get_rtcdev())
> +		return -EINVAL;
> +
> +	return base->gettime();
>  }
  
> --- a/kernel/time/posix-timers.h
> +++ b/kernel/time/posix-timers.h
> @@ -6,8 +6,11 @@ struct k_clock {
>  				struct timespec64 *tp);
>  	int	(*clock_set)(const clockid_t which_clock,
>  			     const struct timespec64 *tp);
> -	int	(*clock_get)(const clockid_t which_clock,
> -			     struct timespec64 *tp);
> +	/* return the clock value in the current time namespace. */
> +	int	(*clock_get_timespec)(const clockid_t which_clock,
> +				      struct timespec64 *tp);
> +	/* return the clock value in the root time namespace. */
> +	ktime_t	(*clock_get_ktime)(const clockid_t which_clock);
>  	int	(*clock_adj)(const clockid_t which_clock, struct __kernel_timex *tx);
>  	int	(*timer_create)(struct k_itimer *timer);
>  	int	(*nsleep)(const clockid_t which_clock, int flags,

TBH, this patch is way to big. It changes too many things at once. Can you
please structure it this way:

 1) Rename k_clock::clock_get to k_clock::clock_get_timespec and fix up all
    struct initializers

 2) Rename the clock_get_timespec functions per instance

 3) Add the new callback

 4) Add the new functions per instance and add them to the corresponding
    struct initializers

 5) Use the new callback

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets
  2019-06-12 19:26 ` [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets Dmitry Safonov
@ 2019-06-14 13:37   ` Thomas Gleixner
  2019-06-16 17:43     ` Dmitry Safonov
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2019-06-14 13:37 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
	linux-api, x86

On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> ---
>  fs/timerfd.c                   |  3 +++
>  include/linux/time_namespace.h | 18 ++++++++++++++++++
>  kernel/time_namespace.c        | 27 +++++++++++++++++++++++++++
>  3 files changed, 48 insertions(+)

Again, please split that into:

   1) Introduce the new function

   2) Make use of it

> diff --git a/fs/timerfd.c b/fs/timerfd.c
> index 6a6fc8aa1de7..9b0c2f65e7e8 100644
> --- a/fs/timerfd.c
> +++ b/fs/timerfd.c
> @@ -26,6 +26,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/compat.h>
>  #include <linux/rcupdate.h>
> +#include <linux/time_namespace.h>
>  
>  struct timerfd_ctx {
>  	union {
> @@ -196,6 +197,8 @@ static int timerfd_setup(struct timerfd_ctx *ctx, int flags,
>  	}
>  
>  	if (texp != 0) {
> +		if (flags & TFD_TIMER_ABSTIME)
> +			texp = timens_ktime_to_host(clockid, texp);
>  		if (isalarm(ctx)) {
>  			if (flags & TFD_TIMER_ABSTIME)
>  				alarm_start(&ctx->t.alarm, texp);
> diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
> index 1dda8af6b9fe..d32b55fad953 100644
> --- a/include/linux/time_namespace.h
> +++ b/include/linux/time_namespace.h
> @@ -56,6 +56,19 @@ static inline void timens_add_boottime(struct timespec64 *ts)
>                  *ts = timespec64_add(*ts, ns_offsets->boottime);
>  }
>  
> +ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim,
> +				struct timens_offsets *offsets);
> +static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim)
> +{
> +	struct timens_offsets *offsets = current->nsproxy->time_ns->offsets;
> +
> +	if (!offsets) /* fast-path for the root time namespace */

Can you please avoid tail comments. They break the reading flow. Aside of
that I don't see the value of documenting the obvious.

> +ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim, struct timens_offsets *ns_offsets)

Please line break the arguments

ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim,
				struct timens_offsets *ns_offsets)

> +{
> +	ktime_t koff;
> +
> +	switch (clockid) {
> +	case CLOCK_MONOTONIC:
> +		koff = timespec64_to_ktime(ns_offsets->monotonic);
> +		break;
> +	case CLOCK_BOOTTIME:
> +	case CLOCK_BOOTTIME_ALARM:
> +		koff = timespec64_to_ktime(ns_offsets->boottime);
> +		break;
> +	default:
> +		return tim;
> +	}
> +
> +	/* tim - off has to be in [0, KTIME_MAX) */

Please be more elaborate why the below conditions can happen at all.

> +	if (tim < koff)
> +		tim = 0;
> +	else if (KTIME_MAX - tim < -koff)
> +		tim = KTIME_MAX;
> +	else
> +		tim = ktime_sub(tim, koff);

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 07/28] posix-timers/timens: Take into account clock offsets
  2019-06-12 19:26 ` [PATCHv4 07/28] posix-timers/timens: Take into account " Dmitry Safonov
@ 2019-06-14 13:42   ` Thomas Gleixner
  2019-06-16 17:45     ` Dmitry Safonov
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2019-06-14 13:42 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
	linux-api, x86

On Wed, 12 Jun 2019, Dmitry Safonov wrote:

> Subject: posix-timers/timens: Take into account clock offsets

Please avoid that '/timens' appendix. It's not really a new subsystem or
subfunction of posix-timers.

posix-timers: Add time namespace support to common_timer_set()

> From: Andrei Vagin <avagin@gmail.com>
> 
> Wire timer_settime() syscall into time namespace virtualization.

Please explain why this only affects common_timer_set() and not any other
incarnation along with an explanation why only ABSTIME timers need to be
converted.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 08/28] timens/kernel: Take into account timens clock offsets in clock_nanosleep
  2019-06-12 19:26 ` [PATCHv4 08/28] timens/kernel: Take into account timens clock offsets in clock_nanosleep Dmitry Safonov
@ 2019-06-14 13:49   ` Thomas Gleixner
  0 siblings, 0 replies; 47+ messages in thread
From: Thomas Gleixner @ 2019-06-14 13:49 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
	linux-api, x86

On Wed, 12 Jun 2019, Dmitry Safonov wrote:

Again the subsystem prefix is something pulled out of thin air.

> From: Andrei Vagin <avagin@gmail.com>
> 
> Wire up clock_nanosleep() to timens offsets.
> 
> Signed-off-by: Andrei Vagin <avagin@openvz.org>
> Co-developed-by: Dmitry Safonov <dima@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  include/linux/hrtimer.h    |  2 +-
>  kernel/time/alarmtimer.c   |  2 ++
>  kernel/time/hrtimer.c      |  8 ++++----
>  kernel/time/posix-stubs.c  | 12 ++++++++++--
>  kernel/time/posix-timers.c | 19 ++++++++++++++++---
>  5 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
> index 2e8957eac4d4..5a3b3e17d0e8 100644
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -473,7 +473,7 @@ static inline u64 hrtimer_forward_now(struct hrtimer *timer,
>  /* Precise sleep: */
>  
>  extern int nanosleep_copyout(struct restart_block *, struct timespec64 *);
> -extern long hrtimer_nanosleep(const struct timespec64 *rqtp,
> +extern long hrtimer_nanosleep(ktime_t rqtp,
>  			      const enum hrtimer_mode mode,
>  			      const clockid_t clockid);
>  
> diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
> index 6346e6ee0d32..f1f42df179d0 100644
> --- a/kernel/time/alarmtimer.c
> +++ b/kernel/time/alarmtimer.c
> @@ -819,6 +819,8 @@ static int alarm_timer_nsleep(const clockid_t which_clock, int flags,
>  		ktime_t now = alarm_bases[type].gettime();
>  
>  		exp = ktime_add_safe(now, exp);
> +	} else {
> +		exp = timens_ktime_to_host(which_clock, exp);
>  	}

This one is independent of the hrtimer part. Please split it out into

 Subject: alarmtimer: Make nanosleep time namespace aware

 
>  	ret = alarmtimer_do_nsleep(&alarm, exp, type);
> diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
> index 41dfff23c1f9..b245f6ff9c8f 100644
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -1716,7 +1716,7 @@ static long __sched hrtimer_nanosleep_restart(struct restart_block *restart)
>  	return ret;
>  }
>  
> -long hrtimer_nanosleep(const struct timespec64 *rqtp,
> +long hrtimer_nanosleep(ktime_t rqtp,
>  		       const enum hrtimer_mode mode, const clockid_t clockid)

This signature change wants to be separate.

  Subject: hrtimers: Prepare hrtimer_nanosleep() for time namespaces

>  {
>  	struct restart_block *restart;
> @@ -1729,7 +1729,7 @@ long hrtimer_nanosleep(const struct timespec64 *rqtp,
>  		slack = 0;
>  
>  	hrtimer_init_on_stack(&t.timer, clockid, mode);
> -	hrtimer_set_expires_range_ns(&t.timer, timespec64_to_ktime(*rqtp), slack);
> +	hrtimer_set_expires_range_ns(&t.timer, rqtp, slack);
>  	ret = do_nanosleep(&t, mode);
>  	if (ret != -ERESTART_RESTARTBLOCK)
>  		goto out;
> @@ -1764,7 +1764,7 @@ SYSCALL_DEFINE2(nanosleep, struct __kernel_timespec __user *, rqtp,
>  
>  	current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
>  	current->restart_block.nanosleep.rmtp = rmtp;
> -	return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
> +	return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, CLOCK_MONOTONIC);
>  }
>  
>  #endif
> @@ -1784,7 +1784,7 @@ SYSCALL_DEFINE2(nanosleep_time32, struct old_timespec32 __user *, rqtp,
>  
>  	current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE;
>  	current->restart_block.nanosleep.compat_rmtp = rmtp;
> -	return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
> +	return hrtimer_nanosleep(timespec64_to_ktime(tu), HRTIMER_MODE_REL, CLOCK_MONOTONIC);
>  }
>  #endif
>  
> diff --git a/kernel/time/posix-stubs.c b/kernel/time/posix-stubs.c
> index edaf075d1ee4..4ee0dc180866 100644
> --- a/kernel/time/posix-stubs.c
> +++ b/kernel/time/posix-stubs.c
> @@ -129,6 +129,7 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
>  		struct __kernel_timespec __user *, rmtp)
>  {
>  	struct timespec64 t;
> +	ktime_t texp;
>  
>  	switch (which_clock) {
>  	case CLOCK_REALTIME:
> @@ -147,7 +148,10 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
>  		rmtp = NULL;
>  	current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
>  	current->restart_block.nanosleep.rmtp = rmtp;
> -	return hrtimer_nanosleep(&t, flags & TIMER_ABSTIME ?
> +	texp = timespec64_to_ktime(t);
> +	if (flags & TIMER_ABSTIME)
> +		texp = timens_ktime_to_host(clockid, texp;

And then add actual name space support with:

  Subject: posix-timers: Make clock_nanosleep() time namespace aware

> +	return hrtimer_nanosleep(texp, flags & TIMER_ABSTIME ?
>  				 HRTIMER_MODE_ABS : HRTIMER_MODE_REL,
>  				 which_clock);
>  }

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 09/28] timens: Shift /proc/uptime
  2019-06-12 19:26 ` [PATCHv4 09/28] timens: Shift /proc/uptime Dmitry Safonov
@ 2019-06-14 13:50   ` Thomas Gleixner
  2019-06-16 17:48     ` Dmitry Safonov
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2019-06-14 13:50 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86

On Wed, 12 Jun 2019, Dmitry Safonov wrote:

Again, please use the usual prefix and bolt not everything to
timens. timens: is the proper prefix for the actual time namespace core
code.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 15/28] x86/vdso: Add offsets page in vvar
  2019-06-12 19:26 ` [PATCHv4 15/28] x86/vdso: Add offsets page in vvar Dmitry Safonov
@ 2019-06-14 13:58   ` Thomas Gleixner
  2019-06-16 17:49     ` Dmitry Safonov
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2019-06-14 13:58 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86,
	Andrei Vagin

On Wed, 12 Jun 2019, Dmitry Safonov wrote:
>  
> +#ifdef CONFIG_TIME_NS
> +notrace static __always_inline void clk_to_ns(clockid_t clk, struct timespec *ts)
> +{
> +	struct timens_offsets *timens = (struct timens_offsets *) &timens_page;
> +	struct timespec64 *offset64;
> +
> +	switch (clk) {
> +	case CLOCK_MONOTONIC:
> +	case CLOCK_MONOTONIC_COARSE:
> +	case CLOCK_MONOTONIC_RAW:
> +		offset64 = &timens->monotonic;
> +		break;
> +	case CLOCK_BOOTTIME:
> +		offset64 = &timens->boottime;
> +	default:
> +		return;
> +	}
> +
> +	ts->tv_nsec += offset64->tv_nsec;
> +	ts->tv_sec += offset64->tv_sec;
> +	if (ts->tv_nsec >= NSEC_PER_SEC) {
> +		ts->tv_nsec -= NSEC_PER_SEC;
> +		ts->tv_sec++;
> +	}
> +	if (ts->tv_nsec < 0) {
> +		ts->tv_nsec += NSEC_PER_SEC;
> +		ts->tv_sec--;
> +	}

I had to think twice why adding the offset (which can be negative) can
never result in negative time being returned. A comment explaining this
would be appreciated.

As I'm planning to merge Vincezos VDSO consolidation into 5.3, can you
please start to work on top of his series, which should be available as
final v7 next week hopefully.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone()
  2019-06-12 19:26 ` [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone() Dmitry Safonov
@ 2019-06-14 14:05   ` Thomas Gleixner
  2019-06-16 17:51     ` Dmitry Safonov
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2019-06-14 14:05 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Dmitry Safonov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86,
	Andrei Vagin

On Wed, 12 Jun 2019, Dmitry Safonov wrote:
>  
> +#ifdef CONFIG_TIME_NS
> +int vdso_join_timens(struct task_struct *task)
> +{
> +	struct mm_struct *mm = task->mm;
> +	struct vm_area_struct *vma;
> +
> +	if (down_write_killable(&mm->mmap_sem))
> +		return -EINTR;
> +
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		unsigned long size = vma->vm_end - vma->vm_start;
> +
> +		if (vma_is_special_mapping(vma, &vvar_mapping) ||
> +		    vma_is_special_mapping(vma, &vdso_mapping))
> +			zap_page_range(vma, vma->vm_start, size);
> +	}
> +
> +	up_write(&mm->mmap_sem);
> +	return 0;
> +}
> +#else /* CONFIG_TIME_NS */
> +int vdso_join_timens(struct task_struct *task)
> +{
> +	return -ENXIO;
> +}

Is that else path really required? The callsite is only compiled when
CONFIG_TIME_NS is enabled, right?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line
  2019-06-12 19:26 ` [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line Dmitry Safonov
@ 2019-06-14 14:13   ` Thomas Gleixner
  2019-06-23  5:26     ` Andrei Vagin
  0 siblings, 1 reply; 47+ messages in thread
From: Thomas Gleixner @ 2019-06-14 14:13 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
	linux-api, x86

On Wed, 12 Jun 2019, Dmitry Safonov wrote:

> From: Andrei Vagin <avagin@gmail.com>
> 
> After performance testing VDSO patches a noticeable 20% regression was
> found on gettime_perf selftest with a cold cache.
> As it turns to be, before time namespaces introduction, VDSO functions
> were quite aligned to cache lines, but adding a new code to adjust
> timens offset inside namespace created a small shift and vdso functions
> become unaligned on cache lines.
> 
> Add align to vdso functions with gcc option to fix performance drop.
> 
> Coping the resulting numbers from cover letter:
> 
> Hot CPU cache (more gettime_perf.c cycles - the better):
>         | before     | CONFIG_TIME_NS=n | host        | inside timens
> --------|------------|------------------|-------------|-------------
> cycles  | 139887013  | 139453003        | 139899785   | 128792458
> diff (%)| 100        | 99.7             | 100         | 92

Why is CONFIG_TIME_NS=n behaving worse than current mainline and
worse than 'host' mode?

> Cold cache (lesser tsc per gettime_perf_cold.c cycle - the better):
>         | before     | CONFIG_TIME_NS=n | host        | inside timens
> --------|------------|------------------|-------------|-------------
> tsc     | 6748       | 6718             | 6862        | 12682
> diff (%)| 100        | 99.6             | 101.7       | 188

Weird, now CONFIG_TIME_NS=n is better than current mainline and 'host' mode
drops.

Either I'm misreading the numbers or missing something or I'm just confused
as usual :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 02/28] timens: Add timens_offsets
  2019-06-14 13:11   ` Thomas Gleixner
@ 2019-06-14 14:32     ` Dmitry Safonov
  2019-07-29 22:26     ` Dmitry Safonov
  1 sibling, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-14 14:32 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
	Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86

On 6/14/19 2:11 PM, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> 
>> From: Andrei Vagin <avagin@openvz.org>
>>
>> Introduce offsets for time namespace. They will contain an adjustment
>> needed to convert clocks to/from host's.
>>
>> Allocate one page for each time namespace that will be premapped into
>> userspace among vvar pages.
>> index 000000000000..7d7cb68ea778
>> --- /dev/null
>> +++ b/include/linux/timens_offsets.h
>> @@ -0,0 +1,8 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_TIME_OFFSETS_H
>> +#define _LINUX_TIME_OFFSETS_H
>> +
>> +struct timens_offsets {
>> +};
> 
> That empty struct which is nowhere used looks odd. Can you move that to the
> patch which actually makes use of it?

Sure, makes sense.

Thanks,
          Dmitry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t
  2019-06-14 13:32   ` Thomas Gleixner
@ 2019-06-14 14:39     ` Dmitry Safonov
  0 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-14 14:39 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86

Hi Thomas,

Thanks much for the review,

On 6/14/19 2:32 PM, Thomas Gleixner wrote:
> Dmitry,
> 
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> 
>> From: Andrei Vagin <avagin@gmail.com>
>>
>> The callsite in common_timer_get() has already a comment:
>>         /*
>>          * The timespec64 based conversion is suboptimal, but it's not
>>          * worth to implement yet another callback.
>>          */
>>         kc->clock_get(timr->it_clock, &ts64);
>>         now = timespec64_to_ktime(ts64);
>>
>> Now we are going to add time namespaces and we need to be able to get:
> 
> Please avoid 'we' and try to describe the changes in a neutral technical
> form, e.g.:
> 
>  The upcoming support for time namespaces requires to have access to:
> 
>> * clock value in a task time namespace to return it from the clock_gettime
>>   syscall.
> 
>   - The time in a tasks time namespace for sys_clock_gettime()
> 
>> * clock valuse in the root time namespace to use it in
>>   common_timer_get().
> 
>   - The time in the root name space for common_timer_get()
> 
>> It looks like another reason why we need a separate callback to return
>> clock value in ktime_t.
> 
>  That adds a valid reason to finally implement a separate callback which
>  returns the time in ktime_t format.
> 
> Hmm?

Agree, the patch has become bigger than wanted and the message could
have been better in technical sense. Will split, add kernel doc and fix
the commit message(s).

[..]
> TBH, this patch is way to big. It changes too many things at once. Can you
> please structure it this way:
> 
>  1) Rename k_clock::clock_get to k_clock::clock_get_timespec and fix up all
>     struct initializers
> 
>  2) Rename the clock_get_timespec functions per instance
> 
>  3) Add the new callback
> 
>  4) Add the new functions per instance and add them to the corresponding
>     struct initializers
> 
>  5) Use the new callback
> 
Thanks,
          Dima

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets
  2019-06-14 13:37   ` Thomas Gleixner
@ 2019-06-16 17:43     ` Dmitry Safonov
  0 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-16 17:43 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86

On 6/14/19 2:37 PM, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
>> ---
>>  fs/timerfd.c                   |  3 +++
>>  include/linux/time_namespace.h | 18 ++++++++++++++++++
>>  kernel/time_namespace.c        | 27 +++++++++++++++++++++++++++
>>  3 files changed, 48 insertions(+)
> 
> Again, please split that into:
> 
>    1) Introduce the new function
> 
>    2) Make use of it

Will do

> 
>> diff --git a/fs/timerfd.c b/fs/timerfd.c
>> index 6a6fc8aa1de7..9b0c2f65e7e8 100644
>> --- a/fs/timerfd.c
>> +++ b/fs/timerfd.c
>> @@ -26,6 +26,7 @@
>>  #include <linux/syscalls.h>
>>  #include <linux/compat.h>
>>  #include <linux/rcupdate.h>
>> +#include <linux/time_namespace.h>
>>  
>>  struct timerfd_ctx {
>>  	union {
>> @@ -196,6 +197,8 @@ static int timerfd_setup(struct timerfd_ctx *ctx, int flags,
>>  	}
>>  
>>  	if (texp != 0) {
>> +		if (flags & TFD_TIMER_ABSTIME)
>> +			texp = timens_ktime_to_host(clockid, texp);
>>  		if (isalarm(ctx)) {
>>  			if (flags & TFD_TIMER_ABSTIME)
>>  				alarm_start(&ctx->t.alarm, texp);
>> diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
>> index 1dda8af6b9fe..d32b55fad953 100644
>> --- a/include/linux/time_namespace.h
>> +++ b/include/linux/time_namespace.h
>> @@ -56,6 +56,19 @@ static inline void timens_add_boottime(struct timespec64 *ts)
>>                  *ts = timespec64_add(*ts, ns_offsets->boottime);
>>  }
>>  
>> +ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim,
>> +				struct timens_offsets *offsets);
>> +static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim)
>> +{
>> +	struct timens_offsets *offsets = current->nsproxy->time_ns->offsets;
>> +
>> +	if (!offsets) /* fast-path for the root time namespace */
> 
> Can you please avoid tail comments. They break the reading flow. Aside of
> that I don't see the value of documenting the obvious.
> 
>> +ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim, struct timens_offsets *ns_offsets)
> 
> Please line break the arguments
> 
> ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim,
> 				struct timens_offsets *ns_offsets)

Sure

> 
>> +{
>> +	ktime_t koff;
>> +
>> +	switch (clockid) {
>> +	case CLOCK_MONOTONIC:
>> +		koff = timespec64_to_ktime(ns_offsets->monotonic);
>> +		break;
>> +	case CLOCK_BOOTTIME:
>> +	case CLOCK_BOOTTIME_ALARM:
>> +		koff = timespec64_to_ktime(ns_offsets->boottime);
>> +		break;
>> +	default:
>> +		return tim;
>> +	}
>> +
>> +	/* tim - off has to be in [0, KTIME_MAX) */
> 
> Please be more elaborate why the below conditions can happen at all.
> 
>> +	if (tim < koff)
>> +		tim = 0;
>> +	else if (KTIME_MAX - tim < -koff)
>> +		tim = KTIME_MAX;
>> +	else
>> +		tim = ktime_sub(tim, koff);

Thanks,
          Dmitry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 07/28] posix-timers/timens: Take into account clock offsets
  2019-06-14 13:42   ` Thomas Gleixner
@ 2019-06-16 17:45     ` Dmitry Safonov
  0 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-16 17:45 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Eric W. Biederman, H. Peter Anvin, Ingo Molnar,
	Jann Horn, Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86

On 6/14/19 2:42 PM, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> 
>> Subject: posix-timers/timens: Take into account clock offsets
> 
> Please avoid that '/timens' appendix. It's not really a new subsystem or
> subfunction of posix-timers.
> 
> posix-timers: Add time namespace support to common_timer_set()

Ok

> 
>> From: Andrei Vagin <avagin@gmail.com>
>>
>> Wire timer_settime() syscall into time namespace virtualization.
> 
> Please explain why this only affects common_timer_set() and not any other
> incarnation along with an explanation why only ABSTIME timers need to be
> converted.

Will do.

Thanks,
          Dmitry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 09/28] timens: Shift /proc/uptime
  2019-06-14 13:50   ` Thomas Gleixner
@ 2019-06-16 17:48     ` Dmitry Safonov
  0 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-16 17:48 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
	Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86

On 6/14/19 2:50 PM, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> 
> Again, please use the usual prefix and bolt not everything to
> timens. timens: is the proper prefix for the actual time namespace core
> code.

Yep, will do.

Thanks,
          Dmitry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 15/28] x86/vdso: Add offsets page in vvar
  2019-06-14 13:58   ` Thomas Gleixner
@ 2019-06-16 17:49     ` Dmitry Safonov
  0 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-16 17:49 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
	Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86,
	Andrei Vagin

On 6/14/19 2:58 PM, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
>>  
>> +#ifdef CONFIG_TIME_NS
>> +notrace static __always_inline void clk_to_ns(clockid_t clk, struct timespec *ts)
>> +{
>> +	struct timens_offsets *timens = (struct timens_offsets *) &timens_page;
>> +	struct timespec64 *offset64;
>> +
>> +	switch (clk) {
>> +	case CLOCK_MONOTONIC:
>> +	case CLOCK_MONOTONIC_COARSE:
>> +	case CLOCK_MONOTONIC_RAW:
>> +		offset64 = &timens->monotonic;
>> +		break;
>> +	case CLOCK_BOOTTIME:
>> +		offset64 = &timens->boottime;
>> +	default:
>> +		return;
>> +	}
>> +
>> +	ts->tv_nsec += offset64->tv_nsec;
>> +	ts->tv_sec += offset64->tv_sec;
>> +	if (ts->tv_nsec >= NSEC_PER_SEC) {
>> +		ts->tv_nsec -= NSEC_PER_SEC;
>> +		ts->tv_sec++;
>> +	}
>> +	if (ts->tv_nsec < 0) {
>> +		ts->tv_nsec += NSEC_PER_SEC;
>> +		ts->tv_sec--;
>> +	}
> 
> I had to think twice why adding the offset (which can be negative) can
> never result in negative time being returned. A comment explaining this
> would be appreciated.
> 
> As I'm planning to merge Vincezos VDSO consolidation into 5.3, can you
> please start to work on top of his series, which should be available as
> final v7 next week hopefully.

Yes, will rebase on the top of his series.

Thanks much,
          Dmitry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone()
  2019-06-14 14:05   ` Thomas Gleixner
@ 2019-06-16 17:51     ` Dmitry Safonov
  0 siblings, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-06-16 17:51 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Adrian Reber, Andrei Vagin, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
	Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86,
	Andrei Vagin

On 6/14/19 3:05 PM, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
>>  
>> +#ifdef CONFIG_TIME_NS
>> +int vdso_join_timens(struct task_struct *task)
>> +{
>> +	struct mm_struct *mm = task->mm;
>> +	struct vm_area_struct *vma;
>> +
>> +	if (down_write_killable(&mm->mmap_sem))
>> +		return -EINTR;
>> +
>> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
>> +		unsigned long size = vma->vm_end - vma->vm_start;
>> +
>> +		if (vma_is_special_mapping(vma, &vvar_mapping) ||
>> +		    vma_is_special_mapping(vma, &vdso_mapping))
>> +			zap_page_range(vma, vma->vm_start, size);
>> +	}
>> +
>> +	up_write(&mm->mmap_sem);
>> +	return 0;
>> +}
>> +#else /* CONFIG_TIME_NS */
>> +int vdso_join_timens(struct task_struct *task)
>> +{
>> +	return -ENXIO;
>> +}
> 
> Is that else path really required? The callsite is only compiled when
> CONFIG_TIME_NS is enabled, right?

Oh, yes - will drop this.

Thanks,
          Dmitry

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line
  2019-06-14 14:13   ` Thomas Gleixner
@ 2019-06-23  5:26     ` Andrei Vagin
  0 siblings, 0 replies; 47+ messages in thread
From: Andrei Vagin @ 2019-06-23  5:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dmitry Safonov, linux-kernel, Adrian Reber, Andrei Vagin,
	Andy Lutomirski, Arnd Bergmann, Christian Brauner,
	Cyrill Gorcunov, Dmitry Safonov, Eric W. Biederman,
	H. Peter Anvin, Ingo Molnar, Jann Horn, Jeff Dike, Oleg Nesterov,
	Pavel Emelyanov, Shuah Khan, Vincenzo Frascino, containers, criu,
	linux-api, x86

On Fri, Jun 14, 2019 at 04:13:31PM +0200, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> 
> > From: Andrei Vagin <avagin@gmail.com>
> > 
> > After performance testing VDSO patches a noticeable 20% regression was
> > found on gettime_perf selftest with a cold cache.
> > As it turns to be, before time namespaces introduction, VDSO functions
> > were quite aligned to cache lines, but adding a new code to adjust
> > timens offset inside namespace created a small shift and vdso functions
> > become unaligned on cache lines.
> > 
> > Add align to vdso functions with gcc option to fix performance drop.
> > 
> > Coping the resulting numbers from cover letter:
> > 
> > Hot CPU cache (more gettime_perf.c cycles - the better):
> >         | before     | CONFIG_TIME_NS=n | host        | inside timens
> > --------|------------|------------------|-------------|-------------
> > cycles  | 139887013  | 139453003        | 139899785   | 128792458
> > diff (%)| 100        | 99.7             | 100         | 92
> 
> Why is CONFIG_TIME_NS=n behaving worse than current mainline and
> worse than 'host' mode?

We had to specify a precision of these numbers, it is more than this
0.3%, so at that time I decided that here is nothing to worry about. I
did these measurments a few mounth ago for the second version of this
series. I repeated measurments for this set of patches:

        | before    | CONFIG_TIME_NS=n | host      | inside timens
--------------------------------------------------------------
        | 144645498 | 142916801        | 140364862 | 132378440
        | 143440633 | 141545739        | 140540053 | 132714190
        | 144876395 | 144650599        | 140026814 | 131843318
        | 143984551 | 144595770        | 140359260 | 131683544
        | 144875682 | 143799788        | 140692618 | 131300332
--------------------------------------------------------------
avg     | 144364551 | 143501739        | 140396721 | 131983964
diff %  | 100       | 99.4             | 97.2      | 91.4
-------------------------------------------------------------
stdev % | 0.4       | 0.9              | 0.1       | 0.4

> 
> > Cold cache (lesser tsc per gettime_perf_cold.c cycle - the better):
> >         | before     | CONFIG_TIME_NS=n | host        | inside timens
> > --------|------------|------------------|-------------|-------------
> > tsc     | 6748       | 6718             | 6862        | 12682
> > diff (%)| 100        | 99.6             | 101.7       | 188
> 
> Weird, now CONFIG_TIME_NS=n is better than current mainline and 'host' mode
> drops.

The precision of these numbers is much smaller than of the previous set.
These numbers are for the second version of this series, so I decided to
repeat measurements for this version. When I run the test, I found that
there is some degradation in compare with v5.0. I bisected and found
that the problem is in 2b539aefe9e4 ("mm/resource: Let
walk_system_ram_range() search child resources"). At this point, I
realized that my test isn't quite right. On each iteration, the test
starts a new process, then do start=rdtsc();clock_gettime();end=rdtsc()
and prints (end-start). The problem here is that when clock_gettime() is
called the first time, vdso pages are not mapped into a process address
space, so the test measures how fast vdso pages are mapped into the
process address space. I modified this test, now it uses the clflush
instruction to drop cpu caches.  Here are the results:

           | before    | CONFIG_TIME_NS=n | host      | inside timens
--------------------------------------------------------------
tsc        | 434       | 433              | 437       | 477
stdev(tsc) | 5         | 5                | 5         | 3
diff (%)   | 1         | 1	          | 100.1     | 109

Here is the source code for the modified test:
https://github.com/avagin/linux-task-diag/blob/wip/timens-rfc-v4/tools/testing/selftests/timens/gettime_perf_cold.c

This test does 10K iterations. At the first glance, the numbers look
noisy, so I sort them and take only 8K numbers in the middle:

$ ./gettime_perf_cold > raw
$ cat raw | sort -n | tail -n 9000 | head -n 8000 > results

> 
> Either I'm misreading the numbers or missing something or I'm just confused
> as usual :)
> 
> Thanks,
>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           > 	tglx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv4 02/28] timens: Add timens_offsets
  2019-06-14 13:11   ` Thomas Gleixner
  2019-06-14 14:32     ` Dmitry Safonov
@ 2019-07-29 22:26     ` Dmitry Safonov
  1 sibling, 0 replies; 47+ messages in thread
From: Dmitry Safonov @ 2019-07-29 22:26 UTC (permalink / raw)
  To: Thomas Gleixner, Dmitry Safonov
  Cc: linux-kernel, Andrei Vagin, Adrian Reber, Andy Lutomirski,
	Arnd Bergmann, Christian Brauner, Cyrill Gorcunov,
	Eric W. Biederman, H. Peter Anvin, Ingo Molnar, Jann Horn,
	Jeff Dike, Oleg Nesterov, Pavel Emelyanov, Shuah Khan,
	Vincenzo Frascino, containers, criu, linux-api, x86

Hi Thomas,

On 6/14/19 2:11 PM, Thomas Gleixner wrote:
> On Wed, 12 Jun 2019, Dmitry Safonov wrote:
> 
>> From: Andrei Vagin <avagin@openvz.org>
>>
>> Introduce offsets for time namespace. They will contain an adjustment
>> needed to convert clocks to/from host's.
>>
>> Allocate one page for each time namespace that will be premapped into
>> userspace among vvar pages.
>> index 000000000000..7d7cb68ea778
>> --- /dev/null
>> +++ b/include/linux/timens_offsets.h
>> @@ -0,0 +1,8 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_TIME_OFFSETS_H
>> +#define _LINUX_TIME_OFFSETS_H
>> +
>> +struct timens_offsets {
>> +};
> 
> That empty struct which is nowhere used looks odd. Can you move that to the
> patch which actually makes use of it?

I've tried to move the structure into patch
[PATCHv4 05/28] timens: Introduce CLOCK_BOOTTIME offset
but that resulted in an ugly patch.

Then I've tried to make it an opaque type here [to keep
allocation/freeing in one commit], and change to a full structure in the
following CLOCK_BOOTTIME patch, but that wasn't any prettier.

So, we've [with Andrei] addressed your critics but this in v5.

Just to let you know, that we haven't silently ignored your review, but
found that it might be prettier to keep the patch as-is..
I'll move it in v6 if it still makes sense in v5.

Thanks much for your time,
          Dmitry

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2019-07-29 22:26 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-12 19:25 [PATCHv4 00/28] kernel: Introduce Time Namespace Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 01/28] ns: " Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 02/28] timens: Add timens_offsets Dmitry Safonov
2019-06-14 13:11   ` Thomas Gleixner
2019-06-14 14:32     ` Dmitry Safonov
2019-07-29 22:26     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 03/28] posix-clocks: add another call back to return clock time in ktime_t Dmitry Safonov
2019-06-14 13:32   ` Thomas Gleixner
2019-06-14 14:39     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 04/28] timens: Introduce CLOCK_MONOTONIC offsets Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 05/28] timens: Introduce CLOCK_BOOTTIME offset Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 06/28] timerfd/timens: Take into account ns clock offsets Dmitry Safonov
2019-06-14 13:37   ` Thomas Gleixner
2019-06-16 17:43     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 07/28] posix-timers/timens: Take into account " Dmitry Safonov
2019-06-14 13:42   ` Thomas Gleixner
2019-06-16 17:45     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 08/28] timens/kernel: Take into account timens clock offsets in clock_nanosleep Dmitry Safonov
2019-06-14 13:49   ` Thomas Gleixner
2019-06-12 19:26 ` [PATCHv4 09/28] timens: Shift /proc/uptime Dmitry Safonov
2019-06-14 13:50   ` Thomas Gleixner
2019-06-16 17:48     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 10/28] x86/vdso2c: Correct err messages on file opening Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 11/28] x86/vdso2c: Convert iterator to unsigned Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 12/28] x86/vdso/Makefile: Add vobjs32 Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 13/28] x86/vdso: Restrict splitting VVAR VMA Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 14/28] x86/vdso: Rename vdso_image {.data=>.text} Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 15/28] x86/vdso: Add offsets page in vvar Dmitry Safonov
2019-06-14 13:58   ` Thomas Gleixner
2019-06-16 17:49     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 16/28] x86/vdso: Allocate timens vdso Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 17/28] x86/vdso: Switch image on setns()/unshare()/clone() Dmitry Safonov
2019-06-14 14:05   ` Thomas Gleixner
2019-06-16 17:51     ` Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 18/28] vdso: introduce timens_static_branch Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 19/28] timens: Add align for timens_offsets Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 20/28] timens/fs/proc: Introduce /proc/pid/timens_offsets Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 21/28] selftest/timens: Add Time Namespace test for supported clocks Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 22/28] selftest/timens: Add a test for timerfd Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 23/28] selftest/timens: Add a test for clock_nanosleep() Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 24/28] selftest/timens: Add procfs selftest Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 25/28] selftest/timens: Add timer offsets test Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 26/28] x86/vdso: Align VDSO functions by CPU L1 cache line Dmitry Safonov
2019-06-14 14:13   ` Thomas Gleixner
2019-06-23  5:26     ` Andrei Vagin
2019-06-12 19:26 ` [PATCHv4 27/28] selftests: Add a simple perf test for clock_gettime() Dmitry Safonov
2019-06-12 19:26 ` [PATCHv4 28/28] selftest/timens: Check that a right vdso is mapped after fork and exec Dmitry Safonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).