Linux-kselftest Archive on lore.kernel.org
 help / color / Atom feed
WARNING: multiple messages refer to this Message-ID
From: dima at arista.com (Dmitry Safonov)
Subject: [RFC 00/20] ns: Introduce Time Namespace
Date: Wed, 19 Sep 2018 21:50:17 +0100
Message-ID: <20180919205037.9574-1-dima@arista.com> (raw)

[-- Warning: decoded text below may be mangled --]
[-- Attachment #0: Type: text/plain, Size: 6405 bytes --]

Discussions around time virtualization are there for a long time.
The first attempt to implement time namespace was in 2006 by Jeff Dike.
>From that time, the topic appears on and off in various discussions.

There are two main use cases for time namespaces:
1. change date and time inside a container;
2. adjust clocks for a container restored from a checkpoint.

“It seems like this might be one of the last major obstacles keeping
migration from being used in production systems, given that not all
containers and connections can be migrated as long as a time dependency
is capable of messing it up.” (by github.com/dav-ell)

The kernel provides access to several clocks: CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
start points for them are not defined and are different for each running
system. When a container is migrated from one node to another, all
clocks have to be restored into consistent states; in other words, they
have to continue running from the same points where they have been
dumped.

The main idea behind this patch set is adding per-namespace offsets for
system clocks. When a process in a non-root time namespace requests
time of a clock, a namespace offset is added to the current value of
this clock on a host and the sum is returned.

All offsets are placed on a separate page, this allows up to map it as 
part of vvar into user processes and use offsets from vdso calls.

Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME
clocks.

Questions to discuss:

* Clone flags exhaustion. Currently there is only one unused clone flag
bit left, and it may be worth to use it to extend arguments of the clone
system call.

* Realtime clock implementation details:
  Is having a simple offset enough?
  What to do when date and time is changed on the host?
  Is there a need to adjust vfs modification and creation times? 
  Implementation for adjtime() syscall.

Cc: Dmitry Safonov <0x7f454c46 at gmail.com>
Cc: Adrian Reber <adrian at lisas.de>
Cc: Andrei Vagin <avagin at openvz.org>
Cc: Andy Lutomirski <luto at kernel.org>
Cc: Christian Brauner <christian.brauner at ubuntu.com>
Cc: Cyrill Gorcunov <gorcunov at openvz.org>
Cc: "Eric W. Biederman" <ebiederm at xmission.com>
Cc: "H. Peter Anvin" <hpa at zytor.com> 
Cc: Ingo Molnar <mingo at redhat.com>
Cc: Jeff Dike <jdike at addtoit.com>
Cc: Oleg Nesterov <oleg at redhat.com>
Cc: Pavel Emelyanov <xemul at virtuozzo.com>
Cc: Shuah Khan <shuah at kernel.org>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: containers at lists.linux-foundation.org
Cc: criu at openvz.org
Cc: linux-api at vger.kernel.org
Cc: x86 at kernel.org

Andrei Vagin (12):
  ns: Introduce Time Namespace
  timens: Add timens_offsets
  timens: Introduce CLOCK_MONOTONIC offsets
  timens: Introduce CLOCK_BOOTTIME offset
  timerfd/timens: Take into account ns clock offsets
  kernel: Take into account timens clock offsets in clock_nanosleep
  x86/vdso/timens: Add offsets page in vvar
  x86/vdso: Use set_normalized_timespec() to avoid 32 bit overflow
  posix-timers/timens: Take into account clock offsets
  selftest/timens: Add test for timerfd
  selftest/timens: Add test for clock_nanosleep
  timens/selftest: Add timer offsets test

Dmitry Safonov (8):
  timens: Shift /proc/uptime
  x86/vdso: Restrict splitting vvar vma
  x86/vdso: Purge timens page on setns()/unshare()/clone()
  x86/vdso: Look for vvar vma to purge timens page
  timens: Add align for timens_offsets
  timens: Optimize zero-offsets
  selftest: Add Time Namespace test for supported clocks
  timens/selftest: Add procfs selftest

 arch/Kconfig                                     |   5 +
 arch/x86/Kconfig                                 |   1 +
 arch/x86/entry/vdso/vclock_gettime.c             |  52 +++++
 arch/x86/entry/vdso/vdso-layout.lds.S            |   9 +-
 arch/x86/entry/vdso/vdso2c.c                     |   3 +
 arch/x86/entry/vdso/vma.c                        |  67 +++++++
 arch/x86/include/asm/vdso.h                      |   2 +
 fs/proc/namespaces.c                             |   3 +
 fs/proc/uptime.c                                 |   3 +
 fs/timerfd.c                                     |  16 +-
 include/linux/nsproxy.h                          |   1 +
 include/linux/proc_ns.h                          |   1 +
 include/linux/time_namespace.h                   |  72 +++++++
 include/linux/timens_offsets.h                   |  25 +++
 include/linux/user_namespace.h                   |   1 +
 include/uapi/linux/sched.h                       |   1 +
 init/Kconfig                                     |   8 +
 kernel/Makefile                                  |   1 +
 kernel/fork.c                                    |   3 +-
 kernel/nsproxy.c                                 |  19 +-
 kernel/time/hrtimer.c                            |   8 +
 kernel/time/posix-timers.c                       |  89 ++++++++-
 kernel/time/posix-timers.h                       |   2 +
 kernel/time_namespace.c                          | 230 +++++++++++++++++++++++
 tools/testing/selftests/timens/.gitignore        |   5 +
 tools/testing/selftests/timens/Makefile          |   6 +
 tools/testing/selftests/timens/clock_nanosleep.c |  98 ++++++++++
 tools/testing/selftests/timens/config            |   1 +
 tools/testing/selftests/timens/log.h             |  21 +++
 tools/testing/selftests/timens/procfs.c          | 145 ++++++++++++++
 tools/testing/selftests/timens/timens.c          | 196 +++++++++++++++++++
 tools/testing/selftests/timens/timer.c           |  95 ++++++++++
 tools/testing/selftests/timens/timerfd.c         |  96 ++++++++++
 33 files changed, 1272 insertions(+), 13 deletions(-)
 create mode 100644 include/linux/time_namespace.h
 create mode 100644 include/linux/timens_offsets.h
 create mode 100644 kernel/time_namespace.c
 create mode 100644 tools/testing/selftests/timens/.gitignore
 create mode 100644 tools/testing/selftests/timens/Makefile
 create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c
 create mode 100644 tools/testing/selftests/timens/config
 create mode 100644 tools/testing/selftests/timens/log.h
 create mode 100644 tools/testing/selftests/timens/procfs.c
 create mode 100644 tools/testing/selftests/timens/timens.c
 create mode 100644 tools/testing/selftests/timens/timer.c
 create mode 100644 tools/testing/selftests/timens/timerfd.c

-- 
2.13.6

From: dima@arista.com (Dmitry Safonov)
Subject: [RFC 00/20] ns: Introduce Time Namespace
Date: Wed, 19 Sep 2018 21:50:17 +0100
Message-ID: <20180919205037.9574-1-dima@arista.com> (raw)
Message-ID: <20180919205017.IgkwHFA1WWNbVoccEQOYM9ccRdPfUZB9UNGVccfW_dw@z> (raw)

Discussions around time virtualization are there for a long time.
The first attempt to implement time namespace was in 2006 by Jeff Dike.
>From that time, the topic appears on and off in various discussions.

There are two main use cases for time namespaces:
1. change date and time inside a container;
2. adjust clocks for a container restored from a checkpoint.

“It seems like this might be one of the last major obstacles keeping
migration from being used in production systems, given that not all
containers and connections can be migrated as long as a time dependency
is capable of messing it up.” (by github.com/dav-ell)

The kernel provides access to several clocks: CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
start points for them are not defined and are different for each running
system. When a container is migrated from one node to another, all
clocks have to be restored into consistent states; in other words, they
have to continue running from the same points where they have been
dumped.

The main idea behind this patch set is adding per-namespace offsets for
system clocks. When a process in a non-root time namespace requests
time of a clock, a namespace offset is added to the current value of
this clock on a host and the sum is returned.

All offsets are placed on a separate page, this allows up to map it as 
part of vvar into user processes and use offsets from vdso calls.

Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME
clocks.

Questions to discuss:

* Clone flags exhaustion. Currently there is only one unused clone flag
bit left, and it may be worth to use it to extend arguments of the clone
system call.

* Realtime clock implementation details:
  Is having a simple offset enough?
  What to do when date and time is changed on the host?
  Is there a need to adjust vfs modification and creation times? 
  Implementation for adjtime() syscall.

Cc: Dmitry Safonov <0x7f454c46 at gmail.com>
Cc: Adrian Reber <adrian at lisas.de>
Cc: Andrei Vagin <avagin at openvz.org>
Cc: Andy Lutomirski <luto at kernel.org>
Cc: Christian Brauner <christian.brauner at ubuntu.com>
Cc: Cyrill Gorcunov <gorcunov at openvz.org>
Cc: "Eric W. Biederman" <ebiederm at xmission.com>
Cc: "H. Peter Anvin" <hpa at zytor.com> 
Cc: Ingo Molnar <mingo at redhat.com>
Cc: Jeff Dike <jdike at addtoit.com>
Cc: Oleg Nesterov <oleg at redhat.com>
Cc: Pavel Emelyanov <xemul at virtuozzo.com>
Cc: Shuah Khan <shuah at kernel.org>
Cc: Thomas Gleixner <tglx at linutronix.de>
Cc: containers at lists.linux-foundation.org
Cc: criu at openvz.org
Cc: linux-api at vger.kernel.org
Cc: x86 at kernel.org

Andrei Vagin (12):
  ns: Introduce Time Namespace
  timens: Add timens_offsets
  timens: Introduce CLOCK_MONOTONIC offsets
  timens: Introduce CLOCK_BOOTTIME offset
  timerfd/timens: Take into account ns clock offsets
  kernel: Take into account timens clock offsets in clock_nanosleep
  x86/vdso/timens: Add offsets page in vvar
  x86/vdso: Use set_normalized_timespec() to avoid 32 bit overflow
  posix-timers/timens: Take into account clock offsets
  selftest/timens: Add test for timerfd
  selftest/timens: Add test for clock_nanosleep
  timens/selftest: Add timer offsets test

Dmitry Safonov (8):
  timens: Shift /proc/uptime
  x86/vdso: Restrict splitting vvar vma
  x86/vdso: Purge timens page on setns()/unshare()/clone()
  x86/vdso: Look for vvar vma to purge timens page
  timens: Add align for timens_offsets
  timens: Optimize zero-offsets
  selftest: Add Time Namespace test for supported clocks
  timens/selftest: Add procfs selftest

 arch/Kconfig                                     |   5 +
 arch/x86/Kconfig                                 |   1 +
 arch/x86/entry/vdso/vclock_gettime.c             |  52 +++++
 arch/x86/entry/vdso/vdso-layout.lds.S            |   9 +-
 arch/x86/entry/vdso/vdso2c.c                     |   3 +
 arch/x86/entry/vdso/vma.c                        |  67 +++++++
 arch/x86/include/asm/vdso.h                      |   2 +
 fs/proc/namespaces.c                             |   3 +
 fs/proc/uptime.c                                 |   3 +
 fs/timerfd.c                                     |  16 +-
 include/linux/nsproxy.h                          |   1 +
 include/linux/proc_ns.h                          |   1 +
 include/linux/time_namespace.h                   |  72 +++++++
 include/linux/timens_offsets.h                   |  25 +++
 include/linux/user_namespace.h                   |   1 +
 include/uapi/linux/sched.h                       |   1 +
 init/Kconfig                                     |   8 +
 kernel/Makefile                                  |   1 +
 kernel/fork.c                                    |   3 +-
 kernel/nsproxy.c                                 |  19 +-
 kernel/time/hrtimer.c                            |   8 +
 kernel/time/posix-timers.c                       |  89 ++++++++-
 kernel/time/posix-timers.h                       |   2 +
 kernel/time_namespace.c                          | 230 +++++++++++++++++++++++
 tools/testing/selftests/timens/.gitignore        |   5 +
 tools/testing/selftests/timens/Makefile          |   6 +
 tools/testing/selftests/timens/clock_nanosleep.c |  98 ++++++++++
 tools/testing/selftests/timens/config            |   1 +
 tools/testing/selftests/timens/log.h             |  21 +++
 tools/testing/selftests/timens/procfs.c          | 145 ++++++++++++++
 tools/testing/selftests/timens/timens.c          | 196 +++++++++++++++++++
 tools/testing/selftests/timens/timer.c           |  95 ++++++++++
 tools/testing/selftests/timens/timerfd.c         |  96 ++++++++++
 33 files changed, 1272 insertions(+), 13 deletions(-)
 create mode 100644 include/linux/time_namespace.h
 create mode 100644 include/linux/timens_offsets.h
 create mode 100644 kernel/time_namespace.c
 create mode 100644 tools/testing/selftests/timens/.gitignore
 create mode 100644 tools/testing/selftests/timens/Makefile
 create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c
 create mode 100644 tools/testing/selftests/timens/config
 create mode 100644 tools/testing/selftests/timens/log.h
 create mode 100644 tools/testing/selftests/timens/procfs.c
 create mode 100644 tools/testing/selftests/timens/timens.c
 create mode 100644 tools/testing/selftests/timens/timer.c
 create mode 100644 tools/testing/selftests/timens/timerfd.c

-- 
2.13.6

             reply index

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-19 20:50 dima at arista.com (Dmitry Safonov) [this message]
2018-09-19 20:50 ` dima
2018-09-19 20:50 ` [RFC 16/20] selftest: Add Time Namespace test for supported clocks dima at arista.com (Dmitry Safonov)
2018-09-19 20:50   ` dima
2018-09-24 21:36   ` shuah at kernel.org (Shuah Khan)
2018-09-24 21:36     ` shuah
2018-09-19 20:50 ` [RFC 17/20] selftest/timens: Add test for timerfd dima at arista.com (Dmitry Safonov)
2018-09-19 20:50   ` dima
2018-09-19 20:50 ` [RFC 18/20] selftest/timens: Add test for clock_nanosleep dima at arista.com (Dmitry Safonov)
2018-09-19 20:50   ` dima
2018-09-19 20:50 ` [RFC 19/20] timens/selftest: Add procfs selftest dima at arista.com (Dmitry Safonov)
2018-09-19 20:50   ` dima
2018-09-19 20:50 ` [RFC 20/20] timens/selftest: Add timer offsets test dima at arista.com (Dmitry Safonov)
2018-09-19 20:50   ` dima
2018-09-21 12:27 ` [RFC 00/20] ns: Introduce Time Namespace ebiederm at xmission.com (Eric W. Biederman)
2018-09-21 12:27   ` ebiederm
2018-09-24 20:51   ` avagin at virtuozzo.com (Andrey Vagin)
2018-09-24 20:51     ` avagin
2018-09-24 22:02     ` ebiederm at xmission.com (Eric W. Biederman)
2018-09-24 22:02       ` ebiederm
2018-09-25  1:42       ` avagin at virtuozzo.com (Andrey Vagin)
2018-09-25  1:42         ` avagin
2018-09-26 17:36         ` ebiederm at xmission.com (Eric W. Biederman)
2018-09-26 17:36           ` ebiederm
2018-09-26 17:59           ` 0x7f454c46 at gmail.com (Dmitry Safonov)
2018-09-26 17:59             ` 0x7f454c46
2018-09-27 21:30           ` tglx at linutronix.de (Thomas Gleixner)
2018-09-27 21:30             ` tglx
2018-09-27 21:41             ` tglx at linutronix.de (Thomas Gleixner)
2018-09-27 21:41               ` tglx
2018-10-01 23:20               ` avagin at virtuozzo.com (Andrey Vagin)
2018-10-01 23:20                 ` avagin
2018-10-02  6:15                 ` tglx at linutronix.de (Thomas Gleixner)
2018-10-02  6:15                   ` tglx
2018-10-02 21:05                   ` 0x7f454c46 at gmail.com (Dmitry Safonov)
2018-10-02 21:05                     ` 0x7f454c46
2018-10-02 21:26                     ` tglx at linutronix.de (Thomas Gleixner)
2018-10-02 21:26                       ` tglx
2018-09-28 17:03             ` ebiederm at xmission.com (Eric W. Biederman)
2018-09-28 17:03               ` ebiederm
2018-09-28 19:32               ` tglx at linutronix.de (Thomas Gleixner)
2018-09-28 19:32                 ` tglx
2018-10-01  9:05                 ` ebiederm at xmission.com (Eric W. Biederman)
2018-10-01  9:05                   ` ebiederm
2018-10-01  9:15                 ` Setting monotonic time? ebiederm at xmission.com (Eric W. Biederman)
2018-10-01  9:15                   ` ebiederm
2018-10-01 18:52                   ` tglx at linutronix.de (Thomas Gleixner)
2018-10-01 18:52                     ` tglx
2018-10-02 20:00                     ` arnd at arndb.de (Arnd Bergmann)
2018-10-02 20:00                       ` arnd
2018-10-02 20:06                       ` tglx at linutronix.de (Thomas Gleixner)
2018-10-02 20:06                         ` tglx
2018-10-03  4:50                         ` ebiederm at xmission.com (Eric W. Biederman)
2018-10-03  4:50                           ` ebiederm
2018-10-03  5:25                           ` tglx at linutronix.de (Thomas Gleixner)
2018-10-03  5:25                             ` tglx
2018-10-03  6:14                             ` ebiederm at xmission.com (Eric W. Biederman)
2018-10-03  6:14                               ` ebiederm
2018-10-03  7:02                               ` arnd at arndb.de (Arnd Bergmann)
2018-10-03  7:02                                 ` arnd
2018-10-03  6:14                             ` tglx at linutronix.de (Thomas Gleixner)
2018-10-03  6:14                               ` tglx
2018-10-01 20:51                   ` avagin at virtuozzo.com (Andrey Vagin)
2018-10-01 20:51                     ` avagin
2018-10-02  6:16                     ` tglx at linutronix.de (Thomas Gleixner)
2018-10-02  6:16                       ` tglx
2018-10-21  1:41               ` [RFC 00/20] ns: Introduce Time Namespace avagin at gmail.com (Andrei Vagin)
2018-10-21  1:41                 ` avagin
2018-10-21  3:54                 ` avagin at gmail.com (Andrei Vagin)
2018-10-21  3:54                   ` avagin
2018-10-29 20:33                 ` tglx at linutronix.de (Thomas Gleixner)
2018-10-29 20:33                   ` tglx
2018-10-29 21:21                   ` ebiederm at xmission.com (Eric W. Biederman)
2018-10-29 21:21                     ` ebiederm
2018-10-29 21:36                     ` tglx at linutronix.de (Thomas Gleixner)
2018-10-29 21:36                       ` tglx
2018-10-31 16:26                   ` avagin at gmail.com (Andrei Vagin)
2018-10-31 16:26                     ` avagin

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180919205037.9574-1-dima@arista.com \
    --to= \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-kselftest Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-kselftest/0 linux-kselftest/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-kselftest linux-kselftest/ https://lore.kernel.org/linux-kselftest \
		linux-kselftest@vger.kernel.org linux-kselftest@archiver.kernel.org
	public-inbox-index linux-kselftest


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kselftest


AGPL code for this site: git clone https://public-inbox.org/ public-inbox