linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
Subject: [RFC 00/20] ns: Introduce Time Namespace
Date: Tue, 25 Sep 2018 00:02:32 +0200	[thread overview]
Message-ID: <874leezh8n.fsf@xmission.com> (raw)
Message-ID: <20180924220232.9pPQ5qJOc4m49hy1E7y7wRzLWdr3yfYNisem6WC8V2M@z> (raw)
In-Reply-To: <20180924205119.GA14833@outlook.office365.com> (Andrey Vagin's message of "Mon, 24 Sep 2018 20:51:33 +0000")

Andrey Vagin <avagin at virtuozzo.com> writes:

> On Fri, Sep 21, 2018@02:27:29PM +0200, Eric W. Biederman wrote:
>> Dmitry Safonov <dima at arista.com> writes:
>> 
>> > Discussions around time virtualization are there for a long time.
>> > The first attempt to implement time namespace was in 2006 by Jeff Dike.
>> > From that time, the topic appears on and off in various discussions.
>> >
>> > There are two main use cases for time namespaces:
>> > 1. change date and time inside a container;
>> > 2. adjust clocks for a container restored from a checkpoint.
>> >
>> > “It seems like this might be one of the last major obstacles keeping
>> > migration from being used in production systems, given that not all
>> > containers and connections can be migrated as long as a time dependency
>> > is capable of messing it up.” (by github.com/dav-ell)
>> >
>> > The kernel provides access to several clocks: CLOCK_REALTIME,
>> > CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
>> > start points for them are not defined and are different for each running
>> > system. When a container is migrated from one node to another, all
>> > clocks have to be restored into consistent states; in other words, they
>> > have to continue running from the same points where they have been
>> > dumped.
>> >
>> > The main idea behind this patch set is adding per-namespace offsets for
>> > system clocks. When a process in a non-root time namespace requests
>> > time of a clock, a namespace offset is added to the current value of
>> > this clock on a host and the sum is returned.
>> >
>> > All offsets are placed on a separate page, this allows up to map it as 
>> > part of vvar into user processes and use offsets from vdso calls.
>> >
>> > Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME
>> > clocks.
>> >
>> > Questions to discuss:
>> >
>> > * Clone flags exhaustion. Currently there is only one unused clone flag
>> > bit left, and it may be worth to use it to extend arguments of the clone
>> > system call.
>> >
>> > * Realtime clock implementation details:
>> >   Is having a simple offset enough?
>> >   What to do when date and time is changed on the host?
>> >   Is there a need to adjust vfs modification and creation times? 
>> >   Implementation for adjtime() syscall.
>> 
>> Overall I support this effort.  In my quick skim this code looked good.
>
> Hi Eric,
>
> Thank you for the feedback.
>
>> 
>> My feeling is that we need to be able to support running ntpd and
>> support one namespace doing googles smoothing of leap seconds while
>> another namespace takes the leap second.
>> 
>> What I was imagining when I was last thinking about this was one
>> instance of struct timekeeper aka tk_core per time namespace.  That
>> structure already keeps offsets for all of the various clocks from
>> the kerne internal time sources.  What would be needed would be to
>> pass in an appropriate time namespace pointer.
>> 
>> I could be completely wrong as I have not take the time to completely
>> trace through the code.  Have you looked at pushing the time namespace
>> down as far as tk_core?
>> 
>> What I think would be the big advantage (besides ntp working) is that
>> the bulk of the code could be reused.  Allowing testing of the kernel's
>> time code by setting up a new time namespace.  So a person in production
>> could setup a time namespace with the time set ahead a little  bit and
>> be able to verify that the kernel handles the upcoming leap second
>> properly.
>>
>
> It is an interesting idea, but I have a few questions:
>
> 1. Does it mean that timekeeping_update() will be called for each
> namespace? This functions is called periodically, it updates times on the
> timekeeper structure, updates vsyscall_gtod_data, etc. What will be an
> overhead of this?

I don't know if periodically is a proper characterization.  There may be
a code path that does that.  But from what I can see timekeeping_update
is the guts of settimeofday (and a few related functions).

So it appears to make sense for timekeeping_update to be per namespace.

Hmm.  Looking at what is updated in the vsyscall_gtod_data it does
look like you would have to periodically update things, but I don't know
big that period would be.  As long as the period is reasonably large,
or the time namespaces were sufficiently deschronized it should not
be a problem.  But that is the class of problem that could make
my ideal impractical if there is measuarable overhead.

Where were you seeing timekeeping_update being called periodically?

> 2. What will we do with vdso? It looks like we will have to have a
> separate vsyscall_gtod_data for each ns and update each of them
> separately.

Yes.  But you don't have to have introduce another variable just make
certain vsyscall_gtod_data is a page aligned thing per time namespace.

If I read the summary of the existing patchset something very similiar
is already going on.

Each process would only map one.  And unshare of the time namespace
would need to act like the pid namespace or be limited to only being
allowed when there is only a single task using the mm.

Eric

  parent reply	other threads:[~2018-09-24 22:02 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-19 20:50 [RFC 00/20] ns: Introduce Time Namespace dima
2018-09-19 20:50 ` Dmitry Safonov
2018-09-19 20:50 ` [RFC 16/20] selftest: Add Time Namespace test for supported clocks dima
2018-09-19 20:50   ` Dmitry Safonov
2018-09-24 21:36   ` shuah
2018-09-24 21:36     ` Shuah Khan
2018-09-19 20:50 ` [RFC 17/20] selftest/timens: Add test for timerfd dima
2018-09-19 20:50   ` Dmitry Safonov
2018-09-19 20:50 ` [RFC 18/20] selftest/timens: Add test for clock_nanosleep dima
2018-09-19 20:50   ` Dmitry Safonov
2018-09-19 20:50 ` [RFC 19/20] timens/selftest: Add procfs selftest dima
2018-09-19 20:50   ` Dmitry Safonov
2018-09-19 20:50 ` [RFC 20/20] timens/selftest: Add timer offsets test dima
2018-09-19 20:50   ` Dmitry Safonov
2018-09-21 12:27 ` [RFC 00/20] ns: Introduce Time Namespace ebiederm
2018-09-21 12:27   ` Eric W. Biederman
2018-09-24 20:51   ` avagin
2018-09-24 20:51     ` Andrey Vagin
2018-09-24 22:02     ` ebiederm [this message]
2018-09-24 22:02       ` Eric W. Biederman
2018-09-25  1:42       ` avagin
2018-09-25  1:42         ` Andrey Vagin
2018-09-26 17:36         ` ebiederm
2018-09-26 17:36           ` Eric W. Biederman
2018-09-26 17:59           ` 0x7f454c46
2018-09-26 17:59             ` Dmitry Safonov
2018-09-27 21:30           ` tglx
2018-09-27 21:30             ` Thomas Gleixner
2018-09-27 21:41             ` tglx
2018-09-27 21:41               ` Thomas Gleixner
2018-10-01 23:20               ` avagin
2018-10-01 23:20                 ` Andrey Vagin
2018-10-02  6:15                 ` tglx
2018-10-02  6:15                   ` Thomas Gleixner
2018-10-02 21:05                   ` 0x7f454c46
2018-10-02 21:05                     ` Dmitry Safonov
2018-10-02 21:26                     ` tglx
2018-10-02 21:26                       ` Thomas Gleixner
2018-09-28 17:03             ` ebiederm
2018-09-28 17:03               ` Eric W. Biederman
2018-09-28 19:32               ` tglx
2018-09-28 19:32                 ` Thomas Gleixner
2018-10-01  9:05                 ` ebiederm
2018-10-01  9:05                   ` Eric W. Biederman
2018-10-01  9:15                 ` Setting monotonic time? ebiederm
2018-10-01  9:15                   ` Eric W. Biederman
2018-10-01 18:52                   ` tglx
2018-10-01 18:52                     ` Thomas Gleixner
2018-10-02 20:00                     ` arnd
2018-10-02 20:00                       ` Arnd Bergmann
2018-10-02 20:06                       ` tglx
2018-10-02 20:06                         ` Thomas Gleixner
2018-10-03  4:50                         ` ebiederm
2018-10-03  4:50                           ` Eric W. Biederman
2018-10-03  5:25                           ` tglx
2018-10-03  5:25                             ` Thomas Gleixner
2018-10-03  6:14                             ` ebiederm
2018-10-03  6:14                               ` Eric W. Biederman
2018-10-03  7:02                               ` arnd
2018-10-03  7:02                                 ` Arnd Bergmann
2018-10-03  6:14                             ` tglx
2018-10-03  6:14                               ` Thomas Gleixner
2018-10-01 20:51                   ` avagin
2018-10-01 20:51                     ` Andrey Vagin
2018-10-02  6:16                     ` tglx
2018-10-02  6:16                       ` Thomas Gleixner
2018-10-21  1:41               ` [RFC 00/20] ns: Introduce Time Namespace avagin
2018-10-21  1:41                 ` Andrei Vagin
2018-10-21  3:54                 ` avagin
2018-10-21  3:54                   ` Andrei Vagin
2018-10-29 20:33                 ` tglx
2018-10-29 20:33                   ` Thomas Gleixner
2018-10-29 21:21                   ` ebiederm
2018-10-29 21:21                     ` Eric W. Biederman
2018-10-29 21:36                     ` tglx
2018-10-29 21:36                       ` Thomas Gleixner
2018-10-31 16:26                   ` avagin
2018-10-31 16:26                     ` Andrei Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874leezh8n.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).