Linux-api Archive on lore.kernel.org
 help / color / Atom feed
From: Michael Kerrisk <mtk.manpages@gmail.com>
To: Dmitry Safonov <dima@arista.com>, Andrei Vagin <avagin@openvz.org>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	Dmitry Safonov <0x7f454c46@gmail.com>,
	Adrian Reber <adrian@lisas.de>, Andy Lutomirski <luto@kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Jann Horn <jannh@google.com>, Jeff Dike <jdike@addtoit.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Shuah Khan <shuah@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	containers <containers@lists.linux-foundation.org>,
	criu@openvz.org, Linux API <linux-api@vger.kernel.org>,
	x86@kernel.org, Andrei Vagin <avagin@gmail.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Time Namespaces: CLONE_NEWTIME and clone3()?
Date: Mon, 17 Feb 2020 15:20:55 +0100
Message-ID: <CAHO5Pa2_7BzZPCXjFj4f=YoX28M4q2Au=h6GrzN-EjRffMo1iw@mail.gmail.com> (raw)
In-Reply-To: <20191112012724.250792-4-dima@arista.com>

Hello Dmitry, Andrei,

Is the CLONE_NEWTIME flag intended to be usable with clone3()? The
mail quoted below implies (in my reading) that this should be possible
once clone3() is available, which it is by now. (See also [1].)

If the answer is yes, CLONE_NEWTIME  should be usable with clone3(),
then I have a bug report and a question.

I successfully used CLONE_NEWTIME with unshare(). But if I try to use
CLONE_NEWSIGNAL with clone3(), it errors out with EINVAL, because of
the following check in clone3_args_valid():

        /*
         * - make the CLONE_DETACHED bit reuseable for clone3
         * - make the CSIGNAL bits reuseable for clone3
         */
        if (kargs->flags & (CLONE_DETACHED | CSIGNAL))
                return false;

The problem is that CLONE_NEWTIME matches one of the bits in the
CSIGNAL mask. If the intention is to allow CLONE_NEWTIME with
clone3(), then either the bit needs to be redefined, or the error
checking in clone3_args_valid() needs to be reworked.

And my question: if it is intended that CLONE_NEWTIME  should be
usable with clone3(), how should that work? What I mean is,
clone3(CLONE_NEWTIME) creates a child process in a new time namespace,
but, as I understand it, the /proc/PID/timens_offsets must be defined
before the first process is created in or joins (setns()) the new
namespace. What am I missing?

Thanks,

Michael

[1] The message for commit 769071ac9f20b6a447410c7eaa55d1a5233ef40c,
implies rather more strongly that clone3() should be able to use
CLONE_NEWCTIME, but perhaps that is a result of Thomas's fix-up:

[[
    All available clone flags have been used, so CLONE_NEWTIME uses the highest
    bit of CSIGNAL. It means that it can be used only with the unshare() and
    the clone3() system calls.

    [ tglx: Adjusted paragraph about clone3() to reality and massaged the
            changelog a bit. ]
]]

On Tue, Nov 12, 2019 at 2:31 AM Dmitry Safonov <dima@arista.com> wrote:
>
> From: Andrei Vagin <avagin@openvz.org>
>
> Time Namespace isolates clock values.
>
> The kernel provides access to several clocks CLOCK_REALTIME,
> CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.
>
> CLOCK_REALTIME
>       System-wide clock that measures real (i.e., wall-clock) time.
>
> CLOCK_MONOTONIC
>       Clock that cannot be set and represents monotonic time since
>       some unspecified starting point.
>
> CLOCK_BOOTTIME
>       Identical to CLOCK_MONOTONIC, except it also includes any time
>       that the system is suspended.
>
> For many users, the time namespace means the ability to changes date and
> time in a container (CLOCK_REALTIME).
>
> But in a context of the checkpoint/restore functionality, monotonic and
> bootime clocks become interesting. Both clocks are monotonic with
> unspecified staring points. These clocks are widely used to measure time
> slices and set timers. After restoring or migrating processes, we have to
> guarantee that they never go backward. In an ideal case, the behavior of
> these clocks should be the same as for a case when a whole system is
> suspended. All this means that we need to be able to set CLOCK_MONOTONIC
> and CLOCK_BOOTTIME clocks, what can be done by adding per-namespace
> offsets for clocks.
>
> A time namespace is similar to a pid namespace in a way how it is
> created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
> but doesn't set it to the current process. Then all children of
> the process will be born in the new time namespace, or a process can
> use the setns() system call to join a namespace.
>
> This scheme allows setting clock offsets for a namespace, before any
> processes appear in it.
>
> All available clone flags have been used, so CLONE_NEWTIME uses the
> highest bit of CSIGNAL. It means that we can use it with the unshare()
> system call only. Rith now, this works for us, because time namespace
> offsets can be set only when a new time namespace is not populated. In a
> future, we will have the clone3() system call [1] which will allow to use
> the CSIGNAL mask for clone flags.
>
> [1]: httmps://lkml.kernel.org/r/20190604160944.4058-1-christian@brauner.io
>
> Link: https://criu.org/Time_namespace
> Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
> Signed-off-by: Andrei Vagin <avagin@gmail.com>
> Co-developed-by: Dmitry Safonov <dima@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>


-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

  parent reply index

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-12  1:26 [PATCHv8 00/34] kernel: Introduce Time Namespace Dmitry Safonov
2019-11-12  1:26 ` [PATCHv8 01/34] lib/vdso: Add unlikely() hint into vdso_read_begin() Dmitry Safonov
2019-11-12  1:26 ` [PATCHv8 02/34] lib/vdso: make do_hres and do_coarse as __always_inline Dmitry Safonov
     [not found]   ` <20191112012724.250792-3-dima-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
2020-01-10  9:45     ` Vincenzo Frascino
2020-01-10 11:42       ` Thomas Gleixner
     [not found]         ` <878smfa66i.fsf-ecDvlHI5BZPZikZi3RtOZ1XZhhPuCNm+@public.gmane.org>
2020-01-10 11:47           ` Vincenzo Frascino
2020-01-10 12:02             ` Thomas Gleixner
     [not found]               ` <875zhja59q.fsf-ecDvlHI5BZPZikZi3RtOZ1XZhhPuCNm+@public.gmane.org>
2020-01-10 12:18                 ` Vincenzo Frascino
2020-01-13  5:27               ` Andrei Vagin
2019-11-12  1:26 ` [PATCHv8 03/34] ns: Introduce Time Namespace Dmitry Safonov
     [not found]   ` <20191112012724.250792-4-dima-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
2020-01-27 14:12     ` Dmitry Vyukov
     [not found]       ` <CACT4Y+b70bRRS2XD3yxhBoy4E-LFy_K3wMrjeuPmiEvaPe_c2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-01-27 14:19         ` Dmitry Safonov
2020-02-17 14:20   ` Michael Kerrisk [this message]
2020-02-17 14:59     ` Time Namespaces: CLONE_NEWTIME and clone3()? Christian Brauner
2020-02-17 21:47       ` Michael Kerrisk (man-pages)
2020-02-17 23:03         ` Christian Brauner
2020-02-17 23:29           ` Thomas Gleixner
2020-02-18  2:37             ` Eric W. Biederman
2020-02-18 17:11           ` Adrian Reber
2020-02-18 17:26             ` Christian Brauner
2019-11-12  1:26 ` [PATCHv8 04/34] time: Add timens_offsets to be used for tasks in timens Dmitry Safonov
2019-11-12  1:26 ` [PATCHv8 05/34] posix-clocks: Rename the clock_get() callback to clock_get_timespec() Dmitry Safonov
2019-11-12  1:26 ` [PATCHv8 06/34] posix-clocks: Rename .clock_get_timespec() callbacks accordingly Dmitry Safonov
2019-11-12  1:26 ` [PATCHv8 07/34] alarmtimer: Rename gettime() callback to get_ktime() Dmitry Safonov
2019-11-12  1:26 ` [PATCHv8 08/34] alarmtimer: Provide get_timespec() callback Dmitry Safonov
2019-11-12  1:26 ` [PATCHv8 09/34] posix-clocks: Introduce clock_get_ktime() callback Dmitry Safonov
2019-11-12  1:26 ` [PATCHv8 10/34] posix-timers: Use clock_get_ktime() in common_timer_get() Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 11/34] posix-clocks: Wire up clock_gettime() with timens offsets Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 12/34] kernel: Add do_timens_ktime_to_host() helper Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 13/34] timerfd: Make timerfd_settime() time namespace aware Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 14/34] posix-timers: Make timer_settime() " Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 15/34] alarmtimer: Make nanosleep " Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 16/34] hrtimers: Prepare hrtimer_nanosleep() for time namespaces Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 17/34] posix-timers: Make clock_nanosleep() time namespace aware Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 18/34] fs/proc: Respect boottime inside time namespace for /proc/uptime Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 19/34] x86/vdso: Restrict splitting VVAR VMA Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 20/34] lib/vdso: Prepare for time namespace support Dmitry Safonov
     [not found]   ` <20191112012724.250792-21-dima-nzgTgzXrdUbQT0dZR+AlfA@public.gmane.org>
2020-01-12 10:32     ` Thomas Gleixner
2019-11-12  1:27 ` [PATCHv8 21/34] x86/vdso: Provide vdso_data offset on vvar_page Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 22/34] x86/vdso: Add timens page Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 23/34] time: Allocate per-timens vvar page Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 24/34] x86/vdso: Handle faults on timens page Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 25/34] x86/vdso: On timens page fault prefault also VVAR page Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 26/34] x86/vdso: Zap vvar pages on switch a time namspace Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 27/34] fs/proc: Introduce /proc/pid/timens_offsets Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 28/34] selftests/timens: Add Time Namespace test for supported clocks Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 29/34] selftests/timens: Add a test for timerfd Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 30/34] selftests/timens: Add a test for clock_nanosleep() Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 31/34] selftests/timens: Add procfs selftest Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 32/34] selftests/timens: Add timer offsets test Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 33/34] selftests/timens: Add a simple perf test for clock_gettime() Dmitry Safonov
2019-11-12  1:27 ` [PATCHv8 34/34] selftests/timens: Check for right timens offsets after fork and exec Dmitry Safonov
2019-11-21 18:05 ` [PATCHv8 00/34] kernel: Introduce Time Namespace Andrei Vagin
2019-12-11 20:38   ` Dmitry Safonov
2020-01-09 21:09     ` Thomas Gleixner
2020-01-10  9:52       ` Vincenzo Frascino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHO5Pa2_7BzZPCXjFj4f=YoX28M4q2Au=h6GrzN-EjRffMo1iw@mail.gmail.com' \
    --to=mtk.manpages@gmail.com \
    --cc=0x7f454c46@gmail.com \
    --cc=adrian@lisas.de \
    --cc=arnd@arndb.de \
    --cc=avagin@gmail.com \
    --cc=avagin@openvz.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=criu@openvz.org \
    --cc=dima@arista.com \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@openvz.org \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=jdike@addtoit.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vincenzo.frascino@arm.com \
    --cc=x86@kernel.org \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-api Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-api/0 linux-api/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-api linux-api/ https://lore.kernel.org/linux-api \
		linux-api@vger.kernel.org
	public-inbox-index linux-api

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-api


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git