From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E125BC34022 for ; Mon, 17 Feb 2020 23:03:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BE47020801 for ; Mon, 17 Feb 2020 23:03:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726190AbgBQXDj (ORCPT ); Mon, 17 Feb 2020 18:03:39 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:56717 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725922AbgBQXDi (ORCPT ); Mon, 17 Feb 2020 18:03:38 -0500 Received: from ip5f5bf7ec.dynamic.kabel-deutschland.de ([95.91.247.236] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1j3pQK-0001Ar-JZ; Mon, 17 Feb 2020 23:03:32 +0000 Date: Tue, 18 Feb 2020 00:03:31 +0100 From: Christian Brauner To: "Michael Kerrisk (man-pages)" Cc: Dmitry Safonov , Andrei Vagin , Linux Kernel , Dmitry Safonov <0x7f454c46@gmail.com>, Adrian Reber , Andy Lutomirski , Arnd Bergmann , Cyrill Gorcunov , "Eric W. Biederman" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Jeff Dike , Oleg Nesterov , Pavel Emelyanov , Shuah Khan , Thomas Gleixner , Vincenzo Frascino , containers , criu@openvz.org, Linux API , x86@kernel.org, Andrei Vagin Subject: Re: Time Namespaces: CLONE_NEWTIME and clone3()? Message-ID: <20200217230331.he6p5bs766zp6smx@wittgenstein> References: <20191112012724.250792-1-dima@arista.com> <20191112012724.250792-4-dima@arista.com> <20200217145908.7epzz5nescccwvzv@wittgenstein> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 17, 2020 at 10:47:53PM +0100, Michael Kerrisk (man-pages) wrote: > Hello Christian, > > On Mon, 17 Feb 2020 at 16:15, Christian Brauner > wrote: > > > > On Mon, Feb 17, 2020 at 03:20:55PM +0100, Michael Kerrisk wrote: > > > Hello Dmitry, Andrei, > > > > > > Is the CLONE_NEWTIME flag intended to be usable with clone3()? The > > > mail quoted below implies (in my reading) that this should be possible > > > once clone3() is available, which it is by now. (See also [1].) > > > > > > If the answer is yes, CLONE_NEWTIME should be usable with clone3(), > > > then I have a bug report and a question. > > > > > > I successfully used CLONE_NEWTIME with unshare(). But if I try to use > > > CLONE_NEWSIGNAL with clone3(), it errors out with EINVAL, because of > > > > s/CLONE_NEWSIGNAL/CLONE_NEWTIME/ > > > > > the following check in clone3_args_valid(): > > > > > > /* > > > * - make the CLONE_DETACHED bit reuseable for clone3 > > > * - make the CSIGNAL bits reuseable for clone3 > > > */ > > > if (kargs->flags & (CLONE_DETACHED | CSIGNAL)) > > > return false; > > > > > > The problem is that CLONE_NEWTIME matches one of the bits in the > > > CSIGNAL mask. If the intention is to allow CLONE_NEWTIME with > > > clone3(), then either the bit needs to be redefined, or the error > > > checking in clone3_args_valid() needs to be reworked. > > > > If this is intended to be useable with clone3() the check should be > > adapted to allow for CLONE_NEWTIME. (I asked about this a while ago I > > think.) > > But below rather sounds like it should simply be an unshare() flag. The > > code seems to set frozen_offsets to true right after copy_namespaces() > > in timens_on_fork(new_ns, tsk) and so the offsets can't be changed > > anymore unless I'm reading this wrong. > > Alternatives seem to either make timens_offsets writable once after fork > > and before exec, I guess - though that's probably not going to work > > with the vdso judging from timens_on_fork(). > > > > The other alternative is that Andrei and Dmitry send me a patch to > > enable CLONE_NEWTIME with clone3() by exposing struct timens_offsets (or > > a version of it) in the uapi and extend struct clone_args to include a > > pointer to a struct timens_offset that is _only_ set when CLONE_NEWTIME > > is set. > > Though the unshare() way sounds way less invasive simpler. > > Actually, I think the alternative you propose just here is better. I > imagine there are times when one will want to create multiple > namespaces with a single call to clone3(), including a time namespace. > I think this should be allowed by the API. And, otherwise, clone3() > becomes something of a second-class citizen for creating namespaces. > (I don't really get the "less invasive" argument. Implementing this is > just a piece of kernel to code to make user-space's life a bit simpler > and more consistent.) I don't particularly mind either way. If there's actual users that need to set it at clone3() time then we can extend it. So I'd like to hear what Adrian, Dmitry, and Thomas think since they are well-versed how this will be used in the wild. I'm weary of exposing a whole new uapi struct and extending clone3() without any real use-case but I'm happy to if there is! Christian