From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74317C2BB86 for ; Sun, 12 Apr 2020 13:44:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4EC7720708 for ; Sun, 12 Apr 2020 13:44:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Oep9Bi1K" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726843AbgDLNo5 (ORCPT ); Sun, 12 Apr 2020 09:44:57 -0400 Received: from mail-il1-f194.google.com ([209.85.166.194]:36164 "EHLO mail-il1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726805AbgDLNo5 (ORCPT ); Sun, 12 Apr 2020 09:44:57 -0400 Received: by mail-il1-f194.google.com with SMTP id t8so1648559ilj.3; Sun, 12 Apr 2020 06:44:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VYsC8YIiUn94Jux5CSjT7x5juT2k9650lk3hczfKxeU=; b=Oep9Bi1KORyB+M4gSQ9Ws2AopSAiNXqkK4Sl4EY0j4Qjao5GDHCT+b1ZNLWKxre3bl J0Cuvh+URweme+qqz/X2KQzSYLEbPPGZ+Fhf4JEH+62qeZSS0XMKE1zYZB5Gm6n7oOHe ai5WniXnSWug3aX3aC38jNkpnhk0X+S2H0SwDuAhoe7F7h2N9wH7Kpr9wrd27BEXtonN 6QiGz1Cik42cnuVZumwh5DuSTZZ/SYQEB+RnDN3obW4B3V7exE5xbA8PL+X3oEFoZprR fS0WDjrB1/b5QUt+EUxb2ng+cyjybl5oimyeDEcS2Z347LxPFh8EI77UbG5YqDo31GwC 6n9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VYsC8YIiUn94Jux5CSjT7x5juT2k9650lk3hczfKxeU=; b=qS2fRr644Hi8fkqZZbFedWtt31OW4M4gJDEwE3Uk/Q03y/0PMuV8YiiqU8ot8ca3Br h/iWFYhOfgHaWPoXUgBZ51WHBks9fmgO0Lp0M1IwU9TDdy9yxrqftqz0zwU6cFFhnPx1 SdRTDV8JE7iH+vN5XOzMmimah+pieOWSnFaoS4Tlt93VH9FhnhG1yD26yyHuyjEa8GDS Qcn86eFLfez8uExnCWbSHyaxcNlF/u0uPaSnAyabgoULoytiwL6QgJor/7PXuWR9S036 P/K/QVyBHa8eeafJPr8T9aq4Gg+3p/sdKg9VsdkmpwKBHNXfuf9xZIGXyMLUHJ3qBWwL KfwA== X-Gm-Message-State: AGi0PuanjFaLPxEj1dyEX4/F/xsqv7q51XGaq2ly6r95REBR5DXn4Wk4 x5shgJXESUh3VlEKBQog2LwHajLXs6UqmqeovZA= X-Google-Smtp-Source: APiQypLeUWHD17m1jcbll/fLl+WNlRp5MHeZrHx9Sis4ExawTsb46KBo0x4pr2nQB7+6GAnsRzgb5Hpcoi15NETpuKM= X-Received: by 2002:a92:dc09:: with SMTP id t9mr11365390iln.308.1586699095577; Sun, 12 Apr 2020 06:44:55 -0700 (PDT) MIME-Version: 1.0 References: <20200408152151.5780-1-christian.brauner@ubuntu.com> <20200408152151.5780-3-christian.brauner@ubuntu.com> <20200409082659.exequ3evhlv33csr@wittgenstein> <20200412120300.vuigwofazxfbxluu@wittgenstein> In-Reply-To: <20200412120300.vuigwofazxfbxluu@wittgenstein> From: David Rheinsberg Date: Sun, 12 Apr 2020 15:44:44 +0200 Message-ID: Subject: Re: [PATCH 2/8] loopfs: implement loopfs To: Christian Brauner Cc: Jens Axboe , Greg Kroah-Hartman , lkml , linux-block@vger.kernel.org, linux-api@vger.kernel.org, Jonathan Corbet , Serge Hallyn , "Rafael J. Wysocki" , Tejun Heo , "David S. Miller" , Saravana Kannan , Jan Kara , David Howells , Seth Forshee , Tom Gundersen , Christian Kellner , Dmitry Vyukov , =?UTF-8?Q?St=C3=A9phane_Graber?= , linux-doc@vger.kernel.org, netdev@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Hi On Sun, Apr 12, 2020 at 2:03 PM Christian Brauner wrote: [...] > On Sun, Apr 12, 2020 at 12:38:54PM +0200, David Rheinsberg wrote: > > which scenario the limit would be useful. Anyone can create a user-ns, > > create a new loopfs mount, and just happily create more loop-devices. > > So what is so special that you want to restrict the devices on a > > _single_ mount instance? > > To share that instance across namespaces. You can e.g. create the > mount instance in one mount namespace owned by userns1, create a second > user namespace usern2 with the same mapping which is blocked from > creating additional user namespaces either by seccomp or by > /proc/sys/user/max_user_namespaces or lsms what have you. Because it > doesn't own the mount namespace the loopfs mount it is in it can't > remount it and can't exceed the local limit. Right. But now you re-use the userns-limit to also limit loopfs (or other userns restrictions to limit loopfs access). Existing safe setups allow contained processes to create their own user-namespace. With your patchset merged, every such existing contained system with userns-access gets access to a kernel API that allows them unbound kernel memory allocations. I don't think you can tell every existing system to not enable CONFIG_LOOP_FS. Or to make sure to install seccomp filters before updating their kernels. Right? These setups already exist, and they happily use distribution kernels. I think there is no way around `struct user_struct`, `struct ucount`, or whatever you like. > > Furthermore, how do you intend to limit user-space from creating an > > unbound amount of loop devices? Unless I am mistaken, with your > > proposal *any* process can create a new loopfs with a basically > > unlimited amount of loop-devices, thus easily triggering unbound > > kernel allocations. I think this needs to be accounted. The classic > > way is to put a per-uid limit into `struct user_struct` (done by > > pipes, mlock, epoll, mq, etc.). An alternative is `struct ucount`, > > which allows hierarchical management (inotify uses that, as an > > example). > > Yeah, I know. We can certainly do this. My point is, I think we have to. [...] > > With your proposed loop-fs we could achieve something close to it: > > Mount a private loopfs, create a loop-device, and rely on automatic > > cleanup when the mount-namespace is destroyed. > > With loopfs you can do this with the old or new mount api and you don't > need to have loopfs mounted for that at all. Here's a sample program > that works right now with the old mount api: Yeah, loopfs would certainly allow this, and I would be perfectly happy with this API. I think it is overly heavy for the use-case we have, but I do acknowledge that there are other use-cases as well. But I think your claim that "you don't need to have loopfs mounted" is misleading. loopfs must be mounted for the entirety of the program. Instead, you don't have to have it linked in your mount-namespace, since you can immediately detach it. And with the new mount-APIs, you don't even need it linked initially, as you can create a detached mount right away. Thanks David