From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lennart Poettering Date: Tue, 31 Mar 2020 15:24:51 +0000 Subject: Re: Upcoming: Notifications, FS notifications and fsinfo() Message-Id: <20200331152451.GG27959@gardel-login> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit List-Id: References: <1445647.1585576702@warthog.procyon.org.uk> <20200330211700.g7evnuvvjenq3fzm@wittgenstein> <20200331083430.kserp35qabnxvths@ws.net.home> <20200331122554.GA27469@gardel-login> In-Reply-To: To: Miklos Szeredi Cc: Karel Zak , Christian Brauner , David Howells , Linus Torvalds , Al Viro , dray@redhat.com, Miklos Szeredi , Steven Whitehouse , Jeff Layton , Ian Kent , andres@anarazel.de, keyrings@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksa Sarai On Di, 31.03.20 17:10, Miklos Szeredi (miklos@szeredi.hu) wrote: > On Tue, Mar 31, 2020 at 2:25 PM Lennart Poettering wrote: > > > > On Di, 31.03.20 10:56, Miklos Szeredi (miklos@szeredi.hu) wrote: > > > > > On Tue, Mar 31, 2020 at 10:34 AM Karel Zak wrote: > > > > > > > > On Tue, Mar 31, 2020 at 07:11:11AM +0200, Miklos Szeredi wrote: > > > > > On Mon, Mar 30, 2020 at 11:17 PM Christian Brauner > > > > > wrote: > > > > > > > > > > > Fwiw, putting down my kernel hat and speaking as someone who maintains > > > > > > two container runtimes and various other low-level bits and pieces in > > > > > > userspace who'd make heavy use of this stuff I would prefer the fd-based > > > > > > fsinfo() approach especially in the light of across namespace > > > > > > operations, querying all properties of a mount atomically all-at-once, > > > > > > > > > > fsinfo(2) doesn't meet the atomically all-at-once requirement. > > > > > > > > I guess your /proc based idea have exactly the same problem... > > > > > > Yes, that's exactly what I wanted to demonstrate: there's no > > > fundamental difference between the two API's in this respect. > > > > > > > I see two possible ways: > > > > > > > > - after open("/mnt", O_PATH) create copy-on-write object in kernel to > > > > represent mount node -- kernel will able to modify it, but userspace > > > > will get unchanged data from the FD until to close() > > > > > > > > - improve fsinfo() to provide set (list) of the attributes by one call > > > > > > I think we are approaching this from the wrong end. Let's just > > > ignore all of the proposed interfaces for now and only concentrate on > > > what this will be used for. > > > > > > Start with a set of use cases by all interested parties. E.g. > > > > > > - systemd wants to keep track attached mounts in a namespace, as well > > > as new detached mounts created by fsmount() > > > > > > - systemd need to keep information (such as parent, children, mount > > > flags, fs options, etc) up to date on any change of topology or > > > attributes. > > > > - We also have code that recursively remounts r/o or unmounts some > > directory tree (with filters), > > Recursive remount-ro is clear. What is not clear is whether you need > to do this for hidden mounts (not possible from userspace without a > way to disable mount following on path lookup). Would it make sense > to add a kernel API for recursive setting of mount flags? I would be very happy about an explicit kernel API for recursively toggling the MS_RDONLY. But for many usecases in systemd we need the ability to filter some subdirs and leave them as is, so while helpful we'd have to keep the userspace code we currently have anyway. > What exactly is this unmount with filters? Can you give examples? Hmm, actually it's only the r/o remount that has filters, not the unmount. Sorry for the confusion. And the r/o remount with filters just means: "remount everything below X read-only except for X/Y and X/Z/A"... Lennart -- Lennart Poettering, Berlin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F7D5C43331 for ; Tue, 31 Mar 2020 15:24:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E13332078B for ; Tue, 31 Mar 2020 15:24:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730781AbgCaPYy (ORCPT ); Tue, 31 Mar 2020 11:24:54 -0400 Received: from gardel.0pointer.net ([85.214.157.71]:48790 "EHLO gardel.0pointer.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730521AbgCaPYx (ORCPT ); Tue, 31 Mar 2020 11:24:53 -0400 Received: from gardel-login.0pointer.net (gardel.0pointer.net [IPv6:2a01:238:43ed:c300:10c3:bcf3:3266:da74]) by gardel.0pointer.net (Postfix) with ESMTP id A1BF6E814E3; Tue, 31 Mar 2020 17:24:51 +0200 (CEST) Received: by gardel-login.0pointer.net (Postfix, from userid 1000) id 4D7F6160704; Tue, 31 Mar 2020 17:24:51 +0200 (CEST) Date: Tue, 31 Mar 2020 17:24:51 +0200 From: Lennart Poettering To: Miklos Szeredi Cc: Karel Zak , Christian Brauner , David Howells , Linus Torvalds , Al Viro , dray@redhat.com, Miklos Szeredi , Steven Whitehouse , Jeff Layton , Ian Kent , andres@anarazel.de, keyrings@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksa Sarai Subject: Re: Upcoming: Notifications, FS notifications and fsinfo() Message-ID: <20200331152451.GG27959@gardel-login> References: <1445647.1585576702@warthog.procyon.org.uk> <20200330211700.g7evnuvvjenq3fzm@wittgenstein> <20200331083430.kserp35qabnxvths@ws.net.home> <20200331122554.GA27469@gardel-login> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Di, 31.03.20 17:10, Miklos Szeredi (miklos@szeredi.hu) wrote: > On Tue, Mar 31, 2020 at 2:25 PM Lennart Poettering wrote: > > > > On Di, 31.03.20 10:56, Miklos Szeredi (miklos@szeredi.hu) wrote: > > > > > On Tue, Mar 31, 2020 at 10:34 AM Karel Zak wrote: > > > > > > > > On Tue, Mar 31, 2020 at 07:11:11AM +0200, Miklos Szeredi wrote: > > > > > On Mon, Mar 30, 2020 at 11:17 PM Christian Brauner > > > > > wrote: > > > > > > > > > > > Fwiw, putting down my kernel hat and speaking as someone who maintains > > > > > > two container runtimes and various other low-level bits and pieces in > > > > > > userspace who'd make heavy use of this stuff I would prefer the fd-based > > > > > > fsinfo() approach especially in the light of across namespace > > > > > > operations, querying all properties of a mount atomically all-at-once, > > > > > > > > > > fsinfo(2) doesn't meet the atomically all-at-once requirement. > > > > > > > > I guess your /proc based idea have exactly the same problem... > > > > > > Yes, that's exactly what I wanted to demonstrate: there's no > > > fundamental difference between the two API's in this respect. > > > > > > > I see two possible ways: > > > > > > > > - after open("/mnt", O_PATH) create copy-on-write object in kernel to > > > > represent mount node -- kernel will able to modify it, but userspace > > > > will get unchanged data from the FD until to close() > > > > > > > > - improve fsinfo() to provide set (list) of the attributes by one call > > > > > > I think we are approaching this from the wrong end. Let's just > > > ignore all of the proposed interfaces for now and only concentrate on > > > what this will be used for. > > > > > > Start with a set of use cases by all interested parties. E.g. > > > > > > - systemd wants to keep track attached mounts in a namespace, as well > > > as new detached mounts created by fsmount() > > > > > > - systemd need to keep information (such as parent, children, mount > > > flags, fs options, etc) up to date on any change of topology or > > > attributes. > > > > - We also have code that recursively remounts r/o or unmounts some > > directory tree (with filters), > > Recursive remount-ro is clear. What is not clear is whether you need > to do this for hidden mounts (not possible from userspace without a > way to disable mount following on path lookup). Would it make sense > to add a kernel API for recursive setting of mount flags? I would be very happy about an explicit kernel API for recursively toggling the MS_RDONLY. But for many usecases in systemd we need the ability to filter some subdirs and leave them as is, so while helpful we'd have to keep the userspace code we currently have anyway. > What exactly is this unmount with filters? Can you give examples? Hmm, actually it's only the r/o remount that has filters, not the unmount. Sorry for the confusion. And the r/o remount with filters just means: "remount everything below X read-only except for X/Y and X/Z/A"... Lennart -- Lennart Poettering, Berlin