From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99B5CC4332E for ; Thu, 19 Mar 2020 12:37:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 67CAB20732 for ; Thu, 19 Mar 2020 12:37:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="key not found in DNS" (0-bit key) header.d=szeredi.hu header.i=@szeredi.hu header.b="cMtY3qUc" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727166AbgCSMhN (ORCPT ); Thu, 19 Mar 2020 08:37:13 -0400 Received: from mail-il1-f193.google.com ([209.85.166.193]:40461 "EHLO mail-il1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727127AbgCSMhK (ORCPT ); Thu, 19 Mar 2020 08:37:10 -0400 Received: by mail-il1-f193.google.com with SMTP id p12so2040881ilm.7 for ; Thu, 19 Mar 2020 05:37:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2tuVxH0JdqTpL/q4Mhu8Jwtx1AVVUn0AWeX5OBSt298=; b=cMtY3qUcEmiCWPAEeCd5lhB7x3Biy4/6Het2lKK01B7wcpZG2RZMmWE3LV0gVxGIei UuMBMRql9WM+pRx9p4acPSg0mo4kZ7BHmdSm48TJO0cwwlRT82C+TUnt1Ixfd5tAZfQB qGlXYFxfI9xkqU5Rq8m22vi2BvEMfsNiNnE2E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2tuVxH0JdqTpL/q4Mhu8Jwtx1AVVUn0AWeX5OBSt298=; b=LFIINNuA7pOPu1eZgT831ET8+zifxZz5VT5PbdrJg8HH43jqJ7LHruDhpv4I3vgxcU bKQ4DtqDeCcdSW1lXD+kdxrXPplzrGluy8WL9N6p9T6ZwdQsgHtfZoYZmjwqIybQC2z2 S8AWqlBNB8hQ7+LsVs4mxJqZFgYNg/Y40x/TSAOKO5nQei6m5GKUVrgl0cWWhiHAlMGc 7adu9IoQVpszEom6qZRelgEr/mB7/q+DSBjs60q4c1qqxUImonXe0gSZSXqunmkgLpWf IHKI9nFTjY6XLplXSR9pO7B4xl6ejleEMf/Q97Yv+U4+6Sck6VYj5T2fu8gUnSVs1PLx /4iQ== X-Gm-Message-State: ANhLgQ046+gbNkBT54DorJDvTFbIxKsywLhSlganrpeq4SSYCxs800D1 QCJEq3tbxf2ZSildk12S9//4pM6WvbH5HYqNKkYVVw== X-Google-Smtp-Source: ADFU+vsTsJqo2m1ETEsSuJlFNylEmftzB1ulJzyZd8sss8uJZS7aWDqz3pxVA9Gv6H7ICK0JOZMAndoK/X55keC9JJI= X-Received: by 2002:a92:3b8c:: with SMTP id n12mr2899150ilh.186.1584621429946; Thu, 19 Mar 2020 05:37:09 -0700 (PDT) MIME-Version: 1.0 References: <158454408854.2864823.5910520544515668590.stgit@warthog.procyon.org.uk> <3085880.1584614257@warthog.procyon.org.uk> In-Reply-To: <3085880.1584614257@warthog.procyon.org.uk> From: Miklos Szeredi Date: Thu, 19 Mar 2020 13:36:58 +0100 Message-ID: Subject: Re: [PATCH 00/13] VFS: Filesystem information [ver #19] To: David Howells Cc: Linus Torvalds , Al Viro , Linux NFS list , Andreas Dilger , Anna Schumaker , "Theodore Ts'o" , Linux API , linux-ext4@vger.kernel.org, Trond Myklebust , Ian Kent , Miklos Szeredi , Christian Brauner , Jann Horn , "Darrick J. Wong" , Karel Zak , Jeff Layton , linux-fsdevel@vger.kernel.org, LSM , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, Mar 19, 2020 at 11:37 AM David Howells wrote: > > Miklos Szeredi wrote: > > > > (2) It's more efficient as we can return specific binary data rather than > > > making huge text dumps. Granted, sysfs and procfs could present the > > > same data, though as lots of little files which have to be > > > individually opened, read, closed and parsed. > > > > Asked this a number of times, but you haven't answered yet: what > > application would require such a high efficiency? > > Low efficiency means more time doing this when that time could be spent doing > other things - or even putting the CPU in a powersaving state. Using an > open/read/close render-to-text-and-parse interface *will* be slower and less > efficient as there are more things you have to do to use it. > > Then consider doing a walk over all the mounts in the case where there are > 10000 of them - we have issues with /proc/mounts for such. fsinfo() will end > up doing a lot less work. Current /proc/mounts problems arise from the fact that mount info can only be queried for the whole namespace, and hence changes related to a single mount will require rescanning the complete mount list. If mount info can be queried for individual mounts, then the need to scan the complete list will be rare. That's *the* point of this change. > > > (3) We wouldn't have the overhead of open and close (even adding a > > > self-contained readfile() syscall has to do that internally > > > > Busted: add f_op->readfile() and be done with all that. For example > > DEFINE_SHOW_ATTRIBUTE() could be trivially moved to that interface. > > Look at your example. "f_op->". That's "file->f_op->" I presume. > > You would have to make it "i_op->" to avoid the open and the close - and for > things like procfs and sysfs, that's probably entirely reasonable - but bear > in mind that you still have to apply all the LSM file security controls, just > in case the backing filesystem is, say, ext4 rather than procfs. > > > We could optimize existing proc, sys, etc. interfaces, but it's not > > been an issue, apparently. > > You can't get rid of or change many of the existing interfaces. A lot of them > are effectively indirect system calls and are, as such, part of the fixed > UAPI. You'd have to add a parallel optimised set. Sure. We already have the single_open() internal API that is basically a ->readfile() wrapper. Moving this up to the f_op level (no, it's not an i_op, and yes, we do need struct file, but it can be simply allocated on the stack) is a trivial optimization that would let a readfile(2) syscall access that level. No new complexity in that case. Same generally goes for seq_file: seq_readfile() is trivial to implement without messing with current implementation or any existing APIs. > > > > (6) Don't have to create/delete a bunch of sysfs/procfs nodes each time a > > > mount happens or is removed - and since systemd makes much use of > > > mount namespaces and mount propagation, this will create a lot of > > > nodes. > > > > Not true. > > This may not be true if you roll your own special filesystem. It *is* true if > you do it in procfs or sysfs. The files don't exist if you don't create nodes > or attribute tables for them. That's one of the reasons why I opted to roll my own. But the ideas therein could be applied to kernfs, if found to be generally useful. Nothing magic about that. > > > > The argument for doing this through procfs/sysfs/somemagicfs is that > > > someone using a shell can just query the magic files using ordinary text > > > tools, such as cat - and that has merit - but it doesn't solve the > > > query-by-pathname problem. > > > > > > The suggested way around the query-by-pathname problem is to open the > > > target file O_PATH and then look in a magic directory under procfs > > > corresponding to the fd number to see a set of attribute files[*] laid out. > > > Bash, however, can't open by O_PATH or O_NOFOLLOW as things stand... > > > > Bash doesn't have fsinfo(2) either, so that's not really a good argument. > > I never claimed that fsinfo() could be accessed directly from the shell. For > you proposal, you claimed "immediately usable from all programming languages, > including scripts". You are right. Note however: only special files need the O_PATH handling, regular files are directories can be opened by the shell without side effects. In any case, I think neither of us can be convinced of the other's right, so I guess It's up to Al and Linus to make a decision. Thanks, Miklos