From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAC4EC11D3D for ; Thu, 27 Feb 2020 15:14:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9288924688 for ; Thu, 27 Feb 2020 15:14:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EgnDTWxj" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730466AbgB0POh (ORCPT ); Thu, 27 Feb 2020 10:14:37 -0500 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:38190 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730383AbgB0POg (ORCPT ); Thu, 27 Feb 2020 10:14:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1582816476; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=KsXD7OC/j5gaz+S/O9WvlSstDKGTM7c7shTcNdfT6SA=; b=EgnDTWxjqIm9eGAsnZG8XDys/asq/mPCgVOTXj/ZAnB0jl+bKpJevx3UJLfboA8/cs1oMv SKEze3R9pTQfxlwveey0B6+geNPpi+X+156IpVslGa6poOYKnWDnateBPvllZElEVdJwqC MZsmcK9vDH7VGp+u9s+0aD0q+qlWH5g= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-354-WDNOwscoO72aO4dxQsI4uw-1; Thu, 27 Feb 2020 10:14:30 -0500 X-MC-Unique: WDNOwscoO72aO4dxQsI4uw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 693DD1005514; Thu, 27 Feb 2020 15:14:28 +0000 (UTC) Received: from ws.net.home (ovpn-204-202.brq.redhat.com [10.40.204.202]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 807061036B25; Thu, 27 Feb 2020 15:14:23 +0000 (UTC) Date: Thu, 27 Feb 2020 16:14:21 +0100 From: Karel Zak To: Miklos Szeredi Cc: Ian Kent , Miklos Szeredi , James Bottomley , Steven Whitehouse , David Howells , viro , Christian Brauner , Jann Horn , "Darrick J. Wong" , Linux API , linux-fsdevel , lkml , Lennart Poettering , Zbigniew =?utf-8?Q?J=C4=99drzejewski-Szmek?= , util-linux@vger.kernel.org Subject: Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17] Message-ID: <20200227151421.3u74ijhqt6ekbiss@ws.net.home> References: <1582556135.3384.4.camel@HansenPartnership.com> <1582644535.3361.8.camel@HansenPartnership.com> <1c8db4e2b707f958316941d8edd2073ee7e7b22c.camel@themaw.net> <3e656465c427487e4ea14151b77d391d52cd6bad.camel@themaw.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Thu, Feb 27, 2020 at 02:45:27PM +0100, Miklos Szeredi wrote: > > So the problem I want to see fixed is the effect of very large > > mount tables on other user space applications, particularly the > > effect when a large number of mounts or umounts are performed. Yes, now you have to generate (in kernel) and parse (in userspace) all mount table to get information about just one mount table entry. This is typical for umount or systemd. > > > - add a notification mechanism - lookup a mount based on path > > > - and a way to selectively query mount/superblock information > > based on path ... For umount-like use-cases we need mountpoint/ to mount entry conversion; I guess something like open(mountpoint/) + fsinfo() should be good enough. For systemd we need the same, but triggered by notification. The ideal solution is to get mount entry ID or FD from notification and later use this ID or FD to ask for details about the mount entry (probably again fsinfo()). The notification has to be usable with in epoll() set. This solves 99% of our performance issues I guess. > > So that means mount table info. needs to be maintained, whether that > > can be achieved using sysfs I don't know. Creating and maintaining > > the sysfs tree would be a big challenge I think. It will be still necessary to get complete mount table sometimes, but not in performance sensitive scenarios. I'm not sure about sysfs/, you need somehow resolve namespaces, order of the mount entries (which one is the last one), etc. IMHO translate mountpoint path to sysfs/ path will be complicated. > > But before trying to work out how to use a notification mechanism > > just having a way to get the info provided by the proc tables using > > a path alone should give initial immediate improvement in libmount. > > Adding Karel, Lennart, Zbigniew and util-linux@vger... > > At a quick glance at libmount and systemd code, it appears that just > switching out the implementation in libmount will not be enough: > systemd is calling functions like mnt_table_parse_*() when it receives > a notification that the mount table changed. We're ready to change this stuff in systemd if there will be something better (something per-mount-entry). My plan is add new API to libmount to query information about one mount entry (but I had no time to play with fsinfo yet). > What is the end purpose of parsing the mount tables? Can systemd guys > comment on that? If mount/umount is triggered by systemd than it need verification about success and final version of the mount options. It also reads information from libmount to get userspace mount options (.e.g. _netdev -- libmount uses mount source, target and fsroot to join kernel and userpace stuff). And don't forget that mount units are part of systemd dependencies, so umount/mount is important event for systemd and it need details about the changes (what, where, ... etc.) Karel -- Karel Zak http://karelzak.blogspot.com