From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8467BC432BE for ; Fri, 30 Jul 2021 00:18:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E39960F5E for ; Fri, 30 Jul 2021 00:18:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235345AbhG3ASO (ORCPT ); Thu, 29 Jul 2021 20:18:14 -0400 Received: from zeniv-ca.linux.org.uk ([142.44.231.140]:42500 "EHLO zeniv-ca.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235214AbhG3ASO (ORCPT ); Thu, 29 Jul 2021 20:18:14 -0400 Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1m9G9b-0052l3-6o; Fri, 30 Jul 2021 00:13:31 +0000 Date: Fri, 30 Jul 2021 00:13:31 +0000 From: Al Viro To: Josef Bacik Cc: "J. Bruce Fields" , NeilBrown , Christoph Hellwig , Chuck Lever , Chris Mason , David Sterba , linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Message-ID: References: <162742539595.32498.13687924366155737575.stgit@noble.brown> <20210728193536.GD3152@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Wed, Jul 28, 2021 at 05:30:04PM -0400, Josef Bacik wrote: > I don't think anybody has that many file systems. For btrfs it's a single > file system. Think of syncfs, it's going to walk through all of the super > blocks on the system calling ->sync_fs on each subvol superblock. Now this > isn't a huge deal, we could just have some flag that says "I'm not real" or > even just have anonymous superblocks that don't get added to the global > super_blocks list, and that would address my main pain points. Umm... Aren't the snapshots read-only by definition? > The second part is inode reclaim. Again this particular problem could be > avoided if we had an anonymous superblock that wasn't actually used, but the > inode lru is per superblock. Now with reclaim instead of walking all the > inodes, you're walking a bunch of super blocks and then walking the list of > inodes within those super blocks. You're burning CPU cycles because now > instead of getting big chunks of inodes to dispose, it's spread out across > many super blocks. > > The other weird thing is the way we apply pressure to shrinker systems. We > essentially say "try to evict X objects from your list", which means in this > case with lots of subvolumes we'd be evicting waaaaay more inodes than you > were before, likely impacting performance where you have workloads that have > lots of files open across many subvolumes (which is what FB does with it's > containers). > > If we want a anonymous superblock per subvolume then the only way it'll work > is if it's not actually tied into anything, and we still use the primary > super block for the whole file system. And if that's what we're going to do > what's the point of the super block exactly? This approach that Neil's come > up with seems like a reasonable solution to me. Christoph gets his > separation and /proc/self/mountinfo, and we avoid the scalability headache > of a billion super blocks. Thanks, AFAICS, we also get arseloads of weird corner cases - in particular, Neil's suggestions re visibility in /proc/mounts look rather arbitrary. Al, really disliking the entire series...