From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2800FC4338F for ; Thu, 29 Jul 2021 01:29:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 009C960F43 for ; Thu, 29 Jul 2021 01:29:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233178AbhG2B3j (ORCPT ); Wed, 28 Jul 2021 21:29:39 -0400 Received: from james.kirk.hungrycats.org ([174.142.39.145]:43228 "EHLO james.kirk.hungrycats.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233142AbhG2B3i (ORCPT ); Wed, 28 Jul 2021 21:29:38 -0400 Received: by james.kirk.hungrycats.org (Postfix, from userid 1002) id 9EA3FB0898C; Wed, 28 Jul 2021 21:29:31 -0400 (EDT) Date: Wed, 28 Jul 2021 21:29:31 -0400 From: Zygo Blaxell To: "J. Bruce Fields" Cc: Neal Gompa , NeilBrown , Wang Yugui , Christoph Hellwig , Josef Bacik , Chuck Lever , Chris Mason , David Sterba , Alexander Viro , linux-fsdevel , linux-nfs@vger.kernel.org, Btrfs BTRFS Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Message-ID: <20210729012931.GK10170@hungrycats.org> References: <162742539595.32498.13687924366155737575.stgit@noble.brown> <20210728125819.6E52.409509F4@e16-tech.com> <20210728140431.D704.409509F4@e16-tech.com> <162745567084.21659.16797059962461187633@noble.neil.brown.name> <20210728191431.GA3152@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210728191431.GA3152@fieldses.org> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Wed, Jul 28, 2021 at 03:14:31PM -0400, J. Bruce Fields wrote: > On Wed, Jul 28, 2021 at 08:26:12AM -0400, Neal Gompa wrote: > > I think this is behavior people generally expect, but I wonder what > > the consequences of this would be with huge numbers of subvolumes. If > > there are hundreds or thousands of them (which is quite possible on > > SUSE systems, for example, with its auto-snapshotting regime), this > > would be a mess, wouldn't it? > > I'm surprised that btrfs is special here. Doesn't anyone have thousands > of lvm snapshots? Or is it that they do but they're not normally > mounted? Unprivileged users can't create lvm snapshots as easily or quickly as using mkdir (well, ok, mkdir and fssync). lvm doesn't scale very well past more than a few dozen snapshots of the same original volume, and performance degrades linearly in the number of snapshots if the original LV is modified. btrfs is the opposite: users can create and delete as many snapshots as they like, at a cost more expensive than mkdir but less expensive than 'cp -a', and users only pay IO costs for writes to the subvols they modify. So some btrfs users use snapshots in places where more traditional tools like 'cp -a' or 'git checkout' are used on other filesystems. e.g. a build system might make a snapshot of a git working tree containing a checked out and built baseline revision, and then it might do a loop where it makes a snapshot, applies one patch from an integration branch in the snapshot directory, and incrementally builds there. The next revision makes a snapshot of its parent revision's subvol and builds the next patch. If there are merges in the integration branch, then the builder can go back to parent revisions, create a new snapshot, apply the patch, and build in a snapshot on both sides of the merge. After testing picks a winner, the builder can simply delete all the snapshots except the one for the version that won testing (there is no requirement to commit the snapshot to the origin LV as in lvm, either can be destroyed without requiring action to preserve the other). You can do a similar thing with overlayfs, but it runs into problems with all the mount points. In btrfs, the mount points are persistent because they're built into the filesystem. With overlayfs, you have to save and restore them so they persist across reboots (unless that feature has been added since I last looked). I'm looking at a few machines here, and if all the subvols are visible to 'df', its output would be somewhere around 3-5 MB. That's too much--we'd have to hack up df to not show the same btrfs twice...as well as every monitoring tool that reports free space...which sounds similar to the problems we're trying to avoid. Ideally there would be a way to turn this on or off. It is creating a set of new problems that is the complement of the set we're trying to fix in this change. > --b.