From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AB13C433F5 for ; Fri, 21 Jan 2022 00:07:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378324AbiAUAHq (ORCPT ); Thu, 20 Jan 2022 19:07:46 -0500 Received: from drax.kayaks.hungrycats.org ([174.142.148.226]:45030 "EHLO drax.kayaks.hungrycats.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378263AbiAUAHq (ORCPT ); Thu, 20 Jan 2022 19:07:46 -0500 Received: by drax.kayaks.hungrycats.org (Postfix, from userid 1002) id 35F2217A69D; Thu, 20 Jan 2022 19:07:45 -0500 (EST) Date: Thu, 20 Jan 2022 19:07:45 -0500 From: Zygo Blaxell To: dsterba@suse.cz, Remi Gauvin , linux-btrfs Subject: Re: Case for "datacow-forced" option Message-ID: References: <42e747ca-faf1-ed7c-9823-4ab333c07104@georgianit.com> <20220111160112.GP14046@twin.jikos.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220111160112.GP14046@twin.jikos.cz> Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Tue, Jan 11, 2022 at 05:01:12PM +0100, David Sterba wrote: > On Fri, Jan 07, 2022 at 08:30:46PM -0500, Remi Gauvin wrote: > > I notice some software is silently creating files with +C attribute > > without user input. (Systemd journals, libvert qcow files, etc.)... I > > can appreciate the goal of a performance boost, but I can only see this > > as disaster for users of btrfs RAID, which will lead to inconsistent > > mirrors on unclean shutdown, (and no way to fix, other than full balance.) > > > > I think a datacow-forced option would be a good idea to prevent > > accidental creation of critical files with nocow attribute. > > Settings like that start some kind of "policy wars" and list of > exceptions, ie. who decides what the filesystem is allowed to do. A > mount option like you suggest would never allow to create a nocow file, > but having some scratch nocow files with better performance would be > nice to have. A global forced option would prevent accidental nocow > files while you as user would consider them important. > > I'd rather see that fixed or made configurable on the side of > applications, the filesystem is really just providing features and > options and limits the policies and forced options to the users. > > IIRC the systemd journals got +C because the write pattern is 'append' > that over time creates highly fragmented files. For VM images it's a > performance optimization at the cost of no checksums. Both performance > vs reliability trade off, that somebody made on behalf of users. But not > to satisfaction to all, wich I understand but don't agree that the > filesystem should be the level where this gets resolved. Policy controls that were previously handled in lower storage layers are becoming btrfs's responsibility as it replaces those layers. datasums and RAID were first, but encryption and integrity are coming. In other words, btrfs already started the policy war by invading the territory of legacy policy regimes and establishing its own new and different policy regime. Now that the policy war has started, we'd like proper tools to fight and win. e.g. if we put ext4 on a dm-integrity device, applications don't get to disable data integrity on individual files--every file on the filesystem gets block-level integrity. Similar things happen with encryption and mirroring: every file on the filesystem is encrypted and every file gets mirrored, because the whole filesystem is stored in something encrypted or mirrored, and no second option is available from the filesystem. If we want to have a mix of different policies, we can create a bunch of ext4 filesystems with different policies set on the backing devices and mount them all at the appropriate points. It can be set up so that unprivileged users can only create files on the encrypted filesystems to prevent data leaks, only a privileged user can change those rules, and even for privileged users rule changes are a little non-trivial (e.g. new filesystems have to be created and mounted to implement a new policy option combination). It's awesome that btrfs _can_ enable or disable data integrity selectively at the individual inode level without special privileges, but it's not a novel capability, only a novel policy. Unlike other Linux filesystem + storage stack setups, btrfs allows a user to enable and disable storage policy controls without permission and with reckless abandon. If btrfs is to replace the other layers, then it needs to reimplement the control knobs (or explicit _lack_ of control knobs) of the things it's replacing, or functionality is lost compared to the legacy systems. In some cases failure to verify, encrypt, or mirror data can result in catastrophic failures, and having controls available to unprivileged users that disable these features at inode level would be a bug. It's much nicer to say "we don't ever allow nodatasum files on our filesystems because our hosting providers have terrible taste in SSD models" and be done with the issue forever. The alternative is to play whack-a-mole every day when some new app gets installed or upgraded, it doesn't follow policy rules, and upstream won't fix it. We'd have apps explode during completely normal disk failure events because the app turned off datacow (and with it datasum and self-heal) and a bad sector or two silently destroys the files. It became enough of a problem that I now patch the kernel to silently clear bits from application requests to set forbidden attributes, removing the need to frequently audit the filesystems for new files with these attributes. > If fragmentation is problem, eventual runs of the defrag ioctl on the > files can make the problem bearable.