linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Case for "datacow-forced" option
@ 2022-01-08  1:30 Remi Gauvin
  2022-01-09 16:37 ` Chris Murphy
  2022-01-11 16:01 ` David Sterba
  0 siblings, 2 replies; 5+ messages in thread
From: Remi Gauvin @ 2022-01-08  1:30 UTC (permalink / raw)
  To: linux-btrfs

I notice some software is silently creating files with +C attribute
without user input.  (Systemd journals, libvert qcow files, etc.)... I
can appreciate the goal of a performance boost, but I can only see this
as disaster for users of btrfs RAID, which will lead to inconsistent
mirrors on unclean shutdown, (and no way to fix, other than full balance.)

I think a datacow-forced option would be a good idea to prevent
accidental creation of critical files with nocow attribute.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Case for "datacow-forced" option
  2022-01-08  1:30 Case for "datacow-forced" option Remi Gauvin
@ 2022-01-09 16:37 ` Chris Murphy
  2022-01-11 16:02   ` David Sterba
  2022-01-11 16:01 ` David Sterba
  1 sibling, 1 reply; 5+ messages in thread
From: Chris Murphy @ 2022-01-09 16:37 UTC (permalink / raw)
  To: Remi Gauvin; +Cc: linux-btrfs

On Fri, Jan 7, 2022 at 6:30 PM Remi Gauvin <remi@georgianit.com> wrote:
>
> I notice some software is silently creating files with +C attribute
> without user input.  (Systemd journals, libvert qcow files, etc.)... I
> can appreciate the goal of a performance boost, but I can only see this
> as disaster for users of btrfs RAID, which will lead to inconsistent
> mirrors on unclean shutdown, (and no way to fix, other than full balance.)
>
> I think a datacow-forced option would be a good idea to prevent
> accidental creation of critical files with nocow attribute.

libvirt sets file attribute C on the directory specified at the time
of pool creation. You can unset it and it won't be reset.

systemd defaults to nodatacow on Btrfs, but you can `touch
/etc/tmpfiles.d/journal-nocow.conf` and then also recursively remote
file attribute C from /var/log/journal

I know that zoned disallows nodatacow, but I have no idea what happens
to files being copied over with file attribute C. Does the entire copy
fail or just an error when trying to set chattr +C on the file while
it's still zero length.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Case for "datacow-forced" option
  2022-01-08  1:30 Case for "datacow-forced" option Remi Gauvin
  2022-01-09 16:37 ` Chris Murphy
@ 2022-01-11 16:01 ` David Sterba
  2022-01-21  0:07   ` Zygo Blaxell
  1 sibling, 1 reply; 5+ messages in thread
From: David Sterba @ 2022-01-11 16:01 UTC (permalink / raw)
  To: Remi Gauvin; +Cc: linux-btrfs

On Fri, Jan 07, 2022 at 08:30:46PM -0500, Remi Gauvin wrote:
> I notice some software is silently creating files with +C attribute
> without user input.  (Systemd journals, libvert qcow files, etc.)... I
> can appreciate the goal of a performance boost, but I can only see this
> as disaster for users of btrfs RAID, which will lead to inconsistent
> mirrors on unclean shutdown, (and no way to fix, other than full balance.)
> 
> I think a datacow-forced option would be a good idea to prevent
> accidental creation of critical files with nocow attribute.

Settings like that start some kind of "policy wars" and list of
exceptions, ie. who decides what the filesystem is allowed to do. A
mount option like you suggest would never allow to create a nocow file,
but having some scratch nocow files with better performance would be
nice to have. A global forced option would prevent accidental nocow
files while you as user would consider them important.

I'd rather see that fixed or made configurable on the side of
applications, the filesystem is really just providing features and
options and limits the policies and forced options to the users.

IIRC the systemd journals got +C because the write pattern is 'append'
that over time creates highly fragmented files. For VM images it's a
performance optimization at the cost of no checksums. Both performance
vs reliability trade off, that somebody made on behalf of users. But not
to satisfaction to all, wich I understand but don't agree that the
filesystem should be the level where this gets resolved.

If fragmentation is problem, eventual runs of the defrag ioctl on the
files can make the problem bearable.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Case for "datacow-forced" option
  2022-01-09 16:37 ` Chris Murphy
@ 2022-01-11 16:02   ` David Sterba
  0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2022-01-11 16:02 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Remi Gauvin, linux-btrfs

On Sun, Jan 09, 2022 at 09:37:52AM -0700, Chris Murphy wrote:
> On Fri, Jan 7, 2022 at 6:30 PM Remi Gauvin <remi@georgianit.com> wrote:
> I know that zoned disallows nodatacow, but I have no idea what happens
> to files being copied over with file attribute C. Does the entire copy
> fail or just an error when trying to set chattr +C on the file while
> it's still zero length.

If it's plain copy, ther'es no difference, but if it's based on reflink
then the new file will be created without the +C flag and reflink won't
work, leaving a 0 size file.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Case for "datacow-forced" option
  2022-01-11 16:01 ` David Sterba
@ 2022-01-21  0:07   ` Zygo Blaxell
  0 siblings, 0 replies; 5+ messages in thread
From: Zygo Blaxell @ 2022-01-21  0:07 UTC (permalink / raw)
  To: dsterba, Remi Gauvin, linux-btrfs

On Tue, Jan 11, 2022 at 05:01:12PM +0100, David Sterba wrote:
> On Fri, Jan 07, 2022 at 08:30:46PM -0500, Remi Gauvin wrote:
> > I notice some software is silently creating files with +C attribute
> > without user input.  (Systemd journals, libvert qcow files, etc.)... I
> > can appreciate the goal of a performance boost, but I can only see this
> > as disaster for users of btrfs RAID, which will lead to inconsistent
> > mirrors on unclean shutdown, (and no way to fix, other than full balance.)
> > 
> > I think a datacow-forced option would be a good idea to prevent
> > accidental creation of critical files with nocow attribute.
> 
> Settings like that start some kind of "policy wars" and list of
> exceptions, ie. who decides what the filesystem is allowed to do. A
> mount option like you suggest would never allow to create a nocow file,
> but having some scratch nocow files with better performance would be
> nice to have. A global forced option would prevent accidental nocow
> files while you as user would consider them important.
> 
> I'd rather see that fixed or made configurable on the side of
> applications, the filesystem is really just providing features and
> options and limits the policies and forced options to the users.
> 
> IIRC the systemd journals got +C because the write pattern is 'append'
> that over time creates highly fragmented files. For VM images it's a
> performance optimization at the cost of no checksums. Both performance
> vs reliability trade off, that somebody made on behalf of users. But not
> to satisfaction to all, wich I understand but don't agree that the
> filesystem should be the level where this gets resolved.

Policy controls that were previously handled in lower storage layers
are becoming btrfs's responsibility as it replaces those layers.
datasums and RAID were first, but encryption and integrity are coming.

In other words, btrfs already started the policy war by invading the
territory of legacy policy regimes and establishing its own new and
different policy regime.  Now that the policy war has started, we'd like
proper tools to fight and win.

e.g. if we put ext4 on a dm-integrity device, applications don't get to
disable data integrity on individual files--every file on the filesystem
gets block-level integrity.  Similar things happen with encryption and
mirroring: every file on the filesystem is encrypted and every file gets
mirrored, because the whole filesystem is stored in something encrypted
or mirrored, and no second option is available from the filesystem.

If we want to have a mix of different policies, we can create a bunch
of ext4 filesystems with different policies set on the backing devices
and mount them all at the appropriate points.  It can be set up so that
unprivileged users can only create files on the encrypted filesystems
to prevent data leaks, only a privileged user can change those rules,
and even for privileged users rule changes are a little non-trivial
(e.g. new filesystems have to be created and mounted to implement a new
policy option combination).

It's awesome that btrfs _can_ enable or disable data integrity selectively
at the individual inode level without special privileges, but it's not a
novel capability, only a novel policy.  Unlike other Linux filesystem +
storage stack setups, btrfs allows a user to enable and disable storage
policy controls without permission and with reckless abandon.

If btrfs is to replace the other layers, then it needs to reimplement
the control knobs (or explicit _lack_ of control knobs) of the things
it's replacing, or functionality is lost compared to the legacy systems.
In some cases failure to verify, encrypt, or mirror data can result in
catastrophic failures, and having controls available to unprivileged
users that disable these features at inode level would be a bug.

It's much nicer to say "we don't ever allow nodatasum files on our
filesystems because our hosting providers have terrible taste in SSD
models" and be done with the issue forever.  The alternative is to play
whack-a-mole every day when some new app gets installed or upgraded,
it doesn't follow policy rules, and upstream won't fix it.

We'd have apps explode during completely normal disk failure events
because the app turned off datacow (and with it datasum and self-heal)
and a bad sector or two silently destroys the files.  It became enough
of a problem that I now patch the kernel to silently clear bits from
application requests to set forbidden attributes, removing the need to
frequently audit the filesystems for new files with these attributes.

> If fragmentation is problem, eventual runs of the defrag ioctl on the
> files can make the problem bearable.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-21  0:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-08  1:30 Case for "datacow-forced" option Remi Gauvin
2022-01-09 16:37 ` Chris Murphy
2022-01-11 16:02   ` David Sterba
2022-01-11 16:01 ` David Sterba
2022-01-21  0:07   ` Zygo Blaxell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).