All of lore.kernel.org
 help / color / mirror / Atom feed
* reflink copying does not check/set No_COW attribute and fail
@ 2021-06-04 14:33 Tom Yan
  2021-06-04 14:37 ` Tom Yan
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Yan @ 2021-06-04 14:33 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

I've just bumped into a problem that I am not sure what the expected
behavior should be, but there seems to be something flawed.

Say I have a file that was created with the No_COW attributed
(inherited from the directory / subvolume / mount option). Then if I
try to do a reflink copy, the copying will fail with "Invalid
argument" if the copy has no one to inherit the No_COW attribute from.

For example:
[tom@archlinux mnt]$ sudo btrfs subvol list .
ID 256 gen 11 top level 5 path a
ID 257 gen 9 top level 5 path b
[tom@archlinux mnt]$ lsattr
---------------------- ./a
---------------C------ ./b
[tom@archlinux mnt]$ lsattr b/
---------------C------ b/test
[tom@archlinux mnt]$ du -h b/test
512M    b/test
[tom@archlinux mnt]$ lsattr a/
[tom@archlinux mnt]$ cp --reflink=always b/test a/
cp: failed to clone 'a/test' from 'b/test': Invalid argument
[tom@archlinux mnt]$ lsattr a/
---------------------- a/test
[tom@archlinux mnt]$ du a/test
0    a/test
[tom@archlinux mnt]$ du --apparent-size a/test
0    a/test
[tom@archlinux mnt]$ rm a/test
[tom@archlinux mnt]$ sudo chattr +C a/
[tom@archlinux mnt]$ cp --reflink=always b/test a/
[tom@archlinux mnt]$ lsattr a/
---------------C------ a/test
[tom@archlinux mnt]$ cmp b/test a/test
[tom@archlinux mnt]$

I'm not entirely sure if a reflink copy is supposed to work for a
source file that was created with No_COW, but apparently it is. The
problem is just that the reflink copy also needs to have the attribute
set, yet it cannot inherit from the source automatically.

I wonder if this is a kernel-side problem or something that coreutils
missed? It also seems wrong that when it fails there will be an empty
destination file created.

Kernel version: Linux archlinux 5.12.8-arch1-1 #1 SMP PREEMPT Fri, 28
May 2021 15:10:20 +0000 x86_64 GNU/Linux
Coreutils version: 8.32

Regards,
Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reflink copying does not check/set No_COW attribute and fail
  2021-06-04 14:33 reflink copying does not check/set No_COW attribute and fail Tom Yan
@ 2021-06-04 14:37 ` Tom Yan
  2021-06-04 20:16   ` Zygo Blaxell
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Yan @ 2021-06-04 14:37 UTC (permalink / raw)
  To: linux-btrfs; +Cc: bug-coreutils

Also cc'ing bug-coreutils@gnu.org.

On Fri, 4 Jun 2021 at 22:33, Tom Yan <tom.ty89@gmail.com> wrote:
>
> Hi all,
>
> I've just bumped into a problem that I am not sure what the expected
> behavior should be, but there seems to be something flawed.
>
> Say I have a file that was created with the No_COW attributed
> (inherited from the directory / subvolume / mount option). Then if I
> try to do a reflink copy, the copying will fail with "Invalid
> argument" if the copy has no one to inherit the No_COW attribute from.
>
> For example:
> [tom@archlinux mnt]$ sudo btrfs subvol list .
> ID 256 gen 11 top level 5 path a
> ID 257 gen 9 top level 5 path b
> [tom@archlinux mnt]$ lsattr
> ---------------------- ./a
> ---------------C------ ./b
> [tom@archlinux mnt]$ lsattr b/
> ---------------C------ b/test
> [tom@archlinux mnt]$ du -h b/test
> 512M    b/test
> [tom@archlinux mnt]$ lsattr a/
> [tom@archlinux mnt]$ cp --reflink=always b/test a/
> cp: failed to clone 'a/test' from 'b/test': Invalid argument
> [tom@archlinux mnt]$ lsattr a/
> ---------------------- a/test
> [tom@archlinux mnt]$ du a/test
> 0    a/test
> [tom@archlinux mnt]$ du --apparent-size a/test
> 0    a/test
> [tom@archlinux mnt]$ rm a/test
> [tom@archlinux mnt]$ sudo chattr +C a/
> [tom@archlinux mnt]$ cp --reflink=always b/test a/
> [tom@archlinux mnt]$ lsattr a/
> ---------------C------ a/test
> [tom@archlinux mnt]$ cmp b/test a/test
> [tom@archlinux mnt]$
>
> I'm not entirely sure if a reflink copy is supposed to work for a
> source file that was created with No_COW, but apparently it is. The
> problem is just that the reflink copy also needs to have the attribute
> set, yet it cannot inherit from the source automatically.
>
> I wonder if this is a kernel-side problem or something that coreutils
> missed? It also seems wrong that when it fails there will be an empty
> destination file created.
>
> Kernel version: Linux archlinux 5.12.8-arch1-1 #1 SMP PREEMPT Fri, 28
> May 2021 15:10:20 +0000 x86_64 GNU/Linux
> Coreutils version: 8.32
>
> Regards,
> Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reflink copying does not check/set No_COW attribute and fail
  2021-06-04 14:37 ` Tom Yan
@ 2021-06-04 20:16   ` Zygo Blaxell
  2021-06-05  5:56     ` Tom Yan
  0 siblings, 1 reply; 9+ messages in thread
From: Zygo Blaxell @ 2021-06-04 20:16 UTC (permalink / raw)
  To: Tom Yan; +Cc: linux-btrfs, bug-coreutils

On Fri, Jun 04, 2021 at 10:37:35PM +0800, Tom Yan wrote:
> Also cc'ing bug-coreutils@gnu.org.
> 
> On Fri, 4 Jun 2021 at 22:33, Tom Yan <tom.ty89@gmail.com> wrote:
> >
> > Hi all,
> >
> > I've just bumped into a problem that I am not sure what the expected
> > behavior should be, but there seems to be something flawed.
> >
> > Say I have a file that was created with the No_COW attributed
> > (inherited from the directory / subvolume / mount option). Then if I
> > try to do a reflink copy, the copying will fail with "Invalid
> > argument" if the copy has no one to inherit the No_COW attribute from.

Correct.  nodatacow implies nodatasum, and you cannot reflink an extent
from a nodatasum inode into a datasum inode.

The result of allowing this would be a file that has some extents
that have csums, and some that do not.  Making this work would make
reading from such a file worse (i.e. make it slower, or fail to detect
corruption in metadata).  It's possible to solve some of those problems
(or at least contain them in inodes that are known to have mixed csum
and non-csum data), but first someone would have to make the case that
this is worth the effort to support.

> > For example:
> > [tom@archlinux mnt]$ sudo btrfs subvol list .
> > ID 256 gen 11 top level 5 path a
> > ID 257 gen 9 top level 5 path b
> > [tom@archlinux mnt]$ lsattr
> > ---------------------- ./a
> > ---------------C------ ./b
> > [tom@archlinux mnt]$ lsattr b/
> > ---------------C------ b/test
> > [tom@archlinux mnt]$ du -h b/test
> > 512M    b/test
> > [tom@archlinux mnt]$ lsattr a/
> > [tom@archlinux mnt]$ cp --reflink=always b/test a/
> > cp: failed to clone 'a/test' from 'b/test': Invalid argument
> > [tom@archlinux mnt]$ lsattr a/
> > ---------------------- a/test
> > [tom@archlinux mnt]$ du a/test
> > 0    a/test
> > [tom@archlinux mnt]$ du --apparent-size a/test
> > 0    a/test
> > [tom@archlinux mnt]$ rm a/test
> > [tom@archlinux mnt]$ sudo chattr +C a/
> > [tom@archlinux mnt]$ cp --reflink=always b/test a/
> > [tom@archlinux mnt]$ lsattr a/
> > ---------------C------ a/test
> > [tom@archlinux mnt]$ cmp b/test a/test
> > [tom@archlinux mnt]$
> >
> > I'm not entirely sure if a reflink copy is supposed to work for a
> > source file that was created with No_COW, but apparently it is. 

Snapshots are also allowed for nodatacow files.  The extents that are
shared become implicitly datacow until they are not shared any more.

Snapshots are deferred reflink copies, so it would be difficult to
allow one and not the other.  Disallowing both seems overly restrictive
(e.g. with such a restriction, it would not be possible to use 'btrfs
send' or make a snapshot on a subvol that contains any nodatacow file).

btrfs did disallow both operations for swap files, so it could be possible
to disallow both reflinks and snapshots for nodatacow files, but AFAIK
nobody wants that (some people even want the swapfile restrictions to
go away someday).

> > The
> > problem is just that the reflink copy also needs to have the attribute
> > set, yet it cannot inherit from the source automatically.

reflink can only reflink copy from one nodatasum file to another nodatasum
file, or from one datasum file to another datasum file.

An empty inode can be changed from datacow to nodatacow (or vice versa)
using the fsattr ioctl, which simultaneously changes the file from
datasum to nodatasum if the filesystem was not mounted with the nodatasum
mount option.

There is a possible kernel enhancement here:  when an empty inode is the
dst of a reflink, automatically change the reflink dst inode's nodatasum
flag to match the reflink src inode's nodatasum flag.  If the dst inode
is not empty and the inode datasum flags do not match, then reject the
reflink with EINVAL as before.

It's not clear whether this should apply only to nodatasum or also to
nodatacow.  reflink doesn't need src and dst agreement on nodatacow,
so the dst inode could be a nodatasum+datacow file.  Unfortunately all
the userspace tools including coreutils can only see the nodatacow
inode bit, not the nodatasum bit, so the lack of csums on the dst file
would be invisible.  The kernel cannot know the user's intent from the
available information.

It's not clear that we want the kernel to be implicitly changing
inode attribute bits like this--especially bits that disable integrity
features like nodatasum.  There is precedent for changing fsattrs with
the no-compress inode flag, but that flag doesn't disable csums, and
this one would.

One could also make the opposite case:  it should always be an error to
do anything that would put data in a datasum file without csums, the
existing behavior is correct, and should not be changed.  The problem
with this argument is that users can't see the datasum inode bits,
so it's not clear that the EINVAL is a data protection mechanism.

> > I wonder if this is a kernel-side problem or something that coreutils
> > missed? It also seems wrong that when it fails there will be an empty
> > destination file created.

Normally coreutils will fall back to simple copy if --reflink=auto
is used.  --reflink=always is the user's explicit request for "reflink
or nothing, please."  The user correctly got nothing, as requested.

On other filesystems, reflink on a nodatacow file might make a simple
copy in the kernel--in which case you are no better off than if you had
used --reflink=auto.

coreutils could propagate the source inode nodatacow fsattribute to
the destination inode if it intends to use reflink to copy the data.
That would be the userspace equivalent of the kernel enhancement I
suggested above.  It would probably match user expectations better--no
hidden surprises for non-coreutils use cases, and all the affected inode
attribute bits are necessarily visible in userspace.

fsattr propagation could be quite complicated for coreutils to implement
correctly in general, as some fsattrs should not be propagated this way,
and other filesystems may have different restrictions.  Some fsattrs must
be set before the data is written (e.g. -c for compression), others must
be set after the data is written (e.g. -i for immutable), and some are
a matter of user intent (e.g. should a simple copy be compressed if the
source is not?  Depends on the intended use of the copy).

On other filesystems this userspace behavior might trigger the opposite
of the intended kernel behavior, causing reflink to always fall back to
simple copy because the dst inode's nodatacow attribute is set.

Ideally btrfs will not force coreutils to do one thing on btrfs and the
opposite thing on other filesystems, so it might be worthwhile to hack
around this in the kernel as proposed above.  There is precedent for
that--btrfs falls back to simple copy in reflinks of inline extents,
more or less for the sole purpose of making cp --reflink=always not fail
so randomly.

> > Kernel version: Linux archlinux 5.12.8-arch1-1 #1 SMP PREEMPT Fri, 28
> > May 2021 15:10:20 +0000 x86_64 GNU/Linux
> > Coreutils version: 8.32
> >
> > Regards,
> > Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reflink copying does not check/set No_COW attribute and fail
  2021-06-04 20:16   ` Zygo Blaxell
@ 2021-06-05  5:56     ` Tom Yan
  2021-06-05 10:35       ` Forza
  2021-06-06  5:42       ` Zygo Blaxell
  0 siblings, 2 replies; 9+ messages in thread
From: Tom Yan @ 2021-06-05  5:56 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: linux-btrfs, bug-coreutils

As far as I'm concerned, inheriting an attribute from the source inode
isn't a "surprising" behavior. Rather it seems pretty "natural" to me.
And I don't think whether the attribute is "dangerous" changes that,
because if you consider it "dangerous", shouldn't you "watch out"
anyway when you try to make a clone of a source with such an
attribute?

If we see it from the way that, the kernel does not make the
destination inherit nodatasum just to make reflink succeed as much as
possible, but rather it just by design inherit nodatacow (for the
reason of being NOT surprising), then there's no concern in whether
they should "decoupled" when we implement the inheritance. (Like we
can't set only nodatasum with `chattr either. It's simply out of the
scope then.)

I don't know if we can do that based on whether the reflink mode is
always. Though we can fallback to "normal" copy when the source has
nodatasum (and/or nodatacow), personally I don't find it less
surprising than inheriting nodatacow all the time.

By the way, what will `chattr -C` do exactly if the file/inode had
nodatacow? Is the behavior different when it is / there is a reflink?

On Sat, 5 Jun 2021 at 04:16, Zygo Blaxell
<ce3g8jdj@umail.furryterror.org> wrote:
>
> On Fri, Jun 04, 2021 at 10:37:35PM +0800, Tom Yan wrote:
> > Also cc'ing bug-coreutils@gnu.org.
> >
> > On Fri, 4 Jun 2021 at 22:33, Tom Yan <tom.ty89@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > I've just bumped into a problem that I am not sure what the expected
> > > behavior should be, but there seems to be something flawed.
> > >
> > > Say I have a file that was created with the No_COW attributed
> > > (inherited from the directory / subvolume / mount option). Then if I
> > > try to do a reflink copy, the copying will fail with "Invalid
> > > argument" if the copy has no one to inherit the No_COW attribute from.
>
> Correct.  nodatacow implies nodatasum, and you cannot reflink an extent
> from a nodatasum inode into a datasum inode.
>
> The result of allowing this would be a file that has some extents
> that have csums, and some that do not.  Making this work would make
> reading from such a file worse (i.e. make it slower, or fail to detect
> corruption in metadata).  It's possible to solve some of those problems
> (or at least contain them in inodes that are known to have mixed csum
> and non-csum data), but first someone would have to make the case that
> this is worth the effort to support.
>
> > > For example:
> > > [tom@archlinux mnt]$ sudo btrfs subvol list .
> > > ID 256 gen 11 top level 5 path a
> > > ID 257 gen 9 top level 5 path b
> > > [tom@archlinux mnt]$ lsattr
> > > ---------------------- ./a
> > > ---------------C------ ./b
> > > [tom@archlinux mnt]$ lsattr b/
> > > ---------------C------ b/test
> > > [tom@archlinux mnt]$ du -h b/test
> > > 512M    b/test
> > > [tom@archlinux mnt]$ lsattr a/
> > > [tom@archlinux mnt]$ cp --reflink=always b/test a/
> > > cp: failed to clone 'a/test' from 'b/test': Invalid argument
> > > [tom@archlinux mnt]$ lsattr a/
> > > ---------------------- a/test
> > > [tom@archlinux mnt]$ du a/test
> > > 0    a/test
> > > [tom@archlinux mnt]$ du --apparent-size a/test
> > > 0    a/test
> > > [tom@archlinux mnt]$ rm a/test
> > > [tom@archlinux mnt]$ sudo chattr +C a/
> > > [tom@archlinux mnt]$ cp --reflink=always b/test a/
> > > [tom@archlinux mnt]$ lsattr a/
> > > ---------------C------ a/test
> > > [tom@archlinux mnt]$ cmp b/test a/test
> > > [tom@archlinux mnt]$
> > >
> > > I'm not entirely sure if a reflink copy is supposed to work for a
> > > source file that was created with No_COW, but apparently it is.
>
> Snapshots are also allowed for nodatacow files.  The extents that are
> shared become implicitly datacow until they are not shared any more.
>
> Snapshots are deferred reflink copies, so it would be difficult to
> allow one and not the other.  Disallowing both seems overly restrictive
> (e.g. with such a restriction, it would not be possible to use 'btrfs
> send' or make a snapshot on a subvol that contains any nodatacow file).
>
> btrfs did disallow both operations for swap files, so it could be possible
> to disallow both reflinks and snapshots for nodatacow files, but AFAIK
> nobody wants that (some people even want the swapfile restrictions to
> go away someday).
>
> > > The
> > > problem is just that the reflink copy also needs to have the attribute
> > > set, yet it cannot inherit from the source automatically.
>
> reflink can only reflink copy from one nodatasum file to another nodatasum
> file, or from one datasum file to another datasum file.
>
> An empty inode can be changed from datacow to nodatacow (or vice versa)
> using the fsattr ioctl, which simultaneously changes the file from
> datasum to nodatasum if the filesystem was not mounted with the nodatasum
> mount option.
>
> There is a possible kernel enhancement here:  when an empty inode is the
> dst of a reflink, automatically change the reflink dst inode's nodatasum
> flag to match the reflink src inode's nodatasum flag.  If the dst inode
> is not empty and the inode datasum flags do not match, then reject the
> reflink with EINVAL as before.
>
> It's not clear whether this should apply only to nodatasum or also to
> nodatacow.  reflink doesn't need src and dst agreement on nodatacow,
> so the dst inode could be a nodatasum+datacow file.  Unfortunately all
> the userspace tools including coreutils can only see the nodatacow
> inode bit, not the nodatasum bit, so the lack of csums on the dst file
> would be invisible.  The kernel cannot know the user's intent from the
> available information.
>
> It's not clear that we want the kernel to be implicitly changing
> inode attribute bits like this--especially bits that disable integrity
> features like nodatasum.  There is precedent for changing fsattrs with
> the no-compress inode flag, but that flag doesn't disable csums, and
> this one would.
>
> One could also make the opposite case:  it should always be an error to
> do anything that would put data in a datasum file without csums, the
> existing behavior is correct, and should not be changed.  The problem
> with this argument is that users can't see the datasum inode bits,
> so it's not clear that the EINVAL is a data protection mechanism.
>
> > > I wonder if this is a kernel-side problem or something that coreutils
> > > missed? It also seems wrong that when it fails there will be an empty
> > > destination file created.
>
> Normally coreutils will fall back to simple copy if --reflink=auto
> is used.  --reflink=always is the user's explicit request for "reflink
> or nothing, please."  The user correctly got nothing, as requested.
>
> On other filesystems, reflink on a nodatacow file might make a simple
> copy in the kernel--in which case you are no better off than if you had
> used --reflink=auto.
>
> coreutils could propagate the source inode nodatacow fsattribute to
> the destination inode if it intends to use reflink to copy the data.
> That would be the userspace equivalent of the kernel enhancement I
> suggested above.  It would probably match user expectations better--no
> hidden surprises for non-coreutils use cases, and all the affected inode
> attribute bits are necessarily visible in userspace.
>
> fsattr propagation could be quite complicated for coreutils to implement
> correctly in general, as some fsattrs should not be propagated this way,
> and other filesystems may have different restrictions.  Some fsattrs must
> be set before the data is written (e.g. -c for compression), others must
> be set after the data is written (e.g. -i for immutable), and some are
> a matter of user intent (e.g. should a simple copy be compressed if the
> source is not?  Depends on the intended use of the copy).
>
> On other filesystems this userspace behavior might trigger the opposite
> of the intended kernel behavior, causing reflink to always fall back to
> simple copy because the dst inode's nodatacow attribute is set.
>
> Ideally btrfs will not force coreutils to do one thing on btrfs and the
> opposite thing on other filesystems, so it might be worthwhile to hack
> around this in the kernel as proposed above.  There is precedent for
> that--btrfs falls back to simple copy in reflinks of inline extents,
> more or less for the sole purpose of making cp --reflink=always not fail
> so randomly.
>
> > > Kernel version: Linux archlinux 5.12.8-arch1-1 #1 SMP PREEMPT Fri, 28
> > > May 2021 15:10:20 +0000 x86_64 GNU/Linux
> > > Coreutils version: 8.32
> > >
> > > Regards,
> > > Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reflink copying does not check/set No_COW attribute and fail
  2021-06-05  5:56     ` Tom Yan
@ 2021-06-05 10:35       ` Forza
  2021-06-06  5:42       ` Zygo Blaxell
  1 sibling, 0 replies; 9+ messages in thread
From: Forza @ 2021-06-05 10:35 UTC (permalink / raw)
  To: Tom Yan, Zygo Blaxell; +Cc: linux-btrfs, bug-coreutils



On 2021-06-05 07:56, Tom Yan wrote:
> As far as I'm concerned, inheriting an attribute from the source inode
> isn't a "surprising" behavior. Rather it seems pretty "natural" to me.
> And I don't think whether the attribute is "dangerous" changes that,
> because if you consider it "dangerous", shouldn't you "watch out"
> anyway when you try to make a clone of a source with such an
> attribute?

I'd agree here. 'cp -a' does mean preserve all attrributes. It is up the 
user to think about consequences copying nodatacow files.

> 
> If we see it from the way that, the kernel does not make the
> destination inherit nodatasum just to make reflink succeed as much as
> possible, but rather it just by design inherit nodatacow (for the
> reason of being NOT surprising), then there's no concern in whether
> they should "decoupled" when we implement the inheritance. (Like we
> can't set only nodatasum with `chattr either. It's simply out of the
> scope then.)
> 
> I don't know if we can do that based on whether the reflink mode is
> always. Though we can fallback to "normal" copy when the source has
> nodatasum (and/or nodatacow), personally I don't find it less
> surprising than inheriting nodatacow all the time.
> 
> By the way, what will `chattr -C` do exactly if the file/inode had
> nodatacow? Is the behavior different when it is / there is a reflink?

You cannot disable nodatacow on a file with existing contents.

There is already a thread from May 2020 on coreutils mailing list about 
the order of copying attributes to solve the issue of nodatacow etc.

Basically, 'cp -a' needs to set some file attributes before adding data 
to them.

https://lists.gnu.org/archive/html/coreutils/2020-05/msg00011.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: reflink copying does not check/set No_COW attribute and fail
  2021-06-05  5:56     ` Tom Yan
  2021-06-05 10:35       ` Forza
@ 2021-06-06  5:42       ` Zygo Blaxell
  2021-06-07  5:47         ` bug#48833: " Paul Eggert
  1 sibling, 1 reply; 9+ messages in thread
From: Zygo Blaxell @ 2021-06-06  5:42 UTC (permalink / raw)
  To: Tom Yan; +Cc: linux-btrfs, bug-coreutils

On Sat, Jun 05, 2021 at 01:56:49PM +0800, Tom Yan wrote:
> As far as I'm concerned, inheriting an attribute from the source inode
> isn't a "surprising" behavior. Rather it seems pretty "natural" to me.
> And I don't think whether the attribute is "dangerous" changes that,
> because if you consider it "dangerous", shouldn't you "watch out"
> anyway when you try to make a clone of a source with such an
> attribute?

It's important not to conflate the two contexts.  For cp -a, copying
attributes from the src inode might be obvious.  For any other application
(including cp with different arguments), it might be the worst possible
thing to do, especially if we've been doing the opposite for over
a decade.

People run btrfs on crappy SSD and MMC devices, where filesystem csums
are the only way to know the disk is failing (SMART is nonexistent,
and the firmware doesn't bother with any form of working ECC or CRC, so
the first indication of any drive failure is when it starts corrupting
data).  Disabling the datasum bit on any inode is eventually equivalent
to injecting a stream of random undetected corruption into the file when
the drive inevitably fails.  For users with such drives, a mount option
or filesystem feature flag to disable chattr +C globally would make sense,
as they never want any file to be nodatasum for any reason.

If cp -a implements the inode attribute propagation (or inheritance), then
only users of cp -a are impacted.  They are more likely to be aware that
they may be creating new files with reduced-integrity storage attributes.

> If we see it from the way that, the kernel does not make the
> destination inherit nodatasum just to make reflink succeed as much as
> possible, but rather it just by design inherit nodatacow (for the
> reason of being NOT surprising), then there's no concern in whether
> they should "decoupled" when we implement the inheritance. (Like we
> can't set only nodatasum with `chattr either. It's simply out of the
> scope then.)

Thw window for design of these features mostly closed in 2008.  If we
limit the scope as you suggest, we stuck with a lot of the current
behavior now because it is existing kernel API.

If we also fixed some of the other design issues, like the invisible
datasum attribute, we could make the case that at least these automatic
implied inode changes would be _visible to users_.  At the moment,
it requires special tools to inspect the filesystem to see that a file
has its nodatasum bit set, and we probably don't want to start silently
flipping any more inode bits users can't even see.  We already did that
with prealloc and no-compression attributes, and users find it difficult
to understand the resulting changes in btrfs behavior.  There are some
proposals in the works to make all the inode flags visible as xattr
properties, and if those are done then we can say "well at least you
can see that we now change the datasum attribute sometimes."  But it's
a very weak argument.

The kernel has no way to know it's running 'cp -a' or that the intent
is to copy an existing object.  A user could be constructing a new
object using a combination of writes and reflinks from other files
(e.g. a disk image compiler) and intend for the result to have csums
even if the reflinked source files did not.

The kernel only knows that it has been told to create a file with XZ
attributes, and later the user requests cloning data from a file with
XYZ attributes.  If the user didn't intend for the dst file to have
XYZ attributes (and why would she, when she created the file with XZ
attributes?), then it could be a serious regression if the file suddenly
ends up with unintended Y attributes on new kernels.  Without easy
userspace visibility of the nodatasum attribute, it would be difficult
to even _detect_ this change until after silent data corruption gets
bad enough to be noticed.

Contrast with the coreutils package, which has changed its behavior
several times since reflink was first introduced in 2009 to adapt to
changing user expectations.  Nobody expects coreutils to behave quite
the same way from one year to the next, so there is room for innovation
there--and users who don't want the innovations can fork coreutils or
use some other tools.  One such innovation could be to simply detect
this case and make the appropriate inode changes before copying.

On the other hand, maybe nobody has any existing software deployed that
depends on the old kernel behavior, so we can change it after all.  In the
dangerous case (cheap SSD), a sysadmin wouldn't have any nodatacow files
anywhere on the filesystem, so they wouldn't be making reflink copies
of them because none exist.  They could be making a copy of a nodatacow
file from another filesystem, but in that case they would be using normal
copy, since reflink doesn't work across filesystems.  Maybe we can find
all the users who are running reflinking disk image compilers and they
can update their code to work around the proposed new kernel behavior.

> I don't know if we can do that based on whether the reflink mode is
> always. Though we can fallback to "normal" copy when the source has
> nodatasum (and/or nodatacow), personally I don't find it less
> surprising than inheriting nodatacow all the time.

Note that the kernel never sees the reflink argument from cp.  It is
entirely implemented inside the 'cp' command.  In --reflink=auto mode,
cp will first try clone_range, and if the kernel rejects that with an
EINVAL error, then cp will try again with traditional read/write code.
This is after cp has already created the dst file.  The kernel does
not provide a "create a new file with the attributes of an old file"
call--cp has to create the file with open(), and then do a series of
chmod, chown, and setfattr calls in the right order to replicate the
attributes of the old file.

The kernel doesn't guarantee that reflinks are possible in all cases, and
generally does not provide any fallback.  There are multiple possible
responses to a reflink failure and the kernel cannot implement all
of them.  It is up to userspace (i.e. the cp command) to decide what a
correct fallback behavior should be, and then implement it.

> By the way, what will `chattr -C` do exactly if the file/inode had
> nodatacow? Is the behavior different when it is / there is a reflink?

If the file is empty, you can chattr +C or -C.  If the file is not
empty, chattr fails with an error.

All the data extents in a file have to have the same csum status (csum
present or csum absent) as the inode.  It is not allowed to change the
C attribute when data extents exist because the C attribute indirectly
affects whether the extents have data csums or not, and if the inode was
changed it would disagree with existing data extents.  If no data extents
exist (i.e. zero-length file), then the inode attribute can be changed,
because there is nothing present that could disagree with the inode.

It doesn't matter whether reflinks were previously made or not.  In btrfs
there is no difference between a reflink and any other write, in the
same way that there is no difference between a directory entry created
by hardlink versus any other file creation.  Normal writes create a
new data extent, then reflink it to exactly the spot in file where the
data was written.  clone_range creates a reflink to some data extent(s)
that already exist.  Like hardlinks, when a reflink copy is made, it is
not always possible to tell which was the original and which was the copy.

> On Sat, 5 Jun 2021 at 04:16, Zygo Blaxell
> <ce3g8jdj@umail.furryterror.org> wrote:
> >
> > On Fri, Jun 04, 2021 at 10:37:35PM +0800, Tom Yan wrote:
> > > Also cc'ing bug-coreutils@gnu.org.
> > >
> > > On Fri, 4 Jun 2021 at 22:33, Tom Yan <tom.ty89@gmail.com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I've just bumped into a problem that I am not sure what the expected
> > > > behavior should be, but there seems to be something flawed.
> > > >
> > > > Say I have a file that was created with the No_COW attributed
> > > > (inherited from the directory / subvolume / mount option). Then if I
> > > > try to do a reflink copy, the copying will fail with "Invalid
> > > > argument" if the copy has no one to inherit the No_COW attribute from.
> >
> > Correct.  nodatacow implies nodatasum, and you cannot reflink an extent
> > from a nodatasum inode into a datasum inode.
> >
> > The result of allowing this would be a file that has some extents
> > that have csums, and some that do not.  Making this work would make
> > reading from such a file worse (i.e. make it slower, or fail to detect
> > corruption in metadata).  It's possible to solve some of those problems
> > (or at least contain them in inodes that are known to have mixed csum
> > and non-csum data), but first someone would have to make the case that
> > this is worth the effort to support.
> >
> > > > For example:
> > > > [tom@archlinux mnt]$ sudo btrfs subvol list .
> > > > ID 256 gen 11 top level 5 path a
> > > > ID 257 gen 9 top level 5 path b
> > > > [tom@archlinux mnt]$ lsattr
> > > > ---------------------- ./a
> > > > ---------------C------ ./b
> > > > [tom@archlinux mnt]$ lsattr b/
> > > > ---------------C------ b/test
> > > > [tom@archlinux mnt]$ du -h b/test
> > > > 512M    b/test
> > > > [tom@archlinux mnt]$ lsattr a/
> > > > [tom@archlinux mnt]$ cp --reflink=always b/test a/
> > > > cp: failed to clone 'a/test' from 'b/test': Invalid argument
> > > > [tom@archlinux mnt]$ lsattr a/
> > > > ---------------------- a/test
> > > > [tom@archlinux mnt]$ du a/test
> > > > 0    a/test
> > > > [tom@archlinux mnt]$ du --apparent-size a/test
> > > > 0    a/test
> > > > [tom@archlinux mnt]$ rm a/test
> > > > [tom@archlinux mnt]$ sudo chattr +C a/
> > > > [tom@archlinux mnt]$ cp --reflink=always b/test a/
> > > > [tom@archlinux mnt]$ lsattr a/
> > > > ---------------C------ a/test
> > > > [tom@archlinux mnt]$ cmp b/test a/test
> > > > [tom@archlinux mnt]$
> > > >
> > > > I'm not entirely sure if a reflink copy is supposed to work for a
> > > > source file that was created with No_COW, but apparently it is.
> >
> > Snapshots are also allowed for nodatacow files.  The extents that are
> > shared become implicitly datacow until they are not shared any more.
> >
> > Snapshots are deferred reflink copies, so it would be difficult to
> > allow one and not the other.  Disallowing both seems overly restrictive
> > (e.g. with such a restriction, it would not be possible to use 'btrfs
> > send' or make a snapshot on a subvol that contains any nodatacow file).
> >
> > btrfs did disallow both operations for swap files, so it could be possible
> > to disallow both reflinks and snapshots for nodatacow files, but AFAIK
> > nobody wants that (some people even want the swapfile restrictions to
> > go away someday).
> >
> > > > The
> > > > problem is just that the reflink copy also needs to have the attribute
> > > > set, yet it cannot inherit from the source automatically.
> >
> > reflink can only reflink copy from one nodatasum file to another nodatasum
> > file, or from one datasum file to another datasum file.
> >
> > An empty inode can be changed from datacow to nodatacow (or vice versa)
> > using the fsattr ioctl, which simultaneously changes the file from
> > datasum to nodatasum if the filesystem was not mounted with the nodatasum
> > mount option.
> >
> > There is a possible kernel enhancement here:  when an empty inode is the
> > dst of a reflink, automatically change the reflink dst inode's nodatasum
> > flag to match the reflink src inode's nodatasum flag.  If the dst inode
> > is not empty and the inode datasum flags do not match, then reject the
> > reflink with EINVAL as before.
> >
> > It's not clear whether this should apply only to nodatasum or also to
> > nodatacow.  reflink doesn't need src and dst agreement on nodatacow,
> > so the dst inode could be a nodatasum+datacow file.  Unfortunately all
> > the userspace tools including coreutils can only see the nodatacow
> > inode bit, not the nodatasum bit, so the lack of csums on the dst file
> > would be invisible.  The kernel cannot know the user's intent from the
> > available information.
> >
> > It's not clear that we want the kernel to be implicitly changing
> > inode attribute bits like this--especially bits that disable integrity
> > features like nodatasum.  There is precedent for changing fsattrs with
> > the no-compress inode flag, but that flag doesn't disable csums, and
> > this one would.
> >
> > One could also make the opposite case:  it should always be an error to
> > do anything that would put data in a datasum file without csums, the
> > existing behavior is correct, and should not be changed.  The problem
> > with this argument is that users can't see the datasum inode bits,
> > so it's not clear that the EINVAL is a data protection mechanism.
> >
> > > > I wonder if this is a kernel-side problem or something that coreutils
> > > > missed? It also seems wrong that when it fails there will be an empty
> > > > destination file created.
> >
> > Normally coreutils will fall back to simple copy if --reflink=auto
> > is used.  --reflink=always is the user's explicit request for "reflink
> > or nothing, please."  The user correctly got nothing, as requested.
> >
> > On other filesystems, reflink on a nodatacow file might make a simple
> > copy in the kernel--in which case you are no better off than if you had
> > used --reflink=auto.
> >
> > coreutils could propagate the source inode nodatacow fsattribute to
> > the destination inode if it intends to use reflink to copy the data.
> > That would be the userspace equivalent of the kernel enhancement I
> > suggested above.  It would probably match user expectations better--no
> > hidden surprises for non-coreutils use cases, and all the affected inode
> > attribute bits are necessarily visible in userspace.
> >
> > fsattr propagation could be quite complicated for coreutils to implement
> > correctly in general, as some fsattrs should not be propagated this way,
> > and other filesystems may have different restrictions.  Some fsattrs must
> > be set before the data is written (e.g. -c for compression), others must
> > be set after the data is written (e.g. -i for immutable), and some are
> > a matter of user intent (e.g. should a simple copy be compressed if the
> > source is not?  Depends on the intended use of the copy).
> >
> > On other filesystems this userspace behavior might trigger the opposite
> > of the intended kernel behavior, causing reflink to always fall back to
> > simple copy because the dst inode's nodatacow attribute is set.
> >
> > Ideally btrfs will not force coreutils to do one thing on btrfs and the
> > opposite thing on other filesystems, so it might be worthwhile to hack
> > around this in the kernel as proposed above.  There is precedent for
> > that--btrfs falls back to simple copy in reflinks of inline extents,
> > more or less for the sole purpose of making cp --reflink=always not fail
> > so randomly.
> >
> > > > Kernel version: Linux archlinux 5.12.8-arch1-1 #1 SMP PREEMPT Fri, 28
> > > > May 2021 15:10:20 +0000 x86_64 GNU/Linux
> > > > Coreutils version: 8.32
> > > >
> > > > Regards,
> > > > Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#48833: reflink copying does not check/set No_COW attribute and fail
  2021-06-06  5:42       ` Zygo Blaxell
@ 2021-06-07  5:47         ` Paul Eggert
  2021-06-08  2:41           ` Zygo Blaxell
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Eggert @ 2021-06-07  5:47 UTC (permalink / raw)
  To: Zygo Blaxell, Tom Yan; +Cc: 48833, linux-btrfs

On 6/5/21 10:42 PM, Zygo Blaxell wrote:
> If cp -a implements the inode attribute propagation (or inheritance), then
> only users of cp -a are impacted.  They are more likely to be aware that
> they may be creating new files with reduced-integrity storage attributes.

True, although I think this aspect of attribute-copying will typically 
come as a surprise even to "cp -a" users.

> If the file is empty, you can chattr +C or -C.  If the file is not
> empty, chattr fails with an error.

Although coreutils 'cp -a' currently truncates any already-existing 
output file (by opening it with O_TRUNC), it then calls copy_file_range 
before calling fsetxattr on the destination. Presumably cp should do the 
equivalent of chattr +C before doing the copy_file_range stuff. (Perhaps 
you've already mentioned this point; if so, my apologies for the 
duplication.)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#48833: reflink copying does not check/set No_COW attribute and fail
  2021-06-07  5:47         ` bug#48833: " Paul Eggert
@ 2021-06-08  2:41           ` Zygo Blaxell
  2021-06-27 10:56             ` A L
  0 siblings, 1 reply; 9+ messages in thread
From: Zygo Blaxell @ 2021-06-08  2:41 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Tom Yan, 48833, linux-btrfs

On Sun, Jun 06, 2021 at 10:47:05PM -0700, Paul Eggert wrote:
> On 6/5/21 10:42 PM, Zygo Blaxell wrote:
> > If cp -a implements the inode attribute propagation (or inheritance), then
> > only users of cp -a are impacted.  They are more likely to be aware that
> > they may be creating new files with reduced-integrity storage attributes.
> 
> True, although I think this aspect of attribute-copying will typically come
> as a surprise even to "cp -a" users.

Existing users might be surprised when "cp -a" starts replicating storage
attributes when it did not do so before, but I suspect most future cp
users would expect "cp -a" to preserve storage-policy attributes the same
way it currently preserves ownership, permissions, timestamps, extended
attributes, and security context--a list that initially contained only
the ownership, permissions, and timestamps in the past, the others were
added over time.  If not by default, then at least have the ability to
do it when requested with a "--preserve=datacow" switch.

We could say "cp --reflink=always implies --preserve=datacow because it
might not work otherwise", which solves the immediate issue as presented,
but I don't think we _want_ to say that because it has a potentially
bad surprising case when attribute changes are unexpected (it's the same
reason that we would not want to implement it that way in the kernel).
As a cp user, I would prefer to make the choice to copy the storage
attributes with --preserve or -a, and after that choice is made, then
--reflink=always will work because cp is setting attributes in dst to
match src as I requested it to do.  If I had made the opposite choice
(didn't use -a or --preserve, or did use --no-preserve=datacow), then
cp shall not copy the storage attributes, the dst inodes have attributes
inherited from dst's parent, --reflink=always fails when there are
mismatches as it does now, and --reflink=auto or =none copies may have
different storage attributes from the src (with possibly stronger or
weaker integrity).

We could say "'cp -a --reflink=always' implies --preserve=datacow, but
'--reflink=always' and '-a' by themselves do not."  It seems complex
to describe, but maybe it does surprise the fewest people in the most
beneficial way:  plain 'cp -a' users keep exactly the old behavior,
while 'cp -a --reflink=always' users get successful copies in one case
where they currently get unexpected failures.  'cp -r' doesn't imply
--preserve=all, so 'cp -r --reflink=always' would need to be accompanied
by '--preserve=datacow' or '--preserve=all' to get the attribute-copying.

The cp doc could be clearer that filesystems that support reflink
don't guarantee every file can be reflinked to every other file.
reflink is expected to fail in a growing number of cases over time,
as more filesystem features are created that are incompatible with it
(e.g. encryption, where reflinks between files with different owners could
be unimplementable).  I've seen a number of users get burned by making big
--reflink=always copies and not checking the results for errors, assuming
that only lack of space for metadata could cause a reflink copy to fail.
There are good reasons why --reflink=auto exists and is the default,
and users ignore them at their peril.

The really awful thing about all this is that it's not datacow, the
thing visible with chattr +/-C and implemented by other filesystems,
that is the problem.  The problem is the datasum bit hidden behind the
datacow bit on btrfs.  datasum can still be disabled even when datacow
is enabled, and datasum requires privileges to detect (unprivileged
users can only observe the datasum bit's effect on reflink copies to
files with the opposite datasum setting).  The proper option name should
be --preserve=datasum, but cp can only examine or change the datacow bit.

> > If the file is empty, you can chattr +C or -C.  If the file is not
> > empty, chattr fails with an error.
> 
> Although coreutils 'cp -a' currently truncates any already-existing output
> file (by opening it with O_TRUNC), it then calls copy_file_range before
> calling fsetxattr on the destination. Presumably cp should do the equivalent
> of chattr +C before doing the copy_file_range stuff. (Perhaps you've already
> mentioned this point; if so, my apologies for the duplication.)

The important thing is that got across, whether you extracted it from what
I wrote, or reconstructed it from context.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: bug#48833: reflink copying does not check/set No_COW attribute and fail
  2021-06-08  2:41           ` Zygo Blaxell
@ 2021-06-27 10:56             ` A L
  0 siblings, 0 replies; 9+ messages in thread
From: A L @ 2021-06-27 10:56 UTC (permalink / raw)
  To: Zygo Blaxell, Paul Eggert; +Cc: 48833, linux-btrfs, Tom Yan


On 2021-06-08 04:41, Zygo Blaxell wrote:
 > On Sun, Jun 06, 2021 at 10:47:05PM -0700, Paul Eggert wrote:
 >> On 6/5/21 10:42 PM, Zygo Blaxell wrote:
 >>> If cp -a implements the inode attribute propagation (or 
inheritance), then
 >>> only users of cp -a are impacted.  They are more likely to be aware 
that
 >>> they may be creating new files with reduced-integrity storage 
attributes.
 >>
 >> True, although I think this aspect of attribute-copying will 
typically come
 >> as a surprise even to "cp -a" users.
 >
 > Existing users might be surprised when "cp -a" starts replicating storage
 > attributes when it did not do so before, but I suspect most future cp
 > users would expect "cp -a" to preserve storage-policy attributes the same
 > way it currently preserves ownership, permissions, timestamps, extended
 > attributes, and security context--a list that initially contained only
 > the ownership, permissions, and timestamps in the past, the others were
 > added over time.  If not by default, then at least have the ability to
 > do it when requested with a "--preserve=datacow" switch.

...

 > The cp doc could be clearer that filesystems that support reflink
 > don't guarantee every file can be reflinked to every other file.
 > reflink is expected to fail in a growing number of cases over time,
 > as more filesystem features are created that are incompatible with it
 > (e.g. encryption, where reflinks between files with different owners 
could
 > be unimplementable).  I've seen a number of users get burned by 
making big
 > --reflink=always copies and not checking the results for errors, assuming
 > that only lack of space for metadata could cause a reflink copy to fail.
 > There are good reasons why --reflink=auto exists and is the default,
 > and users ignore them at their peril.
 >

Hello everyone,
I made a similar thread[1] about a year ago on the coreutils 
mailing-list and I think it is also relevant to this bug-report.

It is true as Zygo mentions, that reflinking nocow and cow files does 
not work, and cannot work due to the nature of how nocow works.

What I would like to add to this bug-report is what elaborated on in the 
other thread, that we can move forward with preserving all attributes by 
setting them in the correct order. I show in the message that reflinking 
works between two nocow files and that ‘cp -a’ could preserve nocow and 
other attributes if ‘cp -a’ sets those attributes in correct order.

As a normal end-user, IMHO, ‘cp -a’ should preserve all attributes where 
possible, which is also what the manual[2] currently states:

‘--archive’
Preserve as much as possible of the structure and attributes of the 
original files in the copy (but do not attempt to preserve internal 
directory structure; i.e., ‘ls -U’ may list the entries in a copied 
directory in a different order). Try to preserve SELinux security 
context and extended attributes (xattr), but ignore any failure to do 
that and print no corresponding diagnostic. Equivalent to -dR 
--preserve=all with the reduced diagnostics.


Only when using --reflink=always, we should fail if the target cannot 
support reflinks.

Thanks!

~A


[1] https://lists.gnu.org/archive/html/coreutils/2021-06/msg00005.html that
[2] 
https://www.gnu.org/software/coreutils/manual/html_node/cp-invocation.html#cp-invocation


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-27 10:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04 14:33 reflink copying does not check/set No_COW attribute and fail Tom Yan
2021-06-04 14:37 ` Tom Yan
2021-06-04 20:16   ` Zygo Blaxell
2021-06-05  5:56     ` Tom Yan
2021-06-05 10:35       ` Forza
2021-06-06  5:42       ` Zygo Blaxell
2021-06-07  5:47         ` bug#48833: " Paul Eggert
2021-06-08  2:41           ` Zygo Blaxell
2021-06-27 10:56             ` A L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.