Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)

All of lore.kernel.org
 help / color / mirror / Atom feed

* Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 10:26 Markus Trippelsdorf
  2011-04-14 12:06 ` Markus Trippelsdorf
                   ` (2 more replies)
  0 siblings, 3 replies; 117+ messages in thread
From: Markus Trippelsdorf @ 2011-04-14 10:26 UTC (permalink / raw)
  To: coreutils; +Cc: xfs

I trashed my system this morning when I installed coreutils-8.11.

What happened is that coreutils compiles and links correctly, but then
the following command (during the installation phase):

./ginstall chroot hostid nice who users pinky stty df stdbuf [ base64 basename cat chcon chgrp chmod chown cksum comm cp csplit cut date dd dir dircolors dirname du echo env expand expr factor false fmt fold head id join link ln logname ls md5sum mkdir mkfifo mknod mktemp mv nl nproc nohup od paste pathchk pr printenv printf ptx pwd readlink rm rmdir runcon seq sha1sum sha224sum sha256sum sha384sum sha512sum shred shuf sleep sort split stat sum sync tac tail tee test timeout touch tr true truncate tsort tty uname unexpand uniq unlink vdir wc whoami yes arch '/var/tmp/portage/sys-apps/coreutils-8.11/image//usr/bin'

apparently produces files which have the length of the originals but are
full of zeros. (and these were then installed to my live system, thereby
trashing it).

Now all the above is automated, because I use gentoo.
But when I run the command above (ginstall) later again by hand,
everything is copied just fine and the resulting binaries are all
usable.

The partition in question uses xfs.

# xfs_info /var
meta-data=/dev/sda1              isize=256    agcount=4, agsize=12800000 blks
         =                       sectsz=4096  attr=2
data     =                       bsize=4096   blocks=51200000, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=25000, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

This is a 4kb hard drive (sectsz=4096).

I'm running the lastest vanilla git kernel (2.6.39-rc3-00087-gda768a4).

Now my question is, could this be caused by the recent FIEMAP changes in
coreutils?
-- 
Markus

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 10:26 Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) Markus Trippelsdorf
@ 2011-04-14 12:06 ` Markus Trippelsdorf
  2011-04-14 14:02   ` Markus Trippelsdorf
  2011-04-14 14:39 ` Eric Sandeen
       [not found] ` <20110414102608.GA1678-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
  2 siblings, 1 reply; 117+ messages in thread
From: Markus Trippelsdorf @ 2011-04-14 12:06 UTC (permalink / raw)
  To: coreutils; +Cc: xfs

On 2011.04.14 at 12:26 +0200, Markus Trippelsdorf wrote:
> I trashed my system this morning when I installed coreutils-8.11.
> 
> What happened is that coreutils compiles and links correctly, but then
> the following command (during the installation phase):
> 
> ./ginstall chroot hostid nice who users pinky stty df stdbuf [ base64 basename cat chcon chgrp chmod chown cksum comm cp csplit cut date dd dir dircolors dirname du echo env expand expr factor false fmt fold head id join link ln logname ls md5sum mkdir mkfifo mknod mktemp mv nl nproc nohup od paste pathchk pr printenv printf ptx pwd readlink rm rmdir runcon seq sha1sum sha224sum sha256sum sha384sum sha512sum shred shuf sleep sort split stat sum sync tac tail tee test timeout touch tr true truncate tsort tty uname unexpand uniq unlink vdir wc whoami yes arch '/var/tmp/portage/sys-apps/coreutils-8.11/image//usr/bin'
> 
> apparently produces files which have the length of the originals but are
> full of zeros. (and these were then installed to my live system, thereby
> trashing it).
> 
> Now all the above is automated, because I use gentoo.
> But when I run the command above (ginstall) later again by hand,
> everything is copied just fine and the resulting binaries are all
> usable.
> 
> The partition in question uses xfs.
> 
> # xfs_info /var
> meta-data=/dev/sda1              isize=256    agcount=4, agsize=12800000 blks
>          =                       sectsz=4096  attr=2
> data     =                       bsize=4096   blocks=51200000, imaxpct=25
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=25000, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> This is a 4kb hard drive (sectsz=4096).
> 
> I'm running the lastest vanilla git kernel (2.6.39-rc3-00087-gda768a4).
> 
> Now my question is, could this be caused by the recent FIEMAP changes in
> coreutils?

Apparently yes:

Here is a "make check" failure example:

FAIL: cp/fiemap-empty (exit: 1)
===============================
...
+ fallocate -l 10MiB -n unwritten.withdata
+ dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock
of=unwritten.withdata
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.00219578 s, 2.3 MB/s
+ cp unwritten.withdata cp.test
++ stat -c %s unwritten.withdata
++ stat -c %s cp.test
+ test 5120 = 5120
+ cmp unwritten.withdata cp.test
unwritten.withdata cp.test differ: char 1, line 1
+ fail=1

-- 
Markus

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 12:06 ` Markus Trippelsdorf
@ 2011-04-14 14:02   ` Markus Trippelsdorf
       [not found]     ` <20110414140222.GB1679-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
  0 siblings, 1 reply; 117+ messages in thread
From: Markus Trippelsdorf @ 2011-04-14 14:02 UTC (permalink / raw)
  To: coreutils; +Cc: xfs

On 2011.04.14 at 14:53 +0100, Pádraig Brady wrote:
> On 14/04/11 14:48, Markus Trippelsdorf wrote:
> > On 2011.04.14 at 14:34 +0100, Pádraig Brady wrote:
> >> Hi Markus,
> >>
> >> I noticed your fiemap issues here:
> >> http://oss.sgi.com/pipermail/xfs/2011-April/050102.html
> >>
> >> FAIL: cp/fiemap-empty (exit: 1)
> >> ===============================
> >> ...
> >> + fallocate -l 10MiB -n unwritten.withdata
> >> + dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock
> >> of=unwritten.withdata
> >> 10+0 records in
> >> 10+0 records out
> >> 5120 bytes (5.1 kB) copied, 0.00219578 s, 2.3 MB/s
> >> + cp unwritten.withdata cp.test
> >> ++ stat -c %s unwritten.withdata
> >> ++ stat -c %s cp.test
> >> + test 5120 = 5120
> >> + cmp unwritten.withdata cp.test
> >> unwritten.withdata cp.test differ: char 1, line 1
> >> + fail=1
> >>
> >> cp was changed in 8.11 to not bother reading
> >> an extent if it is marked as UNWRITTEN.
> >> The comment in fiemap.h says that this means that the
> >> space is allocated, but zero.
> >>
> >> We tested on XFS, on F15 x86_64, which is earlier
> >> than your 2.6.39-rc3 and didn't notice this issue.
> >>
> >> I'm guessing so that XFS is reporting the extent
> >> as UNWRITTEN, even though there is data in it now,
> >> and that it might sort itself out after a while,
> >> or after a sync I suppose (note we also stopped
> >> using sync before fiemap for 2.6.39).
> >>
> >> It would help a lot if you could insert this command
> >> into the test above (just before the failing cp)
> >> and show the test output again:
> >>
> >>   filefrag -v unwritten.withdata
> > 
> > Hi Pádraig,
> > 
> > here you go:
> > + filefrag -v unwritten.withdata                                                                                                                     
> > Filesystem type is: ef53                                                                                                                             
> > File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
> >  ext logical physical expected length flags                                                                                                          
> >    0       0   274432            2560 unwritten,eof                                                                                                  
> > unwritten.withdata: 1 extent found
> > 
> > Please notice that this also happens with ext4 on the same kernel. 
> > Btrfs is fine.
> 
> That looks like a bug in XFS :(
> I presume if you change `filefrag -v` to `filefrag -vs` that
> the output will change, and the test will pass.
> I'm a bit surprised that ext4 shows the same thing
> as there was supposedly a patch for this issue already
> applied for 2.6.39.
> 
> It would be great if we got these fixed up before
> 2.6.29 was released, so that the checks in coreutils 8.11
> were valid.

You're right `filefrag -vs` fixes the issue on both xfs and ext4.

-- 
Markus

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 10:26 Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) Markus Trippelsdorf
  2011-04-14 12:06 ` Markus Trippelsdorf
@ 2011-04-14 14:39 ` Eric Sandeen
       [not found] ` <20110414102608.GA1678-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
  2 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 14:39 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: coreutils, xfs

On 4/14/11 5:26 AM, Markus Trippelsdorf wrote:
> I trashed my system this morning when I installed coreutils-8.11.
> 
> What happened is that coreutils compiles and links correctly, but
> then the following command (during the installation phase):
> 
> ./ginstall chroot hostid nice who users pinky stty df stdbuf [ base64
> basename cat chcon chgrp chmod chown cksum comm cp csplit cut date dd
> dir dircolors dirname du echo env expand expr factor false fmt fold
> head id join link ln logname ls md5sum mkdir mkfifo mknod mktemp mv
> nl nproc nohup od paste pathchk pr printenv printf ptx pwd readlink
> rm rmdir runcon seq sha1sum sha224sum sha256sum sha384sum sha512sum
> shred shuf sleep sort split stat sum sync tac tail tee test timeout
> touch tr true truncate tsort tty uname unexpand uniq unlink vdir wc
> whoami yes arch
> '/var/tmp/portage/sys-apps/coreutils-8.11/image//usr/bin'
> 
> apparently produces files which have the length of the originals but
> are full of zeros. (and these were then installed to my live system,
> thereby trashing it).
> 
> Now all the above is automated, because I use gentoo. But when I run
> the command above (ginstall) later again by hand, everything is
> copied just fine and the resulting binaries are all usable.
> 
> The partition in question uses xfs.
> 
> # xfs_info /var meta-data=/dev/sda1              isize=256
> agcount=4, agsize=12800000 blks =                       sectsz=4096
> attr=2 data     =                       bsize=4096   blocks=51200000,
> imaxpct=25 =                       sunit=0      swidth=0 blks naming
> =version 2              bsize=4096   ascii-ci=0 log      =internal
> bsize=4096   blocks=25000, version=2 =
> sectsz=4096  sunit=1 blks, lazy-count=1 realtime =none
> extsz=4096   blocks=0, rtextents=0
> 
> This is a 4kb hard drive (sectsz=4096).
> 
> I'm running the lastest vanilla git kernel
> (2.6.39-rc3-00087-gda768a4).
> 
> Now my question is, could this be caused by the recent FIEMAP changes
> in coreutils?


Well damn.  Looking into it ...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 14:02   ` Markus Trippelsdorf
@ 2011-04-14 14:59         ` Pádraig Brady
  0 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-04-14 14:59 UTC (permalink / raw)
  To: xfs-oss, linux-ext4-u79uwXL29TY76Z2rM5mHXA
  Cc: coreutils-mXXj517/zsQ, Markus Trippelsdorf

On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>> Hi Pádraig,
>>>
>>> here you go:
>>> + filefrag -v unwritten.withdata                                                                                                                     
>>> Filesystem type is: ef53                                                                                                                             
>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>  ext logical physical expected length flags                                                                                                          
>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>> unwritten.withdata: 1 extent found
>>>
>>> Please notice that this also happens with ext4 on the same kernel. 
>>> Btrfs is fine.
>>
> `filefrag -vs` fixes the issue on both xfs and ext4.

So in summary, currently on (2.6.39-rc3), the following
will (usually?) report a single unwritten extent,
on both ext4 and xfs

  fallocate -l 10MiB -n k
  dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
  filefrag -v k # grep for an extent without unwritten || fail

This particular issue has been discussed so far at:
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
Note there it was stated there that ext4 had this
fixed as of 2.6.39-rc1, so maybe there is something lurking?

cheers,
Pádraig.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 14:59         ` Pádraig Brady
  0 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-04-14 14:59 UTC (permalink / raw)
  To: xfs-oss, linux-ext4; +Cc: coreutils, Markus Trippelsdorf

On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>> Hi Pádraig,
>>>
>>> here you go:
>>> + filefrag -v unwritten.withdata                                                                                                                     
>>> Filesystem type is: ef53                                                                                                                             
>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>  ext logical physical expected length flags                                                                                                          
>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>> unwritten.withdata: 1 extent found
>>>
>>> Please notice that this also happens with ext4 on the same kernel. 
>>> Btrfs is fine.
>>
> `filefrag -vs` fixes the issue on both xfs and ext4.

So in summary, currently on (2.6.39-rc3), the following
will (usually?) report a single unwritten extent,
on both ext4 and xfs

  fallocate -l 10MiB -n k
  dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
  filefrag -v k # grep for an extent without unwritten || fail

This particular issue has been discussed so far at:
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
Note there it was stated there that ext4 had this
fixed as of 2.6.39-rc1, so maybe there is something lurking?

cheers,
Pádraig.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 14:59         ` Pádraig Brady
@ 2011-04-14 15:50             ` Eric Sandeen
  -1 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 15:50 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA, coreutils-mXXj517/zsQ,
	Markus Trippelsdorf, xfs-oss

On 4/14/11 9:59 AM, Pádraig Brady wrote:
> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>> Hi Pádraig,
>>>>
>>>> here you go:
>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>> Filesystem type is: ef53                                                                                                                             
>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>  ext logical physical expected length flags                                                                                                          
>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>>> unwritten.withdata: 1 extent found
>>>>
>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>> Btrfs is fine.
>>>
>> `filefrag -vs` fixes the issue on both xfs and ext4.
> 
> So in summary, currently on (2.6.39-rc3), the following
> will (usually?) report a single unwritten extent,
> on both ext4 and xfs
> 
>   fallocate -l 10MiB -n k
>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>   filefrag -v k # grep for an extent without unwritten || fail

right, that's what I see too in testing.

But would the coreutils install have done a preallocation of the destination file?

Otherwise this looks like a different bug...

> This particular issue has been discussed so far at:
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
> Note there it was stated there that ext4 had this
> fixed as of 2.6.39-rc1, so maybe there is something lurking?

ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.

Still, I don't know for sure what happened to Markus - did something preallocate, in his case?

-Eric
 
> cheers,
> Pádraig.
> 
> _______________________________________________
> xfs mailing list
> xfs-VZNHf3L845pBDgjK7y7TUQ@public.gmane.org
> http://oss.sgi.com/mailman/listinfo/xfs
> 

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 15:50             ` Eric Sandeen
  0 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 15:50 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: linux-ext4, coreutils, Markus Trippelsdorf, xfs-oss

On 4/14/11 9:59 AM, Pádraig Brady wrote:
> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>> Hi Pádraig,
>>>>
>>>> here you go:
>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>> Filesystem type is: ef53                                                                                                                             
>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>  ext logical physical expected length flags                                                                                                          
>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>>> unwritten.withdata: 1 extent found
>>>>
>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>> Btrfs is fine.
>>>
>> `filefrag -vs` fixes the issue on both xfs and ext4.
> 
> So in summary, currently on (2.6.39-rc3), the following
> will (usually?) report a single unwritten extent,
> on both ext4 and xfs
> 
>   fallocate -l 10MiB -n k
>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>   filefrag -v k # grep for an extent without unwritten || fail

right, that's what I see too in testing.

But would the coreutils install have done a preallocation of the destination file?

Otherwise this looks like a different bug...

> This particular issue has been discussed so far at:
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
> Note there it was stated there that ext4 had this
> fixed as of 2.6.39-rc1, so maybe there is something lurking?

ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.

Still, I don't know for sure what happened to Markus - did something preallocate, in his case?

-Eric
 
> cheers,
> Pádraig.
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 15:50             ` Eric Sandeen
@ 2011-04-14 15:52                 ` Pádraig Brady
  -1 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-04-14 15:52 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA, coreutils-mXXj517/zsQ,
	Markus Trippelsdorf, xfs-oss

On 14/04/11 16:50, Eric Sandeen wrote:
> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>> Hi Pádraig,
>>>>>
>>>>> here you go:
>>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>>> Filesystem type is: ef53                                                                                                                             
>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>>  ext logical physical expected length flags                                                                                                          
>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>>>> unwritten.withdata: 1 extent found
>>>>>
>>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>>> Btrfs is fine.
>>>>
>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>
>> So in summary, currently on (2.6.39-rc3), the following
>> will (usually?) report a single unwritten extent,
>> on both ext4 and xfs
>>
>>   fallocate -l 10MiB -n k
>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>   filefrag -v k # grep for an extent without unwritten || fail
> 
> right, that's what I see too in testing.
> 
> But would the coreutils install have done a preallocation of the destination file?
> 
> Otherwise this looks like a different bug...
> 
>> This particular issue has been discussed so far at:
>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>> Note there it was stated there that ext4 had this
>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
> 
> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
> 
> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?

Well that preallocate test is failing for him
when the source file is either on ext4 or xfs.
He noticed the issue initially on XFS when copying
none preallocated files, so XFS probably just has
the general issue of needing a sync before fiemap,
where as EXT4 just has this preallocate one
(though I've not seen it myself).

cheers,
Pádraig.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 15:52                 ` Pádraig Brady
  0 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-04-14 15:52 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-ext4, coreutils, Markus Trippelsdorf, xfs-oss

On 14/04/11 16:50, Eric Sandeen wrote:
> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>> Hi Pádraig,
>>>>>
>>>>> here you go:
>>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>>> Filesystem type is: ef53                                                                                                                             
>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>>  ext logical physical expected length flags                                                                                                          
>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>>>> unwritten.withdata: 1 extent found
>>>>>
>>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>>> Btrfs is fine.
>>>>
>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>
>> So in summary, currently on (2.6.39-rc3), the following
>> will (usually?) report a single unwritten extent,
>> on both ext4 and xfs
>>
>>   fallocate -l 10MiB -n k
>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>   filefrag -v k # grep for an extent without unwritten || fail
> 
> right, that's what I see too in testing.
> 
> But would the coreutils install have done a preallocation of the destination file?
> 
> Otherwise this looks like a different bug...
> 
>> This particular issue has been discussed so far at:
>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>> Note there it was stated there that ext4 had this
>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
> 
> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
> 
> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?

Well that preallocate test is failing for him
when the source file is either on ext4 or xfs.
He noticed the issue initially on XFS when copying
none preallocated files, so XFS probably just has
the general issue of needing a sync before fiemap,
where as EXT4 just has this preallocate one
(though I've not seen it myself).

cheers,
Pádraig.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 15:52                 ` Pádraig Brady
@ 2011-04-14 15:56                   ` Eric Sandeen
  -1 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 15:56 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: xfs-oss, linux-ext4, coreutils, Markus Trippelsdorf

On 4/14/11 10:52 AM, Pádraig Brady wrote:
> On 14/04/11 16:50, Eric Sandeen wrote:
>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>> Hi Pádraig,
>>>>>>
>>>>>> here you go:
>>>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>>>> Filesystem type is: ef53                                                                                                                             
>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>>>  ext logical physical expected length flags                                                                                                          
>>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>>>>> unwritten.withdata: 1 extent found
>>>>>>
>>>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>>>> Btrfs is fine.
>>>>>
>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>
>>> So in summary, currently on (2.6.39-rc3), the following
>>> will (usually?) report a single unwritten extent,
>>> on both ext4 and xfs
>>>
>>>   fallocate -l 10MiB -n k
>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>   filefrag -v k # grep for an extent without unwritten || fail
>>
>> right, that's what I see too in testing.
>>
>> But would the coreutils install have done a preallocation of the destination file?
>>
>> Otherwise this looks like a different bug...
>>
>>> This particular issue has been discussed so far at:
>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>> Note there it was stated there that ext4 had this
>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>
>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>
>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> 
> Well that preallocate test is failing for him
> when the source file is either on ext4 or xfs.
> He noticed the issue initially on XFS when copying
> none preallocated files, so XFS probably just has
> the general issue of needing a sync before fiemap,
> where as EXT4 just has this preallocate one
> (though I've not seen it myself).
> 
> cheers,
> Pádraig.
> 

well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.

So I still don't know what Markus hit...

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 15:56                   ` Eric Sandeen
  0 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 15:56 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: linux-ext4, coreutils, Markus Trippelsdorf, xfs-oss

On 4/14/11 10:52 AM, Pádraig Brady wrote:
> On 14/04/11 16:50, Eric Sandeen wrote:
>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>> Hi Pádraig,
>>>>>>
>>>>>> here you go:
>>>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>>>> Filesystem type is: ef53                                                                                                                             
>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>>>  ext logical physical expected length flags                                                                                                          
>>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>>>>> unwritten.withdata: 1 extent found
>>>>>>
>>>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>>>> Btrfs is fine.
>>>>>
>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>
>>> So in summary, currently on (2.6.39-rc3), the following
>>> will (usually?) report a single unwritten extent,
>>> on both ext4 and xfs
>>>
>>>   fallocate -l 10MiB -n k
>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>   filefrag -v k # grep for an extent without unwritten || fail
>>
>> right, that's what I see too in testing.
>>
>> But would the coreutils install have done a preallocation of the destination file?
>>
>> Otherwise this looks like a different bug...
>>
>>> This particular issue has been discussed so far at:
>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>> Note there it was stated there that ext4 had this
>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>
>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>
>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> 
> Well that preallocate test is failing for him
> when the source file is either on ext4 or xfs.
> He noticed the issue initially on XFS when copying
> none preallocated files, so XFS probably just has
> the general issue of needing a sync before fiemap,
> where as EXT4 just has this preallocate one
> (though I've not seen it myself).
> 
> cheers,
> Pádraig.
> 

well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.

So I still don't know what Markus hit...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 15:56                   ` Eric Sandeen
@ 2011-04-14 16:03                     ` Markus Trippelsdorf
  -1 siblings, 0 replies; 117+ messages in thread
From: Markus Trippelsdorf @ 2011-04-14 16:03 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Pádraig Brady, xfs-oss, linux-ext4, coreutils

On 2011.04.14 at 10:56 -0500, Eric Sandeen wrote:
> On 4/14/11 10:52 AM, Pádraig Brady wrote:
> > On 14/04/11 16:50, Eric Sandeen wrote:
> >> On 4/14/11 9:59 AM, Pádraig Brady wrote:
> >>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
> >>>>>> Hi Pádraig,
> >>>>>>
> >>>>>> here you go:
> >>>>>> + filefrag -v unwritten.withdata                                                                                                                     
> >>>>>> Filesystem type is: ef53                                                                                                                             
> >>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
> >>>>>>  ext logical physical expected length flags                                                                                                          
> >>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
> >>>>>> unwritten.withdata: 1 extent found
> >>>>>>
> >>>>>> Please notice that this also happens with ext4 on the same kernel. 
> >>>>>> Btrfs is fine.
> >>>>>
> >>>> `filefrag -vs` fixes the issue on both xfs and ext4.
> >>>
> >>> So in summary, currently on (2.6.39-rc3), the following
> >>> will (usually?) report a single unwritten extent,
> >>> on both ext4 and xfs
> >>>
> >>>   fallocate -l 10MiB -n k
> >>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> >>>   filefrag -v k # grep for an extent without unwritten || fail
> >>
> >> right, that's what I see too in testing.
> >>
> >> But would the coreutils install have done a preallocation of the destination file?
> >>
> >> Otherwise this looks like a different bug...
> >>
> >>> This particular issue has been discussed so far at:
> >>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
> >>> Note there it was stated there that ext4 had this
> >>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
> >>
> >> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
> >>
> >> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> > 
> > Well that preallocate test is failing for him
> > when the source file is either on ext4 or xfs.
> > He noticed the issue initially on XFS when copying
> > none preallocated files, so XFS probably just has
> > the general issue of needing a sync before fiemap,
> > where as EXT4 just has this preallocate one
> > (though I've not seen it myself).
> > 
> > cheers,
> > Pádraig.
> > 
> 
> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
> 
> So I still don't know what Markus hit...

Maybe it's delalloc:

x4 /tmp # dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k                                                                               
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.0021822 s, 2.3 MB/s
x4 /tmp # filefrag -v k 
Filesystem type is: 58465342
File size of k is 5120 (2 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0        0              16 unknown,delalloc,eof
k: 1 extent found
x4 /tmp # sync
x4 /tmp # filefrag -v k 
Filesystem type is: 58465342
File size of k is 5120 (2 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0 26960045              16 eof
k: 1 extent found

-- 
Markus
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 16:03                     ` Markus Trippelsdorf
  0 siblings, 0 replies; 117+ messages in thread
From: Markus Trippelsdorf @ 2011-04-14 16:03 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-ext4, Pádraig Brady, coreutils, xfs-oss

On 2011.04.14 at 10:56 -0500, Eric Sandeen wrote:
> On 4/14/11 10:52 AM, Pádraig Brady wrote:
> > On 14/04/11 16:50, Eric Sandeen wrote:
> >> On 4/14/11 9:59 AM, Pádraig Brady wrote:
> >>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
> >>>>>> Hi Pádraig,
> >>>>>>
> >>>>>> here you go:
> >>>>>> + filefrag -v unwritten.withdata                                                                                                                     
> >>>>>> Filesystem type is: ef53                                                                                                                             
> >>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
> >>>>>>  ext logical physical expected length flags                                                                                                          
> >>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
> >>>>>> unwritten.withdata: 1 extent found
> >>>>>>
> >>>>>> Please notice that this also happens with ext4 on the same kernel. 
> >>>>>> Btrfs is fine.
> >>>>>
> >>>> `filefrag -vs` fixes the issue on both xfs and ext4.
> >>>
> >>> So in summary, currently on (2.6.39-rc3), the following
> >>> will (usually?) report a single unwritten extent,
> >>> on both ext4 and xfs
> >>>
> >>>   fallocate -l 10MiB -n k
> >>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> >>>   filefrag -v k # grep for an extent without unwritten || fail
> >>
> >> right, that's what I see too in testing.
> >>
> >> But would the coreutils install have done a preallocation of the destination file?
> >>
> >> Otherwise this looks like a different bug...
> >>
> >>> This particular issue has been discussed so far at:
> >>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
> >>> Note there it was stated there that ext4 had this
> >>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
> >>
> >> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
> >>
> >> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> > 
> > Well that preallocate test is failing for him
> > when the source file is either on ext4 or xfs.
> > He noticed the issue initially on XFS when copying
> > none preallocated files, so XFS probably just has
> > the general issue of needing a sync before fiemap,
> > where as EXT4 just has this preallocate one
> > (though I've not seen it myself).
> > 
> > cheers,
> > Pádraig.
> > 
> 
> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
> 
> So I still don't know what Markus hit...

Maybe it's delalloc:

x4 /tmp # dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k                                                                               
10+0 records in
10+0 records out
5120 bytes (5.1 kB) copied, 0.0021822 s, 2.3 MB/s
x4 /tmp # filefrag -v k 
Filesystem type is: 58465342
File size of k is 5120 (2 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0        0              16 unknown,delalloc,eof
k: 1 extent found
x4 /tmp # sync
x4 /tmp # filefrag -v k 
Filesystem type is: 58465342
File size of k is 5120 (2 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0 26960045              16 eof
k: 1 extent found

-- 
Markus

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 15:56                   ` Eric Sandeen
@ 2011-04-14 16:04                     ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-14 16:04 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Pádraig Brady, xfs-oss, linux-ext4, coreutils, Markus Trippelsdorf

2011/4/14 Eric Sandeen <sandeen@sandeen.net>:
> On 4/14/11 10:52 AM, Pádraig Brady wrote:
>> On 14/04/11 16:50, Eric Sandeen wrote:
>>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>>> Hi Pádraig,
>>>>>>>
>>>>>>> here you go:
>>>>>>> + filefrag -v unwritten.withdata
>>>>>>> Filesystem type is: ef53
>>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
>>>>>>>  ext logical physical expected length flags
>>>>>>>    0       0   274432            2560 unwritten,eof
>>>>>>> unwritten.withdata: 1 extent found
>>>>>>>
>>>>>>> Please notice that this also happens with ext4 on the same kernel.
>>>>>>> Btrfs is fine.
>>>>>>
>>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>>
>>>> So in summary, currently on (2.6.39-rc3), the following
>>>> will (usually?) report a single unwritten extent,
>>>> on both ext4 and xfs
>>>>
>>>>   fallocate -l 10MiB -n k
>>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>>   filefrag -v k # grep for an extent without unwritten || fail
>>>
>>> right, that's what I see too in testing.
>>>
>>> But would the coreutils install have done a preallocation of the destination file?
>>>
>>> Otherwise this looks like a different bug...
>>>
>>>> This particular issue has been discussed so far at:
>>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>>> Note there it was stated there that ext4 had this
>>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>>
>>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>>
>>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
>>
>> Well that preallocate test is failing for him
>> when the source file is either on ext4 or xfs.
>> He noticed the issue initially on XFS when copying
>> none preallocated files, so XFS probably just has
>> the general issue of needing a sync before fiemap,
>> where as EXT4 just has this preallocate one
>> (though I've not seen it myself).
>>
>> cheers,
>> Pádraig.
>>
>
> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
>
> So I still don't know what Markus hit...
Sorry for that my patch ignored fallocate.  The situation is like this:
An user allocated space for a file with fallocate, and write a small
part of it. and the written is not flushed.
The extent stays one unwritten extent in disk and memory with delayed
allocation.

In ext4 ext4_ext_walk_space() thinks an extent does not exist only if
there is no any extents on disk.  So ext4_ext_walk_space()
thinks there is a extent and   ext4_ext_fiemap_cb() thus ignores pagecache.

I think ext4_ext_walk_space() should take unwritten extent not exist.

Yongqiang.
>
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 16:04                     ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-14 16:04 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: linux-ext4, Pádraig Brady, coreutils, Markus Trippelsdorf, xfs-oss

2011/4/14 Eric Sandeen <sandeen@sandeen.net>:
> On 4/14/11 10:52 AM, Pádraig Brady wrote:
>> On 14/04/11 16:50, Eric Sandeen wrote:
>>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>>> Hi Pádraig,
>>>>>>>
>>>>>>> here you go:
>>>>>>> + filefrag -v unwritten.withdata
>>>>>>> Filesystem type is: ef53
>>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
>>>>>>>  ext logical physical expected length flags
>>>>>>>    0       0   274432            2560 unwritten,eof
>>>>>>> unwritten.withdata: 1 extent found
>>>>>>>
>>>>>>> Please notice that this also happens with ext4 on the same kernel.
>>>>>>> Btrfs is fine.
>>>>>>
>>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>>
>>>> So in summary, currently on (2.6.39-rc3), the following
>>>> will (usually?) report a single unwritten extent,
>>>> on both ext4 and xfs
>>>>
>>>>   fallocate -l 10MiB -n k
>>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>>   filefrag -v k # grep for an extent without unwritten || fail
>>>
>>> right, that's what I see too in testing.
>>>
>>> But would the coreutils install have done a preallocation of the destination file?
>>>
>>> Otherwise this looks like a different bug...
>>>
>>>> This particular issue has been discussed so far at:
>>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>>> Note there it was stated there that ext4 had this
>>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>>
>>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>>
>>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
>>
>> Well that preallocate test is failing for him
>> when the source file is either on ext4 or xfs.
>> He noticed the issue initially on XFS when copying
>> none preallocated files, so XFS probably just has
>> the general issue of needing a sync before fiemap,
>> where as EXT4 just has this preallocate one
>> (though I've not seen it myself).
>>
>> cheers,
>> Pádraig.
>>
>
> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
>
> So I still don't know what Markus hit...
Sorry for that my patch ignored fallocate.  The situation is like this:
An user allocated space for a file with fallocate, and write a small
part of it. and the written is not flushed.
The extent stays one unwritten extent in disk and memory with delayed
allocation.

In ext4 ext4_ext_walk_space() thinks an extent does not exist only if
there is no any extents on disk.  So ext4_ext_walk_space()
thinks there is a extent and   ext4_ext_fiemap_cb() thus ignores pagecache.

I think ext4_ext_walk_space() should take unwritten extent not exist.

Yongqiang.
>
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 16:04                     ` Yongqiang Yang
@ 2011-04-14 16:10                       ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-14 16:10 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Pádraig Brady, xfs-oss, linux-ext4, coreutils, Markus Trippelsdorf

Hi,

I am off my working computer.  Maybe below fix could fix the problem.

fs/ext4/extent.c
static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
1877                 } else if (block >= le32_to_cpu(ex->ee_block)) {
1878                         /*
1879                          * some part of requested space is covered
1880                          * by found extent
1881                          */
1882                         start = block;
1883                         end = le32_to_cpu(ex->ee_block)
1884                                 + ext4_ext_get_actual_len(ex);
1885                         if (block + num < end)
1886                                 end = block + num;
       +                        if (!ext4_ext_is_uninitialized(ex))
1887                         exists = 1;
1888                 } else {
1889                         BUG();
1890                 }


On Fri, Apr 15, 2011 at 12:04 AM, Yongqiang Yang <xiaoqiangnk@gmail.com> wrote:
> 2011/4/14 Eric Sandeen <sandeen@sandeen.net>:
>> On 4/14/11 10:52 AM, Pádraig Brady wrote:
>>> On 14/04/11 16:50, Eric Sandeen wrote:
>>>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>>>> Hi Pádraig,
>>>>>>>>
>>>>>>>> here you go:
>>>>>>>> + filefrag -v unwritten.withdata
>>>>>>>> Filesystem type is: ef53
>>>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
>>>>>>>>  ext logical physical expected length flags
>>>>>>>>    0       0   274432            2560 unwritten,eof
>>>>>>>> unwritten.withdata: 1 extent found
>>>>>>>>
>>>>>>>> Please notice that this also happens with ext4 on the same kernel.
>>>>>>>> Btrfs is fine.
>>>>>>>
>>>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>>>
>>>>> So in summary, currently on (2.6.39-rc3), the following
>>>>> will (usually?) report a single unwritten extent,
>>>>> on both ext4 and xfs
>>>>>
>>>>>   fallocate -l 10MiB -n k
>>>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>>>   filefrag -v k # grep for an extent without unwritten || fail
>>>>
>>>> right, that's what I see too in testing.
>>>>
>>>> But would the coreutils install have done a preallocation of the destination file?
>>>>
>>>> Otherwise this looks like a different bug...
>>>>
>>>>> This particular issue has been discussed so far at:
>>>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>>>> Note there it was stated there that ext4 had this
>>>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>>>
>>>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>>>
>>>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
>>>
>>> Well that preallocate test is failing for him
>>> when the source file is either on ext4 or xfs.
>>> He noticed the issue initially on XFS when copying
>>> none preallocated files, so XFS probably just has
>>> the general issue of needing a sync before fiemap,
>>> where as EXT4 just has this preallocate one
>>> (though I've not seen it myself).
>>>
>>> cheers,
>>> Pádraig.
>>>
>>
>> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
>>
>> So I still don't know what Markus hit...
> Sorry for that my patch ignored fallocate.  The situation is like this:
> An user allocated space for a file with fallocate, and write a small
> part of it. and the written is not flushed.
> The extent stays one unwritten extent in disk and memory with delayed
> allocation.
>
> In ext4 ext4_ext_walk_space() thinks an extent does not exist only if
> there is no any extents on disk.  So ext4_ext_walk_space()
> thinks there is a extent and   ext4_ext_fiemap_cb() thus ignores pagecache.
>
> I think ext4_ext_walk_space() should take unwritten extent not exist.
>
> Yongqiang.
>>
>> -Eric
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Best Wishes
> Yongqiang Yang
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 16:10                       ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-14 16:10 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: linux-ext4, Pádraig Brady, coreutils, Markus Trippelsdorf, xfs-oss

Hi,

I am off my working computer.  Maybe below fix could fix the problem.

fs/ext4/extent.c
static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
1877                 } else if (block >= le32_to_cpu(ex->ee_block)) {
1878                         /*
1879                          * some part of requested space is covered
1880                          * by found extent
1881                          */
1882                         start = block;
1883                         end = le32_to_cpu(ex->ee_block)
1884                                 + ext4_ext_get_actual_len(ex);
1885                         if (block + num < end)
1886                                 end = block + num;
       +                        if (!ext4_ext_is_uninitialized(ex))
1887                         exists = 1;
1888                 } else {
1889                         BUG();
1890                 }


On Fri, Apr 15, 2011 at 12:04 AM, Yongqiang Yang <xiaoqiangnk@gmail.com> wrote:
> 2011/4/14 Eric Sandeen <sandeen@sandeen.net>:
>> On 4/14/11 10:52 AM, Pádraig Brady wrote:
>>> On 14/04/11 16:50, Eric Sandeen wrote:
>>>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>>>> Hi Pádraig,
>>>>>>>>
>>>>>>>> here you go:
>>>>>>>> + filefrag -v unwritten.withdata
>>>>>>>> Filesystem type is: ef53
>>>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
>>>>>>>>  ext logical physical expected length flags
>>>>>>>>    0       0   274432            2560 unwritten,eof
>>>>>>>> unwritten.withdata: 1 extent found
>>>>>>>>
>>>>>>>> Please notice that this also happens with ext4 on the same kernel.
>>>>>>>> Btrfs is fine.
>>>>>>>
>>>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>>>
>>>>> So in summary, currently on (2.6.39-rc3), the following
>>>>> will (usually?) report a single unwritten extent,
>>>>> on both ext4 and xfs
>>>>>
>>>>>   fallocate -l 10MiB -n k
>>>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>>>   filefrag -v k # grep for an extent without unwritten || fail
>>>>
>>>> right, that's what I see too in testing.
>>>>
>>>> But would the coreutils install have done a preallocation of the destination file?
>>>>
>>>> Otherwise this looks like a different bug...
>>>>
>>>>> This particular issue has been discussed so far at:
>>>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>>>> Note there it was stated there that ext4 had this
>>>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>>>
>>>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>>>
>>>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
>>>
>>> Well that preallocate test is failing for him
>>> when the source file is either on ext4 or xfs.
>>> He noticed the issue initially on XFS when copying
>>> none preallocated files, so XFS probably just has
>>> the general issue of needing a sync before fiemap,
>>> where as EXT4 just has this preallocate one
>>> (though I've not seen it myself).
>>>
>>> cheers,
>>> Pádraig.
>>>
>>
>> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
>>
>> So I still don't know what Markus hit...
> Sorry for that my patch ignored fallocate.  The situation is like this:
> An user allocated space for a file with fallocate, and write a small
> part of it. and the written is not flushed.
> The extent stays one unwritten extent in disk and memory with delayed
> allocation.
>
> In ext4 ext4_ext_walk_space() thinks an extent does not exist only if
> there is no any extents on disk.  So ext4_ext_walk_space()
> thinks there is a extent and   ext4_ext_fiemap_cb() thus ignores pagecache.
>
> I think ext4_ext_walk_space() should take unwritten extent not exist.
>
> Yongqiang.
>>
>> -Eric
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Best Wishes
> Yongqiang Yang
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 16:03                     ` Markus Trippelsdorf
@ 2011-04-14 16:14                       ` Eric Sandeen
  -1 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 16:14 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Pádraig Brady, xfs-oss, linux-ext4, coreutils

On 4/14/11 11:03 AM, Markus Trippelsdorf wrote:
> On 2011.04.14 at 10:56 -0500, Eric Sandeen wrote:

...

>> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
>>
>> So I still don't know what Markus hit...
> 
> Maybe it's delalloc:
> 
> x4 /tmp # dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k                                                                               
> 10+0 records in
> 10+0 records out
> 5120 bytes (5.1 kB) copied, 0.0021822 s, 2.3 MB/s
> x4 /tmp # filefrag -v k 
> Filesystem type is: 58465342
> File size of k is 5120 (2 blocks, blocksize 4096)
>  ext logical physical expected length flags
>    0       0        0              16 unknown,delalloc,eof
> k: 1 extent found
> x4 /tmp # sync
> x4 /tmp # filefrag -v k 
> Filesystem type is: 58465342
> File size of k is 5120 (2 blocks, blocksize 4096)
>  ext logical physical expected length flags
>    0       0 26960045              16 eof
> k: 1 extent found
> 

well filefrag still returns that it knows of a valid range from logical blocks 0 to 16, so cp should have something to copy...

-Eric

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 16:14                       ` Eric Sandeen
  0 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 16:14 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: linux-ext4, Pádraig Brady, coreutils, xfs-oss

On 4/14/11 11:03 AM, Markus Trippelsdorf wrote:
> On 2011.04.14 at 10:56 -0500, Eric Sandeen wrote:

...

>> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
>>
>> So I still don't know what Markus hit...
> 
> Maybe it's delalloc:
> 
> x4 /tmp # dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k                                                                               
> 10+0 records in
> 10+0 records out
> 5120 bytes (5.1 kB) copied, 0.0021822 s, 2.3 MB/s
> x4 /tmp # filefrag -v k 
> Filesystem type is: 58465342
> File size of k is 5120 (2 blocks, blocksize 4096)
>  ext logical physical expected length flags
>    0       0        0              16 unknown,delalloc,eof
> k: 1 extent found
> x4 /tmp # sync
> x4 /tmp # filefrag -v k 
> Filesystem type is: 58465342
> File size of k is 5120 (2 blocks, blocksize 4096)
>  ext logical physical expected length flags
>    0       0 26960045              16 eof
> k: 1 extent found
> 

well filefrag still returns that it knows of a valid range from logical blocks 0 to 16, so cp should have something to copy...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 16:03                     ` Markus Trippelsdorf
@ 2011-04-14 16:21                         ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-14 16:21 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA, Eric Sandeen,
	coreutils-mXXj517/zsQ, xfs-oss

On Fri, Apr 15, 2011 at 12:03 AM, Markus Trippelsdorf
<markus-xp2qqqlHh3xzoYq+O6RWwA@public.gmane.org> wrote:
> On 2011.04.14 at 10:56 -0500, Eric Sandeen wrote:
>> On 4/14/11 10:52 AM, Pádraig Brady wrote:
>> > On 14/04/11 16:50, Eric Sandeen wrote:
>> >> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>> >>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>> >>>>>> Hi Pádraig,
>> >>>>>>
>> >>>>>> here you go:
>> >>>>>> + filefrag -v unwritten.withdata
>> >>>>>> Filesystem type is: ef53
>> >>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
>> >>>>>>  ext logical physical expected length flags
>> >>>>>>    0       0   274432            2560 unwritten,eof
>> >>>>>> unwritten.withdata: 1 extent found
>> >>>>>>
>> >>>>>> Please notice that this also happens with ext4 on the same kernel.
>> >>>>>> Btrfs is fine.
>> >>>>>
>> >>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>> >>>
>> >>> So in summary, currently on (2.6.39-rc3), the following
>> >>> will (usually?) report a single unwritten extent,
>> >>> on both ext4 and xfs
>> >>>
>> >>>   fallocate -l 10MiB -n k
>> >>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>> >>>   filefrag -v k # grep for an extent without unwritten || fail
>> >>
>> >> right, that's what I see too in testing.
>> >>
>> >> But would the coreutils install have done a preallocation of the destination file?
>> >>
>> >> Otherwise this looks like a different bug...
>> >>
>> >>> This particular issue has been discussed so far at:
>> >>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>> >>> Note there it was stated there that ext4 had this
>> >>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>> >>
>> >> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>> >>
>> >> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
>> >
>> > Well that preallocate test is failing for him
>> > when the source file is either on ext4 or xfs.
>> > He noticed the issue initially on XFS when copying
>> > none preallocated files, so XFS probably just has
>> > the general issue of needing a sync before fiemap,
>> > where as EXT4 just has this preallocate one
>> > (though I've not seen it myself).
>> >
>> > cheers,
>> > Pádraig.
>> >
>>
>> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
>>
>> So I still don't know what Markus hit...
>
> Maybe it's delalloc:
>
> x4 /tmp # dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> 10+0 records in
> 10+0 records out
> 5120 bytes (5.1 kB) copied, 0.0021822 s, 2.3 MB/s
> x4 /tmp # filefrag -v k
> Filesystem type is: 58465342
> File size of k is 5120 (2 blocks, blocksize 4096)
>  ext logical physical expected length flags
>   0       0        0              16 unknown,delalloc,eof
> k: 1 extent found
> x4 /tmp # sync
> x4 /tmp # filefrag -v k
> Filesystem type is: 58465342
> File size of k is 5120 (2 blocks, blocksize 4096)
>  ext logical physical expected length flags
>   0 0 26960045 16 eof
> k: 1 extent found
There is no preallocation in this case. Problem comes when fallocate
and dealloc work together.

>
> --
> Markus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best Wishes
Yongqiang Yang

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 16:21                         ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-14 16:21 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Pádraig Brady, linux-ext4, Eric Sandeen, coreutils, xfs-oss

On Fri, Apr 15, 2011 at 12:03 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2011.04.14 at 10:56 -0500, Eric Sandeen wrote:
>> On 4/14/11 10:52 AM, Pádraig Brady wrote:
>> > On 14/04/11 16:50, Eric Sandeen wrote:
>> >> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>> >>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>> >>>>>> Hi Pádraig,
>> >>>>>>
>> >>>>>> here you go:
>> >>>>>> + filefrag -v unwritten.withdata
>> >>>>>> Filesystem type is: ef53
>> >>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
>> >>>>>>  ext logical physical expected length flags
>> >>>>>>    0       0   274432            2560 unwritten,eof
>> >>>>>> unwritten.withdata: 1 extent found
>> >>>>>>
>> >>>>>> Please notice that this also happens with ext4 on the same kernel.
>> >>>>>> Btrfs is fine.
>> >>>>>
>> >>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>> >>>
>> >>> So in summary, currently on (2.6.39-rc3), the following
>> >>> will (usually?) report a single unwritten extent,
>> >>> on both ext4 and xfs
>> >>>
>> >>>   fallocate -l 10MiB -n k
>> >>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>> >>>   filefrag -v k # grep for an extent without unwritten || fail
>> >>
>> >> right, that's what I see too in testing.
>> >>
>> >> But would the coreutils install have done a preallocation of the destination file?
>> >>
>> >> Otherwise this looks like a different bug...
>> >>
>> >>> This particular issue has been discussed so far at:
>> >>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>> >>> Note there it was stated there that ext4 had this
>> >>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>> >>
>> >> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>> >>
>> >> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
>> >
>> > Well that preallocate test is failing for him
>> > when the source file is either on ext4 or xfs.
>> > He noticed the issue initially on XFS when copying
>> > none preallocated files, so XFS probably just has
>> > the general issue of needing a sync before fiemap,
>> > where as EXT4 just has this preallocate one
>> > (though I've not seen it myself).
>> >
>> > cheers,
>> > Pádraig.
>> >
>>
>> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
>>
>> So I still don't know what Markus hit...
>
> Maybe it's delalloc:
>
> x4 /tmp # dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> 10+0 records in
> 10+0 records out
> 5120 bytes (5.1 kB) copied, 0.0021822 s, 2.3 MB/s
> x4 /tmp # filefrag -v k
> Filesystem type is: 58465342
> File size of k is 5120 (2 blocks, blocksize 4096)
>  ext logical physical expected length flags
>   0       0        0              16 unknown,delalloc,eof
> k: 1 extent found
> x4 /tmp # sync
> x4 /tmp # filefrag -v k
> Filesystem type is: 58465342
> File size of k is 5120 (2 blocks, blocksize 4096)
>  ext logical physical expected length flags
>   0 0 26960045 16 eof
> k: 1 extent found
There is no preallocation in this case. Problem comes when fallocate
and dealloc work together.

>
> --
> Markus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 16:21                         ` Yongqiang Yang
@ 2011-04-14 16:28                             ` Markus Trippelsdorf
  -1 siblings, 0 replies; 117+ messages in thread
From: Markus Trippelsdorf @ 2011-04-14 16:28 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA, Eric Sandeen,
	coreutils-mXXj517/zsQ, xfs-oss

On 2011.04.15 at 00:21 +0800, Yongqiang Yang wrote:
> On Fri, Apr 15, 2011 at 12:03 AM, Markus Trippelsdorf
> <markus-xp2qqqlHh3xzoYq+O6RWwA@public.gmane.org> wrote:
> > On 2011.04.14 at 10:56 -0500, Eric Sandeen wrote:
> >> On 4/14/11 10:52 AM, Pádraig Brady wrote:
> >> > On 14/04/11 16:50, Eric Sandeen wrote:
> >> >> On 4/14/11 9:59 AM, Pádraig Brady wrote:
> >> >>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
> >> >>>>>> Hi Pádraig,
> >> >>>>>>
> >> >>>>>> here you go:
> >> >>>>>> + filefrag -v unwritten.withdata
> >> >>>>>> Filesystem type is: ef53
> >> >>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
> >> >>>>>>  ext logical physical expected length flags
> >> >>>>>>    0       0   274432            2560 unwritten,eof
> >> >>>>>> unwritten.withdata: 1 extent found
> >> >>>>>>
> >> >>>>>> Please notice that this also happens with ext4 on the same kernel.
> >> >>>>>> Btrfs is fine.
> >> >>>>>
> >> >>>> `filefrag -vs` fixes the issue on both xfs and ext4.
> >> >>>
> >> >>> So in summary, currently on (2.6.39-rc3), the following
> >> >>> will (usually?) report a single unwritten extent,
> >> >>> on both ext4 and xfs
> >> >>>
> >> >>>   fallocate -l 10MiB -n k
> >> >>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> >> >>>   filefrag -v k # grep for an extent without unwritten || fail
> >> >>
> >> >> right, that's what I see too in testing.
> >> >>
> >> >> But would the coreutils install have done a preallocation of the destination file?
> >> >>
> >> >> Otherwise this looks like a different bug...
> >> >>
> >> >>> This particular issue has been discussed so far at:
> >> >>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
> >> >>> Note there it was stated there that ext4 had this
> >> >>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
> >> >>
> >> >> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
> >> >>
> >> >> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> >> >
> >> > Well that preallocate test is failing for him
> >> > when the source file is either on ext4 or xfs.
> >> > He noticed the issue initially on XFS when copying
> >> > none preallocated files, so XFS probably just has
> >> > the general issue of needing a sync before fiemap,
> >> > where as EXT4 just has this preallocate one
> >> > (though I've not seen it myself).
> >> >
> >> > cheers,
> >> > Pádraig.
> >> >
> >>
> >> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
> >>
> >> So I still don't know what Markus hit...
> >
> > Maybe it's delalloc:
> >
> > x4 /tmp # dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> > 10+0 records in
> > 10+0 records out
> > 5120 bytes (5.1 kB) copied, 0.0021822 s, 2.3 MB/s
> > x4 /tmp # filefrag -v k
> > Filesystem type is: 58465342
> > File size of k is 5120 (2 blocks, blocksize 4096)
> >  ext logical physical expected length flags
> >   0       0        0              16 unknown,delalloc,eof
> > k: 1 extent found
> > x4 /tmp # sync
> > x4 /tmp # filefrag -v k
> > Filesystem type is: 58465342
> > File size of k is 5120 (2 blocks, blocksize 4096)
> >  ext logical physical expected length flags
> >   0 0 26960045 16 eof
> > k: 1 extent found
> There is no preallocation in this case. Problem comes when fallocate
> and dealloc work together.

Yes, but we're still trying to find out what caused the zeros in the
binaries that coreutils installed on my system.

Now the failure only happens when I use "gold" as my linker. With GNU ld
everything is OK. But I thought this must be a timing issue, because
gold is faster and the binaries in coreutils-8.11/src are all fine.

-- 
Markus

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 16:28                             ` Markus Trippelsdorf
  0 siblings, 0 replies; 117+ messages in thread
From: Markus Trippelsdorf @ 2011-04-14 16:28 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Pádraig Brady, linux-ext4, Eric Sandeen, coreutils, xfs-oss

On 2011.04.15 at 00:21 +0800, Yongqiang Yang wrote:
> On Fri, Apr 15, 2011 at 12:03 AM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> > On 2011.04.14 at 10:56 -0500, Eric Sandeen wrote:
> >> On 4/14/11 10:52 AM, Pádraig Brady wrote:
> >> > On 14/04/11 16:50, Eric Sandeen wrote:
> >> >> On 4/14/11 9:59 AM, Pádraig Brady wrote:
> >> >>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
> >> >>>>>> Hi Pádraig,
> >> >>>>>>
> >> >>>>>> here you go:
> >> >>>>>> + filefrag -v unwritten.withdata
> >> >>>>>> Filesystem type is: ef53
> >> >>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
> >> >>>>>>  ext logical physical expected length flags
> >> >>>>>>    0       0   274432            2560 unwritten,eof
> >> >>>>>> unwritten.withdata: 1 extent found
> >> >>>>>>
> >> >>>>>> Please notice that this also happens with ext4 on the same kernel.
> >> >>>>>> Btrfs is fine.
> >> >>>>>
> >> >>>> `filefrag -vs` fixes the issue on both xfs and ext4.
> >> >>>
> >> >>> So in summary, currently on (2.6.39-rc3), the following
> >> >>> will (usually?) report a single unwritten extent,
> >> >>> on both ext4 and xfs
> >> >>>
> >> >>>   fallocate -l 10MiB -n k
> >> >>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> >> >>>   filefrag -v k # grep for an extent without unwritten || fail
> >> >>
> >> >> right, that's what I see too in testing.
> >> >>
> >> >> But would the coreutils install have done a preallocation of the destination file?
> >> >>
> >> >> Otherwise this looks like a different bug...
> >> >>
> >> >>> This particular issue has been discussed so far at:
> >> >>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
> >> >>> Note there it was stated there that ext4 had this
> >> >>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
> >> >>
> >> >> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
> >> >>
> >> >> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> >> >
> >> > Well that preallocate test is failing for him
> >> > when the source file is either on ext4 or xfs.
> >> > He noticed the issue initially on XFS when copying
> >> > none preallocated files, so XFS probably just has
> >> > the general issue of needing a sync before fiemap,
> >> > where as EXT4 just has this preallocate one
> >> > (though I've not seen it myself).
> >> >
> >> > cheers,
> >> > Pádraig.
> >> >
> >>
> >> well, if I simply take the preallocation step out of the testcase, it works fine on xfs without a sync.
> >>
> >> So I still don't know what Markus hit...
> >
> > Maybe it's delalloc:
> >
> > x4 /tmp # dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> > 10+0 records in
> > 10+0 records out
> > 5120 bytes (5.1 kB) copied, 0.0021822 s, 2.3 MB/s
> > x4 /tmp # filefrag -v k
> > Filesystem type is: 58465342
> > File size of k is 5120 (2 blocks, blocksize 4096)
> >  ext logical physical expected length flags
> >   0       0        0              16 unknown,delalloc,eof
> > k: 1 extent found
> > x4 /tmp # sync
> > x4 /tmp # filefrag -v k
> > Filesystem type is: 58465342
> > File size of k is 5120 (2 blocks, blocksize 4096)
> >  ext logical physical expected length flags
> >   0 0 26960045 16 eof
> > k: 1 extent found
> There is no preallocation in this case. Problem comes when fallocate
> and dealloc work together.

Yes, but we're still trying to find out what caused the zeros in the
binaries that coreutils installed on my system.

Now the failure only happens when I use "gold" as my linker. With GNU ld
everything is OK. But I thought this must be a timing issue, because
gold is faster and the binaries in coreutils-8.11/src are all fine.

-- 
Markus

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 16:28                             ` Markus Trippelsdorf
@ 2011-04-14 16:31                               ` Eric Sandeen
  -1 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 16:31 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Yongqiang Yang, Pádraig Brady, xfs-oss, linux-ext4, coreutils

On 4/14/11 11:28 AM, Markus Trippelsdorf wrote:

<snip>

> Yes, but we're still trying to find out what caused the zeros in the
> binaries that coreutils installed on my system.
> 
> Now the failure only happens when I use "gold" as my linker. With GNU ld
> everything is OK. But I thought this must be a timing issue, because
> gold is faster and the binaries in coreutils-8.11/src are all fine.

maybe xfs_bmap (or filefrag) of the binaries with both linkers would be instructive; are they laid out significantly differently?

does gold preallocate?

-Eric

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 16:31                               ` Eric Sandeen
  0 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 16:31 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Yongqiang Yang, xfs-oss, coreutils, linux-ext4, ady,
	=?ISO-8859-1?Q?P=E1draig_Br?=

On 4/14/11 11:28 AM, Markus Trippelsdorf wrote:

<snip>

> Yes, but we're still trying to find out what caused the zeros in the
> binaries that coreutils installed on my system.
> 
> Now the failure only happens when I use "gold" as my linker. With GNU ld
> everything is OK. But I thought this must be a timing issue, because
> gold is faster and the binaries in coreutils-8.11/src are all fine.

maybe xfs_bmap (or filefrag) of the binaries with both linkers would be instructive; are they laid out significantly differently?

does gold preallocate?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 16:31                               ` Eric Sandeen
@ 2011-04-14 16:48                                 ` Markus Trippelsdorf
  -1 siblings, 0 replies; 117+ messages in thread
From: Markus Trippelsdorf @ 2011-04-14 16:48 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Yongqiang Yang, Pádraig Brady, xfs-oss, linux-ext4, coreutils

On 2011.04.14 at 11:31 -0500, Eric Sandeen wrote:
> On 4/14/11 11:28 AM, Markus Trippelsdorf wrote:
> 
> <snip>
> 
> > Yes, but we're still trying to find out what caused the zeros in the
> > binaries that coreutils installed on my system.
> > 
> > Now the failure only happens when I use "gold" as my linker. With GNU ld
> > everything is OK. But I thought this must be a timing issue, because
> > gold is faster and the binaries in coreutils-8.11/src are all fine.
> 
> maybe xfs_bmap (or filefrag) of the binaries with both linkers would be instructive; are they laid out significantly differently?
> 
> does gold preallocate?

Just checked and yes it does. That should explain the issue I was
seeing.

bool
Output_file::map_no_anonymous()
{
  const int o = this->o_;

  // If the output file is not a regular file, don't try to mmap it;
  // instead, we'll mmap a block of memory (an anonymous buffer), and
  // then later write the buffer to the file.
  void* base;
  struct stat statbuf;
  if (o == STDOUT_FILENO || o == STDERR_FILENO
      || ::fstat(o, &statbuf) != 0
      || !S_ISREG(statbuf.st_mode)
      || this->is_temporary_)
    return false;

  // Ensure that we have disk space available for the file.  If we
  // don't do this, it is possible that we will call munmap, close,
  // and exit with dirty buffers still in the cache with no assigned
  // disk blocks.  If the disk is out of space at that point, the
  // output file will wind up incomplete, but we will have already
  // exited.  The alternative to fallocate would be to use fdatasync,
  // but that would be a more significant performance hit.
  if (::posix_fallocate(o, 0, this->file_size_) < 0)
    gold_fatal(_("%s: %s"), this->name_, strerror(errno));

  // Map the file into memory.
  base = ::mmap(NULL, this->file_size_, PROT_READ | PROT_WRITE,
		MAP_SHARED, o, 0);

  // The mmap call might fail because of file system issues: the file
  // system might not support mmap at all, or it might not support
  // mmap with PROT_WRITE.
  if (base == MAP_FAILED)
    return false;

  this->map_is_anonymous_ = false;
  this->base_ = static_cast<unsigned char*>(base);
  return true;
}


-- 
Markus

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 16:48                                 ` Markus Trippelsdorf
  0 siblings, 0 replies; 117+ messages in thread
From: Markus Trippelsdorf @ 2011-04-14 16:48 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: linux-ext4, Pádraig Brady, coreutils, Yongqiang Yang, xfs-oss

On 2011.04.14 at 11:31 -0500, Eric Sandeen wrote:
> On 4/14/11 11:28 AM, Markus Trippelsdorf wrote:
> 
> <snip>
> 
> > Yes, but we're still trying to find out what caused the zeros in the
> > binaries that coreutils installed on my system.
> > 
> > Now the failure only happens when I use "gold" as my linker. With GNU ld
> > everything is OK. But I thought this must be a timing issue, because
> > gold is faster and the binaries in coreutils-8.11/src are all fine.
> 
> maybe xfs_bmap (or filefrag) of the binaries with both linkers would be instructive; are they laid out significantly differently?
> 
> does gold preallocate?

Just checked and yes it does. That should explain the issue I was
seeing.

bool
Output_file::map_no_anonymous()
{
  const int o = this->o_;

  // If the output file is not a regular file, don't try to mmap it;
  // instead, we'll mmap a block of memory (an anonymous buffer), and
  // then later write the buffer to the file.
  void* base;
  struct stat statbuf;
  if (o == STDOUT_FILENO || o == STDERR_FILENO
      || ::fstat(o, &statbuf) != 0
      || !S_ISREG(statbuf.st_mode)
      || this->is_temporary_)
    return false;

  // Ensure that we have disk space available for the file.  If we
  // don't do this, it is possible that we will call munmap, close,
  // and exit with dirty buffers still in the cache with no assigned
  // disk blocks.  If the disk is out of space at that point, the
  // output file will wind up incomplete, but we will have already
  // exited.  The alternative to fallocate would be to use fdatasync,
  // but that would be a more significant performance hit.
  if (::posix_fallocate(o, 0, this->file_size_) < 0)
    gold_fatal(_("%s: %s"), this->name_, strerror(errno));

  // Map the file into memory.
  base = ::mmap(NULL, this->file_size_, PROT_READ | PROT_WRITE,
		MAP_SHARED, o, 0);

  // The mmap call might fail because of file system issues: the file
  // system might not support mmap at all, or it might not support
  // mmap with PROT_WRITE.
  if (base == MAP_FAILED)
    return false;

  this->map_is_anonymous_ = false;
  this->base_ = static_cast<unsigned char*>(base);
  return true;
}


-- 
Markus

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 16:48                                 ` Markus Trippelsdorf
@ 2011-04-14 16:49                                   ` Eric Sandeen
  -1 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 16:49 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Yongqiang Yang, Pádraig Brady, xfs-oss, linux-ext4, coreutils

On 4/14/11 11:48 AM, Markus Trippelsdorf wrote:
> On 2011.04.14 at 11:31 -0500, Eric Sandeen wrote:
>> On 4/14/11 11:28 AM, Markus Trippelsdorf wrote:
>>
>> <snip>
>>
>>> Yes, but we're still trying to find out what caused the zeros in the
>>> binaries that coreutils installed on my system.
>>>
>>> Now the failure only happens when I use "gold" as my linker. With GNU ld
>>> everything is OK. But I thought this must be a timing issue, because
>>> gold is faster and the binaries in coreutils-8.11/src are all fine.
>>
>> maybe xfs_bmap (or filefrag) of the binaries with both linkers would be instructive; are they laid out significantly differently?
>>
>> does gold preallocate?
> 
> Just checked and yes it does. That should explain the issue I was
> seeing.

Well, mystery solved there, at least!

Now for the fixing part :)

Thanks for checking, at least my view of the world is still intact ;)

-Eric

> bool
> Output_file::map_no_anonymous()
> {
>   const int o = this->o_;
> 
>   // If the output file is not a regular file, don't try to mmap it;
>   // instead, we'll mmap a block of memory (an anonymous buffer), and
>   // then later write the buffer to the file.
>   void* base;
>   struct stat statbuf;
>   if (o == STDOUT_FILENO || o == STDERR_FILENO
>       || ::fstat(o, &statbuf) != 0
>       || !S_ISREG(statbuf.st_mode)
>       || this->is_temporary_)
>     return false;
> 
>   // Ensure that we have disk space available for the file.  If we
>   // don't do this, it is possible that we will call munmap, close,
>   // and exit with dirty buffers still in the cache with no assigned
>   // disk blocks.  If the disk is out of space at that point, the
>   // output file will wind up incomplete, but we will have already
>   // exited.  The alternative to fallocate would be to use fdatasync,
>   // but that would be a more significant performance hit.
>   if (::posix_fallocate(o, 0, this->file_size_) < 0)
>     gold_fatal(_("%s: %s"), this->name_, strerror(errno));
> 
>   // Map the file into memory.
>   base = ::mmap(NULL, this->file_size_, PROT_READ | PROT_WRITE,
> 		MAP_SHARED, o, 0);
> 
>   // The mmap call might fail because of file system issues: the file
>   // system might not support mmap at all, or it might not support
>   // mmap with PROT_WRITE.
>   if (base == MAP_FAILED)
>     return false;
> 
>   this->map_is_anonymous_ = false;
>   this->base_ = static_cast<unsigned char*>(base);
>   return true;
> }
> 
> 


^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 16:49                                   ` Eric Sandeen
  0 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-14 16:49 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: Yongqiang Yang, xfs-oss, coreutils, linux-ext4, ady,
	=?ISO-8859-1?Q?P=E1draig_Br?=

On 4/14/11 11:48 AM, Markus Trippelsdorf wrote:
> On 2011.04.14 at 11:31 -0500, Eric Sandeen wrote:
>> On 4/14/11 11:28 AM, Markus Trippelsdorf wrote:
>>
>> <snip>
>>
>>> Yes, but we're still trying to find out what caused the zeros in the
>>> binaries that coreutils installed on my system.
>>>
>>> Now the failure only happens when I use "gold" as my linker. With GNU ld
>>> everything is OK. But I thought this must be a timing issue, because
>>> gold is faster and the binaries in coreutils-8.11/src are all fine.
>>
>> maybe xfs_bmap (or filefrag) of the binaries with both linkers would be instructive; are they laid out significantly differently?
>>
>> does gold preallocate?
> 
> Just checked and yes it does. That should explain the issue I was
> seeing.

Well, mystery solved there, at least!

Now for the fixing part :)

Thanks for checking, at least my view of the world is still intact ;)

-Eric

> bool
> Output_file::map_no_anonymous()
> {
>   const int o = this->o_;
> 
>   // If the output file is not a regular file, don't try to mmap it;
>   // instead, we'll mmap a block of memory (an anonymous buffer), and
>   // then later write the buffer to the file.
>   void* base;
>   struct stat statbuf;
>   if (o == STDOUT_FILENO || o == STDERR_FILENO
>       || ::fstat(o, &statbuf) != 0
>       || !S_ISREG(statbuf.st_mode)
>       || this->is_temporary_)
>     return false;
> 
>   // Ensure that we have disk space available for the file.  If we
>   // don't do this, it is possible that we will call munmap, close,
>   // and exit with dirty buffers still in the cache with no assigned
>   // disk blocks.  If the disk is out of space at that point, the
>   // output file will wind up incomplete, but we will have already
>   // exited.  The alternative to fallocate would be to use fdatasync,
>   // but that would be a more significant performance hit.
>   if (::posix_fallocate(o, 0, this->file_size_) < 0)
>     gold_fatal(_("%s: %s"), this->name_, strerror(errno));
> 
>   // Map the file into memory.
>   base = ::mmap(NULL, this->file_size_, PROT_READ | PROT_WRITE,
> 		MAP_SHARED, o, 0);
> 
>   // The mmap call might fail because of file system issues: the file
>   // system might not support mmap at all, or it might not support
>   // mmap with PROT_WRITE.
>   if (base == MAP_FAILED)
>     return false;
> 
>   this->map_is_anonymous_ = false;
>   this->base_ = static_cast<unsigned char*>(base);
>   return true;
> }
> 
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 15:52                 ` Pádraig Brady
@ 2011-04-14 17:27                     ` Jim Meyering
  -1 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-14 17:27 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA, Eric Sandeen,
	coreutils-mXXj517/zsQ, Markus Trippelsdorf, xfs-oss

Pádraig Brady wrote:

> On 14/04/11 16:50, Eric Sandeen wrote:
>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>> Hi Pádraig,
>>>>>>
>>>>>> here you go:
>>>>>> + filefrag -v unwritten.withdata
>>>>>> Filesystem type is: ef53
>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
>>>>>>  ext logical physical expected length flags
>>>>>>    0       0   274432            2560 unwritten,eof
>>>>>> unwritten.withdata: 1 extent found
>>>>>>
>>>>>> Please notice that this also happens with ext4 on the same kernel.
>>>>>> Btrfs is fine.
>>>>>
>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>
>>> So in summary, currently on (2.6.39-rc3), the following
>>> will (usually?) report a single unwritten extent,
>>> on both ext4 and xfs
>>>
>>>   fallocate -l 10MiB -n k
>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>   filefrag -v k # grep for an extent without unwritten || fail
>>
>> right, that's what I see too in testing.
>>
>> But would the coreutils install have done a preallocation of the
>> destination file?
>>
>> Otherwise this looks like a different bug...
>>
>>> This particular issue has been discussed so far at:
>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>> Note there it was stated there that ext4 had this
>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>
>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember,
>> I think I started looking into it, but it's clearly still broken.
>>
>> Still, I don't know for sure what happened to Markus - did something
>> preallocate, in his case?
>
> Well that preallocate test is failing for him
> when the source file is either on ext4 or xfs.
> He noticed the issue initially on XFS when copying
> none preallocated files, so XFS probably just has
> the general issue of needing a sync before fiemap,
> where as EXT4 just has this preallocate one
> (though I've not seen it myself).

FYI, I see the same failure now using ext3 (and but not w/ext4)
with rawhide's 2.6.39-0.rc2.git0.0.fc16.x86_64:

  + df -t ext3 .
  + require_root_
  + uid_is_privileged_
  ++ id -u
  + my_uid=0
  + case $my_uid in
  + NON_ROOT_USERNAME=nobody
  ++ id -g nobody
  + NON_ROOT_GROUP=99
  + cwd=/t/m/ext3/tmp/coreutils-8.11.1-5995ed-dirty/tests/gt-sparse-fiemap.Qhjo
  + skip=0
  + dd if=/dev/zero of=blob bs=32k count=1000
  1000+0 records in
  1000+0 records out
  32768000 bytes (33 MB) copied, 1.02932 s, 31.8 MB/s
  + mkdir mnt
  + mkfs -t ext4 -F blob
  mke2fs 1.41.14 (22-Dec-2010)
  Filesystem label=
  OS type: Linux
  Block size=1024 (log=0)
  Fragment size=1024 (log=0)
  Stride=0 blocks, Stripe width=0 blocks
  8000 inodes, 32000 blocks
  1600 blocks (5.00%) reserved for the super user
  First data block=1
  Maximum filesystem blocks=32768000
  4 block groups
  8192 blocks per group, 8192 fragments per group
  2000 inodes per group
  Superblock backups stored on blocks:
          8193, 24577

  Writing inode tables: done
  Creating journal (1024 blocks): done
  Writing superblocks and filesystem accounting information: done

  This filesystem will be automatically checked every 21 mounts or
  180 days, whichever comes first.  Use tune2fs -c or -i to override.
  + mount -oloop blob mnt
  + cd mnt
  + echo test
  + test -s f
  + test 0 = 1
  ++ seq 1 2 21
  + for i in '$(seq 1 2 21)'
  + for j in 1 2 31 100
  + perl -e 'BEGIN { $n = 1 * 1024; *F = *STDOUT }' -e 'for (1..1) { sysseek (*F, $n, 1)' -e '&& syswrite (*F, chr($_)x$n) or die "$!"}'
  + cp --sparse=always j1 j2
  + cmp j1 j2
  + filefrag -vs j1
  + grep -F extent
  + filefrag -v j1
  + filefrag -vs j2
  + f ff1
  + perl /t/m/ext3/tmp/coreutils-8.11.1-5995ed-dirty/tests/filefrag-extent-compare
  + sed 's/ [a-z,][a-z,]*$//' ff1
  + awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
  + f ff2
  + sed 's/ [a-z,][a-z,]*$//' ff2
  + awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
  + test 0 = 1
  + for j in 1 2 31 100
  + perl -e 'BEGIN { $n = 1 * 1024; *F = *STDOUT }' -e 'for (1..2) { sysseek (*F, $n, 1)' -e '&& syswrite (*F, chr($_)x$n) or die "$!"}'
  + cp --sparse=always j1 j2
  + cmp j1 j2
  j1 j2 differ: char 1, line 1     <<<<================
  + fail=1

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 17:27                     ` Jim Meyering
  0 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-14 17:27 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: linux-ext4, Eric Sandeen, coreutils, Markus Trippelsdorf, xfs-oss

Pádraig Brady wrote:

> On 14/04/11 16:50, Eric Sandeen wrote:
>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>> Hi Pádraig,
>>>>>>
>>>>>> here you go:
>>>>>> + filefrag -v unwritten.withdata
>>>>>> Filesystem type is: ef53
>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)
>>>>>>  ext logical physical expected length flags
>>>>>>    0       0   274432            2560 unwritten,eof
>>>>>> unwritten.withdata: 1 extent found
>>>>>>
>>>>>> Please notice that this also happens with ext4 on the same kernel.
>>>>>> Btrfs is fine.
>>>>>
>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>
>>> So in summary, currently on (2.6.39-rc3), the following
>>> will (usually?) report a single unwritten extent,
>>> on both ext4 and xfs
>>>
>>>   fallocate -l 10MiB -n k
>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>   filefrag -v k # grep for an extent without unwritten || fail
>>
>> right, that's what I see too in testing.
>>
>> But would the coreutils install have done a preallocation of the
>> destination file?
>>
>> Otherwise this looks like a different bug...
>>
>>> This particular issue has been discussed so far at:
>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>> Note there it was stated there that ext4 had this
>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>
>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember,
>> I think I started looking into it, but it's clearly still broken.
>>
>> Still, I don't know for sure what happened to Markus - did something
>> preallocate, in his case?
>
> Well that preallocate test is failing for him
> when the source file is either on ext4 or xfs.
> He noticed the issue initially on XFS when copying
> none preallocated files, so XFS probably just has
> the general issue of needing a sync before fiemap,
> where as EXT4 just has this preallocate one
> (though I've not seen it myself).

FYI, I see the same failure now using ext3 (and but not w/ext4)
with rawhide's 2.6.39-0.rc2.git0.0.fc16.x86_64:

  + df -t ext3 .
  + require_root_
  + uid_is_privileged_
  ++ id -u
  + my_uid=0
  + case $my_uid in
  + NON_ROOT_USERNAME=nobody
  ++ id -g nobody
  + NON_ROOT_GROUP=99
  + cwd=/t/m/ext3/tmp/coreutils-8.11.1-5995ed-dirty/tests/gt-sparse-fiemap.Qhjo
  + skip=0
  + dd if=/dev/zero of=blob bs=32k count=1000
  1000+0 records in
  1000+0 records out
  32768000 bytes (33 MB) copied, 1.02932 s, 31.8 MB/s
  + mkdir mnt
  + mkfs -t ext4 -F blob
  mke2fs 1.41.14 (22-Dec-2010)
  Filesystem label=
  OS type: Linux
  Block size=1024 (log=0)
  Fragment size=1024 (log=0)
  Stride=0 blocks, Stripe width=0 blocks
  8000 inodes, 32000 blocks
  1600 blocks (5.00%) reserved for the super user
  First data block=1
  Maximum filesystem blocks=32768000
  4 block groups
  8192 blocks per group, 8192 fragments per group
  2000 inodes per group
  Superblock backups stored on blocks:
          8193, 24577

  Writing inode tables: done
  Creating journal (1024 blocks): done
  Writing superblocks and filesystem accounting information: done

  This filesystem will be automatically checked every 21 mounts or
  180 days, whichever comes first.  Use tune2fs -c or -i to override.
  + mount -oloop blob mnt
  + cd mnt
  + echo test
  + test -s f
  + test 0 = 1
  ++ seq 1 2 21
  + for i in '$(seq 1 2 21)'
  + for j in 1 2 31 100
  + perl -e 'BEGIN { $n = 1 * 1024; *F = *STDOUT }' -e 'for (1..1) { sysseek (*F, $n, 1)' -e '&& syswrite (*F, chr($_)x$n) or die "$!"}'
  + cp --sparse=always j1 j2
  + cmp j1 j2
  + filefrag -vs j1
  + grep -F extent
  + filefrag -v j1
  + filefrag -vs j2
  + f ff1
  + perl /t/m/ext3/tmp/coreutils-8.11.1-5995ed-dirty/tests/filefrag-extent-compare
  + sed 's/ [a-z,][a-z,]*$//' ff1
  + awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
  + f ff2
  + sed 's/ [a-z,][a-z,]*$//' ff2
  + awk '/^ *[0-9]/ {printf "%d %d ", $2 ,NF < 5 ? $NF : $5 } END {print ""}'
  + test 0 = 1
  + for j in 1 2 31 100
  + perl -e 'BEGIN { $n = 1 * 1024; *F = *STDOUT }' -e 'for (1..2) { sysseek (*F, $n, 1)' -e '&& syswrite (*F, chr($_)x$n) or die "$!"}'
  + cp --sparse=always j1 j2
  + cmp j1 j2
  j1 j2 differ: char 1, line 1     <<<<================
  + fail=1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 17:27                     ` Jim Meyering
@ 2011-04-14 19:13                       ` Pádraig Brady
  -1 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-04-14 19:13 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Eric Sandeen, linux-ext4, coreutils, Markus Trippelsdorf, xfs-oss

On 14/04/11 18:27, Jim Meyering wrote:
> 
> FYI, I see the same failure now using ext3 (and but not w/ext4)
> with rawhide's 2.6.39-0.rc2.git0.0.fc16.x86_64:
> 
>   + df -t ext3 .

Not with ext3 ...

>   + mkfs -t ext4 -F blob

... but with ext4 loop back

>   + perl -e 'BEGIN { $n = 1 * 1024; *F = *STDOUT }' -e 'for (1..1) { sysseek (*F, $n, 1)' -e '&& syswrite (*F, chr($_)x$n) or die "$!"}'

>   + cp --sparse=always j1 j2
>   + cmp j1 j2
>   j1 j2 differ: char 1, line 1     <<<<================

But there was no preallocation done above.
So this was the original sync issue, which doesn't seem to be working :(
Is there a chance the rawhide kernel hasn't included that change?
Unlikely as it's 2.6.39-rc2.

cheers,
Pádraig.

p.s. I will do some checking with ext3 to ensure everything is OK
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 19:13                       ` Pádraig Brady
  0 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-04-14 19:13 UTC (permalink / raw)
  To: Jim Meyering
  Cc: linux-ext4, Eric Sandeen, coreutils, Markus Trippelsdorf, xfs-oss

On 14/04/11 18:27, Jim Meyering wrote:
> 
> FYI, I see the same failure now using ext3 (and but not w/ext4)
> with rawhide's 2.6.39-0.rc2.git0.0.fc16.x86_64:
> 
>   + df -t ext3 .

Not with ext3 ...

>   + mkfs -t ext4 -F blob

... but with ext4 loop back

>   + perl -e 'BEGIN { $n = 1 * 1024; *F = *STDOUT }' -e 'for (1..1) { sysseek (*F, $n, 1)' -e '&& syswrite (*F, chr($_)x$n) or die "$!"}'

>   + cp --sparse=always j1 j2
>   + cmp j1 j2
>   j1 j2 differ: char 1, line 1     <<<<================

But there was no preallocation done above.
So this was the original sync issue, which doesn't seem to be working :(
Is there a chance the rawhide kernel hasn't included that change?
Unlikely as it's 2.6.39-rc2.

cheers,
Pádraig.

p.s. I will do some checking with ext3 to ensure everything is OK

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 17:27                     ` Jim Meyering
@ 2011-04-14 19:39                         ` Jim Meyering
  -1 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-14 19:39 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA, Eric Sandeen,
	coreutils-mXXj517/zsQ, Markus Trippelsdorf, xfs-oss

Jim Meyering wrote:
...
> FYI, I see the same failure now using ext3 (and but not w/ext4)
> with rawhide's 2.6.39-0.rc2.git0.0.fc16.x86_64:

Correction.
The failure below is not on ext3, but on a loopback-mounted
ext4 file system.

The test detected that the current FS was ext3 -- and it's known
that this test is not useful on ext3 -- and since I was running it
as root, it creates a loopback ext4 file system on which to run the test.

>   + df -t ext3 .
>   + require_root_
>   + uid_is_privileged_
>   ++ id -u
>   + my_uid=0

However, other strange things are happening here.
I'm investigating.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 19:39                         ` Jim Meyering
  0 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-14 19:39 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: linux-ext4, Eric Sandeen, coreutils, Markus Trippelsdorf, xfs-oss

Jim Meyering wrote:
...
> FYI, I see the same failure now using ext3 (and but not w/ext4)
> with rawhide's 2.6.39-0.rc2.git0.0.fc16.x86_64:

Correction.
The failure below is not on ext3, but on a loopback-mounted
ext4 file system.

The test detected that the current FS was ext3 -- and it's known
that this test is not useful on ext3 -- and since I was running it
as root, it creates a loopback ext4 file system on which to run the test.

>   + df -t ext3 .
>   + require_root_
>   + uid_is_privileged_
>   ++ id -u
>   + my_uid=0

However, other strange things are happening here.
I'm investigating.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 15:50             ` Eric Sandeen
  (?)
  (?)
@ 2011-04-14 22:59             ` Dave Chinner
  2011-04-14 23:29                 ` Pádraig Brady
  -1 siblings, 1 reply; 117+ messages in thread
From: Dave Chinner @ 2011-04-14 22:59 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: linux-ext4, Pádraig Brady, coreutils, Markus Trippelsdorf, xfs-oss

On Thu, Apr 14, 2011 at 10:50:10AM -0500, Eric Sandeen wrote:
> On 4/14/11 9:59 AM, Pádraig Brady wrote:
> > On 14/04/11 15:02, Markus Trippelsdorf wrote:
> >>>> Hi Pádraig,
> >>>>
> >>>> here you go:
> >>>> + filefrag -v unwritten.withdata                                                                                                                     
> >>>> Filesystem type is: ef53                                                                                                                             
> >>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
> >>>>  ext logical physical expected length flags                                                                                                          
> >>>>    0       0   274432            2560 unwritten,eof                                                                                                  
> >>>> unwritten.withdata: 1 extent found
> >>>>
> >>>> Please notice that this also happens with ext4 on the same kernel. 
> >>>> Btrfs is fine.
> >>>
> >> `filefrag -vs` fixes the issue on both xfs and ext4.
> > 
> > So in summary, currently on (2.6.39-rc3), the following
> > will (usually?) report a single unwritten extent,
> > on both ext4 and xfs
> > 
> >   fallocate -l 10MiB -n k
> >   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> >   filefrag -v k # grep for an extent without unwritten || fail
> 
> right, that's what I see too in testing.
> 
> But would the coreutils install have done a preallocation of the destination file?
> 
> Otherwise this looks like a different bug...
> 
> > This particular issue has been discussed so far at:
> > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
> > Note there it was stated there that ext4 had this
> > fixed as of 2.6.39-rc1, so maybe there is something lurking?
> 
> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
> 
> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?

Unwritten extent mapping behaves in an unexpected way due to
buffered writeback not occurring immediately. Extent conversion
doesn't occur until the data is on disk, and for buffered IO you
need an fdatasync to ensure that has occurred.

That is: 

$ xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c "bmap -vp" /mnt/test/foo
wrote 5120/5120 bytes at offset 0
5 KiB, 2 ops; 0.0000 sec (62.600 MiB/sec and 25641.0256 ops/sec)
/mnt/test/foo:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
   0: [0..20479]:      268984..289463    0 (268984..289463) 20480 10000

Data has not been written yet, so it is still unwritten. The same
test with a fsync shows:

$ sudo xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c fsync -c "bmap -vp" /mnt/test/foo
wrote 5120/5120 bytes at offset 0
5 KiB, 2 ops; 0.0000 sec (87.193 MiB/sec and 35714.2857 ops/sec)
/mnt/test/foo:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
   0: [0..15]:         268984..268999    0 (268984..268999)    16 00000
   1: [16..20479]:     269000..289463    0 (269000..289463) 20464 10000

Everything is fine.

So this seems like an application error to me. If you are going to
use fiemap to determine what ranges to copy, then you have to
fdatasync the source file first to guarantee that preallocated
extents have been converted to written state before mapping the
file....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 22:59             ` Dave Chinner
@ 2011-04-14 23:29                 ` Pádraig Brady
  0 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-04-14 23:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric Sandeen, linux-ext4, coreutils, Markus Trippelsdorf, xfs-oss

On 14/04/11 23:59, Dave Chinner wrote:
> On Thu, Apr 14, 2011 at 10:50:10AM -0500, Eric Sandeen wrote:
>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>> Hi Pádraig,
>>>>>>
>>>>>> here you go:
>>>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>>>> Filesystem type is: ef53                                                                                                                             
>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>>>  ext logical physical expected length flags                                                                                                          
>>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>>>>> unwritten.withdata: 1 extent found
>>>>>>
>>>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>>>> Btrfs is fine.
>>>>>
>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>
>>> So in summary, currently on (2.6.39-rc3), the following
>>> will (usually?) report a single unwritten extent,
>>> on both ext4 and xfs
>>>
>>>   fallocate -l 10MiB -n k
>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>   filefrag -v k # grep for an extent without unwritten || fail
>>
>> right, that's what I see too in testing.
>>
>> But would the coreutils install have done a preallocation of the destination file?
>>
>> Otherwise this looks like a different bug...
>>
>>> This particular issue has been discussed so far at:
>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>> Note there it was stated there that ext4 had this
>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>
>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>
>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> 
> Unwritten extent mapping behaves in an unexpected way due to
> buffered writeback not occurring immediately. Extent conversion
> doesn't occur until the data is on disk, and for buffered IO you
> need an fdatasync to ensure that has occurred.
> 
> That is: 
> 
> $ xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c "bmap -vp" /mnt/test/foo
> wrote 5120/5120 bytes at offset 0
> 5 KiB, 2 ops; 0.0000 sec (62.600 MiB/sec and 25641.0256 ops/sec)
> /mnt/test/foo:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..20479]:      268984..289463    0 (268984..289463) 20480 10000
> 
> Data has not been written yet, so it is still unwritten. The same
> test with a fsync shows:
> 
> $ sudo xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c fsync -c "bmap -vp" /mnt/test/foo
> wrote 5120/5120 bytes at offset 0
> 5 KiB, 2 ops; 0.0000 sec (87.193 MiB/sec and 35714.2857 ops/sec)
> /mnt/test/foo:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..15]:         268984..268999    0 (268984..268999)    16 00000
>    1: [16..20479]:     269000..289463    0 (269000..289463) 20464 10000
> 
> Everything is fine.
> 
> So this seems like an application error to me. If you are going to
> use fiemap to determine what ranges to copy, then you have to
> fdatasync the source file first to guarantee that preallocated
> extents have been converted to written state before mapping the
> file....

Well IMHO there should be a difference between
knowing where you are going to write, and actually writing to disk.
I.E. one shouldn't need to write the whole way to the device
before returning a valid fiemap.  If a particular file system
implementation needs to sync to return a valid fiemap,
then it should be implicit.

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-14 23:29                 ` Pádraig Brady
  0 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-04-14 23:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-ext4, Eric Sandeen, coreutils, Markus Trippelsdorf, xfs-oss

On 14/04/11 23:59, Dave Chinner wrote:
> On Thu, Apr 14, 2011 at 10:50:10AM -0500, Eric Sandeen wrote:
>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>> Hi Pádraig,
>>>>>>
>>>>>> here you go:
>>>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>>>> Filesystem type is: ef53                                                                                                                             
>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>>>  ext logical physical expected length flags                                                                                                          
>>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
>>>>>> unwritten.withdata: 1 extent found
>>>>>>
>>>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>>>> Btrfs is fine.
>>>>>
>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>
>>> So in summary, currently on (2.6.39-rc3), the following
>>> will (usually?) report a single unwritten extent,
>>> on both ext4 and xfs
>>>
>>>   fallocate -l 10MiB -n k
>>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>   filefrag -v k # grep for an extent without unwritten || fail
>>
>> right, that's what I see too in testing.
>>
>> But would the coreutils install have done a preallocation of the destination file?
>>
>> Otherwise this looks like a different bug...
>>
>>> This particular issue has been discussed so far at:
>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>> Note there it was stated there that ext4 had this
>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>
>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>
>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> 
> Unwritten extent mapping behaves in an unexpected way due to
> buffered writeback not occurring immediately. Extent conversion
> doesn't occur until the data is on disk, and for buffered IO you
> need an fdatasync to ensure that has occurred.
> 
> That is: 
> 
> $ xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c "bmap -vp" /mnt/test/foo
> wrote 5120/5120 bytes at offset 0
> 5 KiB, 2 ops; 0.0000 sec (62.600 MiB/sec and 25641.0256 ops/sec)
> /mnt/test/foo:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..20479]:      268984..289463    0 (268984..289463) 20480 10000
> 
> Data has not been written yet, so it is still unwritten. The same
> test with a fsync shows:
> 
> $ sudo xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c fsync -c "bmap -vp" /mnt/test/foo
> wrote 5120/5120 bytes at offset 0
> 5 KiB, 2 ops; 0.0000 sec (87.193 MiB/sec and 35714.2857 ops/sec)
> /mnt/test/foo:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..15]:         268984..268999    0 (268984..268999)    16 00000
>    1: [16..20479]:     269000..289463    0 (269000..289463) 20464 10000
> 
> Everything is fine.
> 
> So this seems like an application error to me. If you are going to
> use fiemap to determine what ranges to copy, then you have to
> fdatasync the source file first to guarantee that preallocated
> extents have been converted to written state before mapping the
> file....

Well IMHO there should be a difference between
knowing where you are going to write, and actually writing to disk.
I.E. one shouldn't need to write the whole way to the device
before returning a valid fiemap.  If a particular file system
implementation needs to sync to return a valid fiemap,
then it should be implicit.

cheers,
Pádraig.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 23:29                 ` Pádraig Brady
@ 2011-04-15  0:09                   ` Dave Chinner
  -1 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-15  0:09 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: Eric Sandeen, linux-ext4, coreutils, Markus Trippelsdorf, xfs-oss

On Fri, Apr 15, 2011 at 12:29:46AM +0100, Pádraig Brady wrote:
> On 14/04/11 23:59, Dave Chinner wrote:
> > On Thu, Apr 14, 2011 at 10:50:10AM -0500, Eric Sandeen wrote:
> >> On 4/14/11 9:59 AM, Pádraig Brady wrote:
> >>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
> >>>>>> Hi Pádraig,
> >>>>>>
> >>>>>> here you go:
> >>>>>> + filefrag -v unwritten.withdata                                                                                                                     
> >>>>>> Filesystem type is: ef53                                                                                                                             
> >>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
> >>>>>>  ext logical physical expected length flags                                                                                                          
> >>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
> >>>>>> unwritten.withdata: 1 extent found
> >>>>>>
> >>>>>> Please notice that this also happens with ext4 on the same kernel. 
> >>>>>> Btrfs is fine.
> >>>>>
> >>>> `filefrag -vs` fixes the issue on both xfs and ext4.
> >>>
> >>> So in summary, currently on (2.6.39-rc3), the following
> >>> will (usually?) report a single unwritten extent,
> >>> on both ext4 and xfs
> >>>
> >>>   fallocate -l 10MiB -n k
> >>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> >>>   filefrag -v k # grep for an extent without unwritten || fail
> >>
> >> right, that's what I see too in testing.
> >>
> >> But would the coreutils install have done a preallocation of the destination file?
> >>
> >> Otherwise this looks like a different bug...
> >>
> >>> This particular issue has been discussed so far at:
> >>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
> >>> Note there it was stated there that ext4 had this
> >>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
> >>
> >> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
> >>
> >> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> > 
> > Unwritten extent mapping behaves in an unexpected way due to
> > buffered writeback not occurring immediately. Extent conversion
> > doesn't occur until the data is on disk, and for buffered IO you
> > need an fdatasync to ensure that has occurred.
> > 
> > That is: 
> > 
> > $ xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c "bmap -vp" /mnt/test/foo
> > wrote 5120/5120 bytes at offset 0
> > 5 KiB, 2 ops; 0.0000 sec (62.600 MiB/sec and 25641.0256 ops/sec)
> > /mnt/test/foo:
> >  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> >    0: [0..20479]:      268984..289463    0 (268984..289463) 20480 10000
> > 
> > Data has not been written yet, so it is still unwritten. The same
> > test with a fsync shows:
> > 
> > $ sudo xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c fsync -c "bmap -vp" /mnt/test/foo
> > wrote 5120/5120 bytes at offset 0
> > 5 KiB, 2 ops; 0.0000 sec (87.193 MiB/sec and 35714.2857 ops/sec)
> > /mnt/test/foo:
> >  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> >    0: [0..15]:         268984..268999    0 (268984..268999)    16 00000
> >    1: [16..20479]:     269000..289463    0 (269000..289463) 20464 10000
> > 
> > Everything is fine.
> > 
> > So this seems like an application error to me. If you are going to
> > use fiemap to determine what ranges to copy, then you have to
> > fdatasync the source file first to guarantee that preallocated
> > extents have been converted to written state before mapping the
> > file....
> 
> Well IMHO there should be a difference between
> knowing where you are going to write, and actually writing to disk.
> I.E. one shouldn't need to write the whole way to the device
> before returning a valid fiemap.  If a particular file system
> implementation needs to sync to return a valid fiemap,
> then it should be implicit.

No, this was explicitly laid out in the fiemap interface discussions
- it's up to the applicaiton to decide if it needs to do a sync
first. That's what the FIEMAP_FLAG_SYNC control flag is for.
This forces the fiemap call to do a fsync _before_ getting the
mapping. If you want to know the exact layout of the file is, then
you must use this flag.

Even so, it is recognised that this is racy - any use of the block
map has a time-of-read-to-time-of-use race condition that means you
have to _verify_ the copy after it completes. FYI, that's what
xfs_fsr does when copying based on extent maps - if the inode has
changed in _any way_ during the copy, it aborts the copy of that
file.

i.e. using fiemap for copying is at best a *hint* about the regions
that need copying, and it is in no way a guarantee that you'll get
all the information you need to make accurate copy even if you do
use the synchronous variant.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-15  0:09                   ` Dave Chinner
  0 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-15  0:09 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: linux-ext4, Eric Sandeen, coreutils, Markus Trippelsdorf, xfs-oss

On Fri, Apr 15, 2011 at 12:29:46AM +0100, Pádraig Brady wrote:
> On 14/04/11 23:59, Dave Chinner wrote:
> > On Thu, Apr 14, 2011 at 10:50:10AM -0500, Eric Sandeen wrote:
> >> On 4/14/11 9:59 AM, Pádraig Brady wrote:
> >>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
> >>>>>> Hi Pádraig,
> >>>>>>
> >>>>>> here you go:
> >>>>>> + filefrag -v unwritten.withdata                                                                                                                     
> >>>>>> Filesystem type is: ef53                                                                                                                             
> >>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
> >>>>>>  ext logical physical expected length flags                                                                                                          
> >>>>>>    0       0   274432            2560 unwritten,eof                                                                                                  
> >>>>>> unwritten.withdata: 1 extent found
> >>>>>>
> >>>>>> Please notice that this also happens with ext4 on the same kernel. 
> >>>>>> Btrfs is fine.
> >>>>>
> >>>> `filefrag -vs` fixes the issue on both xfs and ext4.
> >>>
> >>> So in summary, currently on (2.6.39-rc3), the following
> >>> will (usually?) report a single unwritten extent,
> >>> on both ext4 and xfs
> >>>
> >>>   fallocate -l 10MiB -n k
> >>>   dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
> >>>   filefrag -v k # grep for an extent without unwritten || fail
> >>
> >> right, that's what I see too in testing.
> >>
> >> But would the coreutils install have done a preallocation of the destination file?
> >>
> >> Otherwise this looks like a different bug...
> >>
> >>> This particular issue has been discussed so far at:
> >>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
> >>> Note there it was stated there that ext4 had this
> >>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
> >>
> >> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
> >>
> >> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
> > 
> > Unwritten extent mapping behaves in an unexpected way due to
> > buffered writeback not occurring immediately. Extent conversion
> > doesn't occur until the data is on disk, and for buffered IO you
> > need an fdatasync to ensure that has occurred.
> > 
> > That is: 
> > 
> > $ xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c "bmap -vp" /mnt/test/foo
> > wrote 5120/5120 bytes at offset 0
> > 5 KiB, 2 ops; 0.0000 sec (62.600 MiB/sec and 25641.0256 ops/sec)
> > /mnt/test/foo:
> >  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> >    0: [0..20479]:      268984..289463    0 (268984..289463) 20480 10000
> > 
> > Data has not been written yet, so it is still unwritten. The same
> > test with a fsync shows:
> > 
> > $ sudo xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c fsync -c "bmap -vp" /mnt/test/foo
> > wrote 5120/5120 bytes at offset 0
> > 5 KiB, 2 ops; 0.0000 sec (87.193 MiB/sec and 35714.2857 ops/sec)
> > /mnt/test/foo:
> >  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> >    0: [0..15]:         268984..268999    0 (268984..268999)    16 00000
> >    1: [16..20479]:     269000..289463    0 (269000..289463) 20464 10000
> > 
> > Everything is fine.
> > 
> > So this seems like an application error to me. If you are going to
> > use fiemap to determine what ranges to copy, then you have to
> > fdatasync the source file first to guarantee that preallocated
> > extents have been converted to written state before mapping the
> > file....
> 
> Well IMHO there should be a difference between
> knowing where you are going to write, and actually writing to disk.
> I.E. one shouldn't need to write the whole way to the device
> before returning a valid fiemap.  If a particular file system
> implementation needs to sync to return a valid fiemap,
> then it should be implicit.

No, this was explicitly laid out in the fiemap interface discussions
- it's up to the applicaiton to decide if it needs to do a sync
first. That's what the FIEMAP_FLAG_SYNC control flag is for.
This forces the fiemap call to do a fsync _before_ getting the
mapping. If you want to know the exact layout of the file is, then
you must use this flag.

Even so, it is recognised that this is racy - any use of the block
map has a time-of-read-to-time-of-use race condition that means you
have to _verify_ the copy after it completes. FYI, that's what
xfs_fsr does when copying based on extent maps - if the inode has
changed in _any way_ during the copy, it aborts the copy of that
file.

i.e. using fiemap for copying is at best a *hint* about the regions
that need copying, and it is in no way a guarantee that you'll get
all the information you need to make accurate copy even if you do
use the synchronous variant.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-15  0:09                   ` Dave Chinner
@ 2011-04-15  5:01                     ` Andreas Dilger
  -1 siblings, 0 replies; 117+ messages in thread
From: Andreas Dilger @ 2011-04-15  5:01 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Pádraig Brady, Eric Sandeen, linux-ext4, coreutils,
	Markus Trippelsdorf, xfs-oss

On 2011-04-14, at 6:09 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Apr 15, 2011 at 12:29:46AM +0100, Pádraig Brady wrote:
>> On 14/04/11 23:59, Dave Chinner wrote:
>>> On Thu, Apr 14, 2011 at 10:50:10AM -0500, Eric Sandeen wrote:
>>>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>>>> Hi Pádraig,
>>>>>>>> 
>>>>>>>> here you go:
>>>>>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>>>>>> Filesystem type is: ef53                                                                                                                             
>>>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>>>>> ext logical physical expected length flags                                                                                                          
>>>>>>>>   0       0   274432            2560 unwritten,eof                                                                                                  
>>>>>>>> unwritten.withdata: 1 extent found
>>>>>>>> 
>>>>>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>>>>>> Btrfs is fine.
>>>>>>> 
>>>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>>> 
>>>>> So in summary, currently on (2.6.39-rc3), the following
>>>>> will (usually?) report a single unwritten extent,
>>>>> on both ext4 and xfs
>>>>> 
>>>>>  fallocate -l 10MiB -n k
>>>>>  dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>>>  filefrag -v k # grep for an extent without unwritten || fail
>>>> 
>>>> right, that's what I see too in testing.
>>>> 
>>>> But would the coreutils install have done a preallocation of the destination file?
>>>> 
>>>> Otherwise this looks like a different bug...
>>>> 
>>>>> This particular issue has been discussed so far at:
>>>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>>>> Note there it was stated there that ext4 had this
>>>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>>> 
>>>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>>> 
>>>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
>>> 
>>> Unwritten extent mapping behaves in an unexpected way due to
>>> buffered writeback not occurring immediately. Extent conversion
>>> doesn't occur until the data is on disk, and for buffered IO you
>>> need an fdatasync to ensure that has occurred.
>>> 
>>> That is: 
>>> 
>>> $ xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c "bmap -vp" /mnt/test/foo
>>> wrote 5120/5120 bytes at offset 0
>>> 5 KiB, 2 ops; 0.0000 sec (62.600 MiB/sec and 25641.0256 ops/sec)
>>> /mnt/test/foo:
>>> EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>>>   0: [0..20479]:      268984..289463    0 (268984..289463) 20480 10000
>>> 
>>> Data has not been written yet, so it is still unwritten. The same
>>> test with a fsync shows:
>>> 
>>> $ sudo xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c fsync -c "bmap -vp" /mnt/test/foo
>>> wrote 5120/5120 bytes at offset 0
>>> 5 KiB, 2 ops; 0.0000 sec (87.193 MiB/sec and 35714.2857 ops/sec)
>>> /mnt/test/foo:
>>> EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>>>   0: [0..15]:         268984..268999    0 (268984..268999)    16 00000
>>>   1: [16..20479]:     269000..289463    0 (269000..289463) 20464 10000
>>> 
>>> Everything is fine.
>>> 
>>> So this seems like an application error to me. If you are going to
>>> use fiemap to determine what ranges to copy, then you have to
>>> fdatasync the source file first to guarantee that preallocated
>>> extents have been converted to written state before mapping the
>>> file....
>> 
>> Well IMHO there should be a difference between
>> knowing where you are going to write, and actually writing to disk.
>> I.E. one shouldn't need to write the whole way to the device
>> before returning a valid fiemap.  If a particular file system
>> implementation needs to sync to return a valid fiemap,
>> then it should be implicit.
> 
> No, this was explicitly laid out in the fiemap interface discussions
> - it's up to the applicaiton to decide if it needs to do a sync
> first. That's what the FIEMAP_FLAG_SYNC control flag is for.
> This forces the fiemap call to do a fsync _before_ getting the
> mapping. If you want to know the exact layout of the file is, then
> you must use this flag.
> 
> Even so, it is recognised that this is racy - any use of the block
> map has a time-of-read-to-time-of-use race condition that means you
> have to _verify_ the copy after it completes. FYI, that's what
> xfs_fsr does when copying based on extent maps - if the inode has
> changed in _any way_ during the copy, it aborts the copy of that
> file.
> 
> i.e. using fiemap for copying is at best a *hint* about the regions
> that need copying, and it is in no way a guarantee that you'll get
> all the information you need to make accurate copy even if you do
> use the synchronous variant.

I would tend to agree with Pádraig. If there is data in the mapping (regardless of whether it is on disk or not), the FIEMAP should return this to the caller. The SYNC flag is only intended to flush the data to disk for tools that are doing direct-to-disk operations on the data. 

Otherwise the UNMAPPED flag is useless, since even with "check, copy, check" there is no guarantee that the inode is changed _during_ the copy operation. It could have been written into the cache _before_ the FIEMAP and remain unchanged and in your case there would be no way to know any data was ever written to the file without SYNC on ever single file before FIEMAP.

Cheers, Andreas--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-15  5:01                     ` Andreas Dilger
  0 siblings, 0 replies; 117+ messages in thread
From: Andreas Dilger @ 2011-04-15  5:01 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric Sandeen, xfs-oss, coreutils, linux-ext4, Pádraig Brady,
	Markus Trippelsdorf

On 2011-04-14, at 6:09 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Apr 15, 2011 at 12:29:46AM +0100, Pádraig Brady wrote:
>> On 14/04/11 23:59, Dave Chinner wrote:
>>> On Thu, Apr 14, 2011 at 10:50:10AM -0500, Eric Sandeen wrote:
>>>> On 4/14/11 9:59 AM, Pádraig Brady wrote:
>>>>> On 14/04/11 15:02, Markus Trippelsdorf wrote:
>>>>>>>> Hi Pádraig,
>>>>>>>> 
>>>>>>>> here you go:
>>>>>>>> + filefrag -v unwritten.withdata                                                                                                                     
>>>>>>>> Filesystem type is: ef53                                                                                                                             
>>>>>>>> File size of unwritten.withdata is 5120 (2 blocks, blocksize 4096)                                                                                   
>>>>>>>> ext logical physical expected length flags                                                                                                          
>>>>>>>>   0       0   274432            2560 unwritten,eof                                                                                                  
>>>>>>>> unwritten.withdata: 1 extent found
>>>>>>>> 
>>>>>>>> Please notice that this also happens with ext4 on the same kernel. 
>>>>>>>> Btrfs is fine.
>>>>>>> 
>>>>>> `filefrag -vs` fixes the issue on both xfs and ext4.
>>>>> 
>>>>> So in summary, currently on (2.6.39-rc3), the following
>>>>> will (usually?) report a single unwritten extent,
>>>>> on both ext4 and xfs
>>>>> 
>>>>>  fallocate -l 10MiB -n k
>>>>>  dd count=10 if=/dev/urandom conv=notrunc iflag=fullblock of=k
>>>>>  filefrag -v k # grep for an extent without unwritten || fail
>>>> 
>>>> right, that's what I see too in testing.
>>>> 
>>>> But would the coreutils install have done a preallocation of the destination file?
>>>> 
>>>> Otherwise this looks like a different bug...
>>>> 
>>>>> This particular issue has been discussed so far at:
>>>>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=8411
>>>>> Note there it was stated there that ext4 had this
>>>>> fixed as of 2.6.39-rc1, so maybe there is something lurking?
>>>> 
>>>> ext4 got a fix, but not xfs, I guess.  My poor brain can't remember, I think I started looking into it, but it's clearly still broken.
>>>> 
>>>> Still, I don't know for sure what happened to Markus - did something preallocate, in his case?
>>> 
>>> Unwritten extent mapping behaves in an unexpected way due to
>>> buffered writeback not occurring immediately. Extent conversion
>>> doesn't occur until the data is on disk, and for buffered IO you
>>> need an fdatasync to ensure that has occurred.
>>> 
>>> That is: 
>>> 
>>> $ xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c "bmap -vp" /mnt/test/foo
>>> wrote 5120/5120 bytes at offset 0
>>> 5 KiB, 2 ops; 0.0000 sec (62.600 MiB/sec and 25641.0256 ops/sec)
>>> /mnt/test/foo:
>>> EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>>>   0: [0..20479]:      268984..289463    0 (268984..289463) 20480 10000
>>> 
>>> Data has not been written yet, so it is still unwritten. The same
>>> test with a fsync shows:
>>> 
>>> $ sudo xfs_io -f -c "resvsp 0 10m" -c "pwrite 0 5120" -c fsync -c "bmap -vp" /mnt/test/foo
>>> wrote 5120/5120 bytes at offset 0
>>> 5 KiB, 2 ops; 0.0000 sec (87.193 MiB/sec and 35714.2857 ops/sec)
>>> /mnt/test/foo:
>>> EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>>>   0: [0..15]:         268984..268999    0 (268984..268999)    16 00000
>>>   1: [16..20479]:     269000..289463    0 (269000..289463) 20464 10000
>>> 
>>> Everything is fine.
>>> 
>>> So this seems like an application error to me. If you are going to
>>> use fiemap to determine what ranges to copy, then you have to
>>> fdatasync the source file first to guarantee that preallocated
>>> extents have been converted to written state before mapping the
>>> file....
>> 
>> Well IMHO there should be a difference between
>> knowing where you are going to write, and actually writing to disk.
>> I.E. one shouldn't need to write the whole way to the device
>> before returning a valid fiemap.  If a particular file system
>> implementation needs to sync to return a valid fiemap,
>> then it should be implicit.
> 
> No, this was explicitly laid out in the fiemap interface discussions
> - it's up to the applicaiton to decide if it needs to do a sync
> first. That's what the FIEMAP_FLAG_SYNC control flag is for.
> This forces the fiemap call to do a fsync _before_ getting the
> mapping. If you want to know the exact layout of the file is, then
> you must use this flag.
> 
> Even so, it is recognised that this is racy - any use of the block
> map has a time-of-read-to-time-of-use race condition that means you
> have to _verify_ the copy after it completes. FYI, that's what
> xfs_fsr does when copying based on extent maps - if the inode has
> changed in _any way_ during the copy, it aborts the copy of that
> file.
> 
> i.e. using fiemap for copying is at best a *hint* about the regions
> that need copying, and it is in no way a guarantee that you'll get
> all the information you need to make accurate copy even if you do
> use the synchronous variant.

I would tend to agree with Pádraig. If there is data in the mapping (regardless of whether it is on disk or not), the FIEMAP should return this to the caller. The SYNC flag is only intended to flush the data to disk for tools that are doing direct-to-disk operations on the data. 

Otherwise the UNMAPPED flag is useless, since even with "check, copy, check" there is no guarantee that the inode is changed _during_ the copy operation. It could have been written into the cache _before_ the FIEMAP and remain unchanged and in your case there would be no way to know any data was ever written to the file without SYNC on ever single file before FIEMAP.

Cheers, Andreas
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-15  0:09                   ` Dave Chinner
@ 2011-04-15  8:53                     ` Jim Meyering
  -1 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-15  8:53 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric Sandeen, xfs-oss, coreutils-mXXj517/zsQ,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA, Markus Trippelsdorf

Dave Chinner wrote:
> On Fri, Apr 15, 2011 at 12:29:46AM +0100, Pádraig Brady wrote:
...
>> Well IMHO there should be a difference between
>> knowing where you are going to write, and actually writing to disk.
>> I.E. one shouldn't need to write the whole way to the device
>> before returning a valid fiemap.  If a particular file system
>> implementation needs to sync to return a valid fiemap,
>> then it should be implicit.
>
> No, this was explicitly laid out in the fiemap interface discussions
> - it's up to the applicaiton to decide if it needs to do a sync
> first. That's what the FIEMAP_FLAG_SYNC control flag is for.
> This forces the fiemap call to do a fsync _before_ getting the
> mapping. If you want to know the exact layout of the file is, then
> you must use this flag.
>
> Even so, it is recognised that this is racy - any use of the block
> map has a time-of-read-to-time-of-use race condition that means you
> have to _verify_ the copy after it completes. FYI, that's what
> xfs_fsr does when copying based on extent maps - if the inode has
> changed in _any way_ during the copy, it aborts the copy of that
> file.
>
> i.e. using fiemap for copying is at best a *hint* about the regions
> that need copying, and it is in no way a guarantee that you'll get
> all the information you need to make accurate copy even if you do
> use the synchronous variant.

Hi Dave,

Can you or anyone else point to authoritative documentation
(or even a summary of those "discussions") of FIEMAP semantics?
I'm hoping the semantics are the same for all file system types.

I had understood that cp's use of FIEMAP_FLAG_SYNC was not only
unnecessary, but even undesirable, given a new-enough kernel.
That's why coreutils-8.11 resorts to using the workaround of
FIEMAP_FLAG_SYNC only when uname says the kernel is 2.6.[0..38].

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-15  8:53                     ` Jim Meyering
  0 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-15  8:53 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric Sandeen, xfs-oss, coreutils, linux-ext4, Pádraig Brady,
	Markus Trippelsdorf

Dave Chinner wrote:
> On Fri, Apr 15, 2011 at 12:29:46AM +0100, Pádraig Brady wrote:
...
>> Well IMHO there should be a difference between
>> knowing where you are going to write, and actually writing to disk.
>> I.E. one shouldn't need to write the whole way to the device
>> before returning a valid fiemap.  If a particular file system
>> implementation needs to sync to return a valid fiemap,
>> then it should be implicit.
>
> No, this was explicitly laid out in the fiemap interface discussions
> - it's up to the applicaiton to decide if it needs to do a sync
> first. That's what the FIEMAP_FLAG_SYNC control flag is for.
> This forces the fiemap call to do a fsync _before_ getting the
> mapping. If you want to know the exact layout of the file is, then
> you must use this flag.
>
> Even so, it is recognised that this is racy - any use of the block
> map has a time-of-read-to-time-of-use race condition that means you
> have to _verify_ the copy after it completes. FYI, that's what
> xfs_fsr does when copying based on extent maps - if the inode has
> changed in _any way_ during the copy, it aborts the copy of that
> file.
>
> i.e. using fiemap for copying is at best a *hint* about the regions
> that need copying, and it is in no way a guarantee that you'll get
> all the information you need to make accurate copy even if you do
> use the synchronous variant.

Hi Dave,

Can you or anyone else point to authoritative documentation
(or even a summary of those "discussions") of FIEMAP semantics?
I'm hoping the semantics are the same for all file system types.

I had understood that cp's use of FIEMAP_FLAG_SYNC was not only
unnecessary, but even undesirable, given a new-enough kernel.
That's why coreutils-8.11 resorts to using the workaround of
FIEMAP_FLAG_SYNC only when uname says the kernel is 2.6.[0..38].

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-15  8:53                     ` Jim Meyering
@ 2011-04-15 17:16                       ` Christoph Hellwig
  -1 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-15 17:16 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Dave Chinner, P?draig Brady, linux-ext4, Eric Sandeen, coreutils,
	Markus Trippelsdorf, xfs-oss

On Fri, Apr 15, 2011 at 10:53:48AM +0200, Jim Meyering wrote:
> Can you or anyone else point to authoritative documentation
> (or even a summary of those "discussions") of FIEMAP semantics?

A large part of the problem is that there's none and the semantics
is a hode-podge of random flags that mostly make little sense
outside of fs developers debugging the layout that we created.  The
closest to a description is Documentation/filesystems/fiemap.txt
in the kernel tree, but that doesn't document most of the delicate
points.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-15 17:16                       ` Christoph Hellwig
  0 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-15 17:16 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Eric Sandeen, xfs-oss, coreutils, linux-ext4, P?draig Brady,
	Markus Trippelsdorf

On Fri, Apr 15, 2011 at 10:53:48AM +0200, Jim Meyering wrote:
> Can you or anyone else point to authoritative documentation
> (or even a summary of those "discussions") of FIEMAP semantics?

A large part of the problem is that there's none and the semantics
is a hode-podge of random flags that mostly make little sense
outside of fs developers debugging the layout that we created.  The
closest to a description is Documentation/filesystems/fiemap.txt
in the kernel tree, but that doesn't document most of the delicate
points.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-15 17:16                       ` Christoph Hellwig
@ 2011-04-15 17:24                           ` Eric Blake
  -1 siblings, 0 replies; 117+ messages in thread
From: Eric Blake @ 2011-04-15 17:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, Dave Chinner, xfs-oss, coreutils-mXXj517/zsQ,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA, Markus Trippelsdorf

[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]

On 04/15/2011 11:16 AM, Christoph Hellwig wrote:
> On Fri, Apr 15, 2011 at 10:53:48AM +0200, Jim Meyering wrote:
>> Can you or anyone else point to authoritative documentation
>> (or even a summary of those "discussions") of FIEMAP semantics?
> 
> A large part of the problem is that there's none and the semantics
> is a hode-podge of random flags that mostly make little sense
> outside of fs developers debugging the layout that we created.  The
> closest to a description is Documentation/filesystems/fiemap.txt
> in the kernel tree, but that doesn't document most of the delicate
> points.

Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
to define and easier to use) interface for discovering the regions of a
file that only contain NUL bytes?  FIEMAP is a great interface at
exposing everything possible, but coreutils doesn't necessarily need all
that complexity just for determining when it is safe to skip over a
portion of a file that is either unallocated or unchanged from the
initial 0 contents.

-- 
Eric Blake   eblake-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org    +1-801-349-2682
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-15 17:24                           ` Eric Blake
  0 siblings, 0 replies; 117+ messages in thread
From: Eric Blake @ 2011-04-15 17:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, Jim Meyering, xfs-oss, coreutils, linux-ext4,
	Markus Trippelsdorf


[-- Attachment #1.1: Type: text/plain, Size: 1238 bytes --]

On 04/15/2011 11:16 AM, Christoph Hellwig wrote:
> On Fri, Apr 15, 2011 at 10:53:48AM +0200, Jim Meyering wrote:
>> Can you or anyone else point to authoritative documentation
>> (or even a summary of those "discussions") of FIEMAP semantics?
> 
> A large part of the problem is that there's none and the semantics
> is a hode-podge of random flags that mostly make little sense
> outside of fs developers debugging the layout that we created.  The
> closest to a description is Documentation/filesystems/fiemap.txt
> in the kernel tree, but that doesn't document most of the delicate
> points.

Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
to define and easier to use) interface for discovering the regions of a
file that only contain NUL bytes?  FIEMAP is a great interface at
exposing everything possible, but coreutils doesn't necessarily need all
that complexity just for determining when it is safe to skip over a
portion of a file that is either unallocated or unchanged from the
initial 0 contents.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-15 17:24                           ` Eric Blake
@ 2011-04-15 17:26                             ` Christoph Hellwig
  -1 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-15 17:26 UTC (permalink / raw)
  To: Eric Blake
  Cc: Christoph Hellwig, Jim Meyering, Eric Sandeen, Dave Chinner,
	xfs-oss, coreutils, linux-ext4, Markus Trippelsdorf

On Fri, Apr 15, 2011 at 11:24:19AM -0600, Eric Blake wrote:
> Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
> and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
> to define and easier to use) interface for discovering the regions of a
> file that only contain NUL bytes?

Yes, I've already suggested that both in this thread and on IRC.

For efficient copies it's the only usable interface.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-15 17:26                             ` Christoph Hellwig
  0 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-15 17:26 UTC (permalink / raw)
  To: Eric Blake
  Cc: Eric Sandeen, Jim Meyering, xfs-oss, Christoph Hellwig,
	coreutils, linux-ext4, Markus Trippelsdorf

On Fri, Apr 15, 2011 at 11:24:19AM -0600, Eric Blake wrote:
> Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
> and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
> to define and easier to use) interface for discovering the regions of a
> file that only contain NUL bytes?

Yes, I've already suggested that both in this thread and on IRC.

For efficient copies it's the only usable interface.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-15 17:26                             ` Christoph Hellwig
@ 2011-04-15 22:28                                 ` Andreas Dilger
  -1 siblings, 0 replies; 117+ messages in thread
From: Andreas Dilger @ 2011-04-15 22:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Eric Sandeen, xfs-oss, coreutils-mXXj517/zsQ,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA, Markus Trippelsdorf

On 2011-04-15, at 11:26 AM, Christoph Hellwig wrote:
> On Fri, Apr 15, 2011 at 11:24:19AM -0600, Eric Blake wrote:
>> Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
>> and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
>> to define and easier to use) interface for discovering the regions of a
>> file that only contain NUL bytes?
> 
> Yes, I've already suggested that both in this thread and on IRC.
> 
> For efficient copies it's the only usable interface.

I suspect that these bugs would have still existed whether the interface is SEEK_HOLE/SEEK_DATA, or FIEMAP.  The main problem is that the delalloc pages were not accounted for correctly during layout traversal.

For ext4 I think it is sufficient to add another case to ext4_ext_fiemap_cb() to check the pagecache for unmapped pages when it finds an uninitialized extent on disk.  This will be very similar to the fix for finding holes in the on-disk mapping.

Cheers, Andreas

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-15 22:28                                 ` Andreas Dilger
  0 siblings, 0 replies; 117+ messages in thread
From: Andreas Dilger @ 2011-04-15 22:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, Jim Meyering, xfs-oss, coreutils, linux-ext4,
	Eric Blake, Markus Trippelsdorf

On 2011-04-15, at 11:26 AM, Christoph Hellwig wrote:
> On Fri, Apr 15, 2011 at 11:24:19AM -0600, Eric Blake wrote:
>> Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
>> and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
>> to define and easier to use) interface for discovering the regions of a
>> file that only contain NUL bytes?
> 
> Yes, I've already suggested that both in this thread and on IRC.
> 
> For efficient copies it's the only usable interface.

I suspect that these bugs would have still existed whether the interface is SEEK_HOLE/SEEK_DATA, or FIEMAP.  The main problem is that the delalloc pages were not accounted for correctly during layout traversal.

For ext4 I think it is sufficient to add another case to ext4_ext_fiemap_cb() to check the pagecache for unmapped pages when it finds an uninitialized extent on disk.  This will be very similar to the fix for finding holes in the on-disk mapping.

Cheers, Andreas





_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-15 22:28                                 ` Andreas Dilger
@ 2011-04-16  0:25                                   ` Dave Chinner
  -1 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-16  0:25 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Christoph Hellwig, Eric Blake, Jim Meyering, Eric Sandeen,
	xfs-oss, coreutils, linux-ext4, Markus Trippelsdorf

On Fri, Apr 15, 2011 at 04:28:37PM -0600, Andreas Dilger wrote:
> On 2011-04-15, at 11:26 AM, Christoph Hellwig wrote:
> > On Fri, Apr 15, 2011 at 11:24:19AM -0600, Eric Blake wrote:
> >> Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
> >> and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
> >> to define and easier to use) interface for discovering the regions of a
> >> file that only contain NUL bytes?
> > 
> > Yes, I've already suggested that both in this thread and on IRC.
> > 
> > For efficient copies it's the only usable interface.
> 
> I suspect that these bugs would have still existed whether the
> interface is SEEK_HOLE/SEEK_DATA, or FIEMAP.  The main problem is
> that the delalloc pages were not accounted for correctly during
> layout traversal..

It's not delalloc that is the problem - XFS accounts for them just
fine in the extent map when asked. However, XFS does speculative
delayed allocation over regions that contain no data, so if the
core-utils folk are assuming that delalloc extents contain data and
need to be copied, they're in for a nasty surprise.

However, every example I've seen in this thread has had to do with
unwritten extents not changing state when data is written into the
page cache. i.e. people are struggling with the expected behaviour
of unwritten extents.

That is, unwritten extent remain unwritten extents until data has
been _physically_ written to them. If there is data in the page
cache over the unwritten extent, it is still an unwritten extent.
If the system crashes while in this state, then the extent _must_
remain an unwritten extent after recovery, otherwise it exposes
stale data.

Further, using FIEMAP to determine where the data is that needs
copying is extremely fragile. What happens when FIEMAP grows a
different type of extent that contains data? cp will break, because
it doesn't think it needs to copy data in extents of an unknown
type. Or it will break because it thinks it needs to copy it and
there's something in it that should not be copied.

Also, cp shoul dnot be trying to replicate the physical layout of
the file when copying it - that's for the filesystem to decide and
having userspace try to do this is a sure recipe for causing severe
filesystem fragmentation. The filesystems already do an excellent
job of optimising allocation - userspace should not be trying to
second guess what is optimal layout for the filesystem.

Fundamentally, what the core-utils guys want is FIEMAP to tell them
where data is in the file, regardless of whether it is in memory or
on disk. That is not what FIEMAP is intended for and matches
SEEK_HOLE/SEEK_DATA precisely.

SEEK_HOLE/SEEK_DATA have very well understood semantics and is
designed specifically for optimising acceess to sparse files. This
interface abstracts all the details of how different filesystems
store their data so the application doesn't need to care about it.
The API is so, so much simpler to use and understand, to. And if the
filesystem has data in cache over an unwritten extent, then by
definition it's still data to be returned by SEEK_DATA. If it fails
to return the range as such then the implementation is broken.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-16  0:25                                   ` Dave Chinner
  0 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-16  0:25 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Eric Sandeen, Jim Meyering, xfs-oss, Christoph Hellwig,
	coreutils, linux-ext4, Eric Blake, Markus Trippelsdorf

On Fri, Apr 15, 2011 at 04:28:37PM -0600, Andreas Dilger wrote:
> On 2011-04-15, at 11:26 AM, Christoph Hellwig wrote:
> > On Fri, Apr 15, 2011 at 11:24:19AM -0600, Eric Blake wrote:
> >> Would it be worth borrowing from Solaris' semantics and adding SEEK_HOLE
> >> and SEEK_DATA to lseek(2), as a higher level (less-detailed, but easier
> >> to define and easier to use) interface for discovering the regions of a
> >> file that only contain NUL bytes?
> > 
> > Yes, I've already suggested that both in this thread and on IRC.
> > 
> > For efficient copies it's the only usable interface.
> 
> I suspect that these bugs would have still existed whether the
> interface is SEEK_HOLE/SEEK_DATA, or FIEMAP.  The main problem is
> that the delalloc pages were not accounted for correctly during
> layout traversal..

It's not delalloc that is the problem - XFS accounts for them just
fine in the extent map when asked. However, XFS does speculative
delayed allocation over regions that contain no data, so if the
core-utils folk are assuming that delalloc extents contain data and
need to be copied, they're in for a nasty surprise.

However, every example I've seen in this thread has had to do with
unwritten extents not changing state when data is written into the
page cache. i.e. people are struggling with the expected behaviour
of unwritten extents.

That is, unwritten extent remain unwritten extents until data has
been _physically_ written to them. If there is data in the page
cache over the unwritten extent, it is still an unwritten extent.
If the system crashes while in this state, then the extent _must_
remain an unwritten extent after recovery, otherwise it exposes
stale data.

Further, using FIEMAP to determine where the data is that needs
copying is extremely fragile. What happens when FIEMAP grows a
different type of extent that contains data? cp will break, because
it doesn't think it needs to copy data in extents of an unknown
type. Or it will break because it thinks it needs to copy it and
there's something in it that should not be copied.

Also, cp shoul dnot be trying to replicate the physical layout of
the file when copying it - that's for the filesystem to decide and
having userspace try to do this is a sure recipe for causing severe
filesystem fragmentation. The filesystems already do an excellent
job of optimising allocation - userspace should not be trying to
second guess what is optimal layout for the filesystem.

Fundamentally, what the core-utils guys want is FIEMAP to tell them
where data is in the file, regardless of whether it is in memory or
on disk. That is not what FIEMAP is intended for and matches
SEEK_HOLE/SEEK_DATA precisely.

SEEK_HOLE/SEEK_DATA have very well understood semantics and is
designed specifically for optimising acceess to sparse files. This
interface abstracts all the details of how different filesystems
store their data so the application doesn't need to care about it.
The API is so, so much simpler to use and understand, to. And if the
filesystem has data in cache over an unwritten extent, then by
definition it's still data to be returned by SEEK_DATA. If it fails
to return the range as such then the implementation is broken.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-15  5:01                     ` Andreas Dilger
@ 2011-04-16  0:50                       ` Dave Chinner
  -1 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-16  0:50 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Pádraig Brady, Eric Sandeen, linux-ext4, coreutils,
	Markus Trippelsdorf, xfs-oss

On Thu, Apr 14, 2011 at 11:01:04PM -0600, Andreas Dilger wrote:
> On 2011-04-14, at 6:09 PM, Dave Chinner <david@fromorbit.com>
> wrote:
> > No, this was explicitly laid out in the fiemap interface
> > discussions - it's up to the applicaiton to decide if it needs
> > to do a sync first. That's what the FIEMAP_FLAG_SYNC control
> > flag is for.  This forces the fiemap call to do a fsync _before_
> > getting the mapping. If you want to know the exact layout of the
> > file is, then you must use this flag.
> > 
> > Even so, it is recognised that this is racy - any use of the
> > block map has a time-of-read-to-time-of-use race condition that
> > means you have to _verify_ the copy after it completes. FYI,
> > that's what xfs_fsr does when copying based on extent maps - if
> > the inode has changed in _any way_ during the copy, it aborts
> > the copy of that file.
> > 
> > i.e. using fiemap for copying is at best a *hint* about the
> > regions that need copying, and it is in no way a guarantee that
> > you'll get all the information you need to make accurate copy
> > even if you do use the synchronous variant.
> 
> I would tend to agree with Pádraig. If there is data in the
> mapping (regardless of whether it is on disk or not), the FIEMAP
> should return this to the caller.  The SYNC flag is only intended
> to flush the data to disk for tools that are doing
> direct-to-disk operations on the data.

What you are suggesting is that FIEMAP needs to be page cache
coherent, and that is far, far away from the intended use of the
interface. Even consiering that you need to looking for active pages
in the page cache when mapping extents say to me that you are
doing something very wrong.

Unwritten extents remain unwritten until the data is physically
written to them. Therefore, to change their state, you need to sync
the data covering the range.  _Lying_ about whether an extent is in
the unwritten state is a really bad precedence to set, especially as
it is then guaranteed to change state when a crash occurs (Why did
recovery zero out my file? FIEMAP said it contained data before my
system crashed!).

Don't try to mangle the API semantics every time someone doesn't
understand how to use FIEMAP reliably. If you need the extent list
returned by FIEMAP to match what is in the page cache *regardless of

> Otherwise the UNMAPPED flag is useless, since even with "check,
> copy, check" there is no guarantee that the inode is changed
> _during_ the copy operation. It could have been written into the
> cache _before_ the FIEMAP and remain unchanged and in your case
> there would be no way to know any data was ever written to the
> file without SYNC on ever single file before FIEMAP.

I can't find any UNMAPPED flag in the FIEMAP interface, so I have no
idea what you are refering to here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-16  0:50                       ` Dave Chinner
  0 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-16  0:50 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Eric Sandeen, xfs-oss, coreutils, linux-ext4, Pádraig Brady,
	Markus Trippelsdorf

On Thu, Apr 14, 2011 at 11:01:04PM -0600, Andreas Dilger wrote:
> On 2011-04-14, at 6:09 PM, Dave Chinner <david@fromorbit.com>
> wrote:
> > No, this was explicitly laid out in the fiemap interface
> > discussions - it's up to the applicaiton to decide if it needs
> > to do a sync first. That's what the FIEMAP_FLAG_SYNC control
> > flag is for.  This forces the fiemap call to do a fsync _before_
> > getting the mapping. If you want to know the exact layout of the
> > file is, then you must use this flag.
> > 
> > Even so, it is recognised that this is racy - any use of the
> > block map has a time-of-read-to-time-of-use race condition that
> > means you have to _verify_ the copy after it completes. FYI,
> > that's what xfs_fsr does when copying based on extent maps - if
> > the inode has changed in _any way_ during the copy, it aborts
> > the copy of that file.
> > 
> > i.e. using fiemap for copying is at best a *hint* about the
> > regions that need copying, and it is in no way a guarantee that
> > you'll get all the information you need to make accurate copy
> > even if you do use the synchronous variant.
> 
> I would tend to agree with Pádraig. If there is data in the
> mapping (regardless of whether it is on disk or not), the FIEMAP
> should return this to the caller.  The SYNC flag is only intended
> to flush the data to disk for tools that are doing
> direct-to-disk operations on the data.

What you are suggesting is that FIEMAP needs to be page cache
coherent, and that is far, far away from the intended use of the
interface. Even consiering that you need to looking for active pages
in the page cache when mapping extents say to me that you are
doing something very wrong.

Unwritten extents remain unwritten until the data is physically
written to them. Therefore, to change their state, you need to sync
the data covering the range.  _Lying_ about whether an extent is in
the unwritten state is a really bad precedence to set, especially as
it is then guaranteed to change state when a crash occurs (Why did
recovery zero out my file? FIEMAP said it contained data before my
system crashed!).

Don't try to mangle the API semantics every time someone doesn't
understand how to use FIEMAP reliably. If you need the extent list
returned by FIEMAP to match what is in the page cache *regardless of

> Otherwise the UNMAPPED flag is useless, since even with "check,
> copy, check" there is no guarantee that the inode is changed
> _during_ the copy operation. It could have been written into the
> cache _before_ the FIEMAP and remain unchanged and in your case
> there would be no way to know any data was ever written to the
> file without SYNC on ever single file before FIEMAP.

I can't find any UNMAPPED flag in the FIEMAP interface, so I have no
idea what you are refering to here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-16  0:50                       ` Dave Chinner
@ 2011-04-16  5:11                         ` Andreas Dilger
  -1 siblings, 0 replies; 117+ messages in thread
From: Andreas Dilger @ 2011-04-16  5:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric Sandeen, xfs-oss, coreutils-mXXj517/zsQ,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA, Markus Trippelsdorf

On 2011-04-15, at 6:50 PM, Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:
> What you are suggesting is that FIEMAP needs to be page cache
> coherent, and that is far, far away from the intended use of the
> interface. Even consiering that you need to looking for active pages
> in the page cache when mapping extents say to me that you are
> doing something very wrong.
> 
> Unwritten extents remain unwritten until the data is physically
> written to them. Therefore, to change their state, you need to sync
> the data covering the range.

In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is fine. 

Cheers, Andreas

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-16  5:11                         ` Andreas Dilger
  0 siblings, 0 replies; 117+ messages in thread
From: Andreas Dilger @ 2011-04-16  5:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Eric Sandeen, xfs-oss, coreutils, linux-ext4, Pádraig Brady,
	Markus Trippelsdorf

On 2011-04-15, at 6:50 PM, Dave Chinner <david@fromorbit.com> wrote:
> What you are suggesting is that FIEMAP needs to be page cache
> coherent, and that is far, far away from the intended use of the
> interface. Even consiering that you need to looking for active pages
> in the page cache when mapping extents say to me that you are
> doing something very wrong.
> 
> Unwritten extents remain unwritten until the data is physically
> written to them. Therefore, to change their state, you need to sync
> the data covering the range.

In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is fine. 

Cheers, Andreas

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-16  0:50                       ` Dave Chinner
@ 2011-04-16  6:05                         ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-16  6:05 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Eric Sandeen, xfs-oss, coreutils-mXXj517/zsQ,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA, Markus Trippelsdorf

On Sat, Apr 16, 2011 at 8:50 AM, Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:
> On Thu, Apr 14, 2011 at 11:01:04PM -0600, Andreas Dilger wrote:
>> On 2011-04-14, at 6:09 PM, Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
>> wrote:
>> > No, this was explicitly laid out in the fiemap interface
>> > discussions - it's up to the applicaiton to decide if it needs
>> > to do a sync first. That's what the FIEMAP_FLAG_SYNC control
>> > flag is for.  This forces the fiemap call to do a fsync _before_
>> > getting the mapping. If you want to know the exact layout of the
>> > file is, then you must use this flag.
>> >
>> > Even so, it is recognised that this is racy - any use of the
>> > block map has a time-of-read-to-time-of-use race condition that
>> > means you have to _verify_ the copy after it completes. FYI,
>> > that's what xfs_fsr does when copying based on extent maps - if
>> > the inode has changed in _any way_ during the copy, it aborts
>> > the copy of that file.
>> >
>> > i.e. using fiemap for copying is at best a *hint* about the
>> > regions that need copying, and it is in no way a guarantee that
>> > you'll get all the information you need to make accurate copy
>> > even if you do use the synchronous variant.
>>
>> I would tend to agree with Pádraig. If there is data in the
>> mapping (regardless of whether it is on disk or not), the FIEMAP
>> should return this to the caller.  The SYNC flag is only intended
>> to flush the data to disk for tools that are doing
>> direct-to-disk operations on the data.
>
> What you are suggesting is that FIEMAP needs to be page cache
> coherent, and that is far, far away from the intended use of the
> interface. Even consiering that you need to looking for active pages
> in the page cache when mapping extents say to me that you are
> doing something very wrong.
>
> Unwritten extents remain unwritten until the data is physically
> written to them. Therefore, to change their state, you need to sync
No, buffered writes change their state without sync.

> the data covering the range.  _Lying_ about whether an extent is in
> the unwritten state is a really bad precedence to set, especially as
> it is then guaranteed to change state when a crash occurs (Why did
> recovery zero out my file? FIEMAP said it contained data before my
> system crashed!).

All filesystems have metadata in memory which is not flushed to
permanent storage. e.g. if a extent exists in memory, but itself and
corresponding data are not flushed to permanent storage. So you said
above can only be achieved by sync before FIEMAP.  Otherwise if a
crash occurs, FIEMAP can not find data before system crashed.

Without delayed allocation, there is no difference between
preallocation case(fallocate) and normal cases.

> Don't try to mangle the API semantics every time someone doesn't
> understand how to use FIEMAP reliably. If you need the extent list
> returned by FIEMAP to match what is in the page cache *regardless of
>
>> Otherwise the UNMAPPED flag is useless, since even with "check,
>> copy, check" there is no guarantee that the inode is changed
>> _during_ the copy operation. It could have been written into the
>> cache _before_ the FIEMAP and remain unchanged and in your case
>> there would be no way to know any data was ever written to the
>> file without SYNC on ever single file before FIEMAP.
>
> I can't find any UNMAPPED flag in the FIEMAP interface, so I have no
> idea what you are refering to here.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best Wishes
Yongqiang Yang

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-16  6:05                         ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-16  6:05 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Eric Sandeen, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On Sat, Apr 16, 2011 at 8:50 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Apr 14, 2011 at 11:01:04PM -0600, Andreas Dilger wrote:
>> On 2011-04-14, at 6:09 PM, Dave Chinner <david@fromorbit.com>
>> wrote:
>> > No, this was explicitly laid out in the fiemap interface
>> > discussions - it's up to the applicaiton to decide if it needs
>> > to do a sync first. That's what the FIEMAP_FLAG_SYNC control
>> > flag is for.  This forces the fiemap call to do a fsync _before_
>> > getting the mapping. If you want to know the exact layout of the
>> > file is, then you must use this flag.
>> >
>> > Even so, it is recognised that this is racy - any use of the
>> > block map has a time-of-read-to-time-of-use race condition that
>> > means you have to _verify_ the copy after it completes. FYI,
>> > that's what xfs_fsr does when copying based on extent maps - if
>> > the inode has changed in _any way_ during the copy, it aborts
>> > the copy of that file.
>> >
>> > i.e. using fiemap for copying is at best a *hint* about the
>> > regions that need copying, and it is in no way a guarantee that
>> > you'll get all the information you need to make accurate copy
>> > even if you do use the synchronous variant.
>>
>> I would tend to agree with Pádraig. If there is data in the
>> mapping (regardless of whether it is on disk or not), the FIEMAP
>> should return this to the caller.  The SYNC flag is only intended
>> to flush the data to disk for tools that are doing
>> direct-to-disk operations on the data.
>
> What you are suggesting is that FIEMAP needs to be page cache
> coherent, and that is far, far away from the intended use of the
> interface. Even consiering that you need to looking for active pages
> in the page cache when mapping extents say to me that you are
> doing something very wrong.
>
> Unwritten extents remain unwritten until the data is physically
> written to them. Therefore, to change their state, you need to sync
No, buffered writes change their state without sync.

> the data covering the range.  _Lying_ about whether an extent is in
> the unwritten state is a really bad precedence to set, especially as
> it is then guaranteed to change state when a crash occurs (Why did
> recovery zero out my file? FIEMAP said it contained data before my
> system crashed!).

All filesystems have metadata in memory which is not flushed to
permanent storage. e.g. if a extent exists in memory, but itself and
corresponding data are not flushed to permanent storage. So you said
above can only be achieved by sync before FIEMAP.  Otherwise if a
crash occurs, FIEMAP can not find data before system crashed.

Without delayed allocation, there is no difference between
preallocation case(fallocate) and normal cases.

> Don't try to mangle the API semantics every time someone doesn't
> understand how to use FIEMAP reliably. If you need the extent list
> returned by FIEMAP to match what is in the page cache *regardless of
>
>> Otherwise the UNMAPPED flag is useless, since even with "check,
>> copy, check" there is no guarantee that the inode is changed
>> _during_ the copy operation. It could have been written into the
>> cache _before_ the FIEMAP and remain unchanged and in your case
>> there would be no way to know any data was ever written to the
>> file without SYNC on ever single file before FIEMAP.
>
> I can't find any UNMAPPED flag in the FIEMAP interface, so I have no
> idea what you are refering to here.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-16  5:11                         ` Andreas Dilger
@ 2011-04-16 12:21                           ` Theodore Tso
  -1 siblings, 0 replies; 117+ messages in thread
From: Theodore Tso @ 2011-04-16 12:21 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Dave Chinner, Pádraig Brady, Eric Sandeen, linux-ext4,
	coreutils, Markus Trippelsdorf, xfs-oss

On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:

> In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is fine. 

Except that if someone is copying a large delay allocated file, it will cause 
the file to immediately snapped to disk, which might not be the greatest
thing in the world.  Christoph is write, SEEK_HOLE and SEEK_DATA are
a much better API for what cp woulld lke to do.  Unfortunately it hasn't
been implemented yet in the VFS...

- Ted

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-16 12:21                           ` Theodore Tso
  0 siblings, 0 replies; 117+ messages in thread
From: Theodore Tso @ 2011-04-16 12:21 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Eric Sandeen, xfs-oss, coreutils, linux-ext4, Pádraig Brady,
	Markus Trippelsdorf

On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:

> In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is fine. 

Except that if someone is copying a large delay allocated file, it will cause 
the file to immediately snapped to disk, which might not be the greatest
thing in the world.  Christoph is write, SEEK_HOLE and SEEK_DATA are
a much better API for what cp woulld lke to do.  Unfortunately it hasn't
been implemented yet in the VFS...

- Ted
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-16  6:05                         ` Yongqiang Yang
@ 2011-04-18  0:35                           ` Dave Chinner
  -1 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-18  0:35 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Andreas Dilger, Pádraig Brady, Eric Sandeen, linux-ext4,
	coreutils, Markus Trippelsdorf, xfs-oss

On Sat, Apr 16, 2011 at 02:05:51PM +0800, Yongqiang Yang wrote:
> On Sat, Apr 16, 2011 at 8:50 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Apr 14, 2011 at 11:01:04PM -0600, Andreas Dilger wrote:
> >> On 2011-04-14, at 6:09 PM, Dave Chinner <david@fromorbit.com>
> >> wrote:
> >> > No, this was explicitly laid out in the fiemap interface
> >> > discussions - it's up to the applicaiton to decide if it needs
> >> > to do a sync first. That's what the FIEMAP_FLAG_SYNC control
> >> > flag is for.  This forces the fiemap call to do a fsync _before_
> >> > getting the mapping. If you want to know the exact layout of the
> >> > file is, then you must use this flag.
> >> >
> >> > Even so, it is recognised that this is racy - any use of the
> >> > block map has a time-of-read-to-time-of-use race condition that
> >> > means you have to _verify_ the copy after it completes. FYI,
> >> > that's what xfs_fsr does when copying based on extent maps - if
> >> > the inode has changed in _any way_ during the copy, it aborts
> >> > the copy of that file.
> >> >
> >> > i.e. using fiemap for copying is at best a *hint* about the
> >> > regions that need copying, and it is in no way a guarantee that
> >> > you'll get all the information you need to make accurate copy
> >> > even if you do use the synchronous variant.
> >>
> >> I would tend to agree with Pádraig. If there is data in the
> >> mapping (regardless of whether it is on disk or not), the FIEMAP
> >> should return this to the caller.  The SYNC flag is only intended
> >> to flush the data to disk for tools that are doing
> >> direct-to-disk operations on the data.
> >
> > What you are suggesting is that FIEMAP needs to be page cache
> > coherent, and that is far, far away from the intended use of the
> > interface. Even consiering that you need to looking for active pages
> > in the page cache when mapping extents say to me that you are
> > doing something very wrong.
> >
> > Unwritten extents remain unwritten until the data is physically
> > written to them. Therefore, to change their state, you need to sync
> No, buffered writes change their state without sync.

They shouldn't.

> > the data covering the range.  _Lying_ about whether an extent is in
> > the unwritten state is a really bad precedence to set, especially as
> > it is then guaranteed to change state when a crash occurs (Why did
> > recovery zero out my file? FIEMAP said it contained data before my
> > system crashed!).
> 
> All filesystems have metadata in memory which is not flushed to
> permanent storage. e.g. if a extent exists in memory, but itself and
> corresponding data are not flushed to permanent storage.

Sure, but in the case of unwritten extents, XFS does not change the
metadata state in memory until *after the physical IO is completed*.
I'm pretty sure that btrfs is the same.

IOWs, despite the fact that a buffered write has occurred, no
metadata has changed state in memory, and the extents are still
unwritten in both memory and on disk....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-18  0:35                           ` Dave Chinner
  0 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-18  0:35 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Andreas Dilger, Eric Sandeen, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On Sat, Apr 16, 2011 at 02:05:51PM +0800, Yongqiang Yang wrote:
> On Sat, Apr 16, 2011 at 8:50 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Thu, Apr 14, 2011 at 11:01:04PM -0600, Andreas Dilger wrote:
> >> On 2011-04-14, at 6:09 PM, Dave Chinner <david@fromorbit.com>
> >> wrote:
> >> > No, this was explicitly laid out in the fiemap interface
> >> > discussions - it's up to the applicaiton to decide if it needs
> >> > to do a sync first. That's what the FIEMAP_FLAG_SYNC control
> >> > flag is for.  This forces the fiemap call to do a fsync _before_
> >> > getting the mapping. If you want to know the exact layout of the
> >> > file is, then you must use this flag.
> >> >
> >> > Even so, it is recognised that this is racy - any use of the
> >> > block map has a time-of-read-to-time-of-use race condition that
> >> > means you have to _verify_ the copy after it completes. FYI,
> >> > that's what xfs_fsr does when copying based on extent maps - if
> >> > the inode has changed in _any way_ during the copy, it aborts
> >> > the copy of that file.
> >> >
> >> > i.e. using fiemap for copying is at best a *hint* about the
> >> > regions that need copying, and it is in no way a guarantee that
> >> > you'll get all the information you need to make accurate copy
> >> > even if you do use the synchronous variant.
> >>
> >> I would tend to agree with Pádraig. If there is data in the
> >> mapping (regardless of whether it is on disk or not), the FIEMAP
> >> should return this to the caller.  The SYNC flag is only intended
> >> to flush the data to disk for tools that are doing
> >> direct-to-disk operations on the data.
> >
> > What you are suggesting is that FIEMAP needs to be page cache
> > coherent, and that is far, far away from the intended use of the
> > interface. Even consiering that you need to looking for active pages
> > in the page cache when mapping extents say to me that you are
> > doing something very wrong.
> >
> > Unwritten extents remain unwritten until the data is physically
> > written to them. Therefore, to change their state, you need to sync
> No, buffered writes change their state without sync.

They shouldn't.

> > the data covering the range.  _Lying_ about whether an extent is in
> > the unwritten state is a really bad precedence to set, especially as
> > it is then guaranteed to change state when a crash occurs (Why did
> > recovery zero out my file? FIEMAP said it contained data before my
> > system crashed!).
> 
> All filesystems have metadata in memory which is not flushed to
> permanent storage. e.g. if a extent exists in memory, but itself and
> corresponding data are not flushed to permanent storage.

Sure, but in the case of unwritten extents, XFS does not change the
metadata state in memory until *after the physical IO is completed*.
I'm pretty sure that btrfs is the same.

IOWs, despite the fact that a buffered write has occurred, no
metadata has changed state in memory, and the extents are still
unwritten in both memory and on disk....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-16 12:21                           ` Theodore Tso
@ 2011-04-18  0:40                             ` Dave Chinner
  -1 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-18  0:40 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Andreas Dilger, Pádraig Brady, Eric Sandeen, linux-ext4,
	coreutils, Markus Trippelsdorf, xfs-oss

On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
> 
> On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
> 
> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is fine. 
> 
> Except that if someone is copying a large delay allocated file, it will cause 
> the file to immediately snapped to disk, which might not be the greatest
> thing in the world. 

Obvious workaround - if the initial fiemap call shows unwritten
extents, redo it with the sync flag set. Though that assumeѕ that
you can trust things like delalloc extents to only cover the range
that valid data exists in. Which, of course, you can't assume,
either. :/

> Christoph is write, SEEK_HOLE and SEEK_DATA are
> a much better API for what cp woulld lke to do.  Unfortunately it hasn't
> been implemented yet in the VFS...

Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-18  0:40                             ` Dave Chinner
  0 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-18  0:40 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Andreas Dilger, Eric Sandeen, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
> 
> On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
> 
> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is fine. 
> 
> Except that if someone is copying a large delay allocated file, it will cause 
> the file to immediately snapped to disk, which might not be the greatest
> thing in the world. 

Obvious workaround - if the initial fiemap call shows unwritten
extents, redo it with the sync flag set. Though that assumeѕ that
you can trust things like delalloc extents to only cover the range
that valid data exists in. Which, of course, you can't assume,
either. :/

> Christoph is write, SEEK_HOLE and SEEK_DATA are
> a much better API for what cp woulld lke to do.  Unfortunately it hasn't
> been implemented yet in the VFS...

Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-18  0:40                             ` Dave Chinner
@ 2011-04-18  2:45                               ` Andreas Dilger
  -1 siblings, 0 replies; 117+ messages in thread
From: Andreas Dilger @ 2011-04-18  2:45 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Theodore Tso, Eric Sandeen, xfs-oss, coreutils-mXXj517/zsQ,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA, Markus Trippelsdorf

[-- Attachment #1: Type: text/plain, Size: 1883 bytes --]

On 2011-04-17, at 6:40 PM, Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:
> On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
>> 
>> On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
>>> In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is fine. 
>> 
>> Except that if someone is copying a large delay allocated file, it will cause 
>> the file to immediately snapped to disk, which might not be the greatest
>> thing in the world. 
> 
> Obvious workaround - if the initial fiemap call shows unwritten
> extents, redo it with the sync flag set. Though that assumeѕ that
> you can trust things like delalloc extents to only cover the range
> that valid data exists in. Which, of course, you can't assume,
> either. :/

Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do anything if there is unwritten data, which is the only case we are concerned with at this point.  In any case, this is a simple solution for coreutils until such a time that a more complex solution is added in the kernel (if ever).

>> Christoph is write, SEEK_HOLE and SEEK_DATA are
>> a much better API for what cp woulld lke to do.  Unfortunately it hasn't
>> been implemented yet in the VFS...
> 
> Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.

I don't see how this will change the problem in any meaningful way. There will still need to be code that is traversing the on-disk mapping, and also keeping it coherent with unwritten data in the page cache.

Since FIEMAP already exists for most Linux filesystems, it probably makes sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk mapping in the first place.

I agree that SEEK_{HOLE,DATA} is an easier programming interface, and probably what cp, tar, etc should use, once it is implemented. 

Cheers, Andreas

[-- Attachment #2: Type: text/html, Size: 5002 bytes --]

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-18  2:45                               ` Andreas Dilger
  0 siblings, 0 replies; 117+ messages in thread
From: Andreas Dilger @ 2011-04-18  2:45 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Theodore Tso, Eric Sandeen, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

[-- Attachment #1.1: Type: text/plain, Size: 1857 bytes --]

On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
>> 
>> On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
>>> In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is fine. 
>> 
>> Except that if someone is copying a large delay allocated file, it will cause 
>> the file to immediately snapped to disk, which might not be the greatest
>> thing in the world. 
> 
> Obvious workaround - if the initial fiemap call shows unwritten
> extents, redo it with the sync flag set. Though that assumeѕ that
> you can trust things like delalloc extents to only cover the range
> that valid data exists in. Which, of course, you can't assume,
> either. :/

Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do anything if there is unwritten data, which is the only case we are concerned with at this point.  In any case, this is a simple solution for coreutils until such a time that a more complex solution is added in the kernel (if ever).

>> Christoph is write, SEEK_HOLE and SEEK_DATA are
>> a much better API for what cp woulld lke to do.  Unfortunately it hasn't
>> been implemented yet in the VFS...
> 
> Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.

I don't see how this will change the problem in any meaningful way. There will still need to be code that is traversing the on-disk mapping, and also keeping it coherent with unwritten data in the page cache.

Since FIEMAP already exists for most Linux filesystems, it probably makes sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk mapping in the first place.

I agree that SEEK_{HOLE,DATA} is an easier programming interface, and probably what cp, tar, etc should use, once it is implemented. 

Cheers, Andreas

[-- Attachment #1.2: Type: text/html, Size: 4950 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-18  2:45                               ` Andreas Dilger
@ 2011-04-19  1:58                                 ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-19  1:58 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Dave Chinner, Theodore Tso, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
>
> On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
>
> In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
> fine.
>
> Except that if someone is copying a large delay allocated file, it will
> cause
>
> the file to immediately snapped to disk, which might not be the greatest
>
> thing in the world.
>
> Obvious workaround - if the initial fiemap call shows unwritten
> extents, redo it with the sync flag set. Though that assumeѕ that
> you can trust things like delalloc extents to only cover the range
> that valid data exists in. Which, of course, you can't assume,
> either. :/
>
> Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
> anything if there is unwritten data, which is the only case we are concerned
> with at this point.  In any case, this is a simple solution for coreutils
> until such a time that a more complex solution is added in the kernel (if
> ever).
>
> Christoph is write, SEEK_HOLE and SEEK_DATA are
>
> a much better API for what cp woulld lke to do.  Unfortunately it hasn't
>
> been implemented yet in the VFS...
>
> Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
>
> I don't see how this will change the problem in any meaningful way. There
> will still need to be code that is traversing the on-disk mapping, and also
> keeping it coherent with unwritten data in the page cache.

It seems that we are being messed up by page cache and disk.
Unwritten flag returned from FIEMAP indicates blocks on disk are not
written, but it does not say if there is data in page cache.  So
FIEMAP itself just tells user the map on disk.  However there is an
exception for delayed allocation,  FIEMAP tells users the data is in
page cache.

Maybe FIEMAP should return all known messages for unwritten extent, if
unwritten data exists in page cache, FIEMAP should let users know that
data is in page cache and space on disk has been preallocated, but
data has not been flushed into disk.  Actually, delayed allocation has
done like this. Then user-space applications can determine how to do.
Taking cp as an example, it will copy from page cache rather ignore
it.


We need a definite definition for FIEMAP, in other words, it tells
users map on disk or both disk and page cache.

If the former one is taken, then FIEMAP should not consider delayed allocation.
otherwise, FIEMAP should return all known messages for unwritten case
like delayed allocation.

> Since FIEMAP already exists for most Linux filesystems, it probably makes
> sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk
> mapping in the first place.
> I agree that SEEK_{HOLE,DATA} is an easier programming interface, and
> probably what cp, tar, etc should use, once it is implemented.
> Cheers, Andreas
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  1:58                                 ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-19  1:58 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Theodore Tso, Eric Sandeen, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
>
> On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
>
> In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
> fine.
>
> Except that if someone is copying a large delay allocated file, it will
> cause
>
> the file to immediately snapped to disk, which might not be the greatest
>
> thing in the world.
>
> Obvious workaround - if the initial fiemap call shows unwritten
> extents, redo it with the sync flag set. Though that assumeѕ that
> you can trust things like delalloc extents to only cover the range
> that valid data exists in. Which, of course, you can't assume,
> either. :/
>
> Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
> anything if there is unwritten data, which is the only case we are concerned
> with at this point.  In any case, this is a simple solution for coreutils
> until such a time that a more complex solution is added in the kernel (if
> ever).
>
> Christoph is write, SEEK_HOLE and SEEK_DATA are
>
> a much better API for what cp woulld lke to do.  Unfortunately it hasn't
>
> been implemented yet in the VFS...
>
> Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
>
> I don't see how this will change the problem in any meaningful way. There
> will still need to be code that is traversing the on-disk mapping, and also
> keeping it coherent with unwritten data in the page cache.

It seems that we are being messed up by page cache and disk.
Unwritten flag returned from FIEMAP indicates blocks on disk are not
written, but it does not say if there is data in page cache.  So
FIEMAP itself just tells user the map on disk.  However there is an
exception for delayed allocation,  FIEMAP tells users the data is in
page cache.

Maybe FIEMAP should return all known messages for unwritten extent, if
unwritten data exists in page cache, FIEMAP should let users know that
data is in page cache and space on disk has been preallocated, but
data has not been flushed into disk.  Actually, delayed allocation has
done like this. Then user-space applications can determine how to do.
Taking cp as an example, it will copy from page cache rather ignore
it.


We need a definite definition for FIEMAP, in other words, it tells
users map on disk or both disk and page cache.

If the former one is taken, then FIEMAP should not consider delayed allocation.
otherwise, FIEMAP should return all known messages for unwritten case
like delayed allocation.

> Since FIEMAP already exists for most Linux filesystems, it probably makes
> sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk
> mapping in the first place.
> I agree that SEEK_{HOLE,DATA} is an easier programming interface, and
> probably what cp, tar, etc should use, once it is implemented.
> Cheers, Andreas
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  1:58                                 ` Yongqiang Yang
@ 2011-04-19  2:59                                     ` Ted Ts'o
  -1 siblings, 0 replies; 117+ messages in thread
From: Ted Ts'o @ 2011-04-19  2:59 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Andreas Dilger, Dave Chinner, Eric Sandeen, xfs-oss,
	coreutils-mXXj517/zsQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	Markus Trippelsdorf

On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> wrote:
> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should
> > only do anything if there is unwritten data, which is the only
> > case we are concerned with at this point.  In any case, this is a
> > simple solution for coreutils until such a time that a more
> > complex solution is added in the kernel (if ever).

I would recommend that coreutils check i_blocks and i_size and only
try using fiemap (with FIEMAP_FLAG_SYNC) if the file appears to be
sparse.  That's because FIEMAP_FLAG_SYNC will do the effectively
equivalent of an fsync() system call.  Otherwise, in the case of a
freshly untar'ed directory hierarchy which is then copied using "cp
-r", cp would end up calling fsync() for each file in the directory,
with the disastrous performance result that one might expect.

If cp only tries the fiemap optimization on files that appear to be
sparse, it should avoid this problem.

> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
> >
> > I don't see how this will change the problem in any meaningful way. There
> > will still need to be code that is traversing the on-disk mapping, and also
> > keeping it coherent with unwritten data in the page cache.

The advantage of SEEK_HOLE/SEEK_DATA is that we don't need to force an
fsync() of the data.

> It seems that we are being messed up by page cache and disk.
> Unwritten flag returned from FIEMAP indicates blocks on disk are not
> written, but it does not say if there is data in page cache.  So
> FIEMAP itself just tells user the map on disk.  However there is an
> exception for delayed allocation,  FIEMAP tells users the data is in
> page cache.
> 
> Maybe FIEMAP should return all known messages for unwritten extent, if
> unwritten data exists in page cache, FIEMAP should let users know that
> data is in page cache and space on disk has been preallocated, but
> data has not been flushed into disk.  Actually, delayed allocation has
> done like this. Then user-space applications can determine how to do.
> Taking cp as an example, it will copy from page cache rather ignore
> it.
>
> We need a definite definition for FIEMAP, in other words, it tells
> users map on disk or both disk and page cache.
> 
> If the former one is taken, then FIEMAP should not consider delayed
> allocation.  otherwise, FIEMAP should return all known messages for
> unwritten case like delayed allocation.

The fact that the FIEMAP interface deifnition includes an delayed
allocation bit could be a strong indication that unlike the XFS's bmap
interface, that this interface is supposed to return information
taking into account both on-disk and page cache information.  If this
is the case, then even though there might be a single on-disk
(uninitialized) extent, if there are pages in the page cache that have
not yet been written out yet, but which are described by that on-disk
extent, then instead of returning a single struct fiemap_extent for
that on-disk extent, the fiemap ioctl would need to return multiple
struct fiemap_extents, where some would have the FIEMAP_UNWRITTEN bit,
and others would not (since data has been written to the page cache,
even if it hasn't been flushed to disk yet).

But yes, if we're going to make the case that the FIEMAP interface is
only intended to reflect the on-disk information, then the DELALLOC
bit shouldn't be returned at all, and we should deprecate it.
Anything else leads us to a inconsistent interface.

> > Since FIEMAP already exists for most Linux filesystems, it probably makes
> > sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk
> > mapping in the first place.

Not if it means forcing an FIEMAP_FLAG_SYNC, which implies an fsync().
If the only way to get consistent data across ext4, btrfs, xfs,
etc. is to force userspace to issue a FIEMAP_FLAG_SYNC, then we need
to have a separate interface of SEEK_HOLE/SEEK_DATA that doesn't
require flushing data to the disk first.

Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
it's the only way to guarantee correct behaviour for XFS.  But I would
really rather that be the long-term way we leave things!

						- Ted

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  2:59                                     ` Ted Ts'o
  0 siblings, 0 replies; 117+ messages in thread
From: Ted Ts'o @ 2011-04-19  2:59 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Andreas Dilger, Eric Sandeen, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should
> > only do anything if there is unwritten data, which is the only
> > case we are concerned with at this point.  In any case, this is a
> > simple solution for coreutils until such a time that a more
> > complex solution is added in the kernel (if ever).

I would recommend that coreutils check i_blocks and i_size and only
try using fiemap (with FIEMAP_FLAG_SYNC) if the file appears to be
sparse.  That's because FIEMAP_FLAG_SYNC will do the effectively
equivalent of an fsync() system call.  Otherwise, in the case of a
freshly untar'ed directory hierarchy which is then copied using "cp
-r", cp would end up calling fsync() for each file in the directory,
with the disastrous performance result that one might expect.

If cp only tries the fiemap optimization on files that appear to be
sparse, it should avoid this problem.

> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
> >
> > I don't see how this will change the problem in any meaningful way. There
> > will still need to be code that is traversing the on-disk mapping, and also
> > keeping it coherent with unwritten data in the page cache.

The advantage of SEEK_HOLE/SEEK_DATA is that we don't need to force an
fsync() of the data.

> It seems that we are being messed up by page cache and disk.
> Unwritten flag returned from FIEMAP indicates blocks on disk are not
> written, but it does not say if there is data in page cache.  So
> FIEMAP itself just tells user the map on disk.  However there is an
> exception for delayed allocation,  FIEMAP tells users the data is in
> page cache.
> 
> Maybe FIEMAP should return all known messages for unwritten extent, if
> unwritten data exists in page cache, FIEMAP should let users know that
> data is in page cache and space on disk has been preallocated, but
> data has not been flushed into disk.  Actually, delayed allocation has
> done like this. Then user-space applications can determine how to do.
> Taking cp as an example, it will copy from page cache rather ignore
> it.
>
> We need a definite definition for FIEMAP, in other words, it tells
> users map on disk or both disk and page cache.
> 
> If the former one is taken, then FIEMAP should not consider delayed
> allocation.  otherwise, FIEMAP should return all known messages for
> unwritten case like delayed allocation.

The fact that the FIEMAP interface deifnition includes an delayed
allocation bit could be a strong indication that unlike the XFS's bmap
interface, that this interface is supposed to return information
taking into account both on-disk and page cache information.  If this
is the case, then even though there might be a single on-disk
(uninitialized) extent, if there are pages in the page cache that have
not yet been written out yet, but which are described by that on-disk
extent, then instead of returning a single struct fiemap_extent for
that on-disk extent, the fiemap ioctl would need to return multiple
struct fiemap_extents, where some would have the FIEMAP_UNWRITTEN bit,
and others would not (since data has been written to the page cache,
even if it hasn't been flushed to disk yet).

But yes, if we're going to make the case that the FIEMAP interface is
only intended to reflect the on-disk information, then the DELALLOC
bit shouldn't be returned at all, and we should deprecate it.
Anything else leads us to a inconsistent interface.

> > Since FIEMAP already exists for most Linux filesystems, it probably makes
> > sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk
> > mapping in the first place.

Not if it means forcing an FIEMAP_FLAG_SYNC, which implies an fsync().
If the only way to get consistent data across ext4, btrfs, xfs,
etc. is to force userspace to issue a FIEMAP_FLAG_SYNC, then we need
to have a separate interface of SEEK_HOLE/SEEK_DATA that doesn't
require flushing data to the disk first.

Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
it's the only way to guarantee correct behaviour for XFS.  But I would
really rather that be the long-term way we leave things!

						- Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  2:59                                     ` Ted Ts'o
@ 2011-04-19  3:05                                         ` Eric Sandeen
  -1 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-19  3:05 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Dave Chinner, Yongqiang Yang, xfs-oss,
	coreutils-mXXj517/zsQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	Markus Trippelsdorf

On 4/18/11 9:59 PM, Ted Ts'o wrote:

...

> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
> it's the only way to guarantee correct behaviour for XFS.  But I would
> really rather that be the long-term way we leave things!

XFS ... or ext4:

# xfs_io -Ff -c "falloc 0 1m" -c "pwrite 0 512k" testfile; /root/fiemap-test testfile
wrote 524288/524288 bytes at offset 0
512 KiB, 128 ops; 0.0000 sec (161.342 MiB/sec and 41303.6463 ops/sec)
start 0 length -1 flags 0x0 count 32
ext:   0 logical: [       0..     255] phys:    34048..   34303 flags: 0x801 tot: 256

# uname -r
2.6.39-0.rc3.git2.0.fc16.x86_64

Above is on ext4.  It behaves exactly like XFS in my testing; data in the page cache does not cause fiemap to return anything other than "unwritten" for preallocated extents.

-Eric

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  3:05                                         ` Eric Sandeen
  0 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-19  3:05 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Yongqiang Yang, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On 4/18/11 9:59 PM, Ted Ts'o wrote:

...

> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
> it's the only way to guarantee correct behaviour for XFS.  But I would
> really rather that be the long-term way we leave things!

XFS ... or ext4:

# xfs_io -Ff -c "falloc 0 1m" -c "pwrite 0 512k" testfile; /root/fiemap-test testfile
wrote 524288/524288 bytes at offset 0
512 KiB, 128 ops; 0.0000 sec (161.342 MiB/sec and 41303.6463 ops/sec)
start 0 length -1 flags 0x0 count 32
ext:   0 logical: [       0..     255] phys:    34048..   34303 flags: 0x801 tot: 256

# uname -r
2.6.39-0.rc3.git2.0.fc16.x86_64

Above is on ext4.  It behaves exactly like XFS in my testing; data in the page cache does not cause fiemap to return anything other than "unwritten" for preallocated extents.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  2:59                                     ` Ted Ts'o
@ 2011-04-19  3:30                                       ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-19  3:30 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Dave Chinner, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 10:59 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
>> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should
>> > only do anything if there is unwritten data, which is the only
>> > case we are concerned with at this point.  In any case, this is a
>> > simple solution for coreutils until such a time that a more
>> > complex solution is added in the kernel (if ever).
>
> I would recommend that coreutils check i_blocks and i_size and only
> try using fiemap (with FIEMAP_FLAG_SYNC) if the file appears to be
> sparse.  That's because FIEMAP_FLAG_SYNC will do the effectively
> equivalent of an fsync() system call.  Otherwise, in the case of a
> freshly untar'ed directory hierarchy which is then copied using "cp
> -r", cp would end up calling fsync() for each file in the directory,
> with the disastrous performance result that one might expect.
>
> If cp only tries the fiemap optimization on files that appear to be
> sparse, it should avoid this problem.
>
>> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
>> >
>> > I don't see how this will change the problem in any meaningful way. There
>> > will still need to be code that is traversing the on-disk mapping, and also
>> > keeping it coherent with unwritten data in the page cache.
>
> The advantage of SEEK_HOLE/SEEK_DATA is that we don't need to force an
> fsync() of the data.
>
>> It seems that we are being messed up by page cache and disk.
>> Unwritten flag returned from FIEMAP indicates blocks on disk are not
>> written, but it does not say if there is data in page cache.  So
>> FIEMAP itself just tells user the map on disk.  However there is an
>> exception for delayed allocation,  FIEMAP tells users the data is in
>> page cache.
>>
>> Maybe FIEMAP should return all known messages for unwritten extent, if
>> unwritten data exists in page cache, FIEMAP should let users know that
>> data is in page cache and space on disk has been preallocated, but
>> data has not been flushed into disk.  Actually, delayed allocation has
>> done like this. Then user-space applications can determine how to do.
>> Taking cp as an example, it will copy from page cache rather ignore
>> it.
>>
>> We need a definite definition for FIEMAP, in other words, it tells
>> users map on disk or both disk and page cache.
>>
>> If the former one is taken, then FIEMAP should not consider delayed
>> allocation.  otherwise, FIEMAP should return all known messages for
>> unwritten case like delayed allocation.
>
> The fact that the FIEMAP interface deifnition includes an delayed
> allocation bit could be a strong indication that unlike the XFS's bmap
> interface, that this interface is supposed to return information
> taking into account both on-disk and page cache information.  If this
> is the case, then even though there might be a single on-disk
> (uninitialized) extent, if there are pages in the page cache that have
> not yet been written out yet, but which are described by that on-disk
> extent, then instead of returning a single struct fiemap_extent for
> that on-disk extent, the fiemap ioctl would need to return multiple
> struct fiemap_extents, where some would have the FIEMAP_UNWRITTEN bit,
> and others would not (since data has been written to the page cache,
> even if it hasn't been flushed to disk yet).
Maybe we can add a SPLIT flag like MERGE for ext3, which is set if
there are pages in page cache that have not been written out, but
which are described by unwritten extent on disk, and which does not
cover the whole extent.

Thus, an extent returned by FIEMAP may have UNWRITTEN, NOBYPASS and SPLIT flags.

I noticed that there is a NOBYPASS flag in initial FIEMAP, which
indicates data has not been written out to disk.  But it does not
exist in current implementation any more.

>
> But yes, if we're going to make the case that the FIEMAP interface is
> only intended to reflect the on-disk information, then the DELALLOC
> bit shouldn't be returned at all, and we should deprecate it.
> Anything else leads us to a inconsistent interface.
>
>> > Since FIEMAP already exists for most Linux filesystems, it probably makes
>> > sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk
>> > mapping in the first place.
>
> Not if it means forcing an FIEMAP_FLAG_SYNC, which implies an fsync().
> If the only way to get consistent data across ext4, btrfs, xfs,
> etc. is to force userspace to issue a FIEMAP_FLAG_SYNC, then we need
> to have a separate interface of SEEK_HOLE/SEEK_DATA that doesn't
> require flushing data to the disk first.
>
> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
> it's the only way to guarantee correct behaviour for XFS.  But I would
> really rather that be the long-term way we leave things!
>
>                                                - Ted
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  3:30                                       ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-19  3:30 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Eric Sandeen, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 10:59 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
>> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should
>> > only do anything if there is unwritten data, which is the only
>> > case we are concerned with at this point.  In any case, this is a
>> > simple solution for coreutils until such a time that a more
>> > complex solution is added in the kernel (if ever).
>
> I would recommend that coreutils check i_blocks and i_size and only
> try using fiemap (with FIEMAP_FLAG_SYNC) if the file appears to be
> sparse.  That's because FIEMAP_FLAG_SYNC will do the effectively
> equivalent of an fsync() system call.  Otherwise, in the case of a
> freshly untar'ed directory hierarchy which is then copied using "cp
> -r", cp would end up calling fsync() for each file in the directory,
> with the disastrous performance result that one might expect.
>
> If cp only tries the fiemap optimization on files that appear to be
> sparse, it should avoid this problem.
>
>> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
>> >
>> > I don't see how this will change the problem in any meaningful way. There
>> > will still need to be code that is traversing the on-disk mapping, and also
>> > keeping it coherent with unwritten data in the page cache.
>
> The advantage of SEEK_HOLE/SEEK_DATA is that we don't need to force an
> fsync() of the data.
>
>> It seems that we are being messed up by page cache and disk.
>> Unwritten flag returned from FIEMAP indicates blocks on disk are not
>> written, but it does not say if there is data in page cache.  So
>> FIEMAP itself just tells user the map on disk.  However there is an
>> exception for delayed allocation,  FIEMAP tells users the data is in
>> page cache.
>>
>> Maybe FIEMAP should return all known messages for unwritten extent, if
>> unwritten data exists in page cache, FIEMAP should let users know that
>> data is in page cache and space on disk has been preallocated, but
>> data has not been flushed into disk.  Actually, delayed allocation has
>> done like this. Then user-space applications can determine how to do.
>> Taking cp as an example, it will copy from page cache rather ignore
>> it.
>>
>> We need a definite definition for FIEMAP, in other words, it tells
>> users map on disk or both disk and page cache.
>>
>> If the former one is taken, then FIEMAP should not consider delayed
>> allocation.  otherwise, FIEMAP should return all known messages for
>> unwritten case like delayed allocation.
>
> The fact that the FIEMAP interface deifnition includes an delayed
> allocation bit could be a strong indication that unlike the XFS's bmap
> interface, that this interface is supposed to return information
> taking into account both on-disk and page cache information.  If this
> is the case, then even though there might be a single on-disk
> (uninitialized) extent, if there are pages in the page cache that have
> not yet been written out yet, but which are described by that on-disk
> extent, then instead of returning a single struct fiemap_extent for
> that on-disk extent, the fiemap ioctl would need to return multiple
> struct fiemap_extents, where some would have the FIEMAP_UNWRITTEN bit,
> and others would not (since data has been written to the page cache,
> even if it hasn't been flushed to disk yet).
Maybe we can add a SPLIT flag like MERGE for ext3, which is set if
there are pages in page cache that have not been written out, but
which are described by unwritten extent on disk, and which does not
cover the whole extent.

Thus, an extent returned by FIEMAP may have UNWRITTEN, NOBYPASS and SPLIT flags.

I noticed that there is a NOBYPASS flag in initial FIEMAP, which
indicates data has not been written out to disk.  But it does not
exist in current implementation any more.

>
> But yes, if we're going to make the case that the FIEMAP interface is
> only intended to reflect the on-disk information, then the DELALLOC
> bit shouldn't be returned at all, and we should deprecate it.
> Anything else leads us to a inconsistent interface.
>
>> > Since FIEMAP already exists for most Linux filesystems, it probably makes
>> > sense to implement SEEK_{HOLE,DATA} by calling FIEMAP to get the disk
>> > mapping in the first place.
>
> Not if it means forcing an FIEMAP_FLAG_SYNC, which implies an fsync().
> If the only way to get consistent data across ext4, btrfs, xfs,
> etc. is to force userspace to issue a FIEMAP_FLAG_SYNC, then we need
> to have a separate interface of SEEK_HOLE/SEEK_DATA that doesn't
> require flushing data to the disk first.
>
> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
> it's the only way to guarantee correct behaviour for XFS.  But I would
> really rather that be the long-term way we leave things!
>
>                                                - Ted
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  1:58                                 ` Yongqiang Yang
@ 2011-04-19  3:44                                   ` Dave Chinner
  -1 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-19  3:44 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Andreas Dilger, Theodore Tso, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> > On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
> >
> > On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
> >
> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
> > fine.
> >
> > Except that if someone is copying a large delay allocated file, it will
> > cause
> >
> > the file to immediately snapped to disk, which might not be the greatest
> >
> > thing in the world.
> >
> > Obvious workaround - if the initial fiemap call shows unwritten
> > extents, redo it with the sync flag set. Though that assumeѕ that
> > you can trust things like delalloc extents to only cover the range
> > that valid data exists in. Which, of course, you can't assume,
> > either. :/
> >
> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
> > anything if there is unwritten data, which is the only case we are concerned
> > with at this point.  In any case, this is a simple solution for coreutils
> > until such a time that a more complex solution is added in the kernel (if
> > ever).
> >
> > Christoph is write, SEEK_HOLE and SEEK_DATA are
> >
> > a much better API for what cp woulld lke to do.  Unfortunately it hasn't
> >
> > been implemented yet in the VFS...
> >
> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
> >
> > I don't see how this will change the problem in any meaningful way. There
> > will still need to be code that is traversing the on-disk mapping, and also
> > keeping it coherent with unwritten data in the page cache.
> 
> It seems that we are being messed up by page cache and disk.
> Unwritten flag returned from FIEMAP indicates blocks on disk are not
> written, but it does not say if there is data in page cache.  So
> FIEMAP itself just tells user the map on disk.  However there is an
> exception for delayed allocation,  FIEMAP tells users the data is in
> page cache.

No, FIEMAP does not tell the user there is data in the page cache.
It tells there user there is a delayed allocation extent. For XFS, a
delayed allocation extent can cover a range _greater_ than there is
data in the page cache - we do allocation allignment, speculative
allocation and other tricks to avoid fragmentation via
delayed allocation. When XFSs says there is a delalloc extent, it is
simply showing the in-memory representation of the extent. if you
want to know where the data in the page cache actually is, you need
to sync the file to disk to get those ranges converted to real
extents. This is how xfs_bmap has worked for 15 years....

> Maybe FIEMAP should return all known messages for unwritten extent, if
> unwritten data exists in page cache, FIEMAP should let users know that
> data is in page cache and space on disk has been preallocated, but
> data has not been flushed into disk.  Actually, delayed allocation has
> done like this. Then user-space applications can determine how to do.
> Taking cp as an example, it will copy from page cache rather ignore
> it.

Once again, FIEMAP is for showing the filesystem's current extent
state, not the page cache state. Ext4 may implement FIEMAP by doing
page cache walks, but that is a filesystem specific implementation
detail.

> We need a definite definition for FIEMAP, in other words, it tells
> users map on disk or both disk and page cache.

We already have a definition - and it has nothing to do with the
page cache state.

> If the former one is taken, then FIEMAP should not consider
> delayed allocation.

Not at all. the delayed allocation extent is a first class extent
type in XFS and it is reported directly from the extent list. Your
viewpoint is very ext4-specific and ignores the fact that other
filesystems were doing this sort of mapping long before even ext3
(let alone ext4) was a glint in the designer's eye....

> otherwise, FIEMAP should return all known messages for unwritten case
> like delayed allocation.

See my previous comments about extents being unwritten until data is
physically written to them.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  3:44                                   ` Dave Chinner
  0 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-19  3:44 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Andreas Dilger, Theodore Tso, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> > On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
> >
> > On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
> >
> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
> > fine.
> >
> > Except that if someone is copying a large delay allocated file, it will
> > cause
> >
> > the file to immediately snapped to disk, which might not be the greatest
> >
> > thing in the world.
> >
> > Obvious workaround - if the initial fiemap call shows unwritten
> > extents, redo it with the sync flag set. Though that assumeѕ that
> > you can trust things like delalloc extents to only cover the range
> > that valid data exists in. Which, of course, you can't assume,
> > either. :/
> >
> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
> > anything if there is unwritten data, which is the only case we are concerned
> > with at this point.  In any case, this is a simple solution for coreutils
> > until such a time that a more complex solution is added in the kernel (if
> > ever).
> >
> > Christoph is write, SEEK_HOLE and SEEK_DATA are
> >
> > a much better API for what cp woulld lke to do.  Unfortunately it hasn't
> >
> > been implemented yet in the VFS...
> >
> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
> >
> > I don't see how this will change the problem in any meaningful way. There
> > will still need to be code that is traversing the on-disk mapping, and also
> > keeping it coherent with unwritten data in the page cache.
> 
> It seems that we are being messed up by page cache and disk.
> Unwritten flag returned from FIEMAP indicates blocks on disk are not
> written, but it does not say if there is data in page cache.  So
> FIEMAP itself just tells user the map on disk.  However there is an
> exception for delayed allocation,  FIEMAP tells users the data is in
> page cache.

No, FIEMAP does not tell the user there is data in the page cache.
It tells there user there is a delayed allocation extent. For XFS, a
delayed allocation extent can cover a range _greater_ than there is
data in the page cache - we do allocation allignment, speculative
allocation and other tricks to avoid fragmentation via
delayed allocation. When XFSs says there is a delalloc extent, it is
simply showing the in-memory representation of the extent. if you
want to know where the data in the page cache actually is, you need
to sync the file to disk to get those ranges converted to real
extents. This is how xfs_bmap has worked for 15 years....

> Maybe FIEMAP should return all known messages for unwritten extent, if
> unwritten data exists in page cache, FIEMAP should let users know that
> data is in page cache and space on disk has been preallocated, but
> data has not been flushed into disk.  Actually, delayed allocation has
> done like this. Then user-space applications can determine how to do.
> Taking cp as an example, it will copy from page cache rather ignore
> it.

Once again, FIEMAP is for showing the filesystem's current extent
state, not the page cache state. Ext4 may implement FIEMAP by doing
page cache walks, but that is a filesystem specific implementation
detail.

> We need a definite definition for FIEMAP, in other words, it tells
> users map on disk or both disk and page cache.

We already have a definition - and it has nothing to do with the
page cache state.

> If the former one is taken, then FIEMAP should not consider
> delayed allocation.

Not at all. the delayed allocation extent is a first class extent
type in XFS and it is reported directly from the extent list. Your
viewpoint is very ext4-specific and ignores the fact that other
filesystems were doing this sort of mapping long before even ext3
(let alone ext4) was a glint in the designer's eye....

> otherwise, FIEMAP should return all known messages for unwritten case
> like delayed allocation.

See my previous comments about extents being unwritten until data is
physically written to them.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  2:59                                     ` Ted Ts'o
@ 2011-04-19  4:14                                       ` Dave Chinner
  -1 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-19  4:14 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Yongqiang Yang, Andreas Dilger, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Mon, Apr 18, 2011 at 10:59:49PM -0400, Ted Ts'o wrote:
> On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
> > On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> > > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should
> > > only do anything if there is unwritten data, which is the only
> > > case we are concerned with at this point.  In any case, this is a
> > > simple solution for coreutils until such a time that a more
> > > complex solution is added in the kernel (if ever).
> 
> I would recommend that coreutils check i_blocks and i_size and only
> try using fiemap (with FIEMAP_FLAG_SYNC) if the file appears to be
> sparse.  That's because FIEMAP_FLAG_SYNC will do the effectively
> equivalent of an fsync() system call.  Otherwise, in the case of a
> freshly untar'ed directory hierarchy which is then copied using "cp
> -r", cp would end up calling fsync() for each file in the directory,
> with the disastrous performance result that one might expect.
> 
> If cp only tries the fiemap optimization on files that appear to be
> sparse, it should avoid this problem.
> 
> > > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
> > >
> > > I don't see how this will change the problem in any meaningful way. There
> > > will still need to be code that is traversing the on-disk mapping, and also
> > > keeping it coherent with unwritten data in the page cache.
> 
> The advantage of SEEK_HOLE/SEEK_DATA is that we don't need to force an
> fsync() of the data.
> 
> > It seems that we are being messed up by page cache and disk.
> > Unwritten flag returned from FIEMAP indicates blocks on disk are not
> > written, but it does not say if there is data in page cache.  So
> > FIEMAP itself just tells user the map on disk.  However there is an
> > exception for delayed allocation,  FIEMAP tells users the data is in
> > page cache.
> > 
> > Maybe FIEMAP should return all known messages for unwritten extent, if
> > unwritten data exists in page cache, FIEMAP should let users know that
> > data is in page cache and space on disk has been preallocated, but
> > data has not been flushed into disk.  Actually, delayed allocation has
> > done like this. Then user-space applications can determine how to do.
> > Taking cp as an example, it will copy from page cache rather ignore
> > it.
> >
> > We need a definite definition for FIEMAP, in other words, it tells
> > users map on disk or both disk and page cache.
> > 
> > If the former one is taken, then FIEMAP should not consider delayed
> > allocation.  otherwise, FIEMAP should return all known messages for
> > unwritten case like delayed allocation.
> 
> The fact that the FIEMAP interface deifnition includes an delayed
> allocation bit could be a strong indication that unlike the XFS's bmap
> interface, that this interface is supposed to return information
> taking into account both on-disk and page cache information.

As I said in a previous email, XFS uses delalloc as a first class
extent and reporting them does not require looking at the page
cache. Therefore whatever historical behaviour xfs_bmap used is
irrelevant - supporting delalloc extents was a 2 or 3 line change
and in no way was intended to report anything other than the current
extents. Even at that time, "dirty page cache ranges" != delalloc
extents, and this appears to be the way ext4 has _implemented_
reporting of delalloc extents. 

Indeed, I was the one that suggested it be supported because it is
useful to know the delalloc state _for debugging purposes_.  Now you
are trying to redefine what a delalloc extent is to match the ext4
implementation, and then extend that same reasoning to change what
an unwritten extent means to match how _you think_ the ext4
implementation works(*).

And besides, if I use your same logical progression you've
applied to FIEMAP via the ext4 delalloc extent implementation, using
the XFS delalloc extent implementation in no way implies page cache
coherency for FIEMAP. :)

FIEMAP is for reporting extent state. What that means is filesystem
specific, and requires knowledge of the filesystem to use
effectively. If you want to report coherent state for working out
what ranges to copy, implement SEEK_HOLE/SEEK_DATA (which would use
much of the FIEMAP infrastructure). Redefining the FIEMAP API will
not solve the problem of different filesystems behaving in a manner
that is not useful for coreutils....

> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
> it's the only way to guarantee correct behaviour for XFS.  But I would
> really rather that be the long-term way we leave things!

(*) It's not XFS specific - ext4 behaves exactly the same way (as
eric kindly pointed out). IOWs, it's likely that all filesystems
need the SYNC flag for one reason or another and that indicates to
me that FIEMAP is simply not the right interface for coreutils to be
using for their intended purpose.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  4:14                                       ` Dave Chinner
  0 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-19  4:14 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Eric Sandeen, Yongqiang Yang, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Mon, Apr 18, 2011 at 10:59:49PM -0400, Ted Ts'o wrote:
> On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
> > On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> > > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should
> > > only do anything if there is unwritten data, which is the only
> > > case we are concerned with at this point.  In any case, this is a
> > > simple solution for coreutils until such a time that a more
> > > complex solution is added in the kernel (if ever).
> 
> I would recommend that coreutils check i_blocks and i_size and only
> try using fiemap (with FIEMAP_FLAG_SYNC) if the file appears to be
> sparse.  That's because FIEMAP_FLAG_SYNC will do the effectively
> equivalent of an fsync() system call.  Otherwise, in the case of a
> freshly untar'ed directory hierarchy which is then copied using "cp
> -r", cp would end up calling fsync() for each file in the directory,
> with the disastrous performance result that one might expect.
> 
> If cp only tries the fiemap optimization on files that appear to be
> sparse, it should avoid this problem.
> 
> > > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
> > >
> > > I don't see how this will change the problem in any meaningful way. There
> > > will still need to be code that is traversing the on-disk mapping, and also
> > > keeping it coherent with unwritten data in the page cache.
> 
> The advantage of SEEK_HOLE/SEEK_DATA is that we don't need to force an
> fsync() of the data.
> 
> > It seems that we are being messed up by page cache and disk.
> > Unwritten flag returned from FIEMAP indicates blocks on disk are not
> > written, but it does not say if there is data in page cache.  So
> > FIEMAP itself just tells user the map on disk.  However there is an
> > exception for delayed allocation,  FIEMAP tells users the data is in
> > page cache.
> > 
> > Maybe FIEMAP should return all known messages for unwritten extent, if
> > unwritten data exists in page cache, FIEMAP should let users know that
> > data is in page cache and space on disk has been preallocated, but
> > data has not been flushed into disk.  Actually, delayed allocation has
> > done like this. Then user-space applications can determine how to do.
> > Taking cp as an example, it will copy from page cache rather ignore
> > it.
> >
> > We need a definite definition for FIEMAP, in other words, it tells
> > users map on disk or both disk and page cache.
> > 
> > If the former one is taken, then FIEMAP should not consider delayed
> > allocation.  otherwise, FIEMAP should return all known messages for
> > unwritten case like delayed allocation.
> 
> The fact that the FIEMAP interface deifnition includes an delayed
> allocation bit could be a strong indication that unlike the XFS's bmap
> interface, that this interface is supposed to return information
> taking into account both on-disk and page cache information.

As I said in a previous email, XFS uses delalloc as a first class
extent and reporting them does not require looking at the page
cache. Therefore whatever historical behaviour xfs_bmap used is
irrelevant - supporting delalloc extents was a 2 or 3 line change
and in no way was intended to report anything other than the current
extents. Even at that time, "dirty page cache ranges" != delalloc
extents, and this appears to be the way ext4 has _implemented_
reporting of delalloc extents. 

Indeed, I was the one that suggested it be supported because it is
useful to know the delalloc state _for debugging purposes_.  Now you
are trying to redefine what a delalloc extent is to match the ext4
implementation, and then extend that same reasoning to change what
an unwritten extent means to match how _you think_ the ext4
implementation works(*).

And besides, if I use your same logical progression you've
applied to FIEMAP via the ext4 delalloc extent implementation, using
the XFS delalloc extent implementation in no way implies page cache
coherency for FIEMAP. :)

FIEMAP is for reporting extent state. What that means is filesystem
specific, and requires knowledge of the filesystem to use
effectively. If you want to report coherent state for working out
what ranges to copy, implement SEEK_HOLE/SEEK_DATA (which would use
much of the FIEMAP infrastructure). Redefining the FIEMAP API will
not solve the problem of different filesystems behaving in a manner
that is not useful for coreutils....

> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
> it's the only way to guarantee correct behaviour for XFS.  But I would
> really rather that be the long-term way we leave things!

(*) It's not XFS specific - ext4 behaves exactly the same way (as
eric kindly pointed out). IOWs, it's likely that all filesystems
need the SYNC flag for one reason or another and that indicates to
me that FIEMAP is simply not the right interface for coreutils to be
using for their intended purpose.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  2:59                                     ` Ted Ts'o
@ 2011-04-19  5:27                                       ` Christoph Hellwig
  -1 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-19  5:27 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Yongqiang Yang, Andreas Dilger, Dave Chinner, Eric Sandeen,
	xfs-oss, coreutils, linux-ext4, P?draig Brady,
	Markus Trippelsdorf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 1555 bytes --]

On Mon, Apr 18, 2011 at 10:59:49PM -0400, Ted Ts'o wrote:
> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
> it's the only way to guarantee correct behaviour for XFS.  But I would
> really rather that be the long-term way we leave things!

As Eric pointed out both ext4 and XFS have the same behaviour when
writing into unwritten extent.  I think you are a bit confused because
ext4 also got basic handling of delalloc extents wrong before commit
6d9c85eb700bd3ac59e63bb9de463dea1aca084c, which never was a problem with
XFS.  It would be nice if ext4 developers had sent the included
regression test for xfs so that everyone could verify this behaviour,
btw.

To report written to but not synced unwritten extents properly we'd
need to move fiemap away from the onðdisk state reporting done so far
and do something that is purely in-memory.  It would be doable by
walking the pagecache and checking for the buffer unwritten flag
in a loop over the pages, but I'm honestly not sure it's going to
help much.  In fact given that unwritten extent were specifically
allocated before it doesn't seem like an overly smart idea to skip
them in a copy - yes it will save space but it also undoes the
previous explicit preallocation.  If people want it they should rather
add a new option to cp to turn zeroes into holes.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  5:27                                       ` Christoph Hellwig
  0 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-19  5:27 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Eric Sandeen, Yongqiang Yang, xfs-oss, coreutils,
	linux-ext4, P?draig Brady, Markus Trippelsdorf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 1333 bytes --]

On Mon, Apr 18, 2011 at 10:59:49PM -0400, Ted Ts'o wrote:
> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
> it's the only way to guarantee correct behaviour for XFS.  But I would
> really rather that be the long-term way we leave things!

As Eric pointed out both ext4 and XFS have the same behaviour when
writing into unwritten extent.  I think you are a bit confused because
ext4 also got basic handling of delalloc extents wrong before commit
6d9c85eb700bd3ac59e63bb9de463dea1aca084c, which never was a problem with
XFS.  It would be nice if ext4 developers had sent the included
regression test for xfs so that everyone could verify this behaviour,
btw.

To report written to but not synced unwritten extents properly we'd
need to move fiemap away from the onðdisk state reporting done so far
and do something that is purely in-memory.  It would be doable by
walking the pagecache and checking for the buffer unwritten flag
in a loop over the pages, but I'm honestly not sure it's going to
help much.  In fact given that unwritten extent were specifically
allocated before it doesn't seem like an overly smart idea to skip
them in a copy - yes it will save space but it also undoes the
previous explicit preallocation.  If people want it they should rather
add a new option to cp to turn zeroes into holes.

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  3:44                                   ` Dave Chinner
@ 2011-04-19  6:53                                     ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-19  6:53 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Theodore Tso, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 11:44 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
>> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>> > On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >
>> > On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
>> >
>> > On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
>> >
>> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
>> > fine.
>> >
>> > Except that if someone is copying a large delay allocated file, it will
>> > cause
>> >
>> > the file to immediately snapped to disk, which might not be the greatest
>> >
>> > thing in the world.
>> >
>> > Obvious workaround - if the initial fiemap call shows unwritten
>> > extents, redo it with the sync flag set. Though that assumeѕ that
>> > you can trust things like delalloc extents to only cover the range
>> > that valid data exists in. Which, of course, you can't assume,
>> > either. :/
>> >
>> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
>> > anything if there is unwritten data, which is the only case we are concerned
>> > with at this point.  In any case, this is a simple solution for coreutils
>> > until such a time that a more complex solution is added in the kernel (if
>> > ever).
>> >
>> > Christoph is write, SEEK_HOLE and SEEK_DATA are
>> >
>> > a much better API for what cp woulld lke to do.  Unfortunately it hasn't
>> >
>> > been implemented yet in the VFS...
>> >
>> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
>> >
>> > I don't see how this will change the problem in any meaningful way. There
>> > will still need to be code that is traversing the on-disk mapping, and also
>> > keeping it coherent with unwritten data in the page cache.
>>
>> It seems that we are being messed up by page cache and disk.
>> Unwritten flag returned from FIEMAP indicates blocks on disk are not
>> written, but it does not say if there is data in page cache.  So
>> FIEMAP itself just tells user the map on disk.  However there is an
>> exception for delayed allocation,  FIEMAP tells users the data is in
>> page cache.
>
> No, FIEMAP does not tell the user there is data in the page cache.
> It tells there user there is a delayed allocation extent. For XFS, a
> delayed allocation extent can cover a range _greater_ than there is
> data in the page cache - we do allocation allignment, speculative
> allocation and other tricks to avoid fragmentation via
> delayed allocation. When XFSs says there is a delalloc extent, it is
> simply showing the in-memory representation of the extent. if you
> want to know where the data in the page cache actually is, you need
> to sync the file to disk to get those ranges converted to real
> extents. This is how xfs_bmap has worked for 15 years....
>
>> Maybe FIEMAP should return all known messages for unwritten extent, if
>> unwritten data exists in page cache, FIEMAP should let users know that
>> data is in page cache and space on disk has been preallocated, but
>> data has not been flushed into disk.  Actually, delayed allocation has
>> done like this. Then user-space applications can determine how to do.
>> Taking cp as an example, it will copy from page cache rather ignore
>> it.
>
> Once again, FIEMAP is for showing the filesystem's current extent
> state, not the page cache state. Ext4 may implement FIEMAP by doing
> page cache walks, but that is a filesystem specific implementation
> detail.
>
>> We need a definite definition for FIEMAP, in other words, it tells
>> users map on disk or both disk and page cache.
>
> We already have a definition - and it has nothing to do with the
> page cache state.
>
>> If the former one is taken, then FIEMAP should not consider
>> delayed allocation.
>
> Not at all. the delayed allocation extent is a first class extent
> type in XFS and it is reported directly from the extent list. Your
> viewpoint is very ext4-specific and ignores the fact that other
> filesystems were doing this sort of mapping long before even ext3
> (let alone ext4) was a glint in the designer's eye....
>
>> otherwise, FIEMAP should return all known messages for unwritten case
>> like delayed allocation.
>
> See my previous comments about extents being unwritten until data is
> physically written to them.
Understood, thank you for your explanation.

Ok.  Let's look at it from a higher view.  What you described about
extent state is specific to xfs.

I think there are 2 ways to provide a definite definition for FIEMAP
for all filesystems:

1. FIEMAP returns extent state on disk.
2. FIEMAP returns extent both in memory and on disk.

Now, the question comes in case 2.   How to define extent's state in
memory? Every filesystem has its own implementation regarding extent
in memory, especially for delayed and unwritten extents, and I think
they are all reasonable. For example, ext4 without delayed allocation
change unwritten extents to written ones immediately in memory, while
the changing is delayed until flush time in delayed allocation case.

It seems that there are only 1 way to provide a definite definition of
in-memory-extent - FIEMAP should return what an user has written.
this works for all filesystems.

Yongqiang.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  6:53                                     ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-19  6:53 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Theodore Tso, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 11:44 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
>> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>> > On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >
>> > On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
>> >
>> > On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
>> >
>> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
>> > fine.
>> >
>> > Except that if someone is copying a large delay allocated file, it will
>> > cause
>> >
>> > the file to immediately snapped to disk, which might not be the greatest
>> >
>> > thing in the world.
>> >
>> > Obvious workaround - if the initial fiemap call shows unwritten
>> > extents, redo it with the sync flag set. Though that assumeѕ that
>> > you can trust things like delalloc extents to only cover the range
>> > that valid data exists in. Which, of course, you can't assume,
>> > either. :/
>> >
>> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
>> > anything if there is unwritten data, which is the only case we are concerned
>> > with at this point.  In any case, this is a simple solution for coreutils
>> > until such a time that a more complex solution is added in the kernel (if
>> > ever).
>> >
>> > Christoph is write, SEEK_HOLE and SEEK_DATA are
>> >
>> > a much better API for what cp woulld lke to do.  Unfortunately it hasn't
>> >
>> > been implemented yet in the VFS...
>> >
>> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
>> >
>> > I don't see how this will change the problem in any meaningful way. There
>> > will still need to be code that is traversing the on-disk mapping, and also
>> > keeping it coherent with unwritten data in the page cache.
>>
>> It seems that we are being messed up by page cache and disk.
>> Unwritten flag returned from FIEMAP indicates blocks on disk are not
>> written, but it does not say if there is data in page cache.  So
>> FIEMAP itself just tells user the map on disk.  However there is an
>> exception for delayed allocation,  FIEMAP tells users the data is in
>> page cache.
>
> No, FIEMAP does not tell the user there is data in the page cache.
> It tells there user there is a delayed allocation extent. For XFS, a
> delayed allocation extent can cover a range _greater_ than there is
> data in the page cache - we do allocation allignment, speculative
> allocation and other tricks to avoid fragmentation via
> delayed allocation. When XFSs says there is a delalloc extent, it is
> simply showing the in-memory representation of the extent. if you
> want to know where the data in the page cache actually is, you need
> to sync the file to disk to get those ranges converted to real
> extents. This is how xfs_bmap has worked for 15 years....
>
>> Maybe FIEMAP should return all known messages for unwritten extent, if
>> unwritten data exists in page cache, FIEMAP should let users know that
>> data is in page cache and space on disk has been preallocated, but
>> data has not been flushed into disk.  Actually, delayed allocation has
>> done like this. Then user-space applications can determine how to do.
>> Taking cp as an example, it will copy from page cache rather ignore
>> it.
>
> Once again, FIEMAP is for showing the filesystem's current extent
> state, not the page cache state. Ext4 may implement FIEMAP by doing
> page cache walks, but that is a filesystem specific implementation
> detail.
>
>> We need a definite definition for FIEMAP, in other words, it tells
>> users map on disk or both disk and page cache.
>
> We already have a definition - and it has nothing to do with the
> page cache state.
>
>> If the former one is taken, then FIEMAP should not consider
>> delayed allocation.
>
> Not at all. the delayed allocation extent is a first class extent
> type in XFS and it is reported directly from the extent list. Your
> viewpoint is very ext4-specific and ignores the fact that other
> filesystems were doing this sort of mapping long before even ext3
> (let alone ext4) was a glint in the designer's eye....
>
>> otherwise, FIEMAP should return all known messages for unwritten case
>> like delayed allocation.
>
> See my previous comments about extents being unwritten until data is
> physically written to them.
Understood, thank you for your explanation.

Ok.  Let's look at it from a higher view.  What you described about
extent state is specific to xfs.

I think there are 2 ways to provide a definite definition for FIEMAP
for all filesystems:

1. FIEMAP returns extent state on disk.
2. FIEMAP returns extent both in memory and on disk.

Now, the question comes in case 2.   How to define extent's state in
memory? Every filesystem has its own implementation regarding extent
in memory, especially for delayed and unwritten extents, and I think
they are all reasonable. For example, ext4 without delayed allocation
change unwritten extents to written ones immediately in memory, while
the changing is delayed until flush time in delayed allocation case.

It seems that there are only 1 way to provide a definite definition of
in-memory-extent - FIEMAP should return what an user has written.
this works for all filesystems.

Yongqiang.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  6:53                                     ` Yongqiang Yang
@ 2011-04-19  7:45                                       ` Dave Chinner
  -1 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-19  7:45 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Andreas Dilger, Theodore Tso, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 02:53:20PM +0800, Yongqiang Yang wrote:
> On Tue, Apr 19, 2011 at 11:44 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
> >> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> >> > On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> >
> >> > On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
> >> >
> >> > On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
> >> >
> >> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
> >> > fine.
> >> >
> >> > Except that if someone is copying a large delay allocated file, it will
> >> > cause
> >> >
> >> > the file to immediately snapped to disk, which might not be the greatest
> >> >
> >> > thing in the world.
> >> >
> >> > Obvious workaround - if the initial fiemap call shows unwritten
> >> > extents, redo it with the sync flag set. Though that assumeѕ that
> >> > you can trust things like delalloc extents to only cover the range
> >> > that valid data exists in. Which, of course, you can't assume,
> >> > either. :/
> >> >
> >> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
> >> > anything if there is unwritten data, which is the only case we are concerned
> >> > with at this point.  In any case, this is a simple solution for coreutils
> >> > until such a time that a more complex solution is added in the kernel (if
> >> > ever).
> >> >
> >> > Christoph is write, SEEK_HOLE and SEEK_DATA are
> >> >
> >> > a much better API for what cp woulld lke to do.  Unfortunately it hasn't
> >> >
> >> > been implemented yet in the VFS...
> >> >
> >> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
> >> >
> >> > I don't see how this will change the problem in any meaningful way. There
> >> > will still need to be code that is traversing the on-disk mapping, and also
> >> > keeping it coherent with unwritten data in the page cache.
> >>
> >> It seems that we are being messed up by page cache and disk.
> >> Unwritten flag returned from FIEMAP indicates blocks on disk are not
> >> written, but it does not say if there is data in page cache.  So
> >> FIEMAP itself just tells user the map on disk.  However there is an
> >> exception for delayed allocation,  FIEMAP tells users the data is in
> >> page cache.
> >
> > No, FIEMAP does not tell the user there is data in the page cache.
> > It tells there user there is a delayed allocation extent. For XFS, a
> > delayed allocation extent can cover a range _greater_ than there is
> > data in the page cache - we do allocation allignment, speculative
> > allocation and other tricks to avoid fragmentation via
> > delayed allocation. When XFSs says there is a delalloc extent, it is
> > simply showing the in-memory representation of the extent. if you
> > want to know where the data in the page cache actually is, you need
> > to sync the file to disk to get those ranges converted to real
> > extents. This is how xfs_bmap has worked for 15 years....
> >
> >> Maybe FIEMAP should return all known messages for unwritten extent, if
> >> unwritten data exists in page cache, FIEMAP should let users know that
> >> data is in page cache and space on disk has been preallocated, but
> >> data has not been flushed into disk.  Actually, delayed allocation has
> >> done like this. Then user-space applications can determine how to do.
> >> Taking cp as an example, it will copy from page cache rather ignore
> >> it.
> >
> > Once again, FIEMAP is for showing the filesystem's current extent
> > state, not the page cache state. Ext4 may implement FIEMAP by doing
> > page cache walks, but that is a filesystem specific implementation
> > detail.
> >
> >> We need a definite definition for FIEMAP, in other words, it tells
> >> users map on disk or both disk and page cache.
> >
> > We already have a definition - and it has nothing to do with the
> > page cache state.
> >
> >> If the former one is taken, then FIEMAP should not consider
> >> delayed allocation.
> >
> > Not at all. the delayed allocation extent is a first class extent
> > type in XFS and it is reported directly from the extent list. Your
> > viewpoint is very ext4-specific and ignores the fact that other
> > filesystems were doing this sort of mapping long before even ext3
> > (let alone ext4) was a glint in the designer's eye....
> >
> >> otherwise, FIEMAP should return all known messages for unwritten case
> >> like delayed allocation.
> >
> > See my previous comments about extents being unwritten until data is
> > physically written to them.
> Understood, thank you for your explanation.
> 
> Ok.  Let's look at it from a higher view.  What you described about
> extent state is specific to xfs.
> 
> I think there are 2 ways to provide a definite definition for FIEMAP
> for all filesystems:
> 
> 1. FIEMAP returns extent state on disk.
> 2. FIEMAP returns extent both in memory and on disk.

You are *not listening*. There is no #2. FIEMAP returns the extent
state _on disk_ at the time of the call. If you want it to reflect
the in-memory state at the time of the call (for data or metadata),
you *must* use the the SYNC flag to convert that in-memory state to
on-disk state, which FIEMAP then reports just fine.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  7:45                                       ` Dave Chinner
  0 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-19  7:45 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Andreas Dilger, Theodore Tso, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 02:53:20PM +0800, Yongqiang Yang wrote:
> On Tue, Apr 19, 2011 at 11:44 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
> >> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
> >> > On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> >
> >> > On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
> >> >
> >> > On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
> >> >
> >> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
> >> > fine.
> >> >
> >> > Except that if someone is copying a large delay allocated file, it will
> >> > cause
> >> >
> >> > the file to immediately snapped to disk, which might not be the greatest
> >> >
> >> > thing in the world.
> >> >
> >> > Obvious workaround - if the initial fiemap call shows unwritten
> >> > extents, redo it with the sync flag set. Though that assumeѕ that
> >> > you can trust things like delalloc extents to only cover the range
> >> > that valid data exists in. Which, of course, you can't assume,
> >> > either. :/
> >> >
> >> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
> >> > anything if there is unwritten data, which is the only case we are concerned
> >> > with at this point.  In any case, this is a simple solution for coreutils
> >> > until such a time that a more complex solution is added in the kernel (if
> >> > ever).
> >> >
> >> > Christoph is write, SEEK_HOLE and SEEK_DATA are
> >> >
> >> > a much better API for what cp woulld lke to do.  Unfortunately it hasn't
> >> >
> >> > been implemented yet in the VFS...
> >> >
> >> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
> >> >
> >> > I don't see how this will change the problem in any meaningful way. There
> >> > will still need to be code that is traversing the on-disk mapping, and also
> >> > keeping it coherent with unwritten data in the page cache.
> >>
> >> It seems that we are being messed up by page cache and disk.
> >> Unwritten flag returned from FIEMAP indicates blocks on disk are not
> >> written, but it does not say if there is data in page cache.  So
> >> FIEMAP itself just tells user the map on disk.  However there is an
> >> exception for delayed allocation,  FIEMAP tells users the data is in
> >> page cache.
> >
> > No, FIEMAP does not tell the user there is data in the page cache.
> > It tells there user there is a delayed allocation extent. For XFS, a
> > delayed allocation extent can cover a range _greater_ than there is
> > data in the page cache - we do allocation allignment, speculative
> > allocation and other tricks to avoid fragmentation via
> > delayed allocation. When XFSs says there is a delalloc extent, it is
> > simply showing the in-memory representation of the extent. if you
> > want to know where the data in the page cache actually is, you need
> > to sync the file to disk to get those ranges converted to real
> > extents. This is how xfs_bmap has worked for 15 years....
> >
> >> Maybe FIEMAP should return all known messages for unwritten extent, if
> >> unwritten data exists in page cache, FIEMAP should let users know that
> >> data is in page cache and space on disk has been preallocated, but
> >> data has not been flushed into disk.  Actually, delayed allocation has
> >> done like this. Then user-space applications can determine how to do.
> >> Taking cp as an example, it will copy from page cache rather ignore
> >> it.
> >
> > Once again, FIEMAP is for showing the filesystem's current extent
> > state, not the page cache state. Ext4 may implement FIEMAP by doing
> > page cache walks, but that is a filesystem specific implementation
> > detail.
> >
> >> We need a definite definition for FIEMAP, in other words, it tells
> >> users map on disk or both disk and page cache.
> >
> > We already have a definition - and it has nothing to do with the
> > page cache state.
> >
> >> If the former one is taken, then FIEMAP should not consider
> >> delayed allocation.
> >
> > Not at all. the delayed allocation extent is a first class extent
> > type in XFS and it is reported directly from the extent list. Your
> > viewpoint is very ext4-specific and ignores the fact that other
> > filesystems were doing this sort of mapping long before even ext3
> > (let alone ext4) was a glint in the designer's eye....
> >
> >> otherwise, FIEMAP should return all known messages for unwritten case
> >> like delayed allocation.
> >
> > See my previous comments about extents being unwritten until data is
> > physically written to them.
> Understood, thank you for your explanation.
> 
> Ok.  Let's look at it from a higher view.  What you described about
> extent state is specific to xfs.
> 
> I think there are 2 ways to provide a definite definition for FIEMAP
> for all filesystems:
> 
> 1. FIEMAP returns extent state on disk.
> 2. FIEMAP returns extent both in memory and on disk.

You are *not listening*. There is no #2. FIEMAP returns the extent
state _on disk_ at the time of the call. If you want it to reflect
the in-memory state at the time of the call (for data or metadata),
you *must* use the the SYNC flag to convert that in-memory state to
on-disk state, which FIEMAP then reports just fine.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  7:45                                       ` Dave Chinner
@ 2011-04-19  8:11                                         ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-19  8:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Theodore Tso, Eric Sandeen, xfs-oss,
	coreutils-mXXj517/zsQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	Markus Trippelsdorf

On Tue, Apr 19, 2011 at 3:45 PM, Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:
> On Tue, Apr 19, 2011 at 02:53:20PM +0800, Yongqiang Yang wrote:
>> On Tue, Apr 19, 2011 at 11:44 AM, Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:
>> > On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
>> >> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org> wrote:
>> >> > On 2011-04-17, at 6:40 PM, Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org> wrote:
>> >> >
>> >> > On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
>> >> >
>> >> > On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
>> >> >
>> >> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
>> >> > fine.
>> >> >
>> >> > Except that if someone is copying a large delay allocated file, it will
>> >> > cause
>> >> >
>> >> > the file to immediately snapped to disk, which might not be the greatest
>> >> >
>> >> > thing in the world.
>> >> >
>> >> > Obvious workaround - if the initial fiemap call shows unwritten
>> >> > extents, redo it with the sync flag set. Though that assumeѕ that
>> >> > you can trust things like delalloc extents to only cover the range
>> >> > that valid data exists in. Which, of course, you can't assume,
>> >> > either. :/
>> >> >
>> >> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
>> >> > anything if there is unwritten data, which is the only case we are concerned
>> >> > with at this point.  In any case, this is a simple solution for coreutils
>> >> > until such a time that a more complex solution is added in the kernel (if
>> >> > ever).
>> >> >
>> >> > Christoph is write, SEEK_HOLE and SEEK_DATA are
>> >> >
>> >> > a much better API for what cp woulld lke to do.  Unfortunately it hasn't
>> >> >
>> >> > been implemented yet in the VFS...
>> >> >
>> >> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
>> >> >
>> >> > I don't see how this will change the problem in any meaningful way. There
>> >> > will still need to be code that is traversing the on-disk mapping, and also
>> >> > keeping it coherent with unwritten data in the page cache.
>> >>
>> >> It seems that we are being messed up by page cache and disk.
>> >> Unwritten flag returned from FIEMAP indicates blocks on disk are not
>> >> written, but it does not say if there is data in page cache.  So
>> >> FIEMAP itself just tells user the map on disk.  However there is an
>> >> exception for delayed allocation,  FIEMAP tells users the data is in
>> >> page cache.
>> >
>> > No, FIEMAP does not tell the user there is data in the page cache.
>> > It tells there user there is a delayed allocation extent. For XFS, a
>> > delayed allocation extent can cover a range _greater_ than there is
>> > data in the page cache - we do allocation allignment, speculative
>> > allocation and other tricks to avoid fragmentation via
>> > delayed allocation. When XFSs says there is a delalloc extent, it is
>> > simply showing the in-memory representation of the extent. if you
>> > want to know where the data in the page cache actually is, you need
>> > to sync the file to disk to get those ranges converted to real
>> > extents. This is how xfs_bmap has worked for 15 years....
>> >
>> >> Maybe FIEMAP should return all known messages for unwritten extent, if
>> >> unwritten data exists in page cache, FIEMAP should let users know that
>> >> data is in page cache and space on disk has been preallocated, but
>> >> data has not been flushed into disk.  Actually, delayed allocation has
>> >> done like this. Then user-space applications can determine how to do.
>> >> Taking cp as an example, it will copy from page cache rather ignore
>> >> it.
>> >
>> > Once again, FIEMAP is for showing the filesystem's current extent
>> > state, not the page cache state. Ext4 may implement FIEMAP by doing
>> > page cache walks, but that is a filesystem specific implementation
>> > detail.
>> >
>> >> We need a definite definition for FIEMAP, in other words, it tells
>> >> users map on disk or both disk and page cache.
>> >
>> > We already have a definition - and it has nothing to do with the
>> > page cache state.
>> >
>> >> If the former one is taken, then FIEMAP should not consider
>> >> delayed allocation.
>> >
>> > Not at all. the delayed allocation extent is a first class extent
>> > type in XFS and it is reported directly from the extent list. Your
>> > viewpoint is very ext4-specific and ignores the fact that other
>> > filesystems were doing this sort of mapping long before even ext3
>> > (let alone ext4) was a glint in the designer's eye....
>> >
>> >> otherwise, FIEMAP should return all known messages for unwritten case
>> >> like delayed allocation.
>> >
>> > See my previous comments about extents being unwritten until data is
>> > physically written to them.
>> Understood, thank you for your explanation.
>>
>> Ok.  Let's look at it from a higher view.  What you described about
>> extent state is specific to xfs.
>>
>> I think there are 2 ways to provide a definite definition for FIEMAP
>> for all filesystems:
>>
>> 1. FIEMAP returns extent state on disk.
>> 2. FIEMAP returns extent both in memory and on disk.
>
> You are *not listening*. There is no #2. FIEMAP returns the extent
> state _on disk_ at the time of the call. If you want it to reflect
> the in-memory state at the time of the call (for data or metadata),
> you *must* use the the SYNC flag to convert that in-memory state to
> on-disk state, which FIEMAP then reports just fine.

Sorry for being dense.

I think delayed extent is an exception. because it is not on the disk.

Yongqiang.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org
>



-- 
Best Wishes
Yongqiang Yang

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19  8:11                                         ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-19  8:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Theodore Tso, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 3:45 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, Apr 19, 2011 at 02:53:20PM +0800, Yongqiang Yang wrote:
>> On Tue, Apr 19, 2011 at 11:44 AM, Dave Chinner <david@fromorbit.com> wrote:
>> > On Tue, Apr 19, 2011 at 09:58:15AM +0800, Yongqiang Yang wrote:
>> >> On Mon, Apr 18, 2011 at 10:45 AM, Andreas Dilger <adilger@dilger.ca> wrote:
>> >> > On 2011-04-17, at 6:40 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >> >
>> >> > On Sat, Apr 16, 2011 at 08:21:28AM -0400, Theodore Tso wrote:
>> >> >
>> >> > On Apr 16, 2011, at 1:11 AM, Andreas Dilger wrote:
>> >> >
>> >> > In that case, it means cp should just always use FIEMAP_FLAG_SYNC, which is
>> >> > fine.
>> >> >
>> >> > Except that if someone is copying a large delay allocated file, it will
>> >> > cause
>> >> >
>> >> > the file to immediately snapped to disk, which might not be the greatest
>> >> >
>> >> > thing in the world.
>> >> >
>> >> > Obvious workaround - if the initial fiemap call shows unwritten
>> >> > extents, redo it with the sync flag set. Though that assumeѕ that
>> >> > you can trust things like delalloc extents to only cover the range
>> >> > that valid data exists in. Which, of course, you can't assume,
>> >> > either. :/
>> >> >
>> >> > Always passing FIEMAP_FLAG_SYNC is fine in this case. It should only do
>> >> > anything if there is unwritten data, which is the only case we are concerned
>> >> > with at this point.  In any case, this is a simple solution for coreutils
>> >> > until such a time that a more complex solution is added in the kernel (if
>> >> > ever).
>> >> >
>> >> > Christoph is write, SEEK_HOLE and SEEK_DATA are
>> >> >
>> >> > a much better API for what cp woulld lke to do.  Unfortunately it hasn't
>> >> >
>> >> > been implemented yet in the VFS...
>> >> >
>> >> > Agreed, SEEK_HOLE/SEEK_DATA is the right way to solve this problem.
>> >> >
>> >> > I don't see how this will change the problem in any meaningful way. There
>> >> > will still need to be code that is traversing the on-disk mapping, and also
>> >> > keeping it coherent with unwritten data in the page cache.
>> >>
>> >> It seems that we are being messed up by page cache and disk.
>> >> Unwritten flag returned from FIEMAP indicates blocks on disk are not
>> >> written, but it does not say if there is data in page cache.  So
>> >> FIEMAP itself just tells user the map on disk.  However there is an
>> >> exception for delayed allocation,  FIEMAP tells users the data is in
>> >> page cache.
>> >
>> > No, FIEMAP does not tell the user there is data in the page cache.
>> > It tells there user there is a delayed allocation extent. For XFS, a
>> > delayed allocation extent can cover a range _greater_ than there is
>> > data in the page cache - we do allocation allignment, speculative
>> > allocation and other tricks to avoid fragmentation via
>> > delayed allocation. When XFSs says there is a delalloc extent, it is
>> > simply showing the in-memory representation of the extent. if you
>> > want to know where the data in the page cache actually is, you need
>> > to sync the file to disk to get those ranges converted to real
>> > extents. This is how xfs_bmap has worked for 15 years....
>> >
>> >> Maybe FIEMAP should return all known messages for unwritten extent, if
>> >> unwritten data exists in page cache, FIEMAP should let users know that
>> >> data is in page cache and space on disk has been preallocated, but
>> >> data has not been flushed into disk.  Actually, delayed allocation has
>> >> done like this. Then user-space applications can determine how to do.
>> >> Taking cp as an example, it will copy from page cache rather ignore
>> >> it.
>> >
>> > Once again, FIEMAP is for showing the filesystem's current extent
>> > state, not the page cache state. Ext4 may implement FIEMAP by doing
>> > page cache walks, but that is a filesystem specific implementation
>> > detail.
>> >
>> >> We need a definite definition for FIEMAP, in other words, it tells
>> >> users map on disk or both disk and page cache.
>> >
>> > We already have a definition - and it has nothing to do with the
>> > page cache state.
>> >
>> >> If the former one is taken, then FIEMAP should not consider
>> >> delayed allocation.
>> >
>> > Not at all. the delayed allocation extent is a first class extent
>> > type in XFS and it is reported directly from the extent list. Your
>> > viewpoint is very ext4-specific and ignores the fact that other
>> > filesystems were doing this sort of mapping long before even ext3
>> > (let alone ext4) was a glint in the designer's eye....
>> >
>> >> otherwise, FIEMAP should return all known messages for unwritten case
>> >> like delayed allocation.
>> >
>> > See my previous comments about extents being unwritten until data is
>> > physically written to them.
>> Understood, thank you for your explanation.
>>
>> Ok.  Let's look at it from a higher view.  What you described about
>> extent state is specific to xfs.
>>
>> I think there are 2 ways to provide a definite definition for FIEMAP
>> for all filesystems:
>>
>> 1. FIEMAP returns extent state on disk.
>> 2. FIEMAP returns extent both in memory and on disk.
>
> You are *not listening*. There is no #2. FIEMAP returns the extent
> state _on disk_ at the time of the call. If you want it to reflect
> the in-memory state at the time of the call (for data or metadata),
> you *must* use the the SYNC flag to convert that in-memory state to
> on-disk state, which FIEMAP then reports just fine.

Sorry for being dense.

I think delayed extent is an exception. because it is not on the disk.

Yongqiang.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  8:11                                         ` Yongqiang Yang
@ 2011-04-19 14:05                                           ` Eric Sandeen
  -1 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-19 14:05 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Dave Chinner, Andreas Dilger, Theodore Tso, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On 4/19/11 3:11 AM, Yongqiang Yang wrote:
> On Tue, Apr 19, 2011 at 3:45 PM, Dave Chinner <david@fromorbit.com> wrote:
>> On Tue, Apr 19, 2011 at 02:53:20PM +0800, Yongqiang Yang wrote:

...

>>> I think there are 2 ways to provide a definite definition for FIEMAP
>>> for all filesystems:
>>>
>>> 1. FIEMAP returns extent state on disk.
>>> 2. FIEMAP returns extent both in memory and on disk.
>>
>> You are *not listening*. There is no #2. FIEMAP returns the extent
>> state _on disk_ at the time of the call. If you want it to reflect
>> the in-memory state at the time of the call (for data or metadata),
>> you *must* use the the SYNC flag to convert that in-memory state to
>> on-disk state, which FIEMAP then reports just fine.
> 
> Sorry for being dense.
> 
> I think delayed extent is an exception. because it is not on the disk.
> 
> Yongqiang.

I don't think you're being dense, I think that the interface specification is just messed up.

By including flags for both unwritten and delalloc in the interface, we have hopelessly intertwined on-disk and in-memory state.

If you preallocate 1M and then do a buffered IO to that same range without a sync, and then a fiemap, what on earth should the interface return?

Writing the first part of that testcase is simple but I have no idea what the correct behavior is.

(FIEMAP_EXTENT_UNWRITTEN|FIEMAP_EXTENT_DELALLOC) ? that makes no sense.  But no other combination of flags makes sense to me either, unless we get into tortured redefinitions of what "delalloc" means.

And if we can't say what it -should- return.... well, too bad for coreutils.  :(

-Eric

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19 14:05                                           ` Eric Sandeen
  0 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-19 14:05 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: Andreas Dilger, Theodore Tso, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On 4/19/11 3:11 AM, Yongqiang Yang wrote:
> On Tue, Apr 19, 2011 at 3:45 PM, Dave Chinner <david@fromorbit.com> wrote:
>> On Tue, Apr 19, 2011 at 02:53:20PM +0800, Yongqiang Yang wrote:

...

>>> I think there are 2 ways to provide a definite definition for FIEMAP
>>> for all filesystems:
>>>
>>> 1. FIEMAP returns extent state on disk.
>>> 2. FIEMAP returns extent both in memory and on disk.
>>
>> You are *not listening*. There is no #2. FIEMAP returns the extent
>> state _on disk_ at the time of the call. If you want it to reflect
>> the in-memory state at the time of the call (for data or metadata),
>> you *must* use the the SYNC flag to convert that in-memory state to
>> on-disk state, which FIEMAP then reports just fine.
> 
> Sorry for being dense.
> 
> I think delayed extent is an exception. because it is not on the disk.
> 
> Yongqiang.

I don't think you're being dense, I think that the interface specification is just messed up.

By including flags for both unwritten and delalloc in the interface, we have hopelessly intertwined on-disk and in-memory state.

If you preallocate 1M and then do a buffered IO to that same range without a sync, and then a fiemap, what on earth should the interface return?

Writing the first part of that testcase is simple but I have no idea what the correct behavior is.

(FIEMAP_EXTENT_UNWRITTEN|FIEMAP_EXTENT_DELALLOC) ? that makes no sense.  But no other combination of flags makes sense to me either, unless we get into tortured redefinitions of what "delalloc" means.

And if we can't say what it -should- return.... well, too bad for coreutils.  :(

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  7:45                                       ` Dave Chinner
@ 2011-04-19 14:09                                         ` Ted Ts'o
  -1 siblings, 0 replies; 117+ messages in thread
From: Ted Ts'o @ 2011-04-19 14:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Yongqiang Yang, Andreas Dilger, Eric Sandeen, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 05:45:38PM +1000, Dave Chinner wrote:
> You are *not listening*. There is no #2. FIEMAP returns the extent
> state _on disk_ at the time of the call.

Dave, you're being rather strident about your insistence about what
FIEMAP's semantics are.  Part of the problem here is that it's *not*
clear or settled.

If it really is the state _on_ _disk_, does XFS really have a DELALLOC
bit _on_ _disk_?

					- Ted

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19 14:09                                         ` Ted Ts'o
  0 siblings, 0 replies; 117+ messages in thread
From: Ted Ts'o @ 2011-04-19 14:09 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Eric Sandeen, Yongqiang Yang, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 05:45:38PM +1000, Dave Chinner wrote:
> You are *not listening*. There is no #2. FIEMAP returns the extent
> state _on disk_ at the time of the call.

Dave, you're being rather strident about your insistence about what
FIEMAP's semantics are.  Part of the problem here is that it's *not*
clear or settled.

If it really is the state _on_ _disk_, does XFS really have a DELALLOC
bit _on_ _disk_?

					- Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19 14:09                                         ` Ted Ts'o
@ 2011-04-19 14:13                                           ` Eric Sandeen
  -1 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-19 14:13 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Dave Chinner, Yongqiang Yang, Andreas Dilger, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On 4/19/11 9:09 AM, Ted Ts'o wrote:
> On Tue, Apr 19, 2011 at 05:45:38PM +1000, Dave Chinner wrote:
>> You are *not listening*. There is no #2. FIEMAP returns the extent
>> state _on disk_ at the time of the call.
> 
> Dave, you're being rather strident about your insistence about what
> FIEMAP's semantics are.  Part of the problem here is that it's *not*
> clear or settled.
> 
> If it really is the state _on_ _disk_, does XFS really have a DELALLOC
> bit _on_ _disk_?
> 
> 					- Ted
> 

no of course it doesn't....

But I too am confused about Dave's assertion that it only reflects ondisk state when we have that pesky delalloc flag.

Whose idea was that, anyway? ;)

I'd certainly buy the argument that it -should- only reflect ondisk state, and we should nuke the delalloc flag from orbit, if we could, though.

-Eric

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19 14:13                                           ` Eric Sandeen
  0 siblings, 0 replies; 117+ messages in thread
From: Eric Sandeen @ 2011-04-19 14:13 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Yongqiang Yang, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On 4/19/11 9:09 AM, Ted Ts'o wrote:
> On Tue, Apr 19, 2011 at 05:45:38PM +1000, Dave Chinner wrote:
>> You are *not listening*. There is no #2. FIEMAP returns the extent
>> state _on disk_ at the time of the call.
> 
> Dave, you're being rather strident about your insistence about what
> FIEMAP's semantics are.  Part of the problem here is that it's *not*
> clear or settled.
> 
> If it really is the state _on_ _disk_, does XFS really have a DELALLOC
> bit _on_ _disk_?
> 
> 					- Ted
> 

no of course it doesn't....

But I too am confused about Dave's assertion that it only reflects ondisk state when we have that pesky delalloc flag.

Whose idea was that, anyway? ;)

I'd certainly buy the argument that it -should- only reflect ondisk state, and we should nuke the delalloc flag from orbit, if we could, though.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19 14:13                                           ` Eric Sandeen
@ 2011-04-19 16:01                                             ` Ted Ts'o
  -1 siblings, 0 replies; 117+ messages in thread
From: Ted Ts'o @ 2011-04-19 16:01 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Dave Chinner, Yongqiang Yang, Andreas Dilger, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 09:13:19AM -0500, Eric Sandeen wrote:
> 
> But I too am confused about Dave's assertion that it only reflects ondisk state when we have that pesky delalloc flag.
> 
> Whose idea was that, anyway? ;)
> 
> I'd certainly buy the argument that it -should- only reflect ondisk state, and we should nuke the delalloc flag from orbit, if we could, though.

I see three options of how we can clarify FIEMAP's semantics:

1) We define it as only reflecting ondisk state, and nuke the delalloc
flag from orbit.

2) We state that if the file is currently has unflushed pages in the
page cache, and FIEMAP_FLAG_SYNC is not passed, whether or not extents
return the DELALLOC flag or how they handle the UNWRITTEN flag is
undefined.

3) We state that FIEMAP is supposed to return information which
reflects the union of the on-disk and page cache state, with all that
this implies.

All of these are internally consistent definitions --- we need to
chose one, document, and then tell the shellutils folks what they
should do.

In the case of #1 and #2, we really need to implement support for
SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
this information.

Do we all agree on the problem statement, at least?  If so, then we
can try to come consensus on what is the appropriate solution. 

    	   		     	     - Ted

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19 16:01                                             ` Ted Ts'o
  0 siblings, 0 replies; 117+ messages in thread
From: Ted Ts'o @ 2011-04-19 16:01 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Andreas Dilger, Yongqiang Yang, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 09:13:19AM -0500, Eric Sandeen wrote:
> 
> But I too am confused about Dave's assertion that it only reflects ondisk state when we have that pesky delalloc flag.
> 
> Whose idea was that, anyway? ;)
> 
> I'd certainly buy the argument that it -should- only reflect ondisk state, and we should nuke the delalloc flag from orbit, if we could, though.

I see three options of how we can clarify FIEMAP's semantics:

1) We define it as only reflecting ondisk state, and nuke the delalloc
flag from orbit.

2) We state that if the file is currently has unflushed pages in the
page cache, and FIEMAP_FLAG_SYNC is not passed, whether or not extents
return the DELALLOC flag or how they handle the UNWRITTEN flag is
undefined.

3) We state that FIEMAP is supposed to return information which
reflects the union of the on-disk and page cache state, with all that
this implies.

All of these are internally consistent definitions --- we need to
chose one, document, and then tell the shellutils folks what they
should do.

In the case of #1 and #2, we really need to implement support for
SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
this information.

Do we all agree on the problem statement, at least?  If so, then we
can try to come consensus on what is the appropriate solution. 

    	   		     	     - Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19 14:09                                         ` Ted Ts'o
@ 2011-04-19 21:08                                             ` Dave Chinner
  -1 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-19 21:08 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Eric Sandeen, Yongqiang Yang, xfs-oss,
	coreutils-mXXj517/zsQ, linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	Markus Trippelsdorf

On Tue, Apr 19, 2011 at 10:09:09AM -0400, Ted Ts'o wrote:
> On Tue, Apr 19, 2011 at 05:45:38PM +1000, Dave Chinner wrote:
> > You are *not listening*. There is no #2. FIEMAP returns the extent
> > state _on disk_ at the time of the call.
> 
> Dave, you're being rather strident about your insistence about what
> FIEMAP's semantics are. 

The bit about the page cache state being relevant? That's what I was
refering to here.

> Part of the problem here is that it's *not*
> clear or settled.
> 
> If it really is the state _on_ _disk_, does XFS really have a DELALLOC
> bit _on_ _disk_?

Sigh. No.

This whole thing blew up because of unwritten extent behaviour when
there is dirty page cache covering and unwritten extent. Delalloc
was not the issue - what I said is absolutely true for unwritten
extents.  Somewhere in the middle someone started talking about
delalloc extents and conflating their behaviour with unwritten
extents, but I continued to talk about unwritten extents and
cached pages.

Even so, for delalloc extents the dirty page state in the page cache
is irrelevant. I've said earlier that XFS delalloc extents can span
regions that have no page cache state - they don't get reported as
holes by FIEMAP because they are tracked as delalloc. IOWs, like
unwritten extents, you can't rely on delalloc extents to tell you
where the data is in the file.

So, it logically follws that you need to use the SYNC flag for both
unwritten extents and delalloc extents to find out where there data
realy lies by converting them to real, written extents. i.e. the
only extents you can trust contain data from FIEMAP are the real
extents on disk....

Cheers,

Dave.
-- 
Dave Chinner
david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-19 21:08                                             ` Dave Chinner
  0 siblings, 0 replies; 117+ messages in thread
From: Dave Chinner @ 2011-04-19 21:08 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Eric Sandeen, Yongqiang Yang, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 10:09:09AM -0400, Ted Ts'o wrote:
> On Tue, Apr 19, 2011 at 05:45:38PM +1000, Dave Chinner wrote:
> > You are *not listening*. There is no #2. FIEMAP returns the extent
> > state _on disk_ at the time of the call.
> 
> Dave, you're being rather strident about your insistence about what
> FIEMAP's semantics are. 

The bit about the page cache state being relevant? That's what I was
refering to here.

> Part of the problem here is that it's *not*
> clear or settled.
> 
> If it really is the state _on_ _disk_, does XFS really have a DELALLOC
> bit _on_ _disk_?

Sigh. No.

This whole thing blew up because of unwritten extent behaviour when
there is dirty page cache covering and unwritten extent. Delalloc
was not the issue - what I said is absolutely true for unwritten
extents.  Somewhere in the middle someone started talking about
delalloc extents and conflating their behaviour with unwritten
extents, but I continued to talk about unwritten extents and
cached pages.

Even so, for delalloc extents the dirty page state in the page cache
is irrelevant. I've said earlier that XFS delalloc extents can span
regions that have no page cache state - they don't get reported as
holes by FIEMAP because they are tracked as delalloc. IOWs, like
unwritten extents, you can't rely on delalloc extents to tell you
where the data is in the file.

So, it logically follws that you need to use the SYNC flag for both
unwritten extents and delalloc extents to find out where there data
realy lies by converting them to real, written extents. i.e. the
only extents you can trust contain data from FIEMAP are the real
extents on disk....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19 16:01                                             ` Ted Ts'o
@ 2011-04-20  1:53                                               ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-20  1:53 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Eric Sandeen, Dave Chinner, Andreas Dilger, xfs-oss, coreutils,
	linux-ext4, Pádraig Brady, Markus Trippelsdorf

On Wed, Apr 20, 2011 at 12:01 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Tue, Apr 19, 2011 at 09:13:19AM -0500, Eric Sandeen wrote:
>>
>> But I too am confused about Dave's assertion that it only reflects ondisk state when we have that pesky delalloc flag.
>>
>> Whose idea was that, anyway? ;)
>>
>> I'd certainly buy the argument that it -should- only reflect ondisk state, and we should nuke the delalloc flag from orbit, if we could, though.
>
> I see three options of how we can clarify FIEMAP's semantics:
>
> 1) We define it as only reflecting ondisk state, and nuke the delalloc
> flag from orbit.
>
> 2) We state that if the file is currently has unflushed pages in the
> page cache, and FIEMAP_FLAG_SYNC is not passed, whether or not extents
> return the DELALLOC flag or how they handle the UNWRITTEN flag is
> undefined.
>
> 3) We state that FIEMAP is supposed to return information which
> reflects the union of the on-disk and page cache state, with all that
> this implies.
>
> All of these are internally consistent definitions --- we need to
> chose one, document, and then tell the shellutils folks what they
> should do.
>
> In the case of #1 and #2, we really need to implement support for
> SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
> this information.
>
> Do we all agree on the problem statement, at least?  If so, then we
> can try to come consensus on what is the appropriate solution.

I agree on the problem statement.  Users need to know what FIEMAP
returns definitely.

It seems that Dave is looking at the problem from a different view.

Dave thinks that FIEMAP returns where data exists on disk finally.
Then there are 2 possibilities: unknown and known.  delayed extent is
unknown and others are known.  Although we know where data in
unwritten extent exist finally on disk, we cannot know whether or not
it is being in memory.  However we know data in delayed extent is in
memory. it sounds strange.

>
>                                     - Ted
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-20  1:53                                               ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-04-20  1:53 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Eric Sandeen, xfs-oss, coreutils, linux-ext4,
	Pádraig Brady, Markus Trippelsdorf

On Wed, Apr 20, 2011 at 12:01 AM, Ted Ts'o <tytso@mit.edu> wrote:
> On Tue, Apr 19, 2011 at 09:13:19AM -0500, Eric Sandeen wrote:
>>
>> But I too am confused about Dave's assertion that it only reflects ondisk state when we have that pesky delalloc flag.
>>
>> Whose idea was that, anyway? ;)
>>
>> I'd certainly buy the argument that it -should- only reflect ondisk state, and we should nuke the delalloc flag from orbit, if we could, though.
>
> I see three options of how we can clarify FIEMAP's semantics:
>
> 1) We define it as only reflecting ondisk state, and nuke the delalloc
> flag from orbit.
>
> 2) We state that if the file is currently has unflushed pages in the
> page cache, and FIEMAP_FLAG_SYNC is not passed, whether or not extents
> return the DELALLOC flag or how they handle the UNWRITTEN flag is
> undefined.
>
> 3) We state that FIEMAP is supposed to return information which
> reflects the union of the on-disk and page cache state, with all that
> this implies.
>
> All of these are internally consistent definitions --- we need to
> chose one, document, and then tell the shellutils folks what they
> should do.
>
> In the case of #1 and #2, we really need to implement support for
> SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
> this information.
>
> Do we all agree on the problem statement, at least?  If so, then we
> can try to come consensus on what is the appropriate solution.

I agree on the problem statement.  Users need to know what FIEMAP
returns definitely.

It seems that Dave is looking at the problem from a different view.

Dave thinks that FIEMAP returns where data exists on disk finally.
Then there are 2 possibilities: unknown and known.  delayed extent is
unknown and others are known.  Although we know where data in
unwritten extent exist finally on disk, we cannot know whether or not
it is being in memory.  However we know data in delayed extent is in
memory. it sounds strange.

>
>                                     - Ted
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 10:26 Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) Markus Trippelsdorf
@ 2011-04-20 14:39     ` Jim Meyering
  2011-04-14 14:39 ` Eric Sandeen
       [not found] ` <20110414102608.GA1678-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
  2 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-20 14:39 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA, coreutils-mXXj517/zsQ,
	xfs-VZNHf3L845pBDgjK7y7TUQ

Markus Trippelsdorf wrote:
> I trashed my system this morning when I installed coreutils-8.11.
>
> What happened is that coreutils compiles and links correctly, but then
> the following command (during the installation phase):
>
> ./ginstall chroot hostid nice who users pinky stty df stdbuf [ base64
...
>
> apparently produces files which have the length of the originals but are
> full of zeros. (and these were then installed to my live system, thereby
> trashing it).
...

Thanks again for the report.
I believe that the following series addresses this problem
and have confirmed that tests pass with 2.6.39-rc3 on all
of ext3, ext4, btrfs and xfs -- though there was what appears
to be a spurious failure in tests/cp/sparse-fiemap when run on xfs.
On one iteration of this loop, with j=31, in these loops

  for i in $(seq 1 2 21); do
    for j in 1 2 31 100; do

[in http://git.savannah.gnu.org/cgit/coreutils.git/tree/tests/cp/sparse-fiemap]
the two files compared equal, yet their extents did not match,
even after merging.  I'm inclined to skip the extent-comparing check
at least for XFS, now.

Here's the unusually-technical-for-NEWS summary:

** Changes in behavior

  cp's extent-based (FIEMAP) copying code is more reliable in the face
  of varying and undocumented file system semantics:
  - it no longer treats unwritten extents specially
  - a FIEMAP-based extent copy always uses the FIEMAP_FLAG_SYNC flag.
      Before, it would incur the performance penalty of that sync only
      for 2.6.38 and older kernels.  We thought all problems would be
      resolved for 2.6.39.
  - it now attempts a FIEMAP copy only on a file that appears sparse.
      Sparse files are relatively unusual, and the copying code incurs
      the performance penalty of the now-mandatory sync only for them.




From 2783b52b273dd8fca824d8e1a64f8c4f41a54c00 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 20 Apr 2011 09:49:15 +0200
Subject: [PATCH 1/4] copy: always use FIEMAP_FLAG_SYNC, for now

* src/extent-scan.c (extent_need_sync): Always return true,
to make the sole caller always use FIEMAP_FLAG_SYNC.
This will doubtless have an undesirable performance impact,
but we'll mitigate that shortly, by using extent_copy only on
files with holes.
---
 src/extent-scan.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/src/extent-scan.c b/src/extent-scan.c
index da7eb9d..596e7f7 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -36,6 +36,13 @@
 static bool
 extent_need_sync (void)
 {
+  /* For now always return true, to be on the safe side.
+     If/when FIEMAP semantics are well defined (before SEEK_HOLE support
+     is usable) and kernels implementing them are in use, we may relax
+     this once again.  */
+  return true;
+
+#if FIEMAP_BEHAVIOR_IS_DEFINED_AND_USABLE
   static int need_sync = -1;

   if (need_sync == -1)
@@ -57,6 +64,7 @@ extent_need_sync (void)
     }

   return need_sync;
+#endif
 }

 /* Allocate space for struct extent_scan, initialize the entries if
--
1.7.5.rc2.295.g19c42


From f35019b45e2b1ff6e1940db7b452dcb8f674f190 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 20 Apr 2011 10:15:15 +0200
Subject: [PATCH 2/4] copy: do not treat unwritten extents specially (avoid
 XFS corruption)

* src/copy.c (extent_copy): Do not treat "unwritten extents" specially.
Otherwise, with XFS and a release-candidate 2.6.39-rc3 kernel, and
when using gold as your linker[*], and if you don't run "make check",
you could end up installing files full of zeros instead of the
expected binaries.  For a lot of discussion, see
http://thread.gmane.org/gmane.comp.file-systems.xfs.general/37895

[*] Gold preallocates space for the files it writes, which is good.
However, on XFS, that conspired with the other conditions to result
the malfunctioning of the just-built install binary.
---
 src/copy.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 9b53127..f6f9ea6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -398,7 +398,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
           /* Treat an unwritten but allocated extent much like a hole.
              I.E. don't read, but don't convert to a hole in the destination,
              unless SPARSE_ALWAYS.  */
-          if (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN)
+          /* For now, do not treat FIEMAP_EXTENT_UNWRITTEN specially,
+             because that (in combination with no sync) would lead to data
+             loss at least on XFS and ext4 when using 2.6.39-rc3 kernels.  */
+          if (0 && (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN))
             {
               empty_extent = true;
               last_ext_len = 0;
--
1.7.5.rc2.295.g19c42


From 489261905dfee95cc1ebb708e8302bb246519b8b Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 20 Apr 2011 10:23:32 +0200
Subject: [PATCH 3/4] copy: factor out a tiny sparse-testing function

* src/copy.c (HAVE_STRUCT_STAT_ST_BLOCKS): Define to 0 if undefined,
so we can use it in the return expression, here:
(is_probably_sparse): New function, factored out of...
(copy_reg): ...here.  Use the new function.
---
 src/copy.c |   23 +++++++++++++++++++----
 1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index f6f9ea6..3db07b5 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -764,6 +764,23 @@ fchmod_or_lchmod (int desc, char const *name, mode_t mode)
   return lchmod (name, mode);
 }

+#ifndef HAVE_STRUCT_STAT_ST_BLOCKS
+# define HAVE_STRUCT_STAT_ST_BLOCKS 0
+#endif
+
+/* Use a heuristic to determine whether stat buffer SB comes from a file
+   with sparse blocks.  If the file has fewer blocks than would normally
+   be needed for a file of its size, then at least one of the blocks in
+   the file is a hole.  In that case, return true.  */
+static bool
+is_probably_sparse (struct stat const *sb)
+{
+  return (HAVE_STRUCT_STAT_ST_BLOCKS
+          && S_ISREG (sb->st_mode)
+          && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE);
+}
+
+
 /* Copy a regular file from SRC_NAME to DST_NAME.
    If the source file contains holes, copies holes and blocks of zeros
    in the source file as holes in the destination file.
@@ -984,15 +1001,13 @@ copy_reg (char const *src_name, char const *dst_name,
           if (x->sparse_mode == SPARSE_ALWAYS)
             make_holes = true;

-#if HAVE_STRUCT_STAT_ST_BLOCKS
           /* Use a heuristic to determine whether SRC_NAME contains any sparse
              blocks.  If the file has fewer blocks than would normally be
              needed for a file of its size, then at least one of the blocks in
              the file is a hole.  */
-          if (x->sparse_mode == SPARSE_AUTO && S_ISREG (src_open_sb.st_mode)
-              && ST_NBLOCKS (src_open_sb) < src_open_sb.st_size / ST_NBLOCKSIZE)
+          if (x->sparse_mode == SPARSE_AUTO
+              && is_probably_sparse (&src_open_sb))
             make_holes = true;
-#endif
         }

       /* If not making a sparse file, try to use a more-efficient
--
1.7.5.rc2.295.g19c42


From 39fdf629729319ab4011cf15c0a16cba4e4aba1b Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 20 Apr 2011 11:21:09 +0200
Subject: [PATCH 4/4] copy: use FIEMAP (extent_copy) only for
 apparently-sparse files,

to avoid the expense of extent_copy's unconditional use of
FIEMAP_FLAG_SYNC.
* src/copy.c (copy_reg): Do not attempt extent_copy on a file
that appears to have no holes.
* NEWS (Changes in behavior): Document this.  At first I labeled this
as a bug fix, but that would be inaccurate, considering there is no
documentation of FIEMAP semantics, nor even consensus among kernel
FS developers.  Here's hoping SEEK_HOLE/SEEK_DATA support will soon
make it into the linux kernel.
---
 NEWS       |   13 +++++++++++++
 src/copy.c |   37 +++++++++++++++++++++----------------
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/NEWS b/NEWS
index 4873b5a..7bc2ef3 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,19 @@ GNU coreutils NEWS                                    -*- outline -*-

 * Noteworthy changes in release ?.? (????-??-??) [?]

+** Changes in behavior
+
+  cp's extent-based (FIEMAP) copying code is more reliable in the face
+  of varying and undocumented file system semantics:
+  - it no longer treats unwritten extents specially
+  - a FIEMAP-based extent copy always uses the FIEMAP_FLAG_SYNC flag.
+      Before, it would incur the performance penalty of that sync only
+      for 2.6.38 and older kernels.  We thought all problems would be
+      resolved for 2.6.39.
+  - it now attempts a FIEMAP copy only on a file that appears sparse.
+      Sparse files are relatively unusual, and the copying code incurs
+      the performance penalty of the now-mandatory sync only for them.
+

 * Noteworthy changes in release 8.11 (2011-04-13) [stable]

diff --git a/src/copy.c b/src/copy.c
index 3db07b5..6edf52e 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -993,6 +993,7 @@ copy_reg (char const *src_name, char const *dst_name,

       /* Deal with sparse files.  */
       bool make_holes = false;
+      bool sparse_src = false;

       if (S_ISREG (sb.st_mode))
         {
@@ -1005,8 +1006,8 @@ copy_reg (char const *src_name, char const *dst_name,
              blocks.  If the file has fewer blocks than would normally be
              needed for a file of its size, then at least one of the blocks in
              the file is a hole.  */
-          if (x->sparse_mode == SPARSE_AUTO
-              && is_probably_sparse (&src_open_sb))
+          sparse_src = is_probably_sparse (&src_open_sb);
+          if (x->sparse_mode == SPARSE_AUTO && sparse_src)
             make_holes = true;
         }

@@ -1038,21 +1039,25 @@ copy_reg (char const *src_name, char const *dst_name,
       buf_alloc = xmalloc (buf_size + buf_alignment_slop);
       buf = ptr_align (buf_alloc, buf_alignment);

-      bool normal_copy_required;
-      /* Perform an efficient extent-based copy, falling back to the
-         standard copy only if the initial extent scan fails.  If the
-         '--sparse=never' option is specified, write all data but use
-         any extents to read more efficiently.  */
-      if (extent_copy (source_desc, dest_desc, buf, buf_size,
-                       src_open_sb.st_size,
-                       S_ISREG (sb.st_mode) ? x->sparse_mode : SPARSE_NEVER,
-                       src_name, dst_name, &normal_copy_required))
-        goto preserve_metadata;
-
-      if (! normal_copy_required)
+      if (sparse_src)
         {
-          return_val = false;
-          goto close_src_and_dst_desc;
+          bool normal_copy_required;
+
+          /* Perform an efficient extent-based copy, falling back to the
+             standard copy only if the initial extent scan fails.  If the
+             '--sparse=never' option is specified, write all data but use
+             any extents to read more efficiently.  */
+          if (extent_copy (source_desc, dest_desc, buf, buf_size,
+                           src_open_sb.st_size,
+                           S_ISREG (sb.st_mode) ? x->sparse_mode : SPARSE_NEVER,
+                           src_name, dst_name, &normal_copy_required))
+            goto preserve_metadata;
+
+          if (! normal_copy_required)
+            {
+              return_val = false;
+              goto close_src_and_dst_desc;
+            }
         }

       off_t n_read;
--
1.7.5.rc2.295.g19c42

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-20 14:39     ` Jim Meyering
  0 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-20 14:39 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: linux-ext4, coreutils, xfs

Markus Trippelsdorf wrote:
> I trashed my system this morning when I installed coreutils-8.11.
>
> What happened is that coreutils compiles and links correctly, but then
> the following command (during the installation phase):
>
> ./ginstall chroot hostid nice who users pinky stty df stdbuf [ base64
...
>
> apparently produces files which have the length of the originals but are
> full of zeros. (and these were then installed to my live system, thereby
> trashing it).
...

Thanks again for the report.
I believe that the following series addresses this problem
and have confirmed that tests pass with 2.6.39-rc3 on all
of ext3, ext4, btrfs and xfs -- though there was what appears
to be a spurious failure in tests/cp/sparse-fiemap when run on xfs.
On one iteration of this loop, with j=31, in these loops

  for i in $(seq 1 2 21); do
    for j in 1 2 31 100; do

[in http://git.savannah.gnu.org/cgit/coreutils.git/tree/tests/cp/sparse-fiemap]
the two files compared equal, yet their extents did not match,
even after merging.  I'm inclined to skip the extent-comparing check
at least for XFS, now.

Here's the unusually-technical-for-NEWS summary:

** Changes in behavior

  cp's extent-based (FIEMAP) copying code is more reliable in the face
  of varying and undocumented file system semantics:
  - it no longer treats unwritten extents specially
  - a FIEMAP-based extent copy always uses the FIEMAP_FLAG_SYNC flag.
      Before, it would incur the performance penalty of that sync only
      for 2.6.38 and older kernels.  We thought all problems would be
      resolved for 2.6.39.
  - it now attempts a FIEMAP copy only on a file that appears sparse.
      Sparse files are relatively unusual, and the copying code incurs
      the performance penalty of the now-mandatory sync only for them.




>From 2783b52b273dd8fca824d8e1a64f8c4f41a54c00 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Wed, 20 Apr 2011 09:49:15 +0200
Subject: [PATCH 1/4] copy: always use FIEMAP_FLAG_SYNC, for now

* src/extent-scan.c (extent_need_sync): Always return true,
to make the sole caller always use FIEMAP_FLAG_SYNC.
This will doubtless have an undesirable performance impact,
but we'll mitigate that shortly, by using extent_copy only on
files with holes.
---
 src/extent-scan.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/src/extent-scan.c b/src/extent-scan.c
index da7eb9d..596e7f7 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -36,6 +36,13 @@
 static bool
 extent_need_sync (void)
 {
+  /* For now always return true, to be on the safe side.
+     If/when FIEMAP semantics are well defined (before SEEK_HOLE support
+     is usable) and kernels implementing them are in use, we may relax
+     this once again.  */
+  return true;
+
+#if FIEMAP_BEHAVIOR_IS_DEFINED_AND_USABLE
   static int need_sync = -1;

   if (need_sync == -1)
@@ -57,6 +64,7 @@ extent_need_sync (void)
     }

   return need_sync;
+#endif
 }

 /* Allocate space for struct extent_scan, initialize the entries if
--
1.7.5.rc2.295.g19c42


>From f35019b45e2b1ff6e1940db7b452dcb8f674f190 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Wed, 20 Apr 2011 10:15:15 +0200
Subject: [PATCH 2/4] copy: do not treat unwritten extents specially (avoid
 XFS corruption)

* src/copy.c (extent_copy): Do not treat "unwritten extents" specially.
Otherwise, with XFS and a release-candidate 2.6.39-rc3 kernel, and
when using gold as your linker[*], and if you don't run "make check",
you could end up installing files full of zeros instead of the
expected binaries.  For a lot of discussion, see
http://thread.gmane.org/gmane.comp.file-systems.xfs.general/37895

[*] Gold preallocates space for the files it writes, which is good.
However, on XFS, that conspired with the other conditions to result
the malfunctioning of the just-built install binary.
---
 src/copy.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 9b53127..f6f9ea6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -398,7 +398,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
           /* Treat an unwritten but allocated extent much like a hole.
              I.E. don't read, but don't convert to a hole in the destination,
              unless SPARSE_ALWAYS.  */
-          if (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN)
+          /* For now, do not treat FIEMAP_EXTENT_UNWRITTEN specially,
+             because that (in combination with no sync) would lead to data
+             loss at least on XFS and ext4 when using 2.6.39-rc3 kernels.  */
+          if (0 && (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN))
             {
               empty_extent = true;
               last_ext_len = 0;
--
1.7.5.rc2.295.g19c42


>From 489261905dfee95cc1ebb708e8302bb246519b8b Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Wed, 20 Apr 2011 10:23:32 +0200
Subject: [PATCH 3/4] copy: factor out a tiny sparse-testing function

* src/copy.c (HAVE_STRUCT_STAT_ST_BLOCKS): Define to 0 if undefined,
so we can use it in the return expression, here:
(is_probably_sparse): New function, factored out of...
(copy_reg): ...here.  Use the new function.
---
 src/copy.c |   23 +++++++++++++++++++----
 1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index f6f9ea6..3db07b5 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -764,6 +764,23 @@ fchmod_or_lchmod (int desc, char const *name, mode_t mode)
   return lchmod (name, mode);
 }

+#ifndef HAVE_STRUCT_STAT_ST_BLOCKS
+# define HAVE_STRUCT_STAT_ST_BLOCKS 0
+#endif
+
+/* Use a heuristic to determine whether stat buffer SB comes from a file
+   with sparse blocks.  If the file has fewer blocks than would normally
+   be needed for a file of its size, then at least one of the blocks in
+   the file is a hole.  In that case, return true.  */
+static bool
+is_probably_sparse (struct stat const *sb)
+{
+  return (HAVE_STRUCT_STAT_ST_BLOCKS
+          && S_ISREG (sb->st_mode)
+          && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE);
+}
+
+
 /* Copy a regular file from SRC_NAME to DST_NAME.
    If the source file contains holes, copies holes and blocks of zeros
    in the source file as holes in the destination file.
@@ -984,15 +1001,13 @@ copy_reg (char const *src_name, char const *dst_name,
           if (x->sparse_mode == SPARSE_ALWAYS)
             make_holes = true;

-#if HAVE_STRUCT_STAT_ST_BLOCKS
           /* Use a heuristic to determine whether SRC_NAME contains any sparse
              blocks.  If the file has fewer blocks than would normally be
              needed for a file of its size, then at least one of the blocks in
              the file is a hole.  */
-          if (x->sparse_mode == SPARSE_AUTO && S_ISREG (src_open_sb.st_mode)
-              && ST_NBLOCKS (src_open_sb) < src_open_sb.st_size / ST_NBLOCKSIZE)
+          if (x->sparse_mode == SPARSE_AUTO
+              && is_probably_sparse (&src_open_sb))
             make_holes = true;
-#endif
         }

       /* If not making a sparse file, try to use a more-efficient
--
1.7.5.rc2.295.g19c42


>From 39fdf629729319ab4011cf15c0a16cba4e4aba1b Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Wed, 20 Apr 2011 11:21:09 +0200
Subject: [PATCH 4/4] copy: use FIEMAP (extent_copy) only for
 apparently-sparse files,

to avoid the expense of extent_copy's unconditional use of
FIEMAP_FLAG_SYNC.
* src/copy.c (copy_reg): Do not attempt extent_copy on a file
that appears to have no holes.
* NEWS (Changes in behavior): Document this.  At first I labeled this
as a bug fix, but that would be inaccurate, considering there is no
documentation of FIEMAP semantics, nor even consensus among kernel
FS developers.  Here's hoping SEEK_HOLE/SEEK_DATA support will soon
make it into the linux kernel.
---
 NEWS       |   13 +++++++++++++
 src/copy.c |   37 +++++++++++++++++++++----------------
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/NEWS b/NEWS
index 4873b5a..7bc2ef3 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,19 @@ GNU coreutils NEWS                                    -*- outline -*-

 * Noteworthy changes in release ?.? (????-??-??) [?]

+** Changes in behavior
+
+  cp's extent-based (FIEMAP) copying code is more reliable in the face
+  of varying and undocumented file system semantics:
+  - it no longer treats unwritten extents specially
+  - a FIEMAP-based extent copy always uses the FIEMAP_FLAG_SYNC flag.
+      Before, it would incur the performance penalty of that sync only
+      for 2.6.38 and older kernels.  We thought all problems would be
+      resolved for 2.6.39.
+  - it now attempts a FIEMAP copy only on a file that appears sparse.
+      Sparse files are relatively unusual, and the copying code incurs
+      the performance penalty of the now-mandatory sync only for them.
+

 * Noteworthy changes in release 8.11 (2011-04-13) [stable]

diff --git a/src/copy.c b/src/copy.c
index 3db07b5..6edf52e 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -993,6 +993,7 @@ copy_reg (char const *src_name, char const *dst_name,

       /* Deal with sparse files.  */
       bool make_holes = false;
+      bool sparse_src = false;

       if (S_ISREG (sb.st_mode))
         {
@@ -1005,8 +1006,8 @@ copy_reg (char const *src_name, char const *dst_name,
              blocks.  If the file has fewer blocks than would normally be
              needed for a file of its size, then at least one of the blocks in
              the file is a hole.  */
-          if (x->sparse_mode == SPARSE_AUTO
-              && is_probably_sparse (&src_open_sb))
+          sparse_src = is_probably_sparse (&src_open_sb);
+          if (x->sparse_mode == SPARSE_AUTO && sparse_src)
             make_holes = true;
         }

@@ -1038,21 +1039,25 @@ copy_reg (char const *src_name, char const *dst_name,
       buf_alloc = xmalloc (buf_size + buf_alignment_slop);
       buf = ptr_align (buf_alloc, buf_alignment);

-      bool normal_copy_required;
-      /* Perform an efficient extent-based copy, falling back to the
-         standard copy only if the initial extent scan fails.  If the
-         '--sparse=never' option is specified, write all data but use
-         any extents to read more efficiently.  */
-      if (extent_copy (source_desc, dest_desc, buf, buf_size,
-                       src_open_sb.st_size,
-                       S_ISREG (sb.st_mode) ? x->sparse_mode : SPARSE_NEVER,
-                       src_name, dst_name, &normal_copy_required))
-        goto preserve_metadata;
-
-      if (! normal_copy_required)
+      if (sparse_src)
         {
-          return_val = false;
-          goto close_src_and_dst_desc;
+          bool normal_copy_required;
+
+          /* Perform an efficient extent-based copy, falling back to the
+             standard copy only if the initial extent scan fails.  If the
+             '--sparse=never' option is specified, write all data but use
+             any extents to read more efficiently.  */
+          if (extent_copy (source_desc, dest_desc, buf, buf_size,
+                           src_open_sb.st_size,
+                           S_ISREG (sb.st_mode) ? x->sparse_mode : SPARSE_NEVER,
+                           src_name, dst_name, &normal_copy_required))
+            goto preserve_metadata;
+
+          if (! normal_copy_required)
+            {
+              return_val = false;
+              goto close_src_and_dst_desc;
+            }
         }

       off_t n_read;
--
1.7.5.rc2.295.g19c42

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19 16:01                                             ` Ted Ts'o
@ 2011-04-20 15:21                                               ` Christoph Hellwig
  -1 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-20 15:21 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Eric Sandeen, Dave Chinner, Yongqiang Yang, Andreas Dilger,
	xfs-oss, coreutils, linux-ext4, P?draig Brady,
	Markus Trippelsdorf

On Tue, Apr 19, 2011 at 12:01:14PM -0400, Ted Ts'o wrote:
> 1) We define it as only reflecting ondisk state, and nuke the delalloc
> flag from orbit.
> 
> 2) We state that if the file is currently has unflushed pages in the
> page cache, and FIEMAP_FLAG_SYNC is not passed, whether or not extents
> return the DELALLOC flag or how they handle the UNWRITTEN flag is
> undefined.

That seems like a weird option, as the pagecache state really has
nothing to do at all with the extent layout, and the existence of dirty
pages really has nothing to do with the unwritten flag.

> 3) We state that FIEMAP is supposed to return information which
> reflects the union of the on-disk and page cache state, with all that
> this implies.

How do you want to union the existance of an extent with a state
on disk, with a pending modification to it that is still in-memory
and not flushed out to disk yet?  This is looking into an uncertain
future, as the extent map might change in various other ways before
the transaction to conver the unwritten extents goes to disk.

And if we do this it would need to be a new option to FIEMAP, as
it changes the semantics from the existing one that returns the
actual state on disk (plus the magic delalloc bit).

And even if you find semantics that take pending unwrittent extent
conversions into account and still make sense how do you plan to
implement them?  For buffered writes into unwritten extents it could
be done by walking the pagecache and buffers after adding a new
flag for an already converted unwritten extent to the buffer head
state.  But there's no easy way to do that for direct I/O.

> In the case of #1 and #2, we really need to implement support for
> SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
> this information.

We need to do that anyway, as fiemap is a horrible interface for
tools that just want to skip holes.  

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-20 15:21                                               ` Christoph Hellwig
  0 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-20 15:21 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Andreas Dilger, Eric Sandeen, Yongqiang Yang, xfs-oss, coreutils,
	linux-ext4, P?draig Brady, Markus Trippelsdorf

On Tue, Apr 19, 2011 at 12:01:14PM -0400, Ted Ts'o wrote:
> 1) We define it as only reflecting ondisk state, and nuke the delalloc
> flag from orbit.
> 
> 2) We state that if the file is currently has unflushed pages in the
> page cache, and FIEMAP_FLAG_SYNC is not passed, whether or not extents
> return the DELALLOC flag or how they handle the UNWRITTEN flag is
> undefined.

That seems like a weird option, as the pagecache state really has
nothing to do at all with the extent layout, and the existence of dirty
pages really has nothing to do with the unwritten flag.

> 3) We state that FIEMAP is supposed to return information which
> reflects the union of the on-disk and page cache state, with all that
> this implies.

How do you want to union the existance of an extent with a state
on disk, with a pending modification to it that is still in-memory
and not flushed out to disk yet?  This is looking into an uncertain
future, as the extent map might change in various other ways before
the transaction to conver the unwritten extents goes to disk.

And if we do this it would need to be a new option to FIEMAP, as
it changes the semantics from the existing one that returns the
actual state on disk (plus the magic delalloc bit).

And even if you find semantics that take pending unwrittent extent
conversions into account and still make sense how do you plan to
implement them?  For buffered writes into unwritten extents it could
be done by walking the pagecache and buffers after adding a new
flag for an already converted unwritten extent to the buffer head
state.  But there's no easy way to do that for direct I/O.

> In the case of #1 and #2, we really need to implement support for
> SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
> this information.

We need to do that anyway, as fiemap is a horrible interface for
tools that just want to skip holes.  

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19 21:08                                             ` Dave Chinner
@ 2011-04-20 15:29                                               ` Christoph Hellwig
  -1 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-20 15:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Ted Ts'o, Yongqiang Yang, Andreas Dilger, Eric Sandeen,
	xfs-oss, coreutils, linux-ext4, P?draig Brady,
	Markus Trippelsdorf

On Wed, Apr 20, 2011 at 07:08:25AM +1000, Dave Chinner wrote:
> So, it logically follws that you need to use the SYNC flag for both
> unwritten extents and delalloc extents to find out where there data
> realy lies by converting them to real, written extents. i.e. the
> only extents you can trust contain data from FIEMAP are the real
> extents on disk....

Even more funny is that the bug report that started this thread involved
software that didn't actually care about the location on disk, at all.

cp from coreutils really just wanted an efficient way to skip holes
in sparse files, and we got into a chain reaction of various flaws
and oversights :

 (1) Linux lacks the SEEK_HOLE/SEEK_DATA interface that would make
     skipping holes trivial and thus coreutils has to use FIEMAP.
 (2) ext4 and btrfs in some cases mishandled reporting delalloc
     extents, which means coreutils had to add the sync flag,
     despite not caring where data is on disk
 (3) coreutils tried to treat unwrittent extents as holes.  Which
     makes some sense given their high-level description, although
     probably not too much in practice given that we explicitly
     allocated blocks to these "holes" to optimize performance.
     But the main issue here is that there is no documentation
     that clearly states that unwrittent extents reported by
     FIEMAP may actually contain useful data.  In fact there's
     no useful documentation for FIEMAP outside the kernel tree.
     And interface that complex really needs a manpage.


^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-20 15:29                                               ` Christoph Hellwig
  0 siblings, 0 replies; 117+ messages in thread
From: Christoph Hellwig @ 2011-04-20 15:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Andreas Dilger, Ted Ts'o, Eric Sandeen, Yongqiang Yang,
	xfs-oss, coreutils, linux-ext4, P?draig Brady,
	Markus Trippelsdorf

On Wed, Apr 20, 2011 at 07:08:25AM +1000, Dave Chinner wrote:
> So, it logically follws that you need to use the SYNC flag for both
> unwritten extents and delalloc extents to find out where there data
> realy lies by converting them to real, written extents. i.e. the
> only extents you can trust contain data from FIEMAP are the real
> extents on disk....

Even more funny is that the bug report that started this thread involved
software that didn't actually care about the location on disk, at all.

cp from coreutils really just wanted an efficient way to skip holes
in sparse files, and we got into a chain reaction of various flaws
and oversights :

 (1) Linux lacks the SEEK_HOLE/SEEK_DATA interface that would make
     skipping holes trivial and thus coreutils has to use FIEMAP.
 (2) ext4 and btrfs in some cases mishandled reporting delalloc
     extents, which means coreutils had to add the sync flag,
     despite not caring where data is on disk
 (3) coreutils tried to treat unwrittent extents as holes.  Which
     makes some sense given their high-level description, although
     probably not too much in practice given that we explicitly
     allocated blocks to these "holes" to optimize performance.
     But the main issue here is that there is no documentation
     that clearly states that unwrittent extents reported by
     FIEMAP may actually contain useful data.  In fact there's
     no useful documentation for FIEMAP outside the kernel tree.
     And interface that complex really needs a manpage.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-20 15:21                                               ` Christoph Hellwig
@ 2011-04-20 17:21                                                 ` Ted Ts'o
  -1 siblings, 0 replies; 117+ messages in thread
From: Ted Ts'o @ 2011-04-20 17:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Eric Sandeen, Dave Chinner, Yongqiang Yang, Andreas Dilger,
	xfs-oss, coreutils, linux-ext4, P?draig Brady,
	Markus Trippelsdorf

On Wed, Apr 20, 2011 at 11:21:31AM -0400, Christoph Hellwig wrote:
> 
> How do you want to union the existance of an extent with a state
> on disk, with a pending modification to it that is still in-memory
> and not flushed out to disk yet?  This is looking into an uncertain
> future, as the extent map might change in various other ways before
> the transaction to conver the unwritten extents goes to disk.

So for example, suppose you have a single unwritten extent on disk,
but there are 3 regions within that extent range's that have unwritten
pages, you return 3 or 4 fiemap_extent structures, reflecting the
state if the unwritten pages were pushed out to disk at the time of
the fiemap ioctl --- but without actually doing the expensive sync
operation.  The one case where you can't do that is in the case of
delayed allocation blocks, since you won't know where on disk they
would be going, necessarily --- but hey, conveniently we have a
DELALLOC bit already defined....

> And if we do this it would need to be a new option to FIEMAP, as
> it changes the semantics from the existing one that returns the
> actual state on disk (plus the magic delalloc bit).

Well, we seem to have inconsistent semantics right now, because we
never defined the semantics clearly enough from the beginning.  So no
matter which choice we choose, including "the on-disk extent state
only, and nuke the delalloc bit", we will be changing semantics.  I'm
not sure we can get around that.

> And even if you find semantics that take pending unwrittent extent
> conversions into account and still make sense how do you plan to
> implement them?  For buffered writes into unwritten extents it could
> be done by walking the pagecache and buffers after adding a new
> flag for an already converted unwritten extent to the buffer head
> state.  But there's no easy way to do that for direct I/O.

If the file is being actively modified (for example with direct I/O),
there will be inevitably race conditions.  If only some of the pending
conversions have been taken into account, that seems like it's
reasonable result.  If a file is actively being modified by many DIO
writes, even using FIEMAP_FLAG_SYNC isn't going to help you get a
coherent view of the file, so this seems to be a previously unsolved
problem....

> > In the case of #1 and #2, we really need to implement support for
> > SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
> > this information.
> 
> We need to do that anyway, as fiemap is a horrible interface for
> tools that just want to skip holes.

I agree that implementing SEEK_HOLE/SEEK_DATA is a good thing
regardless of which choice we end up choosing.

	      	    	      	     - Ted

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-20 17:21                                                 ` Ted Ts'o
  0 siblings, 0 replies; 117+ messages in thread
From: Ted Ts'o @ 2011-04-20 17:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andreas Dilger, Eric Sandeen, Yongqiang Yang, xfs-oss, coreutils,
	linux-ext4, P?draig Brady, Markus Trippelsdorf

On Wed, Apr 20, 2011 at 11:21:31AM -0400, Christoph Hellwig wrote:
> 
> How do you want to union the existance of an extent with a state
> on disk, with a pending modification to it that is still in-memory
> and not flushed out to disk yet?  This is looking into an uncertain
> future, as the extent map might change in various other ways before
> the transaction to conver the unwritten extents goes to disk.

So for example, suppose you have a single unwritten extent on disk,
but there are 3 regions within that extent range's that have unwritten
pages, you return 3 or 4 fiemap_extent structures, reflecting the
state if the unwritten pages were pushed out to disk at the time of
the fiemap ioctl --- but without actually doing the expensive sync
operation.  The one case where you can't do that is in the case of
delayed allocation blocks, since you won't know where on disk they
would be going, necessarily --- but hey, conveniently we have a
DELALLOC bit already defined....

> And if we do this it would need to be a new option to FIEMAP, as
> it changes the semantics from the existing one that returns the
> actual state on disk (plus the magic delalloc bit).

Well, we seem to have inconsistent semantics right now, because we
never defined the semantics clearly enough from the beginning.  So no
matter which choice we choose, including "the on-disk extent state
only, and nuke the delalloc bit", we will be changing semantics.  I'm
not sure we can get around that.

> And even if you find semantics that take pending unwrittent extent
> conversions into account and still make sense how do you plan to
> implement them?  For buffered writes into unwritten extents it could
> be done by walking the pagecache and buffers after adding a new
> flag for an already converted unwritten extent to the buffer head
> state.  But there's no easy way to do that for direct I/O.

If the file is being actively modified (for example with direct I/O),
there will be inevitably race conditions.  If only some of the pending
conversions have been taken into account, that seems like it's
reasonable result.  If a file is actively being modified by many DIO
writes, even using FIEMAP_FLAG_SYNC isn't going to help you get a
coherent view of the file, so this seems to be a previously unsolved
problem....

> > In the case of #1 and #2, we really need to implement support for
> > SEEK_HOLE/SEEK_DATA for userspace programs like cp who want to know
> > this information.
> 
> We need to do that anyway, as fiemap is a horrible interface for
> tools that just want to skip holes.

I agree that implementing SEEK_HOLE/SEEK_DATA is a good thing
regardless of which choice we end up choosing.

	      	    	      	     - Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-20 14:39     ` Jim Meyering
@ 2011-04-21 20:01         ` Jim Meyering
  -1 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-21 20:01 UTC (permalink / raw)
  To: Markus Trippelsdorf
  Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA, coreutils-mXXj517/zsQ,
	xfs-VZNHf3L845pBDgjK7y7TUQ

Jim Meyering wrote:
> Markus Trippelsdorf wrote:
>> I trashed my system this morning when I installed coreutils-8.11.
>>
>> What happened is that coreutils compiles and links correctly, but then
>> the following command (during the installation phase):
>>
>> ./ginstall chroot hostid nice who users pinky stty df stdbuf [ base64
> ...
>>
>> apparently produces files which have the length of the originals but are
>> full of zeros. (and these were then installed to my live system, thereby
>> trashing it).
> ...
>
> Thanks again for the report.
> I believe that the following series addresses this problem
> and have confirmed that tests pass with 2.6.39-rc3 on all
> of ext3, ext4, btrfs and xfs -- though there was what appears
> to be a spurious failure in tests/cp/sparse-fiemap when run on xfs.
> On one iteration of this loop, with j=31, in these loops
>
>   for i in $(seq 1 2 21); do
>     for j in 1 2 31 100; do
>
> [in http://git.savannah.gnu.org/cgit/coreutils.git/tree/tests/cp/sparse-fiemap]
> the two files compared equal, yet their extents did not match,
> even after merging.  I'm inclined to skip the extent-comparing check
> at least for XFS, now.
>
> Here's the unusually-technical-for-NEWS summary:

[slightly updated and pushed, along with test-adjusting changes]

** Changes in behavior

  cp's extent-based (FIEMAP) copying code is more reliable in the face
  of varying and undocumented file system semantics:
  - it no longer treats unwritten extents specially
  - a FIEMAP-based extent copy always uses the FIEMAP_FLAG_SYNC flag.
      Before, it would incur the performance penalty of that sync only
      for 2.6.38 and older kernels.  We thought all problems would be
      resolved for 2.6.39.
  - it now attempts a FIEMAP copy only on a file that appears sparse.
      Sparse files are relatively unusual, and the copying code incurs
      the performance penalty of the now-mandatory sync only for them.


From 9bcd045f812a75cf96ba392bc45529422f87c088 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 20 Apr 2011 09:49:15 +0200
Subject: [PATCH 1/6] copy: always use FIEMAP_FLAG_SYNC, for now

* src/extent-scan.c (extent_need_sync): Always return true,
to make the sole caller always use FIEMAP_FLAG_SYNC.
This will doubtless have an undesirable performance impact,
but we'll mitigate that shortly, by using extent_copy only on
files with holes.
---
 src/extent-scan.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/src/extent-scan.c b/src/extent-scan.c
index da7eb9d..596e7f7 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -36,6 +36,13 @@
 static bool
 extent_need_sync (void)
 {
+  /* For now always return true, to be on the safe side.
+     If/when FIEMAP semantics are well defined (before SEEK_HOLE support
+     is usable) and kernels implementing them are in use, we may relax
+     this once again.  */
+  return true;
+
+#if FIEMAP_BEHAVIOR_IS_DEFINED_AND_USABLE
   static int need_sync = -1;

   if (need_sync == -1)
@@ -57,6 +64,7 @@ extent_need_sync (void)
     }

   return need_sync;
+#endif
 }

 /* Allocate space for struct extent_scan, initialize the entries if
--
1.7.5.rc3.291.g63e4e


From bef4fa1e1a20c636979db159647a93e5954bc542 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 20 Apr 2011 10:15:15 +0200
Subject: [PATCH 2/6] copy: do not treat unwritten extents specially: avoid
 XFS/ext4 data loss

* src/copy.c (extent_copy): Do not treat "unwritten extents" specially.
Otherwise, with a release-candidate 2.6.39-rc3 kernel, XFS or ext4,
when using gold as your linker, and if you forget to run "make check",
you could end up installing files full of zeros instead of the expected
binaries.  For a lot of discussion, see
http://thread.gmane.org/gmane.comp.file-systems.xfs.general/37895
* tests/cp/fiemap-empty: Disable this test.
---
 src/copy.c            |    5 ++++-
 tests/cp/fiemap-empty |    5 +++++
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 9b53127..f6f9ea6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -398,7 +398,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
           /* Treat an unwritten but allocated extent much like a hole.
              I.E. don't read, but don't convert to a hole in the destination,
              unless SPARSE_ALWAYS.  */
-          if (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN)
+          /* For now, do not treat FIEMAP_EXTENT_UNWRITTEN specially,
+             because that (in combination with no sync) would lead to data
+             loss at least on XFS and ext4 when using 2.6.39-rc3 kernels.  */
+          if (0 && (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN))
             {
               empty_extent = true;
               last_ext_len = 0;
diff --git a/tests/cp/fiemap-empty b/tests/cp/fiemap-empty
index 64c3254..836668e 100755
--- a/tests/cp/fiemap-empty
+++ b/tests/cp/fiemap-empty
@@ -19,6 +19,11 @@
 . "${srcdir=.}/init.sh"; path_prepend_ ../src
 print_ver_ cp

+# FIXME: enable any part of this test that is still relevant,
+# or, if none are relevant (now that cp does not handle unwritten
+# extents), just remove the test altogether.
+skip_test_ 'disabled for now'
+
 touch fiemap_chk
 fiemap_capable_ fiemap_chk ||
   skip_test_ 'this file system lacks FIEMAP support'
--
1.7.5.rc3.291.g63e4e


From 846d826096fc8fb621d751f8b3db1da68a8bbd06 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 20 Apr 2011 10:23:32 +0200
Subject: [PATCH 3/6] copy: factor out a tiny sparse-testing function

* src/copy.c (HAVE_STRUCT_STAT_ST_BLOCKS): Define to 0 if undefined,
so we can use it in the return expression, here:
(is_probably_sparse): New function, factored out of...
(copy_reg): ...here.  Use the new function.
---
 src/copy.c |   23 +++++++++++++++++++----
 1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index f6f9ea6..3db07b5 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -764,6 +764,23 @@ fchmod_or_lchmod (int desc, char const *name, mode_t mode)
   return lchmod (name, mode);
 }

+#ifndef HAVE_STRUCT_STAT_ST_BLOCKS
+# define HAVE_STRUCT_STAT_ST_BLOCKS 0
+#endif
+
+/* Use a heuristic to determine whether stat buffer SB comes from a file
+   with sparse blocks.  If the file has fewer blocks than would normally
+   be needed for a file of its size, then at least one of the blocks in
+   the file is a hole.  In that case, return true.  */
+static bool
+is_probably_sparse (struct stat const *sb)
+{
+  return (HAVE_STRUCT_STAT_ST_BLOCKS
+          && S_ISREG (sb->st_mode)
+          && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE);
+}
+
+
 /* Copy a regular file from SRC_NAME to DST_NAME.
    If the source file contains holes, copies holes and blocks of zeros
    in the source file as holes in the destination file.
@@ -984,15 +1001,13 @@ copy_reg (char const *src_name, char const *dst_name,
           if (x->sparse_mode == SPARSE_ALWAYS)
             make_holes = true;

-#if HAVE_STRUCT_STAT_ST_BLOCKS
           /* Use a heuristic to determine whether SRC_NAME contains any sparse
              blocks.  If the file has fewer blocks than would normally be
              needed for a file of its size, then at least one of the blocks in
              the file is a hole.  */
-          if (x->sparse_mode == SPARSE_AUTO && S_ISREG (src_open_sb.st_mode)
-              && ST_NBLOCKS (src_open_sb) < src_open_sb.st_size / ST_NBLOCKSIZE)
+          if (x->sparse_mode == SPARSE_AUTO
+              && is_probably_sparse (&src_open_sb))
             make_holes = true;
-#endif
         }

       /* If not making a sparse file, try to use a more-efficient
--
1.7.5.rc3.291.g63e4e


From 18a474d755aa10d881243d9457d2c420c5e4ea77 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Wed, 20 Apr 2011 11:21:09 +0200
Subject: [PATCH 4/6] copy: use FIEMAP (extent_copy) only for
 apparently-sparse files,

to avoid the expense of extent_copy's unconditional use of
FIEMAP_FLAG_SYNC.
* src/copy.c (copy_reg): Do not attempt extent_copy on a file
that appears to have no holes.
* NEWS (Changes in behavior): Document this.  At first I labeled this
as a bug fix, but that would be inaccurate, considering there is no
documentation of FIEMAP semantics, nor even consensus among kernel
FS developers.  Here's hoping SEEK_HOLE/SEEK_DATA support will soon
make it into the linux kernel.
---
 NEWS       |   13 +++++++++++++
 src/copy.c |   37 +++++++++++++++++++++----------------
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/NEWS b/NEWS
index 4873b5a..7bc2ef3 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,19 @@ GNU coreutils NEWS                                    -*- outline -*-

 * Noteworthy changes in release ?.? (????-??-??) [?]

+** Changes in behavior
+
+  cp's extent-based (FIEMAP) copying code is more reliable in the face
+  of varying and undocumented file system semantics:
+  - it no longer treats unwritten extents specially
+  - a FIEMAP-based extent copy always uses the FIEMAP_FLAG_SYNC flag.
+      Before, it would incur the performance penalty of that sync only
+      for 2.6.38 and older kernels.  We thought all problems would be
+      resolved for 2.6.39.
+  - it now attempts a FIEMAP copy only on a file that appears sparse.
+      Sparse files are relatively unusual, and the copying code incurs
+      the performance penalty of the now-mandatory sync only for them.
+

 * Noteworthy changes in release 8.11 (2011-04-13) [stable]

diff --git a/src/copy.c b/src/copy.c
index 3db07b5..6edf52e 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -993,6 +993,7 @@ copy_reg (char const *src_name, char const *dst_name,

       /* Deal with sparse files.  */
       bool make_holes = false;
+      bool sparse_src = false;

       if (S_ISREG (sb.st_mode))
         {
@@ -1005,8 +1006,8 @@ copy_reg (char const *src_name, char const *dst_name,
              blocks.  If the file has fewer blocks than would normally be
              needed for a file of its size, then at least one of the blocks in
              the file is a hole.  */
-          if (x->sparse_mode == SPARSE_AUTO
-              && is_probably_sparse (&src_open_sb))
+          sparse_src = is_probably_sparse (&src_open_sb);
+          if (x->sparse_mode == SPARSE_AUTO && sparse_src)
             make_holes = true;
         }

@@ -1038,21 +1039,25 @@ copy_reg (char const *src_name, char const *dst_name,
       buf_alloc = xmalloc (buf_size + buf_alignment_slop);
       buf = ptr_align (buf_alloc, buf_alignment);

-      bool normal_copy_required;
-      /* Perform an efficient extent-based copy, falling back to the
-         standard copy only if the initial extent scan fails.  If the
-         '--sparse=never' option is specified, write all data but use
-         any extents to read more efficiently.  */
-      if (extent_copy (source_desc, dest_desc, buf, buf_size,
-                       src_open_sb.st_size,
-                       S_ISREG (sb.st_mode) ? x->sparse_mode : SPARSE_NEVER,
-                       src_name, dst_name, &normal_copy_required))
-        goto preserve_metadata;
-
-      if (! normal_copy_required)
+      if (sparse_src)
         {
-          return_val = false;
-          goto close_src_and_dst_desc;
+          bool normal_copy_required;
+
+          /* Perform an efficient extent-based copy, falling back to the
+             standard copy only if the initial extent scan fails.  If the
+             '--sparse=never' option is specified, write all data but use
+             any extents to read more efficiently.  */
+          if (extent_copy (source_desc, dest_desc, buf, buf_size,
+                           src_open_sb.st_size,
+                           S_ISREG (sb.st_mode) ? x->sparse_mode : SPARSE_NEVER,
+                           src_name, dst_name, &normal_copy_required))
+            goto preserve_metadata;
+
+          if (! normal_copy_required)
+            {
+              return_val = false;
+              goto close_src_and_dst_desc;
+            }
         }

       off_t n_read;
--
1.7.5.rc3.291.g63e4e


From 223e3832eb5a9b1aadf0a69d076f40116389565c Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Thu, 21 Apr 2011 18:08:20 +0200
Subject: [PATCH 5/6] tests: sparse-fiemap: report more detail upon failure;
 ignore an FP

* tests/cp/sparse-fiemap: Fail right away with details, when cmp fails.
When extent maps are found to differ, display them and merely warn.
---
 tests/cp/sparse-fiemap |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 2c6a250..2e8c95b 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -75,7 +75,7 @@ for i in $(seq 1 2 21); do
     # for the same reasons.
     cp --sparse=always j1 j2 || fail=1

-    cmp j1 j2 || fail=1
+    cmp j1 j2 || fail_ "data loss i=$i j=$j"
     if ! filefrag -vs j1 | grep -F extent >/dev/null; then
       test $skip != 1 && warn_ 'skipping part; you lack filefrag'
       skip=1
@@ -98,8 +98,12 @@ for i in $(seq 1 2 21); do
       # exclude the physical block numbers; they always differ
       filefrag -v j1 > ff1 || framework_failure
       filefrag -vs j2 > ff2 || framework_failure
-      { f ff1; f ff2; } | $PERL $abs_top_srcdir/tests/filefrag-extent-compare ||
-        fail=1
+      { f ff1; f ff2; } | $PERL $abs_top_srcdir/tests/filefrag-extent-compare \
+        || {
+             warn_ ignoring filefrag-reported extent map differences
+             # Show the differing extent maps.
+             head -99 ff1 ff2
+           }
     fi
     test $fail = 1 && break 2
   done
--
1.7.5.rc3.291.g63e4e


From 8c0b1de42c615a82ce7e32901ad1e4dca95b3657 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date: Thu, 21 Apr 2011 21:01:13 +0200
Subject: [PATCH 6/6] tests: sparse-fiemap: with root/ext3, do not create an
 ext4 FS

* tests/cp/sparse-fiemap: When this test was run as root on an ext3
file system, (ext3 had known problems), it would trickily create and
mount a loopback ext4 file system and use that instead.  However, due
to a bug in 2.6.39-rc1..rc3, this loopback test (when run in another
loopback FS) exposed a bug with 1k-blocksize ext4 whereby non-NUL
data would be read from a hole.  For details, see this:
http://thread.gmane.org/gmane.comp.file-systems.ext4/24495
---
 tests/cp/sparse-fiemap |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 2e8c95b..1394060 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -26,6 +26,10 @@ touch fiemap_chk
 if fiemap_capable_ fiemap_chk && ! df -t ext3 . >/dev/null; then
   : # Current partition has working extents.  Good!
 else
+  # FIXME: temporarily(?) skip this variant, at least until after this bug
+  # is fixed: http://thread.gmane.org/gmane.comp.file-systems.ext4/24495
+  skip_test_ "current file system has insufficient FIEMAP support"
+
   # It's not;  we need to create one, hence we need root access.
   require_root_

--
1.7.5.rc3.291.g63e4e

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-21 20:01         ` Jim Meyering
  0 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-21 20:01 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: linux-ext4, coreutils, xfs

Jim Meyering wrote:
> Markus Trippelsdorf wrote:
>> I trashed my system this morning when I installed coreutils-8.11.
>>
>> What happened is that coreutils compiles and links correctly, but then
>> the following command (during the installation phase):
>>
>> ./ginstall chroot hostid nice who users pinky stty df stdbuf [ base64
> ...
>>
>> apparently produces files which have the length of the originals but are
>> full of zeros. (and these were then installed to my live system, thereby
>> trashing it).
> ...
>
> Thanks again for the report.
> I believe that the following series addresses this problem
> and have confirmed that tests pass with 2.6.39-rc3 on all
> of ext3, ext4, btrfs and xfs -- though there was what appears
> to be a spurious failure in tests/cp/sparse-fiemap when run on xfs.
> On one iteration of this loop, with j=31, in these loops
>
>   for i in $(seq 1 2 21); do
>     for j in 1 2 31 100; do
>
> [in http://git.savannah.gnu.org/cgit/coreutils.git/tree/tests/cp/sparse-fiemap]
> the two files compared equal, yet their extents did not match,
> even after merging.  I'm inclined to skip the extent-comparing check
> at least for XFS, now.
>
> Here's the unusually-technical-for-NEWS summary:

[slightly updated and pushed, along with test-adjusting changes]

** Changes in behavior

  cp's extent-based (FIEMAP) copying code is more reliable in the face
  of varying and undocumented file system semantics:
  - it no longer treats unwritten extents specially
  - a FIEMAP-based extent copy always uses the FIEMAP_FLAG_SYNC flag.
      Before, it would incur the performance penalty of that sync only
      for 2.6.38 and older kernels.  We thought all problems would be
      resolved for 2.6.39.
  - it now attempts a FIEMAP copy only on a file that appears sparse.
      Sparse files are relatively unusual, and the copying code incurs
      the performance penalty of the now-mandatory sync only for them.


>From 9bcd045f812a75cf96ba392bc45529422f87c088 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Wed, 20 Apr 2011 09:49:15 +0200
Subject: [PATCH 1/6] copy: always use FIEMAP_FLAG_SYNC, for now

* src/extent-scan.c (extent_need_sync): Always return true,
to make the sole caller always use FIEMAP_FLAG_SYNC.
This will doubtless have an undesirable performance impact,
but we'll mitigate that shortly, by using extent_copy only on
files with holes.
---
 src/extent-scan.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/src/extent-scan.c b/src/extent-scan.c
index da7eb9d..596e7f7 100644
--- a/src/extent-scan.c
+++ b/src/extent-scan.c
@@ -36,6 +36,13 @@
 static bool
 extent_need_sync (void)
 {
+  /* For now always return true, to be on the safe side.
+     If/when FIEMAP semantics are well defined (before SEEK_HOLE support
+     is usable) and kernels implementing them are in use, we may relax
+     this once again.  */
+  return true;
+
+#if FIEMAP_BEHAVIOR_IS_DEFINED_AND_USABLE
   static int need_sync = -1;

   if (need_sync == -1)
@@ -57,6 +64,7 @@ extent_need_sync (void)
     }

   return need_sync;
+#endif
 }

 /* Allocate space for struct extent_scan, initialize the entries if
--
1.7.5.rc3.291.g63e4e


>From bef4fa1e1a20c636979db159647a93e5954bc542 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Wed, 20 Apr 2011 10:15:15 +0200
Subject: [PATCH 2/6] copy: do not treat unwritten extents specially: avoid
 XFS/ext4 data loss

* src/copy.c (extent_copy): Do not treat "unwritten extents" specially.
Otherwise, with a release-candidate 2.6.39-rc3 kernel, XFS or ext4,
when using gold as your linker, and if you forget to run "make check",
you could end up installing files full of zeros instead of the expected
binaries.  For a lot of discussion, see
http://thread.gmane.org/gmane.comp.file-systems.xfs.general/37895
* tests/cp/fiemap-empty: Disable this test.
---
 src/copy.c            |    5 ++++-
 tests/cp/fiemap-empty |    5 +++++
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index 9b53127..f6f9ea6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -398,7 +398,10 @@ extent_copy (int src_fd, int dest_fd, char *buf, size_t buf_size,
           /* Treat an unwritten but allocated extent much like a hole.
              I.E. don't read, but don't convert to a hole in the destination,
              unless SPARSE_ALWAYS.  */
-          if (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN)
+          /* For now, do not treat FIEMAP_EXTENT_UNWRITTEN specially,
+             because that (in combination with no sync) would lead to data
+             loss at least on XFS and ext4 when using 2.6.39-rc3 kernels.  */
+          if (0 && (scan.ext_info[i].ext_flags & FIEMAP_EXTENT_UNWRITTEN))
             {
               empty_extent = true;
               last_ext_len = 0;
diff --git a/tests/cp/fiemap-empty b/tests/cp/fiemap-empty
index 64c3254..836668e 100755
--- a/tests/cp/fiemap-empty
+++ b/tests/cp/fiemap-empty
@@ -19,6 +19,11 @@
 . "${srcdir=.}/init.sh"; path_prepend_ ../src
 print_ver_ cp

+# FIXME: enable any part of this test that is still relevant,
+# or, if none are relevant (now that cp does not handle unwritten
+# extents), just remove the test altogether.
+skip_test_ 'disabled for now'
+
 touch fiemap_chk
 fiemap_capable_ fiemap_chk ||
   skip_test_ 'this file system lacks FIEMAP support'
--
1.7.5.rc3.291.g63e4e


>From 846d826096fc8fb621d751f8b3db1da68a8bbd06 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Wed, 20 Apr 2011 10:23:32 +0200
Subject: [PATCH 3/6] copy: factor out a tiny sparse-testing function

* src/copy.c (HAVE_STRUCT_STAT_ST_BLOCKS): Define to 0 if undefined,
so we can use it in the return expression, here:
(is_probably_sparse): New function, factored out of...
(copy_reg): ...here.  Use the new function.
---
 src/copy.c |   23 +++++++++++++++++++----
 1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/src/copy.c b/src/copy.c
index f6f9ea6..3db07b5 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -764,6 +764,23 @@ fchmod_or_lchmod (int desc, char const *name, mode_t mode)
   return lchmod (name, mode);
 }

+#ifndef HAVE_STRUCT_STAT_ST_BLOCKS
+# define HAVE_STRUCT_STAT_ST_BLOCKS 0
+#endif
+
+/* Use a heuristic to determine whether stat buffer SB comes from a file
+   with sparse blocks.  If the file has fewer blocks than would normally
+   be needed for a file of its size, then at least one of the blocks in
+   the file is a hole.  In that case, return true.  */
+static bool
+is_probably_sparse (struct stat const *sb)
+{
+  return (HAVE_STRUCT_STAT_ST_BLOCKS
+          && S_ISREG (sb->st_mode)
+          && ST_NBLOCKS (*sb) < sb->st_size / ST_NBLOCKSIZE);
+}
+
+
 /* Copy a regular file from SRC_NAME to DST_NAME.
    If the source file contains holes, copies holes and blocks of zeros
    in the source file as holes in the destination file.
@@ -984,15 +1001,13 @@ copy_reg (char const *src_name, char const *dst_name,
           if (x->sparse_mode == SPARSE_ALWAYS)
             make_holes = true;

-#if HAVE_STRUCT_STAT_ST_BLOCKS
           /* Use a heuristic to determine whether SRC_NAME contains any sparse
              blocks.  If the file has fewer blocks than would normally be
              needed for a file of its size, then at least one of the blocks in
              the file is a hole.  */
-          if (x->sparse_mode == SPARSE_AUTO && S_ISREG (src_open_sb.st_mode)
-              && ST_NBLOCKS (src_open_sb) < src_open_sb.st_size / ST_NBLOCKSIZE)
+          if (x->sparse_mode == SPARSE_AUTO
+              && is_probably_sparse (&src_open_sb))
             make_holes = true;
-#endif
         }

       /* If not making a sparse file, try to use a more-efficient
--
1.7.5.rc3.291.g63e4e


>From 18a474d755aa10d881243d9457d2c420c5e4ea77 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Wed, 20 Apr 2011 11:21:09 +0200
Subject: [PATCH 4/6] copy: use FIEMAP (extent_copy) only for
 apparently-sparse files,

to avoid the expense of extent_copy's unconditional use of
FIEMAP_FLAG_SYNC.
* src/copy.c (copy_reg): Do not attempt extent_copy on a file
that appears to have no holes.
* NEWS (Changes in behavior): Document this.  At first I labeled this
as a bug fix, but that would be inaccurate, considering there is no
documentation of FIEMAP semantics, nor even consensus among kernel
FS developers.  Here's hoping SEEK_HOLE/SEEK_DATA support will soon
make it into the linux kernel.
---
 NEWS       |   13 +++++++++++++
 src/copy.c |   37 +++++++++++++++++++++----------------
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/NEWS b/NEWS
index 4873b5a..7bc2ef3 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,19 @@ GNU coreutils NEWS                                    -*- outline -*-

 * Noteworthy changes in release ?.? (????-??-??) [?]

+** Changes in behavior
+
+  cp's extent-based (FIEMAP) copying code is more reliable in the face
+  of varying and undocumented file system semantics:
+  - it no longer treats unwritten extents specially
+  - a FIEMAP-based extent copy always uses the FIEMAP_FLAG_SYNC flag.
+      Before, it would incur the performance penalty of that sync only
+      for 2.6.38 and older kernels.  We thought all problems would be
+      resolved for 2.6.39.
+  - it now attempts a FIEMAP copy only on a file that appears sparse.
+      Sparse files are relatively unusual, and the copying code incurs
+      the performance penalty of the now-mandatory sync only for them.
+

 * Noteworthy changes in release 8.11 (2011-04-13) [stable]

diff --git a/src/copy.c b/src/copy.c
index 3db07b5..6edf52e 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -993,6 +993,7 @@ copy_reg (char const *src_name, char const *dst_name,

       /* Deal with sparse files.  */
       bool make_holes = false;
+      bool sparse_src = false;

       if (S_ISREG (sb.st_mode))
         {
@@ -1005,8 +1006,8 @@ copy_reg (char const *src_name, char const *dst_name,
              blocks.  If the file has fewer blocks than would normally be
              needed for a file of its size, then at least one of the blocks in
              the file is a hole.  */
-          if (x->sparse_mode == SPARSE_AUTO
-              && is_probably_sparse (&src_open_sb))
+          sparse_src = is_probably_sparse (&src_open_sb);
+          if (x->sparse_mode == SPARSE_AUTO && sparse_src)
             make_holes = true;
         }

@@ -1038,21 +1039,25 @@ copy_reg (char const *src_name, char const *dst_name,
       buf_alloc = xmalloc (buf_size + buf_alignment_slop);
       buf = ptr_align (buf_alloc, buf_alignment);

-      bool normal_copy_required;
-      /* Perform an efficient extent-based copy, falling back to the
-         standard copy only if the initial extent scan fails.  If the
-         '--sparse=never' option is specified, write all data but use
-         any extents to read more efficiently.  */
-      if (extent_copy (source_desc, dest_desc, buf, buf_size,
-                       src_open_sb.st_size,
-                       S_ISREG (sb.st_mode) ? x->sparse_mode : SPARSE_NEVER,
-                       src_name, dst_name, &normal_copy_required))
-        goto preserve_metadata;
-
-      if (! normal_copy_required)
+      if (sparse_src)
         {
-          return_val = false;
-          goto close_src_and_dst_desc;
+          bool normal_copy_required;
+
+          /* Perform an efficient extent-based copy, falling back to the
+             standard copy only if the initial extent scan fails.  If the
+             '--sparse=never' option is specified, write all data but use
+             any extents to read more efficiently.  */
+          if (extent_copy (source_desc, dest_desc, buf, buf_size,
+                           src_open_sb.st_size,
+                           S_ISREG (sb.st_mode) ? x->sparse_mode : SPARSE_NEVER,
+                           src_name, dst_name, &normal_copy_required))
+            goto preserve_metadata;
+
+          if (! normal_copy_required)
+            {
+              return_val = false;
+              goto close_src_and_dst_desc;
+            }
         }

       off_t n_read;
--
1.7.5.rc3.291.g63e4e


>From 223e3832eb5a9b1aadf0a69d076f40116389565c Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Thu, 21 Apr 2011 18:08:20 +0200
Subject: [PATCH 5/6] tests: sparse-fiemap: report more detail upon failure;
 ignore an FP

* tests/cp/sparse-fiemap: Fail right away with details, when cmp fails.
When extent maps are found to differ, display them and merely warn.
---
 tests/cp/sparse-fiemap |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 2c6a250..2e8c95b 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -75,7 +75,7 @@ for i in $(seq 1 2 21); do
     # for the same reasons.
     cp --sparse=always j1 j2 || fail=1

-    cmp j1 j2 || fail=1
+    cmp j1 j2 || fail_ "data loss i=$i j=$j"
     if ! filefrag -vs j1 | grep -F extent >/dev/null; then
       test $skip != 1 && warn_ 'skipping part; you lack filefrag'
       skip=1
@@ -98,8 +98,12 @@ for i in $(seq 1 2 21); do
       # exclude the physical block numbers; they always differ
       filefrag -v j1 > ff1 || framework_failure
       filefrag -vs j2 > ff2 || framework_failure
-      { f ff1; f ff2; } | $PERL $abs_top_srcdir/tests/filefrag-extent-compare ||
-        fail=1
+      { f ff1; f ff2; } | $PERL $abs_top_srcdir/tests/filefrag-extent-compare \
+        || {
+             warn_ ignoring filefrag-reported extent map differences
+             # Show the differing extent maps.
+             head -99 ff1 ff2
+           }
     fi
     test $fail = 1 && break 2
   done
--
1.7.5.rc3.291.g63e4e


>From 8c0b1de42c615a82ce7e32901ad1e4dca95b3657 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Thu, 21 Apr 2011 21:01:13 +0200
Subject: [PATCH 6/6] tests: sparse-fiemap: with root/ext3, do not create an
 ext4 FS

* tests/cp/sparse-fiemap: When this test was run as root on an ext3
file system, (ext3 had known problems), it would trickily create and
mount a loopback ext4 file system and use that instead.  However, due
to a bug in 2.6.39-rc1..rc3, this loopback test (when run in another
loopback FS) exposed a bug with 1k-blocksize ext4 whereby non-NUL
data would be read from a hole.  For details, see this:
http://thread.gmane.org/gmane.comp.file-systems.ext4/24495
---
 tests/cp/sparse-fiemap |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap
index 2e8c95b..1394060 100755
--- a/tests/cp/sparse-fiemap
+++ b/tests/cp/sparse-fiemap
@@ -26,6 +26,10 @@ touch fiemap_chk
 if fiemap_capable_ fiemap_chk && ! df -t ext3 . >/dev/null; then
   : # Current partition has working extents.  Good!
 else
+  # FIXME: temporarily(?) skip this variant, at least until after this bug
+  # is fixed: http://thread.gmane.org/gmane.comp.file-systems.ext4/24495
+  skip_test_ "current file system has insufficient FIEMAP support"
+
   # It's not;  we need to create one, hence we need root access.
   require_root_

--
1.7.5.rc3.291.g63e4e

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-19  3:05                                         ` Eric Sandeen
@ 2011-04-21 20:12                                             ` Jim Meyering
  -1 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-21 20:12 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Andreas Dilger, Ted Ts'o, Dave Chinner, Yongqiang Yang,
	xfs-oss, coreutils-mXXj517/zsQ,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA, Markus Trippelsdorf

Eric Sandeen wrote:
> On 4/18/11 9:59 PM, Ted Ts'o wrote:
> ...
>> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
>> it's the only way to guarantee correct behaviour for XFS.  But I would
>> really rather that be the long-term way we leave things!
>
> XFS ... or ext4:
>
> # xfs_io -Ff -c "falloc 0 1m" -c "pwrite 0 512k" testfile;
> /root/fiemap-test testfile
> wrote 524288/524288 bytes at offset 0
> 512 KiB, 128 ops; 0.0000 sec (161.342 MiB/sec and 41303.6463 ops/sec)
> start 0 length -1 flags 0x0 count 32
> ext: 0 logical: [ 0..  255] phys: 34048..  34303 flags: 0x801 tot: 256
>
> # uname -r
> 2.6.39-0.rc3.git2.0.fc16.x86_64
>
> Above is on ext4.  It behaves exactly like XFS in my testing; data in
> the page cache does not cause fiemap to return anything other than
> "unwritten" for preallocated extents.

Thanks for the feedback.
In case anyone wants to test or review,
I've just made a coreutils snapshot:

  http://thread.gmane.org/gmane.comp.gnu.coreutils.general/1108

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-04-21 20:12                                             ` Jim Meyering
  0 siblings, 0 replies; 117+ messages in thread
From: Jim Meyering @ 2011-04-21 20:12 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Andreas Dilger, Ted Ts'o, Yongqiang Yang, xfs-oss, coreutils,
	linux-ext4, Markus Trippelsdorf

Eric Sandeen wrote:
> On 4/18/11 9:59 PM, Ted Ts'o wrote:
> ...
>> Maybe coreutils will need to use FIEMAP_FLAG_SYNC initially, since
>> it's the only way to guarantee correct behaviour for XFS.  But I would
>> really rather that be the long-term way we leave things!
>
> XFS ... or ext4:
>
> # xfs_io -Ff -c "falloc 0 1m" -c "pwrite 0 512k" testfile;
> /root/fiemap-test testfile
> wrote 524288/524288 bytes at offset 0
> 512 KiB, 128 ops; 0.0000 sec (161.342 MiB/sec and 41303.6463 ops/sec)
> start 0 length -1 flags 0x0 count 32
> ext: 0 logical: [ 0..  255] phys: 34048..  34303 flags: 0x801 tot: 256
>
> # uname -r
> 2.6.39-0.rc3.git2.0.fc16.x86_64
>
> Above is on ext4.  It behaves exactly like XFS in my testing; data in
> the page cache does not cause fiemap to return anything other than
> "unwritten" for preallocated extents.

Thanks for the feedback.
In case anyone wants to test or review,
I've just made a coreutils snapshot:

  http://thread.gmane.org/gmane.comp.gnu.coreutils.general/1108

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-04-14 16:10                       ` Yongqiang Yang
@ 2011-05-05 11:29                           ` Pádraig Brady
  -1 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-05-05 11:29 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: linux-ext4-u79uwXL29TY76Z2rM5mHXA, Eric Sandeen,
	coreutils-mXXj517/zsQ, Markus Trippelsdorf, xfs-oss

On 14/04/11 17:10, Yongqiang Yang wrote:
> Hi,
> 
> I am off my working computer.  Maybe below fix could fix the problem.
> 
> fs/ext4/extent.c
> static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
> 1877                 } else if (block >= le32_to_cpu(ex->ee_block)) {
> 1878                         /*
> 1879                          * some part of requested space is covered
> 1880                          * by found extent
> 1881                          */
> 1882                         start = block;
> 1883                         end = le32_to_cpu(ex->ee_block)
> 1884                                 + ext4_ext_get_actual_len(ex);
> 1885                         if (block + num < end)
> 1886                                 end = block + num;
>        +                        if (!ext4_ext_is_uninitialized(ex))
> 1887                         exists = 1;
> 1888                 } else {
> 1889                         BUG();
> 1890                 }

Hi,

To follow up on the above.  I'm under the impression
that ext4 is expected to return extents for what
is written, irrespective of whether it's reached the
disk or not. I.E. the preallocation case where this fails
was an oversite, for which the above might fix.

So is the above summary correct, and has there
been any more thoughts on a fix?

cheers,
Pádraig.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-05-05 11:29                           ` Pádraig Brady
  0 siblings, 0 replies; 117+ messages in thread
From: Pádraig Brady @ 2011-05-05 11:29 UTC (permalink / raw)
  To: Yongqiang Yang
  Cc: linux-ext4, Eric Sandeen, coreutils, Markus Trippelsdorf, xfs-oss

On 14/04/11 17:10, Yongqiang Yang wrote:
> Hi,
> 
> I am off my working computer.  Maybe below fix could fix the problem.
> 
> fs/ext4/extent.c
> static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
> 1877                 } else if (block >= le32_to_cpu(ex->ee_block)) {
> 1878                         /*
> 1879                          * some part of requested space is covered
> 1880                          * by found extent
> 1881                          */
> 1882                         start = block;
> 1883                         end = le32_to_cpu(ex->ee_block)
> 1884                                 + ext4_ext_get_actual_len(ex);
> 1885                         if (block + num < end)
> 1886                                 end = block + num;
>        +                        if (!ext4_ext_is_uninitialized(ex))
> 1887                         exists = 1;
> 1888                 } else {
> 1889                         BUG();
> 1890                 }

Hi,

To follow up on the above.  I'm under the impression
that ext4 is expected to return extents for what
is written, irrespective of whether it's reached the
disk or not. I.E. the preallocation case where this fails
was an oversite, for which the above might fix.

So is the above summary correct, and has there
been any more thoughts on a fix?

cheers,
Pádraig.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
  2011-05-05 11:29                           ` Pádraig Brady
@ 2011-05-05 11:47                             ` Yongqiang Yang
  -1 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-05-05 11:47 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: Eric Sandeen, xfs-oss, linux-ext4, coreutils, Markus Trippelsdorf

2011/5/5 Pádraig Brady <P@draigbrady.com>:
> On 14/04/11 17:10, Yongqiang Yang wrote:
>> Hi,
>>
>> I am off my working computer.  Maybe below fix could fix the problem.
>>
>> fs/ext4/extent.c
>> static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
>> 1877                 } else if (block >= le32_to_cpu(ex->ee_block)) {
>> 1878                         /*
>> 1879                          * some part of requested space is covered
>> 1880                          * by found extent
>> 1881                          */
>> 1882                         start = block;
>> 1883                         end = le32_to_cpu(ex->ee_block)
>> 1884                                 + ext4_ext_get_actual_len(ex);
>> 1885                         if (block + num < end)
>> 1886                                 end = block + num;
>>        +                        if (!ext4_ext_is_uninitialized(ex))
>> 1887                         exists = 1;
>> 1888                 } else {
>> 1889                         BUG();
>> 1890                 }
>
> Hi,
>
> To follow up on the above.  I'm under the impression
> that ext4 is expected to return extents for what
> is written, irrespective of whether it's reached the
> disk or not. I.E. the preallocation case where this fails
No.  It just returns extent info now - allocated extents and delayed
extents.  In the preallocation case, it returns unwritten extents.
And the code above does not work.

> was an oversite, for which the above might fix.
>
> So is the above summary correct, and has there
> been any more thoughts on a fix?
>
> cheers,
> Pádraig.
>



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
@ 2011-05-05 11:47                             ` Yongqiang Yang
  0 siblings, 0 replies; 117+ messages in thread
From: Yongqiang Yang @ 2011-05-05 11:47 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: linux-ext4, Eric Sandeen, coreutils, Markus Trippelsdorf, xfs-oss

2011/5/5 Pádraig Brady <P@draigbrady.com>:
> On 14/04/11 17:10, Yongqiang Yang wrote:
>> Hi,
>>
>> I am off my working computer.  Maybe below fix could fix the problem.
>>
>> fs/ext4/extent.c
>> static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
>> 1877                 } else if (block >= le32_to_cpu(ex->ee_block)) {
>> 1878                         /*
>> 1879                          * some part of requested space is covered
>> 1880                          * by found extent
>> 1881                          */
>> 1882                         start = block;
>> 1883                         end = le32_to_cpu(ex->ee_block)
>> 1884                                 + ext4_ext_get_actual_len(ex);
>> 1885                         if (block + num < end)
>> 1886                                 end = block + num;
>>        +                        if (!ext4_ext_is_uninitialized(ex))
>> 1887                         exists = 1;
>> 1888                 } else {
>> 1889                         BUG();
>> 1890                 }
>
> Hi,
>
> To follow up on the above.  I'm under the impression
> that ext4 is expected to return extents for what
> is written, irrespective of whether it's reached the
> disk or not. I.E. the preallocation case where this fails
No.  It just returns extent info now - allocated extents and delayed
extents.  In the preallocation case, it returns unwritten extents.
And the code above does not work.

> was an oversite, for which the above might fix.
>
> So is the above summary correct, and has there
> been any more thoughts on a fix?
>
> cheers,
> Pádraig.
>



-- 
Best Wishes
Yongqiang Yang

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 117+ messages in thread

end of thread, other threads:[~2011-05-05 11:47 UTC | newest]

Thread overview: 117+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-14 10:26 Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?) Markus Trippelsdorf
2011-04-14 12:06 ` Markus Trippelsdorf
2011-04-14 14:02   ` Markus Trippelsdorf
     [not found]     ` <20110414140222.GB1679-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
2011-04-14 14:59       ` Pádraig Brady
2011-04-14 14:59         ` Pádraig Brady
     [not found]         ` <4DA70BD3.1070409-V8g9lnOeT5ydJdNcDFJN0w@public.gmane.org>
2011-04-14 15:50           ` Eric Sandeen
2011-04-14 15:50             ` Eric Sandeen
     [not found]             ` <4DA717B2.3020305-+82itfer+wXR7s880joybQ@public.gmane.org>
2011-04-14 15:52               ` Pádraig Brady
2011-04-14 15:52                 ` Pádraig Brady
2011-04-14 15:56                 ` Eric Sandeen
2011-04-14 15:56                   ` Eric Sandeen
2011-04-14 16:03                   ` Markus Trippelsdorf
2011-04-14 16:03                     ` Markus Trippelsdorf
2011-04-14 16:14                     ` Eric Sandeen
2011-04-14 16:14                       ` Eric Sandeen
     [not found]                     ` <20110414160343.GA12787-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
2011-04-14 16:21                       ` Yongqiang Yang
2011-04-14 16:21                         ` Yongqiang Yang
     [not found]                         ` <BANLkTimRxvBMp9M7zwiUY_UmmFOY5N58+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-14 16:28                           ` Markus Trippelsdorf
2011-04-14 16:28                             ` Markus Trippelsdorf
2011-04-14 16:31                             ` Eric Sandeen
2011-04-14 16:31                               ` Eric Sandeen
2011-04-14 16:48                               ` Markus Trippelsdorf
2011-04-14 16:48                                 ` Markus Trippelsdorf
2011-04-14 16:49                                 ` Eric Sandeen
2011-04-14 16:49                                   ` Eric Sandeen
2011-04-14 16:04                   ` Yongqiang Yang
2011-04-14 16:04                     ` Yongqiang Yang
2011-04-14 16:10                     ` Yongqiang Yang
2011-04-14 16:10                       ` Yongqiang Yang
     [not found]                       ` <BANLkTimoLeWMJgNFGW+zdeUeJyZ-_+8fMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-05-05 11:29                         ` Pádraig Brady
2011-05-05 11:29                           ` Pádraig Brady
2011-05-05 11:47                           ` Yongqiang Yang
2011-05-05 11:47                             ` Yongqiang Yang
     [not found]                 ` <4DA7182B.8050409-V8g9lnOeT5ydJdNcDFJN0w@public.gmane.org>
2011-04-14 17:27                   ` Jim Meyering
2011-04-14 17:27                     ` Jim Meyering
2011-04-14 19:13                     ` Pádraig Brady
2011-04-14 19:13                       ` Pádraig Brady
     [not found]                     ` <878vvcspz0.fsf-CybKA8TIZ99x3y/oJEDuiw@public.gmane.org>
2011-04-14 19:39                       ` Jim Meyering
2011-04-14 19:39                         ` Jim Meyering
2011-04-14 22:59             ` Dave Chinner
2011-04-14 23:29               ` Pádraig Brady
2011-04-14 23:29                 ` Pádraig Brady
2011-04-15  0:09                 ` Dave Chinner
2011-04-15  0:09                   ` Dave Chinner
2011-04-15  5:01                   ` Andreas Dilger
2011-04-15  5:01                     ` Andreas Dilger
2011-04-16  0:50                     ` Dave Chinner
2011-04-16  0:50                       ` Dave Chinner
2011-04-16  5:11                       ` Andreas Dilger
2011-04-16  5:11                         ` Andreas Dilger
2011-04-16 12:21                         ` Theodore Tso
2011-04-16 12:21                           ` Theodore Tso
2011-04-18  0:40                           ` Dave Chinner
2011-04-18  0:40                             ` Dave Chinner
2011-04-18  2:45                             ` Andreas Dilger
2011-04-18  2:45                               ` Andreas Dilger
2011-04-19  1:58                               ` Yongqiang Yang
2011-04-19  1:58                                 ` Yongqiang Yang
     [not found]                                 ` <BANLkTin=WEpSf6ddiOMNMOpCPP-wiEttSw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-04-19  2:59                                   ` Ted Ts'o
2011-04-19  2:59                                     ` Ted Ts'o
     [not found]                                     ` <20110419025949.GA3030-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2011-04-19  3:05                                       ` Eric Sandeen
2011-04-19  3:05                                         ` Eric Sandeen
     [not found]                                         ` <4DACFBEB.9040909-+82itfer+wXR7s880joybQ@public.gmane.org>
2011-04-21 20:12                                           ` Jim Meyering
2011-04-21 20:12                                             ` Jim Meyering
2011-04-19  3:30                                     ` Yongqiang Yang
2011-04-19  3:30                                       ` Yongqiang Yang
2011-04-19  4:14                                     ` Dave Chinner
2011-04-19  4:14                                       ` Dave Chinner
2011-04-19  5:27                                     ` Christoph Hellwig
2011-04-19  5:27                                       ` Christoph Hellwig
2011-04-19  3:44                                 ` Dave Chinner
2011-04-19  3:44                                   ` Dave Chinner
2011-04-19  6:53                                   ` Yongqiang Yang
2011-04-19  6:53                                     ` Yongqiang Yang
2011-04-19  7:45                                     ` Dave Chinner
2011-04-19  7:45                                       ` Dave Chinner
2011-04-19  8:11                                       ` Yongqiang Yang
2011-04-19  8:11                                         ` Yongqiang Yang
2011-04-19 14:05                                         ` Eric Sandeen
2011-04-19 14:05                                           ` Eric Sandeen
2011-04-19 14:09                                       ` Ted Ts'o
2011-04-19 14:09                                         ` Ted Ts'o
2011-04-19 14:13                                         ` Eric Sandeen
2011-04-19 14:13                                           ` Eric Sandeen
2011-04-19 16:01                                           ` Ted Ts'o
2011-04-19 16:01                                             ` Ted Ts'o
2011-04-20  1:53                                             ` Yongqiang Yang
2011-04-20  1:53                                               ` Yongqiang Yang
2011-04-20 15:21                                             ` Christoph Hellwig
2011-04-20 15:21                                               ` Christoph Hellwig
2011-04-20 17:21                                               ` Ted Ts'o
2011-04-20 17:21                                                 ` Ted Ts'o
     [not found]                                         ` <20110419140909.GD3030-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2011-04-19 21:08                                           ` Dave Chinner
2011-04-19 21:08                                             ` Dave Chinner
2011-04-20 15:29                                             ` Christoph Hellwig
2011-04-20 15:29                                               ` Christoph Hellwig
2011-04-16  6:05                       ` Yongqiang Yang
2011-04-16  6:05                         ` Yongqiang Yang
2011-04-18  0:35                         ` Dave Chinner
2011-04-18  0:35                           ` Dave Chinner
2011-04-15  8:53                   ` Jim Meyering
2011-04-15  8:53                     ` Jim Meyering
2011-04-15 17:16                     ` Christoph Hellwig
2011-04-15 17:16                       ` Christoph Hellwig
     [not found]                       ` <20110415171629.GA9088-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2011-04-15 17:24                         ` Eric Blake
2011-04-15 17:24                           ` Eric Blake
2011-04-15 17:26                           ` Christoph Hellwig
2011-04-15 17:26                             ` Christoph Hellwig
     [not found]                             ` <20110415172603.GA20086-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2011-04-15 22:28                               ` Andreas Dilger
2011-04-15 22:28                                 ` Andreas Dilger
2011-04-16  0:25                                 ` Dave Chinner
2011-04-16  0:25                                   ` Dave Chinner
2011-04-14 14:39 ` Eric Sandeen
     [not found] ` <20110414102608.GA1678-tLCgZGx+iJ+kxVt8IV0GqQ@public.gmane.org>
2011-04-20 14:39   ` Jim Meyering
2011-04-20 14:39     ` Jim Meyering
     [not found]     ` <87d3khugv1.fsf-CybKA8TIZ99x3y/oJEDuiw@public.gmane.org>
2011-04-21 20:01       ` Jim Meyering
2011-04-21 20:01         ` Jim Meyering

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.