All of lore.kernel.org
 help / color / mirror / Atom feed
* Creating recursive snapshots for all filesystems
@ 2013-05-02 20:41 Alexander Skwar
  2013-05-03  8:48 ` Sander
  2013-05-05 11:05 ` Kai Krakow
  0 siblings, 2 replies; 18+ messages in thread
From: Alexander Skwar @ 2013-05-02 20:41 UTC (permalink / raw)
  To: linux-btrfs

Hello once more...

I'm on Ubuntu 13.04 with Ubuntu kernel 3.8.0-19-lowlatency and
Btrfs v0.20-rc1.

(Did I say that before...? *G*)

Okay, ATM I'm writing a script for creating snapshots for "backups" of
all my btrfs filesystems. I come from a FreeBSD / Solaris background
with heavy use of ZFS.

In ZFS, it's very easy to create snapshots of all "zpools"
("filesystems") of the system, like so:

SnapName=backup.`date +%Y%m%d--%H%M%S`
zpool list -Ho name | while read zpool; do
  zfs snapshot -r $zpool@$SnapName &
done ; wait

Well… How to do something like this with btrfs? I do _not_ want to
have to manually list the filesystems/subvolumes I've got in some
script or configuration file.

Where I'm hanging right now, is that I can't seem to figure out a
"bullet proof" way to find all the subvolumes of the filesystems I
might have.

I've got this:

root@ask-home:~# btrfs fi show
failed to read /dev/sr0
Label: 'Data'  uuid: 7d2eb10f-aced-4d41-bb7f-7badbf075b6a
Total devices 1 FS bytes used 27.96GB
devid    1 size 35.00GB used 35.00GB path /dev/dm-0

Label: 'KernBtrfs'  uuid: 8b16bc2a-43e3-40da-89f5-7b333c6682f3
Total devices 1 FS bytes used 655.05MB
devid    1 size 11.85GB used 6.04GB path /dev/sda11

Btrfs v0.20-rc1

root@ask-home:~# mount -t btrfs
/dev/mapper/ssd-Data on /data type btrfs (rw,noatime,ssd,discard)
/dev/mapper/ssd-Data on /home type btrfs (rw,noatime,ssd,discard,subvol=home)
/dev/sda11 on /data/Kernel/KernBtrfs type btrfs (rw,noatime)

Now, how would I find all the subvolumes of the btrfs with the label
"Data" (or uuid 7d...) or /dev/dm-0? "btrfs subvolume list" only seems
to operate on where a btrfs is mounted. It works on "path".

I could do "btrfs subv list -a /data", but how would I figure out,
that "/data" is the "root" of the filesystem with uuid 7d...?

For now, I do something like this (-> http://pastebin.com/u08ub1i8):

SnapName=backup.`date +%Y%m%d--%H%M%S`
btrfs fi show 2>/dev/null | awk '/ path / {print $NF}' | while read path; do
  SafePath=`echo "$path" | tr / .`
  TmpMountDir=`mktemp -d /tmp/.btrfs.mount.$SafePath.XXXXXX`
  mount -t btrfs $path $TmpMountDir
  (btrfs subv list -ar $TmpMountDir; btrfs subv list -a $TmpMountDir)
| sort | uniq -u | while read _id Id _gen Gen _top _level Toplevel
_path Path; do
    btrfs subv snaps -r "$TmpMountDir/$Path" "$TmpMountDir/$Path.$SnapName"
  done
  umount $TmpMountDir
  rmdir $TmpMountDir
done

Now, this works, but seems "somewhat" complicated... But
maybe I'm just spoiled by ZFS ;)

Is there an easier way to achieve what I want? I want to achieve:

Creating recursive snapshots for all filesystems

;)

Thanks,

Alexander
--
=>        Google+ => http://plus.skwar.me         <==
=> Chat (Jabber/Google Talk) => a.skwar@gmail.com <==

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Creating recursive snapshots for all filesystems
  2013-05-02 20:41 Creating recursive snapshots for all filesystems Alexander Skwar
@ 2013-05-03  8:48 ` Sander
  2013-05-03 11:46   ` Alexander Skwar
  2013-05-05 11:05 ` Kai Krakow
  1 sibling, 1 reply; 18+ messages in thread
From: Sander @ 2013-05-03  8:48 UTC (permalink / raw)
  To: Alexander Skwar; +Cc: linux-btrfs

Alexander Skwar wrote (ao):
> Where I'm hanging right now, is that I can't seem to figure out a
> "bullet proof" way to find all the subvolumes of the filesystems I
> might have.

> Is there an easier way to achieve what I want? I want to achieve:
> 
> Creating recursive snapshots for all filesystems

Not sure if this helps, but I have subvolid=0, which contains all my
subvolumes, mounted under /.root/

/etc/fstab:
LABEL=panda   /  btrfs  subvol=rootvolume,space_cache,inode_cache,compress=lzo,ssd  0  0
LABEL=panda   /home           btrfs   subvol=home                                   0  0
LABEL=panda   /root           btrfs   subvol=root                                   0  0
LABEL=panda   /var            btrfs   subvol=var                                    0  0
LABEL=panda   /holding        btrfs   subvol=.holding                               0  0
LABEL=panda   /.root          btrfs   subvolid=0                                    0  0
LABEL=panda   /.backupadmin   btrfs   subvol=backupadmin                            0  0 
/Varlib       /var/lib        none    bind                                          0  0

panda:~# ls -l /.root/
total 0
drwxr-xr-x. 1 root root 580800 Jan 30 17:46 backupadmin
drwxr-xr-x. 1 root root     24 Mar 27  2012 home
drwx------. 1 root root    742 Mar 19 15:50 root
drwxr-xr-x. 1 root root    226 May 16  2012 rootvolume
drwxr-xr-x. 1 root root     96 Apr  3  2012 var

In my snapshots script:

  ...
  yyyymmddhhmm=`date +%Y%m%d_%H.%M`
  ...
  for subvolume in `ls /.root/`
  do
    ...
    /sbin/btrfs subvolume snapshot ${filesystem}/${subvolume}/ \
      /.root/.snapshot_${yyyymmddhhmm}_${hostname}_${subvolume}/ || result=2
    ...
  done
  ...

This creates timestamped snapshots for all subvolumes.

	Sander

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Creating recursive snapshots for all filesystems
  2013-05-03  8:48 ` Sander
@ 2013-05-03 11:46   ` Alexander Skwar
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Skwar @ 2013-05-03 11:46 UTC (permalink / raw)
  To: linux-btrfs

Hi

Sander <sander <at> humilis.net> writes:

> 
> Alexander Skwar wrote (ao):
> > Where I'm hanging right now, is that I can't seem to figure out a
> > "bullet proof" way to find all the subvolumes of the filesystems I
> > might have.
> 
> > Is there an easier way to achieve what I want? I want to achieve:
> > 
> > Creating recursive snapshots for all filesystems
> 
> Not sure if this helps, but I have subvolid=0, which contains all my
> subvolumes, mounted under /.root/

Hm, not quite what I'm after and not nearly as easy as ZFS...

"Problem" with your approach: The admin has to maintain this. 
I was looking for something, which "maints itself", so to say.
And your approach also wouldn't scale if there are sub-subvolumes.

ZFS really is so much easier (at least regarding that).

Thanks a lot, though. It's a worthwhile idea.

Regards,
Alexander



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Creating recursive snapshots for all filesystems
  2013-05-02 20:41 Creating recursive snapshots for all filesystems Alexander Skwar
  2013-05-03  8:48 ` Sander
@ 2013-05-05 11:05 ` Kai Krakow
  2013-05-05 12:59   ` Alexander Skwar
  1 sibling, 1 reply; 18+ messages in thread
From: Kai Krakow @ 2013-05-05 11:05 UTC (permalink / raw)
  To: linux-btrfs

Alexander Skwar <alexanders.mailinglists+nospam@gmail.com> schrieb:

> Where I'm hanging right now, is that I can't seem to figure out a
> "bullet proof" way to find all the subvolumes of the filesystems I
> might have.

What about this:

# btrfs sub list -a /
ID 256 gen 1487089 top level 5 path <FS_TREE>/root64
ID 258 gen 1487089 top level 5 path <FS_TREE>/home
ID 259 gen 1486932 top level 5 path <FS_TREE>/usr-src
ID 260 gen 1486885 top level 5 path <FS_TREE>/usr-portage
ID 261 gen 1487089 top level 5 path <FS_TREE>/var-tmp

You still need to iterate through the mounted btrfs filesystems if you are 
using more than one:

# btrfs fi show | fgrep uuid
Label: 'usb-backup'  uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca
Label: 'system'  uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536

Then translate the uuid back to an fspath somehow.

Another option would be to use blkid:

# blkid -t TYPE=btrfs
/dev/sda3: LABEL="system" UUID="d2bb232a-2e8f-4951-8bcc-97e237f1b536" 
UUID_SUB="7e0a2d93-86cc-4421-9e2c-b5c405075ff3" TYPE="btrfs" 
PARTLABEL="Linux filesystem" PARTUUID="7807f64f-3d0b-4e99-81a4-975128ed2918" 
/dev/sdb3: LABEL="system" UUID="d2bb232a-2e8f-4951-8bcc-97e237f1b536" 
UUID_SUB="caa3a27b-8546-4519-9a13-8f50cd1caf70" TYPE="btrfs" 
PARTLABEL="Linux filesystem" PARTUUID="1330dcf5-f1d3-462b-af08-e7cad839b3dc" 
/dev/sdc3: LABEL="system" UUID="d2bb232a-2e8f-4951-8bcc-97e237f1b536" 
UUID_SUB="e919b066-db86-4818-975c-bae0548c822a" TYPE="btrfs" 
PARTLABEL="Linux filesystem" PARTUUID="847fd503-8dca-4c6b-b199-20a47d62aa55" 
/dev/sdd1: LABEL="usb-backup" UUID="7038c8fa-4293-49e9-b493-a9c46e5663ca" 
UUID_SUB="f2eac9b0-22e3-44a2-bdc5-c98ee86da71f" TYPE="btrfs" 
PARTLABEL="Linux filesystem" PARTUUID="275ba328-9d5a-494b-bf19-a995596a4b6b" 

Still needs translation back to fspathes. But that could be done with 
grep/head/lsblk trickery...

HTH,
Kai


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Creating recursive snapshots for all filesystems
  2013-05-05 11:05 ` Kai Krakow
@ 2013-05-05 12:59   ` Alexander Skwar
  2013-05-05 16:03     ` Kai Krakow
  2013-05-09 20:41     ` nocow 'C' flag ignored after balance Kyle Gates
  0 siblings, 2 replies; 18+ messages in thread
From: Alexander Skwar @ 2013-05-05 12:59 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-btrfs

Hello

On Sun, May 5, 2013 at 1:05 PM, Kai Krakow <hurikhan77+btrfs@gmail.com> wrote:
> Alexander Skwar <alexanders.mailinglists+nospam@gmail.com> schrieb:
>
>> Where I'm hanging right now, is that I can't seem to figure out a
>> "bullet proof" way to find all the subvolumes of the filesystems I
>> might have.
>
> What about this:
>
> # btrfs sub list -a /

How do I know that "/" is a btrfs? How do I know, that there are not
also other btrfs filesystems? :)

> You still need to iterate through the mounted btrfs filesystems if you are
> using more than one:
>
> # btrfs fi show | fgrep uuid
> Label: 'usb-backup'  uuid: 7038c8fa-4293-49e9-b493-a9c46e5663ca
> Label: 'system'  uuid: d2bb232a-2e8f-4951-8bcc-97e237f1b536
>
> Then translate the uuid back to an fspath somehow.

Yep. That's basically what I'm doing ATM.

>
> Another option would be to use blkid:
>
> # blkid -t TYPE=btrfs

Ah, that's cool! Didn't know that blkid had this option.

> Still needs translation back to fspathes. But that could be done with
> grep/head/lsblk trickery...

Hm, I didn't know lsblk until now, but it seems that it doesn't handle
LVM well, does it?

(See http://pastebin.com/fuZ8HHQi for better readable version.)

a@ask-home:~$ blkid -t TYPE=btrfs | grep mapper
/dev/mapper/ssd-Data: LABEL="Data"
UUID="7d2eb10f-aced-4d41-bb7f-7badbf075b6a"
UUID_SUB="582b64f5-edd5-48f2-978e-24df9a839b5b" TYPE="btrfs"

a@ask-home:~$ lsblk
NAME                      MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                         8:0    0 167.7G  0 disk
├─sda1                      8:1    0   100M  0 part
├─sda2                      8:2    0  10.9G  0 part /dell-recovery
├─sda3                      8:3    0  39.1G  0 part /windows
├─sda4                      8:4    0     1K  0 part
├─sda5                      8:5    0   102M  0 part /boot
├─sda6                      8:6    0   8.1G  0 part [SWAP]
├─sda7                      8:7    0  12.2G  0 part
│ ├─ssd-Data (dm-0)       252:0    0    35G  0 lvm
│ └─ssd-UbRoot1304 (dm-1) 252:1    0    10G  0 lvm  /
├─sda8                      8:8    0  12.2G  0 part
│ └─ssd-Data (dm-0)       252:0    0    35G  0 lvm
├─sda9                      8:9    0  12.2G  0 part
│ └─ssd-Data (dm-0)       252:0    0    35G  0 lvm
├─sda10                     8:10   0  12.2G  0 part
│ └─ssd-Data (dm-0)       252:0    0    35G  0 lvm
├─sda11                     8:11   0  11.9G  0 part
├─sda12                     8:12   0  11.9G  0 part /data/Kernel/KernExt4
├─sda13                     8:13   0  11.9G  0 part
├─sda14                     8:14   0  11.9G  0 part /data/Kernel/KernReiserfs
└─sda15                     8:15   0  11.9G  0 part /data/Kernel/KernXfs
sr0                        11:0    1  1024M  0 rom

a@ask-home:~$ lsblk -P|grep Data
NAME="ssd-Data (dm-0)" MAJ:MIN="252:0" RM="0" SIZE="35G" RO="0"
TYPE="lvm" MOUNTPOINT=""
NAME="ssd-Data (dm-0)" MAJ:MIN="252:0" RM="0" SIZE="35G" RO="0"
TYPE="lvm" MOUNTPOINT=""
NAME="ssd-Data (dm-0)" MAJ:MIN="252:0" RM="0" SIZE="35G" RO="0"
TYPE="lvm" MOUNTPOINT=""
NAME="ssd-Data (dm-0)" MAJ:MIN="252:0" RM="0" SIZE="35G" RO="0"
TYPE="lvm" MOUNTPOINT=""


So I guess I'd still need to mount the root volume temporarily
somewhere to do the translation.

ZFS really wipes the floor with btrfs regarding ease of use as far as
that's concerned…

That blkid trick was quite useful, though. Thanks!

Cheers,
Alexander
-- 
=>        Google+ => http://plus.skwar.me         <==
=> Chat (Jabber/Google Talk) => a.skwar@gmail.com <==

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Creating recursive snapshots for all filesystems
  2013-05-05 12:59   ` Alexander Skwar
@ 2013-05-05 16:03     ` Kai Krakow
  2013-05-05 16:19       ` Alexander Skwar
  2013-05-09 20:41     ` nocow 'C' flag ignored after balance Kyle Gates
  1 sibling, 1 reply; 18+ messages in thread
From: Kai Krakow @ 2013-05-05 16:03 UTC (permalink / raw)
  To: linux-btrfs

Alexander Skwar <alexanders.mailinglists+nospam@gmail.com> schrieb:

> So I guess I'd still need to mount the root volume temporarily
> somewhere to do the translation.

That brings in the idea how bedup seems to handle this. Maybe you want to 
take one or the other idea from there as it also has to enumerate all btrfs 
filesystems and snapshots:

https://github.com/g2p/bedup

Regards,
Kai


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Creating recursive snapshots for all filesystems
  2013-05-05 16:03     ` Kai Krakow
@ 2013-05-05 16:19       ` Alexander Skwar
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Skwar @ 2013-05-05 16:19 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-btrfs

Hi Kai

On Sun, May 5, 2013 at 6:03 PM, Kai Krakow <hurikhan77+btrfs@gmail.com> wrote:

> That brings in the idea how bedup seems to handle this. Maybe you want to
> take one or the other idea from there as it also has to enumerate all btrfs
> filesystems and snapshots:

Sure, will have a look. There are _certainly_ more clever ways
to do that, than what I came up with ☺

BR,

Alexander
--
=>        Google+ => http://plus.skwar.me         <==
=> Chat (Jabber/Google Talk) => a.skwar@gmail.com <==

^ permalink raw reply	[flat|nested] 18+ messages in thread

* nocow 'C' flag ignored after balance
  2013-05-05 12:59   ` Alexander Skwar
  2013-05-05 16:03     ` Kai Krakow
@ 2013-05-09 20:41     ` Kyle Gates
  2013-05-10  5:15       ` Liu Bo
  1 sibling, 1 reply; 18+ messages in thread
From: Kyle Gates @ 2013-05-09 20:41 UTC (permalink / raw)
  To: linux-btrfs

I'll preface that I'm running Ubuntu 13.04 with the standard 3.8 series 
kernel so please disregard if this has been fixed in higher versions. This 
is on a btrfs RAID1 with 3 then 4 disks.

My use case is to set the nocow 'C' flag on a directory and copy in some 
files, then make lots of writes (same file sizes) and note that the number 
of extents stays the same, good.
Then run a balance (I added a disk) and start making writes again, now the 
number of extents starts climbing, boo.
Is this standard behavior? I realize a balance will cow the files. Are they 
also being checksummed thereby breaking the nocow flag?

I have made no snapshots and made no writes to said files while the balance 
was running.

Thanks,
Kyle 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-09 20:41     ` nocow 'C' flag ignored after balance Kyle Gates
@ 2013-05-10  5:15       ` Liu Bo
  2013-05-10 13:58         ` Kyle Gates
  0 siblings, 1 reply; 18+ messages in thread
From: Liu Bo @ 2013-05-10  5:15 UTC (permalink / raw)
  To: Kyle Gates; +Cc: linux-btrfs

On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote:
> I'll preface that I'm running Ubuntu 13.04 with the standard 3.8
> series kernel so please disregard if this has been fixed in higher
> versions. This is on a btrfs RAID1 with 3 then 4 disks.
> 
> My use case is to set the nocow 'C' flag on a directory and copy in
> some files, then make lots of writes (same file sizes) and note that
> the number of extents stays the same, good.
> Then run a balance (I added a disk) and start making writes again,
> now the number of extents starts climbing, boo.
> Is this standard behavior? I realize a balance will cow the files.
> Are they also being checksummed thereby breaking the nocow flag?
> 
> I have made no snapshots and made no writes to said files while the
> balance was running.

Hi Kyle,

It's hard to say if it's standard, it is a side effect casued by balance.

During balance, our reloc root works like a snapshot, so we set
last_snapshot on the fs root, and this makes new nocow writes think that
we have to do cow as the extent is created before taking snapshot.

But the nocow 'C' flag on the file is still there, if you make new
writes on the new extent after balance, you still get the same number of
extents.

thanks,
liubo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-10  5:15       ` Liu Bo
@ 2013-05-10 13:58         ` Kyle Gates
  2013-05-16 19:11           ` Kyle Gates
  0 siblings, 1 reply; 18+ messages in thread
From: Kyle Gates @ 2013-05-10 13:58 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs

On Fri, May 10, 2013 Liu Bo wrote:
> On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote:
>> I'll preface that I'm running Ubuntu 13.04 with the standard 3.8
>> series kernel so please disregard if this has been fixed in higher
>> versions. This is on a btrfs RAID1 with 3 then 4 disks.
>>
>> My use case is to set the nocow 'C' flag on a directory and copy in
>> some files, then make lots of writes (same file sizes) and note that
>> the number of extents stays the same, good.
>> Then run a balance (I added a disk) and start making writes again,
>> now the number of extents starts climbing, boo.
>> Is this standard behavior? I realize a balance will cow the files.
>> Are they also being checksummed thereby breaking the nocow flag?
>>
>> I have made no snapshots and made no writes to said files while the
>> balance was running.
>
> Hi Kyle,
>
> It's hard to say if it's standard, it is a side effect casued by balance.
>
> During balance, our reloc root works like a snapshot, so we set
> last_snapshot on the fs root, and this makes new nocow writes think that
> we have to do cow as the extent is created before taking snapshot.
>
> But the nocow 'C' flag on the file is still there, if you make new
> writes on the new extent after balance, you still get the same number of
> extents.
>
> thanks,
> liubo

Thank you for the explanation.
On my machine this didn't happen however. IIRC one 10GiB file had 24 extents 
before balance, 26 extents after balance, and 1000+ and growing when I 
checked the following day.
I'll add that I am running a relatively recent version of btrfs-tools from a 
ppa. 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-10 13:58         ` Kyle Gates
@ 2013-05-16 19:11           ` Kyle Gates
  2013-05-17  7:04             ` Liu Bo
  0 siblings, 1 reply; 18+ messages in thread
From: Kyle Gates @ 2013-05-16 19:11 UTC (permalink / raw)
  To: bo.li.liu, dsterba; +Cc: linux-btrfs

> On Fri, May 10, 2013 Liu Bo wrote:
>> On Thu, May 09, 2013 at 03:41:49PM -0500, Kyle Gates wrote:
>>> I'll preface that I'm running Ubuntu 13.04 with the standard 3.8
>>> series kernel so please disregard if this has been fixed in higher
>>> versions. This is on a btrfs RAID1 with 3 then 4 disks.
>>>
>>> My use case is to set the nocow 'C' flag on a directory and copy in
>>> some files, then make lots of writes (same file sizes) and note that
>>> the number of extents stays the same, good.
>>> Then run a balance (I added a disk) and start making writes again,
>>> now the number of extents starts climbing, boo.
>>> Is this standard behavior? I realize a balance will cow the files.
>>> Are they also being checksummed thereby breaking the nocow flag?
>>>
>>> I have made no snapshots and made no writes to said files while the
>>> balance was running.
>>
>> Hi Kyle,
>>
>> It's hard to say if it's standard, it is a side effect casued by balance.
>>
>> During balance, our reloc root works like a snapshot, so we set
>> last_snapshot on the fs root, and this makes new nocow writes think that
>> we have to do cow as the extent is created before taking snapshot.
>>
>> But the nocow 'C' flag on the file is still there, if you make new
>> writes on the new extent after balance, you still get the same number of
>> extents.
>>
>> thanks,
>> liubo
>
> Thank you for the explanation.
> On my machine this didn't happen however. IIRC one ~10GiB file had 24 
> extents before balance, 26 extents after balance, and 1000+ and growing 
> when I checked the following day.
> I'll add that I am running a relatively recent version of btrfs-tools from 
> a ppa.
and mounted with autodefrag
Am I actually just seeing large ranges getting split while remaining 
contiguous on disk? This would imply crc calculation on the two outside 
ranges. Or perhaps there is some data being inlined for each write. I 
believe writes on this file are 32KiB each.
Does the balance produce persistent crc values in the metadata even though 
the files are nocow which implies nocrc?
...
I ran this test again and here's filefrag -v after about a day of use:

Filesystem type is: 9123683e
File size of /blah/blah/file is 10213265920 (2493474 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0 675625629               9
   1       9 675621279 675625638     55
   2      64 674410131 675621334    886
   3     950 675558303 674411017      9
   4     959 675583473 675558312     55
   5    1014 674411081 675583528    708
   6    1722 675456318 674411789      9
   7    1731 675710934 675456327     55
   8    1786 674411853 675710989    521
   9    2307 675424433 674412374      9
  10    2316 675471062 675424442     55
  11    2371 674412438 675471117    984
  12    3355 676012018 674413422      9
  13    3364 676024295 676012027     55
  14    3419 674413486 676024350    871
  15    4290 675681138 674414357      9
  16    4299 675618500 675681147     55
...
13986 2486955 671627059 675876382    627
13987 2487582 675677542 671627686      9
13988 2487591 675700351 675677551     55
13989 2487646 671627750 675700406   1212
13990 2488858 675932037 671628962      9
13991 2488867 675990025 675932046     55
13992 2488922 671629026 675990080    220
13993 2489142 675674447 671629246      9
13994 2489151 675687864 675674456     55
13995 2489206 671629310 675687919   1821
13996 2491027 676209288 671631131      9
13997 2491036 676260767 676209297     55
13998 2491091 671631195 676260822    285
13999 2491376 675650278 671631480      9
14000 2491385 675678822 675650287     55
14001 2491440 671631544 675678877   1464
14002 2492904 675534255 671633008      9
14003 2492913 675503514 675534264     55
14004 2492968 671633072 675503569    506 eof
/blah/blah/file: 14005 extents found

As you can see the 32KiB writes fit in the extents of size 9 and 55. Are 
those 9 block extents inlined?
If I understand correctly, new extents are created for these nocow writes, 
then the old extents are basically hole punched producing three (four? 
because of inlining) separate extents.
Something here begs for optimization. Perhaps balance should treat nocow 
files a little differently. That would be the time to remove the extra bits 
that prevent inplace overwrites. After the fact it becomes much more 
difficult, although removing a crc for the extent being written seems a 
little easier then iterating over the entire file.

Thanks for taking the time to read,
Kyle

P.S. I'm CCing David as I believe he wrote the patch to get the 'C' flag 
working on empty files and directories. 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-16 19:11           ` Kyle Gates
@ 2013-05-17  7:04             ` Liu Bo
  2013-05-17 14:38               ` Kyle Gates
  2013-05-28 14:22               ` Kyle Gates
  0 siblings, 2 replies; 18+ messages in thread
From: Liu Bo @ 2013-05-17  7:04 UTC (permalink / raw)
  To: Kyle Gates; +Cc: dsterba, linux-btrfs

On Thu, May 16, 2013 at 02:11:41PM -0500, Kyle Gates wrote:
> and mounted with autodefrag
> Am I actually just seeing large ranges getting split while remaining
> contiguous on disk? This would imply crc calculation on the two
> outside ranges. Or perhaps there is some data being inlined for each
> write. I believe writes on this file are 32KiB each.
> Does the balance produce persistent crc values in the metadata even
> though the files are nocow which implies nocrc?
> ...
> I ran this test again and here's filefrag -v after about a day of use:
> 
>[...]
> As you can see the 32KiB writes fit in the extents of size 9 and 55.
> Are those 9 block extents inlined?
> If I understand correctly, new extents are created for these nocow
> writes, then the old extents are basically hole punched producing
> three (four? because of inlining) separate extents.
> Something here begs for optimization. Perhaps balance should treat
> nocow files a little differently. That would be the time to remove
> the extra bits that prevent inplace overwrites. After the fact it
> becomes much more difficult, although removing a crc for the extent
> being written seems a little easier then iterating over the entire
> file.
> 
> Thanks for taking the time to read,
> Kyle
> 
> P.S. I'm CCing David as I believe he wrote the patch to get the 'C'
> flag working on empty files and directories.

Hi Kyle,

Can you please apply this patch and see if it helps?

thanks,
liubo


From: Liu Bo <bo.li.liu@oracle.com>

Subject: [PATCH] Btrfs: fix broken nocow after a normal balance

Balance will create reloc_root for each fs root, and it's going to
record last_snapshot to filter shared blocks.  The side effect of
setting last_snapshot is to break nocow attributes of files.

So here we update file extent's generation while walking relocated
file extents in data reloc root, and use file extent's generation
instead for checking if we have cross refs for the file extent.

That way we can make nocow happy again and have no impact on others.

Reported-by: Kyle Gates <kylegates@hotmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/ctree.h       |    2 +-
 fs/btrfs/extent-tree.c |   18 +++++++++++++-----
 fs/btrfs/inode.c       |   10 ++++++++--
 fs/btrfs/relocation.c  |    1 +
 4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4560052..eb2e782 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_root *root,
 				    u64 bytenr, u64 num_bytes);
 int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
 			  struct btrfs_root *root,
-			  u64 objectid, u64 offset, u64 bytenr);
+			  u64 objectid, u64 offset, u64 bytenr, u64 gen);
 struct btrfs_block_group_cache *btrfs_lookup_block_group(
 						 struct btrfs_fs_info *info,
 						 u64 bytenr);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1e84c74..f3b3616 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2816,7 +2816,8 @@ out:
 static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
 					struct btrfs_root *root,
 					struct btrfs_path *path,
-					u64 objectid, u64 offset, u64 bytenr)
+					u64 objectid, u64 offset, u64 bytenr,
+					u64 fi_gen)
 {
 	struct btrfs_root *extent_root = root->fs_info->extent_root;
 	struct extent_buffer *leaf;
@@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct btrfs_trans_handle
*trans,
 	    btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY))
 		goto out;
 
-	if (btrfs_extent_generation(leaf, ei) <=
-	    btrfs_root_last_snapshot(&root->root_item))
+	/*
+	 * Usually generation in extent item is larger than that in file extent
+	 * item because of delay refs.  But we don't want balance to break
+	 * file's nocow behaviour, so use file_extent's generation which has
+	 * been updates when we update fs root to point to relocated file
+	 * extents in data reloc root.
+	 */
+	fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen);
+	if (fi_gen <= btrfs_root_last_snapshot(&root->root_item))
 		goto out;
 
 	iref = (struct btrfs_extent_inline_ref *)(ei + 1);
@@ -2886,7 +2894,7 @@ out:
 
 int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
 			  struct btrfs_root *root,
-			  u64 objectid, u64 offset, u64 bytenr)
+			  u64 objectid, u64 offset, u64 bytenr, u64 gen)
 {
 	struct btrfs_path *path;
 	int ret;
@@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
 
 	do {
 		ret = check_committed_ref(trans, root, path, objectid,
-					  offset, bytenr);
+					  offset, bytenr, gen);
 		if (ret && ret != -ENOENT)
 			goto out;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2cfdd33..976b045 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1727,6 +1727,8 @@ next_slot:
 		ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
 		if (extent_type == BTRFS_FILE_EXTENT_REG ||
 		    extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
+			u64 gen;
+			gen = btrfs_file_extent_generation(leaf, fi);
 			disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 			extent_offset = btrfs_file_extent_offset(leaf, fi);
 			extent_end = found_key.offset +
@@ -1749,7 +1751,8 @@ next_slot:
 				goto out_check;
 			if (btrfs_cross_ref_exist(trans, root, ino,
 						  found_key.offset -
-						  extent_offset, disk_bytenr))
+						  extent_offset, disk_bytenr,
+						  gen))
 				goto out_check;
 			disk_bytenr += extent_offset;
 			disk_bytenr += cur_offset - found_key.offset;
@@ -7002,6 +7005,7 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle
*trans,
 	struct btrfs_key key;
 	u64 disk_bytenr;
 	u64 backref_offset;
+	u64 fi_gen;
 	u64 extent_end;
 	u64 num_bytes;
 	int slot;
@@ -7048,6 +7052,7 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle
*trans,
 	}
 	disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
 	backref_offset = btrfs_file_extent_offset(leaf, fi);
+	fi_gen = btrfs_file_extent_generation(leaf, fi);
 
 	*orig_start = key.offset - backref_offset;
 	*orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
@@ -7067,7 +7072,8 @@ static noinline int can_nocow_odirect(struct btrfs_trans_handle
*trans,
 	 * find any we must cow
 	 */
 	if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode),
-				  key.offset - backref_offset, disk_bytenr))
+				  key.offset - backref_offset, disk_bytenr,
+				  fi_gen))
 		goto out;
 
 	/*
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 704a1b8..07faabf 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1637,6 +1637,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans,
 		BUG_ON(ret < 0);
 
 		btrfs_set_file_extent_disk_bytenr(leaf, fi, new_bytenr);
+		btrfs_set_file_extent_generation(leaf, fi, trans->transid);
 		dirty = 1;
 
 		key.offset -= btrfs_file_extent_offset(leaf, fi);
-- 
1.7.7



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-17  7:04             ` Liu Bo
@ 2013-05-17 14:38               ` Kyle Gates
  2013-05-28 14:22               ` Kyle Gates
  1 sibling, 0 replies; 18+ messages in thread
From: Kyle Gates @ 2013-05-17 14:38 UTC (permalink / raw)
  To: bo.li.liu; +Cc: dsterba, linux-btrfs

On Fri, 17 May 2013 15:04:45 +0800, Liu Bo wrote:
> On Thu, May 16, 2013 at 02:11:41PM -0500, Kyle Gates wrote:
>> and mounted with autodefrag
>> Am I actually just seeing large ranges getting split while remaining
>> contiguous on disk? This would imply crc calculation on the two
>> outside ranges. Or perhaps there is some data being inlined for each
>> write. I believe writes on this file are 32KiB each.
>> Does the balance produce persistent crc values in the metadata even
>> though the files are nocow which implies nocrc?
>> ...
>> I ran this test again and here's filefrag -v after about a day of use:
>>
>>[...]
>> As you can see the 32KiB writes fit in the extents of size 9 and 55.
>> Are those 9 block extents inlined?
>> If I understand correctly, new extents are created for these nocow
>> writes, then the old extents are basically hole punched producing
>> three (four? because of inlining) separate extents.
>> Something here begs for optimization. Perhaps balance should treat
>> nocow files a little differently. That would be the time to remove
>> the extra bits that prevent inplace overwrites. After the fact it
>> becomes much more difficult, although removing a crc for the extent
>> being written seems a little easier then iterating over the entire
>> file.
>>
>> Thanks for taking the time to read,
>> Kyle
>>
>> P.S. I'm CCing David as I believe he wrote the patch to get the 'C'
>> flag working on empty files and directories.
>
> Hi Kyle,
>
> Can you please apply this patch and see if it helps?
>
> thanks,
> liubo
>
>
> From: Liu Bo <bo.li.liu@oracle.com>
>
> Subject: [PATCH] Btrfs: fix broken nocow after a normal balance
>
> Balance will create reloc_root for each fs root, and it's going to
> record last_snapshot to filter shared blocks.  The side effect of
> setting last_snapshot is to break nocow attributes of files.
>
> So here we update file extent's generation while walking relocated
> file extents in data reloc root, and use file extent's generation
> instead for checking if we have cross refs for the file extent.
>
> That way we can make nocow happy again and have no impact on others.
>
> Reported-by: Kyle Gates <kylegates@hotmail.com>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
> fs/btrfs/ctree.h       |    2 +-
> fs/btrfs/extent-tree.c |   18 +++++++++++++-----
> fs/btrfs/inode.c       |   10 ++++++++--
> fs/btrfs/relocation.c  |    1 +
> 4 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 4560052..eb2e782 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct 
> btrfs_root *root,
>      u64 bytenr, u64 num_bytes);
> int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
>    struct btrfs_root *root,
> -   u64 objectid, u64 offset, u64 bytenr);
> +   u64 objectid, u64 offset, u64 bytenr, u64 gen);
> struct btrfs_block_group_cache *btrfs_lookup_block_group(
>  struct btrfs_fs_info *info,
>  u64 bytenr);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 1e84c74..f3b3616 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2816,7 +2816,8 @@ out:
> static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
>  struct btrfs_root *root,
>  struct btrfs_path *path,
> - u64 objectid, u64 offset, u64 bytenr)
> + u64 objectid, u64 offset, u64 bytenr,
> + u64 fi_gen)
> {
>  struct btrfs_root *extent_root = root->fs_info->extent_root;
>  struct extent_buffer *leaf;
> @@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct 
> btrfs_trans_handle
> *trans,
>      btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY))
>  goto out;
>
> - if (btrfs_extent_generation(leaf, ei) <=
> -     btrfs_root_last_snapshot(&root->root_item))
> + /*
> + * Usually generation in extent item is larger than that in file extent
> + * item because of delay refs.  But we don't want balance to break
> + * file's nocow behaviour, so use file_extent's generation which has
> + * been updates when we update fs root to point to relocated file
> + * extents in data reloc root.
> + */
> + fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen);
> + if (fi_gen <= btrfs_root_last_snapshot(&root->root_item))
>  goto out;
>
>  iref = (struct btrfs_extent_inline_ref *)(ei + 1);
> @@ -2886,7 +2894,7 @@ out:
>
> int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
>    struct btrfs_root *root,
> -   u64 objectid, u64 offset, u64 bytenr)
> +   u64 objectid, u64 offset, u64 bytenr, u64 gen)
> {
>  struct btrfs_path *path;
>  int ret;
> @@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle 
> *trans,
>
>  do {
>  ret = check_committed_ref(trans, root, path, objectid,
> -   offset, bytenr);
> +   offset, bytenr, gen);
>  if (ret && ret != -ENOENT)
>  goto out;
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 2cfdd33..976b045 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1727,6 +1727,8 @@ next_slot:
>  ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
>  if (extent_type == BTRFS_FILE_EXTENT_REG ||
>      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
> + u64 gen;
> + gen = btrfs_file_extent_generation(leaf, fi);
>  disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>  extent_offset = btrfs_file_extent_offset(leaf, fi);
>  extent_end = found_key.offset +
> @@ -1749,7 +1751,8 @@ next_slot:
>  goto out_check;
>  if (btrfs_cross_ref_exist(trans, root, ino,
>    found_key.offset -
> -   extent_offset, disk_bytenr))
> +   extent_offset, disk_bytenr,
> +   gen))
>  goto out_check;
>  disk_bytenr += extent_offset;
>  disk_bytenr += cur_offset - found_key.offset;
> @@ -7002,6 +7005,7 @@ static noinline int can_nocow_odirect(struct 
> btrfs_trans_handle
> *trans,
>  struct btrfs_key key;
>  u64 disk_bytenr;
>  u64 backref_offset;
> + u64 fi_gen;
>  u64 extent_end;
>  u64 num_bytes;
>  int slot;
> @@ -7048,6 +7052,7 @@ static noinline int can_nocow_odirect(struct 
> btrfs_trans_handle
> *trans,
>  }
>  disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>  backref_offset = btrfs_file_extent_offset(leaf, fi);
> + fi_gen = btrfs_file_extent_generation(leaf, fi);
>
>  *orig_start = key.offset - backref_offset;
>  *orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
> @@ -7067,7 +7072,8 @@ static noinline int can_nocow_odirect(struct 
> btrfs_trans_handle
> *trans,
>  * find any we must cow
>  */
>  if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode),
> -   key.offset - backref_offset, disk_bytenr))
> +   key.offset - backref_offset, disk_bytenr,
> +   fi_gen))
>  goto out;
>
>  /*
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 704a1b8..07faabf 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -1637,6 +1637,7 @@ int replace_file_extents(struct btrfs_trans_handle 
> *trans,
>  BUG_ON(ret < 0);
>
>  btrfs_set_file_extent_disk_bytenr(leaf, fi, new_bytenr);
> + btrfs_set_file_extent_generation(leaf, fi, trans->transid);
>  dirty = 1;
>
>  key.offset -= btrfs_file_extent_offset(leaf, fi);
> -- 
> 1.7.7
>
Liu Bo,
Thank you for taking the time to produce this patch. I'm not in a position 
to try a kernel compile and will be on vacation next week. I hope someone 
else would like to give it a try and use a little dd magic to test it.
Thanks,
Kyle 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-17  7:04             ` Liu Bo
  2013-05-17 14:38               ` Kyle Gates
@ 2013-05-28 14:22               ` Kyle Gates
  2013-05-29  1:55                 ` Liu Bo
  1 sibling, 1 reply; 18+ messages in thread
From: Kyle Gates @ 2013-05-28 14:22 UTC (permalink / raw)
  To: bo.li.liu; +Cc: dsterba, linux-btrfs

> From: Liu Bo <bo.li.liu@oracle.com>
>
> Subject: [PATCH] Btrfs: fix broken nocow after a normal balance
>
> Balance will create reloc_root for each fs root, and it's going to
> record last_snapshot to filter shared blocks.  The side effect of
> setting last_snapshot is to break nocow attributes of files.
>
> So here we update file extent's generation while walking relocated
> file extents in data reloc root, and use file extent's generation
> instead for checking if we have cross refs for the file extent.
>
> That way we can make nocow happy again and have no impact on others.
>
> Reported-by: Kyle Gates <kylegates@hotmail.com>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
> fs/btrfs/ctree.h       |    2 +-
> fs/btrfs/extent-tree.c |   18 +++++++++++++-----
> fs/btrfs/inode.c       |   10 ++++++++--
> fs/btrfs/relocation.c  |    1 +
> 4 files changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 4560052..eb2e782 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -3090,7 +3090,7 @@ int btrfs_pin_extent_for_log_replay(struct 
> btrfs_root *root,
>      u64 bytenr, u64 num_bytes);
> int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
>    struct btrfs_root *root,
> -   u64 objectid, u64 offset, u64 bytenr);
> +   u64 objectid, u64 offset, u64 bytenr, u64 gen);
> struct btrfs_block_group_cache *btrfs_lookup_block_group(
>  struct btrfs_fs_info *info,
>  u64 bytenr);
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 1e84c74..f3b3616 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2816,7 +2816,8 @@ out:
> static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
>  struct btrfs_root *root,
>  struct btrfs_path *path,
> - u64 objectid, u64 offset, u64 bytenr)
> + u64 objectid, u64 offset, u64 bytenr,
> + u64 fi_gen)
> {
>  struct btrfs_root *extent_root = root->fs_info->extent_root;
>  struct extent_buffer *leaf;
> @@ -2861,8 +2862,15 @@ static noinline int check_committed_ref(struct 
> btrfs_trans_handle
> *trans,
>      btrfs_extent_inline_ref_size(BTRFS_EXTENT_DATA_REF_KEY))
>  goto out;
>
> - if (btrfs_extent_generation(leaf, ei) <=
> -     btrfs_root_last_snapshot(&root->root_item))
> + /*
> + * Usually generation in extent item is larger than that in file extent
> + * item because of delay refs.  But we don't want balance to break
> + * file's nocow behaviour, so use file_extent's generation which has
> + * been updates when we update fs root to point to relocated file
> + * extents in data reloc root.
> + */
> + fi_gen = max_t(u64, btrfs_extent_generation(leaf, ei), fi_gen);
> + if (fi_gen <= btrfs_root_last_snapshot(&root->root_item))
>  goto out;
>
>  iref = (struct btrfs_extent_inline_ref *)(ei + 1);
> @@ -2886,7 +2894,7 @@ out:
>
> int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans,
>    struct btrfs_root *root,
> -   u64 objectid, u64 offset, u64 bytenr)
> +   u64 objectid, u64 offset, u64 bytenr, u64 gen)
> {
>  struct btrfs_path *path;
>  int ret;
> @@ -2898,7 +2906,7 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle 
> *trans,
>
>  do {
>  ret = check_committed_ref(trans, root, path, objectid,
> -   offset, bytenr);
> +   offset, bytenr, gen);
>  if (ret && ret != -ENOENT)
>  goto out;
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 2cfdd33..976b045 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1727,6 +1727,8 @@ next_slot:
>  ram_bytes = btrfs_file_extent_ram_bytes(leaf, fi);
>  if (extent_type == BTRFS_FILE_EXTENT_REG ||
>      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
> + u64 gen;
> + gen = btrfs_file_extent_generation(leaf, fi);
>  disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>  extent_offset = btrfs_file_extent_offset(leaf, fi);
>  extent_end = found_key.offset +
> @@ -1749,7 +1751,8 @@ next_slot:
>  goto out_check;
>  if (btrfs_cross_ref_exist(trans, root, ino,
>    found_key.offset -
> -   extent_offset, disk_bytenr))
> +   extent_offset, disk_bytenr,
> +   gen))
>  goto out_check;
>  disk_bytenr += extent_offset;
>  disk_bytenr += cur_offset - found_key.offset;
> @@ -7002,6 +7005,7 @@ static noinline int can_nocow_odirect(struct 
> btrfs_trans_handle
> *trans,
>  struct btrfs_key key;
>  u64 disk_bytenr;
>  u64 backref_offset;
> + u64 fi_gen;
>  u64 extent_end;
>  u64 num_bytes;
>  int slot;
> @@ -7048,6 +7052,7 @@ static noinline int can_nocow_odirect(struct 
> btrfs_trans_handle
> *trans,
>  }
>  disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, fi);
>  backref_offset = btrfs_file_extent_offset(leaf, fi);
> + fi_gen = btrfs_file_extent_generation(leaf, fi);
>
>  *orig_start = key.offset - backref_offset;
>  *orig_block_len = btrfs_file_extent_disk_num_bytes(leaf, fi);
> @@ -7067,7 +7072,8 @@ static noinline int can_nocow_odirect(struct 
> btrfs_trans_handle
> *trans,
>  * find any we must cow
>  */
>  if (btrfs_cross_ref_exist(trans, root, btrfs_ino(inode),
> -   key.offset - backref_offset, disk_bytenr))
> +   key.offset - backref_offset, disk_bytenr,
> +   fi_gen))
>  goto out;
>
>  /*
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 704a1b8..07faabf 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -1637,6 +1637,7 @@ int replace_file_extents(struct btrfs_trans_handle 
> *trans,
>  BUG_ON(ret < 0);
>
>  btrfs_set_file_extent_disk_bytenr(leaf, fi, new_bytenr);
> + btrfs_set_file_extent_generation(leaf, fi, trans->transid);
>  dirty = 1;
>
>  key.offset -= btrfs_file_extent_offset(leaf, fi);
> -- 
> 1.7.7
>

Sorry for the long wait in replying.
This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu Raring 
kernel). I can probably try again on a newer version if you think it will 
help.
This was my first kernel compile so I patched by hand and waited (10 hours 
on my old 32 bit single core machine).

I did move some of the files off and back on to the filesystem to start 
fresh and compare but all seem to exhibit the same behavior after a balance.

Thanks,
Kyle 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-28 14:22               ` Kyle Gates
@ 2013-05-29  1:55                 ` Liu Bo
  2013-05-29  8:33                   ` Miao Xie
  2013-05-30 16:40                   ` Kyle Gates
  0 siblings, 2 replies; 18+ messages in thread
From: Liu Bo @ 2013-05-29  1:55 UTC (permalink / raw)
  To: Kyle Gates; +Cc: dsterba, linux-btrfs

On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:
> >From: Liu Bo <bo.li.liu@oracle.com>
> >
> >Subject: [PATCH] Btrfs: fix broken nocow after a normal balance
> >
>[...]
> 
> Sorry for the long wait in replying.
> This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu
> Raring kernel). I can probably try again on a newer version if you
> think it will help.
> This was my first kernel compile so I patched by hand and waited (10
> hours on my old 32 bit single core machine).
> 
> I did move some of the files off and back on to the filesystem to
> start fresh and compare but all seem to exhibit the same behavior
> after a balance.
>

Thanks for testing the patch although it didn't help you.
Actually I tested it to be sure that it fixed the problems in my reproducer.

So anyway can you please apply this debug patch in order to nail it down?

thanks,
liubo

 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index df472ab..c12a11c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2857,8 +2857,12 @@ static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
 		goto out;
 
 	if (btrfs_extent_generation(leaf, ei) <=
-	    btrfs_root_last_snapshot(&root->root_item))
+	    btrfs_root_last_snapshot(&root->root_item)) {
+		printk("extent gen %llu last_snap %llu\n",
+			btrfs_extent_generation(leaf, ei),
+			btrfs_root_last_snapshot(&root->root_item));
 		goto out;
+	}
 
 	iref = (struct btrfs_extent_inline_ref *)(ei + 1);
 	if (btrfs_extent_inline_ref_type(leaf, iref) !=
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 23c596c..8cad6ee 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1317,16 +1317,24 @@ next_slot:
 				goto out_check;
 			if (btrfs_file_extent_compression(leaf, fi) ||
 			    btrfs_file_extent_encryption(leaf, fi) ||
-			    btrfs_file_extent_other_encoding(leaf, fi))
+			    btrfs_file_extent_other_encoding(leaf, fi)) {
+				printk("special encoding\n");
 				goto out_check;
-			if (extent_type == BTRFS_FILE_EXTENT_REG && !force)
+			}
+			if (extent_type == BTRFS_FILE_EXTENT_REG && !force) {
+				printk("BTRFS_FILE_EXTENT_REF\n");
 				goto out_check;
-			if (btrfs_extent_readonly(root, disk_bytenr))
+			}
+			if (btrfs_extent_readonly(root, disk_bytenr)) {
+				printk("ro\n");
 				goto out_check;
+			}
 			if (btrfs_cross_ref_exist(trans, root, ino,
 						  found_key.offset -
-						  extent_offset, disk_bytenr))
+						  extent_offset, disk_bytenr)) {
+				printk("cross ref\n");
 				goto out_check;
+			}
 			disk_bytenr += extent_offset;
 			disk_bytenr += cur_offset - found_key.offset;
 			num_bytes = min(end + 1, extent_end) - cur_offset;


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-29  1:55                 ` Liu Bo
@ 2013-05-29  8:33                   ` Miao Xie
  2013-05-30 16:40                     ` Kyle Gates
  2013-05-30 16:40                   ` Kyle Gates
  1 sibling, 1 reply; 18+ messages in thread
From: Miao Xie @ 2013-05-29  8:33 UTC (permalink / raw)
  To: bo.li.liu; +Cc: Kyle Gates, dsterba, linux-btrfs

On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote:
> On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:
>>> From: Liu Bo <bo.li.liu@oracle.com>
>>>
>>> Subject: [PATCH] Btrfs: fix broken nocow after a normal balance
>>>
>> [...]
>>
>> Sorry for the long wait in replying.
>> This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu
>> Raring kernel). I can probably try again on a newer version if you
>> think it will help.
>> This was my first kernel compile so I patched by hand and waited (10
>> hours on my old 32 bit single core machine).
>>
>> I did move some of the files off and back on to the filesystem to
>> start fresh and compare but all seem to exhibit the same behavior
>> after a balance.
>>
> 
> Thanks for testing the patch although it didn't help you.
> Actually I tested it to be sure that it fixed the problems in my reproducer.
> 
> So anyway can you please apply this debug patch in order to nail it down?

Your patch can not fix the above problem is because we may update ->last_snapshot
after we relocate the file data extent.

For example, there are two block groups which will be relocated, One is data block
group, the other is metadata block group. Then we relocate the data block group firstly,
and set the new generation for the file data extent item/the relative extent item and
set (new_generation - 1) for ->last_snapshot. After the relocation of this block group,
we will end the transaction and drop the relocation tree. If we end the space balance now,
we won't break the nocow rule because ->last_snapshot is less than the generation of the file
data extent item/the relative extent item. But there is still one block group which will be
relocated, when relocating the second block group, we will also start a new transaction,
and update ->last_snapshot if need. So, ->last_snapshot is greater than the generation of the file
data extent item we set before. And the nocow rule is broken.

Back to this above problem. I don't think it is a serious problem, we only do COW once after
the relocation, then we will still honour the nocow rule. The behaviour is similar to snapshot.
So maybe it needn't be fixed.

If we must fix this problem, I think the only way is that get the generation at the beginning
of the space balance, and then set it to ->last_snapshot if ->last_snapshot is less than it,
don't use (current_generation - 1) to update the ->last_snapshot. Besides that, don't forget
to store the generation into btrfs_balance_item, or the problem will happen after we resume the
balance.

Thanks
Miao

> 
> thanks,
> liubo
> 
>  
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index df472ab..c12a11c 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2857,8 +2857,12 @@ static noinline int check_committed_ref(struct btrfs_trans_handle *trans,
>  		goto out;
>  
>  	if (btrfs_extent_generation(leaf, ei) <=
> -	    btrfs_root_last_snapshot(&root->root_item))
> +	    btrfs_root_last_snapshot(&root->root_item)) {
> +		printk("extent gen %llu last_snap %llu\n",
> +			btrfs_extent_generation(leaf, ei),
> +			btrfs_root_last_snapshot(&root->root_item));
>  		goto out;
> +	}
>  
>  	iref = (struct btrfs_extent_inline_ref *)(ei + 1);
>  	if (btrfs_extent_inline_ref_type(leaf, iref) !=
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 23c596c..8cad6ee 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1317,16 +1317,24 @@ next_slot:
>  				goto out_check;
>  			if (btrfs_file_extent_compression(leaf, fi) ||
>  			    btrfs_file_extent_encryption(leaf, fi) ||
> -			    btrfs_file_extent_other_encoding(leaf, fi))
> +			    btrfs_file_extent_other_encoding(leaf, fi)) {
> +				printk("special encoding\n");
>  				goto out_check;
> -			if (extent_type == BTRFS_FILE_EXTENT_REG && !force)
> +			}
> +			if (extent_type == BTRFS_FILE_EXTENT_REG && !force) {
> +				printk("BTRFS_FILE_EXTENT_REF\n");
>  				goto out_check;
> -			if (btrfs_extent_readonly(root, disk_bytenr))
> +			}
> +			if (btrfs_extent_readonly(root, disk_bytenr)) {
> +				printk("ro\n");
>  				goto out_check;
> +			}
>  			if (btrfs_cross_ref_exist(trans, root, ino,
>  						  found_key.offset -
> -						  extent_offset, disk_bytenr))
> +						  extent_offset, disk_bytenr)) {
> +				printk("cross ref\n");
>  				goto out_check;
> +			}
>  			disk_bytenr += extent_offset;
>  			disk_bytenr += cur_offset - found_key.offset;
>  			num_bytes = min(end + 1, extent_end) - cur_offset;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-29  1:55                 ` Liu Bo
  2013-05-29  8:33                   ` Miao Xie
@ 2013-05-30 16:40                   ` Kyle Gates
  1 sibling, 0 replies; 18+ messages in thread
From: Kyle Gates @ 2013-05-30 16:40 UTC (permalink / raw)
  To: bo.li.liu; +Cc: dsterba, linux-btrfs

On Tue, May 28, 2013, Liu Bo wrote:
> On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:
>> >From: Liu Bo <bo.li.liu@oracle.com>
>> >
>> >Subject: [PATCH] Btrfs: fix broken nocow after a normal balance
>> >
>>[...]
>>
>> Sorry for the long wait in replying.
>> This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu
>> Raring kernel). I can probably try again on a newer version if you
>> think it will help.
>> This was my first kernel compile so I patched by hand and waited (10
>> hours on my old 32 bit single core machine).
>>
>> I did move some of the files off and back on to the filesystem to
>> start fresh and compare but all seem to exhibit the same behavior
>> after a balance.
>>
>
> Thanks for testing the patch although it didn't help you.
> Actually I tested it to be sure that it fixed the problems in my 
> reproducer.
>
> So anyway can you please apply this debug patch in order to nail it down?
>
> thanks,
> liubo
>
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index df472ab..c12a11c 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2857,8 +2857,12 @@ static noinline int check_committed_ref(struct 
> btrfs_trans_handle *trans,
>  goto out;
>
>  if (btrfs_extent_generation(leaf, ei) <=
> -     btrfs_root_last_snapshot(&root->root_item))
> +     btrfs_root_last_snapshot(&root->root_item)) {
> + printk("extent gen %llu last_snap %llu\n",
> + btrfs_extent_generation(leaf, ei),
> + btrfs_root_last_snapshot(&root->root_item));
>  goto out;
> + }
>
>  iref = (struct btrfs_extent_inline_ref *)(ei + 1);
>  if (btrfs_extent_inline_ref_type(leaf, iref) !=
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 23c596c..8cad6ee 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1317,16 +1317,24 @@ next_slot:
>  goto out_check;
>  if (btrfs_file_extent_compression(leaf, fi) ||
>      btrfs_file_extent_encryption(leaf, fi) ||
> -     btrfs_file_extent_other_encoding(leaf, fi))
> +     btrfs_file_extent_other_encoding(leaf, fi)) {
> + printk("special encoding\n");
>  goto out_check;
> - if (extent_type == BTRFS_FILE_EXTENT_REG && !force)
> + }
> + if (extent_type == BTRFS_FILE_EXTENT_REG && !force) {
> + printk("BTRFS_FILE_EXTENT_REF\n");
>  goto out_check;
> - if (btrfs_extent_readonly(root, disk_bytenr))
> + }
> + if (btrfs_extent_readonly(root, disk_bytenr)) {
> + printk("ro\n");
>  goto out_check;
> + }
>  if (btrfs_cross_ref_exist(trans, root, ino,
>    found_key.offset -
> -   extent_offset, disk_bytenr))
> +   extent_offset, disk_bytenr)) {
> + printk("cross ref\n");
>  goto out_check;
> + }
>  disk_bytenr += extent_offset;
>  disk_bytenr += cur_offset - found_key.offset;
>  num_bytes = min(end + 1, extent_end) - cur_offset;
>
In another email Miao Xie suggests this patch won't help, so I'll wait for 
more comments/suggestions.
Thanks,
Kyle 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: nocow 'C' flag ignored after balance
  2013-05-29  8:33                   ` Miao Xie
@ 2013-05-30 16:40                     ` Kyle Gates
  0 siblings, 0 replies; 18+ messages in thread
From: Kyle Gates @ 2013-05-30 16:40 UTC (permalink / raw)
  To: miaox, bo.li.liu; +Cc: dsterba, linux-btrfs

On Wed, May 29, 2013 Miao Xie wrote:
> On wed, 29 May 2013 10:55:11 +0900, Liu Bo wrote:
>> On Tue, May 28, 2013 at 09:22:11AM -0500, Kyle Gates wrote:
>>>> From: Liu Bo <bo.li.liu@oracle.com>
>>>>
>>>> Subject: [PATCH] Btrfs: fix broken nocow after a normal balance
>>>>
>>> [...]
>>>
>>> Sorry for the long wait in replying.
>>> This patch was unsuccessful in fixing the problem (on my 3.8 Ubuntu
>>> Raring kernel). I can probably try again on a newer version if you
>>> think it will help.
>>> This was my first kernel compile so I patched by hand and waited (10
>>> hours on my old 32 bit single core machine).
>>>
>>> I did move some of the files off and back on to the filesystem to
>>> start fresh and compare but all seem to exhibit the same behavior
>>> after a balance.
>>>
>>
>> Thanks for testing the patch although it didn't help you.
>> Actually I tested it to be sure that it fixed the problems in my 
>> reproducer.
>>
>> So anyway can you please apply this debug patch in order to nail it down?
>
> Your patch can not fix the above problem is because we may 
> update ->last_snapshot
> after we relocate the file data extent.
>
> For example, there are two block groups which will be relocated, One is 
> data block
> group, the other is metadata block group. Then we relocate the data block 
> group firstly,
> and set the new generation for the file data extent item/the relative 
> extent item and
> set (new_generation - 1) for ->last_snapshot. After the relocation of this 
> block group,
> we will end the transaction and drop the relocation tree. If we end the 
> space balance now,
> we won't break the nocow rule because ->last_snapshot is less than the 
> generation of the file
> data extent item/the relative extent item. But there is still one block 
> group which will be
> relocated, when relocating the second block group, we will also start a 
> new transaction,
> and update ->last_snapshot if need. So, ->last_snapshot is greater than 
> the generation of the file
> data extent item we set before. And the nocow rule is broken.
>
> Back to this above problem. I don't think it is a serious problem, we only 
> do COW once after
> the relocation, then we will still honour the nocow rule. The behaviour is 
> similar to snapshot.
> So maybe it needn't be fixed.

I would argue that for large vm workloads, running a balance or adding disks 
is a common practice that will result in a drastic drop in performance as 
well as massive increases in metadata writes and fragmentation.
In my case my disks were thrashing severely, performance was poor and ntp 
couldn't even hold my clock stable.
If the fix is nontrival please add this to the todo list.
Thanks,
Kyle

> If we must fix this problem, I think the only way is that get the 
> generation at the beginning
> of the space balance, and then set it to ->last_snapshot 
> if ->last_snapshot is less than it,
> don't use (current_generation - 1) to update the ->last_snapshot. Besides 
> that, don't forget
> to store the generation into btrfs_balance_item, or the problem will 
> happen after we resume the
> balance.
>
> Thanks
> Miao
>
>>
>> thanks,
>> liubo
>>
>> [...]
>>
>
 


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-05-30 16:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-02 20:41 Creating recursive snapshots for all filesystems Alexander Skwar
2013-05-03  8:48 ` Sander
2013-05-03 11:46   ` Alexander Skwar
2013-05-05 11:05 ` Kai Krakow
2013-05-05 12:59   ` Alexander Skwar
2013-05-05 16:03     ` Kai Krakow
2013-05-05 16:19       ` Alexander Skwar
2013-05-09 20:41     ` nocow 'C' flag ignored after balance Kyle Gates
2013-05-10  5:15       ` Liu Bo
2013-05-10 13:58         ` Kyle Gates
2013-05-16 19:11           ` Kyle Gates
2013-05-17  7:04             ` Liu Bo
2013-05-17 14:38               ` Kyle Gates
2013-05-28 14:22               ` Kyle Gates
2013-05-29  1:55                 ` Liu Bo
2013-05-29  8:33                   ` Miao Xie
2013-05-30 16:40                     ` Kyle Gates
2013-05-30 16:40                   ` Kyle Gates

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.