linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* All free space eaten during defragmenting (3.14)
@ 2014-05-31  7:19 Szőts Ákos
  2014-06-01  0:56 ` Duncan
  0 siblings, 1 reply; 9+ messages in thread
From: Szőts Ákos @ 2014-05-31  7:19 UTC (permalink / raw)
  To: linux-btrfs

Dear list,

A tried to make a full defragmenting on my $HOME directory (which doesn't 
contain any snapshots). After some hours of running, it stopped with „No space 
left on device” error.

I checked and it ate about 50 GB of free space.
Before: Data, single: total=433.83GiB, used=~380.00GiB
After: Data, single: total=433.83GiB, used=~430.00GiB
Both times: Metadata, DUP: total=8.00GiB, used=7.08GiB

In the "btrfs fi df" man page I didn't find anything that is related to this 
phenomenon.

My questions are:
- Is it a bug or some consequence of the defrag process?
- Can I reclaim somehow the free space?

Command used: 
shopt -s dotglob
for i in *; do echo "$i"; btrfs fi defrag -clzo -r "$i"; done

Btrfs and kernel version:
Btrfs v3.12+20131125
Linux 3.14.4-1.gbebeb6f-desktop #1 SMP PREEMPT x86_64

Best regards,
Ákos

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: All free space eaten during defragmenting (3.14)
  2014-05-31  7:19 All free space eaten during defragmenting (3.14) Szőts Ákos
@ 2014-06-01  0:56 ` Duncan
  2014-06-01  1:56   ` Duncan
  2014-06-01 20:39   ` Peter Chant
  0 siblings, 2 replies; 9+ messages in thread
From: Duncan @ 2014-06-01  0:56 UTC (permalink / raw)
  To: linux-btrfs

Szőts Ákos posted on Sat, 31 May 2014 09:19:03 +0200 as excerpted:

> A tried to make a full defragmenting on my $HOME directory (which
> doesn't contain any snapshots). After some hours of running, it stopped
> with „No space left on device” error.
> 
> I checked and it ate about 50 GB of free space.
> Before: Data, single: total=433.83GiB, used=~380.00GiB
> After: Data, single: total=433.83GiB, used=~430.00GiB
> Both times: Metadata, DUP: total=8.00GiB, used=7.08GiB
> 
> In the "btrfs fi df" man page I didn't find anything that is related to
> this phenomenon.
> 
> My questions are:
> - Is it a bug or some consequence of the defrag process?
> - Can I reclaim somehow the free space?
> 
> Command used:
> shopt -s dotglob for i in *; do echo "$i"; btrfs fi defrag -clzo -r
> "$i"; done
> 
> Btrfs and kernel version:
> Btrfs v3.12+20131125
> Linux 3.14.4-1.gbebeb6f-desktop #1 SMP PREEMPT x86_64

Your btrfs-progs version is old.  You may want to update it.  Current is 
3.14.2.

Seeing just the title, I was sure I knew what happened, but that would 
have been snapshot related, and you say above that you don't have any 
snapshots... of that directory.

But... what is your layout?  Subvolumes?  Where are they mounted?  Do you 
have snapshots of any of them?  Also, a btrfs filesystem show <mountpoint> 
should normally accompany a btrfs filesystem df, as it's pretty hard to 
interpret one without the other.

Here's the deal.  Due to scaling issues the original snapshot aware 
defrag code was recently disabled, so defrag now doesn't worry about 
snapshots, only defragging whatever is currently mounted.  If you have a 
lot of fragmentation and are using snapshots, the defrag will copy all 
those fragmented files in ordered to defrag them, thus duplicating their 
blocks and doubling their required space.  Based on the title alone, 
that's what I /thought/ happened, and given what you did /not/ say, I 
actually still think it is the case and the below assumes that, tho I'm 
no longer entirely sure.

You do say your home dir doesn't contain snapshots, but is it 
snapshotted?  If it's on a snapshotted subvolume (or the main filesystem 
if you don't do subvolumes), it'll be snapshotted along with the 
subvolume, and that explains the extra usage as it's doubling the space.  
Had you included the layout and snapshot information, we could have seen 
from that whether this was the issue, or not, but...

Meanwhile, about the defrag options.  Defrag now has the recursive option 
(-r), which you used, so you really don't need to do the fancy loop stuff 
any more.  Just use the -r/recursive option and let it do the extra work.

The -c/compress option will trigger file compression as well as defrag.  
Normally this would cause them to use less space (or do nothing if you 
had consistently mounted with compress=lzo, but since snapshot-aware-
defrag is currently disabled, in the presence of snapshots of the same 
files it could cause more space to be used instead, since it's likely to 
cause files that don't need defragged to be rewritten as well.

As for reclaiming the space... assuming you have snapshots of /home 
(either as snapshots of the /home subvolume or of /, if you don't have a 
/home subvolume and / is your main mountpoint) as seems likely from the 
given symptoms, now that you've de-duplicated the data, the way to 
recover the space would be to either delete all the pre-defrag snapshots 
so that copy can be freed, or revert to the last pre-defrag snapshot, and 
delete any snapshots since along with the working copy.  You'd be 
deleting one or the other dup of the data depending on which way you 
went, but which way you go is up to you.

The other alternative, of course, would be to mkfs the entire filesystem 
and restore from your backups. =:^)

Meanwhile, do consider using the autodefrag mount option.  That should 
help keep fragmentation from getting out of hand in the first place, altho 
if you've run without it for awhile, you're likely to already be highly 
fragmented, and you may see a slowdown for awhile until the filesystem 
catches up with the fragmentation that's already there.  Tho again, if 
you're using snapshots, given that snapshot-aware-defrag is disabled ATM, 
that could trigger data duplication as well.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: All free space eaten during defragmenting (3.14)
  2014-06-01  0:56 ` Duncan
@ 2014-06-01  1:56   ` Duncan
  2014-06-01 20:39   ` Peter Chant
  1 sibling, 0 replies; 9+ messages in thread
From: Duncan @ 2014-06-01  1:56 UTC (permalink / raw)
  To: linux-btrfs

Duncan posted on Sun, 01 Jun 2014 00:56:09 +0000 as excerpted:

> As for reclaiming the space...

One more option there I forgot to mention... There are various dedup 
tools out there.  See the wiki ( https://btrfs.wiki.kernel.org ) and the 
list archive for more as I've not used any of them myself, but in theory 
at least, the dedup tools should help you recover the space, and I /
think/ they'll do so without forcing either snapshot deletion or a fresh 
mkfs.btrfs and restore from backup, too, those being the other options I 
mentioned.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: All free space eaten during defragmenting (3.14)
  2014-06-01  0:56 ` Duncan
  2014-06-01  1:56   ` Duncan
@ 2014-06-01 20:39   ` Peter Chant
  2014-06-01 22:47     ` Duncan
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Chant @ 2014-06-01 20:39 UTC (permalink / raw)
  To: linux-btrfs

I have a question that has arisen from reading one of Duncan's posts:

On 06/01/2014 01:56 AM, Duncan wrote:

> Here's the deal.  Due to scaling issues the original snapshot aware 
> defrag code was recently disabled, so defrag now doesn't worry about 
> snapshots, only defragging whatever is currently mounted.  If you have a 
> lot of fragmentation and are using snapshots, the defrag will copy all 
> those fragmented files in ordered to defrag them, thus duplicating their 
> blocks and doubling their required space.  Based on the title alone, 
> that's what I /thought/ happened, and given what you did /not/ say, I 
> actually still think it is the case and the below assumes that, tho I'm 
> no longer entirely sure.

The above implies to me that snapshots should not normally be mounted?
I may have misread the intent.

* * *

My thought is that I have a btrfs to hold data on my system, it contains
/home in a subvolume and also subvolumes for various other things.  I
take daily, hourly and weekly snapshots and my script does delete old
ones after a while.

I also mount the base/default btrfs file system on /mnt/data-pool.  This
means that my snapshots are available in their own subdirectory, so I
presume this means that they are mounted, if not in their own right, at
least they are as part of the default subvolume.  Given the
defragmentation discussion above should I be doing this or should my
setup ensure that they are not normally mounted?

I'm not aware of how you would create a subvolume that was outside of a
mounted part of the file system 'tree' - so if I did not want my
subvolumes mounted and I wanted snapshots then I'd have to mount the
default subvolume, make snapshots, and then unmount it?  This seems a
bit clumsy and I'm not convinced that this is a sensible plan.  I don't
think this is right, can anyone confirm or deny?

I do note I get pauses every so often for a second or two or three while
I believe btrfs is doing 'stuff'.  Although annoying I do find I am
prepared to live with this as the subvolumes and snap-shotting are
valuable to me.

Pete


-- 
Peter Chant

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: All free space eaten during defragmenting (3.14)
  2014-06-01 20:39   ` Peter Chant
@ 2014-06-01 22:47     ` Duncan
  2014-06-02 20:54       ` Peter Chant
  0 siblings, 1 reply; 9+ messages in thread
From: Duncan @ 2014-06-01 22:47 UTC (permalink / raw)
  To: linux-btrfs

Peter Chant posted on Sun, 01 Jun 2014 21:39:18 +0100 as excerpted:

> I have a question that has arisen from reading one of Duncan's posts:
> 
> On 06/01/2014 01:56 AM, Duncan wrote:
> 
>> Here's the deal.  Due to scaling issues the original snapshot aware
>> defrag code was recently disabled, so defrag now doesn't worry about
>> snapshots, only defragging whatever is currently mounted.  If you have
>> a lot of fragmentation and are using snapshots, the defrag will copy
>> all those fragmented files in ordered to defrag them, thus duplicating
>> their blocks and doubling their required space.  Based on the title
>> alone, that's what I /thought/ happened, and given what you did /not/
>> say, I actually still think it is the case and the below assumes that,
>> tho I'm no longer entirely sure.
> 
> The above implies to me that snapshots should not normally be mounted? I
> may have misread the intent.

Indeed you misread, because I didn't say exactly what I meant and you 
found a different way of interpreting it that I didn't consider. =:^\

What I /meant/ was "only defragging what you pointed the defrag at", not 
the other snapshots of the same subvolume.  "Mounted" shouldn't have 
anything to do with it, except that I didn't consider the possibility of 
having the other snapshots mounted at the same time, so said "mounted" 
when I meant the one you pointed defrag at as I wasn't thinking about 
having the others mounted too.

> My thought is that I have a btrfs to hold data on my system, it contains
> /home in a subvolume and also subvolumes for various other things.  I
> take daily, hourly and weekly snapshots and my script does delete old
> ones after a while.
> 
> I also mount the base/default btrfs file system on /mnt/data-pool.  This
> means that my snapshots are available in their own subdirectory, so I
> presume this means that they are mounted, if not in their own right, at
> least they are as part of the default subvolume.  Given the
> defragmentation discussion above should I be doing this or should my
> setup ensure that they are not normally mounted?

Your setup is fine in that regard.  My mis-speak. =:^(

The question now is, did my mis-speak fatally flaw delivery of my 
intended point, or did you get it (at least after this correction) in 
spite of my mis-speak?

That point being in three parts... 

1) btrfs snapshots work without using too much space because of btrfs' 
copy-on-write (COW) nature.  Normally, unless there is a change in the 
data from that which was snapshotted, the data will occupy the same 
amount of space no matter how many times you snapshot it.

2) With snapshot-aware-defrag (ideal but currently disabled due to 
scaling issues with the current code), defrag would take account of all 
the snapshots containing the same data, and would change them /all/ to 
point to the new data location, when defragging a snapshotted file.

3) Unfortunately, with the snapshot-awareness disabled, it will only 
defrag the particular instance of the data (normally the online working 
instance) you actually pointed defrag at, ignoring the other snapshots 
still pointing at the old instance, thereby duplicating the data, with 
all the other instances of the data still pinned by their snapshot to the 
old location, while only the single instance you pointed defrag at 
actually gets defragged, thereby breaking the COW link with the other 
instances and duplicating the defragged data.

> I'm not aware of how you would create a subvolume that was outside of a
> mounted part of the file system 'tree' - so if I did not want my
> subvolumes mounted and I wanted snapshots then I'd have to mount the
> default subvolume, make snapshots, and then unmount it?  This seems a
> bit clumsy and I'm not convinced that this is a sensible plan.  I don't
> think this is right, can anyone confirm or deny?

Doing the mount "master" subvolume, make snapshots, then unmount, so the 
snapshots are only available when the "master" subvolume is mounted, is 
one valid way of handling things.  However, it's not the only way.  Your 
way, keeping the "master" mounted all the time as well, is also valid.  I 
simply forgot that case in my original mis-speak.

That said, there's a couple reasons one might go to the inconvenience of 
doing the mount/umount dance, so the snapshots are only available when 
they're actually being worked with.  The first is that unmounted data is 
less likely to be accidentally damaged (altho when it's subvolumes/
snapshots on the same master filesystem, the separation and protection 
from damage isn't as great as if they were entirely seperate filesystems, 
but of course you can't snapshot to entirely separate filesystems).

The second and arguably more important reason has to do with security, 
specifically root escalation vulnerabilities.  Consider system updates 
that include a security update for such a root escalation vulnerability.  
Normally, you'd take a snapshot before doing the update, so as to have a 
chance to rollback to the pre-update snapshot in case something in the 
update goes wrong.  That's a good policy, but what happens to that 
security update?  Now the pre-update snapshot still contains the 
vulnerable version, even while the working copy is patched and is no 
longer vulnerable.  Now, if you keep those snapshots mounted and some bad 
guy gets user access to your system, they can access the still vulnerable 
copy in the pre-update snapshot to upgrade their user access to root. =:^(

Now most systems today are effectively single-human-user and that human 
user has root access anyway, so it's not the huge deal it would be on a 
full multi-user system.  However, just as best practice says don't run as 
root all the time, best practice also says don't leave those pre-update 
root-escalation vulnerable executables laying around for just anyone who 
happens to have user-level execute privileges to access.  Thus, keeping 
the "master" subvolume unmounted and access to those old snapshots 
restricted, except when actually working with the snapshots, is 
considered good policy, for the same reason that not "taking the name of 
root in vain" is considered good policy.

But it's your system and your policies, serving at your convenience.  So 
whether that's too much security at the price of too little convenience, 
is up to you. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: All free space eaten during defragmenting (3.14)
  2014-06-01 22:47     ` Duncan
@ 2014-06-02 20:54       ` Peter Chant
  2014-06-03  4:46         ` Duncan
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Chant @ 2014-06-02 20:54 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 06/01/2014 11:47 PM, Duncan wrote:

>>> Here's the deal.  Due to scaling issues the original snapshot aware
>>> defrag code was recently disabled, so defrag now doesn't worry about
>>> snapshots, only defragging whatever is currently mounted.  If you have
>>> a lot of fragmentation and are using snapshots, the defrag will copy
>>> all those fragmented files in ordered to defrag them, thus duplicating
>>> their blocks and doubling their required space.  Based on the title
>>> alone, that's what I /thought/ happened, and given what you did /not/
>>> say, I actually still think it is the case and the below assumes that,
>>> tho I'm no longer entirely sure.
>>
>> The above implies to me that snapshots should not normally be mounted? I
>> may have misread the intent.
> 
> Indeed you misread, because I didn't say exactly what I meant and you 
> found a different way of interpreting it that I didn't consider. =:^\
> 

I was mildly confused.  Situation normal...


> What I /meant/ was "only defragging what you pointed the defrag at", not 
> the other snapshots of the same subvolume.  "Mounted" shouldn't have 
> anything to do with it, except that I didn't consider the possibility of 
> having the other snapshots mounted at the same time, so said "mounted" 
> when I meant the one you pointed defrag at as I wasn't thinking about 
> having the others mounted too.

Interesting.  I have set autodefrag in fstab.  I _may_ have previously
tried to defrag the top-level subvolume - faint memory, that is
pointless, as if a file exists in more than one subvolume and it is
changed in one or more it cannot be optimally defraged in all subvols at
once if I understand it correctly - as bits of it are common and bits
differ?  Or maybe separate whole copies of the file are created?  So if
using snapshots only defrag the one you are actively using, if I
understand correctly.

Thanks for the hint, it has aided my understanding.


> 
>> My thought is that I have a btrfs to hold data on my system, it contains
>> /home in a subvolume and also subvolumes for various other things.  I
>> take daily, hourly and weekly snapshots and my script does delete old
>> ones after a while.
>>
>> I also mount the base/default btrfs file system on /mnt/data-pool.  This
>> means that my snapshots are available in their own subdirectory, so I
>> presume this means that they are mounted, if not in their own right, at
>> least they are as part of the default subvolume.  Given the
>> defragmentation discussion above should I be doing this or should my
>> setup ensure that they are not normally mounted?
> 
> Your setup is fine in that regard.  My mis-speak. =:^(


Don't think it is that big an issue!  I was slightly puzzled by a
potential implication of your post v my user knowledge of btrfs.

> 
> The question now is, did my mis-speak fatally flaw delivery of my 
> intended point, or did you get it (at least after this correction) in 
> spite of my mis-speak?
> 
> That point being in three parts... 
> 
> 1) btrfs snapshots work without using too much space because of btrfs' 
> copy-on-write (COW) nature.  Normally, unless there is a change in the 
> data from that which was snapshotted, the data will occupy the same 
> amount of space no matter how many times you snapshot it.
> 

Got this.  Killer feature unless you want to store tiny amounts of data
on huge disks if you snap-shot.  Got it.

> 2) With snapshot-aware-defrag (ideal but currently disabled due to 
> scaling issues with the current code), defrag would take account of all 
> the snapshots containing the same data, and would change them /all/ to 
> point to the new data location, when defragging a snapshotted file.
> 

This is an issue I'm not really up on, and is one of the things I was
reading with interest on the list.

> 3) Unfortunately, with the snapshot-awareness disabled, it will only 
> defrag the particular instance of the data (normally the online working 
> instance) you actually pointed defrag at, ignoring the other snapshots 
> still pointing at the old instance, thereby duplicating the data, with 
> all the other instances of the data still pinned by their snapshot to the 
> old location, while only the single instance you pointed defrag at 
> actually gets defragged, thereby breaking the COW link with the other 
> instances and duplicating the defragged data.

So with what I am doing, creating snapshots for 'backup' purposes only,
this should not be a big issue as this will only affect the 'working
copy'.  (No, btrfs snapshots are not my backup solution.)

> 
>> I'm not aware of how you would create a subvolume that was outside of a
>> mounted part of the file system 'tree' - so if I did not want my
>> subvolumes mounted and I wanted snapshots then I'd have to mount the
>> default subvolume, make snapshots, and then unmount it?  This seems a
>> bit clumsy and I'm not convinced that this is a sensible plan.  I don't
>> think this is right, can anyone confirm or deny?
> 
> Doing the mount "master" subvolume, make snapshots, then unmount, so the 
> snapshots are only available when the "master" subvolume is mounted, is 
> one valid way of handling things.  However, it's not the only way.  Your 
> way, keeping the "master" mounted all the time as well, is also valid.  I 
> simply forgot that case in my original mis-speak.
> 
> That said, there's a couple reasons one might go to the inconvenience of 
> doing the mount/umount dance, so the snapshots are only available when 
> they're actually being worked with.  The first is that unmounted data is 
> less likely to be accidentally damaged (altho when it's subvolumes/
> snapshots on the same master filesystem, the separation and protection 
> from damage isn't as great as if they were entirely seperate filesystems, 
> but of course you can't snapshot to entirely separate filesystems).
> 

The protection from damage could also or perhaps better being enforced
using read only snapshots?

> The second and arguably more important reason has to do with security, 
> specifically root escalation vulnerabilities.  Consider system updates 
> that include a security update for such a root escalation vulnerability.  
> Normally, you'd take a snapshot before doing the update, so as to have a 
> chance to rollback to the pre-update snapshot in case something in the 
> update goes wrong.  That's a good policy, but what happens to that 
> security update?  Now the pre-update snapshot still contains the 
> vulnerable version, even while the working copy is patched and is no 
> longer vulnerable.  Now, if you keep those snapshots mounted and some bad 
> guy gets user access to your system, they can access the still vulnerable 
> copy in the pre-update snapshot to upgrade their user access to root. =:^(
> 
> Now most systems today are effectively single-human-user and that human 
> user has root access anyway, so it's not the huge deal it would be on a 
> full multi-user system.  However, just as best practice says don't run as 
> root all the time, best practice also says don't leave those pre-update 
> root-escalation vulnerable executables laying around for just anyone who 
> happens to have user-level execute privileges to access.  Thus, keeping 
> the "master" subvolume unmounted and access to those old snapshots 
> restricted, except when actually working with the snapshots, is 
> considered good policy, for the same reason that not "taking the name of 
> root in vain" is considered good policy.
> 
> But it's your system and your policies, serving at your convenience.  So 
> whether that's too much security at the price of too little convenience, 
> is up to you. =:^)
> 

This is an interesting point.  The changes are not too radical, all I
need to do is add code to my snapshot scripts to mount and unmount my
toplevel btrfs tree when performing a snapshot. Not sure if this causes
any sigificant time penulty as in slowing of the system with any heavy
IO.  Since snapshots are run by cron then the time taken to complete is
not critical, rather whether the act of mounting and unmounting causes
any slowing due to heavy IO.

It does not seem to offer too many absolutes in the way of security or
does it?  I suppose it does, for a normal user, remove access to older
binaries that may have shortcomings.  Suspect permissions could solve
that as well.

Food for thought in any case.  Thank you.

Pete




-- 
Peter Chant

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: All free space eaten during defragmenting (3.14)
  2014-06-02 20:54       ` Peter Chant
@ 2014-06-03  4:46         ` Duncan
  2014-06-03 22:21           ` Peter Chant
  0 siblings, 1 reply; 9+ messages in thread
From: Duncan @ 2014-06-03  4:46 UTC (permalink / raw)
  To: linux-btrfs

Peter Chant posted on Mon, 02 Jun 2014 21:54:56 +0100 as excerpted:

>> What I /meant/ was "only defragging what you pointed the defrag at",
>> not the other snapshots of the same subvolume.  "Mounted" shouldn't
>> have anything to do with it, except that I didn't consider the
>> possibility of having the other snapshots mounted at the same time, so
>> said "mounted" when I meant the one you pointed defrag at as I wasn't
>> thinking about having the others mounted too.
> 
> Interesting.  I have set autodefrag in fstab.  I _may_ have previously
> tried to defrag the top-level subvolume - faint memory, that is
> pointless, as if a file exists in more than one subvolume and it is
> changed in one or more it cannot be optimally defraged in all subvols at
> once if I understand it correctly - as bits of it are common and bits
> differ?  Or maybe separate whole copies of the file are created?  So if
> using snapshots only defrag the one you are actively using, if I
> understand correctly.

Hmm... that brings up an interesting question.  I know snapshots stop at 
subvolume boundaries, but I haven't the foggiest how the -r/recursive 
option to defrag behaves.  Does defrag stop at subvolume boundaries (and 
thus snapshot boundaries, as they're simply special-case subvolumes that 
point at the same data as another subvolume as of the time they were 
taken) too?  If not, what about entirely separate filesystem boundaries 
where a second btrfs filesystem happen to be mounted inside the 
recursively defragged tree?  I simply don't know, tho I strongly suspect 
it doesn't cross full filesystem boundaries, at least.

Of course if you were using something like find and executing defrag on 
each found entry, then yes it would recurse, as find would recurse across 
filesystems and keep going (unless you told it not to using find's -xdev 
option).


Meanwhile, you mention the autodefrag mount option.  Assuming you have it 
on all the time, there should be that much to defrag, *EXCEPT* if the -c/
compress option is used as well.  If you aren't also using the compress 
mount option by default, then you are effectively telling defrag to 
compress everything as it goes, so it will defrag-and-compressed all 
files.  Which wouldn't be a problem with snapshot-aware-defrag as it'd 
compress for all snapshots at the same time too.  But with snapshot-aware-
defrag currently disabled, that would effectively force ALL files to be 
rewritten in ordered to compress them, thereby breaking the COW link with 
the other snapshots and duplicating ALL data.

Which would SERIOUSLY increased data usage, doubling it, except that the 
compression would reduce the size of the new version, so perhaps only a 
50% increase in data usage, with the caveat that the effectiveness of the 
compression and thus the 50% number would vary greatly depending on the 
compressibility of the data in question.

Thus, if the OP were NOT using compression previously, it was the -clzo 
that /really/ blew things up the data usage, as without snapshot-aware-
defrag enabled he was effectively duplicating everything that defrag saw 
in ordered to compress it!  (If he was using the compress=lzo option 
before and had always used it, then adding the -clzo to defrag shouldn't 
have mattered at all, since the compress mount option would have done the 
same thing during the defrag as the defrag compress option.)

I guess that wasn't quite the intended effect of adding the -clzo flag!  
All because of the lack of snapshot-aware-defrag.

>> 2) With snapshot-aware-defrag (ideal but currently disabled due to
>> scaling issues with the current code), defrag would take account of all
>> the snapshots containing the same data, and would change them /all/ to
>> point to the new data location, when defragging a snapshotted file.
>> 
>> 
> This is an issue I'm not really up on, and is one of the things I was
> reading with interest on the list.
> 
>> 3) Unfortunately, with the snapshot-awareness disabled, it will only
>> defrag the particular instance of the data (normally the online working
>> instance) you actually pointed defrag at, ignoring the other snapshots
>> still pointing at the old instance, thereby duplicating the data, with
>> all the other instances of the data still pinned by their snapshot to
>> the old location, while only the single instance you pointed defrag at
>> actually gets defragged, thereby breaking the COW link with the other
>> instances and duplicating the defragged data.
> 
> So with what I am doing, creating snapshots for 'backup' purposes only,
> this should not be a big issue as this will only affect the 'working
> copy'.  (No, btrfs snapshots are not my backup solution.)

If the data that you're trying to defrag is snapshotted, the defrag will 
currently break the COW link and double usage.  However, as long as you 
have the space to spare and are deleting the snapshots in a reasonable 
time (as it sounds like you are since it seems you're doing snapshots 
only to enable a stable backup), once you delete all the snapshots from 
before the defrag, you should get the space back, so it's not a permanent 
issue.

>> That said, there's a couple reasons one might go to the inconvenience
>> of doing the mount/umount dance, so the snapshots are only available
>> when they're actually being worked with.  The first is that unmounted
>> data is less likely to be accidentally damaged (altho when it's
>> subvolumes/ snapshots on the same master filesystem, the separation and
>> protection from damage isn't as great as if they were entirely seperate
>> filesystems, but of course you can't snapshot to entirely separate
>> filesystems).
>> 
>> 
> The protection from damage could also or perhaps better being enforced
> using read only snapshots?

Yes.  But you can put me in the multiple independent btrfs filesystems, 
each on their own partitions, camp.  My problem in principle with one big 
filesystem with subvolumes and snapshots, is that should something happen 
to damage that filesystem such that it cannot be fully recovered, all 
those snapshot and subvolume "data eggs" are in the same filesystem 
"basket", and if it drops, all those eggs are lost at the same time!

So I still vastly prefer traditional partitioning methods, with several 
independent filesystems each on their own partition, and in fact, backup 
partitions/filesystems as well, with the primary backups on partitions on 
the same pair of (mostly btrfs raid1) physical devices.  That way, if one 
btrfs filesystem or even all that were currently mounted go unrecoverably 
bad at the same time, the damage is limited, and I still have the first-
backups on the same device-pair I can boot to.  (FWIW, I have additional 
backups on other devices, just in case it's the operating device pair 
that go bad at the same time, tho I don't necessarily keep them to the 
same level of currency, as I don't consider the risk of both operating 
devices going bad at the same time all that high and accept that level of 
risk should it actually occur.)

So I'm used to unmounted meaning the whole filesystem is not in use and 
therefore reasonable safe from damage, while if it's only subvolumes/
snapshots on the same master filesystem, the level of safety in keeping 
them unmounted (or read-only mounted if mounted at all) isn't really 
comparable to the entirely separate filesystem case.  But certainly, 
there's still /some/ benefit to it.  But that's why I added the 
parenthetical caveat, because in the middle of writing that paragraph, I 
realized that the safety element wasn't as big a deal as I had originally 
thought when I started the paragraph, because I'm used to dealing with 
the separate filesystems case and that didn't apply here.

>> The second and arguably more important reason has to do with security,
>> specifically root escalation vulnerabilities.  Consider system updates
>> that include a security update for such a root escalation
>> vulnerability. Normally, you'd take a snapshot before doing the update,
>> so as to have a chance to rollback to the pre-update snapshot in case
>> something in the update goes wrong.  That's a good policy, but what
>> happens to that security update?  Now the pre-update snapshot still
>> contains the vulnerable version, even while the working copy is patched
>> and is no longer vulnerable.  Now, if you keep those snapshots mounted
>> and some bad guy gets user access to your system, they can access the
>> still vulnerable copy in the pre-update snapshot to upgrade their user
>> access to root. =:^(
>> 
> This is an interesting point.  The changes are not too radical, all I
> need to do is add code to my snapshot scripts to mount and unmount my
> toplevel btrfs tree when performing a snapshot. Not sure if this causes
> any sigificant time penulty as in slowing of the system with any heavy
> IO.  Since snapshots are run by cron then the time taken to complete is
> not critical, rather whether the act of mounting and unmounting causes
> any slowing due to heavy IO.

Lest there be any confusion I should note that idea isn't original to 
me.  But as I'm reasonably security focused, once I read it on the list, 
it definitely ranked rather high on my "snapshots considerations" list, 
and you can bet I'll never have the master subvolume routinely mounted 
here as a result!

Meanwhile, unless there's something strange going on, mounts shouldn't 
affect ongoing I/O much at all.  Umounts are slightly different, in that 
on btrfs there can be some housekeeping that must be done before the 
filesystem is fully unmounted that could in theory disrupt ongoing I/O 
temporarily, but that's limited to writable mounts where some serious 
write-activity occurred, such that if you're just mounting to do a 
snapshot and umounting again, I don't believe that should be a problem, 
since in the normal case there will be only a bit of metadata to update 
from the process of doing the snapshot.

FWIW, while I actually don't do much snapshotting here, I have something 
similar setup for my "packages" filesystem, which is unmounted unless I'm 
doing system updates or package queries, and for my rootfs, which is 
mounted read-only, again unless I'm updating it.  My package-tree-update 
scripts check to see if the packages filesystem is mounted and if not 
mount it, and remount my rootfs read-write, before syncing the packages-
tree from remote.  When I'm done, I have another script that umounts the 
packages tree, and remounts the rootfs ro once again.

And you're right, in comparison to the rest of the scripts, the mounting 
bit is actually quite trivial. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: All free space eaten during defragmenting (3.14)
  2014-06-03  4:46         ` Duncan
@ 2014-06-03 22:21           ` Peter Chant
  2014-06-04  9:21             ` Duncan
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Chant @ 2014-06-03 22:21 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 06/03/2014 05:46 AM, Duncan wrote:

>> Interesting.  I have set autodefrag in fstab.  I _may_ have previously
>> tried to defrag the top-level subvolume - faint memory, that is
>> pointless, as if a file exists in more than one subvolume and it is
>> changed in one or more it cannot be optimally defraged in all subvols at
>> once if I understand it correctly - as bits of it are common and bits
>> differ?  Or maybe separate whole copies of the file are created?  So if
>> using snapshots only defrag the one you are actively using, if I
>> understand correctly.
> 
> Hmm... that brings up an interesting question.  I know snapshots stop at 
> subvolume boundaries, but I haven't the foggiest how the -r/recursive 
> option to defrag behaves.  Does defrag stop at subvolume boundaries (and 
> thus snapshot boundaries, as they're simply special-case subvolumes that 
> point at the same data as another subvolume as of the time they were 
> taken) too?  If not, what about entirely separate filesystem boundaries 
> where a second btrfs filesystem happen to be mounted inside the 
> recursively defragged tree?  I simply don't know, tho I strongly suspect 
> it doesn't cross full filesystem boundaries, at least.

I'm not a dev so this is going rather far beyond my knowledge...

> 
> Of course if you were using something like find and executing defrag on 
> each found entry, then yes it would recurse, as find would recurse across 
> filesystems and keep going (unless you told it not to using find's -xdev 
> option).


I did not know the recursive option existed.  However, I'd previously
cursed the tools not having a recursive option or being recursive by
default.  If there is now a recursive option it would be really perverse
to use find to implement a recursive defrag.

> 
> 
> Meanwhile, you mention the autodefrag mount option.  Assuming you have it 
> on all the time, there should be that much to defrag, *EXCEPT* if the -c/
> compress option is used as well.  If you aren't also using the compress 
> mount option by default, then you are effectively telling defrag to 
> compress everything as it goes, so it will defrag-and-compressed all 
> files.  Which wouldn't be a problem with snapshot-aware-defrag as it'd 
> compress for all snapshots at the same time too.  But with snapshot-aware-
> defrag currently disabled, that would effectively force ALL files to be 
> rewritten in ordered to compress them, thereby breaking the COW link with 
> the other snapshots and duplicating ALL data.

I've got compress=lzo, options from fstab:
device=/dev/sdb,device=/dev/sdc,autodefrag,defaults,inode_cache,noatime,
compress=lzo

I'm running kernel 3.13.6.  Not sure if snapshot-aware-defrag is enabled
or disabled in this version.  Unfortunately I really don't understand
how COW works here.  I understand the basic idea but have no idea how it
is implemented in btrfs or any other fs.


> 
> Which would SERIOUSLY increased data usage, doubling it, except that the 
> compression would reduce the size of the new version, so perhaps only a 
> 50% increase in data usage, with the caveat that the effectiveness of the 
> compression and thus the 50% number would vary greatly depending on the 
> compressibility of the data in question.
> 

>>> 3) Unfortunately, with the snapshot-awareness disabled, it will only
>>> defrag the particular instance of the data (normally the online working
>>> instance) you actually pointed defrag at, ignoring the other snapshots
>>> still pointing at the old instance, thereby duplicating the data, with
>>> all the other instances of the data still pinned by their snapshot to
>>> the old location, while only the single instance you pointed defrag at
>>> actually gets defragged, thereby breaking the COW link with the other
>>> instances and duplicating the defragged data.
>>
>> So with what I am doing, creating snapshots for 'backup' purposes only,
>> this should not be a big issue as this will only affect the 'working
>> copy'.  (No, btrfs snapshots are not my backup solution.)
> 
> If the data that you're trying to defrag is snapshotted, the defrag will 
> currently break the COW link and double usage.  However, as long as you 
> have the space to spare and are deleting the snapshots in a reasonable 
> time (as it sounds like you are since it seems you're doing snapshots 
> only to enable a stable backup), once you delete all the snapshots from 
> before the defrag, you should get the space back, so it's not a permanent 
> issue.

Hmm.  From your previous discussion I get the impression that it is not
a problem if it has always compressed, or always not compressed, but it
blows up if the compression setting is changed - e.g. the compressed
file and uncompressed file are effectively completely different.


> 
>>> That said, there's a couple reasons one might go to the inconvenience
>>> of doing the mount/umount dance, so the snapshots are only available
>>> when they're actually being worked with.  The first is that unmounted
>>> data is less likely to be accidentally damaged (altho when it's
>>> subvolumes/ snapshots on the same master filesystem, the separation and
>>> protection from damage isn't as great as if they were entirely seperate
>>> filesystems, but of course you can't snapshot to entirely separate
>>> filesystems).
>>>
>>>
>> The protection from damage could also or perhaps better being enforced
>> using read only snapshots?
> 
> Yes.  But you can put me in the multiple independent btrfs filesystems, 
> each on their own partitions, camp.  My problem in principle with one big 
> filesystem with subvolumes and snapshots, is that should something happen 
> to damage that filesystem such that it cannot be fully recovered, all 
> those snapshot and subvolume "data eggs" are in the same filesystem 
> "basket", and if it drops, all those eggs are lost at the same time!
> 

My 'main' backup is to rsync to a ext4 formatted drive.  I have a second
backup (reminder to use it).  That is btrfs and uses snapshots.
However, I rsync to it, I'm assuming that if my btrfs that I am backing
up is corrupted then there is a danger that send/receive could propagate
errors?  Without knowing any better it seems like something worth
eliminating.


> So I still vastly prefer traditional partitioning methods, with several 
> independent filesystems each on their own partition, and in fact, backup 
> partitions/filesystems as well, with the primary backups on partitions on 
> the same pair of (mostly btrfs raid1) physical devices.  That way, if one 
> btrfs filesystem or even all that were currently mounted go unrecoverably 
> bad at the same time, the damage is limited, and I still have the first-
> backups on the same device-pair I can boot to.  (FWIW, I have additional 
> backups on other devices, just in case it's the operating device pair 
> that go bad at the same time, tho I don't necessarily keep them to the 
> same level of currency, as I don't consider the risk of both operating 
> devices going bad at the same time all that high and accept that level of 
> risk should it actually occur.)
> 
> So I'm used to unmounted meaning the whole filesystem is not in use and 
> therefore reasonable safe from damage, while if it's only subvolumes/
> snapshots on the same master filesystem, the level of safety in keeping 
> them unmounted (or read-only mounted if mounted at all) isn't really 
> comparable to the entirely separate filesystem case.  But certainly, 
> there's still /some/ benefit to it.  But that's why I added the 
> parenthetical caveat, because in the middle of writing that paragraph, I 
> realized that the safety element wasn't as big a deal as I had originally 
> thought when I started the paragraph, because I'm used to dealing with 
> the separate filesystems case and that didn't apply here.
> 

I've amended my scripts so the toplevel subvol and snapshots are now
only mounted during snapshot creation and deletion.

>>> The second and arguably more important reason has to do with security,
>>> specifically root escalation vulnerabilities.  Consider system updates
>>> that include a security update for such a root escalation
>>> vulnerability. Normally, you'd take a snapshot before doing the update,
>>> so as to have a chance to rollback to the pre-update snapshot in case
>>> something in the update goes wrong.  That's a good policy, but what
>>> happens to that security update?  Now the pre-update snapshot still
>>> contains the vulnerable version, even while the working copy is patched
>>> and is no longer vulnerable.  Now, if you keep those snapshots mounted
>>> and some bad guy gets user access to your system, they can access the
>>> still vulnerable copy in the pre-update snapshot to upgrade their user
>>> access to root. =:^(
>>>
>> This is an interesting point.  The changes are not too radical, all I
>> need to do is add code to my snapshot scripts to mount and unmount my
>> toplevel btrfs tree when performing a snapshot. Not sure if this causes
>> any sigificant time penulty as in slowing of the system with any heavy
>> IO.  Since snapshots are run by cron then the time taken to complete is
>> not critical, rather whether the act of mounting and unmounting causes
>> any slowing due to heavy IO.
> 
> Lest there be any confusion I should note that idea isn't original to 
> me.  But as I'm reasonably security focused, once I read it on the list, 
> it definitely ranked rather high on my "snapshots considerations" list, 
> and you can bet I'll never have the master subvolume routinely mounted 
> here as a result!
> 
> Meanwhile, unless there's something strange going on, mounts shouldn't 
> affect ongoing I/O much at all.  Umounts are slightly different, in that 
> on btrfs there can be some housekeeping that must be done before the 
> filesystem is fully unmounted that could in theory disrupt ongoing I/O 
> temporarily, but that's limited to writable mounts where some serious 
> write-activity occurred, such that if you're just mounting to do a 
> snapshot and umounting again, I don't believe that should be a problem, 
> since in the normal case there will be only a bit of metadata to update 
> from the process of doing the snapshot.
> 

This is an interesting point.  When I first modified my scripts to
mount/umount the top-level sub-volume I found things slowing
dramatically.  Heavy disk IO and usage of btrfs-cleaner, btrfs-transact
and btrfs-submit for minutes on end.  Brief pauses whilst the system
became usable.

Something else odd seems to be happening right now.  I'm cleaning out
some directories to free up disk space, /tmp-old out of / and also
associated snapshots.  This is on SSD but I can hear my traditional HDDs
thrashing.  Separate btrfs file systems.  Presumably a coincidence.

Hopefully things will settle down.  Though the system is still doing a
lot of disk io it is a lot more usable than earlier.

Pete



-- 
Peter Chant

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: All free space eaten during defragmenting (3.14)
  2014-06-03 22:21           ` Peter Chant
@ 2014-06-04  9:21             ` Duncan
  0 siblings, 0 replies; 9+ messages in thread
From: Duncan @ 2014-06-04  9:21 UTC (permalink / raw)
  To: linux-btrfs

Peter Chant posted on Tue, 03 Jun 2014 23:21:55 +0100 as excerpted:

> On 06/03/2014 05:46 AM, Duncan wrote:
> 
>> Of course if you were using something like find and executing defrag on
>> each found entry, then yes it would recurse, as find would recurse
>> across filesystems and keep going (unless you told it not to using
>> find's -xdev option).
> 
> I did not know the recursive option existed.  However, I'd previously
> cursed the tools not having a recursive option or being recursive by
> default.  If there is now a recursive option it would be really perverse
> to use find to implement a recursive defrag.

Defrag's -r/recursive option is reasonably new, but checking the btrfs-
progs git tree (since I run the git version) says that was commit 
c2c5353b, which git describe says was v0.19-725, so it should be in btrfs-
progs v3.12.  So it's not /that/ new.  Anyone still running something 
earlier than that really should update. =:^)

But the wiki recommended using find from back before the builtin 
recursive option, and I can well imagine people with already working 
scripts not wanting to fix what isn't (for them) broken. =:^)  So I 
imagine there will be find-and-defrag users for some time, tho they 
should even now be on their way to becoming a rather small percentage,
at least for folks following the keep-current recommendations.

Meanwhile, this question is bugging me so let me just ask it.  The OP was 
from a different email address (szotsaki@gmail), and once I noticed that 
I've been assuming that you and the OP are different people, tho in my 
first reply to you I assumed you were the OP.  So just to clear things 
up, different people and I can't assume that what he wrote about his case 
applies to you, correct? =:^)

>> Meanwhile, you mention the autodefrag mount option.  Assuming you have
>> it on all the time, there should be that much to defrag, *EXCEPT* if
>> the -c/ compress option is used as well.  If you aren't also using the
>> compress mount option by default, then you are effectively telling
>> defrag to compress everything as it goes, so it will
>> defrag-and-compressed all files.  Which wouldn't be a problem with
>> snapshot-aware-defrag as it'd compress for all snapshots at the same
>> time too.  But with snapshot-aware-
>> defrag currently disabled, that would effectively force ALL files to be
>> rewritten in ordered to compress them, thereby breaking the COW link
>> with the other snapshots and duplicating ALL data.
> 
> I've got compress=lzo, options from fstab:
> device=/dev/sdb,device=/dev/sdc,autodefrag,defaults,inode_cache,noatime,
> compress=lzo
> 
> I'm running kernel 3.13.6.  Not sure if snapshot-aware-defrag is enabled
> or disabled in this version.

A git search says (linus' mainline tree) commit 8101c8db, merge commit 
878a876b, with git describe labeling the merge commit as v3.14-rc1-13, so 
it would be in v3.14-rc2.  However, the commit in question was CCed to 
stable@, so it should have made it into a 3.13.x stable release as well.  
Whether it's in 3.13.6 specifically, I couldn't say without checking the 
stable tree or changelog, which should be easier for you to do since 
you're actually running it.  (Hint, I simply searched on "defrag", here; 
it ended up being the third hit back from 3.14.0, I believe, so it 
shouldn't be horribly buried, at least.)

> Unfortunately I really don't understand how COW works here.
> I understand the basic idea but have no idea how it is implemented
> in btrfs or any other fs.

FWIW, I think only the kernel/filesystem or at least developer types 
/really/ understand COW, but I /think/ I have a reasonable sysadmin's-
level understanding of the practical effects in terms of btrfs, simply 
from watching the list.

Meanwhile, not that it has any bearing on this thread, but about your 
mount options, FWIW you may wish to remove that inode_cache option.  I 
don't claim to have a full understanding, but from what I've picked up 
from various dev remarks, it's not necessary at all on 64-bit systems 
(well, unless you have really small files filling an exabyte size 
filesystem!) since the inode-space is large enough finding free inode 
numbers isn't an issue, and while it can be of help in specific 
situations on 32-bit systems, there's two problems with it that make it 
not suitable for the general case: (1) on large filesystems (I'm not sure 
how large but I'd guess it's TiB scale) there's danger of inode-number-
collision due to 32-bit-overflow, and (2) it must be regenerated at every 
mount, which at least on TiB-scale spinning rust can trigger several 
minutes of intense drive activity while it does so.  (The btrfs wiki now 
says it's not recommended, but has a somewhat different explanation.  
While I'm not a coder and thus in no position to say for sure based on 
the code, I believe the wiki's explanation isn't quite correct, but 
either way, it's still not recommended.)

The use-cases where inode_cache might be worthwhile are thus all 32-bit, 
and include things like busy email servers with lots of files being 
constantly created/deleted.  If in doubt, disable it.

Oh, and while I'm at it, I might as well mention that the "defaults" 
mount option is normally not necessary, except as a field-holder in fstab 
if no non-default options are being used as well.  That's the whole point 
of "defaults", that they're default, and thus don't need passed.  Tho 
(unlike inode_cache) it does no harm.

>> If the data that you're trying to defrag is snapshotted, the defrag
>> will currently break the COW link and double usage.  However, as long
>> as you have the space to spare and are deleting the snapshots in a
>> reasonable time (as it sounds like you are since it seems you're doing
>> snapshots only to enable a stable backup), once you delete all the
>> snapshots from before the defrag, you should get the space back, so
>> it's not a permanent issue.
> 
> Hmm.  From your previous discussion I get the impression that it is not
> a problem if it has always compressed, or always not compressed, but it
> blows up if the compression setting is changed - e.g. the compressed
> file and uncompressed file are effectively completely different.

Yes.  Except that I'm not /absolutely/ sure that a "null defrag", that 
is, one that doesn't have anything to do since everything is already 
defragged and compression is the same either way, actually does nothing, 
thereby leaving the COW linkages intact.  I /believe/ that to be the 
case, and if it /is/ the case, a defrag with the same compression on 
already defragged (due to autodefrag) data /shouldn't/ blow up data usage 
against snapshots, since it wouldn't actually move anything, but I don't 
/know/ that for sure and not being a coder I can't so easily just go and 
look at the code to see, either.

And since I basically don't use snapshots, preferring actual backups, it 
wouldn't be that easy for me to check by simply trying a few GiBs of 
defrag (since I too use compress=lzo,autodefrag) and comparing before and 
after usage, since there's no snapshots that it'd be duplicating against.

But it should be a fairly easy thing to check.  (This is where I wonder 
if you're the OP.  Obviously if so, /something/ triggered that change in 
size.)

Assuming you're not the OP and you're more concerned about future 
behavior, since you're already using autodefrag and snapshots, a before 
and after btrfs filesystem df check, with a defrag of a few GiB between, 
enough to tick over a couple digits of data usage on the df should it be 
doubling things, should help prove one way or the other.  If that doesn't 
increase usage, the same test but with say -czlib, since you're currently 
using lzo, should force the COW-link breakage and double the usage for 
that few GiB, thereby proving both the leave-alone case of no change, and 
the COW-link breakage doubling.  It'd cost that few GiB of extra space 
usage, but should clear up the question, one way or the other.

> My 'main' backup is to rsync to a ext4 formatted drive.  I have a second
> backup (reminder to use it).  That is btrfs and uses snapshots. However,
> I rsync to it, I'm assuming that if my btrfs that I am backing up is
> corrupted then there is a danger that send/receive could propagate
> errors?  Without knowing any better it seems like something worth
> eliminating.

>From what I've seen onlist, if send/receive completes on both sides (send 
and receive) without error, you have a pretty reliable backup.  If 
there's anything wrong, one side or the other errors out.  Actually, due 
to various corner-case bugs they're still flushing out, send/receive can 
error out even if both sides are fine, too, just because something 
happened that nobody thought of yet.  One recently fixed bug, for 
example, was a scenario where two subdirs were originally nested B inside 
of A, but switched positions so A was inside B.  Until that fix, send/
receive couldn't make sense of that situation and would simply error out.

So as long as send/receive works it should be reliable.  The trouble 
isn't that it propagates filesystem corruption as it'll error out before 
it does that, but that at present there's still enough legitimate but 
strange corner-cases that it errors out on, that it isn't a reliable 
backup mechanism for /that/ reason, not because it'll propagate 
corruption.  So you can use it with confidence as long as it's working.  
Just be prepared for it to quit working without notice, even after it has 
been working reliably for awhile, and have a fallback backup method ready 
to go should that happen.

That said, since you're already using rsync, I'd suggest staying with 
that, for now.  There will be plenty of time to switch to the more 
efficient btrfs send/receive after both btrfs and send/receive have 
matured rather longer.

> I've amended my scripts so the toplevel subvol and snapshots are now
> only mounted during snapshot creation and deletion.
> 
>>> This is an interesting point.  The changes are not too radical, all I
>>> need to do is add code to my snapshot scripts to mount and unmount my
>>> toplevel btrfs tree when performing a snapshot. Not sure if this
>>> causes any sigificant time penulty as in slowing of the system with
>>> any heavy IO.  Since snapshots are run by cron then the time taken to
>>> complete is not critical, rather whether the act of mounting and
>>> unmounting causes any slowing due to heavy IO.
>> 
>> [U]nless there's something strange going on, mounts shouldn't
>> affect ongoing I/O much at all.  Umounts are slightly different, in
>> that on btrfs there can be some housekeeping that must be done before
>> the filesystem is fully unmounted that could in theory disrupt ongoing
>> I/O temporarily, but that's limited to writable mounts where some
>> serious write-activity occurred, such that if you're just mounting to
>> do a snapshot and umounting again, I don't believe that should be a
>> problem, since in the normal case there will be only a bit of metadata
>> to update from the process of doing the snapshot.
>> 
>> 
> This is an interesting point.  When I first modified my scripts to
> mount/umount the top-level sub-volume I found things slowing
> dramatically.  Heavy disk IO and usage of btrfs-cleaner, btrfs-transact
> and btrfs-submit for minutes on end.  Brief pauses whilst the system
> became usable.

That /might/ be the inode_cache thing.  Like I said, that's not 
recommended, with one of the down sides being high I/O at mount.  I 
definitely wasn't considering it when I said mounts shouldn't affect 
ongoing I/O!

So try without that, tho I can't say that's the /entire/ problem, but it 
certainly won't be helping things!

> Something else odd seems to be happening right now.  I'm cleaning out
> some directories to free up disk space, /tmp-old out of / and also
> associated snapshots.  This is on SSD but I can hear my traditional HDDs
> thrashing.  Separate btrfs file systems.  Presumably a coincidence.
> 
> Hopefully things will settle down.  Though the system is still doing a
> lot of disk io it is a lot more usable than earlier.

One other thing that might be part of it.  Currently, btrfs does a lot of 
re-scanning (effectively btrfs device scan) as neither the userspace nor 
the kernel properly caches and reuses active btrfs filesystem and device 
information.  So mounts and various btrfs userspace actions will rescan 
instead of caching, while OTOH, the kernel btrfs subsystem can sometimes 
be oblivious to device changes that other bits of the kernel already know 
about.  There have actually been some real recent patches targeting that, 
but I think they'll hit kernel and userspace v3.16, as at least some of 
them were too late for kernel v3.15.

But try without inode_cache as I suspect that may well be a good part of 
it right there, and I'd really like to know whether I'm right or wrong on 
that.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-06-04  9:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-31  7:19 All free space eaten during defragmenting (3.14) Szőts Ákos
2014-06-01  0:56 ` Duncan
2014-06-01  1:56   ` Duncan
2014-06-01 20:39   ` Peter Chant
2014-06-01 22:47     ` Duncan
2014-06-02 20:54       ` Peter Chant
2014-06-03  4:46         ` Duncan
2014-06-03 22:21           ` Peter Chant
2014-06-04  9:21             ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).