linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Btrfs filesystem freezing during snapshots
@ 2014-05-26 12:28 David Bloquel
  2014-05-26 15:20 ` Martin
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: David Bloquel @ 2014-05-26 12:28 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have a problem with my btrfs filesystem which is freezing when I am
doing snapshots.

I have a cron that is snapshoting around 70 sub volume every ten
minutes. The sub volumes that btrfs is snapshoting are containers
folders that are running through my virtual environment.
Sub directories that btrfs is snapshoting are not that big (from 500MB
to 10GB max and usually around 3GB) but there is a lot of IO on the
filesystem because of the intensive use of the CTs and VMs.

At some point the snapshot process becomes really slow, at first it
snapshot around one folder per seconds but then after a while it can
take 30seconds or even few minutes to snapshot one single sub volumes.
Subvolumes are really similar to each other in size and number of
files so there is no reason that it takes 1second for one sub volume
and then 3minutes for another one.

Moreover when my snapshot cron is running all my vms and containers
are slowing down until the whole filesystem freezes which leads to
frozen CT and VMs (which is a real problem for me).

Moreover I can see that my CPU load is really high during the process.

when I'm am looking to dmesg there is a lot of messages of this kind:

[96537.686467] BTRFS debug (device drbd0): unlinked 290 orphans
[96540.819101] BTRFS debug (device drbd0): unlinked 2317 orphans
[96544.852499] BTRFS debug (device drbd0): unlinked 25 orphans
[96547.494132] BTRFS debug (device drbd0): unlinked 20 orphans
[96770.954615] BTRFS debug (device drbd0): unlinked 95 orphans
[96814.027538] BTRFS debug (device drbd0): unlinked 3331 orphans
[96841.240481] BTRFS debug (device drbd0): unlinked 24 orphans
[96851.094867] BTRFS debug (device drbd0): unlinked 6 orphans
[96862.285772] BTRFS debug (device drbd0): unlinked 2105 orphans
[96869.611062] BTRFS debug (device drbd0): unlinked 9 orphans
[96875.920977] BTRFS debug (device drbd0): unlinked 2 orphans
[96892.333661] BTRFS debug (device drbd0): unlinked 1640 orphans
[96902.928344] BTRFS debug (device drbd0): unlinked 482 orphans
[96907.615605] BTRFS debug (device drbd0): unlinked 83 orphans
[96914.216044] BTRFS debug (device drbd0): unlinked 39 orphans
[96921.936762] BTRFS debug (device drbd0): unlinked 50 orphans
[96927.035003] BTRFS debug (device drbd0): unlinked 12 orphans
[96932.864481] BTRFS debug (device drbd0): unlinked 5 orphans
[96937.511487] BTRFS debug (device drbd0): unlinked 31 orphans
[96946.521916] BTRFS debug (device drbd0): unlinked 5 orphans
[96948.591532] BTRFS debug (device drbd0): unlinked 4 orphans


I am not copying the whole dmesg because there is hundreds of orphans warning.

In addition of orphans warning there is also this kind of messages in
the log files:

[69537.117372] INFO: task btrfs-transacti:14507 blocked for more than
120 seconds.
[69537.117439]       Not tainted 3.12-0.bpo.1-amd64 #1
[69537.117475] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[69537.117535] btrfs-transacti D ffff88047fdd4300     0 14507      2 0x00000000
[69537.117546]  ffff88046bc740c0 0000000000000046 0000000000000296
ffff88046f0dc840
[69537.117557]  ffff880075987fd8 ffff880075987fd8 ffff880075987fd8
ffff88046bc740c0
[69537.117565]  0000000000000246 ffff880351942ea8 ffff880351942f30
0000000000000000
[69537.117574] Call Trace:
[69537.117613]  [<ffffffffa04b4dc5>] ? wait_for_commit.isra.25+0x55/0x90 [btrfs]
[69537.117624]  [<ffffffff81082d20>] ? add_wait_queue+0x60/0x60
[69537.117650]  [<ffffffffa04b69bb>] ?
btrfs_commit_transaction+0x10b/0x9f0 [btrfs]
[69537.117675]  [<ffffffffa04b0385>] ? transaction_kthread+0x1b5/0x220 [btrfs]
[69537.117699]  [<ffffffffa04b01d0>] ?
btree_readpage_end_io_hook+0x2d0/0x2d0 [btrfs]
[69537.117707]  [<ffffffff81082333>] ? kthread+0xb3/0xc0
[69537.117715]  [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
[69537.117724]  [<ffffffff814cb70c>] ? ret_from_fork+0x7c/0xb0
[69537.117732]  [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
[69657.215298] INFO: task btrfs-transacti:14507 blocked for more than
120 seconds.
[69657.215360]       Not tainted 3.12-0.bpo.1-amd64 #1
[69657.215393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[69657.215450] btrfs-transacti D ffff88047fdd4300     0 14507      2 0x00000000
[69657.215455]  ffff88046bc740c0 0000000000000046 0000000000000296
ffff88046f0dc840
[69657.215461]  ffff880075987fd8 ffff880075987fd8 ffff880075987fd8
ffff88046bc740c0
[69657.215465]  0000000000000246 ffff880351942ea8 ffff880351942f30
0000000000000000
[69657.215469] Call Trace:
[69657.215490]  [<ffffffffa04b4dc5>] ? wait_for_commit.isra.25+0x55/0x90 [btrfs]
[69657.215496]  [<ffffffff81082d20>] ? add_wait_queue+0x60/0x60
[69657.215508]  [<ffffffffa04b69bb>] ?
btrfs_commit_transaction+0x10b/0x9f0 [btrfs]
[69657.215520]  [<ffffffffa04b0385>] ? transaction_kthread+0x1b5/0x220 [btrfs]
[69657.215531]  [<ffffffffa04b01d0>] ?
btree_readpage_end_io_hook+0x2d0/0x2d0 [btrfs]
[69657.215535]  [<ffffffff81082333>] ? kthread+0xb3/0xc0
[69657.215539]  [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
[69657.215543]  [<ffffffff814cb70c>] ? ret_from_fork+0x7c/0xb0
[69657.215547]  [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0


I think the message: "[69537.117372] INFO: task btrfs-transacti:14507
blocked for more than 120 seconds." appears when the filesystem is
frozen.


A solution would be to wait few seconds between each snapshot to avoid
high load however I think it's just a way to avoid the problem and I
would rather fix it because I am affraid it could appear during
another operation (copy of a lot of small files etc...).

I have checked a lot of old messages from this mailling list and I got
some clues but no real/working solution in my case.

I hope some of you could give me some advises

If you need any further information please do not hesitate.

(Sorry for my English, I tried to make it as good as I can)

Best regards,
David

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs filesystem freezing during snapshots
  2014-05-26 12:28 Btrfs filesystem freezing during snapshots David Bloquel
@ 2014-05-26 15:20 ` Martin
  2014-05-26 16:19   ` Russell Coker
  2014-05-26 15:39 ` Duncan
  2014-05-26 16:39 ` Roman Mamedov
  2 siblings, 1 reply; 6+ messages in thread
From: Martin @ 2014-05-26 15:20 UTC (permalink / raw)
  To: linux-btrfs

On 26/05/14 13:28, David Bloquel wrote:
> Hi,
> 
> I have a problem with my btrfs filesystem which is freezing when I am
> doing snapshots.
> 
> I have a cron that is snapshoting around 70 sub volume every ten
> minutes. The sub volumes that btrfs is snapshoting are containers
> folders that are running through my virtual environment.
> Sub directories that btrfs is snapshoting are not that big (from 500MB
> to 10GB max and usually around 3GB) but there is a lot of IO on the
> filesystem because of the intensive use of the CTs and VMs.
> 
> At some point the snapshot process becomes really slow, at first it
> snapshot around one folder per seconds but then after a while it can
> take 30seconds or even few minutes to snapshot one single sub volumes.
> Subvolumes are really similar to each other in size and number of
> files so there is no reason that it takes 1second for one sub volume
> and then 3minutes for another one.
> 
> Moreover when my snapshot cron is running all my vms and containers
> are slowing down until the whole filesystem freezes which leads to
> frozen CT and VMs (which is a real problem for me).
> 
> Moreover I can see that my CPU load is really high during the process.
> 
> when I'm am looking to dmesg there is a lot of messages of this kind:
> 
> [96537.686467] BTRFS debug (device drbd0): unlinked 290 orphans
[...]

That looks to be running on top of drbd which will add a network write
overhead (unless you are dangerously running asynchronously!). Hence you
will see IO speed related limits a little sooner...

However, I will guess that your primary problem is likely due to
accumulating fragmentation due to adding ever more snapshots every 10
mins for the VMs/containers.


There are other people far more practised here than I, but some guesses
to try are:


Use "nocow" for the VM images (and container images);

Try using the btrfs auto defrag (beware your IO speed limit vs file size
to be defragged);

Avoid accumulating too many versions of any one snapshot.


Note also the "experimental" status for btrfs... I'm sure you will have
noticed the previous race problems for deleting snapshots.

Aside: I've held off from using kernel 3.12 and 3.13 due to curious
happenings on my test system. kernel 3.14.4 is behaving well so far.


Hope that gives a few clues.

Good luck,
Martin



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs filesystem freezing during snapshots
  2014-05-26 12:28 Btrfs filesystem freezing during snapshots David Bloquel
  2014-05-26 15:20 ` Martin
@ 2014-05-26 15:39 ` Duncan
  2014-05-26 16:39 ` Roman Mamedov
  2 siblings, 0 replies; 6+ messages in thread
From: Duncan @ 2014-05-26 15:39 UTC (permalink / raw)
  To: linux-btrfs

David Bloquel posted on Mon, 26 May 2014 14:28:51 +0200 as excerpted:

> I have a problem with my btrfs filesystem which is freezing when I am
> doing snapshots.
> 
> I have a cron that is snapshoting around 70 sub volume every ten
> minutes. The sub volumes that btrfs is snapshoting are containers
> folders that are running through my virtual environment.
> Sub directories that btrfs is snapshoting are not that big (from 500MB
> to 10GB max and usually around 3GB) but there is a lot of IO on the
> filesystem because of the intensive use of the CTs and VMs.
> 
> At some point the snapshot process becomes really slow, at first it
> snapshot around one folder per seconds but then after a while it can
> take 30seconds or even few minutes to snapshot one single sub volumes.
> Subvolumes are really similar to each other in size and number of
> files so there is no reason that it takes 1second for one sub volume
> and then 3minutes for another one.
> 
> Moreover when my snapshot cron is running all my vms and containers
> are slowing down until the whole filesystem freezes which leads to
> frozen CT and VMs (which is a real problem for me).
> 
> Moreover I can see that my CPU load is really high during the process.
> 
> when I'm am looking to dmesg there is a lot of messages of this kind:
> 
> [orphan unlinking and btrfs-transacti blocked messages, kernel 3.12.0]
> 
> A solution would be to wait few seconds between each snapshot to avoid
> high load however I think it's just a way to avoid the problem and I
> would rather fix it because I am affraid it could appear during
> another operation (copy of a lot of small files etc...).
> 
> I have checked a lot of old messages from this mailling list and I got
> some clues but no real/working solution in my case.

You're hitting one of the btrfs performance and scaling weak-spots
head-on from two different directions at once, so it's little wonder 
you're seeing problems.  

Copy-on-write based filesystems such as btrfs will always find
"internal-rewrite-pattern" a severe challenge to deal with, because under 
normal circumstances, all those writes to blocks inside existing files 
force rewriting those blocks elsewhere, thus very heavily fragmenting the 
file.  We've had reports of files with hundreds of thousands of file 
extents!  No WONDER btrfs bogs down trying to manage these things!

Btrfs has two mechanisms to deal with this.  For small files up to a few 
hundred MiB (think firefox sqlite database files), the autodefrag mount 
option is useful, as when it sees a write into a file it queues that file 
for full rewrite.  However, as the file size increases toward a GiB and 
higher this doesn't scale so well, as the writes can come faster than the 
file can be rewritten.  

Thus for large internal-rewrite files another mechanism is needed.  Until 
the devs come up with a more efficient automated solution, the current 
recommendation is to set the NOCOW file attribute (chattr +C) on these 
files, or more accurately, on the directory before the files are created, 
so they inherit the attribute at creation.[1]  NOCOW files are updated
in-place as they would be on traditional filesystems, thus avoiding the 
fragmentation.

But unfortunately there's a number of caveats and limitations to NOCOW, 
the biggest of which is that snapshots assume COW semantics and freeze 
the existing file data in place at the time of the snapshot, so the first 
write to a file block after a snapshot forces a COW write even on NOCOW 
files, as the alternative would be destroying the snapshot.

Since you're snapshotting those files every 10 minutes, that means even 
with NOCOW files every ten minutes worth of changes will be stored in 
extents written out of order!

Which is what you're coming up against.  Take a look at what filefrag 
says about some of those several gig active VM images that have been 
around for a few weeks.  I bet you find a lot of them have tens of 
thousands of extents, even if you've used the NOCOW attribute on them 
from creation as recommended.

The bottom line is that VM images and the like should be set NOCOW and 
excluded from snapshots using subvolumes, since snapshots stop at 
subvolume boundaries.  Use more conventional backup methods for them,
and/or since setting NOCOW and avoiding snapshots bypasses many of the 
features people actually choose btrfs to get, consider creating separate 
filesystems for your VM images, etc, using something other than btrfs, 
since btrfs simply doesn't work so well for this use-case at this time.

Another caveat/limitation of NOCOW is that it turns off btrfs data 
checksumming and (mount-option-optional) compression, since in-place 
updates don't work well with these features and leaving them on would 
simply be an invitation to impossible to resolve race conditions and 
performance issues, so better to just force them off along with COW and 
avoid the additional danger.  However, that turns out not to be the 
problem one might think, since most applications using such internal file 
rewrite techniques have had to evolve their own methods of dealing with 
file integrity and crash restoration as they're used on filesystems 
without the file integrity mechanisms of btrfs, and in fact, having both 
btrfs and the application's own mechanisms trying to manage things has at 
times resulted in its own set of bugs since neither one accounts for what 
the other is doing and the checkpoints aren't coordinated, etc.  So 
actually, turning off btrfs file integrity checking for these files 
simply lets the applications handle it the way they do on other 
filesystems, without btrfs getting in the way.

Meanwhile, the devs are working hard at improving this use-case, but it's 
worth keeping in mind that features such as snapshotting and checksummed 
file integrity are features that other filesystems don't normally have, 
so even if there's limitations to where and how they work on btrfs, the 
fact that btrfs has them at all puts btrfs beyond other filesystems, and 
if the features must be disabled for a particular use-case, that only 
returns btrfs to the same general set of features that other filesystems 
have.

Addressing the problem from another angle, how many snapshots are you 
keeping?  You're taking snapshots every 10 minutes, but do you have 
automated thinning setup as well?  If you thin to say a snapshot every 
half hour after an hour, deleting two of three, then a snapshot every 
hour after six hours (deleting half), a snapshot every eight hours after 
a day, (three a day, deleting seven of eight), a snapshot a day after a 
week (deleting three of four), and do off-media backup after four weeks 
so can delete all snapshots older than that, you'll have 6 (10-minute, to 
1 hour) + 10 (half-hour, to 6 hours) + 18 (hourly, to a day) + 18
(8-hourly, to a week), + 21 (daily, to four weeks) = 6+10+18+18+21 =
73 snapshots.

Of course, if feasible reducing the base snapshot frequency to every half 
hour will cut it to under 70, and give you a bit more time between 
snapshots to avoid the possibility of a new cycle starting before the 
last one has finished, as well.

I don't know if you're thinning now, but if not, you may have hundreds or 
thousands of existing snapshots.  Simply thinning them out to something 
reasonable like the 70-ish proposed above may well be all you need.

Finally, I note that you're still on a 3.12 kernel, while 3.14 is out and 
3.15 is well on its way.  There's still enough bugs being fixed in each 
kernel that it's worth keeping current, and certainly, if you report 
problems here with a two-kernel-cycle-old kernel, you can expect that 
trying at least the latest stable kernel is going to be suggested, if not 
the latest rc kernel, altho I usually wait until rc2 or rc3 myself, 
figuring I should have read about any real bad system eating bugs by then 
and they will have probably been fixed by then as well, if I didn't.  
Somewhere right about 3.12 they disabled the snapshot aware defrag as it 
simply was NOT scaling well in these sorts of cases, tho it might have 
been 3.11.  If you don't have that snapshot-aware-defrag disabling in 
your kernel, defrags especially will take much *MUCH* longer, but IIRC it 
was disabled by 3.12 so with luck you don't have /that/ problem to worry 
about with your current kernel, at least.

Similarly with btrfs-progs.  Current release (last I checked, about a 
week ago myself) is 3.14.1.  If you're behind that, consider upgrading it 
too, altho it's not quite as critical as the kernel.  The version before 
that was 3.12, and I'd recommend at least having that.  If you're still 
on 0.19 or 0.20-rc, better upgrade!

---
[1] NOCOW attribute inheritance:  On btrfs the nocow attribute should be 
set at file creation in ordered to guarantee that it applies properly.  
The easiest way to do this is to set it on the directory that will 
contain the files, then copy (not move, unless from a different 
filesystem, and not using cp --reflink) existing files from elsewhere 
into the directory with the attribute already set, so they get it set 
when they are created as well.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs filesystem freezing during snapshots
  2014-05-26 15:20 ` Martin
@ 2014-05-26 16:19   ` Russell Coker
  0 siblings, 0 replies; 6+ messages in thread
From: Russell Coker @ 2014-05-26 16:19 UTC (permalink / raw)
  To: Martin; +Cc: linux-btrfs

On Mon, 26 May 2014 16:20:55 Martin wrote:
> That looks to be running on top of drbd which will add a network write
> overhead (unless you are dangerously running asynchronously!). Hence you
> will see IO speed related limits a little sooner...

http://etbe.coker.com.au/2012/01/05/drbd-benchmarking/

Last time I did DRBD performance testing I found that the synchronous option 
was FASTER.  My theory is that no-one does much work on the options that 
aren't recommended.
 
-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs filesystem freezing during snapshots
  2014-05-26 12:28 Btrfs filesystem freezing during snapshots David Bloquel
  2014-05-26 15:20 ` Martin
  2014-05-26 15:39 ` Duncan
@ 2014-05-26 16:39 ` Roman Mamedov
  2014-05-26 17:02   ` Roman Mamedov
  2 siblings, 1 reply; 6+ messages in thread
From: Roman Mamedov @ 2014-05-26 16:39 UTC (permalink / raw)
  To: David Bloquel; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 635 bytes --]

On Mon, 26 May 2014 14:28:51 +0200
David Bloquel <david.bloquel@jimywoo.fr> wrote:

> [69537.117439]       Not tainted 3.12-0.bpo.1-amd64 #1

Try upgrading to the kernel 3.14. From what I can tell it has significant
improvements/bugfixes in the snapshot deletion area. Just a couple of days ago
I got a recurring lock-up after deleting 50 snapshots with tens of thousands of
fragments of files in each (VM images, like in your case). Googled around a
little bit, and found a similar issue with a report that 3.14 solves the
problem. Upgraded to 3.14.4 (from 3.12.20), and voila, indeed it does.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Btrfs filesystem freezing during snapshots
  2014-05-26 16:39 ` Roman Mamedov
@ 2014-05-26 17:02   ` Roman Mamedov
  0 siblings, 0 replies; 6+ messages in thread
From: Roman Mamedov @ 2014-05-26 17:02 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Bloquel

[-- Attachment #1: Type: text/plain, Size: 916 bytes --]

On Mon, 26 May 2014 22:39:16 +0600
Roman Mamedov <rm@romanrm.net> wrote:

> On Mon, 26 May 2014 14:28:51 +0200
> David Bloquel <david.bloquel@jimywoo.fr> wrote:
> 
> > [69537.117439]       Not tainted 3.12-0.bpo.1-amd64 #1
> 
> Try upgrading to the kernel 3.14. From what I can tell it has significant
> improvements/bugfixes in the snapshot deletion area. Just a couple of days ago
> I got a recurring lock-up after deleting 50 snapshots with tens of thousands of
> fragments of files in each (VM images, like in your case). Googled around a
> little bit, and found a similar issue with a report that 3.14 solves the
> problem. Upgraded to 3.14.4 (from 3.12.20), and voila, indeed it does.

Oh, I missed that your freezing happens during snapshot creation, not
deletion. But anyways, I think checking if the problem persists on the newest
kernel is still a good idea.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-05-26 17:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-26 12:28 Btrfs filesystem freezing during snapshots David Bloquel
2014-05-26 15:20 ` Martin
2014-05-26 16:19   ` Russell Coker
2014-05-26 15:39 ` Duncan
2014-05-26 16:39 ` Roman Mamedov
2014-05-26 17:02   ` Roman Mamedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).