* Btrfs filesystem freezing during snapshots
@ 2014-05-26 12:28 David Bloquel
2014-05-26 15:20 ` Martin
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: David Bloquel @ 2014-05-26 12:28 UTC (permalink / raw)
To: linux-btrfs
Hi,
I have a problem with my btrfs filesystem which is freezing when I am
doing snapshots.
I have a cron that is snapshoting around 70 sub volume every ten
minutes. The sub volumes that btrfs is snapshoting are containers
folders that are running through my virtual environment.
Sub directories that btrfs is snapshoting are not that big (from 500MB
to 10GB max and usually around 3GB) but there is a lot of IO on the
filesystem because of the intensive use of the CTs and VMs.
At some point the snapshot process becomes really slow, at first it
snapshot around one folder per seconds but then after a while it can
take 30seconds or even few minutes to snapshot one single sub volumes.
Subvolumes are really similar to each other in size and number of
files so there is no reason that it takes 1second for one sub volume
and then 3minutes for another one.
Moreover when my snapshot cron is running all my vms and containers
are slowing down until the whole filesystem freezes which leads to
frozen CT and VMs (which is a real problem for me).
Moreover I can see that my CPU load is really high during the process.
when I'm am looking to dmesg there is a lot of messages of this kind:
[96537.686467] BTRFS debug (device drbd0): unlinked 290 orphans
[96540.819101] BTRFS debug (device drbd0): unlinked 2317 orphans
[96544.852499] BTRFS debug (device drbd0): unlinked 25 orphans
[96547.494132] BTRFS debug (device drbd0): unlinked 20 orphans
[96770.954615] BTRFS debug (device drbd0): unlinked 95 orphans
[96814.027538] BTRFS debug (device drbd0): unlinked 3331 orphans
[96841.240481] BTRFS debug (device drbd0): unlinked 24 orphans
[96851.094867] BTRFS debug (device drbd0): unlinked 6 orphans
[96862.285772] BTRFS debug (device drbd0): unlinked 2105 orphans
[96869.611062] BTRFS debug (device drbd0): unlinked 9 orphans
[96875.920977] BTRFS debug (device drbd0): unlinked 2 orphans
[96892.333661] BTRFS debug (device drbd0): unlinked 1640 orphans
[96902.928344] BTRFS debug (device drbd0): unlinked 482 orphans
[96907.615605] BTRFS debug (device drbd0): unlinked 83 orphans
[96914.216044] BTRFS debug (device drbd0): unlinked 39 orphans
[96921.936762] BTRFS debug (device drbd0): unlinked 50 orphans
[96927.035003] BTRFS debug (device drbd0): unlinked 12 orphans
[96932.864481] BTRFS debug (device drbd0): unlinked 5 orphans
[96937.511487] BTRFS debug (device drbd0): unlinked 31 orphans
[96946.521916] BTRFS debug (device drbd0): unlinked 5 orphans
[96948.591532] BTRFS debug (device drbd0): unlinked 4 orphans
I am not copying the whole dmesg because there is hundreds of orphans warning.
In addition of orphans warning there is also this kind of messages in
the log files:
[69537.117372] INFO: task btrfs-transacti:14507 blocked for more than
120 seconds.
[69537.117439] Not tainted 3.12-0.bpo.1-amd64 #1
[69537.117475] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[69537.117535] btrfs-transacti D ffff88047fdd4300 0 14507 2 0x00000000
[69537.117546] ffff88046bc740c0 0000000000000046 0000000000000296
ffff88046f0dc840
[69537.117557] ffff880075987fd8 ffff880075987fd8 ffff880075987fd8
ffff88046bc740c0
[69537.117565] 0000000000000246 ffff880351942ea8 ffff880351942f30
0000000000000000
[69537.117574] Call Trace:
[69537.117613] [<ffffffffa04b4dc5>] ? wait_for_commit.isra.25+0x55/0x90 [btrfs]
[69537.117624] [<ffffffff81082d20>] ? add_wait_queue+0x60/0x60
[69537.117650] [<ffffffffa04b69bb>] ?
btrfs_commit_transaction+0x10b/0x9f0 [btrfs]
[69537.117675] [<ffffffffa04b0385>] ? transaction_kthread+0x1b5/0x220 [btrfs]
[69537.117699] [<ffffffffa04b01d0>] ?
btree_readpage_end_io_hook+0x2d0/0x2d0 [btrfs]
[69537.117707] [<ffffffff81082333>] ? kthread+0xb3/0xc0
[69537.117715] [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
[69537.117724] [<ffffffff814cb70c>] ? ret_from_fork+0x7c/0xb0
[69537.117732] [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
[69657.215298] INFO: task btrfs-transacti:14507 blocked for more than
120 seconds.
[69657.215360] Not tainted 3.12-0.bpo.1-amd64 #1
[69657.215393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[69657.215450] btrfs-transacti D ffff88047fdd4300 0 14507 2 0x00000000
[69657.215455] ffff88046bc740c0 0000000000000046 0000000000000296
ffff88046f0dc840
[69657.215461] ffff880075987fd8 ffff880075987fd8 ffff880075987fd8
ffff88046bc740c0
[69657.215465] 0000000000000246 ffff880351942ea8 ffff880351942f30
0000000000000000
[69657.215469] Call Trace:
[69657.215490] [<ffffffffa04b4dc5>] ? wait_for_commit.isra.25+0x55/0x90 [btrfs]
[69657.215496] [<ffffffff81082d20>] ? add_wait_queue+0x60/0x60
[69657.215508] [<ffffffffa04b69bb>] ?
btrfs_commit_transaction+0x10b/0x9f0 [btrfs]
[69657.215520] [<ffffffffa04b0385>] ? transaction_kthread+0x1b5/0x220 [btrfs]
[69657.215531] [<ffffffffa04b01d0>] ?
btree_readpage_end_io_hook+0x2d0/0x2d0 [btrfs]
[69657.215535] [<ffffffff81082333>] ? kthread+0xb3/0xc0
[69657.215539] [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
[69657.215543] [<ffffffff814cb70c>] ? ret_from_fork+0x7c/0xb0
[69657.215547] [<ffffffff81082280>] ? flush_kthread_worker+0xa0/0xa0
I think the message: "[69537.117372] INFO: task btrfs-transacti:14507
blocked for more than 120 seconds." appears when the filesystem is
frozen.
A solution would be to wait few seconds between each snapshot to avoid
high load however I think it's just a way to avoid the problem and I
would rather fix it because I am affraid it could appear during
another operation (copy of a lot of small files etc...).
I have checked a lot of old messages from this mailling list and I got
some clues but no real/working solution in my case.
I hope some of you could give me some advises
If you need any further information please do not hesitate.
(Sorry for my English, I tried to make it as good as I can)
Best regards,
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Btrfs filesystem freezing during snapshots
2014-05-26 12:28 Btrfs filesystem freezing during snapshots David Bloquel
@ 2014-05-26 15:20 ` Martin
2014-05-26 16:19 ` Russell Coker
2014-05-26 15:39 ` Duncan
2014-05-26 16:39 ` Roman Mamedov
2 siblings, 1 reply; 6+ messages in thread
From: Martin @ 2014-05-26 15:20 UTC (permalink / raw)
To: linux-btrfs
On 26/05/14 13:28, David Bloquel wrote:
> Hi,
>
> I have a problem with my btrfs filesystem which is freezing when I am
> doing snapshots.
>
> I have a cron that is snapshoting around 70 sub volume every ten
> minutes. The sub volumes that btrfs is snapshoting are containers
> folders that are running through my virtual environment.
> Sub directories that btrfs is snapshoting are not that big (from 500MB
> to 10GB max and usually around 3GB) but there is a lot of IO on the
> filesystem because of the intensive use of the CTs and VMs.
>
> At some point the snapshot process becomes really slow, at first it
> snapshot around one folder per seconds but then after a while it can
> take 30seconds or even few minutes to snapshot one single sub volumes.
> Subvolumes are really similar to each other in size and number of
> files so there is no reason that it takes 1second for one sub volume
> and then 3minutes for another one.
>
> Moreover when my snapshot cron is running all my vms and containers
> are slowing down until the whole filesystem freezes which leads to
> frozen CT and VMs (which is a real problem for me).
>
> Moreover I can see that my CPU load is really high during the process.
>
> when I'm am looking to dmesg there is a lot of messages of this kind:
>
> [96537.686467] BTRFS debug (device drbd0): unlinked 290 orphans
[...]
That looks to be running on top of drbd which will add a network write
overhead (unless you are dangerously running asynchronously!). Hence you
will see IO speed related limits a little sooner...
However, I will guess that your primary problem is likely due to
accumulating fragmentation due to adding ever more snapshots every 10
mins for the VMs/containers.
There are other people far more practised here than I, but some guesses
to try are:
Use "nocow" for the VM images (and container images);
Try using the btrfs auto defrag (beware your IO speed limit vs file size
to be defragged);
Avoid accumulating too many versions of any one snapshot.
Note also the "experimental" status for btrfs... I'm sure you will have
noticed the previous race problems for deleting snapshots.
Aside: I've held off from using kernel 3.12 and 3.13 due to curious
happenings on my test system. kernel 3.14.4 is behaving well so far.
Hope that gives a few clues.
Good luck,
Martin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Btrfs filesystem freezing during snapshots
2014-05-26 12:28 Btrfs filesystem freezing during snapshots David Bloquel
2014-05-26 15:20 ` Martin
@ 2014-05-26 15:39 ` Duncan
2014-05-26 16:39 ` Roman Mamedov
2 siblings, 0 replies; 6+ messages in thread
From: Duncan @ 2014-05-26 15:39 UTC (permalink / raw)
To: linux-btrfs
David Bloquel posted on Mon, 26 May 2014 14:28:51 +0200 as excerpted:
> I have a problem with my btrfs filesystem which is freezing when I am
> doing snapshots.
>
> I have a cron that is snapshoting around 70 sub volume every ten
> minutes. The sub volumes that btrfs is snapshoting are containers
> folders that are running through my virtual environment.
> Sub directories that btrfs is snapshoting are not that big (from 500MB
> to 10GB max and usually around 3GB) but there is a lot of IO on the
> filesystem because of the intensive use of the CTs and VMs.
>
> At some point the snapshot process becomes really slow, at first it
> snapshot around one folder per seconds but then after a while it can
> take 30seconds or even few minutes to snapshot one single sub volumes.
> Subvolumes are really similar to each other in size and number of
> files so there is no reason that it takes 1second for one sub volume
> and then 3minutes for another one.
>
> Moreover when my snapshot cron is running all my vms and containers
> are slowing down until the whole filesystem freezes which leads to
> frozen CT and VMs (which is a real problem for me).
>
> Moreover I can see that my CPU load is really high during the process.
>
> when I'm am looking to dmesg there is a lot of messages of this kind:
>
> [orphan unlinking and btrfs-transacti blocked messages, kernel 3.12.0]
>
> A solution would be to wait few seconds between each snapshot to avoid
> high load however I think it's just a way to avoid the problem and I
> would rather fix it because I am affraid it could appear during
> another operation (copy of a lot of small files etc...).
>
> I have checked a lot of old messages from this mailling list and I got
> some clues but no real/working solution in my case.
You're hitting one of the btrfs performance and scaling weak-spots
head-on from two different directions at once, so it's little wonder
you're seeing problems.
Copy-on-write based filesystems such as btrfs will always find
"internal-rewrite-pattern" a severe challenge to deal with, because under
normal circumstances, all those writes to blocks inside existing files
force rewriting those blocks elsewhere, thus very heavily fragmenting the
file. We've had reports of files with hundreds of thousands of file
extents! No WONDER btrfs bogs down trying to manage these things!
Btrfs has two mechanisms to deal with this. For small files up to a few
hundred MiB (think firefox sqlite database files), the autodefrag mount
option is useful, as when it sees a write into a file it queues that file
for full rewrite. However, as the file size increases toward a GiB and
higher this doesn't scale so well, as the writes can come faster than the
file can be rewritten.
Thus for large internal-rewrite files another mechanism is needed. Until
the devs come up with a more efficient automated solution, the current
recommendation is to set the NOCOW file attribute (chattr +C) on these
files, or more accurately, on the directory before the files are created,
so they inherit the attribute at creation.[1] NOCOW files are updated
in-place as they would be on traditional filesystems, thus avoiding the
fragmentation.
But unfortunately there's a number of caveats and limitations to NOCOW,
the biggest of which is that snapshots assume COW semantics and freeze
the existing file data in place at the time of the snapshot, so the first
write to a file block after a snapshot forces a COW write even on NOCOW
files, as the alternative would be destroying the snapshot.
Since you're snapshotting those files every 10 minutes, that means even
with NOCOW files every ten minutes worth of changes will be stored in
extents written out of order!
Which is what you're coming up against. Take a look at what filefrag
says about some of those several gig active VM images that have been
around for a few weeks. I bet you find a lot of them have tens of
thousands of extents, even if you've used the NOCOW attribute on them
from creation as recommended.
The bottom line is that VM images and the like should be set NOCOW and
excluded from snapshots using subvolumes, since snapshots stop at
subvolume boundaries. Use more conventional backup methods for them,
and/or since setting NOCOW and avoiding snapshots bypasses many of the
features people actually choose btrfs to get, consider creating separate
filesystems for your VM images, etc, using something other than btrfs,
since btrfs simply doesn't work so well for this use-case at this time.
Another caveat/limitation of NOCOW is that it turns off btrfs data
checksumming and (mount-option-optional) compression, since in-place
updates don't work well with these features and leaving them on would
simply be an invitation to impossible to resolve race conditions and
performance issues, so better to just force them off along with COW and
avoid the additional danger. However, that turns out not to be the
problem one might think, since most applications using such internal file
rewrite techniques have had to evolve their own methods of dealing with
file integrity and crash restoration as they're used on filesystems
without the file integrity mechanisms of btrfs, and in fact, having both
btrfs and the application's own mechanisms trying to manage things has at
times resulted in its own set of bugs since neither one accounts for what
the other is doing and the checkpoints aren't coordinated, etc. So
actually, turning off btrfs file integrity checking for these files
simply lets the applications handle it the way they do on other
filesystems, without btrfs getting in the way.
Meanwhile, the devs are working hard at improving this use-case, but it's
worth keeping in mind that features such as snapshotting and checksummed
file integrity are features that other filesystems don't normally have,
so even if there's limitations to where and how they work on btrfs, the
fact that btrfs has them at all puts btrfs beyond other filesystems, and
if the features must be disabled for a particular use-case, that only
returns btrfs to the same general set of features that other filesystems
have.
Addressing the problem from another angle, how many snapshots are you
keeping? You're taking snapshots every 10 minutes, but do you have
automated thinning setup as well? If you thin to say a snapshot every
half hour after an hour, deleting two of three, then a snapshot every
hour after six hours (deleting half), a snapshot every eight hours after
a day, (three a day, deleting seven of eight), a snapshot a day after a
week (deleting three of four), and do off-media backup after four weeks
so can delete all snapshots older than that, you'll have 6 (10-minute, to
1 hour) + 10 (half-hour, to 6 hours) + 18 (hourly, to a day) + 18
(8-hourly, to a week), + 21 (daily, to four weeks) = 6+10+18+18+21 =
73 snapshots.
Of course, if feasible reducing the base snapshot frequency to every half
hour will cut it to under 70, and give you a bit more time between
snapshots to avoid the possibility of a new cycle starting before the
last one has finished, as well.
I don't know if you're thinning now, but if not, you may have hundreds or
thousands of existing snapshots. Simply thinning them out to something
reasonable like the 70-ish proposed above may well be all you need.
Finally, I note that you're still on a 3.12 kernel, while 3.14 is out and
3.15 is well on its way. There's still enough bugs being fixed in each
kernel that it's worth keeping current, and certainly, if you report
problems here with a two-kernel-cycle-old kernel, you can expect that
trying at least the latest stable kernel is going to be suggested, if not
the latest rc kernel, altho I usually wait until rc2 or rc3 myself,
figuring I should have read about any real bad system eating bugs by then
and they will have probably been fixed by then as well, if I didn't.
Somewhere right about 3.12 they disabled the snapshot aware defrag as it
simply was NOT scaling well in these sorts of cases, tho it might have
been 3.11. If you don't have that snapshot-aware-defrag disabling in
your kernel, defrags especially will take much *MUCH* longer, but IIRC it
was disabled by 3.12 so with luck you don't have /that/ problem to worry
about with your current kernel, at least.
Similarly with btrfs-progs. Current release (last I checked, about a
week ago myself) is 3.14.1. If you're behind that, consider upgrading it
too, altho it's not quite as critical as the kernel. The version before
that was 3.12, and I'd recommend at least having that. If you're still
on 0.19 or 0.20-rc, better upgrade!
---
[1] NOCOW attribute inheritance: On btrfs the nocow attribute should be
set at file creation in ordered to guarantee that it applies properly.
The easiest way to do this is to set it on the directory that will
contain the files, then copy (not move, unless from a different
filesystem, and not using cp --reflink) existing files from elsewhere
into the directory with the attribute already set, so they get it set
when they are created as well.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Btrfs filesystem freezing during snapshots
2014-05-26 15:20 ` Martin
@ 2014-05-26 16:19 ` Russell Coker
0 siblings, 0 replies; 6+ messages in thread
From: Russell Coker @ 2014-05-26 16:19 UTC (permalink / raw)
To: Martin; +Cc: linux-btrfs
On Mon, 26 May 2014 16:20:55 Martin wrote:
> That looks to be running on top of drbd which will add a network write
> overhead (unless you are dangerously running asynchronously!). Hence you
> will see IO speed related limits a little sooner...
http://etbe.coker.com.au/2012/01/05/drbd-benchmarking/
Last time I did DRBD performance testing I found that the synchronous option
was FASTER. My theory is that no-one does much work on the options that
aren't recommended.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Btrfs filesystem freezing during snapshots
2014-05-26 12:28 Btrfs filesystem freezing during snapshots David Bloquel
2014-05-26 15:20 ` Martin
2014-05-26 15:39 ` Duncan
@ 2014-05-26 16:39 ` Roman Mamedov
2014-05-26 17:02 ` Roman Mamedov
2 siblings, 1 reply; 6+ messages in thread
From: Roman Mamedov @ 2014-05-26 16:39 UTC (permalink / raw)
To: David Bloquel; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 635 bytes --]
On Mon, 26 May 2014 14:28:51 +0200
David Bloquel <david.bloquel@jimywoo.fr> wrote:
> [69537.117439] Not tainted 3.12-0.bpo.1-amd64 #1
Try upgrading to the kernel 3.14. From what I can tell it has significant
improvements/bugfixes in the snapshot deletion area. Just a couple of days ago
I got a recurring lock-up after deleting 50 snapshots with tens of thousands of
fragments of files in each (VM images, like in your case). Googled around a
little bit, and found a similar issue with a report that 3.14 solves the
problem. Upgraded to 3.14.4 (from 3.12.20), and voila, indeed it does.
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Btrfs filesystem freezing during snapshots
2014-05-26 16:39 ` Roman Mamedov
@ 2014-05-26 17:02 ` Roman Mamedov
0 siblings, 0 replies; 6+ messages in thread
From: Roman Mamedov @ 2014-05-26 17:02 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Bloquel
[-- Attachment #1: Type: text/plain, Size: 916 bytes --]
On Mon, 26 May 2014 22:39:16 +0600
Roman Mamedov <rm@romanrm.net> wrote:
> On Mon, 26 May 2014 14:28:51 +0200
> David Bloquel <david.bloquel@jimywoo.fr> wrote:
>
> > [69537.117439] Not tainted 3.12-0.bpo.1-amd64 #1
>
> Try upgrading to the kernel 3.14. From what I can tell it has significant
> improvements/bugfixes in the snapshot deletion area. Just a couple of days ago
> I got a recurring lock-up after deleting 50 snapshots with tens of thousands of
> fragments of files in each (VM images, like in your case). Googled around a
> little bit, and found a similar issue with a report that 3.14 solves the
> problem. Upgraded to 3.14.4 (from 3.12.20), and voila, indeed it does.
Oh, I missed that your freezing happens during snapshot creation, not
deletion. But anyways, I think checking if the problem persists on the newest
kernel is still a good idea.
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-05-26 17:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-26 12:28 Btrfs filesystem freezing during snapshots David Bloquel
2014-05-26 15:20 ` Martin
2014-05-26 16:19 ` Russell Coker
2014-05-26 15:39 ` Duncan
2014-05-26 16:39 ` Roman Mamedov
2014-05-26 17:02 ` Roman Mamedov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).