* Huge load on btrfs subvolume delete
@ 2016-08-15 10:39 Daniel Caillibaud
2016-08-15 12:32 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Caillibaud @ 2016-08-15 10:39 UTC (permalink / raw)
To: linux-btrfs
Hi,
I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete
I use snapshots on lxc hosts under debian jessie with
- kernel 4.6.0-0.bpo.1-amd64
- btrfs-progs 4.6.1-1~bpo8
For backup, I have each day, for each subvolume
btrfs subvolume snapshot -r $subvol $snap
# then later
ionice -c3 btrfs subvolume delete $snap
but ionice doesn't seems to have any effect here and after a few minutes the load grows up
quite high (30~40), and I don't know how to make this deletion nicer with I/O
Is there a better way to do so ?
Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems the one doing a
lot of I/O ?
Actually my io priority on btrfs process are
ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}'
[btrfs-worker] none: prio 4
[btrfs-worker-hi] none: prio 4
[btrfs-delalloc] none: prio 4
[btrfs-flush_del] none: prio 4
[btrfs-cache] none: prio 4
[btrfs-submit] none: prio 4
[btrfs-fixup] none: prio 4
[btrfs-endio] none: prio 4
[btrfs-endio-met] none: prio 4
[btrfs-endio-met] none: prio 4
[btrfs-endio-rai] none: prio 4
[btrfs-endio-rep] none: prio 4
[btrfs-rmw] none: prio 4
[btrfs-endio-wri] none: prio 4
[btrfs-freespace] none: prio 4
[btrfs-delayed-m] none: prio 4
[btrfs-readahead] none: prio 4
[btrfs-qgroup-re] none: prio 4
[btrfs-extent-re] none: prio 4
[btrfs-cleaner] none: prio 0
[btrfs-transacti] none: prio 0
Thanks
--
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Huge load on btrfs subvolume delete
2016-08-15 10:39 Huge load on btrfs subvolume delete Daniel Caillibaud
@ 2016-08-15 12:32 ` Austin S. Hemmelgarn
2016-08-15 14:06 ` Daniel Caillibaud
0 siblings, 1 reply; 5+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-15 12:32 UTC (permalink / raw)
To: Daniel Caillibaud, linux-btrfs
On 2016-08-15 06:39, Daniel Caillibaud wrote:
> Hi,
>
> I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete
>
> I use snapshots on lxc hosts under debian jessie with
> - kernel 4.6.0-0.bpo.1-amd64
> - btrfs-progs 4.6.1-1~bpo8
>
> For backup, I have each day, for each subvolume
>
> btrfs subvolume snapshot -r $subvol $snap
> # then later
> ionice -c3 btrfs subvolume delete $snap
>
> but ionice doesn't seems to have any effect here and after a few minutes the load grows up
> quite high (30~40), and I don't know how to make this deletion nicer with I/O
Before I start explaining possible solutions, it helps to explain what's
actually happening here. When you create a snapshot, BTRFS just scans
down the tree for the subvolume in question and creates new references
to everything in that subvolume in a separate tree. This is usually
insanely fast because all that needs to be done is updating metadata.
When you delete a snapshot however, it has to remove any remaining
references within the snapshot to the parent subvolume, and also has to
process any changed data that is now different from the parent subvolume
for deletion just like it would for deleting a file. As a result of
this, the work to create a snapshot only depends on the complexity of
the directory structure within the subvolume, while the work to delete
it depends on both that and how much the snapshot has changed from the
parent subvolume.
The spike in load your seeing is the filesystem handling all that
internal accounting in the background, and I'd be willing to bet that it
varies based on how fast things are changing in the parent subvolume.
Setting idle I/O scheduling priority on the command to delete the
snapshot does nothing because all that command does is tell the kernel
to delete the snapshot, the actual deletion is handled in the filesystem
driver. While it won't help with the spike in load, you probably want
to add `--commit-after` to that subvolume deletion command. That will
cause the spike to happen almost immediately, and the command won't
return until the filesystem is finished with the accounting and thus the
load should be back to normal when it returns.
>
> Is there a better way to do so ?
While there isn't any way I know of to do so, there are ways you can
reduce the impact by reducing how much your backing up:
1. You almost certainly don't need to back up the logs, and if you do,
they should probably be backed up independently from the rest of the
system image. In most cases, logs just add extra size to a backup, and
have little value when you restore a backup, so it makes little sense in
most cases to include them in a backup. The simplest way to exclude
them in your case is to make /var/log in the LXC containers be a
separate subvolume. This will exclude it from the snapshot for the
backup, which will both speed up the backup, and reduce the amount of
changes from the parent that occur while creating the backup.
2. Assuming you're using a distribution compliant with the filesystem
hierarchy standard, there are a couple of directories you can safely
exclude from all backups simply because portable programs are designed
to handle losing data from these directories gracefully. Such
directories include /tmp, /var/tmp, and /var/cache, and they can be
excluded the same way as /var/log.
3. Similar arguments apply to $HOME/.cache, which is essentially a
per-user /var/cache. This is less likely to have an impact if you don't
have individual users doing things on these systems.
4. Look for other similar areas you may be able to safely exclude. For
example, I use Gentoo, and I build all my packages with external
debugging symbols which get stored in /usr/lib/debug. I only have this
set up for convenience, so there's no point in me backing it up because
I can just rebuild the package to regenerate the debugging symbols if I
need them after restoring from a backup. Similarly, I also exclude any
VCS repositories that I have copies of elsewhere, simply because I can
just clone that copy if I need it.
>
> Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems the one doing a
> lot of I/O ?
Yes, it's always a bad idea to mess with any scheduling properties other
than CPU affinity for kernel threads (and even messing with CPU affinity
is usually a bad idea too). The btrfs-transaction kthread (the name
gets cut off by the length limits built into the kernel) is a
particularly bad one to mess with, because it handles committing updates
to the filesystem. Setting an idle scheduling priority on it would
probably put you at severe risk of data loss or cause your system to
lock up.
>
> Actually my io priority on btrfs process are
>
> ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}'
> [btrfs-worker] none: prio 4
> [btrfs-worker-hi] none: prio 4
> [btrfs-delalloc] none: prio 4
> [btrfs-flush_del] none: prio 4
> [btrfs-cache] none: prio 4
> [btrfs-submit] none: prio 4
> [btrfs-fixup] none: prio 4
> [btrfs-endio] none: prio 4
> [btrfs-endio-met] none: prio 4
> [btrfs-endio-met] none: prio 4
> [btrfs-endio-rai] none: prio 4
> [btrfs-endio-rep] none: prio 4
> [btrfs-rmw] none: prio 4
> [btrfs-endio-wri] none: prio 4
> [btrfs-freespace] none: prio 4
> [btrfs-delayed-m] none: prio 4
> [btrfs-readahead] none: prio 4
> [btrfs-qgroup-re] none: prio 4
> [btrfs-extent-re] none: prio 4
> [btrfs-cleaner] none: prio 0
> [btrfs-transacti] none: prio 0
Altogether, this is exactly what they should be in a normal kernel.
Also, neat trick with awk to get that info, I'll have to remember that.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Huge load on btrfs subvolume delete
2016-08-15 12:32 ` Austin S. Hemmelgarn
@ 2016-08-15 14:06 ` Daniel Caillibaud
2016-08-15 14:16 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Caillibaud @ 2016-08-15 14:06 UTC (permalink / raw)
To: btrfs ml
Le 15/08/16 à 08:32, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> a écrit :
ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote:
ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete
[…]
ASH> Before I start explaining possible solutions, it helps to explain what's
ASH> actually happening here.
[…]
Thanks a lot for these clear and detailed explanations.
ASH> > Is there a better way to do so ?
ASH> While there isn't any way I know of to do so, there are ways you can
ASH> reduce the impact by reducing how much your backing up:
Thanks for these clues too !
I'll use --commit-after, in order to wait for complete deletion before starting rsync the next
snapshot, and I keep in mind the benefit of putting /var/log outside the main subvolume of the
vm (but I guess my main pb is about databases, because their datadir are the ones with most
writes).
--
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Huge load on btrfs subvolume delete
2016-08-15 14:06 ` Daniel Caillibaud
@ 2016-08-15 14:16 ` Austin S. Hemmelgarn
2016-08-15 19:56 ` Daniel Caillibaud
0 siblings, 1 reply; 5+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-15 14:16 UTC (permalink / raw)
To: Daniel Caillibaud, btrfs ml
On 2016-08-15 10:06, Daniel Caillibaud wrote:
> Le 15/08/16 à 08:32, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> a écrit :
>
> ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote:
> ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete
> […]
>
> ASH> Before I start explaining possible solutions, it helps to explain what's
> ASH> actually happening here.
> […]
>
> Thanks a lot for these clear and detailed explanations.
Glad I could help.
>
> ASH> > Is there a better way to do so ?
>
> ASH> While there isn't any way I know of to do so, there are ways you can
> ASH> reduce the impact by reducing how much your backing up:
>
> Thanks for these clues too !
>
> I'll use --commit-after, in order to wait for complete deletion before starting rsync the next
> snapshot, and I keep in mind the benefit of putting /var/log outside the main subvolume of the
> vm (but I guess my main pb is about databases, because their datadir are the ones with most
> writes).
>
With respect to databases, you might consider backing them up separately
too. In many cases for something like an SQL database, it's a lot more
flexible to have a dump of the database as a backup than it is to have
the database files themselves, because it decouples it from the
filesystem level layout. Most good databases should be able to give you
a stable dump (assuming of course that the application using the
databases is sanely written) a whole lot faster than you could back up
the files themselves. For the couple of databases we use internally
where I work, we actually back them up separately not only to retain
this flexibility, but also because we have them on a separate backup
schedule from the rest of the systems because they change a lot more
frequently than anything else.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Huge load on btrfs subvolume delete
2016-08-15 14:16 ` Austin S. Hemmelgarn
@ 2016-08-15 19:56 ` Daniel Caillibaud
0 siblings, 0 replies; 5+ messages in thread
From: Daniel Caillibaud @ 2016-08-15 19:56 UTC (permalink / raw)
To: btrfs ml
Le 15/08/16 à 10:16, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> a écrit :
ASH> With respect to databases, you might consider backing them up separately
ASH> too. In many cases for something like an SQL database, it's a lot more
ASH> flexible to have a dump of the database as a backup than it is to have
ASH> the database files themselves, because it decouples it from the
ASH> filesystem level layout.
With mysql|mariadb, having a consistent dump needs to lock tables during dump, not acceptable on
production servers.
Even with specialised tools for hotdump, doing the dump on prod servers is too heavy about I/O
(I have huge db, writing the dump is expensive and long).
I used to have a slave juste for the dump (easy to stop slave, dump, and start slave), but after
a while it wasn't able to follow the writings all the day long (prod was on ssd and it wasn't,
dump hd was 100% busy all the day long), so it's for me really easier to rsync the raw
files once a day on a cheap host before dump.
(of course, I need to flush & lock table during the snapshot, before rsync, but it's just one or
two seconds, still acceptable)
--
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-08-15 19:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-15 10:39 Huge load on btrfs subvolume delete Daniel Caillibaud
2016-08-15 12:32 ` Austin S. Hemmelgarn
2016-08-15 14:06 ` Daniel Caillibaud
2016-08-15 14:16 ` Austin S. Hemmelgarn
2016-08-15 19:56 ` Daniel Caillibaud
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.