All of lore.kernel.org
 help / color / mirror / Atom feed
From: Henk Slager <eye1tm@gmail.com>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Use fast device only for metadata?
Date: Tue, 9 Feb 2016 19:23:50 +0100	[thread overview]
Message-ID: <CAPmG0jYnogfJjutzQaL0FJAUbnLJke8i1-s_sNWVdF7JauSQ4w@mail.gmail.com> (raw)
In-Reply-To: <20160209082933.52273993@jupiter.sol.kaishome.de>

On Tue, Feb 9, 2016 at 8:29 AM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Am Mon, 08 Feb 2016 13:44:17 -0800
> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>
>> On Feb 07 2016, Martin Steigerwald <martin@lichtvoll.de> wrote:
>> > Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>> >> Am Sun, 07 Feb 2016 11:06:58 -0800
>> >>
>> >> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>> >> > Hello,
>> >> >
>> >> > I have a large home directory on a spinning disk that I regularly
>> >> > synchronize between different computers using unison. That takes
>> >> > ages, even though the amount of changed files is typically
>> >> > small. I suspect most if the time is spend walking through the
>> >> > file system and checking mtimes.
>> >> >
>> >> > So I was wondering if I could possibly speed-up this operation by
>> >> > storing all btrfs metadata on a fast, SSD drive. It seems that
>> >> > mkfs.btrfs allows me to put the metadata in raid1 or dup mode,
>> >> > and the file contents in single mode. However, I could not find
>> >> > a way to tell btrfs to use a device *only* for metadata. Is
>> >> > there a way to do that?
>> >> >
>> >> > Also, what is the difference between using "dup" and "raid1" for
>> >> > the metadata?
>> >>
>> >> You may want to try bcache. It will speedup random access which is
>> >> probably the main cause for your slow sync. Unfortunately it
>> >> requires you to reformat your btrfs partitions to add a bcache
>> >> superblock. But it's worth the efforts.
>> >>
>> >> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+
>> >> hours to typically 1.5-3 depending on how much data changed.
>> >
>> > An alternative is using dm-cache, I think it doesn´t need to
>> > recreate the filesystem.
>>
>> Yes, I tried that already but it didn't improve things at all. I
>> wrote a message to the lvm list though, so maybe someone will be able
>> to help.
>>
>> Otherwise I'll give bcache a shot. I've avoided it so far because of
>> the need to reformat and because of rumours that it doesn't work well
>> with LVM or BTRFS. But it sounds as if that's not the case..
>
> I'm myself using bcache+btrfs and it ran bullet proof so far, even
> after unintentional resets or power outage. It's important tho to NOT
> put any storage layer between bcache and your devices or between btrfs
> and your device as there are reports it becomes unstable with md or lvm
> involved. In my setup I can even use discard/trim without problems. I'd
> recommend a current kernel, tho.
>
> Since it requires reformatting, it's a big pita but it's worth the
> efforts. It appeared, from its design, much more effective and stable
> than dmcache. You could even format a bcache superblock "just in case",
> and add an SSD later. Without SSD, bcache will just work in passthru
> mode. Actually, I started to format all my storage with bcache
> superblock "just in case". It is similar to having another partition
> table folded inside - so it doesn't hurt (except you need bcache-probe
> in initrd to detect the contained filesystems).

Same positive bcache+BTRFS experience for me, I am using it since
kernel 4.1.6 and now just latest 4.4. Especially now it is possible to
use VM images in normal CoW mode with speed/performance comparable to
the image on SSD. This is with 50G images consisting of about 50k
extents, raid10 btrfs with mount options noatime,nossd,autodefrag and
writeback on. Initial amount of extents was in order of 100 or so, but
later small writes inside the VM just almost all end up in the bcache.
Nightly incremental send|receive is just a few minutes. Kernel compile
from local git repo clone almost works like from SSD.

When both RAM cache is invalidated and bcache detached / stopped / not
there, filesystem finds or operations that have to deal with
fragmentation or a lot of seeks clearly take way more time. From
there, after starting and using an OS in a VM for lets say 10 minutes
for common tasks, speed is 'SSD like' and not 'HDD like' anymore and
stays that way (until eviction of blocks of course).

The 'reformatting' might be avoided by using this:
https://github.com/g2p/blocks

I haven't used it myself as one fs was just full harddisk and my
python installations had some issues. I wanted to keep same UUID ( due
to longterm incremental send|receive cloning setup) so I did shrink
the filesystem to its almost smallest possible and then used an extra
device (4TB) to dd_rescue the fs image onto and then 2nd step
dd_rescue it back to the original disk (to a partition that is
bcache'd). A btrfs replace would have also been an option. Or some
2-step add-remove action or tricks with raid1.

For another disk I did not have a spare disk, so I made a script to do
an 'in-place' filesystem image replace. I have browsed the superblocks
(don't remember size, but its a few kB AFAIK), so 1G copyblocksize is
huge enough and keeping at least 2 copyblocks readahead stored on
intermediate storage worked fine. Same can be used for LUKS header
addition.

  parent reply	other threads:[~2016-02-09 18:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-07 19:06 Use fast device only for metadata? Nikolaus Rath
2016-02-07 20:07 ` Kai Krakow
2016-02-07 20:59   ` Martin Steigerwald
2016-02-08  1:04     ` Duncan
2016-02-08 12:24     ` Austin S. Hemmelgarn
2016-02-08 13:20       ` Qu Wenruo
2016-02-08 13:29         ` Austin S. Hemmelgarn
2016-02-08 14:23           ` Qu Wenruo
2016-02-08 21:44     ` Nikolaus Rath
2016-02-08 22:12       ` Duncan
2016-02-09  7:29       ` Kai Krakow
2016-02-09 16:09         ` Nikolaus Rath
2016-02-09 21:43           ` Kai Krakow
2016-02-09 22:02             ` Chris Murphy
2016-02-09 22:38             ` Nikolaus Rath
2016-02-10  1:12               ` Henk Slager
2016-02-09 16:10         ` Nikolaus Rath
2016-02-09 21:29           ` Kai Krakow
2016-02-09 18:23         ` Henk Slager [this message]
2016-02-09 13:22       ` Austin S. Hemmelgarn
2016-02-10  4:08       ` Nikolaus Rath

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPmG0jYnogfJjutzQaL0FJAUbnLJke8i1-s_sNWVdF7JauSQ4w@mail.gmail.com \
    --to=eye1tm@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.