All of lore.kernel.org
 help / color / mirror / Atom feed
* Use fast device only for metadata?
@ 2016-02-07 19:06 Nikolaus Rath
  2016-02-07 20:07 ` Kai Krakow
  0 siblings, 1 reply; 21+ messages in thread
From: Nikolaus Rath @ 2016-02-07 19:06 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I have a large home directory on a spinning disk that I regularly
synchronize between different computers using unison. That takes ages,
even though the amount of changed files is typically small. I suspect
most if the time is spend walking through the file system and checking
mtimes.

So I was wondering if I could possibly speed-up this operation by
storing all btrfs metadata on a fast, SSD drive. It seems that
mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
file contents in single mode. However, I could not find a way to tell
btrfs to use a device *only* for metadata. Is there a way to do that?

Also, what is the difference between using "dup" and "raid1" for the
metadata? 

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-07 19:06 Use fast device only for metadata? Nikolaus Rath
@ 2016-02-07 20:07 ` Kai Krakow
  2016-02-07 20:59   ` Martin Steigerwald
  0 siblings, 1 reply; 21+ messages in thread
From: Kai Krakow @ 2016-02-07 20:07 UTC (permalink / raw)
  To: linux-btrfs

Am Sun, 07 Feb 2016 11:06:58 -0800
schrieb Nikolaus Rath <Nikolaus@rath.org>:

> Hello,
> 
> I have a large home directory on a spinning disk that I regularly
> synchronize between different computers using unison. That takes ages,
> even though the amount of changed files is typically small. I suspect
> most if the time is spend walking through the file system and checking
> mtimes.
> 
> So I was wondering if I could possibly speed-up this operation by
> storing all btrfs metadata on a fast, SSD drive. It seems that
> mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
> file contents in single mode. However, I could not find a way to tell
> btrfs to use a device *only* for metadata. Is there a way to do that?
> 
> Also, what is the difference between using "dup" and "raid1" for the
> metadata? 

You may want to try bcache. It will speedup random access which is
probably the main cause for your slow sync. Unfortunately it requires
you to reformat your btrfs partitions to add a bcache superblock. But
it's worth the efforts.

I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
to typically 1.5-3 depending on how much data changed.

-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-07 20:07 ` Kai Krakow
@ 2016-02-07 20:59   ` Martin Steigerwald
  2016-02-08  1:04     ` Duncan
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Martin Steigerwald @ 2016-02-07 20:59 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-btrfs

Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
> Am Sun, 07 Feb 2016 11:06:58 -0800
> 
> schrieb Nikolaus Rath <Nikolaus@rath.org>:
> > Hello,
> > 
> > I have a large home directory on a spinning disk that I regularly
> > synchronize between different computers using unison. That takes ages,
> > even though the amount of changed files is typically small. I suspect
> > most if the time is spend walking through the file system and checking
> > mtimes.
> > 
> > So I was wondering if I could possibly speed-up this operation by
> > storing all btrfs metadata on a fast, SSD drive. It seems that
> > mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
> > file contents in single mode. However, I could not find a way to tell
> > btrfs to use a device *only* for metadata. Is there a way to do that?
> > 
> > Also, what is the difference between using "dup" and "raid1" for the
> > metadata?
> 
> You may want to try bcache. It will speedup random access which is
> probably the main cause for your slow sync. Unfortunately it requires
> you to reformat your btrfs partitions to add a bcache superblock. But
> it's worth the efforts.
> 
> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
> to typically 1.5-3 depending on how much data changed.

An alternative is using dm-cache, I think it doesn´t need to recreate the 
filesystem.

I wonder what happened to the VFS hot data tracking stuff patchset floating 
around here quite some time ago.

-- 
Martin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-07 20:59   ` Martin Steigerwald
@ 2016-02-08  1:04     ` Duncan
  2016-02-08 12:24     ` Austin S. Hemmelgarn
  2016-02-08 21:44     ` Nikolaus Rath
  2 siblings, 0 replies; 21+ messages in thread
From: Duncan @ 2016-02-08  1:04 UTC (permalink / raw)
  To: linux-btrfs

Martin Steigerwald posted on Sun, 07 Feb 2016 21:59:48 +0100 as excerpted:

> Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>> Am Sun, 07 Feb 2016 11:06:58 -0800
>> 
>> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>>> 
>>> I have a large home directory on a spinning disk that I regularly
>>> synchronize between different computers using unison. That takes
>>> ages, even though the amount of changed files is typically small.
>>> I suspect most if the time is spend walking through the file system
>>> and checking mtimes.
>>> 
>>> So I was wondering if I could possibly speed-up this operation by
>>> storing all btrfs metadata on a fast, SSD drive. It seems that
>>> mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and
>>> the file contents in single mode. However, I could not find a way to
>>> tell btrfs to use a device *only* for metadata. Is there a way to do
>>> that?

As with the others, I'd recommend bcache or dmcache.  Which one is up to 
you, but AFAIK, bcache has more on-list users and is thus potentially 
better tested with btrfs and easier to find someone to compare btrfs-on-
XXcache notes with, if you find you need to.

>>> Also, what is the difference between using "dup" and "raid1" for the
>>> metadata?

Dup is 2X on a single device and is the single-device metadata default on 
spinning rust (single is the default for single-device on SSDs, primarily 
due to some SSD firmware doing dedup already, in which case dup wouldn't 
do anything but take more CPU time).

Raid1 is 2X on multiple devices, with the allocator ensuring the two 
copies don't end up on the same device.  It's the metadata default on 
multi-device.

>> You may want to try bcache. [...] I use a nightly rsync to USB3 disk,
>> and bcache reduced it from 5+ hours to typically 1.5-3 depending on
>> how much data changed.
> 
> An alternative is using dm-cache, I think it doesn´t need to recreate
> the filesystem.
> 
> I wonder what happened to the VFS hot data tracking stuff patchset
> floating around here quite some time ago.

AFAIK it's still around, and very possibly in-use by some major user.  I 
believe it's still on the btrfs roadmap and should eventually be 
mainlined, but with bcache and dmcache maturing now, there's not the 
pressing need for it to be btrfs-built-in that there was before.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-07 20:59   ` Martin Steigerwald
  2016-02-08  1:04     ` Duncan
@ 2016-02-08 12:24     ` Austin S. Hemmelgarn
  2016-02-08 13:20       ` Qu Wenruo
  2016-02-08 21:44     ` Nikolaus Rath
  2 siblings, 1 reply; 21+ messages in thread
From: Austin S. Hemmelgarn @ 2016-02-08 12:24 UTC (permalink / raw)
  To: Martin Steigerwald, Kai Krakow; +Cc: linux-btrfs

On 2016-02-07 15:59, Martin Steigerwald wrote:
> Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>> Am Sun, 07 Feb 2016 11:06:58 -0800
>>
>> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>>> Hello,
>>>
>>> I have a large home directory on a spinning disk that I regularly
>>> synchronize between different computers using unison. That takes ages,
>>> even though the amount of changed files is typically small. I suspect
>>> most if the time is spend walking through the file system and checking
>>> mtimes.
>>>
>>> So I was wondering if I could possibly speed-up this operation by
>>> storing all btrfs metadata on a fast, SSD drive. It seems that
>>> mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
>>> file contents in single mode. However, I could not find a way to tell
>>> btrfs to use a device *only* for metadata. Is there a way to do that?
>>>
>>> Also, what is the difference between using "dup" and "raid1" for the
>>> metadata?
>>
>> You may want to try bcache. It will speedup random access which is
>> probably the main cause for your slow sync. Unfortunately it requires
>> you to reformat your btrfs partitions to add a bcache superblock. But
>> it's worth the efforts.
>>
>> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
>> to typically 1.5-3 depending on how much data changed.
>
> An alternative is using dm-cache, I think it doesn´t need to recreate the
> filesystem.
That's correct, dm-cache can use a regular underlying storage device. 
This of course has potential implications for a multi-device filesystem 
(it can seriously confuse BTRFS and cause data corruption), but it works 
just fine for a single device filesystem.  This makes it a bit easier to 
test run, but also means you need more devices (internally, it uses 3, 
one backing device, one cache device, and a metadata device for 
persistently mapping between the two).  It's really easy to set up 
though if you have a recent version of LVM built with dm-cache support.

In general, bcache takes a bit more setup, but avoids the multi-device 
issues, and importantly, doesn't require LVM or dmsetup (which are 
usually pretty big packages on many distros).  The caveat with bcache 
though is that there have been issues in the past with data integrity 
when used with BTRFS, but if you're on a recent kernel (at least 4.0 if 
you're using BTRFS for actual data storage), you should have no issues.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-08 12:24     ` Austin S. Hemmelgarn
@ 2016-02-08 13:20       ` Qu Wenruo
  2016-02-08 13:29         ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 21+ messages in thread
From: Qu Wenruo @ 2016-02-08 13:20 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Martin Steigerwald, Kai Krakow; +Cc: linux-btrfs



On 02/08/2016 08:24 PM, Austin S. Hemmelgarn wrote:
> On 2016-02-07 15:59, Martin Steigerwald wrote:
>> Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>>> Am Sun, 07 Feb 2016 11:06:58 -0800
>>>
>>> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>>>> Hello,
>>>>
>>>> I have a large home directory on a spinning disk that I regularly
>>>> synchronize between different computers using unison. That takes ages,
>>>> even though the amount of changed files is typically small. I suspect
>>>> most if the time is spend walking through the file system and checking
>>>> mtimes.
>>>>
>>>> So I was wondering if I could possibly speed-up this operation by
>>>> storing all btrfs metadata on a fast, SSD drive. It seems that
>>>> mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
>>>> file contents in single mode. However, I could not find a way to tell
>>>> btrfs to use a device *only* for metadata. Is there a way to do that?
>>>>
>>>> Also, what is the difference between using "dup" and "raid1" for the
>>>> metadata?
>>>
>>> You may want to try bcache. It will speedup random access which is
>>> probably the main cause for your slow sync. Unfortunately it requires
>>> you to reformat your btrfs partitions to add a bcache superblock. But
>>> it's worth the efforts.
>>>
>>> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
>>> to typically 1.5-3 depending on how much data changed.
>>
>> An alternative is using dm-cache, I think it doesn´t need to recreate the
>> filesystem.
> That's correct, dm-cache can use a regular underlying storage device.
> This of course has potential implications for a multi-device filesystem
> (it can seriously confuse BTRFS and cause data corruption), but it works
> just fine for a single device filesystem.  This makes it a bit easier to
> test run, but also means you need more devices (internally, it uses 3,
> one backing device, one cache device, and a metadata device for
> persistently mapping between the two).  It's really easy to set up
> though if you have a recent version of LVM built with dm-cache support.
>
> In general, bcache takes a bit more setup, but avoids the multi-device
> issues, and importantly, doesn't require LVM or dmsetup (which are
> usually pretty big packages on many distros).  The caveat with bcache
> though is that there have been issues in the past with data integrity
> when used with BTRFS, but if you're on a recent kernel (at least 4.0 if
> you're using BTRFS for actual data storage), you should have no issues.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

And I just want to add more about using a device *only* for metadata.

The short answer is, unfortunately, NO.

1) Even using bcache/dm-cache, it may still cache small data write

Although I'm not quite sure about dm-cache/bcache, but as long as the 
top file is Btrfs, it won't be possible to limit data/metadata to/from 
specific device.

IIRC, bcache or similiar method may cache most random r/w of metadata, 
it's still quite possible to cache a lot of random r/w of data.

And depending on the sector size(minimal data block size) and leaf size 
(metadata block size), it's even more possible to cache small data other 
than metadata under specific worload.
As default sectorsize is 4K, but leafsize is 16K.

2) Btrfs don't have special preference on chunk allocation.

Btrfs just allocate chunks in the order of unallocated space.
So, even there is a super big TB or PB spinning device, and GB level 
SSD, btrfs will just trust them according to unallocated space.



BTW, to really allocate the bottleneck, it's better to use perf to 
allocate which function btrfs spends most of its time on.

Although it's a known fact that btrfs is quite slow on metadata 
modification compared to other file systems, I'm still not quite sure 
about if that's the root cause.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-08 13:20       ` Qu Wenruo
@ 2016-02-08 13:29         ` Austin S. Hemmelgarn
  2016-02-08 14:23           ` Qu Wenruo
  0 siblings, 1 reply; 21+ messages in thread
From: Austin S. Hemmelgarn @ 2016-02-08 13:29 UTC (permalink / raw)
  To: Qu Wenruo, Martin Steigerwald, Kai Krakow; +Cc: linux-btrfs

On 2016-02-08 08:20, Qu Wenruo wrote:
> On 02/08/2016 08:24 PM, Austin S. Hemmelgarn wrote:
>> On 2016-02-07 15:59, Martin Steigerwald wrote:
>>> Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>>>> Am Sun, 07 Feb 2016 11:06:58 -0800
>>>>
>>>> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>>>>> Hello,
>>>>>
>>>>> I have a large home directory on a spinning disk that I regularly
>>>>> synchronize between different computers using unison. That takes ages,
>>>>> even though the amount of changed files is typically small. I suspect
>>>>> most if the time is spend walking through the file system and checking
>>>>> mtimes.
>>>>>
>>>>> So I was wondering if I could possibly speed-up this operation by
>>>>> storing all btrfs metadata on a fast, SSD drive. It seems that
>>>>> mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
>>>>> file contents in single mode. However, I could not find a way to tell
>>>>> btrfs to use a device *only* for metadata. Is there a way to do that?
>>>>>
>>>>> Also, what is the difference between using "dup" and "raid1" for the
>>>>> metadata?
>>>>
>>>> You may want to try bcache. It will speedup random access which is
>>>> probably the main cause for your slow sync. Unfortunately it requires
>>>> you to reformat your btrfs partitions to add a bcache superblock. But
>>>> it's worth the efforts.
>>>>
>>>> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
>>>> to typically 1.5-3 depending on how much data changed.
>>>
>>> An alternative is using dm-cache, I think it doesn´t need to recreate
>>> the
>>> filesystem.
>> That's correct, dm-cache can use a regular underlying storage device.
>> This of course has potential implications for a multi-device filesystem
>> (it can seriously confuse BTRFS and cause data corruption), but it works
>> just fine for a single device filesystem.  This makes it a bit easier to
>> test run, but also means you need more devices (internally, it uses 3,
>> one backing device, one cache device, and a metadata device for
>> persistently mapping between the two).  It's really easy to set up
>> though if you have a recent version of LVM built with dm-cache support.
>>
>> In general, bcache takes a bit more setup, but avoids the multi-device
>> issues, and importantly, doesn't require LVM or dmsetup (which are
>> usually pretty big packages on many distros).  The caveat with bcache
>> though is that there have been issues in the past with data integrity
>> when used with BTRFS, but if you're on a recent kernel (at least 4.0 if
>> you're using BTRFS for actual data storage), you should have no issues.
>
> And I just want to add more about using a device *only* for metadata.
>
> The short answer is, unfortunately, NO.
>
> 1) Even using bcache/dm-cache, it may still cache small data write
>
> Although I'm not quite sure about dm-cache/bcache, but as long as the
> top file is Btrfs, it won't be possible to limit data/metadata to/from
> specific device.
>
> IIRC, bcache or similiar method may cache most random r/w of metadata,
> it's still quite possible to cache a lot of random r/w of data.
>
> And depending on the sector size(minimal data block size) and leaf size
> (metadata block size), it's even more possible to cache small data other
> than metadata under specific worload.
> As default sectorsize is 4K, but leafsize is 16K.
The mention of dm-cache/bcache was more intended as an alternative, 
since BTRFS currently can't do what Nikolaus was trying to achieve. 
Neither will give quite the performance profile that a dedicated 
metadata device might, but they should still significantly improve 
general performance.  In essence, these function for BTRFS like L2ARC on 
an SSD does for ZFS.
>
> 2) Btrfs don't have special preference on chunk allocation.
>
> Btrfs just allocate chunks in the order of unallocated space.
> So, even there is a super big TB or PB spinning device, and GB level
> SSD, btrfs will just trust them according to unallocated space.
On at least the project page, there is a suggestion to provide this 
functionality.  In a way, it's essentially equivalent to the external 
journal device supported by ext4, XFS, OCFS2 and some other filesystems, 
and as such, I'd say it's a feature we should seriously consider looking 
at implementing eventually, even if just for feature parity, and even if 
we speed up metadata operations in BTRFS.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-08 13:29         ` Austin S. Hemmelgarn
@ 2016-02-08 14:23           ` Qu Wenruo
  0 siblings, 0 replies; 21+ messages in thread
From: Qu Wenruo @ 2016-02-08 14:23 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Martin Steigerwald, Kai Krakow; +Cc: linux-btrfs



On 02/08/2016 09:29 PM, Austin S. Hemmelgarn wrote:
> On 2016-02-08 08:20, Qu Wenruo wrote:
>> On 02/08/2016 08:24 PM, Austin S. Hemmelgarn wrote:
>>> On 2016-02-07 15:59, Martin Steigerwald wrote:
>>>> Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>>>>> Am Sun, 07 Feb 2016 11:06:58 -0800
>>>>>
>>>>> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>>>>>> Hello,
>>>>>>
>>>>>> I have a large home directory on a spinning disk that I regularly
>>>>>> synchronize between different computers using unison. That takes
>>>>>> ages,
>>>>>> even though the amount of changed files is typically small. I suspect
>>>>>> most if the time is spend walking through the file system and
>>>>>> checking
>>>>>> mtimes.
>>>>>>
>>>>>> So I was wondering if I could possibly speed-up this operation by
>>>>>> storing all btrfs metadata on a fast, SSD drive. It seems that
>>>>>> mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and
>>>>>> the
>>>>>> file contents in single mode. However, I could not find a way to tell
>>>>>> btrfs to use a device *only* for metadata. Is there a way to do that?
>>>>>>
>>>>>> Also, what is the difference between using "dup" and "raid1" for the
>>>>>> metadata?
>>>>>
>>>>> You may want to try bcache. It will speedup random access which is
>>>>> probably the main cause for your slow sync. Unfortunately it requires
>>>>> you to reformat your btrfs partitions to add a bcache superblock. But
>>>>> it's worth the efforts.
>>>>>
>>>>> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+
>>>>> hours
>>>>> to typically 1.5-3 depending on how much data changed.
>>>>
>>>> An alternative is using dm-cache, I think it doesn´t need to recreate
>>>> the
>>>> filesystem.
>>> That's correct, dm-cache can use a regular underlying storage device.
>>> This of course has potential implications for a multi-device filesystem
>>> (it can seriously confuse BTRFS and cause data corruption), but it works
>>> just fine for a single device filesystem.  This makes it a bit easier to
>>> test run, but also means you need more devices (internally, it uses 3,
>>> one backing device, one cache device, and a metadata device for
>>> persistently mapping between the two).  It's really easy to set up
>>> though if you have a recent version of LVM built with dm-cache support.
>>>
>>> In general, bcache takes a bit more setup, but avoids the multi-device
>>> issues, and importantly, doesn't require LVM or dmsetup (which are
>>> usually pretty big packages on many distros).  The caveat with bcache
>>> though is that there have been issues in the past with data integrity
>>> when used with BTRFS, but if you're on a recent kernel (at least 4.0 if
>>> you're using BTRFS for actual data storage), you should have no issues.
>>
>> And I just want to add more about using a device *only* for metadata.
>>
>> The short answer is, unfortunately, NO.
>>
>> 1) Even using bcache/dm-cache, it may still cache small data write
>>
>> Although I'm not quite sure about dm-cache/bcache, but as long as the
>> top file is Btrfs, it won't be possible to limit data/metadata to/from
>> specific device.
>>
>> IIRC, bcache or similiar method may cache most random r/w of metadata,
>> it's still quite possible to cache a lot of random r/w of data.
>>
>> And depending on the sector size(minimal data block size) and leaf size
>> (metadata block size), it's even more possible to cache small data other
>> than metadata under specific worload.
>> As default sectorsize is 4K, but leafsize is 16K.
> The mention of dm-cache/bcache was more intended as an alternative,
> since BTRFS currently can't do what Nikolaus was trying to achieve.
> Neither will give quite the performance profile that a dedicated
> metadata device might, but they should still significantly improve
> general performance.  In essence, these function for BTRFS like L2ARC on
> an SSD does for ZFS.
>>
>> 2) Btrfs don't have special preference on chunk allocation.
>>
>> Btrfs just allocate chunks in the order of unallocated space.
>> So, even there is a super big TB or PB spinning device, and GB level
>> SSD, btrfs will just trust them according to unallocated space.
> On at least the project page, there is a suggestion to provide this
> functionality.  In a way, it's essentially equivalent to the external
> journal device supported by ext4, XFS, OCFS2 and some other filesystems,
> and as such, I'd say it's a feature we should seriously consider looking
> at implementing eventually, even if just for feature parity, and even if
> we speed up metadata operations in BTRFS.

Yes, that's quite a good feature, not only for metadata speedup, but 
also for better metadata safety.

But on the other hand, I also suspect lock concurrency other than device 
speed is causing slow btrfs metadata performance.

Fortunately, that's also in the project page.
But unfortunately, it may be much harder to implement than special 
behaved chunk allocation.

Thanks,
Qu

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-07 20:59   ` Martin Steigerwald
  2016-02-08  1:04     ` Duncan
  2016-02-08 12:24     ` Austin S. Hemmelgarn
@ 2016-02-08 21:44     ` Nikolaus Rath
  2016-02-08 22:12       ` Duncan
                         ` (3 more replies)
  2 siblings, 4 replies; 21+ messages in thread
From: Nikolaus Rath @ 2016-02-08 21:44 UTC (permalink / raw)
  To: linux-btrfs

On Feb 07 2016, Martin Steigerwald <martin@lichtvoll.de> wrote:
> Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>> Am Sun, 07 Feb 2016 11:06:58 -0800
>> 
>> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>> > Hello,
>> > 
>> > I have a large home directory on a spinning disk that I regularly
>> > synchronize between different computers using unison. That takes ages,
>> > even though the amount of changed files is typically small. I suspect
>> > most if the time is spend walking through the file system and checking
>> > mtimes.
>> > 
>> > So I was wondering if I could possibly speed-up this operation by
>> > storing all btrfs metadata on a fast, SSD drive. It seems that
>> > mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
>> > file contents in single mode. However, I could not find a way to tell
>> > btrfs to use a device *only* for metadata. Is there a way to do that?
>> > 
>> > Also, what is the difference between using "dup" and "raid1" for the
>> > metadata?
>> 
>> You may want to try bcache. It will speedup random access which is
>> probably the main cause for your slow sync. Unfortunately it requires
>> you to reformat your btrfs partitions to add a bcache superblock. But
>> it's worth the efforts.
>> 
>> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
>> to typically 1.5-3 depending on how much data changed.
>
> An alternative is using dm-cache, I think it doesn´t need to recreate the 
> filesystem.

Yes, I tried that already but it didn't improve things at all. I wrote a
message to the lvm list though, so maybe someone will be able to help.

Otherwise I'll give bcache a shot. I've avoided it so far because of the
need to reformat and because of rumours that it doesn't work well with
LVM or BTRFS. But it sounds as if that's not the case..


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-08 21:44     ` Nikolaus Rath
@ 2016-02-08 22:12       ` Duncan
  2016-02-09  7:29       ` Kai Krakow
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Duncan @ 2016-02-08 22:12 UTC (permalink / raw)
  To: linux-btrfs

Nikolaus Rath posted on Mon, 08 Feb 2016 13:44:17 -0800 as excerpted:

> Otherwise I'll give bcache a shot. I've avoided it so far because of the
> need to reformat and because of rumours that it doesn't work well with
> LVM or BTRFS. But it sounds as if that's not the case..

Bcache used to have problems with btrfs, but as I and others have 
mentioned, we have people known to be using btrfs with bcache on the 
list, and it has been working fine for quite some time, now.

Bcache vs. LVM, OTOH, I know nothing about.  Tho to be fair I guess I'm a 
bit anti-LVM biased myself, as it seems a bit too much complexity for the 
offered advantages, and when I tried it some time ago along with mdraid, 
I decided to keep the mdraid, but kill the lvm as too complex to be 
confident I could manage it correctly under the pressures of a disaster 
recovery situation, possibly with limited access to documentation, 
manpages, other recovery tools, etc.  MDRaid, OTOH, was easier to 
administer, in part because it's possible to assemble mdraid direct from 
the kernel without userspace (initr* or the like if / is on it), and I 
successfully managed it thru various issues over some time.

Of course these days I use multi-device btrfs directly, no mdraid, and a 
multi-device btrfs root unfortunately does seem to require an initr*, but 
its other advantages outweigh the additional complexity of having to use 
an initr*, so...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-08 21:44     ` Nikolaus Rath
  2016-02-08 22:12       ` Duncan
@ 2016-02-09  7:29       ` Kai Krakow
  2016-02-09 16:09         ` Nikolaus Rath
                           ` (2 more replies)
  2016-02-09 13:22       ` Austin S. Hemmelgarn
  2016-02-10  4:08       ` Nikolaus Rath
  3 siblings, 3 replies; 21+ messages in thread
From: Kai Krakow @ 2016-02-09  7:29 UTC (permalink / raw)
  To: linux-btrfs

Am Mon, 08 Feb 2016 13:44:17 -0800
schrieb Nikolaus Rath <Nikolaus@rath.org>:

> On Feb 07 2016, Martin Steigerwald <martin@lichtvoll.de> wrote:
> > Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
> >> Am Sun, 07 Feb 2016 11:06:58 -0800
> >> 
> >> schrieb Nikolaus Rath <Nikolaus@rath.org>:
> >> > Hello,
> >> > 
> >> > I have a large home directory on a spinning disk that I regularly
> >> > synchronize between different computers using unison. That takes
> >> > ages, even though the amount of changed files is typically
> >> > small. I suspect most if the time is spend walking through the
> >> > file system and checking mtimes.
> >> > 
> >> > So I was wondering if I could possibly speed-up this operation by
> >> > storing all btrfs metadata on a fast, SSD drive. It seems that
> >> > mkfs.btrfs allows me to put the metadata in raid1 or dup mode,
> >> > and the file contents in single mode. However, I could not find
> >> > a way to tell btrfs to use a device *only* for metadata. Is
> >> > there a way to do that?
> >> > 
> >> > Also, what is the difference between using "dup" and "raid1" for
> >> > the metadata?
> >> 
> >> You may want to try bcache. It will speedup random access which is
> >> probably the main cause for your slow sync. Unfortunately it
> >> requires you to reformat your btrfs partitions to add a bcache
> >> superblock. But it's worth the efforts.
> >> 
> >> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+
> >> hours to typically 1.5-3 depending on how much data changed.
> >
> > An alternative is using dm-cache, I think it doesn´t need to
> > recreate the filesystem.
> 
> Yes, I tried that already but it didn't improve things at all. I
> wrote a message to the lvm list though, so maybe someone will be able
> to help.
> 
> Otherwise I'll give bcache a shot. I've avoided it so far because of
> the need to reformat and because of rumours that it doesn't work well
> with LVM or BTRFS. But it sounds as if that's not the case..

I'm myself using bcache+btrfs and it ran bullet proof so far, even
after unintentional resets or power outage. It's important tho to NOT
put any storage layer between bcache and your devices or between btrfs
and your device as there are reports it becomes unstable with md or lvm
involved. In my setup I can even use discard/trim without problems. I'd
recommend a current kernel, tho.

Since it requires reformatting, it's a big pita but it's worth the
efforts. It appeared, from its design, much more effective and stable
than dmcache. You could even format a bcache superblock "just in case",
and add an SSD later. Without SSD, bcache will just work in passthru
mode. Actually, I started to format all my storage with bcache
superblock "just in case". It is similar to having another partition
table folded inside - so it doesn't hurt (except you need bcache-probe
in initrd to detect the contained filesystems).

-- 
Regards,
Kai

Replies to list-only preferred.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-08 21:44     ` Nikolaus Rath
  2016-02-08 22:12       ` Duncan
  2016-02-09  7:29       ` Kai Krakow
@ 2016-02-09 13:22       ` Austin S. Hemmelgarn
  2016-02-10  4:08       ` Nikolaus Rath
  3 siblings, 0 replies; 21+ messages in thread
From: Austin S. Hemmelgarn @ 2016-02-09 13:22 UTC (permalink / raw)
  To: Nikolaus Rath, linux-btrfs

On 2016-02-08 16:44, Nikolaus Rath wrote:
> On Feb 07 2016, Martin Steigerwald <martin@lichtvoll.de> wrote:
>> Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>>> Am Sun, 07 Feb 2016 11:06:58 -0800
>>>
>>> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>>>> Hello,
>>>>
>>>> I have a large home directory on a spinning disk that I regularly
>>>> synchronize between different computers using unison. That takes ages,
>>>> even though the amount of changed files is typically small. I suspect
>>>> most if the time is spend walking through the file system and checking
>>>> mtimes.
>>>>
>>>> So I was wondering if I could possibly speed-up this operation by
>>>> storing all btrfs metadata on a fast, SSD drive. It seems that
>>>> mkfs.btrfs allows me to put the metadata in raid1 or dup mode, and the
>>>> file contents in single mode. However, I could not find a way to tell
>>>> btrfs to use a device *only* for metadata. Is there a way to do that?
>>>>
>>>> Also, what is the difference between using "dup" and "raid1" for the
>>>> metadata?
>>>
>>> You may want to try bcache. It will speedup random access which is
>>> probably the main cause for your slow sync. Unfortunately it requires
>>> you to reformat your btrfs partitions to add a bcache superblock. But
>>> it's worth the efforts.
>>>
>>> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+ hours
>>> to typically 1.5-3 depending on how much data changed.
>>
>> An alternative is using dm-cache, I think it doesn´t need to recreate the
>> filesystem.
>
> Yes, I tried that already but it didn't improve things at all. I wrote a
> message to the lvm list though, so maybe someone will be able to help.
That's interesting.  I've been using BTRFS on dm-cache for a while, and 
have seen measurable improvements in performance.  They are not big 
improvements (only about 5% peak), but they're still improvements, which 
is somewhat impressive considering that the backing storage that's being 
cached is a RAID0 set which gets almost the same raw throughput as the 
SSD that's caching it.  Of course, I'm using it more for the power 
savings (SSD's use less power, and I've got a big enough cache I can 
often spin down the traditional disks in the RAID0 set), and I also 
re-tune my system as hardware and workloads change, and my workloads 
tend to be atypical (lots of sequential isochronous writes, regular long 
sequential reads, and some random reads and rewrites), so YMMV.
>
> Otherwise I'll give bcache a shot. I've avoided it so far because of the
> need to reformat and because of rumours that it doesn't work well with
> LVM or BTRFS. But it sounds as if that's not the case..
It should work fine with _just_ BTRFS, but don't put any other layers 
into the storage system like LVM or dmcrypt or mdraid, it's got some 
pretty pathological interactions with the device mapper and md 
frameworks still.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-09  7:29       ` Kai Krakow
@ 2016-02-09 16:09         ` Nikolaus Rath
  2016-02-09 21:43           ` Kai Krakow
  2016-02-09 16:10         ` Nikolaus Rath
  2016-02-09 18:23         ` Henk Slager
  2 siblings, 1 reply; 21+ messages in thread
From: Nikolaus Rath @ 2016-02-09 16:09 UTC (permalink / raw)
  To: linux-btrfs

On Feb 09 2016, Kai Krakow <hurikhan77@gmail.com> wrote:
> I'm myself using bcache+btrfs and it ran bullet proof so far, even
> after unintentional resets or power outage. It's important tho to NOT
> put any storage layer between bcache and your devices or between btrfs
> and your device as there are reports it becomes unstable with md or lvm
> involved.

Do you mean I should not use anything in the stack other than btrfs and
bcache, or do you mean I should not put anything under bcache?

In other words, I assume bcache on LVM is a bad idea. But what about LVM
on bcache?

Also, btrfs on LVM on disk is working fine for me, but you seem to be
saying that it should not? Or are you talking specifically about btrfs
on LVM on bcache?


If there's no way to put LVM anywhere into the stack that'd be a bummer,
I very much want to use dm-crypt (and I guess that counts as lvm?).

Thanks,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-09  7:29       ` Kai Krakow
  2016-02-09 16:09         ` Nikolaus Rath
@ 2016-02-09 16:10         ` Nikolaus Rath
  2016-02-09 21:29           ` Kai Krakow
  2016-02-09 18:23         ` Henk Slager
  2 siblings, 1 reply; 21+ messages in thread
From: Nikolaus Rath @ 2016-02-09 16:10 UTC (permalink / raw)
  To: linux-btrfs

On Feb 09 2016, Kai Krakow <hurikhan77@gmail.com> wrote:
> You could even format a bcache superblock "just in case",
> and add an SSD later. Without SSD, bcache will just work in passthru
> mode.

Do the LVM concerns still apply in passthrough mode, or only when
there's an actual cache?

Thanks,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-09  7:29       ` Kai Krakow
  2016-02-09 16:09         ` Nikolaus Rath
  2016-02-09 16:10         ` Nikolaus Rath
@ 2016-02-09 18:23         ` Henk Slager
  2 siblings, 0 replies; 21+ messages in thread
From: Henk Slager @ 2016-02-09 18:23 UTC (permalink / raw)
  To: linux-btrfs

On Tue, Feb 9, 2016 at 8:29 AM, Kai Krakow <hurikhan77@gmail.com> wrote:
> Am Mon, 08 Feb 2016 13:44:17 -0800
> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>
>> On Feb 07 2016, Martin Steigerwald <martin@lichtvoll.de> wrote:
>> > Am Sonntag, 7. Februar 2016, 21:07:13 CET schrieb Kai Krakow:
>> >> Am Sun, 07 Feb 2016 11:06:58 -0800
>> >>
>> >> schrieb Nikolaus Rath <Nikolaus@rath.org>:
>> >> > Hello,
>> >> >
>> >> > I have a large home directory on a spinning disk that I regularly
>> >> > synchronize between different computers using unison. That takes
>> >> > ages, even though the amount of changed files is typically
>> >> > small. I suspect most if the time is spend walking through the
>> >> > file system and checking mtimes.
>> >> >
>> >> > So I was wondering if I could possibly speed-up this operation by
>> >> > storing all btrfs metadata on a fast, SSD drive. It seems that
>> >> > mkfs.btrfs allows me to put the metadata in raid1 or dup mode,
>> >> > and the file contents in single mode. However, I could not find
>> >> > a way to tell btrfs to use a device *only* for metadata. Is
>> >> > there a way to do that?
>> >> >
>> >> > Also, what is the difference between using "dup" and "raid1" for
>> >> > the metadata?
>> >>
>> >> You may want to try bcache. It will speedup random access which is
>> >> probably the main cause for your slow sync. Unfortunately it
>> >> requires you to reformat your btrfs partitions to add a bcache
>> >> superblock. But it's worth the efforts.
>> >>
>> >> I use a nightly rsync to USB3 disk, and bcache reduced it from 5+
>> >> hours to typically 1.5-3 depending on how much data changed.
>> >
>> > An alternative is using dm-cache, I think it doesn´t need to
>> > recreate the filesystem.
>>
>> Yes, I tried that already but it didn't improve things at all. I
>> wrote a message to the lvm list though, so maybe someone will be able
>> to help.
>>
>> Otherwise I'll give bcache a shot. I've avoided it so far because of
>> the need to reformat and because of rumours that it doesn't work well
>> with LVM or BTRFS. But it sounds as if that's not the case..
>
> I'm myself using bcache+btrfs and it ran bullet proof so far, even
> after unintentional resets or power outage. It's important tho to NOT
> put any storage layer between bcache and your devices or between btrfs
> and your device as there are reports it becomes unstable with md or lvm
> involved. In my setup I can even use discard/trim without problems. I'd
> recommend a current kernel, tho.
>
> Since it requires reformatting, it's a big pita but it's worth the
> efforts. It appeared, from its design, much more effective and stable
> than dmcache. You could even format a bcache superblock "just in case",
> and add an SSD later. Without SSD, bcache will just work in passthru
> mode. Actually, I started to format all my storage with bcache
> superblock "just in case". It is similar to having another partition
> table folded inside - so it doesn't hurt (except you need bcache-probe
> in initrd to detect the contained filesystems).

Same positive bcache+BTRFS experience for me, I am using it since
kernel 4.1.6 and now just latest 4.4. Especially now it is possible to
use VM images in normal CoW mode with speed/performance comparable to
the image on SSD. This is with 50G images consisting of about 50k
extents, raid10 btrfs with mount options noatime,nossd,autodefrag and
writeback on. Initial amount of extents was in order of 100 or so, but
later small writes inside the VM just almost all end up in the bcache.
Nightly incremental send|receive is just a few minutes. Kernel compile
from local git repo clone almost works like from SSD.

When both RAM cache is invalidated and bcache detached / stopped / not
there, filesystem finds or operations that have to deal with
fragmentation or a lot of seeks clearly take way more time. From
there, after starting and using an OS in a VM for lets say 10 minutes
for common tasks, speed is 'SSD like' and not 'HDD like' anymore and
stays that way (until eviction of blocks of course).

The 'reformatting' might be avoided by using this:
https://github.com/g2p/blocks

I haven't used it myself as one fs was just full harddisk and my
python installations had some issues. I wanted to keep same UUID ( due
to longterm incremental send|receive cloning setup) so I did shrink
the filesystem to its almost smallest possible and then used an extra
device (4TB) to dd_rescue the fs image onto and then 2nd step
dd_rescue it back to the original disk (to a partition that is
bcache'd). A btrfs replace would have also been an option. Or some
2-step add-remove action or tricks with raid1.

For another disk I did not have a spare disk, so I made a script to do
an 'in-place' filesystem image replace. I have browsed the superblocks
(don't remember size, but its a few kB AFAIK), so 1G copyblocksize is
huge enough and keeping at least 2 copyblocks readahead stored on
intermediate storage worked fine. Same can be used for LUKS header
addition.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-09 16:10         ` Nikolaus Rath
@ 2016-02-09 21:29           ` Kai Krakow
  0 siblings, 0 replies; 21+ messages in thread
From: Kai Krakow @ 2016-02-09 21:29 UTC (permalink / raw)
  To: linux-btrfs

Am Tue, 09 Feb 2016 08:10:15 -0800
schrieb Nikolaus Rath <Nikolaus@rath.org>:

> On Feb 09 2016, Kai Krakow <hurikhan77@gmail.com> wrote:
> > You could even format a bcache superblock "just in case",
> > and add an SSD later. Without SSD, bcache will just work in passthru
> > mode.
> 
> Do the LVM concerns still apply in passthrough mode, or only when
> there's an actual cache?

I don't think anyone ever tried... But I think there's actually not
much logic involved in passthru mode, still it would pass through the
bcache layer - where the bugs may be. It may be worth stress testing
such a setup first, then do your backups (which you should do anyways
when using btrfs, so this is more or less a no-op).

There may even be differences if backing is on lvm, or if caching is on
lvm, and the order of layering (bcache+lvm+btrfs, or lvm+bcache+btrfs).
I think you may find some more details with the search machine of your
preference. I remember there were actually some posts detailing exactly
about this - including some mid-term experience with such a setup.

What ever you find, passthru-mode is probably the easiest path
regarding to code complexity, so it may not reproduce bugs others
found. You may want to try to reproduce exactly their situations but
just using passthru mode and see if it works.

I suspect the hardware storage stack may also play its role (SSD
firmware, SATA/RAID chipset, trim support on/off, NCQ support, ...)


-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-09 16:09         ` Nikolaus Rath
@ 2016-02-09 21:43           ` Kai Krakow
  2016-02-09 22:02             ` Chris Murphy
  2016-02-09 22:38             ` Nikolaus Rath
  0 siblings, 2 replies; 21+ messages in thread
From: Kai Krakow @ 2016-02-09 21:43 UTC (permalink / raw)
  To: linux-btrfs

Am Tue, 09 Feb 2016 08:09:20 -0800
schrieb Nikolaus Rath <Nikolaus@rath.org>:

> On Feb 09 2016, Kai Krakow <hurikhan77@gmail.com> wrote:
> > I'm myself using bcache+btrfs and it ran bullet proof so far, even
> > after unintentional resets or power outage. It's important tho to
> > NOT put any storage layer between bcache and your devices or
> > between btrfs and your device as there are reports it becomes
> > unstable with md or lvm involved.
> 
> Do you mean I should not use anything in the stack other than btrfs
> and bcache, or do you mean I should not put anything under bcache?

I never tried, I just use rawdevice+bcache+btrfs. Nothing stacked
below or inbetween. This works for me.

> In other words, I assume bcache on LVM is a bad idea. But what about
> LVM on bcache?

I think it makes a difference.

> Also, btrfs on LVM on disk is working fine for me, but you seem to be
> saying that it should not? Or are you talking specifically about btrfs
> on LVM on bcache?

Btrfs alone should be no problem. Any combination of all three could
get you in trouble. I suggest doing your tests and keep it as simple as
it can be.

> If there's no way to put LVM anywhere into the stack that'd be a
> bummer, I very much want to use dm-crypt (and I guess that counts as
> lvm?).

Wasn't there plans for integrating per-file encryption into btrfs (like
there's already for ext4)? I think this could pretty well obsolete your
plans - except you prefer full-device encryption.

If you don't put encryption below the bcache caching device, everything
going to the cache won't be encrypted - so that's probably what you are
having to do anyways.

But I don't know how such a setup recovers from power outage, I'm not
familiar with dm-crypt at all, how it integrates with maybe initrd etc.

But get a bigger picture let me explain how bcache works:

The caching device is treated dirty always. That means, it replays all
dirty data automatically during device discovery. Backing and caching
create a unified pair - that's why the superblock is needed. It saves
you from accidently using the backing without the cache. So even after
unclean shutdown, from the user-space view, the pair is always
consistent. Bcache will only remove persisted data from its log if it
ensured it was written correctly to the backing. The backing on its
own, however, is not guaranteed to be consistent at any time - except
you cleanly stop bcache and disconnect the pair (detach the cache).

When dm-crypt comes in, I'm not sure how this is handled - given that
the encryption key must be loaded from somewhere... Someone else may
have a better clue here.

So actually there's two questions:

1. Which order of stacking makes more sense and is more resilient to
errors?

2. Which order of stacking is exhibiting bugs?


-- 
Regards,
Kai

Replies to list-only preferred.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-09 21:43           ` Kai Krakow
@ 2016-02-09 22:02             ` Chris Murphy
  2016-02-09 22:38             ` Nikolaus Rath
  1 sibling, 0 replies; 21+ messages in thread
From: Chris Murphy @ 2016-02-09 22:02 UTC (permalink / raw)
  To: Kai Krakow; +Cc: Btrfs BTRFS

On Tue, Feb 9, 2016 at 2:43 PM, Kai Krakow <hurikhan77@gmail.com> wrote:

> Wasn't there plans for integrating per-file encryption into btrfs (like
> there's already for ext4)? I think this could pretty well obsolete your
> plans - except you prefer full-device encryption.

https://btrfs.wiki.kernel.org/index.php/Project_ideas#Encryption

I don't know whether the ZFS strategy (it would be per subvolume on
Btrfs) or the per directory strategy of ext4 is simpler. The simpler
it is, the more viable it is, I feel.

Maybe it's too much of a tonka toy to only encrypt file data, not
metadata (?) a question for someone more security conscious, but I'd
rather have some level of integrated encryption rather than none. So I
wonder if encryption could be a compression option - that is, it'd fit
into the compression code path and instead of compressing, it'd
encrypt. I guess the bigger problem then is user space tools to manage
keys. For booting, there'd need to be a libbtrfs api or ioctl for
systemd+plymouth to get the passphrase from the user. And for home, it
actually can't be in the startup process at all, it has to be
integrated into the desktop, using the user login passphrase to unlock
a KEK, and from there the DEK. The whole point of per directory
encryption is, a bunch of stuff remains encrypted.

If it were treated as a variation on compression, specifically a
variant of forced compression,  it means no key is needed to do
balance, scrub, device replace, etc, and even inline data gets
encrypted also. Open question if the metadata slot for compression is
big enough to include something like a key uuid, because each dir item
(at least) needs to point to the key needed to decrypt the data. Hmm,
or maybe a new tree to contain and track the encryption keys meant for
each dir item.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-09 21:43           ` Kai Krakow
  2016-02-09 22:02             ` Chris Murphy
@ 2016-02-09 22:38             ` Nikolaus Rath
  2016-02-10  1:12               ` Henk Slager
  1 sibling, 1 reply; 21+ messages in thread
From: Nikolaus Rath @ 2016-02-09 22:38 UTC (permalink / raw)
  To: linux-btrfs

On Feb 09 2016, Kai Krakow <hurikhan77@gmail.com> wrote:
>> If there's no way to put LVM anywhere into the stack that'd be a
>> bummer, I very much want to use dm-crypt (and I guess that counts as
>> lvm?).
>
> Wasn't there plans for integrating per-file encryption into btrfs (like
> there's already for ext4)? I think this could pretty well obsolete your
> plans - except you prefer full-device encryption.

Well, it could obsolete it once the plan turns into an implementation,
but not today :-).

> If you don't put encryption below the bcache caching device, everything
> going to the cache won't be encrypted - so that's probably what you are
> having to do anyways.

No, I could use put separate encryption layers between bcache and the
disk - for both the backing and the caching device.

> But I don't know how such a setup recovers from power outage, I'm not
> familiar with dm-crypt at all, how it integrates with maybe initrd
> etc.

Initrd is not a concern. You can put on it whatever is needed to set up
the stack.

As far as power outages is concerned, I think dm-crypt doesn't change
anything - it's an intermediate layer with no caching. Any write gets
passed through synchronously.

> The caching device is treated dirty always. That means, it replays all
> dirty data automatically during device discovery. Backing and caching
> create a unified pair - that's why the superblock is needed. It saves
> you from accidently using the backing without the cache. So even after
> unclean shutdown, from the user-space view, the pair is always
> consistent. Bcache will only remove persisted data from its log if it
> ensured it was written correctly to the backing. The backing on its
> own, however, is not guaranteed to be consistent at any time - except
> you cleanly stop bcache and disconnect the pair (detach the cache).
>
> When dm-crypt comes in, I'm not sure how this is handled - given that
> the encryption key must be loaded from somewhere... Someone else may
> have a better clue here.

The encryption keys are supplied by userspace when setting up the
device. 


> So actually there's two questions:
>
> 1. Which order of stacking makes more sense and is more resilient to
> errors?

I think in an ideal world (i.e, no software bugs), inserting dm-crypt
anywhere in the stack will not make a difference at all even when there
is a crash. Thus...

> 2. Which order of stacking is exhibiting bugs?

..indeed becomes the important question. Now if only someone had an
answer :-).


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-09 22:38             ` Nikolaus Rath
@ 2016-02-10  1:12               ` Henk Slager
  0 siblings, 0 replies; 21+ messages in thread
From: Henk Slager @ 2016-02-10  1:12 UTC (permalink / raw)
  To: linux-btrfs

On Tue, Feb 9, 2016 at 11:38 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
> On Feb 09 2016, Kai Krakow <hurikhan77@gmail.com> wrote:
>>> If there's no way to put LVM anywhere into the stack that'd be a
>>> bummer, I very much want to use dm-crypt (and I guess that counts as
>>> lvm?).
>>
>> Wasn't there plans for integrating per-file encryption into btrfs (like
>> there's already for ext4)? I think this could pretty well obsolete your
>> plans - except you prefer full-device encryption.
>
> Well, it could obsolete it once the plan turns into an implementation,
> but not today :-).
>
>> If you don't put encryption below the bcache caching device, everything
>> going to the cache won't be encrypted - so that's probably what you are
>> having to do anyways.
>
> No, I could use put separate encryption layers between bcache and the
> disk - for both the backing and the caching device.
>
>> But I don't know how such a setup recovers from power outage, I'm not
>> familiar with dm-crypt at all, how it integrates with maybe initrd
>> etc.
>
> Initrd is not a concern. You can put on it whatever is needed to set up
> the stack.
>
> As far as power outages is concerned, I think dm-crypt doesn't change
> anything - it's an intermediate layer with no caching. Any write gets
> passed through synchronously.
>
>> The caching device is treated dirty always. That means, it replays all
>> dirty data automatically during device discovery. Backing and caching
>> create a unified pair - that's why the superblock is needed. It saves
>> you from accidently using the backing without the cache. So even after
>> unclean shutdown, from the user-space view, the pair is always
>> consistent. Bcache will only remove persisted data from its log if it
>> ensured it was written correctly to the backing. The backing on its
>> own, however, is not guaranteed to be consistent at any time - except
>> you cleanly stop bcache and disconnect the pair (detach the cache).
>>
>> When dm-crypt comes in, I'm not sure how this is handled - given that
>> the encryption key must be loaded from somewhere... Someone else may
>> have a better clue here.
>
> The encryption keys are supplied by userspace when setting up the
> device.
>
>
>> So actually there's two questions:
>>
>> 1. Which order of stacking makes more sense and is more resilient to
>> errors?
>
> I think in an ideal world (i.e, no software bugs), inserting dm-crypt
> anywhere in the stack will not make a difference at all even when there
> is a crash. Thus...

Most sense to me made dm-crypt between bcache and btrfs. And that
works fine I can say. Actually, I have been using the following since
kernel 4.4.0-rc4 was there:
rawdevice + bcache + iscsi + dm-crypt + btrfs

This way, the IP link transports encrypted data (although it is only a
local short ethernetcable+switch). It works fine, scrubs are still
with 0 errors and the last btrfs check did not report any errors.
(It also works well with AoE, top perfomance after I put MTU's to 9000 )

>> 2. Which order of stacking is exhibiting bugs?
>
> ..indeed becomes the important question. Now if only someone had an
> answer :-).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Use fast device only for metadata?
  2016-02-08 21:44     ` Nikolaus Rath
                         ` (2 preceding siblings ...)
  2016-02-09 13:22       ` Austin S. Hemmelgarn
@ 2016-02-10  4:08       ` Nikolaus Rath
  3 siblings, 0 replies; 21+ messages in thread
From: Nikolaus Rath @ 2016-02-10  4:08 UTC (permalink / raw)
  To: linux-btrfs

On Feb 08 2016, Nikolaus Rath <Nikolaus@rath.org> wrote:
> Otherwise I'll give bcache a shot. I've avoided it so far because of the
> need to reformat and because of rumours that it doesn't work well with
> LVM or BTRFS. But it sounds as if that's not the case..

I now have the following stack:

btrfs on LUKS on LVM on bcache

The VG contains two bcache PVs with backing devices on different
spinning disks, and a shared cache device on SSD. I'm using Kernel 4.3.

I'm super happy with the performance, boot times increased from 1:30
minutes to X11 and 2:00 to Firefox roughly 0:10 to X11 and 0:30 to
Firefox.


Time will tell if it also keeps my data intact, but I hope btrfs would
at least detect any corruption. 


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-02-10  4:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-07 19:06 Use fast device only for metadata? Nikolaus Rath
2016-02-07 20:07 ` Kai Krakow
2016-02-07 20:59   ` Martin Steigerwald
2016-02-08  1:04     ` Duncan
2016-02-08 12:24     ` Austin S. Hemmelgarn
2016-02-08 13:20       ` Qu Wenruo
2016-02-08 13:29         ` Austin S. Hemmelgarn
2016-02-08 14:23           ` Qu Wenruo
2016-02-08 21:44     ` Nikolaus Rath
2016-02-08 22:12       ` Duncan
2016-02-09  7:29       ` Kai Krakow
2016-02-09 16:09         ` Nikolaus Rath
2016-02-09 21:43           ` Kai Krakow
2016-02-09 22:02             ` Chris Murphy
2016-02-09 22:38             ` Nikolaus Rath
2016-02-10  1:12               ` Henk Slager
2016-02-09 16:10         ` Nikolaus Rath
2016-02-09 21:29           ` Kai Krakow
2016-02-09 18:23         ` Henk Slager
2016-02-09 13:22       ` Austin S. Hemmelgarn
2016-02-10  4:08       ` Nikolaus Rath

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.