All of lore.kernel.org
 help / color / mirror / Atom feed
* mix ssd and hdd in single volume
@ 2017-04-01  6:06 UGlee
  2017-04-02  0:13 ` Duncan
  2017-04-03 12:23 ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 6+ messages in thread
From: UGlee @ 2017-04-01  6:06 UTC (permalink / raw)
  To: linux-btrfs

We are working on a small NAS server for home user. The product is
equipped with a small fast SSD (around 60-120GB) and a large HDD (2T
to 4T).

We have two choices:

1. using bcache to accelerate io operation
2. combining SSD and HDD into a single btrfs volume.

Bcache is certainly designed for our purpose. But bcache requires
complex configuration and can only start from clean disk. Also in our
test in Ubuntu 16.04, data inconsistence was observed at least once,
resulting total HDD data lost.

So we wonder if simply putting SSD and HDD into a single btrfs volume,
in whatever mode, the general read operation (mostly readdir and
getxattr) will also be significantly faster than a single HDD without
SSD.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mix ssd and hdd in single volume
  2017-04-01  6:06 mix ssd and hdd in single volume UGlee
@ 2017-04-02  0:13 ` Duncan
  2017-04-03  8:30   ` Marat Khalili
  2017-04-03 12:23 ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 6+ messages in thread
From: Duncan @ 2017-04-02  0:13 UTC (permalink / raw)
  To: linux-btrfs

UGlee posted on Sat, 01 Apr 2017 14:06:11 +0800 as excerpted:

> We are working on a small NAS server for home user. The product is
> equipped with a small fast SSD (around 60-120GB) and a large HDD (2T to
> 4T).
> 
> We have two choices:
> 
> 1. using bcache to accelerate io operation 2. combining SSD and HDD into
> a single btrfs volume.
> 
> Bcache is certainly designed for our purpose. But bcache requires
> complex configuration and can only start from clean disk. Also in our
> test in Ubuntu 16.04, data inconsistence was observed at least once,
> resulting total HDD data lost.
> 
> So we wonder if simply putting SSD and HDD into a single btrfs volume,
> in whatever mode, the general read operation (mostly readdir and
> getxattr) will also be significantly faster than a single HDD without
> SSD.

At present, bcache, or possibly the lvmcache alternative, are the only 
recommended way of creating a single btrfs out of a mixed-size ssd/hdd 
multi-volume.

The problem is that while they've been considered, there's no present 
method of telling btrfs to use the smaller ssd for hotter content.  The 
btrfs chunk allocator simply doesn't have that option at present.

Which would leave you with the choice of single, raid1 or raid0 modes.  
Raid1 requires two copies on separate devices which would mean the extra 
space on the larger hdd would be wasted/unusable, and the read-mode 
mirror choice algorithm is purely even/odd PID-based so on single reads 
you'd have a 50% chance of fast ssd reads, 50% chance slow hdd.  With 
single mode the allocator allocates to the device with the most space 
available first, so until the free space equalized between the two, all 
chunks would end up on the larger/slower hdd.  And raid0 would allocate 
evenly (allocate-wide policy) to both, again wasting the extra space on 
the larger device while only giving you overall about the same speed as 
two hdds would give you, tho less predictably you'd get the full speed of 
the ssd.

The default two-device setup, FWIW, is raid1-mode metadata for safety, 
single-mode data.  

As you can see, none of those are ideal from a fast-small-ssd as cache to 
a large-slow-hdd perspective, thus the recommendation of bcache or 
lvmcache if that's what you want/need.

The other alternative, of course, is separate filesystems, using a 
combination of symlinks, partitioning and bind-mounts, to arrange for 
frequently accessed and performance-critical stuff such as root and /home 
to be on the smaller/faster ssd, while the larger/slower hdd is used for 
stuff like a user's multimedia partition/filesystem.  That's actually 
what I've done here and I'm *very* happy with the result, but it's the 
type of solution that must either be customized per-installation, or 
perhaps be setup by a special-purpose distro installer designed with that 
sort of use-case target in mind.  It's /not/ the sort of thing you can do 
in a NAS product and expect mass-market users to actually read and 
understand the docs in ordered to use the product in an optimal way.


Meanwhile, since you appear to be designing a mass-market product, it's 
worth mentioning that btrfs is considered, certainly by its devs and 
users on this list, to be "still stabilizing, not fully stable and 
mature."  As such, making and having backups at the ready for any data 
considered to be more valuable than the time and resources necessary to 
make those backups is strongly recommended, even more so than when the 
filesystem is considered stable and mature (tho certainly the rule 
applies even then, but try telling that to a mass-market user...).

Additionally, since btrfs /is/ still stabilizing, we recommend that users 
run relatively new kernels, we best support the latest kernels in either 
of the current kernel series (thus 4.10 and 4.9 at present) or the 
mainline LTS series (thus 4.9 and 4.4 at present), and further recommend 
that users at least loosely follow the list in ordered to keep up with 
current btrfs developments and possible issues they may confront.

That doesn't sound like a particularly good choice for a mass-market NAS 
product to me.  Of course there's rockstor and others out there already 
shipping such products, but they're risking their reputation and the 
safety of their customer's data in the process, altho there's certainly a 
few customers out there with the time, desire and technical know-how to 
ensure the recommended backups and following current kernel and list, and 
that's exactly the sort of people you'll find already here.  But that's 
not sufficiently mass-market to appeal to most vendors.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mix ssd and hdd in single volume
  2017-04-02  0:13 ` Duncan
@ 2017-04-03  8:30   ` Marat Khalili
  2017-04-03  8:41     ` Roman Mamedov
  0 siblings, 1 reply; 6+ messages in thread
From: Marat Khalili @ 2017-04-03  8:30 UTC (permalink / raw)
  To: linux-btrfs

On 02/04/17 03:13, Duncan wrote:
> Meanwhile, since you appear to be designing a mass-market product, it's
> worth mentioning that btrfs is considered, certainly by its devs and
> users on this list, to be "still stabilizing, not fully stable and
> mature." [...] That doesn't sound like a particularly good choice for a mass-market NAS
> product to me.  Of course there's rockstor and others out there already
> shipping such products, but they're risking their reputation and the
> safety of their customer's data in the process, altho there's certainly a
> few customers out there with the time, desire and technical know-how to
> ensure the recommended backups and following current kernel and list, and
> that's exactly the sort of people you'll find already here.  But that's
> not sufficiently mass-market to appeal to most vendors.
You may want to look here: https://www.synology.com/en-global/dsm/Btrfs 
. Somebody forgot to tell Synology, which already supports btrfs in all 
hardware-capable devices. I think Rubicon has been crossed in 
'mass-market NAS[es]', for good or not.

--

With Best Regards,
Marat Khalili


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mix ssd and hdd in single volume
  2017-04-03  8:30   ` Marat Khalili
@ 2017-04-03  8:41     ` Roman Mamedov
  2017-04-07  3:12       ` Duncan
  0 siblings, 1 reply; 6+ messages in thread
From: Roman Mamedov @ 2017-04-03  8:41 UTC (permalink / raw)
  To: Marat Khalili; +Cc: linux-btrfs

On Mon, 3 Apr 2017 11:30:44 +0300
Marat Khalili <mkh@rqc.ru> wrote:

> You may want to look here: https://www.synology.com/en-global/dsm/Btrfs 
> . Somebody forgot to tell Synology, which already supports btrfs in all 
> hardware-capable devices. I think Rubicon has been crossed in 
> 'mass-market NAS[es]', for good or not.

AFAIR Synology did not come to this list asking for (any kind of) advice
prior to implementing that (else they would have gotten the same kind of post
from Duncan and others), and it's not Btrfs developers job to have an outreach
program to contact vendors and educate them to not use Btrfs.

I don't remember seeing them actively contribute improvements or fixes
especially for the RAID5 or RAID6 features (which they ADVERTISE on that page
as a fully working part of their product). That doesn't seem honest to end
users or playing nicely with the upstream developers. What the upstream gets
instead is just those end-users coming here one by one some years later,
asking how to fix a broken Btrfs RAID5 on an embedded box running some 3.10 or
3.14 kernel.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mix ssd and hdd in single volume
  2017-04-01  6:06 mix ssd and hdd in single volume UGlee
  2017-04-02  0:13 ` Duncan
@ 2017-04-03 12:23 ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 6+ messages in thread
From: Austin S. Hemmelgarn @ 2017-04-03 12:23 UTC (permalink / raw)
  To: matianfu, linux-btrfs

On 2017-04-01 02:06, UGlee wrote:
> We are working on a small NAS server for home user. The product is
> equipped with a small fast SSD (around 60-120GB) and a large HDD (2T
> to 4T).
>
> We have two choices:
>
> 1. using bcache to accelerate io operation
> 2. combining SSD and HDD into a single btrfs volume.
>
> Bcache is certainly designed for our purpose. But bcache requires
> complex configuration and can only start from clean disk. Also in our
> test in Ubuntu 16.04, data inconsistence was observed at least once,
> resulting total HDD data lost.
>
> So we wonder if simply putting SSD and HDD into a single btrfs volume,
> in whatever mode, the general read operation (mostly readdir and
> getxattr) will also be significantly faster than a single HDD without
> SSD.

Have you tried dm-cache?  The general idea is similar to bcache, but 
it's been much more reliable in my experience, and it's possible to add 
it to an existing device without any need for reprovisioning  (although 
the existing device can't have any pending writes, otherwise you might 
get some data corruption).

Additionally, given what you've said, write-through mode should cover 
what you need in terms of performance, and may be more reliable on 
bcache than writeback mode.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mix ssd and hdd in single volume
  2017-04-03  8:41     ` Roman Mamedov
@ 2017-04-07  3:12       ` Duncan
  0 siblings, 0 replies; 6+ messages in thread
From: Duncan @ 2017-04-07  3:12 UTC (permalink / raw)
  To: linux-btrfs

Roman Mamedov posted on Mon, 03 Apr 2017 13:41:07 +0500 as excerpted:

> On Mon, 3 Apr 2017 11:30:44 +0300 Marat Khalili <mkh@rqc.ru> wrote:
> 
>> You may want to look here: https://www.synology.com/en-global/dsm/Btrfs
>> . Somebody forgot to tell Synology, which already supports btrfs in all
>> hardware-capable devices. I think Rubicon has been crossed in
>> 'mass-market NAS[es]', for good or not.
> 
> AFAIR Synology did not come to this list asking for (any kind of) advice
> prior to implementing that (else they would have gotten the same kind of
> post from Duncan and others)[.]  I don't remember seeing them actively
> contribute improvements or fixes especially for the RAID5 or RAID6
> features (which they ADVERTISE on that page as a fully working part of
> their product).

> That doesn't seem honest to end users or playing nicely with the
> upstream developers. What the upstream gets instead is just those
> end-users coming here one by one some years later, asking how to fix
> a broken Btrfs RAID5 on an embedded box running some 3.10 or 3.14
> kernel.

And of course then the user gets the real state of btrfs and of btrfs 
raid56 mode, particularly back that far, explained to them.  Along with 
that we'll explain that any data on it is in all likelihood lost data, 
with little to no chance at recovery.

And we'll point out that if there was serious value in the data, they 
would have investigated the state of the filesystem before they put that 
data on it, and of course, as I've already said, they'd have had backups 
for anything that was of more value than the time/resources/hassle of 
doing those backups.

And if they're lucky, that NAS will have /been/ the backup, and they'll 
still have the actual working copy at least, and can make another backup 
ASAP just in case that working copy dies too.

But if they're unlucky...

Of course the user will then blame the manufacturer, but by that time the 
warranty will be up, and even if not, while they /might/ get their money 
back, that won't get their data back.

And the manufacturer will get a bad name, but by then having taken the 
money and run they'll be on to something else or perhaps be bought out by 
someone bigger or be out of business.

And all the user will be able to do is chalk it up to experience, and 
mourn the loss of their kids' baby pictures/videos or their wedding 
videos, or whatever.  If they're /really/ lucky, they'll have put them on 
facebook or youtube or whatever, and can retrieve at least those, from 
there.

Meanwhile, the user, having been once burned, may never use the by then 
much improved btrfs, or even worse, never trust anything Linux, again.

Oh, well.  The best we can do here is warn those that /do/ value their 
data enough to do their research first, so they /do/ have those backups 
or at least use something a bit more mature than btrfs raid56 mode.  Of 
course and continue to work on full btrfs stabilization.  And I like to 
think we're reasonably good at those warnings, anyway.  The 
stabilization, too, but that takes time and patience, plus coder skill, 
the last of which which I personally don't have, so I just pitch in where 
I can, answering questions, etc.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-04-07  3:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-01  6:06 mix ssd and hdd in single volume UGlee
2017-04-02  0:13 ` Duncan
2017-04-03  8:30   ` Marat Khalili
2017-04-03  8:41     ` Roman Mamedov
2017-04-07  3:12       ` Duncan
2017-04-03 12:23 ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.