From: Goffredo Baroncelli <kreijack@libero.it>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs@vger.kernel.org, Michael <mclaud@roznica.com.ua>,
Hugo Mills <hugo@carfax.org.uk>,
Martin Svec <martin.svec@zoner.cz>,
Wang Yugui <wangyugui@e16-tech.com>,
Paul Jones <paul@pauljones.id.au>,
Adam Borowski <kilobyte@angband.pl>
Subject: BTRFS and *CACHE setup [was Re: [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata]
Date: Fri, 8 Jan 2021 18:43:52 +0100 [thread overview]
Message-ID: <0dbec46b-8f46-afca-c61a-51b85300b0f2@libero.it> (raw)
In-Reply-To: <bc7d874f-3f8b-7eff-6d18-f9613e7c6972@libero.it>
On 1/8/21 6:30 PM, Goffredo Baroncelli wrote:
> On 1/8/21 2:05 AM, Zygo Blaxell wrote:
>> On Thu, May 28, 2020 at 08:34:47PM +0200, Goffredo Baroncelli wrote:
>>>
> [...]
>>
>> I've been testing these patches for a while now. They enable an
>> interesting use case that can't otherwise be done safely, sanely or
>> cheaply with btrfs.
>
> Thanks Zygo for this feedback. As usual you are source of very interesting considerations.
>>
>> Normally if we have an array of, say, 10 spinning disks, and we want to
>> implement a writeback cache layer with SSD, we would need 10 distinct SSD
>> devices to avoid reducing btrfs's ability to recover from drive failures.
>> The writeback cache will be modified on both reads and writes, data and
>> metadata, so we need high endurance SSDs if we want them to make it to
>> the end of their warranty. The SSD firmware has to not have crippling
>> performance bugs while under heavy write load, which means we are now
>> restricted to an expensive subset of high endurance SSDs targeted at
>> the enterprise/NAS/video production markets...and we need 10 of them!
>>
>> NVME has fairly draconian restrictions on drive count, and getting
>> anything close to 10 of them into a btrfs filesystem can be an expensive
>> challenge. (I'm not counting solutions that use USB-to-NVME bridges
>> because those don't count as "sane" or "safe").
>>
>> We can share the cache between disks, but not safely in writeback mode,
>> because a failure in one SSD could affect multiple logical btrfs disks.
>> Strictly speaking we can't do it safely in any cache mode, but at least
>> with a writethrough cache we can recover the btrfs by throwing the SSDs
>> away.
[...]
Hi Zygo,
could you elaborate the last sentence. What I understood is that in
write-through mode, the ordering (and the barrier) are preserved.
So this mode should be safe (bug a part).
If this is true, it would be possible to have a btrfs multi (spinning)
disks setup with only one SSD acting as cache. Of course, it will
works only in write-through mode, and the main beneficial are related
to caching the data for further next read.
Does anyone have further experiences ? Does anyone tried to
recover a BTRFS filesystem when the cache disks died ?
Oh.. wait... Now I understood: if the caching disk read badly (but
without returning an error), the bad data may be wrote on the other
disks: in this case a single failure (the cache disk) may affect
all the other disks and the redundancy is lost ...
BR
G.Baroncelli
next prev parent reply other threads:[~2021-01-08 17:44 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-28 18:34 [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 1/4] Add an ioctl to set/retrive the device properties Goffredo Baroncelli
2020-05-28 22:03 ` Hans van Kranenburg
2020-05-28 18:34 ` [PATCH 2/4] Add flags for dedicated metadata disks Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 3/4] Export dev_item.type in sysfs /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 4/4] btrfs: add preferred_metadata mode Goffredo Baroncelli
2020-05-28 22:02 ` Hans van Kranenburg
2020-05-29 16:26 ` Goffredo Baroncelli
2020-05-28 21:59 ` [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Hans van Kranenburg
2020-05-29 16:37 ` Goffredo Baroncelli
2020-05-30 11:44 ` Zygo Blaxell
2020-05-30 11:51 ` Goffredo Baroncelli
2021-01-08 1:05 ` Zygo Blaxell
2021-01-08 17:30 ` Goffredo Baroncelli
2021-01-08 17:43 ` Goffredo Baroncelli [this message]
2021-01-09 21:23 ` Zygo Blaxell
2021-01-10 19:55 ` Goffredo Baroncelli
2021-01-16 0:25 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0dbec46b-8f46-afca-c61a-51b85300b0f2@libero.it \
--to=kreijack@libero.it \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=hugo@carfax.org.uk \
--cc=kilobyte@angband.pl \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=martin.svec@zoner.cz \
--cc=mclaud@roznica.com.ua \
--cc=paul@pauljones.id.au \
--cc=wangyugui@e16-tech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).