linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs@vger.kernel.org, Michael <mclaud@roznica.com.ua>,
	Hugo Mills <hugo@carfax.org.uk>,
	Martin Svec <martin.svec@zoner.cz>,
	Wang Yugui <wangyugui@e16-tech.com>,
	Paul Jones <paul@pauljones.id.au>,
	Adam Borowski <kilobyte@angband.pl>
Subject: BTRFS and *CACHE setup [was Re: [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata]
Date: Fri, 8 Jan 2021 18:43:52 +0100	[thread overview]
Message-ID: <0dbec46b-8f46-afca-c61a-51b85300b0f2@libero.it> (raw)
In-Reply-To: <bc7d874f-3f8b-7eff-6d18-f9613e7c6972@libero.it>

On 1/8/21 6:30 PM, Goffredo Baroncelli wrote:
> On 1/8/21 2:05 AM, Zygo Blaxell wrote:
>> On Thu, May 28, 2020 at 08:34:47PM +0200, Goffredo Baroncelli wrote:
>>>
> [...]
>>
>> I've been testing these patches for a while now.  They enable an
>> interesting use case that can't otherwise be done safely, sanely or
>> cheaply with btrfs.
> 
> Thanks Zygo for this feedback. As usual you are source of very interesting considerations.
>>
>> Normally if we have an array of, say, 10 spinning disks, and we want to
>> implement a writeback cache layer with SSD, we would need 10 distinct SSD
>> devices to avoid reducing btrfs's ability to recover from drive failures.
>> The writeback cache will be modified on both reads and writes, data and
>> metadata, so we need high endurance SSDs if we want them to make it to
>> the end of their warranty.  The SSD firmware has to not have crippling
>> performance bugs while under heavy write load, which means we are now
>> restricted to an expensive subset of high endurance SSDs targeted at
>> the enterprise/NAS/video production markets...and we need 10 of them!
>>
>> NVME has fairly draconian restrictions on drive count, and getting
>> anything close to 10 of them into a btrfs filesystem can be an expensive
>> challenge.  (I'm not counting solutions that use USB-to-NVME bridges
>> because those don't count as "sane" or "safe").
>>
>> We can share the cache between disks, but not safely in writeback mode,
>> because a failure in one SSD could affect multiple logical btrfs disks.
>> Strictly speaking we can't do it safely in any cache mode, but at least
>> with a writethrough cache we can recover the btrfs by throwing the SSDs
>> away.
[...]

Hi Zygo,

could you elaborate the last sentence. What I understood is that in
write-through mode, the ordering (and the barrier) are preserved.
So this mode should be safe (bug a part).

If this is true, it would be possible to have a btrfs multi (spinning)
disks setup with only one SSD acting as cache. Of course, it will
works only in write-through mode, and the main beneficial are related
to caching the data for further next read.

Does anyone have further experiences ? Does anyone tried to
recover a BTRFS filesystem when the cache disks died ?

Oh.. wait... Now I understood: if the caching disk read badly (but
without returning an error), the bad data may be wrote on the other
disks: in this case a single failure (the cache disk) may affect
all the other disks and the redundancy is lost ...

BR
G.Baroncelli

  reply	other threads:[~2021-01-08 17:44 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-28 18:34 [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 1/4] Add an ioctl to set/retrive the device properties Goffredo Baroncelli
2020-05-28 22:03   ` Hans van Kranenburg
2020-05-28 18:34 ` [PATCH 2/4] Add flags for dedicated metadata disks Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 3/4] Export dev_item.type in sysfs /sys/fs/btrfs/<uuid>/devinfo/<devid>/type Goffredo Baroncelli
2020-05-28 18:34 ` [PATCH 4/4] btrfs: add preferred_metadata mode Goffredo Baroncelli
2020-05-28 22:02   ` Hans van Kranenburg
2020-05-29 16:26     ` Goffredo Baroncelli
2020-05-28 21:59 ` [RFC][PATCH V4] btrfs: preferred_metadata: preferred device for metadata Hans van Kranenburg
2020-05-29 16:37   ` Goffredo Baroncelli
2020-05-30 11:44     ` Zygo Blaxell
2020-05-30 11:51       ` Goffredo Baroncelli
2021-01-08  1:05 ` Zygo Blaxell
2021-01-08 17:30   ` Goffredo Baroncelli
2021-01-08 17:43     ` Goffredo Baroncelli [this message]
2021-01-09 21:23     ` Zygo Blaxell
2021-01-10 19:55       ` Goffredo Baroncelli
2021-01-16  0:25         ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0dbec46b-8f46-afca-c61a-51b85300b0f2@libero.it \
    --to=kreijack@libero.it \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=hugo@carfax.org.uk \
    --cc=kilobyte@angband.pl \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin.svec@zoner.cz \
    --cc=mclaud@roznica.com.ua \
    --cc=paul@pauljones.id.au \
    --cc=wangyugui@e16-tech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).