All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugo Mills <hugo@carfax.org.uk>
To: Christoph Anton Mitterer <calestyo@scientia.net>
Cc: Henk Slager <eye1tm@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs
Date: Sun, 5 Jun 2016 21:07:21 +0000	[thread overview]
Message-ID: <20160605210721.GH24492@carfax.org.uk> (raw)
In-Reply-To: <1465160205.6702.38.camel@scientia.net>

[-- Attachment #1: Type: text/plain, Size: 3726 bytes --]

On Sun, Jun 05, 2016 at 10:56:45PM +0200, Christoph Anton Mitterer wrote:
> On Sun, 2016-06-05 at 22:39 +0200, Henk Slager wrote:
> > > So the point I'm trying to make:
> > > People do probably not care so much whether their VM image/etc. is
> > > COWed or not, snapshots/etc. still work with that,... but they may
> > > likely care if the integrity feature is lost.
> > > So IMHO, nodatacow + checksumming deserves to be amongst the top
> > > priorities.
> > Have you tried blockdevice/HDD caching like bcache or dmcache in
> > combination with VMs on BTRFS?
> No yet,... my personal use case is just some VMs on the notebook, and
> for this, the above would seem a bit overkill.
> For the larger VM cluster at the institute,... puh to be honest I don't
> know by hard what we do there.
> 
> 
> >   Or ZVOL for VMs in ZFS with L2ARC?
> Well but all this is an alternative solution,...
> 
> 
> > I assume the primary reason for wanting nodatacow + checksumming is
> > to
> > avoid long seektimes on HDDs due to growing fragmentation of the VM
> > images over time.
> Well the primary reason is wanting to have overall checksumming in the
> fs, regardless of which features one uses.

   The problem is that you can't guarantee consistency with
nodatacow+checksums. If you have nodatacow, then data is overwritten,
in place. If you do that, then you can't have a fully consistent
checksum -- there are always race conditions between the checksum and
the data being written (or the data and the checksum, depending on
which way round you do it).

> I think we already have some situations where tools use/set btrfs
> features by themselves (i.e. automatically)... wasn't systemd creating
> subvols per default in some locations, when there's btrfs?
> So it's no big step to postgresql/etc. setting nodatacow, making people
> loose integrity without them even knowing.
> 
> Of course, avoiding the fragmentation is the reason for the desire to
> have nodatacow.
> 
> 
> >  But even if you have nodatacow + checksumming
> > implemented, it is then still HDD access and a VM imagefile itself is
> > not guaranteed to be continuous.
> Uhm... sure, but that's no difference to other filesystems?!
> 
> 
> > It is clear that for VM images the amount of extents will be large
> > over time (like 50k or so, autodefrag on),
> Wasn't it said, that autodefrag performs bad for anything larger than
> ~1G?

   I don't recall ever seeing someone saying that. Of course, I may
have forgotten seeing it...

> >  but with a modern SSD used
> > as cache, it doesn't matter. It is still way faster than just HDD(s),
> > even with freshly copied image with <100 extents.
> Well the fragmentation has also many other consequences and not just
> seeks (assuming everyone would use SSDs, which is and probably won't be
> the case for quite a while).
> Most obviously you get much more IOPS and btrfs itself will, AFAIU,
> also suffer from some issues due to the fragmentation.

   This is a fundamental problem with all CoW filesystems. There are
some mititgations that can be put in place (true CoW rather than
btrfs's redirect-on-write, like some databases do, where the original
data is copied elsewhere before overwriting; cache aggressively and
with knowledge of the CoW nature of the FS, like ZFS does), but they
all have their drawbacks and pathological cases.

   Hugo.

-- 
Hugo Mills             | How do you become King? You stand in the marketplace
hugo@... carfax.org.uk | and announce you're going to tax everyone. If you
http://carfax.org.uk/  | get out alive, you're King.
PGP: E2AB1DE4          |                                        Harry Harrison

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2016-06-05 21:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-01 22:25 raid5/6 production use status? Christoph Anton Mitterer
2016-06-02  9:24 ` Gerald Hopf
2016-06-02  9:35   ` Hugo Mills
2016-06-02 10:03     ` Gerald Hopf
2016-06-03 17:38   ` btrfs (was: raid5/6) production use status (and future)? Christoph Anton Mitterer
2016-06-03 19:50     ` btrfs Austin S Hemmelgarn
2016-06-04  1:51       ` btrfs Christoph Anton Mitterer
2016-06-04  7:24         ` btrfs Andrei Borzenkov
2016-06-04 17:00           ` btrfs Chris Murphy
2016-06-04 17:37             ` btrfs Christoph Anton Mitterer
2016-06-04 19:13               ` btrfs Chris Murphy
2016-06-04 22:43                 ` btrfs Christoph Anton Mitterer
2016-06-05 15:51                   ` btrfs Chris Murphy
2016-06-05 20:39                     ` btrfs Christoph Anton Mitterer
2016-06-04 21:18             ` btrfs Andrei Borzenkov
2016-06-05 20:39         ` btrfs Henk Slager
2016-06-05 20:56           ` btrfs Christoph Anton Mitterer
2016-06-05 21:07             ` Hugo Mills [this message]
2016-06-05 21:31               ` btrfs Christoph Anton Mitterer
2016-06-05 23:39                 ` btrfs Chris Murphy
2016-06-08  6:13                 ` btrfs Duncan
2016-06-06  0:56         ` btrfs Chris Murphy
2016-06-06 13:04         ` btrfs Austin S. Hemmelgarn
     [not found]     ` <f4a9ef2f-99a8-bcc4-5a8f-b022914980f0@swiftspirit.co.za>
2016-06-04  2:13       ` btrfs Christoph Anton Mitterer
2016-06-04  2:36         ` btrfs Chris Murphy
2024-01-15 15:32 btrfs Turritopsis Dohrnii Teo En Ming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160605210721.GH24492@carfax.org.uk \
    --to=hugo@carfax.org.uk \
    --cc=calestyo@scientia.net \
    --cc=eye1tm@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.