linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Graham Cobb <g.btrfs@cobb.uk.net>
To: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Used disk size of a received subvolume?
Date: Fri, 17 May 2019 16:28:28 +0100	[thread overview]
Message-ID: <811bcd96-5a8e-cb10-7efb-22c1046e0f42@cobb.uk.net> (raw)
In-Reply-To: <27af7824-f3e9-47a5-7760-d3e30827a081@tty0.ch>

On 17/05/2019 14:57, Axel Burri wrote:
> btrfs fi du shows me the information wanted, but only for the last
> received subvolume (as you said it changes over time, and any later
> child will share data with it). For all others, it merely shows "this
> is what gets freed if you delete this subvolume".

It doesn't even show you that: it is possible to have shared (not
exclusive) data which is only shared between files within the subvolume,
and which will be freed if the subvolume is deleted. And, of course, the
obvious problem that if you only count exclusive then no one is being
charged for all the shared segments ("Oh, my backup is getting a bit
expensive. Hmm. I know! I will back up all my files to two different
destinations, and make sure btrfs is sharing the data between both
locations! Then no one pays for it! Whoopee!")

In my opinion, the shared/exclusive information in btrfs fi du is worse
than useless: it confuses people who think it means something different
from what it does. And, in btrfs, it isn't really useful to know whether
something is "exclusive" or not -- what people care about is always
something else (which is dependent on **where** it is shared, and by whom).

The biggest problem is that you haven't defined what **you** (in your
particular use case) mean by the "size" of a subvolume. For btrfs that
doesn't have any single obvious definition.

Most commonly, I think, people mean "how much space on disk would be
freed up if I deleted this subvolume and all subvolumes contained within
it", although quite often they mean the similar (but not identical) "how
much space on disk would be freed up if I deleted just this subvolume".
And sometimes they actually mean "how much space on disk would be freed
up if I deleted this subvolume, the subvolumes contained with in, and
all the snapshots I have taken but are lying around forgotten about in
some other directory tree somewhere".

But often they mean something else completely, such as "how much space
is taken up by the data which was originally created in this subvolume
but which has been cloned into all sorts of places now and may not even
be referred to from this subvolume any more" (typically this is the case
if you want to charge the subvolume owner for the data usage).

And, of course, another reading of your question would be "how much data
was transferred during this send/receive operation" (relevant if you are
running a backup service and want to charge people by how much they are
sending to the service rather than the amount of data stored).

That is why I created my "extents-list" stuff. This is a horrible hack
(one day I will rewrite it using the python library) which lets me
answer questions like: "how much space am I wasting by keeping
historical snapshots", "how much data is being shared between two
subvolumes", "how much of the data in my latest snapshot is unique to
that snapshot" and "how much space would I actually free up if I removed
(just) these particular directories". None of which can be answered from
the existing btrfs command line tools (unless I have missed something).

> And it is pretty slow: on my backup disk (spinning rust, ~2000
> subvolumes, ~100 sharing data), btrfs fi du takes around 5min for a
> subvolume of 20GB, while btrfs find-new takes only seconds.

Yes. Answering the real questions involves taking the FIEMAP data for
every file involved (which, for some questions, is actually every file
on the disk!) so it takes a very long time. Days for my multi-terabyte
backup disk.

> Summing up, what I'm looking for would be something like:
> 
>   btrfs fi du -s --exclusive-relative-to=<other-subvol> <subvol>

You can do that with FIEMAP data. Feel free to look extents-lists. Also
feel free to shout "this is a gross hack" and scream at me!

If you really just need it for two subvols like that

extents-expr -s <subvol> - <other-subvol>

will tell you how much space is in extents used in <subvol> but not used
in <other-subvol>.

Graham

  reply	other threads:[~2019-05-17 15:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16 14:54 Used disk size of a received subvolume? Axel Burri
2019-05-16 17:09 ` Remi Gauvin
2019-05-17 14:14   ` Axel Burri
2019-05-17 16:22     ` Remi Gauvin
2019-05-16 17:12 ` Hugo Mills
2019-05-17 13:57   ` Axel Burri
2019-05-17 15:28     ` Graham Cobb [this message]
2019-05-17 16:39       ` Steven Davies
2019-05-17 23:15         ` Graham Cobb
2019-05-23 16:06       ` Axel Burri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=811bcd96-5a8e-cb10-7efb-22c1046e0f42@cobb.uk.net \
    --to=g.btrfs@cobb.uk.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).