All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Davies <btrfs-list@steev.me.uk>
To: Hans van Kranenburg <hans@knorrie.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Exploring referenced extents
Date: Wed, 13 May 2020 21:08:05 +0100	[thread overview]
Message-ID: <ac22328714cb200989294a451fc9930b@steev.me.uk> (raw)
In-Reply-To: <3e9446ef-955b-351c-8238-9ca07ee38bf6@knorrie.org>

On 2020-05-11 02:21, Hans van Kranenburg wrote:
> Hi!

Thanks for your insights!

> On 5/9/20 1:11 PM, Steven Davies wrote:
>> For curiosity I'm trying to write a tool which will show me the size 
>> of
>> data extents belonging to which files in a snapshot are exclusive to
>> that snapshot as a way to show how much space would be freed if the
>> snapshot were to be deleted, and which files in the snapshot are 
>> taking
>> up the most space.

<snip lots of useful information>

This is what I was missing when I read the documentation:

>>      find what files reference it #1
>>      for each referencing file:
>>        determine which subvolumes it lives in #2
> 
> For this, we delegate the work to the running linux kernel code, to ask
> it who's using the extent at this disk_bytenr.
> 
> https://python-btrfs.readthedocs.io/en/stable/btrfs.html#btrfs.ioctl.logical_to_ino_v2
> 
> The main thing you're looking for is the ignore_offset option, which
> will give you a list of *any* user of *any* data in that extent, 
> instead
> of only the first 4096 bytes in it which disk_bytenr itself is part of.

I did rework the script - albeit not the way you suggested (I still walk 
the file tree and look up the extents) because my subvolumes are small 
and stored on relatively fast SSDs, and this way allows me to narrow the 
search to a single directory - but it seems to work now. It isn't pretty 
yet either! It's succeeded in telling me that the reason the oldest 
snapshot of my / subvolume is huge is because it contains a dump of 
linux-firmware that's not shared by anything.

Next job - make it into a tree-like utility.

https://github.com/daviessm/btrfs-snapshots-diff/blob/4003a3fdec70c2a0de348e75a6576f9342754f54/btrfs-subvol-size.py

-- 
Steven Davies

  reply	other threads:[~2020-05-13 20:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-09 11:11 Exploring referenced extents Steven Davies
2020-05-09 19:16 ` Steven Davies
2020-05-09 21:32 ` Graham Cobb
2020-05-10 11:07   ` Steven Davies
2020-05-10 12:15     ` Graham Cobb
2020-05-10  1:20 ` Qu Wenruo
2020-05-10 10:55   ` Steven Davies
2020-05-10 11:55     ` Qu Wenruo
2020-05-10 12:51       ` Steven Davies
2020-05-10 13:05         ` Qu Wenruo
2020-05-11  1:21 ` Hans van Kranenburg
2020-05-13 20:08   ` Steven Davies [this message]
2020-05-13 20:15     ` Hans van Kranenburg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac22328714cb200989294a451fc9930b@steev.me.uk \
    --to=btrfs-list@steev.me.uk \
    --cc=hans@knorrie.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.