linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Used disk size of a received subvolume?
@ 2019-05-16 14:54 Axel Burri
  2019-05-16 17:09 ` Remi Gauvin
  2019-05-16 17:12 ` Hugo Mills
  0 siblings, 2 replies; 10+ messages in thread
From: Axel Burri @ 2019-05-16 14:54 UTC (permalink / raw)
  To: linux-btrfs

Trying to get the size of a subvolume created using "btrfs receive",
I've come with a cute little script:

   SUBVOL=/path/to/subvolume
   CGEN=$(btrfs subvolume show "$SUBVOL" \
     | sed -n 's/\s*Gen at creation:\s*//p')
   btrfs subvolume find-new "$SUBVOL" $((CGEN+1)) \
     | cut -d' ' -f7 \
     | tr '\n' '+' \
     | sed 's/\+\+$/\n/' \
     | bc

This simply sums up the "len" field from all modified files since the
creation of the subvolume. Works fine, as btrfs-receive first makes a
snapshot of the parent subvolume, then adds the files according to the
send-stream.

Now this rises some questions:

1. How accurate is this? AFAIK "btrfs find-new" prints real length, not
compressed length.

2. If there are clone-sources in the send-stream, the cloned files
probably also appear in the list.

3. Is there a better way? It would be nice to have a btrfs command for
this. It would be straight-forward to have a "--summary" option in
"btrfs find-new", another approach would be to calculate and dump the
size in either "btrfs send" or "btrfs receive".

Any thoughts? I'm willing to implement such a feature in btrfs-progs if
this sounds reasonable to you.


- Axel


Ref: https://github.com/digint/btrbk/issues/280

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Used disk size of a received subvolume?
  2019-05-16 14:54 Used disk size of a received subvolume? Axel Burri
@ 2019-05-16 17:09 ` Remi Gauvin
  2019-05-17 14:14   ` Axel Burri
  2019-05-16 17:12 ` Hugo Mills
  1 sibling, 1 reply; 10+ messages in thread
From: Remi Gauvin @ 2019-05-16 17:09 UTC (permalink / raw)
  To: Axel Burri, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1176 bytes --]

On 2019-05-16 10:54 a.m., Axel Burri wrote:

> 
> Any thoughts? I'm willing to implement such a feature in btrfs-progs if
> this sounds reasonable to you.
> 


BTRFS qgroups are where this is implemented.  You have to enable quotas,
and leaving quotas enabled has lots of problems, (mostly performance
related), so I would not suggest leaving them on when there is lots of
activity, (ie, multiple send/receive, or deletion of many snapshots.)

But you can enable quotas as any time (btrfs quota enable /path)

Wait for the rescan to finish

btrfs quota rescan -s /path  (to view status of scan)

And then:

btrfs qgroup show /path to list the space usage, (total, and what you're
looking for: Exclusive)

Note that the default groups correspond to subvolume ID, not filename,
(someone did make a utility somewhere that will display this output with
corresponding directory names.)

btrfs sub list /path is used to find the relevant ID's.. (I find the -o
option useful, so it only displays the subvolumes that are children to
the /path)

As stated above, I would suggest disabling quotas when you are finished:

btrfs quota disable /path



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Used disk size of a received subvolume?
  2019-05-16 14:54 Used disk size of a received subvolume? Axel Burri
  2019-05-16 17:09 ` Remi Gauvin
@ 2019-05-16 17:12 ` Hugo Mills
  2019-05-17 13:57   ` Axel Burri
  1 sibling, 1 reply; 10+ messages in thread
From: Hugo Mills @ 2019-05-16 17:12 UTC (permalink / raw)
  To: Axel Burri; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 1934 bytes --]

On Thu, May 16, 2019 at 04:54:42PM +0200, Axel Burri wrote:
> Trying to get the size of a subvolume created using "btrfs receive",
> I've come with a cute little script:
> 
>    SUBVOL=/path/to/subvolume
>    CGEN=$(btrfs subvolume show "$SUBVOL" \
>      | sed -n 's/\s*Gen at creation:\s*//p')
>    btrfs subvolume find-new "$SUBVOL" $((CGEN+1)) \
>      | cut -d' ' -f7 \
>      | tr '\n' '+' \
>      | sed 's/\+\+$/\n/' \
>      | bc
> 
> This simply sums up the "len" field from all modified files since the
> creation of the subvolume. Works fine, as btrfs-receive first makes a
> snapshot of the parent subvolume, then adds the files according to the
> send-stream.
> 
> Now this rises some questions:
> 
> 1. How accurate is this? AFAIK "btrfs find-new" prints real length, not
> compressed length.
> 
> 2. If there are clone-sources in the send-stream, the cloned files
> probably also appear in the list.
> 
> 3. Is there a better way? It would be nice to have a btrfs command for
> this. It would be straight-forward to have a "--summary" option in
> "btrfs find-new", another approach would be to calculate and dump the
> size in either "btrfs send" or "btrfs receive".

   btrfs find-new also doesn't tell you about deleted files (fairly
obviously), so if anything's been removed, you'll be overestimating
the overall change in size.

> Any thoughts? I'm willing to implement such a feature in btrfs-progs if
> this sounds reasonable to you.

   If you're looking for the incremental usage of the subvolume, why
not just use the "exclusive" value from btrfs fi du? That's exactly
that information. (And note that it changes over time, as other
subvols it shares with are deleted).

   Hugo.

-- 
Hugo Mills             | Your problem is that you've got too much taste to be
hugo@... carfax.org.uk | a web developer.
http://carfax.org.uk/  |
PGP: E2AB1DE4          |                                          Steve Harris

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Used disk size of a received subvolume?
  2019-05-16 17:12 ` Hugo Mills
@ 2019-05-17 13:57   ` Axel Burri
  2019-05-17 15:28     ` Graham Cobb
  0 siblings, 1 reply; 10+ messages in thread
From: Axel Burri @ 2019-05-17 13:57 UTC (permalink / raw)
  To: Hugo Mills, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 13285 bytes --]

On 16/05/2019 19.12, Hugo Mills wrote:
> On Thu, May 16, 2019 at 04:54:42PM +0200, Axel Burri wrote:
>> Trying to get the size of a subvolume created using "btrfs receive",
>> I've come with a cute little script:
>>
>>    SUBVOL=/path/to/subvolume
>>    CGEN=$(btrfs subvolume show "$SUBVOL" \
>>      | sed -n 's/\s*Gen at creation:\s*//p')
>>    btrfs subvolume find-new "$SUBVOL" $((CGEN+1)) \
>>      | cut -d' ' -f7 \
>>      | tr '\n' '+' \
>>      | sed 's/\+\+$/\n/' \
>>      | bc
>>
>> This simply sums up the "len" field from all modified files since the
>> creation of the subvolume. Works fine, as btrfs-receive first makes a
>> snapshot of the parent subvolume, then adds the files according to the
>> send-stream.
>>
>> Now this rises some questions:
>>
>> 1. How accurate is this? AFAIK "btrfs find-new" prints real length, not
>> compressed length.
>>
>> 2. If there are clone-sources in the send-stream, the cloned files
>> probably also appear in the list.
>>
>> 3. Is there a better way? It would be nice to have a btrfs command for
>> this. It would be straight-forward to have a "--summary" option in
>> "btrfs find-new", another approach would be to calculate and dump the
>> size in either "btrfs send" or "btrfs receive".
> 
>    btrfs find-new also doesn't tell you about deleted files (fairly
> obviously), so if anything's been removed, you'll be overestimating
> the overall change in size.

True, making it not very useful, especially for backups where it is
also important to see what has been deleted.

>> Any thoughts? I'm willing to implement such a feature in btrfs-progs if
>> this sounds reasonable to you.
> 
>    If you're looking for the incremental usage of the subvolume, why
> not just use the "exclusive" value from btrfs fi du? That's exactly
> that information. (And note that it changes over time, as other
> subvols it shares with are deleted).

btrfs fi du shows me the information wanted, but only for the last
received subvolume (as you said it changes over time, and any later
child will share data with it). For all others, it merely shows "this
is what gets freed if you delete this subvolume".

And it is pretty slow: on my backup disk (spinning rust, ~2000
subvolumes, ~100 sharing data), btrfs fi du takes around 5min for a
subvolume of 20GB, while btrfs find-new takes only seconds.


Summing up, what I'm looking for would be something like:

  btrfs fi du -s --exclusive-relative-to=<other-subvol> <subvol>

Which is probably not easily doable.


Thanks,

- Axel


PS:

To get an idea on how this looks like on real data, here's the
combined output of "btrfs fi du -s" (Total, Exclusive),
"find-new-sum.sh", "find-new-prev.sh" for btrbk backups of a
webservice with low traffic.

Large values on find-new-prev column shows me when I updated the
software, probably overestimated by a factor of 2 (new package
installed, old package deleted).

     Total   Exclusive   find-new-sum  find-new-prev Filename
 319.59MiB    32.00KiB   345917734                   data.20151130
 320.24MiB    13.41MiB   630784        2453504       data.20151206
 321.44MiB    16.82MiB   135168        3699477       data.20160103
 325.77MiB   126.65MiB   258048        128418159     data.20160207
 326.92MiB   127.69MiB   262144        128242483     data.20160306
 467.85MiB   101.53MiB   585728        307220876     data.20160403
 475.99MiB   101.17MiB   544768        107036454     data.20160501
 515.12MiB    21.70MiB   450703        138940051     data.20160605
 736.80MiB    22.46MiB   573440        246238002     data.20160703
 743.39MiB   107.00MiB   540672        105290810     data.20160807
   1.15GiB   149.94MiB   651264        613553890     data.20160904
   1.23GiB    25.52MiB   111188246     238660114     data.20161002
   1.24GiB    24.70MiB   491520        10912303      data.20161106
   1.24GiB    25.66MiB   1834183       8197625       data.20161204
   1.24GiB   115.76MiB   573440        160281830     data.20170101
   1.25GiB   114.73MiB   610304        113867698     data.20170205
   1.82GiB    25.48MiB   618496        726426846     data.20170305
   1.82GiB    26.71MiB   507904        6691050       data.20170402
   1.85GiB    26.59MiB   733184        112139682     data.20170507
   1.85GiB    26.83MiB   24576         6841378       data.20170604
   1.85GiB    27.92MiB   1874740       7671216       data.20170702
   1.85GiB    28.73MiB   2441216       13105243      data.20170806
   1.86GiB    27.94MiB   491520        12743292      data.20170903
   1.90GiB    27.57MiB   634880        196836664     data.20171001
   1.90GiB    28.38MiB   503808        8083183       data.20171105
   1.90GiB    28.83MiB   425984        6748887       data.20171203
   1.90GiB   129.11MiB   708751        174442102     data.20180107
   1.87GiB   195.24MiB   1232896       249746997     data.20180204
   2.33GiB    29.74MiB   1183744       731504273     data.20180304
   2.33GiB    30.76MiB   1093632       7414396       data.20180401
   2.35GiB    29.20MiB   1048576       205278308     data.20180506
   2.35GiB    28.88MiB   1191936       2736128       data.20180513
   2.35GiB    28.79MiB   901120        179189757     data.20180520
   2.35GiB    28.82MiB   1179648       3047567       data.20180527
   2.35GiB    29.00MiB   1040384       3502223       data.20180603
   2.35GiB    29.21MiB   1183744       3093116       data.20180610
   2.46GiB    28.66MiB   1093632       366863945     data.20180617
   2.46GiB    29.24MiB   495616        3283388       data.20180624
   2.46GiB    29.39MiB   1290240       3207311       data.20180701
   2.47GiB    29.55MiB   1351680       3269245       data.20180708
   2.50GiB    29.40MiB   995328        39091775      data.20180715
   2.50GiB    29.46MiB   1236992       3411968       data.20180722
   2.50GiB    29.79MiB   1015951       3823244       data.20180729
   2.50GiB    30.11MiB   1359872       4673536       data.20180805
   2.50GiB    30.25MiB   638976        4542464       data.20180812
   2.50GiB    29.62MiB   1421312       183075636     data.20180819
   2.50GiB    29.48MiB   1114112       3473408       data.20180826
   2.50GiB    29.71MiB   1253376       3158016       data.20180902
   2.50GiB    29.74MiB   1126400       2945167       data.20180909
   2.50GiB    29.89MiB   1191936       2874597       data.20180916
   2.50GiB    30.32MiB   720896        3408015       data.20180923
   2.52GiB    30.94MiB   2097152       223821428     data.20180930
   2.52GiB    30.51MiB   1187840       6980297       data.20181007
   2.52GiB    30.45MiB   1090181       3455803       data.20181014
   2.52GiB    30.52MiB   1196814       2983306       data.20181021
   2.53GiB    30.78MiB   2231973       5325153       data.20181028
   2.52GiB    30.93MiB   1789697       239439411     data.20181104
   2.53GiB    30.92MiB   2453157       7016043       data.20181111
   2.53GiB    31.09MiB   1666725       3402269       data.20181118
   2.53GiB    30.68MiB   1060517       4639629       data.20181125
   2.53GiB    30.76MiB   1051174       4370574       data.20181202
   2.53GiB    30.68MiB   964673        3893313       data.20181209
   2.53GiB    30.65MiB   1064960       2727936       data.20181216
   2.54GiB    30.07MiB   1717236       212719260     data.20181223
   2.54GiB    30.39MiB   1231330       4679530       data.20181230
   2.54GiB    30.62MiB   1244308       3761492       data.20190106
   2.54GiB    31.58MiB   1146533       7047447       data.20190113
   2.55GiB    31.28MiB   1371905       6896944       data.20190120
   2.55GiB    31.27MiB   1941203       6495347       data.20190127
   2.55GiB    31.17MiB   1605262       5266478       data.20190203
   2.55GiB    31.54MiB   1105596       2847971       data.20190210
   2.56GiB    31.37MiB   1221804       201645985     data.20190217
   2.56GiB    31.58MiB   1360378       2971929       data.20190224
   2.56GiB    32.07MiB   2174675       5535980       data.20190303
   2.56GiB    32.02MiB   1743836       6429621       data.20190310
   2.56GiB    31.43MiB   1466368       3834494       data.20190317
   2.56GiB    31.26MiB   1544192       1544192       data.20190318
   2.56GiB    31.34MiB   1995373       1995373       data.20190319
   2.56GiB    31.37MiB   1509559       1509559       data.20190320
   2.56GiB    31.37MiB   1597093       1597093       data.20190321
   2.56GiB    30.51MiB   1552037       1552037       data.20190322
   2.57GiB    30.47MiB   2694989       2694989       data.20190323
   2.57GiB    30.57MiB   2612947       2612947       data.20190324
   2.58GiB    30.10MiB   208643677     208643677     data.20190325
   2.58GiB    31.27MiB   1580732       1580732       data.20190326
   2.58GiB    31.34MiB   1552083       1552083       data.20190327
   2.58GiB    31.48MiB   1605331       1605331       data.20190328
   2.58GiB    30.55MiB   17353623      17353623      data.20190329
   2.58GiB    30.63MiB   2715094       2715094       data.20190330
   2.58GiB    31.55MiB   1646130       1646130       data.20190331
   2.58GiB    31.51MiB   1478263       1478263       data.20190401
   2.58GiB    31.61MiB   5478839       5478839       data.20190402
   2.58GiB    31.52MiB   1711712       1711712       data.20190403
   2.58GiB    31.53MiB   1535538       1535538       data.20190404
   2.58GiB    31.57MiB   1592820       1592820       data.20190405
   2.58GiB    31.50MiB   1887794       1887794       data.20190406
   2.58GiB    31.58MiB   1531442       1531442       data.20190407
   2.58GiB    31.54MiB   1679082       1679082       data.20190408
   2.58GiB    31.55MiB   1580732       1580732       data.20190409
   2.58GiB    31.53MiB   1679036       1679036       data.20190410
   2.58GiB    31.46MiB   1748668       1748668       data.20190411
   2.58GiB    31.62MiB   1704319       1704319       data.20190412
   2.58GiB    30.16MiB   215955743     215955743     data.20190413
   2.58GiB    31.40MiB   1500930       1500930       data.20190414
   2.58GiB    31.36MiB   1357708       1357708       data.20190415
   2.58GiB    31.45MiB   1498789       1498789       data.20190416
   2.58GiB    31.75MiB   2137765       2137765       data.20190417
   2.58GiB    31.81MiB   1601212       1601212       data.20190418
   2.58GiB    31.82MiB   1730505       1730505       data.20190419
   2.58GiB    31.82MiB   1572586       1572586       data.20190420
   2.58GiB    31.87MiB   1490620       1490620       data.20190421
   2.58GiB    31.88MiB   1523457       1523457       data.20190422
   2.58GiB    31.86MiB   1924865       1924865       data.20190423
   2.58GiB    31.89MiB   1654482       1654482       data.20190424
   2.58GiB    31.84MiB   1588970       1588970       data.20190425
   2.58GiB    31.84MiB   1576613       1576613       data.20190426
   2.58GiB    31.81MiB   1756929       1756929       data.20190427
   2.58GiB    31.89MiB   1658625       1658625       data.20190428
   2.58GiB    31.84MiB   1584805       1584805       data.20190429
   2.58GiB    31.82MiB   1605377       1605377       data.20190430
   2.58GiB    31.85MiB   1588947       1588947       data.20190501
   2.58GiB    31.89MiB   1765121       1765121       data.20190502
   2.58GiB    31.88MiB   1760933       1760933       data.20190503
   2.58GiB    31.83MiB   1810154       1810154       data.20190504
   2.58GiB    32.20MiB   1609473       1609473       data.20190505
   2.61GiB     2.68MiB   149129730     149129730     data.20190506
   2.61GiB    17.41MiB   3137281       3137281       data.20190507
   2.61GiB    23.75MiB   2625258       2625258       data.20190508
   2.61GiB    27.62MiB   2404074       2404074       data.20190509
   2.61GiB    29.87MiB   1904224       1904224       data.20190510
   2.61GiB    31.22MiB   2080513       2080513       data.20190511
   2.61GiB    35.84MiB   2744042       2744042       data.20190512
   2.61GiB    37.33MiB   2019073       2019073       data.20190513
   2.61GiB    37.79MiB   2035388       2035388       data.20190514
   2.61GiB    38.17MiB   1785578       1785578       data.20190515
   2.61GiB    39.63MiB   1982163       1982163       data.20190516


$ cat find-new-sum.sh
#!/bin/bash
# Data size added on received subvolume.
subvol=$1
cgen=$(btrfs subvolume show "$subvol" \
  | sed -n 's/\s*Gen at creation:\s*//p')
sum=$(btrfs subvolume find-new "$subvol" $((cgen+1)) \
  | cut -d' ' -f7 \
  | tr '\n' '+' \
  | sed 's/\+\+$/\n/' \
  | bc)
echo "$subvol $sum"


$ cat find-new-prev.sh
#!/bin/bash
# Data size added since last backup.
# (works only if backups are received linear in time)
lastgen=999999999
for subvol in $@ ; do
    cgen=$(btrfs subvolume show "$subvol" \
      | sed -n 's/\s*Gen at creation:\s*//p')
    if [[ $lastgen -gt $cgen ]]; then
       echo "$subvol older, skipping"
    else
        sum=$(btrfs subvolume find-new "$subvol" $((lastgen+1)) \
          | cut -d' ' -f7 \
          | tr '\n' '+' \
          | sed 's/\+\+$/\n/' \
          | bc)
        echo "$subvol $sum"
    fi
    lastgen=$(btrfs subvolume show "$subvol" \
      | sed -n 's/\s*Generation:\s*//p')
done


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Used disk size of a received subvolume?
  2019-05-16 17:09 ` Remi Gauvin
@ 2019-05-17 14:14   ` Axel Burri
  2019-05-17 16:22     ` Remi Gauvin
  0 siblings, 1 reply; 10+ messages in thread
From: Axel Burri @ 2019-05-17 14:14 UTC (permalink / raw)
  To: Remi Gauvin, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1630 bytes --]

On 16/05/2019 19.09, Remi Gauvin wrote:
> On 2019-05-16 10:54 a.m., Axel Burri wrote:
> 
>>
>> Any thoughts? I'm willing to implement such a feature in btrfs-progs if
>> this sounds reasonable to you.
>>
> 
> 
> BTRFS qgroups are where this is implemented.  You have to enable quotas,
> and leaving quotas enabled has lots of problems, (mostly performance
> related), so I would not suggest leaving them on when there is lots of
> activity, (ie, multiple send/receive, or deletion of many snapshots.)
> 
> But you can enable quotas as any time (btrfs quota enable /path)
> 
> Wait for the rescan to finish
> 
> btrfs quota rescan -s /path  (to view status of scan)
> 
> And then:
> 
> btrfs qgroup show /path to list the space usage, (total, and what you're
> looking for: Exclusive)
> 
> Note that the default groups correspond to subvolume ID, not filename,
> (someone did make a utility somewhere that will display this output with
> corresponding directory names.)
> 
> btrfs sub list /path is used to find the relevant ID's.. (I find the -o
> option useful, so it only displays the subvolumes that are children to
> the /path)
> 
> As stated above, I would suggest disabling quotas when you are finished:
> 
> btrfs quota disable /path


Thanks for the tip, but this does not seem practicable for productive
systems, as it involves enabling quotas, which had many problems in the
past (not sure about the current state, but probably still true if I get
you correctly).

Nevertheless I played around with it and it seems to work, I'll keep it
in mind for the future.


- Axel


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Used disk size of a received subvolume?
  2019-05-17 13:57   ` Axel Burri
@ 2019-05-17 15:28     ` Graham Cobb
  2019-05-17 16:39       ` Steven Davies
  2019-05-23 16:06       ` Axel Burri
  0 siblings, 2 replies; 10+ messages in thread
From: Graham Cobb @ 2019-05-17 15:28 UTC (permalink / raw)
  To: linux-btrfs

On 17/05/2019 14:57, Axel Burri wrote:
> btrfs fi du shows me the information wanted, but only for the last
> received subvolume (as you said it changes over time, and any later
> child will share data with it). For all others, it merely shows "this
> is what gets freed if you delete this subvolume".

It doesn't even show you that: it is possible to have shared (not
exclusive) data which is only shared between files within the subvolume,
and which will be freed if the subvolume is deleted. And, of course, the
obvious problem that if you only count exclusive then no one is being
charged for all the shared segments ("Oh, my backup is getting a bit
expensive. Hmm. I know! I will back up all my files to two different
destinations, and make sure btrfs is sharing the data between both
locations! Then no one pays for it! Whoopee!")

In my opinion, the shared/exclusive information in btrfs fi du is worse
than useless: it confuses people who think it means something different
from what it does. And, in btrfs, it isn't really useful to know whether
something is "exclusive" or not -- what people care about is always
something else (which is dependent on **where** it is shared, and by whom).

The biggest problem is that you haven't defined what **you** (in your
particular use case) mean by the "size" of a subvolume. For btrfs that
doesn't have any single obvious definition.

Most commonly, I think, people mean "how much space on disk would be
freed up if I deleted this subvolume and all subvolumes contained within
it", although quite often they mean the similar (but not identical) "how
much space on disk would be freed up if I deleted just this subvolume".
And sometimes they actually mean "how much space on disk would be freed
up if I deleted this subvolume, the subvolumes contained with in, and
all the snapshots I have taken but are lying around forgotten about in
some other directory tree somewhere".

But often they mean something else completely, such as "how much space
is taken up by the data which was originally created in this subvolume
but which has been cloned into all sorts of places now and may not even
be referred to from this subvolume any more" (typically this is the case
if you want to charge the subvolume owner for the data usage).

And, of course, another reading of your question would be "how much data
was transferred during this send/receive operation" (relevant if you are
running a backup service and want to charge people by how much they are
sending to the service rather than the amount of data stored).

That is why I created my "extents-list" stuff. This is a horrible hack
(one day I will rewrite it using the python library) which lets me
answer questions like: "how much space am I wasting by keeping
historical snapshots", "how much data is being shared between two
subvolumes", "how much of the data in my latest snapshot is unique to
that snapshot" and "how much space would I actually free up if I removed
(just) these particular directories". None of which can be answered from
the existing btrfs command line tools (unless I have missed something).

> And it is pretty slow: on my backup disk (spinning rust, ~2000
> subvolumes, ~100 sharing data), btrfs fi du takes around 5min for a
> subvolume of 20GB, while btrfs find-new takes only seconds.

Yes. Answering the real questions involves taking the FIEMAP data for
every file involved (which, for some questions, is actually every file
on the disk!) so it takes a very long time. Days for my multi-terabyte
backup disk.

> Summing up, what I'm looking for would be something like:
> 
>   btrfs fi du -s --exclusive-relative-to=<other-subvol> <subvol>

You can do that with FIEMAP data. Feel free to look extents-lists. Also
feel free to shout "this is a gross hack" and scream at me!

If you really just need it for two subvols like that

extents-expr -s <subvol> - <other-subvol>

will tell you how much space is in extents used in <subvol> but not used
in <other-subvol>.

Graham

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Used disk size of a received subvolume?
  2019-05-17 14:14   ` Axel Burri
@ 2019-05-17 16:22     ` Remi Gauvin
  0 siblings, 0 replies; 10+ messages in thread
From: Remi Gauvin @ 2019-05-17 16:22 UTC (permalink / raw)
  To: Axel Burri; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 583 bytes --]

On 2019-05-17 10:14 a.m., Axel Burri wrote:

> 
> Nevertheless I played around with it and it seems to work, I'll keep it
> in mind for the future.
> 


fi du was not implemented last time I had to do this, so I had
completely forgotten about it.  It makes much more sense to just use
that for the Excluisve disk usage of a single subvolume.

However, Qgroups does allow you to group multiple subvolumes,, so, for
example, you build a query to find out how much disk space is used
exclusively by the 10 oldest snapshots, which I find to be a more useful
question..


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Used disk size of a received subvolume?
  2019-05-17 15:28     ` Graham Cobb
@ 2019-05-17 16:39       ` Steven Davies
  2019-05-17 23:15         ` Graham Cobb
  2019-05-23 16:06       ` Axel Burri
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Davies @ 2019-05-17 16:39 UTC (permalink / raw)
  To: linux-btrfs

On 17/05/2019 16:28, Graham Cobb wrote:

> That is why I created my "extents-list" stuff. This is a horrible hack
> (one day I will rewrite it using the python library) which lets me
> answer questions like: "how much space am I wasting by keeping
> historical snapshots", "how much data is being shared between two
> subvolumes", "how much of the data in my latest snapshot is unique to
> that snapshot" and "how much space would I actually free up if I removed
> (just) these particular directories". None of which can be answered from
> the existing btrfs command line tools (unless I have missed something).

I have my own horrible hack to do something like this; if you ever get 
around to implementing it in Python could you share the code?

-- 
Steven Davies

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Used disk size of a received subvolume?
  2019-05-17 16:39       ` Steven Davies
@ 2019-05-17 23:15         ` Graham Cobb
  0 siblings, 0 replies; 10+ messages in thread
From: Graham Cobb @ 2019-05-17 23:15 UTC (permalink / raw)
  To: linux-btrfs

On 17/05/2019 17:39, Steven Davies wrote:
> On 17/05/2019 16:28, Graham Cobb wrote:
> 
>> That is why I created my "extents-list" stuff. This is a horrible hack
>> (one day I will rewrite it using the python library) which lets me
>> answer questions like: "how much space am I wasting by keeping
>> historical snapshots", "how much data is being shared between two
>> subvolumes", "how much of the data in my latest snapshot is unique to
>> that snapshot" and "how much space would I actually free up if I removed
>> (just) these particular directories". None of which can be answered from
>> the existing btrfs command line tools (unless I have missed something).
> 
> I have my own horrible hack to do something like this; if you ever get
> around to implementing it in Python could you share the code?
> 

Sure. The current hack (using shell and command line tools) is at
https://github.com/GrahamCobb/extents-lists. If the python version ever
materialises I expect it will end up there as well.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Used disk size of a received subvolume?
  2019-05-17 15:28     ` Graham Cobb
  2019-05-17 16:39       ` Steven Davies
@ 2019-05-23 16:06       ` Axel Burri
  1 sibling, 0 replies; 10+ messages in thread
From: Axel Burri @ 2019-05-23 16:06 UTC (permalink / raw)
  To: Graham Cobb, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5776 bytes --]

On 17/05/2019 17.28, Graham Cobb wrote:
> On 17/05/2019 14:57, Axel Burri wrote:
>> btrfs fi du shows me the information wanted, but only for the last
>> received subvolume (as you said it changes over time, and any later
>> child will share data with it). For all others, it merely shows "this
>> is what gets freed if you delete this subvolume".
> 
> It doesn't even show you that: it is possible to have shared (not
> exclusive) data which is only shared between files within the subvolume,
> and which will be freed if the subvolume is deleted. And, of course, the
> obvious problem that if you only count exclusive then no one is being
> charged for all the shared segments ("Oh, my backup is getting a bit
> expensive. Hmm. I know! I will back up all my files to two different
> destinations, and make sure btrfs is sharing the data between both
> locations! Then no one pays for it! Whoopee!")
> 
> In my opinion, the shared/exclusive information in btrfs fi du is worse
> than useless: it confuses people who think it means something different
> from what it does. And, in btrfs, it isn't really useful to know whether
> something is "exclusive" or not -- what people care about is always
> something else (which is dependent on **where** it is shared, and by whom).

Agreed. Sadly btrfs-filesystem(8) does not give much information on how
"exclusive" should be interpreted.

> The biggest problem is that you haven't defined what **you** (in your
> particular use case) mean by the "size" of a subvolume. For btrfs that
> doesn't have any single obvious definition.
> 
> Most commonly, I think, people mean "how much space on disk would be
> freed up if I deleted this subvolume and all subvolumes contained within
> it", although quite often they mean the similar (but not identical) "how
> much space on disk would be freed up if I deleted just this subvolume".
> And sometimes they actually mean "how much space on disk would be freed
> up if I deleted this subvolume, the subvolumes contained with in, and
> all the snapshots I have taken but are lying around forgotten about in
> some other directory tree somewhere".
> 
> But often they mean something else completely, such as "how much space
> is taken up by the data which was originally created in this subvolume
> but which has been cloned into all sorts of places now and may not even
> be referred to from this subvolume any more" (typically this is the case
> if you want to charge the subvolume owner for the data usage).
> 
> And, of course, another reading of your question would be "how much data
> was transferred during this send/receive operation" (relevant if you are
> running a backup service and want to charge people by how much they are
> sending to the service rather than the amount of data stored).

I actually meant "how much space is taken up by the data compared to the
previous received subvolume", or any similar question which gives
insight on how much disk space is being used over time by send/receive
backups of snapshots of a source subvolume.

After a couple of years of running btrbk I have many backup subvolumes,
and I want to be able to get some statistics on which ones eat up how
much space on disk.

> That is why I created my "extents-list" stuff. This is a horrible hack
> (one day I will rewrite it using the python library) which lets me
> answer questions like: "how much space am I wasting by keeping
> historical snapshots", "how much data is being shared between two
> subvolumes", "how much of the data in my latest snapshot is unique to
> that snapshot" and "how much space would I actually free up if I removed
> (just) these particular directories". None of which can be answered from
> the existing btrfs command line tools (unless I have missed something).
> 
>> And it is pretty slow: on my backup disk (spinning rust, ~2000
>> subvolumes, ~100 sharing data), btrfs fi du takes around 5min for a
>> subvolume of 20GB, while btrfs find-new takes only seconds.
> 
> Yes. Answering the real questions involves taking the FIEMAP data for
> every file involved (which, for some questions, is actually every file
> on the disk!) so it takes a very long time. Days for my multi-terabyte
> backup disk.
> 
>> Summing up, what I'm looking for would be something like:
>>
>>   btrfs fi du -s --exclusive-relative-to=<other-subvol> <subvol>
> 
> You can do that with FIEMAP data. Feel free to look extents-lists. Also
> feel free to shout "this is a gross hack" and scream at me!
> 
> If you really just need it for two subvols like that
> 
> extents-expr -s <subvol> - <other-subvol>
> 
> will tell you how much space is in extents used in <subvol> but not used
> in <other-subvol>.

Thanks a lot, your scripts are very useful and answer my question.

While I love their bashyness, I re-hacked parts of it in perl last
night, so that I can use it within btrbk (not sure though if I want to
unleash this to the masses, as many people will mis-interpret the data
and shout at me on how slow this is).

Here's what I got by now:

  # git clone -b extents-diff https://github.com/digint/btrbk.git
  # ./btrbk extents-diff /home --dry-run
  # ./btrbk extents-diff /home
  # ./btrbk extents-diff <subvol>...

If called with a single argument, btrbk looks for all related subvolumes
and prints the difference to the previous one, sorted by gen (transid).
While this is usually fine for snapshots, parent-uuid chains get broken
for received subvolume as soon as an intermediate subvolume is deleted
(and thus need to be passed as additional arguments).

The hacky perl module is here:
https://github.com/digint/btrbk/blob/extents-diff/lib/Linux/ExtentsMap.pm


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-05-23 16:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 14:54 Used disk size of a received subvolume? Axel Burri
2019-05-16 17:09 ` Remi Gauvin
2019-05-17 14:14   ` Axel Burri
2019-05-17 16:22     ` Remi Gauvin
2019-05-16 17:12 ` Hugo Mills
2019-05-17 13:57   ` Axel Burri
2019-05-17 15:28     ` Graham Cobb
2019-05-17 16:39       ` Steven Davies
2019-05-17 23:15         ` Graham Cobb
2019-05-23 16:06       ` Axel Burri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).