* Used disk size of a received subvolume? @ 2019-05-16 14:54 Axel Burri 2019-05-16 17:09 ` Remi Gauvin 2019-05-16 17:12 ` Hugo Mills 0 siblings, 2 replies; 10+ messages in thread From: Axel Burri @ 2019-05-16 14:54 UTC (permalink / raw) To: linux-btrfs Trying to get the size of a subvolume created using "btrfs receive", I've come with a cute little script: SUBVOL=/path/to/subvolume CGEN=$(btrfs subvolume show "$SUBVOL" \ | sed -n 's/\s*Gen at creation:\s*//p') btrfs subvolume find-new "$SUBVOL" $((CGEN+1)) \ | cut -d' ' -f7 \ | tr '\n' '+' \ | sed 's/\+\+$/\n/' \ | bc This simply sums up the "len" field from all modified files since the creation of the subvolume. Works fine, as btrfs-receive first makes a snapshot of the parent subvolume, then adds the files according to the send-stream. Now this rises some questions: 1. How accurate is this? AFAIK "btrfs find-new" prints real length, not compressed length. 2. If there are clone-sources in the send-stream, the cloned files probably also appear in the list. 3. Is there a better way? It would be nice to have a btrfs command for this. It would be straight-forward to have a "--summary" option in "btrfs find-new", another approach would be to calculate and dump the size in either "btrfs send" or "btrfs receive". Any thoughts? I'm willing to implement such a feature in btrfs-progs if this sounds reasonable to you. - Axel Ref: https://github.com/digint/btrbk/issues/280 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Used disk size of a received subvolume? 2019-05-16 14:54 Used disk size of a received subvolume? Axel Burri @ 2019-05-16 17:09 ` Remi Gauvin 2019-05-17 14:14 ` Axel Burri 2019-05-16 17:12 ` Hugo Mills 1 sibling, 1 reply; 10+ messages in thread From: Remi Gauvin @ 2019-05-16 17:09 UTC (permalink / raw) To: Axel Burri, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 1176 bytes --] On 2019-05-16 10:54 a.m., Axel Burri wrote: > > Any thoughts? I'm willing to implement such a feature in btrfs-progs if > this sounds reasonable to you. > BTRFS qgroups are where this is implemented. You have to enable quotas, and leaving quotas enabled has lots of problems, (mostly performance related), so I would not suggest leaving them on when there is lots of activity, (ie, multiple send/receive, or deletion of many snapshots.) But you can enable quotas as any time (btrfs quota enable /path) Wait for the rescan to finish btrfs quota rescan -s /path (to view status of scan) And then: btrfs qgroup show /path to list the space usage, (total, and what you're looking for: Exclusive) Note that the default groups correspond to subvolume ID, not filename, (someone did make a utility somewhere that will display this output with corresponding directory names.) btrfs sub list /path is used to find the relevant ID's.. (I find the -o option useful, so it only displays the subvolumes that are children to the /path) As stated above, I would suggest disabling quotas when you are finished: btrfs quota disable /path [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Used disk size of a received subvolume? 2019-05-16 17:09 ` Remi Gauvin @ 2019-05-17 14:14 ` Axel Burri 2019-05-17 16:22 ` Remi Gauvin 0 siblings, 1 reply; 10+ messages in thread From: Axel Burri @ 2019-05-17 14:14 UTC (permalink / raw) To: Remi Gauvin, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 1630 bytes --] On 16/05/2019 19.09, Remi Gauvin wrote: > On 2019-05-16 10:54 a.m., Axel Burri wrote: > >> >> Any thoughts? I'm willing to implement such a feature in btrfs-progs if >> this sounds reasonable to you. >> > > > BTRFS qgroups are where this is implemented. You have to enable quotas, > and leaving quotas enabled has lots of problems, (mostly performance > related), so I would not suggest leaving them on when there is lots of > activity, (ie, multiple send/receive, or deletion of many snapshots.) > > But you can enable quotas as any time (btrfs quota enable /path) > > Wait for the rescan to finish > > btrfs quota rescan -s /path (to view status of scan) > > And then: > > btrfs qgroup show /path to list the space usage, (total, and what you're > looking for: Exclusive) > > Note that the default groups correspond to subvolume ID, not filename, > (someone did make a utility somewhere that will display this output with > corresponding directory names.) > > btrfs sub list /path is used to find the relevant ID's.. (I find the -o > option useful, so it only displays the subvolumes that are children to > the /path) > > As stated above, I would suggest disabling quotas when you are finished: > > btrfs quota disable /path Thanks for the tip, but this does not seem practicable for productive systems, as it involves enabling quotas, which had many problems in the past (not sure about the current state, but probably still true if I get you correctly). Nevertheless I played around with it and it seems to work, I'll keep it in mind for the future. - Axel [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Used disk size of a received subvolume? 2019-05-17 14:14 ` Axel Burri @ 2019-05-17 16:22 ` Remi Gauvin 0 siblings, 0 replies; 10+ messages in thread From: Remi Gauvin @ 2019-05-17 16:22 UTC (permalink / raw) To: Axel Burri; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 583 bytes --] On 2019-05-17 10:14 a.m., Axel Burri wrote: > > Nevertheless I played around with it and it seems to work, I'll keep it > in mind for the future. > fi du was not implemented last time I had to do this, so I had completely forgotten about it. It makes much more sense to just use that for the Excluisve disk usage of a single subvolume. However, Qgroups does allow you to group multiple subvolumes,, so, for example, you build a query to find out how much disk space is used exclusively by the 10 oldest snapshots, which I find to be a more useful question.. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Used disk size of a received subvolume? 2019-05-16 14:54 Used disk size of a received subvolume? Axel Burri 2019-05-16 17:09 ` Remi Gauvin @ 2019-05-16 17:12 ` Hugo Mills 2019-05-17 13:57 ` Axel Burri 1 sibling, 1 reply; 10+ messages in thread From: Hugo Mills @ 2019-05-16 17:12 UTC (permalink / raw) To: Axel Burri; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 1934 bytes --] On Thu, May 16, 2019 at 04:54:42PM +0200, Axel Burri wrote: > Trying to get the size of a subvolume created using "btrfs receive", > I've come with a cute little script: > > SUBVOL=/path/to/subvolume > CGEN=$(btrfs subvolume show "$SUBVOL" \ > | sed -n 's/\s*Gen at creation:\s*//p') > btrfs subvolume find-new "$SUBVOL" $((CGEN+1)) \ > | cut -d' ' -f7 \ > | tr '\n' '+' \ > | sed 's/\+\+$/\n/' \ > | bc > > This simply sums up the "len" field from all modified files since the > creation of the subvolume. Works fine, as btrfs-receive first makes a > snapshot of the parent subvolume, then adds the files according to the > send-stream. > > Now this rises some questions: > > 1. How accurate is this? AFAIK "btrfs find-new" prints real length, not > compressed length. > > 2. If there are clone-sources in the send-stream, the cloned files > probably also appear in the list. > > 3. Is there a better way? It would be nice to have a btrfs command for > this. It would be straight-forward to have a "--summary" option in > "btrfs find-new", another approach would be to calculate and dump the > size in either "btrfs send" or "btrfs receive". btrfs find-new also doesn't tell you about deleted files (fairly obviously), so if anything's been removed, you'll be overestimating the overall change in size. > Any thoughts? I'm willing to implement such a feature in btrfs-progs if > this sounds reasonable to you. If you're looking for the incremental usage of the subvolume, why not just use the "exclusive" value from btrfs fi du? That's exactly that information. (And note that it changes over time, as other subvols it shares with are deleted). Hugo. -- Hugo Mills | Your problem is that you've got too much taste to be hugo@... carfax.org.uk | a web developer. http://carfax.org.uk/ | PGP: E2AB1DE4 | Steve Harris [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Used disk size of a received subvolume? 2019-05-16 17:12 ` Hugo Mills @ 2019-05-17 13:57 ` Axel Burri 2019-05-17 15:28 ` Graham Cobb 0 siblings, 1 reply; 10+ messages in thread From: Axel Burri @ 2019-05-17 13:57 UTC (permalink / raw) To: Hugo Mills, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 13285 bytes --] On 16/05/2019 19.12, Hugo Mills wrote: > On Thu, May 16, 2019 at 04:54:42PM +0200, Axel Burri wrote: >> Trying to get the size of a subvolume created using "btrfs receive", >> I've come with a cute little script: >> >> SUBVOL=/path/to/subvolume >> CGEN=$(btrfs subvolume show "$SUBVOL" \ >> | sed -n 's/\s*Gen at creation:\s*//p') >> btrfs subvolume find-new "$SUBVOL" $((CGEN+1)) \ >> | cut -d' ' -f7 \ >> | tr '\n' '+' \ >> | sed 's/\+\+$/\n/' \ >> | bc >> >> This simply sums up the "len" field from all modified files since the >> creation of the subvolume. Works fine, as btrfs-receive first makes a >> snapshot of the parent subvolume, then adds the files according to the >> send-stream. >> >> Now this rises some questions: >> >> 1. How accurate is this? AFAIK "btrfs find-new" prints real length, not >> compressed length. >> >> 2. If there are clone-sources in the send-stream, the cloned files >> probably also appear in the list. >> >> 3. Is there a better way? It would be nice to have a btrfs command for >> this. It would be straight-forward to have a "--summary" option in >> "btrfs find-new", another approach would be to calculate and dump the >> size in either "btrfs send" or "btrfs receive". > > btrfs find-new also doesn't tell you about deleted files (fairly > obviously), so if anything's been removed, you'll be overestimating > the overall change in size. True, making it not very useful, especially for backups where it is also important to see what has been deleted. >> Any thoughts? I'm willing to implement such a feature in btrfs-progs if >> this sounds reasonable to you. > > If you're looking for the incremental usage of the subvolume, why > not just use the "exclusive" value from btrfs fi du? That's exactly > that information. (And note that it changes over time, as other > subvols it shares with are deleted). btrfs fi du shows me the information wanted, but only for the last received subvolume (as you said it changes over time, and any later child will share data with it). For all others, it merely shows "this is what gets freed if you delete this subvolume". And it is pretty slow: on my backup disk (spinning rust, ~2000 subvolumes, ~100 sharing data), btrfs fi du takes around 5min for a subvolume of 20GB, while btrfs find-new takes only seconds. Summing up, what I'm looking for would be something like: btrfs fi du -s --exclusive-relative-to=<other-subvol> <subvol> Which is probably not easily doable. Thanks, - Axel PS: To get an idea on how this looks like on real data, here's the combined output of "btrfs fi du -s" (Total, Exclusive), "find-new-sum.sh", "find-new-prev.sh" for btrbk backups of a webservice with low traffic. Large values on find-new-prev column shows me when I updated the software, probably overestimated by a factor of 2 (new package installed, old package deleted). Total Exclusive find-new-sum find-new-prev Filename 319.59MiB 32.00KiB 345917734 data.20151130 320.24MiB 13.41MiB 630784 2453504 data.20151206 321.44MiB 16.82MiB 135168 3699477 data.20160103 325.77MiB 126.65MiB 258048 128418159 data.20160207 326.92MiB 127.69MiB 262144 128242483 data.20160306 467.85MiB 101.53MiB 585728 307220876 data.20160403 475.99MiB 101.17MiB 544768 107036454 data.20160501 515.12MiB 21.70MiB 450703 138940051 data.20160605 736.80MiB 22.46MiB 573440 246238002 data.20160703 743.39MiB 107.00MiB 540672 105290810 data.20160807 1.15GiB 149.94MiB 651264 613553890 data.20160904 1.23GiB 25.52MiB 111188246 238660114 data.20161002 1.24GiB 24.70MiB 491520 10912303 data.20161106 1.24GiB 25.66MiB 1834183 8197625 data.20161204 1.24GiB 115.76MiB 573440 160281830 data.20170101 1.25GiB 114.73MiB 610304 113867698 data.20170205 1.82GiB 25.48MiB 618496 726426846 data.20170305 1.82GiB 26.71MiB 507904 6691050 data.20170402 1.85GiB 26.59MiB 733184 112139682 data.20170507 1.85GiB 26.83MiB 24576 6841378 data.20170604 1.85GiB 27.92MiB 1874740 7671216 data.20170702 1.85GiB 28.73MiB 2441216 13105243 data.20170806 1.86GiB 27.94MiB 491520 12743292 data.20170903 1.90GiB 27.57MiB 634880 196836664 data.20171001 1.90GiB 28.38MiB 503808 8083183 data.20171105 1.90GiB 28.83MiB 425984 6748887 data.20171203 1.90GiB 129.11MiB 708751 174442102 data.20180107 1.87GiB 195.24MiB 1232896 249746997 data.20180204 2.33GiB 29.74MiB 1183744 731504273 data.20180304 2.33GiB 30.76MiB 1093632 7414396 data.20180401 2.35GiB 29.20MiB 1048576 205278308 data.20180506 2.35GiB 28.88MiB 1191936 2736128 data.20180513 2.35GiB 28.79MiB 901120 179189757 data.20180520 2.35GiB 28.82MiB 1179648 3047567 data.20180527 2.35GiB 29.00MiB 1040384 3502223 data.20180603 2.35GiB 29.21MiB 1183744 3093116 data.20180610 2.46GiB 28.66MiB 1093632 366863945 data.20180617 2.46GiB 29.24MiB 495616 3283388 data.20180624 2.46GiB 29.39MiB 1290240 3207311 data.20180701 2.47GiB 29.55MiB 1351680 3269245 data.20180708 2.50GiB 29.40MiB 995328 39091775 data.20180715 2.50GiB 29.46MiB 1236992 3411968 data.20180722 2.50GiB 29.79MiB 1015951 3823244 data.20180729 2.50GiB 30.11MiB 1359872 4673536 data.20180805 2.50GiB 30.25MiB 638976 4542464 data.20180812 2.50GiB 29.62MiB 1421312 183075636 data.20180819 2.50GiB 29.48MiB 1114112 3473408 data.20180826 2.50GiB 29.71MiB 1253376 3158016 data.20180902 2.50GiB 29.74MiB 1126400 2945167 data.20180909 2.50GiB 29.89MiB 1191936 2874597 data.20180916 2.50GiB 30.32MiB 720896 3408015 data.20180923 2.52GiB 30.94MiB 2097152 223821428 data.20180930 2.52GiB 30.51MiB 1187840 6980297 data.20181007 2.52GiB 30.45MiB 1090181 3455803 data.20181014 2.52GiB 30.52MiB 1196814 2983306 data.20181021 2.53GiB 30.78MiB 2231973 5325153 data.20181028 2.52GiB 30.93MiB 1789697 239439411 data.20181104 2.53GiB 30.92MiB 2453157 7016043 data.20181111 2.53GiB 31.09MiB 1666725 3402269 data.20181118 2.53GiB 30.68MiB 1060517 4639629 data.20181125 2.53GiB 30.76MiB 1051174 4370574 data.20181202 2.53GiB 30.68MiB 964673 3893313 data.20181209 2.53GiB 30.65MiB 1064960 2727936 data.20181216 2.54GiB 30.07MiB 1717236 212719260 data.20181223 2.54GiB 30.39MiB 1231330 4679530 data.20181230 2.54GiB 30.62MiB 1244308 3761492 data.20190106 2.54GiB 31.58MiB 1146533 7047447 data.20190113 2.55GiB 31.28MiB 1371905 6896944 data.20190120 2.55GiB 31.27MiB 1941203 6495347 data.20190127 2.55GiB 31.17MiB 1605262 5266478 data.20190203 2.55GiB 31.54MiB 1105596 2847971 data.20190210 2.56GiB 31.37MiB 1221804 201645985 data.20190217 2.56GiB 31.58MiB 1360378 2971929 data.20190224 2.56GiB 32.07MiB 2174675 5535980 data.20190303 2.56GiB 32.02MiB 1743836 6429621 data.20190310 2.56GiB 31.43MiB 1466368 3834494 data.20190317 2.56GiB 31.26MiB 1544192 1544192 data.20190318 2.56GiB 31.34MiB 1995373 1995373 data.20190319 2.56GiB 31.37MiB 1509559 1509559 data.20190320 2.56GiB 31.37MiB 1597093 1597093 data.20190321 2.56GiB 30.51MiB 1552037 1552037 data.20190322 2.57GiB 30.47MiB 2694989 2694989 data.20190323 2.57GiB 30.57MiB 2612947 2612947 data.20190324 2.58GiB 30.10MiB 208643677 208643677 data.20190325 2.58GiB 31.27MiB 1580732 1580732 data.20190326 2.58GiB 31.34MiB 1552083 1552083 data.20190327 2.58GiB 31.48MiB 1605331 1605331 data.20190328 2.58GiB 30.55MiB 17353623 17353623 data.20190329 2.58GiB 30.63MiB 2715094 2715094 data.20190330 2.58GiB 31.55MiB 1646130 1646130 data.20190331 2.58GiB 31.51MiB 1478263 1478263 data.20190401 2.58GiB 31.61MiB 5478839 5478839 data.20190402 2.58GiB 31.52MiB 1711712 1711712 data.20190403 2.58GiB 31.53MiB 1535538 1535538 data.20190404 2.58GiB 31.57MiB 1592820 1592820 data.20190405 2.58GiB 31.50MiB 1887794 1887794 data.20190406 2.58GiB 31.58MiB 1531442 1531442 data.20190407 2.58GiB 31.54MiB 1679082 1679082 data.20190408 2.58GiB 31.55MiB 1580732 1580732 data.20190409 2.58GiB 31.53MiB 1679036 1679036 data.20190410 2.58GiB 31.46MiB 1748668 1748668 data.20190411 2.58GiB 31.62MiB 1704319 1704319 data.20190412 2.58GiB 30.16MiB 215955743 215955743 data.20190413 2.58GiB 31.40MiB 1500930 1500930 data.20190414 2.58GiB 31.36MiB 1357708 1357708 data.20190415 2.58GiB 31.45MiB 1498789 1498789 data.20190416 2.58GiB 31.75MiB 2137765 2137765 data.20190417 2.58GiB 31.81MiB 1601212 1601212 data.20190418 2.58GiB 31.82MiB 1730505 1730505 data.20190419 2.58GiB 31.82MiB 1572586 1572586 data.20190420 2.58GiB 31.87MiB 1490620 1490620 data.20190421 2.58GiB 31.88MiB 1523457 1523457 data.20190422 2.58GiB 31.86MiB 1924865 1924865 data.20190423 2.58GiB 31.89MiB 1654482 1654482 data.20190424 2.58GiB 31.84MiB 1588970 1588970 data.20190425 2.58GiB 31.84MiB 1576613 1576613 data.20190426 2.58GiB 31.81MiB 1756929 1756929 data.20190427 2.58GiB 31.89MiB 1658625 1658625 data.20190428 2.58GiB 31.84MiB 1584805 1584805 data.20190429 2.58GiB 31.82MiB 1605377 1605377 data.20190430 2.58GiB 31.85MiB 1588947 1588947 data.20190501 2.58GiB 31.89MiB 1765121 1765121 data.20190502 2.58GiB 31.88MiB 1760933 1760933 data.20190503 2.58GiB 31.83MiB 1810154 1810154 data.20190504 2.58GiB 32.20MiB 1609473 1609473 data.20190505 2.61GiB 2.68MiB 149129730 149129730 data.20190506 2.61GiB 17.41MiB 3137281 3137281 data.20190507 2.61GiB 23.75MiB 2625258 2625258 data.20190508 2.61GiB 27.62MiB 2404074 2404074 data.20190509 2.61GiB 29.87MiB 1904224 1904224 data.20190510 2.61GiB 31.22MiB 2080513 2080513 data.20190511 2.61GiB 35.84MiB 2744042 2744042 data.20190512 2.61GiB 37.33MiB 2019073 2019073 data.20190513 2.61GiB 37.79MiB 2035388 2035388 data.20190514 2.61GiB 38.17MiB 1785578 1785578 data.20190515 2.61GiB 39.63MiB 1982163 1982163 data.20190516 $ cat find-new-sum.sh #!/bin/bash # Data size added on received subvolume. subvol=$1 cgen=$(btrfs subvolume show "$subvol" \ | sed -n 's/\s*Gen at creation:\s*//p') sum=$(btrfs subvolume find-new "$subvol" $((cgen+1)) \ | cut -d' ' -f7 \ | tr '\n' '+' \ | sed 's/\+\+$/\n/' \ | bc) echo "$subvol $sum" $ cat find-new-prev.sh #!/bin/bash # Data size added since last backup. # (works only if backups are received linear in time) lastgen=999999999 for subvol in $@ ; do cgen=$(btrfs subvolume show "$subvol" \ | sed -n 's/\s*Gen at creation:\s*//p') if [[ $lastgen -gt $cgen ]]; then echo "$subvol older, skipping" else sum=$(btrfs subvolume find-new "$subvol" $((lastgen+1)) \ | cut -d' ' -f7 \ | tr '\n' '+' \ | sed 's/\+\+$/\n/' \ | bc) echo "$subvol $sum" fi lastgen=$(btrfs subvolume show "$subvol" \ | sed -n 's/\s*Generation:\s*//p') done [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Used disk size of a received subvolume? 2019-05-17 13:57 ` Axel Burri @ 2019-05-17 15:28 ` Graham Cobb 2019-05-17 16:39 ` Steven Davies 2019-05-23 16:06 ` Axel Burri 0 siblings, 2 replies; 10+ messages in thread From: Graham Cobb @ 2019-05-17 15:28 UTC (permalink / raw) To: linux-btrfs On 17/05/2019 14:57, Axel Burri wrote: > btrfs fi du shows me the information wanted, but only for the last > received subvolume (as you said it changes over time, and any later > child will share data with it). For all others, it merely shows "this > is what gets freed if you delete this subvolume". It doesn't even show you that: it is possible to have shared (not exclusive) data which is only shared between files within the subvolume, and which will be freed if the subvolume is deleted. And, of course, the obvious problem that if you only count exclusive then no one is being charged for all the shared segments ("Oh, my backup is getting a bit expensive. Hmm. I know! I will back up all my files to two different destinations, and make sure btrfs is sharing the data between both locations! Then no one pays for it! Whoopee!") In my opinion, the shared/exclusive information in btrfs fi du is worse than useless: it confuses people who think it means something different from what it does. And, in btrfs, it isn't really useful to know whether something is "exclusive" or not -- what people care about is always something else (which is dependent on **where** it is shared, and by whom). The biggest problem is that you haven't defined what **you** (in your particular use case) mean by the "size" of a subvolume. For btrfs that doesn't have any single obvious definition. Most commonly, I think, people mean "how much space on disk would be freed up if I deleted this subvolume and all subvolumes contained within it", although quite often they mean the similar (but not identical) "how much space on disk would be freed up if I deleted just this subvolume". And sometimes they actually mean "how much space on disk would be freed up if I deleted this subvolume, the subvolumes contained with in, and all the snapshots I have taken but are lying around forgotten about in some other directory tree somewhere". But often they mean something else completely, such as "how much space is taken up by the data which was originally created in this subvolume but which has been cloned into all sorts of places now and may not even be referred to from this subvolume any more" (typically this is the case if you want to charge the subvolume owner for the data usage). And, of course, another reading of your question would be "how much data was transferred during this send/receive operation" (relevant if you are running a backup service and want to charge people by how much they are sending to the service rather than the amount of data stored). That is why I created my "extents-list" stuff. This is a horrible hack (one day I will rewrite it using the python library) which lets me answer questions like: "how much space am I wasting by keeping historical snapshots", "how much data is being shared between two subvolumes", "how much of the data in my latest snapshot is unique to that snapshot" and "how much space would I actually free up if I removed (just) these particular directories". None of which can be answered from the existing btrfs command line tools (unless I have missed something). > And it is pretty slow: on my backup disk (spinning rust, ~2000 > subvolumes, ~100 sharing data), btrfs fi du takes around 5min for a > subvolume of 20GB, while btrfs find-new takes only seconds. Yes. Answering the real questions involves taking the FIEMAP data for every file involved (which, for some questions, is actually every file on the disk!) so it takes a very long time. Days for my multi-terabyte backup disk. > Summing up, what I'm looking for would be something like: > > btrfs fi du -s --exclusive-relative-to=<other-subvol> <subvol> You can do that with FIEMAP data. Feel free to look extents-lists. Also feel free to shout "this is a gross hack" and scream at me! If you really just need it for two subvols like that extents-expr -s <subvol> - <other-subvol> will tell you how much space is in extents used in <subvol> but not used in <other-subvol>. Graham ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Used disk size of a received subvolume? 2019-05-17 15:28 ` Graham Cobb @ 2019-05-17 16:39 ` Steven Davies 2019-05-17 23:15 ` Graham Cobb 2019-05-23 16:06 ` Axel Burri 1 sibling, 1 reply; 10+ messages in thread From: Steven Davies @ 2019-05-17 16:39 UTC (permalink / raw) To: linux-btrfs On 17/05/2019 16:28, Graham Cobb wrote: > That is why I created my "extents-list" stuff. This is a horrible hack > (one day I will rewrite it using the python library) which lets me > answer questions like: "how much space am I wasting by keeping > historical snapshots", "how much data is being shared between two > subvolumes", "how much of the data in my latest snapshot is unique to > that snapshot" and "how much space would I actually free up if I removed > (just) these particular directories". None of which can be answered from > the existing btrfs command line tools (unless I have missed something). I have my own horrible hack to do something like this; if you ever get around to implementing it in Python could you share the code? -- Steven Davies ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Used disk size of a received subvolume? 2019-05-17 16:39 ` Steven Davies @ 2019-05-17 23:15 ` Graham Cobb 0 siblings, 0 replies; 10+ messages in thread From: Graham Cobb @ 2019-05-17 23:15 UTC (permalink / raw) To: linux-btrfs On 17/05/2019 17:39, Steven Davies wrote: > On 17/05/2019 16:28, Graham Cobb wrote: > >> That is why I created my "extents-list" stuff. This is a horrible hack >> (one day I will rewrite it using the python library) which lets me >> answer questions like: "how much space am I wasting by keeping >> historical snapshots", "how much data is being shared between two >> subvolumes", "how much of the data in my latest snapshot is unique to >> that snapshot" and "how much space would I actually free up if I removed >> (just) these particular directories". None of which can be answered from >> the existing btrfs command line tools (unless I have missed something). > > I have my own horrible hack to do something like this; if you ever get > around to implementing it in Python could you share the code? > Sure. The current hack (using shell and command line tools) is at https://github.com/GrahamCobb/extents-lists. If the python version ever materialises I expect it will end up there as well. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Used disk size of a received subvolume? 2019-05-17 15:28 ` Graham Cobb 2019-05-17 16:39 ` Steven Davies @ 2019-05-23 16:06 ` Axel Burri 1 sibling, 0 replies; 10+ messages in thread From: Axel Burri @ 2019-05-23 16:06 UTC (permalink / raw) To: Graham Cobb, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 5776 bytes --] On 17/05/2019 17.28, Graham Cobb wrote: > On 17/05/2019 14:57, Axel Burri wrote: >> btrfs fi du shows me the information wanted, but only for the last >> received subvolume (as you said it changes over time, and any later >> child will share data with it). For all others, it merely shows "this >> is what gets freed if you delete this subvolume". > > It doesn't even show you that: it is possible to have shared (not > exclusive) data which is only shared between files within the subvolume, > and which will be freed if the subvolume is deleted. And, of course, the > obvious problem that if you only count exclusive then no one is being > charged for all the shared segments ("Oh, my backup is getting a bit > expensive. Hmm. I know! I will back up all my files to two different > destinations, and make sure btrfs is sharing the data between both > locations! Then no one pays for it! Whoopee!") > > In my opinion, the shared/exclusive information in btrfs fi du is worse > than useless: it confuses people who think it means something different > from what it does. And, in btrfs, it isn't really useful to know whether > something is "exclusive" or not -- what people care about is always > something else (which is dependent on **where** it is shared, and by whom). Agreed. Sadly btrfs-filesystem(8) does not give much information on how "exclusive" should be interpreted. > The biggest problem is that you haven't defined what **you** (in your > particular use case) mean by the "size" of a subvolume. For btrfs that > doesn't have any single obvious definition. > > Most commonly, I think, people mean "how much space on disk would be > freed up if I deleted this subvolume and all subvolumes contained within > it", although quite often they mean the similar (but not identical) "how > much space on disk would be freed up if I deleted just this subvolume". > And sometimes they actually mean "how much space on disk would be freed > up if I deleted this subvolume, the subvolumes contained with in, and > all the snapshots I have taken but are lying around forgotten about in > some other directory tree somewhere". > > But often they mean something else completely, such as "how much space > is taken up by the data which was originally created in this subvolume > but which has been cloned into all sorts of places now and may not even > be referred to from this subvolume any more" (typically this is the case > if you want to charge the subvolume owner for the data usage). > > And, of course, another reading of your question would be "how much data > was transferred during this send/receive operation" (relevant if you are > running a backup service and want to charge people by how much they are > sending to the service rather than the amount of data stored). I actually meant "how much space is taken up by the data compared to the previous received subvolume", or any similar question which gives insight on how much disk space is being used over time by send/receive backups of snapshots of a source subvolume. After a couple of years of running btrbk I have many backup subvolumes, and I want to be able to get some statistics on which ones eat up how much space on disk. > That is why I created my "extents-list" stuff. This is a horrible hack > (one day I will rewrite it using the python library) which lets me > answer questions like: "how much space am I wasting by keeping > historical snapshots", "how much data is being shared between two > subvolumes", "how much of the data in my latest snapshot is unique to > that snapshot" and "how much space would I actually free up if I removed > (just) these particular directories". None of which can be answered from > the existing btrfs command line tools (unless I have missed something). > >> And it is pretty slow: on my backup disk (spinning rust, ~2000 >> subvolumes, ~100 sharing data), btrfs fi du takes around 5min for a >> subvolume of 20GB, while btrfs find-new takes only seconds. > > Yes. Answering the real questions involves taking the FIEMAP data for > every file involved (which, for some questions, is actually every file > on the disk!) so it takes a very long time. Days for my multi-terabyte > backup disk. > >> Summing up, what I'm looking for would be something like: >> >> btrfs fi du -s --exclusive-relative-to=<other-subvol> <subvol> > > You can do that with FIEMAP data. Feel free to look extents-lists. Also > feel free to shout "this is a gross hack" and scream at me! > > If you really just need it for two subvols like that > > extents-expr -s <subvol> - <other-subvol> > > will tell you how much space is in extents used in <subvol> but not used > in <other-subvol>. Thanks a lot, your scripts are very useful and answer my question. While I love their bashyness, I re-hacked parts of it in perl last night, so that I can use it within btrbk (not sure though if I want to unleash this to the masses, as many people will mis-interpret the data and shout at me on how slow this is). Here's what I got by now: # git clone -b extents-diff https://github.com/digint/btrbk.git # ./btrbk extents-diff /home --dry-run # ./btrbk extents-diff /home # ./btrbk extents-diff <subvol>... If called with a single argument, btrbk looks for all related subvolumes and prints the difference to the previous one, sorted by gen (transid). While this is usually fine for snapshots, parent-uuid chains get broken for received subvolume as soon as an intermediate subvolume is deleted (and thus need to be passed as additional arguments). The hacky perl module is here: https://github.com/digint/btrbk/blob/extents-diff/lib/Linux/ExtentsMap.pm [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-05-23 16:06 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-16 14:54 Used disk size of a received subvolume? Axel Burri 2019-05-16 17:09 ` Remi Gauvin 2019-05-17 14:14 ` Axel Burri 2019-05-17 16:22 ` Remi Gauvin 2019-05-16 17:12 ` Hugo Mills 2019-05-17 13:57 ` Axel Burri 2019-05-17 15:28 ` Graham Cobb 2019-05-17 16:39 ` Steven Davies 2019-05-17 23:15 ` Graham Cobb 2019-05-23 16:06 ` Axel Burri
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).