linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* python-btrfs v10 preview... detailed usage reporting and a tutorial
@ 2018-09-23 21:54 Hans van Kranenburg
  2018-09-23 23:19 ` Adam Borowski
  2018-09-24  8:08 ` Nikolay Borisov
  0 siblings, 2 replies; 6+ messages in thread
From: Hans van Kranenburg @ 2018-09-23 21:54 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

I'm planning for a python-btrfs release to happen in a about week.

All new changes are in the develop branch:
https://github.com/knorrie/python-btrfs/commits/develop

tl;dr: check out the two new examples added in the latest git commits
and see if they provide correct info!

## Detailed usage reporting

The new FsUsage object provides information on different levels
(physical allocated bytes on devices, virtual space usage, etc) and also
contains code to estimate how much space is still actually really
available before ENOSPC happens. (e.g. the values that would ideally
show up in df output). It works for all allocation profiles!

Two examples have been added, which use the new code. I would appreciate
extra testing. Please try them and see if the reported numbers make sense:

space_calculator.py
-------------------
Best to be initially described as a CLI version of the well-known
webbased btrfs space calculator by Hugo. ;] Throw a few disk sizes at
it, choose data and metadata profile and see how much space you would
get to store actual data.

See commit message "Add example to calculate usable and wasted space"
for example output.

show_usage.py
-------------
The contents of the old show_usage.py example that simply showed a list
of block groups are replaced with a detailed usage report of an existing
filesystem.

See commit message "A new show usage example!" for example output.

## A btrfs tutorial!

A while ago I started creating documentation for python-btrfs in
tutorial style. By playing around with an example filesystem we learn
where btrfs puts our data on the disks, what a chunk, block group and an
extent is, how we can quickly look up interesting things in metadata and
how cows climb trees, moo.

https://github.com/knorrie/python-btrfs/issues/11
https://github.com/knorrie/python-btrfs/blob/tutorial/tutorial/README.md

I'm not sure yet if I'm going to 'ship' the first few pages already,
since it's still very much a work in progress, but in any case feedback
/ ideas are welcome. Have a look!

## Other changes

Other changes are the addition of the sync, fideduperange and
get_features ioctl calls and a workaround for python 3.7 which breaks
the struct module api.

## P.S.

And finally, when doing the above, I discovered a few extra unintended
features and bugs in the btrfs chunk allocator (Did you know RAID10
block groups are limited to 5GiB in size? Did you know that when the
last chunk added on a disk is of DUP type, it could end up having an end
beyond the limit of a device?).

I still have to actually test the second one, causing it to happen.
If anyone is interested to help with that, please ask about it.

The bugs are all related to repeated kernel code all over the place
containing a lot of if statements dealing with different kind of
allocation profiles and their exceptions. What I ended up doing is
making a few helper functions instead, see the commit "Add volumes.py,
handling device / chunk logic". It would probably be nice to do the same
in the kernel code, which would also solve the mentioned bugs and
prevent new similar ones from happening.

Have fun,
-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-btrfs v10 preview... detailed usage reporting and a tutorial
  2018-09-23 21:54 python-btrfs v10 preview... detailed usage reporting and a tutorial Hans van Kranenburg
@ 2018-09-23 23:19 ` Adam Borowski
  2018-10-08  0:03   ` Hans van Kranenburg
  2018-09-24  8:08 ` Nikolay Borisov
  1 sibling, 1 reply; 6+ messages in thread
From: Adam Borowski @ 2018-09-23 23:19 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: linux-btrfs

On Sun, Sep 23, 2018 at 11:54:12PM +0200, Hans van Kranenburg wrote:
> Two examples have been added, which use the new code. I would appreciate
> extra testing. Please try them and see if the reported numbers make sense:
> 
> space_calculator.py
> -------------------
> Best to be initially described as a CLI version of the well-known
> webbased btrfs space calculator by Hugo. ;] Throw a few disk sizes at
> it, choose data and metadata profile and see how much space you would
> get to store actual data.
> 
> See commit message "Add example to calculate usable and wasted space"
> for example output.
> 
> show_usage.py
> -------------
> The contents of the old show_usage.py example that simply showed a list
> of block groups are replaced with a detailed usage report of an existing
> filesystem.

I wonder, perhaps at least some of the examples could be elevated to
commands meant to be run by end-user?  Ie, installing them to /usr/bin/,
dropping the extension?  They'd probably need less generic names, though.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 10 people enter a bar:
⣾⠁⢰⠒⠀⣿⡁ • 1 who understands binary,
⢿⡄⠘⠷⠚⠋⠀ • 1 who doesn't,
⠈⠳⣄⠀⠀⠀⠀ • and E who prefer to write it as hex.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-btrfs v10 preview... detailed usage reporting and a tutorial
  2018-09-23 21:54 python-btrfs v10 preview... detailed usage reporting and a tutorial Hans van Kranenburg
  2018-09-23 23:19 ` Adam Borowski
@ 2018-09-24  8:08 ` Nikolay Borisov
  2018-09-28 23:04   ` Hans van Kranenburg
  1 sibling, 1 reply; 6+ messages in thread
From: Nikolay Borisov @ 2018-09-24  8:08 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs



On 24.09.2018 00:54, Hans van Kranenburg wrote:
<snip>

.
> 
> The bugs are all related to repeated kernel code all over the place
> containing a lot of if statements dealing with different kind of
> allocation profiles and their exceptions. What I ended up doing is
> making a few helper functions instead, see the commit "Add volumes.py,
> handling device / chunk logic". It would probably be nice to do the same
> in the kernel code, which would also solve the mentioned bugs and
> prevent new similar ones from happening.

Would you care to report each bug separately so they can be triaged and
fixed?

> 
> Have fun,
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-btrfs v10 preview... detailed usage reporting and a tutorial
  2018-09-24  8:08 ` Nikolay Borisov
@ 2018-09-28 23:04   ` Hans van Kranenburg
  0 siblings, 0 replies; 6+ messages in thread
From: Hans van Kranenburg @ 2018-09-28 23:04 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

On 09/24/2018 10:08 AM, Nikolay Borisov wrote:
>>
>> The bugs are all related to repeated kernel code all over the place
>> containing a lot of if statements dealing with different kind of
>> allocation profiles and their exceptions. What I ended up doing is
>> making a few helper functions instead, see the commit "Add volumes.py,
>> handling device / chunk logic". It would probably be nice to do the same
>> in the kernel code, which would also solve the mentioned bugs and
>> prevent new similar ones from happening.
> 
> Would you care to report each bug separately so they can be triaged and
> fixed?

In case of the RAID10 5GiB thing I think I was mixing up things. When
doing mkfs you end up with a RAID10 chunk of 5GiB (dunno why, didn't
research), when mounting and pointing balance at it, I get a 10GiB for
it back, so that's ok.

For the DUP thing, I sent an explanation ("DUP dev_extent might overlap
something next to it"), which doesn't seem to attract much attention
yet. I'm preparing a pile of patches to volumes.[ch] to fix this, clean
up things that I ran into and make the logic a bit less convoluted.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-btrfs v10 preview... detailed usage reporting and a tutorial
  2018-09-23 23:19 ` Adam Borowski
@ 2018-10-08  0:03   ` Hans van Kranenburg
  2018-10-08  5:42     ` Adam Borowski
  0 siblings, 1 reply; 6+ messages in thread
From: Hans van Kranenburg @ 2018-10-08  0:03 UTC (permalink / raw)
  To: Adam Borowski; +Cc: linux-btrfs

Hi,

On 09/24/2018 01:19 AM, Adam Borowski wrote:
> On Sun, Sep 23, 2018 at 11:54:12PM +0200, Hans van Kranenburg wrote:
>> Two examples have been added, which use the new code. I would appreciate
>> extra testing. Please try them and see if the reported numbers make sense:
>>
>> space_calculator.py
>> -------------------
>> Best to be initially described as a CLI version of the well-known
>> webbased btrfs space calculator by Hugo. ;] Throw a few disk sizes at
>> it, choose data and metadata profile and see how much space you would
>> get to store actual data.
>>
>> See commit message "Add example to calculate usable and wasted space"
>> for example output.
>>
>> show_usage.py
>> -------------
>> The contents of the old show_usage.py example that simply showed a list
>> of block groups are replaced with a detailed usage report of an existing
>> filesystem.
> 
> I wonder, perhaps at least some of the examples could be elevated to
> commands meant to be run by end-user?  Ie, installing them to /usr/bin/,
> dropping the extension?  They'd probably need less generic names, though.

Some of the examples are very useful, and I keep using them frequently.
That's actually also the reason that I for now just have copied the
examples/ to /usr/share/doc/python3-btrfs/examples for the Debian
package, so that they're easily available on all systems that I work on.

Currently the examples collection is serving a few purposes. It's my
poor mans testing framework, which covers all functionality of the lib.
It displays all the things that you can do. There's a rich git commit
message history on them, which I plan to transform into documentation
and tutorial stuff later.

So, yes, a bunch of the things are quite useful actually. The new
show_usage and space_calculator are examples of things that are possible
which start to ascend the small thingies on debugging level.

So what would be candidates to be promoted to 'official' utils?

0) Ah, btrfs-heatmap

Yeah, that's the thing it all started with. I started writing all of the
code to be able to debug why my filesystems were allocating raw disk
space all the time and not reusing the free already allocated space.
But, that one is already done.

https://github.com/knorrie/btrfs-heatmap/

1) Custom btrfs balance

If really needed (and luckily, the need for it is mostly removed after
solving the -o ssd issues) I always use balance_least_used.py instead of
regular btrfs balance. I think it totally makes sense to do the analysis
of what blockgroups to feed to balance in what order in user space.

I also used another custom script to feed block groups with highly
fragmented free space to balance to try repairing filesystems that had
been using the cluster data extent allocator. That's not in examples,
but when you combine show_free_space_fragmentation with parts of
balance_least_used, you get the idea.

The best example I can think of here is a program that uses the new
usage information to find out how to feed block groups to balance to
actually get a balanced filesystem with minimal amount of wasted raw
space, and then do exactly that in the quickest way possible while
providing interesting progress information, instead of just brute force
rewriting all of the data and having no idea what's actually happening.

2) Advanced usage reporting

Something like the new show_usage, but hey, when using python with some
batteries included, I guess we can relatively easily do a nice html or
pdf output with pie and bar charts which provide the user with
information about the filesystem. Just having users run that when
they're asking for help on IRC and share the result would be nice. :o)

3) The space calculator

Yup, obviously.

4) Maybe show_orphan_cleaner_progress

I use that one now and then to get a live view on mass-removal of
subvolumes (backup snapshot expiry), but it's very close to a debug
tool. Or maybe I'm already spoiled and used to it now, and I don't
realize any more how frustrating it must be to see disk IO and cpu go
all places and have no idea about what btrfs is doing.

5) So much more...

So... the examples are just basic test coverage. There is so much more
that can be done.

And yes, to be able to write a small thingie that uses the lib, you
already have to know a lot about btrfs. -> That's why I started writing
the tutorial.

And yes, when promoting things like the new show_usage example to
programs that are easily available, users will probably start parsing
the output of them with sed and awk which is a total abomination and the
absolute opposite of the purpose of the library. So be it. Let it go. :D
"The code never bothered me any way".

The interesting question that remains is where the result should go.

btrfs-heatmap is a thing of its own now, but it's a bit of the "show
case" example using the lib, with its own collection of documentation
and even possibility to script it again.

Shipping the 'binaries' in the python3-btrfs package wouldn't be the
right thing, so where should they go? apt-get install btrfs-moar-utils-yolo?

Or should btrfs-progs start to use this to accelerate improvement for
providing a richer collection of useful progs for things that are not on
essential level (like, you won't need them inside initramfs, so they can
use python)?

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: python-btrfs v10 preview... detailed usage reporting and a tutorial
  2018-10-08  0:03   ` Hans van Kranenburg
@ 2018-10-08  5:42     ` Adam Borowski
  0 siblings, 0 replies; 6+ messages in thread
From: Adam Borowski @ 2018-10-08  5:42 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: linux-btrfs

On Mon, Oct 08, 2018 at 02:03:44AM +0200, Hans van Kranenburg wrote:
> And yes, when promoting things like the new show_usage example to
> programs that are easily available, users will probably start parsing
> the output of them with sed and awk which is a total abomination and the
> absolute opposite of the purpose of the library. So be it. Let it go. :D
> "The code never bothered me any way".

It's not like some deranged person would parse the output of, say, show_file
in Perl...
 
> The interesting question that remains is where the result should go.
> 
> btrfs-heatmap is a thing of its own now, but it's a bit of the "show
> case" example using the lib, with its own collection of documentation
> and even possibility to script it again.
> 
> Shipping the 'binaries' in the python3-btrfs package wouldn't be the
> right thing, so where should they go? apt-get install btrfs-moar-utils-yolo?

At least in Debian, moving executables between packages is a matter of
versioned Replaces (+Conflicts: old), so if any point you decide differently
it's not a problem.  So btrfs-moar-utils-yolo should work well.

> Or should btrfs-progs start to use this to accelerate improvement for
> providing a richer collection of useful progs for things that are not on
> essential level (like, you won't need them inside initramfs, so they can
> use python)?

You might want your own package that's agile and btrfs-progs for things
declared to be rock stable (WRT command-line API, not neccesarily stability
of code).

Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ 10 people enter a bar: 1 who understands binary,
⢿⡄⠘⠷⠚⠋⠀ 1 who doesn't, D who prefer to write it as hex,
⠈⠳⣄⠀⠀⠀⠀ and 1 who narrowly avoided an off-by-one error.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-08  5:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-23 21:54 python-btrfs v10 preview... detailed usage reporting and a tutorial Hans van Kranenburg
2018-09-23 23:19 ` Adam Borowski
2018-10-08  0:03   ` Hans van Kranenburg
2018-10-08  5:42     ` Adam Borowski
2018-09-24  8:08 ` Nikolay Borisov
2018-09-28 23:04   ` Hans van Kranenburg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).