All of lore.kernel.org
 help / color / mirror / Atom feed
* [proposal] making filesystem tools more machine friendly
@ 2017-06-30  8:17 Jan Tulak
  2017-06-30 10:22 ` Arvin Schnell
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Jan Tulak @ 2017-06-30  8:17 UTC (permalink / raw)
  To: linux-fsdevel, linux-btrfs, linux-ext4, linux-xfs

AKA filesystem API

Hi guys

Currently, filesystem tools are not made with automation in mind. So
any tool that wants to interact with filesystems (be it for
automation, or to provide a more user-friendly interface) has to
screen scrape everything and cope with changing outputs.

I think it is the time to focus some thoughts on how to make the fs
tools easier to be used by scripts and other tools. Now, to ease you,
the answer to the obvious question "who will do it" is "me". I don't
want to force you into anything, though, so I'm opening this
discussion pretty early with some ideas and I hope to hear from you
what do you think about it before anything is set in the stone. (For
those who visited Vault this year, Justin Mitchell and I had a talk
about this, codename Springfield.)

The following text attempts to identify issues with using
filesystems-related tools in scripts/applications and proposes a
solution to those issues.

Content:
1. A quick introduction
2. Details of the issues
3. Proposed Solutions
4. Conclusion

1. A quick introduction
=================

I discussed this topic with people who are building something around
fs tools. For example, the developer of libblockdev (Vratislav
Podzimek, https://github.com/vpodzime/libblockdev) or system storage
manager (was Lukas Czerner, now it is me,
https://sourceforge.net/projects/storagemanager/), and the listed
issues are a product of experience from working with those tools. The
issues are related mostly to basic operations like mkfs, fsck,
snapshots and resizing. Advanced/debugging tools like xfs_io or xfs_db
are not in the focus, but they might benefit from any possible change
too if included in it.

The main issues of the current state, where all the tools are run with
some flags and options and produce a human readable output:
 * The output format can change, sometimes without an easy way of
detection like a version number.
 * Different output formats for different tools even in a single FS,
thus zero code reuse/sharing.
 * Screenscraping can introduce bugs (a rare inner condition can add a
field into the output and break regular expressions).
 * No (or weak) progress report.
 * Different filesystems have different input formats (flags, options)
even for the most basic capabilities.
 * Thread safety, forking

2. Details of the issues
==================

Let’s look at the issues now: why it is an issue and if we can do some
small change to fix it on its own.


The output format can change, sometimes without an easy way of
detection like a version number. Most filesystems are well behaved,
but still, we don’t know what exactly are people doing with the tools
and even adding a new field can possibly break a script. Keeping a
compatibility with older versions adds another complexity to such
tools.

What can be done about this? The new fields have to be printed somehow
and changing the format of the standard output would break everything.
Making sure that if the input or output changes in any way, it is
always with a detectable difference in the version number is a good
practice, but it doesn’t solve the issue, it only makes hacking around
it easier.

What can really help is to have an alternative output (which can be
turned on when the user wants it), which is easy to parse and which is
resilient to a certain degree of changes because it can express
dynamic items like lists or arrays: JSON, XML...


Different input/output formats for different tools even in a single
FS, thus zero code reuse/sharing: Support for every tool and every
filesystem has to start from a scratch. Nothing can be done about it
without a change in the output format. But if an optional JSON or XML
or something was supported, then instead of creating a parser for
every tool, there could be used just one standard and already a
well-tested library.


Screenscraping can introduce bugs (some rare inner condition can add a
field into the output and break regular expressions): Well, let’s just
look at how many services still can’t even parse and verify an email
address correctly. And we have a lot more complex text… Again, some
easy-to-parse format with existing libraries that would turn the text
into a bunch of variables or an object would help.


No (or weak) progress report: Especially for tools that can run for a
long time, like fsck. Screenscraping everything it throws out and then
deciding whether it is a progress report message (because instead of
“25 %” it says “running operation foo”), a message to tell the user,
or something to just ignore is a lot less comfortable and error prone
than “{level: ‘progress’, stage: 5, msg: ‘operation foo’}”.


Different filesystems have different input formats (flags, options)
even for the most basic capabilities: Similar to “Different
input/output formats for different tools...”. For example, for
labeling a partition, you use mkfs.xfs -L label, but mkfs.vfat -n
label. However, changing this requires getting the same functionality
with a common basic specification to other filesystems too.


Thread safety, forking: The people who work on a library, like
libblockdev, doesn’t like that they have to fork another process over
which they have no control, as they can’t guarantee anything about it.
This can’t be fixed by changing the output format, though, but would
require making a public library providing a similar interface as the
existing fs tools. No detailed access to insides is needed, just a way
how to run mkfs, fsck, snapshots, etc… without spawning another
process and without screenscraping.


3. Proposed Solutions
=================

There are two (complementary) ways how to address the issues: add a
structured, machine-readable format for input/output of the tools
(e.g. JSON, XML, …) and to create a library with the functionality of
the existing tools. Let’s look now at those options. I will focus on
what changes they would require, what would be the price for
maintaining that solution and if there are any drawbacks or additional
advantages.

An optional third option would be to create a system service/daemon,
that would use dbus or some other structured interface to accept jobs
and return results. I think that LVM people are working on something
like this.

The proposed solutions are ordered in regards to their complexity.
Also, they can be seen as follow-ups, because every proposed option
requires big part of the work from the previous one anyway.


3.1. Structured Output
-------------------------------
In other words, what LVM already does with --reportformat
{basic|json}, and lsblk with --json. Possibly, we could also make JSON
input too. That would allow the user to, instead of using all the
flags and options of CLI, make something like
--jsoninput=“{dev:’/dev/abc’, force: true, … }”

Some preliminary notes about the format:
Most likely, this would mean JSON. JSON is currently preferred over
XML because it is easier to read by humans if the need arises, it’s
encoder/parser is lighter and easier to use. Also, other projects like
LVM, multipath or lsblk already uses JSON, so it would be nice to
don’t break the group.

Required implementation changes/expected problems:
In an ideal world, a simple replacement of all prints with a wrapping
function would be enough. However, as far as I know, an overwhelming
majority of the tools has printing functions spread through the code
and prints everything as soon as it knows the specific bit of
information.

Change of the output into a structured format means some refactoring.
Instead of simple printf(), an object, array or structure has to be
created and rather than pure strings, more diagnostically useful
values have to be added into it in place of the current prints. Then,
when it is reasonable, print it out all at once in any desired format.
The “when reasonable” would usually mean at the end, but it could also
print progress if it is a long time running operation.

Because of the kinds of the required changes, the implementation can
be split into two parts: first, clean the code and move all the prints
out from spaghetti to the end. Then, once this is done, add the
structured format.

Maintaining overhead:
Small. By separating the printing from generating the data, we don’t
really care about the output format anywhere except in the printing
function itself, and if a new field or value is added or an old one
removed, then the amount of work is roughly equal to the current
state.

Drawbacks:
Searching for a place in the code where something happens would be
more complicated - instead of simple search for a string that is the
same in the code and in the output, one would search for the line that
adds the data to the message log. This could be simplified by using
__LINE__ macro with debug output. (With JSON, an additional field
would not affect anything, so it would be completely safe.)

Additional advantages:
The refactoring can clean up the code a bit. It is easy to add any
other format in the future. Our own test suite could also benefit from
the structured output.

Comment:
The most bang for the buck option and most of the work done for this
is required also for every other option. In terms of specific
interface (i.e. common JSON elements), we need to identify a common
subset of fields/options that every fs will use and anything else move
into fs-specific extensions.

With regards to the compatibility between different filesystems, the
best way how to specify the format might be a small library that would
take raw data on one side and turn it into JSON string or even print
it (and the reverse, if input supports json too). This way, we would
be sure that there really is a common ground that works the same in
every fs.

Another way how to achieve the compatibility is to make an RFC-like
document. For example: All occurrences of a filesystem identifier MUST
be in a field named 'fsid' which SHOULD contain a UUID formatted
string. I think this is useful even if we end up with the library as a
way to find out the common ground.

3.2. A Library
-------------------
A library implementing basic functions like: mkfs(int argc, char
*argv[]), fsck(), … etc. Once done, binding for other languages like
Python is possible too.

Required implementation changes/expected problems:
If the implementation of this library would follow up after the
changes that add the structured output, then most of the work would be
already done. The most complex issue remaining would probably be that
there can be no exit() call - if anything fails, we have to gracefully
return up the stack and out of the library.

A duplicity of functionality is not an issue because there is none -
the binary tools like mkfs.xfs would become simple wrappers around the
library functions: pass user input to the specified library call, then
print any message created in the library and exit.

Maintaining overhead:
None? The code would be cleaned up and moved, but there wouldn’t be
new things to maintain.

Drawbacks:
I can’t think out any...

Additional advantages:
The refactoring can clean up the code a bit.

Comment:
Useful and nice to have, but doesn’t have to be done ASAP.


3.3. A system service
------------------------------
A system service/daemon, that would use dbus or some other structured
interface to accept jobs and return results.

Required implementation changes/expected problems:
We don’t want the daemon to exit if it can’t access a file, we don’t
want it to do printfs(), … So, at the end, we have to do the
structured output and library and then a lot of work above it.

Maintaining overhead:
All the system services things plus whatever the other two solutions require.

Drawbacks:
The biggest maintaining overhead of all proposed solutions, basically
a new tool/subproject. Using dbus in a third party project is more
work than to just include a library and call one function.

Additional advantages:
It wouldn’t be possible to attempt concurrent modifications of a device.

Comment:
In my opinion, this shouldn’t be our project. Let other people make it
their front if they want something like this and instead, just make it
easier for them by using one of the other solutions.


4. Conclusion
===========

A structured output is something we should aim for. It helps a lot
with the issues and it is the cheapest option. If it goes well, it can
be later on followed by creating a library, although, at this moment,
that seems a premature thing strive for. Creating a daemon is not a
thing any single filesystem should do on its own.

And thank you for reading all this, I look forward to your comments. :-)
Jan

-- 
Jan Tulak
jtulak@redhat.com / jan@tulak.me

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-06-30  8:17 [proposal] making filesystem tools more machine friendly Jan Tulak
@ 2017-06-30 10:22 ` Arvin Schnell
  2017-06-30 13:58 ` Emmanuel Florac
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Arvin Schnell @ 2017-06-30 10:22 UTC (permalink / raw)
  To: linux-fsdevel


Hi Jan,

I appreciate your proposal and I'm all for it!

Working on the storage part of YaST I face similar problems with
parsing the output of many different programs. Having a common
JSON output will certainly help to make the code simpler and more
robust.

Currently some parsers are very difficult, esp. when the output
attributes contain the chars used as a separator. Some tools even
have machine friendly output but it is still error
prone. E.g. parted does not protect a ':' in the partition name
of its machine friendly output. (Just an example, your proposal
is not about parted.)

ciao
  Arvin

-- 
Arvin Schnell, <aschnell@suse.com>
Senior Software Engineer, Research & Development
SUSE Linux GmbH, GF: Felix Imendï¿œrffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nï¿œrnberg)
Maxfeldstraï¿œe 5
90409 Nï¿œrnberg
Germany

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-06-30  8:17 [proposal] making filesystem tools more machine friendly Jan Tulak
  2017-06-30 10:22 ` Arvin Schnell
@ 2017-06-30 13:58 ` Emmanuel Florac
  2017-06-30 15:29 ` Theodore Ts'o
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Emmanuel Florac @ 2017-06-30 13:58 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-fsdevel, linux-btrfs, linux-ext4, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

Le Fri, 30 Jun 2017 10:17:17 +0200
Jan Tulak <jtulak@redhat.com> écrivait:

> A structured output is something we should aim for. It helps a lot
> with the issues and it is the cheapest option. If it goes well, it can
> be later on followed by creating a library, although, at this moment,
> that seems a premature thing strive for. Creating a daemon is not a
> thing any single filesystem should do on its own.
> 
> And thank you for reading all this, I look forward to your
> comments. :-)

The whole concept seems a noble endeavour, I'm a bit wary of the
possibility to convince all the teams in charge to adhere to it
however... But that looks like a good idea for sure :)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-06-30  8:17 [proposal] making filesystem tools more machine friendly Jan Tulak
  2017-06-30 10:22 ` Arvin Schnell
  2017-06-30 13:58 ` Emmanuel Florac
@ 2017-06-30 15:29 ` Theodore Ts'o
  2017-07-03 11:52   ` Jan Tulak
  2017-07-05 18:11 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2017-06-30 15:29 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-fsdevel

I'm only replying to linux-fsdevel@ because I think that's really the
right list for us to be having this discussion.

So a couple of comments.  First of all, this sort of thing has been
proposed before.  Specifically, this was one of the problems which
EVMS (from IBM, which ultimately lost out to device mapper) had as
part of their project.  Their userspace component has a plug-in
architecture so that file systems could provide a shared library which
could be used by GUI and CLI tools.

I had that support in e2fsprogs, and it was removed when EVMS was
ultimately killed off by device-mapper and LVM.  The commit removal
was in commit 921f4ad53: "Remove support for EVMS 1.x plugin library"
so if you look at the sources in lib/evms in the e2fsprogs git repo at
commit 921f4ad53 (aka 921f4ad53^), you can see what it looked like.

Secondly, as a file system developer and maintainer for e2fsprogs, I'm
going to be extremely hesitant to accept patches which radically
reorganize e2fsprogs and which adds dependencies on third-party
libraries that emit JSON output before I know whether or not your
project is going to be successful, or will ultimately end up getting
abandoned and left for dead, as was the case for EVMS.

That being said, I probably would be willing to accept a drop-in
library, ala lib/evms, where some of the functions were handled
directly via accessing the file system directly via the plugin
library, and where some of the changes were made by wrapping mke2fs
and e2fsck.  Over time, *if* this interface gained legs, I'd be more
willing to try to make a library version of mke2fs (and then _maybe_
e2fsck) so that the plugin library could do more of the work natively,
rather than by wrapping the mke2fs and e2fsck executables.

Third, I think you are *massively* underestimating how much work is
needed to do this in a generic fashion, especially with the file
system check tool.  In any tool where you need to ask the user for
permission to make a particular change to the file system, trying to
do this in a generic way is *hard*, and the "use JSON" is not enough.

At the same time, for the common case, where you just want to "check
to see if the file system is consistent, and return a boolean", or
"check the file system and do all of the safe things", you don't need
all of the complexity of JSON.  So I think JSON is simultaneously too
much (for the simple stuff, where the user doesn't need to see the
output anyway), and too little (for the hard cases where you are doing
a file system repair operation).

For another matter, how you would pass in mkfs parameters, which are
very file system specific, in a completely generic way, is also
completely unspecified in your proposal.  This makes me deeply
suspicious you haven't thought through the issues sufficiently, and if
you try to send patches that massively reorganize e2fsprogs and adds
JSON output in what I suspect will be a horrible, terribly ugly way,
it's most likely I will decline to accept your patches.

This probably means that you will need to do something which starts by
using screen scraping at least initially, and if the maintainers
aren't willing to adopt your code, you're going to have to maintain it
yourself.  That's how EVMS started, and then they asked if I would be
willing to integrate the ext2/3 (this was before ext4) evms plugin
into e2fsprogs, and would I be willing to start making changes to
integrate it more organically into e2fsprogs.  I suspect that kind of
intecremental approach is much more likely to be successful in the
long term.  You may also want to look at the EVMS plugin interfaces,
since the people who tried this did think about what made sense to try
to do in a file system independent way, and what would probably have
to be kept file system specific.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-06-30 15:29 ` Theodore Ts'o
@ 2017-07-03 11:52   ` Jan Tulak
  2017-07-03 15:07     ` Theodore Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Tulak @ 2017-07-03 11:52 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: linux-fsdevel

On Fri, Jun 30, 2017 at 5:29 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>
> So a couple of comments.  First of all, this sort of thing has been
> proposed before.  Specifically, this was one of the problems which
> EVMS (from IBM, which ultimately lost out to device mapper) had as
> part of their project.  Their userspace component has a plug-in
> architecture so that file systems could provide a shared library which
> could be used by GUI and CLI tools.
>
> I had that support in e2fsprogs, and it was removed when EVMS was
> ultimately killed off by device-mapper and LVM.  The commit removal
> was in commit 921f4ad53: "Remove support for EVMS 1.x plugin library"
> so if you look at the sources in lib/evms in the e2fsprogs git repo at
> commit 921f4ad53 (aka 921f4ad53^), you can see what it looked like.

Thank you for this info, I wasn't aware of EVMS. It might be useful,
although, from my brief look, it seems that it wouldn't do much for
the structured output, but rather for the library later on.

>
> Secondly, as a file system developer and maintainer for e2fsprogs, I'm
> going to be extremely hesitant to accept patches which radically
> reorganize e2fsprogs and which adds dependencies on third-party
> libraries that emit JSON output before I know whether or not your
> project is going to be successful, or will ultimately end up getting
> abandoned and left for dead, as was the case for EVMS.
>
> That being said, I probably would be willing to accept a drop-in
> library, ala lib/evms, where some of the functions were handled
> directly via accessing the file system directly via the plugin
> library, and where some of the changes were made by wrapping mke2fs
> and e2fsck.  Over time, *if* this interface gained legs, I'd be more
> willing to try to make a library version of mke2fs (and then _maybe_
> e2fsck) so that the plugin library could do more of the work natively,
> rather than by wrapping the mke2fs and e2fsck executables.

Yeah, I understand your reluctance (and expect it), which is why I
opened this so early. About the drop-in in lib/, I have to check what
could or couldn't do this way. I know xfsprogs, but e2fsprogs I saw
only really briefly.

My intention with the structured output is to minimize the amount of
code that has to be kept up-to-date, unlike the library. Ideally and
most of the time, the only difference would be that instead of printf
calls, there would be a function to just store the message or specific
values in a 1:1 substitution, while the part that does the actual
printing would be small, loosely coupled and wouldn't need any changes
during the usual development of e2fsprogs, xfsprogs or anything else.

In any case, starting with a wrapper is something I thought about. It
would mean that it's code has to be kept up-to-date manually all the
time, but if it would catch enough of interest, it could be integrated
with only minimal changes on the outside.

>
> Third, I think you are *massively* underestimating how much work is
> needed to do this in a generic fashion, especially with the file
> system check tool.  In any tool where you need to ask the user for
> permission to make a particular change to the file system, trying to
> do this in a generic way is *hard*, and the "use JSON" is not enough.
>
> At the same time, for the common case, where you just want to "check
> to see if the file system is consistent, and return a boolean", or
> "check the file system and do all of the safe things", you don't need
> all of the complexity of JSON.  So I think JSON is simultaneously too
> much (for the simple stuff, where the user doesn't need to see the
> output anyway), and too little (for the hard cases where you are doing
> a file system repair operation).

I want to limit the capabilities of this interface to non-interactive
only. So, yes, with fsck, JSON would be overkill. But the idea is to
have a single format across all the tools, so you don't need a
standalone parser for every tool, even if some tools don't need
anything more than an exit code and/or one message on stderr. In the
case of ext2/3/4 it is more about resize2fs and tune2fs, where the
JSON would be much more useful, than fsck.

>
> For another matter, how you would pass in mkfs parameters, which are
> very file system specific, in a completely generic way, is also
> completely unspecified in your proposal.  This makes me deeply
> suspicious you haven't thought through the issues sufficiently, and if
> you try to send patches that massively reorganize e2fsprogs and adds
> JSON output in what I suspect will be a horrible, terribly ugly way,
> it's most likely I will decline to accept your patches.

The generic way would be along the lines: These ten common and
frequently used fields are generic and work everywhere, anything else
has a prefix (ext4-, xfs-, btrfs-, ...) or is inside of a fs-specific
list of extensions "ext4":{"some-option":value}. And during the
parsing, all the fields would be mapped to some mkfs arguments,
usually 1:1. Similar to what could be done with output, like putting
volume identifier into a specific field (e.g. "fsid") no matter what
the filesystem is.

But I'm still trying to find out what are the common usage patterns;
whether there is some frequently used subset for which it is worth to
do the unification at all. So, it is quite possible that I will ditch
this particular idea completely (and add the JSON while keeping every
field name and structure fs-specific) if I find out that it is better
to be left on higher-level tools; if the only unified options would be
volume label, force flag, and target device, then the saved time for
other developers is not balancing the effort spend on the unification
of fields.

So, I didn't specify how to do it in detail because I don't know yet
at what point it would be worth of the effort (when it would be useful
enough). But at the same time, this decision should better be made
early, so I mentioned it in my proposal, to see if people are
interested in it at all.

>
> This probably means that you will need to do something which starts by
> using screen scraping at least initially, and if the maintainers
> aren't willing to adopt your code, you're going to have to maintain it
> yourself.  That's how EVMS started, and then they asked if I would be
> willing to integrate the ext2/3 (this was before ext4) evms plugin
> into e2fsprogs, and would I be willing to start making changes to
> integrate it more organically into e2fsprogs.  I suspect that kind of
> intecremental approach is much more likely to be successful in the
> long term.  You may also want to look at the EVMS plugin interfaces,
> since the people who tried this did think about what made sense to try
> to do in a file system independent way, and what would probably have
> to be kept file system specific.
>

Thanks for pointing out EVMS, I will see what I can learn from that
attempt. Starting with screenscraping is certainly an option and might
be the only viable one. This raises some other questions, though:
given the temporality of the wrapper, I would rather use other
languages than C/Bash (e.g. Python) to simplify and speed up the
development. But I have doubts about whether you would be willing to
adopt this into e2fsprogs, which would, in turn, reduce the usability
of this approach.

Anyway, thank you much for your reply. :-)

Cheers,
Jan


-- 
Jan Tulak
jtulak@redhat.com / jan@tulak.me

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-07-03 11:52   ` Jan Tulak
@ 2017-07-03 15:07     ` Theodore Ts'o
  2017-07-03 17:37       ` Darrick J. Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2017-07-03 15:07 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-fsdevel

On Mon, Jul 03, 2017 at 01:52:49PM +0200, Jan Tulak wrote:
> I want to limit the capabilities of this interface to non-interactive
> only. So, yes, with fsck, JSON would be overkill. But the idea is to
> have a single format across all the tools, so you don't need a
> standalone parser for every tool, even if some tools don't need
> anything more than an exit code and/or one message on stderr. In the
> case of ext2/3/4 it is more about resize2fs and tune2fs, where the
> JSON would be much more useful, than fsck.

I'm not sure what sort use cases you have in mind where structured
output would be useful.

For mke2fs or resize2fs, in general all you care about is "did the
operation succeed", right?  What did you have in mind where this more
information than "the operation was successful" did you have in mind?

For tune2fs, what options or output in tune2fs are sufficiently file
system independent that you think it would be worth exporting to your
infrastructure?

> The generic way would be along the lines: These ten common and
> frequently used fields are generic and work everywhere, anything else
> has a prefix (ext4-, xfs-, btrfs-, ...) or is inside of a fs-specific
> list of extensions "ext4":{"some-option":value}. And during the
> parsing, all the fields would be mapped to some mkfs arguments,
> usually 1:1. Similar to what could be done with output, like putting
> volume identifier into a specific field (e.g. "fsid") no matter what
> the filesystem is.

It might be useful if you could give some "user stories" that explain
at a high level what the user might want to do with the ultimate
user-visible interface that would require this kind of precision.

Most users don't know how to use the specialized options to
mkfs.<FSTYP> and to be honest, most don't need to.  The way I've dealt
with this for mke2fs.conf is that when someone has come up with a
specialized recipe for a unique sort of file system type --- say,
maybe for a Lustre Metadata server, or the back-end storage for some
kind of clustre file system like Hadoopfs, or specialized options for
an Android phone --- someone with wizard-level skills will edit
/etc/mke2fs.conf, with perhaps something like this:

    smr-host-managed = {
        features = extent,huge_file,bigalloc,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
        cluster_size = 32768
        hash_alg = half_md4
        reserved_ratio = 0.0
        num_backup_sb = 0
        packed_meta_blocks = 1
        make_hugefiles = 1
        inode_ratio = 4194304
        hugefiles_dir = /smr
        hugefiles_name = smr-file
        hugefiles_digits = 0
        hugefiles_size = 0
        hugefiles_align = 256M
        hugefiles_align_disk = true
        num_hugefiles = 1
        zero_hugefiles = false
	flex_bg_size = 262144
    }

... and then all the user will have to do is run "mke2fs -t ext4 -T
smr-host-managed /dev/sdXX".  (The intended use case for this is to
support an SMR-aware user-space application that was going to be
managing the host-managed SMR zones directly.)

I suspect telling the user that they have to type a whole series of
parameters of the form:

	"ext4":{"some-option":value}

into a GUI would not be a particularly user-friendly suggestion.  :-)

The other question I'd ask is how many people really are going to want
to use your infrastructure?  Is it only going to be for the "point and
click" users who will want a simplified interface?  Are you trying to
make something that will be useful for advanced/expert users?  What
value are you going to be able to add that will convince the
advanced/export users that they should learn your new
"fstype":{"some-option":value} syntax when typing at the command line
will probably be ten times faster, easier, and less rage-inducing than
trying to reverse-engineer out some interface that was designed not to
scare the civilians?  (There's a reason why many drivers prefer manual
to automatic transmission on their cars.  :-)

> Thanks for pointing out EVMS, I will see what I can learn from that
> attempt. Starting with screenscraping is certainly an option and might
> be the only viable one. This raises some other questions, though:
> given the temporality of the wrapper, I would rather use other
> languages than C/Bash (e.g. Python) to simplify and speed up the
> development. But I have doubts about whether you would be willing to
> adopt this into e2fsprogs, which would, in turn, reduce the usability
> of this approach.

The reason why you might want to consider C is because:

   * It allows the plugin to be imported into many different
     programming languages: Python, Go, Perl, etc., via using
     something like SWIG.

   * Different file system maintainers will be willing to accept
     maintenance of your plugin at different times.  For some file
     systems, you may have to wrap the command-line tools forever; for
     one thing the file system may no longer be under active
     maintenance (ex: iso9660) but you still might want to be use it
     in your GUI interface.  Other file system developers will be
     willing to take over the plugin and support it as a native part
     of their file system tools more quickly.

   * The first operations that you might want to make be native
     instead of being screen scraped (getting the file system size,
     the amount of free space, etc.)  are things which are most easily
     done in C.

     So if you want ask the file system developers to take over the
     plugin, they are much more likely to be willing to say yes if the
     plugin is already in C, as opposed to asking them to take over
     some Python class where trying to integrate python code to call
     into libext2fs is going to be a pain in the ass.  For that
     matter, you might want to implement the plugins to call libext2fs
     and libxfs directly for those basic functions.  That's what the
     EVMS developers did, and those interfaces in libext2fs are
     guaranteed to have ABI and API stability.  Anyway, if your goal
     is to convince file system developers to eventually take
     ownership of the interface/plugin module, it will be much easier
     to do that if it is in C --- trust me on that.

   * I don't think it's going to be that hard to use C; as I've said,
     I really disbelieve that there are that many places where you
     need to screenscrape.  Most of what you will probably need to do
     is to return the exit status of mke2fs, fsck, resize2fs, etc.
     Those programs that you do need to screen scrape will have
     outputs similar to dumpe2fs, which is stupid-easy to parse, and
     are also, as I've noted above, the simplest thing to move to
     being done in native code calling the file system's C library.

Oh, one thing.  I'll note that e2fsprogs has progress bar support
already, and it was designed so it could be easily integrated into a
GUI.  As far as I know Ubuntu was the only distro that used it ---
progress bars tend not to be high on most distros' product manager's
feature priority lists --- but it's there.  See e2fsck's -C option.
This support was also plumbed into fsck (see its -C option), so I
designed it to be something that other file systems could implement.

Also, we're Linux systems programmers, not Web developers writing
Javascript; why use JSON and require fancy parsing when you can just
isolate the completion information onto a separate file descriptor?  I
don't know how big and complex your JSON parsing library is, but all
that was needed to parse *my* completion information is the single
line of C code:

	fscanf(progress_f, "%d %lu %lu %ms\n", &pass, &cur, &max, &text).

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-07-03 15:07     ` Theodore Ts'o
@ 2017-07-03 17:37       ` Darrick J. Wong
  2017-07-04 13:57         ` Jan Tulak
  0 siblings, 1 reply; 17+ messages in thread
From: Darrick J. Wong @ 2017-07-03 17:37 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Jan Tulak, linux-fsdevel

On Mon, Jul 03, 2017 at 11:07:22AM -0400, Theodore Ts'o wrote:
> On Mon, Jul 03, 2017 at 01:52:49PM +0200, Jan Tulak wrote:
> > I want to limit the capabilities of this interface to non-interactive
> > only. So, yes, with fsck, JSON would be overkill. But the idea is to
> > have a single format across all the tools, so you don't need a
> > standalone parser for every tool, even if some tools don't need
> > anything more than an exit code and/or one message on stderr. In the
> > case of ext2/3/4 it is more about resize2fs and tune2fs, where the
> > JSON would be much more useful, than fsck.
> 
> I'm not sure what sort use cases you have in mind where structured
> output would be useful.
> 
> For mke2fs or resize2fs, in general all you care about is "did the
> operation succeed", right?  What did you have in mind where this more
> information than "the operation was successful" did you have in mind?
> 
> For tune2fs, what options or output in tune2fs are sufficiently file
> system independent that you think it would be worth exporting to your
> infrastructure?
> 
> > The generic way would be along the lines: These ten common and
> > frequently used fields are generic and work everywhere, anything else
> > has a prefix (ext4-, xfs-, btrfs-, ...) or is inside of a fs-specific
> > list of extensions "ext4":{"some-option":value}. And during the
> > parsing, all the fields would be mapped to some mkfs arguments,
> > usually 1:1. Similar to what could be done with output, like putting
> > volume identifier into a specific field (e.g. "fsid") no matter what
> > the filesystem is.
> 
> It might be useful if you could give some "user stories" that explain
> at a high level what the user might want to do with the ultimate
> user-visible interface that would require this kind of precision.

Yes, please.

> Most users don't know how to use the specialized options to
> mkfs.<FSTYP> and to be honest, most don't need to.  The way I've dealt

To reinforce that point, in XFS land we tell people to take the defaults
unless they have verified that a specific problem of theirs is fixed by
changing some tuning parameter (and doesn't cause other problems).  Many
of the user complaints I see on the (xfs) list are a result of people
engaging in copy-pasta without understanding the tradeoffs they're
implicitly accepting.

Once users get to the point of having verified their workloads with
non-default options, they often encode a mkfs.xfs command line into
their deployment scripts.  So while I can be convinced that there's
value in a(nother) EVMS-like thing encapsulating (generic) mkfs, I'm not
currently persuaded that there's much point in adapting mkfs.$FSTYP to
accept a bunch of geometry/feature options via json.

(This probably changes for things like btrfs that also manage volumes.)

> with this for mke2fs.conf is that when someone has come up with a
> specialized recipe for a unique sort of file system type --- say,
> maybe for a Lustre Metadata server, or the back-end storage for some
> kind of clustre file system like Hadoopfs, or specialized options for
> an Android phone --- someone with wizard-level skills will edit
> /etc/mke2fs.conf, with perhaps something like this:
> 
>     smr-host-managed = {
>         features = extent,huge_file,bigalloc,flex_bg,uninit_bg,dir_nlink,extra_isize,^resize_inode,sparse_super2
>         cluster_size = 32768
>         hash_alg = half_md4
>         reserved_ratio = 0.0
>         num_backup_sb = 0
>         packed_meta_blocks = 1
>         make_hugefiles = 1
>         inode_ratio = 4194304
>         hugefiles_dir = /smr
>         hugefiles_name = smr-file
>         hugefiles_digits = 0
>         hugefiles_size = 0
>         hugefiles_align = 256M
>         hugefiles_align_disk = true
>         num_hugefiles = 1
>         zero_hugefiles = false
> 	flex_bg_size = 262144
>     }
> 
> ... and then all the user will have to do is run "mke2fs -t ext4 -T
> smr-host-managed /dev/sdXX".  (The intended use case for this is to
> support an SMR-aware user-space application that was going to be
> managing the host-managed SMR zones directly.)
> 
> I suspect telling the user that they have to type a whole series of
> parameters of the form:
> 
> 	"ext4":{"some-option":value}
> 
> into a GUI would not be a particularly user-friendly suggestion.  :-)
> 
> The other question I'd ask is how many people really are going to want
> to use your infrastructure?  Is it only going to be for the "point and
> click" users who will want a simplified interface?  Are you trying to
> make something that will be useful for advanced/expert users?  What
> value are you going to be able to add that will convince the
> advanced/export users that they should learn your new
> "fstype":{"some-option":value} syntax when typing at the command line
> will probably be ten times faster, easier, and less rage-inducing than
> trying to reverse-engineer out some interface that was designed not to
> scare the civilians?  (There's a reason why many drivers prefer manual
> to automatic transmission on their cars.  :-)

I doubt Jan is proposing to eliminate ye olde getopt arguments --
presumably the GUI (or whatever) will generate the json code and the
library stuffs it into mkfs?

Or to put it another way, I would not accept a mkfs.xfs patch ripping
out the classic getopt options, and I suspect Eric Sandeen wouldn't
either.

> > Thanks for pointing out EVMS, I will see what I can learn from that
> > attempt. Starting with screenscraping is certainly an option and might
> > be the only viable one. This raises some other questions, though:
> > given the temporality of the wrapper, I would rather use other
> > languages than C/Bash (e.g. Python) to simplify and speed up the
> > development. But I have doubts about whether you would be willing to
> > adopt this into e2fsprogs, which would, in turn, reduce the usability
> > of this approach.
> 
> The reason why you might want to consider C is because:
> 
>    * It allows the plugin to be imported into many different
>      programming languages: Python, Go, Perl, etc., via using
>      something like SWIG.
> 
>    * Different file system maintainers will be willing to accept
>      maintenance of your plugin at different times.  For some file
>      systems, you may have to wrap the command-line tools forever; for
>      one thing the file system may no longer be under active
>      maintenance (ex: iso9660) but you still might want to be use it
>      in your GUI interface.  Other file system developers will be
>      willing to take over the plugin and support it as a native part
>      of their file system tools more quickly.
> 
>    * The first operations that you might want to make be native
>      instead of being screen scraped (getting the file system size,
>      the amount of free space, etc.)  are things which are most easily
>      done in C.

Weird ioctls and other syscalls aren't so hard in python.  I've written
a clumsy GETFSMAP wrapper:

https://github.com/djwong/filemapper/blob/master/getfsmap.py

But you're correct to point out that wrapping libext2fs in Python would
be difficult, and even more invasive if you took the interesting parts
of, say, tune2fs and put them into a separate library and made tune2fs
merely an arguments-parsing wrapper around the separate library.

I'd occasionally thought about wrapping libext2fs in Python and
concluded that it was too much work for not enough gain, and Ted would
likely NAK it anyway.

>      So if you want ask the file system developers to take over the
>      plugin, they are much more likely to be willing to say yes if the
>      plugin is already in C, as opposed to asking them to take over
>      some Python class where trying to integrate python code to call
>      into libext2fs is going to be a pain in the ass.  For that
>      matter, you might want to implement the plugins to call libext2fs
>      and libxfs directly for those basic functions.  That's what the
>      EVMS developers did, and those interfaces in libext2fs are
>      guaranteed to have ABI and API stability.  Anyway, if your goal
>      is to convince file system developers to eventually take
>      ownership of the interface/plugin module, it will be much easier
>      to do that if it is in C --- trust me on that.

Yeah, probably. :)

I don't think we'll ever have A[BP]I stability in "libxfs".

For that matter, libxfs isn't even a proper library; we statically link
all the object files directly into the tools.  But then Jan is (so far)
only asking to wrap the tools, not reimplement them too.

>    * I don't think it's going to be that hard to use C; as I've said,
>      I really disbelieve that there are that many places where you
>      need to screenscrape.  Most of what you will probably need to do
>      is to return the exit status of mke2fs, fsck, resize2fs, etc.
>      Those programs that you do need to screen scrape will have
>      outputs similar to dumpe2fs, which is stupid-easy to parse, and
>      are also, as I've noted above, the simplest thing to move to
>      being done in native code calling the file system's C library.
> 
> Oh, one thing.  I'll note that e2fsprogs has progress bar support
> already, and it was designed so it could be easily integrated into a
> GUI.  As far as I know Ubuntu was the only distro that used it ---
> progress bars tend not to be high on most distros' product manager's
> feature priority lists --- but it's there.  See e2fsck's -C option.
> This support was also plumbed into fsck (see its -C option), so I
> designed it to be something that other file systems could implement.
> 
> Also, we're Linux systems programmers, not Web developers writing
> Javascript; why use JSON and require fancy parsing when you can just
> isolate the completion information onto a separate file descriptor?  I

I imagine that json makes it easier to export structured information
than a basic flat file, but otoh my experiences with wrapped filesystem
tools is that the good tools report nonzero error status and point you
at the full text logs of what happened.  The bad ones, of course, don't
check the status and don't record the output.

> don't know how big and complex your JSON parsing library is, but all
> that was needed to parse *my* completion information is the single
> line of C code:
> 
> 	fscanf(progress_f, "%d %lu %lu %ms\n", &pass, &cur, &max, &text).

FWIW I've been building the same into xfs_scrub if anyone cares... :)

--D

> 
> Cheers,
> 
> 						- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-07-03 17:37       ` Darrick J. Wong
@ 2017-07-04 13:57         ` Jan Tulak
  2017-07-04 19:07           ` Theodore Ts'o
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Tulak @ 2017-07-04 13:57 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Theodore Ts'o, linux-fsdevel

Reply to both Darrick and Ted.

On Mon, Jul 3, 2017 at 7:37 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> On Mon, Jul 03, 2017 at 11:07:22AM -0400, Theodore Ts'o wrote:
>> On Mon, Jul 03, 2017 at 01:52:49PM +0200, Jan Tulak wrote:

>>
>> I'm not sure what sort use cases you have in mind where structured
>> output would be useful.
>>
>> For mke2fs or resize2fs, in general all you care about is "did the
>> operation succeed", right?  What did you have in mind where this more
>> information than "the operation was successful" did you have in mind?
>>
>> For tune2fs, what options or output in tune2fs are sufficiently file
>> system independent that you think it would be worth exporting to your
>> infrastructure?

I think I muddied the water with the fs-independent fields... :-/ The
main thing I'm trying to achieve is to ditch the separate syntax
parsers for every tool and fs and instead have only one format, one
parser, and the difference between filesystems and tools is only in
the data produced by the parser. If some fields in the data are going
to be common, it is nice, but that's just a detail.

There are some similar values between filesystems, but they are really
basic ones - blocks, inodes, and maybe few more other things, so I
admit the usability of common names for these fields is questionable.
What I really want is that I can run tune2fs or xfs_info, pass the
output through a standard, common parser, and get a structure filled
with data.

>>
>> > The generic way would be along the lines: These ten common and
>> > frequently used fields are generic and work everywhere, anything else
>> > has a prefix (ext4-, xfs-, btrfs-, ...) or is inside of a fs-specific
>> > list of extensions "ext4":{"some-option":value}. And during the
>> > parsing, all the fields would be mapped to some mkfs arguments,
>> > usually 1:1. Similar to what could be done with output, like putting
>> > volume identifier into a specific field (e.g. "fsid") no matter what
>> > the filesystem is.
>>
>> It might be useful if you could give some "user stories" that explain
>> at a high level what the user might want to do with the ultimate
>> user-visible interface that would require this kind of precision.
>
> Yes, please.

I intend it for more automated or abstract storage management. One
where the user doesn't really care about specific values in any given
filesystem, but an intermediate layer can read and configure them.
Perhaps similar to the "mke2fs -t ext4 -T smr-host-managed /dev/sdXX"
Ted mentioned bellow.

It is either for tools that group multiple subsystems under one hood,
abstracting most of the differences between the various filesystems
while still letting the user design the storage (e.g. System Storage
Manage - SSM), or for strongly automated solutions, where the user
just throws some drives into a pool, configures intended use and maybe
few other things like redundancy and the rest happens in the
background - LVM, fs formatting, raid... (e.g. fresh project
stratis-storage [1]). But GUI like YaST or Anaconda will certainly
benefit from it as well.

[1] https://github.com/stratis-storage/stratisd

>
>> Most users don't know how to use the specialized options to
>> mkfs.<FSTYP> and to be honest, most don't need to.  The way I've dealt
>
> To reinforce that point, in XFS land we tell people to take the defaults
> unless they have verified that a specific problem of theirs is fixed by
> changing some tuning parameter (and doesn't cause other problems).  Many
> of the user complaints I see on the (xfs) list are a result of people
> engaging in copy-pasta without understanding the tradeoffs they're
> implicitly accepting.
>
> Once users get to the point of having verified their workloads with
> non-default options, they often encode a mkfs.xfs command line into
> their deployment scripts.  So while I can be convinced that there's
> value in a(nother) EVMS-like thing encapsulating (generic) mkfs, I'm not
> currently persuaded that there's much point in adapting mkfs.$FSTYP to
> accept a bunch of geometry/feature options via json.
>
> (This probably changes for things like btrfs that also manage volumes.)

See above, the goal is not to give an end user a way how to fiddle
with configuration, but rather to simplify the work for developers of
automated tools, so they can tune the filesystem for specific
workloads without having to care about syntax details that are
different between tools and filesystems.

>
>> with this for mke2fs.conf is that when someone has come up with a
>> specialized recipe for a unique sort of file system type --- say,
>> maybe for a Lustre Metadata server, or the back-end storage for some
>> kind of clustre file system like Hadoopfs, or specialized options for
>> an Android phone --- someone with wizard-level skills will edit
>> /etc/mke2fs.conf, with perhaps something like this:
>>
<snip>
>>
>> ... and then all the user will have to do is run "mke2fs -t ext4 -T
>> smr-host-managed /dev/sdXX".  (The intended use case for this is to
>> support an SMR-aware user-space application that was going to be
>> managing the host-managed SMR zones directly.)
>>
>> I suspect telling the user that they have to type a whole series of
>> parameters of the form:
>>
>>       "ext4":{"some-option":value}
>>
>> into a GUI would not be a particularly user-friendly suggestion.  :-)
>>
>> The other question I'd ask is how many people really are going to want
>> to use your infrastructure?  Is it only going to be for the "point and
>> click" users who will want a simplified interface?  Are you trying to
>> make something that will be useful for advanced/expert users?  What
>> value are you going to be able to add that will convince the
>> advanced/export users that they should learn your new
>> "fstype":{"some-option":value} syntax when typing at the command line
>> will probably be ten times faster, easier, and less rage-inducing than
>> trying to reverse-engineer out some interface that was designed not to
>> scare the civilians?  (There's a reason why many drivers prefer manual
>> to automatic transmission on their cars.  :-)
>
> I doubt Jan is proposing to eliminate ye olde getopt arguments --
> presumably the GUI (or whatever) will generate the json code and the
> library stuffs it into mkfs?
>
> Or to put it another way, I would not accept a mkfs.xfs patch ripping
> out the classic getopt options, and I suspect Eric Sandeen wouldn't
> either.

The old getopt is going to stay and anyone using mkfs.<FS> directly
will see everything the same way as it is now. :-) The json way is
only to simplify "fs tool <---> abstracting tool" data exchange. Which
answers the question "who will use it": The json will see only
developers or advanced users who are making some scripts, and who want
better compatibility between versions and across filesystems. But it
shouldn't be used in lieu of current getopts anywhere else. The json
input would probably do an internal transformation from json to getopt
anyway unless there is a useful internal infrastructure where the
getopt part could be skipped.

>>
>> The reason why you might want to consider C is because:
>>
>>    * It allows the plugin to be imported into many different
>>      programming languages: Python, Go, Perl, etc., via using
>>      something like SWIG.
>>
>>    * Different file system maintainers will be willing to accept
>>      maintenance of your plugin at different times.  For some file
>>      systems, you may have to wrap the command-line tools forever; for
>>      one thing the file system may no longer be under active
>>      maintenance (ex: iso9660) but you still might want to be use it
>>      in your GUI interface.  Other file system developers will be
>>      willing to take over the plugin and support it as a native part
>>      of their file system tools more quickly.
>>
>>    * The first operations that you might want to make be native
>>      instead of being screen scraped (getting the file system size,
>>      the amount of free space, etc.)  are things which are most easily
>>      done in C.
>
> Weird ioctls and other syscalls aren't so hard in python.  I've written
> a clumsy GETFSMAP wrapper:
>
> https://github.com/djwong/filemapper/blob/master/getfsmap.py
>
> But you're correct to point out that wrapping libext2fs in Python would
> be difficult, and even more invasive if you took the interesting parts
> of, say, tune2fs and put them into a separate library and made tune2fs
> merely an arguments-parsing wrapper around the separate library.
>
> I'd occasionally thought about wrapping libext2fs in Python and
> concluded that it was too much work for not enough gain, and Ted would
> likely NAK it anyway.

Understood, anything else than C won't fly. :-)

>
>>      So if you want ask the file system developers to take over the
>>      plugin, they are much more likely to be willing to say yes if the
>>      plugin is already in C, as opposed to asking them to take over
>>      some Python class where trying to integrate python code to call
>>      into libext2fs is going to be a pain in the ass.  For that
>>      matter, you might want to implement the plugins to call libext2fs
>>      and libxfs directly for those basic functions.  That's what the
>>      EVMS developers did, and those interfaces in libext2fs are
>>      guaranteed to have ABI and API stability.  Anyway, if your goal
>>      is to convince file system developers to eventually take
>>      ownership of the interface/plugin module, it will be much easier
>>      to do that if it is in C --- trust me on that.
>
> Yeah, probably. :)
>
> I don't think we'll ever have A[BP]I stability in "libxfs".
>
> For that matter, libxfs isn't even a proper library; we statically link
> all the object files directly into the tools.  But then Jan is (so far)
> only asking to wrap the tools, not reimplement them too.

Yes. As I mentioned at the end of the very first email, I think that
simply adding a structured output (and possibly input) to the tools
will help the most. All the other stuff about providing calls as a
library is something I consider a far future if it happens at all.

>
>>    * I don't think it's going to be that hard to use C; as I've said,
>>      I really disbelieve that there are that many places where you
>>      need to screenscrape.  Most of what you will probably need to do
>>      is to return the exit status of mke2fs, fsck, resize2fs, etc.
>>      Those programs that you do need to screen scrape will have
>>      outputs similar to dumpe2fs, which is stupid-easy to parse, and
>>      are also, as I've noted above, the simplest thing to move to
>>      being done in native code calling the file system's C library.
>>
>> Oh, one thing.  I'll note that e2fsprogs has progress bar support
>> already, and it was designed so it could be easily integrated into a
>> GUI.  As far as I know Ubuntu was the only distro that used it ---
>> progress bars tend not to be high on most distros' product manager's
>> feature priority lists --- but it's there.  See e2fsck's -C option.
>> This support was also plumbed into fsck (see its -C option), so I
>> designed it to be something that other file systems could implement.
>>
>> Also, we're Linux systems programmers, not Web developers writing
>> Javascript; why use JSON and require fancy parsing when you can just
>> isolate the completion information onto a separate file descriptor?  I
>
> I imagine that json makes it easier to export structured information
> than a basic flat file, but otoh my experiences with wrapped filesystem
> tools is that the good tools report nonzero error status and point you
> at the full text logs of what happened.  The bad ones, of course, don't
> check the status and don't record the output.
>
>> don't know how big and complex your JSON parsing library is, but all
>> that was needed to parse *my* completion information is the single
>> line of C code:
>>
>>       fscanf(progress_f, "%d %lu %lu %ms\n", &pass, &cur, &max, &text).
>
> FWIW I've been building the same into xfs_scrub if anyone cares... :)
>

A return code and a log file are not going to help much when I need to
know the geometry of a volume. I suggested adding json also to the
simple cases (where return code would be enough) only because this
change should be almost for free (the infrastructure will be there
anyway) and it will add the same output format as the more verbose
tools have. That being said, it doesn't have to be json. Another
format might work as well. The advantage of json is that it already
solved all the special cases that may happen, supports nested
structures and is already used by other storage-related tools. If
there should be a single format, it needs to handle snapshots and
subvolumes, etc, even if not every tool will use its whole power.


And by the way, I'm going to be off until Monday. The next two days
(Wed/Thu) are public holidays here in CZ and then it is almost a
weekend, so I will be off the grid for most of the next five days.

Cheers,
Jan

-- 
Jan Tulak
jtulak@redhat.com / jan@tulak.me

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-07-04 13:57         ` Jan Tulak
@ 2017-07-04 19:07           ` Theodore Ts'o
  2017-07-12  8:42             ` Jan Tulak
  0 siblings, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2017-07-04 19:07 UTC (permalink / raw)
  To: Jan Tulak; +Cc: Darrick J. Wong, linux-fsdevel

On Tue, Jul 04, 2017 at 03:57:38PM +0200, Jan Tulak wrote:
> 
> There are some similar values between filesystems, but they are really
> basic ones - blocks, inodes, and maybe few more other things, so I
> admit the usability of common names for these fields is questionable.
> What I really want is that I can run tune2fs or xfs_info, pass the
> output through a standard, common parser, and get a structure filled
> with data.

Can you make a list of what you might need?  The reason why I was
asking for "user stories" is because I was trying to get you to think
in more detail about exactly what functionality you really need.  Not
just something hand wavy like "tools that want to abstract most of the
differences between various file systems" by exactly what file
systems.  (e.g., create a file system, resize a file system, check a
file system --- what else?)

I think if you make a list of the functionality that you need to
abstract --- and thinking about "user stories" is a common way UI
designers try to get developers to think in terms of "what does the
user want to do", as opposed to the developer-centric "oh, I want an
abstract file system interface" answer.

In terms of what you might need, I'm guessing it won't be much more
than:

* Blocksize
* Number of blocks in the file system
* Number of free blocks in the file system
* Number of inodes in the file system (*)
* Number of free inodes in the file system (*)
* Volume label (**)
* UUID (**)

(*) It's not clear you even need the number of inodes / free inodes.
Try going through potential "user stories" and see what you find.

(**) Already available via libblkid and the blkid CLI.

I suspect you may be making this way more complicated that it needs to
be.  These fields are pretty basic, and have changed in decades.  I'm
pretty sure the original EVMS screen-scraping code that parsed the
output of "dumpe2fs -h" from 10+ years ago would still work today.  So
when you assert that the screen scraping is terrible because the tools
are constantly changing the output --- I'd like to gently challenge
that assertion.

These are also the fields which are *trivially* easy to extract via a
C interface, and which again, even if they were done without the
support of the file system developers would almost certainly be
stable.  The volume ID and UUID are extracted for the superblock of a
large number of file systems in libblkid, and most of the probe code
involved hasn't changed in decades.  Changes were only to support new
file systems, not because file systems are constantly changing the
superblock encoding for where to find the Volume label or UUID.  I
assure you the same would be tree for the fields listed above.

Doing something like libblkid's probe functions for the number of
block/inodes will almost certainly be less work than trying to get
JSON patches crammed into a large number of file system tools.  If you
did it in a standalone, libblkid-like library, it would also have the
benefit that you could query a file system for its basic statistics
even if the file system's userspace tools have been installed in the
distribution.

So perhaps that's something you should consider?

							- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-06-30  8:17 [proposal] making filesystem tools more machine friendly Jan Tulak
                   ` (2 preceding siblings ...)
  2017-06-30 15:29 ` Theodore Ts'o
@ 2017-07-05 18:11 ` Christoph Hellwig
  2017-07-12 13:00   ` Jan Tulak
  2017-07-12 17:10 ` Richard W.M. Jones
  2017-11-27 14:57 ` Andrew Price
  5 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2017-07-05 18:11 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-fsdevel, linux-btrfs, linux-ext4, linux-xfs

Hi Jan,

I really like the idea out json output for fs/block tools.

nvme-cli has been doing it for a while, and that has been very
useful.  It comes with it's own little handrolled json output
helpers:

	https://github.com/linux-nvme/nvme-cli/blob/master/json.c

so it doesn't introduce any new dependencies.  If you want to parse
input as well a library might be a better approach, though.

I really don't like the shared object evms model mention in the thread,
as that creates hard ELF dependencies.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-07-04 19:07           ` Theodore Ts'o
@ 2017-07-12  8:42             ` Jan Tulak
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Tulak @ 2017-07-12  8:42 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Darrick J. Wong, linux-fsdevel

On Tue, Jul 4, 2017 at 9:07 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> In terms of what you might need, I'm guessing it won't be much more
> than:
>
> * Blocksize
> * Number of blocks in the file system
> * Number of free blocks in the file system
> * Number of inodes in the file system (*)
> * Number of free inodes in the file system (*)
> * Volume label (**)
> * UUID (**)
>
> (*) It's not clear you even need the number of inodes / free inodes.
> Try going through potential "user stories" and see what you find.
>
> (**) Already available via libblkid and the blkid CLI.


For ext4/xfs, that's almost all. But it get's worse with btrfs, as you
can see in this list:
 * create/resize/remove/check a filesystem on a volume
 * get basic stats (label, uuid, size, free space, blocksize)
 * set/edit/unset quotas (disk, user, group, project)
 * add/remove device from a pool
 * create/delete/mount subvolumes and snapshots in a pool

The reason why I include the pool and subvolume capabilities of btrfs
(and possibly zfs) is that other filesystems can achieve similar
results with LVM and the tools I mentioned are bridging and hiding
this difference. So, for ext4/xfs, I need only to add the fields
regarding quotas to the list and that should be all I can think of
now. Btrfs is another issue, though.


>
> I suspect you may be making this way more complicated that it needs to
> be.  These fields are pretty basic, and have changed in decades.  I'm
> pretty sure the original EVMS screen-scraping code that parsed the
> output of "dumpe2fs -h" from 10+ years ago would still work today.  So
> when you assert that the screen scraping is terrible because the tools
> are constantly changing the output --- I'd like to gently challenge
> that assertion.

OK, I exaggerated how much it changes. Still, even a new field can
cause trouble (depending on how the parser is written), which is an
unnecessary issue.

>
> These are also the fields which are *trivially* easy to extract via a
> C interface, and which again, even if they were done without the
> support of the file system developers would almost certainly be
> stable.  The volume ID and UUID are extracted for the superblock of a
> large number of file systems in libblkid, and most of the probe code
> involved hasn't changed in decades.  Changes were only to support new
> file systems, not because file systems are constantly changing the
> superblock encoding for where to find the Volume label or UUID.  I
> assure you the same would be tree for the fields listed above.
>
> Doing something like libblkid's probe functions for the number of
> block/inodes will almost certainly be less work than trying to get
> JSON patches crammed into a large number of file system tools.  If you
> did it in a standalone, libblkid-like library, it would also have the
> benefit that you could query a file system for its basic statistics
> even if the file system's userspace tools have been installed in the
> distribution.
>
> So perhaps that's something you should consider?
>
>                                                         - Ted

An independent tool is a good idea at the beginning, making it
accessible to more people (who doesn't have to rebuild *progs), but we
still have another layer for a functionality that could be done
directly within the fs tools. While I may go this way initially, I
would still consider a merge as a goal.

I think that this way might be reasonable and acceptable for everyone:
Start with the wrappers, first for tools where the situation is the
worst, and meanwhile do some small changes in *progs which are useful
for everyone - e.g. ensure that every tool can be run
non-interactively when required (some time ago, there were issues with
Anaconda getting stuck when a fs tool waited for a confirmation, but I
can't find the bugs right now to check the details and if the tools
were fixed or Anaconda made a workaround). And when the bonus of the
new interface is clear and there are some users, we can talk about how
to merge it.

Cheers,
Jan

-- 
Jan Tulak
jtulak@redhat.com / jan@tulak.me

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-07-05 18:11 ` Christoph Hellwig
@ 2017-07-12 13:00   ` Jan Tulak
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Tulak @ 2017-07-12 13:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel

On Wed, Jul 5, 2017 at 8:11 PM, Christoph Hellwig <hch@infradead.org> wrote:
> Hi Jan,
>
> I really like the idea out json output for fs/block tools.
>
> nvme-cli has been doing it for a while, and that has been very
> useful.  It comes with it's own little handrolled json output
> helpers:
>
>         https://github.com/linux-nvme/nvme-cli/blob/master/json.c

Thanks for the link. :-)

Jan

-- 
Jan Tulak
jtulak@redhat.com / jan@tulak.me

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-06-30  8:17 [proposal] making filesystem tools more machine friendly Jan Tulak
                   ` (3 preceding siblings ...)
  2017-07-05 18:11 ` Christoph Hellwig
@ 2017-07-12 17:10 ` Richard W.M. Jones
  2017-11-27 14:57 ` Andrew Price
  5 siblings, 0 replies; 17+ messages in thread
From: Richard W.M. Jones @ 2017-07-12 17:10 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-fsdevel, linux-btrfs, linux-ext4, linux-xfs


libguestfs could really use structured output from more of the command
line tools.  Particularly:

 - all the ext4 tools
 - all the xfs tools
 - all the btrfs tools
 - parted

and more.  See also:

  https://github.com/libguestfs/libguestfs/tree/master/daemon

A dbus service would not be useful.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-06-30  8:17 [proposal] making filesystem tools more machine friendly Jan Tulak
                   ` (4 preceding siblings ...)
  2017-07-12 17:10 ` Richard W.M. Jones
@ 2017-11-27 14:57 ` Andrew Price
  2017-11-27 15:38   ` Jan Tulak
  5 siblings, 1 reply; 17+ messages in thread
From: Andrew Price @ 2017-11-27 14:57 UTC (permalink / raw)
  To: Jan Tulak, linux-fsdevel

Hi,

On 30/06/17 09:17, Jan Tulak wrote:
> AKA filesystem API
> 
> Hi guys
> 
> Currently, filesystem tools are not made with automation in mind. So
> any tool that wants to interact with filesystems (be it for
> automation, or to provide a more user-friendly interface) has to
> screen scrape everything and cope with changing outputs.
> 
> I think it is the time to focus some thoughts on how to make the fs
> tools easier to be used by scripts and other tools. 

What's the status of this? I'd like to make sure gfs2-utils is geared up 
for it and catered for, whatever solution is chosen.

Perhaps this ship has already sailed, but, I think a json dependency may 
be a little too heavy, and perhaps a simpler stream of key-value lists 
that can be generated with fprintf() would suffice? For accepting 
options, we already have code to parse that sort of thing as we handle 
"foo=bar,baz=42" style strings for extended options so it shouldn't take 
much work to repurpose that. That said, I don't think passing options to 
tools in this way holds much value over specifying them in argv.

Willing to budge for consensus, though.

Andy

> Now, to ease you,
> the answer to the obvious question "who will do it" is "me". I don't
> want to force you into anything, though, so I'm opening this
> discussion pretty early with some ideas and I hope to hear from you
> what do you think about it before anything is set in the stone. (For
> those who visited Vault this year, Justin Mitchell and I had a talk
> about this, codename Springfield.)
> 
> The following text attempts to identify issues with using
> filesystems-related tools in scripts/applications and proposes a
> solution to those issues.
> 
> Content:
> 1. A quick introduction
> 2. Details of the issues
> 3. Proposed Solutions
> 4. Conclusion
> 
> 1. A quick introduction
> =================
> 
> I discussed this topic with people who are building something around
> fs tools. For example, the developer of libblockdev (Vratislav
> Podzimek, https://github.com/vpodzime/libblockdev) or system storage
> manager (was Lukas Czerner, now it is me,
> https://sourceforge.net/projects/storagemanager/), and the listed
> issues are a product of experience from working with those tools. The
> issues are related mostly to basic operations like mkfs, fsck,
> snapshots and resizing. Advanced/debugging tools like xfs_io or xfs_db
> are not in the focus, but they might benefit from any possible change
> too if included in it.
> 
> The main issues of the current state, where all the tools are run with
> some flags and options and produce a human readable output:
>   * The output format can change, sometimes without an easy way of
> detection like a version number.
>   * Different output formats for different tools even in a single FS,
> thus zero code reuse/sharing.
>   * Screenscraping can introduce bugs (a rare inner condition can add a
> field into the output and break regular expressions).
>   * No (or weak) progress report.
>   * Different filesystems have different input formats (flags, options)
> even for the most basic capabilities.
>   * Thread safety, forking
> 
> 2. Details of the issues
> ==================
> 
> Let’s look at the issues now: why it is an issue and if we can do some
> small change to fix it on its own.
> 
> 
> The output format can change, sometimes without an easy way of
> detection like a version number. Most filesystems are well behaved,
> but still, we don’t know what exactly are people doing with the tools
> and even adding a new field can possibly break a script. Keeping a
> compatibility with older versions adds another complexity to such
> tools.
> 
> What can be done about this? The new fields have to be printed somehow
> and changing the format of the standard output would break everything.
> Making sure that if the input or output changes in any way, it is
> always with a detectable difference in the version number is a good
> practice, but it doesn’t solve the issue, it only makes hacking around
> it easier.
> 
> What can really help is to have an alternative output (which can be
> turned on when the user wants it), which is easy to parse and which is
> resilient to a certain degree of changes because it can express
> dynamic items like lists or arrays: JSON, XML...
> 
> 
> Different input/output formats for different tools even in a single
> FS, thus zero code reuse/sharing: Support for every tool and every
> filesystem has to start from a scratch. Nothing can be done about it
> without a change in the output format. But if an optional JSON or XML
> or something was supported, then instead of creating a parser for
> every tool, there could be used just one standard and already a
> well-tested library.
> 
> 
> Screenscraping can introduce bugs (some rare inner condition can add a
> field into the output and break regular expressions): Well, let’s just
> look at how many services still can’t even parse and verify an email
> address correctly. And we have a lot more complex text… Again, some
> easy-to-parse format with existing libraries that would turn the text
> into a bunch of variables or an object would help.
> 
> 
> No (or weak) progress report: Especially for tools that can run for a
> long time, like fsck. Screenscraping everything it throws out and then
> deciding whether it is a progress report message (because instead of
> “25 %” it says “running operation foo”), a message to tell the user,
> or something to just ignore is a lot less comfortable and error prone
> than “{level: ‘progress’, stage: 5, msg: ‘operation foo’}”.
> 
> 
> Different filesystems have different input formats (flags, options)
> even for the most basic capabilities: Similar to “Different
> input/output formats for different tools...”. For example, for
> labeling a partition, you use mkfs.xfs -L label, but mkfs.vfat -n
> label. However, changing this requires getting the same functionality
> with a common basic specification to other filesystems too.
> 
> 
> Thread safety, forking: The people who work on a library, like
> libblockdev, doesn’t like that they have to fork another process over
> which they have no control, as they can’t guarantee anything about it.
> This can’t be fixed by changing the output format, though, but would
> require making a public library providing a similar interface as the
> existing fs tools. No detailed access to insides is needed, just a way
> how to run mkfs, fsck, snapshots, etc… without spawning another
> process and without screenscraping.
> 
> 
> 3. Proposed Solutions
> =================
> 
> There are two (complementary) ways how to address the issues: add a
> structured, machine-readable format for input/output of the tools
> (e.g. JSON, XML, …) and to create a library with the functionality of
> the existing tools. Let’s look now at those options. I will focus on
> what changes they would require, what would be the price for
> maintaining that solution and if there are any drawbacks or additional
> advantages.
> 
> An optional third option would be to create a system service/daemon,
> that would use dbus or some other structured interface to accept jobs
> and return results. I think that LVM people are working on something
> like this.
> 
> The proposed solutions are ordered in regards to their complexity.
> Also, they can be seen as follow-ups, because every proposed option
> requires big part of the work from the previous one anyway.
> 
> 
> 3.1. Structured Output
> -------------------------------
> In other words, what LVM already does with --reportformat
> {basic|json}, and lsblk with --json. Possibly, we could also make JSON
> input too. That would allow the user to, instead of using all the
> flags and options of CLI, make something like
> --jsoninput=“{dev:’/dev/abc’, force: true, … }”
> 
> Some preliminary notes about the format:
> Most likely, this would mean JSON. JSON is currently preferred over
> XML because it is easier to read by humans if the need arises, it’s
> encoder/parser is lighter and easier to use. Also, other projects like
> LVM, multipath or lsblk already uses JSON, so it would be nice to
> don’t break the group.
> 
> Required implementation changes/expected problems:
> In an ideal world, a simple replacement of all prints with a wrapping
> function would be enough. However, as far as I know, an overwhelming
> majority of the tools has printing functions spread through the code
> and prints everything as soon as it knows the specific bit of
> information.
> 
> Change of the output into a structured format means some refactoring.
> Instead of simple printf(), an object, array or structure has to be
> created and rather than pure strings, more diagnostically useful
> values have to be added into it in place of the current prints. Then,
> when it is reasonable, print it out all at once in any desired format.
> The “when reasonable” would usually mean at the end, but it could also
> print progress if it is a long time running operation.
> 
> Because of the kinds of the required changes, the implementation can
> be split into two parts: first, clean the code and move all the prints
> out from spaghetti to the end. Then, once this is done, add the
> structured format.
> 
> Maintaining overhead:
> Small. By separating the printing from generating the data, we don’t
> really care about the output format anywhere except in the printing
> function itself, and if a new field or value is added or an old one
> removed, then the amount of work is roughly equal to the current
> state.
> 
> Drawbacks:
> Searching for a place in the code where something happens would be
> more complicated - instead of simple search for a string that is the
> same in the code and in the output, one would search for the line that
> adds the data to the message log. This could be simplified by using
> __LINE__ macro with debug output. (With JSON, an additional field
> would not affect anything, so it would be completely safe.)
> 
> Additional advantages:
> The refactoring can clean up the code a bit. It is easy to add any
> other format in the future. Our own test suite could also benefit from
> the structured output.
> 
> Comment:
> The most bang for the buck option and most of the work done for this
> is required also for every other option. In terms of specific
> interface (i.e. common JSON elements), we need to identify a common
> subset of fields/options that every fs will use and anything else move
> into fs-specific extensions.
> 
> With regards to the compatibility between different filesystems, the
> best way how to specify the format might be a small library that would
> take raw data on one side and turn it into JSON string or even print
> it (and the reverse, if input supports json too). This way, we would
> be sure that there really is a common ground that works the same in
> every fs.
> 
> Another way how to achieve the compatibility is to make an RFC-like
> document. For example: All occurrences of a filesystem identifier MUST
> be in a field named 'fsid' which SHOULD contain a UUID formatted
> string. I think this is useful even if we end up with the library as a
> way to find out the common ground.
> 
> 3.2. A Library
> -------------------
> A library implementing basic functions like: mkfs(int argc, char
> *argv[]), fsck(), … etc. Once done, binding for other languages like
> Python is possible too.
> 
> Required implementation changes/expected problems:
> If the implementation of this library would follow up after the
> changes that add the structured output, then most of the work would be
> already done. The most complex issue remaining would probably be that
> there can be no exit() call - if anything fails, we have to gracefully
> return up the stack and out of the library.
> 
> A duplicity of functionality is not an issue because there is none -
> the binary tools like mkfs.xfs would become simple wrappers around the
> library functions: pass user input to the specified library call, then
> print any message created in the library and exit.
> 
> Maintaining overhead:
> None? The code would be cleaned up and moved, but there wouldn’t be
> new things to maintain.
> 
> Drawbacks:
> I can’t think out any...
> 
> Additional advantages:
> The refactoring can clean up the code a bit.
> 
> Comment:
> Useful and nice to have, but doesn’t have to be done ASAP.
> 
> 
> 3.3. A system service
> ------------------------------
> A system service/daemon, that would use dbus or some other structured
> interface to accept jobs and return results.
> 
> Required implementation changes/expected problems:
> We don’t want the daemon to exit if it can’t access a file, we don’t
> want it to do printfs(), … So, at the end, we have to do the
> structured output and library and then a lot of work above it.
> 
> Maintaining overhead:
> All the system services things plus whatever the other two solutions require.
> 
> Drawbacks:
> The biggest maintaining overhead of all proposed solutions, basically
> a new tool/subproject. Using dbus in a third party project is more
> work than to just include a library and call one function.
> 
> Additional advantages:
> It wouldn’t be possible to attempt concurrent modifications of a device.
> 
> Comment:
> In my opinion, this shouldn’t be our project. Let other people make it
> their front if they want something like this and instead, just make it
> easier for them by using one of the other solutions.
> 
> 
> 4. Conclusion
> ===========
> 
> A structured output is something we should aim for. It helps a lot
> with the issues and it is the cheapest option. If it goes well, it can
> be later on followed by creating a library, although, at this moment,
> that seems a premature thing strive for. Creating a daemon is not a
> thing any single filesystem should do on its own.
> 
> And thank you for reading all this, I look forward to your comments. :-)
> Jan
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-11-27 14:57 ` Andrew Price
@ 2017-11-27 15:38   ` Jan Tulak
  2017-11-27 16:24     ` Andrew Price
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Tulak @ 2017-11-27 15:38 UTC (permalink / raw)
  To: Andrew Price; +Cc: linux-fsdevel

On Mon, Nov 27, 2017 at 3:57 PM, Andrew Price <anprice@redhat.com> wrote:
> Hi,
>
> On 30/06/17 09:17, Jan Tulak wrote:
>>
>> AKA filesystem API
>>
>> Hi guys
>>
>> Currently, filesystem tools are not made with automation in mind. So
>> any tool that wants to interact with filesystems (be it for
>> automation, or to provide a more user-friendly interface) has to
>> screen scrape everything and cope with changing outputs.
>>
>> I think it is the time to focus some thoughts on how to make the fs
>> tools easier to be used by scripts and other tools.
>
>
> What's the status of this? I'd like to make sure gfs2-utils is geared up for
> it and catered for, whatever solution is chosen.

I decided to go the way of using wrapper at first - that solves some
of the use cases I'm concerned sooner. And if there are enough users
of this wrapper, then in a time I can look at the integration again.

>
> Perhaps this ship has already sailed, but, I think a json dependency may be
> a little too heavy, and perhaps a simpler stream of key-value lists that can
> be generated with fprintf() would suffice? For accepting options, we already
> have code to parse that sort of thing as we handle "foo=bar,baz=42" style

That's not enough once you have any hierarchy in your data, i.e.
multiple volumes in a group, and each has a name or path... Hence, I
strived for an advanced format, like JSON. However, if gfs2-utils does
not need anything more, there is no need to use a complex solution for
this moment.

What makes it easy to add a common format, later on, is not having
prints all through the program together with a heap of exit() calls,
and rather, if there is some structure containing the state from which
the output is created at one point no matter what. Of course, that can
be difficult to achieve if you want to print some progress or if the
program would require deeper changes... which is why I let it be for
now until I can show that there really is a sufficient interest from
the users of a wrapper, and the effort is better justified.

Cheers,
Jan

> strings for extended options so it shouldn't take much work to repurpose
> that. That said, I don't think passing options to tools in this way holds
> much value over specifying them in argv.
>
> Willing to budge for consensus, though.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-11-27 15:38   ` Jan Tulak
@ 2017-11-27 16:24     ` Andrew Price
  2017-11-27 16:42       ` Jan Tulak
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Price @ 2017-11-27 16:24 UTC (permalink / raw)
  To: Jan Tulak; +Cc: linux-fsdevel

On 27/11/17 15:38, Jan Tulak wrote:
> On Mon, Nov 27, 2017 at 3:57 PM, Andrew Price <anprice@redhat.com> wrote:
>> On 30/06/17 09:17, Jan Tulak wrote:
>>>
>>> AKA filesystem API
>>>
>>> Hi guys
>>>
>>> Currently, filesystem tools are not made with automation in mind. So
>>> any tool that wants to interact with filesystems (be it for
>>> automation, or to provide a more user-friendly interface) has to
>>> screen scrape everything and cope with changing outputs.
>>>
>>> I think it is the time to focus some thoughts on how to make the fs
>>> tools easier to be used by scripts and other tools.
>>
>>
>> What's the status of this? I'd like to make sure gfs2-utils is geared up for
>> it and catered for, whatever solution is chosen.
> 
> I decided to go the way of using wrapper at first - that solves some
> of the use cases I'm concerned sooner. And if there are enough users
> of this wrapper, then in a time I can look at the integration again.
> 
>>
>> Perhaps this ship has already sailed, but, I think a json dependency may be
>> a little too heavy, and perhaps a simpler stream of key-value lists that can
>> be generated with fprintf() would suffice? For accepting options, we already
>> have code to parse that sort of thing as we handle "foo=bar,baz=42" style
> 
> That's not enough once you have any hierarchy in your data, i.e.
> multiple volumes in a group, and each has a name or path... Hence, I
> strived for an advanced format, like JSON. 

A nested format is not necessarily required to describe a nested 
structure. It does take away the need for a "parent" field in each 
record but I'm not convinced that is a sufficent trade-off for adding a 
json lib dependency.

> However, if gfs2-utils does
> not need anything more, there is no need to use a complex solution for
> this moment.

I think the only hierarchical support we might want is for progress 
reporting to be split into stages and sub-stages.

> What makes it easy to add a common format, later on, is not having
> prints all through the program together with a heap of exit() calls,

I'm not sure I understand this point...

> and rather, if there is some structure containing the state from which
> the output is created at one point no matter what. Of course, that can
> be difficult to achieve if you want to print some progress or if the
> program would require deeper changes... which is why I let it be for
> now until I can show that there really is a sufficient interest from
> the users of a wrapper, and the effort is better justified.

Okay, thanks for the update. In case it helps in future, I'm willing to 
experiment with a (test?) tool that works with the gfs2-utils in a 
development branch.

Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [proposal] making filesystem tools more machine friendly
  2017-11-27 16:24     ` Andrew Price
@ 2017-11-27 16:42       ` Jan Tulak
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Tulak @ 2017-11-27 16:42 UTC (permalink / raw)
  To: Andrew Price; +Cc: linux-fsdevel

On Mon, Nov 27, 2017 at 5:24 PM, Andrew Price <anprice@redhat.com> wrote:
> On 27/11/17 15:38, Jan Tulak wrote:
>>
>> On Mon, Nov 27, 2017 at 3:57 PM, Andrew Price <anprice@redhat.com> wrote:
>>>
>>> On 30/06/17 09:17, Jan Tulak wrote:
>>>>
>>>>
>>>> AKA filesystem API
>>>>
>>>> Hi guys
>>>>
>>>> Currently, filesystem tools are not made with automation in mind. So
>>>> any tool that wants to interact with filesystems (be it for
>>>> automation, or to provide a more user-friendly interface) has to
>>>> screen scrape everything and cope with changing outputs.
>>>>
>>>> I think it is the time to focus some thoughts on how to make the fs
>>>> tools easier to be used by scripts and other tools.
>>>
>>>
>>>
>>> What's the status of this? I'd like to make sure gfs2-utils is geared up
>>> for
>>> it and catered for, whatever solution is chosen.
>>
>>
>> I decided to go the way of using wrapper at first - that solves some
>> of the use cases I'm concerned sooner. And if there are enough users
>> of this wrapper, then in a time I can look at the integration again.
>>
>>>
>>> Perhaps this ship has already sailed, but, I think a json dependency may
>>> be
>>> a little too heavy, and perhaps a simpler stream of key-value lists that
>>> can
>>> be generated with fprintf() would suffice? For accepting options, we
>>> already
>>> have code to parse that sort of thing as we handle "foo=bar,baz=42" style
>>
>>
>> That's not enough once you have any hierarchy in your data, i.e.
>> multiple volumes in a group, and each has a name or path... Hence, I
>> strived for an advanced format, like JSON.
>
>
> A nested format is not necessarily required to describe a nested structure.
> It does take away the need for a "parent" field in each record but I'm not
> convinced that is a sufficent trade-off for adding a json lib dependency.

json was presented as an option, that is already established and well
known. I'm not against using another format if it is universal enough
and easy to use by 3rd parties, but the discussion never got so far,
because the more important and underlying issues weren't sufficiently
solved. The output format is really just a cosmetical thing.


>
>> What makes it easy to add a common format, later on, is not having
>> prints all through the program together with a heap of exit() calls,
>
>
> I'm not sure I understand this point...

I mean this: Is any failure propagated up to main(), which contains
the only exit() in the whole program, or is exit() called immediately
on any issue, even in functions ten levels of stack deep, without
returning up the stack? If the first, then it is easy to just print
the state of the application at the end of any run, no matter what
happens. If the later, it can more difficult.

>
>> and rather, if there is some structure containing the state from which
>> the output is created at one point no matter what. Of course, that can
>> be difficult to achieve if you want to print some progress or if the
>> program would require deeper changes... which is why I let it be for
>> now until I can show that there really is a sufficient interest from
>> the users of a wrapper, and the effort is better justified.
>
>
> Okay, thanks for the update. In case it helps in future, I'm willing to
> experiment with a (test?) tool that works with the gfs2-utils in a
> development branch.
>

OK, thanks.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-11-27 16:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-30  8:17 [proposal] making filesystem tools more machine friendly Jan Tulak
2017-06-30 10:22 ` Arvin Schnell
2017-06-30 13:58 ` Emmanuel Florac
2017-06-30 15:29 ` Theodore Ts'o
2017-07-03 11:52   ` Jan Tulak
2017-07-03 15:07     ` Theodore Ts'o
2017-07-03 17:37       ` Darrick J. Wong
2017-07-04 13:57         ` Jan Tulak
2017-07-04 19:07           ` Theodore Ts'o
2017-07-12  8:42             ` Jan Tulak
2017-07-05 18:11 ` Christoph Hellwig
2017-07-12 13:00   ` Jan Tulak
2017-07-12 17:10 ` Richard W.M. Jones
2017-11-27 14:57 ` Andrew Price
2017-11-27 15:38   ` Jan Tulak
2017-11-27 16:24     ` Andrew Price
2017-11-27 16:42       ` Jan Tulak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.