All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Price <anprice@redhat.com>
To: Jan Tulak <jtulak@redhat.com>, linux-fsdevel@vger.kernel.org
Subject: Re: [proposal] making filesystem tools more machine friendly
Date: Mon, 27 Nov 2017 14:57:52 +0000	[thread overview]
Message-ID: <095aeded-39d1-c331-22cc-cdf1da069e3f@redhat.com> (raw)
In-Reply-To: <CACj3i70p=ybXZWRncseqwnbs7HAYu-SL02+cj--7T5YnqcwVKQ@mail.gmail.com>

Hi,

On 30/06/17 09:17, Jan Tulak wrote:
> AKA filesystem API
> 
> Hi guys
> 
> Currently, filesystem tools are not made with automation in mind. So
> any tool that wants to interact with filesystems (be it for
> automation, or to provide a more user-friendly interface) has to
> screen scrape everything and cope with changing outputs.
> 
> I think it is the time to focus some thoughts on how to make the fs
> tools easier to be used by scripts and other tools. 

What's the status of this? I'd like to make sure gfs2-utils is geared up 
for it and catered for, whatever solution is chosen.

Perhaps this ship has already sailed, but, I think a json dependency may 
be a little too heavy, and perhaps a simpler stream of key-value lists 
that can be generated with fprintf() would suffice? For accepting 
options, we already have code to parse that sort of thing as we handle 
"foo=bar,baz=42" style strings for extended options so it shouldn't take 
much work to repurpose that. That said, I don't think passing options to 
tools in this way holds much value over specifying them in argv.

Willing to budge for consensus, though.

Andy

> Now, to ease you,
> the answer to the obvious question "who will do it" is "me". I don't
> want to force you into anything, though, so I'm opening this
> discussion pretty early with some ideas and I hope to hear from you
> what do you think about it before anything is set in the stone. (For
> those who visited Vault this year, Justin Mitchell and I had a talk
> about this, codename Springfield.)
> 
> The following text attempts to identify issues with using
> filesystems-related tools in scripts/applications and proposes a
> solution to those issues.
> 
> Content:
> 1. A quick introduction
> 2. Details of the issues
> 3. Proposed Solutions
> 4. Conclusion
> 
> 1. A quick introduction
> =================
> 
> I discussed this topic with people who are building something around
> fs tools. For example, the developer of libblockdev (Vratislav
> Podzimek, https://github.com/vpodzime/libblockdev) or system storage
> manager (was Lukas Czerner, now it is me,
> https://sourceforge.net/projects/storagemanager/), and the listed
> issues are a product of experience from working with those tools. The
> issues are related mostly to basic operations like mkfs, fsck,
> snapshots and resizing. Advanced/debugging tools like xfs_io or xfs_db
> are not in the focus, but they might benefit from any possible change
> too if included in it.
> 
> The main issues of the current state, where all the tools are run with
> some flags and options and produce a human readable output:
>   * The output format can change, sometimes without an easy way of
> detection like a version number.
>   * Different output formats for different tools even in a single FS,
> thus zero code reuse/sharing.
>   * Screenscraping can introduce bugs (a rare inner condition can add a
> field into the output and break regular expressions).
>   * No (or weak) progress report.
>   * Different filesystems have different input formats (flags, options)
> even for the most basic capabilities.
>   * Thread safety, forking
> 
> 2. Details of the issues
> ==================
> 
> Let’s look at the issues now: why it is an issue and if we can do some
> small change to fix it on its own.
> 
> 
> The output format can change, sometimes without an easy way of
> detection like a version number. Most filesystems are well behaved,
> but still, we don’t know what exactly are people doing with the tools
> and even adding a new field can possibly break a script. Keeping a
> compatibility with older versions adds another complexity to such
> tools.
> 
> What can be done about this? The new fields have to be printed somehow
> and changing the format of the standard output would break everything.
> Making sure that if the input or output changes in any way, it is
> always with a detectable difference in the version number is a good
> practice, but it doesn’t solve the issue, it only makes hacking around
> it easier.
> 
> What can really help is to have an alternative output (which can be
> turned on when the user wants it), which is easy to parse and which is
> resilient to a certain degree of changes because it can express
> dynamic items like lists or arrays: JSON, XML...
> 
> 
> Different input/output formats for different tools even in a single
> FS, thus zero code reuse/sharing: Support for every tool and every
> filesystem has to start from a scratch. Nothing can be done about it
> without a change in the output format. But if an optional JSON or XML
> or something was supported, then instead of creating a parser for
> every tool, there could be used just one standard and already a
> well-tested library.
> 
> 
> Screenscraping can introduce bugs (some rare inner condition can add a
> field into the output and break regular expressions): Well, let’s just
> look at how many services still can’t even parse and verify an email
> address correctly. And we have a lot more complex text… Again, some
> easy-to-parse format with existing libraries that would turn the text
> into a bunch of variables or an object would help.
> 
> 
> No (or weak) progress report: Especially for tools that can run for a
> long time, like fsck. Screenscraping everything it throws out and then
> deciding whether it is a progress report message (because instead of
> “25 %” it says “running operation foo”), a message to tell the user,
> or something to just ignore is a lot less comfortable and error prone
> than “{level: ‘progress’, stage: 5, msg: ‘operation foo’}”.
> 
> 
> Different filesystems have different input formats (flags, options)
> even for the most basic capabilities: Similar to “Different
> input/output formats for different tools...”. For example, for
> labeling a partition, you use mkfs.xfs -L label, but mkfs.vfat -n
> label. However, changing this requires getting the same functionality
> with a common basic specification to other filesystems too.
> 
> 
> Thread safety, forking: The people who work on a library, like
> libblockdev, doesn’t like that they have to fork another process over
> which they have no control, as they can’t guarantee anything about it.
> This can’t be fixed by changing the output format, though, but would
> require making a public library providing a similar interface as the
> existing fs tools. No detailed access to insides is needed, just a way
> how to run mkfs, fsck, snapshots, etc… without spawning another
> process and without screenscraping.
> 
> 
> 3. Proposed Solutions
> =================
> 
> There are two (complementary) ways how to address the issues: add a
> structured, machine-readable format for input/output of the tools
> (e.g. JSON, XML, …) and to create a library with the functionality of
> the existing tools. Let’s look now at those options. I will focus on
> what changes they would require, what would be the price for
> maintaining that solution and if there are any drawbacks or additional
> advantages.
> 
> An optional third option would be to create a system service/daemon,
> that would use dbus or some other structured interface to accept jobs
> and return results. I think that LVM people are working on something
> like this.
> 
> The proposed solutions are ordered in regards to their complexity.
> Also, they can be seen as follow-ups, because every proposed option
> requires big part of the work from the previous one anyway.
> 
> 
> 3.1. Structured Output
> -------------------------------
> In other words, what LVM already does with --reportformat
> {basic|json}, and lsblk with --json. Possibly, we could also make JSON
> input too. That would allow the user to, instead of using all the
> flags and options of CLI, make something like
> --jsoninput=“{dev:’/dev/abc’, force: true, … }”
> 
> Some preliminary notes about the format:
> Most likely, this would mean JSON. JSON is currently preferred over
> XML because it is easier to read by humans if the need arises, it’s
> encoder/parser is lighter and easier to use. Also, other projects like
> LVM, multipath or lsblk already uses JSON, so it would be nice to
> don’t break the group.
> 
> Required implementation changes/expected problems:
> In an ideal world, a simple replacement of all prints with a wrapping
> function would be enough. However, as far as I know, an overwhelming
> majority of the tools has printing functions spread through the code
> and prints everything as soon as it knows the specific bit of
> information.
> 
> Change of the output into a structured format means some refactoring.
> Instead of simple printf(), an object, array or structure has to be
> created and rather than pure strings, more diagnostically useful
> values have to be added into it in place of the current prints. Then,
> when it is reasonable, print it out all at once in any desired format.
> The “when reasonable” would usually mean at the end, but it could also
> print progress if it is a long time running operation.
> 
> Because of the kinds of the required changes, the implementation can
> be split into two parts: first, clean the code and move all the prints
> out from spaghetti to the end. Then, once this is done, add the
> structured format.
> 
> Maintaining overhead:
> Small. By separating the printing from generating the data, we don’t
> really care about the output format anywhere except in the printing
> function itself, and if a new field or value is added or an old one
> removed, then the amount of work is roughly equal to the current
> state.
> 
> Drawbacks:
> Searching for a place in the code where something happens would be
> more complicated - instead of simple search for a string that is the
> same in the code and in the output, one would search for the line that
> adds the data to the message log. This could be simplified by using
> __LINE__ macro with debug output. (With JSON, an additional field
> would not affect anything, so it would be completely safe.)
> 
> Additional advantages:
> The refactoring can clean up the code a bit. It is easy to add any
> other format in the future. Our own test suite could also benefit from
> the structured output.
> 
> Comment:
> The most bang for the buck option and most of the work done for this
> is required also for every other option. In terms of specific
> interface (i.e. common JSON elements), we need to identify a common
> subset of fields/options that every fs will use and anything else move
> into fs-specific extensions.
> 
> With regards to the compatibility between different filesystems, the
> best way how to specify the format might be a small library that would
> take raw data on one side and turn it into JSON string or even print
> it (and the reverse, if input supports json too). This way, we would
> be sure that there really is a common ground that works the same in
> every fs.
> 
> Another way how to achieve the compatibility is to make an RFC-like
> document. For example: All occurrences of a filesystem identifier MUST
> be in a field named 'fsid' which SHOULD contain a UUID formatted
> string. I think this is useful even if we end up with the library as a
> way to find out the common ground.
> 
> 3.2. A Library
> -------------------
> A library implementing basic functions like: mkfs(int argc, char
> *argv[]), fsck(), … etc. Once done, binding for other languages like
> Python is possible too.
> 
> Required implementation changes/expected problems:
> If the implementation of this library would follow up after the
> changes that add the structured output, then most of the work would be
> already done. The most complex issue remaining would probably be that
> there can be no exit() call - if anything fails, we have to gracefully
> return up the stack and out of the library.
> 
> A duplicity of functionality is not an issue because there is none -
> the binary tools like mkfs.xfs would become simple wrappers around the
> library functions: pass user input to the specified library call, then
> print any message created in the library and exit.
> 
> Maintaining overhead:
> None? The code would be cleaned up and moved, but there wouldn’t be
> new things to maintain.
> 
> Drawbacks:
> I can’t think out any...
> 
> Additional advantages:
> The refactoring can clean up the code a bit.
> 
> Comment:
> Useful and nice to have, but doesn’t have to be done ASAP.
> 
> 
> 3.3. A system service
> ------------------------------
> A system service/daemon, that would use dbus or some other structured
> interface to accept jobs and return results.
> 
> Required implementation changes/expected problems:
> We don’t want the daemon to exit if it can’t access a file, we don’t
> want it to do printfs(), … So, at the end, we have to do the
> structured output and library and then a lot of work above it.
> 
> Maintaining overhead:
> All the system services things plus whatever the other two solutions require.
> 
> Drawbacks:
> The biggest maintaining overhead of all proposed solutions, basically
> a new tool/subproject. Using dbus in a third party project is more
> work than to just include a library and call one function.
> 
> Additional advantages:
> It wouldn’t be possible to attempt concurrent modifications of a device.
> 
> Comment:
> In my opinion, this shouldn’t be our project. Let other people make it
> their front if they want something like this and instead, just make it
> easier for them by using one of the other solutions.
> 
> 
> 4. Conclusion
> ===========
> 
> A structured output is something we should aim for. It helps a lot
> with the issues and it is the cheapest option. If it goes well, it can
> be later on followed by creating a library, although, at this moment,
> that seems a premature thing strive for. Creating a daemon is not a
> thing any single filesystem should do on its own.
> 
> And thank you for reading all this, I look forward to your comments. :-)
> Jan
> 

  parent reply	other threads:[~2017-11-27 14:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-30  8:17 [proposal] making filesystem tools more machine friendly Jan Tulak
2017-06-30 10:22 ` Arvin Schnell
2017-06-30 13:58 ` Emmanuel Florac
2017-06-30 15:29 ` Theodore Ts'o
2017-07-03 11:52   ` Jan Tulak
2017-07-03 15:07     ` Theodore Ts'o
2017-07-03 17:37       ` Darrick J. Wong
2017-07-04 13:57         ` Jan Tulak
2017-07-04 19:07           ` Theodore Ts'o
2017-07-12  8:42             ` Jan Tulak
2017-07-05 18:11 ` Christoph Hellwig
2017-07-12 13:00   ` Jan Tulak
2017-07-12 17:10 ` Richard W.M. Jones
2017-11-27 14:57 ` Andrew Price [this message]
2017-11-27 15:38   ` Jan Tulak
2017-11-27 16:24     ` Andrew Price
2017-11-27 16:42       ` Jan Tulak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=095aeded-39d1-c331-22cc-cdf1da069e3f@redhat.com \
    --to=anprice@redhat.com \
    --cc=jtulak@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.