linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFC: btrfs-progs json formatting
@ 2019-06-25 16:09 David Sterba
  2019-06-25 18:07 ` Hans van Kranenburg
  0 siblings, 1 reply; 4+ messages in thread
From: David Sterba @ 2019-06-25 16:09 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I'd like to get some feedback on the json output, the overall structure
of the information and naming.

Code: git://github.com/kdave/btrfs-progs.git preview-json

The one example command that supports it is

 $ ./btrfs --format json subvolume show /mnt

The test tests/cli-tests/011-* can demonstrate the capabilities.  The
json formatter and example use are in the top commits, otherwise the
branch contains lot of unrelated changes.

Example output:

{
  "__header": {
    "version": "1"
  },
  "subvolume-show": {
    "path": "subv2",
    "name": "subv2",
    "uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
    "parent_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
    "received_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
    "otime": "2019-06-25 17:15:28 +0200",
    "subvolume_id": "256",
    "generation": "6",
    "parent_id": "5",
    "toplevel_id": "5",
    "flags": "-",
    "snapshots": [
    ],
    "quota-group": {
      "qgroupid": "0/256",
      "qgroup-limit-referenced": "-",
      "qgroup-limit-exclusive": "-",
      "qgroup-usage-referenced": "16.00KiB",
      "qgroup-usage-exclusive": "16.00KiB"
    }
  }
}

The current proposal:

- the toplevel group contains 2 maps, one is some generic header, to
  identify version of the format or the version of progs that produces
  it or whatever could be useful and I did not think of it

- the 2nd map is named according to the command that generated the
  output, this is to be able to merge outputs from several commands, or
  to somehow define the contents of the map

Compare that to a simple unnamed group with bunch of values:

{
  "path": "subv2",
  "name": "subv2",
  "uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
  "parent_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
  "received_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
  "otime": "2019-06-25 17:15:28 +0200",
  "subvolume_id": "256",
  "generation": "6",
  "parent_id": "5",
  "toplevel_id": "5",
  "flags": "-",
  "snapshots": [
  ],
  "quota-group": {
    "qgroupid": "0/256",
    "qgroup-limit-referenced": "-",
    "qgroup-limit-exclusive": "-",
    "qgroup-usage-referenced": "16.00KiB",
    "qgroup-usage-exclusive": "16.00KiB"
  }
}

That's a bit shorter but makes validation harder. I assume that the
output would be accessed programatically like (python pseudo-code)

---
rawjson = output_of_command("btrfs --format json subvolume show /mnt")
json = Json.from_string(rawjson)

if json["subvolume-show"]["flags"] != "-":
  print("Subvolume %s has flags", json["subvolume-show"]["name"])
---

so the latter is subset of the former, without one map reference.

Once the json format is settled, more commands can be easily converted.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: btrfs-progs json formatting
  2019-06-25 16:09 RFC: btrfs-progs json formatting David Sterba
@ 2019-06-25 18:07 ` Hans van Kranenburg
  2019-06-26 16:47   ` David Sterba
  0 siblings, 1 reply; 4+ messages in thread
From: Hans van Kranenburg @ 2019-06-25 18:07 UTC (permalink / raw)
  To: dsterba, linux-btrfs

Hi,

On 6/25/19 6:09 PM, David Sterba wrote:
> Hi,
> 
> I'd like to get some feedback on the json output, the overall structure
> of the information and naming.
> 
> Code: git://github.com/kdave/btrfs-progs.git preview-json
> 
> The one example command that supports it is
> 
>  $ ./btrfs --format json subvolume show /mnt

I think for deciding what the structure will look like, it's very
interesting to look at what tools, utils, use cases, etc.. will use this
and how. Taking the bike for a ride, instead of painting the shed.

E.g.
 - Does the user expect the same text as in a current btrfs sub list,
but while being able to quickly use specific fields instead of using awk
to split the current output? -> a field flags with contents like
"foo,bar" or "FOO|BAR" might be expected instead of an integer.
 - Does the user want to write another tool that actually
programmatically does more? In that case the user might prefer having
the number instead of the text of flags which needs to be parsed back again.

> The test tests/cli-tests/011-* can demonstrate the capabilities.  The
> json formatter and example use are in the top commits, otherwise the
> branch contains lot of unrelated changes.
> 
> Example output:
> 
> {
>   "__header": {
>     "version": "1"
>   },
>   "subvolume-show": {
>     "path": "subv2",
>     "name": "subv2",
>     "uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>     "parent_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>     "received_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>     "otime": "2019-06-25 17:15:28 +0200",
>     "subvolume_id": "256",
>     "generation": "6",
>     "parent_id": "5",
>     "toplevel_id": "5",
>     "flags": "-",
>     "snapshots": [
>     ],
>     "quota-group": {
>       "qgroupid": "0/256",
>       "qgroup-limit-referenced": "-",
>       "qgroup-limit-exclusive": "-",
>       "qgroup-usage-referenced": "16.00KiB",
>       "qgroup-usage-exclusive": "16.00KiB"
>     }
>   }
> }
> 
> The current proposal:
> 
> - the toplevel group contains 2 maps, one is some generic header, to
>   identify version of the format or the version of progs that produces
>   it or whatever could be useful and I did not think of it
> 
> - the 2nd map is named according to the command that generated the
>   output, this is to be able to merge outputs from several commands, or
>   to somehow define the contents of the map

In what example use case would I be combining output of several commands?

> Compare that to a simple unnamed group with bunch of values:
> 
> {
>   "path": "subv2",
>   "name": "subv2",
>   "uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>   "parent_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>   "received_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>   "otime": "2019-06-25 17:15:28 +0200",
>   "subvolume_id": "256",
>   "generation": "6",
>   "parent_id": "5",
>   "toplevel_id": "5",
>   "flags": "-",
>   "snapshots": [
>   ],
>   "quota-group": {
>     "qgroupid": "0/256",
>     "qgroup-limit-referenced": "-",
>     "qgroup-limit-exclusive": "-",
>     "qgroup-usage-referenced": "16.00KiB",
>     "qgroup-usage-exclusive": "16.00KiB"
>   }
> }
> 
> That's a bit shorter but makes validation harder. I assume that the
> output would be accessed programatically like (python pseudo-code)

What about having structures that are result-data-oriented instead of
command-oriented?

E.g. something that shows a subvol or lists them could have a bunch of
"subvolumes" in the result:

{
  "__header": {
    [...]
  },
  "subvolumes": {
    5: {
    }
    256: {
    },
    260: {
    },
    1337: {
    }
  }
}

Then subvol show 260 could return the same format, with just one of them:

{
  "__header": {
    [...]
  },
  "subvolumes": {
    260: {
    },
  }
}

Now I can also combine the result of a few targeted subvol show commands:

subvols = sub_show_260_json["subvolumes"]
subvols.update(sub_show_1337_json["subvolumes"])
list(subvols.keys())
 -> [260, 1337]

> ---
> rawjson = output_of_command("btrfs --format json subvolume show /mnt")
> json = Json.from_string(rawjson)
> 
> if json["subvolume-show"]["flags"] != "-":
>   print("Subvolume %s has flags", json["subvolume-show"]["name"])
> ---
> 
> so the latter is subset of the former, without one map reference.
> 
> Once the json format is settled, more commands can be easily converted.
> 

Just some random thoughts.

Hans

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: btrfs-progs json formatting
  2019-06-25 18:07 ` Hans van Kranenburg
@ 2019-06-26 16:47   ` David Sterba
  2019-06-27  8:45     ` Hans van Kranenburg
  0 siblings, 1 reply; 4+ messages in thread
From: David Sterba @ 2019-06-26 16:47 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: dsterba, linux-btrfs

On Tue, Jun 25, 2019 at 08:07:25PM +0200, Hans van Kranenburg wrote:
> Hi,
> 
> On 6/25/19 6:09 PM, David Sterba wrote:
> > Hi,
> > 
> > I'd like to get some feedback on the json output, the overall structure
> > of the information and naming.
> > 
> > Code: git://github.com/kdave/btrfs-progs.git preview-json
> > 
> > The one example command that supports it is
> > 
> >  $ ./btrfs --format json subvolume show /mnt
> 
> I think for deciding what the structure will look like, it's very
> interesting to look at what tools, utils, use cases, etc.. will use this
> and how. Taking the bike for a ride, instead of painting the shed.

Geez bike shedding mentioned in 2nd sentence?

> E.g.
>  - Does the user expect the same text as in a current btrfs sub list,
> but while being able to quickly use specific fields instead of using awk
> to split the current output? -> a field flags with contents like
> "foo,bar" or "FOO|BAR" might be expected instead of an integer.

The idea is that the default output is for humans, but if there's need
to parse the output, json is the format for that. Cutting out the
information using 'awk' is not expected here.

So the information provided by the text and json in a single command is
almost the same, with json some enhancements are possible to allow
linking to other commands, eg. not just the subvolume path but also the
id. Something that you suggested below.

>  - Does the user want to write another tool that actually
> programmatically does more? In that case the user might prefer having
> the number instead of the text of flags which needs to be parsed back again.

Ok, so bitfield values can be exported in two ways, one as a raw number
and one as the string.

> > The test tests/cli-tests/011-* can demonstrate the capabilities.  The
> > json formatter and example use are in the top commits, otherwise the
> > branch contains lot of unrelated changes.
> > 
> > Example output:
> > 
> > {
> >   "__header": {
> >     "version": "1"
> >   },
> >   "subvolume-show": {
> >     "path": "subv2",
> >     "name": "subv2",
> >     "uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
> >     "parent_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
> >     "received_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
> >     "otime": "2019-06-25 17:15:28 +0200",
> >     "subvolume_id": "256",
> >     "generation": "6",
> >     "parent_id": "5",
> >     "toplevel_id": "5",
> >     "flags": "-",
> >     "snapshots": [
> >     ],
> >     "quota-group": {
> >       "qgroupid": "0/256",
> >       "qgroup-limit-referenced": "-",
> >       "qgroup-limit-exclusive": "-",
> >       "qgroup-usage-referenced": "16.00KiB",
> >       "qgroup-usage-exclusive": "16.00KiB"
> >     }
> >   }
> > }
> > 
> > The current proposal:
> > 
> > - the toplevel group contains 2 maps, one is some generic header, to
> >   identify version of the format or the version of progs that produces
> >   it or whatever could be useful and I did not think of it
> > 
> > - the 2nd map is named according to the command that generated the
> >   output, this is to be able to merge outputs from several commands, or
> >   to somehow define the contents of the map
> 
> In what example use case would I be combining output of several commands?

Well, that's always an issue with usecases. I don't have immediate use
for that but as the format will be set to stone, additional flexibility
can pay off in the future.

The util-linux tools support json output and that's what I'm trying to
copy for sake of some consistency, as the 'btrfs' tool is sort of
low-level administration tool very close in purpose.

Check eg 'lsblk --json' or 'lscpu --json', that print "blockdevices" or
"lscpu" as the name of the map.

> > Compare that to a simple unnamed group with bunch of values:
> > 
> > {
> >   "path": "subv2",
> >   "name": "subv2",
> >   "uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
> >   "parent_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
> >   "received_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
> >   "otime": "2019-06-25 17:15:28 +0200",
> >   "subvolume_id": "256",
> >   "generation": "6",
> >   "parent_id": "5",
> >   "toplevel_id": "5",
> >   "flags": "-",
> >   "snapshots": [
> >   ],
> >   "quota-group": {
> >     "qgroupid": "0/256",
> >     "qgroup-limit-referenced": "-",
> >     "qgroup-limit-exclusive": "-",
> >     "qgroup-usage-referenced": "16.00KiB",
> >     "qgroup-usage-exclusive": "16.00KiB"
> >   }
> > }
> > 
> > That's a bit shorter but makes validation harder. I assume that the
> > output would be accessed programatically like (python pseudo-code)
> 
> What about having structures that are result-data-oriented instead of
> command-oriented?

The commands give different level of details and retrieving some of that
can take longer. Eg. the quota information query the tree for the
limits, so this might not be suitable for the 'list' command.

> E.g. something that shows a subvol or lists them could have a bunch of
> "subvolumes" in the result:
> 
> {
>   "__header": {
>     [...]
>   },
>   "subvolumes": {
>     5: {
>     }
>     256: {
>     },
>     260: {
>     },
>     1337: {
>     }
>   }
> }
> 
> Then subvol show 260 could return the same format, with just one of them:
> 
> {
>   "__header": {
>     [...]
>   },
>   "subvolumes": {
>     260: {
>     },
>   }
> }
> 
> Now I can also combine the result of a few targeted subvol show commands:
> 
> subvols = sub_show_260_json["subvolumes"]
> subvols.update(sub_show_1337_json["subvolumes"])
> list(subvols.keys())
>  -> [260, 1337]

I think I see what you mean, perhaps more commands need to be explored
to see if we can find some common structures to avoid the per-command
output. Something like lightweight-subvolume and full-details-subvolume,
with some way distinguish that programatically or just leave it to
try/catch.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: btrfs-progs json formatting
  2019-06-26 16:47   ` David Sterba
@ 2019-06-27  8:45     ` Hans van Kranenburg
  0 siblings, 0 replies; 4+ messages in thread
From: Hans van Kranenburg @ 2019-06-27  8:45 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On 6/26/19 6:47 PM, David Sterba wrote:
> On Tue, Jun 25, 2019 at 08:07:25PM +0200, Hans van Kranenburg wrote:
>> Hi,
>>
>> On 6/25/19 6:09 PM, David Sterba wrote:
>>> Hi,
>>>
>>> I'd like to get some feedback on the json output, the overall structure
>>> of the information and naming.
>>>
>>> Code: git://github.com/kdave/btrfs-progs.git preview-json
>>>
>>> The one example command that supports it is
>>>
>>>  $ ./btrfs --format json subvolume show /mnt
>>
>> I think for deciding what the structure will look like, it's very
>> interesting to look at what tools, utils, use cases, etc.. will use this
>> and how. Taking the bike for a ride, instead of painting the shed.
> 
> Geez bike shedding mentioned in 2nd sentence?

Sorry, I didn't mean it like that. :)

What I was trying to say is that I'm genuinely interested in hearing the
actual use cases from actual users who asked for this feature in the
past, because I think it will be the best help to improve the output
format (as opposed to looking at the structures from an aesthetical
viewpoint).

If you still know who requested the feature in the past, then maybe Cc
them on the thread? Might be missed easily, otherwise.

>> E.g.
>>  - Does the user expect the same text as in a current btrfs sub list,
>> but while being able to quickly use specific fields instead of using awk
>> to split the current output? -> a field flags with contents like
>> "foo,bar" or "FOO|BAR" might be expected instead of an integer.
> 
> The idea is that the default output is for humans, but if there's need
> to parse the output, json is the format for that. Cutting out the
> information using 'awk' is not expected here.
> 
> So the information provided by the text and json in a single command is
> almost the same, with json some enhancements are possible to allow
> linking to other commands, eg. not just the subvolume path but also the
> id. Something that you suggested below.
> 
>>  - Does the user want to write another tool that actually
>> programmatically does more? In that case the user might prefer having
>> the number instead of the text of flags which needs to be parsed back again.
> 
> Ok, so bitfield values can be exported in two ways, one as a raw number
> and one as the string.
> 
>>> The test tests/cli-tests/011-* can demonstrate the capabilities.  The
>>> json formatter and example use are in the top commits, otherwise the
>>> branch contains lot of unrelated changes.
>>>
>>> Example output:
>>>
>>> {
>>>   "__header": {
>>>     "version": "1"
>>>   },
>>>   "subvolume-show": {
>>>     "path": "subv2",
>>>     "name": "subv2",
>>>     "uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>>>     "parent_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>>>     "received_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>>>     "otime": "2019-06-25 17:15:28 +0200",
>>>     "subvolume_id": "256",
>>>     "generation": "6",
>>>     "parent_id": "5",
>>>     "toplevel_id": "5",
>>>     "flags": "-",
>>>     "snapshots": [
>>>     ],
>>>     "quota-group": {
>>>       "qgroupid": "0/256",
>>>       "qgroup-limit-referenced": "-",
>>>       "qgroup-limit-exclusive": "-",
>>>       "qgroup-usage-referenced": "16.00KiB",
>>>       "qgroup-usage-exclusive": "16.00KiB"
>>>     }
>>>   }
>>> }
>>>
>>> The current proposal:
>>>
>>> - the toplevel group contains 2 maps, one is some generic header, to
>>>   identify version of the format or the version of progs that produces
>>>   it or whatever could be useful and I did not think of it
>>>
>>> - the 2nd map is named according to the command that generated the
>>>   output, this is to be able to merge outputs from several commands, or
>>>   to somehow define the contents of the map
>>
>> In what example use case would I be combining output of several commands?
> 
> Well, that's always an issue with usecases. I don't have immediate use
> for that but as the format will be set to stone, additional flexibility
> can pay off in the future.
> 
> The util-linux tools support json output and that's what I'm trying to
> copy for sake of some consistency, as the 'btrfs' tool is sort of
> low-level administration tool very close in purpose.
> 
> Check eg 'lsblk --json' or 'lscpu --json', that print "blockdevices" or
> "lscpu" as the name of the map.
> 
>>> Compare that to a simple unnamed group with bunch of values:
>>>
>>> {
>>>   "path": "subv2",
>>>   "name": "subv2",
>>>   "uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>>>   "parent_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>>>   "received_uuid": "9178604e-e961-ef40-a5f0-2c109126b209",
>>>   "otime": "2019-06-25 17:15:28 +0200",
>>>   "subvolume_id": "256",
>>>   "generation": "6",
>>>   "parent_id": "5",
>>>   "toplevel_id": "5",
>>>   "flags": "-",
>>>   "snapshots": [
>>>   ],
>>>   "quota-group": {
>>>     "qgroupid": "0/256",
>>>     "qgroup-limit-referenced": "-",
>>>     "qgroup-limit-exclusive": "-",
>>>     "qgroup-usage-referenced": "16.00KiB",
>>>     "qgroup-usage-exclusive": "16.00KiB"
>>>   }
>>> }
>>>
>>> That's a bit shorter but makes validation harder. I assume that the
>>> output would be accessed programatically like (python pseudo-code)
>>
>> What about having structures that are result-data-oriented instead of
>> command-oriented?
> 
> The commands give different level of details and retrieving some of that
> can take longer. Eg. the quota information query the tree for the
> limits, so this might not be suitable for the 'list' command.
> 
>> E.g. something that shows a subvol or lists them could have a bunch of
>> "subvolumes" in the result:
>>
>> {
>>   "__header": {
>>     [...]
>>   },
>>   "subvolumes": {
>>     5: {
>>     }
>>     256: {
>>     },
>>     260: {
>>     },
>>     1337: {
>>     }
>>   }
>> }
>>
>> Then subvol show 260 could return the same format, with just one of them:
>>
>> {
>>   "__header": {
>>     [...]
>>   },
>>   "subvolumes": {
>>     260: {
>>     },
>>   }
>> }
>>
>> Now I can also combine the result of a few targeted subvol show commands:
>>
>> subvols = sub_show_260_json["subvolumes"]
>> subvols.update(sub_show_1337_json["subvolumes"])
>> list(subvols.keys())
>>  -> [260, 1337]
> 
> I think I see what you mean, perhaps more commands need to be explored
> to see if we can find some common structures to avoid the per-command
> output. Something like lightweight-subvolume and full-details-subvolume,
> with some way distinguish that programatically or just leave it to
> try/catch.
> 

Hans

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-06-27  8:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 16:09 RFC: btrfs-progs json formatting David Sterba
2019-06-25 18:07 ` Hans van Kranenburg
2019-06-26 16:47   ` David Sterba
2019-06-27  8:45     ` Hans van Kranenburg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).