git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Josh Steadmon <steadmon@google.com>,
	Jeff Hostetler <jeffhost@microsoft.com>,
	git@vger.kernel.org, Emily Shaffer <emilyshaffer@google.com>
Subject: Re: [PATCH] trace2: increment event format version
Date: Wed, 01 Dec 2021 16:57:16 +0100	[thread overview]
Message-ID: <211201.86zgpk9u3t.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <9ce9c989-5a8c-1d5c-cc7e-9447760ed110@jeffhostetler.com>


On Wed, Dec 01 2021, Jeff Hostetler wrote:

> On 11/12/21 5:33 PM, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Nov 11 2021, Junio C Hamano wrote:
>> 
>>> Josh Steadmon <steadmon@google.com> writes:
>>>
>>>> On 2021.11.11 15:03, Junio C Hamano wrote:
>>>>> Josh Steadmon <steadmon@google.com> writes:
>>>>>
>>>>>> In 64bc752 (trace2: add trace2_child_ready() to report on background
>>>>>> children, 2021-09-20), we added a new "child_ready" event. In
>>>>>> Documentation/technical/api-trace2.txt, we promise that adding a new
>>>>>> event type will result in incrementing the trace2 event format version
>>>>>> number, but this was not done. Correct this in code & docs.
>>>>>>
> ...
>> On the field itself I also wonder if it's useful at all. I'd think
>> anyone implementing a parser for the format would dispatch to a lookup
>> handling known keys, so having a version indicating "new keys here"
>> seems rather useless.
>> 
>
> That may be true, but it is easier to have a version number and
> allow parsers to ignore it, than it is to not have a version number
> and at some point in the future require parsers try to figure it
> out. IMHO.
>
> So far we've only added new event types (cmd_ancestry and child_ready)
> and everything is in regular JSON forms, so parsing and dispatching
> is relatively easy and the version number is not really needed.
> But, if in the future we decide to change the contents within one of
> those events, then the version number may be significant.

Yes, exactly. Versioning it makes sense, I'm commenting specifically on
the criteria for bumping the version. I.e. this bit in
trace2/tr2_tgt_event.c (with my comments)::
    
     * The version should be incremented if new event types are added

That seems entirely useless, i.e. just ignore unknown types. If you get
an unknown type you know something is new, no need for a version number
bump for this case.

It adds no new information you're not getting with "oh? a new field?".

     * if existing fields are removed, [...]

Useful, you don't always know if the absence of a field means it won't
be there at all, or if something else is up, particularly if the field
is optional. E.g. if we rename "error" to "exception" (or whatever) we
don't want an older consumer to report that there's no errors happening.

     * [...]or if there are significant changes in
     * interpretation of existing events or fields.

Carving out "significant changes" seems rather odd.

Let's just say "any changes"? E.g. I don't think my 2d3491b117c (tr2:
log N parent process names on Linux, 2021-08-27) qualifies as
significant change, but why not bump the version even for that?

In that case it was part of a release where we bumped the version
anyway, but let's come up with some sane criteria.

What's "significant" is also entirely subjective. I think 2d3491b117c is
trivial in the context of the entirety of our trace2 data. But if all
you care about is logging process trees it's highly significant.

     * Smaller changes, such as adding
     * a new field to an existing event, do not require an increment to the EVENT
     * format version.

I'd think a new sub-field would be in the same class as a new top-level
event field, i.e. adding a version wouldn't add any information. It
falls under the same general rule that you can traverse future data with
your "version" in hand, and a whitelist of fields you know how to
handle. If the version is the same but you've got new unknown fields you
shouldn't need to care.

IOW I think this would make more sense as a version bumping criteria:

    The version should be incremented whenever an existing consumer of
    trace2 data might want to act differently based on the new data.

    An exception to this is that any new event types do not merit
    bumping the version number. E.g. we have a top-level event type
    "error" now, but might hypothetically add a new "warning" type.
    Such an addition won't require bumping the version.

    Likewise adding new mandatory fields to existing events doesn't
    require bumping the version. E.g. the "error" type has (as of
    writing) a "fmt" and "msg" field. Let's say a future version adds an
    "id" (as in unique id for the error) field, such an addition won't
    require bumping the version.

    In other words, consumers of the trace2 JSON format are expected to
    walk the structure and only pick those things that they know about.
    Any unknown fields the consumer doesn't know about can be safely
    discarded. This won't apply if the version is bumped, then all bets
    are off, and the meaning of existing fields may or may not have
    changed.

    The idea is to encourage additive changes over changes to existing
    fields, and to reduce the work in maintaining the consumers of the
    format.

    As long as consumers ignore new unknown data they won't need to
    be updated every time the format changes in any way, only for
    potentially backwards-incompatible changes.

Wouldn't this be a saner policy for version bumping? AFAICT the only
thing you wouldn't be getting that you're getting now is the trivial
optimization of being able to say cheaply route trace2 payloads based on
version number alone (but even that is iffy now due to the subjectivity
of "significant change").

  reply	other threads:[~2021-12-01 16:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-11 22:34 [PATCH] trace2: increment event format version Josh Steadmon
2021-11-11 23:03 ` Junio C Hamano
2021-11-11 23:06   ` Josh Steadmon
2021-11-11 23:47     ` Junio C Hamano
2021-11-12 22:33       ` Ævar Arnfjörð Bjarmason
2021-11-12 23:28         ` Junio C Hamano
2021-12-01 15:49         ` Jeff Hostetler
2021-12-01 15:57           ` Ævar Arnfjörð Bjarmason [this message]
2021-12-01 19:56             ` Josh Steadmon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211201.86zgpk9u3t.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=emilyshaffer@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).