Structured feeds

* Structured feeds
@ 2019-11-05 10:02 Dmitry Vyukov
  2019-11-06 15:35 ` Daniel Axtens
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-05 10:02 UTC (permalink / raw)
  To: workflows, automated-testing
  Cc: Konstantin Ryabitsev, Brendan Higgins, Han-Wen Nienhuys,
	Kevin Hilman, Veronika Kabatova

Hi,

This is another follow up after Lyon meetings. The main discussion was
mainly around email process (attestation, archival, etc):
https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t

I think providing info in a structured form is the key for allowing
building more tooling and automation at a reasonable price. So I
discussed with CI/Gerrit people and Konstantin how the structured
information can fit into the current "feeds model" and what would be
the next steps for bringing it to life.

Here is the outline of the idea.
The current public inbox format is a git repo with refs/heads/master
that contains a single file "m" in RFC822 format. We add
refs/heads/json with a single file "j" that contains structured data
in JSON format. 2 separate branches b/c some clients may want to fetch
just one of them.

Current clients will only create plain text "m" entry. However, newer
clients can also create a parallel "j" entry with the same info in
structured form. "m" and "j" are cross-referenced using the
Message-ID. It's OK to have only "m", or both, but not only "j" (any
client needs to generate at least some text representation for every
message).

Currently we have public inbox feeds only for mailing lists. The idea
is that more entities will have own "private" feeds. For example, each
CI system, static analysis system, or third-party code review system
has its own feed. Eventually people have own feeds too. The feeds can
be relatively easily converted to local inbox, important into GMail,
etc (potentially with some filtering).

Besides private feeds there are also aggregated feeds to not require
everybody to fetch thousands of repositories. kernel.org will provide
one, but it can be mirrored (or build independently) anywhere else. If
I create https://github.com/dvyukov/kfeed.git for my feed and Linus
creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
then the aggregated feed will map these to the following branches:
refs/heads/github.com/dvyukov/kfeed/master
refs/heads/github.com/dvyukov/kfeed/json
refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
Standardized naming of sub-feeds allows a single repo to host multiple
feeds. For example, github/gitlab/gerrit bridge could host multiple
individual feeds for their users.
So far there is no proposal for feed auto-discovery. One needs to
notify kernel.org for inclusion of their feed into the main aggregated
feed.

Konstantin offered that kernel.org can send emails for some feeds.
That is, normally one sends out an email and then commits it to the
feed. Instead some systems can just commit the message to feed and
then kernel.org will pull the feed and send emails on user's behalf.
This allows clients to not deal with email at all (including mail
client setup). Which is nice.

Eventually git-lfs (https://git-lfs.github.com) may be used to embed
blob's right into feeds. This would allow users to fetch only the
blobs they are interested in. But this does not need to happen from
day one.

As soon as we have a bridge from plain-text emails into the structured
form, we can start building everything else in the structured world.
Such bridge needs to parse new incoming emails, try to make sense out
of them (new patch, new patch version, comment, etc) and then push the
information in structured form. Then e.g. CIs can fetch info about
patches under review, test and post strctured results. Bridging in the
opposite direction happens semi-automatically as CI also pushes text
representation of results and that just needs to be sent as email.
Alternatively, we could have a separate explicit converted of
structured message into plain text, which would allow to remove some
duplication and present results in more consistent form.

Similarly, it should be much simpler for Patchwork/Gerrit to present
current patches under review. Local mode should work almost seamlessly
-- you fetch the aggregated feed and then run local instance on top of
it.

No work has been done on the actual form/schema of the structured
feeds. That's something we need to figure out working on a prototype.
However, good references would be git-appraise schema:
https://github.com/google/git-appraise/tree/master/schema
and gerrit schema (not sure what's a good link). Does anybody know
where the gitlab schema is? Or other similar schemes?

Thoughts and comments are welcome.
Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread