* Re: Structured feeds
2019-11-05 10:02 Structured feeds Dmitry Vyukov
@ 2019-11-06 15:35 ` Daniel Axtens
2019-11-06 20:50 ` Konstantin Ryabitsev
` (2 more replies)
2019-11-06 19:54 ` Han-Wen Nienhuys
` (3 subsequent siblings)
4 siblings, 3 replies; 33+ messages in thread
From: Daniel Axtens @ 2019-11-06 15:35 UTC (permalink / raw)
To: Dmitry Vyukov, workflows, automated-testing
Cc: Konstantin Ryabitsev, Brendan Higgins, Han-Wen Nienhuys,
Kevin Hilman, Veronika Kabatova
> As soon as we have a bridge from plain-text emails into the structured
> form, we can start building everything else in the structured world.
> Such bridge needs to parse new incoming emails, try to make sense out
> of them (new patch, new patch version, comment, etc) and then push the
> information in structured form. Then e.g. CIs can fetch info about
This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
at almost thirteen hundred lines, and that's with the benefit of the
Python standard library. It also regularly gets patched to handle
changes to email systems (e.g. DMARC), changes to git (git request-pull
format changed subtly in 2.14.3), the bizzare ways people send email,
and so on.
Patchwork does expose much of this as an API, for example for patches:
https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
build on that feel free. We can possibly add data to the API if that
would be helpful. (Patches are always welcome too, if you don't want to
wait an indeterminate amount of time.)
Regards,
Daniel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-06 15:35 ` Daniel Axtens
@ 2019-11-06 20:50 ` Konstantin Ryabitsev
2019-11-07 9:08 ` Dmitry Vyukov
` (2 more replies)
2019-11-07 8:53 ` Dmitry Vyukov
2019-11-07 20:43 ` [Automated-testing] " Don Zickus
2 siblings, 3 replies; 33+ messages in thread
From: Konstantin Ryabitsev @ 2019-11-06 20:50 UTC (permalink / raw)
To: Daniel Axtens
Cc: Dmitry Vyukov, workflows, automated-testing, Brendan Higgins,
Han-Wen Nienhuys, Kevin Hilman, Veronika Kabatova
On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
>This is an non-trivial problem, fwiw. Patchwork's email parser clocks
>in
>at almost thirteen hundred lines, and that's with the benefit of the
>Python standard library. It also regularly gets patched to handle
>changes to email systems (e.g. DMARC), changes to git (git request-pull
>format changed subtly in 2.14.3), the bizzare ways people send email,
>and so on.
I'm actually very interested in seeing patchwork switch from being fed
mail directly from postfix to using public-inbox repositories as its
source of patches. I know it's easy enough to accomplish as-is, by
piping things from public-inbox to parsemail.sh, but it would be even
more awesome if patchwork learned to work with these repos natively.
The way I see it:
- site administrator configures upstream public-inbox feeds
- a backend process clones these repositories
- if it doesn't find a refs/heads/json, then it does its own parsing
to generate a structured feed with patches/series/trailers/pull
requests, cross-referencing them by series as necessary. Something
like a subset of this, excluding patchwork-specific data:
https://patchwork.kernel.org/api/1.1/patches/11177661/
- if it does find an existing structured feed, it simply uses it (e.g.
it was made available by another patchwork instance)
- the same backend process updates the repositories from upstream using
proper manifest files (e.g. see
https://lore.kernel.org/workflows/manifest.js.gz)
- patchwork projects then consume one (or more) of these structured
feeds to generate the actionable list of patches that maintainers can
use, perhaps with optional filtering by specific headers (list-id,
from, cc), patch paths, keywords, etc.
Basically, parsemail.sh is split into two, where one part does feed
cloning, pulling, and parsing into structured data (if not already
done), and another populates actual patchwork project with patches
matching requested parameters.
I see the following upsides to this:
- we consume public-inbox feeds directly, no longer losing patches due
to MTA problems, postfix burps, parse failures, etc
- a project can have multiple sources for patches instead of being tied
to a single mailing list
- downstream patchwork instances (the "local patchwork" tool I mentioned
earlier) can benefit from structured feeds provided by
patchwork.kernel.org
>Patchwork does expose much of this as an API, for example for patches:
>https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>build on that feel free. We can possibly add data to the API if that
>would be helpful. (Patches are always welcome too, if you don't want to
>wait an indeterminate amount of time.)
As I said previously, I may be able to fund development of various
features, but I want to make sure that I properly work with upstream.
That requires getting consensus on features to make sure that we don't
spend funds and efforts on a feature that gets rejected. :)
Would the above feature (using one or more public-inbox repositories as
sources for a patchwork project) be a welcome addition to upstream?
-K
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-06 20:50 ` Konstantin Ryabitsev
@ 2019-11-07 9:08 ` Dmitry Vyukov
2019-11-07 10:57 ` Daniel Axtens
2019-11-07 11:09 ` Daniel Axtens
2019-11-08 14:18 ` Daniel Axtens
2 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07 9:08 UTC (permalink / raw)
To: Konstantin Ryabitsev
Cc: Daniel Axtens, workflows, automated-testing, Brendan Higgins,
Han-Wen Nienhuys, Kevin Hilman, Veronika Kabatova
On Wed, Nov 6, 2019 at 9:50 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> >This is an non-trivial problem, fwiw. Patchwork's email parser clocks
> >in
> >at almost thirteen hundred lines, and that's with the benefit of the
> >Python standard library. It also regularly gets patched to handle
> >changes to email systems (e.g. DMARC), changes to git (git request-pull
> >format changed subtly in 2.14.3), the bizzare ways people send email,
> >and so on.
>
> I'm actually very interested in seeing patchwork switch from being fed
> mail directly from postfix to using public-inbox repositories as its
> source of patches. I know it's easy enough to accomplish as-is, by
> piping things from public-inbox to parsemail.sh, but it would be even
> more awesome if patchwork learned to work with these repos natively.
>
> The way I see it:
>
> - site administrator configures upstream public-inbox feeds
> - a backend process clones these repositories
> - if it doesn't find a refs/heads/json, then it does its own parsing
> to generate a structured feed with patches/series/trailers/pull
> requests, cross-referencing them by series as necessary. Something
> like a subset of this, excluding patchwork-specific data:
> https://patchwork.kernel.org/api/1.1/patches/11177661/
> - if it does find an existing structured feed, it simply uses it (e.g.
> it was made available by another patchwork instance)
It's an interesting feature if a patchwork instance would convert and
export text emails to structured info. Then it can be consumed by CIs
for precommit testing and other systems without the need to duplicate
conversion.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-07 9:08 ` Dmitry Vyukov
@ 2019-11-07 10:57 ` Daniel Axtens
2019-11-07 11:26 ` Veronika Kabatova
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Axtens @ 2019-11-07 10:57 UTC (permalink / raw)
To: Dmitry Vyukov, Konstantin Ryabitsev
Cc: workflows, automated-testing, Brendan Higgins, Han-Wen Nienhuys,
Kevin Hilman, Veronika Kabatova
Dmitry Vyukov <dvyukov@google.com> writes:
> On Wed, Nov 6, 2019 at 9:50 PM Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
>>
>> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
>> >This is an non-trivial problem, fwiw. Patchwork's email parser clocks
>> >in
>> >at almost thirteen hundred lines, and that's with the benefit of the
>> >Python standard library. It also regularly gets patched to handle
>> >changes to email systems (e.g. DMARC), changes to git (git request-pull
>> >format changed subtly in 2.14.3), the bizzare ways people send email,
>> >and so on.
>>
>> I'm actually very interested in seeing patchwork switch from being fed
>> mail directly from postfix to using public-inbox repositories as its
>> source of patches. I know it's easy enough to accomplish as-is, by
>> piping things from public-inbox to parsemail.sh, but it would be even
>> more awesome if patchwork learned to work with these repos natively.
>>
>> The way I see it:
>>
>> - site administrator configures upstream public-inbox feeds
>> - a backend process clones these repositories
>> - if it doesn't find a refs/heads/json, then it does its own parsing
>> to generate a structured feed with patches/series/trailers/pull
>> requests, cross-referencing them by series as necessary. Something
>> like a subset of this, excluding patchwork-specific data:
>> https://patchwork.kernel.org/api/1.1/patches/11177661/
>> - if it does find an existing structured feed, it simply uses it (e.g.
>> it was made available by another patchwork instance)
>
> It's an interesting feature if a patchwork instance would convert and
> export text emails to structured info. Then it can be consumed by CIs
> for precommit testing and other systems without the need to duplicate
> conversion.
This already happens.
Snowpatch does this and uses it to run CI checks on patch series as soon
as they arrive, and sends them back to patchwork as test results. It has
been running on linuxppc-dev for over a year.
Snowpatch is at https://github.com/ruscur/snowpatch
An example patch showing the checks having been run is
https://patchwork.ozlabs.org/patch/1190589/
I think there's a different CI system used for some device-tree patches:
e.g. https://patchwork.ozlabs.org/patch/1190714/ - I have no idea how
this works in the backend, but it also uses the patchwork API.
Regards,
Daniel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-07 10:57 ` Daniel Axtens
@ 2019-11-07 11:26 ` Veronika Kabatova
2019-11-08 0:24 ` Eric Wong
0 siblings, 1 reply; 33+ messages in thread
From: Veronika Kabatova @ 2019-11-07 11:26 UTC (permalink / raw)
To: Daniel Axtens, Dmitry Vyukov
Cc: Konstantin Ryabitsev, workflows, automated-testing,
Brendan Higgins, Han-Wen Nienhuys, Kevin Hilman
----- Original Message -----
> From: "Daniel Axtens" <dja@axtens.net>
> To: "Dmitry Vyukov" <dvyukov@google.com>, "Konstantin Ryabitsev" <konstantin@linuxfoundation.org>
> Cc: workflows@vger.kernel.org, automated-testing@yoctoproject.org, "Brendan Higgins" <brendanhiggins@google.com>,
> "Han-Wen Nienhuys" <hanwen@google.com>, "Kevin Hilman" <khilman@baylibre.com>, "Veronika Kabatova"
> <vkabatov@redhat.com>
> Sent: Thursday, November 7, 2019 11:57:19 AM
> Subject: Re: Structured feeds
>
> Dmitry Vyukov <dvyukov@google.com> writes:
>
> > On Wed, Nov 6, 2019 at 9:50 PM Konstantin Ryabitsev
> > <konstantin@linuxfoundation.org> wrote:
> >>
> >> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> >> >This is an non-trivial problem, fwiw. Patchwork's email parser clocks
> >> >in
> >> >at almost thirteen hundred lines, and that's with the benefit of the
> >> >Python standard library. It also regularly gets patched to handle
> >> >changes to email systems (e.g. DMARC), changes to git (git request-pull
> >> >format changed subtly in 2.14.3), the bizzare ways people send email,
> >> >and so on.
> >>
> >> I'm actually very interested in seeing patchwork switch from being fed
> >> mail directly from postfix to using public-inbox repositories as its
> >> source of patches. I know it's easy enough to accomplish as-is, by
> >> piping things from public-inbox to parsemail.sh, but it would be even
> >> more awesome if patchwork learned to work with these repos natively.
> >>
> >> The way I see it:
> >>
> >> - site administrator configures upstream public-inbox feeds
> >> - a backend process clones these repositories
> >> - if it doesn't find a refs/heads/json, then it does its own parsing
> >> to generate a structured feed with patches/series/trailers/pull
> >> requests, cross-referencing them by series as necessary. Something
> >> like a subset of this, excluding patchwork-specific data:
> >> https://patchwork.kernel.org/api/1.1/patches/11177661/
> >> - if it does find an existing structured feed, it simply uses it (e.g.
> >> it was made available by another patchwork instance)
> >
> > It's an interesting feature if a patchwork instance would convert and
> > export text emails to structured info. Then it can be consumed by CIs
> > for precommit testing and other systems without the need to duplicate
> > conversion.
>
> This already happens.
>
> Snowpatch does this and uses it to run CI checks on patch series as soon
> as they arrive, and sends them back to patchwork as test results. It has
> been running on linuxppc-dev for over a year.
>
> Snowpatch is at https://github.com/ruscur/snowpatch
>
> An example patch showing the checks having been run is
> https://patchwork.ozlabs.org/patch/1190589/
>
CKI does something similar too [0].
The code contains some RHEL-specific checks as we are not running patch
testing for upstream yet. The PW checks can be submitted from the pipeline.
We should probably update the trigger to use the events API...
The only "structured information" CKI requires is to have the patch in the
correct PW project, which is mapped to a git tree/branch so we know where
to apply the patch. However there are cases when more information is needed,
such as if multiple different branches can be used with the same project, or
the patch depends on another change.
This situation should be resolved with the freeform tagging feature I
proposed a while ago (blocked by DB refactoring; original series can be found
at [1]). This feature would allow developers to add any tags to their patches,
similar to the signed-off-by line. The extracted tags can then be queried in
the API and used by CI.
I'll be totally honest and admit I ignored most of the implementation details
of public inbox feeds (will take a look when I have some free time) but as
long as they contain the original email, the feature should be usable with
them too.
[0] https://gitlab.com/cki-project/pipeline-trigger/blob/master/triggers/patch_trigger.py
[1] https://patchwork.ozlabs.org/project/patchwork/list/?series=66057
Veronika
> I think there's a different CI system used for some device-tree patches:
> e.g. https://patchwork.ozlabs.org/patch/1190714/ - I have no idea how
> this works in the backend, but it also uses the patchwork API.
>
> Regards,
> Daniel
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-07 11:26 ` Veronika Kabatova
@ 2019-11-08 0:24 ` Eric Wong
0 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2019-11-08 0:24 UTC (permalink / raw)
To: Veronika Kabatova
Cc: Daniel Axtens, Dmitry Vyukov, Konstantin Ryabitsev, workflows,
automated-testing, Brendan Higgins, Han-Wen Nienhuys,
Kevin Hilman
Veronika Kabatova <vkabatov@redhat.com> wrote:
> I'll be totally honest and admit I ignored most of the implementation details
> of public inbox feeds (will take a look when I have some free time) but as
> long as they contain the original email, the feature should be usable with
> them too.
Implementation details should not matter to consumers.
public-inbox exposes everything as NNTP which is the same
message format as email. NNTP is also much more stable and
established than the v2 git layout of public-inbox (which could
be superceded by a hypothetical "v3" layout).
I highly recommend anybody consuming public-inbox (and not
making 1:1 mirrors) use NNTP since it's well-established
and doesn't enforce long-term storage requirements.
I hope to support HTTP(S) CONNECT tunneling as a means for users
behind firewalls to get around NNTP port 119/563 restrictions.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-06 20:50 ` Konstantin Ryabitsev
2019-11-07 9:08 ` Dmitry Vyukov
@ 2019-11-07 11:09 ` Daniel Axtens
2019-11-08 14:18 ` Daniel Axtens
2 siblings, 0 replies; 33+ messages in thread
From: Daniel Axtens @ 2019-11-07 11:09 UTC (permalink / raw)
To: Konstantin Ryabitsev, patchwork
Cc: Dmitry Vyukov, workflows, automated-testing, Brendan Higgins,
Han-Wen Nienhuys, Kevin Hilman, Veronika Kabatova
Sending on to the patchwork list for discussion. I think at least some
of this makes sense for Patchwork to support, I'll do a more detailed
analysis/breakdown later on.
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
>>This is an non-trivial problem, fwiw. Patchwork's email parser clocks
>>in
>>at almost thirteen hundred lines, and that's with the benefit of the
>>Python standard library. It also regularly gets patched to handle
>>changes to email systems (e.g. DMARC), changes to git (git request-pull
>>format changed subtly in 2.14.3), the bizzare ways people send email,
>>and so on.
>
> I'm actually very interested in seeing patchwork switch from being fed
> mail directly from postfix to using public-inbox repositories as its
> source of patches. I know it's easy enough to accomplish as-is, by
> piping things from public-inbox to parsemail.sh, but it would be even
> more awesome if patchwork learned to work with these repos natively.
>
> The way I see it:
>
> - site administrator configures upstream public-inbox feeds
> - a backend process clones these repositories
> - if it doesn't find a refs/heads/json, then it does its own parsing
> to generate a structured feed with patches/series/trailers/pull
> requests, cross-referencing them by series as necessary. Something
> like a subset of this, excluding patchwork-specific data:
> https://patchwork.kernel.org/api/1.1/patches/11177661/
> - if it does find an existing structured feed, it simply uses it (e.g.
> it was made available by another patchwork instance)
> - the same backend process updates the repositories from upstream using
> proper manifest files (e.g. see
> https://lore.kernel.org/workflows/manifest.js.gz)
>
> - patchwork projects then consume one (or more) of these structured
> feeds to generate the actionable list of patches that maintainers can
> use, perhaps with optional filtering by specific headers (list-id,
> from, cc), patch paths, keywords, etc.
>
> Basically, parsemail.sh is split into two, where one part does feed
> cloning, pulling, and parsing into structured data (if not already
> done), and another populates actual patchwork project with patches
> matching requested parameters.
>
> I see the following upsides to this:
>
> - we consume public-inbox feeds directly, no longer losing patches due
> to MTA problems, postfix burps, parse failures, etc
> - a project can have multiple sources for patches instead of being tied
> to a single mailing list
> - downstream patchwork instances (the "local patchwork" tool I mentioned
> earlier) can benefit from structured feeds provided by
> patchwork.kernel.org
>
>>Patchwork does expose much of this as an API, for example for patches:
>>https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>>build on that feel free. We can possibly add data to the API if that
>>would be helpful. (Patches are always welcome too, if you don't want to
>>wait an indeterminate amount of time.)
>
> As I said previously, I may be able to fund development of various
> features, but I want to make sure that I properly work with upstream.
> That requires getting consensus on features to make sure that we don't
> spend funds and efforts on a feature that gets rejected. :)
>
> Would the above feature (using one or more public-inbox repositories as
> sources for a patchwork project) be a welcome addition to upstream?
>
> -K
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-06 20:50 ` Konstantin Ryabitsev
2019-11-07 9:08 ` Dmitry Vyukov
2019-11-07 11:09 ` Daniel Axtens
@ 2019-11-08 14:18 ` Daniel Axtens
2019-11-09 7:41 ` Johannes Berg
2 siblings, 1 reply; 33+ messages in thread
From: Daniel Axtens @ 2019-11-08 14:18 UTC (permalink / raw)
To: Konstantin Ryabitsev, patchwork
Cc: Dmitry Vyukov, workflows, automated-testing, Brendan Higgins,
Han-Wen Nienhuys, Kevin Hilman, Veronika Kabatova
> I'm actually very interested in seeing patchwork switch from being fed
> mail directly from postfix to using public-inbox repositories as its
> source of patches. I know it's easy enough to accomplish as-is, by
> piping things from public-inbox to parsemail.sh, but it would be even
> more awesome if patchwork learned to work with these repos natively.
>
> The way I see it:
>
> - site administrator configures upstream public-inbox feeds
> - a backend process clones these repositories
> - if it doesn't find a refs/heads/json, then it does its own parsing
> to generate a structured feed with patches/series/trailers/pull
> requests, cross-referencing them by series as necessary. Something
> like a subset of this, excluding patchwork-specific data:
> https://patchwork.kernel.org/api/1.1/patches/11177661/
> - if it does find an existing structured feed, it simply uses it (e.g.
> it was made available by another patchwork instance)
> - the same backend process updates the repositories from upstream using
> proper manifest files (e.g. see
> https://lore.kernel.org/workflows/manifest.js.gz)
>
> - patchwork projects then consume one (or more) of these structured
> feeds to generate the actionable list of patches that maintainers can
> use, perhaps with optional filtering by specific headers (list-id,
> from, cc), patch paths, keywords, etc.
>
> Basically, parsemail.sh is split into two, where one part does feed
> cloning, pulling, and parsing into structured data (if not already
> done), and another populates actual patchwork project with patches
> matching requested parameters.
This is very confusing to me. Let me see if I have it correct.
You want to split out a chunk of parsemail that takes email messages,
either from regular email or from public-inbox, and spits out a
structured feed.
You then want patchwork to consume that structured feed.
I don't know how that would work architecturally - converting emails
into a structured feed requires a lot of the patchwork core.
It would be a lot simpler from the patchwork side to teach parsemail to
be able to consume a public-inbox git feed, and write an API consumer
that takes the structured data that Patchwork produces, strip out the
bits you don't care about, and feed it into other projects.
>
> I see the following upsides to this:
>
> - we consume public-inbox feeds directly, no longer losing patches due
> to MTA problems, postfix burps, parse failures, etc
This much I am OK with as an additional option for sites. FWIW,
consuming a public-inbox feed doesn't protect you against most parse
failures - they are due to things like duplicate message-ids and bad
mail from the sender end. It should prevent against issues due to
postfix invoking multiple parsemails in parallel, but that shouldn't be
losing patches, just getting series metadata wrong.
> - a project can have multiple sources for patches instead of being tied
> to a single mailing list
You can get around this pretty easily now with the --list-id=parameter,
and I think the netdev patchwork might do this to grab bpf patches? I
think there's a little shim at OzLabs that does this.
I also don't see how a public-inbox feed helps. Currently pw determines
the list based on a header in the email, unless overridden. public-inbox
emails will also have that header, so either patchwork looks at those
headers or you tell patchwork explicitly that a particular public-inbox
feed corresponds to a particular list. Either way I think this leaves
you in the same situation you were in before, unless I have
misunderstood...
> - downstream patchwork instances (the "local patchwork" tool I mentioned
> earlier) can benefit from structured feeds provided by
> patchwork.kernel.org
Do I understand correctly that this is basically a stripped-down version
of what the API provides, but in git form?
>>Patchwork does expose much of this as an API, for example for patches:
>>https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>>build on that feel free. We can possibly add data to the API if that
>>would be helpful. (Patches are always welcome too, if you don't want to
>>wait an indeterminate amount of time.)
>
> As I said previously, I may be able to fund development of various
> features, but I want to make sure that I properly work with upstream.
> That requires getting consensus on features to make sure that we don't
> spend funds and efforts on a feature that gets rejected. :)
>
> Would the above feature (using one or more public-inbox repositories as
> sources for a patchwork project) be a welcome addition to upstream?
I think a lot about patchwork development in terms of good incremental
changes. This is largely because maintainers get quite cross with us if
we break things, and I don't like that.
What I would be happy with as a first step (not necessarily saying this
is _all_ I would accept, just that this is what I'd want to see _first_)
is:
- code that efficiently reads a public-inbox git repository/folder of
git repositories and feeds it into the existing parser. I have very
inefficient code that converts public-inbox to an mbox and then
parses that, but I'm sure you can do better with a git library.
- careful thought about how to do this incrementally. It's obvious how
to do email incrementally, but I think you need to keep an extra bit
of state around to incrementally parse the git archive. I think.
- careful thought about how to do this in a way that doesn't require
sites that don't want to load public-inbox feeds to install lots of
random git-parsing code.
Once you can do that, I'm happy to think more about your more ambitious
plans.
Regards,
Daniel
>
> -K
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-08 14:18 ` Daniel Axtens
@ 2019-11-09 7:41 ` Johannes Berg
2019-11-12 10:44 ` Daniel Borkmann
[not found] ` <208edf06eb4c56a4f376caf0feced65f09d23f93.camel@that.guru>
0 siblings, 2 replies; 33+ messages in thread
From: Johannes Berg @ 2019-11-09 7:41 UTC (permalink / raw)
To: Daniel Axtens, Konstantin Ryabitsev, patchwork
Cc: workflows, Kevin Hilman, Brendan Higgins, Han-Wen Nienhuys,
automated-testing, Dmitry Vyukov
On Sat, 2019-11-09 at 01:18 +1100, Daniel Axtens wrote:
> >
> - code that efficiently reads a public-inbox git repository/folder of
> git repositories and feeds it into the existing parser. I have very
> inefficient code that converts public-inbox to an mbox and then
> parses that, but I'm sure you can do better with a git library.
Somebody (Daniel Borkmann?) posted a (very fast) public-inbox git to
maildir converter, with procmail support. I assume that would actually
satisfy this step already, since you can just substitute the patchwork
parser for procmail.
> - careful thought about how to do this incrementally. It's obvious how
> to do email incrementally, but I think you need to keep an extra bit
> of state around to incrementally parse the git archive. I think.
Not sure he had an incremental mode figured out there, but that can't
really be all *that* hard, just store the last-successfully-parsed git
sha1?
johannes
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-09 7:41 ` Johannes Berg
@ 2019-11-12 10:44 ` Daniel Borkmann
[not found] ` <208edf06eb4c56a4f376caf0feced65f09d23f93.camel@that.guru>
1 sibling, 0 replies; 33+ messages in thread
From: Daniel Borkmann @ 2019-11-12 10:44 UTC (permalink / raw)
To: Johannes Berg, Daniel Axtens, Konstantin Ryabitsev, patchwork
Cc: workflows, Kevin Hilman, Brendan Higgins, Han-Wen Nienhuys,
automated-testing, Dmitry Vyukov
On 11/9/19 8:41 AM, Johannes Berg wrote:
> On Sat, 2019-11-09 at 01:18 +1100, Daniel Axtens wrote:
>>>
>> - code that efficiently reads a public-inbox git repository/folder of
>> git repositories and feeds it into the existing parser. I have very
>> inefficient code that converts public-inbox to an mbox and then
>> parses that, but I'm sure you can do better with a git library.
>
> Somebody (Daniel Borkmann?) posted a (very fast) public-inbox git to
> maildir converter, with procmail support. I assume that would actually
> satisfy this step already, since you can just substitute the patchwork
> parser for procmail.
>
>> - careful thought about how to do this incrementally. It's obvious how
>> to do email incrementally, but I think you need to keep an extra bit
>> of state around to incrementally parse the git archive. I think.
>
> Not sure he had an incremental mode figured out there, but that can't
> really be all *that* hard, just store the last-successfully-parsed git
> sha1?
Yep, that is what it is doing, so that we only need to walk the repo(s)
upon a new git fetch to the point where we stopped last time.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <208edf06eb4c56a4f376caf0feced65f09d23f93.camel@that.guru>]
* Re: Structured feeds
[not found] ` <208edf06eb4c56a4f376caf0feced65f09d23f93.camel@that.guru>
@ 2019-11-30 18:16 ` Johannes Berg
2019-11-30 18:36 ` Stephen Finucane
0 siblings, 1 reply; 33+ messages in thread
From: Johannes Berg @ 2019-11-30 18:16 UTC (permalink / raw)
To: Stephen Finucane, Daniel Axtens, Konstantin Ryabitsev, patchwork
Cc: workflows, Kevin Hilman, Brendan Higgins, Han-Wen Nienhuys,
automated-testing, Dmitry Vyukov
On Sat, 2019-11-30 at 18:04 +0000, Stephen Finucane wrote:
> > Somebody (Daniel Borkmann?) posted a (very fast) public-inbox git to
> > maildir converter, with procmail support. I assume that would actually
> > satisfy this step already, since you can just substitute the patchwork
> > parser for procmail.
>
> What do you mean "substitute the patchwork parser for procmail"? From
> reading this thread, I got the impression that we'd be changing what
> feeds things into the 'parsemail' management command, right?
Yes, that's exactly what I meant. I was looking at it from Daniel's
tool's POV, so instead of calling procmail it can call patchwork.
johannes
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-30 18:16 ` Johannes Berg
@ 2019-11-30 18:36 ` Stephen Finucane
0 siblings, 0 replies; 33+ messages in thread
From: Stephen Finucane @ 2019-11-30 18:36 UTC (permalink / raw)
To: Johannes Berg, Daniel Axtens, Konstantin Ryabitsev, patchwork
Cc: workflows, Kevin Hilman, Brendan Higgins, Han-Wen Nienhuys,
automated-testing, Dmitry Vyukov
On Sat, 2019-11-30 at 19:16 +0100, Johannes Berg wrote:
> On Sat, 2019-11-30 at 18:04 +0000, Stephen Finucane wrote:
>
> > > Somebody (Daniel Borkmann?) posted a (very fast) public-inbox git to
> > > maildir converter, with procmail support. I assume that would actually
> > > satisfy this step already, since you can just substitute the patchwork
> > > parser for procmail.
> >
> > What do you mean "substitute the patchwork parser for procmail"? From
> > reading this thread, I got the impression that we'd be changing what
> > feeds things into the 'parsemail' management command, right?
>
> Yes, that's exactly what I meant. I was looking at it from Daniel's
> tool's POV, so instead of calling procmail it can call patchwork.
>
> johannes
Ah, then that I have no issues with :) I've managed to configure
getmail to feed into a patchwork instance exactly once, and found
configuring postfix so daunting that I actually suggested just using an
email-as-a-service provider to take the hassle out of things [1].
Anything that lets one avoid working with those tools is a good thing
in my mind.
Stephen
[1] https://patchwork.readthedocs.io/en/latest/deployment/installation/#use-a-email-as-a-service-provider
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-06 15:35 ` Daniel Axtens
2019-11-06 20:50 ` Konstantin Ryabitsev
@ 2019-11-07 8:53 ` Dmitry Vyukov
2019-11-07 10:40 ` Daniel Axtens
2019-11-07 20:43 ` [Automated-testing] " Don Zickus
2 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07 8:53 UTC (permalink / raw)
To: Daniel Axtens
Cc: workflows, automated-testing, Konstantin Ryabitsev,
Brendan Higgins, Han-Wen Nienhuys, Kevin Hilman,
Veronika Kabatova
On Wed, Nov 6, 2019 at 4:35 PM Daniel Axtens <dja@axtens.net> wrote:
>
> > As soon as we have a bridge from plain-text emails into the structured
> > form, we can start building everything else in the structured world.
> > Such bridge needs to parse new incoming emails, try to make sense out
> > of them (new patch, new patch version, comment, etc) and then push the
> > information in structured form. Then e.g. CIs can fetch info about
>
> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> at almost thirteen hundred lines, and that's with the benefit of the
> Python standard library. It also regularly gets patched to handle
> changes to email systems (e.g. DMARC), changes to git (git request-pull
> format changed subtly in 2.14.3), the bizzare ways people send email,
> and so on.
>
> Patchwork does expose much of this as an API, for example for patches:
> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
> build on that feel free. We can possibly add data to the API if that
> would be helpful. (Patches are always welcome too, if you don't want to
> wait an indeterminate amount of time.)
Hi Daniel,
Thanks!
Could you provide a link to the code?
Do you have a test suite for the parser (set of email samples and what
they should be parsed to)?
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-07 8:53 ` Dmitry Vyukov
@ 2019-11-07 10:40 ` Daniel Axtens
2019-11-07 10:43 ` Dmitry Vyukov
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Axtens @ 2019-11-07 10:40 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: workflows, automated-testing, Konstantin Ryabitsev,
Brendan Higgins, Han-Wen Nienhuys, Kevin Hilman,
Veronika Kabatova
Dmitry Vyukov <dvyukov@google.com> writes:
> On Wed, Nov 6, 2019 at 4:35 PM Daniel Axtens <dja@axtens.net> wrote:
>>
>> > As soon as we have a bridge from plain-text emails into the structured
>> > form, we can start building everything else in the structured world.
>> > Such bridge needs to parse new incoming emails, try to make sense out
>> > of them (new patch, new patch version, comment, etc) and then push the
>> > information in structured form. Then e.g. CIs can fetch info about
>>
>> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
>> at almost thirteen hundred lines, and that's with the benefit of the
>> Python standard library. It also regularly gets patched to handle
>> changes to email systems (e.g. DMARC), changes to git (git request-pull
>> format changed subtly in 2.14.3), the bizzare ways people send email,
>> and so on.
>>
>> Patchwork does expose much of this as an API, for example for patches:
>> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>> build on that feel free. We can possibly add data to the API if that
>> would be helpful. (Patches are always welcome too, if you don't want to
>> wait an indeterminate amount of time.)
>
> Hi Daniel,
>
> Thanks!
> Could you provide a link to the code?
> Do you have a test suite for the parser (set of email samples and what
> they should be parsed to)?
Sure:
https://github.com/getpatchwork/patchwork in particular
https://github.com/getpatchwork/patchwork/blob/master/patchwork/parser.py and
https://github.com/getpatchwork/patchwork/tree/master/patchwork/tests
Regards,
Daniel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-07 10:40 ` Daniel Axtens
@ 2019-11-07 10:43 ` Dmitry Vyukov
0 siblings, 0 replies; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07 10:43 UTC (permalink / raw)
To: Daniel Axtens
Cc: workflows, automated-testing, Konstantin Ryabitsev,
Brendan Higgins, Han-Wen Nienhuys, Kevin Hilman,
Veronika Kabatova
On Thu, Nov 7, 2019 at 11:41 AM Daniel Axtens <dja@axtens.net> wrote:
>
> Dmitry Vyukov <dvyukov@google.com> writes:
>
> > On Wed, Nov 6, 2019 at 4:35 PM Daniel Axtens <dja@axtens.net> wrote:
> >>
> >> > As soon as we have a bridge from plain-text emails into the structured
> >> > form, we can start building everything else in the structured world.
> >> > Such bridge needs to parse new incoming emails, try to make sense out
> >> > of them (new patch, new patch version, comment, etc) and then push the
> >> > information in structured form. Then e.g. CIs can fetch info about
> >>
> >> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> >> at almost thirteen hundred lines, and that's with the benefit of the
> >> Python standard library. It also regularly gets patched to handle
> >> changes to email systems (e.g. DMARC), changes to git (git request-pull
> >> format changed subtly in 2.14.3), the bizzare ways people send email,
> >> and so on.
> >>
> >> Patchwork does expose much of this as an API, for example for patches:
> >> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
> >> build on that feel free. We can possibly add data to the API if that
> >> would be helpful. (Patches are always welcome too, if you don't want to
> >> wait an indeterminate amount of time.)
> >
> > Hi Daniel,
> >
> > Thanks!
> > Could you provide a link to the code?
> > Do you have a test suite for the parser (set of email samples and what
> > they should be parsed to)?
>
> Sure:
> https://github.com/getpatchwork/patchwork in particular
> https://github.com/getpatchwork/patchwork/blob/master/patchwork/parser.py and
> https://github.com/getpatchwork/patchwork/tree/master/patchwork/tests
Added here for future reference:
https://github.com/dvyukov/kit/blob/master/doc/references.md#patchwork
Thanks!
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-06 15:35 ` Daniel Axtens
2019-11-06 20:50 ` Konstantin Ryabitsev
2019-11-07 8:53 ` Dmitry Vyukov
@ 2019-11-07 20:43 ` Don Zickus
2019-11-08 7:58 ` Dmitry Vyukov
2019-11-08 11:44 ` Daniel Axtens
2 siblings, 2 replies; 33+ messages in thread
From: Don Zickus @ 2019-11-07 20:43 UTC (permalink / raw)
To: Daniel Axtens
Cc: Dmitry Vyukov, workflows, automated-testing, Han-Wen Nienhuys,
Konstantin Ryabitsev
On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> > As soon as we have a bridge from plain-text emails into the structured
> > form, we can start building everything else in the structured world.
> > Such bridge needs to parse new incoming emails, try to make sense out
> > of them (new patch, new patch version, comment, etc) and then push the
> > information in structured form. Then e.g. CIs can fetch info about
>
> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> at almost thirteen hundred lines, and that's with the benefit of the
> Python standard library. It also regularly gets patched to handle
> changes to email systems (e.g. DMARC), changes to git (git request-pull
> format changed subtly in 2.14.3), the bizzare ways people send email,
> and so on.
Does it ever make sense to just use git to do the translation to structured
json? Git has similar logic and can easily handle its own changes. Tools
like git-mailinfo and git-mailsplit probably do a good chunk of the
work today.
It wouldn't pull together series info.
Just a thought.
Cheers,
Don
>
> Patchwork does expose much of this as an API, for example for patches:
> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
> build on that feel free. We can possibly add data to the API if that
> would be helpful. (Patches are always welcome too, if you don't want to
> wait an indeterminate amount of time.)
>
> Regards,
> Daniel
>
>
> --
> _______________________________________________
> automated-testing mailing list
> automated-testing@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/automated-testing
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-07 20:43 ` [Automated-testing] " Don Zickus
@ 2019-11-08 7:58 ` Dmitry Vyukov
2019-11-08 15:26 ` Don Zickus
2019-11-08 11:44 ` Daniel Axtens
1 sibling, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-08 7:58 UTC (permalink / raw)
To: Don Zickus
Cc: Daniel Axtens, workflows, automated-testing, Han-Wen Nienhuys,
Konstantin Ryabitsev
On Thu, Nov 7, 2019 at 9:44 PM Don Zickus <dzickus@redhat.com> wrote:
>
> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> > > As soon as we have a bridge from plain-text emails into the structured
> > > form, we can start building everything else in the structured world.
> > > Such bridge needs to parse new incoming emails, try to make sense out
> > > of them (new patch, new patch version, comment, etc) and then push the
> > > information in structured form. Then e.g. CIs can fetch info about
> >
> > This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> > at almost thirteen hundred lines, and that's with the benefit of the
> > Python standard library. It also regularly gets patched to handle
> > changes to email systems (e.g. DMARC), changes to git (git request-pull
> > format changed subtly in 2.14.3), the bizzare ways people send email,
> > and so on.
>
> Does it ever make sense to just use git to do the translation to structured
> json? Git has similar logic and can easily handle its own changes. Tools
> like git-mailinfo and git-mailsplit probably do a good chunk of the
> work today.
>
> It wouldn't pull together series info.
Hi Don,
Could you elaborate? What exactly do you mean? I don't understand the
overall proposal.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-08 7:58 ` Dmitry Vyukov
@ 2019-11-08 15:26 ` Don Zickus
0 siblings, 0 replies; 33+ messages in thread
From: Don Zickus @ 2019-11-08 15:26 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Daniel Axtens, workflows, automated-testing, Han-Wen Nienhuys,
Konstantin Ryabitsev
On Fri, Nov 08, 2019 at 08:58:44AM +0100, Dmitry Vyukov wrote:
> On Thu, Nov 7, 2019 at 9:44 PM Don Zickus <dzickus@redhat.com> wrote:
> >
> > On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> > > > As soon as we have a bridge from plain-text emails into the structured
> > > > form, we can start building everything else in the structured world.
> > > > Such bridge needs to parse new incoming emails, try to make sense out
> > > > of them (new patch, new patch version, comment, etc) and then push the
> > > > information in structured form. Then e.g. CIs can fetch info about
> > >
> > > This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> > > at almost thirteen hundred lines, and that's with the benefit of the
> > > Python standard library. It also regularly gets patched to handle
> > > changes to email systems (e.g. DMARC), changes to git (git request-pull
> > > format changed subtly in 2.14.3), the bizzare ways people send email,
> > > and so on.
> >
> > Does it ever make sense to just use git to do the translation to structured
> > json? Git has similar logic and can easily handle its own changes. Tools
> > like git-mailinfo and git-mailsplit probably do a good chunk of the
> > work today.
> >
> > It wouldn't pull together series info.
>
> Hi Don,
>
> Could you elaborate? What exactly do you mean? I don't understand the
> overall proposal.
The problem I was looking at was, patchwork has this large elaborate python
code to translate human git formatted patches into some structured form.
And rightfully so.
But git has similar code in order to make git-am work.
When applying an email to public-inbox, I had assumed it was using a tool
like git-am that would call into git-mailsplit and git-mailinfo to split
apart the email into various pieces and put them in .git/rebase-apply.
At that point most of the text parsing is done.
So the thought was to have another public-inbox tool that took advantage of
the already split data and just take the small step to finish converting
into a structured file 'j'. As opposed to sending the text email through an
external tool like patchwork to re-split the data into structured pieces
again.
Then adding to that thought was, every time git changed its format or text
output, instead of updating external tools, just leverage git's existing
knowledge of the change (assuming public-inbox used the latest git tool
consistently) would reduce the ripple effect of having to update all
external tools before developers can utilize new git features or changes.
But looking through the public-inbox code, it appears to do things
differently, so the idea may not work at all.
So just treat my idea as looking at the problem from a different angle to
see if there is an easier solution.
Cheers,
Don
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-07 20:43 ` [Automated-testing] " Don Zickus
2019-11-08 7:58 ` Dmitry Vyukov
@ 2019-11-08 11:44 ` Daniel Axtens
2019-11-08 14:54 ` Don Zickus
1 sibling, 1 reply; 33+ messages in thread
From: Daniel Axtens @ 2019-11-08 11:44 UTC (permalink / raw)
To: Don Zickus, patchwork
Cc: Dmitry Vyukov, workflows, automated-testing, Han-Wen Nienhuys,
Konstantin Ryabitsev
Don Zickus <dzickus@redhat.com> writes:
> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
>> > As soon as we have a bridge from plain-text emails into the structured
>> > form, we can start building everything else in the structured world.
>> > Such bridge needs to parse new incoming emails, try to make sense out
>> > of them (new patch, new patch version, comment, etc) and then push the
>> > information in structured form. Then e.g. CIs can fetch info about
>>
>> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
>> at almost thirteen hundred lines, and that's with the benefit of the
>> Python standard library. It also regularly gets patched to handle
>> changes to email systems (e.g. DMARC), changes to git (git request-pull
>> format changed subtly in 2.14.3), the bizzare ways people send email,
>> and so on.
>
> Does it ever make sense to just use git to do the translation to structured
> json? Git has similar logic and can easily handle its own changes. Tools
> like git-mailinfo and git-mailsplit probably do a good chunk of the
> work today.
>
+patchwork@
So patchwork, in theory at least, is VCS-agnostic: if a mail contains a
unified-diff, we can treat it as a patch. We do have some special
handling for git pull requests, but we also have tests for parsing of
CVS and if memory serves Mercurial too. So we haven't wanted to depend
on git-specific tools. Maybe in future we will give up on that, but we
haven't yet.
Regards,
Daniel
> It wouldn't pull together series info.
>
> Just a thought.
>
> Cheers,
> Don
>
>
>
>>
>> Patchwork does expose much of this as an API, for example for patches:
>> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>> build on that feel free. We can possibly add data to the API if that
>> would be helpful. (Patches are always welcome too, if you don't want to
>> wait an indeterminate amount of time.)
>>
>> Regards,
>> Daniel
>>
>>
>> --
>> _______________________________________________
>> automated-testing mailing list
>> automated-testing@yoctoproject.org
>> https://lists.yoctoproject.org/listinfo/automated-testing
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-08 11:44 ` Daniel Axtens
@ 2019-11-08 14:54 ` Don Zickus
0 siblings, 0 replies; 33+ messages in thread
From: Don Zickus @ 2019-11-08 14:54 UTC (permalink / raw)
To: Daniel Axtens
Cc: patchwork, Dmitry Vyukov, workflows, automated-testing,
Han-Wen Nienhuys, Konstantin Ryabitsev
On Fri, Nov 08, 2019 at 10:44:37PM +1100, Daniel Axtens wrote:
> Don Zickus <dzickus@redhat.com> writes:
>
> > On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> >> > As soon as we have a bridge from plain-text emails into the structured
> >> > form, we can start building everything else in the structured world.
> >> > Such bridge needs to parse new incoming emails, try to make sense out
> >> > of them (new patch, new patch version, comment, etc) and then push the
> >> > information in structured form. Then e.g. CIs can fetch info about
> >>
> >> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> >> at almost thirteen hundred lines, and that's with the benefit of the
> >> Python standard library. It also regularly gets patched to handle
> >> changes to email systems (e.g. DMARC), changes to git (git request-pull
> >> format changed subtly in 2.14.3), the bizzare ways people send email,
> >> and so on.
> >
> > Does it ever make sense to just use git to do the translation to structured
> > json? Git has similar logic and can easily handle its own changes. Tools
> > like git-mailinfo and git-mailsplit probably do a good chunk of the
> > work today.
> >
> +patchwork@
>
> So patchwork, in theory at least, is VCS-agnostic: if a mail contains a
> unified-diff, we can treat it as a patch. We do have some special
> handling for git pull requests, but we also have tests for parsing of
> CVS and if memory serves Mercurial too. So we haven't wanted to depend
> on git-specific tools. Maybe in future we will give up on that, but we
> haven't yet.
Fair point. Thanks!
Cheers,
Don
>
> Regards,
> Daniel
>
> > It wouldn't pull together series info.
> >
> > Just a thought.
> >
> > Cheers,
> > Don
> >
> >
> >
> >>
> >> Patchwork does expose much of this as an API, for example for patches:
> >> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
> >> build on that feel free. We can possibly add data to the API if that
> >> would be helpful. (Patches are always welcome too, if you don't want to
> >> wait an indeterminate amount of time.)
> >>
> >> Regards,
> >> Daniel
> >>
> >>
> >> --
> >> _______________________________________________
> >> automated-testing mailing list
> >> automated-testing@yoctoproject.org
> >> https://lists.yoctoproject.org/listinfo/automated-testing
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-05 10:02 Structured feeds Dmitry Vyukov
2019-11-06 15:35 ` Daniel Axtens
@ 2019-11-06 19:54 ` Han-Wen Nienhuys
2019-11-06 20:31 ` Sean Whitton
2019-11-07 9:04 ` Dmitry Vyukov
2019-11-07 8:48 ` [Automated-testing] " Tim.Bird
` (2 subsequent siblings)
4 siblings, 2 replies; 33+ messages in thread
From: Han-Wen Nienhuys @ 2019-11-06 19:54 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: workflows, automated-testing, Konstantin Ryabitsev,
Brendan Higgins, Kevin Hilman, Veronika Kabatova
On Tue, Nov 5, 2019 at 11:02 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> Eventually git-lfs (https://git-lfs.github.com) may be used to embed
>
> blob's right into feeds. This would allow users to fetch only the
> blobs they are interested in. But this does not need to happen from
> day one.
I would avoid building something around git-lfs. The git upstream
project is actively working on providing something that is less hacky
and more reproducible.
Also, if we're using Git to represent the feed and are thinking about
embedding blobs, it would be much more practical to just add a copy of
the linux kernel to the Lore repository, and introduce a commit for
each patch. The linux kernel is about 1.5G, which is much smaller than
the Lore archive, isn't it? You could store each patch under any of
these branch names :
refs/patches/MESSAGE-ID
refs/patches/URL-ESCAPE(MESSAGE-ID)
refs/patches/SHA1(MESSAGE-ID)
refs/patches/AUTHOR/MESSAGE-ID
this will lead to a large number of branches, but this is actually
something that is being addressed in Git with reftable.
> No work has been done on the actual form/schema of the structured
> feeds. That's something we need to figure out working on a prototype.
> However, good references would be git-appraise schema:
> https://github.com/google/git-appraise/tree/master/schema
> and gerrit schema (not sure what's a good link).
The gerrit schema for reviews is unfortunately not documented, but it
should be. I'll try to write down something next week, but here is the
gist of it:
Each review ("change") in Gerrit is numbered. The different revisions
("patchsets") of a change 12345 are stored under
refs/changes/45/12345/${PATCHSET_NUMBER}
they are stored as commits to the main project, ie. if you fetch this
ref, you can check out the proposed change.
A change 12345 has its review metadata under
refs/changes/45/12345/meta
The metadata is a notes branch. The commit messages on the branch hold
global data on the change (votes, global comments). The per file
comments are in a notemap, where the key is the SHA1 of the patchset
the comment refers to, and the value is JSON data. The format of the
JSON is here:
https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/server/notedb/RevisionNoteData.java#25
with the meat in Comment class
https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/entities/Comment.java#33
an example
{
"key": {
"uuid": "c7be1334_47885e36",
"filename":
"java/com/google/gerrit/server/restapi/project/CommitsCollection.java",
"patchSetId": 7
},
"lineNbr": 158,
"author": {
"id": 1026112
},
"writtenOn": "2019-11-06T09:00:50Z",
"side": 1,
"message": "nit: factor this out in a variable, use
toImmutableList as collector",
"range": {
"startLine": 156,
"startChar": 32,
"endLine": 158,
"endChar": 66
},
"revId": "071c601d6ee1a2a9f520415fd9efef8e00f9cf60",
"serverId": "173816e5-2b9a-37c3-8a2e-48639d4f1153",
"unresolved": true
},
for CI type comments, we have "checks" data and robot comments (an
extension of the previous comment), defined here:
https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/entities/RobotComment.java#22
here is an example of CI data that we keep:
"checks": {
"fmt:commitmsg-462a7efcf7234c5824393847968ddd28853aef6e": {
"state": "FAILED",
"message": "/COMMIT_MSG: subject must not end in \u0027.\u0027",
"started": "2019-09-13T17:12:46Z",
"created": "2019-09-11T17:42:40Z",
"updated": "2019-09-13T17:12:47Z"
}
JSON definition:
https://gerrit.googlesource.com/plugins/checks/+/0e609a4599d17308664e1d41c0f91447640ee9fe/java/com/google/gerrit/plugins/checks/db/NoteDbCheck.java#16
--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-06 19:54 ` Han-Wen Nienhuys
@ 2019-11-06 20:31 ` Sean Whitton
2019-11-07 9:04 ` Dmitry Vyukov
1 sibling, 0 replies; 33+ messages in thread
From: Sean Whitton @ 2019-11-06 20:31 UTC (permalink / raw)
To: Han-Wen Nienhuys; +Cc: workflows
Hello,
On Wed 06 Nov 2019 at 08:54PM +01, Han-Wen Nienhuys wrote:
> On Tue, Nov 5, 2019 at 11:02 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> Eventually git-lfs (https://git-lfs.github.com) may be used to embed
>>
>> blob's right into feeds. This would allow users to fetch only the
>> blobs they are interested in. But this does not need to happen from
>> day one.
>
> I would avoid building something around git-lfs. The git upstream
> project is actively working on providing something that is less hacky
> and more reproducible.
Could you share a link to this please?
(There is also git-annex.)
--
Sean Whitton
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-06 19:54 ` Han-Wen Nienhuys
2019-11-06 20:31 ` Sean Whitton
@ 2019-11-07 9:04 ` Dmitry Vyukov
1 sibling, 0 replies; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07 9:04 UTC (permalink / raw)
To: Han-Wen Nienhuys
Cc: workflows, automated-testing, Konstantin Ryabitsev,
Brendan Higgins, Kevin Hilman, Veronika Kabatova
On Wed, Nov 6, 2019 at 8:54 PM Han-Wen Nienhuys <hanwen@google.com> wrote:
>
> On Tue, Nov 5, 2019 at 11:02 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> >
> > blob's right into feeds. This would allow users to fetch only the
> > blobs they are interested in. But this does not need to happen from
> > day one.
>
> I would avoid building something around git-lfs. The git upstream
> project is actively working on providing something that is less hacky
> and more reproducible.
Noted.
I mostly just captured what Konstantin pointed to. I think (1) blob
embedding is not version 1, (2) whatever we do, somebody needs to
prototype and try first.
> Also, if we're using Git to represent the feed and are thinking about
> embedding blobs,
Blobs are not about patches. Patches are small and not binary.
Blobs would be kernel binaries, test binaries, images, code dumps, etc.
> it would be much more practical to just add a copy of
> the linux kernel to the Lore repository, and introduce a commit for
> each patch. The linux kernel is about 1.5G, which is much smaller than
> the Lore archive, isn't it? You could store each patch under any of
> these branch names :
>
> refs/patches/MESSAGE-ID
> refs/patches/URL-ESCAPE(MESSAGE-ID)
> refs/patches/SHA1(MESSAGE-ID)
> refs/patches/AUTHOR/MESSAGE-ID
>
> this will lead to a large number of branches, but this is actually
> something that is being addressed in Git with reftable.
Interesting. I need to think how exactly it can be integrated as
kernel is not a single tree. Though, obviously fetching exact git tree
is very nice. But it's somewhat orthogonal to feeds and may be
provided by another specialized bot feed ("I posted your patch to git
and it's available here"), this way this will work for legacy email
patches too.
> > No work has been done on the actual form/schema of the structured
> > feeds. That's something we need to figure out working on a prototype.
> > However, good references would be git-appraise schema:
> > https://github.com/google/git-appraise/tree/master/schema
> > and gerrit schema (not sure what's a good link).
>
>
> The gerrit schema for reviews is unfortunately not documented, but it
> should be. I'll try to write down something next week, but here is the
> gist of it:
>
> Each review ("change") in Gerrit is numbered. The different revisions
> ("patchsets") of a change 12345 are stored under
>
> refs/changes/45/12345/${PATCHSET_NUMBER}
>
> they are stored as commits to the main project, ie. if you fetch this
> ref, you can check out the proposed change.
>
> A change 12345 has its review metadata under
>
> refs/changes/45/12345/meta
>
> The metadata is a notes branch. The commit messages on the branch hold
> global data on the change (votes, global comments). The per file
> comments are in a notemap, where the key is the SHA1 of the patchset
> the comment refers to, and the value is JSON data. The format of the
> JSON is here:
>
> https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/server/notedb/RevisionNoteData.java#25
>
> with the meat in Comment class
>
> https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/entities/Comment.java#33
>
> an example
>
> {
> "key": {
> "uuid": "c7be1334_47885e36",
> "filename":
> "java/com/google/gerrit/server/restapi/project/CommitsCollection.java",
> "patchSetId": 7
> },
> "lineNbr": 158,
> "author": {
> "id": 1026112
> },
> "writtenOn": "2019-11-06T09:00:50Z",
> "side": 1,
> "message": "nit: factor this out in a variable, use
> toImmutableList as collector",
> "range": {
> "startLine": 156,
> "startChar": 32,
> "endLine": 158,
> "endChar": 66
> },
> "revId": "071c601d6ee1a2a9f520415fd9efef8e00f9cf60",
> "serverId": "173816e5-2b9a-37c3-8a2e-48639d4f1153",
> "unresolved": true
> },
>
> for CI type comments, we have "checks" data and robot comments (an
> extension of the previous comment), defined here:
>
> https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/entities/RobotComment.java#22
>
> here is an example of CI data that we keep:
>
> "checks": {
> "fmt:commitmsg-462a7efcf7234c5824393847968ddd28853aef6e": {
> "state": "FAILED",
> "message": "/COMMIT_MSG: subject must not end in \u0027.\u0027",
> "started": "2019-09-13T17:12:46Z",
> "created": "2019-09-11T17:42:40Z",
> "updated": "2019-09-13T17:12:47Z"
> }
>
> JSON definition:
> https://gerrit.googlesource.com/plugins/checks/+/0e609a4599d17308664e1d41c0f91447640ee9fe/java/com/google/gerrit/plugins/checks/db/NoteDbCheck.java#16
I've added a reference to this for future reference here:
https://github.com/dvyukov/kit/blob/master/doc/references.md
Thanks!
^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [Automated-testing] Structured feeds
2019-11-05 10:02 Structured feeds Dmitry Vyukov
2019-11-06 15:35 ` Daniel Axtens
2019-11-06 19:54 ` Han-Wen Nienhuys
@ 2019-11-07 8:48 ` Tim.Bird
2019-11-07 9:13 ` Dmitry Vyukov
2019-11-07 20:53 ` Don Zickus
2019-11-12 22:54 ` Konstantin Ryabitsev
4 siblings, 1 reply; 33+ messages in thread
From: Tim.Bird @ 2019-11-07 8:48 UTC (permalink / raw)
To: dvyukov, workflows, automated-testing; +Cc: hanwen, konstantin
> -----Original Message-----
> From: Dmitry Vyukov
>
> This is another follow up after Lyon meetings. The main discussion was
> mainly around email process (attestation, archival, etc):
> https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
>
> I think providing info in a structured form is the key for allowing
> building more tooling and automation at a reasonable price. So I
> discussed with CI/Gerrit people and Konstantin how the structured
> information can fit into the current "feeds model" and what would be
> the next steps for bringing it to life.
>
> Here is the outline of the idea.
> The current public inbox format is a git repo with refs/heads/master
> that contains a single file "m" in RFC822 format. We add
> refs/heads/json with a single file "j" that contains structured data
> in JSON format. 2 separate branches b/c some clients may want to fetch
> just one of them.
Can you provide some idea (maybe a few examples) of the types of
structured data that would be in the json branch?
-- Tim
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-07 8:48 ` [Automated-testing] " Tim.Bird
@ 2019-11-07 9:13 ` Dmitry Vyukov
2019-11-07 9:20 ` Tim.Bird
0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07 9:13 UTC (permalink / raw)
To: Tim Bird
Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev
On Thu, Nov 7, 2019 at 9:48 AM <Tim.Bird@sony.com> wrote:
> > -----Original Message-----
> > From: Dmitry Vyukov
> >
> > This is another follow up after Lyon meetings. The main discussion was
> > mainly around email process (attestation, archival, etc):
> > https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> >
> > I think providing info in a structured form is the key for allowing
> > building more tooling and automation at a reasonable price. So I
> > discussed with CI/Gerrit people and Konstantin how the structured
> > information can fit into the current "feeds model" and what would be
> > the next steps for bringing it to life.
> >
> > Here is the outline of the idea.
> > The current public inbox format is a git repo with refs/heads/master
> > that contains a single file "m" in RFC822 format. We add
> > refs/heads/json with a single file "j" that contains structured data
> > in JSON format. 2 separate branches b/c some clients may want to fetch
> > just one of them.
>
> Can you provide some idea (maybe a few examples) of the types of
> structured data that would be in the json branch?
Hi Tim,
Nobody yet tried to define exact formats. Generatelly it should expose
info about patches, comments, test results in an easy to consume form.
Here are examples for patchwork, git-appraise and gerrit:
https://patchwork.ozlabs.org/api/patches/?order=-id
https://github.com/google/git-appraise/tree/master/schema
https://lore.kernel.org/workflows/87sgn0zr09.fsf@iris.silentflame.com/T/#m3db87b43cf5e581ba4d3a7fd5f1fbff5aea3546a
I would expect that the format would resemble these formats to
significant degree. But I guess we need to come up with something, try
to use, see what's missing/needs to be improved and iterate.
Do you have any specific recommendations or ways in which you see it
will be consumed?
^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [Automated-testing] Structured feeds
2019-11-07 9:13 ` Dmitry Vyukov
@ 2019-11-07 9:20 ` Tim.Bird
0 siblings, 0 replies; 33+ messages in thread
From: Tim.Bird @ 2019-11-07 9:20 UTC (permalink / raw)
To: dvyukov; +Cc: workflows, automated-testing, hanwen, konstantin
> -----Original Message-----
> From: Dmitry Vyukov
>
> On Thu, Nov 7, 2019 at 9:48 AM <Tim.Bird@sony.com> wrote:
> > > -----Original Message-----
> > > From: Dmitry Vyukov
> > >
> > > This is another follow up after Lyon meetings. The main discussion was
> > > mainly around email process (attestation, archival, etc):
> > >
> https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> > >
> > > I think providing info in a structured form is the key for allowing
> > > building more tooling and automation at a reasonable price. So I
> > > discussed with CI/Gerrit people and Konstantin how the structured
> > > information can fit into the current "feeds model" and what would be
> > > the next steps for bringing it to life.
> > >
> > > Here is the outline of the idea.
> > > The current public inbox format is a git repo with refs/heads/master
> > > that contains a single file "m" in RFC822 format. We add
> > > refs/heads/json with a single file "j" that contains structured data
> > > in JSON format. 2 separate branches b/c some clients may want to fetch
> > > just one of them.
> >
> > Can you provide some idea (maybe a few examples) of the types of
> > structured data that would be in the json branch?
>
> Hi Tim,
>
> Nobody yet tried to define exact formats. Generatelly it should expose
> info about patches, comments, test results in an easy to consume form.
> Here are examples for patchwork, git-appraise and gerrit:
> https://patchwork.ozlabs.org/api/patches/?order=-id
> https://github.com/google/git-appraise/tree/master/schema
> https://lore.kernel.org/workflows/87sgn0zr09.fsf@iris.silentflame.com/T/#
> m3db87b43cf5e581ba4d3a7fd5f1fbff5aea3546a
> I would expect that the format would resemble these formats to
> significant degree. But I guess we need to come up with something, try
> to use, see what's missing/needs to be improved and iterate.
> Do you have any specific recommendations or ways in which you see it
> will be consumed?
I have no recommendations. (yet.) :-)
I'm just trying to figure out what the proposal is about, and it's scope.
Thanks for the links.
-- Tim
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-05 10:02 Structured feeds Dmitry Vyukov
` (2 preceding siblings ...)
2019-11-07 8:48 ` [Automated-testing] " Tim.Bird
@ 2019-11-07 20:53 ` Don Zickus
2019-11-08 8:05 ` Dmitry Vyukov
2019-11-12 22:54 ` Konstantin Ryabitsev
4 siblings, 1 reply; 33+ messages in thread
From: Don Zickus @ 2019-11-07 20:53 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev
On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> Hi,
>
> This is another follow up after Lyon meetings. The main discussion was
> mainly around email process (attestation, archival, etc):
> https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
>
> I think providing info in a structured form is the key for allowing
> building more tooling and automation at a reasonable price. So I
> discussed with CI/Gerrit people and Konstantin how the structured
> information can fit into the current "feeds model" and what would be
> the next steps for bringing it to life.
>
> Here is the outline of the idea.
> The current public inbox format is a git repo with refs/heads/master
> that contains a single file "m" in RFC822 format. We add
> refs/heads/json with a single file "j" that contains structured data
> in JSON format. 2 separate branches b/c some clients may want to fetch
> just one of them.
>
> Current clients will only create plain text "m" entry. However, newer
> clients can also create a parallel "j" entry with the same info in
> structured form. "m" and "j" are cross-referenced using the
> Message-ID. It's OK to have only "m", or both, but not only "j" (any
> client needs to generate at least some text representation for every
> message).
Interesting idea.
One of the nuisances of email is the client tools have quirks. In Red Hat,
we have used patchworkV1 for quite a long time. These email client 'quirks'
broke a lot of expectations in the database leading us to fix the tool and
manually clean up the data.
In the case of translating to a 'j' file. What happens if the data is
incorrectly translated due to client 'quirks'? Is it expected the 'j' data
is manually reviewed before committing (probably not). Or is it left alone
as-is? Or a follow-on 'j' change is committed?
A similar problem could probably be expanded to CI systems contributing their
data in some result file 'r'.
Cheers,
Don
>
> Currently we have public inbox feeds only for mailing lists. The idea
> is that more entities will have own "private" feeds. For example, each
> CI system, static analysis system, or third-party code review system
> has its own feed. Eventually people have own feeds too. The feeds can
> be relatively easily converted to local inbox, important into GMail,
> etc (potentially with some filtering).
>
> Besides private feeds there are also aggregated feeds to not require
> everybody to fetch thousands of repositories. kernel.org will provide
> one, but it can be mirrored (or build independently) anywhere else. If
> I create https://github.com/dvyukov/kfeed.git for my feed and Linus
> creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
> then the aggregated feed will map these to the following branches:
> refs/heads/github.com/dvyukov/kfeed/master
> refs/heads/github.com/dvyukov/kfeed/json
> refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
> refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
> Standardized naming of sub-feeds allows a single repo to host multiple
> feeds. For example, github/gitlab/gerrit bridge could host multiple
> individual feeds for their users.
> So far there is no proposal for feed auto-discovery. One needs to
> notify kernel.org for inclusion of their feed into the main aggregated
> feed.
>
> Konstantin offered that kernel.org can send emails for some feeds.
> That is, normally one sends out an email and then commits it to the
> feed. Instead some systems can just commit the message to feed and
> then kernel.org will pull the feed and send emails on user's behalf.
> This allows clients to not deal with email at all (including mail
> client setup). Which is nice.
>
> Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> blob's right into feeds. This would allow users to fetch only the
> blobs they are interested in. But this does not need to happen from
> day one.
>
> As soon as we have a bridge from plain-text emails into the structured
> form, we can start building everything else in the structured world.
> Such bridge needs to parse new incoming emails, try to make sense out
> of them (new patch, new patch version, comment, etc) and then push the
> information in structured form. Then e.g. CIs can fetch info about
> patches under review, test and post strctured results. Bridging in the
> opposite direction happens semi-automatically as CI also pushes text
> representation of results and that just needs to be sent as email.
> Alternatively, we could have a separate explicit converted of
> structured message into plain text, which would allow to remove some
> duplication and present results in more consistent form.
>
> Similarly, it should be much simpler for Patchwork/Gerrit to present
> current patches under review. Local mode should work almost seamlessly
> -- you fetch the aggregated feed and then run local instance on top of
> it.
>
> No work has been done on the actual form/schema of the structured
> feeds. That's something we need to figure out working on a prototype.
> However, good references would be git-appraise schema:
> https://github.com/google/git-appraise/tree/master/schema
> and gerrit schema (not sure what's a good link). Does anybody know
> where the gitlab schema is? Or other similar schemes?
>
> Thoughts and comments are welcome.
> Thanks
> --
> _______________________________________________
> automated-testing mailing list
> automated-testing@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/automated-testing
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-07 20:53 ` Don Zickus
@ 2019-11-08 8:05 ` Dmitry Vyukov
2019-11-08 14:52 ` Don Zickus
0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-08 8:05 UTC (permalink / raw)
To: Don Zickus
Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev
On Thu, Nov 7, 2019 at 9:53 PM Don Zickus <dzickus@redhat.com> wrote:
>
> On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> > Hi,
> >
> > This is another follow up after Lyon meetings. The main discussion was
> > mainly around email process (attestation, archival, etc):
> > https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> >
> > I think providing info in a structured form is the key for allowing
> > building more tooling and automation at a reasonable price. So I
> > discussed with CI/Gerrit people and Konstantin how the structured
> > information can fit into the current "feeds model" and what would be
> > the next steps for bringing it to life.
> >
> > Here is the outline of the idea.
> > The current public inbox format is a git repo with refs/heads/master
> > that contains a single file "m" in RFC822 format. We add
> > refs/heads/json with a single file "j" that contains structured data
> > in JSON format. 2 separate branches b/c some clients may want to fetch
> > just one of them.
> >
> > Current clients will only create plain text "m" entry. However, newer
> > clients can also create a parallel "j" entry with the same info in
> > structured form. "m" and "j" are cross-referenced using the
> > Message-ID. It's OK to have only "m", or both, but not only "j" (any
> > client needs to generate at least some text representation for every
> > message).
>
> Interesting idea.
>
> One of the nuisances of email is the client tools have quirks. In Red Hat,
> we have used patchworkV1 for quite a long time. These email client 'quirks'
> broke a lot of expectations in the database leading us to fix the tool and
> manually clean up the data.
>
> In the case of translating to a 'j' file. What happens if the data is
> incorrectly translated due to client 'quirks'? Is it expected the 'j' data
> is manually reviewed before committing (probably not). Or is it left alone
> as-is? Or a follow-on 'j' change is committed?
Good point.
I would expect that eventually there will be updates to the format and
new version. Which is easy to add to json with "version":2 attribute.
Code that parses these messages will need to keep quirks for older
formats.
Realistically nobody will review the data (besides the initial
testing). I guess in the end it depends on (1) how bad it's screwed,
(2) if correct data is preserved in at least some form or not
(consider a client pushes bad structured data, but it's also
misrepresented in the plain text form, or simply missing there).
Fixing up data later is not possible. Appending corrections is possible.
> A similar problem could probably be expanded to CI systems contributing their
> data in some result file 'r'.
The idea is that all systems push "j'. It's the contents of the feed
that matter. CI systems will push messages of different types (test
results), but we don't need "r" for this.
> Cheers,
> Don
>
> >
> > Currently we have public inbox feeds only for mailing lists. The idea
> > is that more entities will have own "private" feeds. For example, each
> > CI system, static analysis system, or third-party code review system
> > has its own feed. Eventually people have own feeds too. The feeds can
> > be relatively easily converted to local inbox, important into GMail,
> > etc (potentially with some filtering).
> >
> > Besides private feeds there are also aggregated feeds to not require
> > everybody to fetch thousands of repositories. kernel.org will provide
> > one, but it can be mirrored (or build independently) anywhere else. If
> > I create https://github.com/dvyukov/kfeed.git for my feed and Linus
> > creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
> > then the aggregated feed will map these to the following branches:
> > refs/heads/github.com/dvyukov/kfeed/master
> > refs/heads/github.com/dvyukov/kfeed/json
> > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
> > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
> > Standardized naming of sub-feeds allows a single repo to host multiple
> > feeds. For example, github/gitlab/gerrit bridge could host multiple
> > individual feeds for their users.
> > So far there is no proposal for feed auto-discovery. One needs to
> > notify kernel.org for inclusion of their feed into the main aggregated
> > feed.
> >
> > Konstantin offered that kernel.org can send emails for some feeds.
> > That is, normally one sends out an email and then commits it to the
> > feed. Instead some systems can just commit the message to feed and
> > then kernel.org will pull the feed and send emails on user's behalf.
> > This allows clients to not deal with email at all (including mail
> > client setup). Which is nice.
> >
> > Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> > blob's right into feeds. This would allow users to fetch only the
> > blobs they are interested in. But this does not need to happen from
> > day one.
> >
> > As soon as we have a bridge from plain-text emails into the structured
> > form, we can start building everything else in the structured world.
> > Such bridge needs to parse new incoming emails, try to make sense out
> > of them (new patch, new patch version, comment, etc) and then push the
> > information in structured form. Then e.g. CIs can fetch info about
> > patches under review, test and post strctured results. Bridging in the
> > opposite direction happens semi-automatically as CI also pushes text
> > representation of results and that just needs to be sent as email.
> > Alternatively, we could have a separate explicit converted of
> > structured message into plain text, which would allow to remove some
> > duplication and present results in more consistent form.
> >
> > Similarly, it should be much simpler for Patchwork/Gerrit to present
> > current patches under review. Local mode should work almost seamlessly
> > -- you fetch the aggregated feed and then run local instance on top of
> > it.
> >
> > No work has been done on the actual form/schema of the structured
> > feeds. That's something we need to figure out working on a prototype.
> > However, good references would be git-appraise schema:
> > https://github.com/google/git-appraise/tree/master/schema
> > and gerrit schema (not sure what's a good link). Does anybody know
> > where the gitlab schema is? Or other similar schemes?
> >
> > Thoughts and comments are welcome.
> > Thanks
> > --
> > _______________________________________________
> > automated-testing mailing list
> > automated-testing@yoctoproject.org
> > https://lists.yoctoproject.org/listinfo/automated-testing
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-08 8:05 ` Dmitry Vyukov
@ 2019-11-08 14:52 ` Don Zickus
2019-11-11 9:20 ` Dmitry Vyukov
0 siblings, 1 reply; 33+ messages in thread
From: Don Zickus @ 2019-11-08 14:52 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev
On Fri, Nov 08, 2019 at 09:05:02AM +0100, Dmitry Vyukov wrote:
> On Thu, Nov 7, 2019 at 9:53 PM Don Zickus <dzickus@redhat.com> wrote:
> >
> > On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> > > Hi,
> > >
> > > This is another follow up after Lyon meetings. The main discussion was
> > > mainly around email process (attestation, archival, etc):
> > > https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> > >
> > > I think providing info in a structured form is the key for allowing
> > > building more tooling and automation at a reasonable price. So I
> > > discussed with CI/Gerrit people and Konstantin how the structured
> > > information can fit into the current "feeds model" and what would be
> > > the next steps for bringing it to life.
> > >
> > > Here is the outline of the idea.
> > > The current public inbox format is a git repo with refs/heads/master
> > > that contains a single file "m" in RFC822 format. We add
> > > refs/heads/json with a single file "j" that contains structured data
> > > in JSON format. 2 separate branches b/c some clients may want to fetch
> > > just one of them.
> > >
> > > Current clients will only create plain text "m" entry. However, newer
> > > clients can also create a parallel "j" entry with the same info in
> > > structured form. "m" and "j" are cross-referenced using the
> > > Message-ID. It's OK to have only "m", or both, but not only "j" (any
> > > client needs to generate at least some text representation for every
> > > message).
> >
> > Interesting idea.
> >
> > One of the nuisances of email is the client tools have quirks. In Red Hat,
> > we have used patchworkV1 for quite a long time. These email client 'quirks'
> > broke a lot of expectations in the database leading us to fix the tool and
> > manually clean up the data.
> >
> > In the case of translating to a 'j' file. What happens if the data is
> > incorrectly translated due to client 'quirks'? Is it expected the 'j' data
> > is manually reviewed before committing (probably not). Or is it left alone
> > as-is? Or a follow-on 'j' change is committed?
>
> Good point.
> I would expect that eventually there will be updates to the format and
> new version. Which is easy to add to json with "version":2 attribute.
> Code that parses these messages will need to keep quirks for older
> formats.
> Realistically nobody will review the data (besides the initial
> testing). I guess in the end it depends on (1) how bad it's screwed,
> (2) if correct data is preserved in at least some form or not
> (consider a client pushes bad structured data, but it's also
> misrepresented in the plain text form, or simply missing there).
> Fixing up data later is not possible. Appending corrections is possible.
Ok. Yeah, in my head I was thinking the data is largely right, just
occasionally 1 or 2 fields was misrepresented due to bad client tool or
human error in the text.
In Red Hat was use internal metadata for checking our patches through our
process (namely Bugzilla id). It isn't unusual for someone to accidentally
fat-finger the bugzilla id when posting their patch.
I was thinking if there is a follow-on 'type' that appends corrections as you
stated, say 'type: correction' that 'corrects the original data. This would
have to be linked through message-id or some unique identifier.
Then I assume any tool that parses the feed 'j' would correlate all the data
based around some unique ids such that picking up corrections would just be
a natural extension?
Cheers,
Don
>
> > A similar problem could probably be expanded to CI systems contributing their
> > data in some result file 'r'.
>
> The idea is that all systems push "j'. It's the contents of the feed
> that matter. CI systems will push messages of different types (test
> results), but we don't need "r" for this.
>
> > Cheers,
> > Don
> >
> > >
> > > Currently we have public inbox feeds only for mailing lists. The idea
> > > is that more entities will have own "private" feeds. For example, each
> > > CI system, static analysis system, or third-party code review system
> > > has its own feed. Eventually people have own feeds too. The feeds can
> > > be relatively easily converted to local inbox, important into GMail,
> > > etc (potentially with some filtering).
> > >
> > > Besides private feeds there are also aggregated feeds to not require
> > > everybody to fetch thousands of repositories. kernel.org will provide
> > > one, but it can be mirrored (or build independently) anywhere else. If
> > > I create https://github.com/dvyukov/kfeed.git for my feed and Linus
> > > creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
> > > then the aggregated feed will map these to the following branches:
> > > refs/heads/github.com/dvyukov/kfeed/master
> > > refs/heads/github.com/dvyukov/kfeed/json
> > > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
> > > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
> > > Standardized naming of sub-feeds allows a single repo to host multiple
> > > feeds. For example, github/gitlab/gerrit bridge could host multiple
> > > individual feeds for their users.
> > > So far there is no proposal for feed auto-discovery. One needs to
> > > notify kernel.org for inclusion of their feed into the main aggregated
> > > feed.
> > >
> > > Konstantin offered that kernel.org can send emails for some feeds.
> > > That is, normally one sends out an email and then commits it to the
> > > feed. Instead some systems can just commit the message to feed and
> > > then kernel.org will pull the feed and send emails on user's behalf.
> > > This allows clients to not deal with email at all (including mail
> > > client setup). Which is nice.
> > >
> > > Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> > > blob's right into feeds. This would allow users to fetch only the
> > > blobs they are interested in. But this does not need to happen from
> > > day one.
> > >
> > > As soon as we have a bridge from plain-text emails into the structured
> > > form, we can start building everything else in the structured world.
> > > Such bridge needs to parse new incoming emails, try to make sense out
> > > of them (new patch, new patch version, comment, etc) and then push the
> > > information in structured form. Then e.g. CIs can fetch info about
> > > patches under review, test and post strctured results. Bridging in the
> > > opposite direction happens semi-automatically as CI also pushes text
> > > representation of results and that just needs to be sent as email.
> > > Alternatively, we could have a separate explicit converted of
> > > structured message into plain text, which would allow to remove some
> > > duplication and present results in more consistent form.
> > >
> > > Similarly, it should be much simpler for Patchwork/Gerrit to present
> > > current patches under review. Local mode should work almost seamlessly
> > > -- you fetch the aggregated feed and then run local instance on top of
> > > it.
> > >
> > > No work has been done on the actual form/schema of the structured
> > > feeds. That's something we need to figure out working on a prototype.
> > > However, good references would be git-appraise schema:
> > > https://github.com/google/git-appraise/tree/master/schema
> > > and gerrit schema (not sure what's a good link). Does anybody know
> > > where the gitlab schema is? Or other similar schemes?
> > >
> > > Thoughts and comments are welcome.
> > > Thanks
> > > --
> > > _______________________________________________
> > > automated-testing mailing list
> > > automated-testing@yoctoproject.org
> > > https://lists.yoctoproject.org/listinfo/automated-testing
> >
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-08 14:52 ` Don Zickus
@ 2019-11-11 9:20 ` Dmitry Vyukov
2019-11-11 15:14 ` Don Zickus
0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-11 9:20 UTC (permalink / raw)
To: Don Zickus
Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev
On Fri, Nov 8, 2019 at 3:53 PM Don Zickus <dzickus@redhat.com> wrote:
>
> On Fri, Nov 08, 2019 at 09:05:02AM +0100, Dmitry Vyukov wrote:
> > On Thu, Nov 7, 2019 at 9:53 PM Don Zickus <dzickus@redhat.com> wrote:
> > >
> > > On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> > > > Hi,
> > > >
> > > > This is another follow up after Lyon meetings. The main discussion was
> > > > mainly around email process (attestation, archival, etc):
> > > > https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> > > >
> > > > I think providing info in a structured form is the key for allowing
> > > > building more tooling and automation at a reasonable price. So I
> > > > discussed with CI/Gerrit people and Konstantin how the structured
> > > > information can fit into the current "feeds model" and what would be
> > > > the next steps for bringing it to life.
> > > >
> > > > Here is the outline of the idea.
> > > > The current public inbox format is a git repo with refs/heads/master
> > > > that contains a single file "m" in RFC822 format. We add
> > > > refs/heads/json with a single file "j" that contains structured data
> > > > in JSON format. 2 separate branches b/c some clients may want to fetch
> > > > just one of them.
> > > >
> > > > Current clients will only create plain text "m" entry. However, newer
> > > > clients can also create a parallel "j" entry with the same info in
> > > > structured form. "m" and "j" are cross-referenced using the
> > > > Message-ID. It's OK to have only "m", or both, but not only "j" (any
> > > > client needs to generate at least some text representation for every
> > > > message).
> > >
> > > Interesting idea.
> > >
> > > One of the nuisances of email is the client tools have quirks. In Red Hat,
> > > we have used patchworkV1 for quite a long time. These email client 'quirks'
> > > broke a lot of expectations in the database leading us to fix the tool and
> > > manually clean up the data.
> > >
> > > In the case of translating to a 'j' file. What happens if the data is
> > > incorrectly translated due to client 'quirks'? Is it expected the 'j' data
> > > is manually reviewed before committing (probably not). Or is it left alone
> > > as-is? Or a follow-on 'j' change is committed?
> >
> > Good point.
> > I would expect that eventually there will be updates to the format and
> > new version. Which is easy to add to json with "version":2 attribute.
> > Code that parses these messages will need to keep quirks for older
> > formats.
> > Realistically nobody will review the data (besides the initial
> > testing). I guess in the end it depends on (1) how bad it's screwed,
> > (2) if correct data is preserved in at least some form or not
> > (consider a client pushes bad structured data, but it's also
> > misrepresented in the plain text form, or simply missing there).
> > Fixing up data later is not possible. Appending corrections is possible.
>
> Ok. Yeah, in my head I was thinking the data is largely right, just
> occasionally 1 or 2 fields was misrepresented due to bad client tool or
> human error in the text.
>
> In Red Hat was use internal metadata for checking our patches through our
> process (namely Bugzilla id). It isn't unusual for someone to accidentally
> fat-finger the bugzilla id when posting their patch.
>
> I was thinking if there is a follow-on 'type' that appends corrections as you
> stated, say 'type: correction' that 'corrects the original data. This would
> have to be linked through message-id or some unique identifier.
>
> Then I assume any tool that parses the feed 'j' would correlate all the data
> based around some unique ids such that picking up corrections would just be
> a natural extension?
Yes, this should be handled naturally in this model. Since it's not
possible to mutate any previously published info, everything is
represented as additions/corrections: adding a comment to a patch,
adding Reviewed-by, adding Nack, adding test results. The final state
of a patch is always reconstructed by "replaying" all messages
published regarding the patch. So naturally if we mis-parsed a message
as "Acked-by: X" and then corrected that to "Nacked-by: X" and
republished, whoever will replay the feed, should replace Acked-by
with Nacked-by.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Automated-testing] Structured feeds
2019-11-11 9:20 ` Dmitry Vyukov
@ 2019-11-11 15:14 ` Don Zickus
0 siblings, 0 replies; 33+ messages in thread
From: Don Zickus @ 2019-11-11 15:14 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev
On Mon, Nov 11, 2019 at 10:20:22AM +0100, Dmitry Vyukov wrote:
> > Ok. Yeah, in my head I was thinking the data is largely right, just
> > occasionally 1 or 2 fields was misrepresented due to bad client tool or
> > human error in the text.
> >
> > In Red Hat was use internal metadata for checking our patches through our
> > process (namely Bugzilla id). It isn't unusual for someone to accidentally
> > fat-finger the bugzilla id when posting their patch.
> >
> > I was thinking if there is a follow-on 'type' that appends corrections as you
> > stated, say 'type: correction' that 'corrects the original data. This would
> > have to be linked through message-id or some unique identifier.
> >
> > Then I assume any tool that parses the feed 'j' would correlate all the data
> > based around some unique ids such that picking up corrections would just be
> > a natural extension?
>
> Yes, this should be handled naturally in this model. Since it's not
> possible to mutate any previously published info, everything is
> represented as additions/corrections: adding a comment to a patch,
> adding Reviewed-by, adding Nack, adding test results. The final state
> of a patch is always reconstructed by "replaying" all messages
> published regarding the patch. So naturally if we mis-parsed a message
> as "Acked-by: X" and then corrected that to "Nacked-by: X" and
> republished, whoever will replay the feed, should replace Acked-by
> with Nacked-by.
Great. That makes sense to me. Thanks!
Cheers,
Don
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Structured feeds
2019-11-05 10:02 Structured feeds Dmitry Vyukov
` (3 preceding siblings ...)
2019-11-07 20:53 ` Don Zickus
@ 2019-11-12 22:54 ` Konstantin Ryabitsev
4 siblings, 0 replies; 33+ messages in thread
From: Konstantin Ryabitsev @ 2019-11-12 22:54 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: workflows, automated-testing, Brendan Higgins, Han-Wen Nienhuys,
Kevin Hilman, Veronika Kabatova
On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> Hi,
>
> This is another follow up after Lyon meetings. The main discussion was
> mainly around email process (attestation, archival, etc):
> https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
>
> I think providing info in a structured form is the key for allowing
> building more tooling and automation at a reasonable price. So I
> discussed with CI/Gerrit people and Konstantin how the structured
> information can fit into the current "feeds model" and what would be
> the next steps for bringing it to life.
BTW, someone recently highlighted the following project to me:
https://openci.io (certificate lapsed, so ignore the browser warning)
The goal of this workgroup was to establish cross-ci communication using
pubsub subscriptions and broadcasted events. The following docs may be
of interest to people on this list:
https://event-driven-federated-cicd.openci.io/key-considerations-and-contraints
https://pipeline-messaging-protocol.openci.io/
-K
^ permalink raw reply [flat|nested] 33+ messages in thread