All of lore.kernel.org
 help / color / mirror / Atom feed
* Structured feeds
@ 2019-11-05 10:02 Dmitry Vyukov
  2019-11-06 15:35 ` Daniel Axtens
                   ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-05 10:02 UTC (permalink / raw)
  To: workflows, automated-testing
  Cc: Konstantin Ryabitsev, Brendan Higgins, Han-Wen Nienhuys,
	Kevin Hilman, Veronika Kabatova

Hi,

This is another follow up after Lyon meetings. The main discussion was
mainly around email process (attestation, archival, etc):
https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t

I think providing info in a structured form is the key for allowing
building more tooling and automation at a reasonable price. So I
discussed with CI/Gerrit people and Konstantin how the structured
information can fit into the current "feeds model" and what would be
the next steps for bringing it to life.

Here is the outline of the idea.
The current public inbox format is a git repo with refs/heads/master
that contains a single file "m" in RFC822 format. We add
refs/heads/json with a single file "j" that contains structured data
in JSON format. 2 separate branches b/c some clients may want to fetch
just one of them.

Current clients will only create plain text "m" entry. However, newer
clients can also create a parallel "j" entry with the same info in
structured form. "m" and "j" are cross-referenced using the
Message-ID. It's OK to have only "m", or both, but not only "j" (any
client needs to generate at least some text representation for every
message).

Currently we have public inbox feeds only for mailing lists. The idea
is that more entities will have own "private" feeds. For example, each
CI system, static analysis system, or third-party code review system
has its own feed. Eventually people have own feeds too. The feeds can
be relatively easily converted to local inbox, important into GMail,
etc (potentially with some filtering).

Besides private feeds there are also aggregated feeds to not require
everybody to fetch thousands of repositories. kernel.org will provide
one, but it can be mirrored (or build independently) anywhere else. If
I create https://github.com/dvyukov/kfeed.git for my feed and Linus
creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
then the aggregated feed will map these to the following branches:
refs/heads/github.com/dvyukov/kfeed/master
refs/heads/github.com/dvyukov/kfeed/json
refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
Standardized naming of sub-feeds allows a single repo to host multiple
feeds. For example, github/gitlab/gerrit bridge could host multiple
individual feeds for their users.
So far there is no proposal for feed auto-discovery. One needs to
notify kernel.org for inclusion of their feed into the main aggregated
feed.

Konstantin offered that kernel.org can send emails for some feeds.
That is, normally one sends out an email and then commits it to the
feed. Instead some systems can just commit the message to feed and
then kernel.org will pull the feed and send emails on user's behalf.
This allows clients to not deal with email at all (including mail
client setup). Which is nice.

Eventually git-lfs (https://git-lfs.github.com) may be used to embed
blob's right into feeds. This would allow users to fetch only the
blobs they are interested in. But this does not need to happen from
day one.

As soon as we have a bridge from plain-text emails into the structured
form, we can start building everything else in the structured world.
Such bridge needs to parse new incoming emails, try to make sense out
of them (new patch, new patch version, comment, etc) and then push the
information in structured form. Then e.g. CIs can fetch info about
patches under review, test and post strctured results. Bridging in the
opposite direction happens semi-automatically as CI also pushes text
representation of results and that just needs to be sent as email.
Alternatively, we could have a separate explicit converted of
structured message into plain text, which would allow to remove some
duplication and present results in more consistent form.

Similarly, it should be much simpler for Patchwork/Gerrit to present
current patches under review. Local mode should work almost seamlessly
-- you fetch the aggregated feed and then run local instance on top of
it.

No work has been done on the actual form/schema of the structured
feeds. That's something we need to figure out working on a prototype.
However, good references would be git-appraise schema:
https://github.com/google/git-appraise/tree/master/schema
and gerrit schema (not sure what's a good link). Does anybody know
where the gitlab schema is? Or other similar schemes?

Thoughts and comments are welcome.
Thanks

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-05 10:02 Structured feeds Dmitry Vyukov
@ 2019-11-06 15:35 ` Daniel Axtens
  2019-11-06 20:50   ` Konstantin Ryabitsev
                     ` (2 more replies)
  2019-11-06 19:54 ` Han-Wen Nienhuys
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 33+ messages in thread
From: Daniel Axtens @ 2019-11-06 15:35 UTC (permalink / raw)
  To: Dmitry Vyukov, workflows, automated-testing
  Cc: Konstantin Ryabitsev, Brendan Higgins, Han-Wen Nienhuys,
	Kevin Hilman, Veronika Kabatova

> As soon as we have a bridge from plain-text emails into the structured
> form, we can start building everything else in the structured world.
> Such bridge needs to parse new incoming emails, try to make sense out
> of them (new patch, new patch version, comment, etc) and then push the
> information in structured form. Then e.g. CIs can fetch info about

This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
at almost thirteen hundred lines, and that's with the benefit of the
Python standard library. It also regularly gets patched to handle
changes to email systems (e.g. DMARC), changes to git (git request-pull
format changed subtly in 2.14.3), the bizzare ways people send email,
and so on.

Patchwork does expose much of this as an API, for example for patches:
https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
build on that feel free. We can possibly add data to the API if that
would be helpful. (Patches are always welcome too, if you don't want to
wait an indeterminate amount of time.)

Regards,
Daniel



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-05 10:02 Structured feeds Dmitry Vyukov
  2019-11-06 15:35 ` Daniel Axtens
@ 2019-11-06 19:54 ` Han-Wen Nienhuys
  2019-11-06 20:31   ` Sean Whitton
  2019-11-07  9:04   ` Dmitry Vyukov
  2019-11-07  8:48 ` [Automated-testing] " Tim.Bird
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 33+ messages in thread
From: Han-Wen Nienhuys @ 2019-11-06 19:54 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: workflows, automated-testing, Konstantin Ryabitsev,
	Brendan Higgins, Kevin Hilman, Veronika Kabatova

On Tue, Nov 5, 2019 at 11:02 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> Eventually git-lfs (https://git-lfs.github.com) may be used to embed
>
> blob's right into feeds. This would allow users to fetch only the
> blobs they are interested in. But this does not need to happen from
> day one.

I would avoid building something around git-lfs. The git upstream
project is actively working on providing something that is less hacky
and more reproducible.

Also, if we're using Git to represent the feed and are thinking about
embedding blobs, it would be much more practical to just add a copy of
the linux kernel to the Lore repository, and introduce a commit for
each patch. The linux kernel is about 1.5G, which is much smaller than
the Lore archive, isn't it? You could store each patch under any of
these branch names :

  refs/patches/MESSAGE-ID
  refs/patches/URL-ESCAPE(MESSAGE-ID)
  refs/patches/SHA1(MESSAGE-ID)
  refs/patches/AUTHOR/MESSAGE-ID

this will lead to a large number of branches, but this is actually
something that is being addressed in Git with reftable.

> No work has been done on the actual form/schema of the structured
> feeds. That's something we need to figure out working on a prototype.
> However, good references would be git-appraise schema:
> https://github.com/google/git-appraise/tree/master/schema
> and gerrit schema (not sure what's a good link).


The gerrit schema for reviews is unfortunately not documented, but it
should be. I'll try to write down something next week, but here is the
gist of it:

Each review ("change") in Gerrit is numbered. The different revisions
("patchsets") of a change 12345 are stored under

  refs/changes/45/12345/${PATCHSET_NUMBER}

they are stored as commits to the main project, ie. if you fetch this
ref, you can check out the proposed change.

A change 12345 has its review metadata under

  refs/changes/45/12345/meta

The metadata is a notes branch. The commit messages on the branch hold
global data on the change (votes, global comments). The per file
comments are in a notemap, where the key is the SHA1 of the patchset
the comment refers to, and the value is JSON data. The format of the
JSON is here:

 https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/server/notedb/RevisionNoteData.java#25

with the meat in Comment class

  https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/entities/Comment.java#33

an example

   {
      "key": {
        "uuid": "c7be1334_47885e36",
        "filename":
"java/com/google/gerrit/server/restapi/project/CommitsCollection.java",
        "patchSetId": 7
      },
      "lineNbr": 158,
      "author": {
        "id": 1026112
      },
      "writtenOn": "2019-11-06T09:00:50Z",
      "side": 1,
      "message": "nit: factor this out in a variable, use
toImmutableList as collector",
      "range": {
        "startLine": 156,
        "startChar": 32,
        "endLine": 158,
        "endChar": 66
      },
      "revId": "071c601d6ee1a2a9f520415fd9efef8e00f9cf60",
      "serverId": "173816e5-2b9a-37c3-8a2e-48639d4f1153",
      "unresolved": true
    },

for CI type comments, we have "checks" data and robot comments (an
extension of the previous comment), defined here:

https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/entities/RobotComment.java#22

here is an example of CI data that we keep:

 "checks": {
    "fmt:commitmsg-462a7efcf7234c5824393847968ddd28853aef6e": {
      "state": "FAILED",
      "message": "/COMMIT_MSG: subject must not end in \u0027.\u0027",
      "started": "2019-09-13T17:12:46Z",
      "created": "2019-09-11T17:42:40Z",
      "updated": "2019-09-13T17:12:47Z"
    }

JSON definition:
https://gerrit.googlesource.com/plugins/checks/+/0e609a4599d17308664e1d41c0f91447640ee9fe/java/com/google/gerrit/plugins/checks/db/NoteDbCheck.java#16


-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-06 19:54 ` Han-Wen Nienhuys
@ 2019-11-06 20:31   ` Sean Whitton
  2019-11-07  9:04   ` Dmitry Vyukov
  1 sibling, 0 replies; 33+ messages in thread
From: Sean Whitton @ 2019-11-06 20:31 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: workflows

Hello,

On Wed 06 Nov 2019 at 08:54PM +01, Han-Wen Nienhuys wrote:

> On Tue, Nov 5, 2019 at 11:02 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>>
>> Eventually git-lfs (https://git-lfs.github.com) may be used to embed
>>
>> blob's right into feeds. This would allow users to fetch only the
>> blobs they are interested in. But this does not need to happen from
>> day one.
>
> I would avoid building something around git-lfs. The git upstream
> project is actively working on providing something that is less hacky
> and more reproducible.

Could you share a link to this please?

(There is also git-annex.)

-- 
Sean Whitton

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-06 15:35 ` Daniel Axtens
@ 2019-11-06 20:50   ` Konstantin Ryabitsev
  2019-11-07  9:08     ` Dmitry Vyukov
                       ` (2 more replies)
  2019-11-07  8:53   ` Dmitry Vyukov
  2019-11-07 20:43   ` [Automated-testing] " Don Zickus
  2 siblings, 3 replies; 33+ messages in thread
From: Konstantin Ryabitsev @ 2019-11-06 20:50 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Dmitry Vyukov, workflows, automated-testing, Brendan Higgins,
	Han-Wen Nienhuys, Kevin Hilman, Veronika Kabatova

On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
>This is an non-trivial problem, fwiw. Patchwork's email parser clocks 
>in
>at almost thirteen hundred lines, and that's with the benefit of the
>Python standard library. It also regularly gets patched to handle
>changes to email systems (e.g. DMARC), changes to git (git request-pull
>format changed subtly in 2.14.3), the bizzare ways people send email,
>and so on.

I'm actually very interested in seeing patchwork switch from being fed 
mail directly from postfix to using public-inbox repositories as its 
source of patches. I know it's easy enough to accomplish as-is, by 
piping things from public-inbox to parsemail.sh, but it would be even 
more awesome if patchwork learned to work with these repos natively.

The way I see it:

- site administrator configures upstream public-inbox feeds
- a backend process clones these repositories
   - if it doesn't find a refs/heads/json, then it does its own parsing 
     to generate a structured feed with patches/series/trailers/pull 
     requests, cross-referencing them by series as necessary. Something 
     like a subset of this, excluding patchwork-specific data:
     https://patchwork.kernel.org/api/1.1/patches/11177661/
   - if it does find an existing structured feed, it simply uses it (e.g.  
     it was made available by another patchwork instance)
- the same backend process updates the repositories from upstream using 
   proper manifest files (e.g. see 
   https://lore.kernel.org/workflows/manifest.js.gz)

- patchwork projects then consume one (or more) of these structured 
   feeds to generate the actionable list of patches that maintainers can 
   use, perhaps with optional filtering by specific headers (list-id, 
   from, cc), patch paths, keywords, etc.

Basically, parsemail.sh is split into two, where one part does feed 
cloning, pulling, and parsing into structured data (if not already 
done), and another populates actual patchwork project with patches 
matching requested parameters.

I see the following upsides to this:

- we consume public-inbox feeds directly, no longer losing patches due 
   to MTA problems, postfix burps, parse failures, etc
- a project can have multiple sources for patches instead of being tied 
   to a single mailing list
- downstream patchwork instances (the "local patchwork" tool I mentioned 
   earlier) can benefit from structured feeds provided by 
   patchwork.kernel.org

>Patchwork does expose much of this as an API, for example for patches:
>https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>build on that feel free. We can possibly add data to the API if that
>would be helpful. (Patches are always welcome too, if you don't want to
>wait an indeterminate amount of time.)

As I said previously, I may be able to fund development of various 
features, but I want to make sure that I properly work with upstream.  
That requires getting consensus on features to make sure that we don't 
spend funds and efforts on a feature that gets rejected. :)

Would the above feature (using one or more public-inbox repositories as 
sources for a patchwork project) be a welcome addition to upstream?

-K

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [Automated-testing] Structured feeds
  2019-11-05 10:02 Structured feeds Dmitry Vyukov
  2019-11-06 15:35 ` Daniel Axtens
  2019-11-06 19:54 ` Han-Wen Nienhuys
@ 2019-11-07  8:48 ` Tim.Bird
  2019-11-07  9:13   ` Dmitry Vyukov
  2019-11-07 20:53 ` Don Zickus
  2019-11-12 22:54 ` Konstantin Ryabitsev
  4 siblings, 1 reply; 33+ messages in thread
From: Tim.Bird @ 2019-11-07  8:48 UTC (permalink / raw)
  To: dvyukov, workflows, automated-testing; +Cc: hanwen, konstantin



> -----Original Message-----
> From: Dmitry Vyukov
> 
> This is another follow up after Lyon meetings. The main discussion was
> mainly around email process (attestation, archival, etc):
> https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> 
> I think providing info in a structured form is the key for allowing
> building more tooling and automation at a reasonable price. So I
> discussed with CI/Gerrit people and Konstantin how the structured
> information can fit into the current "feeds model" and what would be
> the next steps for bringing it to life.
> 
> Here is the outline of the idea.
> The current public inbox format is a git repo with refs/heads/master
> that contains a single file "m" in RFC822 format. We add
> refs/heads/json with a single file "j" that contains structured data
> in JSON format. 2 separate branches b/c some clients may want to fetch
> just one of them.

Can you provide some idea (maybe a few examples) of the types of
structured data that would  be in the json branch?
 -- Tim


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-06 15:35 ` Daniel Axtens
  2019-11-06 20:50   ` Konstantin Ryabitsev
@ 2019-11-07  8:53   ` Dmitry Vyukov
  2019-11-07 10:40     ` Daniel Axtens
  2019-11-07 20:43   ` [Automated-testing] " Don Zickus
  2 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07  8:53 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: workflows, automated-testing, Konstantin Ryabitsev,
	Brendan Higgins, Han-Wen Nienhuys, Kevin Hilman,
	Veronika Kabatova

On Wed, Nov 6, 2019 at 4:35 PM Daniel Axtens <dja@axtens.net> wrote:
>
> > As soon as we have a bridge from plain-text emails into the structured
> > form, we can start building everything else in the structured world.
> > Such bridge needs to parse new incoming emails, try to make sense out
> > of them (new patch, new patch version, comment, etc) and then push the
> > information in structured form. Then e.g. CIs can fetch info about
>
> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> at almost thirteen hundred lines, and that's with the benefit of the
> Python standard library. It also regularly gets patched to handle
> changes to email systems (e.g. DMARC), changes to git (git request-pull
> format changed subtly in 2.14.3), the bizzare ways people send email,
> and so on.
>
> Patchwork does expose much of this as an API, for example for patches:
> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
> build on that feel free. We can possibly add data to the API if that
> would be helpful. (Patches are always welcome too, if you don't want to
> wait an indeterminate amount of time.)

Hi Daniel,

Thanks!
Could you provide a link to the code?
Do you have a test suite for the parser (set of email samples and what
they should be parsed to)?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-06 19:54 ` Han-Wen Nienhuys
  2019-11-06 20:31   ` Sean Whitton
@ 2019-11-07  9:04   ` Dmitry Vyukov
  1 sibling, 0 replies; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07  9:04 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: workflows, automated-testing, Konstantin Ryabitsev,
	Brendan Higgins, Kevin Hilman, Veronika Kabatova

On Wed, Nov 6, 2019 at 8:54 PM Han-Wen Nienhuys <hanwen@google.com> wrote:
>
> On Tue, Nov 5, 2019 at 11:02 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> >
> > blob's right into feeds. This would allow users to fetch only the
> > blobs they are interested in. But this does not need to happen from
> > day one.
>
> I would avoid building something around git-lfs. The git upstream
> project is actively working on providing something that is less hacky
> and more reproducible.

Noted.
I mostly just captured what Konstantin pointed to. I think (1) blob
embedding is not version 1, (2) whatever we do, somebody needs to
prototype and try first.

> Also, if we're using Git to represent the feed and are thinking about
> embedding blobs,

Blobs are not about patches. Patches are small and not binary.
Blobs would be kernel binaries, test binaries, images, code dumps, etc.

> it would be much more practical to just add a copy of
> the linux kernel to the Lore repository, and introduce a commit for
> each patch. The linux kernel is about 1.5G, which is much smaller than
> the Lore archive, isn't it? You could store each patch under any of
> these branch names :
>
>   refs/patches/MESSAGE-ID
>   refs/patches/URL-ESCAPE(MESSAGE-ID)
>   refs/patches/SHA1(MESSAGE-ID)
>   refs/patches/AUTHOR/MESSAGE-ID
>
> this will lead to a large number of branches, but this is actually
> something that is being addressed in Git with reftable.

Interesting. I need to think how exactly it can be integrated as
kernel is not a single tree. Though, obviously fetching exact git tree
is very nice. But it's somewhat orthogonal to feeds and may be
provided by another specialized bot feed ("I posted your patch to git
and it's available here"), this way this will work for legacy email
patches too.

> > No work has been done on the actual form/schema of the structured
> > feeds. That's something we need to figure out working on a prototype.
> > However, good references would be git-appraise schema:
> > https://github.com/google/git-appraise/tree/master/schema
> > and gerrit schema (not sure what's a good link).
>
>
> The gerrit schema for reviews is unfortunately not documented, but it
> should be. I'll try to write down something next week, but here is the
> gist of it:
>
> Each review ("change") in Gerrit is numbered. The different revisions
> ("patchsets") of a change 12345 are stored under
>
>   refs/changes/45/12345/${PATCHSET_NUMBER}
>
> they are stored as commits to the main project, ie. if you fetch this
> ref, you can check out the proposed change.
>
> A change 12345 has its review metadata under
>
>   refs/changes/45/12345/meta
>
> The metadata is a notes branch. The commit messages on the branch hold
> global data on the change (votes, global comments). The per file
> comments are in a notemap, where the key is the SHA1 of the patchset
> the comment refers to, and the value is JSON data. The format of the
> JSON is here:
>
>  https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/server/notedb/RevisionNoteData.java#25
>
> with the meat in Comment class
>
>   https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/entities/Comment.java#33
>
> an example
>
>    {
>       "key": {
>         "uuid": "c7be1334_47885e36",
>         "filename":
> "java/com/google/gerrit/server/restapi/project/CommitsCollection.java",
>         "patchSetId": 7
>       },
>       "lineNbr": 158,
>       "author": {
>         "id": 1026112
>       },
>       "writtenOn": "2019-11-06T09:00:50Z",
>       "side": 1,
>       "message": "nit: factor this out in a variable, use
> toImmutableList as collector",
>       "range": {
>         "startLine": 156,
>         "startChar": 32,
>         "endLine": 158,
>         "endChar": 66
>       },
>       "revId": "071c601d6ee1a2a9f520415fd9efef8e00f9cf60",
>       "serverId": "173816e5-2b9a-37c3-8a2e-48639d4f1153",
>       "unresolved": true
>     },
>
> for CI type comments, we have "checks" data and robot comments (an
> extension of the previous comment), defined here:
>
> https://gerrit.googlesource.com/gerrit/+/9a6b8da5736536405da8bf5956fb3b47e322afa8/java/com/google/gerrit/entities/RobotComment.java#22
>
> here is an example of CI data that we keep:
>
>  "checks": {
>     "fmt:commitmsg-462a7efcf7234c5824393847968ddd28853aef6e": {
>       "state": "FAILED",
>       "message": "/COMMIT_MSG: subject must not end in \u0027.\u0027",
>       "started": "2019-09-13T17:12:46Z",
>       "created": "2019-09-11T17:42:40Z",
>       "updated": "2019-09-13T17:12:47Z"
>     }
>
> JSON definition:
> https://gerrit.googlesource.com/plugins/checks/+/0e609a4599d17308664e1d41c0f91447640ee9fe/java/com/google/gerrit/plugins/checks/db/NoteDbCheck.java#16

I've added a reference to this for future reference here:
https://github.com/dvyukov/kit/blob/master/doc/references.md
Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-06 20:50   ` Konstantin Ryabitsev
@ 2019-11-07  9:08     ` Dmitry Vyukov
  2019-11-07 10:57       ` Daniel Axtens
  2019-11-07 11:09     ` Daniel Axtens
  2019-11-08 14:18     ` Daniel Axtens
  2 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07  9:08 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Daniel Axtens, workflows, automated-testing, Brendan Higgins,
	Han-Wen Nienhuys, Kevin Hilman, Veronika Kabatova

On Wed, Nov 6, 2019 at 9:50 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> >This is an non-trivial problem, fwiw. Patchwork's email parser clocks
> >in
> >at almost thirteen hundred lines, and that's with the benefit of the
> >Python standard library. It also regularly gets patched to handle
> >changes to email systems (e.g. DMARC), changes to git (git request-pull
> >format changed subtly in 2.14.3), the bizzare ways people send email,
> >and so on.
>
> I'm actually very interested in seeing patchwork switch from being fed
> mail directly from postfix to using public-inbox repositories as its
> source of patches. I know it's easy enough to accomplish as-is, by
> piping things from public-inbox to parsemail.sh, but it would be even
> more awesome if patchwork learned to work with these repos natively.
>
> The way I see it:
>
> - site administrator configures upstream public-inbox feeds
> - a backend process clones these repositories
>    - if it doesn't find a refs/heads/json, then it does its own parsing
>      to generate a structured feed with patches/series/trailers/pull
>      requests, cross-referencing them by series as necessary. Something
>      like a subset of this, excluding patchwork-specific data:
>      https://patchwork.kernel.org/api/1.1/patches/11177661/
>    - if it does find an existing structured feed, it simply uses it (e.g.
>      it was made available by another patchwork instance)

It's an interesting feature if a patchwork instance would convert and
export text emails to structured info. Then it can be consumed by CIs
for precommit testing and other systems without the need to duplicate
conversion.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-07  8:48 ` [Automated-testing] " Tim.Bird
@ 2019-11-07  9:13   ` Dmitry Vyukov
  2019-11-07  9:20     ` Tim.Bird
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07  9:13 UTC (permalink / raw)
  To: Tim Bird
  Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev

On Thu, Nov 7, 2019 at 9:48 AM <Tim.Bird@sony.com> wrote:
> > -----Original Message-----
> > From: Dmitry Vyukov
> >
> > This is another follow up after Lyon meetings. The main discussion was
> > mainly around email process (attestation, archival, etc):
> > https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> >
> > I think providing info in a structured form is the key for allowing
> > building more tooling and automation at a reasonable price. So I
> > discussed with CI/Gerrit people and Konstantin how the structured
> > information can fit into the current "feeds model" and what would be
> > the next steps for bringing it to life.
> >
> > Here is the outline of the idea.
> > The current public inbox format is a git repo with refs/heads/master
> > that contains a single file "m" in RFC822 format. We add
> > refs/heads/json with a single file "j" that contains structured data
> > in JSON format. 2 separate branches b/c some clients may want to fetch
> > just one of them.
>
> Can you provide some idea (maybe a few examples) of the types of
> structured data that would  be in the json branch?

Hi Tim,

Nobody yet tried to define exact formats. Generatelly it should expose
info about patches, comments, test results in an easy to consume form.
Here are examples for patchwork, git-appraise and gerrit:
https://patchwork.ozlabs.org/api/patches/?order=-id
https://github.com/google/git-appraise/tree/master/schema
https://lore.kernel.org/workflows/87sgn0zr09.fsf@iris.silentflame.com/T/#m3db87b43cf5e581ba4d3a7fd5f1fbff5aea3546a
I would expect that the format would resemble these formats to
significant degree. But I guess we need to come up with something, try
to use, see what's missing/needs to be improved and iterate.
Do you have any specific recommendations or ways in which you see it
will be consumed?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [Automated-testing] Structured feeds
  2019-11-07  9:13   ` Dmitry Vyukov
@ 2019-11-07  9:20     ` Tim.Bird
  0 siblings, 0 replies; 33+ messages in thread
From: Tim.Bird @ 2019-11-07  9:20 UTC (permalink / raw)
  To: dvyukov; +Cc: workflows, automated-testing, hanwen, konstantin



> -----Original Message-----
> From: Dmitry Vyukov 
> 
> On Thu, Nov 7, 2019 at 9:48 AM <Tim.Bird@sony.com> wrote:
> > > -----Original Message-----
> > > From: Dmitry Vyukov
> > >
> > > This is another follow up after Lyon meetings. The main discussion was
> > > mainly around email process (attestation, archival, etc):
> > >
> https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> > >
> > > I think providing info in a structured form is the key for allowing
> > > building more tooling and automation at a reasonable price. So I
> > > discussed with CI/Gerrit people and Konstantin how the structured
> > > information can fit into the current "feeds model" and what would be
> > > the next steps for bringing it to life.
> > >
> > > Here is the outline of the idea.
> > > The current public inbox format is a git repo with refs/heads/master
> > > that contains a single file "m" in RFC822 format. We add
> > > refs/heads/json with a single file "j" that contains structured data
> > > in JSON format. 2 separate branches b/c some clients may want to fetch
> > > just one of them.
> >
> > Can you provide some idea (maybe a few examples) of the types of
> > structured data that would  be in the json branch?
> 
> Hi Tim,
> 
> Nobody yet tried to define exact formats. Generatelly it should expose
> info about patches, comments, test results in an easy to consume form.
> Here are examples for patchwork, git-appraise and gerrit:
> https://patchwork.ozlabs.org/api/patches/?order=-id
> https://github.com/google/git-appraise/tree/master/schema
> https://lore.kernel.org/workflows/87sgn0zr09.fsf@iris.silentflame.com/T/#
> m3db87b43cf5e581ba4d3a7fd5f1fbff5aea3546a
> I would expect that the format would resemble these formats to
> significant degree. But I guess we need to come up with something, try
> to use, see what's missing/needs to be improved and iterate.
> Do you have any specific recommendations or ways in which you see it
> will be consumed?

I have no recommendations.  (yet.)  :-)
I'm just trying to figure out what the proposal is about, and it's scope.
Thanks for the links.
 -- Tim


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-07  8:53   ` Dmitry Vyukov
@ 2019-11-07 10:40     ` Daniel Axtens
  2019-11-07 10:43       ` Dmitry Vyukov
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Axtens @ 2019-11-07 10:40 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: workflows, automated-testing, Konstantin Ryabitsev,
	Brendan Higgins, Han-Wen Nienhuys, Kevin Hilman,
	Veronika Kabatova

Dmitry Vyukov <dvyukov@google.com> writes:

> On Wed, Nov 6, 2019 at 4:35 PM Daniel Axtens <dja@axtens.net> wrote:
>>
>> > As soon as we have a bridge from plain-text emails into the structured
>> > form, we can start building everything else in the structured world.
>> > Such bridge needs to parse new incoming emails, try to make sense out
>> > of them (new patch, new patch version, comment, etc) and then push the
>> > information in structured form. Then e.g. CIs can fetch info about
>>
>> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
>> at almost thirteen hundred lines, and that's with the benefit of the
>> Python standard library. It also regularly gets patched to handle
>> changes to email systems (e.g. DMARC), changes to git (git request-pull
>> format changed subtly in 2.14.3), the bizzare ways people send email,
>> and so on.
>>
>> Patchwork does expose much of this as an API, for example for patches:
>> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>> build on that feel free. We can possibly add data to the API if that
>> would be helpful. (Patches are always welcome too, if you don't want to
>> wait an indeterminate amount of time.)
>
> Hi Daniel,
>
> Thanks!
> Could you provide a link to the code?
> Do you have a test suite for the parser (set of email samples and what
> they should be parsed to)?

Sure:
https://github.com/getpatchwork/patchwork in particular
https://github.com/getpatchwork/patchwork/blob/master/patchwork/parser.py and
https://github.com/getpatchwork/patchwork/tree/master/patchwork/tests

Regards,
Daniel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-07 10:40     ` Daniel Axtens
@ 2019-11-07 10:43       ` Dmitry Vyukov
  0 siblings, 0 replies; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-07 10:43 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: workflows, automated-testing, Konstantin Ryabitsev,
	Brendan Higgins, Han-Wen Nienhuys, Kevin Hilman,
	Veronika Kabatova

On Thu, Nov 7, 2019 at 11:41 AM Daniel Axtens <dja@axtens.net> wrote:
>
> Dmitry Vyukov <dvyukov@google.com> writes:
>
> > On Wed, Nov 6, 2019 at 4:35 PM Daniel Axtens <dja@axtens.net> wrote:
> >>
> >> > As soon as we have a bridge from plain-text emails into the structured
> >> > form, we can start building everything else in the structured world.
> >> > Such bridge needs to parse new incoming emails, try to make sense out
> >> > of them (new patch, new patch version, comment, etc) and then push the
> >> > information in structured form. Then e.g. CIs can fetch info about
> >>
> >> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> >> at almost thirteen hundred lines, and that's with the benefit of the
> >> Python standard library. It also regularly gets patched to handle
> >> changes to email systems (e.g. DMARC), changes to git (git request-pull
> >> format changed subtly in 2.14.3), the bizzare ways people send email,
> >> and so on.
> >>
> >> Patchwork does expose much of this as an API, for example for patches:
> >> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
> >> build on that feel free. We can possibly add data to the API if that
> >> would be helpful. (Patches are always welcome too, if you don't want to
> >> wait an indeterminate amount of time.)
> >
> > Hi Daniel,
> >
> > Thanks!
> > Could you provide a link to the code?
> > Do you have a test suite for the parser (set of email samples and what
> > they should be parsed to)?
>
> Sure:
> https://github.com/getpatchwork/patchwork in particular
> https://github.com/getpatchwork/patchwork/blob/master/patchwork/parser.py and
> https://github.com/getpatchwork/patchwork/tree/master/patchwork/tests

Added here for future reference:
https://github.com/dvyukov/kit/blob/master/doc/references.md#patchwork
Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-07  9:08     ` Dmitry Vyukov
@ 2019-11-07 10:57       ` Daniel Axtens
  2019-11-07 11:26         ` Veronika Kabatova
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Axtens @ 2019-11-07 10:57 UTC (permalink / raw)
  To: Dmitry Vyukov, Konstantin Ryabitsev
  Cc: workflows, automated-testing, Brendan Higgins, Han-Wen Nienhuys,
	Kevin Hilman, Veronika Kabatova

Dmitry Vyukov <dvyukov@google.com> writes:

> On Wed, Nov 6, 2019 at 9:50 PM Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
>>
>> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
>> >This is an non-trivial problem, fwiw. Patchwork's email parser clocks
>> >in
>> >at almost thirteen hundred lines, and that's with the benefit of the
>> >Python standard library. It also regularly gets patched to handle
>> >changes to email systems (e.g. DMARC), changes to git (git request-pull
>> >format changed subtly in 2.14.3), the bizzare ways people send email,
>> >and so on.
>>
>> I'm actually very interested in seeing patchwork switch from being fed
>> mail directly from postfix to using public-inbox repositories as its
>> source of patches. I know it's easy enough to accomplish as-is, by
>> piping things from public-inbox to parsemail.sh, but it would be even
>> more awesome if patchwork learned to work with these repos natively.
>>
>> The way I see it:
>>
>> - site administrator configures upstream public-inbox feeds
>> - a backend process clones these repositories
>>    - if it doesn't find a refs/heads/json, then it does its own parsing
>>      to generate a structured feed with patches/series/trailers/pull
>>      requests, cross-referencing them by series as necessary. Something
>>      like a subset of this, excluding patchwork-specific data:
>>      https://patchwork.kernel.org/api/1.1/patches/11177661/
>>    - if it does find an existing structured feed, it simply uses it (e.g.
>>      it was made available by another patchwork instance)
>
> It's an interesting feature if a patchwork instance would convert and
> export text emails to structured info. Then it can be consumed by CIs
> for precommit testing and other systems without the need to duplicate
> conversion.

This already happens.

Snowpatch does this and uses it to run CI checks on patch series as soon
as they arrive, and sends them back to patchwork as test results. It has
been running on linuxppc-dev for over a year.

Snowpatch is at https://github.com/ruscur/snowpatch

An example patch showing the checks having been run is
https://patchwork.ozlabs.org/patch/1190589/

I think there's a different CI system used for some device-tree patches:
e.g. https://patchwork.ozlabs.org/patch/1190714/ - I have no idea how
this works in the backend, but it also uses the patchwork API.

Regards,
Daniel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-06 20:50   ` Konstantin Ryabitsev
  2019-11-07  9:08     ` Dmitry Vyukov
@ 2019-11-07 11:09     ` Daniel Axtens
  2019-11-08 14:18     ` Daniel Axtens
  2 siblings, 0 replies; 33+ messages in thread
From: Daniel Axtens @ 2019-11-07 11:09 UTC (permalink / raw)
  To: Konstantin Ryabitsev, patchwork
  Cc: Dmitry Vyukov, workflows, automated-testing, Brendan Higgins,
	Han-Wen Nienhuys, Kevin Hilman, Veronika Kabatova

Sending on to the patchwork list for discussion. I think at least some
of this makes sense for Patchwork to support, I'll do a more detailed
analysis/breakdown later on.

Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
>>This is an non-trivial problem, fwiw. Patchwork's email parser clocks 
>>in
>>at almost thirteen hundred lines, and that's with the benefit of the
>>Python standard library. It also regularly gets patched to handle
>>changes to email systems (e.g. DMARC), changes to git (git request-pull
>>format changed subtly in 2.14.3), the bizzare ways people send email,
>>and so on.
>
> I'm actually very interested in seeing patchwork switch from being fed 
> mail directly from postfix to using public-inbox repositories as its 
> source of patches. I know it's easy enough to accomplish as-is, by 
> piping things from public-inbox to parsemail.sh, but it would be even 
> more awesome if patchwork learned to work with these repos natively.
>
> The way I see it:
>
> - site administrator configures upstream public-inbox feeds
> - a backend process clones these repositories
>    - if it doesn't find a refs/heads/json, then it does its own parsing 
>      to generate a structured feed with patches/series/trailers/pull 
>      requests, cross-referencing them by series as necessary. Something 
>      like a subset of this, excluding patchwork-specific data:
>      https://patchwork.kernel.org/api/1.1/patches/11177661/
>    - if it does find an existing structured feed, it simply uses it (e.g.  
>      it was made available by another patchwork instance)
> - the same backend process updates the repositories from upstream using 
>    proper manifest files (e.g. see 
>    https://lore.kernel.org/workflows/manifest.js.gz)
>
> - patchwork projects then consume one (or more) of these structured 
>    feeds to generate the actionable list of patches that maintainers can 
>    use, perhaps with optional filtering by specific headers (list-id, 
>    from, cc), patch paths, keywords, etc.
>
> Basically, parsemail.sh is split into two, where one part does feed 
> cloning, pulling, and parsing into structured data (if not already 
> done), and another populates actual patchwork project with patches 
> matching requested parameters.
>
> I see the following upsides to this:
>
> - we consume public-inbox feeds directly, no longer losing patches due 
>    to MTA problems, postfix burps, parse failures, etc
> - a project can have multiple sources for patches instead of being tied 
>    to a single mailing list
> - downstream patchwork instances (the "local patchwork" tool I mentioned 
>    earlier) can benefit from structured feeds provided by 
>    patchwork.kernel.org
>
>>Patchwork does expose much of this as an API, for example for patches:
>>https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>>build on that feel free. We can possibly add data to the API if that
>>would be helpful. (Patches are always welcome too, if you don't want to
>>wait an indeterminate amount of time.)
>
> As I said previously, I may be able to fund development of various 
> features, but I want to make sure that I properly work with upstream.  
> That requires getting consensus on features to make sure that we don't 
> spend funds and efforts on a feature that gets rejected. :)
>
> Would the above feature (using one or more public-inbox repositories as 
> sources for a patchwork project) be a welcome addition to upstream?
>
> -K

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-07 10:57       ` Daniel Axtens
@ 2019-11-07 11:26         ` Veronika Kabatova
  2019-11-08  0:24           ` Eric Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Veronika Kabatova @ 2019-11-07 11:26 UTC (permalink / raw)
  To: Daniel Axtens, Dmitry Vyukov
  Cc: Konstantin Ryabitsev, workflows, automated-testing,
	Brendan Higgins, Han-Wen Nienhuys, Kevin Hilman



----- Original Message -----
> From: "Daniel Axtens" <dja@axtens.net>
> To: "Dmitry Vyukov" <dvyukov@google.com>, "Konstantin Ryabitsev" <konstantin@linuxfoundation.org>
> Cc: workflows@vger.kernel.org, automated-testing@yoctoproject.org, "Brendan Higgins" <brendanhiggins@google.com>,
> "Han-Wen Nienhuys" <hanwen@google.com>, "Kevin Hilman" <khilman@baylibre.com>, "Veronika Kabatova"
> <vkabatov@redhat.com>
> Sent: Thursday, November 7, 2019 11:57:19 AM
> Subject: Re: Structured feeds
> 
> Dmitry Vyukov <dvyukov@google.com> writes:
> 
> > On Wed, Nov 6, 2019 at 9:50 PM Konstantin Ryabitsev
> > <konstantin@linuxfoundation.org> wrote:
> >>
> >> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> >> >This is an non-trivial problem, fwiw. Patchwork's email parser clocks
> >> >in
> >> >at almost thirteen hundred lines, and that's with the benefit of the
> >> >Python standard library. It also regularly gets patched to handle
> >> >changes to email systems (e.g. DMARC), changes to git (git request-pull
> >> >format changed subtly in 2.14.3), the bizzare ways people send email,
> >> >and so on.
> >>
> >> I'm actually very interested in seeing patchwork switch from being fed
> >> mail directly from postfix to using public-inbox repositories as its
> >> source of patches. I know it's easy enough to accomplish as-is, by
> >> piping things from public-inbox to parsemail.sh, but it would be even
> >> more awesome if patchwork learned to work with these repos natively.
> >>
> >> The way I see it:
> >>
> >> - site administrator configures upstream public-inbox feeds
> >> - a backend process clones these repositories
> >>    - if it doesn't find a refs/heads/json, then it does its own parsing
> >>      to generate a structured feed with patches/series/trailers/pull
> >>      requests, cross-referencing them by series as necessary. Something
> >>      like a subset of this, excluding patchwork-specific data:
> >>      https://patchwork.kernel.org/api/1.1/patches/11177661/
> >>    - if it does find an existing structured feed, it simply uses it (e.g.
> >>      it was made available by another patchwork instance)
> >
> > It's an interesting feature if a patchwork instance would convert and
> > export text emails to structured info. Then it can be consumed by CIs
> > for precommit testing and other systems without the need to duplicate
> > conversion.
> 
> This already happens.
> 
> Snowpatch does this and uses it to run CI checks on patch series as soon
> as they arrive, and sends them back to patchwork as test results. It has
> been running on linuxppc-dev for over a year.
> 
> Snowpatch is at https://github.com/ruscur/snowpatch
> 
> An example patch showing the checks having been run is
> https://patchwork.ozlabs.org/patch/1190589/
> 

CKI does something similar too [0].

The code contains some RHEL-specific checks as we are not running patch
testing for upstream yet. The PW checks can be submitted from the pipeline.
We should probably update the trigger to use the events API...


The only "structured information" CKI requires is to have the patch in the
correct PW project, which is mapped to a git tree/branch so we know where
to apply the patch. However there are cases when more information is needed,
such as if multiple different branches can be used with the same project, or
the patch depends on another change.

This situation should be resolved with the freeform tagging feature I
proposed a while ago (blocked by DB refactoring; original series can be found
at [1]). This feature would allow developers to add any tags to their patches,
similar to the signed-off-by line. The extracted tags can then be queried in
the API and used by CI.

I'll be totally honest and admit I ignored most of the implementation details
of public inbox feeds (will take a look when I have some free time) but as
long as they contain the original email, the feature should be usable with
them too.


[0] https://gitlab.com/cki-project/pipeline-trigger/blob/master/triggers/patch_trigger.py
[1] https://patchwork.ozlabs.org/project/patchwork/list/?series=66057


Veronika

> I think there's a different CI system used for some device-tree patches:
> e.g. https://patchwork.ozlabs.org/patch/1190714/ - I have no idea how
> this works in the backend, but it also uses the patchwork API.
> 
> Regards,
> Daniel
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-06 15:35 ` Daniel Axtens
  2019-11-06 20:50   ` Konstantin Ryabitsev
  2019-11-07  8:53   ` Dmitry Vyukov
@ 2019-11-07 20:43   ` Don Zickus
  2019-11-08  7:58     ` Dmitry Vyukov
  2019-11-08 11:44     ` Daniel Axtens
  2 siblings, 2 replies; 33+ messages in thread
From: Don Zickus @ 2019-11-07 20:43 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: Dmitry Vyukov, workflows, automated-testing, Han-Wen Nienhuys,
	Konstantin Ryabitsev

On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> > As soon as we have a bridge from plain-text emails into the structured
> > form, we can start building everything else in the structured world.
> > Such bridge needs to parse new incoming emails, try to make sense out
> > of them (new patch, new patch version, comment, etc) and then push the
> > information in structured form. Then e.g. CIs can fetch info about
> 
> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> at almost thirteen hundred lines, and that's with the benefit of the
> Python standard library. It also regularly gets patched to handle
> changes to email systems (e.g. DMARC), changes to git (git request-pull
> format changed subtly in 2.14.3), the bizzare ways people send email,
> and so on.

Does it ever make sense to just use git to do the translation to structured
json?  Git has similar logic and can easily handle its own changes.  Tools
like git-mailinfo and git-mailsplit probably do a good chunk of the
work today.

It wouldn't pull together series info.

Just a thought.

Cheers,
Don



> 
> Patchwork does expose much of this as an API, for example for patches:
> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
> build on that feel free. We can possibly add data to the API if that
> would be helpful. (Patches are always welcome too, if you don't want to
> wait an indeterminate amount of time.)
> 
> Regards,
> Daniel
> 
> 
> -- 
> _______________________________________________
> automated-testing mailing list
> automated-testing@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/automated-testing


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-05 10:02 Structured feeds Dmitry Vyukov
                   ` (2 preceding siblings ...)
  2019-11-07  8:48 ` [Automated-testing] " Tim.Bird
@ 2019-11-07 20:53 ` Don Zickus
  2019-11-08  8:05   ` Dmitry Vyukov
  2019-11-12 22:54 ` Konstantin Ryabitsev
  4 siblings, 1 reply; 33+ messages in thread
From: Don Zickus @ 2019-11-07 20:53 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev

On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> Hi,
> 
> This is another follow up after Lyon meetings. The main discussion was
> mainly around email process (attestation, archival, etc):
> https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> 
> I think providing info in a structured form is the key for allowing
> building more tooling and automation at a reasonable price. So I
> discussed with CI/Gerrit people and Konstantin how the structured
> information can fit into the current "feeds model" and what would be
> the next steps for bringing it to life.
> 
> Here is the outline of the idea.
> The current public inbox format is a git repo with refs/heads/master
> that contains a single file "m" in RFC822 format. We add
> refs/heads/json with a single file "j" that contains structured data
> in JSON format. 2 separate branches b/c some clients may want to fetch
> just one of them.
> 
> Current clients will only create plain text "m" entry. However, newer
> clients can also create a parallel "j" entry with the same info in
> structured form. "m" and "j" are cross-referenced using the
> Message-ID. It's OK to have only "m", or both, but not only "j" (any
> client needs to generate at least some text representation for every
> message).

Interesting idea.

One of the nuisances of email is the client tools have quirks.  In Red Hat,
we have used patchworkV1 for quite a long time.  These email client 'quirks'
broke a lot of expectations in the database leading us to fix the tool and
manually clean up the data.

In the case of translating to a 'j' file.  What happens if the data is
incorrectly translated due to client 'quirks'?  Is it expected the 'j' data
is manually reviewed before committing (probably not).  Or is it left alone
as-is? Or a follow-on 'j' change is committed?

A similar problem could probably be expanded to CI systems contributing their
data in some result file 'r'.

Cheers,
Don

> 
> Currently we have public inbox feeds only for mailing lists. The idea
> is that more entities will have own "private" feeds. For example, each
> CI system, static analysis system, or third-party code review system
> has its own feed. Eventually people have own feeds too. The feeds can
> be relatively easily converted to local inbox, important into GMail,
> etc (potentially with some filtering).
> 
> Besides private feeds there are also aggregated feeds to not require
> everybody to fetch thousands of repositories. kernel.org will provide
> one, but it can be mirrored (or build independently) anywhere else. If
> I create https://github.com/dvyukov/kfeed.git for my feed and Linus
> creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
> then the aggregated feed will map these to the following branches:
> refs/heads/github.com/dvyukov/kfeed/master
> refs/heads/github.com/dvyukov/kfeed/json
> refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
> refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
> Standardized naming of sub-feeds allows a single repo to host multiple
> feeds. For example, github/gitlab/gerrit bridge could host multiple
> individual feeds for their users.
> So far there is no proposal for feed auto-discovery. One needs to
> notify kernel.org for inclusion of their feed into the main aggregated
> feed.
> 
> Konstantin offered that kernel.org can send emails for some feeds.
> That is, normally one sends out an email and then commits it to the
> feed. Instead some systems can just commit the message to feed and
> then kernel.org will pull the feed and send emails on user's behalf.
> This allows clients to not deal with email at all (including mail
> client setup). Which is nice.
> 
> Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> blob's right into feeds. This would allow users to fetch only the
> blobs they are interested in. But this does not need to happen from
> day one.
> 
> As soon as we have a bridge from plain-text emails into the structured
> form, we can start building everything else in the structured world.
> Such bridge needs to parse new incoming emails, try to make sense out
> of them (new patch, new patch version, comment, etc) and then push the
> information in structured form. Then e.g. CIs can fetch info about
> patches under review, test and post strctured results. Bridging in the
> opposite direction happens semi-automatically as CI also pushes text
> representation of results and that just needs to be sent as email.
> Alternatively, we could have a separate explicit converted of
> structured message into plain text, which would allow to remove some
> duplication and present results in more consistent form.
> 
> Similarly, it should be much simpler for Patchwork/Gerrit to present
> current patches under review. Local mode should work almost seamlessly
> -- you fetch the aggregated feed and then run local instance on top of
> it.
> 
> No work has been done on the actual form/schema of the structured
> feeds. That's something we need to figure out working on a prototype.
> However, good references would be git-appraise schema:
> https://github.com/google/git-appraise/tree/master/schema
> and gerrit schema (not sure what's a good link). Does anybody know
> where the gitlab schema is? Or other similar schemes?
> 
> Thoughts and comments are welcome.
> Thanks
> -- 
> _______________________________________________
> automated-testing mailing list
> automated-testing@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/automated-testing


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-07 11:26         ` Veronika Kabatova
@ 2019-11-08  0:24           ` Eric Wong
  0 siblings, 0 replies; 33+ messages in thread
From: Eric Wong @ 2019-11-08  0:24 UTC (permalink / raw)
  To: Veronika Kabatova
  Cc: Daniel Axtens, Dmitry Vyukov, Konstantin Ryabitsev, workflows,
	automated-testing, Brendan Higgins, Han-Wen Nienhuys,
	Kevin Hilman

Veronika Kabatova <vkabatov@redhat.com> wrote:
> I'll be totally honest and admit I ignored most of the implementation details
> of public inbox feeds (will take a look when I have some free time) but as
> long as they contain the original email, the feature should be usable with
> them too.

Implementation details should not matter to consumers.

public-inbox exposes everything as NNTP which is the same
message format as email.  NNTP is also much more stable and
established than the v2 git layout of public-inbox (which could
be superceded by a hypothetical "v3" layout).

I highly recommend anybody consuming public-inbox (and not
making 1:1 mirrors) use NNTP since it's well-established
and doesn't enforce long-term storage requirements.

I hope to support HTTP(S) CONNECT tunneling as a means for users
behind firewalls to get around NNTP port 119/563 restrictions.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-07 20:43   ` [Automated-testing] " Don Zickus
@ 2019-11-08  7:58     ` Dmitry Vyukov
  2019-11-08 15:26       ` Don Zickus
  2019-11-08 11:44     ` Daniel Axtens
  1 sibling, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-08  7:58 UTC (permalink / raw)
  To: Don Zickus
  Cc: Daniel Axtens, workflows, automated-testing, Han-Wen Nienhuys,
	Konstantin Ryabitsev

On Thu, Nov 7, 2019 at 9:44 PM Don Zickus <dzickus@redhat.com> wrote:
>
> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> > > As soon as we have a bridge from plain-text emails into the structured
> > > form, we can start building everything else in the structured world.
> > > Such bridge needs to parse new incoming emails, try to make sense out
> > > of them (new patch, new patch version, comment, etc) and then push the
> > > information in structured form. Then e.g. CIs can fetch info about
> >
> > This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> > at almost thirteen hundred lines, and that's with the benefit of the
> > Python standard library. It also regularly gets patched to handle
> > changes to email systems (e.g. DMARC), changes to git (git request-pull
> > format changed subtly in 2.14.3), the bizzare ways people send email,
> > and so on.
>
> Does it ever make sense to just use git to do the translation to structured
> json?  Git has similar logic and can easily handle its own changes.  Tools
> like git-mailinfo and git-mailsplit probably do a good chunk of the
> work today.
>
> It wouldn't pull together series info.

Hi Don,

Could you elaborate? What exactly do you mean? I don't understand the
overall proposal.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-07 20:53 ` Don Zickus
@ 2019-11-08  8:05   ` Dmitry Vyukov
  2019-11-08 14:52     ` Don Zickus
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-08  8:05 UTC (permalink / raw)
  To: Don Zickus
  Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev

On Thu, Nov 7, 2019 at 9:53 PM Don Zickus <dzickus@redhat.com> wrote:
>
> On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> > Hi,
> >
> > This is another follow up after Lyon meetings. The main discussion was
> > mainly around email process (attestation, archival, etc):
> > https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> >
> > I think providing info in a structured form is the key for allowing
> > building more tooling and automation at a reasonable price. So I
> > discussed with CI/Gerrit people and Konstantin how the structured
> > information can fit into the current "feeds model" and what would be
> > the next steps for bringing it to life.
> >
> > Here is the outline of the idea.
> > The current public inbox format is a git repo with refs/heads/master
> > that contains a single file "m" in RFC822 format. We add
> > refs/heads/json with a single file "j" that contains structured data
> > in JSON format. 2 separate branches b/c some clients may want to fetch
> > just one of them.
> >
> > Current clients will only create plain text "m" entry. However, newer
> > clients can also create a parallel "j" entry with the same info in
> > structured form. "m" and "j" are cross-referenced using the
> > Message-ID. It's OK to have only "m", or both, but not only "j" (any
> > client needs to generate at least some text representation for every
> > message).
>
> Interesting idea.
>
> One of the nuisances of email is the client tools have quirks.  In Red Hat,
> we have used patchworkV1 for quite a long time.  These email client 'quirks'
> broke a lot of expectations in the database leading us to fix the tool and
> manually clean up the data.
>
> In the case of translating to a 'j' file.  What happens if the data is
> incorrectly translated due to client 'quirks'?  Is it expected the 'j' data
> is manually reviewed before committing (probably not).  Or is it left alone
> as-is? Or a follow-on 'j' change is committed?

Good point.
I would expect that eventually there will be updates to the format and
new version. Which is easy to add to json with "version":2 attribute.
Code that parses these messages will need to keep quirks for older
formats.
Realistically nobody will review the data (besides the initial
testing). I guess in the end it depends on (1) how bad it's screwed,
(2) if correct data is preserved in at least some form or not
(consider a client pushes bad structured data, but it's also
misrepresented in the plain text form, or simply missing there).
Fixing up data later is not possible. Appending corrections is possible.

> A similar problem could probably be expanded to CI systems contributing their
> data in some result file 'r'.

The idea is that all systems push "j'. It's the contents of the feed
that matter. CI systems will push messages of different types (test
results), but we don't need "r" for this.

> Cheers,
> Don
>
> >
> > Currently we have public inbox feeds only for mailing lists. The idea
> > is that more entities will have own "private" feeds. For example, each
> > CI system, static analysis system, or third-party code review system
> > has its own feed. Eventually people have own feeds too. The feeds can
> > be relatively easily converted to local inbox, important into GMail,
> > etc (potentially with some filtering).
> >
> > Besides private feeds there are also aggregated feeds to not require
> > everybody to fetch thousands of repositories. kernel.org will provide
> > one, but it can be mirrored (or build independently) anywhere else. If
> > I create https://github.com/dvyukov/kfeed.git for my feed and Linus
> > creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
> > then the aggregated feed will map these to the following branches:
> > refs/heads/github.com/dvyukov/kfeed/master
> > refs/heads/github.com/dvyukov/kfeed/json
> > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
> > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
> > Standardized naming of sub-feeds allows a single repo to host multiple
> > feeds. For example, github/gitlab/gerrit bridge could host multiple
> > individual feeds for their users.
> > So far there is no proposal for feed auto-discovery. One needs to
> > notify kernel.org for inclusion of their feed into the main aggregated
> > feed.
> >
> > Konstantin offered that kernel.org can send emails for some feeds.
> > That is, normally one sends out an email and then commits it to the
> > feed. Instead some systems can just commit the message to feed and
> > then kernel.org will pull the feed and send emails on user's behalf.
> > This allows clients to not deal with email at all (including mail
> > client setup). Which is nice.
> >
> > Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> > blob's right into feeds. This would allow users to fetch only the
> > blobs they are interested in. But this does not need to happen from
> > day one.
> >
> > As soon as we have a bridge from plain-text emails into the structured
> > form, we can start building everything else in the structured world.
> > Such bridge needs to parse new incoming emails, try to make sense out
> > of them (new patch, new patch version, comment, etc) and then push the
> > information in structured form. Then e.g. CIs can fetch info about
> > patches under review, test and post strctured results. Bridging in the
> > opposite direction happens semi-automatically as CI also pushes text
> > representation of results and that just needs to be sent as email.
> > Alternatively, we could have a separate explicit converted of
> > structured message into plain text, which would allow to remove some
> > duplication and present results in more consistent form.
> >
> > Similarly, it should be much simpler for Patchwork/Gerrit to present
> > current patches under review. Local mode should work almost seamlessly
> > -- you fetch the aggregated feed and then run local instance on top of
> > it.
> >
> > No work has been done on the actual form/schema of the structured
> > feeds. That's something we need to figure out working on a prototype.
> > However, good references would be git-appraise schema:
> > https://github.com/google/git-appraise/tree/master/schema
> > and gerrit schema (not sure what's a good link). Does anybody know
> > where the gitlab schema is? Or other similar schemes?
> >
> > Thoughts and comments are welcome.
> > Thanks
> > --
> > _______________________________________________
> > automated-testing mailing list
> > automated-testing@yoctoproject.org
> > https://lists.yoctoproject.org/listinfo/automated-testing
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-07 20:43   ` [Automated-testing] " Don Zickus
  2019-11-08  7:58     ` Dmitry Vyukov
@ 2019-11-08 11:44     ` Daniel Axtens
  2019-11-08 14:54       ` Don Zickus
  1 sibling, 1 reply; 33+ messages in thread
From: Daniel Axtens @ 2019-11-08 11:44 UTC (permalink / raw)
  To: Don Zickus, patchwork
  Cc: Dmitry Vyukov, workflows, automated-testing, Han-Wen Nienhuys,
	Konstantin Ryabitsev

Don Zickus <dzickus@redhat.com> writes:

> On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
>> > As soon as we have a bridge from plain-text emails into the structured
>> > form, we can start building everything else in the structured world.
>> > Such bridge needs to parse new incoming emails, try to make sense out
>> > of them (new patch, new patch version, comment, etc) and then push the
>> > information in structured form. Then e.g. CIs can fetch info about
>> 
>> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
>> at almost thirteen hundred lines, and that's with the benefit of the
>> Python standard library. It also regularly gets patched to handle
>> changes to email systems (e.g. DMARC), changes to git (git request-pull
>> format changed subtly in 2.14.3), the bizzare ways people send email,
>> and so on.
>
> Does it ever make sense to just use git to do the translation to structured
> json?  Git has similar logic and can easily handle its own changes.  Tools
> like git-mailinfo and git-mailsplit probably do a good chunk of the
> work today.
>
+patchwork@

So patchwork, in theory at least, is VCS-agnostic: if a mail contains a
unified-diff, we can treat it as a patch. We do have some special
handling for git pull requests, but we also have tests for parsing of
CVS and if memory serves Mercurial too. So we haven't wanted to depend
on git-specific tools. Maybe in future we will give up on that, but we
haven't yet.

Regards,
Daniel

> It wouldn't pull together series info.
>
> Just a thought.
>
> Cheers,
> Don
>
>
>
>> 
>> Patchwork does expose much of this as an API, for example for patches:
>> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>> build on that feel free. We can possibly add data to the API if that
>> would be helpful. (Patches are always welcome too, if you don't want to
>> wait an indeterminate amount of time.)
>> 
>> Regards,
>> Daniel
>> 
>> 
>> -- 
>> _______________________________________________
>> automated-testing mailing list
>> automated-testing@yoctoproject.org
>> https://lists.yoctoproject.org/listinfo/automated-testing

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-06 20:50   ` Konstantin Ryabitsev
  2019-11-07  9:08     ` Dmitry Vyukov
  2019-11-07 11:09     ` Daniel Axtens
@ 2019-11-08 14:18     ` Daniel Axtens
  2019-11-09  7:41       ` Johannes Berg
  2 siblings, 1 reply; 33+ messages in thread
From: Daniel Axtens @ 2019-11-08 14:18 UTC (permalink / raw)
  To: Konstantin Ryabitsev, patchwork
  Cc: Dmitry Vyukov, workflows, automated-testing, Brendan Higgins,
	Han-Wen Nienhuys, Kevin Hilman, Veronika Kabatova


> I'm actually very interested in seeing patchwork switch from being fed 
> mail directly from postfix to using public-inbox repositories as its 
> source of patches. I know it's easy enough to accomplish as-is, by 
> piping things from public-inbox to parsemail.sh, but it would be even 
> more awesome if patchwork learned to work with these repos natively.
>
> The way I see it:
>
> - site administrator configures upstream public-inbox feeds
> - a backend process clones these repositories
>    - if it doesn't find a refs/heads/json, then it does its own parsing 
>      to generate a structured feed with patches/series/trailers/pull 
>      requests, cross-referencing them by series as necessary. Something 
>      like a subset of this, excluding patchwork-specific data:
>      https://patchwork.kernel.org/api/1.1/patches/11177661/
>    - if it does find an existing structured feed, it simply uses it (e.g.  
>      it was made available by another patchwork instance)
> - the same backend process updates the repositories from upstream using 
>    proper manifest files (e.g. see 
>    https://lore.kernel.org/workflows/manifest.js.gz)
>
> - patchwork projects then consume one (or more) of these structured 
>    feeds to generate the actionable list of patches that maintainers can 
>    use, perhaps with optional filtering by specific headers (list-id, 
>    from, cc), patch paths, keywords, etc.
>
> Basically, parsemail.sh is split into two, where one part does feed 
> cloning, pulling, and parsing into structured data (if not already 
> done), and another populates actual patchwork project with patches 
> matching requested parameters.

This is very confusing to me. Let me see if I have it correct.

You want to split out a chunk of parsemail that takes email messages,
either from regular email or from public-inbox, and spits out a
structured feed.

You then want patchwork to consume that structured feed.

I don't know how that would work architecturally - converting emails
into a structured feed requires a lot of the patchwork core.

It would be a lot simpler from the patchwork side to teach parsemail to
be able to consume a public-inbox git feed, and write an API consumer
that takes the structured data that Patchwork produces, strip out the
bits you don't care about, and feed it into other projects.

>
> I see the following upsides to this:
>
> - we consume public-inbox feeds directly, no longer losing patches due 
>    to MTA problems, postfix burps, parse failures, etc

This much I am OK with as an additional option for sites. FWIW,
consuming a public-inbox feed doesn't protect you against most parse
failures - they are due to things like duplicate message-ids and bad
mail from the sender end. It should prevent against issues due to
postfix invoking multiple parsemails in parallel, but that shouldn't be
losing patches, just getting series metadata wrong.

> - a project can have multiple sources for patches instead of being tied 
>    to a single mailing list

You can get around this pretty easily now with the --list-id=parameter,
and I think the netdev patchwork might do this to grab bpf patches? I
think there's a little shim at OzLabs that does this.

I also don't see how a public-inbox feed helps. Currently pw determines
the list based on a header in the email, unless overridden. public-inbox
emails will also have that header, so either patchwork looks at those
headers or you tell patchwork explicitly that a particular public-inbox
feed corresponds to a particular list. Either way I think this leaves
you in the same situation you were in before, unless I have
misunderstood...

> - downstream patchwork instances (the "local patchwork" tool I mentioned 
>    earlier) can benefit from structured feeds provided by 
>    patchwork.kernel.org

Do I understand correctly that this is basically a stripped-down version
of what the API provides, but in git form?

>>Patchwork does expose much of this as an API, for example for patches:
>>https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>>build on that feel free. We can possibly add data to the API if that
>>would be helpful. (Patches are always welcome too, if you don't want to
>>wait an indeterminate amount of time.)
>
> As I said previously, I may be able to fund development of various 
> features, but I want to make sure that I properly work with upstream.  
> That requires getting consensus on features to make sure that we don't 
> spend funds and efforts on a feature that gets rejected. :)
>
> Would the above feature (using one or more public-inbox repositories as 
> sources for a patchwork project) be a welcome addition to upstream?

I think a lot about patchwork development in terms of good incremental
changes. This is largely because maintainers get quite cross with us if
we break things, and I don't like that.

What I would be happy with as a first step (not necessarily saying this
is _all_ I would accept, just that this is what I'd want to see _first_)
is:

 - code that efficiently reads a public-inbox git repository/folder of
   git repositories and feeds it into the existing parser. I have very
   inefficient code that converts public-inbox to an mbox and then
   parses that, but I'm sure you can do better with a git library.

 - careful thought about how to do this incrementally. It's obvious how
   to do email incrementally, but I think you need to keep an extra bit
   of state around to incrementally parse the git archive. I think.

 - careful thought about how to do this in a way that doesn't require
   sites that don't want to load public-inbox feeds to install lots of
   random git-parsing code.

Once you can do that, I'm happy to think more about your more ambitious
plans.

Regards,
Daniel

>
> -K

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-08  8:05   ` Dmitry Vyukov
@ 2019-11-08 14:52     ` Don Zickus
  2019-11-11  9:20       ` Dmitry Vyukov
  0 siblings, 1 reply; 33+ messages in thread
From: Don Zickus @ 2019-11-08 14:52 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev

On Fri, Nov 08, 2019 at 09:05:02AM +0100, Dmitry Vyukov wrote:
> On Thu, Nov 7, 2019 at 9:53 PM Don Zickus <dzickus@redhat.com> wrote:
> >
> > On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> > > Hi,
> > >
> > > This is another follow up after Lyon meetings. The main discussion was
> > > mainly around email process (attestation, archival, etc):
> > > https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> > >
> > > I think providing info in a structured form is the key for allowing
> > > building more tooling and automation at a reasonable price. So I
> > > discussed with CI/Gerrit people and Konstantin how the structured
> > > information can fit into the current "feeds model" and what would be
> > > the next steps for bringing it to life.
> > >
> > > Here is the outline of the idea.
> > > The current public inbox format is a git repo with refs/heads/master
> > > that contains a single file "m" in RFC822 format. We add
> > > refs/heads/json with a single file "j" that contains structured data
> > > in JSON format. 2 separate branches b/c some clients may want to fetch
> > > just one of them.
> > >
> > > Current clients will only create plain text "m" entry. However, newer
> > > clients can also create a parallel "j" entry with the same info in
> > > structured form. "m" and "j" are cross-referenced using the
> > > Message-ID. It's OK to have only "m", or both, but not only "j" (any
> > > client needs to generate at least some text representation for every
> > > message).
> >
> > Interesting idea.
> >
> > One of the nuisances of email is the client tools have quirks.  In Red Hat,
> > we have used patchworkV1 for quite a long time.  These email client 'quirks'
> > broke a lot of expectations in the database leading us to fix the tool and
> > manually clean up the data.
> >
> > In the case of translating to a 'j' file.  What happens if the data is
> > incorrectly translated due to client 'quirks'?  Is it expected the 'j' data
> > is manually reviewed before committing (probably not).  Or is it left alone
> > as-is? Or a follow-on 'j' change is committed?
> 
> Good point.
> I would expect that eventually there will be updates to the format and
> new version. Which is easy to add to json with "version":2 attribute.
> Code that parses these messages will need to keep quirks for older
> formats.
> Realistically nobody will review the data (besides the initial
> testing). I guess in the end it depends on (1) how bad it's screwed,
> (2) if correct data is preserved in at least some form or not
> (consider a client pushes bad structured data, but it's also
> misrepresented in the plain text form, or simply missing there).
> Fixing up data later is not possible. Appending corrections is possible.

Ok.  Yeah, in my head I was thinking the data is largely right, just
occasionally 1 or 2 fields was misrepresented due to bad client tool or
human error in the text.

In Red Hat was use internal metadata for checking our patches through our
process (namely Bugzilla id).  It isn't unusual for someone to accidentally
fat-finger the bugzilla id when posting their patch.

I was thinking if there is a follow-on 'type' that appends corrections as you
stated, say 'type: correction' that 'corrects the original data.  This would
have to be linked through message-id or some unique identifier.

Then I assume any tool that parses the feed 'j' would correlate all the data
based around some unique ids such that picking up corrections would just be
a natural extension?

Cheers,
Don

> 
> > A similar problem could probably be expanded to CI systems contributing their
> > data in some result file 'r'.
> 
> The idea is that all systems push "j'. It's the contents of the feed
> that matter. CI systems will push messages of different types (test
> results), but we don't need "r" for this.
> 
> > Cheers,
> > Don
> >
> > >
> > > Currently we have public inbox feeds only for mailing lists. The idea
> > > is that more entities will have own "private" feeds. For example, each
> > > CI system, static analysis system, or third-party code review system
> > > has its own feed. Eventually people have own feeds too. The feeds can
> > > be relatively easily converted to local inbox, important into GMail,
> > > etc (potentially with some filtering).
> > >
> > > Besides private feeds there are also aggregated feeds to not require
> > > everybody to fetch thousands of repositories. kernel.org will provide
> > > one, but it can be mirrored (or build independently) anywhere else. If
> > > I create https://github.com/dvyukov/kfeed.git for my feed and Linus
> > > creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
> > > then the aggregated feed will map these to the following branches:
> > > refs/heads/github.com/dvyukov/kfeed/master
> > > refs/heads/github.com/dvyukov/kfeed/json
> > > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
> > > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
> > > Standardized naming of sub-feeds allows a single repo to host multiple
> > > feeds. For example, github/gitlab/gerrit bridge could host multiple
> > > individual feeds for their users.
> > > So far there is no proposal for feed auto-discovery. One needs to
> > > notify kernel.org for inclusion of their feed into the main aggregated
> > > feed.
> > >
> > > Konstantin offered that kernel.org can send emails for some feeds.
> > > That is, normally one sends out an email and then commits it to the
> > > feed. Instead some systems can just commit the message to feed and
> > > then kernel.org will pull the feed and send emails on user's behalf.
> > > This allows clients to not deal with email at all (including mail
> > > client setup). Which is nice.
> > >
> > > Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> > > blob's right into feeds. This would allow users to fetch only the
> > > blobs they are interested in. But this does not need to happen from
> > > day one.
> > >
> > > As soon as we have a bridge from plain-text emails into the structured
> > > form, we can start building everything else in the structured world.
> > > Such bridge needs to parse new incoming emails, try to make sense out
> > > of them (new patch, new patch version, comment, etc) and then push the
> > > information in structured form. Then e.g. CIs can fetch info about
> > > patches under review, test and post strctured results. Bridging in the
> > > opposite direction happens semi-automatically as CI also pushes text
> > > representation of results and that just needs to be sent as email.
> > > Alternatively, we could have a separate explicit converted of
> > > structured message into plain text, which would allow to remove some
> > > duplication and present results in more consistent form.
> > >
> > > Similarly, it should be much simpler for Patchwork/Gerrit to present
> > > current patches under review. Local mode should work almost seamlessly
> > > -- you fetch the aggregated feed and then run local instance on top of
> > > it.
> > >
> > > No work has been done on the actual form/schema of the structured
> > > feeds. That's something we need to figure out working on a prototype.
> > > However, good references would be git-appraise schema:
> > > https://github.com/google/git-appraise/tree/master/schema
> > > and gerrit schema (not sure what's a good link). Does anybody know
> > > where the gitlab schema is? Or other similar schemes?
> > >
> > > Thoughts and comments are welcome.
> > > Thanks
> > > --
> > > _______________________________________________
> > > automated-testing mailing list
> > > automated-testing@yoctoproject.org
> > > https://lists.yoctoproject.org/listinfo/automated-testing
> >


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-08 11:44     ` Daniel Axtens
@ 2019-11-08 14:54       ` Don Zickus
  0 siblings, 0 replies; 33+ messages in thread
From: Don Zickus @ 2019-11-08 14:54 UTC (permalink / raw)
  To: Daniel Axtens
  Cc: patchwork, Dmitry Vyukov, workflows, automated-testing,
	Han-Wen Nienhuys, Konstantin Ryabitsev

On Fri, Nov 08, 2019 at 10:44:37PM +1100, Daniel Axtens wrote:
> Don Zickus <dzickus@redhat.com> writes:
> 
> > On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> >> > As soon as we have a bridge from plain-text emails into the structured
> >> > form, we can start building everything else in the structured world.
> >> > Such bridge needs to parse new incoming emails, try to make sense out
> >> > of them (new patch, new patch version, comment, etc) and then push the
> >> > information in structured form. Then e.g. CIs can fetch info about
> >> 
> >> This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> >> at almost thirteen hundred lines, and that's with the benefit of the
> >> Python standard library. It also regularly gets patched to handle
> >> changes to email systems (e.g. DMARC), changes to git (git request-pull
> >> format changed subtly in 2.14.3), the bizzare ways people send email,
> >> and so on.
> >
> > Does it ever make sense to just use git to do the translation to structured
> > json?  Git has similar logic and can easily handle its own changes.  Tools
> > like git-mailinfo and git-mailsplit probably do a good chunk of the
> > work today.
> >
> +patchwork@
> 
> So patchwork, in theory at least, is VCS-agnostic: if a mail contains a
> unified-diff, we can treat it as a patch. We do have some special
> handling for git pull requests, but we also have tests for parsing of
> CVS and if memory serves Mercurial too. So we haven't wanted to depend
> on git-specific tools. Maybe in future we will give up on that, but we
> haven't yet.

Fair point.  Thanks!

Cheers,
Don

> 
> Regards,
> Daniel
> 
> > It wouldn't pull together series info.
> >
> > Just a thought.
> >
> > Cheers,
> > Don
> >
> >
> >
> >> 
> >> Patchwork does expose much of this as an API, for example for patches:
> >> https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
> >> build on that feel free. We can possibly add data to the API if that
> >> would be helpful. (Patches are always welcome too, if you don't want to
> >> wait an indeterminate amount of time.)
> >> 
> >> Regards,
> >> Daniel
> >> 
> >> 
> >> -- 
> >> _______________________________________________
> >> automated-testing mailing list
> >> automated-testing@yoctoproject.org
> >> https://lists.yoctoproject.org/listinfo/automated-testing


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-08  7:58     ` Dmitry Vyukov
@ 2019-11-08 15:26       ` Don Zickus
  0 siblings, 0 replies; 33+ messages in thread
From: Don Zickus @ 2019-11-08 15:26 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Daniel Axtens, workflows, automated-testing, Han-Wen Nienhuys,
	Konstantin Ryabitsev

On Fri, Nov 08, 2019 at 08:58:44AM +0100, Dmitry Vyukov wrote:
> On Thu, Nov 7, 2019 at 9:44 PM Don Zickus <dzickus@redhat.com> wrote:
> >
> > On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> > > > As soon as we have a bridge from plain-text emails into the structured
> > > > form, we can start building everything else in the structured world.
> > > > Such bridge needs to parse new incoming emails, try to make sense out
> > > > of them (new patch, new patch version, comment, etc) and then push the
> > > > information in structured form. Then e.g. CIs can fetch info about
> > >
> > > This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> > > at almost thirteen hundred lines, and that's with the benefit of the
> > > Python standard library. It also regularly gets patched to handle
> > > changes to email systems (e.g. DMARC), changes to git (git request-pull
> > > format changed subtly in 2.14.3), the bizzare ways people send email,
> > > and so on.
> >
> > Does it ever make sense to just use git to do the translation to structured
> > json?  Git has similar logic and can easily handle its own changes.  Tools
> > like git-mailinfo and git-mailsplit probably do a good chunk of the
> > work today.
> >
> > It wouldn't pull together series info.
> 
> Hi Don,
> 
> Could you elaborate? What exactly do you mean? I don't understand the
> overall proposal.

The problem I was looking at was, patchwork has this large elaborate python
code to translate human git formatted patches into some structured form.
And rightfully so.

But git has similar code in order to make git-am work.

When applying an email to public-inbox, I had assumed it was using a tool
like git-am that would call into git-mailsplit and git-mailinfo to split
apart the email into various pieces and put them in .git/rebase-apply.

At that point most of the text parsing is done.

So the thought was to have another public-inbox tool that took advantage of
the already split data and just take the small step to finish converting
into a structured file 'j'.  As opposed to sending the text email through an
external tool like patchwork to re-split the data into structured pieces
again.

Then adding to that thought was, every time git changed its format or text
output, instead of updating external tools, just leverage git's existing
knowledge of the change (assuming public-inbox used the latest git tool
consistently) would reduce the ripple effect of having to update all
external tools before developers can utilize new git features or changes.

But looking through the public-inbox code, it appears to do things
differently, so the idea may not work at all.

So just treat my idea as looking at the problem from a different angle to
see if there is an easier solution.

Cheers,
Don


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-08 14:18     ` Daniel Axtens
@ 2019-11-09  7:41       ` Johannes Berg
  2019-11-12 10:44         ` Daniel Borkmann
       [not found]         ` <208edf06eb4c56a4f376caf0feced65f09d23f93.camel@that.guru>
  0 siblings, 2 replies; 33+ messages in thread
From: Johannes Berg @ 2019-11-09  7:41 UTC (permalink / raw)
  To: Daniel Axtens, Konstantin Ryabitsev, patchwork
  Cc: workflows, Kevin Hilman, Brendan Higgins, Han-Wen Nienhuys,
	automated-testing, Dmitry Vyukov

On Sat, 2019-11-09 at 01:18 +1100, Daniel Axtens wrote:
> > 
>  - code that efficiently reads a public-inbox git repository/folder of
>    git repositories and feeds it into the existing parser. I have very
>    inefficient code that converts public-inbox to an mbox and then
>    parses that, but I'm sure you can do better with a git library.

Somebody (Daniel Borkmann?) posted a (very fast) public-inbox git to
maildir converter, with procmail support. I assume that would actually
satisfy this step already, since you can just substitute the patchwork
parser for procmail.

>  - careful thought about how to do this incrementally. It's obvious how
>    to do email incrementally, but I think you need to keep an extra bit
>    of state around to incrementally parse the git archive. I think.

Not sure he had an incremental mode figured out there, but that can't
really be all *that* hard, just store the last-successfully-parsed git
sha1?

johannes


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-08 14:52     ` Don Zickus
@ 2019-11-11  9:20       ` Dmitry Vyukov
  2019-11-11 15:14         ` Don Zickus
  0 siblings, 1 reply; 33+ messages in thread
From: Dmitry Vyukov @ 2019-11-11  9:20 UTC (permalink / raw)
  To: Don Zickus
  Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev

On Fri, Nov 8, 2019 at 3:53 PM Don Zickus <dzickus@redhat.com> wrote:
>
> On Fri, Nov 08, 2019 at 09:05:02AM +0100, Dmitry Vyukov wrote:
> > On Thu, Nov 7, 2019 at 9:53 PM Don Zickus <dzickus@redhat.com> wrote:
> > >
> > > On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> > > > Hi,
> > > >
> > > > This is another follow up after Lyon meetings. The main discussion was
> > > > mainly around email process (attestation, archival, etc):
> > > > https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> > > >
> > > > I think providing info in a structured form is the key for allowing
> > > > building more tooling and automation at a reasonable price. So I
> > > > discussed with CI/Gerrit people and Konstantin how the structured
> > > > information can fit into the current "feeds model" and what would be
> > > > the next steps for bringing it to life.
> > > >
> > > > Here is the outline of the idea.
> > > > The current public inbox format is a git repo with refs/heads/master
> > > > that contains a single file "m" in RFC822 format. We add
> > > > refs/heads/json with a single file "j" that contains structured data
> > > > in JSON format. 2 separate branches b/c some clients may want to fetch
> > > > just one of them.
> > > >
> > > > Current clients will only create plain text "m" entry. However, newer
> > > > clients can also create a parallel "j" entry with the same info in
> > > > structured form. "m" and "j" are cross-referenced using the
> > > > Message-ID. It's OK to have only "m", or both, but not only "j" (any
> > > > client needs to generate at least some text representation for every
> > > > message).
> > >
> > > Interesting idea.
> > >
> > > One of the nuisances of email is the client tools have quirks.  In Red Hat,
> > > we have used patchworkV1 for quite a long time.  These email client 'quirks'
> > > broke a lot of expectations in the database leading us to fix the tool and
> > > manually clean up the data.
> > >
> > > In the case of translating to a 'j' file.  What happens if the data is
> > > incorrectly translated due to client 'quirks'?  Is it expected the 'j' data
> > > is manually reviewed before committing (probably not).  Or is it left alone
> > > as-is? Or a follow-on 'j' change is committed?
> >
> > Good point.
> > I would expect that eventually there will be updates to the format and
> > new version. Which is easy to add to json with "version":2 attribute.
> > Code that parses these messages will need to keep quirks for older
> > formats.
> > Realistically nobody will review the data (besides the initial
> > testing). I guess in the end it depends on (1) how bad it's screwed,
> > (2) if correct data is preserved in at least some form or not
> > (consider a client pushes bad structured data, but it's also
> > misrepresented in the plain text form, or simply missing there).
> > Fixing up data later is not possible. Appending corrections is possible.
>
> Ok.  Yeah, in my head I was thinking the data is largely right, just
> occasionally 1 or 2 fields was misrepresented due to bad client tool or
> human error in the text.
>
> In Red Hat was use internal metadata for checking our patches through our
> process (namely Bugzilla id).  It isn't unusual for someone to accidentally
> fat-finger the bugzilla id when posting their patch.
>
> I was thinking if there is a follow-on 'type' that appends corrections as you
> stated, say 'type: correction' that 'corrects the original data.  This would
> have to be linked through message-id or some unique identifier.
>
> Then I assume any tool that parses the feed 'j' would correlate all the data
> based around some unique ids such that picking up corrections would just be
> a natural extension?

Yes, this should be handled naturally in this model. Since it's not
possible to mutate any previously published info, everything is
represented as additions/corrections: adding a comment to a patch,
adding Reviewed-by, adding Nack, adding test results. The final state
of a patch is always reconstructed by "replaying" all messages
published regarding the patch. So naturally if we mis-parsed a message
as "Acked-by: X" and then corrected that to "Nacked-by: X" and
republished, whoever will replay the feed, should replace Acked-by
with Nacked-by.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Automated-testing] Structured feeds
  2019-11-11  9:20       ` Dmitry Vyukov
@ 2019-11-11 15:14         ` Don Zickus
  0 siblings, 0 replies; 33+ messages in thread
From: Don Zickus @ 2019-11-11 15:14 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: workflows, automated-testing, Han-Wen Nienhuys, Konstantin Ryabitsev

On Mon, Nov 11, 2019 at 10:20:22AM +0100, Dmitry Vyukov wrote:
> > Ok.  Yeah, in my head I was thinking the data is largely right, just
> > occasionally 1 or 2 fields was misrepresented due to bad client tool or
> > human error in the text.
> >
> > In Red Hat was use internal metadata for checking our patches through our
> > process (namely Bugzilla id).  It isn't unusual for someone to accidentally
> > fat-finger the bugzilla id when posting their patch.
> >
> > I was thinking if there is a follow-on 'type' that appends corrections as you
> > stated, say 'type: correction' that 'corrects the original data.  This would
> > have to be linked through message-id or some unique identifier.
> >
> > Then I assume any tool that parses the feed 'j' would correlate all the data
> > based around some unique ids such that picking up corrections would just be
> > a natural extension?
> 
> Yes, this should be handled naturally in this model. Since it's not
> possible to mutate any previously published info, everything is
> represented as additions/corrections: adding a comment to a patch,
> adding Reviewed-by, adding Nack, adding test results. The final state
> of a patch is always reconstructed by "replaying" all messages
> published regarding the patch. So naturally if we mis-parsed a message
> as "Acked-by: X" and then corrected that to "Nacked-by: X" and
> republished, whoever will replay the feed, should replace Acked-by
> with Nacked-by.

Great.  That makes sense to me.  Thanks!

Cheers,
Don


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-09  7:41       ` Johannes Berg
@ 2019-11-12 10:44         ` Daniel Borkmann
       [not found]         ` <208edf06eb4c56a4f376caf0feced65f09d23f93.camel@that.guru>
  1 sibling, 0 replies; 33+ messages in thread
From: Daniel Borkmann @ 2019-11-12 10:44 UTC (permalink / raw)
  To: Johannes Berg, Daniel Axtens, Konstantin Ryabitsev, patchwork
  Cc: workflows, Kevin Hilman, Brendan Higgins, Han-Wen Nienhuys,
	automated-testing, Dmitry Vyukov

On 11/9/19 8:41 AM, Johannes Berg wrote:
> On Sat, 2019-11-09 at 01:18 +1100, Daniel Axtens wrote:
>>>
>>   - code that efficiently reads a public-inbox git repository/folder of
>>     git repositories and feeds it into the existing parser. I have very
>>     inefficient code that converts public-inbox to an mbox and then
>>     parses that, but I'm sure you can do better with a git library.
> 
> Somebody (Daniel Borkmann?) posted a (very fast) public-inbox git to
> maildir converter, with procmail support. I assume that would actually
> satisfy this step already, since you can just substitute the patchwork
> parser for procmail.
> 
>>   - careful thought about how to do this incrementally. It's obvious how
>>     to do email incrementally, but I think you need to keep an extra bit
>>     of state around to incrementally parse the git archive. I think.
> 
> Not sure he had an incremental mode figured out there, but that can't
> really be all *that* hard, just store the last-successfully-parsed git
> sha1?

Yep, that is what it is doing, so that we only need to walk the repo(s)
upon a new git fetch to the point where we stopped last time.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-05 10:02 Structured feeds Dmitry Vyukov
                   ` (3 preceding siblings ...)
  2019-11-07 20:53 ` Don Zickus
@ 2019-11-12 22:54 ` Konstantin Ryabitsev
  4 siblings, 0 replies; 33+ messages in thread
From: Konstantin Ryabitsev @ 2019-11-12 22:54 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: workflows, automated-testing, Brendan Higgins, Han-Wen Nienhuys,
	Kevin Hilman, Veronika Kabatova

On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> Hi,
> 
> This is another follow up after Lyon meetings. The main discussion was
> mainly around email process (attestation, archival, etc):
> https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> 
> I think providing info in a structured form is the key for allowing
> building more tooling and automation at a reasonable price. So I
> discussed with CI/Gerrit people and Konstantin how the structured
> information can fit into the current "feeds model" and what would be
> the next steps for bringing it to life.

BTW, someone recently highlighted the following project to me:

https://openci.io (certificate lapsed, so ignore the browser warning)

The goal of this workgroup was to establish cross-ci communication using 
pubsub subscriptions and broadcasted events. The following docs may be 
of interest to people on this list:

https://event-driven-federated-cicd.openci.io/key-considerations-and-contraints
https://pipeline-messaging-protocol.openci.io/

-K

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
       [not found]         ` <208edf06eb4c56a4f376caf0feced65f09d23f93.camel@that.guru>
@ 2019-11-30 18:16           ` Johannes Berg
  2019-11-30 18:36             ` Stephen Finucane
  0 siblings, 1 reply; 33+ messages in thread
From: Johannes Berg @ 2019-11-30 18:16 UTC (permalink / raw)
  To: Stephen Finucane, Daniel Axtens, Konstantin Ryabitsev, patchwork
  Cc: workflows, Kevin Hilman, Brendan Higgins, Han-Wen Nienhuys,
	automated-testing, Dmitry Vyukov

On Sat, 2019-11-30 at 18:04 +0000, Stephen Finucane wrote:

> > Somebody (Daniel Borkmann?) posted a (very fast) public-inbox git to
> > maildir converter, with procmail support. I assume that would actually
> > satisfy this step already, since you can just substitute the patchwork
> > parser for procmail.
> 
> What do you mean "substitute the patchwork parser for procmail"? From
> reading this thread, I got the impression that we'd be changing what
> feeds things into the 'parsemail' management command, right?

Yes, that's exactly what I meant. I was looking at it from Daniel's
tool's POV, so instead of calling procmail it can call patchwork.

johannes


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Structured feeds
  2019-11-30 18:16           ` Johannes Berg
@ 2019-11-30 18:36             ` Stephen Finucane
  0 siblings, 0 replies; 33+ messages in thread
From: Stephen Finucane @ 2019-11-30 18:36 UTC (permalink / raw)
  To: Johannes Berg, Daniel Axtens, Konstantin Ryabitsev, patchwork
  Cc: workflows, Kevin Hilman, Brendan Higgins, Han-Wen Nienhuys,
	automated-testing, Dmitry Vyukov

On Sat, 2019-11-30 at 19:16 +0100, Johannes Berg wrote:
> On Sat, 2019-11-30 at 18:04 +0000, Stephen Finucane wrote:
> 
> > > Somebody (Daniel Borkmann?) posted a (very fast) public-inbox git to
> > > maildir converter, with procmail support. I assume that would actually
> > > satisfy this step already, since you can just substitute the patchwork
> > > parser for procmail.
> > 
> > What do you mean "substitute the patchwork parser for procmail"? From
> > reading this thread, I got the impression that we'd be changing what
> > feeds things into the 'parsemail' management command, right?
> 
> Yes, that's exactly what I meant. I was looking at it from Daniel's
> tool's POV, so instead of calling procmail it can call patchwork.
> 
> johannes

Ah, then that I have no issues with :) I've managed to configure
getmail to feed into a patchwork instance exactly once, and found
configuring postfix so daunting that I actually suggested just using an
email-as-a-service provider to take the hassle out of things [1].
Anything that lets one avoid working with those tools is a good thing
in my mind.

Stephen

[1] https://patchwork.readthedocs.io/en/latest/deployment/installation/#use-a-email-as-a-service-provider


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2019-11-30 18:36 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-05 10:02 Structured feeds Dmitry Vyukov
2019-11-06 15:35 ` Daniel Axtens
2019-11-06 20:50   ` Konstantin Ryabitsev
2019-11-07  9:08     ` Dmitry Vyukov
2019-11-07 10:57       ` Daniel Axtens
2019-11-07 11:26         ` Veronika Kabatova
2019-11-08  0:24           ` Eric Wong
2019-11-07 11:09     ` Daniel Axtens
2019-11-08 14:18     ` Daniel Axtens
2019-11-09  7:41       ` Johannes Berg
2019-11-12 10:44         ` Daniel Borkmann
     [not found]         ` <208edf06eb4c56a4f376caf0feced65f09d23f93.camel@that.guru>
2019-11-30 18:16           ` Johannes Berg
2019-11-30 18:36             ` Stephen Finucane
2019-11-07  8:53   ` Dmitry Vyukov
2019-11-07 10:40     ` Daniel Axtens
2019-11-07 10:43       ` Dmitry Vyukov
2019-11-07 20:43   ` [Automated-testing] " Don Zickus
2019-11-08  7:58     ` Dmitry Vyukov
2019-11-08 15:26       ` Don Zickus
2019-11-08 11:44     ` Daniel Axtens
2019-11-08 14:54       ` Don Zickus
2019-11-06 19:54 ` Han-Wen Nienhuys
2019-11-06 20:31   ` Sean Whitton
2019-11-07  9:04   ` Dmitry Vyukov
2019-11-07  8:48 ` [Automated-testing] " Tim.Bird
2019-11-07  9:13   ` Dmitry Vyukov
2019-11-07  9:20     ` Tim.Bird
2019-11-07 20:53 ` Don Zickus
2019-11-08  8:05   ` Dmitry Vyukov
2019-11-08 14:52     ` Don Zickus
2019-11-11  9:20       ` Dmitry Vyukov
2019-11-11 15:14         ` Don Zickus
2019-11-12 22:54 ` Konstantin Ryabitsev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.