A common place for CI results?

All of lore.kernel.org
 help / color / mirror / Atom feed

* A common place for CI results?
       [not found] <299272045.11819252.1554465036421.JavaMail.zimbra@redhat.com>
@ 2019-04-05 14:41 ` Veronika Kabatova
  2019-04-08 22:16   ` Tim.Bird
  2019-05-14 23:01   ` Tim.Bird
  0 siblings, 2 replies; 15+ messages in thread
From: Veronika Kabatova @ 2019-04-05 14:41 UTC (permalink / raw)
  To: automated-testing, info

Hi,

as we know from this list, there's plenty CI systems doing some testing on the
upstream kernels (and maybe some others we don't know about).

It would be great if there was a single common place where all the CI systems
can put their results. This would make it much easier for the kernel
maintainers and developers to see testing status since they only need to
check one place instead of having a list of sites/mailing lists where each CI
posts their contributions.

A few weeks ago, with some people we've been talking about kernelci.org being
in a good place to act as the central upstream kernel CI piece that most
maintainers already know about. So I'm wondering if it would be possible for
kernelci to also act as an aggregator of all results? There's already an API
for publishing a report [0] so it shouldn't be too hard to adjust it to
handle and show more information. I also found the beta version for test
results [1] so actually, most of the needed functionality seems to be already
there. Since there will be multiple CI systems, the source and contact point
for the contributor (so maintainers know whom to ask about results if needed)
would likely be the only missing essential data point.

The common place for results would also make it easier for new CI systems to
get involved with upstream. There are likely other companies out there running
some tests on kernel internally but don't publish the results anywhere. Only
adding some API calls into their code (with the data they are allowed to
publish) would make it very simple for them to start contributing. If we want
to make them interested, the starting point needs to be trivial. Different
companies have different setups and policies and they might not be able to
fulfill arbitrary requirements so they opt to not get involved at all, which
is a shame because their results can be useful. After the initial "onboarding"
step they might be willing to contribute more and more too.

Please let me know if the idea makes sense or if something similar is already
in plans. I'd be happy to contribute to the effort because I believe it would
make everyone's life easier and we'd all benefit from it (and maybe someone
else from my team would be willing to help out too if needed).

Thanks,

Veronika Kabatova
CKI Project

[0] https://api.kernelci.org/examples.html#sending-a-boot-report
[1] https://kernelci.org/test/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A common place for CI results?
  2019-04-05 14:41 ` A common place for CI results? Veronika Kabatova
@ 2019-04-08 22:16   ` Tim.Bird
  2019-04-09 13:41     ` Guenter Roeck
  2019-05-14 23:01   ` Tim.Bird
  1 sibling, 1 reply; 15+ messages in thread
From: Tim.Bird @ 2019-04-08 22:16 UTC (permalink / raw)
  To: vkabatov, automated-testing, info

> -----Original Message-----
> From: Veronika Kabatova
...
> as we know from this list, there's plenty CI systems doing some testing on
> the
> upstream kernels (and maybe some others we don't know about).
> 
> It would be great if there was a single common place where all the CI systems
> can put their results. This would make it much easier for the kernel
> maintainers and developers to see testing status since they only need to
> check one place instead of having a list of sites/mailing lists where each CI
> posts their contributions.

We've had discussions about this, and decided there are a few issues.
Some of these you identify below.

> 
> A few weeks ago, with some people we've been talking about kernelci.org
> being
> in a good place to act as the central upstream kernel CI piece that most
> maintainers already know about. So I'm wondering if it would be possible for
> kernelci to also act as an aggregator of all results?

Right now, the kernelCI central server is (to my knowledge) maintained
by Kevin Hilman, on his own dime.  That may be changing with the Linux
Foundation possibly creating a testing project to provide support for this.
But in any event, at the scale we're talking about (with lots of test frameworks
and potentially thousands of boards and hundreds of thousands of test
run results arriving daily), hosting this is costly.  So there's a question of
who pays for this.

> There's already an API
> for publishing a report [0] so it shouldn't be too hard to adjust it to
> handle and show more information. I also found the beta version for test
> results [1] so actually, most of the needed functionality seems to be already
> there. Since there will be multiple CI systems, the source and contact point
> for the contributor (so maintainers know whom to ask about results if
> needed)
> would likely be the only missing essential data point.

One of the things on our action item list is to have discussions about a common results format.
See https://elinux.org/ATS_2018_Minutes (towards the end right before "Decisions from the summit")
I think this addresses the issue of what information is needed for a universal results format.
I think we should definitely add a 'contributor' field to a common definition, for the
reasons you mention.

Some other issues, are making it so that different test frameworks emit the same testcase
names when they run the same test.  For example, in Fuego there is a testcase called
Functional.LTP.syscalls.abort07.  It's not required, but it seems like it would be valuable
if CKI, Linaro, Fuego and others decided on a canonical name for this particular testcase,
so they were the same in each run result.

I took an action item from our meetings at Linaro last week to look at this issue
(testcase name harmonization).

> 
> The common place for results would also make it easier for new CI systems
> to
> get involved with upstream. There are likely other companies out there
> running
> some tests on kernel internally but don't publish the results anywhere.

> Only
> adding some API calls into their code (with the data they are allowed to
> publish) would make it very simple for them to start contributing. If we want
> to make them interested, the starting point needs to be trivial. Different
> companies have different setups and policies and they might not be able to
> fulfill arbitrary requirements so they opt to not get involved at all, which
> is a shame because their results can be useful. After the initial "onboarding"
> step they might be willing to contribute more and more too.
> 
Indeed.  Probably most groups don't publish their test results, even
when they are using open source tests.  There are lots of reasons for this
(including there not being a place to publish them, as you mention).
It would be good to also address the other reasons that testing entities
don't publish, and try to remove as many obstacles (or to try to encourage
as much as possible) publishing of test results.

> Please let me know if the idea makes sense or if something similar is already
> in plans. I'd be happy to contribute to the effort because I believe it would
> make everyone's life easier and we'd all benefit from it (and maybe
> someone
> else from my team would be willing to help out too if needed).

I think it makes a lot of sense, and we'd like to take steps to make that possible.

The aspect of this that  I plan to work on myself is testcase name harmonization.
That's one aspect of standardizing a common or universal results format.
But I've already got a lot of things I'm working on.  If someone else wants to
volunteer to work on this, or head up a workgroup to work on this, let me know.

Regards,
 -- Tim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A common place for CI results?
  2019-04-08 22:16   ` Tim.Bird
@ 2019-04-09 13:41     ` Guenter Roeck
  2019-04-10  9:28       ` [Automated-testing] " Mark Brown
  2019-04-10 17:47       ` Veronika Kabatova
  0 siblings, 2 replies; 15+ messages in thread
From: Guenter Roeck @ 2019-04-09 13:41 UTC (permalink / raw)
  To: kernelci, Bird, Timothy; +Cc: vkabatov, automated-testing, info

On Mon, Apr 8, 2019 at 10:48 PM <Tim.Bird@sony.com> wrote:
>
> > -----Original Message-----
> > From: Veronika Kabatova
> ...
> > as we know from this list, there's plenty CI systems doing some testing on
> > the
> > upstream kernels (and maybe some others we don't know about).
> >
> > It would be great if there was a single common place where all the CI systems
> > can put their results. This would make it much easier for the kernel
> > maintainers and developers to see testing status since they only need to
> > check one place instead of having a list of sites/mailing lists where each CI
> > posts their contributions.
>
> We've had discussions about this, and decided there are a few issues.
> Some of these you identify below.
>
> >
> > A few weeks ago, with some people we've been talking about kernelci.org
> > being
> > in a good place to act as the central upstream kernel CI piece that most
> > maintainers already know about. So I'm wondering if it would be possible for
> > kernelci to also act as an aggregator of all results?
>
> Right now, the kernelCI central server is (to my knowledge) maintained
> by Kevin Hilman, on his own dime.  That may be changing with the Linux
> Foundation possibly creating a testing project to provide support for this.
> But in any event, at the scale we're talking about (with lots of test frameworks
> and potentially thousands of boards and hundreds of thousands of test
> run results arriving daily), hosting this is costly.  So there's a question of
> who pays for this.
>

In theory that would be the Linux Foundation as part of the KernelCI
project. Unfortunately, while companies and people do show interest in
KernelCI, there seems to be little interest in actually joining the
project. My understanding is that the Linux Foundation will only make
it official if/when there are five members. Currently there are three,
Google being one of them. Any company interested in the project may
possibly want to consider joining it. When doing so, you'l have
influence setting its direction, and that may include hosting test
results other than those from KernelCI itself.

Guenter

> > There's already an API
> > for publishing a report [0] so it shouldn't be too hard to adjust it to
> > handle and show more information. I also found the beta version for test
> > results [1] so actually, most of the needed functionality seems to be already
> > there. Since there will be multiple CI systems, the source and contact point
> > for the contributor (so maintainers know whom to ask about results if
> > needed)
> > would likely be the only missing essential data point.
>
> One of the things on our action item list is to have discussions about a common results format.
> See https://elinux.org/ATS_2018_Minutes (towards the end right before "Decisions from the summit")
> I think this addresses the issue of what information is needed for a universal results format.
> I think we should definitely add a 'contributor' field to a common definition, for the
> reasons you mention.
>
> Some other issues, are making it so that different test frameworks emit the same testcase
> names when they run the same test.  For example, in Fuego there is a testcase called
> Functional.LTP.syscalls.abort07.  It's not required, but it seems like it would be valuable
> if CKI, Linaro, Fuego and others decided on a canonical name for this particular testcase,
> so they were the same in each run result.
>
> I took an action item from our meetings at Linaro last week to look at this issue
> (testcase name harmonization).
>
> >
> > The common place for results would also make it easier for new CI systems
> > to
> > get involved with upstream. There are likely other companies out there
> > running
> > some tests on kernel internally but don't publish the results anywhere.
>
> > Only
> > adding some API calls into their code (with the data they are allowed to
> > publish) would make it very simple for them to start contributing. If we want
> > to make them interested, the starting point needs to be trivial. Different
> > companies have different setups and policies and they might not be able to
> > fulfill arbitrary requirements so they opt to not get involved at all, which
> > is a shame because their results can be useful. After the initial "onboarding"
> > step they might be willing to contribute more and more too.
> >
> Indeed.  Probably most groups don't publish their test results, even
> when they are using open source tests.  There are lots of reasons for this
> (including there not being a place to publish them, as you mention).
> It would be good to also address the other reasons that testing entities
> don't publish, and try to remove as many obstacles (or to try to encourage
> as much as possible) publishing of test results.
>
> > Please let me know if the idea makes sense or if something similar is already
> > in plans. I'd be happy to contribute to the effort because I believe it would
> > make everyone's life easier and we'd all benefit from it (and maybe
> > someone
> > else from my team would be willing to help out too if needed).
>
> I think it makes a lot of sense, and we'd like to take steps to make that possible.
>
> The aspect of this that  I plan to work on myself is testcase name harmonization.
> That's one aspect of standardizing a common or universal results format.
> But I've already got a lot of things I'm working on.  If someone else wants to
> volunteer to work on this, or head up a workgroup to work on this, let me know.
>
> Regards,
>  -- Tim
>
>
> 
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Automated-testing] A common place for CI results?
  2019-04-09 13:41     ` Guenter Roeck
@ 2019-04-10  9:28       ` Mark Brown
  2019-04-10 17:47       ` Veronika Kabatova
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Brown @ 2019-04-10  9:28 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: kernelci, Bird, Timothy, info, automated-testing

[-- Attachment #1: Type: text/plain, Size: 559 bytes --]

On Tue, Apr 09, 2019 at 06:41:24AM -0700, Guenter Roeck wrote:
> On Mon, Apr 8, 2019 at 10:48 PM <Tim.Bird@sony.com> wrote:

> > Right now, the kernelCI central server is (to my knowledge) maintained
> > by Kevin Hilman, on his own dime.  That may be changing with the Linux

Linaro is paying for the core servers (the Hetzner boxes with the core
servers are a combination of Linaro and Collabora, IIRC the boxes
Collabora is paying for are all builders).  As far as I'm aware no
individual is paying out of pocket for anything except for labs at the
minute.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A common place for CI results?
  2019-04-09 13:41     ` Guenter Roeck
  2019-04-10  9:28       ` [Automated-testing] " Mark Brown
@ 2019-04-10 17:47       ` Veronika Kabatova
  2019-04-10 21:13         ` [Automated-testing] " Kevin Hilman
  1 sibling, 1 reply; 15+ messages in thread
From: Veronika Kabatova @ 2019-04-10 17:47 UTC (permalink / raw)
  To: Guenter Roeck, Timothy Bird; +Cc: kernelci, automated-testing, info


----- Original Message -----
> From: "Guenter Roeck" <groeck@google.com>
> To: kernelci@groups.io, "Timothy Bird" <Tim.Bird@sony.com>
> Cc: vkabatov@redhat.com, automated-testing@yoctoproject.org, info@kernelci.org
> Sent: Tuesday, April 9, 2019 3:41:24 PM
> Subject: Re: A common place for CI results?
> 
> On Mon, Apr 8, 2019 at 10:48 PM <Tim.Bird@sony.com> wrote:
> >
> > > -----Original Message-----
> > > From: Veronika Kabatova
> > ...
> > > as we know from this list, there's plenty CI systems doing some testing
> > > on
> > > the
> > > upstream kernels (and maybe some others we don't know about).
> > >
> > > It would be great if there was a single common place where all the CI
> > > systems
> > > can put their results. This would make it much easier for the kernel
> > > maintainers and developers to see testing status since they only need to
> > > check one place instead of having a list of sites/mailing lists where
> > > each CI
> > > posts their contributions.
> >
> > We've had discussions about this, and decided there are a few issues.
> > Some of these you identify below.
> >
> > >
> > > A few weeks ago, with some people we've been talking about kernelci.org
> > > being
> > > in a good place to act as the central upstream kernel CI piece that most
> > > maintainers already know about. So I'm wondering if it would be possible
> > > for
> > > kernelci to also act as an aggregator of all results?
> >
> > Right now, the kernelCI central server is (to my knowledge) maintained
> > by Kevin Hilman, on his own dime.  That may be changing with the Linux
> > Foundation possibly creating a testing project to provide support for this.
> > But in any event, at the scale we're talking about (with lots of test
> > frameworks
> > and potentially thousands of boards and hundreds of thousands of test
> > run results arriving daily), hosting this is costly.  So there's a question
> > of
> > who pays for this.
> >
> 
> In theory that would be the Linux Foundation as part of the KernelCI
> project. Unfortunately, while companies and people do show interest in
> KernelCI, there seems to be little interest in actually joining the
> project. My understanding is that the Linux Foundation will only make
> it official if/when there are five members. Currently there are three,
> Google being one of them. Any company interested in the project may
> possibly want to consider joining it. When doing so, you'l have
> influence setting its direction, and that may include hosting test
> results other than those from KernelCI itself.
> 

Is there any page with details on how to join and what are the requirements
on us that I can pass along to management to get an official statement?

We are definitely interested in more involvement with upstream, both kernel
and different CI systems, as we have a common goal. If we can help each other
out and build a central CI system for upstream kernels that people can rely
on, we want to be a part of this effort. We have started our own interaction
with upstream (see my intro email on this list) but as all CI systems face
the same challenges it only makes sense to join the forces.

> Guenter
> 
> > > There's already an API
> > > for publishing a report [0] so it shouldn't be too hard to adjust it to
> > > handle and show more information. I also found the beta version for test
> > > results [1] so actually, most of the needed functionality seems to be
> > > already
> > > there. Since there will be multiple CI systems, the source and contact
> > > point
> > > for the contributor (so maintainers know whom to ask about results if
> > > needed)
> > > would likely be the only missing essential data point.
> >
> > One of the things on our action item list is to have discussions about a
> > common results format.
> > See https://elinux.org/ATS_2018_Minutes (towards the end right before
> > "Decisions from the summit")
> > I think this addresses the issue of what information is needed for a
> > universal results format.
> > I think we should definitely add a 'contributor' field to a common
> > definition, for the
> > reasons you mention.
> >
> > Some other issues, are making it so that different test frameworks emit the
> > same testcase
> > names when they run the same test.  For example, in Fuego there is a
> > testcase called
> > Functional.LTP.syscalls.abort07.  It's not required, but it seems like it
> > would be valuable
> > if CKI, Linaro, Fuego and others decided on a canonical name for this
> > particular testcase,
> > so they were the same in each run result.
> >

Good point. CKI only reports full testsuite names as results (so it would be
"LTP lite") and then we have a short log with subtests and results, and a
longer log with details. But for CI systems that report each subtest
separately, having a common name (with maybe "LTP" as an aggregated result)
would definitely be beneficial and easier to parse by both humans and
automation.

> > I took an action item from our meetings at Linaro last week to look at this
> > issue
> > (testcase name harmonization).
> >
> > >
> > > The common place for results would also make it easier for new CI systems
> > > to
> > > get involved with upstream. There are likely other companies out there
> > > running
> > > some tests on kernel internally but don't publish the results anywhere.
> >
> > > Only
> > > adding some API calls into their code (with the data they are allowed to
> > > publish) would make it very simple for them to start contributing. If we
> > > want
> > > to make them interested, the starting point needs to be trivial.
> > > Different
> > > companies have different setups and policies and they might not be able
> > > to
> > > fulfill arbitrary requirements so they opt to not get involved at all,
> > > which
> > > is a shame because their results can be useful. After the initial
> > > "onboarding"
> > > step they might be willing to contribute more and more too.
> > >
> > Indeed.  Probably most groups don't publish their test results, even
> > when they are using open source tests.  There are lots of reasons for this
> > (including there not being a place to publish them, as you mention).
> > It would be good to also address the other reasons that testing entities
> > don't publish, and try to remove as many obstacles (or to try to encourage
> > as much as possible) publishing of test results.
> >

Absolutely agreed.

> > > Please let me know if the idea makes sense or if something similar is
> > > already
> > > in plans. I'd be happy to contribute to the effort because I believe it
> > > would
> > > make everyone's life easier and we'd all benefit from it (and maybe
> > > someone
> > > else from my team would be willing to help out too if needed).
> >
> > I think it makes a lot of sense, and we'd like to take steps to make that
> > possible.
> >
> > The aspect of this that  I plan to work on myself is testcase name
> > harmonization.
> > That's one aspect of standardizing a common or universal results format.
> > But I've already got a lot of things I'm working on.  If someone else wants
> > to
> > volunteer to work on this, or head up a workgroup to work on this, let me
> > know.
> >

Totally understand your situation, too much work and too little time :) I can
try to put an idea together and post it here for feedback to help out. Do you
have any data points or previous discussions to link? It would be great to
have something to build upon instead of posting a brain dump that won't work
for already known issues (that aren't known by me).



Veronika

> > Regards,
> >  -- Tim
> >
> >
> > 
> >
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Automated-testing] A common place for CI results?
  2019-04-10 17:47       ` Veronika Kabatova
@ 2019-04-10 21:13         ` Kevin Hilman
  2019-04-11 16:02           ` Veronika Kabatova
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Hilman @ 2019-04-10 21:13 UTC (permalink / raw)
  To: Veronika Kabatova
  Cc: Guenter Roeck, Timothy Bird, info, automated-testing, kernelci

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

On Wed, Apr 10, 2019 at 10:47 AM Veronika Kabatova <vkabatov@redhat.com> wrote:

[...]

> Is there any page with details on how to join and what are the requirements
> on us that I can pass along to management to get an official statement?

Attatched is the LF slide deck with the project overview, membership
levels and costs etc.

I'd be happy to discuss more with you on a call after you review the
deck, but it would have to be next week as I'm OoO for the rest of
this week.

Thanks,

Kevin

[-- Attachment #2: kernelCI Project Pitch.pdf --]
[-- Type: application/pdf, Size: 1536010 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Automated-testing] A common place for CI results?
  2019-04-10 21:13         ` [Automated-testing] " Kevin Hilman
@ 2019-04-11 16:02           ` Veronika Kabatova
  0 siblings, 0 replies; 15+ messages in thread
From: Veronika Kabatova @ 2019-04-11 16:02 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Guenter Roeck, Timothy Bird, info, automated-testing, kernelci



----- Original Message -----
> From: "Kevin Hilman" <khilman@baylibre.com>
> To: "Veronika Kabatova" <vkabatov@redhat.com>
> Cc: "Guenter Roeck" <groeck@google.com>, "Timothy Bird" <Tim.Bird@sony.com>, "info" <info@kernelci.org>,
> automated-testing@yoctoproject.org, kernelci@groups.io
> Sent: Wednesday, April 10, 2019 11:13:40 PM
> Subject: Re: [Automated-testing] A common place for CI results?
> 
> On Wed, Apr 10, 2019 at 10:47 AM Veronika Kabatova <vkabatov@redhat.com>
> wrote:
> 
> [...]
> 
> > Is there any page with details on how to join and what are the requirements
> > on us that I can pass along to management to get an official statement?
> 
> Attatched is the LF slide deck with the project overview, membership
> levels and costs etc.
> 
> I'd be happy to discuss more with you on a call after you review the
> deck, but it would have to be next week as I'm OoO for the rest of
> this week.

Sounds good. Feel free to reach out off list to set up the time and call
location. I'll prepare a list of questions to discuss (especially as I have
no idea how the project memberships work, even though Red Hat is already a
member of Linux Foundation). Afterwards I can pass along all the information
and try to get any funding.


Thanks,
Veronika

> 
> Thanks,
> 
> Kevin
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A common place for CI results?
  2019-04-05 14:41 ` A common place for CI results? Veronika Kabatova
  2019-04-08 22:16   ` Tim.Bird
@ 2019-05-14 23:01   ` Tim.Bird
  2019-05-15 20:33     ` Dan Rue
  1 sibling, 1 reply; 15+ messages in thread
From: Tim.Bird @ 2019-05-14 23:01 UTC (permalink / raw)
  To: vkabatov, automated-testing, info



> -----Original Message-----
> From: Veronika Kabatova
> 
> Hi,
> 
> as we know from this list, there's plenty CI systems doing some testing on
> the
> upstream kernels (and maybe some others we don't know about).
> 
> It would be great if there was a single common place where all the CI systems
> can put their results. This would make it much easier for the kernel
> maintainers and developers to see testing status since they only need to
> check one place instead of having a list of sites/mailing lists where each CI
> posts their contributions.
> 
> 
> A few weeks ago, with some people we've been talking about kernelci.org
> being
> in a good place to act as the central upstream kernel CI piece that most
> maintainers already know about. So I'm wondering if it would be possible for
> kernelci to also act as an aggregator of all results? There's already an API
> for publishing a report [0] so it shouldn't be too hard to adjust it to
> handle and show more information. I also found the beta version for test
> results [1] so actually, most of the needed functionality seems to be already
> there. Since there will be multiple CI systems, the source and contact point
> for the contributor (so maintainers know whom to ask about results if
> needed)
> would likely be the only missing essential data point.
> 
> 
> The common place for results would also make it easier for new CI systems
> to
> get involved with upstream. There are likely other companies out there
> running
> some tests on kernel internally but don't publish the results anywhere. Only
> adding some API calls into their code (with the data they are allowed to
> publish) would make it very simple for them to start contributing. If we want
> to make them interested, the starting point needs to be trivial. Different
> companies have different setups and policies and they might not be able to
> fulfill arbitrary requirements so they opt to not get involved at all, which
> is a shame because their results can be useful. After the initial "onboarding"
> step they might be willing to contribute more and more too.
> 
> 
> Please let me know if the idea makes sense or if something similar is already
> in plans. I'd be happy to contribute to the effort because I believe it would
> make everyone's life easier and we'd all benefit from it (and maybe
> someone
> else from my team would be willing to help out too if needed).

I never responded to this, but this sounds like a really good idea to me.
I don't care much which backend we aggregate to, but it would be good
as a community to start using one service to start with.  It would help
to find issues with the API, or the results schema, if multiple people
started using it.

I know that people using Fuego are sending data to their own instances
of KernelCI.  But I don't know what the issues are for sending this
data to a shared KernelCI service.

I would be interested in hooking up my lab to send Fuego results to
KernelCI.  This would be a good exercise.  I'm not sure what the next
steps would be, but maybe we could discuss this on the next automated
testing conference call.
 -- Tim


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A common place for CI results?
  2019-05-14 23:01   ` Tim.Bird
@ 2019-05-15 20:33     ` Dan Rue
  2019-05-15 21:06       ` Tom Gall
  2019-05-15 22:58       ` [Automated-testing] " Carlos Hernandez
  0 siblings, 2 replies; 15+ messages in thread
From: Dan Rue @ 2019-05-15 20:33 UTC (permalink / raw)
  To: kernelci, Tim.Bird; +Cc: vkabatov, automated-testing, info

On Tue, May 14, 2019 at 11:01:35PM +0000, Tim.Bird@sony.com wrote:
> 
> 
> > -----Original Message-----
> > From: Veronika Kabatova
> > 
> > Hi,
> > 
> > as we know from this list, there's plenty CI systems doing some testing on
> > the
> > upstream kernels (and maybe some others we don't know about).
> > 
> > It would be great if there was a single common place where all the CI systems
> > can put their results. This would make it much easier for the kernel
> > maintainers and developers to see testing status since they only need to
> > check one place instead of having a list of sites/mailing lists where each CI
> > posts their contributions.
> > 
> > 
> > A few weeks ago, with some people we've been talking about kernelci.org
> > being
> > in a good place to act as the central upstream kernel CI piece that most
> > maintainers already know about. So I'm wondering if it would be possible for
> > kernelci to also act as an aggregator of all results? There's already an API
> > for publishing a report [0] so it shouldn't be too hard to adjust it to
> > handle and show more information. I also found the beta version for test
> > results [1] so actually, most of the needed functionality seems to be already
> > there. Since there will be multiple CI systems, the source and contact point
> > for the contributor (so maintainers know whom to ask about results if
> > needed)
> > would likely be the only missing essential data point.
> > 
> > 
> > The common place for results would also make it easier for new CI systems
> > to
> > get involved with upstream. There are likely other companies out there
> > running
> > some tests on kernel internally but don't publish the results anywhere. Only
> > adding some API calls into their code (with the data they are allowed to
> > publish) would make it very simple for them to start contributing. If we want
> > to make them interested, the starting point needs to be trivial. Different
> > companies have different setups and policies and they might not be able to
> > fulfill arbitrary requirements so they opt to not get involved at all, which
> > is a shame because their results can be useful. After the initial "onboarding"
> > step they might be willing to contribute more and more too.
> > 
> > 
> > Please let me know if the idea makes sense or if something similar is already
> > in plans. I'd be happy to contribute to the effort because I believe it would
> > make everyone's life easier and we'd all benefit from it (and maybe
> > someone
> > else from my team would be willing to help out too if needed).
> 
> I never responded to this, 

yea, you did. ;)

> but this sounds like a really good idea to me. I don't care much which
> backend we aggregate to, but it would be good as a community to start
> using one service to start with.  It would help to find issues with
> the API, or the results schema, if multiple people started using it.
> 
> I know that people using Fuego are sending data to their own instances
> of KernelCI.  But I don't know what the issues are for sending this
> data to a shared KernelCI service.
> 
> I would be interested in hooking up my lab to send Fuego results to
> KernelCI.  This would be a good exercise.  I'm not sure what the next
> steps would be, but maybe we could discuss this on the next automated
> testing conference call.

OK here's my idea.

I don't personally think kernelci (or LKFT) are set up to aggregate
results currently. We have too many assumptions about where tests are
coming from, how things are built, etc. In other words, dealing with
noisy data is going to be non-trivial in any existing project.

I would propose aggregating data into something like google's BigQuery.
This has a few benefits:
- Non-opinionated place to hold structured data
- Allows many downstream use-cases
- Managed hosting, and data is publicly available
- Storage is sponsored by google as a part of
  https://cloud.google.com/bigquery/public-data/
- First 1TB of query per 'project' is free, and users pay for more
  queries than that

With storage taken care of, how do we get the data in?

First, we'll need some canonical data structure defined. I would
approach defining the canonical structure in conjunction with the first
few projects that are interested in contributing their results. Each
project will have an ETL pipeline which will extract the test results
from a given project (such as kernelci, lkft, etc), translate it into
the canonical data structure, and load it into the google bigquery
dataset at a regular interval or in real-time. The translation layer is
where things like test names are handled.

The things this leaves me wanting are:
- raw data storage. It would be nice if raw data were stored somewhere
  permanent in some intermediary place so that later implementations
  could happen, and for data that doesn't fit into whatever structure we
  end up with.
- time, to actually try it and find the gaps. This is just an idea I've
  been thinking about. Anyone with experience here that can help flesh
  this out?

Dan

-- 
Linaro - Kernel Validation

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A common place for CI results?
  2019-05-15 20:33     ` Dan Rue
@ 2019-05-15 21:06       ` Tom Gall
  2019-05-20 15:32         ` Veronika Kabatova
  2019-05-15 22:58       ` [Automated-testing] " Carlos Hernandez
  1 sibling, 1 reply; 15+ messages in thread
From: Tom Gall @ 2019-05-15 21:06 UTC (permalink / raw)
  To: kernelci, Dan Rue; +Cc: Tim.Bird, vkabatov, automated-testing, info



> On May 15, 2019, at 3:33 PM, Dan Rue <dan.rue@linaro.org> wrote:
> 
> On Tue, May 14, 2019 at 11:01:35PM +0000, Tim.Bird@sony.com wrote:
>> 
>> 
>>> -----Original Message-----
>>> From: Veronika Kabatova
>>> 
>>> Hi,
>>> 
>>> as we know from this list, there's plenty CI systems doing some testing on
>>> the
>>> upstream kernels (and maybe some others we don't know about).
>>> 
>>> It would be great if there was a single common place where all the CI systems
>>> can put their results. This would make it much easier for the kernel
>>> maintainers and developers to see testing status since they only need to
>>> check one place instead of having a list of sites/mailing lists where each CI
>>> posts their contributions.
>>> 
>>> 
>>> A few weeks ago, with some people we've been talking about kernelci.org
>>> being
>>> in a good place to act as the central upstream kernel CI piece that most
>>> maintainers already know about. So I'm wondering if it would be possible for
>>> kernelci to also act as an aggregator of all results? There's already an API
>>> for publishing a report [0] so it shouldn't be too hard to adjust it to
>>> handle and show more information. I also found the beta version for test
>>> results [1] so actually, most of the needed functionality seems to be already
>>> there. Since there will be multiple CI systems, the source and contact point
>>> for the contributor (so maintainers know whom to ask about results if
>>> needed)
>>> would likely be the only missing essential data point.
>>> 
>>> 
>>> The common place for results would also make it easier for new CI systems
>>> to
>>> get involved with upstream. There are likely other companies out there
>>> running
>>> some tests on kernel internally but don't publish the results anywhere. Only
>>> adding some API calls into their code (with the data they are allowed to
>>> publish) would make it very simple for them to start contributing. If we want
>>> to make them interested, the starting point needs to be trivial. Different
>>> companies have different setups and policies and they might not be able to
>>> fulfill arbitrary requirements so they opt to not get involved at all, which
>>> is a shame because their results can be useful. After the initial "onboarding"
>>> step they might be willing to contribute more and more too.
>>> 
>>> 
>>> Please let me know if the idea makes sense or if something similar is already
>>> in plans. I'd be happy to contribute to the effort because I believe it would
>>> make everyone's life easier and we'd all benefit from it (and maybe
>>> someone
>>> else from my team would be willing to help out too if needed).
>> 
>> I never responded to this, 
> 
> yea, you did. ;)
> 
>> but this sounds like a really good idea to me. I don't care much which
>> backend we aggregate to, but it would be good as a community to start
>> using one service to start with.  It would help to find issues with
>> the API, or the results schema, if multiple people started using it.
>> 
>> I know that people using Fuego are sending data to their own instances
>> of KernelCI.  But I don't know what the issues are for sending this
>> data to a shared KernelCI service.
>> 
>> I would be interested in hooking up my lab to send Fuego results to
>> KernelCI.  This would be a good exercise.  I'm not sure what the next
>> steps would be, but maybe we could discuss this on the next automated
>> testing conference call.
> 
> OK here's my idea.
> 
> I don't personally think kernelci (or LKFT) are set up to aggregate
> results currently. We have too many assumptions about where tests are
> coming from, how things are built, etc. In other words, dealing with
> noisy data is going to be non-trivial in any existing project.

I completely agree.

> I would propose aggregating data into something like google's BigQuery.
> This has a few benefits:
> - Non-opinionated place to hold structured data
> - Allows many downstream use-cases
> - Managed hosting, and data is publicly available
> - Storage is sponsored by google as a part of
>  https://cloud.google.com/bigquery/public-data/
> - First 1TB of query per 'project' is free, and users pay for more
>  queries than that

I very much like this idea. I do lots of android kernel testing
and being able to work with / compare / contribute to what
is essentially a pile of data in BQ would be great. As an
end user working with the data I’d also have lots of dash 
board options to customize and share queries with others. 

> With storage taken care of, how do we get the data in?

> First, we'll need some canonical data structure defined. I would
> approach defining the canonical structure in conjunction with the first
> few projects that are interested in contributing their results. Each
> project will have an ETL pipeline which will extract the test results
> from a given project (such as kernelci, lkft, etc), translate it into
> the canonical data structure, and load it into the google bigquery
> dataset at a regular interval or in real-time. The translation layer is
> where things like test names are handled.

Exactly. I would hope that the various projects that are producing
data would be motived to plug in. After all, it makes the data
they are producing more useful and available to a larger group 
of people.

> The things this leaves me wanting are:
> - raw data storage. It would be nice if raw data were stored somewhere
>  permanent in some intermediary place so that later implementations
>  could happen, and for data that doesn't fit into whatever structure we
>  end up with.

I agree.

> - time, to actually try it and find the gaps. This is just an idea I've
>  been thinking about. Anyone with experience here that can help flesh
>  this out?

I’m willing to lend a hand. 

> Dan
> 
> -- 
> Linaro - Kernel Validation

Tom

—
Directory, Linaro Consumer Group


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A common place for CI results?
  2019-05-15 21:06       ` Tom Gall
@ 2019-05-20 15:32         ` Veronika Kabatova
  2019-05-28  8:24           ` Guillaume Tucker
  0 siblings, 1 reply; 15+ messages in thread
From: Veronika Kabatova @ 2019-05-20 15:32 UTC (permalink / raw)
  To: Tom Gall, Dan Rue, Tim Bird; +Cc: kernelci, automated-testing, info



----- Original Message -----
> From: "Tom Gall" <tom.gall@linaro.org>
> To: kernelci@groups.io, "Dan Rue" <dan.rue@linaro.org>
> Cc: "Tim Bird" <Tim.Bird@sony.com>, vkabatov@redhat.com, automated-testing@yoctoproject.org, info@kernelci.org
> Sent: Wednesday, May 15, 2019 11:06:33 PM
> Subject: Re: A common place for CI results?
> 
> 
> 
> > On May 15, 2019, at 3:33 PM, Dan Rue <dan.rue@linaro.org> wrote:
> > 
> > On Tue, May 14, 2019 at 11:01:35PM +0000, Tim.Bird@sony.com wrote:
> >> 
> >> 
> >>> -----Original Message-----
> >>> From: Veronika Kabatova
> >>> 
> >>> Hi,
> >>> 
> >>> as we know from this list, there's plenty CI systems doing some testing
> >>> on
> >>> the
> >>> upstream kernels (and maybe some others we don't know about).
> >>> 
> >>> It would be great if there was a single common place where all the CI
> >>> systems
> >>> can put their results. This would make it much easier for the kernel
> >>> maintainers and developers to see testing status since they only need to
> >>> check one place instead of having a list of sites/mailing lists where
> >>> each CI
> >>> posts their contributions.
> >>> 
> >>> 
> >>> A few weeks ago, with some people we've been talking about kernelci.org
> >>> being
> >>> in a good place to act as the central upstream kernel CI piece that most
> >>> maintainers already know about. So I'm wondering if it would be possible
> >>> for
> >>> kernelci to also act as an aggregator of all results? There's already an
> >>> API
> >>> for publishing a report [0] so it shouldn't be too hard to adjust it to
> >>> handle and show more information. I also found the beta version for test
> >>> results [1] so actually, most of the needed functionality seems to be
> >>> already
> >>> there. Since there will be multiple CI systems, the source and contact
> >>> point
> >>> for the contributor (so maintainers know whom to ask about results if
> >>> needed)
> >>> would likely be the only missing essential data point.
> >>> 
> >>> 
> >>> The common place for results would also make it easier for new CI systems
> >>> to
> >>> get involved with upstream. There are likely other companies out there
> >>> running
> >>> some tests on kernel internally but don't publish the results anywhere.
> >>> Only
> >>> adding some API calls into their code (with the data they are allowed to
> >>> publish) would make it very simple for them to start contributing. If we
> >>> want
> >>> to make them interested, the starting point needs to be trivial.
> >>> Different
> >>> companies have different setups and policies and they might not be able
> >>> to
> >>> fulfill arbitrary requirements so they opt to not get involved at all,
> >>> which
> >>> is a shame because their results can be useful. After the initial
> >>> "onboarding"
> >>> step they might be willing to contribute more and more too.
> >>> 
> >>> 
> >>> Please let me know if the idea makes sense or if something similar is
> >>> already
> >>> in plans. I'd be happy to contribute to the effort because I believe it
> >>> would
> >>> make everyone's life easier and we'd all benefit from it (and maybe
> >>> someone
> >>> else from my team would be willing to help out too if needed).
> >> 
> >> I never responded to this,
> > 
> > yea, you did. ;)
> > 
> >> but this sounds like a really good idea to me. I don't care much which
> >> backend we aggregate to, but it would be good as a community to start
> >> using one service to start with.  It would help to find issues with
> >> the API, or the results schema, if multiple people started using it.
> >> 
> >> I know that people using Fuego are sending data to their own instances
> >> of KernelCI.  But I don't know what the issues are for sending this
> >> data to a shared KernelCI service.
> >> 
> >> I would be interested in hooking up my lab to send Fuego results to
> >> KernelCI.  This would be a good exercise.  I'm not sure what the next
> >> steps would be, but maybe we could discuss this on the next automated
> >> testing conference call.
> > 
> > OK here's my idea.
> > 
> > I don't personally think kernelci (or LKFT) are set up to aggregate
> > results currently. We have too many assumptions about where tests are
> > coming from, how things are built, etc. In other words, dealing with
> > noisy data is going to be non-trivial in any existing project.
> 
> I completely agree.
> 

This is a good point. I'm totally fine with having a separate independent
place for aggregation.

> > I would propose aggregating data into something like google's BigQuery.
> > This has a few benefits:
> > - Non-opinionated place to hold structured data
> > - Allows many downstream use-cases
> > - Managed hosting, and data is publicly available
> > - Storage is sponsored by google as a part of
> >  https://cloud.google.com/bigquery/public-data/
> > - First 1TB of query per 'project' is free, and users pay for more
> >  queries than that
> 
> I very much like this idea. I do lots of android kernel testing
> and being able to work with / compare / contribute to what
> is essentially a pile of data in BQ would be great. As an
> end user working with the data I’d also have lots of dash
> board options to customize and share queries with others.
> 
> > With storage taken care of, how do we get the data in?
> 
> > First, we'll need some canonical data structure defined. I would
> > approach defining the canonical structure in conjunction with the first
> > few projects that are interested in contributing their results. Each
> > project will have an ETL pipeline which will extract the test results
> > from a given project (such as kernelci, lkft, etc), translate it into
> > the canonical data structure, and load it into the google bigquery
> > dataset at a regular interval or in real-time. The translation layer is
> > where things like test names are handled.
> 

+1, exactly how I imagined this part.

> Exactly. I would hope that the various projects that are producing
> data would be motived to plug in. After all, it makes the data
> they are producing more useful and available to a larger group
> of people.
> 
> > The things this leaves me wanting are:
> > - raw data storage. It would be nice if raw data were stored somewhere
> >  permanent in some intermediary place so that later implementations
> >  could happen, and for data that doesn't fit into whatever structure we
> >  end up with.
> 
> I agree.

+1

> 
> > - time, to actually try it and find the gaps. This is just an idea I've
> >  been thinking about. Anyone with experience here that can help flesh
> >  this out?
> 
> I’m willing to lend a hand.
> 

Thanks for starting up a specific proposal! I agree with everything that was
brought up. I'll try to find time to participate in the implementation part
too (although my experience with data storage is.. limited, I should be able
to help out with the structure prototyping and maybe other parts too).


Thanks again,

Veronika
CKI Project

> > Dan
> > 
> > --
> > Linaro - Kernel Validation
> 
> Tom
> 
> —
> Directory, Linaro Consumer Group
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A common place for CI results?
  2019-05-20 15:32         ` Veronika Kabatova
@ 2019-05-28  8:24           ` Guillaume Tucker
  2019-05-28 14:45             ` Veronika Kabatova
  0 siblings, 1 reply; 15+ messages in thread
From: Guillaume Tucker @ 2019-05-28  8:24 UTC (permalink / raw)
  To: kernelci, vkabatov; +Cc: Tom Gall, Dan Rue, Tim Bird, automated-testing, info

[-- Attachment #1: Type: text/plain, Size: 11085 bytes --]

Hello,

On Mon, May 20, 2019 at 4:38 PM Veronika Kabatova <vkabatov@redhat.com>
wrote:

>
>
> ----- Original Message -----
> > From: "Tom Gall" <tom.gall@linaro.org>
> > To: kernelci@groups.io, "Dan Rue" <dan.rue@linaro.org>
> > Cc: "Tim Bird" <Tim.Bird@sony.com>, vkabatov@redhat.com,
> automated-testing@yoctoproject.org, info@kernelci.org
> > Sent: Wednesday, May 15, 2019 11:06:33 PM
> > Subject: Re: A common place for CI results?
> >
> >
> >
> > > On May 15, 2019, at 3:33 PM, Dan Rue <dan.rue@linaro.org> wrote:
> > >
> > > On Tue, May 14, 2019 at 11:01:35PM +0000, Tim.Bird@sony.com wrote:
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Veronika Kabatova
> > >>>
> > >>> Hi,
> > >>>
> > >>> as we know from this list, there's plenty CI systems doing some
> testing
> > >>> on
> > >>> the
> > >>> upstream kernels (and maybe some others we don't know about).
> > >>>
> > >>> It would be great if there was a single common place where all the CI
> > >>> systems
> > >>> can put their results. This would make it much easier for the kernel
> > >>> maintainers and developers to see testing status since they only
> need to
> > >>> check one place instead of having a list of sites/mailing lists where
> > >>> each CI
> > >>> posts their contributions.
> > >>>
> > >>>
> > >>> A few weeks ago, with some people we've been talking about
> kernelci.org
> > >>> being
> > >>> in a good place to act as the central upstream kernel CI piece that
> most
> > >>> maintainers already know about. So I'm wondering if it would be
> possible
> > >>> for
> > >>> kernelci to also act as an aggregator of all results? There's
> already an
> > >>> API
> > >>> for publishing a report [0] so it shouldn't be too hard to adjust it
> to
> > >>> handle and show more information. I also found the beta version for
> test
> > >>> results [1] so actually, most of the needed functionality seems to be
> > >>> already
> > >>> there. Since there will be multiple CI systems, the source and
> contact
> > >>> point
> > >>> for the contributor (so maintainers know whom to ask about results if
> > >>> needed)
> > >>> would likely be the only missing essential data point.
> > >>>
> > >>>
> > >>> The common place for results would also make it easier for new CI
> systems
> > >>> to
> > >>> get involved with upstream. There are likely other companies out
> there
> > >>> running
> > >>> some tests on kernel internally but don't publish the results
> anywhere.
> > >>> Only
> > >>> adding some API calls into their code (with the data they are
> allowed to
> > >>> publish) would make it very simple for them to start contributing.
> If we
> > >>> want
> > >>> to make them interested, the starting point needs to be trivial.
> > >>> Different
> > >>> companies have different setups and policies and they might not be
> able
> > >>> to
> > >>> fulfill arbitrary requirements so they opt to not get involved at
> all,
> > >>> which
> > >>> is a shame because their results can be useful. After the initial
> > >>> "onboarding"
> > >>> step they might be willing to contribute more and more too.
> > >>>
> > >>>
> > >>> Please let me know if the idea makes sense or if something similar is
> > >>> already
> > >>> in plans. I'd be happy to contribute to the effort because I believe
> it
> > >>> would
> > >>> make everyone's life easier and we'd all benefit from it (and maybe
> > >>> someone
> > >>> else from my team would be willing to help out too if needed).
> > >>
> > >> I never responded to this,
> > >
> > > yea, you did. ;)
> > >
> > >> but this sounds like a really good idea to me. I don't care much which
> > >> backend we aggregate to, but it would be good as a community to start
> > >> using one service to start with.  It would help to find issues with
> > >> the API, or the results schema, if multiple people started using it.
> > >>
> > >> I know that people using Fuego are sending data to their own instances
> > >> of KernelCI.  But I don't know what the issues are for sending this
> > >> data to a shared KernelCI service.
> > >>
> > >> I would be interested in hooking up my lab to send Fuego results to
> > >> KernelCI.  This would be a good exercise.  I'm not sure what the next
> > >> steps would be, but maybe we could discuss this on the next automated
> > >> testing conference call.
> > >
> > > OK here's my idea.
> > >
> > > I don't personally think kernelci (or LKFT) are set up to aggregate
> > > results currently. We have too many assumptions about where tests are
> > > coming from, how things are built, etc. In other words, dealing with
> > > noisy data is going to be non-trivial in any existing project.
> >
> > I completely agree.
> >
>
> This is a good point. I'm totally fine with having a separate independent
> place for aggregation.
>
> > > I would propose aggregating data into something like google's BigQuery.
> > > This has a few benefits:
> > > - Non-opinionated place to hold structured data
> > > - Allows many downstream use-cases
> > > - Managed hosting, and data is publicly available
> > > - Storage is sponsored by google as a part of
> > >  https://cloud.google.com/bigquery/public-data/
> > > - First 1TB of query per 'project' is free, and users pay for more
> > >  queries than that
> >
> > I very much like this idea. I do lots of android kernel testing
> > and being able to work with / compare / contribute to what
> > is essentially a pile of data in BQ would be great. As an
> > end user working with the data I’d also have lots of dash
> > board options to customize and share queries with others.
> >
> > > With storage taken care of, how do we get the data in?
> >
> > > First, we'll need some canonical data structure defined. I would
> > > approach defining the canonical structure in conjunction with the first
> > > few projects that are interested in contributing their results. Each
> > > project will have an ETL pipeline which will extract the test results
> > > from a given project (such as kernelci, lkft, etc), translate it into
> > > the canonical data structure, and load it into the google bigquery
> > > dataset at a regular interval or in real-time. The translation layer is
> > > where things like test names are handled.
> >
>
> +1, exactly how I imagined this part.
>
> > Exactly. I would hope that the various projects that are producing
> > data would be motived to plug in. After all, it makes the data
> > they are producing more useful and available to a larger group
> > of people.
> >
> > > The things this leaves me wanting are:
> > > - raw data storage. It would be nice if raw data were stored somewhere
> > >  permanent in some intermediary place so that later implementations
> > >  could happen, and for data that doesn't fit into whatever structure we
> > >  end up with.
> >
> > I agree.
>
> +1
>
> >
> > > - time, to actually try it and find the gaps. This is just an idea I've
> > >  been thinking about. Anyone with experience here that can help flesh
> > >  this out?
> >
> > I’m willing to lend a hand.
> >
>
> Thanks for starting up a specific proposal! I agree with everything that
> was
> brought up. I'll try to find time to participate in the implementation part
> too (although my experience with data storage is.. limited, I should be
> able
> to help out with the structure prototyping and maybe other parts too).
>

This all sounds great: having a common location to store the
results that is scalable and definitions of test case names.

However, there is a whole layer of logic above and around this
which KernelCI does, and I'm sure other CI systems also do with
some degree of overlap.  So it seems to me that solving how to
deal with the results is only one piece in the puzzle to get a
common CI architecture for upstream kernel testing.  Sorry I'm a
bit late to the party so I'll add my 2¢ here...

Around the end of last year I made this document and mentioned it
on this list, about making KernelCI more modular:


https://docs.google.com/document/d/15F42HdHTO6NbSL53_iLl77lfe1XQKdWaHAf7XCNkKD8/edit?usp=sharing
  https://groups.io/g/kernelci/topic/kernelci_modular_pipeline/29692355

The idea is to make it possible to have alternative components in
the KernelCI "pipeline".  Right now, KernelCI has these
components:

* Jenkins job to monitor git branches and build kernels
* LAVA to run tests
* Custom backend and storage server to keep binaries and data
* Custom web frontend to show the results

They could all be replaced or used in conjunction with
alternative build systems, database engines, test lab schedulers
and dashboards.  The key thing is the code orchestrating all
this, which is kept in the kernelci-core repository.

For example, when a change has been detected in a tree, rather
than triggering kernel builds on Jenkins there could be a request
sent to another build system to do that elsewhere.  Likewise,
when some builds are ready to be tested, jobs could be scheduled
in non-LAVA labs simply by sending another kind of HTTP request
than the ones we're currently sending to the LAVA APIs.  This
could be easily described in some config files, in fact we
already have one with the list of labs where to submit jobs to.
Builds and tests are configured in YAML files in KernelCI, which
could easily be extended too with new attributes.

The big advantage of having a central way to orchestrate all this
is that results are going to be consistent and higher-level
features can be enabled: each tree will be sampled at the same
commit, so we don't end up with one CI lab running a version of
mainline a few patches than another one etc...  It means we can
expand some KernelCI features to a larger ecosystem of CI labs,
such as:

* redundancy checks when the same test / hardware is tested in
  multiple places (say, if a RPi fails to boot in a LAVA lab but
  not in CKI's lab...)

* regression tracking across the whole spectrum of CI labs

* common reports for each single kernel revision being tested

* bisections extended to non-LAVA labs

It feels like a diagram would be needed to really give an idea of
how this would work.  APIs and callback mechanisms would need to
be well defined to have clear entry points for the various
components in a modular system like this.  I think we would be
able to reuse some of the things currently used by KernelCI and
improve them, taking into account what other CI labs have been
doing (LKFT, CKI...).

I'm only scratching the surface here, but I wanted to raise this
point to see if others shared the same vision.  It would be
unfortunate if we came up with a great solution focused on
results, but then realised that it had big design limitations
when trying to add more abstract functionality across all the
contributing CI labs.

Well, I tried to keep it short - hope this makes any sense.

Cheers,
Guillaume

[-- Attachment #2: Type: text/html, Size: 14230 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: A common place for CI results?
  2019-05-28  8:24           ` Guillaume Tucker
@ 2019-05-28 14:45             ` Veronika Kabatova
  0 siblings, 0 replies; 15+ messages in thread
From: Veronika Kabatova @ 2019-05-28 14:45 UTC (permalink / raw)
  To: Guillaume Tucker
  Cc: kernelci, Tom Gall, Dan Rue, Tim Bird, automated-testing, info

----- Original Message ----- 

> From: "Guillaume Tucker" <guillaume.tucker@gmail.com>
> To: kernelci@groups.io, vkabatov@redhat.com
> Cc: "Tom Gall" <tom.gall@linaro.org>, "Dan Rue" <dan.rue@linaro.org>, "Tim
> Bird" <Tim.Bird@sony.com>, automated-testing@yoctoproject.org,
> info@kernelci.org
> Sent: Tuesday, May 28, 2019 10:24:44 AM
> Subject: Re: A common place for CI results?

> Hello,

> On Mon, May 20, 2019 at 4:38 PM Veronika Kabatova < vkabatov@redhat.com >
> wrote:

> > ----- Original Message -----
> 
> > > From: "Tom Gall" < tom.gall@linaro.org >
> 
> > > To: kernelci@groups.io , "Dan Rue" < dan.rue@linaro.org >
> 
> > > Cc: "Tim Bird" < Tim.Bird@sony.com >, vkabatov@redhat.com ,
> > > automated-testing@yoctoproject.org , info@kernelci.org
> 
> > > Sent: Wednesday, May 15, 2019 11:06:33 PM
> 
> > > Subject: Re: A common place for CI results?
> 
> > >
> 
> > >
> 
> > >
> 
> > > > On May 15, 2019, at 3:33 PM, Dan Rue < dan.rue@linaro.org > wrote:
> 
> > > >
> 
> > > > On Tue, May 14, 2019 at 11:01:35PM +0000, Tim.Bird@sony.com wrote:
> 
> > > >>
> 
> > > >>
> 
> > > >>> -----Original Message-----
> 
> > > >>> From: Veronika Kabatova
> 
> > > >>>
> 
> > > >>> Hi,
> 
> > > >>>
> 
> > > >>> as we know from this list, there's plenty CI systems doing some
> > > >>> testing
> 
> > > >>> on
> 
> > > >>> the
> 
> > > >>> upstream kernels (and maybe some others we don't know about).
> 
> > > >>>
> 
> > > >>> It would be great if there was a single common place where all the CI
> 
> > > >>> systems
> 
> > > >>> can put their results. This would make it much easier for the kernel
> 
> > > >>> maintainers and developers to see testing status since they only need
> > > >>> to
> 
> > > >>> check one place instead of having a list of sites/mailing lists where
> 
> > > >>> each CI
> 
> > > >>> posts their contributions.
> 
> > > >>>
> 
> > > >>>
> 
> > > >>> A few weeks ago, with some people we've been talking about
> > > >>> kernelci.org
> 
> > > >>> being
> 
> > > >>> in a good place to act as the central upstream kernel CI piece that
> > > >>> most
> 
> > > >>> maintainers already know about. So I'm wondering if it would be
> > > >>> possible
> 
> > > >>> for
> 
> > > >>> kernelci to also act as an aggregator of all results? There's already
> > > >>> an
> 
> > > >>> API
> 
> > > >>> for publishing a report [0] so it shouldn't be too hard to adjust it
> > > >>> to
> 
> > > >>> handle and show more information. I also found the beta version for
> > > >>> test
> 
> > > >>> results [1] so actually, most of the needed functionality seems to be
> 
> > > >>> already
> 
> > > >>> there. Since there will be multiple CI systems, the source and
> > > >>> contact
> 
> > > >>> point
> 
> > > >>> for the contributor (so maintainers know whom to ask about results if
> 
> > > >>> needed)
> 
> > > >>> would likely be the only missing essential data point.
> 
> > > >>>
> 
> > > >>>
> 
> > > >>> The common place for results would also make it easier for new CI
> > > >>> systems
> 
> > > >>> to
> 
> > > >>> get involved with upstream. There are likely other companies out
> > > >>> there
> 
> > > >>> running
> 
> > > >>> some tests on kernel internally but don't publish the results
> > > >>> anywhere.
> 
> > > >>> Only
> 
> > > >>> adding some API calls into their code (with the data they are allowed
> > > >>> to
> 
> > > >>> publish) would make it very simple for them to start contributing. If
> > > >>> we
> 
> > > >>> want
> 
> > > >>> to make them interested, the starting point needs to be trivial.
> 
> > > >>> Different
> 
> > > >>> companies have different setups and policies and they might not be
> > > >>> able
> 
> > > >>> to
> 
> > > >>> fulfill arbitrary requirements so they opt to not get involved at
> > > >>> all,
> 
> > > >>> which
> 
> > > >>> is a shame because their results can be useful. After the initial
> 
> > > >>> "onboarding"
> 
> > > >>> step they might be willing to contribute more and more too.
> 
> > > >>>
> 
> > > >>>
> 
> > > >>> Please let me know if the idea makes sense or if something similar is
> 
> > > >>> already
> 
> > > >>> in plans. I'd be happy to contribute to the effort because I believe
> > > >>> it
> 
> > > >>> would
> 
> > > >>> make everyone's life easier and we'd all benefit from it (and maybe
> 
> > > >>> someone
> 
> > > >>> else from my team would be willing to help out too if needed).
> 
> > > >>
> 
> > > >> I never responded to this,
> 
> > > >
> 
> > > > yea, you did. ;)
> 
> > > >
> 
> > > >> but this sounds like a really good idea to me. I don't care much which
> 
> > > >> backend we aggregate to, but it would be good as a community to start
> 
> > > >> using one service to start with. It would help to find issues with
> 
> > > >> the API, or the results schema, if multiple people started using it.
> 
> > > >>
> 
> > > >> I know that people using Fuego are sending data to their own instances
> 
> > > >> of KernelCI. But I don't know what the issues are for sending this
> 
> > > >> data to a shared KernelCI service.
> 
> > > >>
> 
> > > >> I would be interested in hooking up my lab to send Fuego results to
> 
> > > >> KernelCI. This would be a good exercise. I'm not sure what the next
> 
> > > >> steps would be, but maybe we could discuss this on the next automated
> 
> > > >> testing conference call.
> 
> > > >
> 
> > > > OK here's my idea.
> 
> > > >
> 
> > > > I don't personally think kernelci (or LKFT) are set up to aggregate
> 
> > > > results currently. We have too many assumptions about where tests are
> 
> > > > coming from, how things are built, etc. In other words, dealing with
> 
> > > > noisy data is going to be non-trivial in any existing project.
> 
> > >
> 
> > > I completely agree.
> 
> > >
> 

> > This is a good point. I'm totally fine with having a separate independent
> 
> > place for aggregation.
> 

> > > > I would propose aggregating data into something like google's BigQuery.
> 
> > > > This has a few benefits:
> 
> > > > - Non-opinionated place to hold structured data
> 
> > > > - Allows many downstream use-cases
> 
> > > > - Managed hosting, and data is publicly available
> 
> > > > - Storage is sponsored by google as a part of
> 
> > > > https://cloud.google.com/bigquery/public-data/
> 
> > > > - First 1TB of query per 'project' is free, and users pay for more
> 
> > > > queries than that
> 
> > >
> 
> > > I very much like this idea. I do lots of android kernel testing
> 
> > > and being able to work with / compare / contribute to what
> 
> > > is essentially a pile of data in BQ would be great. As an
> 
> > > end user working with the data I’d also have lots of dash
> 
> > > board options to customize and share queries with others.
> 
> > >
> 
> > > > With storage taken care of, how do we get the data in?
> 
> > >
> 
> > > > First, we'll need some canonical data structure defined. I would
> 
> > > > approach defining the canonical structure in conjunction with the first
> 
> > > > few projects that are interested in contributing their results. Each
> 
> > > > project will have an ETL pipeline which will extract the test results
> 
> > > > from a given project (such as kernelci, lkft, etc), translate it into
> 
> > > > the canonical data structure, and load it into the google bigquery
> 
> > > > dataset at a regular interval or in real-time. The translation layer is
> 
> > > > where things like test names are handled.
> 
> > >
> 

> > +1, exactly how I imagined this part.
> 

> > > Exactly. I would hope that the various projects that are producing
> 
> > > data would be motived to plug in. After all, it makes the data
> 
> > > they are producing more useful and available to a larger group
> 
> > > of people.
> 
> > >
> 
> > > > The things this leaves me wanting are:
> 
> > > > - raw data storage. It would be nice if raw data were stored somewhere
> 
> > > > permanent in some intermediary place so that later implementations
> 
> > > > could happen, and for data that doesn't fit into whatever structure we
> 
> > > > end up with.
> 
> > >
> 
> > > I agree.
> 

> > +1
> 

> > >
> 
> > > > - time, to actually try it and find the gaps. This is just an idea I've
> 
> > > > been thinking about. Anyone with experience here that can help flesh
> 
> > > > this out?
> 
> > >
> 
> > > I’m willing to lend a hand.
> 
> > >
> 

> > Thanks for starting up a specific proposal! I agree with everything that
> > was
> 
> > brought up. I'll try to find time to participate in the implementation part
> 
> > too (although my experience with data storage is.. limited, I should be
> > able
> 
> > to help out with the structure prototyping and maybe other parts too).
> 

> This all sounds great: having a common location to store the
> results that is scalable and definitions of test case names.

> However, there is a whole layer of logic above and around this
> which KernelCI does, and I'm sure other CI systems also do with
> some degree of overlap. So it seems to me that solving how to
> deal with the results is only one piece in the puzzle to get a
> common CI architecture for upstream kernel testing. Sorry I'm a
> bit late to the party so I'll add my 2¢ here...

> Around the end of last year I made this document and mentioned it
> on this list, about making KernelCI more modular:

> https://docs.google.com/document/d/15F42HdHTO6NbSL53_iLl77lfe1XQKdWaHAf7XCNkKD8/edit?usp=sharing
> https://groups.io/g/kernelci/topic/kernelci_modular_pipeline/29692355

I have a plan to read this document but still didn't get around it :(


> The idea is to make it possible to have alternative components in
> the KernelCI "pipeline". Right now, KernelCI has these
> components:

> * Jenkins job to monitor git branches and build kernels
> * LAVA to run tests
> * Custom backend and storage server to keep binaries and data
> * Custom web frontend to show the results

> They could all be replaced or used in conjunction with
> alternative build systems, database engines, test lab schedulers
> and dashboards. The key thing is the code orchestrating all
> this, which is kept in the kernelci-core repository.

> For example, when a change has been detected in a tree, rather
> than triggering kernel builds on Jenkins there could be a request
> sent to another build system to do that elsewhere. Likewise,
> when some builds are ready to be tested, jobs could be scheduled
> in non-LAVA labs simply by sending another kind of HTTP request
> than the ones we're currently sending to the LAVA APIs. This
> could be easily described in some config files, in fact we
> already have one with the list of labs where to submit jobs to.
> Builds and tests are configured in YAML files in KernelCI, which
> could easily be extended too with new attributes.

> The big advantage of having a central way to orchestrate all this
> is that results are going to be consistent and higher-level
> features can be enabled: each tree will be sampled at the same
> commit, so we don't end up with one CI lab running a version of
> mainline a few patches than another one etc... It means we can
> expand some KernelCI features to a larger ecosystem of CI labs,
> such as:

> * redundancy checks when the same test / hardware is tested in
> multiple places (say, if a RPi fails to boot in a LAVA lab but
> not in CKI's lab...)

> * regression tracking across the whole spectrum of CI labs

> * common reports for each single kernel revision being tested

> * bisections extended to non-LAVA labs

> It feels like a diagram would be needed to really give an idea of
> how this would work. APIs and callback mechanisms would need to
> be well defined to have clear entry points for the various
> components in a modular system like this. I think we would be
> able to reuse some of the things currently used by KernelCI and
> improve them, taking into account what other CI labs have been
> doing (LKFT, CKI...).

> I'm only scratching the surface here, but I wanted to raise this
> point to see if others shared the same vision. It would be
> unfortunate if we came up with a great solution focused on
> results, but then realised that it had big design limitations
> when trying to add more abstract functionality across all the
> contributing CI labs.

This is definitely something I'd love to see in the (likely very far)
future too.

However, I'd say that it would require changes in the CI systems and not
necessarily in the result format / way of displaying the data, which is
what we are trying to set up and agree on here. Each of the CI systems in
question would be responsible for their API calls and the receiving side
would just validate what it got, and this is likely not something that
would change even in the far future when we get more integrated.


I agree with all the points you made and we should definitely keep them
in mind going forward, but unless I overlooked something (which is
totally possible :) these two things don't depend on each other.

> Well, I tried to keep it short - hope this makes any sense.

It does. I really have to find some time to get through that doc.


Veronika

> Cheers,
> Guillaume

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Automated-testing] A common place for CI results?
  2019-05-15 20:33     ` Dan Rue
  2019-05-15 21:06       ` Tom Gall
@ 2019-05-15 22:58       ` Carlos Hernandez
  2019-05-16 12:05         ` Mark Brown
  1 sibling, 1 reply; 15+ messages in thread
From: Carlos Hernandez @ 2019-05-15 22:58 UTC (permalink / raw)
  To: Dan Rue, kernelci, Tim.Bird; +Cc: info, automated-testing

[-- Attachment #1: Type: text/plain, Size: 2137 bytes --]


On 5/15/19 4:33 PM, Dan Rue wrote:
> OK here's my idea.
>
> I don't personally think kernelci (or LKFT) are set up to aggregate
> results currently. We have too many assumptions about where tests are
> coming from, how things are built, etc. In other words, dealing with
> noisy data is going to be non-trivial in any existing project.
>
> I would propose aggregating data into something like google's BigQuery.
> This has a few benefits:
> - Non-opinionated place to hold structured data
> - Allows many downstream use-cases
> - Managed hosting, and data is publicly available
> - Storage is sponsored by google as a part of
>    https://cloud.google.com/bigquery/public-data/
> - First 1TB of query per 'project' is free, and users pay for more
>    queries than that
>
> With storage taken care of, how do we get the data in?
>
> First, we'll need some canonical data structure defined. I would
> approach defining the canonical structure in conjunction with the first
> few projects that are interested in contributing their results. Each
> project will have an ETL pipeline which will extract the test results
> from a given project (such as kernelci, lkft, etc), translate it into
> the canonical data structure, and load it into the google bigquery
> dataset at a regular interval or in real-time. The translation layer is
> where things like test names are handled.

+1

I like the idea

>
> The things this leaves me wanting are:
> - raw data storage. It would be nice if raw data were stored somewhere
>    permanent in some intermediary place so that later implementations
>    could happen, and for data that doesn't fit into whatever structure we
>    end up with.

If required, we could setup a related table w/ raw data. I believe max 
cell size ~ 100MB per https://cloud.google.com/bigquery/quotas

However, another approach could be to define the structure version in 
the schema. New fields can be added and left blank for old data.

> - time, to actually try it and find the gaps. This is just an idea I've
>    been thinking about. Anyone with experience here that can help flesh
>    this out?
>
> Dan

-- 
Carlos


[-- Attachment #2: Type: text/html, Size: 3058 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Automated-testing] A common place for CI results?
  2019-05-15 22:58       ` [Automated-testing] " Carlos Hernandez
@ 2019-05-16 12:05         ` Mark Brown
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Brown @ 2019-05-16 12:05 UTC (permalink / raw)
  To: Carlos Hernandez; +Cc: Dan Rue, kernelci, Tim.Bird, automated-testing, info

[-- Attachment #1: Type: text/plain, Size: 1033 bytes --]

On Wed, May 15, 2019 at 06:58:04PM -0400, Carlos Hernandez wrote:
> On 5/15/19 4:33 PM, Dan Rue wrote:

> > This has a few benefits:
> > - Non-opinionated place to hold structured data

Of course structure is opinion :/

> +1
> 
> I like the idea

Me too.

> > The things this leaves me wanting are:
> > - raw data storage. It would be nice if raw data were stored somewhere
> >    permanent in some intermediary place so that later implementations
> >    could happen, and for data that doesn't fit into whatever structure we
> >    end up with.

> If required, we could setup a related table w/ raw data. I believe max cell
> size ~ 100MB per https://cloud.google.com/bigquery/quotas

> However, another approach could be to define the structure version in the
> schema. New fields can be added and left blank for old data.

Versioned structures do make tooling to use the data more difficult to
implement, I think Dan's idea is good especially early on when things
are being tried for the first time.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-05-28 14:45 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <299272045.11819252.1554465036421.JavaMail.zimbra@redhat.com>
2019-04-05 14:41 ` A common place for CI results? Veronika Kabatova
2019-04-08 22:16   ` Tim.Bird
2019-04-09 13:41     ` Guenter Roeck
2019-04-10  9:28       ` [Automated-testing] " Mark Brown
2019-04-10 17:47       ` Veronika Kabatova
2019-04-10 21:13         ` [Automated-testing] " Kevin Hilman
2019-04-11 16:02           ` Veronika Kabatova
2019-05-14 23:01   ` Tim.Bird
2019-05-15 20:33     ` Dan Rue
2019-05-15 21:06       ` Tom Gall
2019-05-20 15:32         ` Veronika Kabatova
2019-05-28  8:24           ` Guillaume Tucker
2019-05-28 14:45             ` Veronika Kabatova
2019-05-15 22:58       ` [Automated-testing] " Carlos Hernandez
2019-05-16 12:05         ` Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.