From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Mon, 20 May 2019 11:32:09 -0400 (EDT)
From: Veronika Kabatova <vkabatov@redhat.com>
Message-ID: <1070842434.21130927.1558366329557.JavaMail.zimbra@redhat.com>
In-Reply-To: <A0783550-C80A-4037-9B18-256AEA3CDAA7@linaro.org>
References: <299272045.11819252.1554465036421.JavaMail.zimbra@redhat.com> <457016061.11846096.1554475313006.JavaMail.zimbra@redhat.com> <ECADFF3FD767C149AD96A924E7EA6EAF977191AE@USCULXMSG01.am.sony.com> <20190515203334.lbhpqpdyulqztr5c@xps.therub.org> <A0783550-C80A-4037-9B18-256AEA3CDAA7@linaro.org>
Subject: Re: A common place for CI results?
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
List-ID: <kernelci.groups.io>
To: Tom Gall <tom.gall@linaro.org>, Dan Rue <dan.rue@linaro.org>, Tim Bird <Tim.Bird@sony.com>
Cc: kernelci@groups.io, automated-testing@yoctoproject.org, info@kernelci.org


----- Original Message -----
> From: "Tom Gall" <tom.gall@linaro.org>
> To: kernelci@groups.io, "Dan Rue" <dan.rue@linaro.org>
> Cc: "Tim Bird" <Tim.Bird@sony.com>, vkabatov@redhat.com, automated-testin=
g@yoctoproject.org, info@kernelci.org
> Sent: Wednesday, May 15, 2019 11:06:33 PM
> Subject: Re: A common place for CI results?
>=20
>=20
>=20
> > On May 15, 2019, at 3:33 PM, Dan Rue <dan.rue@linaro.org> wrote:
> >=20
> > On Tue, May 14, 2019 at 11:01:35PM +0000, Tim.Bird@sony.com wrote:
> >>=20
> >>=20
> >>> -----Original Message-----
> >>> From: Veronika Kabatova
> >>>=20
> >>> Hi,
> >>>=20
> >>> as we know from this list, there's plenty CI systems doing some testi=
ng
> >>> on
> >>> the
> >>> upstream kernels (and maybe some others we don't know about).
> >>>=20
> >>> It would be great if there was a single common place where all the CI
> >>> systems
> >>> can put their results. This would make it much easier for the kernel
> >>> maintainers and developers to see testing status since they only need=
 to
> >>> check one place instead of having a list of sites/mailing lists where
> >>> each CI
> >>> posts their contributions.
> >>>=20
> >>>=20
> >>> A few weeks ago, with some people we've been talking about kernelci.o=
rg
> >>> being
> >>> in a good place to act as the central upstream kernel CI piece that m=
ost
> >>> maintainers already know about. So I'm wondering if it would be possi=
ble
> >>> for
> >>> kernelci to also act as an aggregator of all results? There's already=
 an
> >>> API
> >>> for publishing a report [0] so it shouldn't be too hard to adjust it =
to
> >>> handle and show more information. I also found the beta version for t=
est
> >>> results [1] so actually, most of the needed functionality seems to be
> >>> already
> >>> there. Since there will be multiple CI systems, the source and contac=
t
> >>> point
> >>> for the contributor (so maintainers know whom to ask about results if
> >>> needed)
> >>> would likely be the only missing essential data point.
> >>>=20
> >>>=20
> >>> The common place for results would also make it easier for new CI sys=
tems
> >>> to
> >>> get involved with upstream. There are likely other companies out ther=
e
> >>> running
> >>> some tests on kernel internally but don't publish the results anywher=
e.
> >>> Only
> >>> adding some API calls into their code (with the data they are allowed=
 to
> >>> publish) would make it very simple for them to start contributing. If=
 we
> >>> want
> >>> to make them interested, the starting point needs to be trivial.
> >>> Different
> >>> companies have different setups and policies and they might not be ab=
le
> >>> to
> >>> fulfill arbitrary requirements so they opt to not get involved at all=
,
> >>> which
> >>> is a shame because their results can be useful. After the initial
> >>> "onboarding"
> >>> step they might be willing to contribute more and more too.
> >>>=20
> >>>=20
> >>> Please let me know if the idea makes sense or if something similar is
> >>> already
> >>> in plans. I'd be happy to contribute to the effort because I believe =
it
> >>> would
> >>> make everyone's life easier and we'd all benefit from it (and maybe
> >>> someone
> >>> else from my team would be willing to help out too if needed).
> >>=20
> >> I never responded to this,
> >=20
> > yea, you did. ;)
> >=20
> >> but this sounds like a really good idea to me. I don't care much which
> >> backend we aggregate to, but it would be good as a community to start
> >> using one service to start with.  It would help to find issues with
> >> the API, or the results schema, if multiple people started using it.
> >>=20
> >> I know that people using Fuego are sending data to their own instances
> >> of KernelCI.  But I don't know what the issues are for sending this
> >> data to a shared KernelCI service.
> >>=20
> >> I would be interested in hooking up my lab to send Fuego results to
> >> KernelCI.  This would be a good exercise.  I'm not sure what the next
> >> steps would be, but maybe we could discuss this on the next automated
> >> testing conference call.
> >=20
> > OK here's my idea.
> >=20
> > I don't personally think kernelci (or LKFT) are set up to aggregate
> > results currently. We have too many assumptions about where tests are
> > coming from, how things are built, etc. In other words, dealing with
> > noisy data is going to be non-trivial in any existing project.
>=20
> I completely agree.
>=20

This is a good point. I'm totally fine with having a separate independent
place for aggregation.

> > I would propose aggregating data into something like google's BigQuery.
> > This has a few benefits:
> > - Non-opinionated place to hold structured data
> > - Allows many downstream use-cases
> > - Managed hosting, and data is publicly available
> > - Storage is sponsored by google as a part of
> >  https://cloud.google.com/bigquery/public-data/
> > - First 1TB of query per 'project' is free, and users pay for more
> >  queries than that
>=20
> I very much like this idea. I do lots of android kernel testing
> and being able to work with / compare / contribute to what
> is essentially a pile of data in BQ would be great. As an
> end user working with the data I=E2=80=99d also have lots of dash
> board options to customize and share queries with others.
>=20
> > With storage taken care of, how do we get the data in?
>=20
> > First, we'll need some canonical data structure defined. I would
> > approach defining the canonical structure in conjunction with the first
> > few projects that are interested in contributing their results. Each
> > project will have an ETL pipeline which will extract the test results
> > from a given project (such as kernelci, lkft, etc), translate it into
> > the canonical data structure, and load it into the google bigquery
> > dataset at a regular interval or in real-time. The translation layer is
> > where things like test names are handled.
>=20

+1, exactly how I imagined this part.

> Exactly. I would hope that the various projects that are producing
> data would be motived to plug in. After all, it makes the data
> they are producing more useful and available to a larger group
> of people.
>=20
> > The things this leaves me wanting are:
> > - raw data storage. It would be nice if raw data were stored somewhere
> >  permanent in some intermediary place so that later implementations
> >  could happen, and for data that doesn't fit into whatever structure we
> >  end up with.
>=20
> I agree.

+1

>=20
> > - time, to actually try it and find the gaps. This is just an idea I've
> >  been thinking about. Anyone with experience here that can help flesh
> >  this out?
>=20
> I=E2=80=99m willing to lend a hand.
>=20

Thanks for starting up a specific proposal! I agree with everything that wa=
s
brought up. I'll try to find time to participate in the implementation part
too (although my experience with data storage is.. limited, I should be abl=
e
to help out with the structure prototyping and maybe other parts too).


Thanks again,

Veronika
CKI Project

> > Dan
> >=20
> > --
> > Linaro - Kernel Validation
>=20
> Tom
>=20
> =E2=80=94
> Directory, Linaro Consumer Group
>=20
>=20