Re: [Automated-testing] A common place for CI results?

From: Carlos Hernandez <ceh@ti.com>
To: Dan Rue <dan.rue@linaro.org>, kernelci@groups.io, Tim.Bird@sony.com
Cc: info@kernelci.org, automated-testing@yoctoproject.org
Subject: Re: [Automated-testing] A common place for CI results?
Date: Wed, 15 May 2019 18:58:04 -0400	[thread overview]
Message-ID: <516df501-82c2-67d5-890e-5a4274c7ce35@ti.com> (raw)
In-Reply-To: <20190515203334.lbhpqpdyulqztr5c@xps.therub.org>

[-- Attachment #1: Type: text/plain, Size: 2137 bytes --]

On 5/15/19 4:33 PM, Dan Rue wrote:
> OK here's my idea.
>
> I don't personally think kernelci (or LKFT) are set up to aggregate
> results currently. We have too many assumptions about where tests are
> coming from, how things are built, etc. In other words, dealing with
> noisy data is going to be non-trivial in any existing project.
>
> I would propose aggregating data into something like google's BigQuery.
> This has a few benefits:
> - Non-opinionated place to hold structured data
> - Allows many downstream use-cases
> - Managed hosting, and data is publicly available
> - Storage is sponsored by google as a part of
>    https://cloud.google.com/bigquery/public-data/
> - First 1TB of query per 'project' is free, and users pay for more
>    queries than that
>
> With storage taken care of, how do we get the data in?
>
> First, we'll need some canonical data structure defined. I would
> approach defining the canonical structure in conjunction with the first
> few projects that are interested in contributing their results. Each
> project will have an ETL pipeline which will extract the test results
> from a given project (such as kernelci, lkft, etc), translate it into
> the canonical data structure, and load it into the google bigquery
> dataset at a regular interval or in real-time. The translation layer is
> where things like test names are handled.

+1

I like the idea

>
> The things this leaves me wanting are:
> - raw data storage. It would be nice if raw data were stored somewhere
>    permanent in some intermediary place so that later implementations
>    could happen, and for data that doesn't fit into whatever structure we
>    end up with.

If required, we could setup a related table w/ raw data. I believe max 
cell size ~ 100MB per https://cloud.google.com/bigquery/quotas

However, another approach could be to define the structure version in 
the schema. New fields can be added and left blank for old data.

> - time, to actually try it and find the gaps. This is just an idea I've
>    been thinking about. Anyone with experience here that can help flesh
>    this out?
>
> Dan

-- 
Carlos

[-- Attachment #2: Type: text/html, Size: 3058 bytes --]