All of lore.kernel.org
 help / color / mirror / Atom feed
From: Carlos Hernandez <ceh@ti.com>
To: Dan Rue <dan.rue@linaro.org>, kernelci@groups.io, Tim.Bird@sony.com
Cc: info@kernelci.org, automated-testing@yoctoproject.org
Subject: Re: [Automated-testing] A common place for CI results?
Date: Wed, 15 May 2019 18:58:04 -0400	[thread overview]
Message-ID: <516df501-82c2-67d5-890e-5a4274c7ce35@ti.com> (raw)
In-Reply-To: <20190515203334.lbhpqpdyulqztr5c@xps.therub.org>

[-- Attachment #1: Type: text/plain, Size: 2137 bytes --]


On 5/15/19 4:33 PM, Dan Rue wrote:
> OK here's my idea.
>
> I don't personally think kernelci (or LKFT) are set up to aggregate
> results currently. We have too many assumptions about where tests are
> coming from, how things are built, etc. In other words, dealing with
> noisy data is going to be non-trivial in any existing project.
>
> I would propose aggregating data into something like google's BigQuery.
> This has a few benefits:
> - Non-opinionated place to hold structured data
> - Allows many downstream use-cases
> - Managed hosting, and data is publicly available
> - Storage is sponsored by google as a part of
>    https://cloud.google.com/bigquery/public-data/
> - First 1TB of query per 'project' is free, and users pay for more
>    queries than that
>
> With storage taken care of, how do we get the data in?
>
> First, we'll need some canonical data structure defined. I would
> approach defining the canonical structure in conjunction with the first
> few projects that are interested in contributing their results. Each
> project will have an ETL pipeline which will extract the test results
> from a given project (such as kernelci, lkft, etc), translate it into
> the canonical data structure, and load it into the google bigquery
> dataset at a regular interval or in real-time. The translation layer is
> where things like test names are handled.

+1

I like the idea

>
> The things this leaves me wanting are:
> - raw data storage. It would be nice if raw data were stored somewhere
>    permanent in some intermediary place so that later implementations
>    could happen, and for data that doesn't fit into whatever structure we
>    end up with.

If required, we could setup a related table w/ raw data. I believe max 
cell size ~ 100MB per https://cloud.google.com/bigquery/quotas

However, another approach could be to define the structure version in 
the schema. New fields can be added and left blank for old data.

> - time, to actually try it and find the gaps. This is just an idea I've
>    been thinking about. Anyone with experience here that can help flesh
>    this out?
>
> Dan

-- 
Carlos


[-- Attachment #2: Type: text/html, Size: 3058 bytes --]

  parent reply	other threads:[~2019-05-15 22:58 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <299272045.11819252.1554465036421.JavaMail.zimbra@redhat.com>
2019-04-05 14:41 ` A common place for CI results? Veronika Kabatova
2019-04-08 22:16   ` Tim.Bird
2019-04-09 13:41     ` Guenter Roeck
2019-04-10  9:28       ` [Automated-testing] " Mark Brown
2019-04-10 17:47       ` Veronika Kabatova
2019-04-10 21:13         ` [Automated-testing] " Kevin Hilman
2019-04-11 16:02           ` Veronika Kabatova
2019-05-14 23:01   ` Tim.Bird
2019-05-15 20:33     ` Dan Rue
2019-05-15 21:06       ` Tom Gall
2019-05-20 15:32         ` Veronika Kabatova
2019-05-28  8:24           ` Guillaume Tucker
2019-05-28 14:45             ` Veronika Kabatova
2019-05-15 22:58       ` Carlos Hernandez [this message]
2019-05-16 12:05         ` [Automated-testing] " Mark Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=516df501-82c2-67d5-890e-5a4274c7ce35@ti.com \
    --to=ceh@ti.com \
    --cc=Tim.Bird@sony.com \
    --cc=automated-testing@yoctoproject.org \
    --cc=dan.rue@linaro.org \
    --cc=info@kernelci.org \
    --cc=kernelci@groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.