* KernelCI backend redesign and generic lab support
@ 2021-03-05 20:55 Guillaume Tucker
2021-04-13 2:54 ` Bjorn Andersson
0 siblings, 1 reply; 3+ messages in thread
From: Guillaume Tucker @ 2021-03-05 20:55 UTC (permalink / raw)
To: Michał Gałka, ticotimo, Nikolai Kondrashov,
Michael Grzeschik, santiago.esteban, Jan Lübbe
Cc: kernelci, automated-testing
Hello,
As it has been mentioned multiple times recently, the
kernelci-backend code is ageing pretty badly: it's doing too
many things so it's hard to maintain, there are better ways to
implement a backend now with less code, and it's still Python
2.7. Also, there is a need to better support non-LAVA labs such
as Labgrid. Finally, in order to really implement a modular
KernelCI pipeline, we need a good messaging system to
orchestrate the different components - which is similar to
having a generic way to notify labs about tests to run. For all
these reasons, it's now time to seriously consider how we should
replace it with a better architecture.
I've gathered some ideas in this email regarding how we might go
about doing that. It seems like there are several people
motivated to help on different aspects of the work, so it would
be really great to organise this as a community development
effort.
Please feel free to share your thoughts about any of the points
below, and tell whether you're interested to take part in any of
it. If there appears to be enough interest, we should schedule
a meeting to kick-start this in a couple of weeks or so.
* Design ideas
* REST API to submit / retrieve data
* same idea as existing one but simplified implementation using jsonschema
* auth tokens but if possible using existing frameworks to simplify code
* interface to database
* same idea as now but with better models implementation
* pub/sub mechanism to coordinate pipeline with events
* new feature, framework to be decided (Cloud Events? Autobahn?)
* no logic in backend, only messages
* send notifications when things get added in database
* Client side
Some features currently in kernelci-backend should be moved to client side
and rely on the pub/sub and API instead:
* LAVA callback handling (receive from LAVA, push via API)
* log parsing (subscribe to events, get log when notified, send results)
* email reports (subscribe to events, generate reports and send directly)
* KCIDB bridge (subscribe to events, forward to KCIDB API)
About getting tests to run in labs, this could then be unified
to in fact deal with LAVA labs in the same way as non-LAVA
ones. At the moment, the Jenkins pipeline knows when builds
are completed and directly schedules LAVA jobs to run.
Instead, we should have a service listening to events to know
when builds are available, and schedule LAVA jobs then. Other
labs could do that too, by receiving the same events but then
performing actions that are specific to their own
implementation. For common ones such as LabGrid and
Kubernetes, some code could be added to kernelci-core like we
currently have for LAVA to facilitate translating KernelCI
events into "lab dialects".
About emails, we could also have a micro-service listening for
emails such as replies to reports previously sent (say, to
automatically change the status of a tracked regression...) or
for specific ones such as stable reviews.
* Implementation ideas
The current Python 2.7 implementation uses Tornado as the web
framework, Redis for object caching and locking, Celery for
asynchronous processing and interfaces with MongoDB. Here's
what I propose to do:
* start new design using Python 3.x (minor version TBD) using current one as
reference rather than doing a straight port
* keep Tornado as the web framework since it still has a good community and
is well suited for backend applications
* keep Redis for caching and locking, but also use it for the pub/sub
mechanism provided out of the box (we may host it on Azure)
* see if we really need to keep Celery when we have client-side services
* keep MongoDB as it's been working well for us, also to reduce the effort
with the new design and have the ability to directly import existing data
(we may host it in Azure)
* separate the "storage" server from the backend, as it currently relies on
it to be on the same host which is causing bad design and unnecessary
dependencies (the backend shouldn't even need to read anything from
storage, only client code would be doing this using URLs stored in the
database)
* use the "kernelci" Python package from kernelci-core to define common code
as appropriate such as YAML configuration handling and JSON schema
validation, to be shared between the backend and client code
* Schema
The current schema has worked well for many years, but it has
also become inconsistent and hard to maintain. For example,
the names of the fields are getting translated in several
places from "tree" to "job", from "kernel" to "git_describe",
from "build_environment" to "compiler"... So it needs a big
refresh.
Also, one important thing to consider would be to have common
object properties for all the database entries so we could
make a tree structure with them. For example, tests may
depend on other tests and also on builds, and also on
revisions. Pretty much like object inheritance, we could have
a basic "type" and then derivatives such as build and test.
So I think we should take this opportunity to start with a new
schema design, taking inspiration from the current one and
what has been done with KCIDB in terms of content.
It would somehow relate to the YAML configuration where
dependencies should be better expressed (i.e. run this test
once this build has completed, and this other test once the
first test has completed...). This is the same dependency
tree as in the results, just without the runtime details and
actual results.
All this would deserve a discussion of its own, and I think we
should start with an over-simplified schema to get the
components up and running with the new design.
* Development
It would seem like the different pieces can be worked on in
parallel to some extent, so it would be good to create a
backlog on GitHub to define some high-level objectives
accordingly. Then people who are interested can assign issues
to them.
We should try to have this working in Docker from the start,
to it easier for all the contributors to have a a compatible
environement and also to actually deploy it. We can run an
instance of it on staging.kernelci.org with an alternative
port number than the current REST API.
I believe it should be fine to ignore the web frontend
initially, we can then adjust it to make it use the newly
designed API. We however have to keep its use-case in mind
and the type of queries it would be typically making. We may
have a minimal frontend instance reworked with only one view
as a basic end-to-end test.
How does that all sound?
Have a good week-end!
Best wishes,
Guillaume
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: KernelCI backend redesign and generic lab support
2021-03-05 20:55 KernelCI backend redesign and generic lab support Guillaume Tucker
@ 2021-04-13 2:54 ` Bjorn Andersson
2021-10-19 11:42 ` Guillaume Tucker
0 siblings, 1 reply; 3+ messages in thread
From: Bjorn Andersson @ 2021-04-13 2:54 UTC (permalink / raw)
To: kernelci, guillaume.tucker
Cc: Micha?? Ga??ka, ticotimo, Nikolai Kondrashov, Michael Grzeschik,
santiago.esteban, Jan L?bbe, automated-testing
On Fri 05 Mar 14:55 CST 2021, Guillaume Tucker wrote:
> Hello,
>
Hi Guillaume,
Sorry for taking the time to give you some feedback on this.
> As it has been mentioned multiple times recently, the
> kernelci-backend code is ageing pretty badly: it's doing too
> many things so it's hard to maintain, there are better ways to
> implement a backend now with less code, and it's still Python
> 2.7. Also, there is a need to better support non-LAVA labs such
> as Labgrid. Finally, in order to really implement a modular
> KernelCI pipeline, we need a good messaging system to
> orchestrate the different components - which is similar to
> having a generic way to notify labs about tests to run. For all
> these reasons, it's now time to seriously consider how we should
> replace it with a better architecture.
>
> I've gathered some ideas in this email regarding how we might go
> about doing that. It seems like there are several people
> motivated to help on different aspects of the work, so it would
> be really great to organise this as a community development
> effort.
>
> Please feel free to share your thoughts about any of the points
> below, and tell whether you're interested to take part in any of
> it. If there appears to be enough interest, we should schedule
> a meeting to kick-start this in a couple of weeks or so.
>
>
> * Design ideas
>
> * REST API to submit / retrieve data
> * same idea as existing one but simplified implementation using jsonschema
> * auth tokens but if possible using existing frameworks to simplify code
>
> * interface to database
> * same idea as now but with better models implementation
>
> * pub/sub mechanism to coordinate pipeline with events
> * new feature, framework to be decided (Cloud Events? Autobahn?)
> * no logic in backend, only messages
> * send notifications when things get added in database
My current approach for lab-bjorn is to poll the REST api from time to
time for builds that matches some search criteria relevant for my boards
and submit these builds to a RabbitMQ "topic" exchange. Then I have
individual jobs per board that consumes these builds, run tests and
submits test results in a queue, which finally is consumed by a thing
that reports back using the REST api.
The scraper in the beginning works, but replacing it with a subscriber
model would feel like a better design. Perhaps RabbitMQ is too low
level? But the model would be nice to have.
>
>
> * Client side
>
> Some features currently in kernelci-backend should be moved to client side
> and rely on the pub/sub and API instead:
>
> * LAVA callback handling (receive from LAVA, push via API)
> * log parsing (subscribe to events, get log when notified, send results)
Since I moved to the REST api for reporting, instead of faking a LAVA
instance, I lost a few details - such as the LAVA parser generating html
logs. Nothing serious, but unifying the interface here would be good.
Regards,
Bjorn
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: KernelCI backend redesign and generic lab support
2021-04-13 2:54 ` Bjorn Andersson
@ 2021-10-19 11:42 ` Guillaume Tucker
0 siblings, 0 replies; 3+ messages in thread
From: Guillaume Tucker @ 2021-10-19 11:42 UTC (permalink / raw)
To: Bjorn Andersson, kernelci
Cc: ticotimo, Nikolai Kondrashov, Michael Grzeschik,
santiago.esteban, automated-testing, Jan Lübbe,
Michał Gałka
On 13/04/2021 03:54, Bjorn Andersson wrote:
> On Fri 05 Mar 14:55 CST 2021, Guillaume Tucker wrote:
>
>> Hello,
>>
>
> Hi Guillaume,
>
> Sorry for taking the time to give you some feedback on this.
No worries, it's a long-term redesign :)
The good news is that we've now got something off the ground with
the new KernelCI API project:
https://github.com/kernelci/kernelci-api
See below some notes based on the initial ideas in this thread.
The proof-of-concept has helped shed some light on a few things,
now I think there's enough to start designing things in a way
that would overcome the limitations of the current
kernelci-backend.
It seems like a great opportunity for new people to start
contributing and to really make it a collaborative work. We
could have a hackfest dedicated to it in a few months' time.
There is also an Outreachy project about it, with a few
candidates already contributing:
https://www.outreachy.org/outreachy-december-2021-internship-round/communities/kernelci/create-new-kernelci-api/cfp/
While kernelci-backend is entirely monolithic, this new
architecture should be very modular with more logic on the client
side. This should facilitate people working on different things
in parallel.
Feel free to join any of the weekly calls to discuss this, every
Tuesday at 17:00 BST (https://meet.kernel.social/kernelci-dev).
>> As it has been mentioned multiple times recently, the
>> kernelci-backend code is ageing pretty badly: it's doing too
>> many things so it's hard to maintain, there are better ways to
>> implement a backend now with less code, and it's still Python
>> 2.7. Also, there is a need to better support non-LAVA labs such
>> as Labgrid. Finally, in order to really implement a modular
>> KernelCI pipeline, we need a good messaging system to
>> orchestrate the different components - which is similar to
>> having a generic way to notify labs about tests to run. For all
>> these reasons, it's now time to seriously consider how we should
>> replace it with a better architecture.
>>
>> I've gathered some ideas in this email regarding how we might go
>> about doing that. It seems like there are several people
>> motivated to help on different aspects of the work, so it would
>> be really great to organise this as a community development
>> effort.
>>
>> Please feel free to share your thoughts about any of the points
>> below, and tell whether you're interested to take part in any of
>> it. If there appears to be enough interest, we should schedule
>> a meeting to kick-start this in a couple of weeks or so.
>>
>>
>> * Design ideas
>>
>> * REST API to submit / retrieve data
>> * same idea as existing one but simplified implementation using jsonschema
Actually, the new design is using FastAPI which relies on
Pydantic and OpenAPI. This provides validation of the data
schema, automatically generated API documentation and
interoperability with other web services.
>> * auth tokens but if possible using existing frameworks to simplify code
FastAPI provides OAuth2 support, that's what the new design is
using. It's based on username / password accounts but tokens can
be used too (JWT). It also means we could use third-party
authentication e.g. GitHub...
>> * interface to database
>> * same idea as now but with better models implementation
That's where Pydantic comes into play, and it's a key part of
FastAPI which can directly validate incoming data and create
objects following Pydantic models.
Also, FastAPI relies on the asynchronous features provided
natively by Python 3 which we can use with Redis and Mongo DB via
the aioredis and motor Python packages. This means we can have
the same benefits as Celery but without the added complexity of
managing tasks "by hand". It also means the client can be
blocked and get an HTTP error if the async task failed without
blocking any backend threads. That's an advantage compared to
the current Celery-based solution, where the client gets an HTTP
202 right away when the task starts but never gets to know if the
task failed later on.
>> * pub/sub mechanism to coordinate pipeline with events
>> * new feature, framework to be decided (Cloud Events? Autobahn?)
I just made a PR for this:
https://github.com/kernelci/kernelci-api/pull/7
See the README on the incoming branch with some examples of how
to use it. It's based on Redis, but with the authentication
provided by FastAPI. It's using CloudEvents to format the
messages in a standard way, which should help interacting with
other web services that also use CloudEvents (I guess it's
becoming a standard but I don't know how widespread it is yet).
>> * no logic in backend, only messages
By "no logic", this means keeping things such as email
notifications, regression tracking, job submission, log parsing
all on the client side. I think this is still a valid principle.
>> * send notifications when things get added in database
The idea is that the backend will provide some basic mechanism to
generate event messages when events occur (e.g. when some data is
being pushed to it) in a systematic way and without any actual
application logic. I believe this makes more sense than having
every client submit both data _and_ an event to say it has
submitted some data, since this should be entirely deterministic.
> My current approach for lab-bjorn is to poll the REST api from time to
> time for builds that matches some search criteria relevant for my boards
> and submit these builds to a RabbitMQ "topic" exchange. Then I have
> individual jobs per board that consumes these builds, run tests and
> submits test results in a queue, which finally is consumed by a thing
> that reports back using the REST api.
>
> The scraper in the beginning works, but replacing it with a subscriber
> model would feel like a better design. Perhaps RabbitMQ is too low
> level? But the model would be nice to have.
This is of course one of the main use-cases for the new pub/sub
mechanism. It should in fact be a generic way of triggering
anything: builds, tests, data post-processing, email reports...
Please feel free to compare the proposed solution based on Redis
and FastAPI with other frameworks, it's still very easy to move
things around at this stage of development.
>> * Client side
>>
>> Some features currently in kernelci-backend should be moved to client side
>> and rely on the pub/sub and API instead:
>>
>> * LAVA callback handling (receive from LAVA, push via API)
>> * log parsing (subscribe to events, get log when notified, send results)
As I already mentioned above, this still seems valid to me.
> Since I moved to the REST api for reporting, instead of faking a LAVA
> instance, I lost a few details - such as the LAVA parser generating html
> logs. Nothing serious, but unifying the interface here would be good.
Absolutely.
Best wishes,
Guillaume
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-10-19 11:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-05 20:55 KernelCI backend redesign and generic lab support Guillaume Tucker
2021-04-13 2:54 ` Bjorn Andersson
2021-10-19 11:42 ` Guillaume Tucker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).