Expanding CI to live testing

* Expanding CI to live testing
@ 2020-06-24 20:16 Verdun, Jean-Marie
  2020-06-25 20:26 ` Andrew Geissler
  0 siblings, 1 reply; 4+ messages in thread
From: Verdun, Jean-Marie @ 2020-06-24 20:16 UTC (permalink / raw)
  To: openbmc

[-- Attachment #1: Type: text/plain, Size: 5923 bytes --]

Hi,

As some of you are aware I am working on a Continuous Integration system which allow developer to test their build on a real hardware. I built a proof of concept before we had to lockdown our Houston Campus. The good news is that it starts to work, and I am using it extensively to work on linuxboot (it is available here: https://osfci.tech). So what can I do ?

Through the CI, I can load pre-compiled BMC / BIOS image, access to the console (serial, during the system boot), control the power of the server (generating a cold boot, in the case the machine bricks due to bad code), and access to the BMC website when it is integrated. I tried to make the code modular and the architecture is the following one:

  *   An internet gateway server which handles all incoming traffic and route it to the relevant follow up servers
  *   1 to N controller nodes, which are attached to a test server. The controller node handle remote power management, as well as console access, and flash emulators which are physically connected to the SPI bus of the test machine.
     *   Attached to each controller node there is a compile machine which is used to compile either linuxboot or OpenBMC (OpenBMC is the reason of this email), in the case the user do not recompile them but just develop the code. When the built is done it is transferred to the CTRL which is loading to the live machine. There is also 1 pre-built image which can be used (if you are developing only linuxboot or only openbmc, you still can test the whole stack)

So the minimal setup requires 4 servers, which could look like a lot, but some of them could be very basic (like the controller node and the gateway).

My main goal is to be able to offer development capacity from a laptop, and to any firmware developers whatever the location he/she will be. I found it pretty useful during this tough moment that we live with the pandemic.

My secondary goal is to automatize live testing on real hardware and probably interface the CTRL pool to a Robot server ( https://robotframework.org/ ). This part still need to be developed, and the current API has the basic coding to support it, but seriously need renaming, and convention build up.

Everything is written in Golang (a lot of bad shell scripts are still here, but I am working at removing them with time going on).

I am currently under the process to adapt it to OpenBMC. The source code is public (https://github.com/HewlettPackard/osfci) . Feel free to fork that tool, and ping me if you need to know how to setup your own CI, this is also a good education tool to demonstrate the efficiency of our work with live system, that you can test. I do have a couple of end users which are discovering Open Source Firmware through that tool and my guidance.

The demo image is using our standard login/password scheme.

My current challenge with OpenBMC is related to build time and not compete with the existing infrastucture but more being integrated to it. I tried to understand how we test new pull request and it looks like that we are using Jenkins. I have no experience with it, but that is fine (I used travisci and Appveyorci).

The compile node into my current CI, could be running a jenkins build instance (not sure about the convention), this will requires that I build also the linkxboot side with the same tool. I can probably adapt that. As of now, the build process run into 2 different dockers instance, which are pretty efficient. The from scratch build for OpenBMC last about 44 minutes when I build everything from scratch (in memory file system). That is a long time, which will require either a batch scheduler, or using Jenkins as it. I can accelerate that using a non-clean state cache, but this is not really my target at the present time. (feel free to convince me I am stupid, and I shall use it !)

Right now my process is very interactive, the end user is getting access to 1 ctrl node (through a webpage), and from there he has 30 minutes allocation to either upload a pre-existing build, or specify his/her github repo, plus a branch, to kick a full build of linuxboot. That is ok with linuxboot as the build time is about 4 minutes or less (from scratch).  It is not with OpenBMC.

So I do have a couple of questions:

  *   Does the Jenkins build could be made into a Docker image knowing that my compile node runs under Ubuntu (I believe 18.04)
  *   Could we find a way when our Jenkins cluster build is done to extract the build result, automatize the transfer of it to perhaps an object storage pool with a unique UUID identifier. The challenge will be to retrieve that UUID into the gerrit page, or the Jenkins log.
  *   If the build is successful, the end user could use that unique UUID to test on a live system. The osfci system will then extract from the object storage backend the build results and bootstrap it on the first CTRL node available.
  *   Then an interactive session could start, or the robot framework system could have a look to the results and feedback jenkins, or the end user

I would like in fact to avoid that we compile two times the same thing, that look like stupid to me.

Regarding the object storage pool, I must admit that I like the minio project which is compatible with s3 storage API. So if that is something which seems reasonable to you, I can work on building up that backend. We could decide to keep build done during the past 30 days, and remove them after that period.

Let me know your thoughts ?

I believe if we have an integrated development environment we might be reducing the number of forks of the project, and hopefully have quicker upstream request, reducing the pain to integrate either old code, or handling merge conflicts. This is not fixing the “secret” aspect we might be struggling with some systems under design, but this is another story.

vejmarie

[-- Attachment #2: Type: text/html, Size: 13672 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread