All of lore.kernel.org
 help / color / mirror / Atom feed
* Expanding CI to live testing
@ 2020-06-24 20:16 Verdun, Jean-Marie
  2020-06-25 20:26 ` Andrew Geissler
  0 siblings, 1 reply; 4+ messages in thread
From: Verdun, Jean-Marie @ 2020-06-24 20:16 UTC (permalink / raw)
  To: openbmc

[-- Attachment #1: Type: text/plain, Size: 5923 bytes --]

Hi,

As some of you are aware I am working on a Continuous Integration system which allow developer to test their build on a real hardware. I built a proof of concept before we had to lockdown our Houston Campus. The good news is that it starts to work, and I am using it extensively to work on linuxboot (it is available here: https://osfci.tech). So what can I do ?

Through the CI, I can load pre-compiled BMC / BIOS image, access to the console (serial, during the system boot), control the power of the server (generating a cold boot, in the case the machine bricks due to bad code), and access to the BMC website when it is integrated. I tried to make the code modular and the architecture is the following one:


  *   An internet gateway server which handles all incoming traffic and route it to the relevant follow up servers
  *   1 to N controller nodes, which are attached to a test server. The controller node handle remote power management, as well as console access, and flash emulators which are physically connected to the SPI bus of the test machine.
     *   Attached to each controller node there is a compile machine which is used to compile either linuxboot or OpenBMC (OpenBMC is the reason of this email), in the case the user do not recompile them but just develop the code. When the built is done it is transferred to the CTRL which is loading to the live machine. There is also 1 pre-built image which can be used (if you are developing only linuxboot or only openbmc, you still can test the whole stack)

So the minimal setup requires 4 servers, which could look like a lot, but some of them could be very basic (like the controller node and the gateway).

My main goal is to be able to offer development capacity from a laptop, and to any firmware developers whatever the location he/she will be. I found it pretty useful during this tough moment that we live with the pandemic.

My secondary goal is to automatize live testing on real hardware and probably interface the CTRL pool to a Robot server ( https://robotframework.org/ ). This part still need to be developed, and the current API has the basic coding to support it, but seriously need renaming, and convention build up.

Everything is written in Golang (a lot of bad shell scripts are still here, but I am working at removing them with time going on).

I am currently under the process to adapt it to OpenBMC. The source code is public (https://github.com/HewlettPackard/osfci) . Feel free to fork that tool, and ping me if you need to know how to setup your own CI, this is also a good education tool to demonstrate the efficiency of our work with live system, that you can test. I do have a couple of end users which are discovering Open Source Firmware through that tool and my guidance.

The demo image is using our standard login/password scheme.

My current challenge with OpenBMC is related to build time and not compete with the existing infrastucture but more being integrated to it. I tried to understand how we test new pull request and it looks like that we are using Jenkins. I have no experience with it, but that is fine (I used travisci and Appveyorci).

The compile node into my current CI, could be running a jenkins build instance (not sure about the convention), this will requires that I build also the linkxboot side with the same tool. I can probably adapt that. As of now, the build process run into 2 different dockers instance, which are pretty efficient. The from scratch build for OpenBMC last about 44 minutes when I build everything from scratch (in memory file system). That is a long time, which will require either a batch scheduler, or using Jenkins as it. I can accelerate that using a non-clean state cache, but this is not really my target at the present time. (feel free to convince me I am stupid, and I shall use it !)

Right now my process is very interactive, the end user is getting access to 1 ctrl node (through a webpage), and from there he has 30 minutes allocation to either upload a pre-existing build, or specify his/her github repo, plus a branch, to kick a full build of linuxboot. That is ok with linuxboot as the build time is about 4 minutes or less (from scratch).  It is not with OpenBMC.

So I do have a couple of questions:


  *   Does the Jenkins build could be made into a Docker image knowing that my compile node runs under Ubuntu (I believe 18.04)
  *   Could we find a way when our Jenkins cluster build is done to extract the build result, automatize the transfer of it to perhaps an object storage pool with a unique UUID identifier. The challenge will be to retrieve that UUID into the gerrit page, or the Jenkins log.
  *   If the build is successful, the end user could use that unique UUID to test on a live system. The osfci system will then extract from the object storage backend the build results and bootstrap it on the first CTRL node available.
  *   Then an interactive session could start, or the robot framework system could have a look to the results and feedback jenkins, or the end user

I would like in fact to avoid that we compile two times the same thing, that look like stupid to me.

Regarding the object storage pool, I must admit that I like the minio project which is compatible with s3 storage API. So if that is something which seems reasonable to you, I can work on building up that backend. We could decide to keep build done during the past 30 days, and remove them after that period.

Let me know your thoughts ?

I believe if we have an integrated development environment we might be reducing the number of forks of the project, and hopefully have quicker upstream request, reducing the pain to integrate either old code, or handling merge conflicts. This is not fixing the “secret” aspect we might be struggling with some systems under design, but this is another story.

vejmarie

[-- Attachment #2: Type: text/html, Size: 13672 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Expanding CI to live testing
  2020-06-24 20:16 Expanding CI to live testing Verdun, Jean-Marie
@ 2020-06-25 20:26 ` Andrew Geissler
  2020-06-25 20:31   ` Verdun, Jean-Marie
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Geissler @ 2020-06-25 20:26 UTC (permalink / raw)
  To: Verdun, Jean-Marie; +Cc: openbmc



> On Jun 24, 2020, at 3:16 PM, Verdun, Jean-Marie <jean-marie.verdun@hpe.com> wrote:
> 
> Hi,
>  
> As some of you are aware I am working on a Continuous Integration system which allow developer to test their build on a real hardware. I built a proof of concept before we had to lockdown our Houston Campus. The good news is that it starts to work, and I am using it extensively to work on linuxboot (it is available here: https://osfci.tech). So what can I do ?

Hi Jean-Marie, welcome to OpenBMC. My name is Andrew and I’m involved with a lot of our OpenBMC CI efforts.

>  
> My secondary goal is to automatize live testing on real hardware and probably interface the CTRL pool to a Robot server ( https://robotframework.org/). This part still need to be developed, and the current API has the basic coding to support it, but seriously need renaming, and convention build up.

There are two types of CI in OpenBMC, repository CI, where we build and compile an individual software based repository and run it’s unit tests. This all happens within a docker container and does a variety of other tests like code formatting and valgrind type checks.

The second type of CI is where we do the full bitbake and build a real image that can be verified within QEMU and on real hardware. This CI happens once a change has been merged into a software repository.  This CI is also all driven from within docker containers. Our public OpenBMC jenkins builds a variety of system configurations. The systems built in CI are chosen based on getting the most coverage of openbmc code. Once HPE has a system in upstream, we could discuss adding it into our public CI.

https://github.com/openbmc/openbmc/wiki/OpenBMC-Infrastructure-Workgroup#infrastructure-scripts has a somewhat dated but still relevant overview of the scripts we use for CI within openbmc.

>  
> My current challenge with OpenBMC is related to build time and not compete with the existing infrastucture but more being integrated to it. I tried to understand how we test new pull request and it looks like that we are using Jenkins. I have no experience with it, but that is fine (I used travisci and Appveyorci).

Yes, it’s better to just get the system you need added to the openbmc upstream CI.  The way we do hardware CI within IBM is the following:

- We have our own Jenkins running within our lab.
- This jenkins monitors for the upstream jenkins to mark a gerrit commit as Verified (i.e it’s passed all upstream CI)
- Once this occurs, the downstream jenkins runs some logic to find the flash image it needs from the upstream jenkins
- It then uses the openbmc robot test framework suite (https://github.com/openbmc/openbmc-test-automation) to flash the image and run a set of test cases on one of our servers
- Upon completion of the downstream hardware CI, the downstream jenkins will add a comment to the gerrit review on whether it passed or failed 
 
> So I do have a couple of questions:
>  
> 	• Does the Jenkins build could be made into a Docker image knowing that my compile node runs under Ubuntu (I believe 18.04)
> 	• Could we find a way when our Jenkins cluster build is done to extract the build result, automatize the transfer of it to perhaps an object storage pool with a unique UUID identifier. The challenge will be to retrieve that UUID into the gerrit page, or the Jenkins log.
> 	• If the build is successful, the end user could use that unique UUID to test on a live system. The osfci system will then extract from the object storage backend the build results and bootstrap it on the first CTRL node available.
> 	• Then an interactive session could start, or the robot framework system could have a look to the results and feedback jenkins, or the end user 

I think it would be great if we could have your infrastructure follow a similar design as laid out above. Have it monitor gerrit for the Verify tags and then kick off validating the image(s) on your collection of hardware and report status back via a comment to the gerrit review.

> vejmarie

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Expanding CI to live testing
  2020-06-25 20:26 ` Andrew Geissler
@ 2020-06-25 20:31   ` Verdun, Jean-Marie
  2020-06-26 14:33     ` Andrew Geissler
  0 siblings, 1 reply; 4+ messages in thread
From: Verdun, Jean-Marie @ 2020-06-25 20:31 UTC (permalink / raw)
  To: Andrew Geissler; +Cc: openbmc

Hi Andrew,

Thanks for your feedback. I like your approach, and will review it carefully. Might have a ton of questions or not (who knows), but I hate to re-invent the wheel when something works on the automation side.

Do you offer public access to debug system on the IBM platform (might be fun to have a look to it on a real machine, it makes probably 20 years that I didn't had the opportunity to use a Power chip) ?

Have a great day,

vejmarie

On 6/25/20, 4:26 PM, "Andrew Geissler" <geissonator@gmail.com> wrote:



    > On Jun 24, 2020, at 3:16 PM, Verdun, Jean-Marie <jean-marie.verdun@hpe.com> wrote:
    > 
    > Hi,
    >  
    > As some of you are aware I am working on a Continuous Integration system which allow developer to test their build on a real hardware. I built a proof of concept before we had to lockdown our Houston Campus. The good news is that it starts to work, and I am using it extensively to work on linuxboot (it is available here: https://osfci.tech ). So what can I do ?

    Hi Jean-Marie, welcome to OpenBMC. My name is Andrew and I’m involved with a lot of our OpenBMC CI efforts.

    >  
    > My secondary goal is to automatize live testing on real hardware and probably interface the CTRL pool to a Robot server ( https://robotframework.org/ ). This part still need to be developed, and the current API has the basic coding to support it, but seriously need renaming, and convention build up.

    There are two types of CI in OpenBMC, repository CI, where we build and compile an individual software based repository and run it’s unit tests. This all happens within a docker container and does a variety of other tests like code formatting and valgrind type checks.

    The second type of CI is where we do the full bitbake and build a real image that can be verified within QEMU and on real hardware. This CI happens once a change has been merged into a software repository.  This CI is also all driven from within docker containers. Our public OpenBMC jenkins builds a variety of system configurations. The systems built in CI are chosen based on getting the most coverage of openbmc code. Once HPE has a system in upstream, we could discuss adding it into our public CI.

    https://github.com/openbmc/openbmc/wiki/OpenBMC-Infrastructure-Workgroup#infrastructure-scripts has a somewhat dated but still relevant overview of the scripts we use for CI within openbmc.

    >  
    > My current challenge with OpenBMC is related to build time and not compete with the existing infrastucture but more being integrated to it. I tried to understand how we test new pull request and it looks like that we are using Jenkins. I have no experience with it, but that is fine (I used travisci and Appveyorci).

    Yes, it’s better to just get the system you need added to the openbmc upstream CI.  The way we do hardware CI within IBM is the following:

    - We have our own Jenkins running within our lab.
    - This jenkins monitors for the upstream jenkins to mark a gerrit commit as Verified (i.e it’s passed all upstream CI)
    - Once this occurs, the downstream jenkins runs some logic to find the flash image it needs from the upstream jenkins
    - It then uses the openbmc robot test framework suite (https://github.com/openbmc/openbmc-test-automation) to flash the image and run a set of test cases on one of our servers
    - Upon completion of the downstream hardware CI, the downstream jenkins will add a comment to the gerrit review on whether it passed or failed 

    > So I do have a couple of questions:
    >  
    > 	• Does the Jenkins build could be made into a Docker image knowing that my compile node runs under Ubuntu (I believe 18.04)
    > 	• Could we find a way when our Jenkins cluster build is done to extract the build result, automatize the transfer of it to perhaps an object storage pool with a unique UUID identifier. The challenge will be to retrieve that UUID into the gerrit page, or the Jenkins log.
    > 	• If the build is successful, the end user could use that unique UUID to test on a live system. The osfci system will then extract from the object storage backend the build results and bootstrap it on the first CTRL node available.
    > 	• Then an interactive session could start, or the robot framework system could have a look to the results and feedback jenkins, or the end user 

    I think it would be great if we could have your infrastructure follow a similar design as laid out above. Have it monitor gerrit for the Verify tags and then kick off validating the image(s) on your collection of hardware and report status back via a comment to the gerrit review.

    > vejmarie



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Expanding CI to live testing
  2020-06-25 20:31   ` Verdun, Jean-Marie
@ 2020-06-26 14:33     ` Andrew Geissler
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Geissler @ 2020-06-26 14:33 UTC (permalink / raw)
  To: Verdun, Jean-Marie; +Cc: openbmc



> On Jun 25, 2020, at 3:31 PM, Verdun, Jean-Marie <jean-marie.verdun@hpe.com> wrote:
> 
> Hi Andrew,
> 
> Thanks for your feedback. I like your approach, and will review it carefully. Might have a ton of questions or not (who knows), but I hate to re-invent the wheel when something works on the automation side.

Sure thing, feel free to reach out here on the mailing list or IRC

> 
> Do you offer public access to debug system on the IBM platform (might be fun to have a look to it on a real machine, it makes probably 20 years that I didn't had the opportunity to use a Power chip) ?
> 

Nope, no public access here. In fact, one of the limitations of using a system like this is an IBM employee (usually me or George) have to look at the HW CI failures and post the relevant debug data to the gerrit commit.

> Have a great day,
> 
> vejmarie
> 
> On 6/25/20, 4:26 PM, "Andrew Geissler" <geissonator@gmail.com> wrote:
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-06-26 14:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24 20:16 Expanding CI to live testing Verdun, Jean-Marie
2020-06-25 20:26 ` Andrew Geissler
2020-06-25 20:31   ` Verdun, Jean-Marie
2020-06-26 14:33     ` Andrew Geissler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.