All of lore.kernel.org
 help / color / mirror / Atom feed
* Reduced build and test coverage
@ 2023-07-06 13:51 Guillaume Tucker
  2023-07-13  5:51 ` Guillaume Tucker
  0 siblings, 1 reply; 9+ messages in thread
From: Guillaume Tucker @ 2023-07-06 13:51 UTC (permalink / raw)
  To: kernelci

While we now have a stable Azure subscription, it turns out the
resources we're using are burning our newly allocated budget much
more quickly than what we had before.  There are several ways to
address this so we're confident it's only a temporary issue, but
in the meantime we have to downscale our build and test coverage
to keep the daily costs as low as possible.

As such, we're planning to make the following changes during
Monday's production update:

* no allmodconfig builds
* no builds with special config fragments (crypto, ima, x86_kvm_guest...)
* most trees will only be building a minimal set of configs
* kselftest and LTP test collections are each run on only one platform
* baseline-nfs is disabled on platforms that don't run tests using NFS

This will allow us to shut down some Kubernetes clusters used for
kernel builds and reduce the data transfers from storage.  We'll
confirm next week whether this is going to be rolled out in
production and provide more updates as we find some solutions.
Please let us know if this is causing some disruption and we may
be able to tweak things to reach a compromise until things are
back to normal.

Thanks,
Guillaume

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reduced build and test coverage
  2023-07-06 13:51 Reduced build and test coverage Guillaume Tucker
@ 2023-07-13  5:51 ` Guillaume Tucker
  2023-09-19 14:29   ` Ricardo Cañuelo
  0 siblings, 1 reply; 9+ messages in thread
From: Guillaume Tucker @ 2023-07-13  5:51 UTC (permalink / raw)
  To: kernelci

On 06/07/2023 15:51, Guillaume Tucker wrote:
> While we now have a stable Azure subscription, it turns out the
> resources we're using are burning our newly allocated budget much
> more quickly than what we had before.  There are several ways to
> address this so we're confident it's only a temporary issue, but
> in the meantime we have to downscale our build and test coverage
> to keep the daily costs as low as possible.
> 
> As such, we're planning to make the following changes during
> Monday's production update:
> 
> * no allmodconfig builds
> * no builds with special config fragments (crypto, ima, x86_kvm_guest...)
> * most trees will only be building a minimal set of configs
> * kselftest and LTP test collections are each run on only one platform
> * baseline-nfs is disabled on platforms that don't run tests using NFS
> 
> This will allow us to shut down some Kubernetes clusters used for
> kernel builds and reduce the data transfers from storage.  We'll
> confirm next week whether this is going to be rolled out in
> production and provide more updates as we find some solutions.
> Please let us know if this is causing some disruption and we may
> be able to tweak things to reach a compromise until things are
> back to normal.

This is to confirm we've now deployed the trimmed-down
configuration with fewer builds and tests as described above.
You can see the results coming in with smaller numbers here:

  https://linux.kernelci.org/job/

The good news is that it is reducing costs a lot which should
help us provide some continuity of service until we find a
solution to restore full coverage.  Several alternatives were
discussed in the board meeting yesterday and we should come up
with something by the end of July.

On a side note, we're currently only building some cherry-picked
user-space rootfs images when some functional changes are needed
to run tests correctly.  If you require one to be updated (LTP,
IGT, v4l2-compliance...) then please let us know.

Thanks,
Guillaume


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reduced build and test coverage
  2023-07-13  5:51 ` Guillaume Tucker
@ 2023-09-19 14:29   ` Ricardo Cañuelo
  2023-09-19 14:31     ` Mark Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Ricardo Cañuelo @ 2023-09-19 14:29 UTC (permalink / raw)
  To: Guillaume Tucker; +Cc: kernelci

Hi Guillaume,

Would it be possible to have the bisector running again? The bisector
results are the main driving point when reporting regressions found by
KernelCI, and without them the task of investigating regressions is
rather impractical and time consuming. Even running it at a reduced
capacity is much better than not having it at all.

The branches we'd be interested in are (in order of importance):
- mainline/master (the bare minimum)
- stable-rc/queue/6.1
- stable-rc/queue/5.15
- stable-rc/queue/5.4
- stable-rc/queue/4.19

Thanks,
Ricardo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reduced build and test coverage
  2023-09-19 14:29   ` Ricardo Cañuelo
@ 2023-09-19 14:31     ` Mark Brown
  2023-09-19 14:44       ` Ricardo Cañuelo
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Brown @ 2023-09-19 14:31 UTC (permalink / raw)
  To: Ricardo Cañuelo; +Cc: Guillaume Tucker, kernelci

[-- Attachment #1: Type: text/plain, Size: 700 bytes --]

On Tue, Sep 19, 2023 at 04:29:21PM +0200, Ricardo Cañuelo wrote:
> Hi Guillaume,
> 
> Would it be possible to have the bisector running again? The bisector
> results are the main driving point when reporting regressions found by
> KernelCI, and without them the task of investigating regressions is
> rather impractical and time consuming. Even running it at a reduced
> capacity is much better than not having it at all.
> 
> The branches we'd be interested in are (in order of importance):
> - mainline/master (the bare minimum)
> - stable-rc/queue/6.1
> - stable-rc/queue/5.15
> - stable-rc/queue/5.4
> - stable-rc/queue/4.19

If we're doing this -next would be very helpful too.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reduced build and test coverage
  2023-09-19 14:31     ` Mark Brown
@ 2023-09-19 14:44       ` Ricardo Cañuelo
  2023-09-19 16:12         ` Mark Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Ricardo Cañuelo @ 2023-09-19 14:44 UTC (permalink / raw)
  To: Mark Brown; +Cc: Guillaume Tucker, kernelci

On mar, sep 19 2023 at 15:31:31, Mark Brown <broonie@kernel.org> wrote:
> If we're doing this -next would be very helpful too.

Does it make sense to bisect -next, though? Considering that it's
constantly rebased, so I don't think it has a coherent and "linear" log
like mainline. That means that bisections aren't guaranteed to work on
it. Is that right?

Ricardo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reduced build and test coverage
  2023-09-19 14:44       ` Ricardo Cañuelo
@ 2023-09-19 16:12         ` Mark Brown
  2023-09-19 16:43           ` Guillaume Charles Tucker
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Brown @ 2023-09-19 16:12 UTC (permalink / raw)
  To: Ricardo Cañuelo; +Cc: Guillaume Tucker, kernelci

[-- Attachment #1: Type: text/plain, Size: 535 bytes --]

On Tue, Sep 19, 2023 at 04:44:06PM +0200, Ricardo Cañuelo wrote:
> On mar, sep 19 2023 at 15:31:31, Mark Brown <broonie@kernel.org> wrote:
> > If we're doing this -next would be very helpful too.

> Does it make sense to bisect -next, though? Considering that it's
> constantly rebased, so I don't think it has a coherent and "linear" log
> like mainline. That means that bisections aren't guaranteed to work on
> it. Is that right?

When bisecting -next you should generally bisect it against the mainline
it was based on.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reduced build and test coverage
  2023-09-19 16:12         ` Mark Brown
@ 2023-09-19 16:43           ` Guillaume Charles Tucker
  2023-09-25  8:50             ` Guillaume Tucker
  0 siblings, 1 reply; 9+ messages in thread
From: Guillaume Charles Tucker @ 2023-09-19 16:43 UTC (permalink / raw)
  To: Mark Brown; +Cc: Ricardo Cañuelo, kernelci

On Tuesday, September 19, 2023 18:12 CEST, Mark Brown <broonie@kernel.org> wrote:

> On Tue, Sep 19, 2023 at 04:44:06PM +0200, Ricardo Cañuelo wrote:
> > On mar, sep 19 2023 at 15:31:31, Mark Brown <broonie@kernel.org> wrote:
> > > If we're doing this -next would be very helpful too.
> 
> > Does it make sense to bisect -next, though? Considering that it's
> > constantly rebased, so I don't think it has a coherent and "linear" log
> > like mainline. That means that bisections aren't guaranteed to work on
> > it. Is that right?
> 
> When bisecting -next you should generally bisect it against the mainline
> it was based on.

Yes that's how the KernelCI bisection works. It's the most useful branch to bisect automatically precisely because it's rebased every day with thousands of commits, so a bisection takes between 10 and 15 iterations and it's painful to do manually. And that's where most of the bugs are.  Then mainline, stable and stable-rc are the other critical trees to cover so it makes sense to enable them first.

We've discussed enabling bisections again with sysadmins and I think we can have this done in a week or two.  Other issues are about measuring costs empirically for different parts of the system and adding a filter in the legacy back end to avoid flooding Jenkins with jobs that won't get run (they still use up all the RAM).

Cheers,
Guillaume


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reduced build and test coverage
  2023-09-19 16:43           ` Guillaume Charles Tucker
@ 2023-09-25  8:50             ` Guillaume Tucker
  2023-10-10 10:43               ` Guillaume Tucker
  0 siblings, 1 reply; 9+ messages in thread
From: Guillaume Tucker @ 2023-09-25  8:50 UTC (permalink / raw)
  To: kernelci; +Cc: Ricardo Cañuelo, Mark Brown

On 19/09/2023 18:43, Guillaume Charles Tucker wrote:
> On Tuesday, September 19, 2023 18:12 CEST, Mark Brown <broonie@kernel.org> wrote:
> 
>> On Tue, Sep 19, 2023 at 04:44:06PM +0200, Ricardo Cañuelo wrote:
>>> On mar, sep 19 2023 at 15:31:31, Mark Brown <broonie@kernel.org> wrote:
>>>> If we're doing this -next would be very helpful too.
>>
>>> Does it make sense to bisect -next, though? Considering that it's
>>> constantly rebased, so I don't think it has a coherent and "linear" log
>>> like mainline. That means that bisections aren't guaranteed to work on
>>> it. Is that right?
>>
>> When bisecting -next you should generally bisect it against the mainline
>> it was based on.
> 
> Yes that's how the KernelCI bisection works. It's the most useful branch to bisect automatically precisely because it's rebased every day with thousands of commits, so a bisection takes between 10 and 15 iterations and it's painful to do manually. And that's where most of the bugs are.  Then mainline, stable and stable-rc are the other critical trees to cover so it makes sense to enable them first.
> 
> We've discussed enabling bisections again with sysadmins and I think we can have this done in a week or two.  Other issues are about measuring costs empirically for different parts of the system and adding a filter in the legacy back end to avoid flooding Jenkins with jobs that won't get run (they still use up all the RAM).

We've now re-enabled the full functional test coverage which can
result in roughly twice the amount of test results coming from
LAVA labs.  We'll be monitoring the additional costs incurred as
a result this week and compare with previous weeks.

Then the plan is to prepare everything to have bisections enabled
again next Monday but with only mainline, linux-next, stable-rc
and stable as discussed previously.  These are the trees that
benefit the most from automated bisection.  We'll be able to
measure the added costs for that again in the following weeks
although we already have an estimate.

In October, the current plan is to enable some more builds such
as allmodconfig on a couple more trees such as mainline (right
now only linux-next has allmodconfig builds enabled) and continue
measuring the related costs.

This exercise aims at:

* knowing more accurately the cost of each area KernelCI's
  coverage to better secure its long-term funding

* streamlining the overall config to maximise the value of the
  results produced by KernelCI

* review the whole configuration in preparation for migrating to
  the new API with a fresh implementation

As mentioned before, if the current coverage is missing parts
that are critical to any use-case in the community then please
raise the issue here and we can address it.  For example, we're
looking into Android builds and bisections on more trees based on
feedback from users.

Thanks,
Guillaume


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Reduced build and test coverage
  2023-09-25  8:50             ` Guillaume Tucker
@ 2023-10-10 10:43               ` Guillaume Tucker
  0 siblings, 0 replies; 9+ messages in thread
From: Guillaume Tucker @ 2023-10-10 10:43 UTC (permalink / raw)
  To: kernelci; +Cc: Ricardo Cañuelo, Mark Brown

On 25/09/2023 10:50, Guillaume Tucker wrote:
> On 19/09/2023 18:43, Guillaume Charles Tucker wrote:
>> On Tuesday, September 19, 2023 18:12 CEST, Mark Brown <broonie@kernel.org> wrote:
>>
>>> On Tue, Sep 19, 2023 at 04:44:06PM +0200, Ricardo Cañuelo wrote:
>>>> On mar, sep 19 2023 at 15:31:31, Mark Brown <broonie@kernel.org> wrote:
>>>>> If we're doing this -next would be very helpful too.
>>>
>>>> Does it make sense to bisect -next, though? Considering that it's
>>>> constantly rebased, so I don't think it has a coherent and "linear" log
>>>> like mainline. That means that bisections aren't guaranteed to work on
>>>> it. Is that right?
>>>
>>> When bisecting -next you should generally bisect it against the mainline
>>> it was based on.
>>
>> Yes that's how the KernelCI bisection works. It's the most useful branch to bisect automatically precisely because it's rebased every day with thousands of commits, so a bisection takes between 10 and 15 iterations and it's painful to do manually. And that's where most of the bugs are.  Then mainline, stable and stable-rc are the other critical trees to cover so it makes sense to enable them first.
>>
>> We've discussed enabling bisections again with sysadmins and I think we can have this done in a week or two.  Other issues are about measuring costs empirically for different parts of the system and adding a filter in the legacy back end to avoid flooding Jenkins with jobs that won't get run (they still use up all the RAM).
> 
> We've now re-enabled the full functional test coverage which can
> result in roughly twice the amount of test results coming from
> LAVA labs.  We'll be monitoring the additional costs incurred as
> a result this week and compare with previous weeks.
> 
> Then the plan is to prepare everything to have bisections enabled
> again next Monday but with only mainline, linux-next, stable-rc
> and stable as discussed previously.  These are the trees that
> benefit the most from automated bisection.  We'll be able to
> measure the added costs for that again in the following weeks
> although we already have an estimate.

We can now confirm that bisections are running again in
production, see details about it here:

  https://github.com/kernelci/kernelci-project/issues/260

It is still based on the legacy system but with some small
internal improvements to avoid cluttering the queue which should
improve the quality of service for users.

> In October, the current plan is to enable some more builds such
> as allmodconfig on a couple more trees such as mainline (right
> now only linux-next has allmodconfig builds enabled) and continue
> measuring the related costs.

This is still the plan for next Monday's production update, which
should give us two weeks to measure the added costs with large
kernel builds such as allmodconfig.

> This exercise aims at:
> 
> * knowing more accurately the cost of each area KernelCI's
>   coverage to better secure its long-term funding
> 
> * streamlining the overall config to maximise the value of the
>   results produced by KernelCI
> 
> * review the whole configuration in preparation for migrating to
>   the new API with a fresh implementation

Note: The Android kernel builds configuration will also be
updated as per the other email thread with Todd.  This should in
fact deliver more value by streamlining the build coverage so
costs aren't expected to change much as a result.  The large GKI
configs are already being built and were never disabled.

> As mentioned before, if the current coverage is missing parts
> that are critical to any use-case in the community then please
> raise the issue here and we can address it.  For example, we're
> looking into Android builds and bisections on more trees based on
> feedback from users.

Likewise, if anything else in the build and tests coverage needs
to be adjusted please let us know.

Thanks,
Guillaume


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-10-10 10:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-06 13:51 Reduced build and test coverage Guillaume Tucker
2023-07-13  5:51 ` Guillaume Tucker
2023-09-19 14:29   ` Ricardo Cañuelo
2023-09-19 14:31     ` Mark Brown
2023-09-19 14:44       ` Ricardo Cañuelo
2023-09-19 16:12         ` Mark Brown
2023-09-19 16:43           ` Guillaume Charles Tucker
2023-09-25  8:50             ` Guillaume Tucker
2023-10-10 10:43               ` Guillaume Tucker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.