All of lore.kernel.org
 help / color / mirror / Atom feed
* Chrome OS and KernelCI
@ 2021-04-30  6:35 Guillaume Tucker
  2021-04-30 14:46 ` Jesse Barnes
  0 siblings, 1 reply; 2+ messages in thread
From: Guillaume Tucker @ 2021-04-30  6:35 UTC (permalink / raw)
  To: kernelci; +Cc: Jesse Barnes, kernelci-members, automated-testing

As you all know, KernelCI is dedicated to testing the upstream kernel.  This is
what the results on linux.kernelci.org are all about.  Still, there are many
products out there running Linux with many downstream changes, such as distros
for desktop and enterprise applications, mobile devices etc...  Several of the
KernelCI Linux Foundation members make Linux-based products, and testing
upstream is valuable to them because every issue caught there is something they
won't have to fix in their product.  In fact, it brings products closer to
upstream kernels over time.

One such member is Google, with Chrome OS products.  There are an increasing
number of Chromebook devices in KernelCI (mostly in Collabora's lab) which can
be used to test that upstream kernels and in particular stable ones are working
well on this hardware.  They're currently running all the regular tests like
other platforms: LTP, kselftest, igt, v4l2-compliance...  Now, as a member
company, Google would like to extend coverage with additional tests that are
only currently available within Chrome OS.  This is exploring a new avenue for
KernelCI, and it's important to ensure all aspects and decisions are well
discussed openly with the community.

Of course, any test that can find a kernel issue is useful.  However, in the
case of Chrome OS tests:

* They can typically only be run within a Chrome OS user-space

This is due to the dependencies on libraries and services that only exist in
Chrome OS.  Those tests can in theory be made more portable, but not in a
trivial way.  It makes it harder for any kernel developer to reproduce a test
than say with KUnit, kselftest or LTP etc...  Even if the user-space is
directly available for everyone to use locally, it is still an extra hurdle.

* Some may be higher-level workloads than bare metal kernel ones

When a kernel panics, we know it's a kernel problem.  When a video stream is
not playing correctly in a Chrome web browser, or if performance has dropped,
it can be harder to directly blame the kernel even if it's the only moving part
between CI runs.  For example, it may be due to a sub-optimal user-space
implementation made visible only as a side-effect of some kernel changes.  So
when reporting Chrome OS test errors, it can be tricky to confidently point the
finger at some kernel patch.

* The issues they find may not be detected by any other kind of tests

If a Chrome OS test finds an issue, for example a benchmark drop, but generic
benchmarks don't, then it's tempting to say the issue is specific to Chrome OS.
When reporting this issue to a random kernel developer not working on Chrome OS
products, it may be harder to convince them they've broken something than with
well-known generic tests.  Each issue is different, and maybe some will be
obviously due to the kernel in which case it's all good.  But there is a
possibility that reporting unclear issues too quickly can have a negative
effect in the community for both Chrome OS and KernelCI.


It would be interesting to know what others think of this, if the issues
highlighted above seem like they would set a precedent that would cause
KernelCI to deviate from its intended purpose.


In terms of addressing these issues, one option is to create a separate
KernelCI instance hosted on chromeos.kernelci.org.  We could then have extra
kernel builds for Chromebooks (e.g. config fragments), dedicated Chrome OS
rootfs images to run Chrome OS tests and email reports sent only to Chrome OS
developers if we want to.  This would give us a stepping stone to try things
out without interfering with linux.kernelci.org.  Whenever some tests are
deemed acceptable, they could be migrated to the main linux.kernelci.org
instance.  It would still be public and building only upstream / stable
kernels, but with a focus on Chrome OS testing.  All the results would also be
sent to the common reporting database KCIDB.

Having a dedicated instance is something every KernelCI LF project member can
benefit from, and this would be the first one.  So decisions around how it gets
done will set some precedent for other members.


Does anyone have any concerns about this?  Or, on the contrary would it seem
appropriate to directly enable tests with a Chrome OS user-space on the main
linux.kernelci.org instance and report issues in the same way as any generic
test results?  Please share any thoughts you may have.  It's important to
ensure we find ways for member companies to fully benefit from KernelCI while
also serving the wider kernel community in the best possible way.

Best wishes,
Guillaume

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Chrome OS and KernelCI
  2021-04-30  6:35 Chrome OS and KernelCI Guillaume Tucker
@ 2021-04-30 14:46 ` Jesse Barnes
  0 siblings, 0 replies; 2+ messages in thread
From: Jesse Barnes @ 2021-04-30 14:46 UTC (permalink / raw)
  To: Guillaume Tucker; +Cc: kernelci, kernelci-members, automated-testing

On Thu, Apr 29, 2021 at 11:35 PM Guillaume Tucker
<guillaume.tucker@collabora.com> wrote:
>
> As you all know, KernelCI is dedicated to testing the upstream kernel.  This is
> what the results on linux.kernelci.org are all about.  Still, there are many
> products out there running Linux with many downstream changes, such as distros
> for desktop and enterprise applications, mobile devices etc...  Several of the
> KernelCI Linux Foundation members make Linux-based products, and testing
> upstream is valuable to them because every issue caught there is something they
> won't have to fix in their product.  In fact, it brings products closer to
> upstream kernels over time.
>
> One such member is Google, with Chrome OS products.  There are an increasing
> number of Chromebook devices in KernelCI (mostly in Collabora's lab) which can
> be used to test that upstream kernels and in particular stable ones are working
> well on this hardware.  They're currently running all the regular tests like
> other platforms: LTP, kselftest, igt, v4l2-compliance...  Now, as a member
> company, Google would like to extend coverage with additional tests that are
> only currently available within Chrome OS.  This is exploring a new avenue for
> KernelCI, and it's important to ensure all aspects and decisions are well
> discussed openly with the community.

[wow this turned into a big reply; short version is I think we have
some really valuable tests that people will want to see results from,
and I definitely think we shouldn't limit ourselves to traditional
distros]

Thanks Guillaume for starting this discussion; summarizing your points
as I understand them:
1) non-traditional userspace required
2) some higher level tests
3) may be harder to reproduce by some developers
4) higher level tests may be harder to isolate as kernel issues

I think 1 and 3 are related since they both create barriers to
developers running traditional distros to reproduce problems.
Similarly, if a problem only occurs on a Chromebook, developers may
have a hard time reproducing (I expect this to happen in the audio
stack sometimes for example).  Some alternatives we've talked about
include containerizing the tests instead of having a full rootfs; this
may allow external developers to simply pull an image and run it,
simplifying that aspect of testing.  That introduces its own issues
though in terms of device access and fidelity to the original test
intent, so would require some effort to get right (but might be a good
thing to invest in for the long term).

2 and 4 are also related, and are real challenges.  That said, I don't
think we should give up easily here!  One of my big motivations for
pushing our tests into KernelCI is to provide upstream developers with
test results they can't get today.  In particular, I want to push at
least a couple of categories of test:
- desktop style memory pressure tests
- tests that measure input latency in web apps while running mixed
workloads including video conferencing

In both cases, the results of our tests report user visible metrics
like tab switch times and input latency as seen by the user (e.g.
keypress to glyph on screen), in addition to several more common
metrics like CPU usage, framerate, etc.  I think several upstream
developers are interested in this type of feedback, and I think they
close a gap in upstream testing we've had for some time.  If one of
these tests regress, it will involve some work to figure out, and
there's definitely a statistical element to the results, but if we
keep the userspace and test images fixed while only changing the
kernel, I think we can be fairly confident that about kernel
responsibility if enough runs show a change (though clearly there are
exceptions here).

One obvious advantage of having these high level tests is that they
represent common workloads and measure user visible behavior, and big
regressions especially would be something the broader community would
want to avoid, if only for selfish reasons!  And over time, we could
add additional tests that correlate to higher level behaviors but that
are easier to reproduce and debug (though for mm and scheduling in
particular, I don't think there's a good substitute for comprehensive
high level tests).

I think if we can figure out an approach on the
userspace/reproducibility issues, we have an opportunity to make Linux
a lot better for the wide variety of userspace stacks out there; imo
we shouldn't limit ourselves to traditional distros, especially as
things have evolved so far from those early days (e.g. packages to
congtainers & VMs, moving away from X, development of new approaches
like io_uring with implications for app & distro architecture, etc).

And I think the high level tests are just a starting point; I'm hoping
KernelCI over time ingests a large body of tests, from low level to
high level, providing the broader community with a variety of metrics
to make releases even higher quality than they are today.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-04-30 14:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30  6:35 Chrome OS and KernelCI Guillaume Tucker
2021-04-30 14:46 ` Jesse Barnes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.