All of lore.kernel.org
 help / color / mirror / Atom feed
* Proposal for a regular upstream performance testing
@ 2020-11-26  8:10 Lukáš Doktor
  2020-11-26  8:23 ` Jason Wang
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Lukáš Doktor @ 2020-11-26  8:10 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Charles Shih, Aleksandar Markovic, Stefan Hajnoczi

Hello guys,

I had been around qemu on the Avocado-vt side for quite some time and a while ago I shifted my focus on performance testing. Currently I am not aware of any upstream CI that would continuously monitor the upstream qemu performance and I'd like to change that. There is a lot to cover so please bear with me.

Goal
====

The goal of this initiative is to detect system-wide performance regressions as well as improvements early, ideally pin-point the individual commits and notify people that they should fix things. All in upstream and ideally with least human interaction possible.

Unlike the recent work of Ahmed Karaman's https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/ my aim is on the system-wide performance inside the guest (like fio, uperf, ...)

Tools
=====

In house we have several different tools used by various teams and I bet there are tons of other tools out there that can do that. I can not speak for all teams but over the time many teams at Red Hat have come to like pbench https://distributed-system-analysis.github.io/pbench/ to run the tests and produce machine readable results and use other tools (Ansible, scripts, ...) to provision the systems and to generate the comparisons.

As for myself I used python for PoC and over the last year I pushed hard to turn it into a usable and sensible tool which I'd like to offer: https://run-perf.readthedocs.io/en/latest/ anyway I am open to suggestions and comparisons. As I am using it downstream to watch regressions I do plan on keep developing the tool as well as the pipelines (unless a better tool is found that would replace it or it's parts).

How
===

This is a tough question. Ideally this should be a standalone service that would only notify the author of the patch that caused the change with a bunch of useful data so they can either address the issue or just be aware of this change and mark it as expected.

Ideally the community should have a way to also issue their custom builds in order to verify their patches so they can debug and address issues better than just commit to qemu-master.

The problem with those is that we can not simply use travis/gitlab/... machines for running those tests, because we are measuring in-guest actual performance. We can't just stop the time when the machine decides to schedule another container/vm. I briefly checked the public bare-metal offerings like rackspace but these are most probably not sufficient either because (unless I'm wrong) they only give you a machine but it is not guaranteed that it will be the same machine the next time. If we are to compare the results we don't need just the same model, we really need the very same machine. Any change to the machine might lead to a significant difference (disk replacement, even firmware update...).

Solution 1
----------

Doing this for downstream builds I can start doing this for upstream as well. At this point I can offer a single pipeline watching only changes in qemu (downstream we are checking distro/kernel changes as well but that would require too much time at this point) on a single x86_64 machine. I can not offer a public access to the testing machine, not even checking custom builds (unless someone provides me a publicly available machine(s) that I would use for this). What I can offer is running the checks on the latest qemu master, publishing the reports, bisecting issues and notifying people about the changes. An example of a report can be found here: https://drive.google.com/file/d/1V2w7QpSuybNusUaGxnyT5zTUvtZDOfsb/view?usp=sharing a documentation of the format is here: https://run-perf.readthedocs.io/en/latest/scripts.html#html-results I can also attach the raw pbench results if needed (as well as details about the tests that were executed and the params and other details).

Currently the covered scenarios would be a default libvirt machine with qcow2 storage and tuned libvirt machine (cpus, hugepages, numa, raw disk...) running fio, uperf and linpack on the latest GA RHEL. In the future I can add/tweak the scenarios as well as tests selection based on your feedback.

Solution 2
----------

I can offer a documentation: https://run-perf.readthedocs.io/en/latest/jenkins.html and someone can fork/inspire by it and setup the pipelines on their system, making it available to the outside world, add your custom scenarios and variants. Note the setup does not require Jenkins, it's just an example and could be easily turned into a cronjob or whatever you chose.

Solution 3
----------

You name it. I bet there are many other ways to perform system-wide performance testing.

Regards,
Lukáš



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-26  8:10 Proposal for a regular upstream performance testing Lukáš Doktor
@ 2020-11-26  8:23 ` Jason Wang
  2020-11-26  9:43 ` Daniel P. Berrangé
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2020-11-26  8:23 UTC (permalink / raw)
  To: Lukáš Doktor, QEMU Developers
  Cc: Charles Shih, Aleksandar Markovic, Stefan Hajnoczi


On 2020/11/26 下午4:10, Lukáš Doktor wrote:
> Hello guys,
>
> I had been around qemu on the Avocado-vt side for quite some time and 
> a while ago I shifted my focus on performance testing. Currently I am 
> not aware of any upstream CI that would continuously monitor the 
> upstream qemu performance and I'd like to change that. There is a lot 
> to cover so please bear with me.
>
> Goal
> ====
>
> The goal of this initiative is to detect system-wide performance 
> regressions as well as improvements early, ideally pin-point the 
> individual commits and notify people that they should fix things. All 
> in upstream and ideally with least human interaction possible.
>
> Unlike the recent work of Ahmed Karaman's 
> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/ my aim is on 
> the system-wide performance inside the guest (like fio, uperf, ...)
>
> Tools
> =====
>
> In house we have several different tools used by various teams and I 
> bet there are tons of other tools out there that can do that. I can 
> not speak for all teams but over the time many teams at Red Hat have 
> come to like pbench 
> https://distributed-system-analysis.github.io/pbench/ to run the tests 
> and produce machine readable results and use other tools (Ansible, 
> scripts, ...) to provision the systems and to generate the comparisons.
>
> As for myself I used python for PoC and over the last year I pushed 
> hard to turn it into a usable and sensible tool which I'd like to 
> offer: https://run-perf.readthedocs.io/en/latest/ anyway I am open to 
> suggestions and comparisons. As I am using it downstream to watch 
> regressions I do plan on keep developing the tool as well as the 
> pipelines (unless a better tool is found that would replace it or it's 
> parts).


FYI, Intel has invented a lot on the 0-day Linux kernel automated 
performance regression test: https://01.org/lkp. It's being actively 
developed upstream.

It's powerful and tons of regressions were reported (and bisected).

I think it can use qemu somehow but I'm not sure. Maybe we can have a try.

Thanks


>
> How
> ===
>
> This is a tough question. Ideally this should be a standalone service 
> that would only notify the author of the patch that caused the change 
> with a bunch of useful data so they can either address the issue or 
> just be aware of this change and mark it as expected.
>
> Ideally the community should have a way to also issue their custom 
> builds in order to verify their patches so they can debug and address 
> issues better than just commit to qemu-master.
>
> The problem with those is that we can not simply use travis/gitlab/... 
> machines for running those tests, because we are measuring in-guest 
> actual performance. We can't just stop the time when the machine 
> decides to schedule another container/vm. I briefly checked the public 
> bare-metal offerings like rackspace but these are most probably not 
> sufficient either because (unless I'm wrong) they only give you a 
> machine but it is not guaranteed that it will be the same machine the 
> next time. If we are to compare the results we don't need just the 
> same model, we really need the very same machine. Any change to the 
> machine might lead to a significant difference (disk replacement, even 
> firmware update...).
>
> Solution 1
> ----------
>
> Doing this for downstream builds I can start doing this for upstream 
> as well. At this point I can offer a single pipeline watching only 
> changes in qemu (downstream we are checking distro/kernel changes as 
> well but that would require too much time at this point) on a single 
> x86_64 machine. I can not offer a public access to the testing 
> machine, not even checking custom builds (unless someone provides me a 
> publicly available machine(s) that I would use for this). What I can 
> offer is running the checks on the latest qemu master, publishing the 
> reports, bisecting issues and notifying people about the changes. An 
> example of a report can be found here: 
> https://drive.google.com/file/d/1V2w7QpSuybNusUaGxnyT5zTUvtZDOfsb/view?usp=sharing 
> a documentation of the format is here: 
> https://run-perf.readthedocs.io/en/latest/scripts.html#html-results I 
> can also attach the raw pbench results if needed (as well as details 
> about the tests that were executed and the params and other details).
>
> Currently the covered scenarios would be a default libvirt machine 
> with qcow2 storage and tuned libvirt machine (cpus, hugepages, numa, 
> raw disk...) running fio, uperf and linpack on the latest GA RHEL. In 
> the future I can add/tweak the scenarios as well as tests selection 
> based on your feedback.
>
> Solution 2
> ----------
>
> I can offer a documentation: 
> https://run-perf.readthedocs.io/en/latest/jenkins.html and someone can 
> fork/inspire by it and setup the pipelines on their system, making it 
> available to the outside world, add your custom scenarios and 
> variants. Note the setup does not require Jenkins, it's just an 
> example and could be easily turned into a cronjob or whatever you chose.
>
> Solution 3
> ----------
>
> You name it. I bet there are many other ways to perform system-wide 
> performance testing.
>
> Regards,
> Lukáš
>
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-26  8:10 Proposal for a regular upstream performance testing Lukáš Doktor
  2020-11-26  8:23 ` Jason Wang
@ 2020-11-26  9:43 ` Daniel P. Berrangé
  2020-11-26 11:29   ` Lukáš Doktor
  2020-11-30 13:23   ` Stefan Hajnoczi
  2020-11-26 10:17 ` Peter Maydell
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 22+ messages in thread
From: Daniel P. Berrangé @ 2020-11-26  9:43 UTC (permalink / raw)
  To: Lukáš Doktor
  Cc: Charles Shih, Aleksandar Markovic, QEMU Developers, Stefan Hajnoczi

On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
> How
> ===
> 
> This is a tough question. Ideally this should be a standalone service that
> would only notify the author of the patch that caused the change with a
> bunch of useful data so they can either address the issue or just be aware
> of this change and mark it as expected.

We need to distinguish between the service that co-ordinates and reports
the testing, vs the service which runs the tests.

For the service which runs the tests, it is critical that it be a standalone
bare metal machine with nothing else being run, to ensure reproducability of
results as you say.

For the service which co-ordinates and reports test results, we ideally want
it to be integrated into our primary CI dashboard, which is GitLab CI at
this time.

> Ideally the community should have a way to also issue their custom builds
> in order to verify their patches so they can debug and address issues
> better than just commit to qemu-master.

Allowing community builds certainly adds an extra dimension of complexity
to the problem, as you need some kind of permissions control, as you can't
allow any arbitrary user on the web to trigger jobs with arbitrary code,
as that is a significant security risk to your infra.

I think I'd just suggest providing a mechanism for the user to easily spin
up performance test jobs on their own hardware. This could be as simple
as providing a docker container recipe that users can deploy on some
arbitrary machine of their choosing that contains the test rig. All they
should need do is provide a git ref, and then launching the container and
running jobs should be a single command. They can simply run the tests
twice, with and without the patch series in question.

> The problem with those is that we can not simply use travis/gitlab/...
> machines for running those tests, because we are measuring in-guest
> actual performance.

As mentioned above - distinguish between the CI framework, and the
actual test runner.



> Solution 3
> ----------
> 
> You name it. I bet there are many other ways to perform system-wide
> performance testing.

IMHO ideally we should use GitLab CI as the dashboard for trigger
the tests, and report results back.  We should not use the GitLab
shared runners though for reasons you describe of course. Instead
register our own dedicated bare metal machine to run the perf jobs.
Cleber has already done some work in this area to provide custom
runners for some of the integration testing work. Red Hat is providing
the hardware for those runners, but I don't know what spare we have
available, if any,  that could be dedicated for the performance
regression tests


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-26  8:10 Proposal for a regular upstream performance testing Lukáš Doktor
  2020-11-26  8:23 ` Jason Wang
  2020-11-26  9:43 ` Daniel P. Berrangé
@ 2020-11-26 10:17 ` Peter Maydell
  2020-11-26 11:16   ` Lukáš Doktor
  2020-11-30 13:25 ` Stefan Hajnoczi
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Peter Maydell @ 2020-11-26 10:17 UTC (permalink / raw)
  To: Lukáš Doktor
  Cc: Charles Shih, Aleksandar Markovic, QEMU Developers, Stefan Hajnoczi

On Thu, 26 Nov 2020 at 08:13, Lukáš Doktor <ldoktor@redhat.com> wrote:
> The goal of this initiative is to detect system-wide performance
> regressions as well as improvements early, ideally pin-point the
> individual commits and notify people that they should fix things.
> All in upstream and ideally with least human interaction possible.

So, my general view on this is that automated testing of performance
is nice, but unless it also comes with people willing to do the
work of identifying and fixing the causes of performance regressions
there's a risk that it degrades into another automated email or set
of graphs that nobody pays much attention to (outside of the obvious
"oops this commit dropped us by 50%" mistakes). As with fuzz testing,
I'm a bit wary of setting up an automated system that just pushes
work onto humans who already have full plates. Is RedHat also
planning to set performance requirements and assign engineers to
keep us from falling below them ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-26 10:17 ` Peter Maydell
@ 2020-11-26 11:16   ` Lukáš Doktor
  0 siblings, 0 replies; 22+ messages in thread
From: Lukáš Doktor @ 2020-11-26 11:16 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Charles Shih, Aleksandar Markovic, QEMU Developers, Stefan Hajnoczi

Dne 26. 11. 20 v 11:17 Peter Maydell napsal(a):
> On Thu, 26 Nov 2020 at 08:13, Lukáš Doktor <ldoktor@redhat.com> wrote:
>> The goal of this initiative is to detect system-wide performance
>> regressions as well as improvements early, ideally pin-point the
>> individual commits and notify people that they should fix things.
>> All in upstream and ideally with least human interaction possible.
> 
> So, my general view on this is that automated testing of performance
> is nice, but unless it also comes with people willing to do the
> work of identifying and fixing the causes of performance regressions
> there's a risk that it degrades into another automated email or set
> of graphs that nobody pays much attention to (outside of the obvious
> "oops this commit dropped us by 50%" mistakes). As with fuzz testing,
> I'm a bit wary of setting up an automated system that just pushes
> work onto humans who already have full plates. Is RedHat also
> planning to set performance requirements and assign engineers to
> keep us from falling below them ?
> 
> thanks
> -- PMM
> 

In the proposed "solution 1" my role would be to maintain, judge and help analyzing the reports if needed. As for the fixing the code I can not serve, that would have to be on the individual contributors/maintainers, the best I can do is to bisect and CC the authors.

In "solution 2" that would be on the other volunteer with my assistance, if needed. Note that currently the pipeline is not that clever so it requires manual interaction for bisection but I do have plan on improving that soon.

Also note that the purpose of this email is also a call for ideas, because maybe there is a better tool out there and if it fitted our needs better I wouldn't mind to switching to it.

Regards,
Lukáš



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-26  9:43 ` Daniel P. Berrangé
@ 2020-11-26 11:29   ` Lukáš Doktor
  2020-11-30 13:23   ` Stefan Hajnoczi
  1 sibling, 0 replies; 22+ messages in thread
From: Lukáš Doktor @ 2020-11-26 11:29 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Charles Shih, Aleksandar Markovic, QEMU Developers, Stefan Hajnoczi

Dne 26. 11. 20 v 10:43 Daniel P. Berrangé napsal(a):
> On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
>> How
>> ===
>>
>> This is a tough question. Ideally this should be a standalone service that
>> would only notify the author of the patch that caused the change with a
>> bunch of useful data so they can either address the issue or just be aware
>> of this change and mark it as expected.
> 
> We need to distinguish between the service that co-ordinates and reports
> the testing, vs the service which runs the tests.
> 
> For the service which runs the tests, it is critical that it be a standalone
> bare metal machine with nothing else being run, to ensure reproducability of
> results as you say.
> 

Ack, for "solution 1" that would be me and I do have a dedicated machine (more will hopefully come). In "solution 2" that would be up to the other volunteer and there could be a combination, of course.

> For the service which co-ordinates and reports test results, we ideally want
> it to be integrated into our primary CI dashboard, which is GitLab CI at
> this time.
> 

At this point I don't have the resources to make this per commit, nor push. I know that in github it is possible to manually inject CI results via:

     curl -u $GITHUB_USER:$GITHUB_TOKEN --data "\"state\": \"$status\", \"description\": \"$description\", \"context\": \"manual/$GITHUB_USER\"" -H "Accept: application/vnd.github.v3+json" "$base_url/statuses/$commit"

if something like this is available in gitlab than I would be glad to start injecting my results.

>> Ideally the community should have a way to also issue their custom builds
>> in order to verify their patches so they can debug and address issues
>> better than just commit to qemu-master.
> 
> Allowing community builds certainly adds an extra dimension of complexity
> to the problem, as you need some kind of permissions control, as you can't
> allow any arbitrary user on the web to trigger jobs with arbitrary code,
> as that is a significant security risk to your infra.
> 
> I think I'd just suggest providing a mechanism for the user to easily spin
> up performance test jobs on their own hardware. This could be as simple
> as providing a docker container recipe that users can deploy on some
> arbitrary machine of their choosing that contains the test rig. All they
> should need do is provide a git ref, and then launching the container and
> running jobs should be a single command. They can simply run the tests
> twice, with and without the patch series in question.
> 

Sure, I can bundle run-perf in a container along with some helpers to simplify the usage.

>> The problem with those is that we can not simply use travis/gitlab/...
>> machines for running those tests, because we are measuring in-guest
>> actual performance.
> 
> As mentioned above - distinguish between the CI framework, and the
> actual test runner.
> 
> 
> 
>> Solution 3
>> ----------
>>
>> You name it. I bet there are many other ways to perform system-wide
>> performance testing.
> 
> IMHO ideally we should use GitLab CI as the dashboard for trigger
> the tests, and report results back.  We should not use the GitLab
> shared runners though for reasons you describe of course. Instead
> register our own dedicated bare metal machine to run the perf jobs.
> Cleber has already done some work in this area to provide custom
> runners for some of the integration testing work. Red Hat is providing
> the hardware for those runners, but I don't know what spare we have
> available, if any,  that could be dedicated for the performance
> regression tests
> 

Thanks for the pointer, I'll ask Cleber about the integration possibilities.

> 
> Regards,
> Daniel
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-26  9:43 ` Daniel P. Berrangé
  2020-11-26 11:29   ` Lukáš Doktor
@ 2020-11-30 13:23   ` Stefan Hajnoczi
  2020-12-01  7:51     ` Lukáš Doktor
  1 sibling, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-11-30 13:23 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Lukáš Doktor, Charles Shih, Aleksandar Markovic,
	QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

On Thu, Nov 26, 2020 at 09:43:38AM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
> > Ideally the community should have a way to also issue their custom builds
> > in order to verify their patches so they can debug and address issues
> > better than just commit to qemu-master.
> 
> Allowing community builds certainly adds an extra dimension of complexity
> to the problem, as you need some kind of permissions control, as you can't
> allow any arbitrary user on the web to trigger jobs with arbitrary code,
> as that is a significant security risk to your infra.

syzkaller and other upstream CI/fuzzing systems do this, so it may be
hard but not impossible.

> I think I'd just suggest providing a mechanism for the user to easily spin
> up performance test jobs on their own hardware. This could be as simple
> as providing a docker container recipe that users can deploy on some
> arbitrary machine of their choosing that contains the test rig. All they
> should need do is provide a git ref, and then launching the container and
> running jobs should be a single command. They can simply run the tests
> twice, with and without the patch series in question.

As soon as developers need to recreate an environment it becomes
time-consuming and there is a risk that the issue won't be reproduced.
That doesn't mean the system is useless - big regressions will still be
tackled - but I think it's too much friction and we should aim to run
community builds.

> > The problem with those is that we can not simply use travis/gitlab/...
> > machines for running those tests, because we are measuring in-guest
> > actual performance.
> 
> As mentioned above - distinguish between the CI framework, and the
> actual test runner.

Does the CI framework or the test runner handle detecting regressions
and providing historical data? I ask because I'm not sure if GitLab CI
provides any of this functionality or whether we'd need to write a
custom CI tool to track and report regressions.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-26  8:10 Proposal for a regular upstream performance testing Lukáš Doktor
                   ` (2 preceding siblings ...)
  2020-11-26 10:17 ` Peter Maydell
@ 2020-11-30 13:25 ` Stefan Hajnoczi
  2020-12-01  8:05   ` Lukáš Doktor
  2020-12-02  8:23 ` Chenqun (kuhn)
  2022-03-21  8:46 ` Lukáš Doktor
  5 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-11-30 13:25 UTC (permalink / raw)
  To: Lukáš Doktor; +Cc: Charles Shih, Aleksandar Markovic, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
> The problem with those is that we can not simply use travis/gitlab/... machines for running those tests, because we are measuring in-guest actual performance. We can't just stop the time when the machine decides to schedule another container/vm. I briefly checked the public bare-metal offerings like rackspace but these are most probably not sufficient either because (unless I'm wrong) they only give you a machine but it is not guaranteed that it will be the same machine the next time. If we are to compare the results we don't need just the same model, we really need the very same machine. Any change to the machine might lead to a significant difference (disk replacement, even firmware update...).

Do you have a suggested bare metal setup?

I think it's more complicated than having a single bare metal host. It
could involve a network boot server, a network traffic generator machine
for external network iperf testing, etc.

What is the minimal environment needed for bare metal hosts?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-30 13:23   ` Stefan Hajnoczi
@ 2020-12-01  7:51     ` Lukáš Doktor
  0 siblings, 0 replies; 22+ messages in thread
From: Lukáš Doktor @ 2020-12-01  7:51 UTC (permalink / raw)
  To: Stefan Hajnoczi, Daniel P. Berrangé
  Cc: Charles Shih, Aleksandar Markovic, QEMU Developers

Dne 30. 11. 20 v 14:23 Stefan Hajnoczi napsal(a):
> On Thu, Nov 26, 2020 at 09:43:38AM +0000, Daniel P. Berrangé wrote:
>> On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
>>> Ideally the community should have a way to also issue their custom builds
>>> in order to verify their patches so they can debug and address issues
>>> better than just commit to qemu-master.
>>
>> Allowing community builds certainly adds an extra dimension of complexity
>> to the problem, as you need some kind of permissions control, as you can't
>> allow any arbitrary user on the web to trigger jobs with arbitrary code,
>> as that is a significant security risk to your infra.
> 
> syzkaller and other upstream CI/fuzzing systems do this, so it may be
> hard but not impossible.
> 

Sure, not impossible, but could not be offered by me at this point. I can't promise anything but maybe in the future this can change, or in solution 2 someone else might resolve the perm issues and I can only assist with the setup (if needed).

>> I think I'd just suggest providing a mechanism for the user to easily spin
>> up performance test jobs on their own hardware. This could be as simple
>> as providing a docker container recipe that users can deploy on some
>> arbitrary machine of their choosing that contains the test rig. All they
>> should need do is provide a git ref, and then launching the container and
>> running jobs should be a single command. They can simply run the tests
>> twice, with and without the patch series in question.
> 
> As soon as developers need to recreate an environment it becomes
> time-consuming and there is a risk that the issue won't be reproduced.
> That doesn't mean the system is useless - big regressions will still be
> tackled - but I think it's too much friction and we should aim to run
> community builds.
> 

I do understand but unfortunately at this point I can not serve.

>>> The problem with those is that we can not simply use travis/gitlab/...
>>> machines for running those tests, because we are measuring in-guest
>>> actual performance.
>>
>> As mentioned above - distinguish between the CI framework, and the
>> actual test runner.
> 
> Does the CI framework or the test runner handle detecting regressions
> and providing historical data? I ask because I'm not sure if GitLab CI
> provides any of this functionality or whether we'd need to write a
> custom CI tool to track and report regressions.
> 

Currently I am using Jenkins which allows to publish result (number of failures and total checks) and store artifacts. I am storing the pbench json results with metadata (few MBs) and html report (also few MBs). Each html report contains a timeline of usually 14 previous builds using them as a reference.

Provided GitLab can do that similarly we should be able to see the number of tests run/failed somewhere and then browse the builds html reports. Last but not least we can fetch the pbench json results and issue another comparison cherry-picking individual results (internally I have a pipeline to do that for me, I could add a helper to do that via cmdline/container for others as well).

Regards,
Lukáš

> Stefan
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-30 13:25 ` Stefan Hajnoczi
@ 2020-12-01  8:05   ` Lukáš Doktor
  2020-12-01 10:22     ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Lukáš Doktor @ 2020-12-01  8:05 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Charles Shih, Aleksandar Markovic, QEMU Developers

Dne 30. 11. 20 v 14:25 Stefan Hajnoczi napsal(a):
> On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
>> The problem with those is that we can not simply use travis/gitlab/... machines for running those tests, because we are measuring in-guest actual performance. We can't just stop the time when the machine decides to schedule another container/vm. I briefly checked the public bare-metal offerings like rackspace but these are most probably not sufficient either because (unless I'm wrong) they only give you a machine but it is not guaranteed that it will be the same machine the next time. If we are to compare the results we don't need just the same model, we really need the very same machine. Any change to the machine might lead to a significant difference (disk replacement, even firmware update...).
> 
> Do you have a suggested bare metal setup?
> 
> I think it's more complicated than having a single bare metal host. It
> could involve a network boot server, a network traffic generator machine
> for external network iperf testing, etc.
> 

Yes. At this point I only test host->guest connection, but run-perf is prepared to test multi-host connection as well (tested with uperf, but dedicated traffic generator could be added as well). Another machine is promised but not yet on the way.

> What is the minimal environment needed for bare metal hosts?
> 

Not sure what you mean by that. For provisioning I have a beaker plugin, other plugins can be added if needed. Even without beaker one can also provide an installed machine and skip the provisioning step. Runperf would then only apply the profiles (including fetching the VM images from public sources) and run the tests on them. Note that for certain profiles might need to reboot the machine and in such case the tested machine can not be the one running run-perf, other profiles can use the current machine but it's still not a very good idea as the additional overhead might spoil the results.

Note that for a very simple issue which do not require a special setup I am usually just running a custom VM on my laptop and use a Localhost profile on that VM, which basically results in testing that custom-setup VM's performance. It's dirty but very fast for the first-level check.

> Stefan
> 

Regards,
Lukáš



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-12-01  8:05   ` Lukáš Doktor
@ 2020-12-01 10:22     ` Stefan Hajnoczi
  2020-12-01 12:06       ` Lukáš Doktor
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-12-01 10:22 UTC (permalink / raw)
  To: Lukáš Doktor; +Cc: Charles Shih, Aleksandar Markovic, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]

On Tue, Dec 01, 2020 at 09:05:49AM +0100, Lukáš Doktor wrote:
> Dne 30. 11. 20 v 14:25 Stefan Hajnoczi napsal(a):
> > On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
> > What is the minimal environment needed for bare metal hosts?
> > 
> 
> Not sure what you mean by that. For provisioning I have a beaker plugin, other plugins can be added if needed. Even without beaker one can also provide an installed machine and skip the provisioning step. Runperf would then only apply the profiles (including fetching the VM images from public sources) and run the tests on them. Note that for certain profiles might need to reboot the machine and in such case the tested machine can not be the one running run-perf, other profiles can use the current machine but it's still not a very good idea as the additional overhead might spoil the results.
> 
> Note that for a very simple issue which do not require a special setup I am usually just running a custom VM on my laptop and use a Localhost profile on that VM, which basically results in testing that custom-setup VM's performance. It's dirty but very fast for the first-level check.

I was thinking about reprovisioning the machine to ensure each run
starts from the same clean state. This requires reprovisioning.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-12-01 10:22     ` Stefan Hajnoczi
@ 2020-12-01 12:06       ` Lukáš Doktor
  2020-12-01 12:35         ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Lukáš Doktor @ 2020-12-01 12:06 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Charles Shih, Aleksandar Markovic, QEMU Developers

Dne 01. 12. 20 v 11:22 Stefan Hajnoczi napsal(a):
> On Tue, Dec 01, 2020 at 09:05:49AM +0100, Lukáš Doktor wrote:
>> Dne 30. 11. 20 v 14:25 Stefan Hajnoczi napsal(a):
>>> On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
>>> What is the minimal environment needed for bare metal hosts?
>>>
>>
>> Not sure what you mean by that. For provisioning I have a beaker plugin, other plugins can be added if needed. Even without beaker one can also provide an installed machine and skip the provisioning step. Runperf would then only apply the profiles (including fetching the VM images from public sources) and run the tests on them. Note that for certain profiles might need to reboot the machine and in such case the tested machine can not be the one running run-perf, other profiles can use the current machine but it's still not a very good idea as the additional overhead might spoil the results.
>>
>> Note that for a very simple issue which do not require a special setup I am usually just running a custom VM on my laptop and use a Localhost profile on that VM, which basically results in testing that custom-setup VM's performance. It's dirty but very fast for the first-level check.
> 
> I was thinking about reprovisioning the machine to ensure each run
> starts from the same clean state. This requires reprovisioning.
> 
> Stefan
> 

Sure, I probably shorten it unnecessary too much. In my setup I am using a beaker plugin that reprovisions the machine. As for others they can either use beaker plugin as well or they can just prepare the machine prior to the execution as described in the previous paragraph.

Regards,
Lukáš



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-12-01 12:06       ` Lukáš Doktor
@ 2020-12-01 12:35         ` Stefan Hajnoczi
  2020-12-02  8:58           ` Chenqun (kuhn)
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2020-12-01 12:35 UTC (permalink / raw)
  To: Lukáš Doktor; +Cc: Charles Shih, Aleksandar Markovic, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 2040 bytes --]

On Tue, Dec 01, 2020 at 01:06:35PM +0100, Lukáš Doktor wrote:
> Dne 01. 12. 20 v 11:22 Stefan Hajnoczi napsal(a):
> > On Tue, Dec 01, 2020 at 09:05:49AM +0100, Lukáš Doktor wrote:
> > > Dne 30. 11. 20 v 14:25 Stefan Hajnoczi napsal(a):
> > > > On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
> > > > What is the minimal environment needed for bare metal hosts?
> > > > 
> > > 
> > > Not sure what you mean by that. For provisioning I have a beaker plugin, other plugins can be added if needed. Even without beaker one can also provide an installed machine and skip the provisioning step. Runperf would then only apply the profiles (including fetching the VM images from public sources) and run the tests on them. Note that for certain profiles might need to reboot the machine and in such case the tested machine can not be the one running run-perf, other profiles can use the current machine but it's still not a very good idea as the additional overhead might spoil the results.
> > > 
> > > Note that for a very simple issue which do not require a special setup I am usually just running a custom VM on my laptop and use a Localhost profile on that VM, which basically results in testing that custom-setup VM's performance. It's dirty but very fast for the first-level check.
> > 
> > I was thinking about reprovisioning the machine to ensure each run
> > starts from the same clean state. This requires reprovisioning.
> > 
> > Stefan
> > 
> 
> Sure, I probably shorten it unnecessary too much. In my setup I am using a beaker plugin that reprovisions the machine. As for others they can either use beaker plugin as well or they can just prepare the machine prior to the execution as described in the previous paragraph.

FWIW I'm not aware of anyone else taking on this work upstream. Whatever
you can do for upstream will be the QEMU disk/network/etc preformance
regression testing effort. Someone might show up with engineering time
and machine resources, but the chance is low.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Proposal for a regular upstream performance testing
  2020-11-26  8:10 Proposal for a regular upstream performance testing Lukáš Doktor
                   ` (3 preceding siblings ...)
  2020-11-30 13:25 ` Stefan Hajnoczi
@ 2020-12-02  8:23 ` Chenqun (kuhn)
  2022-03-21  8:46 ` Lukáš Doktor
  5 siblings, 0 replies; 22+ messages in thread
From: Chenqun (kuhn) @ 2020-12-02  8:23 UTC (permalink / raw)
  To: Lukáš Doktor, QEMU Developers
  Cc: Charles Shih, Chenzhendong (alex),
	Aleksandar Markovic, wufengguang, Stefan Hajnoczi

> -----Original Message-----
> From: Qemu-devel
> [mailto:qemu-devel-bounces+kuhn.chenqun=huawei.com@nongnu.org] On
> Behalf Of Luká? Doktor
> Sent: Thursday, November 26, 2020 4:10 PM
> To: QEMU Developers <qemu-devel@nongnu.org>
> Cc: Charles Shih <cheshi@redhat.com>; Aleksandar Markovic
> <aleksandar.qemu.devel@gmail.com>; Stefan Hajnoczi
> <stefanha@redhat.com>
> Subject: Proposal for a regular upstream performance testing
> 
> Hello guys,
> 
> I had been around qemu on the Avocado-vt side for quite some time and a while
> ago I shifted my focus on performance testing. Currently I am not aware of any
> upstream CI that would continuously monitor the upstream qemu performance
> and I'd like to change that. There is a lot to cover so please bear with me.
> 
> Goal
> ====
> 
> The goal of this initiative is to detect system-wide performance regressions as
> well as improvements early, ideally pin-point the individual commits and notify
> people that they should fix things. All in upstream and ideally with least human
> interaction possible.
> 
> Unlike the recent work of Ahmed Karaman's
> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/ my aim is on the
> system-wide performance inside the guest (like fio, uperf, ...)
> 
> Tools
> =====
> 
> In house we have several different tools used by various teams and I bet there
> are tons of other tools out there that can do that. I can not speak for all teams
> but over the time many teams at Red Hat have come to like pbench
> https://distributed-system-analysis.github.io/pbench/ to run the tests and
> produce machine readable results and use other tools (Ansible, scripts, ...) to
> provision the systems and to generate the comparisons.
> 
> As for myself I used python for PoC and over the last year I pushed hard to turn
> it into a usable and sensible tool which I'd like to offer:
> https://run-perf.readthedocs.io/en/latest/ anyway I am open to suggestions
> and comparisons. As I am using it downstream to watch regressions I do plan
> on keep developing the tool as well as the pipelines (unless a better tool is
> found that would replace it or it's parts).
> 
> How
> ===
> 
> This is a tough question. Ideally this should be a standalone service that would
> only notify the author of the patch that caused the change with a bunch of
> useful data so they can either address the issue or just be aware of this change
> and mark it as expected.
> 
> Ideally the community should have a way to also issue their custom builds in
> order to verify their patches so they can debug and address issues better than
> just commit to qemu-master.
> 
> The problem with those is that we can not simply use travis/gitlab/... machines
> for running those tests, because we are measuring in-guest actual
> performance. We can't just stop the time when the machine decides to
> schedule another container/vm. I briefly checked the public bare-metal
> offerings like rackspace but these are most probably not sufficient either
> because (unless I'm wrong) they only give you a machine but it is not
> guaranteed that it will be the same machine the next time. If we are to
> compare the results we don't need just the same model, we really need the
> very same machine. Any change to the machine might lead to a significant
> difference (disk replacement, even firmware update...).

Hi Lukáš,

  It's nice to see a discussion of QEMU performance topic.
If you have a need for CI platform and physical machine environments, maybe compass-ci can help you.

Compass-ci is an open CI platform of the openEuler community and is growing.

Here's a brief reame:
https://gitee.com/wu_fengguang/compass-ci/blob/master/README.en.md


Thanks,
Chen Qun
> 
> Solution 1
> ----------
> 
> Doing this for downstream builds I can start doing this for upstream as well. At
> this point I can offer a single pipeline watching only changes in qemu
> (downstream we are checking distro/kernel changes as well but that would
> require too much time at this point) on a single x86_64 machine. I can not offer
> a public access to the testing machine, not even checking custom builds (unless
> someone provides me a publicly available machine(s) that I would use for this).
> What I can offer is running the checks on the latest qemu master, publishing
> the reports, bisecting issues and notifying people about the changes. An
> example of a report can be found here:
> https://drive.google.com/file/d/1V2w7QpSuybNusUaGxnyT5zTUvtZDOfsb/view
> ?usp=sharing a documentation of the format is here:
> https://run-perf.readthedocs.io/en/latest/scripts.html#html-results I can also
> attach the raw pbench results if needed (as well as details about the tests that
> were executed and the params and other details).
> 
> Currently the covered scenarios would be a default libvirt machine with qcow2
> storage and tuned libvirt machine (cpus, hugepages, numa, raw disk...) running
> fio, uperf and linpack on the latest GA RHEL. In the future I can add/tweak the
> scenarios as well as tests selection based on your feedback.
> 
> Solution 2
> ----------
> 
> I can offer a documentation:
> https://run-perf.readthedocs.io/en/latest/jenkins.html and someone can
> fork/inspire by it and setup the pipelines on their system, making it available to
> the outside world, add your custom scenarios and variants. Note the setup
> does not require Jenkins, it's just an example and could be easily turned into a
> cronjob or whatever you chose.
> 
> Solution 3
> ----------
> 
> You name it. I bet there are many other ways to perform system-wide
> performance testing.
> 
> Regards,
> Lukáš
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Proposal for a regular upstream performance testing
  2020-12-01 12:35         ` Stefan Hajnoczi
@ 2020-12-02  8:58           ` Chenqun (kuhn)
  0 siblings, 0 replies; 22+ messages in thread
From: Chenqun (kuhn) @ 2020-12-02  8:58 UTC (permalink / raw)
  To: Stefan Hajnoczi, Lukáš Doktor
  Cc: Charles Shih, Chenzhendong (alex),
	Aleksandar Markovic, QEMU Developers, wufengguang

> On Tue, Dec 01, 2020 at 01:06:35PM +0100, Lukáš Doktor wrote:
> > Dne 01. 12. 20 v 11:22 Stefan Hajnoczi napsal(a):
> > > On Tue, Dec 01, 2020 at 09:05:49AM +0100, Lukáš Doktor wrote:
> > > > Dne 30. 11. 20 v 14:25 Stefan Hajnoczi napsal(a):
> > > > > On Thu, Nov 26, 2020 at 09:10:14AM +0100, Lukáš Doktor wrote:
> > > > > What is the minimal environment needed for bare metal hosts?
> > > > >
> > > >
> > > > Not sure what you mean by that. For provisioning I have a beaker plugin,
> other plugins can be added if needed. Even without beaker one can also provide
> an installed machine and skip the provisioning step. Runperf would then only
> apply the profiles (including fetching the VM images from public sources) and
> run the tests on them. Note that for certain profiles might need to reboot the
> machine and in such case the tested machine can not be the one running
> run-perf, other profiles can use the current machine but it's still not a very good
> idea as the additional overhead might spoil the results.
> > > >
> > > > Note that for a very simple issue which do not require a special setup I am
> usually just running a custom VM on my laptop and use a Localhost profile on
> that VM, which basically results in testing that custom-setup VM's
> performance. It's dirty but very fast for the first-level check.
> > >
> > > I was thinking about reprovisioning the machine to ensure each run
> > > starts from the same clean state. This requires reprovisioning.
> > >
> > > Stefan
> > >
> >
> > Sure, I probably shorten it unnecessary too much. In my setup I am using a
> beaker plugin that reprovisions the machine. As for others they can either use
> beaker plugin as well or they can just prepare the machine prior to the
> execution as described in the previous paragraph.
> 
> FWIW I'm not aware of anyone else taking on this work upstream. Whatever
> you can do for upstream will be the QEMU disk/network/etc preformance
> regression testing effort. Someone might show up with engineering time and
> machine resources, but the chance is low.

Maybe we could provide CI platforms, machine resources and some other possible help : )

Currently, there is only a short readme, we will complete the documents as soon as possible.
https://gitee.com/wu_fengguang/compass-ci/blob/master/README.en.md

Thanks,
Chen Qun


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2020-11-26  8:10 Proposal for a regular upstream performance testing Lukáš Doktor
                   ` (4 preceding siblings ...)
  2020-12-02  8:23 ` Chenqun (kuhn)
@ 2022-03-21  8:46 ` Lukáš Doktor
  2022-03-21  9:42   ` Stefan Hajnoczi
  5 siblings, 1 reply; 22+ messages in thread
From: Lukáš Doktor @ 2022-03-21  8:46 UTC (permalink / raw)
  To: QEMU Developers; +Cc: Charles Shih, Aleksandar Markovic, Stefan Hajnoczi


[-- Attachment #1.1.1: Type: text/plain, Size: 6700 bytes --]

Dear qemu developers,

you might remember the "replied to" email from a bit over year ago to raise a discussion about a qemu performance regression CI. On KVM forum I presented https://www.youtube.com/watch?v=Cbm3o4ACE3Y&list=PLbzoR-pLrL6q4ZzA4VRpy42Ua4-D2xHUR&index=9 some details about my testing pipeline. I think it's stable enough to become part of the official CI so people can consume, rely on it and hopefully even suggest configuration changes.

The CI consists of:

1. Jenkins pipeline(s) - internal, not available to developers, running daily builds of the latest available commit
2. Publicly available anonymized results: https://ldoktor.github.io/tmp/RedHat-Perf-worker1/
3. (optional) a manual gitlab pulling job which triggered by the Jenkins pipeline when that particular commit is checked

The (1) is described here: https://run-perf.readthedocs.io/en/latest/jenkins.html and can be replicated on other premises and the individual jobs can be executed directly https://run-perf.readthedocs.io on any linux box using Fedora guests (via pip or container https://run-perf.readthedocs.io/en/latest/container.html ).

As for the (3) I made a testing pipeline available here: https://gitlab.com/ldoktor/qemu/-/pipelines with one always-passing test and one allow-to-fail actual testing job. If you think such integration would be useful, I can add it as another job to the official qemu repo. Note the integration is a bit hacky as, due to resources, we can not test all commits but rather test on daily basis, which is not officially supported by gitlab.

Note the aim of this project is to ensure some very basic system-level workflow performance stays the same or that the differences are described and ideally pinned to individual commits. It should not replace thorough release testing or low-level performance tests.

Regards,
Lukáš


Dne 26. 11. 20 v 9:10 Lukáš Doktor napsal(a):
> Hello guys,
> 
> I had been around qemu on the Avocado-vt side for quite some time and a while ago I shifted my focus on performance testing. Currently I am not aware of any upstream CI that would continuously monitor the upstream qemu performance and I'd like to change that. There is a lot to cover so please bear with me.
> 
> Goal
> ====
> 
> The goal of this initiative is to detect system-wide performance regressions as well as improvements early, ideally pin-point the individual commits and notify people that they should fix things. All in upstream and ideally with least human interaction possible.
> 
> Unlike the recent work of Ahmed Karaman's https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/ my aim is on the system-wide performance inside the guest (like fio, uperf, ...)
> 
> Tools
> =====
> 
> In house we have several different tools used by various teams and I bet there are tons of other tools out there that can do that. I can not speak for all teams but over the time many teams at Red Hat have come to like pbench https://distributed-system-analysis.github.io/pbench/ to run the tests and produce machine readable results and use other tools (Ansible, scripts, ...) to provision the systems and to generate the comparisons.
> 
> As for myself I used python for PoC and over the last year I pushed hard to turn it into a usable and sensible tool which I'd like to offer: https://run-perf.readthedocs.io/en/latest/ anyway I am open to suggestions and comparisons. As I am using it downstream to watch regressions I do plan on keep developing the tool as well as the pipelines (unless a better tool is found that would replace it or it's parts).
> 
> How
> ===
> 
> This is a tough question. Ideally this should be a standalone service that would only notify the author of the patch that caused the change with a bunch of useful data so they can either address the issue or just be aware of this change and mark it as expected.
> 
> Ideally the community should have a way to also issue their custom builds in order to verify their patches so they can debug and address issues better than just commit to qemu-master.
> 
> The problem with those is that we can not simply use travis/gitlab/... machines for running those tests, because we are measuring in-guest actual performance. We can't just stop the time when the machine decides to schedule another container/vm. I briefly checked the public bare-metal offerings like rackspace but these are most probably not sufficient either because (unless I'm wrong) they only give you a machine but it is not guaranteed that it will be the same machine the next time. If we are to compare the results we don't need just the same model, we really need the very same machine. Any change to the machine might lead to a significant difference (disk replacement, even firmware update...).
> 
> Solution 1
> ----------
> 
> Doing this for downstream builds I can start doing this for upstream as well. At this point I can offer a single pipeline watching only changes in qemu (downstream we are checking distro/kernel changes as well but that would require too much time at this point) on a single x86_64 machine. I can not offer a public access to the testing machine, not even checking custom builds (unless someone provides me a publicly available machine(s) that I would use for this). What I can offer is running the checks on the latest qemu master, publishing the reports, bisecting issues and notifying people about the changes. An example of a report can be found here: https://drive.google.com/file/d/1V2w7QpSuybNusUaGxnyT5zTUvtZDOfsb/view?usp=sharing a documentation of the format is here: https://run-perf.readthedocs.io/en/latest/scripts.html#html-results I can also attach the raw pbench results if needed (as well as details about the tests that were executed and the params and other details).
> 
> Currently the covered scenarios would be a default libvirt machine with qcow2 storage and tuned libvirt machine (cpus, hugepages, numa, raw disk...) running fio, uperf and linpack on the latest GA RHEL. In the future I can add/tweak the scenarios as well as tests selection based on your feedback.
> 
> Solution 2
> ----------
> 
> I can offer a documentation: https://run-perf.readthedocs.io/en/latest/jenkins.html and someone can fork/inspire by it and setup the pipelines on their system, making it available to the outside world, add your custom scenarios and variants. Note the setup does not require Jenkins, it's just an example and could be easily turned into a cronjob or whatever you chose.
> 
> Solution 3
> ----------
> 
> You name it. I bet there are many other ways to perform system-wide performance testing.
> 
> Regards,
> Lukáš

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 12153 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2022-03-21  8:46 ` Lukáš Doktor
@ 2022-03-21  9:42   ` Stefan Hajnoczi
  2022-03-21 10:29     ` Lukáš Doktor
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2022-03-21  9:42 UTC (permalink / raw)
  To: Lukáš Doktor; +Cc: Charles Shih, Aleksandar Markovic, QEMU Developers

[-- Attachment #1: Type: text/plain, Size: 3005 bytes --]

On Mon, Mar 21, 2022 at 09:46:12AM +0100, Lukáš Doktor wrote:
> Dear qemu developers,
> 
> you might remember the "replied to" email from a bit over year ago to raise a discussion about a qemu performance regression CI. On KVM forum I presented https://www.youtube.com/watch?v=Cbm3o4ACE3Y&list=PLbzoR-pLrL6q4ZzA4VRpy42Ua4-D2xHUR&index=9 some details about my testing pipeline. I think it's stable enough to become part of the official CI so people can consume, rely on it and hopefully even suggest configuration changes.
> 
> The CI consists of:
> 
> 1. Jenkins pipeline(s) - internal, not available to developers, running daily builds of the latest available commit
> 2. Publicly available anonymized results: https://ldoktor.github.io/tmp/RedHat-Perf-worker1/

This link is 404.

> 3. (optional) a manual gitlab pulling job which triggered by the Jenkins pipeline when that particular commit is checked
> 
> The (1) is described here: https://run-perf.readthedocs.io/en/latest/jenkins.html and can be replicated on other premises and the individual jobs can be executed directly https://run-perf.readthedocs.io on any linux box using Fedora guests (via pip or container https://run-perf.readthedocs.io/en/latest/container.html ).
> 
> As for the (3) I made a testing pipeline available here: https://gitlab.com/ldoktor/qemu/-/pipelines with one always-passing test and one allow-to-fail actual testing job. If you think such integration would be useful, I can add it as another job to the official qemu repo. Note the integration is a bit hacky as, due to resources, we can not test all commits but rather test on daily basis, which is not officially supported by gitlab.
> 
> Note the aim of this project is to ensure some very basic system-level workflow performance stays the same or that the differences are described and ideally pinned to individual commits. It should not replace thorough release testing or low-level performance tests.

If I understand correctly the GitLab CI integration you described
follows the "push" model where Jenkins (running on your own machine)
triggers a manual job in GitLab CI simply to indicate the status of the
nightly performance regression test?

What process should QEMU follow to handle performance regressions
identified by your job? In other words, which stakeholders need to
triage, notify, debug, etc when a regression is identified?

My guess is:
- Someone (you or the qemu.git committer) need to watch the job status and triage failures.
- That person then notifies likely authors of suspected commits so they can investigate.
- The authors need a way to reproduce the issue - either locally or by pushing commits to GitLab and waiting for test results.
- Fixes will be merged as additional qemu.git commits since commit history cannot be rewritten.
- If necessary a git-revert(1) commit can be merged to temporarily undo a commit that caused issues.

Who will watch the job status and triage failures?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2022-03-21  9:42   ` Stefan Hajnoczi
@ 2022-03-21 10:29     ` Lukáš Doktor
  2022-03-22 15:05       ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Lukáš Doktor @ 2022-03-21 10:29 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Charles Shih, Aleksandar Markovic, QEMU Developers


[-- Attachment #1.1.1: Type: text/plain, Size: 4613 bytes --]

Hello Stefan,

Dne 21. 03. 22 v 10:42 Stefan Hajnoczi napsal(a):
> On Mon, Mar 21, 2022 at 09:46:12AM +0100, Lukáš Doktor wrote:
>> Dear qemu developers,
>>
>> you might remember the "replied to" email from a bit over year ago to raise a discussion about a qemu performance regression CI. On KVM forum I presented https://www.youtube.com/watch?v=Cbm3o4ACE3Y&list=PLbzoR-pLrL6q4ZzA4VRpy42Ua4-D2xHUR&index=9 some details about my testing pipeline. I think it's stable enough to become part of the official CI so people can consume, rely on it and hopefully even suggest configuration changes.
>>
>> The CI consists of:
>>
>> 1. Jenkins pipeline(s) - internal, not available to developers, running daily builds of the latest available commit
>> 2. Publicly available anonymized results: https://ldoktor.github.io/tmp/RedHat-Perf-worker1/
> 
> This link is 404.
> 

My mistake, it works well without the tailing slash: https://ldoktor.github.io/tmp/RedHat-Perf-worker1

>> 3. (optional) a manual gitlab pulling job which triggered by the Jenkins pipeline when that particular commit is checked
>>
>> The (1) is described here: https://run-perf.readthedocs.io/en/latest/jenkins.html and can be replicated on other premises and the individual jobs can be executed directly https://run-perf.readthedocs.io on any linux box using Fedora guests (via pip or container https://run-perf.readthedocs.io/en/latest/container.html ).
>>
>> As for the (3) I made a testing pipeline available here: https://gitlab.com/ldoktor/qemu/-/pipelines with one always-passing test and one allow-to-fail actual testing job. If you think such integration would be useful, I can add it as another job to the official qemu repo. Note the integration is a bit hacky as, due to resources, we can not test all commits but rather test on daily basis, which is not officially supported by gitlab.
>>
>> Note the aim of this project is to ensure some very basic system-level workflow performance stays the same or that the differences are described and ideally pinned to individual commits. It should not replace thorough release testing or low-level performance tests.
> 
> If I understand correctly the GitLab CI integration you described
> follows the "push" model where Jenkins (running on your own machine)
> triggers a manual job in GitLab CI simply to indicate the status of the
> nightly performance regression test?
> 
> What process should QEMU follow to handle performance regressions
> identified by your job? In other words, which stakeholders need to
> triage, notify, debug, etc when a regression is identified?
> 
> My guess is:
> - Someone (you or the qemu.git committer) need to watch the job status and triage failures.
> - That person then notifies likely authors of suspected commits so they can investigate.
> - The authors need a way to reproduce the issue - either locally or by pushing commits to GitLab and waiting for test results.
> - Fixes will be merged as additional qemu.git commits since commit history cannot be rewritten.
> - If necessary a git-revert(1) commit can be merged to temporarily undo a commit that caused issues.
> 
> Who will watch the job status and triage failures?
> 
> Stefan

This is exactly the main question I'd like to resolve as part of considering-this-to-be-official-part-of-the-upstream-qemu-testing. At this point our team is offering it's service to maintain this single worker for daily jobs, monitoring the status and pinging people in case of bisectable results.

From the upstream qemu community we are mainly looking for a feedback:

* whether they'd want to be notified of such issues (and via what means)
* whether the current approach seems to be actually performing useful tasks
* whether the reports are understandable
* whether the reports should be regularly pushed into publicly available place (or just on regression/improvement)
* whether there are any volunteers to be interested in non-clearly-bisectable issues (probably by-topic)

Note that not all issues needs to be addressed, some might only result in notes that should help us understand why qemu behaves differently after rebasing our downstream version.

As for the hopefully-not-so-distant-future we already have a second machine based on el9 with NVMe disk in process of preparations and if this pipeline proves to be useful we do have plans to cover other architecture(s) as well. Aside of this other companies might replicate our setup based on the documentation with their machines, their scenarios and their distros of choice.

Regards,
Lukáš

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 12153 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2022-03-21 10:29     ` Lukáš Doktor
@ 2022-03-22 15:05       ` Stefan Hajnoczi
  2022-03-28  6:18         ` Lukáš Doktor
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2022-03-22 15:05 UTC (permalink / raw)
  To: Lukáš Doktor
  Cc: Charles Shih, Kevin Wolf, Aleksandar Markovic, QEMU Developers,
	qemu-block

[-- Attachment #1: Type: text/plain, Size: 5109 bytes --]

On Mon, Mar 21, 2022 at 11:29:42AM +0100, Lukáš Doktor wrote:
> Hello Stefan,
> 
> Dne 21. 03. 22 v 10:42 Stefan Hajnoczi napsal(a):
> > On Mon, Mar 21, 2022 at 09:46:12AM +0100, Lukáš Doktor wrote:
> >> Dear qemu developers,
> >>
> >> you might remember the "replied to" email from a bit over year ago to raise a discussion about a qemu performance regression CI. On KVM forum I presented https://www.youtube.com/watch?v=Cbm3o4ACE3Y&list=PLbzoR-pLrL6q4ZzA4VRpy42Ua4-D2xHUR&index=9 some details about my testing pipeline. I think it's stable enough to become part of the official CI so people can consume, rely on it and hopefully even suggest configuration changes.
> >>
> >> The CI consists of:
> >>
> >> 1. Jenkins pipeline(s) - internal, not available to developers, running daily builds of the latest available commit
> >> 2. Publicly available anonymized results: https://ldoktor.github.io/tmp/RedHat-Perf-worker1/
> > 
> > This link is 404.
> > 
> 
> My mistake, it works well without the tailing slash: https://ldoktor.github.io/tmp/RedHat-Perf-worker1
> 
> >> 3. (optional) a manual gitlab pulling job which triggered by the Jenkins pipeline when that particular commit is checked
> >>
> >> The (1) is described here: https://run-perf.readthedocs.io/en/latest/jenkins.html and can be replicated on other premises and the individual jobs can be executed directly https://run-perf.readthedocs.io on any linux box using Fedora guests (via pip or container https://run-perf.readthedocs.io/en/latest/container.html ).
> >>
> >> As for the (3) I made a testing pipeline available here: https://gitlab.com/ldoktor/qemu/-/pipelines with one always-passing test and one allow-to-fail actual testing job. If you think such integration would be useful, I can add it as another job to the official qemu repo. Note the integration is a bit hacky as, due to resources, we can not test all commits but rather test on daily basis, which is not officially supported by gitlab.
> >>
> >> Note the aim of this project is to ensure some very basic system-level workflow performance stays the same or that the differences are described and ideally pinned to individual commits. It should not replace thorough release testing or low-level performance tests.
> > 
> > If I understand correctly the GitLab CI integration you described
> > follows the "push" model where Jenkins (running on your own machine)
> > triggers a manual job in GitLab CI simply to indicate the status of the
> > nightly performance regression test?
> > 
> > What process should QEMU follow to handle performance regressions
> > identified by your job? In other words, which stakeholders need to
> > triage, notify, debug, etc when a regression is identified?
> > 
> > My guess is:
> > - Someone (you or the qemu.git committer) need to watch the job status and triage failures.
> > - That person then notifies likely authors of suspected commits so they can investigate.
> > - The authors need a way to reproduce the issue - either locally or by pushing commits to GitLab and waiting for test results.
> > - Fixes will be merged as additional qemu.git commits since commit history cannot be rewritten.
> > - If necessary a git-revert(1) commit can be merged to temporarily undo a commit that caused issues.
> > 
> > Who will watch the job status and triage failures?
> > 
> > Stefan
> 
> This is exactly the main question I'd like to resolve as part of considering-this-to-be-official-part-of-the-upstream-qemu-testing. At this point our team is offering it's service to maintain this single worker for daily jobs, monitoring the status and pinging people in case of bisectable results.

That's great! The main hurdle is finding someone to triage regressions
and if you are volunteering to do that then these regression tests would
be helpful to QEMU.

> From the upstream qemu community we are mainly looking for a feedback:
> 
> * whether they'd want to be notified of such issues (and via what means)

I have CCed Kevin Wolf in case he has any questions regarding how fio
regressions will be handled.

I'm happy to be contacted when a regression bisects to a commit I
authored.

> * whether the current approach seems to be actually performing useful tasks
> * whether the reports are understandable

Reports aren't something I would look at as a developer. Although the
history and current status may be useful to some maintainers, that
information isn't critical. Developers simply need to know which commit
introduced a regression and the details of how to run the regression.

> * whether the reports should be regularly pushed into publicly available place (or just on regression/improvement)
> * whether there are any volunteers to be interested in non-clearly-bisectable issues (probably by-topic)

One option is to notify maintainers, but when I'm in this position
myself I usually only investigate critical issues due to limited time.

Regarding how to contact people, I suggest emailing them and CCing
qemu-devel so others are aware.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2022-03-22 15:05       ` Stefan Hajnoczi
@ 2022-03-28  6:18         ` Lukáš Doktor
  2022-03-28  9:57           ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Lukáš Doktor @ 2022-03-28  6:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Charles Shih, Kevin Wolf, Aleksandar Markovic, QEMU Developers,
	qemu-block


[-- Attachment #1.1.1: Type: text/plain, Size: 8261 bytes --]

Hello Stefan, folks,

I seem to have another hit, an improvement actually and it seems to be bisected all the way to you, Stefan. Let me use this as another example of how such process could look like and we can use this to hammer-out the details like via what means to submit the request, whom to notify and how to proceed further.

---

Last week I noticed an improvement in TunedLibvirt/fio-rot-Aj-8i/0000:./write-4KiB/throughput/iops_sec.mean (<driver name="qemu" type="raw" io="native" cache="none"/>, fio, rotationary disk, raw file on host xfs partition, jobs=#cpus, iodepth=8, 4k writes) check and bisected it to:

commit fc8796465c6cd4091efe6a2f8b353f07324f49c7
Author: Stefan Hajnoczi <stefanha@redhat.com>
Date:   Wed Feb 23 15:57:03 2022 +0000

    aio-posix: fix spurious ->poll_ready() callbacks in main loop

Could you please confirm that it does make sense and that it is expected? (looks like it from the description).

Note that this commit was pin pointed using 2 out of 3 commits result, there were actually some little differences between commits fc8 and cc5. The fc8 and 202 results scored similarly to both, good and bad commits with 2 being closer to the bad one. Since cc5 they seem to stabilize further scoring slightly lower than the median fc8 result. Anyway I don't have enough data to declare anything. I can bisect it further if needed.

The git bisect log:

git bisect start
# good: [ecf1bbe3227cc1c54d7374aa737e7e0e60ee0c29] Merge tag 'pull-ppc-20220321' of https://github.com/legoater/qemu into staging
git bisect good ecf1bbe3227cc1c54d7374aa737e7e0e60ee0c29
# bad: [9d36d5f7e0dc905d8cb3dd437e479eb536417d3b] Merge tag 'pull-block-2022-03-22' of https://gitlab.com/hreitz/qemu into staging
git bisect bad 9d36d5f7e0dc905d8cb3dd437e479eb536417d3b
# bad: [0f7d7d72aa99c8e48bbbf37262a9c66c83113f76] iotests: use qemu_img_json() when applicable
git bisect bad 0f7d7d72aa99c8e48bbbf37262a9c66c83113f76
# bad: [cc5387a544325c26dcf124ac7d3999389c24e5c6] block/rbd: fix write zeroes with growing images
git bisect bad cc5387a544325c26dcf124ac7d3999389c24e5c6
# good: [b21e2380376c470900fcadf47507f4d5ade75e85] Use g_new() & friends where that makes obvious sense
git bisect good b21e2380376c470900fcadf47507f4d5ade75e85
# bad: [2028ab513bf0232841a909e1368309858919dbcc] Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging
git bisect bad 2028ab513bf0232841a909e1368309858919dbcc
# bad: [fc8796465c6cd4091efe6a2f8b353f07324f49c7] aio-posix: fix spurious ->poll_ready() callbacks in main loop
git bisect bad fc8796465c6cd4091efe6a2f8b353f07324f49c7
# good: [8a947c7a586e16a048894e1a0a73d154435e90ef] aio-posix: fix build failure io_uring 2.2
git bisect good 8a947c7a586e16a048894e1a0a73d154435e90ef
# first bad commit: [fc8796465c6cd4091efe6a2f8b353f07324f49c7] aio-posix: fix spurious ->poll_ready() callbacks in main loop

Also please find the bisection report attached. I can attach the VM xml file or other logs if needed.

Regards,
Lukáš


Dne 22. 03. 22 v 16:05 Stefan Hajnoczi napsal(a):
> On Mon, Mar 21, 2022 at 11:29:42AM +0100, Lukáš Doktor wrote:
>> Hello Stefan,
>>
>> Dne 21. 03. 22 v 10:42 Stefan Hajnoczi napsal(a):
>>> On Mon, Mar 21, 2022 at 09:46:12AM +0100, Lukáš Doktor wrote:
>>>> Dear qemu developers,
>>>>
>>>> you might remember the "replied to" email from a bit over year ago to raise a discussion about a qemu performance regression CI. On KVM forum I presented https://www.youtube.com/watch?v=Cbm3o4ACE3Y&list=PLbzoR-pLrL6q4ZzA4VRpy42Ua4-D2xHUR&index=9 some details about my testing pipeline. I think it's stable enough to become part of the official CI so people can consume, rely on it and hopefully even suggest configuration changes.
>>>>
>>>> The CI consists of:
>>>>
>>>> 1. Jenkins pipeline(s) - internal, not available to developers, running daily builds of the latest available commit
>>>> 2. Publicly available anonymized results: https://ldoktor.github.io/tmp/RedHat-Perf-worker1/
>>>
>>> This link is 404.
>>>
>>
>> My mistake, it works well without the tailing slash: https://ldoktor.github.io/tmp/RedHat-Perf-worker1
>>
>>>> 3. (optional) a manual gitlab pulling job which triggered by the Jenkins pipeline when that particular commit is checked
>>>>
>>>> The (1) is described here: https://run-perf.readthedocs.io/en/latest/jenkins.html and can be replicated on other premises and the individual jobs can be executed directly https://run-perf.readthedocs.io on any linux box using Fedora guests (via pip or container https://run-perf.readthedocs.io/en/latest/container.html ).
>>>>
>>>> As for the (3) I made a testing pipeline available here: https://gitlab.com/ldoktor/qemu/-/pipelines with one always-passing test and one allow-to-fail actual testing job. If you think such integration would be useful, I can add it as another job to the official qemu repo. Note the integration is a bit hacky as, due to resources, we can not test all commits but rather test on daily basis, which is not officially supported by gitlab.
>>>>
>>>> Note the aim of this project is to ensure some very basic system-level workflow performance stays the same or that the differences are described and ideally pinned to individual commits. It should not replace thorough release testing or low-level performance tests.
>>>
>>> If I understand correctly the GitLab CI integration you described
>>> follows the "push" model where Jenkins (running on your own machine)
>>> triggers a manual job in GitLab CI simply to indicate the status of the
>>> nightly performance regression test?
>>>
>>> What process should QEMU follow to handle performance regressions
>>> identified by your job? In other words, which stakeholders need to
>>> triage, notify, debug, etc when a regression is identified?
>>>
>>> My guess is:
>>> - Someone (you or the qemu.git committer) need to watch the job status and triage failures.
>>> - That person then notifies likely authors of suspected commits so they can investigate.
>>> - The authors need a way to reproduce the issue - either locally or by pushing commits to GitLab and waiting for test results.
>>> - Fixes will be merged as additional qemu.git commits since commit history cannot be rewritten.
>>> - If necessary a git-revert(1) commit can be merged to temporarily undo a commit that caused issues.
>>>
>>> Who will watch the job status and triage failures?
>>>
>>> Stefan
>>
>> This is exactly the main question I'd like to resolve as part of considering-this-to-be-official-part-of-the-upstream-qemu-testing. At this point our team is offering it's service to maintain this single worker for daily jobs, monitoring the status and pinging people in case of bisectable results.
> 
> That's great! The main hurdle is finding someone to triage regressions
> and if you are volunteering to do that then these regression tests would
> be helpful to QEMU.
> 
>> From the upstream qemu community we are mainly looking for a feedback:
>>
>> * whether they'd want to be notified of such issues (and via what means)
> 
> I have CCed Kevin Wolf in case he has any questions regarding how fio
> regressions will be handled.
> 
> I'm happy to be contacted when a regression bisects to a commit I
> authored.
> 
>> * whether the current approach seems to be actually performing useful tasks
>> * whether the reports are understandable
> 
> Reports aren't something I would look at as a developer. Although the
> history and current status may be useful to some maintainers, that
> information isn't critical. Developers simply need to know which commit
> introduced a regression and the details of how to run the regression.
> 
>> * whether the reports should be regularly pushed into publicly available place (or just on regression/improvement)
>> * whether there are any volunteers to be interested in non-clearly-bisectable issues (probably by-topic)
> 
> One option is to notify maintainers, but when I'm in this position
> myself I usually only investigate critical issues due to limited time.
> 
> Regarding how to contact people, I suggest emailing them and CCing
> qemu-devel so others are aware.
> 
> Thanks,
> Stefan

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 12153 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2022-03-28  6:18         ` Lukáš Doktor
@ 2022-03-28  9:57           ` Stefan Hajnoczi
  2022-03-28 11:09             ` Lukáš Doktor
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2022-03-28  9:57 UTC (permalink / raw)
  To: Lukáš Doktor
  Cc: Charles Shih, Kevin Wolf, Aleksandar Markovic, QEMU Developers,
	qemu-block

[-- Attachment #1: Type: text/plain, Size: 1967 bytes --]

On Mon, Mar 28, 2022 at 08:18:43AM +0200, Lukáš Doktor wrote:
> Hello Stefan, folks,
> 
> I seem to have another hit, an improvement actually and it seems to be bisected all the way to you, Stefan. Let me use this as another example of how such process could look like and we can use this to hammer-out the details like via what means to submit the request, whom to notify and how to proceed further.
> 
> ---
> 
> Last week I noticed an improvement in TunedLibvirt/fio-rot-Aj-8i/0000:./write-4KiB/throughput/iops_sec.mean (<driver name="qemu" type="raw" io="native" cache="none"/>, fio, rotationary disk, raw file on host xfs partition, jobs=#cpus, iodepth=8, 4k writes) check and bisected it to:
> 
> commit fc8796465c6cd4091efe6a2f8b353f07324f49c7
> Author: Stefan Hajnoczi <stefanha@redhat.com>
> Date:   Wed Feb 23 15:57:03 2022 +0000
> 
>     aio-posix: fix spurious ->poll_ready() callbacks in main loop
> 
> Could you please confirm that it does make sense and that it is expected? (looks like it from the description).
> 
> Note that this commit was pin pointed using 2 out of 3 commits result, there were actually some little differences between commits fc8 and cc5. The fc8 and 202 results scored similarly to both, good and bad commits with 2 being closer to the bad one. Since cc5 they seem to stabilize further scoring slightly lower than the median fc8 result. Anyway I don't have enough data to declare anything. I can bisect it further if needed.

Yes, I can imagine that commit fc8796465c6c might improve non-IOThread
performance!

I don't know how to read the report:
- What is the difference between "Group stats" and "Failures"?
- Why are there 3 different means in "Group stats"?
- Why are there 3 "fc8" columns in "Failures"?

I don't feel confident searching git-log(1) output with 3-character
commit IDs. git itself uses 12 characters for short commit IDs with a
reasonably low chance of collisions.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Proposal for a regular upstream performance testing
  2022-03-28  9:57           ` Stefan Hajnoczi
@ 2022-03-28 11:09             ` Lukáš Doktor
  0 siblings, 0 replies; 22+ messages in thread
From: Lukáš Doktor @ 2022-03-28 11:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Charles Shih, Kevin Wolf, Aleksandar Markovic, QEMU Developers,
	qemu-block


[-- Attachment #1.1.1: Type: text/plain, Size: 3551 bytes --]

Dne 28. 03. 22 v 11:57 Stefan Hajnoczi napsal(a):
> On Mon, Mar 28, 2022 at 08:18:43AM +0200, Lukáš Doktor wrote:
>> Hello Stefan, folks,
>>
>> I seem to have another hit, an improvement actually and it seems to be bisected all the way to you, Stefan. Let me use this as another example of how such process could look like and we can use this to hammer-out the details like via what means to submit the request, whom to notify and how to proceed further.
>>
>> ---
>>
>> Last week I noticed an improvement in TunedLibvirt/fio-rot-Aj-8i/0000:./write-4KiB/throughput/iops_sec.mean (<driver name="qemu" type="raw" io="native" cache="none"/>, fio, rotationary disk, raw file on host xfs partition, jobs=#cpus, iodepth=8, 4k writes) check and bisected it to:
>>
>> commit fc8796465c6cd4091efe6a2f8b353f07324f49c7
>> Author: Stefan Hajnoczi <stefanha@redhat.com>
>> Date:   Wed Feb 23 15:57:03 2022 +0000
>>
>>     aio-posix: fix spurious ->poll_ready() callbacks in main loop
>>
>> Could you please confirm that it does make sense and that it is expected? (looks like it from the description).
>>
>> Note that this commit was pin pointed using 2 out of 3 commits result, there were actually some little differences between commits fc8 and cc5. The fc8 and 202 results scored similarly to both, good and bad commits with 2 being closer to the bad one. Since cc5 they seem to stabilize further scoring slightly lower than the median fc8 result. Anyway I don't have enough data to declare anything. I can bisect it further if needed.
> 
> Yes, I can imagine that commit fc8796465c6c might improve non-IOThread
> performance!
> 

Hello Stefan and thank you for this confirmation as well as questions. Let me explain that a bit and perhaps you can help me to improve the report as it was designed for CI team and can be I can see where the confusion might come from.

> I don't know how to read the report:
> - What is the difference between "Group stats" and "Failures"?
> - Why are there 3 different means in "Group stats"?

The group stats combine (average) multiple individual failures with some common aspect and stricter thresholds are then used based on the number of matches. The group names contain asterisk (*) character which works the same way linux globs work and all matching tests are included. In this report it doesn't really makes sense as only single test variant was performed, but when various tests/scenarios are being used it can indicate change across all fio-write jobs, or all tests on various profiles...

> - Why are there 3 "fc8" columns in "Failures"?

To avoid bisect wandering away I'm usually using 3-5 samples per run. Still this wasn't reliable enough in some cases and bisect ended up on random commits so I added a 2 out of 3 feature. Every commit is checked twice and when they don't match it runs the third one. In the end all of the attempts are included in the report to better visualize how stable the results are.

> 
> I don't feel confident searching git-log(1) output with 3-character
> commit IDs. git itself uses 12 characters for short commit IDs with a
> reasonably low chance of collisions.
> 

I chose 3 letters only as usually the bisection window is quite small and there is the bisection log with all the full commits attached. Using longer hashes results in very wide table, which I wanted to avoid. Perhaps I can modify the links which currently point to the jenkins results to point to the qemu sha instead, what do you think?

Regards,
Lukáš

> Stefan

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 12153 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-03-28 11:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-26  8:10 Proposal for a regular upstream performance testing Lukáš Doktor
2020-11-26  8:23 ` Jason Wang
2020-11-26  9:43 ` Daniel P. Berrangé
2020-11-26 11:29   ` Lukáš Doktor
2020-11-30 13:23   ` Stefan Hajnoczi
2020-12-01  7:51     ` Lukáš Doktor
2020-11-26 10:17 ` Peter Maydell
2020-11-26 11:16   ` Lukáš Doktor
2020-11-30 13:25 ` Stefan Hajnoczi
2020-12-01  8:05   ` Lukáš Doktor
2020-12-01 10:22     ` Stefan Hajnoczi
2020-12-01 12:06       ` Lukáš Doktor
2020-12-01 12:35         ` Stefan Hajnoczi
2020-12-02  8:58           ` Chenqun (kuhn)
2020-12-02  8:23 ` Chenqun (kuhn)
2022-03-21  8:46 ` Lukáš Doktor
2022-03-21  9:42   ` Stefan Hajnoczi
2022-03-21 10:29     ` Lukáš Doktor
2022-03-22 15:05       ` Stefan Hajnoczi
2022-03-28  6:18         ` Lukáš Doktor
2022-03-28  9:57           ` Stefan Hajnoczi
2022-03-28 11:09             ` Lukáš Doktor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.