[REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

All of lore.kernel.org
 help / color / mirror / Atom feed

* [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
@ 2020-06-29 10:25 Ahmed Karaman
  2020-06-29 10:40 ` Aleksandar Markovic
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Ahmed Karaman @ 2020-06-29 10:25 UTC (permalink / raw)
  To: QEMU Developers, Aleksandar Markovic, Alex Bennée,
	Eric Blake, Richard Henderson, Lukáš Doktor

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

Hi,

The second report of the TCG Continuous Benchmarking series builds
upon the QEMU performance metrics calculated in the previous report.
This report presents a method to dissect the number of instructions
executed by a QEMU invocation into three main phases:
- Code Generation
- JIT Execution
- Helpers Execution
It devises a Python script that automates this process.

After that, the report presents an experiment for comparing the
output of running the script on 17 different targets. Many conclusions
can be drawn from the results and two of them are discussed in the
analysis section.

Report link:
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/

Previous reports:
Report 1 - Measuring Basic Performance Metrics of QEMU:
https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html

Best regards,
Ahmed Karaman

[-- Attachment #2: Type: text/html, Size: 1285 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman
@ 2020-06-29 10:40 ` Aleksandar Markovic
  2020-06-29 14:26   ` Ahmed Karaman
  2020-06-29 16:03 ` Alex Bennée
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Aleksandar Markovic @ 2020-06-29 10:40 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Alex Bennée, QEMU Developers,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1271 bytes --]

понедељак, 29. јун 2020., Ahmed Karaman <ahmedkhaledkaraman@gmail.com> је
написао/ла:

> Hi,
>
> The second report of the TCG Continuous Benchmarking series builds
> upon the QEMU performance metrics calculated in the previous report.
> This report presents a method to dissect the number of instructions
> executed by a QEMU invocation into three main phases:
> - Code Generation
> - JIT Execution
> - Helpers Execution
> It devises a Python script that automates this process.
>
> After that, the report presents an experiment for comparing the
> output of running the script on 17 different targets. Many conclusions
> can be drawn from the results and two of them are discussed in the
> analysis section.
>
> Report link:
> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/
> Dissecting-QEMU-Into-Three-Main-Parts/
>
> Previous reports:
> Report 1 - Measuring Basic Performance Metrics of QEMU:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>
>
My sincere congratulations on the Report 2!!

And, on top of that, this is an excellent idea to list previous reports, as
you did in the paragraph above.

Keep reports coming!!

Aleksandar



> Best regards,
> Ahmed Karaman
>

[-- Attachment #2: Type: text/html, Size: 2080 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-29 10:40 ` Aleksandar Markovic
@ 2020-06-29 14:26   ` Ahmed Karaman
  0 siblings, 0 replies; 27+ messages in thread
From: Ahmed Karaman @ 2020-06-29 14:26 UTC (permalink / raw)
  To: Aleksandar Markovic
  Cc: Lukáš Doktor, Alex Bennée, QEMU Developers,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

Thank you for your support!

On Mon, Jun 29, 2020, 12:40 PM Aleksandar Markovic <
aleksandar.qemu.devel@gmail.com> wrote:

>
>
> понедељак, 29. јун 2020., Ahmed Karaman <ahmedkhaledkaraman@gmail.com> је
> написао/ла:
>
>> Hi,
>>
>> The second report of the TCG Continuous Benchmarking series builds
>> upon the QEMU performance metrics calculated in the previous report.
>> This report presents a method to dissect the number of instructions
>> executed by a QEMU invocation into three main phases:
>> - Code Generation
>> - JIT Execution
>> - Helpers Execution
>> It devises a Python script that automates this process.
>>
>> After that, the report presents an experiment for comparing the
>> output of running the script on 17 different targets. Many conclusions
>> can be drawn from the results and two of them are discussed in the
>> analysis section.
>>
>> Report link:
>>
>> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>>
>> Previous reports:
>> Report 1 - Measuring Basic Performance Metrics of QEMU:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>>
>>
> My sincere congratulations on the Report 2!!
>
> And, on top of that, this is an excellent idea to list previous reports,
> as you did in the paragraph above.
>
> Keep reports coming!!
>
> Aleksandar
>
>
>
>> Best regards,
>> Ahmed Karaman
>>
>

[-- Attachment #2: Type: text/html, Size: 2519 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman
  2020-06-29 10:40 ` Aleksandar Markovic
@ 2020-06-29 16:03 ` Alex Bennée
  2020-06-29 18:21   ` Aleksandar Markovic
                     ` (2 more replies)
  2020-06-30  4:33 ` Lukáš Doktor
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 27+ messages in thread
From: Alex Bennée @ 2020-06-29 16:03 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers,
	Richard Henderson

Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:

> Hi,
>
> The second report of the TCG Continuous Benchmarking series builds
> upon the QEMU performance metrics calculated in the previous report.
> This report presents a method to dissect the number of instructions
> executed by a QEMU invocation into three main phases:
> - Code Generation
> - JIT Execution
> - Helpers Execution
> It devises a Python script that automates this process.
>
> After that, the report presents an experiment for comparing the
> output of running the script on 17 different targets. Many conclusions
> can be drawn from the results and two of them are discussed in the
> analysis section.

A couple of comments. One think I think is missing from your analysis is
the total number of guest instructions being emulated. As you point out
each guest will have different code efficiency in terms of it's
generated code.

Assuming your test case is constant execution (i.e. runs the same each
time) you could run in through a plugins build to extract the number of
guest instructions, e.g.:

  ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1
  SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
  insns: 158603512

I should have also pointed out in your last report that running FP heavy
code will always be biased towards helper/softfloat code to the
detriment of everything else. I think you need more of a mix of
benchmarks to get a better view.

When Emilio did the last set of analysis he used a suite he built out of
nbench and a perl benchmark:

  https://github.com/cota/dbt-bench

As he quoted in his README:

  NBench programs are small, with execution time dominated by small code
  loops. Thus, when run under a DBT engine, the resulting performance
  depends almost entirely on the quality of the output code.

  The Perl benchmarks compile Perl code. As is common for compilation
  workloads, they execute large amounts of code and show no particular
  code execution hotspots. Thus, the resulting DBT performance depends
  largely on code translation speed.

by only having one benchmark you are going to miss out on the envelope
of use cases.

>
> Report link:
>https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>
> Previous reports:
> Report 1 - Measuring Basic Performance Metrics of QEMU:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>
> Best regards,
> Ahmed Karaman

-- 
Alex Bennée

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-29 16:03 ` Alex Bennée
@ 2020-06-29 18:21   ` Aleksandar Markovic
  2020-06-29 21:16   ` Ahmed Karaman
  2020-07-01 13:44   ` Ahmed Karaman
  2 siblings, 0 replies; 27+ messages in thread
From: Aleksandar Markovic @ 2020-06-29 18:21 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Ahmed Karaman, Lukáš Doktor, Aleksandar Markovic,
	QEMU Developers, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

> I should have also pointed out in your
> by only having one benchmark you are going to miss out on the envelope
> of use cases.
>

Alex, thank you for all your comments, and other perspectives that you
always bring to Ahmed's and everyones else's attention. I always imagine
you as a "four-dimensional" engineer for the your unabashed presentation of
out-of-the-box ideas. I actually truly like this, quite often, inspiring
style.

However, it seems to me that this last paragraph is a little unjust
critique, and as if doesn't come from you.

The report is not about a benchmark, it is about a script that does
something. Ahmed never said "we are going to benchmark" anything. The
program in the report is just an example used for illustration.

And, now you say: it is not good for benchmarking. Well, no example is good
for benchmarking, and, again, the report is not about benchmarking. Why do
you mwntion benchmarking at all than? And what is Ahmed supposed to do? To
flood the report with dozens of programs and dozens of tables, thousands of
numbers, find some average - just to illustrate the script?

The variety of test programs will be the subject of future reports.

Otherwise, all intriguing and useful proposals from your side, and many
thanks for them!!

Yours,
Aleksandar

[-- Attachment #2: Type: text/html, Size: 1671 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-29 16:03 ` Alex Bennée
  2020-06-29 18:21   ` Aleksandar Markovic
@ 2020-06-29 21:16   ` Ahmed Karaman
  2020-07-01 13:44   ` Ahmed Karaman
  2 siblings, 0 replies; 27+ messages in thread
From: Ahmed Karaman @ 2020-06-29 21:16 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers,
	Richard Henderson

On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:
>
> > Hi,
> >
> > The second report of the TCG Continuous Benchmarking series builds
> > upon the QEMU performance metrics calculated in the previous report.
> > This report presents a method to dissect the number of instructions
> > executed by a QEMU invocation into three main phases:
> > - Code Generation
> > - JIT Execution
> > - Helpers Execution
> > It devises a Python script that automates this process.
> >
> > After that, the report presents an experiment for comparing the
> > output of running the script on 17 different targets. Many conclusions
> > can be drawn from the results and two of them are discussed in the
> > analysis section.
>
> A couple of comments. One think I think is missing from your analysis is
> the total number of guest instructions being emulated. As you point out
> each guest will have different code efficiency in terms of it's
> generated code.
>
> Assuming your test case is constant execution (i.e. runs the same each
> time)
Yes indeed, the report utilizes Callgrind in the measurements so the
results are very stable.
>you could run in through a plugins build to extract the number of
> guest instructions, e.g.:
>
>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1
>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
>   insns: 158603512
>
That's a very nice suggestion. Maybe this will be the idea of a whole
new report. I'll try to execute the provided command and will let you
know if I have any questions.
> I should have also pointed out in your last report that running FP heavy
> code will always be biased towards helper/softfloat code to the
> detriment of everything else. I think you need more of a mix of
> benchmarks to get a better view.
>
> When Emilio did the last set of analysis he used a suite he built out of
> nbench and a perl benchmark:
>
>   https://github.com/cota/dbt-bench
>
> As he quoted in his README:
>
>   NBench programs are small, with execution time dominated by small code
>   loops. Thus, when run under a DBT engine, the resulting performance
>   depends almost entirely on the quality of the output code.
>
>   The Perl benchmarks compile Perl code. As is common for compilation
>   workloads, they execute large amounts of code and show no particular
>   code execution hotspots. Thus, the resulting DBT performance depends
>   largely on code translation speed.
>
> by only having one benchmark you are going to miss out on the envelope
> of use cases.
>
Future reports will introduce a variety of benchmarks. This report -
and the previous one - are introductory reports. The benchmark used
was to only demonstrate the report ideas. It was not used as a strict
benchmarking program.
> >
> > Report link:
> >https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
> >
> > Previous reports:
> > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> >
> > Best regards,
> > Ahmed Karaman
>
>
> --
> Alex Bennée


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman
  2020-06-29 10:40 ` Aleksandar Markovic
  2020-06-29 16:03 ` Alex Bennée
@ 2020-06-30  4:33 ` Lukáš Doktor
  2020-06-30  7:18   ` Ahmed Karaman
  2020-06-30  9:41   ` Aleksandar Markovic
  2020-06-30  5:59 ` 罗勇刚(Yonggang Luo)
  2020-07-01 14:47 ` Ahmed Karaman
  4 siblings, 2 replies; 27+ messages in thread
From: Lukáš Doktor @ 2020-06-30  4:33 UTC (permalink / raw)
  To: Ahmed Karaman, QEMU Developers, Aleksandar Markovic,
	Alex Bennée, Eric Blake, Richard Henderson


[-- Attachment #1.1: Type: text/plain, Size: 2630 bytes --]

Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a):
> Hi,
> 
> The second report of the TCG Continuous Benchmarking series builds
> upon the QEMU performance metrics calculated in the previous report.
> This report presents a method to dissect the number of instructions
> executed by a QEMU invocation into three main phases:
> - Code Generation
> - JIT Execution
> - Helpers Execution
> It devises a Python script that automates this process.
> 
> After that, the report presents an experiment for comparing the
> output of running the script on 17 different targets. Many conclusions
> can be drawn from the results and two of them are discussed in the
> analysis section.
> 
> Report link:
> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
> 
> Previous reports:
> Report 1 - Measuring Basic Performance Metrics of QEMU:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> 
> Best regards,
> Ahmed Karaman

Hello Ahmed,

very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like:

architecture: x86_64
cpu_codename: Kaby Lake
cpu: i7-8650U
ram: 32GB DDR4
os: Fedora 32
qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7
compiler: gcc-10.1.1-1.fc32.x86_64
flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm  --enable-vhost-net --enable-vhost-net --enable-attr  --enable-kvm  --enable-fdt   --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring

would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion.

Regards,
Lukáš

PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script.

Regards,
Lukáš


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman
                   ` (2 preceding siblings ...)
  2020-06-30  4:33 ` Lukáš Doktor
@ 2020-06-30  5:59 ` 罗勇刚(Yonggang Luo)
  2020-06-30  7:29   ` Ahmed Karaman
  2020-07-01 14:47 ` Ahmed Karaman
  4 siblings, 1 reply; 27+ messages in thread
From: 罗勇刚(Yonggang Luo) @ 2020-06-30  5:59 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Alex Bennée, QEMU Developers,
	Aleksandar Markovic, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 1183 bytes --]

Wonderful work, May I reproduce the work on my local machine?

On Mon, Jun 29, 2020 at 6:26 PM Ahmed Karaman <ahmedkhaledkaraman@gmail.com>
wrote:

> Hi,
>
> The second report of the TCG Continuous Benchmarking series builds
> upon the QEMU performance metrics calculated in the previous report.
> This report presents a method to dissect the number of instructions
> executed by a QEMU invocation into three main phases:
> - Code Generation
> - JIT Execution
> - Helpers Execution
> It devises a Python script that automates this process.
>
> After that, the report presents an experiment for comparing the
> output of running the script on 17 different targets. Many conclusions
> can be drawn from the results and two of them are discussed in the
> analysis section.
>
> Report link:
>
> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>
> Previous reports:
> Report 1 - Measuring Basic Performance Metrics of QEMU:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>
> Best regards,
> Ahmed Karaman
>


-- 
         此致
礼
罗勇刚
Yours
    sincerely,
Yonggang Luo

[-- Attachment #2: Type: text/html, Size: 1910 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30  4:33 ` Lukáš Doktor
@ 2020-06-30  7:18   ` Ahmed Karaman
  2020-06-30  8:58     ` Aleksandar Markovic
  2020-06-30  9:41   ` Aleksandar Markovic
  1 sibling, 1 reply; 27+ messages in thread
From: Ahmed Karaman @ 2020-06-30  7:18 UTC (permalink / raw)
  To: Lukáš Doktor
  Cc: Aleksandar Markovic, Alex Bennée, QEMU Developers,
	Richard Henderson

On Tue, Jun 30, 2020 at 6:34 AM Lukáš Doktor <ldoktor@redhat.com> wrote:
>
> Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a):
> > Hi,
> >
> > The second report of the TCG Continuous Benchmarking series builds
> > upon the QEMU performance metrics calculated in the previous report.
> > This report presents a method to dissect the number of instructions
> > executed by a QEMU invocation into three main phases:
> > - Code Generation
> > - JIT Execution
> > - Helpers Execution
> > It devises a Python script that automates this process.
> >
> > After that, the report presents an experiment for comparing the
> > output of running the script on 17 different targets. Many conclusions
> > can be drawn from the results and two of them are discussed in the
> > analysis section.
> >
> > Report link:
> > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
> >
> > Previous reports:
> > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> >
> > Best regards,
> > Ahmed Karaman
>
> Hello Ahmed,
>
> very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like:
>
> architecture: x86_64
> cpu_codename: Kaby Lake
> cpu: i7-8650U
> ram: 32GB DDR4
> os: Fedora 32
> qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7
> compiler: gcc-10.1.1-1.fc32.x86_64
> flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm  --enable-vhost-net --enable-vhost-net --enable-attr  --enable-kvm  --enable-fdt   --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring
>
> would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion.
>
> Regards,
> Lukáš
>
> PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script.
>
> Regards,
> Lukáš
>
Thanks Mr. Lukáš, I'm really glad you found both reports interesting.

Both reports are based on QEMU version 5.0.0, this wasn't mentioned in
the reports so thanks for the reminder. I'll add a short note about
that.

The used QEMU build is a very basic GCC build (created by just running
../configure in the build directory without any flags).

Regarding the detailed machine information (CPU, RAM ... etc), The two
reports introduce some concepts and methodologies that will produce
consistent results on whichever machine they are executed on. So I
think it's unnecessary to mention the detailed system information used
in the reports for now.

Best regards,
Ahmed Karaman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30  5:59 ` 罗勇刚(Yonggang Luo)
@ 2020-06-30  7:29   ` Ahmed Karaman
  2020-06-30  8:21     ` Aleksandar Markovic
  0 siblings, 1 reply; 27+ messages in thread
From: Ahmed Karaman @ 2020-06-30  7:29 UTC (permalink / raw)
  To: luoyonggang
  Cc: Lukáš Doktor, Alex Bennée, QEMU Developers,
	Aleksandar Markovic, Richard Henderson

On Tue, Jun 30, 2020 at 7:59 AM 罗勇刚(Yonggang Luo) <luoyonggang@gmail.com> wrote:
>
> Wonderful work, May I reproduce the work on my local machine?
>
> On Mon, Jun 29, 2020 at 6:26 PM Ahmed Karaman <ahmedkhaledkaraman@gmail.com> wrote:
>>
>> Hi,
>>
>> The second report of the TCG Continuous Benchmarking series builds
>> upon the QEMU performance metrics calculated in the previous report.
>> This report presents a method to dissect the number of instructions
>> executed by a QEMU invocation into three main phases:
>> - Code Generation
>> - JIT Execution
>> - Helpers Execution
>> It devises a Python script that automates this process.
>>
>> After that, the report presents an experiment for comparing the
>> output of running the script on 17 different targets. Many conclusions
>> can be drawn from the results and two of them are discussed in the
>> analysis section.
>>
>> Report link:
>> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>>
>> Previous reports:
>> Report 1 - Measuring Basic Performance Metrics of QEMU:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>>
>> Best regards,
>> Ahmed Karaman
>
>
>
> --
>          此致
> 礼
> 罗勇刚
> Yours
>     sincerely,
> Yonggang Luo

Thanks Mr. Yonggang. Yes of course, go ahead.
Please let me know if you have any further questions.

Best Regards,
Ahmed Karaman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30  7:29   ` Ahmed Karaman
@ 2020-06-30  8:21     ` Aleksandar Markovic
  2020-06-30  9:52       ` Aleksandar Markovic
  0 siblings, 1 reply; 27+ messages in thread
From: Aleksandar Markovic @ 2020-06-30  8:21 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Alex Bennée, QEMU Developers,
	luoyonggang, Richard Henderson

уто, 30. јун 2020. у 09:30 Ahmed Karaman
<ahmedkhaledkaraman@gmail.com> је написао/ла:
>
> On Tue, Jun 30, 2020 at 7:59 AM 罗勇刚(Yonggang Luo) <luoyonggang@gmail.com> wrote:
> >
> > Wonderful work, May I reproduce the work on my local machine?
> >
> > On Mon, Jun 29, 2020 at 6:26 PM Ahmed Karaman <ahmedkhaledkaraman@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> The second report of the TCG Continuous Benchmarking series builds
> >> upon the QEMU performance metrics calculated in the previous report.
> >> This report presents a method to dissect the number of instructions
> >> executed by a QEMU invocation into three main phases:
> >> - Code Generation
> >> - JIT Execution
> >> - Helpers Execution
> >> It devises a Python script that automates this process.
> >>
> >> After that, the report presents an experiment for comparing the
> >> output of running the script on 17 different targets. Many conclusions
> >> can be drawn from the results and two of them are discussed in the
> >> analysis section.
> >>
> >> Report link:
> >> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
> >>
> >> Previous reports:
> >> Report 1 - Measuring Basic Performance Metrics of QEMU:
> >> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> >>
> >> Best regards,
> >> Ahmed Karaman
> >
> >
> >
> > --
> >          此致
> > 礼
> > 罗勇刚
> > Yours
> >     sincerely,
> > Yonggang Luo
>
> Thanks Mr. Yonggang. Yes of course, go ahead.
> Please let me know if you have any further questions.
>

Yes, Ahmed, you said Mr. Yonggang can go ahaed - but you didn't say how. :)

As far as I know, this is how Ahmed test bed is setup:

1) Fresh installation on Ubuntu 18.04 on an Inter 64-bit host.
2) Install QEMU build prerequisite packages.
3) Install perf (this step is not necessary for Report 2, but it is
for Report 1).
4) Install vallgrind.
5) Install 16 gcc cross-compilers. (which, together with native
comipler, will sum up to the 17 possible QEMU targets)

That is all fine if Mr. Yongang is able to do the above, or if he
already have similar system.

I am fairly convinced that the setup for any Debian-based Linux
distribution will be almost identical as described above

However, let's say Mr.Yongang system is Suse-bases distribution (SUSE
Linux Enterprise, openSUSE Leap, openSUSE Tumbleweed, Gecko). He could
do steps 2), 3), 4) in a fairly similar manner. But, step 5) will be
difficult. I know that support for cross-compilers is relatively poor
for Suse-based distributions. I think Mr. Yongang could run experiment
from the second part of Report 2 only for 5 or 6 targets, rather than
17 as you did.

The bottom line for Report 2:

I think there should be an "Appendix" note on installing
cross-compilers. And some general note on your test bed, as well as
some guideline for all people like Mr. Yongang who wish to repro the
results on their own systems.

Sincerely,
Aleksandar

2)

> Best Regards,
> Ahmed Karaman

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30  7:18   ` Ahmed Karaman
@ 2020-06-30  8:58     ` Aleksandar Markovic
  2020-06-30 12:46       ` Lukáš Doktor
  0 siblings, 1 reply; 27+ messages in thread
From: Aleksandar Markovic @ 2020-06-30  8:58 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Alex Bennée, QEMU Developers,
	Richard Henderson

уто, 30. јун 2020. у 09:19 Ahmed Karaman
<ahmedkhaledkaraman@gmail.com> је написао/ла:
>
> On Tue, Jun 30, 2020 at 6:34 AM Lukáš Doktor <ldoktor@redhat.com> wrote:
> >
> > Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a):
> > > Hi,
> > >
> > > The second report of the TCG Continuous Benchmarking series builds
> > > upon the QEMU performance metrics calculated in the previous report.
> > > This report presents a method to dissect the number of instructions
> > > executed by a QEMU invocation into three main phases:
> > > - Code Generation
> > > - JIT Execution
> > > - Helpers Execution
> > > It devises a Python script that automates this process.
> > >
> > > After that, the report presents an experiment for comparing the
> > > output of running the script on 17 different targets. Many conclusions
> > > can be drawn from the results and two of them are discussed in the
> > > analysis section.
> > >
> > > Report link:
> > > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
> > >
> > > Previous reports:
> > > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> > >
> > > Best regards,
> > > Ahmed Karaman
> >
> > Hello Ahmed,
> >
> > very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like:
> >
> > architecture: x86_64
> > cpu_codename: Kaby Lake
> > cpu: i7-8650U
> > ram: 32GB DDR4
> > os: Fedora 32
> > qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7
> > compiler: gcc-10.1.1-1.fc32.x86_64
> > flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm  --enable-vhost-net --enable-vhost-net --enable-attr  --enable-kvm  --enable-fdt   --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring
> >
> > would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion.
> >
> > Regards,
> > Lukáš
> >
> > PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script.
> >
> > Regards,
> > Lukáš
> >
> Thanks Mr. Lukáš, I'm really glad you found both reports interesting.
>
> Both reports are based on QEMU version 5.0.0, this wasn't mentioned in
> the reports so thanks for the reminder. I'll add a short note about
> that.
>
> The used QEMU build is a very basic GCC build (created by just running
> ../configure in the build directory without any flags).
>
> Regarding the detailed machine information (CPU, RAM ... etc), The two
> reports introduce some concepts and methodologies that will produce
> consistent results on whichever machine they are executed on. So I
> think it's unnecessary to mention the detailed system information used
> in the reports for now.
>

Ahmed, I don't entirely agree with you on this topic.

I think you treated Mr. Lukas comments in an overly lax way.

Yes, the results will be stable (within a small fraction of a percent)
on a particular given system (which is proved in "Stability
Experiment" section of Report 1). That is great! Although it sounds
elementary, this is not easy to achieve, so I am glad you did it.

However, we know that the results for hosts of different architectures
will be different - we expect that.

32-bit Intel host will also most likely produce significantly
different results than 64-bit Intel hosts. By the way, 64-bit targets
in QEMU linux-user mode are not supported on 32-bit hosts (although
nothing stops the user to start corresponding instances of QEMU on a
32-bit host, but the results are unpredictable.

Let's focus now on Intel 64-bit hosts only. Richard, can you perhaps
enlighten us on whether QEMU (from the point of view of TCG target)
behaves differently on different Intel 64-bit hosts, and to what
degree?

I currently work remotely, but once I am be physically at my office I
will have a variety of hosts at the company, and would be happy to do
the comparison between them, wrt what you presented in Report 2.

In conclusion, I think a basic description of your test bed is missing
in your reports. And, for final reports (which we call "nightly
reports") a detailed system description, as Mr Lukas outlined, is,
also in my opinion, necessary.

Thanks, Mr. Lukas, for bringing this to our attention!

Yours,
Aleksandar




> Best regards,
> Ahmed Karaman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30  4:33 ` Lukáš Doktor
  2020-06-30  7:18   ` Ahmed Karaman
@ 2020-06-30  9:41   ` Aleksandar Markovic
  2020-06-30 12:58     ` Lukáš Doktor
  1 sibling, 1 reply; 27+ messages in thread
From: Aleksandar Markovic @ 2020-06-30  9:41 UTC (permalink / raw)
  To: Lukáš Doktor
  Cc: Ahmed Karaman, Alex Bennée, QEMU Developers, Richard Henderson

уто, 30. јун 2020. у 06:34 Lukáš Doktor <ldoktor@redhat.com> је написао/ла:
>
> Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a):
> > Hi,
> >
> > The second report of the TCG Continuous Benchmarking series builds
> > upon the QEMU performance metrics calculated in the previous report.
> > This report presents a method to dissect the number of instructions
> > executed by a QEMU invocation into three main phases:
> > - Code Generation
> > - JIT Execution
> > - Helpers Execution
> > It devises a Python script that automates this process.
> >
> > After that, the report presents an experiment for comparing the
> > output of running the script on 17 different targets. Many conclusions
> > can be drawn from the results and two of them are discussed in the
> > analysis section.
> >
> > Report link:
> > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
> >
> > Previous reports:
> > Report 1 - Measuring Basic Performance Metrics of QEMU:
> > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
> >
> > Best regards,
> > Ahmed Karaman
>
> Hello Ahmed,
>
> very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like:
>
> architecture: x86_64
> cpu_codename: Kaby Lake
> cpu: i7-8650U
> ram: 32GB DDR4
> os: Fedora 32
> qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7
> compiler: gcc-10.1.1-1.fc32.x86_64
> flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm  --enable-vhost-net --enable-vhost-net --enable-attr  --enable-kvm  --enable-fdt   --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring
>
> would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion.
>

I just want to follow up on this observation here, and not related to
Ahmed's report at all.

We often receive bug reports of the following style: "I have Debian
10.2 system and mips emulation misbehaves". As you may imagine, I
assign the bug to myself, install Debian 10.2 system on my
experimental box, and mips emulation works like charm.
<banging-head-against-the-wall-emoji> Obviously, I need more info on
the submitter's system.

After all these years, we don't have (or at least I don't know about
it) a script that we could give the submitter, that picks up various
aspects of his system. This script, since it is not "for presentation"
could be even far more aggressive in picking ups system information
that what Lukas mentioned above. It could collect the output of
various relevant commands, and yip it in a single file. We should have
"get_system_info.py" in our scripts directory!

Sincerely,
Aleksandar


> Regards,
> Lukáš
>
> PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script.
>
> Regards,
> Lukáš
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30  8:21     ` Aleksandar Markovic
@ 2020-06-30  9:52       ` Aleksandar Markovic
  2020-06-30 19:02         ` Ahmed Karaman
  0 siblings, 1 reply; 27+ messages in thread
From: Aleksandar Markovic @ 2020-06-30  9:52 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Alex Bennée, QEMU Developers,
	luoyonggang, Richard Henderson

> As far as I know, this is how Ahmed test bed is setup:
>
> 1) Fresh installation on Ubuntu 18.04 on an Inter 64-bit host.
> 2) Install QEMU build prerequisite packages.
> 3) Install perf (this step is not necessary for Report 2, but it is
> for Report 1).
> 4) Install vallgrind.
> 5) Install 16 gcc cross-compilers. (which, together with native
> comipler, will sum up to the 17 possible QEMU targets)
>

The following commands install cross-compilers needed for creating
table in the second part or Ahmed's Report 2:

sudo apt-get install g++
sudo apt-get install g++-aarch64-linux-gnu
sudo apt-get install g++-alpha-linux-gnu
sudo apt-get install g++-arm-linux-gnueabi
sudo apt-get install g++-hppa-linux-gnu
sudo apt-get install g++-m68k-linux-gnu
sudo apt-get install g++-mips-linux-gnu
sudo apt-get install g++-mips64-linux-gnuabi64
sudo apt-get install g++-mips64el-linux-gnuabi64
sudo apt-get install g++-mipsel-linux-gnu
sudo apt-get install g++-powerpc-linux-gnu
sudo apt-get install g++-powerpc64-linux-gnu
sudo apt-get install g++-powerpc64le-linux-gnu
sudo apt-get install g++-riscv64-linux-gnu
sudo apt-get install g++-s390x-linux-gnu
sudo apt-get install g++-sh4-linux-gnu
sudo apt-get install g++-sparc64-linux-gnu

Ahmed, I think this should be in an Appendix section of Report 2.

Sincerely,
Aleksandar

> That is all fine if Mr. Yongang is able to do the above, or if he
> already have similar system.
>
> I am fairly convinced that the setup for any Debian-based Linux
> distribution will be almost identical as described above
>
> However, let's say Mr.Yongang system is Suse-bases distribution (SUSE
> Linux Enterprise, openSUSE Leap, openSUSE Tumbleweed, Gecko). He could
> do steps 2), 3), 4) in a fairly similar manner. But, step 5) will be
> difficult. I know that support for cross-compilers is relatively poor
> for Suse-based distributions. I think Mr. Yongang could run experiment
> from the second part of Report 2 only for 5 or 6 targets, rather than
> 17 as you did.
>
> The bottom line for Report 2:
>
> I think there should be an "Appendix" note on installing
> cross-compilers. And some general note on your test bed, as well as
> some guideline for all people like Mr. Yongang who wish to repro the
> results on their own systems.
>
> Sincerely,
> Aleksandar
>
>
>
>
>
>
>
>
>
> 2)
>
>
> > Best Regards,
> > Ahmed Karaman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30  8:58     ` Aleksandar Markovic
@ 2020-06-30 12:46       ` Lukáš Doktor
  2020-06-30 19:14         ` Ahmed Karaman
  0 siblings, 1 reply; 27+ messages in thread
From: Lukáš Doktor @ 2020-06-30 12:46 UTC (permalink / raw)
  To: Aleksandar Markovic, Ahmed Karaman
  Cc: Alex Bennée, QEMU Developers, Richard Henderson


[-- Attachment #1.1: Type: text/plain, Size: 1881 bytes --]

> However, we know that the results for hosts of different architectures
> will be different - we expect that.
> 
> 32-bit Intel host will also most likely produce significantly
> different results than 64-bit Intel hosts. By the way, 64-bit targets
> in QEMU linux-user mode are not supported on 32-bit hosts (although
> nothing stops the user to start corresponding instances of QEMU on a
> 32-bit host, but the results are unpredictable.
> 
> Let's focus now on Intel 64-bit hosts only. Richard, can you perhaps
> enlighten us on whether QEMU (from the point of view of TCG target)
> behaves differently on different Intel 64-bit hosts, and to what
> degree?
> 
> I currently work remotely, but once I am be physically at my office I
> will have a variety of hosts at the company, and would be happy to do
> the comparison between them, wrt what you presented in Report 2.
> 
> In conclusion, I think a basic description of your test bed is missing
> in your reports. And, for final reports (which we call "nightly
> reports") a detailed system description, as Mr Lukas outlined, is,
> also in my opinion, necessary.
> 
> Thanks, Mr. Lukas, for bringing this to our attention!
> 

You're welcome. I'm more on the python side, but as far as I know different cpu models (provided their features are enabled) and especially architectures result in way different code-paths. Imagine an old processor without vector instructions compare to newer ones that can process multiple instructions at once.

As for the reports, I don't think that at this point it would be necessary to focus on anything besides a single cpu model (x86_64 Intel) as there are already many variables. Later someone can follow-up with a cross-arch comparison, if necessary.

Regards,
Lukáš

> Yours,
> Aleksandar
> 
> 
> 
> 
>> Best regards,
>> Ahmed Karaman
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30  9:41   ` Aleksandar Markovic
@ 2020-06-30 12:58     ` Lukáš Doktor
  0 siblings, 0 replies; 27+ messages in thread
From: Lukáš Doktor @ 2020-06-30 12:58 UTC (permalink / raw)
  To: Aleksandar Markovic
  Cc: Ahmed Karaman, Alex Bennée, QEMU Developers, Richard Henderson


[-- Attachment #1.1: Type: text/plain, Size: 4573 bytes --]

Dne 30. 06. 20 v 11:41 Aleksandar Markovic napsal(a):
> уто, 30. јун 2020. у 06:34 Lukáš Doktor <ldoktor@redhat.com> је написао/ла:
>>
>> Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a):
>>> Hi,
>>>
>>> The second report of the TCG Continuous Benchmarking series builds
>>> upon the QEMU performance metrics calculated in the previous report.
>>> This report presents a method to dissect the number of instructions
>>> executed by a QEMU invocation into three main phases:
>>> - Code Generation
>>> - JIT Execution
>>> - Helpers Execution
>>> It devises a Python script that automates this process.
>>>
>>> After that, the report presents an experiment for comparing the
>>> output of running the script on 17 different targets. Many conclusions
>>> can be drawn from the results and two of them are discussed in the
>>> analysis section.
>>>
>>> Report link:
>>> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>>>
>>> Previous reports:
>>> Report 1 - Measuring Basic Performance Metrics of QEMU:
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>>>
>>> Best regards,
>>> Ahmed Karaman
>>
>> Hello Ahmed,
>>
>> very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like:
>>
>> architecture: x86_64
>> cpu_codename: Kaby Lake
>> cpu: i7-8650U
>> ram: 32GB DDR4
>> os: Fedora 32
>> qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7
>> compiler: gcc-10.1.1-1.fc32.x86_64
>> flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm  --enable-vhost-net --enable-vhost-net --enable-attr  --enable-kvm  --enable-fdt   --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring
>>
>> would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion.
>>
> 
> I just want to follow up on this observation here, and not related to
> Ahmed's report at all.
> 
> We often receive bug reports of the following style: "I have Debian
> 10.2 system and mips emulation misbehaves". As you may imagine, I
> assign the bug to myself, install Debian 10.2 system on my
> experimental box, and mips emulation works like charm.
> <banging-head-against-the-wall-emoji> Obviously, I need more info on
> the submitter's system.
> 
> After all these years, we don't have (or at least I don't know about
> it) a script that we could give the submitter, that picks up various
> aspects of his system. This script, since it is not "for presentation"
> could be even far more aggressive in picking ups system information
> that what Lukas mentioned above. It could collect the output of
> various relevant commands, and yip it in a single file. We should have
> "get_system_info.py" in our scripts directory!
> 
> Sincerely,
> Aleksandar
> 

Well this itself is a very complicated matter that could deserve a GSoC project. It's hard to balance the utils required to obtain the knowledge. I'm fond of sosreport, that is heavily used by RH but the result is quite big. Slightly smaller set can be generated via ansible, which itself gathers a lot of useful information. If we are to speak only about minimal approach especially tailored to qemu, than I'd suggest taking a look at `avocado.utils` especially `avocado.utils.cpu` as Avocado is already used for qemu testing.

Anyway don't consider this as a complete list, I just wanted to demonstrate how difficult and complex this subject is.

Regards,
Lukáš

> 
>> Regards,
>> Lukáš
>>
>> PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script.
>>
>> Regards,
>> Lukáš
>>
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30  9:52       ` Aleksandar Markovic
@ 2020-06-30 19:02         ` Ahmed Karaman
  0 siblings, 0 replies; 27+ messages in thread
From: Ahmed Karaman @ 2020-06-30 19:02 UTC (permalink / raw)
  To: Aleksandar Markovic, luoyonggang
  Cc: Lukáš Doktor, Alex Bennée, QEMU Developers,
	Richard Henderson

On Tue, Jun 30, 2020 at 11:52 AM Aleksandar Markovic
<aleksandar.qemu.devel@gmail.com> wrote:
>
> > As far as I know, this is how Ahmed test bed is setup:
> >
> > 1) Fresh installation on Ubuntu 18.04 on an Inter 64-bit host.
> > 2) Install QEMU build prerequisite packages.
> > 3) Install perf (this step is not necessary for Report 2, but it is
> > for Report 1).
> > 4) Install vallgrind.
> > 5) Install 16 gcc cross-compilers. (which, together with native
> > comipler, will sum up to the 17 possible QEMU targets)
> >
>
> The following commands install cross-compilers needed for creating
> table in the second part or Ahmed's Report 2:
>
> sudo apt-get install g++
> sudo apt-get install g++-aarch64-linux-gnu
> sudo apt-get install g++-alpha-linux-gnu
> sudo apt-get install g++-arm-linux-gnueabi
> sudo apt-get install g++-hppa-linux-gnu
> sudo apt-get install g++-m68k-linux-gnu
> sudo apt-get install g++-mips-linux-gnu
> sudo apt-get install g++-mips64-linux-gnuabi64
> sudo apt-get install g++-mips64el-linux-gnuabi64
> sudo apt-get install g++-mipsel-linux-gnu
> sudo apt-get install g++-powerpc-linux-gnu
> sudo apt-get install g++-powerpc64-linux-gnu
> sudo apt-get install g++-powerpc64le-linux-gnu
> sudo apt-get install g++-riscv64-linux-gnu
> sudo apt-get install g++-s390x-linux-gnu
> sudo apt-get install g++-sh4-linux-gnu
> sudo apt-get install g++-sparc64-linux-gnu
>
> Ahmed, I think this should be in an Appendix section of Report 2.
>
> Sincerely,
> Aleksandar
>
> > That is all fine if Mr. Yongang is able to do the above, or if he
> > already have similar system.
> >
> > I am fairly convinced that the setup for any Debian-based Linux
> > distribution will be almost identical as described above
> >
> > However, let's say Mr.Yongang system is Suse-bases distribution (SUSE
> > Linux Enterprise, openSUSE Leap, openSUSE Tumbleweed, Gecko). He could
> > do steps 2), 3), 4) in a fairly similar manner. But, step 5) will be
> > difficult. I know that support for cross-compilers is relatively poor
> > for Suse-based distributions. I think Mr. Yongang could run experiment
> > from the second part of Report 2 only for 5 or 6 targets, rather than
> > 17 as you did.
> >
> > The bottom line for Report 2:
> >
> > I think there should be an "Appendix" note on installing
> > cross-compilers. And some general note on your test bed, as well as
> > some guideline for all people like Mr. Yongang who wish to repro the
> > results on their own systems.
> >
> > Sincerely,
> > Aleksandar
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > 2)
> >
> >
> > > Best Regards,
> > > Ahmed Karaman
Thanks Mr. Aleksandar for your input on this one.
This is indeed my setup for the testbed used for the two previous
reports and all the upcoming ones.
To help Mr. Yongang with his setup, and anybody else trying to set
this up, I plan to post a mini-report (Report 0) to lay down the
instructions for setting up a system similar to the one used in the
reports.

Best regards,
Ahmed Karaman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-30 12:46       ` Lukáš Doktor
@ 2020-06-30 19:14         ` Ahmed Karaman
  0 siblings, 0 replies; 27+ messages in thread
From: Ahmed Karaman @ 2020-06-30 19:14 UTC (permalink / raw)
  To: Lukáš Doktor, Aleksandar Markovic
  Cc: Alex Bennée, QEMU Developers, Richard Henderson

On Tue, Jun 30, 2020 at 2:46 PM Lukáš Doktor <ldoktor@redhat.com> wrote:
>
> > However, we know that the results for hosts of different architectures
> > will be different - we expect that.
> >
> > 32-bit Intel host will also most likely produce significantly
> > different results than 64-bit Intel hosts. By the way, 64-bit targets
> > in QEMU linux-user mode are not supported on 32-bit hosts (although
> > nothing stops the user to start corresponding instances of QEMU on a
> > 32-bit host, but the results are unpredictable.
> >
> > Let's focus now on Intel 64-bit hosts only. Richard, can you perhaps
> > enlighten us on whether QEMU (from the point of view of TCG target)
> > behaves differently on different Intel 64-bit hosts, and to what
> > degree?
> >
> > I currently work remotely, but once I am be physically at my office I
> > will have a variety of hosts at the company, and would be happy to do
> > the comparison between them, wrt what you presented in Report 2.
> >
> > In conclusion, I think a basic description of your test bed is missing
> > in your reports. And, for final reports (which we call "nightly
> > reports") a detailed system description, as Mr Lukas outlined, is,
> > also in my opinion, necessary.
> >
> > Thanks, Mr. Lukas, for bringing this to our attention!
> >
>
> You're welcome. I'm more on the python side, but as far as I know different cpu models (provided their features are enabled) and especially architectures result in way different code-paths. Imagine an old processor without vector instructions compare to newer ones that can process multiple instructions at once.
>
> As for the reports, I don't think that at this point it would be necessary to focus on anything besides a single cpu model (x86_64 Intel) as there are already many variables. Later someone can follow-up with a cross-arch comparison, if necessary.
>
> Regards,
> Lukáš
>
> > Yours,
> > Aleksandar
> >
> >
> >
> >
> >> Best regards,
> >> Ahmed Karaman
> >
>
>
Thanks Mr. Lukáš and Aleksandar,
OK, now I see how important it is to have this information somewhere
on the reports page.

In response to Mr. Yongang, I said I will create an mini-report as a
guide for setting up the testbed.
I will add a section to this report with the detailed hardware
information of the used system.
Thanks for bringing this into attention.

Best regards,
Ahmed Karaman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-29 16:03 ` Alex Bennée
  2020-06-29 18:21   ` Aleksandar Markovic
  2020-06-29 21:16   ` Ahmed Karaman
@ 2020-07-01 13:44   ` Ahmed Karaman
  2020-07-01 15:42     ` Alex Bennée
  2 siblings, 1 reply; 27+ messages in thread
From: Ahmed Karaman @ 2020-07-01 13:44 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers,
	Richard Henderson

On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Assuming your test case is constant execution (i.e. runs the same each
> time) you could run in through a plugins build to extract the number of
> guest instructions, e.g.:
>
>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1
>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
>   insns: 158603512
>
> --
> Alex Bennée

Hi Mr. Alex,
I've created a plugins build as you've said using "--enable-plugins" option.
I've searched for "libinsn.so" plugin that you've mentioned in your
command but it isn't in that path.

Are there any other options that I should configure my build with?
Thanks in advance.

Regards,
Ahmed Karaman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman
                   ` (3 preceding siblings ...)
  2020-06-30  5:59 ` 罗勇刚(Yonggang Luo)
@ 2020-07-01 14:47 ` Ahmed Karaman
  4 siblings, 0 replies; 27+ messages in thread
From: Ahmed Karaman @ 2020-07-01 14:47 UTC (permalink / raw)
  To: Lukáš Doktor, luoyonggang, QEMU Developers
  Cc: Aleksandar Markovic, Alex Bennée, Richard Henderson

On Mon, Jun 29, 2020 at 12:25 PM Ahmed Karaman
<ahmedkhaledkaraman@gmail.com> wrote:
>
> Hi,
>
> The second report of the TCG Continuous Benchmarking series builds
> upon the QEMU performance metrics calculated in the previous report.
> This report presents a method to dissect the number of instructions
> executed by a QEMU invocation into three main phases:
> - Code Generation
> - JIT Execution
> - Helpers Execution
> It devises a Python script that automates this process.
>
> After that, the report presents an experiment for comparing the
> output of running the script on 17 different targets. Many conclusions
> can be drawn from the results and two of them are discussed in the
> analysis section.
>
> Report link:
> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>
> Previous reports:
> Report 1 - Measuring Basic Performance Metrics of QEMU:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>
> Best regards,
> Ahmed Karaman

Hi Mr. Lukáš and Yonggang,

I've created a separate "setup" page on the reports website.
https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/setup/

It contains the hardware and OS information of the used system.
It also contains all dependencies and setup instructions required to
set up a machine identical to the one used in the reports.

If you have any further questions or you're using a different Linux
distribution, please let me know.

Best regards,
Ahmed Karaman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-07-01 13:44   ` Ahmed Karaman
@ 2020-07-01 15:42     ` Alex Bennée
  2020-07-01 17:47       ` Ahmed Karaman
  2020-07-03 22:46       ` Aleksandar Markovic
  0 siblings, 2 replies; 27+ messages in thread
From: Alex Bennée @ 2020-07-01 15:42 UTC (permalink / raw)
  To: Ahmed Karaman
  Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers,
	Richard Henderson


Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:

> On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Assuming your test case is constant execution (i.e. runs the same each
>> time) you could run in through a plugins build to extract the number of
>> guest instructions, e.g.:
>>
>>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1
>>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
>>   insns: 158603512
>>
>> --
>> Alex Bennée
>
> Hi Mr. Alex,
> I've created a plugins build as you've said using "--enable-plugins" option.
> I've searched for "libinsn.so" plugin that you've mentioned in your
> command but it isn't in that path.

make plugins

and you should find them in tests/plugins/

>
> Are there any other options that I should configure my build with?
> Thanks in advance.
>
> Regards,
> Ahmed Karaman


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-07-01 15:42     ` Alex Bennée
@ 2020-07-01 17:47       ` Ahmed Karaman
  2020-07-03 22:46       ` Aleksandar Markovic
  1 sibling, 0 replies; 27+ messages in thread
From: Ahmed Karaman @ 2020-07-01 17:47 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers,
	Richard Henderson

On Wed, Jul 1, 2020 at 5:42 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:
>
> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> wrote:
> >>
> >> Assuming your test case is constant execution (i.e. runs the same each
> >> time) you could run in through a plugins build to extract the number of
> >> guest instructions, e.g.:
> >>
> >>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1
> >>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
> >>   insns: 158603512
> >>
> >> --
> >> Alex Bennée
> >
> > Hi Mr. Alex,
> > I've created a plugins build as you've said using "--enable-plugins" option.
> > I've searched for "libinsn.so" plugin that you've mentioned in your
> > command but it isn't in that path.
>
> make plugins
>
> and you should find them in tests/plugins/
>
> >
> > Are there any other options that I should configure my build with?
> > Thanks in advance.
> >
> > Regards,
> > Ahmed Karaman
>
>
> --
> Alex Bennée

Thanks a lot.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-07-01 15:42     ` Alex Bennée
  2020-07-01 17:47       ` Ahmed Karaman
@ 2020-07-03 22:46       ` Aleksandar Markovic
  2020-07-04  8:45         ` Alex Bennée
  1 sibling, 1 reply; 27+ messages in thread
From: Aleksandar Markovic @ 2020-07-03 22:46 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Ahmed Karaman, Lukáš Doktor, QEMU Developers,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 2039 bytes --]

On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote:

>
> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:
>
> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org>
> wrote:
> >>
> >> Assuming your test case is constant execution (i.e. runs the same each
> >> time) you could run in through a plugins build to extract the number of
> >> guest instructions, e.g.:
> >>
> >>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d
> plugin ./tests/tcg/aarch64-linux-user/sha1
> >>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
> >>   insns: 158603512
> >>
> >> --
> >> Alex Bennée
> >
> > Hi Mr. Alex,
> > I've created a plugins build as you've said using "--enable-plugins"
> option.
> > I've searched for "libinsn.so" plugin that you've mentioned in your
> > command but it isn't in that path.
>
> make plugins
>
> and you should find them in tests/plugins/
>
>
Hi, both Alex and Ahmed,

Ahmed showed me tonight the first results with number of guest
instructions. It was almost eye-opening to me. The thing is, by now, I had
only vague picture that, on average, "many" host instructions are generated
per one guest instruction. Now, I could see exact ratio for each target,
for a particular example.

A question for Alex:

- What would be the application of this new info? (Except that one has nice
feeling, like I do, of knowing the exact ratio host/guest instruction for a
particular scenario.)

I just have a feeling there is more significance of this new data that I
currently see. Could it be that it can be used in analysis of performance?
Or measuring quality of emulation (TCG operation)? But how exactly? What
conclusion could potentially be derived from knowing number of guest
instructions?

Sorry for a "stupid" question.

Aleksandar




> >
> > Are there any other options that I should configure my build with?
> > Thanks in advance.
> >
> > Regards,
> > Ahmed Karaman
>
>
> --
> Alex Bennée
>

[-- Attachment #2: Type: text/html, Size: 2867 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-07-03 22:46       ` Aleksandar Markovic
@ 2020-07-04  8:45         ` Alex Bennée
  2020-07-04  9:19           ` Aleksandar Markovic
                             ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Alex Bennée @ 2020-07-04  8:45 UTC (permalink / raw)
  To: Aleksandar Markovic
  Cc: Ahmed Karaman, Lukáš Doktor, QEMU Developers,
	Richard Henderson


Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> writes:

> On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>>
>> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:
>>
>> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org>
>> wrote:
>> >>
>> >> Assuming your test case is constant execution (i.e. runs the same each
>> >> time) you could run in through a plugins build to extract the number of
>> >> guest instructions, e.g.:
>> >>
>> >>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d
>> plugin ./tests/tcg/aarch64-linux-user/sha1
>> >>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
>> >>   insns: 158603512
>> >>
>> >> --
>> >> Alex Bennée
>> >
>> > Hi Mr. Alex,
>> > I've created a plugins build as you've said using "--enable-plugins"
>> option.
>> > I've searched for "libinsn.so" plugin that you've mentioned in your
>> > command but it isn't in that path.
>>
>> make plugins
>>
>> and you should find them in tests/plugins/
>>
>>
> Hi, both Alex and Ahmed,
>
> Ahmed showed me tonight the first results with number of guest
> instructions. It was almost eye-opening to me. The thing is, by now, I had
> only vague picture that, on average, "many" host instructions are generated
> per one guest instruction. Now, I could see exact ratio for each target,
> for a particular example.
>
> A question for Alex:
>
> - What would be the application of this new info? (Except that one has nice
> feeling, like I do, of knowing the exact ratio host/guest instruction for a
> particular scenario.)

Well I think the total number of guest instructions is important because
some architectures are more efficient than others and this will an
impact on the total executed instructions.

> I just have a feeling there is more significance of this new data that I
> currently see. Could it be that it can be used in analysis of performance?
> Or measuring quality of emulation (TCG operation)? But how exactly? What
> conclusion could potentially be derived from knowing number of guest
> instructions?

Knowing the ratio (especially as it changes between workloads) means you
can better pin point where the inefficiencies lie. You don't want to
spend your time chasing down an inefficiency that is down to the guest
compiler ;-) 

>
> Sorry for a "stupid" question.
>
> Aleksandar
>
>
>
>
>> >
>> > Are there any other options that I should configure my build with?
>> > Thanks in advance.
>> >
>> > Regards,
>> > Ahmed Karaman
>>
>>
>> --
>> Alex Bennée
>>


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-07-04  8:45         ` Alex Bennée
@ 2020-07-04  9:19           ` Aleksandar Markovic
  2020-07-04  9:55           ` Aleksandar Markovic
  2020-07-04 17:10           ` Ahmed Karaman
  2 siblings, 0 replies; 27+ messages in thread
From: Aleksandar Markovic @ 2020-07-04  9:19 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Ahmed Karaman, Lukáš Doktor, QEMU Developers,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 3256 bytes --]

On Saturday, July 4, 2020, Alex Bennée <alex.bennee@linaro.org> wrote:

>
> Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> writes:
>
> > On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> >>
> >> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:
> >>
> >> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org>
> >> wrote:
> >> >>
> >> >> Assuming your test case is constant execution (i.e. runs the same
> each
> >> >> time) you could run in through a plugins build to extract the number
> of
> >> >> guest instructions, e.g.:
> >> >>
> >> >>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so
> -d
> >> plugin ./tests/tcg/aarch64-linux-user/sha1
> >> >>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
> >> >>   insns: 158603512
> >> >>
> >> >> --
> >> >> Alex Bennée
> >> >
> >> > Hi Mr. Alex,
> >> > I've created a plugins build as you've said using "--enable-plugins"
> >> option.
> >> > I've searched for "libinsn.so" plugin that you've mentioned in your
> >> > command but it isn't in that path.
> >>
> >> make plugins
> >>
> >> and you should find them in tests/plugins/
> >>
> >>
> > Hi, both Alex and Ahmed,
> >
> > Ahmed showed me tonight the first results with number of guest
> > instructions. It was almost eye-opening to me. The thing is, by now, I
> had
> > only vague picture that, on average, "many" host instructions are
> generated
> > per one guest instruction. Now, I could see exact ratio for each target,
> > for a particular example.
> >
> > A question for Alex:
> >
> > - What would be the application of this new info? (Except that one has
> nice
> > feeling, like I do, of knowing the exact ratio host/guest instruction
> for a
> > particular scenario.)
>
> Well I think the total number of guest instructions is important because
> some architectures are more efficient than others and this will an
> impact on the total executed instructions.
>
> > I just have a feeling there is more significance of this new data that I
> > currently see. Could it be that it can be used in analysis of
> performance?
> > Or measuring quality of emulation (TCG operation)? But how exactly? What
> > conclusion could potentially be derived from knowing number of guest
> > instructions?
>
> Knowing the ratio (especially as it changes between workloads) means you
> can better pin point where the inefficiencies lie. You don't want to
> spend your time chasing down an inefficiency that is down to the guest
> compiler ;-)
>
>
Thanks, Alex.

I am still thinking, looking at broader picture, maybe that ratio, if
applied on appropriate set of diverse workloads and averaged, could be the
considered something like "efficiency of QEMU" - and that measure could
possibly be used when making some TCG changes, aimed to achieve better
performance.

Interesting!

A.



> >
> > Sorry for a "stupid" question.
> >
> > Aleksandar
> >
> >
> >
> >
> >> >
> >> > Are there any other options that I should configure my build with?
> >> > Thanks in advance.
> >> >
> >> > Regards,
> >> > Ahmed Karaman
> >>
> >>
> >> --
> >> Alex Bennée
> >>
>
>
> --
> Alex Bennée
>

[-- Attachment #2: Type: text/html, Size: 4552 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-07-04  8:45         ` Alex Bennée
  2020-07-04  9:19           ` Aleksandar Markovic
@ 2020-07-04  9:55           ` Aleksandar Markovic
  2020-07-04 17:10           ` Ahmed Karaman
  2 siblings, 0 replies; 27+ messages in thread
From: Aleksandar Markovic @ 2020-07-04  9:55 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Ahmed Karaman, Lukáš Doktor, QEMU Developers,
	Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 3441 bytes --]

On Saturday, July 4, 2020, Alex Bennée <alex.bennee@linaro.org> wrote:

>
> Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> writes:
>
> > On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> >>
> >> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:
> >>
> >> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org>
> >> wrote:
> >> >>
> >> >> Assuming your test case is constant execution (i.e. runs the same
> each
> >> >> time) you could run in through a plugins build to extract the number
> of
> >> >> guest instructions, e.g.:
> >> >>
> >> >>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so
> -d
> >> plugin ./tests/tcg/aarch64-linux-user/sha1
> >> >>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
> >> >>   insns: 158603512
> >> >>
> >> >> --
> >> >> Alex Bennée
> >> >
> >> > Hi Mr. Alex,
> >> > I've created a plugins build as you've said using "--enable-plugins"
> >> option.
> >> > I've searched for "libinsn.so" plugin that you've mentioned in your
> >> > command but it isn't in that path.
> >>
> >> make plugins
> >>
> >> and you should find them in tests/plugins/
> >>
> >>
> > Hi, both Alex and Ahmed,
> >
> > Ahmed showed me tonight the first results with number of guest
> > instructions. It was almost eye-opening to me. The thing is, by now, I
> had
> > only vague picture that, on average, "many" host instructions are
> generated
> > per one guest instruction. Now, I could see exact ratio for each target,
> > for a particular example.
> >
> > A question for Alex:
> >
> > - What would be the application of this new info? (Except that one has
> nice
> > feeling, like I do, of knowing the exact ratio host/guest instruction
> for a
> > particular scenario.)
>
> Well I think the total number of guest instructions is important because
> some architectures are more efficient than others and this will an
> impact on the total executed instructions.
>
> > I just have a feeling there is more significance of this new data that I
> > currently see. Could it be that it can be used in analysis of
> performance?
> > Or measuring quality of emulation (TCG operation)? But how exactly? What
> > conclusion could potentially be derived from knowing number of guest
> > instructions?
>
> Knowing the ratio (especially as it changes between workloads) means you
> can better pin point where the inefficiencies lie. You don't want to
> spend your time chasing down an inefficiency that is down to the guest
> compiler ;-)
>
>
Yes, it is definitely worth having the exact number of guest instructions!

Ahmed and I knew from the outset, like everybody else for that matter, that
workload and guest compiler and architecture itself immensly impact any
measurement.

However, if we keep the same guest, guest compiler, and workload as well,
and change just qemu, than we should be able to draw conclusion on
qemu-specific issues, and hopefully remove some inefficiencies. I hope you
will see that approach in next Ahmed's reports.

Aleksandar





> >
> > Sorry for a "stupid" question.
> >
> > Aleksandar
> >
> >
> >
> >
> >> >
> >> > Are there any other options that I should configure my build with?
> >> > Thanks in advance.
> >> >
> >> > Regards,
> >> > Ahmed Karaman
> >>
> >>
> >> --
> >> Alex Bennée
> >>
>
>
> --
> Alex Bennée
>

[-- Attachment #2: Type: text/html, Size: 4756 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts
  2020-07-04  8:45         ` Alex Bennée
  2020-07-04  9:19           ` Aleksandar Markovic
  2020-07-04  9:55           ` Aleksandar Markovic
@ 2020-07-04 17:10           ` Ahmed Karaman
  2 siblings, 0 replies; 27+ messages in thread
From: Ahmed Karaman @ 2020-07-04 17:10 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers,
	Richard Henderson

On Sat, Jul 4, 2020 at 10:45 AM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> writes:
>
> > On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote:
> >
> >>
> >> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes:
> >>
> >> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org>
> >> wrote:
> >> >>
> >> >> Assuming your test case is constant execution (i.e. runs the same each
> >> >> time) you could run in through a plugins build to extract the number of
> >> >> guest instructions, e.g.:
> >> >>
> >> >>   ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d
> >> plugin ./tests/tcg/aarch64-linux-user/sha1
> >> >>   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
> >> >>   insns: 158603512
> >> >>
> >> >> --
> >> >> Alex Bennée
> >> >
> >> > Hi Mr. Alex,
> >> > I've created a plugins build as you've said using "--enable-plugins"
> >> option.
> >> > I've searched for "libinsn.so" plugin that you've mentioned in your
> >> > command but it isn't in that path.
> >>
> >> make plugins
> >>
> >> and you should find them in tests/plugins/
> >>
> >>
> > Hi, both Alex and Ahmed,
> >
> > Ahmed showed me tonight the first results with number of guest
> > instructions. It was almost eye-opening to me. The thing is, by now, I had
> > only vague picture that, on average, "many" host instructions are generated
> > per one guest instruction. Now, I could see exact ratio for each target,
> > for a particular example.
> >
> > A question for Alex:
> >
> > - What would be the application of this new info? (Except that one has nice
> > feeling, like I do, of knowing the exact ratio host/guest instruction for a
> > particular scenario.)
>
> Well I think the total number of guest instructions is important because
> some architectures are more efficient than others and this will an
> impact on the total executed instructions.
>
> > I just have a feeling there is more significance of this new data that I
> > currently see. Could it be that it can be used in analysis of performance?
> > Or measuring quality of emulation (TCG operation)? But how exactly? What
> > conclusion could potentially be derived from knowing number of guest
> > instructions?
>
> Knowing the ratio (especially as it changes between workloads) means you
> can better pin point where the inefficiencies lie. You don't want to
> spend your time chasing down an inefficiency that is down to the guest
> compiler ;-)
>
> >
> > Sorry for a "stupid" question.
> >
> > Aleksandar
> >
> >
> >
> >
> >> >
> >> > Are there any other options that I should configure my build with?
> >> > Thanks in advance.
> >> >
> >> > Regards,
> >> > Ahmed Karaman
> >>
> >>
> >> --
> >> Alex Bennée
> >>
>
>
> --
> Alex Bennée

Thanks Mr. Alex for your help!

Regards,
Ahmed Karaman


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-07-04 17:11 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman
2020-06-29 10:40 ` Aleksandar Markovic
2020-06-29 14:26   ` Ahmed Karaman
2020-06-29 16:03 ` Alex Bennée
2020-06-29 18:21   ` Aleksandar Markovic
2020-06-29 21:16   ` Ahmed Karaman
2020-07-01 13:44   ` Ahmed Karaman
2020-07-01 15:42     ` Alex Bennée
2020-07-01 17:47       ` Ahmed Karaman
2020-07-03 22:46       ` Aleksandar Markovic
2020-07-04  8:45         ` Alex Bennée
2020-07-04  9:19           ` Aleksandar Markovic
2020-07-04  9:55           ` Aleksandar Markovic
2020-07-04 17:10           ` Ahmed Karaman
2020-06-30  4:33 ` Lukáš Doktor
2020-06-30  7:18   ` Ahmed Karaman
2020-06-30  8:58     ` Aleksandar Markovic
2020-06-30 12:46       ` Lukáš Doktor
2020-06-30 19:14         ` Ahmed Karaman
2020-06-30  9:41   ` Aleksandar Markovic
2020-06-30 12:58     ` Lukáš Doktor
2020-06-30  5:59 ` 罗勇刚(Yonggang Luo)
2020-06-30  7:29   ` Ahmed Karaman
2020-06-30  8:21     ` Aleksandar Markovic
2020-06-30  9:52       ` Aleksandar Markovic
2020-06-30 19:02         ` Ahmed Karaman
2020-07-01 14:47 ` Ahmed Karaman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.