* [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts @ 2020-06-29 10:25 Ahmed Karaman 2020-06-29 10:40 ` Aleksandar Markovic ` (4 more replies) 0 siblings, 5 replies; 27+ messages in thread From: Ahmed Karaman @ 2020-06-29 10:25 UTC (permalink / raw) To: QEMU Developers, Aleksandar Markovic, Alex Bennée, Eric Blake, Richard Henderson, Lukáš Doktor [-- Attachment #1: Type: text/plain, Size: 875 bytes --] Hi, The second report of the TCG Continuous Benchmarking series builds upon the QEMU performance metrics calculated in the previous report. This report presents a method to dissect the number of instructions executed by a QEMU invocation into three main phases: - Code Generation - JIT Execution - Helpers Execution It devises a Python script that automates this process. After that, the report presents an experiment for comparing the output of running the script on 17 different targets. Many conclusions can be drawn from the results and two of them are discussed in the analysis section. Report link: https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ Previous reports: Report 1 - Measuring Basic Performance Metrics of QEMU: https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html Best regards, Ahmed Karaman [-- Attachment #2: Type: text/html, Size: 1285 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman @ 2020-06-29 10:40 ` Aleksandar Markovic 2020-06-29 14:26 ` Ahmed Karaman 2020-06-29 16:03 ` Alex Bennée ` (3 subsequent siblings) 4 siblings, 1 reply; 27+ messages in thread From: Aleksandar Markovic @ 2020-06-29 10:40 UTC (permalink / raw) To: Ahmed Karaman Cc: Lukáš Doktor, Alex Bennée, QEMU Developers, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1271 bytes --] понедељак, 29. јун 2020., Ahmed Karaman <ahmedkhaledkaraman@gmail.com> је написао/ла: > Hi, > > The second report of the TCG Continuous Benchmarking series builds > upon the QEMU performance metrics calculated in the previous report. > This report presents a method to dissect the number of instructions > executed by a QEMU invocation into three main phases: > - Code Generation > - JIT Execution > - Helpers Execution > It devises a Python script that automates this process. > > After that, the report presents an experiment for comparing the > output of running the script on 17 different targets. Many conclusions > can be drawn from the results and two of them are discussed in the > analysis section. > > Report link: > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/ > Dissecting-QEMU-Into-Three-Main-Parts/ > > Previous reports: > Report 1 - Measuring Basic Performance Metrics of QEMU: > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > My sincere congratulations on the Report 2!! And, on top of that, this is an excellent idea to list previous reports, as you did in the paragraph above. Keep reports coming!! Aleksandar > Best regards, > Ahmed Karaman > [-- Attachment #2: Type: text/html, Size: 2080 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-29 10:40 ` Aleksandar Markovic @ 2020-06-29 14:26 ` Ahmed Karaman 0 siblings, 0 replies; 27+ messages in thread From: Ahmed Karaman @ 2020-06-29 14:26 UTC (permalink / raw) To: Aleksandar Markovic Cc: Lukáš Doktor, Alex Bennée, QEMU Developers, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1456 bytes --] Thank you for your support! On Mon, Jun 29, 2020, 12:40 PM Aleksandar Markovic < aleksandar.qemu.devel@gmail.com> wrote: > > > понедељак, 29. јун 2020., Ahmed Karaman <ahmedkhaledkaraman@gmail.com> је > написао/ла: > >> Hi, >> >> The second report of the TCG Continuous Benchmarking series builds >> upon the QEMU performance metrics calculated in the previous report. >> This report presents a method to dissect the number of instructions >> executed by a QEMU invocation into three main phases: >> - Code Generation >> - JIT Execution >> - Helpers Execution >> It devises a Python script that automates this process. >> >> After that, the report presents an experiment for comparing the >> output of running the script on 17 different targets. Many conclusions >> can be drawn from the results and two of them are discussed in the >> analysis section. >> >> Report link: >> >> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ >> >> Previous reports: >> Report 1 - Measuring Basic Performance Metrics of QEMU: >> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html >> >> > My sincere congratulations on the Report 2!! > > And, on top of that, this is an excellent idea to list previous reports, > as you did in the paragraph above. > > Keep reports coming!! > > Aleksandar > > > >> Best regards, >> Ahmed Karaman >> > [-- Attachment #2: Type: text/html, Size: 2519 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman 2020-06-29 10:40 ` Aleksandar Markovic @ 2020-06-29 16:03 ` Alex Bennée 2020-06-29 18:21 ` Aleksandar Markovic ` (2 more replies) 2020-06-30 4:33 ` Lukáš Doktor ` (2 subsequent siblings) 4 siblings, 3 replies; 27+ messages in thread From: Alex Bennée @ 2020-06-29 16:03 UTC (permalink / raw) To: Ahmed Karaman Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers, Richard Henderson Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes: > Hi, > > The second report of the TCG Continuous Benchmarking series builds > upon the QEMU performance metrics calculated in the previous report. > This report presents a method to dissect the number of instructions > executed by a QEMU invocation into three main phases: > - Code Generation > - JIT Execution > - Helpers Execution > It devises a Python script that automates this process. > > After that, the report presents an experiment for comparing the > output of running the script on 17 different targets. Many conclusions > can be drawn from the results and two of them are discussed in the > analysis section. A couple of comments. One think I think is missing from your analysis is the total number of guest instructions being emulated. As you point out each guest will have different code efficiency in terms of it's generated code. Assuming your test case is constant execution (i.e. runs the same each time) you could run in through a plugins build to extract the number of guest instructions, e.g.: ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1 SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 insns: 158603512 I should have also pointed out in your last report that running FP heavy code will always be biased towards helper/softfloat code to the detriment of everything else. I think you need more of a mix of benchmarks to get a better view. When Emilio did the last set of analysis he used a suite he built out of nbench and a perl benchmark: https://github.com/cota/dbt-bench As he quoted in his README: NBench programs are small, with execution time dominated by small code loops. Thus, when run under a DBT engine, the resulting performance depends almost entirely on the quality of the output code. The Perl benchmarks compile Perl code. As is common for compilation workloads, they execute large amounts of code and show no particular code execution hotspots. Thus, the resulting DBT performance depends largely on code translation speed. by only having one benchmark you are going to miss out on the envelope of use cases. > > Report link: >https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > > Previous reports: > Report 1 - Measuring Basic Performance Metrics of QEMU: > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > Best regards, > Ahmed Karaman -- Alex Bennée ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-29 16:03 ` Alex Bennée @ 2020-06-29 18:21 ` Aleksandar Markovic 2020-06-29 21:16 ` Ahmed Karaman 2020-07-01 13:44 ` Ahmed Karaman 2 siblings, 0 replies; 27+ messages in thread From: Aleksandar Markovic @ 2020-06-29 18:21 UTC (permalink / raw) To: Alex Bennée Cc: Ahmed Karaman, Lukáš Doktor, Aleksandar Markovic, QEMU Developers, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1290 bytes --] > I should have also pointed out in your > by only having one benchmark you are going to miss out on the envelope > of use cases. > Alex, thank you for all your comments, and other perspectives that you always bring to Ahmed's and everyones else's attention. I always imagine you as a "four-dimensional" engineer for the your unabashed presentation of out-of-the-box ideas. I actually truly like this, quite often, inspiring style. However, it seems to me that this last paragraph is a little unjust critique, and as if doesn't come from you. The report is not about a benchmark, it is about a script that does something. Ahmed never said "we are going to benchmark" anything. The program in the report is just an example used for illustration. And, now you say: it is not good for benchmarking. Well, no example is good for benchmarking, and, again, the report is not about benchmarking. Why do you mwntion benchmarking at all than? And what is Ahmed supposed to do? To flood the report with dozens of programs and dozens of tables, thousands of numbers, find some average - just to illustrate the script? The variety of test programs will be the subject of future reports. Otherwise, all intriguing and useful proposals from your side, and many thanks for them!! Yours, Aleksandar [-- Attachment #2: Type: text/html, Size: 1671 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-29 16:03 ` Alex Bennée 2020-06-29 18:21 ` Aleksandar Markovic @ 2020-06-29 21:16 ` Ahmed Karaman 2020-07-01 13:44 ` Ahmed Karaman 2 siblings, 0 replies; 27+ messages in thread From: Ahmed Karaman @ 2020-06-29 21:16 UTC (permalink / raw) To: Alex Bennée Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers, Richard Henderson On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> wrote: > > > Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes: > > > Hi, > > > > The second report of the TCG Continuous Benchmarking series builds > > upon the QEMU performance metrics calculated in the previous report. > > This report presents a method to dissect the number of instructions > > executed by a QEMU invocation into three main phases: > > - Code Generation > > - JIT Execution > > - Helpers Execution > > It devises a Python script that automates this process. > > > > After that, the report presents an experiment for comparing the > > output of running the script on 17 different targets. Many conclusions > > can be drawn from the results and two of them are discussed in the > > analysis section. > > A couple of comments. One think I think is missing from your analysis is > the total number of guest instructions being emulated. As you point out > each guest will have different code efficiency in terms of it's > generated code. > > Assuming your test case is constant execution (i.e. runs the same each > time) Yes indeed, the report utilizes Callgrind in the measurements so the results are very stable. >you could run in through a plugins build to extract the number of > guest instructions, e.g.: > > ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1 > SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 > insns: 158603512 > That's a very nice suggestion. Maybe this will be the idea of a whole new report. I'll try to execute the provided command and will let you know if I have any questions. > I should have also pointed out in your last report that running FP heavy > code will always be biased towards helper/softfloat code to the > detriment of everything else. I think you need more of a mix of > benchmarks to get a better view. > > When Emilio did the last set of analysis he used a suite he built out of > nbench and a perl benchmark: > > https://github.com/cota/dbt-bench > > As he quoted in his README: > > NBench programs are small, with execution time dominated by small code > loops. Thus, when run under a DBT engine, the resulting performance > depends almost entirely on the quality of the output code. > > The Perl benchmarks compile Perl code. As is common for compilation > workloads, they execute large amounts of code and show no particular > code execution hotspots. Thus, the resulting DBT performance depends > largely on code translation speed. > > by only having one benchmark you are going to miss out on the envelope > of use cases. > Future reports will introduce a variety of benchmarks. This report - and the previous one - are introductory reports. The benchmark used was to only demonstrate the report ideas. It was not used as a strict benchmarking program. > > > > Report link: > >https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > > > > Previous reports: > > Report 1 - Measuring Basic Performance Metrics of QEMU: > > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > > > Best regards, > > Ahmed Karaman > > > -- > Alex Bennée ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-29 16:03 ` Alex Bennée 2020-06-29 18:21 ` Aleksandar Markovic 2020-06-29 21:16 ` Ahmed Karaman @ 2020-07-01 13:44 ` Ahmed Karaman 2020-07-01 15:42 ` Alex Bennée 2 siblings, 1 reply; 27+ messages in thread From: Ahmed Karaman @ 2020-07-01 13:44 UTC (permalink / raw) To: Alex Bennée Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers, Richard Henderson On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> wrote: > > Assuming your test case is constant execution (i.e. runs the same each > time) you could run in through a plugins build to extract the number of > guest instructions, e.g.: > > ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1 > SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 > insns: 158603512 > > -- > Alex Bennée Hi Mr. Alex, I've created a plugins build as you've said using "--enable-plugins" option. I've searched for "libinsn.so" plugin that you've mentioned in your command but it isn't in that path. Are there any other options that I should configure my build with? Thanks in advance. Regards, Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-07-01 13:44 ` Ahmed Karaman @ 2020-07-01 15:42 ` Alex Bennée 2020-07-01 17:47 ` Ahmed Karaman 2020-07-03 22:46 ` Aleksandar Markovic 0 siblings, 2 replies; 27+ messages in thread From: Alex Bennée @ 2020-07-01 15:42 UTC (permalink / raw) To: Ahmed Karaman Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers, Richard Henderson Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes: > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> wrote: >> >> Assuming your test case is constant execution (i.e. runs the same each >> time) you could run in through a plugins build to extract the number of >> guest instructions, e.g.: >> >> ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1 >> SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 >> insns: 158603512 >> >> -- >> Alex Bennée > > Hi Mr. Alex, > I've created a plugins build as you've said using "--enable-plugins" option. > I've searched for "libinsn.so" plugin that you've mentioned in your > command but it isn't in that path. make plugins and you should find them in tests/plugins/ > > Are there any other options that I should configure my build with? > Thanks in advance. > > Regards, > Ahmed Karaman -- Alex Bennée ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-07-01 15:42 ` Alex Bennée @ 2020-07-01 17:47 ` Ahmed Karaman 2020-07-03 22:46 ` Aleksandar Markovic 1 sibling, 0 replies; 27+ messages in thread From: Ahmed Karaman @ 2020-07-01 17:47 UTC (permalink / raw) To: Alex Bennée Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers, Richard Henderson On Wed, Jul 1, 2020 at 5:42 PM Alex Bennée <alex.bennee@linaro.org> wrote: > > > Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes: > > > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> wrote: > >> > >> Assuming your test case is constant execution (i.e. runs the same each > >> time) you could run in through a plugins build to extract the number of > >> guest instructions, e.g.: > >> > >> ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1 > >> SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 > >> insns: 158603512 > >> > >> -- > >> Alex Bennée > > > > Hi Mr. Alex, > > I've created a plugins build as you've said using "--enable-plugins" option. > > I've searched for "libinsn.so" plugin that you've mentioned in your > > command but it isn't in that path. > > make plugins > > and you should find them in tests/plugins/ > > > > > Are there any other options that I should configure my build with? > > Thanks in advance. > > > > Regards, > > Ahmed Karaman > > > -- > Alex Bennée Thanks a lot. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-07-01 15:42 ` Alex Bennée 2020-07-01 17:47 ` Ahmed Karaman @ 2020-07-03 22:46 ` Aleksandar Markovic 2020-07-04 8:45 ` Alex Bennée 1 sibling, 1 reply; 27+ messages in thread From: Aleksandar Markovic @ 2020-07-03 22:46 UTC (permalink / raw) To: Alex Bennée Cc: Ahmed Karaman, Lukáš Doktor, QEMU Developers, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 2039 bytes --] On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote: > > Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes: > > > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> > wrote: > >> > >> Assuming your test case is constant execution (i.e. runs the same each > >> time) you could run in through a plugins build to extract the number of > >> guest instructions, e.g.: > >> > >> ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d > plugin ./tests/tcg/aarch64-linux-user/sha1 > >> SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 > >> insns: 158603512 > >> > >> -- > >> Alex Bennée > > > > Hi Mr. Alex, > > I've created a plugins build as you've said using "--enable-plugins" > option. > > I've searched for "libinsn.so" plugin that you've mentioned in your > > command but it isn't in that path. > > make plugins > > and you should find them in tests/plugins/ > > Hi, both Alex and Ahmed, Ahmed showed me tonight the first results with number of guest instructions. It was almost eye-opening to me. The thing is, by now, I had only vague picture that, on average, "many" host instructions are generated per one guest instruction. Now, I could see exact ratio for each target, for a particular example. A question for Alex: - What would be the application of this new info? (Except that one has nice feeling, like I do, of knowing the exact ratio host/guest instruction for a particular scenario.) I just have a feeling there is more significance of this new data that I currently see. Could it be that it can be used in analysis of performance? Or measuring quality of emulation (TCG operation)? But how exactly? What conclusion could potentially be derived from knowing number of guest instructions? Sorry for a "stupid" question. Aleksandar > > > > Are there any other options that I should configure my build with? > > Thanks in advance. > > > > Regards, > > Ahmed Karaman > > > -- > Alex Bennée > [-- Attachment #2: Type: text/html, Size: 2867 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-07-03 22:46 ` Aleksandar Markovic @ 2020-07-04 8:45 ` Alex Bennée 2020-07-04 9:19 ` Aleksandar Markovic ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Alex Bennée @ 2020-07-04 8:45 UTC (permalink / raw) To: Aleksandar Markovic Cc: Ahmed Karaman, Lukáš Doktor, QEMU Developers, Richard Henderson Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> writes: > On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote: > >> >> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes: >> >> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> >> wrote: >> >> >> >> Assuming your test case is constant execution (i.e. runs the same each >> >> time) you could run in through a plugins build to extract the number of >> >> guest instructions, e.g.: >> >> >> >> ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d >> plugin ./tests/tcg/aarch64-linux-user/sha1 >> >> SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 >> >> insns: 158603512 >> >> >> >> -- >> >> Alex Bennée >> > >> > Hi Mr. Alex, >> > I've created a plugins build as you've said using "--enable-plugins" >> option. >> > I've searched for "libinsn.so" plugin that you've mentioned in your >> > command but it isn't in that path. >> >> make plugins >> >> and you should find them in tests/plugins/ >> >> > Hi, both Alex and Ahmed, > > Ahmed showed me tonight the first results with number of guest > instructions. It was almost eye-opening to me. The thing is, by now, I had > only vague picture that, on average, "many" host instructions are generated > per one guest instruction. Now, I could see exact ratio for each target, > for a particular example. > > A question for Alex: > > - What would be the application of this new info? (Except that one has nice > feeling, like I do, of knowing the exact ratio host/guest instruction for a > particular scenario.) Well I think the total number of guest instructions is important because some architectures are more efficient than others and this will an impact on the total executed instructions. > I just have a feeling there is more significance of this new data that I > currently see. Could it be that it can be used in analysis of performance? > Or measuring quality of emulation (TCG operation)? But how exactly? What > conclusion could potentially be derived from knowing number of guest > instructions? Knowing the ratio (especially as it changes between workloads) means you can better pin point where the inefficiencies lie. You don't want to spend your time chasing down an inefficiency that is down to the guest compiler ;-) > > Sorry for a "stupid" question. > > Aleksandar > > > > >> > >> > Are there any other options that I should configure my build with? >> > Thanks in advance. >> > >> > Regards, >> > Ahmed Karaman >> >> >> -- >> Alex Bennée >> -- Alex Bennée ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-07-04 8:45 ` Alex Bennée @ 2020-07-04 9:19 ` Aleksandar Markovic 2020-07-04 9:55 ` Aleksandar Markovic 2020-07-04 17:10 ` Ahmed Karaman 2 siblings, 0 replies; 27+ messages in thread From: Aleksandar Markovic @ 2020-07-04 9:19 UTC (permalink / raw) To: Alex Bennée Cc: Ahmed Karaman, Lukáš Doktor, QEMU Developers, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 3256 bytes --] On Saturday, July 4, 2020, Alex Bennée <alex.bennee@linaro.org> wrote: > > Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> writes: > > > On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote: > > > >> > >> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes: > >> > >> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> > >> wrote: > >> >> > >> >> Assuming your test case is constant execution (i.e. runs the same > each > >> >> time) you could run in through a plugins build to extract the number > of > >> >> guest instructions, e.g.: > >> >> > >> >> ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so > -d > >> plugin ./tests/tcg/aarch64-linux-user/sha1 > >> >> SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 > >> >> insns: 158603512 > >> >> > >> >> -- > >> >> Alex Bennée > >> > > >> > Hi Mr. Alex, > >> > I've created a plugins build as you've said using "--enable-plugins" > >> option. > >> > I've searched for "libinsn.so" plugin that you've mentioned in your > >> > command but it isn't in that path. > >> > >> make plugins > >> > >> and you should find them in tests/plugins/ > >> > >> > > Hi, both Alex and Ahmed, > > > > Ahmed showed me tonight the first results with number of guest > > instructions. It was almost eye-opening to me. The thing is, by now, I > had > > only vague picture that, on average, "many" host instructions are > generated > > per one guest instruction. Now, I could see exact ratio for each target, > > for a particular example. > > > > A question for Alex: > > > > - What would be the application of this new info? (Except that one has > nice > > feeling, like I do, of knowing the exact ratio host/guest instruction > for a > > particular scenario.) > > Well I think the total number of guest instructions is important because > some architectures are more efficient than others and this will an > impact on the total executed instructions. > > > I just have a feeling there is more significance of this new data that I > > currently see. Could it be that it can be used in analysis of > performance? > > Or measuring quality of emulation (TCG operation)? But how exactly? What > > conclusion could potentially be derived from knowing number of guest > > instructions? > > Knowing the ratio (especially as it changes between workloads) means you > can better pin point where the inefficiencies lie. You don't want to > spend your time chasing down an inefficiency that is down to the guest > compiler ;-) > > Thanks, Alex. I am still thinking, looking at broader picture, maybe that ratio, if applied on appropriate set of diverse workloads and averaged, could be the considered something like "efficiency of QEMU" - and that measure could possibly be used when making some TCG changes, aimed to achieve better performance. Interesting! A. > > > > Sorry for a "stupid" question. > > > > Aleksandar > > > > > > > > > >> > > >> > Are there any other options that I should configure my build with? > >> > Thanks in advance. > >> > > >> > Regards, > >> > Ahmed Karaman > >> > >> > >> -- > >> Alex Bennée > >> > > > -- > Alex Bennée > [-- Attachment #2: Type: text/html, Size: 4552 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-07-04 8:45 ` Alex Bennée 2020-07-04 9:19 ` Aleksandar Markovic @ 2020-07-04 9:55 ` Aleksandar Markovic 2020-07-04 17:10 ` Ahmed Karaman 2 siblings, 0 replies; 27+ messages in thread From: Aleksandar Markovic @ 2020-07-04 9:55 UTC (permalink / raw) To: Alex Bennée Cc: Ahmed Karaman, Lukáš Doktor, QEMU Developers, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 3441 bytes --] On Saturday, July 4, 2020, Alex Bennée <alex.bennee@linaro.org> wrote: > > Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> writes: > > > On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote: > > > >> > >> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes: > >> > >> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> > >> wrote: > >> >> > >> >> Assuming your test case is constant execution (i.e. runs the same > each > >> >> time) you could run in through a plugins build to extract the number > of > >> >> guest instructions, e.g.: > >> >> > >> >> ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so > -d > >> plugin ./tests/tcg/aarch64-linux-user/sha1 > >> >> SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 > >> >> insns: 158603512 > >> >> > >> >> -- > >> >> Alex Bennée > >> > > >> > Hi Mr. Alex, > >> > I've created a plugins build as you've said using "--enable-plugins" > >> option. > >> > I've searched for "libinsn.so" plugin that you've mentioned in your > >> > command but it isn't in that path. > >> > >> make plugins > >> > >> and you should find them in tests/plugins/ > >> > >> > > Hi, both Alex and Ahmed, > > > > Ahmed showed me tonight the first results with number of guest > > instructions. It was almost eye-opening to me. The thing is, by now, I > had > > only vague picture that, on average, "many" host instructions are > generated > > per one guest instruction. Now, I could see exact ratio for each target, > > for a particular example. > > > > A question for Alex: > > > > - What would be the application of this new info? (Except that one has > nice > > feeling, like I do, of knowing the exact ratio host/guest instruction > for a > > particular scenario.) > > Well I think the total number of guest instructions is important because > some architectures are more efficient than others and this will an > impact on the total executed instructions. > > > I just have a feeling there is more significance of this new data that I > > currently see. Could it be that it can be used in analysis of > performance? > > Or measuring quality of emulation (TCG operation)? But how exactly? What > > conclusion could potentially be derived from knowing number of guest > > instructions? > > Knowing the ratio (especially as it changes between workloads) means you > can better pin point where the inefficiencies lie. You don't want to > spend your time chasing down an inefficiency that is down to the guest > compiler ;-) > > Yes, it is definitely worth having the exact number of guest instructions! Ahmed and I knew from the outset, like everybody else for that matter, that workload and guest compiler and architecture itself immensly impact any measurement. However, if we keep the same guest, guest compiler, and workload as well, and change just qemu, than we should be able to draw conclusion on qemu-specific issues, and hopefully remove some inefficiencies. I hope you will see that approach in next Ahmed's reports. Aleksandar > > > > Sorry for a "stupid" question. > > > > Aleksandar > > > > > > > > > >> > > >> > Are there any other options that I should configure my build with? > >> > Thanks in advance. > >> > > >> > Regards, > >> > Ahmed Karaman > >> > >> > >> -- > >> Alex Bennée > >> > > > -- > Alex Bennée > [-- Attachment #2: Type: text/html, Size: 4756 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-07-04 8:45 ` Alex Bennée 2020-07-04 9:19 ` Aleksandar Markovic 2020-07-04 9:55 ` Aleksandar Markovic @ 2020-07-04 17:10 ` Ahmed Karaman 2 siblings, 0 replies; 27+ messages in thread From: Ahmed Karaman @ 2020-07-04 17:10 UTC (permalink / raw) To: Alex Bennée Cc: Lukáš Doktor, Aleksandar Markovic, QEMU Developers, Richard Henderson On Sat, Jul 4, 2020 at 10:45 AM Alex Bennée <alex.bennee@linaro.org> wrote: > > > Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> writes: > > > On Wednesday, July 1, 2020, Alex Bennée <alex.bennee@linaro.org> wrote: > > > >> > >> Ahmed Karaman <ahmedkhaledkaraman@gmail.com> writes: > >> > >> > On Mon, Jun 29, 2020 at 6:03 PM Alex Bennée <alex.bennee@linaro.org> > >> wrote: > >> >> > >> >> Assuming your test case is constant execution (i.e. runs the same each > >> >> time) you could run in through a plugins build to extract the number of > >> >> guest instructions, e.g.: > >> >> > >> >> ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d > >> plugin ./tests/tcg/aarch64-linux-user/sha1 > >> >> SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 > >> >> insns: 158603512 > >> >> > >> >> -- > >> >> Alex Bennée > >> > > >> > Hi Mr. Alex, > >> > I've created a plugins build as you've said using "--enable-plugins" > >> option. > >> > I've searched for "libinsn.so" plugin that you've mentioned in your > >> > command but it isn't in that path. > >> > >> make plugins > >> > >> and you should find them in tests/plugins/ > >> > >> > > Hi, both Alex and Ahmed, > > > > Ahmed showed me tonight the first results with number of guest > > instructions. It was almost eye-opening to me. The thing is, by now, I had > > only vague picture that, on average, "many" host instructions are generated > > per one guest instruction. Now, I could see exact ratio for each target, > > for a particular example. > > > > A question for Alex: > > > > - What would be the application of this new info? (Except that one has nice > > feeling, like I do, of knowing the exact ratio host/guest instruction for a > > particular scenario.) > > Well I think the total number of guest instructions is important because > some architectures are more efficient than others and this will an > impact on the total executed instructions. > > > I just have a feeling there is more significance of this new data that I > > currently see. Could it be that it can be used in analysis of performance? > > Or measuring quality of emulation (TCG operation)? But how exactly? What > > conclusion could potentially be derived from knowing number of guest > > instructions? > > Knowing the ratio (especially as it changes between workloads) means you > can better pin point where the inefficiencies lie. You don't want to > spend your time chasing down an inefficiency that is down to the guest > compiler ;-) > > > > > Sorry for a "stupid" question. > > > > Aleksandar > > > > > > > > > >> > > >> > Are there any other options that I should configure my build with? > >> > Thanks in advance. > >> > > >> > Regards, > >> > Ahmed Karaman > >> > >> > >> -- > >> Alex Bennée > >> > > > -- > Alex Bennée Thanks Mr. Alex for your help! Regards, Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman 2020-06-29 10:40 ` Aleksandar Markovic 2020-06-29 16:03 ` Alex Bennée @ 2020-06-30 4:33 ` Lukáš Doktor 2020-06-30 7:18 ` Ahmed Karaman 2020-06-30 9:41 ` Aleksandar Markovic 2020-06-30 5:59 ` 罗勇刚(Yonggang Luo) 2020-07-01 14:47 ` Ahmed Karaman 4 siblings, 2 replies; 27+ messages in thread From: Lukáš Doktor @ 2020-06-30 4:33 UTC (permalink / raw) To: Ahmed Karaman, QEMU Developers, Aleksandar Markovic, Alex Bennée, Eric Blake, Richard Henderson [-- Attachment #1.1: Type: text/plain, Size: 2630 bytes --] Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a): > Hi, > > The second report of the TCG Continuous Benchmarking series builds > upon the QEMU performance metrics calculated in the previous report. > This report presents a method to dissect the number of instructions > executed by a QEMU invocation into three main phases: > - Code Generation > - JIT Execution > - Helpers Execution > It devises a Python script that automates this process. > > After that, the report presents an experiment for comparing the > output of running the script on 17 different targets. Many conclusions > can be drawn from the results and two of them are discussed in the > analysis section. > > Report link: > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > > Previous reports: > Report 1 - Measuring Basic Performance Metrics of QEMU: > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > Best regards, > Ahmed Karaman Hello Ahmed, very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like: architecture: x86_64 cpu_codename: Kaby Lake cpu: i7-8650U ram: 32GB DDR4 os: Fedora 32 qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7 compiler: gcc-10.1.1-1.fc32.x86_64 flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm --enable-vhost-net --enable-vhost-net --enable-attr --enable-kvm --enable-fdt --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion. Regards, Lukáš PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script. Regards, Lukáš [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 4:33 ` Lukáš Doktor @ 2020-06-30 7:18 ` Ahmed Karaman 2020-06-30 8:58 ` Aleksandar Markovic 2020-06-30 9:41 ` Aleksandar Markovic 1 sibling, 1 reply; 27+ messages in thread From: Ahmed Karaman @ 2020-06-30 7:18 UTC (permalink / raw) To: Lukáš Doktor Cc: Aleksandar Markovic, Alex Bennée, QEMU Developers, Richard Henderson On Tue, Jun 30, 2020 at 6:34 AM Lukáš Doktor <ldoktor@redhat.com> wrote: > > Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a): > > Hi, > > > > The second report of the TCG Continuous Benchmarking series builds > > upon the QEMU performance metrics calculated in the previous report. > > This report presents a method to dissect the number of instructions > > executed by a QEMU invocation into three main phases: > > - Code Generation > > - JIT Execution > > - Helpers Execution > > It devises a Python script that automates this process. > > > > After that, the report presents an experiment for comparing the > > output of running the script on 17 different targets. Many conclusions > > can be drawn from the results and two of them are discussed in the > > analysis section. > > > > Report link: > > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > > > > Previous reports: > > Report 1 - Measuring Basic Performance Metrics of QEMU: > > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > > > Best regards, > > Ahmed Karaman > > Hello Ahmed, > > very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like: > > architecture: x86_64 > cpu_codename: Kaby Lake > cpu: i7-8650U > ram: 32GB DDR4 > os: Fedora 32 > qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7 > compiler: gcc-10.1.1-1.fc32.x86_64 > flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm --enable-vhost-net --enable-vhost-net --enable-attr --enable-kvm --enable-fdt --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring > > would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion. > > Regards, > Lukáš > > PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script. > > Regards, > Lukáš > Thanks Mr. Lukáš, I'm really glad you found both reports interesting. Both reports are based on QEMU version 5.0.0, this wasn't mentioned in the reports so thanks for the reminder. I'll add a short note about that. The used QEMU build is a very basic GCC build (created by just running ../configure in the build directory without any flags). Regarding the detailed machine information (CPU, RAM ... etc), The two reports introduce some concepts and methodologies that will produce consistent results on whichever machine they are executed on. So I think it's unnecessary to mention the detailed system information used in the reports for now. Best regards, Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 7:18 ` Ahmed Karaman @ 2020-06-30 8:58 ` Aleksandar Markovic 2020-06-30 12:46 ` Lukáš Doktor 0 siblings, 1 reply; 27+ messages in thread From: Aleksandar Markovic @ 2020-06-30 8:58 UTC (permalink / raw) To: Ahmed Karaman Cc: Lukáš Doktor, Alex Bennée, QEMU Developers, Richard Henderson уто, 30. јун 2020. у 09:19 Ahmed Karaman <ahmedkhaledkaraman@gmail.com> је написао/ла: > > On Tue, Jun 30, 2020 at 6:34 AM Lukáš Doktor <ldoktor@redhat.com> wrote: > > > > Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a): > > > Hi, > > > > > > The second report of the TCG Continuous Benchmarking series builds > > > upon the QEMU performance metrics calculated in the previous report. > > > This report presents a method to dissect the number of instructions > > > executed by a QEMU invocation into three main phases: > > > - Code Generation > > > - JIT Execution > > > - Helpers Execution > > > It devises a Python script that automates this process. > > > > > > After that, the report presents an experiment for comparing the > > > output of running the script on 17 different targets. Many conclusions > > > can be drawn from the results and two of them are discussed in the > > > analysis section. > > > > > > Report link: > > > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > > > > > > Previous reports: > > > Report 1 - Measuring Basic Performance Metrics of QEMU: > > > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > > > > > Best regards, > > > Ahmed Karaman > > > > Hello Ahmed, > > > > very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like: > > > > architecture: x86_64 > > cpu_codename: Kaby Lake > > cpu: i7-8650U > > ram: 32GB DDR4 > > os: Fedora 32 > > qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7 > > compiler: gcc-10.1.1-1.fc32.x86_64 > > flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm --enable-vhost-net --enable-vhost-net --enable-attr --enable-kvm --enable-fdt --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring > > > > would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion. > > > > Regards, > > Lukáš > > > > PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script. > > > > Regards, > > Lukáš > > > Thanks Mr. Lukáš, I'm really glad you found both reports interesting. > > Both reports are based on QEMU version 5.0.0, this wasn't mentioned in > the reports so thanks for the reminder. I'll add a short note about > that. > > The used QEMU build is a very basic GCC build (created by just running > ../configure in the build directory without any flags). > > Regarding the detailed machine information (CPU, RAM ... etc), The two > reports introduce some concepts and methodologies that will produce > consistent results on whichever machine they are executed on. So I > think it's unnecessary to mention the detailed system information used > in the reports for now. > Ahmed, I don't entirely agree with you on this topic. I think you treated Mr. Lukas comments in an overly lax way. Yes, the results will be stable (within a small fraction of a percent) on a particular given system (which is proved in "Stability Experiment" section of Report 1). That is great! Although it sounds elementary, this is not easy to achieve, so I am glad you did it. However, we know that the results for hosts of different architectures will be different - we expect that. 32-bit Intel host will also most likely produce significantly different results than 64-bit Intel hosts. By the way, 64-bit targets in QEMU linux-user mode are not supported on 32-bit hosts (although nothing stops the user to start corresponding instances of QEMU on a 32-bit host, but the results are unpredictable. Let's focus now on Intel 64-bit hosts only. Richard, can you perhaps enlighten us on whether QEMU (from the point of view of TCG target) behaves differently on different Intel 64-bit hosts, and to what degree? I currently work remotely, but once I am be physically at my office I will have a variety of hosts at the company, and would be happy to do the comparison between them, wrt what you presented in Report 2. In conclusion, I think a basic description of your test bed is missing in your reports. And, for final reports (which we call "nightly reports") a detailed system description, as Mr Lukas outlined, is, also in my opinion, necessary. Thanks, Mr. Lukas, for bringing this to our attention! Yours, Aleksandar > Best regards, > Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 8:58 ` Aleksandar Markovic @ 2020-06-30 12:46 ` Lukáš Doktor 2020-06-30 19:14 ` Ahmed Karaman 0 siblings, 1 reply; 27+ messages in thread From: Lukáš Doktor @ 2020-06-30 12:46 UTC (permalink / raw) To: Aleksandar Markovic, Ahmed Karaman Cc: Alex Bennée, QEMU Developers, Richard Henderson [-- Attachment #1.1: Type: text/plain, Size: 1881 bytes --] > However, we know that the results for hosts of different architectures > will be different - we expect that. > > 32-bit Intel host will also most likely produce significantly > different results than 64-bit Intel hosts. By the way, 64-bit targets > in QEMU linux-user mode are not supported on 32-bit hosts (although > nothing stops the user to start corresponding instances of QEMU on a > 32-bit host, but the results are unpredictable. > > Let's focus now on Intel 64-bit hosts only. Richard, can you perhaps > enlighten us on whether QEMU (from the point of view of TCG target) > behaves differently on different Intel 64-bit hosts, and to what > degree? > > I currently work remotely, but once I am be physically at my office I > will have a variety of hosts at the company, and would be happy to do > the comparison between them, wrt what you presented in Report 2. > > In conclusion, I think a basic description of your test bed is missing > in your reports. And, for final reports (which we call "nightly > reports") a detailed system description, as Mr Lukas outlined, is, > also in my opinion, necessary. > > Thanks, Mr. Lukas, for bringing this to our attention! > You're welcome. I'm more on the python side, but as far as I know different cpu models (provided their features are enabled) and especially architectures result in way different code-paths. Imagine an old processor without vector instructions compare to newer ones that can process multiple instructions at once. As for the reports, I don't think that at this point it would be necessary to focus on anything besides a single cpu model (x86_64 Intel) as there are already many variables. Later someone can follow-up with a cross-arch comparison, if necessary. Regards, Lukáš > Yours, > Aleksandar > > > > >> Best regards, >> Ahmed Karaman > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 12:46 ` Lukáš Doktor @ 2020-06-30 19:14 ` Ahmed Karaman 0 siblings, 0 replies; 27+ messages in thread From: Ahmed Karaman @ 2020-06-30 19:14 UTC (permalink / raw) To: Lukáš Doktor, Aleksandar Markovic Cc: Alex Bennée, QEMU Developers, Richard Henderson On Tue, Jun 30, 2020 at 2:46 PM Lukáš Doktor <ldoktor@redhat.com> wrote: > > > However, we know that the results for hosts of different architectures > > will be different - we expect that. > > > > 32-bit Intel host will also most likely produce significantly > > different results than 64-bit Intel hosts. By the way, 64-bit targets > > in QEMU linux-user mode are not supported on 32-bit hosts (although > > nothing stops the user to start corresponding instances of QEMU on a > > 32-bit host, but the results are unpredictable. > > > > Let's focus now on Intel 64-bit hosts only. Richard, can you perhaps > > enlighten us on whether QEMU (from the point of view of TCG target) > > behaves differently on different Intel 64-bit hosts, and to what > > degree? > > > > I currently work remotely, but once I am be physically at my office I > > will have a variety of hosts at the company, and would be happy to do > > the comparison between them, wrt what you presented in Report 2. > > > > In conclusion, I think a basic description of your test bed is missing > > in your reports. And, for final reports (which we call "nightly > > reports") a detailed system description, as Mr Lukas outlined, is, > > also in my opinion, necessary. > > > > Thanks, Mr. Lukas, for bringing this to our attention! > > > > You're welcome. I'm more on the python side, but as far as I know different cpu models (provided their features are enabled) and especially architectures result in way different code-paths. Imagine an old processor without vector instructions compare to newer ones that can process multiple instructions at once. > > As for the reports, I don't think that at this point it would be necessary to focus on anything besides a single cpu model (x86_64 Intel) as there are already many variables. Later someone can follow-up with a cross-arch comparison, if necessary. > > Regards, > Lukáš > > > Yours, > > Aleksandar > > > > > > > > > >> Best regards, > >> Ahmed Karaman > > > > Thanks Mr. Lukáš and Aleksandar, OK, now I see how important it is to have this information somewhere on the reports page. In response to Mr. Yongang, I said I will create an mini-report as a guide for setting up the testbed. I will add a section to this report with the detailed hardware information of the used system. Thanks for bringing this into attention. Best regards, Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 4:33 ` Lukáš Doktor 2020-06-30 7:18 ` Ahmed Karaman @ 2020-06-30 9:41 ` Aleksandar Markovic 2020-06-30 12:58 ` Lukáš Doktor 1 sibling, 1 reply; 27+ messages in thread From: Aleksandar Markovic @ 2020-06-30 9:41 UTC (permalink / raw) To: Lukáš Doktor Cc: Ahmed Karaman, Alex Bennée, QEMU Developers, Richard Henderson уто, 30. јун 2020. у 06:34 Lukáš Doktor <ldoktor@redhat.com> је написао/ла: > > Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a): > > Hi, > > > > The second report of the TCG Continuous Benchmarking series builds > > upon the QEMU performance metrics calculated in the previous report. > > This report presents a method to dissect the number of instructions > > executed by a QEMU invocation into three main phases: > > - Code Generation > > - JIT Execution > > - Helpers Execution > > It devises a Python script that automates this process. > > > > After that, the report presents an experiment for comparing the > > output of running the script on 17 different targets. Many conclusions > > can be drawn from the results and two of them are discussed in the > > analysis section. > > > > Report link: > > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > > > > Previous reports: > > Report 1 - Measuring Basic Performance Metrics of QEMU: > > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > > > Best regards, > > Ahmed Karaman > > Hello Ahmed, > > very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like: > > architecture: x86_64 > cpu_codename: Kaby Lake > cpu: i7-8650U > ram: 32GB DDR4 > os: Fedora 32 > qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7 > compiler: gcc-10.1.1-1.fc32.x86_64 > flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm --enable-vhost-net --enable-vhost-net --enable-attr --enable-kvm --enable-fdt --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring > > would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion. > I just want to follow up on this observation here, and not related to Ahmed's report at all. We often receive bug reports of the following style: "I have Debian 10.2 system and mips emulation misbehaves". As you may imagine, I assign the bug to myself, install Debian 10.2 system on my experimental box, and mips emulation works like charm. <banging-head-against-the-wall-emoji> Obviously, I need more info on the submitter's system. After all these years, we don't have (or at least I don't know about it) a script that we could give the submitter, that picks up various aspects of his system. This script, since it is not "for presentation" could be even far more aggressive in picking ups system information that what Lukas mentioned above. It could collect the output of various relevant commands, and yip it in a single file. We should have "get_system_info.py" in our scripts directory! Sincerely, Aleksandar > Regards, > Lukáš > > PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script. > > Regards, > Lukáš > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 9:41 ` Aleksandar Markovic @ 2020-06-30 12:58 ` Lukáš Doktor 0 siblings, 0 replies; 27+ messages in thread From: Lukáš Doktor @ 2020-06-30 12:58 UTC (permalink / raw) To: Aleksandar Markovic Cc: Ahmed Karaman, Alex Bennée, QEMU Developers, Richard Henderson [-- Attachment #1.1: Type: text/plain, Size: 4573 bytes --] Dne 30. 06. 20 v 11:41 Aleksandar Markovic napsal(a): > уто, 30. јун 2020. у 06:34 Lukáš Doktor <ldoktor@redhat.com> је написао/ла: >> >> Dne 29. 06. 20 v 12:25 Ahmed Karaman napsal(a): >>> Hi, >>> >>> The second report of the TCG Continuous Benchmarking series builds >>> upon the QEMU performance metrics calculated in the previous report. >>> This report presents a method to dissect the number of instructions >>> executed by a QEMU invocation into three main phases: >>> - Code Generation >>> - JIT Execution >>> - Helpers Execution >>> It devises a Python script that automates this process. >>> >>> After that, the report presents an experiment for comparing the >>> output of running the script on 17 different targets. Many conclusions >>> can be drawn from the results and two of them are discussed in the >>> analysis section. >>> >>> Report link: >>> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ >>> >>> Previous reports: >>> Report 1 - Measuring Basic Performance Metrics of QEMU: >>> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html >>> >>> Best regards, >>> Ahmed Karaman >> >> Hello Ahmed, >> >> very nice reading, both reports so far. One thing that could be better displayed is the system you used this to generate. This would come handy especially later when you move from examples to actual reports. I think it'd make sense to add a section with a clear definition of the machine as well as the operation system, qemu version and eventually other deps (like compiler, flags, ...). For this report something like: >> >> architecture: x86_64 >> cpu_codename: Kaby Lake >> cpu: i7-8650U >> ram: 32GB DDR4 >> os: Fedora 32 >> qemu: 470dd165d152ff7ceac61c7b71c2b89220b3aad7 >> compiler: gcc-10.1.1-1.fc32.x86_64 >> flags: --target-list="x86_64-softmmu,ppc64-softmmu,aarch64-softmmu,s390x-softmmu,riscv64-softmmu" --disable-werror --disable-sparse --enable-sdl --enable-kvm --enable-vhost-net --enable-vhost-net --enable-attr --enable-kvm --enable-fdt --enable-vnc --enable-seccomp --block-drv-rw-whitelist="vmdk,null-aio,quorum,null-co,blkverify,file,nbd,raw,blkdebug,host_device,qed,nbd,iscsi,gluster,rbd,qcow2,throttle,copy-on-read" --python=/usr/bin/python3 --enable-linux-io-uring >> >> would do. Maybe it'd be even a good idea to create a script to report this basic set of information and add it after each of the perf scripts so people don't forget to double-check the conditions, but others might disagree so take this only as a suggestion. >> > > I just want to follow up on this observation here, and not related to > Ahmed's report at all. > > We often receive bug reports of the following style: "I have Debian > 10.2 system and mips emulation misbehaves". As you may imagine, I > assign the bug to myself, install Debian 10.2 system on my > experimental box, and mips emulation works like charm. > <banging-head-against-the-wall-emoji> Obviously, I need more info on > the submitter's system. > > After all these years, we don't have (or at least I don't know about > it) a script that we could give the submitter, that picks up various > aspects of his system. This script, since it is not "for presentation" > could be even far more aggressive in picking ups system information > that what Lukas mentioned above. It could collect the output of > various relevant commands, and yip it in a single file. We should have > "get_system_info.py" in our scripts directory! > > Sincerely, > Aleksandar > Well this itself is a very complicated matter that could deserve a GSoC project. It's hard to balance the utils required to obtain the knowledge. I'm fond of sosreport, that is heavily used by RH but the result is quite big. Slightly smaller set can be generated via ansible, which itself gathers a lot of useful information. If we are to speak only about minimal approach especially tailored to qemu, than I'd suggest taking a look at `avocado.utils` especially `avocado.utils.cpu` as Avocado is already used for qemu testing. Anyway don't consider this as a complete list, I just wanted to demonstrate how difficult and complex this subject is. Regards, Lukáš > >> Regards, >> Lukáš >> >> PS: Automated cpu codenames, hosts OSes and such could be tricky, but one can use other libraries or just best-effort-approach with fallback to "unknown" to let people filling it manually or adding their branch to your script. >> >> Regards, >> Lukáš >> > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman ` (2 preceding siblings ...) 2020-06-30 4:33 ` Lukáš Doktor @ 2020-06-30 5:59 ` 罗勇刚(Yonggang Luo) 2020-06-30 7:29 ` Ahmed Karaman 2020-07-01 14:47 ` Ahmed Karaman 4 siblings, 1 reply; 27+ messages in thread From: 罗勇刚(Yonggang Luo) @ 2020-06-30 5:59 UTC (permalink / raw) To: Ahmed Karaman Cc: Lukáš Doktor, Alex Bennée, QEMU Developers, Aleksandar Markovic, Richard Henderson [-- Attachment #1: Type: text/plain, Size: 1183 bytes --] Wonderful work, May I reproduce the work on my local machine? On Mon, Jun 29, 2020 at 6:26 PM Ahmed Karaman <ahmedkhaledkaraman@gmail.com> wrote: > Hi, > > The second report of the TCG Continuous Benchmarking series builds > upon the QEMU performance metrics calculated in the previous report. > This report presents a method to dissect the number of instructions > executed by a QEMU invocation into three main phases: > - Code Generation > - JIT Execution > - Helpers Execution > It devises a Python script that automates this process. > > After that, the report presents an experiment for comparing the > output of running the script on 17 different targets. Many conclusions > can be drawn from the results and two of them are discussed in the > analysis section. > > Report link: > > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > > Previous reports: > Report 1 - Measuring Basic Performance Metrics of QEMU: > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > Best regards, > Ahmed Karaman > -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo [-- Attachment #2: Type: text/html, Size: 1910 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 5:59 ` 罗勇刚(Yonggang Luo) @ 2020-06-30 7:29 ` Ahmed Karaman 2020-06-30 8:21 ` Aleksandar Markovic 0 siblings, 1 reply; 27+ messages in thread From: Ahmed Karaman @ 2020-06-30 7:29 UTC (permalink / raw) To: luoyonggang Cc: Lukáš Doktor, Alex Bennée, QEMU Developers, Aleksandar Markovic, Richard Henderson On Tue, Jun 30, 2020 at 7:59 AM 罗勇刚(Yonggang Luo) <luoyonggang@gmail.com> wrote: > > Wonderful work, May I reproduce the work on my local machine? > > On Mon, Jun 29, 2020 at 6:26 PM Ahmed Karaman <ahmedkhaledkaraman@gmail.com> wrote: >> >> Hi, >> >> The second report of the TCG Continuous Benchmarking series builds >> upon the QEMU performance metrics calculated in the previous report. >> This report presents a method to dissect the number of instructions >> executed by a QEMU invocation into three main phases: >> - Code Generation >> - JIT Execution >> - Helpers Execution >> It devises a Python script that automates this process. >> >> After that, the report presents an experiment for comparing the >> output of running the script on 17 different targets. Many conclusions >> can be drawn from the results and two of them are discussed in the >> analysis section. >> >> Report link: >> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ >> >> Previous reports: >> Report 1 - Measuring Basic Performance Metrics of QEMU: >> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html >> >> Best regards, >> Ahmed Karaman > > > > -- > 此致 > 礼 > 罗勇刚 > Yours > sincerely, > Yonggang Luo Thanks Mr. Yonggang. Yes of course, go ahead. Please let me know if you have any further questions. Best Regards, Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 7:29 ` Ahmed Karaman @ 2020-06-30 8:21 ` Aleksandar Markovic 2020-06-30 9:52 ` Aleksandar Markovic 0 siblings, 1 reply; 27+ messages in thread From: Aleksandar Markovic @ 2020-06-30 8:21 UTC (permalink / raw) To: Ahmed Karaman Cc: Lukáš Doktor, Alex Bennée, QEMU Developers, luoyonggang, Richard Henderson уто, 30. јун 2020. у 09:30 Ahmed Karaman <ahmedkhaledkaraman@gmail.com> је написао/ла: > > On Tue, Jun 30, 2020 at 7:59 AM 罗勇刚(Yonggang Luo) <luoyonggang@gmail.com> wrote: > > > > Wonderful work, May I reproduce the work on my local machine? > > > > On Mon, Jun 29, 2020 at 6:26 PM Ahmed Karaman <ahmedkhaledkaraman@gmail.com> wrote: > >> > >> Hi, > >> > >> The second report of the TCG Continuous Benchmarking series builds > >> upon the QEMU performance metrics calculated in the previous report. > >> This report presents a method to dissect the number of instructions > >> executed by a QEMU invocation into three main phases: > >> - Code Generation > >> - JIT Execution > >> - Helpers Execution > >> It devises a Python script that automates this process. > >> > >> After that, the report presents an experiment for comparing the > >> output of running the script on 17 different targets. Many conclusions > >> can be drawn from the results and two of them are discussed in the > >> analysis section. > >> > >> Report link: > >> https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > >> > >> Previous reports: > >> Report 1 - Measuring Basic Performance Metrics of QEMU: > >> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > >> > >> Best regards, > >> Ahmed Karaman > > > > > > > > -- > > 此致 > > 礼 > > 罗勇刚 > > Yours > > sincerely, > > Yonggang Luo > > Thanks Mr. Yonggang. Yes of course, go ahead. > Please let me know if you have any further questions. > Yes, Ahmed, you said Mr. Yonggang can go ahaed - but you didn't say how. :) As far as I know, this is how Ahmed test bed is setup: 1) Fresh installation on Ubuntu 18.04 on an Inter 64-bit host. 2) Install QEMU build prerequisite packages. 3) Install perf (this step is not necessary for Report 2, but it is for Report 1). 4) Install vallgrind. 5) Install 16 gcc cross-compilers. (which, together with native comipler, will sum up to the 17 possible QEMU targets) That is all fine if Mr. Yongang is able to do the above, or if he already have similar system. I am fairly convinced that the setup for any Debian-based Linux distribution will be almost identical as described above However, let's say Mr.Yongang system is Suse-bases distribution (SUSE Linux Enterprise, openSUSE Leap, openSUSE Tumbleweed, Gecko). He could do steps 2), 3), 4) in a fairly similar manner. But, step 5) will be difficult. I know that support for cross-compilers is relatively poor for Suse-based distributions. I think Mr. Yongang could run experiment from the second part of Report 2 only for 5 or 6 targets, rather than 17 as you did. The bottom line for Report 2: I think there should be an "Appendix" note on installing cross-compilers. And some general note on your test bed, as well as some guideline for all people like Mr. Yongang who wish to repro the results on their own systems. Sincerely, Aleksandar 2) > Best Regards, > Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 8:21 ` Aleksandar Markovic @ 2020-06-30 9:52 ` Aleksandar Markovic 2020-06-30 19:02 ` Ahmed Karaman 0 siblings, 1 reply; 27+ messages in thread From: Aleksandar Markovic @ 2020-06-30 9:52 UTC (permalink / raw) To: Ahmed Karaman Cc: Lukáš Doktor, Alex Bennée, QEMU Developers, luoyonggang, Richard Henderson > As far as I know, this is how Ahmed test bed is setup: > > 1) Fresh installation on Ubuntu 18.04 on an Inter 64-bit host. > 2) Install QEMU build prerequisite packages. > 3) Install perf (this step is not necessary for Report 2, but it is > for Report 1). > 4) Install vallgrind. > 5) Install 16 gcc cross-compilers. (which, together with native > comipler, will sum up to the 17 possible QEMU targets) > The following commands install cross-compilers needed for creating table in the second part or Ahmed's Report 2: sudo apt-get install g++ sudo apt-get install g++-aarch64-linux-gnu sudo apt-get install g++-alpha-linux-gnu sudo apt-get install g++-arm-linux-gnueabi sudo apt-get install g++-hppa-linux-gnu sudo apt-get install g++-m68k-linux-gnu sudo apt-get install g++-mips-linux-gnu sudo apt-get install g++-mips64-linux-gnuabi64 sudo apt-get install g++-mips64el-linux-gnuabi64 sudo apt-get install g++-mipsel-linux-gnu sudo apt-get install g++-powerpc-linux-gnu sudo apt-get install g++-powerpc64-linux-gnu sudo apt-get install g++-powerpc64le-linux-gnu sudo apt-get install g++-riscv64-linux-gnu sudo apt-get install g++-s390x-linux-gnu sudo apt-get install g++-sh4-linux-gnu sudo apt-get install g++-sparc64-linux-gnu Ahmed, I think this should be in an Appendix section of Report 2. Sincerely, Aleksandar > That is all fine if Mr. Yongang is able to do the above, or if he > already have similar system. > > I am fairly convinced that the setup for any Debian-based Linux > distribution will be almost identical as described above > > However, let's say Mr.Yongang system is Suse-bases distribution (SUSE > Linux Enterprise, openSUSE Leap, openSUSE Tumbleweed, Gecko). He could > do steps 2), 3), 4) in a fairly similar manner. But, step 5) will be > difficult. I know that support for cross-compilers is relatively poor > for Suse-based distributions. I think Mr. Yongang could run experiment > from the second part of Report 2 only for 5 or 6 targets, rather than > 17 as you did. > > The bottom line for Report 2: > > I think there should be an "Appendix" note on installing > cross-compilers. And some general note on your test bed, as well as > some guideline for all people like Mr. Yongang who wish to repro the > results on their own systems. > > Sincerely, > Aleksandar > > > > > > > > > > 2) > > > > Best Regards, > > Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-30 9:52 ` Aleksandar Markovic @ 2020-06-30 19:02 ` Ahmed Karaman 0 siblings, 0 replies; 27+ messages in thread From: Ahmed Karaman @ 2020-06-30 19:02 UTC (permalink / raw) To: Aleksandar Markovic, luoyonggang Cc: Lukáš Doktor, Alex Bennée, QEMU Developers, Richard Henderson On Tue, Jun 30, 2020 at 11:52 AM Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> wrote: > > > As far as I know, this is how Ahmed test bed is setup: > > > > 1) Fresh installation on Ubuntu 18.04 on an Inter 64-bit host. > > 2) Install QEMU build prerequisite packages. > > 3) Install perf (this step is not necessary for Report 2, but it is > > for Report 1). > > 4) Install vallgrind. > > 5) Install 16 gcc cross-compilers. (which, together with native > > comipler, will sum up to the 17 possible QEMU targets) > > > > The following commands install cross-compilers needed for creating > table in the second part or Ahmed's Report 2: > > sudo apt-get install g++ > sudo apt-get install g++-aarch64-linux-gnu > sudo apt-get install g++-alpha-linux-gnu > sudo apt-get install g++-arm-linux-gnueabi > sudo apt-get install g++-hppa-linux-gnu > sudo apt-get install g++-m68k-linux-gnu > sudo apt-get install g++-mips-linux-gnu > sudo apt-get install g++-mips64-linux-gnuabi64 > sudo apt-get install g++-mips64el-linux-gnuabi64 > sudo apt-get install g++-mipsel-linux-gnu > sudo apt-get install g++-powerpc-linux-gnu > sudo apt-get install g++-powerpc64-linux-gnu > sudo apt-get install g++-powerpc64le-linux-gnu > sudo apt-get install g++-riscv64-linux-gnu > sudo apt-get install g++-s390x-linux-gnu > sudo apt-get install g++-sh4-linux-gnu > sudo apt-get install g++-sparc64-linux-gnu > > Ahmed, I think this should be in an Appendix section of Report 2. > > Sincerely, > Aleksandar > > > That is all fine if Mr. Yongang is able to do the above, or if he > > already have similar system. > > > > I am fairly convinced that the setup for any Debian-based Linux > > distribution will be almost identical as described above > > > > However, let's say Mr.Yongang system is Suse-bases distribution (SUSE > > Linux Enterprise, openSUSE Leap, openSUSE Tumbleweed, Gecko). He could > > do steps 2), 3), 4) in a fairly similar manner. But, step 5) will be > > difficult. I know that support for cross-compilers is relatively poor > > for Suse-based distributions. I think Mr. Yongang could run experiment > > from the second part of Report 2 only for 5 or 6 targets, rather than > > 17 as you did. > > > > The bottom line for Report 2: > > > > I think there should be an "Appendix" note on installing > > cross-compilers. And some general note on your test bed, as well as > > some guideline for all people like Mr. Yongang who wish to repro the > > results on their own systems. > > > > Sincerely, > > Aleksandar > > > > > > > > > > > > > > > > > > > > 2) > > > > > > > Best Regards, > > > Ahmed Karaman Thanks Mr. Aleksandar for your input on this one. This is indeed my setup for the testbed used for the two previous reports and all the upcoming ones. To help Mr. Yongang with his setup, and anybody else trying to set this up, I plan to post a mini-report (Report 0) to lay down the instructions for setting up a system similar to the one used in the reports. Best regards, Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts 2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman ` (3 preceding siblings ...) 2020-06-30 5:59 ` 罗勇刚(Yonggang Luo) @ 2020-07-01 14:47 ` Ahmed Karaman 4 siblings, 0 replies; 27+ messages in thread From: Ahmed Karaman @ 2020-07-01 14:47 UTC (permalink / raw) To: Lukáš Doktor, luoyonggang, QEMU Developers Cc: Aleksandar Markovic, Alex Bennée, Richard Henderson On Mon, Jun 29, 2020 at 12:25 PM Ahmed Karaman <ahmedkhaledkaraman@gmail.com> wrote: > > Hi, > > The second report of the TCG Continuous Benchmarking series builds > upon the QEMU performance metrics calculated in the previous report. > This report presents a method to dissect the number of instructions > executed by a QEMU invocation into three main phases: > - Code Generation > - JIT Execution > - Helpers Execution > It devises a Python script that automates this process. > > After that, the report presents an experiment for comparing the > output of running the script on 17 different targets. Many conclusions > can be drawn from the results and two of them are discussed in the > analysis section. > > Report link: > https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > > Previous reports: > Report 1 - Measuring Basic Performance Metrics of QEMU: > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > Best regards, > Ahmed Karaman Hi Mr. Lukáš and Yonggang, I've created a separate "setup" page on the reports website. https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/setup/ It contains the hardware and OS information of the used system. It also contains all dependencies and setup instructions required to set up a machine identical to the one used in the reports. If you have any further questions or you're using a different Linux distribution, please let me know. Best regards, Ahmed Karaman ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2020-07-04 17:11 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-29 10:25 [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts Ahmed Karaman 2020-06-29 10:40 ` Aleksandar Markovic 2020-06-29 14:26 ` Ahmed Karaman 2020-06-29 16:03 ` Alex Bennée 2020-06-29 18:21 ` Aleksandar Markovic 2020-06-29 21:16 ` Ahmed Karaman 2020-07-01 13:44 ` Ahmed Karaman 2020-07-01 15:42 ` Alex Bennée 2020-07-01 17:47 ` Ahmed Karaman 2020-07-03 22:46 ` Aleksandar Markovic 2020-07-04 8:45 ` Alex Bennée 2020-07-04 9:19 ` Aleksandar Markovic 2020-07-04 9:55 ` Aleksandar Markovic 2020-07-04 17:10 ` Ahmed Karaman 2020-06-30 4:33 ` Lukáš Doktor 2020-06-30 7:18 ` Ahmed Karaman 2020-06-30 8:58 ` Aleksandar Markovic 2020-06-30 12:46 ` Lukáš Doktor 2020-06-30 19:14 ` Ahmed Karaman 2020-06-30 9:41 ` Aleksandar Markovic 2020-06-30 12:58 ` Lukáš Doktor 2020-06-30 5:59 ` 罗勇刚(Yonggang Luo) 2020-06-30 7:29 ` Ahmed Karaman 2020-06-30 8:21 ` Aleksandar Markovic 2020-06-30 9:52 ` Aleksandar Markovic 2020-06-30 19:02 ` Ahmed Karaman 2020-07-01 14:47 ` Ahmed Karaman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.