[correcting some email addresses]

нед, 3. мај 2020. у 01:20 Aleksandar Markovic <
aleksandar.qemu.devel@gmail.com> је написао/ла:

> Hi, all.
>
> I just want to share with you some bits and pieces of data that I got
> while doing some preliminary experimentation for the GSoC project "TCG
> Continuous Benchmarking", that Ahmed Karaman, a student of the fourth final
> year of Electical Engineering Faculty in Cairo, will execute.
>
> *User Mode*
>
>    * As expected, for any program dealing with any substantional
> floating-point calculation, softfloat library will be the the heaviest CPU
> cycles consumer.
>    * We plan to examine the performance behaviour of non-FP programs
> (integer arithmetic), or even non-numeric programs (sorting strings, for
> example).
>
> *System Mode*
>
>    * I did profiling of booting several machines using a tool called
> callgrind (a part of valgrind). The tool offers pletora of information,
> however it looks it is little confused by usage of coroutines, and that
> makes some of its reports look very illogical, or plain ugly. Still, it
> seems valid data can be extracted from it. Without going into details, here
> is what it says for one machine (bear in mind that results may vary to a
> great extent between machines):
>      ** The booting involved six threads, one for display handling, one
> for emulations, and four more. The last four did almost nothing during
> boot, just almost entire time siting idle, waiting for something. As far as
> "Total Instruction Fetch Count" (this is the main measure used in
> callgrind), they were distributed in proportion 1:3 between display thread
> and emulation thread (the rest of threads were negligible) (but,
> interestingly enough, for another machine that proportion was 1:20).
>      ** The display thread is dominated by vga_update_display() function
> (21.5% "self" time, and 51.6% "self + callees" time, called almost 40000
> times). Other functions worth mentioning are
> cpu_physical_memory_snapshot_get_dirty() and
> memory_region_snapshot_get_dirty(), which are very small functions, but are
> both invoked over 26 000 000 times, and contribute with over 20% of display
> thread instruction fetch count together.
>      ** Focusing now on emulation thread, "Total Instruction Fetch Counts"
> were roughly distributed this way:
>            - 15.7% is execution of GIT-ed code from translation block
> buffer
>            - 39.9% is execution of helpers
>            - 44.4% is code translation stage, including some coroutine
> activities
>         Top two among helpers:
>           - helper_le_stl_memory()
>           - helper_lookup_tb_ptr() (this one is invoked whopping 36 000
> 000 times)
>         Single largest instruction consumer of code translation:
>           - liveness_pass_1(), that constitutes 21.5% of the entire
> "emulation thread" consumption, or, in other way, almost half of code
> translation stage (that sits at 44.4%)
>
> Please take all this with a little grain of salt, since these results are
> just of preliminary nature.
>
> I would like to use this opportunity to welcome Ahmed Karaman, a talented
> young man from Egypt, into QEMU development community, that'll work on "TCG
> Continuous Benchmarking" project this summer. Please do help them in his
> first steps as our colleague. Best luck to Ahmed!
>
> Thanks,
> Aleksandar
>
>