[correcting some email addresses] нед, 3. мај 2020. у 01:20 Aleksandar Markovic < aleksandar.qemu.devel@gmail.com> је написао/ла: > Hi, all. > > I just want to share with you some bits and pieces of data that I got > while doing some preliminary experimentation for the GSoC project "TCG > Continuous Benchmarking", that Ahmed Karaman, a student of the fourth final > year of Electical Engineering Faculty in Cairo, will execute. > > *User Mode* > > * As expected, for any program dealing with any substantional > floating-point calculation, softfloat library will be the the heaviest CPU > cycles consumer. > * We plan to examine the performance behaviour of non-FP programs > (integer arithmetic), or even non-numeric programs (sorting strings, for > example). > > *System Mode* > > * I did profiling of booting several machines using a tool called > callgrind (a part of valgrind). The tool offers pletora of information, > however it looks it is little confused by usage of coroutines, and that > makes some of its reports look very illogical, or plain ugly. Still, it > seems valid data can be extracted from it. Without going into details, here > is what it says for one machine (bear in mind that results may vary to a > great extent between machines): > ** The booting involved six threads, one for display handling, one > for emulations, and four more. The last four did almost nothing during > boot, just almost entire time siting idle, waiting for something. As far as > "Total Instruction Fetch Count" (this is the main measure used in > callgrind), they were distributed in proportion 1:3 between display thread > and emulation thread (the rest of threads were negligible) (but, > interestingly enough, for another machine that proportion was 1:20). > ** The display thread is dominated by vga_update_display() function > (21.5% "self" time, and 51.6% "self + callees" time, called almost 40000 > times). Other functions worth mentioning are > cpu_physical_memory_snapshot_get_dirty() and > memory_region_snapshot_get_dirty(), which are very small functions, but are > both invoked over 26 000 000 times, and contribute with over 20% of display > thread instruction fetch count together. > ** Focusing now on emulation thread, "Total Instruction Fetch Counts" > were roughly distributed this way: > - 15.7% is execution of GIT-ed code from translation block > buffer > - 39.9% is execution of helpers > - 44.4% is code translation stage, including some coroutine > activities > Top two among helpers: > - helper_le_stl_memory() > - helper_lookup_tb_ptr() (this one is invoked whopping 36 000 > 000 times) > Single largest instruction consumer of code translation: > - liveness_pass_1(), that constitutes 21.5% of the entire > "emulation thread" consumption, or, in other way, almost half of code > translation stage (that sits at 44.4%) > > Please take all this with a little grain of salt, since these results are > just of preliminary nature. > > I would like to use this opportunity to welcome Ahmed Karaman, a talented > young man from Egypt, into QEMU development community, that'll work on "TCG > Continuous Benchmarking" project this summer. Please do help them in his > first steps as our colleague. Best luck to Ahmed! > > Thanks, > Aleksandar > >