All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yin Fengwei <fengwei.yin@intel.com>
To: lkp@lists.01.org
Subject: Re: [x86/asm] 0507503671: will-it-scale.per_process_ops -4.9% regression
Date: Mon, 15 Nov 2021 17:53:32 +0800	[thread overview]
Message-ID: <0cfae054-3150-35d4-f95f-68b5ec1897e7@intel.com> (raw)
In-Reply-To: <20211115073738.GA23967@xsang-OptiPlex-9020>

[-- Attachment #1: Type: text/plain, Size: 9369 bytes --]

Hi,

On 11/15/2021 3:37 PM, kernel test robot wrote:
> 
> 
> Greeting,
> 
> FYI, we noticed a -4.9% regression of will-it-scale.per_process_ops due to commit:
> 
> 
> commit: 0507503671f9b1c867e889cbec0f43abf904f23c ("x86/asm: Avoid adding register pressure for the init case in static_cpu_has()")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: will-it-scale
> on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz with 256G memory
> with following parameters:
> 
> 	nr_task: 50%
> 	mode: process
> 	test: mmap2
> 	cpufreq_governor: performance
> 	ucode: 0xd000280
> 
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
> 
> 
> please be noted, since we don't have clue why this commit could cause
> performance drop, so we did further tests on other platforms or with
> different parameters, and got below results.
Add Kees in case he is interest to this behavior.

Observation on this regression:
   After the patch, the better code is generated. De-assembled the function intel_pmu_store_lbr with
   vmlinux built from commit f87bc8dc7a7c and 0507503671f9 and got:
  
   With commit f87bc8dc7a7c (parent commit):
     https://zerobin.net/?22efb1114b097030#ryD/8LpasEIg8WrS6O/M+sHYJp7c/LoAXPfeB7BUqu4=

   With commit 0507503671f9:
     https://zerobin.net/?e57652572b3ec83c#CvobUggve54SIHmlZ6jkzs3s4k8iQN4ophFoOs7LHMI=

   The assembly code with commit 0507503671f9 is smaller than with parent commit. The
   register r12 in parent commit is strange IIUC.


   BTW, the reason that we picked up function intel_pmu_store_lbr is:
     It's the first function in System.map which has different size w/o the patch


Suppose the performance data with commit 0507503671f9 should be better. But the
test result showed it had improvement only on one test box. On other three test box,
it introduced regressions. Looks like strange.


Regards
Yin, Fengwei

> 
> except the 1% improvement from the first test on a 4 sockets Haswell-EX,
> others all show similar regression:
> 
> +------------------+----------------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops +1.0% improvement                   |
> | test machine     | 144 threads 4 sockets Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory |
> | test parameters  | cpufreq_governor=performance                                                     |
> |                  | mode=process                                                                     |
> |                  | nr_task=50%                                                                      |
> |                  | test=mmap2                                                                       |
> |                  | ucode=0x16                                                                       |
> +------------------+----------------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops -3.7% regression                    |
> | test machine     | 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory |
> | test parameters  | cpufreq_governor=performance                                                     |
> |                  | mode=process                                                                     |
> |                  | nr_task=50%                                                                      |
> |                  | test=mmap2                                                                       |
> |                  | ucode=0x700001e                                                                  |
> +------------------+----------------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops -5.1% regression                    |
> | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz with 256G memory  |
> | test parameters  | cpufreq_governor=performance                                                     |
> |                  | mode=process                                                                     |
> |                  | nr_task=16                                                                       |
> |                  | test=mmap2                                                                       |
> |                  | ucode=0xd000280                                                                  |
> +------------------+----------------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops -5.9% regression                    |
> | test machine     | 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory  |
> | test parameters  | cpufreq_governor=performance                                                     |
> |                  | mode=process                                                                     |
> |                  | nr_task=16                                                                       |
> |                  | test=mmap1                                                                       |
> |                  | ucode=0x5003006                                                                  |
> +------------------+----------------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops -3.5% regression                    |
> | test machine     | 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory  |
> | test parameters  | cpufreq_governor=performance                                                     |
> |                  | mode=process                                                                     |
> |                  | nr_task=50%                                                                      |
> |                  | test=mmap2                                                                       |
> |                  | ucode=0x5003006                                                                  |
> +------------------+----------------------------------------------------------------------------------+
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <oliver.sang@intel.com>
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> To reproduce:
> 
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         sudo bin/lkp install job.yaml           # job file is attached in this email
>         bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>         sudo bin/lkp run generated-yaml-file
> 
>         # if come across any failure that blocks the test,
>         # please remove ~/.lkp and /lkp dir to run from a clean state.
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
>   gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-icl-2sp2/mmap2/will-it-scale/0xd000280
> 
> commit: 
>   f87bc8dc7a ("x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix")
>   0507503671 ("x86/asm: Avoid adding register pressure for the init case in static_cpu_has()")
> 
> f87bc8dc7a7c438c 0507503671f9b1c867e889cbec0 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>   41898923            -4.9%   39829159        will-it-scale.64.processes
>     654670            -4.9%     622330        will-it-scale.per_process_ops
>   41898923            -4.9%   39829159        will-it-scale.workload
>       6918 ± 54%    +116.5%      14975 ± 14%  softirqs.CPU20.SCHED
>     240.00 ± 18%     +57.8%     378.67 ± 20%  slabinfo.biovec-64.active_objs
>     240.00 ± 18%     +57.8%     378.67 ± 20%  slabinfo.biovec-64.num_objs
>       0.01 ± 28%     -36.1%       0.01 ± 14%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>       6114 ± 24%     -46.1%       3296 ± 46%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
>       6114 ± 24%     -46.1%       3296 ± 46%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
>       1409 ± 30%     -30.7%     977.00 ± 23%  interrupts.CPU1.CAL:Function_call_interrupts
>       3001 ± 58%     -70.3%     892.50 ± 69%  interrupts.CPU1.RES:Rescheduling_interrupts
>     669.83 ±172%    +696.9%       5338 ±102%  interrupts.CPU108.NMI:Non-maskable_interrupts

  reply	other threads:[~2021-11-15  9:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-15  7:37 [x86/asm] 0507503671: will-it-scale.per_process_ops -4.9% regression kernel test robot
2021-11-15  9:53 ` Yin Fengwei [this message]
2021-11-15 19:20   ` H. Peter Anvin
2021-11-15 20:39     ` Peter Zijlstra
2021-11-15 21:15       ` H. Peter Anvin
2021-11-15 21:46         ` Peter Zijlstra
2021-11-16  1:41           ` Feng Tang
2021-11-16  1:57         ` Yin Fengwei
2021-11-16  1:40     ` Yin Fengwei
2021-11-16  9:33       ` Yin Fengwei
2021-11-17  2:44         ` Oliver Sang
2021-11-17  1:00       ` H. Peter Anvin
2021-11-17  1:27         ` Yin Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0cfae054-3150-35d4-f95f-68b5ec1897e7@intel.com \
    --to=fengwei.yin@intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.