My mistake, I confused r- and p-values (The r-value is useful but not as useful as p.)

The p-value is a statistical measure of validity. It is typically produced either bya t-test or z-test.

https://en.wikipedia.org/wiki/P-value
https://en.wikipedia.org/wiki/Student%27s_t-test
https://en.wikipedia.org/wiki/Z-test

The z-test is more "mathematically pure", but as far as I understand the t-test is less likely to have false positives for a smaller number of data points.

On November 15, 2021 5:40:55 PM PST, Yin Fengwei <fengwei.yin@intel.com> wrote:
>Hi,
>
>On 11/16/2021 3:20 AM, H. Peter Anvin wrote:
>> [Cc: Peter Z.]
>> 
>> This seems totally bizarre... that is an *enormous* change, and if I'm reading it right it seems like this somehow related to the performance monitoring framework itself?
>We can rerun the benchmark with performance monitor totally disabled.
>
>> 
>> The lower-performance init code is all pushed into the pre-boot path, unless for some strange reason not all code gets patched e.g. at module loading time.
>> 
>> A quick peek around made me notice a few minor possibilities, but none of them look particularly sane:
>> 
>> 1. We don't use "asm inline" in asm_volatile_goto, and we probably
>>    should; otherwise gcc might get the idea this is a more heavyweight
>>    operation than it actually is.
>> 2. There is a workaround in asm_volatile_goto for a bug which apparently
>>    was fixed in gcc 4.8.x that might mislead gcc's code generator into
>>    generating worse code.
>> 
>> Did you see any functions for which the code got *bigger*?
>I checked 7 or 8 functions. Most of them were same. None of them got bigger.
>This intel_pmu_store_lbr was smaller. I could do a full comparing.
>
>> 
>> 
>> Not directly related, but it would be really helpful to get an r-value with your statistics.  It would greatly help avoiding chasing ghosts.
>Can you share what's the "r-value" and how to get it? Thanks.
>
>
>Regards
>Yin, Fengwei
>
>> 
>> 
>> 
>> 
>> On 11/15/21 01:53, Yin Fengwei wrote:
>>> Hi,
>>>
>>> On 11/15/2021 3:37 PM, kernel test robot wrote:
>>>>
>>>>
>>>> Greeting,
>>>>
>>>> FYI, we noticed a -4.9% regression of will-it-scale.per_process_ops due to commit:
>>>>
>>>>
>>>> commit: 0507503671f9b1c867e889cbec0f43abf904f23c ("x86/asm: Avoid adding register pressure for the init case in static_cpu_has()")
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>>
>>>> in testcase: will-it-scale
>>>> on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz with 256G memory
>>>> with following parameters:
>>>>
>>>>     nr_task: 50%
>>>>     mode: process
>>>>     test: mmap2
>>>>     cpufreq_governor: performance
>>>>     ucode: 0xd000280
>>>>
>>>> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
>>>> test-url: https://github.com/antonblanchard/will-it-scale
>>>>
>>>>
>>>> please be noted, since we don't have clue why this commit could cause
>>>> performance drop, so we did further tests on other platforms or with
>>>> different parameters, and got below results.
>>> Add Kees in case he is interest to this behavior.
>>>
>>> Observation on this regression:
>>>     After the patch, the better code is generated. De-assembled the function intel_pmu_store_lbr with
>>>     vmlinux built from commit f87bc8dc7a7c and 0507503671f9 and got:
>>>        With commit f87bc8dc7a7c (parent commit):
>>>       https://zerobin.net/?22efb1114b097030#ryD/8LpasEIg8WrS6O/M+sHYJp7c/LoAXPfeB7BUqu4=
>>>
>>>     With commit 0507503671f9:
>>>       https://zerobin.net/?e57652572b3ec83c#CvobUggve54SIHmlZ6jkzs3s4k8iQN4ophFoOs7LHMI=
>>>
>>>     The assembly code with commit 0507503671f9 is smaller than with parent commit. The
>>>     register r12 in parent commit is strange IIUC.
>>>
>>>
>>>     BTW, the reason that we picked up function intel_pmu_store_lbr is:
>>>       It's the first function in System.map which has different size w/o the patch
>>>
>>>
>>> Suppose the performance data with commit 0507503671f9 should be better. But the
>>> test result showed it had improvement only on one test box. On other three test box,
>>> it introduced regressions. Looks like strange.
>>>
>>>
>>> Regards
>>> Yin, Fengwei
>>>
>>>>
>>>> except the 1% improvement from the first test on a 4 sockets Haswell-EX,
>>>> others all show similar regression:
>>>>
>>>> +------------------+----------------------------------------------------------------------------------+
>>>> | testcase: change | will-it-scale: will-it-scale.per_process_ops +1.0% improvement                   |
>>>> | test machine     | 144 threads 4 sockets Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory |
>>>> | test parameters  | cpufreq_governor=performance                                                     |
>>>> |                  | mode=process                                                                     |
>>>> |                  | nr_task=50%                                                                      |
>>>> |                  | test=mmap2                                                                       |
>>>> |                  | ucode=0x16                                                                       |
>>>> +------------------+----------------------------------------------------------------------------------+
>>>> | testcase: change | will-it-scale: will-it-scale.per_process_ops -3.7% regression                    |
>>>> | test machine     | 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory |
>>>> | test parameters  | cpufreq_governor=performance                                                     |
>>>> |                  | mode=process                                                                     |
>>>> |                  | nr_task=50%                                                                      |
>>>> |                  | test=mmap2                                                                       |
>>>> |                  | ucode=0x700001e                                                                  |
>>>> +------------------+----------------------------------------------------------------------------------+
>>>> | testcase: change | will-it-scale: will-it-scale.per_process_ops -5.1% regression                    |
>>>> | test machine     | 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz with 256G memory  |
>>>> | test parameters  | cpufreq_governor=performance                                                     |
>>>> |                  | mode=process                                                                     |
>>>> |                  | nr_task=16                                                                       |
>>>> |                  | test=mmap2                                                                       |
>>>> |                  | ucode=0xd000280                                                                  |
>>>> +------------------+----------------------------------------------------------------------------------+
>>>> | testcase: change | will-it-scale: will-it-scale.per_process_ops -5.9% regression                    |
>>>> | test machine     | 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory  |
>>>> | test parameters  | cpufreq_governor=performance                                                     |
>>>> |                  | mode=process                                                                     |
>>>> |                  | nr_task=16                                                                       |
>>>> |                  | test=mmap1                                                                       |
>>>> |                  | ucode=0x5003006                                                                  |
>>>> +------------------+----------------------------------------------------------------------------------+
>>>> | testcase: change | will-it-scale: will-it-scale.per_process_ops -3.5% regression                    |
>>>> | test machine     | 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory  |
>>>> | test parameters  | cpufreq_governor=performance                                                     |
>>>> |                  | mode=process                                                                     |
>>>> |                  | nr_task=50%                                                                      |
>>>> |                  | test=mmap2                                                                       |
>>>> |                  | ucode=0x5003006                                                                  |
>>>> +------------------+----------------------------------------------------------------------------------+
>>>>
>>>>
>>>> If you fix the issue, kindly add following tag
>>>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>>>
>>>>
>>>> Details are as below:
>>>> -------------------------------------------------------------------------------------------------->
>>>>
>>>>
>>>> To reproduce:
>>>>
>>>>          git clone https://github.com/intel/lkp-tests.git
>>>>          cd lkp-tests
>>>>          sudo bin/lkp install job.yaml           # job file is attached in this email
>>>>          bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>>>>          sudo bin/lkp run generated-yaml-file
>>>>
>>>>          # if come across any failure that blocks the test,
>>>>          # please remove ~/.lkp and /lkp dir to run from a clean state.
>>>>
>>>> =========================================================================================
>>>> compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
>>>>    gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-icl-2sp2/mmap2/will-it-scale/0xd000280
>>>>
>>>> commit:
>>>>    f87bc8dc7a ("x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix")
>>>>    0507503671 ("x86/asm: Avoid adding register pressure for the init case in static_cpu_has()")
>>>>
>>>> f87bc8dc7a7c438c 0507503671f9b1c867e889cbec0
>>>> ---------------- ---------------------------
>>>>           %stddev     %change         %stddev
>>>>               \          |                \
>>>>    41898923            -4.9%   39829159        will-it-scale.64.processes
>>>>      654670            -4.9%     622330        will-it-scale.per_process_ops
>>>>    41898923            -4.9%   39829159        will-it-scale.workload
>>>>        6918 ± 54%    +116.5%      14975 ± 14%  softirqs.CPU20.SCHED
>>>>      240.00 ± 18%     +57.8%     378.67 ± 20%  slabinfo.biovec-64.active_objs
>>>>      240.00 ± 18%     +57.8%     378.67 ± 20%  slabinfo.biovec-64.num_objs
>>>>        0.01 ± 28%     -36.1%       0.01 ± 14%  perf-sched.sch_delay.max.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
>>>>        6114 ± 24%     -46.1%       3296 ± 46%  perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
>>>>        6114 ± 24%     -46.1%       3296 ± 46%  perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
>>>>        1409 ± 30%     -30.7%     977.00 ± 23%  interrupts.CPU1.CAL:Function_call_interrupts
>>>>        3001 ± 58%     -70.3%     892.50 ± 69%  interrupts.CPU1.RES:Rescheduling_interrupts
>>>>      669.83 ±172%    +696.9%       5338 ±102%  interrupts.CPU108.NMI:Non-maskable_interrupts

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.