linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
@ 2021-04-02  8:33 Paul Menzel
  2021-04-02 14:05 ` Borislav Petkov
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Menzel @ 2021-04-02  8:33 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86
  Cc: LKML, Song Liu, linux-raid, it+linux-x86

Dear Linux folks,


On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 
speed shown at the beginning of the boot.

                        5.4.95        5.10.24
----------------------------------------------
raid6: avx2x4 gen()   18429 MB/s     6155 MB/s
raid6: avx2x4 xor()    6644 MB/s     4274 MB/s
raid6: avx2x2 gen()   17894 MB/s    18744 MB/s
raid6: avx2x2 xor()   11642 MB/s    11950 MB/s
raid6: avx2x1 gen()   13992 MB/s    17112 MB/s
raid6: avx2x1 xor()   10855 MB/s    11143 MB/s

We are able to reproduce this with different models: Supermicro 
AS-2023US-TR4/H11DSU-iN and Dell PowerEdge R7425 (with different 
microcode versions).

Can you reproduce this on your systems?

Bisecting is going to be hard, so the systems are in production and also 
take a while to boot. (Maybe kexec would help here.)


Kind regards,

Paul


PS: Some more information:

```
[    0.000000] Linux version 5.4.97.mx64.368 
(root@theinternet.molgen.mpg.de) (gcc version 7.5.0 (GCC
)) #1 SMP Wed Feb 10 18:22:50 CET 2021
[…]
[    0.000000] DMI: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.1 02/07/2018
[…]
[    0.630603] raid6: avx2x4   gen() 18429 MB/s
[    0.651607] raid6: avx2x4   xor()  6644 MB/s
[    0.672605] raid6: avx2x2   gen() 17894 MB/s
[    0.693603] raid6: avx2x2   xor() 11642 MB/s
[    0.714605] raid6: avx2x1   gen() 13992 MB/s
[    0.735604] raid6: avx2x1   xor() 10855 MB/s
[    0.756607] raid6: sse2x4   gen() 12246 MB/s
[    0.777605] raid6: sse2x4   xor()  5724 MB/s
[    0.798605] raid6: sse2x2   gen() 10945 MB/s
[    0.819603] raid6: sse2x2   xor()  8097 MB/s
[    0.840606] raid6: sse2x1   gen()  5941 MB/s
[    0.861606] raid6: sse2x1   xor()  5894 MB/s
[    0.866565] raid6: using algorithm avx2x4 gen() 18429 MB/s
[    0.871567] raid6: .... xor() 6644 MB/s, rmw enabled
[    0.877566] raid6: using avx2x2 recovery algorithm
[…]
```


```
[    0.000000] Linux version 5.10.24.mx64.375 
(root@theinternet.molgen.mpg.de) (gcc (GCC) 7.5.0, GNU ld (GNU Binutils) 
2.32) #1 SMP Fri Mar 19 12:29:21 CET 2021
[…]
[    0.000000] DMI: Supermicro AS -2023US-TR4/H11DSU-iN, BIOS 1.1 02/07/2018
[…]
[    0.655382] raid6: avx2x4   gen()  6155 MB/s
[    0.676382] raid6: avx2x4   xor()  4274 MB/s
[    0.697380] raid6: avx2x2   gen() 18744 MB/s
[    0.718380] raid6: avx2x2   xor() 11950 MB/s
[    0.739380] raid6: avx2x1   gen() 17112 MB/s
[    0.760380] raid6: avx2x1   xor() 11143 MB/s
[    0.781381] raid6: sse2x4   gen() 11062 MB/s
[    0.802380] raid6: sse2x4   xor()  5180 MB/s
[    0.823380] raid6: sse2x2   gen() 12467 MB/s
[    0.844380] raid6: sse2x2   xor()  7672 MB/s
[    0.865381] raid6: sse2x1   gen()  9733 MB/s
[    0.886380] raid6: sse2x1   xor()  5717 MB/s
[    0.890674] raid6: using algorithm avx2x2 gen() 18744 MB/s
[    0.895673] raid6: .... xor() 11950 MB/s, rmw enabled
[    0.901673] raid6: using avx2x2 recovery algorithm
```

```
$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          128
On-line CPU(s) list:             0-127
Thread(s) per core:              2
Core(s) per socket:              32
Socket(s):                       2
NUMA node(s):                    8
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           1
Model name:                      AMD EPYC 7601 32-Core Processor
Stepping:                        2
Frequency boost:                 enabled
CPU MHz:                         3100.798
CPU max MHz:                     2200.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4399.53
Virtualization:                  AMD-V
L1d cache:                       2 MiB
L1i cache:                       4 MiB
L2 cache:                        32 MiB
L3 cache:                        128 MiB
NUMA node0 CPU(s):               0-7,64-71
NUMA node1 CPU(s):               8-15,72-79
NUMA node2 CPU(s):               16-23,80-87
NUMA node3 CPU(s):               24-31,88-95
NUMA node4 CPU(s):               32-39,96-103
NUMA node5 CPU(s):               40-47,104-111
NUMA node6 CPU(s):               48-55,112-119
NUMA node7 CPU(s):               56-63,120-127
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass 
disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers 
and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB 
conditional, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic 
sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx 
mmxex
                                  t fxsr_opt pdpe1gb rdtscp lm 
constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm 
aperfmperf pni pclmulqd
                                  q monitor ssse3 fma cx16 sse4_1 sse4_2 
movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic 
cr8_lega
                                  cy abm sse4a misalignsse 3dnowprefetch 
osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc 
mwaitx c
                                  pb hw_pstate ssbd ibpb vmmcall 
fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt 
xsavec xgetbv1
                                   xsaves clzero irperf xsaveerptr arat 
npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid 
decodeassists paus
                                  efilter pfthreshold avic 
v_vmsave_vmload vgif overflow_recov succor smca
```

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
  2021-04-02  8:33 [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s Paul Menzel
@ 2021-04-02 14:05 ` Borislav Petkov
  2021-04-06 10:58   ` Paul Menzel
  0 siblings, 1 reply; 5+ messages in thread
From: Borislav Petkov @ 2021-04-02 14:05 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Thomas Gleixner, Ingo Molnar, x86, LKML, Song Liu, linux-raid,
	it+linux-x86

On Fri, Apr 02, 2021 at 10:33:51AM +0200, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed
> shown at the beginning of the boot.
> 
>                        5.4.95        5.10.24
> ----------------------------------------------
> raid6: avx2x4 gen()   18429 MB/s     6155 MB/s
> raid6: avx2x4 xor()    6644 MB/s     4274 MB/s
> raid6: avx2x2 gen()   17894 MB/s    18744 MB/s
> raid6: avx2x2 xor()   11642 MB/s    11950 MB/s
> raid6: avx2x1 gen()   13992 MB/s    17112 MB/s
> raid6: avx2x1 xor()   10855 MB/s    11143 MB/s

Looks like those two might help:

49200d17d27d x86/fpu/64: Don't FNINIT in kernel_fpu_begin()
e45122893a98 x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
  2021-04-02 14:05 ` Borislav Petkov
@ 2021-04-06 10:58   ` Paul Menzel
  2021-04-06 12:41     ` Borislav Petkov
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Menzel @ 2021-04-06 10:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Ingo Molnar, x86, LKML, Song Liu, linux-raid,
	it+linux-x86, Krzysztof Olędzki, Andy Lutomirski,
	Krzysztof Mazur

Dear Borislav,


Am 02.04.21 um 16:05 schrieb Borislav Petkov:
> On Fri, Apr 02, 2021 at 10:33:51AM +0200, Paul Menzel wrote:

>> On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed
>> shown at the beginning of the boot.
>>
>>                         5.4.95        5.10.24
>> ----------------------------------------------
>> raid6: avx2x4 gen()   18429 MB/s     6155 MB/s
>> raid6: avx2x4 xor()    6644 MB/s     4274 MB/s
>> raid6: avx2x2 gen()   17894 MB/s    18744 MB/s
>> raid6: avx2x2 xor()   11642 MB/s    11950 MB/s
>> raid6: avx2x1 gen()   13992 MB/s    17112 MB/s
>> raid6: avx2x1 xor()   10855 MB/s    11143 MB/s
> 
> Looks like those two might help:
> 
> 49200d17d27d x86/fpu/64: Don't FNINIT in kernel_fpu_begin()
> e45122893a98 x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state

I booted Linux 5.12-rc6, containing these commits, on a Dell OptiPlex 
5055 with AMD Ryzen 5 PRO 1500 Quad-Core Processor, and the regression 
is still present for `avx2x4 xor()`:

                         5.4.95       5.10.24
----------------------------------------------
raid6: avx2x4 gen()    23964 MB/s   24540 MB/s 

raid6: avx2x4 xor()    13101 MB/s    8354 MB/s
raid6: avx2x2 gen()    22746 MB/s   26972 MB/s
raid6: avx2x2 xor()    14917 MB/s   16463 MB/s
raid6: avx2x1 gen()    17519 MB/s   24394 MB/s
raid6: avx2x1 xor()    14091 MB/s   15330 MB/s
raid6: sse2x4 gen()    16867 MB/s   16136 MB/s
raid6: sse2x4 xor()     9667 MB/s    8176 MB/s
raid6: sse2x2 gen()    14996 MB/s   18234 MB/s
raid6: sse2x2 xor()    10765 MB/s   10455 MB/s
raid6: sse2x1 gen()     7667 MB/s   13769 MB/s
raid6: sse2x1 xor()     7818 MB/s    7741 MB/s

What system are you using, and what results do you get with 5.4 and 
5.12-rc6?


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
  2021-04-06 10:58   ` Paul Menzel
@ 2021-04-06 12:41     ` Borislav Petkov
  0 siblings, 0 replies; 5+ messages in thread
From: Borislav Petkov @ 2021-04-06 12:41 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Thomas Gleixner, Ingo Molnar, x86, LKML, Song Liu, linux-raid,
	it+linux-x86, Krzysztof Olędzki, Andy Lutomirski,
	Krzysztof Mazur

On Tue, Apr 06, 2021 at 12:58:15PM +0200, Paul Menzel wrote:
> I booted Linux 5.12-rc6, containing these commits, on a Dell OptiPlex 5055
> with AMD Ryzen 5 PRO 1500 Quad-Core Processor, and the regression is still
> present for `avx2x4 xor()`:

So I don't think that's a regression - this looks more like "you should
not look at those numbers and compare them". Below are some results from
boot logs on one of my test boxes, first column is the kernel version.

IOW, you can use those numbers as a random number generator.

Now, I'm not saying that there isn't anything happening after
5.4-5.6-ish timeframe but this needs to be checked with a proper
benchmark and then look at what could be causing this. It could be the
MXCSR clearing but it's not like we don't need that so there won't be a
whole lot we can do.

But someone would have to sit down and do proper measurements first. And
bisect. Then we'll see...

HTH.

01-0+   :raid6: avx2x4   xor() 10311 MB/s
01-rc3+ :raid6: avx2x4   xor()  5497 MB/s
01-rc6+ :raid6: avx2x4   xor()  5369 MB/s
02-rc3+ :raid6: avx2x4   xor()  9812 MB/s
02-rc5+ :raid6: avx2x4   xor() 11479 MB/s
03-rc1+ :raid6: avx2x4   xor()  6434 MB/s
03-rc2+ :raid6: avx2x4   xor()  5487 MB/s
03-rc3+ :raid6: avx2x4   xor()  4840 MB/s
03-rc5+ :raid6: avx2x4   xor() 11104 MB/s
04-rc1+ :raid6: avx2x4   xor()  6443 MB/s
04-rc2+ :raid6: avx2x4   xor()  4959 MB/s
04-rc3+ :raid6: avx2x4   xor()  4918 MB/s
04-rc7+ :raid6: avx2x4   xor()  5219 MB/s
05-rc1+ :raid6: avx2x4   xor()  5362 MB/s
05-rc2+ :raid6: avx2x4   xor()  5356 MB/s
05-rc7+ :raid6: avx2x4   xor()  5821 MB/s
06-rc1+ :raid6: avx2x4   xor()  3358 MB/s
06-rc2+ :raid6: avx2x4   xor()  3591 MB/s
06-rc4+ :raid6: avx2x4   xor()  3947 MB/s
06-rc6+ :raid6: avx2x4   xor()  4100 MB/s
06-rc7+ :raid6: avx2x4   xor()  4038 MB/s
07-0+   :raid6: avx2x4   xor()  3410 MB/s
07-rc1+ :raid6: avx2x4   xor()  4836 MB/s
07-rc2+ :raid6: avx2x4   xor()  3194 MB/s
07-rc5  :raid6: avx2x4   xor()  4220 MB/s
07-rc6+ :raid6: avx2x4   xor()  3949 MB/s
07-rc7+ :raid6: avx2x4   xor()  3238 MB/s
09-0+   :raid6: avx2x4   xor()  3259 MB/s
09-rc1+ :raid6: avx2x4   xor()  2963 MB/s
09-rc4+ :raid6: avx2x4   xor()  2593 MB/s
09-rc5+ :raid6: avx2x4   xor()  2555 MB/s
09-rc7+ :raid6: avx2x4   xor()  3333 MB/s
09-rc8+ :raid6: avx2x4   xor()  2979 MB/s
10-rc4+ :raid6: avx2x4   xor()  4482 MB/s
10-rc5+ :raid6: avx2x4   xor()  6170 MB/s
10-rc7+ :raid6: avx2x4   xor()  3557 MB/s
11-rc1+ :raid6: avx2x4   xor()  1461 MB/s
11-rc2+ :raid6: avx2x4   xor()  4095 MB/s
11-rc7+ :raid6: avx2x4   xor()  6088 MB/s
12-rc1+ :raid6: avx2x4   xor()  4147 MB/s
12-rc2+ :raid6: avx2x4   xor()  4361 MB/s
12-rc3+ :raid6: avx2x4   xor()  4070 MB/s
12-rc4+ :raid6: avx2x4   xor()  6078 MB/s

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s
@ 2021-04-03 12:49 Thomas Backlund
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Backlund @ 2021-04-03 12:49 UTC (permalink / raw)
  To: Borislav Petkov, Paul Menzel
  Cc: Thomas Gleixner, Ingo Molnar, x86, LKML, Song Liu, linux-raid,
	it+linux-x86

Den 2021-04-02 kl. 17:05, skrev Borislav Petkov:
> On Fri, Apr 02, 2021 at 10:33:51AM +0200, Paul Menzel wrote:
>> Dear Linux folks,
>>
>>
>> On an two socket AMD EPYC 7601, we noticed a decrease in raid6 avx2x4 speed
>> shown at the beginning of the boot.
>>
>>                         5.4.95        5.10.24
>> ----------------------------------------------
>> raid6: avx2x4 gen()   18429 MB/s     6155 MB/s
>> raid6: avx2x4 xor()    6644 MB/s     4274 MB/s
>> raid6: avx2x2 gen()   17894 MB/s    18744 MB/s
>> raid6: avx2x2 xor()   11642 MB/s    11950 MB/s
>> raid6: avx2x1 gen()   13992 MB/s    17112 MB/s
>> raid6: avx2x1 xor()   10855 MB/s    11143 MB/s
>
> Looks like those two might help:
>

That would mean only this is missing:
> 49200d17d27d x86/fpu/64: Don't FNINIT in kernel_fpu_begin()


as this one landed in 5.10.11:
> e45122893a98 x86/fpu: Add kernel_fpu_begin_mask() to selectively initialize state
>

--
Thomas


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-06 12:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-02  8:33 [regression 5.4.97 → 5.10.24]: raid6 avx2x4 speed drops from 18429 MB/s to 6155 MB/s Paul Menzel
2021-04-02 14:05 ` Borislav Petkov
2021-04-06 10:58   ` Paul Menzel
2021-04-06 12:41     ` Borislav Petkov
2021-04-03 12:49 Thomas Backlund

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).