From: Heiko Sieger <1856335@bugs.launchpad.net>
To: qemu-devel@nongnu.org
Subject: [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs
Date: Sun, 10 May 2020 20:01:51 -0000 [thread overview]
Message-ID: <158914091142.4693.6888270013870332292.malone@soybean.canonical.com> (raw)
In-Reply-To: 157625616239.22064.10423897892496347105.malonedeb@gac.canonical.com
I upgraded to QEMU emulator version 5.0.50
Using q35-5.1 (the latest) and the following libvirt configuration:
<memory unit="KiB">50331648</memory>
<currentMemory unit="KiB">50331648</currentMemory>
<memoryBacking>
<hugepages/>
</memoryBacking>
<vcpu placement="static">24</vcpu>
<cputune>
<vcpupin vcpu="0" cpuset="0"/>
<vcpupin vcpu="1" cpuset="12"/>
<vcpupin vcpu="2" cpuset="1"/>
<vcpupin vcpu="3" cpuset="13"/>
<vcpupin vcpu="4" cpuset="2"/>
<vcpupin vcpu="5" cpuset="14"/>
<vcpupin vcpu="6" cpuset="3"/>
<vcpupin vcpu="7" cpuset="15"/>
<vcpupin vcpu="8" cpuset="4"/>
<vcpupin vcpu="9" cpuset="16"/>
<vcpupin vcpu="10" cpuset="5"/>
<vcpupin vcpu="11" cpuset="17"/>
<vcpupin vcpu="12" cpuset="6"/>
<vcpupin vcpu="13" cpuset="18"/>
<vcpupin vcpu="14" cpuset="7"/>
<vcpupin vcpu="15" cpuset="19"/>
<vcpupin vcpu="16" cpuset="8"/>
<vcpupin vcpu="17" cpuset="20"/>
<vcpupin vcpu="18" cpuset="9"/>
<vcpupin vcpu="19" cpuset="21"/>
<vcpupin vcpu="20" cpuset="10"/>
<vcpupin vcpu="21" cpuset="22"/>
<vcpupin vcpu="22" cpuset="11"/>
<vcpupin vcpu="23" cpuset="23"/>
</cputune>
<os>
<type arch="x86_64" machine="pc-q35-5.1">hvm</type>
<loader readonly="yes" type="pflash">/usr/share/OVMF/x64/OVMF_CODE.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
<boot dev="hd"/>
<bootmenu enable="no"/>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state="on"/>
<vapic state="on"/>
<spinlocks state="on" retries="8191"/>
<vpindex state="on"/>
<synic state="on"/>
<stimer state="on"/>
<vendor_id state="on" value="AuthenticAMD"/>
<frequencies state="on"/>
</hyperv>
<kvm>
<hidden state="on"/>
</kvm>
<vmport state="off"/>
<ioapic driver="kvm"/>
</features>
<cpu mode="host-passthrough" check="none">
<topology sockets="1" cores="12" threads="2"/>
<cache mode="passthrough"/>
<feature policy="require" name="invtsc"/>
<feature policy="require" name="hypervisor"/>
<feature policy="require" name="topoext"/>
<numa>
<cell id="0" cpus="0-2,12-14" memory="12582912" unit="KiB"/>
<cell id="1" cpus="3-5,15-17" memory="12582912" unit="KiB"/>
<cell id="2" cpus="6-8,18-20" memory="12582912" unit="KiB"/>
<cell id="3" cpus="9-11,21-23" memory="12582912" unit="KiB"/>
</numa>
</cpu>
...
/var/log/libvirt/qemu/win10.log:
-machine pc-q35-5.1,accel=kvm,usb=off,vmport=off,dump-guest-core=off,kernel_irqchip=on,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-cpu host,invtsc=on,hypervisor=on,topoext=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-synic,hv-stimer,hv-vendor-id=AuthenticAMD,hv-frequencies,hv-crash,kvm=off,host-cache-info=on,l3-cache=off \
-m 49152 \
-overcommit mem-lock=off \
-smp 24,sockets=1,cores=12,threads=2 \
-mem-prealloc \
-mem-path /dev/hugepages/libvirt/qemu/3-win10 \
-numa node,nodeid=0,cpus=0-2,cpus=12-14,mem=12288 \
-numa node,nodeid=1,cpus=3-5,cpus=15-17,mem=12288 \
-numa node,nodeid=2,cpus=6-8,cpus=18-20,mem=12288 \
-numa node,nodeid=3,cpus=9-11,cpus=21-23,mem=12288 \
...
For some reason I always get l3-cache=off.
CoreInfo.exe in Windows 10 then produces the following report
(shortened):
Logical to Physical Processor Map:
**---------------------- Physical Processor 0 (Hyperthreaded)
--*--------------------- Physical Processor 1
---*-------------------- Physical Processor 2
----**------------------ Physical Processor 3 (Hyperthreaded)
------**---------------- Physical Processor 4 (Hyperthreaded)
--------*--------------- Physical Processor 5
---------*-------------- Physical Processor 6
----------**------------ Physical Processor 7 (Hyperthreaded)
------------**---------- Physical Processor 8 (Hyperthreaded)
--------------*--------- Physical Processor 9
---------------*-------- Physical Processor 10
----------------**------ Physical Processor 11 (Hyperthreaded)
------------------**---- Physical Processor 12 (Hyperthreaded)
--------------------*--- Physical Processor 13
---------------------*-- Physical Processor 14
----------------------** Physical Processor 15 (Hyperthreaded)
Logical Processor to Socket Map:
************************ Socket 0
Logical Processor to NUMA Node Map:
***---------***--------- NUMA Node 0
---***---------***------ NUMA Node 1
------***---------***--- NUMA Node 2
---------***---------*** NUMA Node 3
Approximate Cross-NUMA Node Access Cost (relative to fastest):
00 01 02 03
00: 1.4 1.2 1.1 1.2
01: 1.1 1.1 1.3 1.1
02: 1.0 1.1 1.0 1.2
03: 1.1 1.2 1.2 1.2
Logical Processor to Cache Map:
**---------------------- Data Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
**---------------------- Instruction Cache 0, Level 1, 32 KB, Assoc 8, LineSize 64
**---------------------- Unified Cache 0, Level 2, 512 KB, Assoc 8, LineSize 64
***--------------------- Unified Cache 1, Level 3, 16 MB, Assoc 16, LineSize 64
--*--------------------- Data Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
--*--------------------- Instruction Cache 1, Level 1, 32 KB, Assoc 8, LineSize 64
--*--------------------- Unified Cache 2, Level 2, 512 KB, Assoc 8, LineSize 64
---*-------------------- Data Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
---*-------------------- Instruction Cache 2, Level 1, 32 KB, Assoc 8, LineSize 64
---*-------------------- Unified Cache 3, Level 2, 512 KB, Assoc 8, LineSize 64
---***------------------ Unified Cache 4, Level 3, 16 MB, Assoc 16, LineSize 64
----**------------------ Data Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64
----**------------------ Instruction Cache 3, Level 1, 32 KB, Assoc 8, LineSize 64
----**------------------ Unified Cache 5, Level 2, 512 KB, Assoc 8, LineSize 64
------**---------------- Data Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64
------**---------------- Instruction Cache 4, Level 1, 32 KB, Assoc 8, LineSize 64
------**---------------- Unified Cache 6, Level 2, 512 KB, Assoc 8, LineSize 64
------**---------------- Unified Cache 7, Level 3, 16 MB, Assoc 16, LineSize 64
--------*--------------- Data Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64
--------*--------------- Instruction Cache 5, Level 1, 32 KB, Assoc 8, LineSize 64
--------*--------------- Unified Cache 8, Level 2, 512 KB, Assoc 8, LineSize 64
--------*--------------- Unified Cache 9, Level 3, 16 MB, Assoc 16, LineSize 64
---------*-------------- Data Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64
---------*-------------- Instruction Cache 6, Level 1, 32 KB, Assoc 8, LineSize 64
---------*-------------- Unified Cache 10, Level 2, 512 KB, Assoc 8, LineSize 64
---------***------------ Unified Cache 11, Level 3, 16 MB, Assoc 16, LineSize 64
----------**------------ Data Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64
----------**------------ Instruction Cache 7, Level 1, 32 KB, Assoc 8, LineSize 64
----------**------------ Unified Cache 12, Level 2, 512 KB, Assoc 8, LineSize 64
------------**---------- Data Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64
------------**---------- Instruction Cache 8, Level 1, 32 KB, Assoc 8, LineSize 64
------------**---------- Unified Cache 13, Level 2, 512 KB, Assoc 8, LineSize 64
------------***--------- Unified Cache 14, Level 3, 16 MB, Assoc 16, LineSize 64
--------------*--------- Data Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64
--------------*--------- Instruction Cache 9, Level 1, 32 KB, Assoc 8, LineSize 64
--------------*--------- Unified Cache 15, Level 2, 512 KB, Assoc 8, LineSize 64
---------------*-------- Data Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64
---------------*-------- Instruction Cache 10, Level 1, 32 KB, Assoc 8, LineSize 64
---------------*-------- Unified Cache 16, Level 2, 512 KB, Assoc 8, LineSize 64
---------------*-------- Unified Cache 17, Level 3, 16 MB, Assoc 16, LineSize 64
----------------**------ Data Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64
----------------**------ Instruction Cache 11, Level 1, 32 KB, Assoc 8, LineSize 64
----------------**------ Unified Cache 18, Level 2, 512 KB, Assoc 8, LineSize 64
----------------**------ Unified Cache 19, Level 3, 16 MB, Assoc 16, LineSize 64
------------------**---- Data Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64
------------------**---- Instruction Cache 12, Level 1, 32 KB, Assoc 8, LineSize 64
------------------**---- Unified Cache 20, Level 2, 512 KB, Assoc 8, LineSize 64
------------------***--- Unified Cache 21, Level 3, 16 MB, Assoc 16, LineSize 64
--------------------*--- Data Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64
--------------------*--- Instruction Cache 13, Level 1, 32 KB, Assoc 8, LineSize 64
--------------------*--- Unified Cache 22, Level 2, 512 KB, Assoc 8, LineSize 64
---------------------*-- Data Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64
---------------------*-- Instruction Cache 14, Level 1, 32 KB, Assoc 8, LineSize 64
---------------------*-- Unified Cache 23, Level 2, 512 KB, Assoc 8, LineSize 64
---------------------*** Unified Cache 24, Level 3, 16 MB, Assoc 16, LineSize 64
----------------------** Data Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64
----------------------** Instruction Cache 15, Level 1, 32 KB, Assoc 8, LineSize 64
----------------------** Unified Cache 25, Level 2, 512 KB, Assoc 8, LineSize 64
Logical Processor to Group Map:
************************ Group 0
The above result is even further away from the actual L3 cache configuration.
So numatune doesn't produce the expected outcome.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1856335
Title:
Cache Layout wrong on many Zen Arch CPUs
Status in QEMU:
New
Bug description:
AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems
to always map Cache ass if it was an 4-Core per CCX CPU, which is
incorrect, and costs upwards 30% performance (more realistically 10%)
in L3 Cache Layout aware applications.
Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT):
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC-IBPB</model>
<vendor>AMD</vendor>
<topology sockets='1' cores='8' threads='1'/>
In windows, coreinfo reports correctly:
****---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64
----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64
On a 3-CCX CPU (3960X /w 6 cores and no SMT):
<cpu mode='custom' match='exact' check='full'>
<model fallback='forbid'>EPYC-IBPB</model>
<vendor>AMD</vendor>
<topology sockets='1' cores='6' threads='1'/>
in windows, coreinfo reports incorrectly:
****-- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64
----** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64
Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm.
With newer Qemu there is a fix (that does behave correctly) in using the dies parameter:
<qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/>
The problem is that the dies are exposed differently than how AMD does
it natively, they are exposed to Windows as sockets, which means, that
if you are nto a business user, you can't ever have a machine with
more than two CCX (6 cores) as consumer versions of Windows only
supports two sockets. (Should this be reported as a separate bug?)
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions
next prev parent reply other threads:[~2020-05-10 20:16 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-13 16:56 [Bug 1856335] [NEW] Cache Layout wrong on many Zen Arch CPUs Damir
2019-12-16 10:06 ` [Bug 1856335] " Damir
2019-12-22 10:09 ` Damir
2019-12-22 10:10 ` Damir
2019-12-23 15:41 ` Babu Moger
2020-04-15 20:46 ` Heiko Sieger
2020-04-15 21:34 ` Babu Moger
2020-04-20 22:58 ` Babu Moger
2020-04-26 10:43 ` Heiko Sieger
2020-05-03 18:32 ` Heiko Sieger
2020-05-03 18:38 ` Heiko Sieger
2020-05-05 22:18 ` Babu Moger
2020-05-07 12:06 ` Heiko Sieger
2020-05-07 14:38 ` Babu Moger
2020-05-10 17:47 ` Damir
2020-05-10 20:01 ` Heiko Sieger [this message]
2020-05-14 23:31 ` Jan Klos
2020-05-15 2:41 ` Jan Klos
2020-05-15 13:04 ` Jan Klos
2020-05-15 13:41 ` Damir
2020-05-15 17:34 ` Babu Moger
2020-05-17 11:15 ` Jan Klos
2020-05-17 11:25 ` Jan Klos
2020-05-18 17:32 ` Heiko Sieger
2020-05-18 18:21 ` Babu Moger
2020-05-18 19:19 ` Heiko Sieger
2020-05-19 9:34 ` Jan Klos
2020-05-19 20:35 ` Heiko Sieger
2020-05-20 21:47 ` Heiko Sieger
2020-05-20 23:28 ` Heiko Sieger
2020-05-21 12:45 ` Jan Klos
2020-05-24 10:34 ` Heiko Sieger
2020-05-29 6:31 ` Heiko Sieger
2020-06-12 8:53 ` Jan Klos
2020-07-10 14:41 ` Heiko Sieger
2020-07-10 19:54 ` Jan Klos
2020-07-26 15:11 ` Sanjay Basu
2020-07-26 17:30 ` Heiko Sieger
2020-07-26 22:20 ` Sanjay Basu
2020-07-29 2:23 ` Heiko Sieger
2021-05-02 18:14 ` Thomas Huth
2021-07-02 4:17 ` Launchpad Bug Tracker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=158914091142.4693.6888270013870332292.malone@soybean.canonical.com \
--to=1856335@bugs.launchpad.net \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.