All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiko Sieger <1856335@bugs.launchpad.net>
To: qemu-devel@nongnu.org
Subject: [Bug 1856335] Re: Cache Layout wrong on many Zen Arch CPUs
Date: Sun, 10 May 2020 20:01:51 -0000	[thread overview]
Message-ID: <158914091142.4693.6888270013870332292.malone@soybean.canonical.com> (raw)
In-Reply-To: 157625616239.22064.10423897892496347105.malonedeb@gac.canonical.com

I upgraded to QEMU emulator version 5.0.50
Using q35-5.1 (the latest) and the following libvirt configuration:

  <memory unit="KiB">50331648</memory>
  <currentMemory unit="KiB">50331648</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement="static">24</vcpu>
  <cputune>
    <vcpupin vcpu="0" cpuset="0"/>
    <vcpupin vcpu="1" cpuset="12"/>
    <vcpupin vcpu="2" cpuset="1"/>
    <vcpupin vcpu="3" cpuset="13"/>
    <vcpupin vcpu="4" cpuset="2"/>
    <vcpupin vcpu="5" cpuset="14"/>
    <vcpupin vcpu="6" cpuset="3"/>
    <vcpupin vcpu="7" cpuset="15"/>
    <vcpupin vcpu="8" cpuset="4"/>
    <vcpupin vcpu="9" cpuset="16"/>
    <vcpupin vcpu="10" cpuset="5"/>
    <vcpupin vcpu="11" cpuset="17"/>
    <vcpupin vcpu="12" cpuset="6"/>
    <vcpupin vcpu="13" cpuset="18"/>
    <vcpupin vcpu="14" cpuset="7"/>
    <vcpupin vcpu="15" cpuset="19"/>
    <vcpupin vcpu="16" cpuset="8"/>
    <vcpupin vcpu="17" cpuset="20"/>
    <vcpupin vcpu="18" cpuset="9"/>
    <vcpupin vcpu="19" cpuset="21"/>
    <vcpupin vcpu="20" cpuset="10"/>
    <vcpupin vcpu="21" cpuset="22"/>
    <vcpupin vcpu="22" cpuset="11"/>
    <vcpupin vcpu="23" cpuset="23"/>
  </cputune>
  <os>
    <type arch="x86_64" machine="pc-q35-5.1">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/OVMF/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
    <boot dev="hd"/>
    <bootmenu enable="no"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vpindex state="on"/>
      <synic state="on"/>
      <stimer state="on"/>
      <vendor_id state="on" value="AuthenticAMD"/>
      <frequencies state="on"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
    <ioapic driver="kvm"/>
  </features>
  <cpu mode="host-passthrough" check="none">
    <topology sockets="1" cores="12" threads="2"/>
    <cache mode="passthrough"/>
    <feature policy="require" name="invtsc"/>
    <feature policy="require" name="hypervisor"/>
    <feature policy="require" name="topoext"/>
    <numa>
      <cell id="0" cpus="0-2,12-14" memory="12582912" unit="KiB"/>
      <cell id="1" cpus="3-5,15-17" memory="12582912" unit="KiB"/>
      <cell id="2" cpus="6-8,18-20" memory="12582912" unit="KiB"/>
      <cell id="3" cpus="9-11,21-23" memory="12582912" unit="KiB"/>
    </numa>
  </cpu>

...

/var/log/libvirt/qemu/win10.log:

-machine pc-q35-5.1,accel=kvm,usb=off,vmport=off,dump-guest-core=off,kernel_irqchip=on,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-cpu host,invtsc=on,hypervisor=on,topoext=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-synic,hv-stimer,hv-vendor-id=AuthenticAMD,hv-frequencies,hv-crash,kvm=off,host-cache-info=on,l3-cache=off \
-m 49152 \
-overcommit mem-lock=off \
-smp 24,sockets=1,cores=12,threads=2 \
-mem-prealloc \
-mem-path /dev/hugepages/libvirt/qemu/3-win10 \
-numa node,nodeid=0,cpus=0-2,cpus=12-14,mem=12288 \
-numa node,nodeid=1,cpus=3-5,cpus=15-17,mem=12288 \
-numa node,nodeid=2,cpus=6-8,cpus=18-20,mem=12288 \
-numa node,nodeid=3,cpus=9-11,cpus=21-23,mem=12288 \
...

For some reason I always get l3-cache=off.

CoreInfo.exe in Windows 10 then produces the following report
(shortened):

Logical to Physical Processor Map:
**----------------------  Physical Processor 0 (Hyperthreaded)
--*---------------------  Physical Processor 1
---*--------------------  Physical Processor 2
----**------------------  Physical Processor 3 (Hyperthreaded)
------**----------------  Physical Processor 4 (Hyperthreaded)
--------*---------------  Physical Processor 5
---------*--------------  Physical Processor 6
----------**------------  Physical Processor 7 (Hyperthreaded)
------------**----------  Physical Processor 8 (Hyperthreaded)
--------------*---------  Physical Processor 9
---------------*--------  Physical Processor 10
----------------**------  Physical Processor 11 (Hyperthreaded)
------------------**----  Physical Processor 12 (Hyperthreaded)
--------------------*---  Physical Processor 13
---------------------*--  Physical Processor 14
----------------------**  Physical Processor 15 (Hyperthreaded)

Logical Processor to Socket Map:
************************  Socket 0

Logical Processor to NUMA Node Map:
***---------***---------  NUMA Node 0
---***---------***------  NUMA Node 1
------***---------***---  NUMA Node 2
---------***---------***  NUMA Node 3

Approximate Cross-NUMA Node Access Cost (relative to fastest):
     00  01  02  03
00: 1.4 1.2 1.1 1.2
01: 1.1 1.1 1.3 1.1
02: 1.0 1.1 1.0 1.2
03: 1.1 1.2 1.2 1.2

Logical Processor to Cache Map:
**----------------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**----------------------  Instruction Cache   0, Level 1,   32 KB, Assoc   8, LineSize  64
**----------------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
***---------------------  Unified Cache       1, Level 3,   16 MB, Assoc  16, LineSize  64
--*---------------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--*---------------------  Instruction Cache   1, Level 1,   32 KB, Assoc   8, LineSize  64
--*---------------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
---*--------------------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
---*--------------------  Instruction Cache   2, Level 1,   32 KB, Assoc   8, LineSize  64
---*--------------------  Unified Cache       3, Level 2,  512 KB, Assoc   8, LineSize  64
---***------------------  Unified Cache       4, Level 3,   16 MB, Assoc  16, LineSize  64
----**------------------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
----**------------------  Instruction Cache   3, Level 1,   32 KB, Assoc   8, LineSize  64
----**------------------  Unified Cache       5, Level 2,  512 KB, Assoc   8, LineSize  64
------**----------------  Data Cache          4, Level 1,   32 KB, Assoc   8, LineSize  64
------**----------------  Instruction Cache   4, Level 1,   32 KB, Assoc   8, LineSize  64
------**----------------  Unified Cache       6, Level 2,  512 KB, Assoc   8, LineSize  64
------**----------------  Unified Cache       7, Level 3,   16 MB, Assoc  16, LineSize  64
--------*---------------  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
--------*---------------  Instruction Cache   5, Level 1,   32 KB, Assoc   8, LineSize  64
--------*---------------  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
--------*---------------  Unified Cache       9, Level 3,   16 MB, Assoc  16, LineSize  64
---------*--------------  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
---------*--------------  Instruction Cache   6, Level 1,   32 KB, Assoc   8, LineSize  64
---------*--------------  Unified Cache      10, Level 2,  512 KB, Assoc   8, LineSize  64
---------***------------  Unified Cache      11, Level 3,   16 MB, Assoc  16, LineSize  64
----------**------------  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
----------**------------  Instruction Cache   7, Level 1,   32 KB, Assoc   8, LineSize  64
----------**------------  Unified Cache      12, Level 2,  512 KB, Assoc   8, LineSize  64
------------**----------  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
------------**----------  Instruction Cache   8, Level 1,   32 KB, Assoc   8, LineSize  64
------------**----------  Unified Cache      13, Level 2,  512 KB, Assoc   8, LineSize  64
------------***---------  Unified Cache      14, Level 3,   16 MB, Assoc  16, LineSize  64
--------------*---------  Data Cache          9, Level 1,   32 KB, Assoc   8, LineSize  64
--------------*---------  Instruction Cache   9, Level 1,   32 KB, Assoc   8, LineSize  64
--------------*---------  Unified Cache      15, Level 2,  512 KB, Assoc   8, LineSize  64
---------------*--------  Data Cache         10, Level 1,   32 KB, Assoc   8, LineSize  64
---------------*--------  Instruction Cache  10, Level 1,   32 KB, Assoc   8, LineSize  64
---------------*--------  Unified Cache      16, Level 2,  512 KB, Assoc   8, LineSize  64
---------------*--------  Unified Cache      17, Level 3,   16 MB, Assoc  16, LineSize  64
----------------**------  Data Cache         11, Level 1,   32 KB, Assoc   8, LineSize  64
----------------**------  Instruction Cache  11, Level 1,   32 KB, Assoc   8, LineSize  64
----------------**------  Unified Cache      18, Level 2,  512 KB, Assoc   8, LineSize  64
----------------**------  Unified Cache      19, Level 3,   16 MB, Assoc  16, LineSize  64
------------------**----  Data Cache         12, Level 1,   32 KB, Assoc   8, LineSize  64
------------------**----  Instruction Cache  12, Level 1,   32 KB, Assoc   8, LineSize  64
------------------**----  Unified Cache      20, Level 2,  512 KB, Assoc   8, LineSize  64
------------------***---  Unified Cache      21, Level 3,   16 MB, Assoc  16, LineSize  64
--------------------*---  Data Cache         13, Level 1,   32 KB, Assoc   8, LineSize  64
--------------------*---  Instruction Cache  13, Level 1,   32 KB, Assoc   8, LineSize  64
--------------------*---  Unified Cache      22, Level 2,  512 KB, Assoc   8, LineSize  64
---------------------*--  Data Cache         14, Level 1,   32 KB, Assoc   8, LineSize  64
---------------------*--  Instruction Cache  14, Level 1,   32 KB, Assoc   8, LineSize  64
---------------------*--  Unified Cache      23, Level 2,  512 KB, Assoc   8, LineSize  64
---------------------***  Unified Cache      24, Level 3,   16 MB, Assoc  16, LineSize  64
----------------------**  Data Cache         15, Level 1,   32 KB, Assoc   8, LineSize  64
----------------------**  Instruction Cache  15, Level 1,   32 KB, Assoc   8, LineSize  64
----------------------**  Unified Cache      25, Level 2,  512 KB, Assoc   8, LineSize  64

Logical Processor to Group Map:
************************  Group 0


The above result is even further away from the actual L3 cache configuration.

So numatune doesn't produce the expected outcome.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1856335

Title:
  Cache Layout wrong on many Zen Arch CPUs

Status in QEMU:
  New

Bug description:
  AMD CPUs have L3 cache per 2, 3 or 4 cores. Currently, TOPOEXT seems
  to always map Cache ass if it was an 4-Core per CCX CPU, which is
  incorrect, and costs upwards 30% performance (more realistically 10%)
  in L3 Cache Layout aware applications.

  Example on a 4-CCX CPU (1950X /w 8 Cores and no SMT):

    <cpu mode='custom' match='exact' check='full'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
      <topology sockets='1' cores='8' threads='1'/>

  In windows, coreinfo reports correctly:

  ****----  Unified Cache 1, Level 3,    8 MB, Assoc  16, LineSize  64
  ----****  Unified Cache 6, Level 3,    8 MB, Assoc  16, LineSize  64

  On a 3-CCX CPU (3960X /w 6 cores and no SMT):

   <cpu mode='custom' match='exact' check='full'>
      <model fallback='forbid'>EPYC-IBPB</model>
      <vendor>AMD</vendor>
      <topology sockets='1' cores='6' threads='1'/>

  in windows, coreinfo reports incorrectly:

  ****--  Unified Cache  1, Level 3,    8 MB, Assoc  16, LineSize  64
  ----**  Unified Cache  6, Level 3,    8 MB, Assoc  16, LineSize  64

  Validated against 3.0, 3.1, 4.1 and 4.2 versions of qemu-kvm.

  With newer Qemu there is a fix (that does behave correctly) in using the dies parameter:
   <qemu:arg value='cores=3,threads=1,dies=2,sockets=1'/>

  The problem is that the dies are exposed differently than how AMD does
  it natively, they are exposed to Windows as sockets, which means, that
  if you are nto a business user, you can't ever have a machine with
  more than two CCX (6 cores) as consumer versions of Windows only
  supports two sockets. (Should this be reported as a separate bug?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1856335/+subscriptions


  parent reply	other threads:[~2020-05-10 20:16 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-13 16:56 [Bug 1856335] [NEW] Cache Layout wrong on many Zen Arch CPUs Damir
2019-12-16 10:06 ` [Bug 1856335] " Damir
2019-12-22 10:09 ` Damir
2019-12-22 10:10 ` Damir
2019-12-23 15:41 ` Babu Moger
2020-04-15 20:46 ` Heiko Sieger
2020-04-15 21:34 ` Babu Moger
2020-04-20 22:58 ` Babu Moger
2020-04-26 10:43 ` Heiko Sieger
2020-05-03 18:32 ` Heiko Sieger
2020-05-03 18:38 ` Heiko Sieger
2020-05-05 22:18 ` Babu Moger
2020-05-07 12:06 ` Heiko Sieger
2020-05-07 14:38 ` Babu Moger
2020-05-10 17:47 ` Damir
2020-05-10 20:01 ` Heiko Sieger [this message]
2020-05-14 23:31 ` Jan Klos
2020-05-15  2:41 ` Jan Klos
2020-05-15 13:04 ` Jan Klos
2020-05-15 13:41 ` Damir
2020-05-15 17:34 ` Babu Moger
2020-05-17 11:15 ` Jan Klos
2020-05-17 11:25 ` Jan Klos
2020-05-18 17:32 ` Heiko Sieger
2020-05-18 18:21 ` Babu Moger
2020-05-18 19:19 ` Heiko Sieger
2020-05-19  9:34 ` Jan Klos
2020-05-19 20:35 ` Heiko Sieger
2020-05-20 21:47 ` Heiko Sieger
2020-05-20 23:28 ` Heiko Sieger
2020-05-21 12:45 ` Jan Klos
2020-05-24 10:34 ` Heiko Sieger
2020-05-29  6:31 ` Heiko Sieger
2020-06-12  8:53 ` Jan Klos
2020-07-10 14:41 ` Heiko Sieger
2020-07-10 19:54 ` Jan Klos
2020-07-26 15:11 ` Sanjay Basu
2020-07-26 17:30 ` Heiko Sieger
2020-07-26 22:20 ` Sanjay Basu
2020-07-29  2:23 ` Heiko Sieger
2021-05-02 18:14 ` Thomas Huth
2021-07-02  4:17 ` Launchpad Bug Tracker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=158914091142.4693.6888270013870332292.malone@soybean.canonical.com \
    --to=1856335@bugs.launchpad.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.