All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] Qemu and 32 PCIe devices
@ 2017-08-08 10:39 Marcin Juszkiewicz
  2017-08-08 15:51 ` Laszlo Ersek
  0 siblings, 1 reply; 10+ messages in thread
From: Marcin Juszkiewicz @ 2017-08-08 10:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laszlo Ersek, Marcel Apfelbaum

[-- Attachment #1: Type: text/plain, Size: 3679 bytes --]

Hello

Few days ago I had an issue with getting PCIe hotplug working on AArch64
machine. Enabled PCI hotplug in kernel and then got hit by some issues.

Out setup is a bunch of aarch64 servers and we use OpenStack to provide
access to arm64 systems. OpenStack uses libvirt to control VMs and
allows to add network interfaces and disk volumes to running systems.

By libvirt AArch64 is treated as PCIe machine without legacy PCI slots.
So to hotplug anything you first need to have enough pcie-root-port
entries as it is described in Qemu docs/pcie.txt and by patch to libvirt
documentation [1][2].

1. https://bugs.linaro.org/attachment.cgi?id=782
2. https://www.redhat.com/archives/libvir-list/2017-July/msg01033.html


But things get complicated once you are going to 32 PCIe devices limit
(which in our setup will rather not happen). UEFI first takes ages to
boot just to land in UEFI shell as it forgot all PCIe devices. With 31
devices it boots (also after long time).


I attached two xml files with VM definitions (for use with virsh). Also
attached their qemu command lines. One has 31 PCI devices, second has
32. Both use [30] as rootfs.

30.
https://builds.96boards.org/snapshots/reference-platform/components/developer-cloud/debian/cloud-image/30/debian-cloud-image.qcow2

Output from 31 PCI devices VM:

root@unassigned-hostname:~# lspci;lspci |wc -l
00:00.0 Host bridge: Red Hat, Inc. Device 0008
00:01.0 PCI bridge: Red Hat, Inc. Device 000c
00:01.1 PCI bridge: Red Hat, Inc. Device 000c
00:01.2 PCI bridge: Red Hat, Inc. Device 000c
00:01.3 PCI bridge: Red Hat, Inc. Device 000c
00:01.4 PCI bridge: Red Hat, Inc. Device 000c
00:01.5 PCI bridge: Red Hat, Inc. Device 000c
00:01.6 PCI bridge: Red Hat, Inc. Device 000c
00:01.7 PCI bridge: Red Hat, Inc. Device 000c
00:02.0 PCI bridge: Red Hat, Inc. Device 000c
00:02.1 PCI bridge: Red Hat, Inc. Device 000c
00:02.2 PCI bridge: Red Hat, Inc. Device 000c
00:02.3 PCI bridge: Red Hat, Inc. Device 000c
00:02.4 PCI bridge: Red Hat, Inc. Device 000c
00:02.5 PCI bridge: Red Hat, Inc. Device 000c
00:02.6 PCI bridge: Red Hat, Inc. Device 000c
01:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
02:00.0 SCSI storage controller: Red Hat, Inc Device 1048 (rev 01)
03:00.0 Communication controller: Red Hat, Inc Device 1043 (rev 01)
04:00.0 SCSI storage controller: Red Hat, Inc Device 1042 (rev 01)
05:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
06:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
07:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
08:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
09:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
0a:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
0b:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
0c:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
0d:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
0e:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
0f:00.0 Ethernet controller: Red Hat, Inc Device 1041 (rev 01)
31
root@unassigned-hostname:~# lspci -t
-[0000:00]-+-00.0
           +-01.0-[01]----00.0
           +-01.1-[02]----00.0
           +-01.2-[03]----00.0
           +-01.3-[04]----00.0
           +-01.4-[05]----00.0
           +-01.5-[06]----00.0
           +-01.6-[07]----00.0
           +-01.7-[08]----00.0
           +-02.0-[09]----00.0
           +-02.1-[0a]----00.0
           +-02.2-[0b]----00.0
           +-02.3-[0c]----00.0
           +-02.4-[0d]----00.0
           +-02.5-[0e]----00.0
           \-02.6-[0f]----00.0

>From what I was told some parts of that issue lies in UEFI, some in
Qemu, some in Linux kernel.

[-- Attachment #2: vm-pci-32.xml --]
[-- Type: text/xml, Size: 8579 bytes --]

<domain type='kvm'>
  <name>pci</name>
  <uuid>5361dd27-bdc1-4178-b26f-323b4009b226</uuid>
  <memory unit='KiB'>12312576</memory>
  <currentMemory unit='KiB'>12312576</currentMemory>
  <vcpu placement='static'>5</vcpu>
  <os>
    <type arch='aarch64' machine='virt-2.9'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/pci_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <gic version='2'/>
  </features>
  <cpu mode='host-passthrough' check='none'/>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-aarch64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/home/linaro/virt/debian-30.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0xf'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='12' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='12' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='13' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='13' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='14' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='14' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='15' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='15' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='16' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='16' port='0x17'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:37:6a:d5'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:db:35:b3'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:8a:54:6c'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:04:2b:e4'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:6b:fc:ac'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:d4:b0:e2'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:52:2f:cc'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:0d:b1:bb'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:8a:a5:69'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:ad:60:b1'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:e3:9a:70'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:56:58:b3'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x0f' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='unix'>
      <target type='virtio' name='org.qemu.guest_agent.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
  </devices>
</domain>


[-- Attachment #3: vm-pci-31.xml --]
[-- Type: text/xml, Size: 10405 bytes --]

<domain type='kvm' id='57'>
  <name>pci</name>
  <uuid>5361dd27-bdc1-4178-b26f-323b4009b226</uuid>
  <memory unit='KiB'>12312576</memory>
  <currentMemory unit='KiB'>12312576</currentMemory>
  <vcpu placement='static'>5</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='aarch64' machine='virt-2.9'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/AAVMF/AAVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/pci_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <gic version='2'/>
  </features>
  <cpu mode='host-passthrough' check='none'/>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/bin/qemu-system-aarch64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/home/linaro/virt/debian-30.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0xf'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x10'/>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x11'/>
      <alias name='pci.10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0x12'/>
      <alias name='pci.11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='12' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='12' port='0x13'/>
      <alias name='pci.12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='13' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='13' port='0x14'/>
      <alias name='pci.13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='14' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='14' port='0x15'/>
      <alias name='pci.14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='15' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='15' port='0x16'/>
      <alias name='pci.15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:37:6a:d5'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:db:35:b3'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:8a:54:6c'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet4'/>
      <model type='virtio'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:04:2b:e4'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet5'/>
      <model type='virtio'/>
      <alias name='net3'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:6b:fc:ac'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet6'/>
      <model type='virtio'/>
      <alias name='net4'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:d4:b0:e2'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet7'/>
      <model type='virtio'/>
      <alias name='net5'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:52:2f:cc'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet8'/>
      <model type='virtio'/>
      <alias name='net6'/>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:0d:b1:bb'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet9'/>
      <model type='virtio'/>
      <alias name='net7'/>
      <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:8a:a5:69'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet10'/>
      <model type='virtio'/>
      <alias name='net8'/>
      <address type='pci' domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:ad:60:b1'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet11'/>
      <model type='virtio'/>
      <alias name='net9'/>
      <address type='pci' domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:e3:9a:70'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet12'/>
      <model type='virtio'/>
      <alias name='net10'/>
      <address type='pci' domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:56:58:b3'/>
      <source network='default' bridge='virbr0'/>
      <target dev='vnet13'/>
      <model type='virtio'/>
      <alias name='net11'/>
      <address type='pci' domain='0x0000' bus='0x0f' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/4'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/4'>
      <source path='/dev/pts/4'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-57-pci/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
  </devices>
  <seclabel type='none' model='none'/>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+64055:+64055</label>
    <imagelabel>+64055:+64055</imagelabel>
  </seclabel>
</domain>


[-- Attachment #4: pci-32.txt --]
[-- Type: text/plain, Size: 4266 bytes --]

/usr/bin/qemu-system-aarch64 -name guest=pci,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-108-pci/master-key.aes -machine virt-2.9,accel=kvm,usb=off,dump-guest-core=off -cpu host -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/pci_VARS.fd,if=pflash,format=raw,unit=1 -m 12024 -realtime mlock=off -smp 5,sockets=5,cores=1,threads=1 -uuid 5361dd27-bdc1-4178-b26f-323b4009b226 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-108-pci/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 -device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 -device pcie-root-port,port=0xd,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 -device pcie-root-port,port=0xe,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 -device pcie-root-port,port=0xf,chassis=8,id=pci.8,bus=pcie.0,addr=0x1.0x7 -device pcie-root-port,port=0x10,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=10,id=pci.10,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=11,id=pci.11,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=12,id=pci.12,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=13,id=pci.13,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=14,id=pci.14,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=15,id=pci.15,bus=pcie.0,addr=0x2.0x6 -device pcie-root-port,port=0x17,chassis=16,id=pci.16,bus=pcie.0,addr=0x2.0x7 -device virtio-scsi-pci,id=scsi0,bus=pci.2,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 -drive file=/home/linaro/virt/debian-30.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:37:6a:d5,bus=pci.1,addr=0x0 -netdev tap,fd=30,id=hostnet1,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:db:35:b3,bus=pci.5,addr=0x0 -netdev tap,fd=32,id=hostnet2,vhost=on,vhostfd=33 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:8a:54:6c,bus=pci.6,addr=0x0 -netdev tap,fd=34,id=hostnet3,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=52:54:00:04:2b:e4,bus=pci.7,addr=0x0 -netdev tap,fd=36,id=hostnet4,vhost=on,vhostfd=37 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=52:54:00:6b:fc:ac,bus=pci.8,addr=0x0 -netdev tap,fd=38,id=hostnet5,vhost=on,vhostfd=39 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=52:54:00:d4:b0:e2,bus=pci.9,addr=0x0 -netdev tap,fd=40,id=hostnet6,vhost=on,vhostfd=41 -device virtio-net-pci,netdev=hostnet6,id=net6,mac=52:54:00:52:2f:cc,bus=pci.10,addr=0x0 -netdev tap,fd=42,id=hostnet7,vhost=on,vhostfd=43 -device virtio-net-pci,netdev=hostnet7,id=net7,mac=52:54:00:0d:b1:bb,bus=pci.11,addr=0x0 -netdev tap,fd=44,id=hostnet8,vhost=on,vhostfd=45 -device virtio-net-pci,netdev=hostnet8,id=net8,mac=52:54:00:8a:a5:69,bus=pci.12,addr=0x0 -netdev tap,fd=46,id=hostnet9,vhost=on,vhostfd=47 -device virtio-net-pci,netdev=hostnet9,id=net9,mac=52:54:00:ad:60:b1,bus=pci.13,addr=0x0 -netdev tap,fd=48,id=hostnet10,vhost=on,vhostfd=49 -device virtio-net-pci,netdev=hostnet10,id=net10,mac=52:54:00:e3:9a:70,bus=pci.14,addr=0x0 -netdev tap,fd=50,id=hostnet11,vhost=on,vhostfd=51 -device virtio-net-pci,netdev=hostnet11,id=net11,mac=52:54:00:56:58:b3,bus=pci.15,addr=0x0 -chardev pty,id=charserial0 -serial chardev:charserial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-108-pci/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -msg timestamp=on

[-- Attachment #5: pci-31.txt --]
[-- Type: text/plain, Size: 4188 bytes --]

/usr/bin/qemu-system-aarch64 -name guest=pci,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-107-pci/master-key.aes -machine virt-2.9,accel=kvm,usb=off,dump-guest-core=off -cpu host -drive file=/usr/share/AAVMF/AAVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/pci_VARS.fd,if=pflash,format=raw,unit=1 -m 12024 -realtime mlock=off -smp 5,sockets=5,cores=1,threads=1 -uuid 5361dd27-bdc1-4178-b26f-323b4009b226 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-107-pci/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 -device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 -device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 -device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 -device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 -device pcie-root-port,port=0xd,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 -device pcie-root-port,port=0xe,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 -device pcie-root-port,port=0xf,chassis=8,id=pci.8,bus=pcie.0,addr=0x1.0x7 -device pcie-root-port,port=0x10,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=10,id=pci.10,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=11,id=pci.11,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=12,id=pci.12,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=13,id=pci.13,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=14,id=pci.14,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=15,id=pci.15,bus=pcie.0,addr=0x2.0x6 -device virtio-scsi-pci,id=scsi0,bus=pci.2,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 -drive file=/home/linaro/virt/debian-30.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:37:6a:d5,bus=pci.1,addr=0x0 -netdev tap,fd=31,id=hostnet1,vhost=on,vhostfd=32 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:db:35:b3,bus=pci.5,addr=0x0 -netdev tap,fd=33,id=hostnet2,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:8a:54:6c,bus=pci.6,addr=0x0 -netdev tap,fd=35,id=hostnet3,vhost=on,vhostfd=36 -device virtio-net-pci,netdev=hostnet3,id=net3,mac=52:54:00:04:2b:e4,bus=pci.7,addr=0x0 -netdev tap,fd=37,id=hostnet4,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet4,id=net4,mac=52:54:00:6b:fc:ac,bus=pci.8,addr=0x0 -netdev tap,fd=39,id=hostnet5,vhost=on,vhostfd=40 -device virtio-net-pci,netdev=hostnet5,id=net5,mac=52:54:00:d4:b0:e2,bus=pci.9,addr=0x0 -netdev tap,fd=41,id=hostnet6,vhost=on,vhostfd=42 -device virtio-net-pci,netdev=hostnet6,id=net6,mac=52:54:00:52:2f:cc,bus=pci.10,addr=0x0 -netdev tap,fd=43,id=hostnet7,vhost=on,vhostfd=44 -device virtio-net-pci,netdev=hostnet7,id=net7,mac=52:54:00:0d:b1:bb,bus=pci.11,addr=0x0 -netdev tap,fd=45,id=hostnet8,vhost=on,vhostfd=46 -device virtio-net-pci,netdev=hostnet8,id=net8,mac=52:54:00:8a:a5:69,bus=pci.12,addr=0x0 -netdev tap,fd=47,id=hostnet9,vhost=on,vhostfd=48 -device virtio-net-pci,netdev=hostnet9,id=net9,mac=52:54:00:ad:60:b1,bus=pci.13,addr=0x0 -netdev tap,fd=49,id=hostnet10,vhost=on,vhostfd=50 -device virtio-net-pci,netdev=hostnet10,id=net10,mac=52:54:00:e3:9a:70,bus=pci.14,addr=0x0 -netdev tap,fd=51,id=hostnet11,vhost=on,vhostfd=52 -device virtio-net-pci,netdev=hostnet11,id=net11,mac=52:54:00:56:58:b3,bus=pci.15,addr=0x0 -chardev pty,id=charserial0 -serial chardev:charserial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-107-pci/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -msg timestamp=on

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Qemu and 32 PCIe devices
  2017-08-08 10:39 [Qemu-devel] Qemu and 32 PCIe devices Marcin Juszkiewicz
@ 2017-08-08 15:51 ` Laszlo Ersek
  2017-08-09  1:06   ` Laszlo Ersek
  0 siblings, 1 reply; 10+ messages in thread
From: Laszlo Ersek @ 2017-08-08 15:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcin Juszkiewicz, Marcel Apfelbaum, Paolo Bonzini, Drew Jones,
	Christoffer Dall, Peter Maydell, Gema Gomez-Solano, Marc Zygnier

On 08/08/17 12:39, Marcin Juszkiewicz wrote:
>
> Few days ago I had an issue with getting PCIe hotplug working on
> AArch64 machine. Enabled PCI hotplug in kernel and then got hit by
> some issues.
>
> Out setup is a bunch of aarch64 servers and we use OpenStack to
> provide access to arm64 systems. OpenStack uses libvirt to control VMs
> and allows to add network interfaces and disk volumes to running
> systems.
>
> By libvirt AArch64 is treated as PCIe machine without legacy PCI
> slots. So to hotplug anything you first need to have enough
> pcie-root-port entries as it is described in Qemu docs/pcie.txt and by
> patch to libvirt documentation [1][2].
>
> 1. https://bugs.linaro.org/attachment.cgi?id=782
> 2. https://www.redhat.com/archives/libvir-list/2017-July/msg01033.html
>
>
> But things get complicated once you are going to 32 PCIe devices limit
> (which in our setup will rather not happen). UEFI first takes ages to
> boot just to land in UEFI shell as it forgot all PCIe devices. With 31
> devices it boots (also after long time).

OK, let me quote my earlier off-list followup (so that the discussion
proceed publicly -- hopefully other thread participants will also repeat
their off-list messages, and/or clarify my conveying them):

On 08/07/17 19:32, Laszlo Ersek wrote:
>
> (1) Everything that's being worked out right now for PCI Express
> hotplug on Q35 (x86) applies equally to aarch64. Meaning, hotplug
> oriented aperture reservations for bus numbers, "IO ports", and
> various types of MMIO.
>
> This is to say that resource reservation is not a done deal (it's
> being designed) for x86 even, and it will take changes for both QEMU
> and OVMF. In OVMF, the relevant driver is "OvmfPkg/PciHotPlugInitDxe".
> Once we implement the necessary logic there (using a new
> "communication channel" with QEMU), then the driver should be possible
> to include in the ArmVirtQemu builds as well.
>
> Marcel, can you please provide pointers to the qemu-devel and seabios
> mailing list discussions?

(Marcel provided the following links, "IO/MEM/Bus reservation hints":

 SeaBIOS:
   - https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06289.html
   - https://www.mail-archive.com/qemu-devel@nongnu.org/msg468550.html
   - https://www.mail-archive.com/qemu-devel@nongnu.org/msg470584.html
 QEMU:
   - https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg07110.html
   - https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg09157.html
)

>
> The OVMF (and AAVMF) RHBZs for adhering to QEMU's IO and MMIO aperture
> hints are:
>
> - https://bugzilla.redhat.com/show_bug.cgi?id=1434740
> - https://bugzilla.redhat.com/show_bug.cgi?id=1434747
>
> The prerequisite QEMU RHBZs are:
> - https://bugzilla.redhat.com/show_bug.cgi?id=1344299
> - https://bugzilla.redhat.com/show_bug.cgi?id=1437113
>
> In addition, there are some issues with edk2's generic PciBusDxe as
> well; for example ATM it ignores bus number reservations for the kinds
> of hotplug controllers that we care about:
>
> https://bugzilla.tianocore.org/show_bug.cgi?id=656
>
> I got a promise on edk2-devel from Ruiyu Ni (PciBusDxe maintainer from
> Intel) that he'd fix the issue at some point, but it's "not high
> priority".
>
>
> (2) As discussed earlier, the aarch64 "virt" machine type has a
> serious limitation relative to Q35: the former's MMCONFIG space is so
> small that it allows for only 16 buses. Each PCI Express root port and
> downstream port uses up a separate bus number. Please re-check
> "docs/pcie.txt" in the QEMU source tree with particular attention to
> section "4. Bus numbers issues" and "5. Hot-plug".
>
> When the edk2 PciBusDxe driver runs out of any kind of aperture (such
> as bus numbers, for example), it starts dropping devices. In general
> it prefers to drop devices with the largest resource consumptions (so
> that it has to drop the fewest devices for enabling the rest to work).
> In case many "uniform" devices are used, it is unspecified which ones
> get dropped. This can easily lead to boot failures (if the dropped
> devices inlcude the one(s) you wanted to boot off of).
>
> (3) The edk2 boot performance when using a large number of PCI
> (Express) devices is indeed really bad. I've looked into it earlier
> (on Q35), briefly, see for example:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1441550#c11

(Here Marcel referenced the following message, also reporting about slow
PCI enumeration:
<https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg07182.html>.
In this case the issue hit the guest kernel; OVMF was not used (and
SeaBIOS was unaffected).)

> As you can see, platform-independent, NIC enumeration-related code in
> edk2 is *really heavy* on UEFI variables. If you had a physical
> machine with lots of pflash, and tens (or hundreds) of NICs, the perf
> would suffer mostly the same.
>
> Anyway, beyond the things written in that comment, there is one very
> interesting symptom that makes me think another (milder?) bottleneck
> could be in QEMU:
>
> When having a large number of PCI(e) devices (to the tune of 100+),
> even the printing of DEBUG messages slows down extremely, to the point
> where I can basically follow, with the naked eye, the printing of
> *individual* characters, on the QEMU debug port. (Obviously such
> printing is unrelated to PCI devices; the QEMU debug port is a simple
> platform device on Q35 and i440fx). This suggests to me that the high
> number of MemoryRegion objects in QEMU, used for the BARs of PCI(e)
> devices and bridges, could slow down the dispatching of the individual
> IO or MMIO accesses. I don't know what data structure QEMU uses for
> representing the "flat view" of the address sapce, but I think it
> *could* be a bottleneck. At least I tried to profile a few bits in the
> firmware, and found nothing related specifically to the slowdown of
> DEBUG prints.

(Paolo says the data structure is a radix tree, so the bottleneck being
there would be surprising. Also Paolo has given me tips for profiling,
so I'm looking into it.)

(Another remark from Paolo, paraphrased: programming the BARs relies on
an O(n^3) algorithm, fixing which is on the todo list.)

>
> In summary:
> - all of the hotplug stuff is still under design / in flux even for
>   x86
> - MMCONFIG of "virt" is too small (too few bus numbers available)
> - the boot perf issue might need even KVM tracing (this is not to say
>   that the UEFI variable massaging during edk2 NIC binding isn't
>   resource hungry! See again RHBZ#1441550 comment 11.)

(Drew added a few more points about mach-virt here; I'll let him repeat
himself.)

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Qemu and 32 PCIe devices
  2017-08-08 15:51 ` Laszlo Ersek
@ 2017-08-09  1:06   ` Laszlo Ersek
  2017-08-09  7:26     ` Paolo Bonzini
  0 siblings, 1 reply; 10+ messages in thread
From: Laszlo Ersek @ 2017-08-09  1:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Marcin Juszkiewicz, Marcel Apfelbaum, Paolo Bonzini, Drew Jones,
	Christoffer Dall, Peter Maydell, Gema Gomez-Solano, Marc Zygnier

[-- Attachment #1: Type: text/plain, Size: 26228 bytes --]

On 08/08/17 17:51, Laszlo Ersek wrote:
> On 08/08/17 12:39, Marcin Juszkiewicz wrote:

>> Anyway, beyond the things written in that comment, there is one very
>> interesting symptom that makes me think another (milder?) bottleneck
>> could be in QEMU:
>>
>> When having a large number of PCI(e) devices (to the tune of 100+),
>> even the printing of DEBUG messages slows down extremely, to the
>> point where I can basically follow, with the naked eye, the printing
>> of *individual* characters, on the QEMU debug port. (Obviously such
>> printing is unrelated to PCI devices; the QEMU debug port is a simple
>> platform device on Q35 and i440fx). This suggests to me that the high
>> number of MemoryRegion objects in QEMU, used for the BARs of PCI(e)
>> devices and bridges, could slow down the dispatching of the
>> individual IO or MMIO accesses. I don't know what data structure QEMU
>> uses for representing the "flat view" of the address sapce, but I
>> think it *could* be a bottleneck. At least I tried to profile a few
>> bits in the firmware, and found nothing related specifically to the
>> slowdown of DEBUG prints.
>
> (Paolo says the data structure is a radix tree, so the bottleneck
> being there would be surprising. Also Paolo has given me tips for
> profiling, so I'm looking into it.)

I have some results here. They are probably not as well compiled as they
would be by someone who profiles stuff every morning for breakfast, but
it's a start I hope.


(1) My command script was the following. I used virtio-scsi-pci
controllers because they don't need backends and also have no pflash
impact in the firmware.

The PCI hierarchy is basically a DMI-PCI bridge plugged into the PCI
Express root complex, with up to five PCI-PCI bridges on that, and then
31 virtio-scsi-pci devices on each PCI-PCI bridge. By moving the "exit
0" around, I could control the number of virtio PCI devices. Pretty
crude, but it worked.

Also, I used PCI (and not PCI Express) bridges and devices because I
didn't want to run out of IO space for this test. (That's a different
question, related to other points in this (and other) threads.)

> CODE=/home/virt-images/OVMF_CODE.4m.fd
> TMPL=/home/virt-images/OVMF_VARS.4m.fd
>
> cd ~/tmp/
>
> if ! [ -e vars16.fd ]; then
>   cp $TMPL vars16.fd
> fi
>
> qemu-system-x86_64 \
>   \
>   -machine q35,vmport=off,accel=kvm \
>   -drive if=pflash,readonly,format=raw,file=$CODE \
>   -drive if=pflash,format=raw,file=vars16.fd \
>   -net none \
>   -display none \
>   -fw_cfg name=opt/ovmf/PcdResizeXterm,string=y \
>   -m 2048 \
>   -debugcon file:debug16.log \
>   -global isa-debugcon.iobase=0x402 \
>   -name debug-threads=on \
>   -s \
>   -chardev stdio,signal=off,mux=on,id=char0 \
>   -mon chardev=char0,mode=readline \
>   -serial chardev:char0 \
>   \
>   -trace enable='pci_update_mappings*' \
>   \
>   -device i82801b11-bridge,id=dmi-pci-bridge \
>   \
>   -device pci-bridge,id=bridge-1,chassis_nr=1,bus=dmi-pci-bridge \
>   -device pci-bridge,id=bridge-2,chassis_nr=2,bus=dmi-pci-bridge \
>   -device pci-bridge,id=bridge-3,chassis_nr=3,bus=dmi-pci-bridge \
>   -device pci-bridge,id=bridge-4,chassis_nr=4,bus=dmi-pci-bridge \
>   -device pci-bridge,id=bridge-5,chassis_nr=5,bus=dmi-pci-bridge \
>   \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x1.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x2.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x3.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x4.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x5.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x6.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x7.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x8.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x9.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0xa.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0xb.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0xc.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0xd.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0xe.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0xf.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x10.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x11.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x12.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x13.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x14.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x15.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x16.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x17.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x18.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x19.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x1a.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x1b.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x1c.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x1d.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x1e.0 \
>   -device virtio-scsi-pci,bus=bridge-1,addr=0x1f.0 \
>   \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x1.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x2.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x3.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x4.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x5.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x6.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x7.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x8.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x9.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0xa.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0xb.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0xc.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0xd.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0xe.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0xf.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x10.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x11.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x12.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x13.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x14.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x15.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x16.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x17.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x18.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x19.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x1a.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x1b.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x1c.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x1d.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x1e.0 \
>   -device virtio-scsi-pci,bus=bridge-2,addr=0x1f.0 \
>   \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x1.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x2.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x3.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x4.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x5.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x6.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x7.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x8.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x9.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0xa.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0xb.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0xc.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0xd.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0xe.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0xf.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x10.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x11.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x12.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x13.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x14.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x15.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x16.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x17.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x18.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x19.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x1a.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x1b.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x1c.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x1d.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x1e.0 \
>   -device virtio-scsi-pci,bus=bridge-3,addr=0x1f.0 \
>
> exit 0
>
>   \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x1.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x2.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x3.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x4.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x5.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x6.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x7.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x8.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x9.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0xa.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0xb.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0xc.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0xd.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0xe.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0xf.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x10.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x11.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x12.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x13.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x14.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x15.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x16.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x17.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x18.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x19.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x1a.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x1b.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x1c.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x1d.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x1e.0 \
>   -device virtio-scsi-pci,bus=bridge-4,addr=0x1f.0 \
>   \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x1.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x2.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x3.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x4.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x5.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x6.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x7.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x8.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x9.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0xa.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0xb.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0xc.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0xd.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0xe.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0xf.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x10.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x11.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x12.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x13.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x14.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x15.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x16.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x17.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x18.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x19.0 \
>   -device virtio-scsi-pci,bus=bridge-5,addr=0x1a.0
>
> exit 0
>
>   -drive id=iso,if=none,format=raw,readonly,file=/usr/share/OVMF/UefiShell.iso \
>   -device ide-cd,drive=iso,bootindex=0 \
>   \


(2) At first I did some super crude measurements, basically from VM
start to UEFI shell prompt:

- 31 devices (single PCI-PCI bridge):  14 seconds
- 46 devices (two PCI-PCI bridges):    31 seconds
- 62 devices (two PCI-PCI bridges):    64 seconds
- 93 devices (three PCI-PCI bridges): 203 seconds

The device count and runtime multipliers are, relative to the first
line:
- 31/31  = 1;     14/14  =  1
- 46/31 ~= 1.48;  31/14 ~=  2.21 (note: 1.48 * 1.48 ~= 2.19)
- 62/31  = 2;     64/14 ~=  4.57 (note: 2    * 2     = 4)
- 93/31  = 3;    203/14 ~= 14.5  (note: 3    * 3     = 9)

So the boot time seems to be a super-quadratic function of the virtio
PCI device count.


(3) Paolo told me to grab a "perf top" screenshot of when the firmware
was in one of those parts that seemed to slow down when adding more
devices.

I took several such screenshots (from different slowed-down parts, with
different device counts), and they were all close variants of the
following:

>   20.14%  qemu-system-x86_64                  [.] render_memory_region
>   17.14%  qemu-system-x86_64                  [.] subpage_register
>   10.31%  qemu-system-x86_64                  [.] int128_add
>    7.86%  qemu-system-x86_64                  [.] addrrange_end
>    7.30%  qemu-system-x86_64                  [.] int128_ge
>    4.89%  qemu-system-x86_64                  [.] int128_nz
>    3.94%  qemu-system-x86_64                  [.] phys_page_compact
>    2.73%  qemu-system-x86_64                  [.] phys_map_node_alloc

and

>   25.53%  qemu-system-x86_64                  [.] subpage_register
>   15.14%  qemu-system-x86_64                  [.] render_memory_region
>    7.99%  qemu-system-x86_64                  [.] int128_add
>    6.17%  qemu-system-x86_64                  [.] addrrange_end
>    5.61%  qemu-system-x86_64                  [.] int128_ge
>    3.94%  qemu-system-x86_64                  [.] int128_nz
>    3.82%  qemu-system-x86_64                  [.] phys_page_compact
>    2.73%  qemu-system-x86_64                  [.] phys_map_node_alloc

and

>   30.16%  qemu-system-x86_64             [.] subpage_register
>   14.44%  qemu-system-x86_64             [.] render_memory_region
>    7.92%  qemu-system-x86_64             [.] int128_add
>    6.02%  qemu-system-x86_64             [.] addrrange_end
>    5.57%  qemu-system-x86_64             [.] int128_ge
>    3.86%  qemu-system-x86_64             [.] int128_nz
>    2.20%  libc-2.17.so                   [.] __memset_sse2

(Note, this was an unoptimized, debug build of QEMU, if that matters.)

In the middle case, I even probed into the firmware (using QEMU's GDB
server), which gave me:

> #0  0x000000007fce71cd in MmioRead16 (Address=2148630532)
>     at MdePkg/Library/BaseIoLibIntrinsic/IoLib.c:154
> #1  0x000000007fce670f in PciExpressRead16 (Address=1146884)
>     at MdePkg/Library/BasePciExpressLib/PciExpressLib.c:483
> #2  0x000000007fce1faa in PciRead16 (Address=1146884)
>     at OvmfPkg/Library/DxePciLibI440FxQ35/PciLib.c:454
> #3  0x000000007fce2e0d in PciSegmentRead16 (Address=1146884)
>     at MdePkg/Library/BasePciSegmentLibPci/PciSegmentLib.c:428
> #4  0x000000007fce3068 in PciSegmentReadBuffer (StartAddress=1146884,
>     Size=2, Buffer=0x7feef8fe) at
>     MdePkg/Library/BasePciSegmentLibPci/PciSegmentLib.c:1174
> #5  0x000000007fce011b in RootBridgeIoPciAccess (This=0x7f6051c0,
>     Read=1 '\001', Width=EfiPciWidthUint16,  Address=1146884, Count=1,
>     Buffer=0x7feef8fe) at
>     MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c:964
> #6  0x000000007fce01a9 in RootBridgeIoPciRead (This=0x7f6051c0,
>     Width=EfiPciWidthUint16, Address=16973828, Count=1,
>     Buffer=0x7feef8fe) at
>     MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c:996
> #7  0x000000007ea9a8e0 in PciIoConfigRead (This=0x7f59e828,
>     Width=EfiPciIoWidthUint16, Offset=4, Count=1,  Buffer=0x7feef8fe)
>     at MdeModulePkg/Bus/Pci/PciBusDxe/PciIo.c:753
> #8  0x000000007ea93240 in PciOperateRegister (PciIoDevice=0x7f59e818,
>     Command=0, Offset=4 '\004',  Operation=3 '\003', PtrCommand=0x0)
>     at MdeModulePkg/Bus/Pci/PciBusDxe/PciCommand.c:46
> #9  0x000000007ea9bad1 in PciIoAttributes (This=0x7f59e828,
>     Operation=EfiPciIoAttributeOperationEnable,  Attributes=0,
>     Result=0x0) at MdeModulePkg/Bus/Pci/PciBusDxe/PciIo.c:1743
> #10 0x000000007ea9bb48 in PciIoAttributes (This=0x7f59c428,
>     Operation=EfiPciIoAttributeOperationEnable,  Attributes=1792,
>     Result=0x0) at MdeModulePkg/Bus/Pci/PciBusDxe/PciIo.c:1753
> #11 0x000000007eb1a98d in DetectAndPreparePlatformPciDevicePath
>     (Handle=0x7f120418, PciIo=0x7f59c428, Pci=0x7feefaa0) at
>     OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c:834
> #12 0x000000007eb1a92e in VisitingAPciInstance (Handle=0x7f120418,
>     Instance=0x7f59c428,  Context=0x7eb1a958
>     <DetectAndPreparePlatformPciDevicePath>) at
>     OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c:789
> #13 0x000000007eb1a894 in VisitAllInstancesOfProtocol (Id=0x7eb2dd40
>     <gEfiPciIoProtocolGuid>,  CallBackFunction=0x7eb1a8c2
>     <VisitingAPciInstance>, Context=0x7eb1a958
>     <DetectAndPreparePlatformPciDevicePath>) at
>     OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c:748
> #14 0x000000007eb1a956 in VisitAllPciInstances
>     (CallBackFunction=0x7eb1a958
>     <DetectAndPreparePlatformPciDevicePath>) at
>     OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c:804
> #15 0x000000007eb1ab5a in DetectAndPreparePlatformPciDevicePaths
>     (DetectVgaOnly=0 '\000') at
>     OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c:904
> #16 0x000000007eb1abba in PlatformInitializeConsole
>     (PlatformConsole=0x7eb2e280) at
>     OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c:938
> #17 0x000000007eb1a157 in PlatformBootManagerBeforeConsole ()
>     at OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c:397
> #18 0x000000007eb05d20 in BdsEntry (This=0x7eb2de08)
>     at MdeModulePkg/Universal/BdsDxe/BdsEntry.c:908
> #19 0x000000007fef1263 in DxeMain (HobStart=0x7fca3018)
>     at MdeModulePkg/Core/Dxe/DxeMain/DxeMain.c:521
> #20 0x000000007fef0589 in ProcessModuleEntryPointList
>     (HobStart=0x7bf56000) at
>     Build/OvmfX64/NOOPT_GCC48/X64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/AutoGen.c:417
> #21 0x000000007fef0260 in _ModuleEntryPoint (HobStart=0x7bf56000)
>     at MdePkg/Library/DxeCoreEntryPoint/DxeCoreEntryPoint.c:54

In advance: the interesting frames are those with
"EfiPciIoAttributeOperationEnable".


(4) Two very visible slowdowns occur when (a)
"OvmfPkg/AcpiPlatformDxe/PciDecoding.c" temporarily enables, in a loop,
IO and MMIO decoding for all PCI devices that were enumerated earlier --
so that the AML generation in QEMU could take their BARs in account --,
and (b) when the (legacy) virtio driver is bound to the virtio PCI
devices.

Loop (a) consists basically purely of PCI command register massaging.

Part (b) needed a bit ^W whole lot more investigation, so I learned how
to enable edk2's built-in profiling for OVMF, added a few "perf points"
to the VirtioPciDeviceBindingStart() function in
"OvmfPkg/VirtioPciDeviceDxe/VirtioPciDevice.c", and printed the
following measurements at the UEFI shell prompt with the DP (display
perf) utility:

When booting with 31 devices:

> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> DB:Support:      25558      137584           5           2         242
>   DB:Start:        175     7021314       40121           0     1221468
>   OpenProto         31         124           3           0           5
>  GetPciAttr         31          84           2           0           3
>    EnaPciIo         31     2671566       86179           0      102605
>    VPciInit         31        1927          62           0          72
>  InstVProto         31       12302         396           0         543

When booting with 93 devices:

> ==[ Cumulative ]========
> (Times in microsec.)     Cumulative   Average     Shortest    Longest
>    Name         Count     Duration    Duration    Duration    Duration
> -------------------------------------------------------------------------------
> DB:Support:      37214      274119           7           2         414
>   DB:Start:        423    87530041      206926           0     3730887
>   OpenProto         93         510           5           0           9
>  GetPciAttr         93         302           3           0          11
>    EnaPciIo         93    62163154      668420           0     1008274
>    VPciInit         93        6564          70           0         239
>  InstVProto         93       42478         456           0        1128

The DB:Support and DB:Start intervals are generic (they cover all other
drivers as well); they reflect the call counts and durations of when any
driver is asked to determine whether it supports a given device, and
when -- after a positive answer -- the same is asked to actually bind
the device.

The other six lines are specific actions in
VirtioPciDeviceBindingStart().

We see the following, when going from 31 to 93 virtio PCI devices:

- the call counts (less than) triple on all rows, good

- the average, shortest and longest durations remain mostly unchanged
  for the DB:Support, OpenProto, GetPciAttr, VPciInit, InstVProto rows,

- on the "EnaPciIo" row (which stands for enabling IO decoding in the
  legacy or transitional virtio device's command register), the average
  duration gets multiplied by ~7.8 (668420/86179), while the max
  duration is multiplied by ~9.8 (1008274/102605),

- the DB:Start row is similar: it includes "EnaPciIo", but it is
  smoothed by UEFI drivers that don't bind to PciIo protocol instances,

- the cumulative duration for EnaPciIo goes from ~2.7 seconds to ~62.2
  seconds (a factor of ~23.3, corresponding to three times as many
  devices bound, times the avg. EnaPciIo duration getting multiplied by
  ~7.8).


(5) After this, I turned to QEMU, and applied the following patch:

> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 258fbe51e2ee..a3c250483211 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1289,19 +1289,27 @@ static void pci_update_mappings(PCIDevice *d)
>              trace_pci_update_mappings_del(d, pci_bus_num(d->bus),
>                                            PCI_SLOT(d->devfn),
>                                            PCI_FUNC(d->devfn),
>                                            i, r->addr, r->size);
>              memory_region_del_subregion(r->address_space, r->memory);
> +            trace_pci_update_mappings_del_done(d, pci_bus_num(d->bus),
> +                                               PCI_SLOT(d->devfn),
> +                                               PCI_FUNC(d->devfn),
> +                                               i, r->addr, r->size);
>          }
>          r->addr = new_addr;
>          if (r->addr != PCI_BAR_UNMAPPED) {
>              trace_pci_update_mappings_add(d, pci_bus_num(d->bus),
>                                            PCI_SLOT(d->devfn),
>                                            PCI_FUNC(d->devfn),
>                                            i, r->addr, r->size);
>              memory_region_add_subregion_overlap(r->address_space,
>                                                  r->addr, r->memory, 1);
> +            trace_pci_update_mappings_add_done(d, pci_bus_num(d->bus),
> +                                               PCI_SLOT(d->devfn),
> +                                               PCI_FUNC(d->devfn),
> +                                               i, r->addr, r->size);
>          }
>      }
>
>      pci_update_vga(d);
>  }
> diff --git a/hw/pci/trace-events b/hw/pci/trace-events
> index f68c178afc2b..ffb5c473e048 100644
> --- a/hw/pci/trace-events
> +++ b/hw/pci/trace-events
> @@ -1,10 +1,12 @@
>  # See docs/devel/tracing.txt for syntax documentation.
>
>  # hw/pci/pci.c
>  pci_update_mappings_del(void *d, uint32_t bus, uint32_t slot, uint32_t func, int bar, uint64_t addr, uint64_t size) "d=%p %02x:%02x.%x %d,0x%"PRIx64"+0x%"PRIx64
>  pci_update_mappings_add(void *d, uint32_t bus, uint32_t slot, uint32_t func, int bar, uint64_t addr, uint64_t size) "d=%p %02x:%02x.%x %d,0x%"PRIx64"+0x%"PRIx64
> +pci_update_mappings_del_done(void *d, uint32_t bus, uint32_t slot, uint32_t func, int bar, uint64_t addr, uint64_t size) "d=%p %02x:%02x.%x %d,0x%"PRIx64"+0x%"PRIx64
> +pci_update_mappings_add_done(void *d, uint32_t bus, uint32_t slot, uint32_t func, int bar, uint64_t addr, uint64_t size) "d=%p %02x:%02x.%x %d,0x%"PRIx64"+0x%"PRIx64
>
>  # hw/pci/pci_host.c
>  pci_cfg_read(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x -> 0x%x"
>  pci_cfg_write(const char *dev, unsigned devid, unsigned fnid, unsigned offs, unsigned val) "%s %02u:%u @0x%x <- 0x%x"

(This is matched by the "-trace enable='pci_update_mappings*'" option
under (1).)

With this patch, I only tested the "93 devices" case, as the slowdown
became visible to the naked eye from the trace messages, as the firmware
enabled more and more BARs / command registers (and inversely, the
speedup was perceivable when the firmware disabled more and more BARs /
command registers).

I will attach the trace output (compressed), from the launch of the VM
to reaching the UEFI shell. It would be interesting to plot the time
differences between the add/del and matching add_done/del_done trace
points (the difference grows continuously as more and more BARs are
enabled); for now I'll quote two extremes from the BAR-enablement loop:

Lines 2311-2312:

> 13602@1502236033.130033:pci_update_mappings_add       d=0x55567cfd4620  02:01.0  0,0xa780+0x40
> 13602@1502236033.133795:pci_update_mappings_add_done  d=0x55567cfd4620  02:01.0  0,0xa780+0x40

Lines 2863-2864:

> 13602@1502236107.343233:pci_update_mappings_add       d=0x555702401bf0  04:1f.0  0,0x8000+0x40
> 13602@1502236107.751508:pci_update_mappings_add_done  d=0x555702401bf0  04:1f.0  0,0x8000+0x40

Enabling IO decoding (hence BAR 0) for the first virtio-scsi-pci device,
02:01.0, took 3.762 msecs.

Enabling IO decoding (hence BAR 0), at the end of the same firmware
loop, for the last virtio-scsi-pci device, 04:1f.0, took 408.275 msecs,
which is approx. 108 times as long. (Almost half a second -- it's
human-perceivable.)


I didn't try to follow QEMU more deeply. Hopefully this is useful for
something.

Thanks
Laszlo

[-- Attachment #2: trace-column-t.txt.xz --]
[-- Type: application/x-xz, Size: 12752 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Qemu and 32 PCIe devices
  2017-08-09  1:06   ` Laszlo Ersek
@ 2017-08-09  7:26     ` Paolo Bonzini
  2017-08-09 10:00       ` Laszlo Ersek
  2017-08-09 17:16       ` Michael S. Tsirkin
  0 siblings, 2 replies; 10+ messages in thread
From: Paolo Bonzini @ 2017-08-09  7:26 UTC (permalink / raw)
  To: Laszlo Ersek, qemu-devel
  Cc: Marcin Juszkiewicz, Marcel Apfelbaum, Drew Jones,
	Christoffer Dall, Peter Maydell, Gema Gomez-Solano, Marc Zygnier

On 09/08/2017 03:06, Laszlo Ersek wrote:
>>   20.14%  qemu-system-x86_64                  [.] render_memory_region
>>   17.14%  qemu-system-x86_64                  [.] subpage_register
>>   10.31%  qemu-system-x86_64                  [.] int128_add
>>    7.86%  qemu-system-x86_64                  [.] addrrange_end
>>    7.30%  qemu-system-x86_64                  [.] int128_ge
>>    4.89%  qemu-system-x86_64                  [.] int128_nz
>>    3.94%  qemu-system-x86_64                  [.] phys_page_compact
>>    2.73%  qemu-system-x86_64                  [.] phys_map_node_alloc

Yes, this is the O(n^3) thing.  An optimized build should be faster
because int128 operations will be inlined and become much more efficient.

> With this patch, I only tested the "93 devices" case, as the slowdown
> became visible to the naked eye from the trace messages, as the firmware
> enabled more and more BARs / command registers (and inversely, the
> speedup was perceivable when the firmware disabled more and more BARs /
> command registers).

This is an interesting observation, and it's expected.  Looking at the
O(n^3) complexity more in detail you have N operations, where the "i"th
operates on "i" DMA address spaces, all of which have at least "i"
memory regions (at least 1 BAR per device).

So the total cost is sum i=1..N i^2 = N(N+1)(2N+1)/6 = O(n^3).
Expressing it as a sum shows why it gets slower as time progresses.

The solution is to note that those "i" address spaces are actually all
the same, so we can get it down to sum i=1..N i = N(N+1)/2 = O(n^2).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Qemu and 32 PCIe devices
  2017-08-09  7:26     ` Paolo Bonzini
@ 2017-08-09 10:00       ` Laszlo Ersek
  2017-08-09 10:16         ` Paolo Bonzini
  2017-08-09 17:16       ` Michael S. Tsirkin
  1 sibling, 1 reply; 10+ messages in thread
From: Laszlo Ersek @ 2017-08-09 10:00 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Marcin Juszkiewicz, Marcel Apfelbaum, Drew Jones,
	Christoffer Dall, Peter Maydell, Gema Gomez-Solano, Marc Zygnier

On 08/09/17 09:26, Paolo Bonzini wrote:
> On 09/08/2017 03:06, Laszlo Ersek wrote:
>>>   20.14%  qemu-system-x86_64                  [.] render_memory_region
>>>   17.14%  qemu-system-x86_64                  [.] subpage_register
>>>   10.31%  qemu-system-x86_64                  [.] int128_add
>>>    7.86%  qemu-system-x86_64                  [.] addrrange_end
>>>    7.30%  qemu-system-x86_64                  [.] int128_ge
>>>    4.89%  qemu-system-x86_64                  [.] int128_nz
>>>    3.94%  qemu-system-x86_64                  [.] phys_page_compact
>>>    2.73%  qemu-system-x86_64                  [.] phys_map_node_alloc
> 
> Yes, this is the O(n^3) thing.  An optimized build should be faster
> because int128 operations will be inlined and become much more efficient.
> 
>> With this patch, I only tested the "93 devices" case, as the slowdown
>> became visible to the naked eye from the trace messages, as the firmware
>> enabled more and more BARs / command registers (and inversely, the
>> speedup was perceivable when the firmware disabled more and more BARs /
>> command registers).
> 
> This is an interesting observation, and it's expected.  Looking at the
> O(n^3) complexity more in detail you have N operations, where the "i"th
> operates on "i" DMA address spaces, all of which have at least "i"
> memory regions (at least 1 BAR per device).

- Can you please give me a pointer to the code where the "i"th operation
works on "i" DMA address spaces? (Not that I dream about patching *that*
code, wherever it may live :) )

- You mentioned that changing this is on the ToDo list. I couldn't find
it under <https://wiki.qemu.org/index.php/ToDo>. Is it tracked somewhere
else?

(I'm not trying to urge any changes in the area, I'd just like to learn
about the code & the tracker item, if there's one.)

Thanks!
Laszlo

> 
> So the total cost is sum i=1..N i^2 = N(N+1)(2N+1)/6 = O(n^3).
> Expressing it as a sum shows why it gets slower as time progresses.
> 
> The solution is to note that those "i" address spaces are actually all
> the same, so we can get it down to sum i=1..N i = N(N+1)/2 = O(n^2).
> 
> Thanks,
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Qemu and 32 PCIe devices
  2017-08-09 10:00       ` Laszlo Ersek
@ 2017-08-09 10:16         ` Paolo Bonzini
  2017-08-09 10:56           ` Laszlo Ersek
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Bonzini @ 2017-08-09 10:16 UTC (permalink / raw)
  To: Laszlo Ersek, qemu-devel
  Cc: Marcin Juszkiewicz, Marcel Apfelbaum, Drew Jones,
	Christoffer Dall, Peter Maydell, Gema Gomez-Solano, Marc Zygnier

On 09/08/2017 12:00, Laszlo Ersek wrote:
> On 08/09/17 09:26, Paolo Bonzini wrote:
>> On 09/08/2017 03:06, Laszlo Ersek wrote:
>>>>   20.14%  qemu-system-x86_64                  [.] render_memory_region
>>>>   17.14%  qemu-system-x86_64                  [.] subpage_register
>>>>   10.31%  qemu-system-x86_64                  [.] int128_add
>>>>    7.86%  qemu-system-x86_64                  [.] addrrange_end
>>>>    7.30%  qemu-system-x86_64                  [.] int128_ge
>>>>    4.89%  qemu-system-x86_64                  [.] int128_nz
>>>>    3.94%  qemu-system-x86_64                  [.] phys_page_compact
>>>>    2.73%  qemu-system-x86_64                  [.] phys_map_node_alloc
>>
>> Yes, this is the O(n^3) thing.  An optimized build should be faster
>> because int128 operations will be inlined and become much more efficient.
>>
>>> With this patch, I only tested the "93 devices" case, as the slowdown
>>> became visible to the naked eye from the trace messages, as the firmware
>>> enabled more and more BARs / command registers (and inversely, the
>>> speedup was perceivable when the firmware disabled more and more BARs /
>>> command registers).
>>
>> This is an interesting observation, and it's expected.  Looking at the
>> O(n^3) complexity more in detail you have N operations, where the "i"th
>> operates on "i" DMA address spaces, all of which have at least "i"
>> memory regions (at least 1 BAR per device).
> 
> - Can you please give me a pointer to the code where the "i"th operation
> works on "i" DMA address spaces? (Not that I dream about patching *that*
> code, wherever it may live :) )

It's all driven by actions of the guest.

Simply, by the time you get to the "i"th command register, you have
enabled bus-master DMA on "i" devices (so that "i" DMA address spaces
are non-empty) and you have enabled BARs on "i" devices (so that their
BARs are included in the address spaces).

> - You mentioned that changing this is on the ToDo list. I couldn't find
> it under <https://wiki.qemu.org/index.php/ToDo>. Is it tracked somewhere
> else?

I've added it to https://wiki.qemu.org/index.php/ToDo/MemoryAPI (thanks
for the nudge).

Paolo

> (I'm not trying to urge any changes in the area, I'd just like to learn
> about the code & the tracker item, if there's one.)
> 
> Thanks!
> Laszlo
> 
>>
>> So the total cost is sum i=1..N i^2 = N(N+1)(2N+1)/6 = O(n^3).
>> Expressing it as a sum shows why it gets slower as time progresses.
>>
>> The solution is to note that those "i" address spaces are actually all
>> the same, so we can get it down to sum i=1..N i = N(N+1)/2 = O(n^2).
>>
>> Thanks,
>>
>> Paolo
>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Qemu and 32 PCIe devices
  2017-08-09 10:16         ` Paolo Bonzini
@ 2017-08-09 10:56           ` Laszlo Ersek
  2017-08-09 11:11             ` Peter Maydell
  2017-08-09 11:15             ` Paolo Bonzini
  0 siblings, 2 replies; 10+ messages in thread
From: Laszlo Ersek @ 2017-08-09 10:56 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel
  Cc: Marcin Juszkiewicz, Marcel Apfelbaum, Drew Jones,
	Christoffer Dall, Peter Maydell, Gema Gomez-Solano, Marc Zygnier

On 08/09/17 12:16, Paolo Bonzini wrote:
> On 09/08/2017 12:00, Laszlo Ersek wrote:
>> On 08/09/17 09:26, Paolo Bonzini wrote:
>>> On 09/08/2017 03:06, Laszlo Ersek wrote:
>>>>>   20.14%  qemu-system-x86_64                  [.] render_memory_region
>>>>>   17.14%  qemu-system-x86_64                  [.] subpage_register
>>>>>   10.31%  qemu-system-x86_64                  [.] int128_add
>>>>>    7.86%  qemu-system-x86_64                  [.] addrrange_end
>>>>>    7.30%  qemu-system-x86_64                  [.] int128_ge
>>>>>    4.89%  qemu-system-x86_64                  [.] int128_nz
>>>>>    3.94%  qemu-system-x86_64                  [.] phys_page_compact
>>>>>    2.73%  qemu-system-x86_64                  [.] phys_map_node_alloc
>>>
>>> Yes, this is the O(n^3) thing.  An optimized build should be faster
>>> because int128 operations will be inlined and become much more efficient.
>>>
>>>> With this patch, I only tested the "93 devices" case, as the slowdown
>>>> became visible to the naked eye from the trace messages, as the firmware
>>>> enabled more and more BARs / command registers (and inversely, the
>>>> speedup was perceivable when the firmware disabled more and more BARs /
>>>> command registers).
>>>
>>> This is an interesting observation, and it's expected.  Looking at the
>>> O(n^3) complexity more in detail you have N operations, where the "i"th
>>> operates on "i" DMA address spaces, all of which have at least "i"
>>> memory regions (at least 1 BAR per device).
>>
>> - Can you please give me a pointer to the code where the "i"th operation
>> works on "i" DMA address spaces? (Not that I dream about patching *that*
>> code, wherever it may live :) )
> 
> It's all driven by actions of the guest.
> 
> Simply, by the time you get to the "i"th command register, you have
> enabled bus-master DMA on "i" devices (so that "i" DMA address spaces
> are non-empty) and you have enabled BARs on "i" devices (so that their
> BARs are included in the address spaces).
> 
>> - You mentioned that changing this is on the ToDo list. I couldn't find
>> it under <https://wiki.qemu.org/index.php/ToDo>. Is it tracked somewhere
>> else?
> 
> I've added it to https://wiki.qemu.org/index.php/ToDo/MemoryAPI (thanks
> for the nudge).

Thank you!

Allow me one last question -- why (and since when) does each device have
its own separate address space? Is that related to the virtual IOMMU?

Now that I look at the "info mtree" monitor output of a random VM, I see
the following "address-space"s:
- memory
- I/O
- cpu-memory
- bunch of nameless ones, with top level regions called
  "bus master container"
- several named "virtio-pci-cfg-as"
- KVM-SMRAM

I (sort of) understand MemoryRegions and aliases, but:
- I don't know why "memory" and "cpu-memory" exist separately, for example,
- I seem to remember that the "bunch of nameless ones" has not always
been there? (I could be totally wrong, of course.)

... There is one address_space_init() call in "hw/pci/pci.c", and it
comes (most recently) from commit 3716d5902d74 ("pci: introduce a bus
master container", 2017-03-13). The earliest commit that added it seems
to be 817dcc536898 ("pci: give each device its own address space",
2012-10-03). The commit messages do mention IOMMUs.

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Qemu and 32 PCIe devices
  2017-08-09 10:56           ` Laszlo Ersek
@ 2017-08-09 11:11             ` Peter Maydell
  2017-08-09 11:15             ` Paolo Bonzini
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Maydell @ 2017-08-09 11:11 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Paolo Bonzini, QEMU Developers, Marcin Juszkiewicz,
	Marcel Apfelbaum, Drew Jones, Christoffer Dall,
	Gema Gomez-Solano, Marc Zygnier

On 9 August 2017 at 11:56, Laszlo Ersek <lersek@redhat.com> wrote:
> Now that I look at the "info mtree" monitor output of a random VM, I see
> the following "address-space"s:
> - memory
> - I/O
> - cpu-memory
> - bunch of nameless ones, with top level regions called
>   "bus master container"
> - several named "virtio-pci-cfg-as"
> - KVM-SMRAM
>
> I (sort of) understand MemoryRegions and aliases, but:
> - I don't know why "memory" and "cpu-memory" exist separately, for example,

"memory" is the "system address space", ie the default overall
view of memory where most devices appear and which gets
used by DMA'ing devices that don't have a proper model of
what their view of the world should be.
"cpu-memory" is specifically the view of the world that
the CPU has. Per-CPU memory mapped devices and devices
that are visible to the CPU but not to random DMA'ing
things can appear here (and not in the 'memory' address space).

Generally "cpu-memory" will be created effectively as
"memory" plus some other stuff.

Some CPU architectures have more than one address space
per CPU -- notably ARM TrustZone-supporting  CPUs
have "cpu-memory" for the NonSecure view and "cpu-secure-memory"
for the Secure view.

The overall aim here is to better model the real world,
where different memory transaction masters can sit on
different buses or in different places in the bus fabric,
and thus have access to different things. Anything that
is a bus master should ideally have its own AddressSpace,
because for QEMU an AddressSpace is the endpoint that
you use to initiate memory transactions. MemoryRegions
on the other hand are what you use for building up the
hierarchy of devices sitting on buses and so on, so
when you're setting up the simulation you pass a device
which is a bus master a MemoryRegion defining "this is
the world you can see", and the device creates an
AddressSpace from it in order to be able to interact
with it.

(We have address_space_init_shareable() to reduce the
proliferation of theoretically distinct but in
practice identical AddressSpaces.)

Similarly, I'm not familiar with the PCI code, but
conceptually each PCI device is a bus master and wants
an AddressSpace to do transactions into.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Qemu and 32 PCIe devices
  2017-08-09 10:56           ` Laszlo Ersek
  2017-08-09 11:11             ` Peter Maydell
@ 2017-08-09 11:15             ` Paolo Bonzini
  1 sibling, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2017-08-09 11:15 UTC (permalink / raw)
  To: Laszlo Ersek, qemu-devel
  Cc: Marcin Juszkiewicz, Marcel Apfelbaum, Drew Jones,
	Christoffer Dall, Peter Maydell, Gema Gomez-Solano, Marc Zygnier

On 09/08/2017 12:56, Laszlo Ersek wrote:
> Allow me one last question -- why (and since when) does each device have
> its own separate address space? Is that related to the virtual IOMMU?

No (though it helps there too).  It's because a device that has
bus-master DMA disabled in the command register cannot see RAM.

So each device has an address space that is just a huge alias for RAM.
The alias is enabled when bus-master DMA is disabled, and disabled when
bus-master DMA is enabled.

> Now that I look at the "info mtree" monitor output of a random VM, I see
> the following "address-space"s:
> - memory
> - I/O
> - cpu-memory
> - bunch of nameless ones, with top level regions called

here they are :)

>   "bus master container"
> - several named "virtio-pci-cfg-as"
> - KVM-SMRAM
> 
> I (sort of) understand MemoryRegions and aliases, but:
> - I don't know why "memory" and "cpu-memory" exist separately, for example,

cpu-memory is also mostly a huge alias to memory.  But some
architectures may see some areas slightly differently when accessed from
devices vs. CPUs.

For example, the 0xFEE00000 area on x86 accesses the APIC when written
from CPUs and triggers MSIs when written from devices.  Not the best
example because we don't use memory vs. cpu-memory to model it, but it's
an example.

Another example was SMM on TCG; when entering SMM, SMRAM used to be
shown/hidden in cpu-memory (and not in memory).  Nowadays TCG uses a
completely separate address space, like KVM-SMRAM in your example above,
but you can see that cpu-memory can come in handy. :)

> - I seem to remember that the "bunch of nameless ones" has not always
> been there? (I could be totally wrong, of course.)

It's always been there IIRC, it comes from 817dcc536898.

Paolo

> ... There is one address_space_init() call in "hw/pci/pci.c", and it
> comes (most recently) from commit 3716d5902d74 ("pci: introduce a bus
> master container", 2017-03-13). The earliest commit that added it seems
> to be 817dcc536898 ("pci: give each device its own address space",
> 2012-10-03). The commit messages do mention IOMMUs.
> 
> Thanks!
> Laszlo
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] Qemu and 32 PCIe devices
  2017-08-09  7:26     ` Paolo Bonzini
  2017-08-09 10:00       ` Laszlo Ersek
@ 2017-08-09 17:16       ` Michael S. Tsirkin
  1 sibling, 0 replies; 10+ messages in thread
From: Michael S. Tsirkin @ 2017-08-09 17:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Laszlo Ersek, qemu-devel, Peter Maydell, Drew Jones,
	Christoffer Dall, Marc Zygnier, Gema Gomez-Solano,
	Marcin Juszkiewicz, Marcel Apfelbaum

On Wed, Aug 09, 2017 at 09:26:11AM +0200, Paolo Bonzini wrote:
> On 09/08/2017 03:06, Laszlo Ersek wrote:
> >>   20.14%  qemu-system-x86_64                  [.] render_memory_region
> >>   17.14%  qemu-system-x86_64                  [.] subpage_register
> >>   10.31%  qemu-system-x86_64                  [.] int128_add
> >>    7.86%  qemu-system-x86_64                  [.] addrrange_end
> >>    7.30%  qemu-system-x86_64                  [.] int128_ge
> >>    4.89%  qemu-system-x86_64                  [.] int128_nz
> >>    3.94%  qemu-system-x86_64                  [.] phys_page_compact
> >>    2.73%  qemu-system-x86_64                  [.] phys_map_node_alloc
> 
> Yes, this is the O(n^3) thing.  An optimized build should be faster
> because int128 operations will be inlined and become much more efficient.
> 
> > With this patch, I only tested the "93 devices" case, as the slowdown
> > became visible to the naked eye from the trace messages, as the firmware
> > enabled more and more BARs / command registers (and inversely, the
> > speedup was perceivable when the firmware disabled more and more BARs /
> > command registers).
> 
> This is an interesting observation, and it's expected.  Looking at the
> O(n^3) complexity more in detail you have N operations, where the "i"th
> operates on "i" DMA address spaces, all of which have at least "i"
> memory regions (at least 1 BAR per device).
> 
> So the total cost is sum i=1..N i^2 = N(N+1)(2N+1)/6 = O(n^3).
> Expressing it as a sum shows why it gets slower as time progresses.
> 
> The solution is to note that those "i" address spaces are actually all
> the same, so we can get it down to sum i=1..N i = N(N+1)/2 = O(n^2).
> 
> Thanks,
> 
> Paolo

We'll probably run into more issues with the vIOMMU but I guess we
can look into it later.

Resolving addresses lazily somehow might be interesting. And would
the caching work that went in a while ago but got disabled
since we couldn't iron out all the small issues
help go in that direction somehow?

-- 
MST

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-08-09 17:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-08 10:39 [Qemu-devel] Qemu and 32 PCIe devices Marcin Juszkiewicz
2017-08-08 15:51 ` Laszlo Ersek
2017-08-09  1:06   ` Laszlo Ersek
2017-08-09  7:26     ` Paolo Bonzini
2017-08-09 10:00       ` Laszlo Ersek
2017-08-09 10:16         ` Paolo Bonzini
2017-08-09 10:56           ` Laszlo Ersek
2017-08-09 11:11             ` Peter Maydell
2017-08-09 11:15             ` Paolo Bonzini
2017-08-09 17:16       ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.