All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 kvmtool 0/4] arm/arm64: PCI Express 1.1 support
@ 2021-06-21  9:21 Alexandru Elisei
  2021-06-21  9:21 ` [PATCH v2 kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Alexandru Elisei @ 2021-06-21  9:21 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz, pierre.gondois

Patches + EDK2 binary that I used for testing can be found at [5].

This series aims to add support for PCI Express 1.1. It is based on the
last patch [0] of the reassignable BAR series. The patch was discarded at
the time because there was no easy solution to solve the overlap between
the UART address and kvmtool's PCI I/O region, which made EDK2 and/or a
guest compiled with 64k pages very unhappy [1]. This is not the case
anymore, as the UART has been moved to address 0x1000000 in commit
45b4968e0de1 ("hw/serial: ARM/arm64: Use MMIO at higher addresses").

The series has also been tested with EDK2 built from the patches [6] that
add PCI Express when running under kvmtool. This means that someone will be
able to download an official iso from the debian website and install it in
a kvmtool VM.

The first two patches in the series are small and hopefully straightford
cleanups for stuff that I discovered when playing with kvmtool.

The third patch implements the PCI Express support only for the arm and
arm64 architectures. The reason for that is that I don't know how to do it
for x86, powerpc and mips (and for the last two I don't even have machines
to test it).

The last patch implements a fix for a Realtek RTL8168 NIC, where the Linux
drivers falls back to a device specific method of initialization if the
device is not PCI Express capable (doesn't have the PCI Express
Capability) [2].


Changes in v2
=============

* Gathered Reviewed-by tag, many thanks!

* Renamed #2 "arm/fdt.c: Warn if MMIO device doesn't provide a node
  generator" to "arm/fdt.c: Don't generate the node if generator function
  is NULL" and replaced the warning with a debug message.

* Added the PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1 define when it's not present
  on the system in patch #4.


Testing for v2
==============

In this iteration, the only change that impacts PCI Express support is the
addition of the PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1 define when it's not
present on the system. Because of this, I believe the testing I did for v1
is still valid.

However, for completeness, a did a sanity run on my x86 machine. Also, the
EDK2 version that I used for testing on arm64 was built from a
work-in-progress tree, and in the meantime the patches have landed on the
mailing list [6]. I also ran some tests with EDK2 built from those patches.
Details below.

On a Ryzen 3900x:
-----------------

amd64 architecture and no PCIE support, making sure no regressions are
introduced.

1. Direct kernel boot + Debian 10 disk with SDL, to exercise the emulated
VESA device.  Was able to login using the display manager and
virtio-{net,blk} were working correctly.

On odroid-c4:
-------------

1. Debian 10 disk + EDK2 + --force-pci. The kernel was booted via Debian
grub, and I tried kernels compiled with 4k, 16k and 64k page sizes.

On AMD Seattle:
---------------

1. Using the EDK2 image and the passthrough Realtek RTL8168 NIC as the
network interface, and a vanilla netinstall iso from the debian website [3]
I was able to install debian in a virtual machine. The installation hint
from the testing for v1 still applies.

2. Realtek RTL8168 + EDK2 boot + --force-pci, kernel compiled with 4k and
64k pages (Seattle doesn't support 16k pages).

3. Intel 82574L NIC + EDK2 boot + --force-pci, kernel compiled with 4k and
64k pages.

4. AMD FirePro W2100 VGA + HDMI audio (both assigned to the VM) + EDK2 boot
+ --force-pci, kernel compiled from v5.10 (see testing for v1) with 4k and
64k pages.

5. NVIDIA Quadro P400 VGA + HDMI audio (both assigned to the VM) + EDK2
boot + --force-pci, kernel compiled with 4k and 64k pages (see testing for
v1).


Testing for v1
==============

Warning, wall of text. Unless specified, the guest kernel was built from
tag v5.12.

On a Ryzen 3900x:
-----------------

amd64 architecture and no PCIE support, making sure no regressions are
introduced.

1. Direct kernel boot + Debian 10 disk with SDL, to exercise the emulated
VESA device.  Was able to login using the display manager and
virtio-{net,blk} were working correctly.

2. Direct kernel boot + Debian 10 disk with SDL + Realtek RTL8168 + Intel
82574L PCIE NIC, both assigned to the VM. Assigning an ip address to the
Realtek NIC fails with the message: "No native access to PCI extended
config space, falling back to CSI", which makes sense since kvmtool is
emulating legacy PCI 3.0 for the amd64 architecture. Other than that,
everything works as expected.

On odroid-c4:
-------------

1. Debian 10 disk + upstream EDK2 built from commit 1f515342d8d8
("DynamicTablesPkg: Use AML_NAME_SEG_SIZE define"), **without** --force-pci
(so using virtio-mmio). Kernel compiled with 4k, 16k and 64k pages. This
was done to make sure there are no regressions.

2. Direct kernel boot + Debian 10 disk, with --force-pci. Tried 3 versions
of the kernel, compiled with 4k, 16k and 64k pagesize. Got the warning:
"TCP: enp0s0: Driver has suspect GRO implementation, TCP performance may be
compromised." I suspect it is because of kvmtool legacy version of virtio.
This was further confirmed by running the same kernel with kvmtool built
from master, with and without --force-pci, the warning was still there.

3. Debian 10 disk + a work-in-progress version of EDK2 which enables PCIE
support for kvmtool, with --force-pci. The kernel was booted via Debian
grub, and same as above, I tried with kernels compiled with 4k, 16k and 64k
page sizes.

On AMD Seattle:
---------------

1. Using the EDK2 image and the passthrough Realtek RTL8168 NIC as the
network interface, I was able to use a vanilla netinstall iso from the
debian website [3] and install debian in a virtual machine. Woohoo!

One gotcha during installation: because kvmtool doesn't emulate a SCSI
CD-ROM, you need to manually specify the virtio disk for the installation
iso. At the 'Detect and mount CD-ROM' prompt, choose No when asked to load
CD-ROM drivers from removable media, Yes to manually select a CD-ROM module
and device, none when choosing the CD-ROM module (it's a virtio disk), then
the device file for accessing the CD-ROM is /dev/vda (only if the iso file
is the first --disk kvmtool parameter, otherwise /dev/vdb if it's the
second, and so on).

2. Realtek RTL8168, direct kernel boot and EDK2 boot with Debian 10 disk,
--force-pci, kernel compiled with 4k and 64k pages (Seattle doesn't support
16k pages) for both direct kernel boot and EDK2 boot.

3. Intel 82574L NIC, direct kernel boot and EDK2 boot with Debian 10 disk,
--force-pci, kernel compiled with 4k and 64k pages for both direct boot and
EDK2 boot.

4. AMD FirePro W2100 VGA + HDMI audio, both assigned to a VM, direct kernel
boot and EDK2 boot with Debian 10 disk, --force-pci, kernel compiled with
4k and 64k pages for both direct boot and EDK2 boot.

For this test, I switched the guest kernel to v5.10 because with v5.11 and
v5.12 I was getting this kernel panic caused by a NULL pointer deference:

[..]
[    0.943927] [drm] radeon kernel modesetting enabled.
[    0.945050] [drm] initializing kernel modesetting (OLAND 0x1002:0x6608 0x1002:0x2120 0x00).
[    0.946313] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.947736] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.949193] [drm:radeon_get_bios] *ERROR* Unable to locate a BIOS ROM
[    0.950151] radeon 0000:00:00.0: Fatal error during GPU init
[    0.950990] [drm] radeon: finishing device.
[    0.951633] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[    0.952936] Mem abort info:
[    0.953369]   ESR = 0x96000004
[    0.953838]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.954635]   SET = 0, FnV = 0
[    0.955100]   EA = 0, S1PTW = 0
[    0.955590] Data abort info:
[    0.956033]   ISV = 0, ISS = 0x00000004
[    0.956608]   CM = 0, WnR = 0
[    0.957099] [0000000000000020] user address but active_mm is swapper
[    0.958051] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[    0.958881] Modules linked in:
[    0.959356] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.11.0 #13
[    0.960268] Hardware name: linux,dummy-virt (DT)
[    0.960970] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[    0.962013] pc : ttm_resource_manager_evict_all+0x64/0x1f0
[    0.962972] lr : ttm_resource_manager_evict_all+0x5c/0x1f0
[    0.963931] sp : ffff80001212ba00
[    0.964517] x29: ffff80001212ba00 x28: 0000000000000000 
[    0.965448] x27: ffff8000118004e0 x26: ffff8000120cd000 
[    0.966371] x25: 0000000000000000 x24: ffff000080c946e8 
[    0.967296] x23: 0000000000000020 x22: 0000000000000000 
[    0.968227] x21: 0000000000000000 x20: ffff8000120cdb90 
[    0.969152] x19: ffff000080c94000 x18: ffffffffffffffff 
[    0.970076] x17: 0000000000000000 x16: 0000000000000001 
[    0.970999] x15: ffff80009212b787 x14: 0000000000000006 
[    0.971928] x13: ffff800011de2368 x12: 0000000000000264 
[    0.972852] x11: 00000000000000cc x10: ffff800011de2368 
[    0.973780] x9 : ffff800011de2368 x8 : 00000000ffffefff 
[    0.974701] x7 : ffff800011e3a368 x6 : ffff800011e3a368 
[    0.975637] x5 : 0000000000000000 x4 : 0000000000000000 
[    0.976559] x3 : ffff8000120cdb90 x2 : 0000000000000001 
[    0.977483] x1 : 0000000000000000 x0 : 0000000000000000 
[    0.978410] Call trace:
[    0.978851]  ttm_resource_manager_evict_all+0x64/0x1f0
[    0.979759]  radeon_bo_evict_vram+0x1c/0x30
[    0.980494]  radeon_device_fini+0x34/0xe8
[    0.981209]  radeon_driver_unload_kms+0x48/0x90
[    0.982000]  radeon_driver_load_kms+0x124/0x174
[    0.982792]  drm_dev_register+0xe0/0x210
[    0.983486]  radeon_pci_probe+0x120/0x1bc
[    0.984180]  local_pci_probe+0x40/0xac
[    0.984843]  pci_device_probe+0x114/0x1b0
[    0.985548]  really_probe+0xe4/0x4c0
[    0.986181]  driver_probe_device+0x58/0xc0
[    0.986902]  device_driver_attach+0xc0/0xcc
[    0.987642]  __driver_attach+0x84/0x124
[    0.988317]  bus_for_each_dev+0x70/0xd0
[    0.988996]  driver_attach+0x24/0x30
[    0.989627]  bus_add_driver+0x104/0x1ec
[    0.990300]  driver_register+0x78/0x130
[    0.990974]  __pci_register_driver+0x48/0x54
[    0.991730]  radeon_module_init+0x54/0x64
[    0.992438]  do_one_initcall+0x50/0x1b0
[    0.993115]  kernel_init_freeable+0x1d4/0x23c
[    0.993880]  kernel_init+0x14/0x118
[    0.994496]  ret_from_fork+0x10/0x34
[    0.995132] Code: f90033ff 9420650e d37c7f36 8b1602b6 (f94012c0) 
[    0.996201] ---[ end trace 88eed6171e8cb9bc ]---
[    0.997011] note: swapper/0[1] exited with preempt_count 1
[    0.997840] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.998984] SMP: stopping secondary CPUs
[    0.999605] Kernel Offset: disabled
[    1.000137] CPU features: 0x00240022,61006082
[    1.000793] Memory Limit: none
[    1.001330] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

This is how dmesg looks like with v5.10, v5.8 and v5.6:

[..]
[    0.972061] [drm] radeon kernel modesetting enabled.
[    0.973162] [drm] initializing kernel modesetting (OLAND 0x1002:0x6608 0x1002:0x2120 0x00).
[    0.974426] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.976037] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.977435] [drm:radeon_get_bios] *ERROR* Unable to locate a BIOS ROM
[    0.978381] radeon 0000:00:00.0: Fatal error during GPU init
[    0.979341] [drm] radeon: finishing device.
[    0.979963] [TTM] Memory type 2 has not been initialized
[    0.988250] radeon: probe of 0000:00:00.0 failed with error -22
[    0.989282] cacheinfo: Unable to detect cache hierarchy for CPU 0
[    0.993326] loop: module loaded
[..]

In my opinion, this is an upstream bug caused by incorrect clean up when
probing fails. I plan to see if I can reproduce it on my x86 machine (to
make it easier to other people to reproduce it) and then report it
upstream.

Note that I used the radeon driver instead of amdgpu because this is the
recommended driver [4] for the GCN1 architecture.

5. NVIDIA Quadro P400 VGA + HDMI audio, both assigned to a VM, direct kernel
boot and EDK2 boot with Debian 10 disk, --force-pci, kernel compiled with
4k and 64k pages for both direct boot and EDK2 boot.

Nouveau seems to work as expected (it binds to the GPU). but during driver
initialization it looks like the system hangs for 30s-1m. My guess is that
something times out in the driver due to missing emulation in kvmtool:

[..]
[    0.335506] [drm] radeon kernel modesetting enabled.
[    0.336369] nouveau 0000:00:00.0: enabling device (0000 -> 0003)
[    0.359468] nouveau 0000:00:00.0: NVIDIA GP107 (137000a1)
[    0.505066] nouveau 0000:00:00.0: bios: version 86.07.6b.00.01
              <---- hangs here
[  123.867379] nouveau 0000:00:00.0: acr: firmware unavailable
[  123.868337] nouveau 0000:00:00.0: pmu: firmware unavailable
[  123.869488] nouveau 0000:00:00.0: gr: firmware unavailable
[  123.870506] nouveau 0000:00:00.0: sec2: firmware unavailable
[  123.928149] nouveau 0000:00:00.0: fb: 2048 MiB GDDR5
[  123.963159] [TTM] Zone  kernel: Available graphics memory: 8313888 KiB
[  123.964823] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[  123.966172] nouveau 0000:00:00.0: DRM: VRAM: 2048 MiB
[  123.967101] nouveau 0000:00:00.0: DRM: GART: 536870912 MiB
[  123.968258] nouveau 0000:00:00.0: DRM: BIT table 'A' not found
[  123.969403] nouveau 0000:00:00.0: DRM: BIT table 'L' not found
[  123.970498] nouveau 0000:00:00.0: DRM: TMDS table version 2.0
[  123.971688] nouveau 0000:00:00.0: DRM: DCB version 4.1
[  123.972639] nouveau 0000:00:00.0: DRM: DCB outp 00: 01800f56 04600020
[  123.973820] nouveau 0000:00:00.0: DRM: DCB outp 01: 01000f52 04620020
[  123.975083] nouveau 0000:00:00.0: DRM: DCB outp 02: 01811f46 04600010
[  123.976500] nouveau 0000:00:00.0: DRM: DCB outp 03: 01011f42 04620010
[  123.977681] nouveau 0000:00:00.0: DRM: DCB outp 04: 02822f76 04600020
[  123.978955] nouveau 0000:00:00.0: DRM: DCB outp 05: 02022f72 00020020
[  123.980309] nouveau 0000:00:00.0: DRM: DCB conn 00: 00002046
[  123.981352] nouveau 0000:00:00.0: DRM: DCB conn 01: 00001146
[  123.982379] nouveau 0000:00:00.0: DRM: DCB conn 02: 00020246
[  123.984507] nouveau 0000:00:00.0: DRM: failed to create kernel channel, -22
[  123.986661] nouveau 0000:00:00.0: DRM: MM: using COPY for buffer copies
[  124.291297] nouveau 0000:00:00.0: [drm] Cannot find any crtc or sizes
[  124.292839] [drm] Initialized nouveau 1.3.1 20120801 for 0000:00:00.0 on minor 0
[..]

6. Crucial MX500 SSD connected to a generic PCIE to sata adapter assigned
to the VM, direct kernel boot and EDK2 boot with Debian 10 disk,
--force-pci, 4k and 64k pages kernel for both direct kernel and UEFI boot.

This was weird. On the host, the PCIE adapter worked just fine with kernel
v5.8, but on v5.12 the host was not able to initialize it:

[    2.891697] ata2: SATA link down (SStatus 0 SControl 300)
[    3.211695] ata3: SATA link down (SStatus 0 SControl 300)
[    3.531699] ata4: SATA link down (SStatus 0 SControl 300)
[    3.851694] ata5: SATA link down (SStatus 0 SControl 300)
[    4.141559] ata9: SATA link down (SStatus 0 SControl 0)
[    4.171691] ata6: SATA link down (SStatus 0 SControl 300)
[    4.491695] ata7: SATA link down (SStatus 0 SControl 300)
[    4.811693] ata8: SATA link down (SStatus 0 SControl 300)
[    6.973559] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[    6.983615] ata10: softreset failed (SRST command error)
[    6.989992] ata10: reset failed (errno=-5), retrying in 8 secs
[   17.173560] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[   17.183618] ata10: softreset failed (SRST command error)
[   17.189990] ata10: reset failed (errno=-5), retrying in 8 secs
[   27.413557] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[   27.423615] ata10: softreset failed (SRST command error)
[   27.429986] ata10: reset failed (errno=-5), retrying in 33 secs
[   60.837548] ata10: limiting SATA link speed to 1.5 Gbps
[   63.001557] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[   63.011615] ata10: softreset failed (SRST command error)
[   63.017988] ata10: reset failed, giving up

Assigning it to a VM worked though after the host running Linux v5.8
unitializes the adapter, so I'm going to consider this a pass. After a few
more tests, I was able to trigger the same error on v5.8. On v5.12
initialization has failed every time (so far, at least).

[0] https://lore.kernel.org/kvm/20200326152438.6218-1-alexandru.elisei@arm.com/T/#m835c93ef1dc7c539b4cdda85aee23210d494ea49
[1] https://lore.kernel.org/kvm/20200326152438.6218-1-alexandru.elisei@arm.com/
[2] https://www.spinics.net/lists/kvm/msg245607.html
[3] https://cdimage.debian.org/debian-cd/current/arm64/iso-cd/debian-10.9.0-arm64-netinst.iso
[4] https://wiki.archlinux.org/title/Xorg#AMD
[5] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/pci-express-v2-edk2-binary
[6] https://edk2.groups.io/g/devel/message/76522?p=,,,20,0,0,0::Created,,armvirtpkg,20,2,0,83558261

Alexandru Elisei (4):
  Move fdt_irq_fn typedef to fdt.h
  arm/fdt.c: Don't generate the node if generator function is NULL
  arm/arm64: Add PCI Express 1.1 support
  arm/arm64: vfio: Add PCI Express Capability Structure

 arm/fdt.c                         |  7 ++-
 arm/include/arm-common/kvm-arch.h |  4 +-
 arm/pci.c                         |  2 +-
 hw/rtc.c                          |  1 +
 include/kvm/fdt.h                 |  2 +
 include/kvm/kvm.h                 |  1 -
 include/kvm/pci.h                 | 75 ++++++++++++++++++++++++++++---
 pci.c                             |  5 ++-
 vfio/pci.c                        | 44 ++++++++++++++----
 9 files changed, 121 insertions(+), 20 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h
  2021-06-21  9:21 [PATCH v2 kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
@ 2021-06-21  9:21 ` Alexandru Elisei
  2021-06-21  9:21 ` [PATCH v2 kvmtool 2/4] arm/fdt.c: Don't generate the node if generator function is NULL Alexandru Elisei
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Alexandru Elisei @ 2021-06-21  9:21 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz, pierre.gondois

The device tree code passes the function generate_irq_prop() to MMIO
devices to create the "interrupts" property. The typedef fdt_irq_fn is the
type used to pass the function to the device. It makes more sense for the
typedef to be in fdt.h with the rest of the device tree functions, so move
it there.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/rtc.c          | 1 +
 include/kvm/fdt.h | 2 ++
 include/kvm/kvm.h | 1 -
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/rtc.c b/hw/rtc.c
index aec31c52a85a..9b8785a869dd 100644
--- a/hw/rtc.c
+++ b/hw/rtc.c
@@ -1,5 +1,6 @@
 #include "kvm/rtc.h"
 
+#include "kvm/fdt.h"
 #include "kvm/ioport.h"
 #include "kvm/kvm.h"
 
diff --git a/include/kvm/fdt.h b/include/kvm/fdt.h
index 4e6157256482..060c37b947cc 100644
--- a/include/kvm/fdt.h
+++ b/include/kvm/fdt.h
@@ -25,6 +25,8 @@ enum irq_type {
 	IRQ_TYPE_LEVEL_MASK	= (IRQ_TYPE_LEVEL_LOW | IRQ_TYPE_LEVEL_HIGH),
 };
 
+typedef void (*fdt_irq_fn)(void *fdt, u8 irq, enum irq_type irq_type);
+
 extern char *fdt_stdout_path;
 
 /* Helper for the various bits of code that generate FDT nodes */
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 6c28afa3f0bb..56e9c8e347a0 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -44,7 +44,6 @@
 struct kvm_cpu;
 typedef void (*mmio_handler_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data,
 				u32 len, u8 is_write, void *ptr);
-typedef void (*fdt_irq_fn)(void *fdt, u8 irq, enum irq_type irq_type);
 
 enum {
 	KVM_VMSTATE_RUNNING,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 kvmtool 2/4] arm/fdt.c: Don't generate the node if generator function is NULL
  2021-06-21  9:21 [PATCH v2 kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
  2021-06-21  9:21 ` [PATCH v2 kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
@ 2021-06-21  9:21 ` Alexandru Elisei
  2021-06-21 14:03   ` Andre Przywara
  2021-06-21  9:21 ` [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
  2021-06-21  9:21 ` [PATCH v2 kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure Alexandru Elisei
  3 siblings, 1 reply; 11+ messages in thread
From: Alexandru Elisei @ 2021-06-21  9:21 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz, pierre.gondois

Print a more helpful debugging message when a MMIO device hasn't set a
function to generate an FDT node instead of causing a segmentation fault by
dereferencing a NULL pointer.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/fdt.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index 02091e9e0bee..7032985e99a3 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -171,7 +171,12 @@ static int setup_fdt(struct kvm *kvm)
 	dev_hdr = device__first_dev(DEVICE_BUS_MMIO);
 	while (dev_hdr) {
 		generate_mmio_fdt_nodes = dev_hdr->data;
-		generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
+		if (generate_mmio_fdt_nodes) {
+			generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
+		} else {
+			pr_debug("Missing FDT node generator for MMIO device %d",
+				 dev_hdr->dev_num);
+		}
 		dev_hdr = device__next_dev(dev_hdr);
 	}
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-21  9:21 [PATCH v2 kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
  2021-06-21  9:21 ` [PATCH v2 kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
  2021-06-21  9:21 ` [PATCH v2 kvmtool 2/4] arm/fdt.c: Don't generate the node if generator function is NULL Alexandru Elisei
@ 2021-06-21  9:21 ` Alexandru Elisei
  2021-06-21 14:04   ` Andre Przywara
  2021-06-21  9:21 ` [PATCH v2 kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure Alexandru Elisei
  3 siblings, 1 reply; 11+ messages in thread
From: Alexandru Elisei @ 2021-06-21  9:21 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz, pierre.gondois

PCI Express comes with an extended addressing scheme, which directly
translated into a bigger device configuration space (256->4096 bytes)
and bigger PCI configuration space (16->256 MB), as well as mandatory
capabilities (power management [1] and PCI Express capability [2]).

However, our virtio PCI implementation implements version 0.9 of the
protocol and it still uses transitional PCI device ID's, so we have
opted to omit the mandatory PCI Express capabilities. For VFIO, the power
management and PCI Express capability are left for a subsequent patch.

[1] PCI Express Base Specification Revision 1.1, section 7.6
[2] PCI Express Base Specification Revision 1.1, section 7.8

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/include/arm-common/kvm-arch.h |  4 ++-
 arm/pci.c                         |  2 +-
 include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
 pci.c                             |  5 +--
 vfio/pci.c                        | 26 ++++++++++------
 5 files changed, 70 insertions(+), 18 deletions(-)

diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
index 436b67b843fc..c645ac001bca 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -49,7 +49,7 @@
 
 
 #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
-#define ARM_PCI_CFG_SIZE	(1ULL << 24)
+#define ARM_PCI_CFG_SIZE	(1ULL << 28)
 #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
 #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
 				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
@@ -77,6 +77,8 @@
 
 #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
 
+#define ARCH_HAS_PCI_EXP	1
+
 static inline bool arm_addr_in_ioport_region(u64 phys_addr)
 {
 	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
diff --git a/arm/pci.c b/arm/pci.c
index ed325fa4a811..2251f627d8b5 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
 	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
 	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
 	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
-	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
+	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
 	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
 
 	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index bf81323d83b7..42d9e1c5645f 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -10,6 +10,7 @@
 #include "kvm/devices.h"
 #include "kvm/msi.h"
 #include "kvm/fdt.h"
+#include "kvm/kvm-arch.h"
 
 #define pci_dev_err(pci_hdr, fmt, ...) \
 	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
@@ -32,10 +33,49 @@
 #define PCI_CONFIG_BUS_FORWARD	0xcfa
 #define PCI_IO_SIZE		0x100
 #define PCI_IOPORT_START	0x6200
-#define PCI_CFG_SIZE		(1ULL << 24)
 
 struct kvm;
 
+/*
+ * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
+ * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
+ */
+#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
+#define PCI_DEV_CFG_SIZE_LEGACY		256
+#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
+#define PCI_DEV_CFG_SIZE_EXTENDED 	4096
+
+#ifdef ARCH_HAS_PCI_EXP
+#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
+#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
+
+union pci_config_address {
+	struct {
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+		unsigned	reg_offset	: 2;		/* 1  .. 0  */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
+		unsigned	function_number	: 3;		/* 14 .. 12 */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
+		unsigned	enable_bit	: 1;		/* 31       */
+#else
+		unsigned	enable_bit	: 1;		/* 31       */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	function_number	: 3;		/* 14 .. 12 */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
+		unsigned	reg_offset	: 2;		/* 1  .. 0  */
+#endif
+	};
+	u32 w;
+};
+
+#else
+#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
+#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
+
 union pci_config_address {
 	struct {
 #if __BYTE_ORDER == __LITTLE_ENDIAN
@@ -58,6 +98,9 @@ union pci_config_address {
 	};
 	u32 w;
 };
+#endif /* ARCH_HAS_PCI_EXP */
+
+#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
 struct msix_table {
 	struct msi_msg msg;
@@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
 				   int bar_num, void *data);
 
 #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
-#define PCI_DEV_CFG_SIZE	256
-#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
 struct pci_config_operations {
 	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
-		      u8 offset, void *data, int sz);
+		      u16 offset, void *data, int sz);
 	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
-		     u8 offset, void *data, int sz);
+		     u16 offset, void *data, int sz);
 };
 
 struct pci_device_header {
diff --git a/pci.c b/pci.c
index d6da79e0a56a..e593033164c1 100644
--- a/pci.c
+++ b/pci.c
@@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
-	u8 bar, offset;
+	u8 bar;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 	u32 value = 0;
@@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
-	u8 offset;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 
diff --git a/vfio/pci.c b/vfio/pci.c
index 49ecd12a38cd..6a4204634e71 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -313,7 +313,7 @@ out_unlock:
 }
 
 static void vfio_pci_msix_cap_write(struct kvm *kvm,
-				    struct vfio_device *vdev, u8 off,
+				    struct vfio_device *vdev, u16 off,
 				    void *data, int sz)
 {
 	struct vfio_pci_device *pdev = &vdev->pci;
@@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
 }
 
 static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
-				     u8 off, u8 *data, u32 sz)
+				     u16 off, u8 *data, u32 sz)
 {
 	size_t i;
 	u32 mask = 0;
@@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
 }
 
 static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
-				   u8 off, u8 *data, u32 sz)
+				   u16 off, u8 *data, u32 sz)
 {
 	u8 ctrl;
 	struct msi_msg msg;
@@ -553,7 +553,7 @@ out:
 }
 
 static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
-			      u8 offset, void *data, int sz)
+			      u16 offset, void *data, int sz)
 {
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
@@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
 }
 
 static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
-			       u8 offset, void *data, int sz)
+			       u16 offset, void *data, int sz)
 {
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
@@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 {
 	int ret;
 	size_t size;
-	u8 pos, next;
+	u16 pos, next;
 	struct pci_cap_hdr *cap;
-	u8 virt_hdr[PCI_DEV_CFG_SIZE];
+	u8 *virt_hdr;
 	struct vfio_pci_device *pdev = &vdev->pci;
 
 	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
 		return 0;
 
-	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
+	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
+	if (!virt_hdr)
+		return -ENOMEM;
 
 	pos = pdev->hdr.capabilities & ~3;
 
@@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
 	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
 
+	free(virt_hdr);
+
 	return 0;
 }
 
@@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 
 	/* Install our fake Configuration Space */
 	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
-	hdr_sz = PCI_DEV_CFG_SIZE;
+	/*
+	 * We don't touch the extended configuration space, let's be cautious
+	 * and not overwrite it all with zeros, or bad things might happen.
+	 */
+	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
 	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
 		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
 			     hdr_sz);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure
  2021-06-21  9:21 [PATCH v2 kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
                   ` (2 preceding siblings ...)
  2021-06-21  9:21 ` [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
@ 2021-06-21  9:21 ` Alexandru Elisei
  2021-06-21 14:04   ` Andre Przywara
  3 siblings, 1 reply; 11+ messages in thread
From: Alexandru Elisei @ 2021-06-21  9:21 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz, pierre.gondois

It turns out that some Linux drivers (like Realtek R8169) fall back to a
device-specific configuration method if the device is not PCI Express
capable:

[    1.433825] r8169 0000:00:00.0 enp0s0: No native access to PCI extended config space, falling back to CSI

Add the PCI Express Capability Structure and populate it for assigned
devices, as this is how the Linux PCI driver determines if a device is PCI
Express capable.

Because we don't emulate a PCI Express link, a root complex or any slot
related properties, the PCI Express capability is kept as small as possible
by ignoring those fields.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/pci.h | 24 ++++++++++++++++++++++++
 vfio/pci.c        | 18 ++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 42d9e1c5645f..0f2d5bbabdc3 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -46,6 +46,8 @@ struct kvm;
 #define PCI_DEV_CFG_SIZE_EXTENDED 	4096
 
 #ifdef ARCH_HAS_PCI_EXP
+#define arch_has_pci_exp()	(true)
+
 #define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
 #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
 
@@ -73,6 +75,8 @@ union pci_config_address {
 };
 
 #else
+#define arch_has_pci_exp()	(false)
+
 #define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
 #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
 
@@ -143,6 +147,24 @@ struct pci_cap_hdr {
 	u8	next;
 };
 
+struct pci_exp_cap {
+	u8 cap;
+	u8 next;
+	u16 cap_reg;
+	u32 dev_cap;
+	u16 dev_ctrl;
+	u16 dev_status;
+	u32 link_cap;
+	u16 link_ctrl;
+	u16 link_status;
+	u32 slot_cap;
+	u16 slot_ctrl;
+	u16 slot_status;
+	u16 root_ctrl;
+	u16 root_cap;
+	u32 root_status;
+};
+
 struct pci_device_header;
 
 typedef int (*bar_activate_fn_t)(struct kvm *kvm,
@@ -188,6 +210,8 @@ struct pci_device_header {
 			u8		min_gnt;
 			u8		max_lat;
 			struct msix_cap msix;
+			/* Used only by architectures which support PCIE */
+			struct pci_exp_cap pci_exp;
 		} __attribute__((packed));
 		/* Pad to PCI config space size */
 		u8	__pad[PCI_DEV_CFG_SIZE];
diff --git a/vfio/pci.c b/vfio/pci.c
index 6a4204634e71..14d578a8f2eb 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -12,6 +12,11 @@
 
 #include <assert.h>
 
+/* Some distros don't have the define. */
+#ifndef PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1
+#define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1	12
+#endif
+
 /* Wrapper around UAPI vfio_irq_set */
 union vfio_irq_eventfd {
 	struct vfio_irq_set	irq;
@@ -623,6 +628,12 @@ static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
 		return PCI_CAP_MSIX_SIZEOF;
 	case PCI_CAP_ID_MSI:
 		return vfio_pci_msi_cap_size((void *)cap_hdr);
+	case PCI_CAP_ID_EXP:
+		/*
+		 * We don't emulate any of the link, slot and root complex
+		 * properties, so ignore them.
+		 */
+		return PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1;
 	default:
 		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
 		return 0;
@@ -696,6 +707,13 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 			pdev->msi.pos = pos;
 			pdev->irq_modes |= VFIO_PCI_IRQ_MODE_MSI;
 			break;
+		case PCI_CAP_ID_EXP:
+			if (!arch_has_pci_exp())
+				continue;
+			ret = vfio_pci_add_cap(vdev, virt_hdr, cap, pos);
+			if (ret)
+				return ret;
+			break;
 		}
 	}
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 kvmtool 2/4] arm/fdt.c: Don't generate the node if generator function is NULL
  2021-06-21  9:21 ` [PATCH v2 kvmtool 2/4] arm/fdt.c: Don't generate the node if generator function is NULL Alexandru Elisei
@ 2021-06-21 14:03   ` Andre Przywara
  0 siblings, 0 replies; 11+ messages in thread
From: Andre Przywara @ 2021-06-21 14:03 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi,
	maz, pierre.gondois

On Mon, 21 Jun 2021 10:21:26 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Print a more helpful debugging message when a MMIO device hasn't set a
> function to generate an FDT node instead of causing a segmentation fault by
> dereferencing a NULL pointer.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks for the change.
One nit below, but anyway:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

> ---
>  arm/fdt.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/fdt.c b/arm/fdt.c
> index 02091e9e0bee..7032985e99a3 100644
> --- a/arm/fdt.c
> +++ b/arm/fdt.c
> @@ -171,7 +171,12 @@ static int setup_fdt(struct kvm *kvm)
>  	dev_hdr = device__first_dev(DEVICE_BUS_MMIO);
>  	while (dev_hdr) {
>  		generate_mmio_fdt_nodes = dev_hdr->data;
> -		generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
> +		if (generate_mmio_fdt_nodes) {
> +			generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
> +		} else {
> +			pr_debug("Missing FDT node generator for MMIO device %d",
> +				 dev_hdr->dev_num);
> +		}

I think coding style says you don't need the brackets here.

Cheers,
Andre

>  		dev_hdr = device__next_dev(dev_hdr);
>  	}
>  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-21  9:21 ` [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
@ 2021-06-21 14:04   ` Andre Przywara
  2021-06-23  9:32     ` Alexandru Elisei
  0 siblings, 1 reply; 11+ messages in thread
From: Andre Przywara @ 2021-06-21 14:04 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi,
	maz, pierre.gondois

On Mon, 21 Jun 2021 10:21:27 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> PCI Express comes with an extended addressing scheme, which directly
> translated into a bigger device configuration space (256->4096 bytes)
> and bigger PCI configuration space (16->256 MB), as well as mandatory
> capabilities (power management [1] and PCI Express capability [2]).
> 
> However, our virtio PCI implementation implements version 0.9 of the
> protocol and it still uses transitional PCI device ID's, so we have
> opted to omit the mandatory PCI Express capabilities. For VFIO, the power
> management and PCI Express capability are left for a subsequent patch.
> 
> [1] PCI Express Base Specification Revision 1.1, section 7.6
> [2] PCI Express Base Specification Revision 1.1, section 7.8
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Sorry, one thing I missed on the first review:

> ---
>  arm/include/arm-common/kvm-arch.h |  4 ++-
>  arm/pci.c                         |  2 +-
>  include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
>  pci.c                             |  5 +--
>  vfio/pci.c                        | 26 ++++++++++------
>  5 files changed, 70 insertions(+), 18 deletions(-)
> 
> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
> index 436b67b843fc..c645ac001bca 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -49,7 +49,7 @@
>  
>  
>  #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
>  #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
> @@ -77,6 +77,8 @@
>  
>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>  
> +#define ARCH_HAS_PCI_EXP	1
> +
>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>  {
>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
> diff --git a/arm/pci.c b/arm/pci.c
> index ed325fa4a811..2251f627d8b5 100644
> --- a/arm/pci.c
> +++ b/arm/pci.c
> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>  
>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index bf81323d83b7..42d9e1c5645f 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -10,6 +10,7 @@
>  #include "kvm/devices.h"
>  #include "kvm/msi.h"
>  #include "kvm/fdt.h"
> +#include "kvm/kvm-arch.h"
>  
>  #define pci_dev_err(pci_hdr, fmt, ...) \
>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> @@ -32,10 +33,49 @@
>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>  #define PCI_IO_SIZE		0x100
>  #define PCI_IOPORT_START	0x6200
> -#define PCI_CFG_SIZE		(1ULL << 24)
>  
>  struct kvm;
>  
> +/*
> + * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
> + * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
> + */
> +#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
> +#define PCI_DEV_CFG_SIZE_LEGACY		256
> +#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
> +#define PCI_DEV_CFG_SIZE_EXTENDED 	4096
> +
> +#ifdef ARCH_HAS_PCI_EXP
> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
> +
> +union pci_config_address {
> +	struct {
> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> +		unsigned	enable_bit	: 1;		/* 31       */
> +#else
> +		unsigned	enable_bit	: 1;		/* 31       */
> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> +#endif
> +	};
> +	u32 w;
> +};
> +
> +#else
> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
> +
>  union pci_config_address {
>  	struct {
>  #if __BYTE_ORDER == __LITTLE_ENDIAN
> @@ -58,6 +98,9 @@ union pci_config_address {
>  	};
>  	u32 w;
>  };
> +#endif /* ARCH_HAS_PCI_EXP */
> +
> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
>  struct msix_table {
>  	struct msi_msg msg;
> @@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>  				   int bar_num, void *data);
>  
>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
> -#define PCI_DEV_CFG_SIZE	256
> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
>  struct pci_config_operations {
>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -		      u8 offset, void *data, int sz);
> +		      u16 offset, void *data, int sz);
>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -		     u8 offset, void *data, int sz);
> +		     u16 offset, void *data, int sz);
>  };
>  
>  struct pci_device_header {
> diff --git a/pci.c b/pci.c
> index d6da79e0a56a..e593033164c1 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
>  	void *base;
> -	u8 bar, offset;
> +	u8 bar;
> +	u16 offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  	u32 value = 0;
> @@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
> -	u8 offset;
> +	u16 offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 49ecd12a38cd..6a4204634e71 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -313,7 +313,7 @@ out_unlock:
>  }
>  
>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
> -				    struct vfio_device *vdev, u8 off,
> +				    struct vfio_device *vdev, u16 off,
>  				    void *data, int sz)
>  {
>  	struct vfio_pci_device *pdev = &vdev->pci;
> @@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>  }
>  
>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
> -				     u8 off, u8 *data, u32 sz)
> +				     u16 off, u8 *data, u32 sz)
>  {
>  	size_t i;
>  	u32 mask = 0;
> @@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>  }
>  
>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
> -				   u8 off, u8 *data, u32 sz)
> +				   u16 off, u8 *data, u32 sz)
>  {
>  	u8 ctrl;
>  	struct msi_msg msg;
> @@ -553,7 +553,7 @@ out:
>  }
>  
>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -			      u8 offset, void *data, int sz)
> +			      u16 offset, void *data, int sz)
>  {
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
> @@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>  }
>  
>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -			       u8 offset, void *data, int sz)
> +			       u16 offset, void *data, int sz)
>  {
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
> @@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  {
>  	int ret;
>  	size_t size;
> -	u8 pos, next;
> +	u16 pos, next;
>  	struct pci_cap_hdr *cap;
> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
> +	u8 *virt_hdr;
>  	struct vfio_pci_device *pdev = &vdev->pci;
>  
>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>  		return 0;
>  
> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
> +	if (!virt_hdr)
> +		return -ENOMEM;

This will leak if the function returns with an error, due to
vfio_pci_add_cap() calls failing.
The easiest way out is probably initialising ret = 0 and using "goto
out;", I guess.

Cheers,
Andre



>  
>  	pos = pdev->hdr.capabilities & ~3;
>  
> @@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
>  
> +	free(virt_hdr);
> +
>  	return 0;
>  }
>  
> @@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>  
>  	/* Install our fake Configuration Space */
>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
> -	hdr_sz = PCI_DEV_CFG_SIZE;
> +	/*
> +	 * We don't touch the extended configuration space, let's be cautious
> +	 * and not overwrite it all with zeros, or bad things might happen.
> +	 */
> +	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
>  			     hdr_sz);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure
  2021-06-21  9:21 ` [PATCH v2 kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure Alexandru Elisei
@ 2021-06-21 14:04   ` Andre Przywara
  0 siblings, 0 replies; 11+ messages in thread
From: Andre Przywara @ 2021-06-21 14:04 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi,
	maz, pierre.gondois

On Mon, 21 Jun 2021 10:21:28 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> It turns out that some Linux drivers (like Realtek R8169) fall back to a
> device-specific configuration method if the device is not PCI Express
> capable:
> 
> [    1.433825] r8169 0000:00:00.0 enp0s0: No native access to PCI extended config space, falling back to CSI
> 
> Add the PCI Express Capability Structure and populate it for assigned
> devices, as this is how the Linux PCI driver determines if a device is PCI
> Express capable.
> 
> Because we don't emulate a PCI Express link, a root complex or any slot
> related properties, the PCI Express capability is kept as small as possible
> by ignoring those fields.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

OK, took a while to wrap my head around how this capability limit works,
but it seems correct.
Thanks for fixing compilation on my system :-)

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Thanks,
Andre

> ---
>  include/kvm/pci.h | 24 ++++++++++++++++++++++++
>  vfio/pci.c        | 18 ++++++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index 42d9e1c5645f..0f2d5bbabdc3 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -46,6 +46,8 @@ struct kvm;
>  #define PCI_DEV_CFG_SIZE_EXTENDED 	4096
>  
>  #ifdef ARCH_HAS_PCI_EXP
> +#define arch_has_pci_exp()	(true)
> +
>  #define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
>  #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
>  
> @@ -73,6 +75,8 @@ union pci_config_address {
>  };
>  
>  #else
> +#define arch_has_pci_exp()	(false)
> +
>  #define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
>  #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
>  
> @@ -143,6 +147,24 @@ struct pci_cap_hdr {
>  	u8	next;
>  };
>  
> +struct pci_exp_cap {
> +	u8 cap;
> +	u8 next;
> +	u16 cap_reg;
> +	u32 dev_cap;
> +	u16 dev_ctrl;
> +	u16 dev_status;
> +	u32 link_cap;
> +	u16 link_ctrl;
> +	u16 link_status;
> +	u32 slot_cap;
> +	u16 slot_ctrl;
> +	u16 slot_status;
> +	u16 root_ctrl;
> +	u16 root_cap;
> +	u32 root_status;
> +};
> +
>  struct pci_device_header;
>  
>  typedef int (*bar_activate_fn_t)(struct kvm *kvm,
> @@ -188,6 +210,8 @@ struct pci_device_header {
>  			u8		min_gnt;
>  			u8		max_lat;
>  			struct msix_cap msix;
> +			/* Used only by architectures which support PCIE */
> +			struct pci_exp_cap pci_exp;
>  		} __attribute__((packed));
>  		/* Pad to PCI config space size */
>  		u8	__pad[PCI_DEV_CFG_SIZE];
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 6a4204634e71..14d578a8f2eb 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -12,6 +12,11 @@
>  
>  #include <assert.h>
>  
> +/* Some distros don't have the define. */
> +#ifndef PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1
> +#define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1	12
> +#endif
> +
>  /* Wrapper around UAPI vfio_irq_set */
>  union vfio_irq_eventfd {
>  	struct vfio_irq_set	irq;
> @@ -623,6 +628,12 @@ static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
>  		return PCI_CAP_MSIX_SIZEOF;
>  	case PCI_CAP_ID_MSI:
>  		return vfio_pci_msi_cap_size((void *)cap_hdr);
> +	case PCI_CAP_ID_EXP:
> +		/*
> +		 * We don't emulate any of the link, slot and root complex
> +		 * properties, so ignore them.
> +		 */
> +		return PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1;
>  	default:
>  		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
>  		return 0;
> @@ -696,6 +707,13 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  			pdev->msi.pos = pos;
>  			pdev->irq_modes |= VFIO_PCI_IRQ_MODE_MSI;
>  			break;
> +		case PCI_CAP_ID_EXP:
> +			if (!arch_has_pci_exp())
> +				continue;
> +			ret = vfio_pci_add_cap(vdev, virt_hdr, cap, pos);
> +			if (ret)
> +				return ret;
> +			break;
>  		}
>  	}
>  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-21 14:04   ` Andre Przywara
@ 2021-06-23  9:32     ` Alexandru Elisei
  2021-06-23 10:06       ` Andre Przywara
  0 siblings, 1 reply; 11+ messages in thread
From: Alexandru Elisei @ 2021-06-23  9:32 UTC (permalink / raw)
  To: Andre Przywara
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi,
	maz, pierre.gondois

Hi Andre,

On 6/21/21 3:04 PM, Andre Przywara wrote:
> On Mon, 21 Jun 2021 10:21:27 +0100
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
>> PCI Express comes with an extended addressing scheme, which directly
>> translated into a bigger device configuration space (256->4096 bytes)
>> and bigger PCI configuration space (16->256 MB), as well as mandatory
>> capabilities (power management [1] and PCI Express capability [2]).
>>
>> However, our virtio PCI implementation implements version 0.9 of the
>> protocol and it still uses transitional PCI device ID's, so we have
>> opted to omit the mandatory PCI Express capabilities. For VFIO, the power
>> management and PCI Express capability are left for a subsequent patch.
>>
>> [1] PCI Express Base Specification Revision 1.1, section 7.6
>> [2] PCI Express Base Specification Revision 1.1, section 7.8
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Sorry, one thing I missed on the first review:
>
>> ---
>>  arm/include/arm-common/kvm-arch.h |  4 ++-
>>  arm/pci.c                         |  2 +-
>>  include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
>>  pci.c                             |  5 +--
>>  vfio/pci.c                        | 26 ++++++++++------
>>  5 files changed, 70 insertions(+), 18 deletions(-)
>>
>> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
>> index 436b67b843fc..c645ac001bca 100644
>> --- a/arm/include/arm-common/kvm-arch.h
>> +++ b/arm/include/arm-common/kvm-arch.h
>> @@ -49,7 +49,7 @@
>>  
>>  
>>  #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
>> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
>> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
>>  #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
>>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>> @@ -77,6 +77,8 @@
>>  
>>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>>  
>> +#define ARCH_HAS_PCI_EXP	1
>> +
>>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>>  {
>>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
>> diff --git a/arm/pci.c b/arm/pci.c
>> index ed325fa4a811..2251f627d8b5 100644
>> --- a/arm/pci.c
>> +++ b/arm/pci.c
>> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
>>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
>> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
>> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>>  
>>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index bf81323d83b7..42d9e1c5645f 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -10,6 +10,7 @@
>>  #include "kvm/devices.h"
>>  #include "kvm/msi.h"
>>  #include "kvm/fdt.h"
>> +#include "kvm/kvm-arch.h"
>>  
>>  #define pci_dev_err(pci_hdr, fmt, ...) \
>>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> @@ -32,10 +33,49 @@
>>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>>  #define PCI_IO_SIZE		0x100
>>  #define PCI_IOPORT_START	0x6200
>> -#define PCI_CFG_SIZE		(1ULL << 24)
>>  
>>  struct kvm;
>>  
>> +/*
>> + * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
>> + * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
>> + */
>> +#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
>> +#define PCI_DEV_CFG_SIZE_LEGACY		256
>> +#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
>> +#define PCI_DEV_CFG_SIZE_EXTENDED 	4096
>> +
>> +#ifdef ARCH_HAS_PCI_EXP
>> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
>> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
>> +
>> +union pci_config_address {
>> +	struct {
>> +#if __BYTE_ORDER == __LITTLE_ENDIAN
>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>> +		unsigned	enable_bit	: 1;		/* 31       */
>> +#else
>> +		unsigned	enable_bit	: 1;		/* 31       */
>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>> +#endif
>> +	};
>> +	u32 w;
>> +};
>> +
>> +#else
>> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
>> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
>> +
>>  union pci_config_address {
>>  	struct {
>>  #if __BYTE_ORDER == __LITTLE_ENDIAN
>> @@ -58,6 +98,9 @@ union pci_config_address {
>>  	};
>>  	u32 w;
>>  };
>> +#endif /* ARCH_HAS_PCI_EXP */
>> +
>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>  
>>  struct msix_table {
>>  	struct msi_msg msg;
>> @@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>>  				   int bar_num, void *data);
>>  
>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>> -#define PCI_DEV_CFG_SIZE	256
>> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>  
>>  struct pci_config_operations {
>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -		      u8 offset, void *data, int sz);
>> +		      u16 offset, void *data, int sz);
>>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -		     u8 offset, void *data, int sz);
>> +		     u16 offset, void *data, int sz);
>>  };
>>  
>>  struct pci_device_header {
>> diff --git a/pci.c b/pci.c
>> index d6da79e0a56a..e593033164c1 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>>  	void *base;
>> -	u8 bar, offset;
>> +	u8 bar;
>> +	u16 offset;
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>>  	u32 value = 0;
>> @@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  
>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>> -	u8 offset;
>> +	u16 offset;
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>>  
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 49ecd12a38cd..6a4204634e71 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -313,7 +313,7 @@ out_unlock:
>>  }
>>  
>>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
>> -				    struct vfio_device *vdev, u8 off,
>> +				    struct vfio_device *vdev, u16 off,
>>  				    void *data, int sz)
>>  {
>>  	struct vfio_pci_device *pdev = &vdev->pci;
>> @@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>>  }
>>  
>>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>> -				     u8 off, u8 *data, u32 sz)
>> +				     u16 off, u8 *data, u32 sz)
>>  {
>>  	size_t i;
>>  	u32 mask = 0;
>> @@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>>  }
>>  
>>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
>> -				   u8 off, u8 *data, u32 sz)
>> +				   u16 off, u8 *data, u32 sz)
>>  {
>>  	u8 ctrl;
>>  	struct msi_msg msg;
>> @@ -553,7 +553,7 @@ out:
>>  }
>>  
>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -			      u8 offset, void *data, int sz)
>> +			      u16 offset, void *data, int sz)
>>  {
>>  	struct vfio_region_info *info;
>>  	struct vfio_pci_device *pdev;
>> @@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>>  }
>>  
>>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -			       u8 offset, void *data, int sz)
>> +			       u16 offset, void *data, int sz)
>>  {
>>  	struct vfio_region_info *info;
>>  	struct vfio_pci_device *pdev;
>> @@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>  {
>>  	int ret;
>>  	size_t size;
>> -	u8 pos, next;
>> +	u16 pos, next;
>>  	struct pci_cap_hdr *cap;
>> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
>> +	u8 *virt_hdr;
>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>  
>>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>>  		return 0;
>>  
>> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
>> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
>> +	if (!virt_hdr)
>> +		return -ENOMEM;
> This will leak if the function returns with an error, due to
> vfio_pci_add_cap() calls failing.
> The easiest way out is probably initialising ret = 0 and using "goto
> out;", I guess.

I don't understand what you mean, can you elaborate?

From what I can tell, the function doesn't do any assignment before this point, so
I don't know what will leak. vfio_pci_add_cap() calls are skipped entirely if the
function returns early here, so I don't see how they can fail.

Thanks,

Alex

>
> Cheers,
> Andre
>
>
>
>>  
>>  	pos = pdev->hdr.capabilities & ~3;
>>  
>> @@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
>>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
>>  
>> +	free(virt_hdr);
>> +
>>  	return 0;
>>  }
>>  
>> @@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>>  
>>  	/* Install our fake Configuration Space */
>>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
>> -	hdr_sz = PCI_DEV_CFG_SIZE;
>> +	/*
>> +	 * We don't touch the extended configuration space, let's be cautious
>> +	 * and not overwrite it all with zeros, or bad things might happen.
>> +	 */
>> +	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
>>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
>>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
>>  			     hdr_sz);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-23  9:32     ` Alexandru Elisei
@ 2021-06-23 10:06       ` Andre Przywara
  2021-06-23 10:12         ` Alexandru Elisei
  0 siblings, 1 reply; 11+ messages in thread
From: Andre Przywara @ 2021-06-23 10:06 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi,
	maz, pierre.gondois

On Wed, 23 Jun 2021 10:32:49 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi Alex,

> On 6/21/21 3:04 PM, Andre Przywara wrote:
> > On Mon, 21 Jun 2021 10:21:27 +0100
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> >  
> >> PCI Express comes with an extended addressing scheme, which directly
> >> translated into a bigger device configuration space (256->4096 bytes)
> >> and bigger PCI configuration space (16->256 MB), as well as mandatory
> >> capabilities (power management [1] and PCI Express capability [2]).
> >>
> >> However, our virtio PCI implementation implements version 0.9 of the
> >> protocol and it still uses transitional PCI device ID's, so we have
> >> opted to omit the mandatory PCI Express capabilities. For VFIO, the power
> >> management and PCI Express capability are left for a subsequent patch.
> >>
> >> [1] PCI Express Base Specification Revision 1.1, section 7.6
> >> [2] PCI Express Base Specification Revision 1.1, section 7.8
> >>
> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>  
> > Sorry, one thing I missed on the first review:
> >  
> >> ---
> >>  arm/include/arm-common/kvm-arch.h |  4 ++-
> >>  arm/pci.c                         |  2 +-
> >>  include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
> >>  pci.c                             |  5 +--
> >>  vfio/pci.c                        | 26 ++++++++++------
> >>  5 files changed, 70 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
> >> index 436b67b843fc..c645ac001bca 100644
> >> --- a/arm/include/arm-common/kvm-arch.h
> >> +++ b/arm/include/arm-common/kvm-arch.h
> >> @@ -49,7 +49,7 @@
> >>  
> >>  
> >>  #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
> >> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
> >> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
> >>  #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
> >>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
> >>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
> >> @@ -77,6 +77,8 @@
> >>  
> >>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
> >>  
> >> +#define ARCH_HAS_PCI_EXP	1
> >> +
> >>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
> >>  {
> >>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
> >> diff --git a/arm/pci.c b/arm/pci.c
> >> index ed325fa4a811..2251f627d8b5 100644
> >> --- a/arm/pci.c
> >> +++ b/arm/pci.c
> >> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
> >>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
> >>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
> >>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
> >> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
> >> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
> >>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
> >>  
> >>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
> >> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> >> index bf81323d83b7..42d9e1c5645f 100644
> >> --- a/include/kvm/pci.h
> >> +++ b/include/kvm/pci.h
> >> @@ -10,6 +10,7 @@
> >>  #include "kvm/devices.h"
> >>  #include "kvm/msi.h"
> >>  #include "kvm/fdt.h"
> >> +#include "kvm/kvm-arch.h"
> >>  
> >>  #define pci_dev_err(pci_hdr, fmt, ...) \
> >>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> >> @@ -32,10 +33,49 @@
> >>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
> >>  #define PCI_IO_SIZE		0x100
> >>  #define PCI_IOPORT_START	0x6200
> >> -#define PCI_CFG_SIZE		(1ULL << 24)
> >>  
> >>  struct kvm;
> >>  
> >> +/*
> >> + * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
> >> + * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
> >> + */
> >> +#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
> >> +#define PCI_DEV_CFG_SIZE_LEGACY		256
> >> +#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
> >> +#define PCI_DEV_CFG_SIZE_EXTENDED 	4096
> >> +
> >> +#ifdef ARCH_HAS_PCI_EXP
> >> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
> >> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
> >> +
> >> +union pci_config_address {
> >> +	struct {
> >> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> >> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> >> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> >> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> >> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> >> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> >> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> >> +		unsigned	enable_bit	: 1;		/* 31       */
> >> +#else
> >> +		unsigned	enable_bit	: 1;		/* 31       */
> >> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> >> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> >> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> >> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> >> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> >> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> >> +#endif
> >> +	};
> >> +	u32 w;
> >> +};
> >> +
> >> +#else
> >> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
> >> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
> >> +
> >>  union pci_config_address {
> >>  	struct {
> >>  #if __BYTE_ORDER == __LITTLE_ENDIAN
> >> @@ -58,6 +98,9 @@ union pci_config_address {
> >>  	};
> >>  	u32 w;
> >>  };
> >> +#endif /* ARCH_HAS_PCI_EXP */
> >> +
> >> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
> >>  
> >>  struct msix_table {
> >>  	struct msi_msg msg;
> >> @@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
> >>  				   int bar_num, void *data);
> >>  
> >>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
> >> -#define PCI_DEV_CFG_SIZE	256
> >> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
> >>  
> >>  struct pci_config_operations {
> >>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> >> -		      u8 offset, void *data, int sz);
> >> +		      u16 offset, void *data, int sz);
> >>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> >> -		     u8 offset, void *data, int sz);
> >> +		     u16 offset, void *data, int sz);
> >>  };
> >>  
> >>  struct pci_device_header {
> >> diff --git a/pci.c b/pci.c
> >> index d6da79e0a56a..e593033164c1 100644
> >> --- a/pci.c
> >> +++ b/pci.c
> >> @@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
> >>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
> >>  {
> >>  	void *base;
> >> -	u8 bar, offset;
> >> +	u8 bar;
> >> +	u16 offset;
> >>  	struct pci_device_header *pci_hdr;
> >>  	u8 dev_num = addr.device_number;
> >>  	u32 value = 0;
> >> @@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
> >>  
> >>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
> >>  {
> >> -	u8 offset;
> >> +	u16 offset;
> >>  	struct pci_device_header *pci_hdr;
> >>  	u8 dev_num = addr.device_number;
> >>  
> >> diff --git a/vfio/pci.c b/vfio/pci.c
> >> index 49ecd12a38cd..6a4204634e71 100644
> >> --- a/vfio/pci.c
> >> +++ b/vfio/pci.c
> >> @@ -313,7 +313,7 @@ out_unlock:
> >>  }
> >>  
> >>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
> >> -				    struct vfio_device *vdev, u8 off,
> >> +				    struct vfio_device *vdev, u16 off,
> >>  				    void *data, int sz)
> >>  {
> >>  	struct vfio_pci_device *pdev = &vdev->pci;
> >> @@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
> >>  }
> >>  
> >>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
> >> -				     u8 off, u8 *data, u32 sz)
> >> +				     u16 off, u8 *data, u32 sz)
> >>  {
> >>  	size_t i;
> >>  	u32 mask = 0;
> >> @@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
> >>  }
> >>  
> >>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
> >> -				   u8 off, u8 *data, u32 sz)
> >> +				   u16 off, u8 *data, u32 sz)
> >>  {
> >>  	u8 ctrl;
> >>  	struct msi_msg msg;
> >> @@ -553,7 +553,7 @@ out:
> >>  }
> >>  
> >>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
> >> -			      u8 offset, void *data, int sz)
> >> +			      u16 offset, void *data, int sz)
> >>  {
> >>  	struct vfio_region_info *info;
> >>  	struct vfio_pci_device *pdev;
> >> @@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
> >>  }
> >>  
> >>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
> >> -			       u8 offset, void *data, int sz)
> >> +			       u16 offset, void *data, int sz)
> >>  {
> >>  	struct vfio_region_info *info;
> >>  	struct vfio_pci_device *pdev;
> >> @@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
> >>  {
> >>  	int ret;
> >>  	size_t size;
> >> -	u8 pos, next;
> >> +	u16 pos, next;
> >>  	struct pci_cap_hdr *cap;
> >> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
> >> +	u8 *virt_hdr;
> >>  	struct vfio_pci_device *pdev = &vdev->pci;
> >>  
> >>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
> >>  		return 0;
> >>  
> >> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
> >> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
> >> +	if (!virt_hdr)
> >> +		return -ENOMEM;  
> > This will leak if the function returns with an error, due to
> > vfio_pci_add_cap() calls failing.
> > The easiest way out is probably initialising ret = 0 and using "goto
> > out;", I guess.  
> 
> I don't understand what you mean, can you elaborate?
> 
> From what I can tell, the function doesn't do any assignment before this point, so
> I don't know what will leak. vfio_pci_add_cap() calls are skipped entirely if the
> function returns early here, so I don't see how they can fail.

I meant you only free virt_hdr at the end of the function, if it
returns successfully. If if returns with an error from any of the
vfio_pci_add_cap() calls in the switch/case statement, the allocation
will never be freed.
Sorry if my wording was unclear!

Cheers,
Andre

> >>  	pos = pdev->hdr.capabilities & ~3;
> >>  
> >> @@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
> >>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
> >>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
> >>  
> >> +	free(virt_hdr);
> >> +
> >>  	return 0;
> >>  }
> >>  
> >> @@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
> >>  
> >>  	/* Install our fake Configuration Space */
> >>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
> >> -	hdr_sz = PCI_DEV_CFG_SIZE;
> >> +	/*
> >> +	 * We don't touch the extended configuration space, let's be cautious
> >> +	 * and not overwrite it all with zeros, or bad things might happen.
> >> +	 */
> >> +	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
> >>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
> >>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
> >>  			     hdr_sz);  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-23 10:06       ` Andre Przywara
@ 2021-06-23 10:12         ` Alexandru Elisei
  0 siblings, 0 replies; 11+ messages in thread
From: Alexandru Elisei @ 2021-06-23 10:12 UTC (permalink / raw)
  To: Andre Przywara
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi,
	maz, pierre.gondois

Hi Andre,

On 6/23/21 11:06 AM, Andre Przywara wrote:
> On Wed, 23 Jun 2021 10:32:49 +0100
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi Alex,
>
>> On 6/21/21 3:04 PM, Andre Przywara wrote:
>>> On Mon, 21 Jun 2021 10:21:27 +0100
>>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>>  
>>>> PCI Express comes with an extended addressing scheme, which directly
>>>> translated into a bigger device configuration space (256->4096 bytes)
>>>> and bigger PCI configuration space (16->256 MB), as well as mandatory
>>>> capabilities (power management [1] and PCI Express capability [2]).
>>>>
>>>> However, our virtio PCI implementation implements version 0.9 of the
>>>> protocol and it still uses transitional PCI device ID's, so we have
>>>> opted to omit the mandatory PCI Express capabilities. For VFIO, the power
>>>> management and PCI Express capability are left for a subsequent patch.
>>>>
>>>> [1] PCI Express Base Specification Revision 1.1, section 7.6
>>>> [2] PCI Express Base Specification Revision 1.1, section 7.8
>>>>
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>  
>>> Sorry, one thing I missed on the first review:
>>>  
>>>> ---
>>>>  arm/include/arm-common/kvm-arch.h |  4 ++-
>>>>  arm/pci.c                         |  2 +-
>>>>  include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
>>>>  pci.c                             |  5 +--
>>>>  vfio/pci.c                        | 26 ++++++++++------
>>>>  5 files changed, 70 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
>>>> index 436b67b843fc..c645ac001bca 100644
>>>> --- a/arm/include/arm-common/kvm-arch.h
>>>> +++ b/arm/include/arm-common/kvm-arch.h
>>>> @@ -49,7 +49,7 @@
>>>>  
>>>>  
>>>>  #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
>>>> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
>>>> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
>>>>  #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
>>>>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>>>>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>>>> @@ -77,6 +77,8 @@
>>>>  
>>>>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>>>>  
>>>> +#define ARCH_HAS_PCI_EXP	1
>>>> +
>>>>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>>>>  {
>>>>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
>>>> diff --git a/arm/pci.c b/arm/pci.c
>>>> index ed325fa4a811..2251f627d8b5 100644
>>>> --- a/arm/pci.c
>>>> +++ b/arm/pci.c
>>>> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
>>>>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>>>>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>>>>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
>>>> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
>>>> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>>>>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>>>>  
>>>>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
>>>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>>>> index bf81323d83b7..42d9e1c5645f 100644
>>>> --- a/include/kvm/pci.h
>>>> +++ b/include/kvm/pci.h
>>>> @@ -10,6 +10,7 @@
>>>>  #include "kvm/devices.h"
>>>>  #include "kvm/msi.h"
>>>>  #include "kvm/fdt.h"
>>>> +#include "kvm/kvm-arch.h"
>>>>  
>>>>  #define pci_dev_err(pci_hdr, fmt, ...) \
>>>>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>>>> @@ -32,10 +33,49 @@
>>>>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>>>>  #define PCI_IO_SIZE		0x100
>>>>  #define PCI_IOPORT_START	0x6200
>>>> -#define PCI_CFG_SIZE		(1ULL << 24)
>>>>  
>>>>  struct kvm;
>>>>  
>>>> +/*
>>>> + * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
>>>> + * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
>>>> + */
>>>> +#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
>>>> +#define PCI_DEV_CFG_SIZE_LEGACY		256
>>>> +#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
>>>> +#define PCI_DEV_CFG_SIZE_EXTENDED 	4096
>>>> +
>>>> +#ifdef ARCH_HAS_PCI_EXP
>>>> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
>>>> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
>>>> +
>>>> +union pci_config_address {
>>>> +	struct {
>>>> +#if __BYTE_ORDER == __LITTLE_ENDIAN
>>>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>>>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>>>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>>>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>>>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>>>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>>>> +		unsigned	enable_bit	: 1;		/* 31       */
>>>> +#else
>>>> +		unsigned	enable_bit	: 1;		/* 31       */
>>>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>>>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>>>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>>>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>>>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>>>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>>>> +#endif
>>>> +	};
>>>> +	u32 w;
>>>> +};
>>>> +
>>>> +#else
>>>> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
>>>> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
>>>> +
>>>>  union pci_config_address {
>>>>  	struct {
>>>>  #if __BYTE_ORDER == __LITTLE_ENDIAN
>>>> @@ -58,6 +98,9 @@ union pci_config_address {
>>>>  	};
>>>>  	u32 w;
>>>>  };
>>>> +#endif /* ARCH_HAS_PCI_EXP */
>>>> +
>>>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>>  
>>>>  struct msix_table {
>>>>  	struct msi_msg msg;
>>>> @@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>>>>  				   int bar_num, void *data);
>>>>  
>>>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>>>> -#define PCI_DEV_CFG_SIZE	256
>>>> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>>  
>>>>  struct pci_config_operations {
>>>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> -		      u8 offset, void *data, int sz);
>>>> +		      u16 offset, void *data, int sz);
>>>>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> -		     u8 offset, void *data, int sz);
>>>> +		     u16 offset, void *data, int sz);
>>>>  };
>>>>  
>>>>  struct pci_device_header {
>>>> diff --git a/pci.c b/pci.c
>>>> index d6da79e0a56a..e593033164c1 100644
>>>> --- a/pci.c
>>>> +++ b/pci.c
>>>> @@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>>>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>>>  {
>>>>  	void *base;
>>>> -	u8 bar, offset;
>>>> +	u8 bar;
>>>> +	u16 offset;
>>>>  	struct pci_device_header *pci_hdr;
>>>>  	u8 dev_num = addr.device_number;
>>>>  	u32 value = 0;
>>>> @@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>>>  
>>>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>>>  {
>>>> -	u8 offset;
>>>> +	u16 offset;
>>>>  	struct pci_device_header *pci_hdr;
>>>>  	u8 dev_num = addr.device_number;
>>>>  
>>>> diff --git a/vfio/pci.c b/vfio/pci.c
>>>> index 49ecd12a38cd..6a4204634e71 100644
>>>> --- a/vfio/pci.c
>>>> +++ b/vfio/pci.c
>>>> @@ -313,7 +313,7 @@ out_unlock:
>>>>  }
>>>>  
>>>>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
>>>> -				    struct vfio_device *vdev, u8 off,
>>>> +				    struct vfio_device *vdev, u16 off,
>>>>  				    void *data, int sz)
>>>>  {
>>>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>>> @@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>>>>  }
>>>>  
>>>>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>>>> -				     u8 off, u8 *data, u32 sz)
>>>> +				     u16 off, u8 *data, u32 sz)
>>>>  {
>>>>  	size_t i;
>>>>  	u32 mask = 0;
>>>> @@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>>>>  }
>>>>  
>>>>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
>>>> -				   u8 off, u8 *data, u32 sz)
>>>> +				   u16 off, u8 *data, u32 sz)
>>>>  {
>>>>  	u8 ctrl;
>>>>  	struct msi_msg msg;
>>>> @@ -553,7 +553,7 @@ out:
>>>>  }
>>>>  
>>>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> -			      u8 offset, void *data, int sz)
>>>> +			      u16 offset, void *data, int sz)
>>>>  {
>>>>  	struct vfio_region_info *info;
>>>>  	struct vfio_pci_device *pdev;
>>>> @@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>>>>  }
>>>>  
>>>>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> -			       u8 offset, void *data, int sz)
>>>> +			       u16 offset, void *data, int sz)
>>>>  {
>>>>  	struct vfio_region_info *info;
>>>>  	struct vfio_pci_device *pdev;
>>>> @@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>>>  {
>>>>  	int ret;
>>>>  	size_t size;
>>>> -	u8 pos, next;
>>>> +	u16 pos, next;
>>>>  	struct pci_cap_hdr *cap;
>>>> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
>>>> +	u8 *virt_hdr;
>>>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>>>  
>>>>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>>>>  		return 0;
>>>>  
>>>> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
>>>> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
>>>> +	if (!virt_hdr)
>>>> +		return -ENOMEM;  
>>> This will leak if the function returns with an error, due to
>>> vfio_pci_add_cap() calls failing.
>>> The easiest way out is probably initialising ret = 0 and using "goto
>>> out;", I guess.  
>> I don't understand what you mean, can you elaborate?
>>
>> From what I can tell, the function doesn't do any assignment before this point, so
>> I don't know what will leak. vfio_pci_add_cap() calls are skipped entirely if the
>> function returns early here, so I don't see how they can fail.
> I meant you only free virt_hdr at the end of the function, if it
> returns successfully. If if returns with an error from any of the
> vfio_pci_add_cap() calls in the switch/case statement, the allocation
> will never be freed.
> Sorry if my wording was unclear!

You're right, we don't free virt_hdr if vfio_parse_caps() fails. I will fix that.

Thanks,

Alex

>
> Cheers,
> Andre
>
>>>>  	pos = pdev->hdr.capabilities & ~3;
>>>>  
>>>> @@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>>>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
>>>>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
>>>>  
>>>> +	free(virt_hdr);
>>>> +
>>>>  	return 0;
>>>>  }
>>>>  
>>>> @@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>>>>  
>>>>  	/* Install our fake Configuration Space */
>>>>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
>>>> -	hdr_sz = PCI_DEV_CFG_SIZE;
>>>> +	/*
>>>> +	 * We don't touch the extended configuration space, let's be cautious
>>>> +	 * and not overwrite it all with zeros, or bad things might happen.
>>>> +	 */
>>>> +	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
>>>>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
>>>>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
>>>>  			     hdr_sz);  

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-23 10:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-21  9:21 [PATCH v2 kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
2021-06-21  9:21 ` [PATCH v2 kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
2021-06-21  9:21 ` [PATCH v2 kvmtool 2/4] arm/fdt.c: Don't generate the node if generator function is NULL Alexandru Elisei
2021-06-21 14:03   ` Andre Przywara
2021-06-21  9:21 ` [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
2021-06-21 14:04   ` Andre Przywara
2021-06-23  9:32     ` Alexandru Elisei
2021-06-23 10:06       ` Andre Przywara
2021-06-23 10:12         ` Alexandru Elisei
2021-06-21  9:21 ` [PATCH v2 kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure Alexandru Elisei
2021-06-21 14:04   ` Andre Przywara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.