All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kvmtool 0/4] arm/arm64: PCI Express 1.1 support
@ 2021-06-09 18:38 Alexandru Elisei
  2021-06-09 18:38 ` [PATCH kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-09 18:38 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz

This series aims to add support for PCI Express 1.1. It is based on the
last patch [0] of the reassignable BAR series. The patch was discarded at
the time because there was no easy solution to solve the overlap between
the UART address and kvmtool's PCI I/O region, which made EDK2 and/or a
guest compiled with 64k pages very unhappy [1]. This is not the case
anymore, as the UART has been moved to address 0x1000000 in commit
45b4968e0de1 ("hw/serial: ARM/arm64: Use MMIO at higher addresses").

The series has also been tested with a work-in-progress version of EDK2
which supports PCI Express when running under kvmtool. Those patches will
be posted soon. The end result is that someone will be able to download an
official iso from the debian website and install it in a kvmtool VM.

The first two patches in the series are small and hopefully straightford
cleanups for stuff that I discovered when playing with kvmtool.

The third patch implements the PCI Express support only for the arm and
arm64 architectures. The reason for that is that I don't know how to do it
for x86, powerpc and mips (and for the last two I don't even have machines
to test it).

The last patch implements a fix for a Realtek RTL8168 NIC, where the Linux
drivers falls back to a device specific method of initialization if the
device is not PCI Express capable (doesn't have the PCI Express
Capability) [2].

Testing
=======

Warning, wall of text. Unless specified, the guest kernel was built from
tag v5.12.

On a Ryzen 3900x:
-----------------

amd64 architecture and no PCIE support, making sure no regressions are
introduced.

1. Direct kernel boot + Debian 10 disk with SDL, to exercise the emulated
VESA device.  Was able to login using the display manager and
virtio-{net,blk} were working correctly.

2. Direct kernel boot + Debian 10 disk with SDL + Realtek RTL8168 + Intel
82574L PCIE NIC, both assigned to the VM. Assigning an ip address to the
Realtek NIC fails with the message: "No native access to PCI extended
config space, falling back to CSI", which makes sense since kvmtool is
emulating legacy PCI 3.0 for the amd64 architecture. Other than that,
everything works as expected.

On odroid-c4:
-------------

1. Debian 10 disk + upstream EDK2 built from commit 1f515342d8d8
("DynamicTablesPkg: Use AML_NAME_SEG_SIZE define"), **without** --force-pci
(so using virtio-mmio). Kernel compiled with 4k, 16k and 64k pages. This
was done to make sure there are no regressions.

2. Direct kernel boot + Debian 10 disk, with --force-pci. Tried 3 versions
of the kernel, compiled with 4k, 16k and 64k pagesize. Got the warning:
"TCP: enp0s0: Driver has suspect GRO implementation, TCP performance may be
compromised." I suspect it is because of kvmtool legacy version of virtio.
This was further confirmed by running the same kernel with kvmtool built
from master, with and without --force-pci, the warning was still there.

3. Debian 10 disk + a work-in-progress version of EDK2 which enables PCIE
support for kvmtool, with --force-pci. The kernel was booted via Debian
grub, and same as above, I tried with kernels compiled with 4k, 16k and 64k
page sizes.

On AMD Seattle:
---------------

1. Using the EDK2 image and the passthrough Realtek RTL8168 NIC as the
network interface, I was able to use a vanilla netinstall iso from the
debian website [3] and install debian in a virtual machine. Woohoo!

One gotcha during installation: because kvmtool doesn't emulate a SCSI
CD-ROM, you need to manually specify the virtio disk for the installation
iso. At the 'Detect and mount CD-ROM' prompt, choose No when asked to load
CD-ROM drivers from removable media, Yes to manually select a CD-ROM module
and device, none when choosing the CD-ROM module (it's a virtio disk), then
the device file for accessing the CD-ROM is /dev/vda (only if the iso file
is the first --disk kvmtool parameter, otherwise /dev/vdb if it's the
second, and so on).

2. Realtek RTL8168, direct kernel boot and EDK2 boot with Debian 10 disk,
--force-pci, kernel compiled with 4k and 64k pages (Seattle doesn't support
16k pages) for both direct kernel boot and EDK2 boot.

3. Intel 82574L NIC, direct kernel boot and EDK2 boot with Debian 10 disk,
--force-pci, kernel compiled with 4k and 64k pages for both direct boot and
EDK2 boot.

4. AMD FirePro W2100 VGA + HDMI audio, both assigned to a VM, direct kernel
boot and EDK2 boot with Debian 10 disk, --force-pci, kernel compiled with
4k and 64k pages for both direct boot and EDK2 boot.

For this test, I switched the guest kernel to v5.10 because with v5.11 and
v5.12 I was getting this kernel panic caused by a NULL pointer deference:

[..]
[    0.943927] [drm] radeon kernel modesetting enabled.
[    0.945050] [drm] initializing kernel modesetting (OLAND 0x1002:0x6608 0x1002:0x2120 0x00).
[    0.946313] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.947736] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.949193] [drm:radeon_get_bios] *ERROR* Unable to locate a BIOS ROM
[    0.950151] radeon 0000:00:00.0: Fatal error during GPU init
[    0.950990] [drm] radeon: finishing device.
[    0.951633] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[    0.952936] Mem abort info:
[    0.953369]   ESR = 0x96000004
[    0.953838]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.954635]   SET = 0, FnV = 0
[    0.955100]   EA = 0, S1PTW = 0
[    0.955590] Data abort info:
[    0.956033]   ISV = 0, ISS = 0x00000004
[    0.956608]   CM = 0, WnR = 0
[    0.957099] [0000000000000020] user address but active_mm is swapper
[    0.958051] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[    0.958881] Modules linked in:
[    0.959356] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.11.0 #13
[    0.960268] Hardware name: linux,dummy-virt (DT)
[    0.960970] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[    0.962013] pc : ttm_resource_manager_evict_all+0x64/0x1f0
[    0.962972] lr : ttm_resource_manager_evict_all+0x5c/0x1f0
[    0.963931] sp : ffff80001212ba00
[    0.964517] x29: ffff80001212ba00 x28: 0000000000000000 
[    0.965448] x27: ffff8000118004e0 x26: ffff8000120cd000 
[    0.966371] x25: 0000000000000000 x24: ffff000080c946e8 
[    0.967296] x23: 0000000000000020 x22: 0000000000000000 
[    0.968227] x21: 0000000000000000 x20: ffff8000120cdb90 
[    0.969152] x19: ffff000080c94000 x18: ffffffffffffffff 
[    0.970076] x17: 0000000000000000 x16: 0000000000000001 
[    0.970999] x15: ffff80009212b787 x14: 0000000000000006 
[    0.971928] x13: ffff800011de2368 x12: 0000000000000264 
[    0.972852] x11: 00000000000000cc x10: ffff800011de2368 
[    0.973780] x9 : ffff800011de2368 x8 : 00000000ffffefff 
[    0.974701] x7 : ffff800011e3a368 x6 : ffff800011e3a368 
[    0.975637] x5 : 0000000000000000 x4 : 0000000000000000 
[    0.976559] x3 : ffff8000120cdb90 x2 : 0000000000000001 
[    0.977483] x1 : 0000000000000000 x0 : 0000000000000000 
[    0.978410] Call trace:
[    0.978851]  ttm_resource_manager_evict_all+0x64/0x1f0
[    0.979759]  radeon_bo_evict_vram+0x1c/0x30
[    0.980494]  radeon_device_fini+0x34/0xe8
[    0.981209]  radeon_driver_unload_kms+0x48/0x90
[    0.982000]  radeon_driver_load_kms+0x124/0x174
[    0.982792]  drm_dev_register+0xe0/0x210
[    0.983486]  radeon_pci_probe+0x120/0x1bc
[    0.984180]  local_pci_probe+0x40/0xac
[    0.984843]  pci_device_probe+0x114/0x1b0
[    0.985548]  really_probe+0xe4/0x4c0
[    0.986181]  driver_probe_device+0x58/0xc0
[    0.986902]  device_driver_attach+0xc0/0xcc
[    0.987642]  __driver_attach+0x84/0x124
[    0.988317]  bus_for_each_dev+0x70/0xd0
[    0.988996]  driver_attach+0x24/0x30
[    0.989627]  bus_add_driver+0x104/0x1ec
[    0.990300]  driver_register+0x78/0x130
[    0.990974]  __pci_register_driver+0x48/0x54
[    0.991730]  radeon_module_init+0x54/0x64
[    0.992438]  do_one_initcall+0x50/0x1b0
[    0.993115]  kernel_init_freeable+0x1d4/0x23c
[    0.993880]  kernel_init+0x14/0x118
[    0.994496]  ret_from_fork+0x10/0x34
[    0.995132] Code: f90033ff 9420650e d37c7f36 8b1602b6 (f94012c0) 
[    0.996201] ---[ end trace 88eed6171e8cb9bc ]---
[    0.997011] note: swapper/0[1] exited with preempt_count 1
[    0.997840] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.998984] SMP: stopping secondary CPUs
[    0.999605] Kernel Offset: disabled
[    1.000137] CPU features: 0x00240022,61006082
[    1.000793] Memory Limit: none
[    1.001330] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

This is how dmesg looks like with v5.10, v5.8 and v5.6:

[..]
[    0.972061] [drm] radeon kernel modesetting enabled.
[    0.973162] [drm] initializing kernel modesetting (OLAND 0x1002:0x6608 0x1002:0x2120 0x00).
[    0.974426] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.976037] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.977435] [drm:radeon_get_bios] *ERROR* Unable to locate a BIOS ROM
[    0.978381] radeon 0000:00:00.0: Fatal error during GPU init
[    0.979341] [drm] radeon: finishing device.
[    0.979963] [TTM] Memory type 2 has not been initialized
[    0.988250] radeon: probe of 0000:00:00.0 failed with error -22
[    0.989282] cacheinfo: Unable to detect cache hierarchy for CPU 0
[    0.993326] loop: module loaded
[..]

In my opinion, this is an upstream bug caused by incorrect clean up when
probing fails. I plan to see if I can reproduce it on my x86 machine (to
make it easier to other people to reproduce it) and then report it
upstream.

Note that I used the radeon driver instead of amdgpu because this is the
recommended driver [4] for the GCN1 architecture.

5. NVIDIA Quadro P400 VGA + HDMI audio, both assigned to a VM, direct kernel
boot and EDK2 boot with Debian 10 disk, --force-pci, kernel compiled with
4k and 64k pages for both direct boot and EDK2 boot.

Nouveau seems to work as expected (it binds to the GPU). but during driver
initialization it looks like the system hangs for 30s-1m. My guess is that
something times out in the driver due to missing emulation in kvmtool:

[..]
[    0.335506] [drm] radeon kernel modesetting enabled.
[    0.336369] nouveau 0000:00:00.0: enabling device (0000 -> 0003)
[    0.359468] nouveau 0000:00:00.0: NVIDIA GP107 (137000a1)
[    0.505066] nouveau 0000:00:00.0: bios: version 86.07.6b.00.01
              <---- hangs here
[  123.867379] nouveau 0000:00:00.0: acr: firmware unavailable
[  123.868337] nouveau 0000:00:00.0: pmu: firmware unavailable
[  123.869488] nouveau 0000:00:00.0: gr: firmware unavailable
[  123.870506] nouveau 0000:00:00.0: sec2: firmware unavailable
[  123.928149] nouveau 0000:00:00.0: fb: 2048 MiB GDDR5
[  123.963159] [TTM] Zone  kernel: Available graphics memory: 8313888 KiB
[  123.964823] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[  123.966172] nouveau 0000:00:00.0: DRM: VRAM: 2048 MiB
[  123.967101] nouveau 0000:00:00.0: DRM: GART: 536870912 MiB
[  123.968258] nouveau 0000:00:00.0: DRM: BIT table 'A' not found
[  123.969403] nouveau 0000:00:00.0: DRM: BIT table 'L' not found
[  123.970498] nouveau 0000:00:00.0: DRM: TMDS table version 2.0
[  123.971688] nouveau 0000:00:00.0: DRM: DCB version 4.1
[  123.972639] nouveau 0000:00:00.0: DRM: DCB outp 00: 01800f56 04600020
[  123.973820] nouveau 0000:00:00.0: DRM: DCB outp 01: 01000f52 04620020
[  123.975083] nouveau 0000:00:00.0: DRM: DCB outp 02: 01811f46 04600010
[  123.976500] nouveau 0000:00:00.0: DRM: DCB outp 03: 01011f42 04620010
[  123.977681] nouveau 0000:00:00.0: DRM: DCB outp 04: 02822f76 04600020
[  123.978955] nouveau 0000:00:00.0: DRM: DCB outp 05: 02022f72 00020020
[  123.980309] nouveau 0000:00:00.0: DRM: DCB conn 00: 00002046
[  123.981352] nouveau 0000:00:00.0: DRM: DCB conn 01: 00001146
[  123.982379] nouveau 0000:00:00.0: DRM: DCB conn 02: 00020246
[  123.984507] nouveau 0000:00:00.0: DRM: failed to create kernel channel, -22
[  123.986661] nouveau 0000:00:00.0: DRM: MM: using COPY for buffer copies
[  124.291297] nouveau 0000:00:00.0: [drm] Cannot find any crtc or sizes
[  124.292839] [drm] Initialized nouveau 1.3.1 20120801 for 0000:00:00.0 on minor 0
[..]

6. Crucial MX500 SSD connected to a generic PCIE to sata adapter assigned
to the VM, direct kernel boot and EDK2 boot with Debian 10 disk,
--force-pci, 4k and 64k pages kernel for both direct kernel and UEFI boot.

This was weird. On the host, the PCIE adapter worked just fine with kernel
v5.8, but on v5.12 the host was not able to initialize it:

[    2.891697] ata2: SATA link down (SStatus 0 SControl 300)
[    3.211695] ata3: SATA link down (SStatus 0 SControl 300)
[    3.531699] ata4: SATA link down (SStatus 0 SControl 300)
[    3.851694] ata5: SATA link down (SStatus 0 SControl 300)
[    4.141559] ata9: SATA link down (SStatus 0 SControl 0)
[    4.171691] ata6: SATA link down (SStatus 0 SControl 300)
[    4.491695] ata7: SATA link down (SStatus 0 SControl 300)
[    4.811693] ata8: SATA link down (SStatus 0 SControl 300)
[    6.973559] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[    6.983615] ata10: softreset failed (SRST command error)
[    6.989992] ata10: reset failed (errno=-5), retrying in 8 secs
[   17.173560] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[   17.183618] ata10: softreset failed (SRST command error)
[   17.189990] ata10: reset failed (errno=-5), retrying in 8 secs
[   27.413557] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[   27.423615] ata10: softreset failed (SRST command error)
[   27.429986] ata10: reset failed (errno=-5), retrying in 33 secs
[   60.837548] ata10: limiting SATA link speed to 1.5 Gbps
[   63.001557] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[   63.011615] ata10: softreset failed (SRST command error)
[   63.017988] ata10: reset failed, giving up

Assigning it to a VM worked though after the host running Linux v5.8
unitializes the adapter, so I'm going to consider this a pass. After a few
more tests, I was able to trigger the same error on v5.8. On v5.12
initialization has failed every time (so far, at least).

[0] https://lore.kernel.org/kvm/20200326152438.6218-1-alexandru.elisei@arm.com/T/#m835c93ef1dc7c539b4cdda85aee23210d494ea49
[1] https://lore.kernel.org/kvm/20200326152438.6218-1-alexandru.elisei@arm.com/
[2] https://www.spinics.net/lists/kvm/msg245607.html
[3] https://cdimage.debian.org/debian-cd/current/arm64/iso-cd/debian-10.9.0-arm64-netinst.iso
[4] https://wiki.archlinux.org/title/Xorg#AMD

Alexandru Elisei (4):
  Move fdt_irq_fn typedef to fdt.h
  arm/fdt.c: Warn if MMIO device doesn't provide a node generator
  arm/arm64: Add PCI Express 1.1 support
  arm/arm64: vfio: Add PCI Express Capability Structure

 arm/fdt.c                         |  7 ++-
 arm/include/arm-common/kvm-arch.h |  5 ++-
 arm/pci.c                         |  2 +-
 hw/rtc.c                          |  1 +
 include/kvm/fdt.h                 |  2 +
 include/kvm/kvm.h                 |  1 -
 include/kvm/pci.h                 | 75 ++++++++++++++++++++++++++++---
 pci.c                             |  5 ++-
 vfio/pci.c                        | 39 ++++++++++++----
 9 files changed, 116 insertions(+), 21 deletions(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h
  2021-06-09 18:38 [PATCH kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
@ 2021-06-09 18:38 ` Alexandru Elisei
  2021-06-10 16:13   ` Andre Przywara
  2021-06-09 18:38 ` [PATCH kvmtool 2/4] arm/fdt.c: Warn if MMIO device doesn't provide a node generator Alexandru Elisei
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-09 18:38 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz

The device tree code passes the function generate_irq_prop() to MMIO
devices to create the "interrupts" property. The typedef fdt_irq_fn is the
type used to pass the function to the device. It makes more sense for the
typedef to be in fdt.h with the rest of the device tree functions, so move
it there.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/rtc.c          | 1 +
 include/kvm/fdt.h | 2 ++
 include/kvm/kvm.h | 1 -
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/rtc.c b/hw/rtc.c
index aec31c52a85a..9b8785a869dd 100644
--- a/hw/rtc.c
+++ b/hw/rtc.c
@@ -1,5 +1,6 @@
 #include "kvm/rtc.h"
 
+#include "kvm/fdt.h"
 #include "kvm/ioport.h"
 #include "kvm/kvm.h"
 
diff --git a/include/kvm/fdt.h b/include/kvm/fdt.h
index 4e6157256482..060c37b947cc 100644
--- a/include/kvm/fdt.h
+++ b/include/kvm/fdt.h
@@ -25,6 +25,8 @@ enum irq_type {
 	IRQ_TYPE_LEVEL_MASK	= (IRQ_TYPE_LEVEL_LOW | IRQ_TYPE_LEVEL_HIGH),
 };
 
+typedef void (*fdt_irq_fn)(void *fdt, u8 irq, enum irq_type irq_type);
+
 extern char *fdt_stdout_path;
 
 /* Helper for the various bits of code that generate FDT nodes */
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 6c28afa3f0bb..56e9c8e347a0 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -44,7 +44,6 @@
 struct kvm_cpu;
 typedef void (*mmio_handler_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data,
 				u32 len, u8 is_write, void *ptr);
-typedef void (*fdt_irq_fn)(void *fdt, u8 irq, enum irq_type irq_type);
 
 enum {
 	KVM_VMSTATE_RUNNING,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kvmtool 2/4] arm/fdt.c: Warn if MMIO device doesn't provide a node generator
  2021-06-09 18:38 [PATCH kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
  2021-06-09 18:38 ` [PATCH kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
@ 2021-06-09 18:38 ` Alexandru Elisei
  2021-06-10 16:13   ` Andre Przywara
  2021-06-09 18:38 ` [PATCH kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
  2021-06-09 18:38 ` [PATCH kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure Alexandru Elisei
  3 siblings, 1 reply; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-09 18:38 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz

Print a more helpful warning when a MMIO device hasn't set a function to
generate an FDT instead of causing a segmentation fault by dereferencing a
NULL pointer.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/fdt.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index 02091e9e0bee..06287a13e395 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -171,7 +171,12 @@ static int setup_fdt(struct kvm *kvm)
 	dev_hdr = device__first_dev(DEVICE_BUS_MMIO);
 	while (dev_hdr) {
 		generate_mmio_fdt_nodes = dev_hdr->data;
-		generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
+		if (generate_mmio_fdt_nodes) {
+			generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
+		} else {
+			pr_warning("Missing FDT node generator for MMIO device %d",
+				   dev_hdr->dev_num);
+		}
 		dev_hdr = device__next_dev(dev_hdr);
 	}
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-09 18:38 [PATCH kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
  2021-06-09 18:38 ` [PATCH kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
  2021-06-09 18:38 ` [PATCH kvmtool 2/4] arm/fdt.c: Warn if MMIO device doesn't provide a node generator Alexandru Elisei
@ 2021-06-09 18:38 ` Alexandru Elisei
  2021-06-10 16:14   ` Andre Przywara
  2021-06-09 18:38 ` [PATCH kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure Alexandru Elisei
  3 siblings, 1 reply; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-09 18:38 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz

PCI Express comes with an extended addressing scheme, which directly
translated into a bigger device configuration space (256->4096 bytes)
and bigger PCI configuration space (16->256 MB), as well as mandatory
capabilities (power management [1] and PCI Express capability [2]).

However, our virtio PCI implementation implements version 0.9 of the
protocol and it still uses transitional PCI device ID's, so we have
opted to omit the mandatory PCI Express capabilities. For VFIO, the power
management and PCI Express capability are left for a subsequent patch.

[1] PCI Express Base Specification Revision 1.1, section 7.6
[2] PCI Express Base Specification Revision 1.1, section 7.8

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/include/arm-common/kvm-arch.h |  4 ++-
 arm/pci.c                         |  2 +-
 include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
 pci.c                             |  5 +--
 vfio/pci.c                        | 26 ++++++++++------
 5 files changed, 70 insertions(+), 18 deletions(-)

diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
index 436b67b843fc..c645ac001bca 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -49,7 +49,7 @@
 
 
 #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
-#define ARM_PCI_CFG_SIZE	(1ULL << 24)
+#define ARM_PCI_CFG_SIZE	(1ULL << 28)
 #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
 #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
 				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
@@ -77,6 +77,8 @@
 
 #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
 
+#define ARCH_HAS_PCI_EXP	1
+
 static inline bool arm_addr_in_ioport_region(u64 phys_addr)
 {
 	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
diff --git a/arm/pci.c b/arm/pci.c
index ed325fa4a811..2251f627d8b5 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
 	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
 	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
 	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
-	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
+	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
 	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
 
 	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index bf81323d83b7..42d9e1c5645f 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -10,6 +10,7 @@
 #include "kvm/devices.h"
 #include "kvm/msi.h"
 #include "kvm/fdt.h"
+#include "kvm/kvm-arch.h"
 
 #define pci_dev_err(pci_hdr, fmt, ...) \
 	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
@@ -32,10 +33,49 @@
 #define PCI_CONFIG_BUS_FORWARD	0xcfa
 #define PCI_IO_SIZE		0x100
 #define PCI_IOPORT_START	0x6200
-#define PCI_CFG_SIZE		(1ULL << 24)
 
 struct kvm;
 
+/*
+ * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
+ * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
+ */
+#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
+#define PCI_DEV_CFG_SIZE_LEGACY		256
+#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
+#define PCI_DEV_CFG_SIZE_EXTENDED 	4096
+
+#ifdef ARCH_HAS_PCI_EXP
+#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
+#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
+
+union pci_config_address {
+	struct {
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+		unsigned	reg_offset	: 2;		/* 1  .. 0  */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
+		unsigned	function_number	: 3;		/* 14 .. 12 */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
+		unsigned	enable_bit	: 1;		/* 31       */
+#else
+		unsigned	enable_bit	: 1;		/* 31       */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	function_number	: 3;		/* 14 .. 12 */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
+		unsigned	reg_offset	: 2;		/* 1  .. 0  */
+#endif
+	};
+	u32 w;
+};
+
+#else
+#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
+#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
+
 union pci_config_address {
 	struct {
 #if __BYTE_ORDER == __LITTLE_ENDIAN
@@ -58,6 +98,9 @@ union pci_config_address {
 	};
 	u32 w;
 };
+#endif /* ARCH_HAS_PCI_EXP */
+
+#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
 struct msix_table {
 	struct msi_msg msg;
@@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
 				   int bar_num, void *data);
 
 #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
-#define PCI_DEV_CFG_SIZE	256
-#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
 struct pci_config_operations {
 	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
-		      u8 offset, void *data, int sz);
+		      u16 offset, void *data, int sz);
 	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
-		     u8 offset, void *data, int sz);
+		     u16 offset, void *data, int sz);
 };
 
 struct pci_device_header {
diff --git a/pci.c b/pci.c
index d6da79e0a56a..e593033164c1 100644
--- a/pci.c
+++ b/pci.c
@@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
-	u8 bar, offset;
+	u8 bar;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 	u32 value = 0;
@@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
-	u8 offset;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 
diff --git a/vfio/pci.c b/vfio/pci.c
index 49ecd12a38cd..6a4204634e71 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -313,7 +313,7 @@ out_unlock:
 }
 
 static void vfio_pci_msix_cap_write(struct kvm *kvm,
-				    struct vfio_device *vdev, u8 off,
+				    struct vfio_device *vdev, u16 off,
 				    void *data, int sz)
 {
 	struct vfio_pci_device *pdev = &vdev->pci;
@@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
 }
 
 static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
-				     u8 off, u8 *data, u32 sz)
+				     u16 off, u8 *data, u32 sz)
 {
 	size_t i;
 	u32 mask = 0;
@@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
 }
 
 static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
-				   u8 off, u8 *data, u32 sz)
+				   u16 off, u8 *data, u32 sz)
 {
 	u8 ctrl;
 	struct msi_msg msg;
@@ -553,7 +553,7 @@ out:
 }
 
 static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
-			      u8 offset, void *data, int sz)
+			      u16 offset, void *data, int sz)
 {
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
@@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
 }
 
 static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
-			       u8 offset, void *data, int sz)
+			       u16 offset, void *data, int sz)
 {
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
@@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 {
 	int ret;
 	size_t size;
-	u8 pos, next;
+	u16 pos, next;
 	struct pci_cap_hdr *cap;
-	u8 virt_hdr[PCI_DEV_CFG_SIZE];
+	u8 *virt_hdr;
 	struct vfio_pci_device *pdev = &vdev->pci;
 
 	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
 		return 0;
 
-	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
+	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
+	if (!virt_hdr)
+		return -ENOMEM;
 
 	pos = pdev->hdr.capabilities & ~3;
 
@@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
 	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
 
+	free(virt_hdr);
+
 	return 0;
 }
 
@@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 
 	/* Install our fake Configuration Space */
 	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
-	hdr_sz = PCI_DEV_CFG_SIZE;
+	/*
+	 * We don't touch the extended configuration space, let's be cautious
+	 * and not overwrite it all with zeros, or bad things might happen.
+	 */
+	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
 	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
 		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
 			     hdr_sz);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure
  2021-06-09 18:38 [PATCH kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
                   ` (2 preceding siblings ...)
  2021-06-09 18:38 ` [PATCH kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
@ 2021-06-09 18:38 ` Alexandru Elisei
  2021-06-10 16:14   ` Andre Przywara
  3 siblings, 1 reply; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-09 18:38 UTC (permalink / raw)
  To: will, julien.thierry.kdev, kvm
  Cc: andre.przywara, sami.mujawar, lorenzo.pieralisi, maz

It turns out that some Linux drivers (like Realtek R8169) fall back to a
device-specific configuration method if the device is not PCI Express
capable:

[    1.433825] r8169 0000:00:00.0 enp0s0: No native access to PCI extended config space, falling back to CSI

Add the PCI Express Capability Structure and populate it for assigned
devices, as this is how the Linux PCI driver determines if a device is PCI
Express capable.

Because we don't emulate a PCI Express link, a root complex or any slot
related properties, the PCI Express capability is kept as small as possible
by ignoring those fields.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/pci.h | 24 ++++++++++++++++++++++++
 vfio/pci.c        | 13 +++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 42d9e1c5645f..0f2d5bbabdc3 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -46,6 +46,8 @@ struct kvm;
 #define PCI_DEV_CFG_SIZE_EXTENDED 	4096
 
 #ifdef ARCH_HAS_PCI_EXP
+#define arch_has_pci_exp()	(true)
+
 #define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
 #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
 
@@ -73,6 +75,8 @@ union pci_config_address {
 };
 
 #else
+#define arch_has_pci_exp()	(false)
+
 #define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
 #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
 
@@ -143,6 +147,24 @@ struct pci_cap_hdr {
 	u8	next;
 };
 
+struct pci_exp_cap {
+	u8 cap;
+	u8 next;
+	u16 cap_reg;
+	u32 dev_cap;
+	u16 dev_ctrl;
+	u16 dev_status;
+	u32 link_cap;
+	u16 link_ctrl;
+	u16 link_status;
+	u32 slot_cap;
+	u16 slot_ctrl;
+	u16 slot_status;
+	u16 root_ctrl;
+	u16 root_cap;
+	u32 root_status;
+};
+
 struct pci_device_header;
 
 typedef int (*bar_activate_fn_t)(struct kvm *kvm,
@@ -188,6 +210,8 @@ struct pci_device_header {
 			u8		min_gnt;
 			u8		max_lat;
 			struct msix_cap msix;
+			/* Used only by architectures which support PCIE */
+			struct pci_exp_cap pci_exp;
 		} __attribute__((packed));
 		/* Pad to PCI config space size */
 		u8	__pad[PCI_DEV_CFG_SIZE];
diff --git a/vfio/pci.c b/vfio/pci.c
index 6a4204634e71..5c9bec6db710 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -623,6 +623,12 @@ static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
 		return PCI_CAP_MSIX_SIZEOF;
 	case PCI_CAP_ID_MSI:
 		return vfio_pci_msi_cap_size((void *)cap_hdr);
+	case PCI_CAP_ID_EXP:
+		/*
+		 * We don't emulate any of the link, slot and root complex
+		 * properties, so ignore them.
+		 */
+		return PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1;
 	default:
 		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
 		return 0;
@@ -696,6 +702,13 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 			pdev->msi.pos = pos;
 			pdev->irq_modes |= VFIO_PCI_IRQ_MODE_MSI;
 			break;
+		case PCI_CAP_ID_EXP:
+			if (!arch_has_pci_exp())
+				continue;
+			ret = vfio_pci_add_cap(vdev, virt_hdr, cap, pos);
+			if (ret)
+				return ret;
+			break;
 		}
 	}
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h
  2021-06-09 18:38 ` [PATCH kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
@ 2021-06-10 16:13   ` Andre Przywara
  0 siblings, 0 replies; 16+ messages in thread
From: Andre Przywara @ 2021-06-10 16:13 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

On Wed,  9 Jun 2021 19:38:09 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> The device tree code passes the function generate_irq_prop() to MMIO
> devices to create the "interrupts" property. The typedef fdt_irq_fn is the
> type used to pass the function to the device. It makes more sense for the
> typedef to be in fdt.h with the rest of the device tree functions, so move
> it there.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Not sure why we really need that, but doesn't seem to hurt:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  hw/rtc.c          | 1 +
>  include/kvm/fdt.h | 2 ++
>  include/kvm/kvm.h | 1 -
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/rtc.c b/hw/rtc.c
> index aec31c52a85a..9b8785a869dd 100644
> --- a/hw/rtc.c
> +++ b/hw/rtc.c
> @@ -1,5 +1,6 @@
>  #include "kvm/rtc.h"
>  
> +#include "kvm/fdt.h"
>  #include "kvm/ioport.h"
>  #include "kvm/kvm.h"
>  
> diff --git a/include/kvm/fdt.h b/include/kvm/fdt.h
> index 4e6157256482..060c37b947cc 100644
> --- a/include/kvm/fdt.h
> +++ b/include/kvm/fdt.h
> @@ -25,6 +25,8 @@ enum irq_type {
>  	IRQ_TYPE_LEVEL_MASK	= (IRQ_TYPE_LEVEL_LOW | IRQ_TYPE_LEVEL_HIGH),
>  };
>  
> +typedef void (*fdt_irq_fn)(void *fdt, u8 irq, enum irq_type irq_type);
> +
>  extern char *fdt_stdout_path;
>  
>  /* Helper for the various bits of code that generate FDT nodes */
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index 6c28afa3f0bb..56e9c8e347a0 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -44,7 +44,6 @@
>  struct kvm_cpu;
>  typedef void (*mmio_handler_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data,
>  				u32 len, u8 is_write, void *ptr);
> -typedef void (*fdt_irq_fn)(void *fdt, u8 irq, enum irq_type irq_type);
>  
>  enum {
>  	KVM_VMSTATE_RUNNING,


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 2/4] arm/fdt.c: Warn if MMIO device doesn't provide a node generator
  2021-06-09 18:38 ` [PATCH kvmtool 2/4] arm/fdt.c: Warn if MMIO device doesn't provide a node generator Alexandru Elisei
@ 2021-06-10 16:13   ` Andre Przywara
  2021-06-10 16:38     ` Alexandru Elisei
  0 siblings, 1 reply; 16+ messages in thread
From: Andre Przywara @ 2021-06-10 16:13 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

On Wed,  9 Jun 2021 19:38:10 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Print a more helpful warning when a MMIO device hasn't set a function to
> generate an FDT instead of causing a segmentation fault by dereferencing a
> NULL pointer.

Not calling generate_mmio_fdt_nodes() if it's NULL is certainly a good
idea, but how did you trigger it?
Because I am wondering whether every MMIO device needs to have an DT
generator? And if that's not the case, a warning might be already too
much.

So either just drop a print at all or use pr_info()/pr_debug()?

Cheers,
Andre

> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/fdt.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/fdt.c b/arm/fdt.c
> index 02091e9e0bee..06287a13e395 100644
> --- a/arm/fdt.c
> +++ b/arm/fdt.c
> @@ -171,7 +171,12 @@ static int setup_fdt(struct kvm *kvm)
>  	dev_hdr = device__first_dev(DEVICE_BUS_MMIO);
>  	while (dev_hdr) {
>  		generate_mmio_fdt_nodes = dev_hdr->data;
> -		generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
> +		if (generate_mmio_fdt_nodes) {
> +			generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
> +		} else {
> +			pr_warning("Missing FDT node generator for MMIO device %d",
> +				   dev_hdr->dev_num);
> +		}
>  		dev_hdr = device__next_dev(dev_hdr);
>  	}
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-09 18:38 ` [PATCH kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
@ 2021-06-10 16:14   ` Andre Przywara
  2021-06-10 16:44     ` Alexandru Elisei
  0 siblings, 1 reply; 16+ messages in thread
From: Andre Przywara @ 2021-06-10 16:14 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

On Wed,  9 Jun 2021 19:38:11 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> PCI Express comes with an extended addressing scheme, which directly
> translated into a bigger device configuration space (256->4096 bytes)
> and bigger PCI configuration space (16->256 MB), as well as mandatory
> capabilities (power management [1] and PCI Express capability [2]).
> 
> However, our virtio PCI implementation implements version 0.9 of the
> protocol and it still uses transitional PCI device ID's, so we have
> opted to omit the mandatory PCI Express capabilities. For VFIO, the power
> management and PCI Express capability are left for a subsequent patch.
> 
> [1] PCI Express Base Specification Revision 1.1, section 7.6
> [2] PCI Express Base Specification Revision 1.1, section 7.8
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/include/arm-common/kvm-arch.h |  4 ++-
>  arm/pci.c                         |  2 +-
>  include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
>  pci.c                             |  5 +--
>  vfio/pci.c                        | 26 ++++++++++------
>  5 files changed, 70 insertions(+), 18 deletions(-)
> 
> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
> index 436b67b843fc..c645ac001bca 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -49,7 +49,7 @@
>  
>  
>  #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
>  #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
> @@ -77,6 +77,8 @@
>  
>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>  
> +#define ARCH_HAS_PCI_EXP	1
> +
>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>  {
>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
> diff --git a/arm/pci.c b/arm/pci.c
> index ed325fa4a811..2251f627d8b5 100644
> --- a/arm/pci.c
> +++ b/arm/pci.c
> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>  
>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index bf81323d83b7..42d9e1c5645f 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -10,6 +10,7 @@
>  #include "kvm/devices.h"
>  #include "kvm/msi.h"
>  #include "kvm/fdt.h"
> +#include "kvm/kvm-arch.h"
>  
>  #define pci_dev_err(pci_hdr, fmt, ...) \
>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> @@ -32,10 +33,49 @@
>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>  #define PCI_IO_SIZE		0x100
>  #define PCI_IOPORT_START	0x6200
> -#define PCI_CFG_SIZE		(1ULL << 24)
>  
>  struct kvm;
>  
> +/*
> + * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
> + * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
> + */
> +#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
> +#define PCI_DEV_CFG_SIZE_LEGACY		256
> +#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
> +#define PCI_DEV_CFG_SIZE_EXTENDED 	4096

Do we need the size right now? If not, we can use
#define PCI_DEV_CFG_SIZE  (PCI_CFG_SIZE << 16) down below?
To make it more obvious where this comes from?

The rest looks alright, thanks for addressing the comments from last
year ;-)

Cheers,
Andre


> +
> +#ifdef ARCH_HAS_PCI_EXP
> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
> +
> +union pci_config_address {
> +	struct {
> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> +		unsigned	enable_bit	: 1;		/* 31       */
> +#else
> +		unsigned	enable_bit	: 1;		/* 31       */
> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> +#endif
> +	};
> +	u32 w;
> +};
> +
> +#else
> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
> +
>  union pci_config_address {
>  	struct {
>  #if __BYTE_ORDER == __LITTLE_ENDIAN
> @@ -58,6 +98,9 @@ union pci_config_address {
>  	};
>  	u32 w;
>  };
> +#endif /* ARCH_HAS_PCI_EXP */
> +
> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
>  struct msix_table {
>  	struct msi_msg msg;
> @@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>  				   int bar_num, void *data);
>  
>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
> -#define PCI_DEV_CFG_SIZE	256
> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
>  struct pci_config_operations {
>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -		      u8 offset, void *data, int sz);
> +		      u16 offset, void *data, int sz);
>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -		     u8 offset, void *data, int sz);
> +		     u16 offset, void *data, int sz);
>  };
>  
>  struct pci_device_header {
> diff --git a/pci.c b/pci.c
> index d6da79e0a56a..e593033164c1 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
>  	void *base;
> -	u8 bar, offset;
> +	u8 bar;
> +	u16 offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  	u32 value = 0;
> @@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
> -	u8 offset;
> +	u16 offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 49ecd12a38cd..6a4204634e71 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -313,7 +313,7 @@ out_unlock:
>  }
>  
>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
> -				    struct vfio_device *vdev, u8 off,
> +				    struct vfio_device *vdev, u16 off,
>  				    void *data, int sz)
>  {
>  	struct vfio_pci_device *pdev = &vdev->pci;
> @@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>  }
>  
>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
> -				     u8 off, u8 *data, u32 sz)
> +				     u16 off, u8 *data, u32 sz)
>  {
>  	size_t i;
>  	u32 mask = 0;
> @@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>  }
>  
>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
> -				   u8 off, u8 *data, u32 sz)
> +				   u16 off, u8 *data, u32 sz)
>  {
>  	u8 ctrl;
>  	struct msi_msg msg;
> @@ -553,7 +553,7 @@ out:
>  }
>  
>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -			      u8 offset, void *data, int sz)
> +			      u16 offset, void *data, int sz)
>  {
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
> @@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>  }
>  
>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -			       u8 offset, void *data, int sz)
> +			       u16 offset, void *data, int sz)
>  {
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
> @@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  {
>  	int ret;
>  	size_t size;
> -	u8 pos, next;
> +	u16 pos, next;
>  	struct pci_cap_hdr *cap;
> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
> +	u8 *virt_hdr;
>  	struct vfio_pci_device *pdev = &vdev->pci;
>  
>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>  		return 0;
>  
> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
> +	if (!virt_hdr)
> +		return -ENOMEM;
>  
>  	pos = pdev->hdr.capabilities & ~3;
>  
> @@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
>  
> +	free(virt_hdr);
> +
>  	return 0;
>  }
>  
> @@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>  
>  	/* Install our fake Configuration Space */
>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
> -	hdr_sz = PCI_DEV_CFG_SIZE;
> +	/*
> +	 * We don't touch the extended configuration space, let's be cautious
> +	 * and not overwrite it all with zeros, or bad things might happen.
> +	 */
> +	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
>  			     hdr_sz);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure
  2021-06-09 18:38 ` [PATCH kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure Alexandru Elisei
@ 2021-06-10 16:14   ` Andre Przywara
  2021-06-10 17:17     ` Alexandru Elisei
  0 siblings, 1 reply; 16+ messages in thread
From: Andre Przywara @ 2021-06-10 16:14 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

On Wed,  9 Jun 2021 19:38:12 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> It turns out that some Linux drivers (like Realtek R8169) fall back to a
> device-specific configuration method if the device is not PCI Express
> capable:
> 
> [    1.433825] r8169 0000:00:00.0 enp0s0: No native access to PCI extended config space, falling back to CSI
> 

Nice discovery, thanks!

> Add the PCI Express Capability Structure and populate it for assigned
> devices, as this is how the Linux PCI driver determines if a device is PCI
> Express capable.
> 
> Because we don't emulate a PCI Express link, a root complex or any slot
> related properties, the PCI Express capability is kept as small as possible
> by ignoring those fields.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  include/kvm/pci.h | 24 ++++++++++++++++++++++++
>  vfio/pci.c        | 13 +++++++++++++
>  2 files changed, 37 insertions(+)
> 
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index 42d9e1c5645f..0f2d5bbabdc3 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -46,6 +46,8 @@ struct kvm;
>  #define PCI_DEV_CFG_SIZE_EXTENDED 	4096
>  
>  #ifdef ARCH_HAS_PCI_EXP
> +#define arch_has_pci_exp()	(true)
> +
>  #define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
>  #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
>  
> @@ -73,6 +75,8 @@ union pci_config_address {
>  };
>  
>  #else
> +#define arch_has_pci_exp()	(false)
> +
>  #define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
>  #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
>  
> @@ -143,6 +147,24 @@ struct pci_cap_hdr {
>  	u8	next;
>  };
>  
> +struct pci_exp_cap {
> +	u8 cap;
> +	u8 next;
> +	u16 cap_reg;
> +	u32 dev_cap;
> +	u16 dev_ctrl;
> +	u16 dev_status;
> +	u32 link_cap;
> +	u16 link_ctrl;
> +	u16 link_status;
> +	u32 slot_cap;
> +	u16 slot_ctrl;
> +	u16 slot_status;
> +	u16 root_ctrl;
> +	u16 root_cap;
> +	u32 root_status;
> +};
> +
>  struct pci_device_header;
>  
>  typedef int (*bar_activate_fn_t)(struct kvm *kvm,
> @@ -188,6 +210,8 @@ struct pci_device_header {
>  			u8		min_gnt;
>  			u8		max_lat;
>  			struct msix_cap msix;
> +			/* Used only by architectures which support PCIE */
> +			struct pci_exp_cap pci_exp;
>  		} __attribute__((packed));
>  		/* Pad to PCI config space size */
>  		u8	__pad[PCI_DEV_CFG_SIZE];
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 6a4204634e71..5c9bec6db710 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -623,6 +623,12 @@ static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
>  		return PCI_CAP_MSIX_SIZEOF;
>  	case PCI_CAP_ID_MSI:
>  		return vfio_pci_msi_cap_size((void *)cap_hdr);
> +	case PCI_CAP_ID_EXP:
> +		/*
> +		 * We don't emulate any of the link, slot and root complex
> +		 * properties, so ignore them.
> +		 */
> +		return PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1;

My (admittedly) older distro kernel headers don't carry this symbol.
Can we have a "#ifndef ... #define ... #endif" at the beginning of
this file?
If "Slackware" isn't convincing enough, RHEL 7.4 doesn't include it
either. And this issue hits x86 as well, even though we don't use PCIe
there. Also we do something similar in patch 3/4.

Rest looks alright.

Cheers,
Andre

>  	default:
>  		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
>  		return 0;
> @@ -696,6 +702,13 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  			pdev->msi.pos = pos;
>  			pdev->irq_modes |= VFIO_PCI_IRQ_MODE_MSI;
>  			break;
> +		case PCI_CAP_ID_EXP:
> +			if (!arch_has_pci_exp())
> +				continue;
> +			ret = vfio_pci_add_cap(vdev, virt_hdr, cap, pos);
> +			if (ret)
> +				return ret;
> +			break;
>  		}
>  	}
>  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 2/4] arm/fdt.c: Warn if MMIO device doesn't provide a node generator
  2021-06-10 16:13   ` Andre Przywara
@ 2021-06-10 16:38     ` Alexandru Elisei
  2021-06-14 14:07       ` Andre Przywara
  0 siblings, 1 reply; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-10 16:38 UTC (permalink / raw)
  To: Andre Przywara
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

Hi Andre,

On 6/10/21 5:13 PM, Andre Przywara wrote:
> On Wed,  9 Jun 2021 19:38:10 +0100
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
>> Print a more helpful warning when a MMIO device hasn't set a function to
>> generate an FDT instead of causing a segmentation fault by dereferencing a
>> NULL pointer.
> Not calling generate_mmio_fdt_nodes() if it's NULL is certainly a good
> idea, but how did you trigger it?

I was able to trigger it when I was hacking a custom MMIO device emulation to test
some behaviour in KVM.

> Because I am wondering whether every MMIO device needs to have an DT
> generator? And if that's not the case, a warning might be already too
> much.

I don't know how the guest will be able to discover the device if it's not in the
DT, that's why I put the warning. If there are devices which can be discovered by
the guest when they are missing from the DT, then I'll drop the warning.

Thanks,

Alex

>
> So either just drop a print at all or use pr_info()/pr_debug()?
>
> Cheers,
> Andre
>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  arm/fdt.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/fdt.c b/arm/fdt.c
>> index 02091e9e0bee..06287a13e395 100644
>> --- a/arm/fdt.c
>> +++ b/arm/fdt.c
>> @@ -171,7 +171,12 @@ static int setup_fdt(struct kvm *kvm)
>>  	dev_hdr = device__first_dev(DEVICE_BUS_MMIO);
>>  	while (dev_hdr) {
>>  		generate_mmio_fdt_nodes = dev_hdr->data;
>> -		generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
>> +		if (generate_mmio_fdt_nodes) {
>> +			generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
>> +		} else {
>> +			pr_warning("Missing FDT node generator for MMIO device %d",
>> +				   dev_hdr->dev_num);
>> +		}
>>  		dev_hdr = device__next_dev(dev_hdr);
>>  	}
>>  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-10 16:14   ` Andre Przywara
@ 2021-06-10 16:44     ` Alexandru Elisei
  2021-06-10 19:00       ` Andre Przywara
  0 siblings, 1 reply; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-10 16:44 UTC (permalink / raw)
  To: Andre Przywara
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

Hi Andre,

On 6/10/21 5:14 PM, Andre Przywara wrote:
> On Wed,  9 Jun 2021 19:38:11 +0100
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> PCI Express comes with an extended addressing scheme, which directly
>> translated into a bigger device configuration space (256->4096 bytes)
>> and bigger PCI configuration space (16->256 MB), as well as mandatory
>> capabilities (power management [1] and PCI Express capability [2]).
>>
>> However, our virtio PCI implementation implements version 0.9 of the
>> protocol and it still uses transitional PCI device ID's, so we have
>> opted to omit the mandatory PCI Express capabilities. For VFIO, the power
>> management and PCI Express capability are left for a subsequent patch.
>>
>> [1] PCI Express Base Specification Revision 1.1, section 7.6
>> [2] PCI Express Base Specification Revision 1.1, section 7.8
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  arm/include/arm-common/kvm-arch.h |  4 ++-
>>  arm/pci.c                         |  2 +-
>>  include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
>>  pci.c                             |  5 +--
>>  vfio/pci.c                        | 26 ++++++++++------
>>  5 files changed, 70 insertions(+), 18 deletions(-)
>>
>> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
>> index 436b67b843fc..c645ac001bca 100644
>> --- a/arm/include/arm-common/kvm-arch.h
>> +++ b/arm/include/arm-common/kvm-arch.h
>> @@ -49,7 +49,7 @@
>>  
>>  
>>  #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
>> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
>> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
>>  #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
>>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>> @@ -77,6 +77,8 @@
>>  
>>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>>  
>> +#define ARCH_HAS_PCI_EXP	1
>> +
>>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>>  {
>>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
>> diff --git a/arm/pci.c b/arm/pci.c
>> index ed325fa4a811..2251f627d8b5 100644
>> --- a/arm/pci.c
>> +++ b/arm/pci.c
>> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
>>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
>> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
>> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>>  
>>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index bf81323d83b7..42d9e1c5645f 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -10,6 +10,7 @@
>>  #include "kvm/devices.h"
>>  #include "kvm/msi.h"
>>  #include "kvm/fdt.h"
>> +#include "kvm/kvm-arch.h"
>>  
>>  #define pci_dev_err(pci_hdr, fmt, ...) \
>>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> @@ -32,10 +33,49 @@
>>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>>  #define PCI_IO_SIZE		0x100
>>  #define PCI_IOPORT_START	0x6200
>> -#define PCI_CFG_SIZE		(1ULL << 24)
>>  
>>  struct kvm;
>>  
>> +/*
>> + * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
>> + * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
>> + */
>> +#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
>> +#define PCI_DEV_CFG_SIZE_LEGACY		256
>> +#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
>> +#define PCI_DEV_CFG_SIZE_EXTENDED 	4096
> Do we need the size right now? If not, we can use
> #define PCI_DEV_CFG_SIZE  (PCI_CFG_SIZE << 16) down below?
> To make it more obvious where this comes from?

I put it there because we need PCI_DEV_CFG_SIZE_LEGACY in the vfio code, and it
looked strangely asymmetrical to have the legacy size but not the extended size.
Unless there's something I'm misinterpreting about your comment.

Thanks,

Alex

>
> The rest looks alright, thanks for addressing the comments from last
> year ;-)
>
> Cheers,
> Andre
>
>
>> +
>> +#ifdef ARCH_HAS_PCI_EXP
>> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
>> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
>> +
>> +union pci_config_address {
>> +	struct {
>> +#if __BYTE_ORDER == __LITTLE_ENDIAN
>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>> +		unsigned	enable_bit	: 1;		/* 31       */
>> +#else
>> +		unsigned	enable_bit	: 1;		/* 31       */
>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>> +#endif
>> +	};
>> +	u32 w;
>> +};
>> +
>> +#else
>> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
>> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
>> +
>>  union pci_config_address {
>>  	struct {
>>  #if __BYTE_ORDER == __LITTLE_ENDIAN
>> @@ -58,6 +98,9 @@ union pci_config_address {
>>  	};
>>  	u32 w;
>>  };
>> +#endif /* ARCH_HAS_PCI_EXP */
>> +
>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>  
>>  struct msix_table {
>>  	struct msi_msg msg;
>> @@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>>  				   int bar_num, void *data);
>>  
>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>> -#define PCI_DEV_CFG_SIZE	256
>> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>  
>>  struct pci_config_operations {
>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -		      u8 offset, void *data, int sz);
>> +		      u16 offset, void *data, int sz);
>>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -		     u8 offset, void *data, int sz);
>> +		     u16 offset, void *data, int sz);
>>  };
>>  
>>  struct pci_device_header {
>> diff --git a/pci.c b/pci.c
>> index d6da79e0a56a..e593033164c1 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>>  	void *base;
>> -	u8 bar, offset;
>> +	u8 bar;
>> +	u16 offset;
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>>  	u32 value = 0;
>> @@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  
>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>> -	u8 offset;
>> +	u16 offset;
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>>  
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 49ecd12a38cd..6a4204634e71 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -313,7 +313,7 @@ out_unlock:
>>  }
>>  
>>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
>> -				    struct vfio_device *vdev, u8 off,
>> +				    struct vfio_device *vdev, u16 off,
>>  				    void *data, int sz)
>>  {
>>  	struct vfio_pci_device *pdev = &vdev->pci;
>> @@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>>  }
>>  
>>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>> -				     u8 off, u8 *data, u32 sz)
>> +				     u16 off, u8 *data, u32 sz)
>>  {
>>  	size_t i;
>>  	u32 mask = 0;
>> @@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>>  }
>>  
>>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
>> -				   u8 off, u8 *data, u32 sz)
>> +				   u16 off, u8 *data, u32 sz)
>>  {
>>  	u8 ctrl;
>>  	struct msi_msg msg;
>> @@ -553,7 +553,7 @@ out:
>>  }
>>  
>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -			      u8 offset, void *data, int sz)
>> +			      u16 offset, void *data, int sz)
>>  {
>>  	struct vfio_region_info *info;
>>  	struct vfio_pci_device *pdev;
>> @@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>>  }
>>  
>>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -			       u8 offset, void *data, int sz)
>> +			       u16 offset, void *data, int sz)
>>  {
>>  	struct vfio_region_info *info;
>>  	struct vfio_pci_device *pdev;
>> @@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>  {
>>  	int ret;
>>  	size_t size;
>> -	u8 pos, next;
>> +	u16 pos, next;
>>  	struct pci_cap_hdr *cap;
>> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
>> +	u8 *virt_hdr;
>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>  
>>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>>  		return 0;
>>  
>> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
>> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
>> +	if (!virt_hdr)
>> +		return -ENOMEM;
>>  
>>  	pos = pdev->hdr.capabilities & ~3;
>>  
>> @@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
>>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
>>  
>> +	free(virt_hdr);
>> +
>>  	return 0;
>>  }
>>  
>> @@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>>  
>>  	/* Install our fake Configuration Space */
>>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
>> -	hdr_sz = PCI_DEV_CFG_SIZE;
>> +	/*
>> +	 * We don't touch the extended configuration space, let's be cautious
>> +	 * and not overwrite it all with zeros, or bad things might happen.
>> +	 */
>> +	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
>>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
>>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
>>  			     hdr_sz);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure
  2021-06-10 16:14   ` Andre Przywara
@ 2021-06-10 17:17     ` Alexandru Elisei
  0 siblings, 0 replies; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-10 17:17 UTC (permalink / raw)
  To: Andre Przywara
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

Hi Andre,

On 6/10/21 5:14 PM, Andre Przywara wrote:
> On Wed,  9 Jun 2021 19:38:12 +0100
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
>> It turns out that some Linux drivers (like Realtek R8169) fall back to a
>> device-specific configuration method if the device is not PCI Express
>> capable:
>>
>> [    1.433825] r8169 0000:00:00.0 enp0s0: No native access to PCI extended config space, falling back to CSI
>>
> Nice discovery, thanks!
>
>> Add the PCI Express Capability Structure and populate it for assigned
>> devices, as this is how the Linux PCI driver determines if a device is PCI
>> Express capable.
>>
>> Because we don't emulate a PCI Express link, a root complex or any slot
>> related properties, the PCI Express capability is kept as small as possible
>> by ignoring those fields.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  include/kvm/pci.h | 24 ++++++++++++++++++++++++
>>  vfio/pci.c        | 13 +++++++++++++
>>  2 files changed, 37 insertions(+)
>>
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index 42d9e1c5645f..0f2d5bbabdc3 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -46,6 +46,8 @@ struct kvm;
>>  #define PCI_DEV_CFG_SIZE_EXTENDED 	4096
>>  
>>  #ifdef ARCH_HAS_PCI_EXP
>> +#define arch_has_pci_exp()	(true)
>> +
>>  #define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
>>  #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
>>  
>> @@ -73,6 +75,8 @@ union pci_config_address {
>>  };
>>  
>>  #else
>> +#define arch_has_pci_exp()	(false)
>> +
>>  #define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
>>  #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
>>  
>> @@ -143,6 +147,24 @@ struct pci_cap_hdr {
>>  	u8	next;
>>  };
>>  
>> +struct pci_exp_cap {
>> +	u8 cap;
>> +	u8 next;
>> +	u16 cap_reg;
>> +	u32 dev_cap;
>> +	u16 dev_ctrl;
>> +	u16 dev_status;
>> +	u32 link_cap;
>> +	u16 link_ctrl;
>> +	u16 link_status;
>> +	u32 slot_cap;
>> +	u16 slot_ctrl;
>> +	u16 slot_status;
>> +	u16 root_ctrl;
>> +	u16 root_cap;
>> +	u32 root_status;
>> +};
>> +
>>  struct pci_device_header;
>>  
>>  typedef int (*bar_activate_fn_t)(struct kvm *kvm,
>> @@ -188,6 +210,8 @@ struct pci_device_header {
>>  			u8		min_gnt;
>>  			u8		max_lat;
>>  			struct msix_cap msix;
>> +			/* Used only by architectures which support PCIE */
>> +			struct pci_exp_cap pci_exp;
>>  		} __attribute__((packed));
>>  		/* Pad to PCI config space size */
>>  		u8	__pad[PCI_DEV_CFG_SIZE];
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 6a4204634e71..5c9bec6db710 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -623,6 +623,12 @@ static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
>>  		return PCI_CAP_MSIX_SIZEOF;
>>  	case PCI_CAP_ID_MSI:
>>  		return vfio_pci_msi_cap_size((void *)cap_hdr);
>> +	case PCI_CAP_ID_EXP:
>> +		/*
>> +		 * We don't emulate any of the link, slot and root complex
>> +		 * properties, so ignore them.
>> +		 */
>> +		return PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1;
> My (admittedly) older distro kernel headers don't carry this symbol.
> Can we have a "#ifndef ... #define ... #endif" at the beginning of
> this file?

Good catch, we'll fix in the next iteration.

Thanks,

Alex

> If "Slackware" isn't convincing enough, RHEL 7.4 doesn't include it
> either. And this issue hits x86 as well, even though we don't use PCIe
> there. Also we do something similar in patch 3/4.
>
> Rest looks alright.
>
> Cheers,
> Andre
>
>>  	default:
>>  		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
>>  		return 0;
>> @@ -696,6 +702,13 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>  			pdev->msi.pos = pos;
>>  			pdev->irq_modes |= VFIO_PCI_IRQ_MODE_MSI;
>>  			break;
>> +		case PCI_CAP_ID_EXP:
>> +			if (!arch_has_pci_exp())
>> +				continue;
>> +			ret = vfio_pci_add_cap(vdev, virt_hdr, cap, pos);
>> +			if (ret)
>> +				return ret;
>> +			break;
>>  		}
>>  	}
>>  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-10 16:44     ` Alexandru Elisei
@ 2021-06-10 19:00       ` Andre Przywara
  2021-06-11  8:51         ` Alexandru Elisei
  0 siblings, 1 reply; 16+ messages in thread
From: Andre Przywara @ 2021-06-10 19:00 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 10 Jun 2021 17:44:00 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Hi Andre,
> 
> On 6/10/21 5:14 PM, Andre Przywara wrote:
> > On Wed,  9 Jun 2021 19:38:11 +0100
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> >
> > Hi,
> >  
> >> PCI Express comes with an extended addressing scheme, which directly
> >> translated into a bigger device configuration space (256->4096 bytes)
> >> and bigger PCI configuration space (16->256 MB), as well as mandatory
> >> capabilities (power management [1] and PCI Express capability [2]).
> >>
> >> However, our virtio PCI implementation implements version 0.9 of the
> >> protocol and it still uses transitional PCI device ID's, so we have
> >> opted to omit the mandatory PCI Express capabilities. For VFIO, the power
> >> management and PCI Express capability are left for a subsequent patch.
> >>
> >> [1] PCI Express Base Specification Revision 1.1, section 7.6
> >> [2] PCI Express Base Specification Revision 1.1, section 7.8
> >>
> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> >> ---
> >>  arm/include/arm-common/kvm-arch.h |  4 ++-
> >>  arm/pci.c                         |  2 +-
> >>  include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
> >>  pci.c                             |  5 +--
> >>  vfio/pci.c                        | 26 ++++++++++------
> >>  5 files changed, 70 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
> >> index 436b67b843fc..c645ac001bca 100644
> >> --- a/arm/include/arm-common/kvm-arch.h
> >> +++ b/arm/include/arm-common/kvm-arch.h
> >> @@ -49,7 +49,7 @@
> >>  
> >>  
> >>  #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
> >> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
> >> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
> >>  #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
> >>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
> >>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
> >> @@ -77,6 +77,8 @@
> >>  
> >>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
> >>  
> >> +#define ARCH_HAS_PCI_EXP	1
> >> +
> >>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
> >>  {
> >>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
> >> diff --git a/arm/pci.c b/arm/pci.c
> >> index ed325fa4a811..2251f627d8b5 100644
> >> --- a/arm/pci.c
> >> +++ b/arm/pci.c
> >> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
> >>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
> >>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
> >>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
> >> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
> >> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
> >>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
> >>  
> >>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
> >> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> >> index bf81323d83b7..42d9e1c5645f 100644
> >> --- a/include/kvm/pci.h
> >> +++ b/include/kvm/pci.h
> >> @@ -10,6 +10,7 @@
> >>  #include "kvm/devices.h"
> >>  #include "kvm/msi.h"
> >>  #include "kvm/fdt.h"
> >> +#include "kvm/kvm-arch.h"
> >>  
> >>  #define pci_dev_err(pci_hdr, fmt, ...) \
> >>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> >> @@ -32,10 +33,49 @@
> >>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
> >>  #define PCI_IO_SIZE		0x100
> >>  #define PCI_IOPORT_START	0x6200
> >> -#define PCI_CFG_SIZE		(1ULL << 24)
> >>  
> >>  struct kvm;
> >>  
> >> +/*
> >> + * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
> >> + * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
> >> + */
> >> +#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
> >> +#define PCI_DEV_CFG_SIZE_LEGACY		256
> >> +#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
> >> +#define PCI_DEV_CFG_SIZE_EXTENDED 	4096  
> > Do we need the size right now? If not, we can use
> > #define PCI_DEV_CFG_SIZE  (PCI_CFG_SIZE << 16) down below?
> > To make it more obvious where this comes from?  
> 
> I put it there because we need PCI_DEV_CFG_SIZE_LEGACY in the vfio code, and it
> looked strangely asymmetrical to have the legacy size but not the extended size.
> Unless there's something I'm misinterpreting about your comment.

Sorry for not talking straight (and swapping arguments), I meant:
#define PCI_DEV_CFG_SIZE_LEGACY		256
#define PCI_DEV_CFG_SIZE_EXTENDED 	4096
#ifdef ARCH_HAS_PCI_EXP
#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
....
#endif
#define PCI_CFG_SIZE		(PCI_DEV_CFG_SIZE << 16)

But it's just a nit, to prove that I read the patch ;-)

Cheers,
Andre

> 
> Thanks,
> 
> Alex
> 
> >
> > The rest looks alright, thanks for addressing the comments from last
> > year ;-)
> >
> > Cheers,
> > Andre
> >
> >  
> >> +
> >> +#ifdef ARCH_HAS_PCI_EXP
> >> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
> >> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
> >> +
> >> +union pci_config_address {
> >> +	struct {
> >> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> >> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> >> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> >> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> >> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> >> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> >> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> >> +		unsigned	enable_bit	: 1;		/* 31       */
> >> +#else
> >> +		unsigned	enable_bit	: 1;		/* 31       */
> >> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> >> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> >> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> >> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> >> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> >> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> >> +#endif
> >> +	};
> >> +	u32 w;
> >> +};
> >> +
> >> +#else
> >> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
> >> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
> >> +
> >>  union pci_config_address {
> >>  	struct {
> >>  #if __BYTE_ORDER == __LITTLE_ENDIAN
> >> @@ -58,6 +98,9 @@ union pci_config_address {
> >>  	};
> >>  	u32 w;
> >>  };
> >> +#endif /* ARCH_HAS_PCI_EXP */
> >> +
> >> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
> >>  
> >>  struct msix_table {
> >>  	struct msi_msg msg;
> >> @@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
> >>  				   int bar_num, void *data);
> >>  
> >>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
> >> -#define PCI_DEV_CFG_SIZE	256
> >> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
> >>  
> >>  struct pci_config_operations {
> >>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> >> -		      u8 offset, void *data, int sz);
> >> +		      u16 offset, void *data, int sz);
> >>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> >> -		     u8 offset, void *data, int sz);
> >> +		     u16 offset, void *data, int sz);
> >>  };
> >>  
> >>  struct pci_device_header {
> >> diff --git a/pci.c b/pci.c
> >> index d6da79e0a56a..e593033164c1 100644
> >> --- a/pci.c
> >> +++ b/pci.c
> >> @@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
> >>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
> >>  {
> >>  	void *base;
> >> -	u8 bar, offset;
> >> +	u8 bar;
> >> +	u16 offset;
> >>  	struct pci_device_header *pci_hdr;
> >>  	u8 dev_num = addr.device_number;
> >>  	u32 value = 0;
> >> @@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
> >>  
> >>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
> >>  {
> >> -	u8 offset;
> >> +	u16 offset;
> >>  	struct pci_device_header *pci_hdr;
> >>  	u8 dev_num = addr.device_number;
> >>  
> >> diff --git a/vfio/pci.c b/vfio/pci.c
> >> index 49ecd12a38cd..6a4204634e71 100644
> >> --- a/vfio/pci.c
> >> +++ b/vfio/pci.c
> >> @@ -313,7 +313,7 @@ out_unlock:
> >>  }
> >>  
> >>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
> >> -				    struct vfio_device *vdev, u8 off,
> >> +				    struct vfio_device *vdev, u16 off,
> >>  				    void *data, int sz)
> >>  {
> >>  	struct vfio_pci_device *pdev = &vdev->pci;
> >> @@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
> >>  }
> >>  
> >>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
> >> -				     u8 off, u8 *data, u32 sz)
> >> +				     u16 off, u8 *data, u32 sz)
> >>  {
> >>  	size_t i;
> >>  	u32 mask = 0;
> >> @@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
> >>  }
> >>  
> >>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
> >> -				   u8 off, u8 *data, u32 sz)
> >> +				   u16 off, u8 *data, u32 sz)
> >>  {
> >>  	u8 ctrl;
> >>  	struct msi_msg msg;
> >> @@ -553,7 +553,7 @@ out:
> >>  }
> >>  
> >>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
> >> -			      u8 offset, void *data, int sz)
> >> +			      u16 offset, void *data, int sz)
> >>  {
> >>  	struct vfio_region_info *info;
> >>  	struct vfio_pci_device *pdev;
> >> @@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
> >>  }
> >>  
> >>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
> >> -			       u8 offset, void *data, int sz)
> >> +			       u16 offset, void *data, int sz)
> >>  {
> >>  	struct vfio_region_info *info;
> >>  	struct vfio_pci_device *pdev;
> >> @@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
> >>  {
> >>  	int ret;
> >>  	size_t size;
> >> -	u8 pos, next;
> >> +	u16 pos, next;
> >>  	struct pci_cap_hdr *cap;
> >> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
> >> +	u8 *virt_hdr;
> >>  	struct vfio_pci_device *pdev = &vdev->pci;
> >>  
> >>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
> >>  		return 0;
> >>  
> >> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
> >> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
> >> +	if (!virt_hdr)
> >> +		return -ENOMEM;
> >>  
> >>  	pos = pdev->hdr.capabilities & ~3;
> >>  
> >> @@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
> >>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
> >>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
> >>  
> >> +	free(virt_hdr);
> >> +
> >>  	return 0;
> >>  }
> >>  
> >> @@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
> >>  
> >>  	/* Install our fake Configuration Space */
> >>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
> >> -	hdr_sz = PCI_DEV_CFG_SIZE;
> >> +	/*
> >> +	 * We don't touch the extended configuration space, let's be cautious
> >> +	 * and not overwrite it all with zeros, or bad things might happen.
> >> +	 */
> >> +	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
> >>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
> >>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
> >>  			     hdr_sz);  


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support
  2021-06-10 19:00       ` Andre Przywara
@ 2021-06-11  8:51         ` Alexandru Elisei
  0 siblings, 0 replies; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-11  8:51 UTC (permalink / raw)
  To: Andre Przywara
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

Hi Andre,

On 6/10/21 8:00 PM, Andre Przywara wrote:
> On Thu, 10 Jun 2021 17:44:00 +0100
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
>> Hi Andre,
>>
>> On 6/10/21 5:14 PM, Andre Przywara wrote:
>>> On Wed,  9 Jun 2021 19:38:11 +0100
>>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>>
>>> Hi,
>>>  
>>>> PCI Express comes with an extended addressing scheme, which directly
>>>> translated into a bigger device configuration space (256->4096 bytes)
>>>> and bigger PCI configuration space (16->256 MB), as well as mandatory
>>>> capabilities (power management [1] and PCI Express capability [2]).
>>>>
>>>> However, our virtio PCI implementation implements version 0.9 of the
>>>> protocol and it still uses transitional PCI device ID's, so we have
>>>> opted to omit the mandatory PCI Express capabilities. For VFIO, the power
>>>> management and PCI Express capability are left for a subsequent patch.
>>>>
>>>> [1] PCI Express Base Specification Revision 1.1, section 7.6
>>>> [2] PCI Express Base Specification Revision 1.1, section 7.8
>>>>
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>>> ---
>>>>  arm/include/arm-common/kvm-arch.h |  4 ++-
>>>>  arm/pci.c                         |  2 +-
>>>>  include/kvm/pci.h                 | 51 ++++++++++++++++++++++++++++---
>>>>  pci.c                             |  5 +--
>>>>  vfio/pci.c                        | 26 ++++++++++------
>>>>  5 files changed, 70 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
>>>> index 436b67b843fc..c645ac001bca 100644
>>>> --- a/arm/include/arm-common/kvm-arch.h
>>>> +++ b/arm/include/arm-common/kvm-arch.h
>>>> @@ -49,7 +49,7 @@
>>>>  
>>>>  
>>>>  #define KVM_PCI_CFG_AREA	ARM_AXI_AREA
>>>> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
>>>> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
>>>>  #define KVM_PCI_MMIO_AREA	(KVM_PCI_CFG_AREA + ARM_PCI_CFG_SIZE)
>>>>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>>>>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>>>> @@ -77,6 +77,8 @@
>>>>  
>>>>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>>>>  
>>>> +#define ARCH_HAS_PCI_EXP	1
>>>> +
>>>>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>>>>  {
>>>>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
>>>> diff --git a/arm/pci.c b/arm/pci.c
>>>> index ed325fa4a811..2251f627d8b5 100644
>>>> --- a/arm/pci.c
>>>> +++ b/arm/pci.c
>>>> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
>>>>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>>>>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>>>>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
>>>> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
>>>> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>>>>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>>>>  
>>>>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
>>>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>>>> index bf81323d83b7..42d9e1c5645f 100644
>>>> --- a/include/kvm/pci.h
>>>> +++ b/include/kvm/pci.h
>>>> @@ -10,6 +10,7 @@
>>>>  #include "kvm/devices.h"
>>>>  #include "kvm/msi.h"
>>>>  #include "kvm/fdt.h"
>>>> +#include "kvm/kvm-arch.h"
>>>>  
>>>>  #define pci_dev_err(pci_hdr, fmt, ...) \
>>>>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>>>> @@ -32,10 +33,49 @@
>>>>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>>>>  #define PCI_IO_SIZE		0x100
>>>>  #define PCI_IOPORT_START	0x6200
>>>> -#define PCI_CFG_SIZE		(1ULL << 24)
>>>>  
>>>>  struct kvm;
>>>>  
>>>> +/*
>>>> + * On some distributions, pci_regs.h doesn't define PCI_CFG_SPACE_SIZE and
>>>> + * PCI_CFG_SPACE_EXP_SIZE, so we define our own.
>>>> + */
>>>> +#define PCI_CFG_SIZE_LEGACY		(1ULL << 24)
>>>> +#define PCI_DEV_CFG_SIZE_LEGACY		256
>>>> +#define PCI_CFG_SIZE_EXTENDED		(1ULL << 28)
>>>> +#define PCI_DEV_CFG_SIZE_EXTENDED 	4096  
>>> Do we need the size right now? If not, we can use
>>> #define PCI_DEV_CFG_SIZE  (PCI_CFG_SIZE << 16) down below?
>>> To make it more obvious where this comes from?  
>> I put it there because we need PCI_DEV_CFG_SIZE_LEGACY in the vfio code, and it
>> looked strangely asymmetrical to have the legacy size but not the extended size.
>> Unless there's something I'm misinterpreting about your comment.
> Sorry for not talking straight (and swapping arguments), I meant:
> #define PCI_DEV_CFG_SIZE_LEGACY		256
> #define PCI_DEV_CFG_SIZE_EXTENDED 	4096
> #ifdef ARCH_HAS_PCI_EXP
> #define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
> ....
> #endif
> #define PCI_CFG_SIZE		(PCI_DEV_CFG_SIZE << 16)

Oh, I see now. I would prefer to keep the shift count explicit, because it makes
it easier to check against the bitmap in pci_config_address (bit 24 is the first
reserved bit for legacy PCI, bit 28 is the first reserved bit for PCI Express),
against the ARM_PCI_CFG_SIZE in the memory layout for arm/arm64 and against Table
7-1: Enhanced Configuration Address Mapping from PCI EXPRESS BASE SPECIFICATION,
REV. 1.1.

Thanks,

Alex

>
> But it's just a nit, to prove that I read the patch ;-)
>
> Cheers,
> Andre
>
>> Thanks,
>>
>> Alex
>>
>>> The rest looks alright, thanks for addressing the comments from last
>>> year ;-)
>>>
>>> Cheers,
>>> Andre
>>>
>>>  
>>>> +
>>>> +#ifdef ARCH_HAS_PCI_EXP
>>>> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_EXTENDED
>>>> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_EXTENDED
>>>> +
>>>> +union pci_config_address {
>>>> +	struct {
>>>> +#if __BYTE_ORDER == __LITTLE_ENDIAN
>>>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>>>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>>>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>>>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>>>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>>>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>>>> +		unsigned	enable_bit	: 1;		/* 31       */
>>>> +#else
>>>> +		unsigned	enable_bit	: 1;		/* 31       */
>>>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>>>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>>>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>>>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>>>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>>>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>>>> +#endif
>>>> +	};
>>>> +	u32 w;
>>>> +};
>>>> +
>>>> +#else
>>>> +#define PCI_CFG_SIZE		PCI_CFG_SIZE_LEGACY
>>>> +#define PCI_DEV_CFG_SIZE	PCI_DEV_CFG_SIZE_LEGACY
>>>> +
>>>>  union pci_config_address {
>>>>  	struct {
>>>>  #if __BYTE_ORDER == __LITTLE_ENDIAN
>>>> @@ -58,6 +98,9 @@ union pci_config_address {
>>>>  	};
>>>>  	u32 w;
>>>>  };
>>>> +#endif /* ARCH_HAS_PCI_EXP */
>>>> +
>>>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>>  
>>>>  struct msix_table {
>>>>  	struct msi_msg msg;
>>>> @@ -110,14 +153,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>>>>  				   int bar_num, void *data);
>>>>  
>>>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>>>> -#define PCI_DEV_CFG_SIZE	256
>>>> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>>  
>>>>  struct pci_config_operations {
>>>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> -		      u8 offset, void *data, int sz);
>>>> +		      u16 offset, void *data, int sz);
>>>>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> -		     u8 offset, void *data, int sz);
>>>> +		     u16 offset, void *data, int sz);
>>>>  };
>>>>  
>>>>  struct pci_device_header {
>>>> diff --git a/pci.c b/pci.c
>>>> index d6da79e0a56a..e593033164c1 100644
>>>> --- a/pci.c
>>>> +++ b/pci.c
>>>> @@ -353,7 +353,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>>>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>>>  {
>>>>  	void *base;
>>>> -	u8 bar, offset;
>>>> +	u8 bar;
>>>> +	u16 offset;
>>>>  	struct pci_device_header *pci_hdr;
>>>>  	u8 dev_num = addr.device_number;
>>>>  	u32 value = 0;
>>>> @@ -392,7 +393,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>>>  
>>>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>>>  {
>>>> -	u8 offset;
>>>> +	u16 offset;
>>>>  	struct pci_device_header *pci_hdr;
>>>>  	u8 dev_num = addr.device_number;
>>>>  
>>>> diff --git a/vfio/pci.c b/vfio/pci.c
>>>> index 49ecd12a38cd..6a4204634e71 100644
>>>> --- a/vfio/pci.c
>>>> +++ b/vfio/pci.c
>>>> @@ -313,7 +313,7 @@ out_unlock:
>>>>  }
>>>>  
>>>>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
>>>> -				    struct vfio_device *vdev, u8 off,
>>>> +				    struct vfio_device *vdev, u16 off,
>>>>  				    void *data, int sz)
>>>>  {
>>>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>>> @@ -345,7 +345,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>>>>  }
>>>>  
>>>>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>>>> -				     u8 off, u8 *data, u32 sz)
>>>> +				     u16 off, u8 *data, u32 sz)
>>>>  {
>>>>  	size_t i;
>>>>  	u32 mask = 0;
>>>> @@ -393,7 +393,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>>>>  }
>>>>  
>>>>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
>>>> -				   u8 off, u8 *data, u32 sz)
>>>> +				   u16 off, u8 *data, u32 sz)
>>>>  {
>>>>  	u8 ctrl;
>>>>  	struct msi_msg msg;
>>>> @@ -553,7 +553,7 @@ out:
>>>>  }
>>>>  
>>>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> -			      u8 offset, void *data, int sz)
>>>> +			      u16 offset, void *data, int sz)
>>>>  {
>>>>  	struct vfio_region_info *info;
>>>>  	struct vfio_pci_device *pdev;
>>>> @@ -571,7 +571,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>>>>  }
>>>>  
>>>>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> -			       u8 offset, void *data, int sz)
>>>> +			       u16 offset, void *data, int sz)
>>>>  {
>>>>  	struct vfio_region_info *info;
>>>>  	struct vfio_pci_device *pdev;
>>>> @@ -658,15 +658,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>>>  {
>>>>  	int ret;
>>>>  	size_t size;
>>>> -	u8 pos, next;
>>>> +	u16 pos, next;
>>>>  	struct pci_cap_hdr *cap;
>>>> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
>>>> +	u8 *virt_hdr;
>>>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>>>  
>>>>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>>>>  		return 0;
>>>>  
>>>> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
>>>> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
>>>> +	if (!virt_hdr)
>>>> +		return -ENOMEM;
>>>>  
>>>>  	pos = pdev->hdr.capabilities & ~3;
>>>>  
>>>> @@ -702,6 +704,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>>>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
>>>>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
>>>>  
>>>> +	free(virt_hdr);
>>>> +
>>>>  	return 0;
>>>>  }
>>>>  
>>>> @@ -812,7 +816,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>>>>  
>>>>  	/* Install our fake Configuration Space */
>>>>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
>>>> -	hdr_sz = PCI_DEV_CFG_SIZE;
>>>> +	/*
>>>> +	 * We don't touch the extended configuration space, let's be cautious
>>>> +	 * and not overwrite it all with zeros, or bad things might happen.
>>>> +	 */
>>>> +	hdr_sz = PCI_DEV_CFG_SIZE_LEGACY;
>>>>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
>>>>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
>>>>  			     hdr_sz);  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 2/4] arm/fdt.c: Warn if MMIO device doesn't provide a node generator
  2021-06-10 16:38     ` Alexandru Elisei
@ 2021-06-14 14:07       ` Andre Przywara
  2021-06-14 15:38         ` Alexandru Elisei
  0 siblings, 1 reply; 16+ messages in thread
From: Andre Przywara @ 2021-06-14 14:07 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 10 Jun 2021 17:38:02 +0100
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi Alex,

> On 6/10/21 5:13 PM, Andre Przywara wrote:
> > On Wed,  9 Jun 2021 19:38:10 +0100
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> >  
> >> Print a more helpful warning when a MMIO device hasn't set a function to
> >> generate an FDT instead of causing a segmentation fault by dereferencing a
> >> NULL pointer.  
> > Not calling generate_mmio_fdt_nodes() if it's NULL is certainly a good
> > idea, but how did you trigger it?  
> 
> I was able to trigger it when I was hacking a custom MMIO device emulation to test
> some behaviour in KVM.
> 
> > Because I am wondering whether every MMIO device needs to have an DT
> > generator? And if that's not the case, a warning might be already too
> > much.  
> 
> I don't know how the guest will be able to discover the device if it's not in the
> DT, that's why I put the warning.
> If there are devices which can be discovered by
> the guest when they are missing from the DT, then I'll drop the warning.

Well, not discovered, probably, but the guest (or parts of the guest,
think EFI firmware) could have hard-coded knowledge about what to
expect. That's what we did for the RTC and initially for the CFI flash.
I agree it's not the best way to do (and we fixed both of those
devices), but it's also nothing a *user* could do much about, so having a
pr_debug() there seems more appropriate to me.

Cheers,
Andre

> >
> > So either just drop a print at all or use pr_info()/pr_debug()?
> >
> > Cheers,
> > Andre
> >  
> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> >> ---
> >>  arm/fdt.c | 7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arm/fdt.c b/arm/fdt.c
> >> index 02091e9e0bee..06287a13e395 100644
> >> --- a/arm/fdt.c
> >> +++ b/arm/fdt.c
> >> @@ -171,7 +171,12 @@ static int setup_fdt(struct kvm *kvm)
> >>  	dev_hdr = device__first_dev(DEVICE_BUS_MMIO);
> >>  	while (dev_hdr) {
> >>  		generate_mmio_fdt_nodes = dev_hdr->data;
> >> -		generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
> >> +		if (generate_mmio_fdt_nodes) {
> >> +			generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
> >> +		} else {
> >> +			pr_warning("Missing FDT node generator for MMIO device %d",
> >> +				   dev_hdr->dev_num);
> >> +		}
> >>  		dev_hdr = device__next_dev(dev_hdr);
> >>  	}
> >>    


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH kvmtool 2/4] arm/fdt.c: Warn if MMIO device doesn't provide a node generator
  2021-06-14 14:07       ` Andre Przywara
@ 2021-06-14 15:38         ` Alexandru Elisei
  0 siblings, 0 replies; 16+ messages in thread
From: Alexandru Elisei @ 2021-06-14 15:38 UTC (permalink / raw)
  To: Andre Przywara
  Cc: will, julien.thierry.kdev, kvm, sami.mujawar, lorenzo.pieralisi, maz

Hi Andre,

On 6/14/21 3:07 PM, Andre Przywara wrote:
> On Thu, 10 Jun 2021 17:38:02 +0100
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi Alex,
>
>> On 6/10/21 5:13 PM, Andre Przywara wrote:
>>> On Wed,  9 Jun 2021 19:38:10 +0100
>>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>>  
>>>> Print a more helpful warning when a MMIO device hasn't set a function to
>>>> generate an FDT instead of causing a segmentation fault by dereferencing a
>>>> NULL pointer.  
>>> Not calling generate_mmio_fdt_nodes() if it's NULL is certainly a good
>>> idea, but how did you trigger it?  
>> I was able to trigger it when I was hacking a custom MMIO device emulation to test
>> some behaviour in KVM.
>>
>>> Because I am wondering whether every MMIO device needs to have an DT
>>> generator? And if that's not the case, a warning might be already too
>>> much.  
>> I don't know how the guest will be able to discover the device if it's not in the
>> DT, that's why I put the warning.
>> If there are devices which can be discovered by
>> the guest when they are missing from the DT, then I'll drop the warning.
> Well, not discovered, probably, but the guest (or parts of the guest,
> think EFI firmware) could have hard-coded knowledge about what to
> expect. That's what we did for the RTC and initially for the CFI flash.
> I agree it's not the best way to do (and we fixed both of those
> devices), but it's also nothing a *user* could do much about, so having a
> pr_debug() there seems more appropriate to me.

I think you're right, pr_debug() makes more sense here.

Thanks,

Alex

>
> Cheers,
> Andre
>
>>> So either just drop a print at all or use pr_info()/pr_debug()?
>>>
>>> Cheers,
>>> Andre
>>>  
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>>> ---
>>>>  arm/fdt.c | 7 ++++++-
>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arm/fdt.c b/arm/fdt.c
>>>> index 02091e9e0bee..06287a13e395 100644
>>>> --- a/arm/fdt.c
>>>> +++ b/arm/fdt.c
>>>> @@ -171,7 +171,12 @@ static int setup_fdt(struct kvm *kvm)
>>>>  	dev_hdr = device__first_dev(DEVICE_BUS_MMIO);
>>>>  	while (dev_hdr) {
>>>>  		generate_mmio_fdt_nodes = dev_hdr->data;
>>>> -		generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
>>>> +		if (generate_mmio_fdt_nodes) {
>>>> +			generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
>>>> +		} else {
>>>> +			pr_warning("Missing FDT node generator for MMIO device %d",
>>>> +				   dev_hdr->dev_num);
>>>> +		}
>>>>  		dev_hdr = device__next_dev(dev_hdr);
>>>>  	}
>>>>    

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-06-14 15:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 18:38 [PATCH kvmtool 0/4] arm/arm64: PCI Express 1.1 support Alexandru Elisei
2021-06-09 18:38 ` [PATCH kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
2021-06-10 16:13   ` Andre Przywara
2021-06-09 18:38 ` [PATCH kvmtool 2/4] arm/fdt.c: Warn if MMIO device doesn't provide a node generator Alexandru Elisei
2021-06-10 16:13   ` Andre Przywara
2021-06-10 16:38     ` Alexandru Elisei
2021-06-14 14:07       ` Andre Przywara
2021-06-14 15:38         ` Alexandru Elisei
2021-06-09 18:38 ` [PATCH kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
2021-06-10 16:14   ` Andre Przywara
2021-06-10 16:44     ` Alexandru Elisei
2021-06-10 19:00       ` Andre Przywara
2021-06-11  8:51         ` Alexandru Elisei
2021-06-09 18:38 ` [PATCH kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure Alexandru Elisei
2021-06-10 16:14   ` Andre Przywara
2021-06-10 17:17     ` Alexandru Elisei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.