kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexandru Elisei <alexandru.elisei@arm.com>
To: will@kernel.org, julien.thierry.kdev@gmail.com, kvm@vger.kernel.org
Cc: andre.przywara@arm.com, sami.mujawar@arm.com,
	lorenzo.pieralisi@arm.com, maz@kernel.org,
	pierre.gondois@arm.com
Subject: [PATCH v2 kvmtool 0/4] arm/arm64: PCI Express 1.1 support
Date: Mon, 21 Jun 2021 10:21:24 +0100	[thread overview]
Message-ID: <20210621092128.11313-1-alexandru.elisei@arm.com> (raw)

Patches + EDK2 binary that I used for testing can be found at [5].

This series aims to add support for PCI Express 1.1. It is based on the
last patch [0] of the reassignable BAR series. The patch was discarded at
the time because there was no easy solution to solve the overlap between
the UART address and kvmtool's PCI I/O region, which made EDK2 and/or a
guest compiled with 64k pages very unhappy [1]. This is not the case
anymore, as the UART has been moved to address 0x1000000 in commit
45b4968e0de1 ("hw/serial: ARM/arm64: Use MMIO at higher addresses").

The series has also been tested with EDK2 built from the patches [6] that
add PCI Express when running under kvmtool. This means that someone will be
able to download an official iso from the debian website and install it in
a kvmtool VM.

The first two patches in the series are small and hopefully straightford
cleanups for stuff that I discovered when playing with kvmtool.

The third patch implements the PCI Express support only for the arm and
arm64 architectures. The reason for that is that I don't know how to do it
for x86, powerpc and mips (and for the last two I don't even have machines
to test it).

The last patch implements a fix for a Realtek RTL8168 NIC, where the Linux
drivers falls back to a device specific method of initialization if the
device is not PCI Express capable (doesn't have the PCI Express
Capability) [2].


Changes in v2
=============

* Gathered Reviewed-by tag, many thanks!

* Renamed #2 "arm/fdt.c: Warn if MMIO device doesn't provide a node
  generator" to "arm/fdt.c: Don't generate the node if generator function
  is NULL" and replaced the warning with a debug message.

* Added the PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1 define when it's not present
  on the system in patch #4.


Testing for v2
==============

In this iteration, the only change that impacts PCI Express support is the
addition of the PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V1 define when it's not
present on the system. Because of this, I believe the testing I did for v1
is still valid.

However, for completeness, a did a sanity run on my x86 machine. Also, the
EDK2 version that I used for testing on arm64 was built from a
work-in-progress tree, and in the meantime the patches have landed on the
mailing list [6]. I also ran some tests with EDK2 built from those patches.
Details below.

On a Ryzen 3900x:
-----------------

amd64 architecture and no PCIE support, making sure no regressions are
introduced.

1. Direct kernel boot + Debian 10 disk with SDL, to exercise the emulated
VESA device.  Was able to login using the display manager and
virtio-{net,blk} were working correctly.

On odroid-c4:
-------------

1. Debian 10 disk + EDK2 + --force-pci. The kernel was booted via Debian
grub, and I tried kernels compiled with 4k, 16k and 64k page sizes.

On AMD Seattle:
---------------

1. Using the EDK2 image and the passthrough Realtek RTL8168 NIC as the
network interface, and a vanilla netinstall iso from the debian website [3]
I was able to install debian in a virtual machine. The installation hint
from the testing for v1 still applies.

2. Realtek RTL8168 + EDK2 boot + --force-pci, kernel compiled with 4k and
64k pages (Seattle doesn't support 16k pages).

3. Intel 82574L NIC + EDK2 boot + --force-pci, kernel compiled with 4k and
64k pages.

4. AMD FirePro W2100 VGA + HDMI audio (both assigned to the VM) + EDK2 boot
+ --force-pci, kernel compiled from v5.10 (see testing for v1) with 4k and
64k pages.

5. NVIDIA Quadro P400 VGA + HDMI audio (both assigned to the VM) + EDK2
boot + --force-pci, kernel compiled with 4k and 64k pages (see testing for
v1).


Testing for v1
==============

Warning, wall of text. Unless specified, the guest kernel was built from
tag v5.12.

On a Ryzen 3900x:
-----------------

amd64 architecture and no PCIE support, making sure no regressions are
introduced.

1. Direct kernel boot + Debian 10 disk with SDL, to exercise the emulated
VESA device.  Was able to login using the display manager and
virtio-{net,blk} were working correctly.

2. Direct kernel boot + Debian 10 disk with SDL + Realtek RTL8168 + Intel
82574L PCIE NIC, both assigned to the VM. Assigning an ip address to the
Realtek NIC fails with the message: "No native access to PCI extended
config space, falling back to CSI", which makes sense since kvmtool is
emulating legacy PCI 3.0 for the amd64 architecture. Other than that,
everything works as expected.

On odroid-c4:
-------------

1. Debian 10 disk + upstream EDK2 built from commit 1f515342d8d8
("DynamicTablesPkg: Use AML_NAME_SEG_SIZE define"), **without** --force-pci
(so using virtio-mmio). Kernel compiled with 4k, 16k and 64k pages. This
was done to make sure there are no regressions.

2. Direct kernel boot + Debian 10 disk, with --force-pci. Tried 3 versions
of the kernel, compiled with 4k, 16k and 64k pagesize. Got the warning:
"TCP: enp0s0: Driver has suspect GRO implementation, TCP performance may be
compromised." I suspect it is because of kvmtool legacy version of virtio.
This was further confirmed by running the same kernel with kvmtool built
from master, with and without --force-pci, the warning was still there.

3. Debian 10 disk + a work-in-progress version of EDK2 which enables PCIE
support for kvmtool, with --force-pci. The kernel was booted via Debian
grub, and same as above, I tried with kernels compiled with 4k, 16k and 64k
page sizes.

On AMD Seattle:
---------------

1. Using the EDK2 image and the passthrough Realtek RTL8168 NIC as the
network interface, I was able to use a vanilla netinstall iso from the
debian website [3] and install debian in a virtual machine. Woohoo!

One gotcha during installation: because kvmtool doesn't emulate a SCSI
CD-ROM, you need to manually specify the virtio disk for the installation
iso. At the 'Detect and mount CD-ROM' prompt, choose No when asked to load
CD-ROM drivers from removable media, Yes to manually select a CD-ROM module
and device, none when choosing the CD-ROM module (it's a virtio disk), then
the device file for accessing the CD-ROM is /dev/vda (only if the iso file
is the first --disk kvmtool parameter, otherwise /dev/vdb if it's the
second, and so on).

2. Realtek RTL8168, direct kernel boot and EDK2 boot with Debian 10 disk,
--force-pci, kernel compiled with 4k and 64k pages (Seattle doesn't support
16k pages) for both direct kernel boot and EDK2 boot.

3. Intel 82574L NIC, direct kernel boot and EDK2 boot with Debian 10 disk,
--force-pci, kernel compiled with 4k and 64k pages for both direct boot and
EDK2 boot.

4. AMD FirePro W2100 VGA + HDMI audio, both assigned to a VM, direct kernel
boot and EDK2 boot with Debian 10 disk, --force-pci, kernel compiled with
4k and 64k pages for both direct boot and EDK2 boot.

For this test, I switched the guest kernel to v5.10 because with v5.11 and
v5.12 I was getting this kernel panic caused by a NULL pointer deference:

[..]
[    0.943927] [drm] radeon kernel modesetting enabled.
[    0.945050] [drm] initializing kernel modesetting (OLAND 0x1002:0x6608 0x1002:0x2120 0x00).
[    0.946313] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.947736] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.949193] [drm:radeon_get_bios] *ERROR* Unable to locate a BIOS ROM
[    0.950151] radeon 0000:00:00.0: Fatal error during GPU init
[    0.950990] [drm] radeon: finishing device.
[    0.951633] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[    0.952936] Mem abort info:
[    0.953369]   ESR = 0x96000004
[    0.953838]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.954635]   SET = 0, FnV = 0
[    0.955100]   EA = 0, S1PTW = 0
[    0.955590] Data abort info:
[    0.956033]   ISV = 0, ISS = 0x00000004
[    0.956608]   CM = 0, WnR = 0
[    0.957099] [0000000000000020] user address but active_mm is swapper
[    0.958051] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[    0.958881] Modules linked in:
[    0.959356] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.11.0 #13
[    0.960268] Hardware name: linux,dummy-virt (DT)
[    0.960970] pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
[    0.962013] pc : ttm_resource_manager_evict_all+0x64/0x1f0
[    0.962972] lr : ttm_resource_manager_evict_all+0x5c/0x1f0
[    0.963931] sp : ffff80001212ba00
[    0.964517] x29: ffff80001212ba00 x28: 0000000000000000 
[    0.965448] x27: ffff8000118004e0 x26: ffff8000120cd000 
[    0.966371] x25: 0000000000000000 x24: ffff000080c946e8 
[    0.967296] x23: 0000000000000020 x22: 0000000000000000 
[    0.968227] x21: 0000000000000000 x20: ffff8000120cdb90 
[    0.969152] x19: ffff000080c94000 x18: ffffffffffffffff 
[    0.970076] x17: 0000000000000000 x16: 0000000000000001 
[    0.970999] x15: ffff80009212b787 x14: 0000000000000006 
[    0.971928] x13: ffff800011de2368 x12: 0000000000000264 
[    0.972852] x11: 00000000000000cc x10: ffff800011de2368 
[    0.973780] x9 : ffff800011de2368 x8 : 00000000ffffefff 
[    0.974701] x7 : ffff800011e3a368 x6 : ffff800011e3a368 
[    0.975637] x5 : 0000000000000000 x4 : 0000000000000000 
[    0.976559] x3 : ffff8000120cdb90 x2 : 0000000000000001 
[    0.977483] x1 : 0000000000000000 x0 : 0000000000000000 
[    0.978410] Call trace:
[    0.978851]  ttm_resource_manager_evict_all+0x64/0x1f0
[    0.979759]  radeon_bo_evict_vram+0x1c/0x30
[    0.980494]  radeon_device_fini+0x34/0xe8
[    0.981209]  radeon_driver_unload_kms+0x48/0x90
[    0.982000]  radeon_driver_load_kms+0x124/0x174
[    0.982792]  drm_dev_register+0xe0/0x210
[    0.983486]  radeon_pci_probe+0x120/0x1bc
[    0.984180]  local_pci_probe+0x40/0xac
[    0.984843]  pci_device_probe+0x114/0x1b0
[    0.985548]  really_probe+0xe4/0x4c0
[    0.986181]  driver_probe_device+0x58/0xc0
[    0.986902]  device_driver_attach+0xc0/0xcc
[    0.987642]  __driver_attach+0x84/0x124
[    0.988317]  bus_for_each_dev+0x70/0xd0
[    0.988996]  driver_attach+0x24/0x30
[    0.989627]  bus_add_driver+0x104/0x1ec
[    0.990300]  driver_register+0x78/0x130
[    0.990974]  __pci_register_driver+0x48/0x54
[    0.991730]  radeon_module_init+0x54/0x64
[    0.992438]  do_one_initcall+0x50/0x1b0
[    0.993115]  kernel_init_freeable+0x1d4/0x23c
[    0.993880]  kernel_init+0x14/0x118
[    0.994496]  ret_from_fork+0x10/0x34
[    0.995132] Code: f90033ff 9420650e d37c7f36 8b1602b6 (f94012c0) 
[    0.996201] ---[ end trace 88eed6171e8cb9bc ]---
[    0.997011] note: swapper/0[1] exited with preempt_count 1
[    0.997840] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.998984] SMP: stopping secondary CPUs
[    0.999605] Kernel Offset: disabled
[    1.000137] CPU features: 0x00240022,61006082
[    1.000793] Memory Limit: none
[    1.001330] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

This is how dmesg looks like with v5.10, v5.8 and v5.6:

[..]
[    0.972061] [drm] radeon kernel modesetting enabled.
[    0.973162] [drm] initializing kernel modesetting (OLAND 0x1002:0x6608 0x1002:0x2120 0x00).
[    0.974426] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.976037] radeon 0000:00:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    0.977435] [drm:radeon_get_bios] *ERROR* Unable to locate a BIOS ROM
[    0.978381] radeon 0000:00:00.0: Fatal error during GPU init
[    0.979341] [drm] radeon: finishing device.
[    0.979963] [TTM] Memory type 2 has not been initialized
[    0.988250] radeon: probe of 0000:00:00.0 failed with error -22
[    0.989282] cacheinfo: Unable to detect cache hierarchy for CPU 0
[    0.993326] loop: module loaded
[..]

In my opinion, this is an upstream bug caused by incorrect clean up when
probing fails. I plan to see if I can reproduce it on my x86 machine (to
make it easier to other people to reproduce it) and then report it
upstream.

Note that I used the radeon driver instead of amdgpu because this is the
recommended driver [4] for the GCN1 architecture.

5. NVIDIA Quadro P400 VGA + HDMI audio, both assigned to a VM, direct kernel
boot and EDK2 boot with Debian 10 disk, --force-pci, kernel compiled with
4k and 64k pages for both direct boot and EDK2 boot.

Nouveau seems to work as expected (it binds to the GPU). but during driver
initialization it looks like the system hangs for 30s-1m. My guess is that
something times out in the driver due to missing emulation in kvmtool:

[..]
[    0.335506] [drm] radeon kernel modesetting enabled.
[    0.336369] nouveau 0000:00:00.0: enabling device (0000 -> 0003)
[    0.359468] nouveau 0000:00:00.0: NVIDIA GP107 (137000a1)
[    0.505066] nouveau 0000:00:00.0: bios: version 86.07.6b.00.01
              <---- hangs here
[  123.867379] nouveau 0000:00:00.0: acr: firmware unavailable
[  123.868337] nouveau 0000:00:00.0: pmu: firmware unavailable
[  123.869488] nouveau 0000:00:00.0: gr: firmware unavailable
[  123.870506] nouveau 0000:00:00.0: sec2: firmware unavailable
[  123.928149] nouveau 0000:00:00.0: fb: 2048 MiB GDDR5
[  123.963159] [TTM] Zone  kernel: Available graphics memory: 8313888 KiB
[  123.964823] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[  123.966172] nouveau 0000:00:00.0: DRM: VRAM: 2048 MiB
[  123.967101] nouveau 0000:00:00.0: DRM: GART: 536870912 MiB
[  123.968258] nouveau 0000:00:00.0: DRM: BIT table 'A' not found
[  123.969403] nouveau 0000:00:00.0: DRM: BIT table 'L' not found
[  123.970498] nouveau 0000:00:00.0: DRM: TMDS table version 2.0
[  123.971688] nouveau 0000:00:00.0: DRM: DCB version 4.1
[  123.972639] nouveau 0000:00:00.0: DRM: DCB outp 00: 01800f56 04600020
[  123.973820] nouveau 0000:00:00.0: DRM: DCB outp 01: 01000f52 04620020
[  123.975083] nouveau 0000:00:00.0: DRM: DCB outp 02: 01811f46 04600010
[  123.976500] nouveau 0000:00:00.0: DRM: DCB outp 03: 01011f42 04620010
[  123.977681] nouveau 0000:00:00.0: DRM: DCB outp 04: 02822f76 04600020
[  123.978955] nouveau 0000:00:00.0: DRM: DCB outp 05: 02022f72 00020020
[  123.980309] nouveau 0000:00:00.0: DRM: DCB conn 00: 00002046
[  123.981352] nouveau 0000:00:00.0: DRM: DCB conn 01: 00001146
[  123.982379] nouveau 0000:00:00.0: DRM: DCB conn 02: 00020246
[  123.984507] nouveau 0000:00:00.0: DRM: failed to create kernel channel, -22
[  123.986661] nouveau 0000:00:00.0: DRM: MM: using COPY for buffer copies
[  124.291297] nouveau 0000:00:00.0: [drm] Cannot find any crtc or sizes
[  124.292839] [drm] Initialized nouveau 1.3.1 20120801 for 0000:00:00.0 on minor 0
[..]

6. Crucial MX500 SSD connected to a generic PCIE to sata adapter assigned
to the VM, direct kernel boot and EDK2 boot with Debian 10 disk,
--force-pci, 4k and 64k pages kernel for both direct kernel and UEFI boot.

This was weird. On the host, the PCIE adapter worked just fine with kernel
v5.8, but on v5.12 the host was not able to initialize it:

[    2.891697] ata2: SATA link down (SStatus 0 SControl 300)
[    3.211695] ata3: SATA link down (SStatus 0 SControl 300)
[    3.531699] ata4: SATA link down (SStatus 0 SControl 300)
[    3.851694] ata5: SATA link down (SStatus 0 SControl 300)
[    4.141559] ata9: SATA link down (SStatus 0 SControl 0)
[    4.171691] ata6: SATA link down (SStatus 0 SControl 300)
[    4.491695] ata7: SATA link down (SStatus 0 SControl 300)
[    4.811693] ata8: SATA link down (SStatus 0 SControl 300)
[    6.973559] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[    6.983615] ata10: softreset failed (SRST command error)
[    6.989992] ata10: reset failed (errno=-5), retrying in 8 secs
[   17.173560] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[   17.183618] ata10: softreset failed (SRST command error)
[   17.189990] ata10: reset failed (errno=-5), retrying in 8 secs
[   27.413557] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[   27.423615] ata10: softreset failed (SRST command error)
[   27.429986] ata10: reset failed (errno=-5), retrying in 33 secs
[   60.837548] ata10: limiting SATA link speed to 1.5 Gbps
[   63.001557] arm-smmu e0a00000.smmu: Unhandled context fault: fsr=0x2, iova=0x8002420000, fsynr=0x181, cbfrsynra=0x100, cb=0
[   63.011615] ata10: softreset failed (SRST command error)
[   63.017988] ata10: reset failed, giving up

Assigning it to a VM worked though after the host running Linux v5.8
unitializes the adapter, so I'm going to consider this a pass. After a few
more tests, I was able to trigger the same error on v5.8. On v5.12
initialization has failed every time (so far, at least).

[0] https://lore.kernel.org/kvm/20200326152438.6218-1-alexandru.elisei@arm.com/T/#m835c93ef1dc7c539b4cdda85aee23210d494ea49
[1] https://lore.kernel.org/kvm/20200326152438.6218-1-alexandru.elisei@arm.com/
[2] https://www.spinics.net/lists/kvm/msg245607.html
[3] https://cdimage.debian.org/debian-cd/current/arm64/iso-cd/debian-10.9.0-arm64-netinst.iso
[4] https://wiki.archlinux.org/title/Xorg#AMD
[5] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/pci-express-v2-edk2-binary
[6] https://edk2.groups.io/g/devel/message/76522?p=,,,20,0,0,0::Created,,armvirtpkg,20,2,0,83558261

Alexandru Elisei (4):
  Move fdt_irq_fn typedef to fdt.h
  arm/fdt.c: Don't generate the node if generator function is NULL
  arm/arm64: Add PCI Express 1.1 support
  arm/arm64: vfio: Add PCI Express Capability Structure

 arm/fdt.c                         |  7 ++-
 arm/include/arm-common/kvm-arch.h |  4 +-
 arm/pci.c                         |  2 +-
 hw/rtc.c                          |  1 +
 include/kvm/fdt.h                 |  2 +
 include/kvm/kvm.h                 |  1 -
 include/kvm/pci.h                 | 75 ++++++++++++++++++++++++++++---
 pci.c                             |  5 ++-
 vfio/pci.c                        | 44 ++++++++++++++----
 9 files changed, 121 insertions(+), 20 deletions(-)

-- 
2.32.0


             reply	other threads:[~2021-06-21  9:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21  9:21 Alexandru Elisei [this message]
2021-06-21  9:21 ` [PATCH v2 kvmtool 1/4] Move fdt_irq_fn typedef to fdt.h Alexandru Elisei
2021-06-21  9:21 ` [PATCH v2 kvmtool 2/4] arm/fdt.c: Don't generate the node if generator function is NULL Alexandru Elisei
2021-06-21 14:03   ` Andre Przywara
2021-06-21  9:21 ` [PATCH v2 kvmtool 3/4] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
2021-06-21 14:04   ` Andre Przywara
2021-06-23  9:32     ` Alexandru Elisei
2021-06-23 10:06       ` Andre Przywara
2021-06-23 10:12         ` Alexandru Elisei
2021-06-21  9:21 ` [PATCH v2 kvmtool 4/4] arm/arm64: vfio: Add PCI Express Capability Structure Alexandru Elisei
2021-06-21 14:04   ` Andre Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210621092128.11313-1-alexandru.elisei@arm.com \
    --to=alexandru.elisei@arm.com \
    --cc=andre.przywara@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=maz@kernel.org \
    --cc=pierre.gondois@arm.com \
    --cc=sami.mujawar@arm.com \
    --cc=will@kernel.org \
    --subject='Re: [PATCH v2 kvmtool 0/4] arm/arm64: PCI Express 1.1 support' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox