All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support
@ 2020-03-26 15:24 Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 01/32] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
                   ` (32 more replies)
  0 siblings, 33 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
it from trying to reassign the BARs. Let's make the BARs reassignable so we
can get rid of this band-aid.

Let's also extend the legacy PCI emulation, which came out in 1992, so we
can properly emulate the PCI Express version 1.1 protocol, which is new in
comparison, being published in 2005.

During device configuration, Linux can assign a region resource to a BAR
that temporarily overlaps with another device. This means that the code
that emulates BAR reassignment must be aware of all the registered PCI
devices. Because of this, and to avoid re-implementing the same code for
each emulated device, the algorithm for activating/deactivating emulation
as BAR addresses change lives completely inside the PCI code. Each device
registers two callback functions which are called when device emulation is
activated (for example, to activate emulation for a newly assigned BAR
address), respectively, when device emulation is deactivated (a previous
BAR address is changed, and emulation for that region must be deactivated).

There's one change this iteration that deserves a more elaborate
explanation. I have removed patch #9 "arm/pci: Fix PCI IO region". The
problem that the patch is trying to fix is still present: the PCI ioport
region overlaps with the 8250 uart, but the fix that the patch proposed was
incorrect, as it only works when the guest uses 4k pages (gory details at
[1]). I considered several fixes:

1. Changing the uart to live above the 64k address, which makes sense, as
on arm it is memory mapped anyway, and the Linux driver gets that address
from the dtb.

2. Changing the PCI CPU I/O addresses to live in the PCI MMIO space, and
keeping the PCI bus I/O addresses unchanged (in the 0-64K region).

3. Dropping PCI I/O ports altogether, which also makes sense: I/O ports are
an x86 architectural feature, and the PCI spec mandates that all controls
that are accessible via an I/O BAR should also be implemented by a memory
BAR.

However, all the above approaches are not the obviously right solution and
not trivial to implement for various reasons:

1. Software might rely on those addresses being fixed (kvm-unit-tests,
EDK2).

2. It requires invasive and subtle changes in the PCI and the device
emulation code, because now we have to distinguish between PCI CPU address
(which we trap and emulate) and PCI bus addresses, which will be used by
the guest to program the devices (for example, the I/O BAR addresses).

3. It requires changes to each device PCI code, because PCI I/O port
address requests failing will become a recoverable error on the arm and
arm64 architectures.

Considering the fact that this series is already very big, and this bug has
been present since 2014 without anyone complaining about it, I decided to
leave out any potential fixes. I will investigate which fix works better
and I will send it as a standalone series.

I tested the patches on an x86_64 and arm64 machine:

1. On AMD Seattle. Tried PCI passthrough using two different guests in each
case; one with 4k pages, and one with 64k pages (the CPU doesn't have
support for 16k pages):
- Intel 82574L Gigabit Ethernet card
- Samsung 970 Pro NVME (controller registers are in the same BAR region as the
  MSIX table and PBA, I wrote a nasty hack to make it work, will try to upstream
  something after this series)
- Realtek 8168 Gigabit Ethernet card
- NVIDIA Quadro P400 (nouveau probe fails because expansion ROM emulation not
  implemented in kvmtool, I will write a fix for that)
- AMD Firepro W2100 (amdgpu driver probe fails just like on the NVIDIA card)
- Seagate Barracuda 1000GB drive and Crucial MX500 SSD attached to a PCIE to
  SATA card.

I also wrote a kvm-unit-tests test that stresses the MMIO emulation locking
scheme that I implemented by spawning 4 vcpus (the Seattle machine has 8
physical CPUs) that toggle memory, and another 4 vcpus that read from the memory
region described by a BAR. I ran this under valgrind, and no deadlocks
use-after-free errors were detected.

2. Ryzen 3900x + Gigabyte x570 Aorus Master (bios F10). Tried PCI passthrough
with a Realtek 8168 and Intel 82574L Gigabit Ethernet cards at the same time,
plus running with --sdl enabled to test VESA device emulation at the same time.
I also managed to get the guest to do bar reassignment for one device by using
the kernel command line parameter pci.resource_alignment=16@pci:10ec:8168.

Patches 1-18 are fixes and cleanups, and can be merged independently. They
are pretty straightforward, so if the size of the series looks off-putting,
please review these first. I am aware that the series has grown quite a
lot, I am willing to split the fixes from the rest of the patches, or
whatever else can make reviewing easier.

Changes in v3:
* Patches 18, 24 and 26 are new.
* Dropped #9 "arm/pci: Fix PCI IO region" for reasons explained above.
* Reworded commit message for #12 "vfio/pci: Ignore expansion ROM BAR
  writes" to make it clear that kvmtool's implementation of VFIO doesn't
  support expansion BAR emulation.
* Moved variable declaration at the start of the function for #13
  "vfio/pci: Don't access unallocated regions".
* Patch #15 "Don't ignore errors registering a device, ioport or mmio
  emulation" uses ioport__remove consistenly.
* Reworked error cleanup for #16 "hw/vesa: Don't ignore fatal errors".
* Reworded commit message for #17 "hw/vesa: Set the size for BAR 0".
* Reworked #19 "ioport: mmio: Use a mutex and reference counting for
  locking" to also use reference counting to prevent races (and updated the
  commit message in the process).
* Added the function pci__bar_size to #20 "pci: Add helpers for BAR values
  and memory/IO space access".
* Protect mem_banks list with a mutex in #22 "vfio: Destroy memslot when
  unmapping the associated VAs"; also moved the munmap after the slot is
  destroyed, as per the KVM api.
* Dropped the rework of the vesa device in patch #27 "pci: Implement
  callbacks for toggling BAR emulation". Also added a few assert statements
  in some callbacks to make sure that they don't get called with an
  unxpected BAR number (which can only result from a coding error). Also
  return -ENOENT when kvm__deregister_mmio fails, to match ioport behavior
  and for better diagnostics.
* Dropped the PCI configuration write callback from the vesa device in #28
  "pci: Toggle BAR I/O and memory space emulation". Apparently, if we don't
  allow the guest kernel to disable BAR access, it treats the VESA device
  as a VGA boot device, instead of a regular VGA device, and Xorg cannot
  use it.
* Droped struct bar_info from #29 "pci: Implement reassignable BARs". Also
  changed pci_dev_err to pci_dev_warn in pci_{activate,deactivate}_bar,
  because the errors can be benign (they can happen because two vcpus race
  against each other to deactivate memory or I/O access, for example).
* Replaced the PCI device configuration space define with the actual
  number in #32 "arm/arm64: Add PCI Express 1.1 support". On some distros
  the defines weren't present in /usr/include/linux/pci_regs.h.
* Implemented other minor review comments.
* Gathered Reviewed-by tags.

Changes in v2:
* Patches 2, 11-18, 20, 22-27, 29 are new.
* Patches 11, 13, and 14 have been dropped.
* Reworked the way BAR reassignment is implemented.
* The patch "Add PCI Express 1.1 support" has been reworked to apply only
  to arm64. For x86 we would need ACPI support in order to advertise the
  location of the ECAM space.
* Gathered Reviewed-by tags.
* Implemented review comments.

[1] https://www.spinics.net/lists/kvm/msg209178.html
[2] https://www.spinics.net/lists/kvm/msg209178.html
[3] https://www.spinics.net/lists/arm-kernel/msg778623.html


Alexandru Elisei (27):
  Makefile: Use correct objcopy binary when cross-compiling for x86_64
  hw/i8042: Compile only for x86
  Remove pci-shmem device
  Check that a PCI device's memory size is power of two
  arm/pci: Advertise only PCI bus 0 in the DT
  vfio/pci: Allocate correct size for MSIX table and PBA BARs
  vfio/pci: Don't assume that only even numbered BARs are 64bit
  vfio/pci: Ignore expansion ROM BAR writes
  vfio/pci: Don't access unallocated regions
  virtio: Don't ignore initialization failures
  Don't ignore errors registering a device, ioport or mmio emulation
  hw/vesa: Don't ignore fatal errors
  hw/vesa: Set the size for BAR 0
  ioport: Fail when registering overlapping ports
  ioport: mmio: Use a mutex and reference counting for locking
  pci: Add helpers for BAR values and memory/IO space access
  virtio/pci: Get emulated region address from BARs
  vfio: Destroy memslot when unmapping the associated VAs
  vfio: Reserve ioports when configuring the BAR
  pci: Limit configuration transaction size to 32 bits
  vfio/pci: Don't write configuration value twice
  vesa: Create device exactly once
  pci: Implement callbacks for toggling BAR emulation
  pci: Toggle BAR I/O and memory space emulation
  pci: Implement reassignable BARs
  vfio: Trap MMIO access to BAR addresses which aren't page aligned
  arm/arm64: Add PCI Express 1.1 support

Julien Thierry (4):
  ioport: pci: Move port allocations to PCI devices
  pci: Fix ioport allocation size
  virtio/pci: Make memory and IO BARs independent
  arm/fdt: Remove 'linux,pci-probe-only' property

Sami Mujawar (1):
  pci: Fix BAR resource sizing arbitration

 Makefile                          |   6 +-
 arm/fdt.c                         |   1 -
 arm/include/arm-common/kvm-arch.h |   4 +-
 arm/ioport.c                      |   3 +-
 arm/pci.c                         |   4 +-
 builtin-run.c                     |   6 +-
 hw/i8042.c                        |  14 +-
 hw/pci-shmem.c                    | 400 ------------------------------
 hw/vesa.c                         | 113 ++++++---
 include/kvm/devices.h             |   3 +-
 include/kvm/ioport.h              |  12 +-
 include/kvm/kvm-config.h          |   2 +-
 include/kvm/kvm.h                 |  11 +-
 include/kvm/pci-shmem.h           |  32 ---
 include/kvm/pci.h                 | 163 +++++++++++-
 include/kvm/rbtree-interval.h     |   4 +-
 include/kvm/util.h                |   2 +
 include/kvm/vesa.h                |   6 +-
 include/kvm/virtio-pci.h          |   3 -
 include/kvm/virtio.h              |   7 +-
 include/linux/compiler.h          |   2 +-
 ioport.c                          | 108 ++++----
 kvm.c                             | 101 +++++++-
 mips/kvm.c                        |   3 +-
 mmio.c                            |  91 +++++--
 pci.c                             | 334 +++++++++++++++++++++++--
 powerpc/include/kvm/kvm-arch.h    |   2 +-
 powerpc/ioport.c                  |   3 +-
 powerpc/spapr_pci.c               |   2 +-
 vfio/core.c                       |  22 +-
 vfio/pci.c                        | 233 +++++++++++++----
 virtio/9p.c                       |   9 +-
 virtio/balloon.c                  |  10 +-
 virtio/blk.c                      |  14 +-
 virtio/console.c                  |  11 +-
 virtio/core.c                     |   9 +-
 virtio/mmio.c                     |  13 +-
 virtio/net.c                      |  32 +--
 virtio/pci.c                      | 206 ++++++++++-----
 virtio/scsi.c                     |  14 +-
 x86/include/kvm/kvm-arch.h        |   2 +-
 x86/ioport.c                      |  66 +++--
 42 files changed, 1287 insertions(+), 796 deletions(-)
 delete mode 100644 hw/pci-shmem.c
 delete mode 100644 include/kvm/pci-shmem.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 01/32] Makefile: Use correct objcopy binary when cross-compiling for x86_64
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 02/32] hw/i8042: Compile only for x86 Alexandru Elisei
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Use the compiler toolchain version of objcopy instead of the native one
when cross-compiling for the x86_64 architecture.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index b76d844f2e01..6d6880dd4f8a 100644
--- a/Makefile
+++ b/Makefile
@@ -22,6 +22,7 @@ CC	:= $(CROSS_COMPILE)gcc
 CFLAGS	:=
 LD	:= $(CROSS_COMPILE)ld
 LDFLAGS	:=
+OBJCOPY	:= $(CROSS_COMPILE)objcopy
 
 FIND	:= find
 CSCOPE	:= cscope
@@ -479,7 +480,7 @@ x86/bios/bios.bin.elf: x86/bios/entry.S x86/bios/e820.c x86/bios/int10.c x86/bio
 
 x86/bios/bios.bin: x86/bios/bios.bin.elf
 	$(E) "  OBJCOPY " $@
-	$(Q) objcopy -O binary -j .text x86/bios/bios.bin.elf x86/bios/bios.bin
+	$(Q) $(OBJCOPY) -O binary -j .text x86/bios/bios.bin.elf x86/bios/bios.bin
 
 x86/bios/bios-rom.o: x86/bios/bios-rom.S x86/bios/bios.bin x86/bios/bios-rom.h
 	$(E) "  CC      " $@
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 02/32] hw/i8042: Compile only for x86
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 01/32] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 03/32] pci: Fix BAR resource sizing arbitration Alexandru Elisei
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

The initialization function for the i8042 emulated device does exactly
nothing for all architectures, except for x86. As a result, the device
is usable only for x86, so let's make the file an architecture specific
object file.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Makefile   | 2 +-
 hw/i8042.c | 4 ----
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index 6d6880dd4f8a..33eddcbb4d66 100644
--- a/Makefile
+++ b/Makefile
@@ -103,7 +103,6 @@ OBJS	+= hw/pci-shmem.o
 OBJS	+= kvm-ipc.o
 OBJS	+= builtin-sandbox.o
 OBJS	+= virtio/mmio.o
-OBJS	+= hw/i8042.o
 
 # Translate uname -m into ARCH string
 ARCH ?= $(shell uname -m | sed -e s/i.86/i386/ -e s/ppc.*/powerpc/ \
@@ -124,6 +123,7 @@ endif
 #x86
 ifeq ($(ARCH),x86)
 	DEFINES += -DCONFIG_X86
+	OBJS	+= hw/i8042.o
 	OBJS	+= x86/boot.o
 	OBJS	+= x86/cpuid.o
 	OBJS	+= x86/interrupt.o
diff --git a/hw/i8042.c b/hw/i8042.c
index 288b7d1108ac..2d8c96e9c7e6 100644
--- a/hw/i8042.c
+++ b/hw/i8042.c
@@ -349,10 +349,6 @@ static struct ioport_operations kbd_ops = {
 
 int kbd__init(struct kvm *kvm)
 {
-#ifndef CONFIG_X86
-	return 0;
-#endif
-
 	kbd_reset();
 	state.kvm = kvm;
 	ioport__register(kvm, I8042_DATA_REG, &kbd_ops, 2, NULL);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 03/32] pci: Fix BAR resource sizing arbitration
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 01/32] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 02/32] hw/i8042: Compile only for x86 Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 04/32] Remove pci-shmem device Alexandru Elisei
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, Julien Thierry

From: Sami Mujawar <sami.mujawar@arm.com>

According to the 'PCI Local Bus Specification, Revision 3.0,
February 3, 2004, Section 6.2.5.1, Implementation Notes, page 227'

    "Software saves the original value of the Base Address register,
    writes 0 FFFF FFFFh to the register, then reads it back. Size
    calculation can be done from the 32-bit value read by first
    clearing encoding information bits (bit 0 for I/O, bits 0-3 for
    memory), inverting all 32 bits (logical NOT), then incrementing
    by 1. The resultant 32-bit value is the memory/I/O range size
    decoded by the register. Note that the upper 16 bits of the result
    is ignored if the Base Address register is for I/O and bits 16-31
    returned zero upon read."

kvmtool was returning the actual BAR resource size which would be
incorrect as the software software drivers would invert all 32 bits
(logical NOT), then incrementing by 1. This ends up with a very large
resource size (in some cases more than 4GB) due to which drivers
assert/fail to work.

e.g if the BAR resource size was 0x1000, kvmtool would return 0x1000
instead of 0xFFFFF00x.

Fixed pci__config_wr() to return the size of the BAR in accordance with
the PCI Local Bus specification, Implementation Notes.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Reworked algorithm, removed power-of-two check]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 pci.c | 42 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/pci.c b/pci.c
index 689869cb79a3..3198732935eb 100644
--- a/pci.c
+++ b/pci.c
@@ -149,6 +149,8 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	u8 bar, offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
+	u32 value = 0;
+	u32 mask;
 
 	if (!pci_device_exists(addr.bus_number, dev_num, 0))
 		return;
@@ -169,13 +171,41 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
 
 	/*
-	 * If the kernel masks the BAR it would expect to find the size of the
-	 * BAR there next time it reads from it. When the kernel got the size it
-	 * would write the address back.
+	 * If the kernel masks the BAR, it will expect to find the size of the
+	 * BAR there next time it reads from it. After the kernel reads the
+	 * size, it will write the address back.
 	 */
-	if (bar < 6 && ioport__read32(data) == 0xFFFFFFFF) {
-		u32 sz = pci_hdr->bar_size[bar];
-		memcpy(base + offset, &sz, sizeof(sz));
+	if (bar < 6) {
+		if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO)
+			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
+		else
+			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
+		/*
+		 * According to the PCI local bus specification REV 3.0:
+		 * The number of upper bits that a device actually implements
+		 * depends on how much of the address space the device will
+		 * respond to. A device that wants a 1 MB memory address space
+		 * (using a 32-bit base address register) would build the top
+		 * 12 bits of the address register, hardwiring the other bits
+		 * to 0.
+		 *
+		 * Furthermore, software can determine how much address space
+		 * the device requires by writing a value of all 1's to the
+		 * register and then reading the value back. The device will
+		 * return 0's in all don't-care address bits, effectively
+		 * specifying the address space required.
+		 *
+		 * Software computes the size of the address space with the
+		 * formula S = ~B + 1, where S is the memory size and B is the
+		 * value read from the BAR. This means that the BAR value that
+		 * kvmtool should return is B = ~(S - 1).
+		 */
+		memcpy(&value, data, size);
+		if (value == 0xffffffff)
+			value = ~(pci_hdr->bar_size[bar] - 1);
+		/* Preserve the special bits. */
+		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
+		memcpy(base + offset, &value, size);
 	} else {
 		memcpy(base + offset, data, size);
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 04/32] Remove pci-shmem device
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (2 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 03/32] pci: Fix BAR resource sizing arbitration Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 05/32] Check that a PCI device's memory size is power of two Alexandru Elisei
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

The pci-shmem emulated device ("ivshmem") was created by QEMU for
cross-VM data sharing. The only Linux driver that uses this device is
the Android Virtual System on a Chip staging driver, which also mentions
a character device driver implemented on top of shmem, which was removed
from Linux.

On the kvmtool side, the only commits touching the pci-shmem device
since it was introduced in 2012 were made when refactoring various
kvmtool subsystems. Let's remove the maintenance burden on the kvmtool
maintainers and remove this unused device.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Makefile                |   1 -
 builtin-run.c           |   5 -
 hw/pci-shmem.c          | 400 ----------------------------------------
 include/kvm/pci-shmem.h |  32 ----
 4 files changed, 438 deletions(-)
 delete mode 100644 hw/pci-shmem.c
 delete mode 100644 include/kvm/pci-shmem.h

diff --git a/Makefile b/Makefile
index 33eddcbb4d66..f75413e74819 100644
--- a/Makefile
+++ b/Makefile
@@ -99,7 +99,6 @@ OBJS	+= util/read-write.o
 OBJS	+= util/util.o
 OBJS	+= virtio/9p.o
 OBJS	+= virtio/9p-pdu.o
-OBJS	+= hw/pci-shmem.o
 OBJS	+= kvm-ipc.o
 OBJS	+= builtin-sandbox.o
 OBJS	+= virtio/mmio.o
diff --git a/builtin-run.c b/builtin-run.c
index f8dc6c7229b0..9cb8c75300eb 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -31,7 +31,6 @@
 #include "kvm/sdl.h"
 #include "kvm/vnc.h"
 #include "kvm/guest_compat.h"
-#include "kvm/pci-shmem.h"
 #include "kvm/kvm-ipc.h"
 #include "kvm/builtin-debug.h"
 
@@ -99,10 +98,6 @@ void kvm_run_set_wrapper_sandbox(void)
 	OPT_INTEGER('c', "cpus", &(cfg)->nrcpus, "Number of CPUs"),	\
 	OPT_U64('m', "mem", &(cfg)->ram_size, "Virtual machine memory"	\
 		" size in MiB."),					\
-	OPT_CALLBACK('\0', "shmem", NULL,				\
-		     "[pci:]<addr>:<size>[:handle=<handle>][:create]",	\
-		     "Share host shmem with guest via pci device",	\
-		     shmem_parser, NULL),				\
 	OPT_CALLBACK('d', "disk", kvm, "image or rootfs_dir", "Disk "	\
 			" image or rootfs directory", img_name_parser,	\
 			kvm),						\
diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
deleted file mode 100644
index f92bc75544d7..000000000000
--- a/hw/pci-shmem.c
+++ /dev/null
@@ -1,400 +0,0 @@
-#include "kvm/devices.h"
-#include "kvm/pci-shmem.h"
-#include "kvm/virtio-pci-dev.h"
-#include "kvm/irq.h"
-#include "kvm/kvm.h"
-#include "kvm/pci.h"
-#include "kvm/util.h"
-#include "kvm/ioport.h"
-#include "kvm/ioeventfd.h"
-
-#include <linux/kvm.h>
-#include <linux/byteorder.h>
-#include <sys/ioctl.h>
-#include <fcntl.h>
-#include <sys/mman.h>
-
-#define MB_SHIFT (20)
-#define KB_SHIFT (10)
-#define GB_SHIFT (30)
-
-static struct pci_device_header pci_shmem_pci_device = {
-	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
-	.device_id	= cpu_to_le16(0x1110),
-	.header_type	= PCI_HEADER_TYPE_NORMAL,
-	.class[2]	= 0xFF,	/* misc pci device */
-	.status		= cpu_to_le16(PCI_STATUS_CAP_LIST),
-	.capabilities	= (void *)&pci_shmem_pci_device.msix - (void *)&pci_shmem_pci_device,
-	.msix.cap	= PCI_CAP_ID_MSIX,
-	.msix.ctrl	= cpu_to_le16(1),
-	.msix.table_offset = cpu_to_le32(1),		/* Use BAR 1 */
-	.msix.pba_offset = cpu_to_le32(0x1001),		/* Use BAR 1 */
-};
-
-static struct device_header pci_shmem_device = {
-	.bus_type	= DEVICE_BUS_PCI,
-	.data		= &pci_shmem_pci_device,
-};
-
-/* registers for the Inter-VM shared memory device */
-enum ivshmem_registers {
-	INTRMASK = 0,
-	INTRSTATUS = 4,
-	IVPOSITION = 8,
-	DOORBELL = 12,
-};
-
-static struct shmem_info *shmem_region;
-static u16 ivshmem_registers;
-static int local_fd;
-static u32 local_id;
-static u64 msix_block;
-static u64 msix_pba;
-static struct msix_table msix_table[2];
-
-int pci_shmem__register_mem(struct shmem_info *si)
-{
-	if (!shmem_region) {
-		shmem_region = si;
-	} else {
-		pr_warning("only single shmem currently avail. ignoring.\n");
-		free(si);
-	}
-	return 0;
-}
-
-static bool shmem_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
-{
-	u16 offset = port - ivshmem_registers;
-
-	switch (offset) {
-	case INTRMASK:
-		break;
-	case INTRSTATUS:
-		break;
-	case IVPOSITION:
-		ioport__write32(data, local_id);
-		break;
-	case DOORBELL:
-		break;
-	};
-
-	return true;
-}
-
-static bool shmem_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
-{
-	u16 offset = port - ivshmem_registers;
-
-	switch (offset) {
-	case INTRMASK:
-		break;
-	case INTRSTATUS:
-		break;
-	case IVPOSITION:
-		break;
-	case DOORBELL:
-		break;
-	};
-
-	return true;
-}
-
-static struct ioport_operations shmem_pci__io_ops = {
-	.io_in	= shmem_pci__io_in,
-	.io_out	= shmem_pci__io_out,
-};
-
-static void callback_mmio_msix(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr)
-{
-	void *mem;
-
-	if (addr - msix_block < 0x1000)
-		mem = &msix_table;
-	else
-		mem = &msix_pba;
-
-	if (is_write)
-		memcpy(mem + addr - msix_block, data, len);
-	else
-		memcpy(data, mem + addr - msix_block, len);
-}
-
-/*
- * Return an irqfd which can be used by other guests to signal this guest
- * whenever they need to poke it
- */
-int pci_shmem__get_local_irqfd(struct kvm *kvm)
-{
-	int fd, gsi, r;
-
-	if (local_fd == 0) {
-		fd = eventfd(0, 0);
-		if (fd < 0)
-			return fd;
-
-		if (pci_shmem_pci_device.msix.ctrl & cpu_to_le16(PCI_MSIX_FLAGS_ENABLE)) {
-			gsi = irq__add_msix_route(kvm, &msix_table[0].msg,
-						  pci_shmem_device.dev_num << 3);
-			if (gsi < 0)
-				return gsi;
-		} else {
-			gsi = pci_shmem_pci_device.irq_line;
-		}
-
-		r = irq__add_irqfd(kvm, gsi, fd, -1);
-		if (r < 0)
-			return r;
-
-		local_fd = fd;
-	}
-
-	return local_fd;
-}
-
-/*
- * Connect a new client to ivshmem by adding the appropriate datamatch
- * to the DOORBELL
- */
-int pci_shmem__add_client(struct kvm *kvm, u32 id, int fd)
-{
-	struct kvm_ioeventfd ioevent;
-
-	ioevent = (struct kvm_ioeventfd) {
-		.addr		= ivshmem_registers + DOORBELL,
-		.len		= sizeof(u32),
-		.datamatch	= id,
-		.fd		= fd,
-		.flags		= KVM_IOEVENTFD_FLAG_PIO | KVM_IOEVENTFD_FLAG_DATAMATCH,
-	};
-
-	return ioctl(kvm->vm_fd, KVM_IOEVENTFD, &ioevent);
-}
-
-/*
- * Remove a client connected to ivshmem by removing the appropriate datamatch
- * from the DOORBELL
- */
-int pci_shmem__remove_client(struct kvm *kvm, u32 id)
-{
-	struct kvm_ioeventfd ioevent;
-
-	ioevent = (struct kvm_ioeventfd) {
-		.addr		= ivshmem_registers + DOORBELL,
-		.len		= sizeof(u32),
-		.datamatch	= id,
-		.flags		= KVM_IOEVENTFD_FLAG_PIO
-				| KVM_IOEVENTFD_FLAG_DATAMATCH
-				| KVM_IOEVENTFD_FLAG_DEASSIGN,
-	};
-
-	return ioctl(kvm->vm_fd, KVM_IOEVENTFD, &ioevent);
-}
-
-static void *setup_shmem(const char *key, size_t len, int creating)
-{
-	int fd;
-	int rtn;
-	void *mem;
-	int flag = O_RDWR;
-
-	if (creating)
-		flag |= O_CREAT;
-
-	fd = shm_open(key, flag, S_IRUSR | S_IWUSR);
-	if (fd < 0) {
-		pr_warning("Failed to open shared memory file %s\n", key);
-		return NULL;
-	}
-
-	if (creating) {
-		rtn = ftruncate(fd, (off_t) len);
-		if (rtn < 0)
-			pr_warning("Can't ftruncate(fd,%zu)\n", len);
-	}
-	mem = mmap(NULL, len,
-		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, 0);
-	if (mem == MAP_FAILED) {
-		pr_warning("Failed to mmap shared memory file");
-		mem = NULL;
-	}
-	close(fd);
-
-	return mem;
-}
-
-int shmem_parser(const struct option *opt, const char *arg, int unset)
-{
-	const u64 default_size = SHMEM_DEFAULT_SIZE;
-	const u64 default_phys_addr = SHMEM_DEFAULT_ADDR;
-	const char *default_handle = SHMEM_DEFAULT_HANDLE;
-	struct shmem_info *si = malloc(sizeof(struct shmem_info));
-	u64 phys_addr;
-	u64 size;
-	char *handle = NULL;
-	int create = 0;
-	const char *p = arg;
-	char *next;
-	int base = 10;
-	int verbose = 0;
-
-	const int skip_pci = strlen("pci:");
-	if (verbose)
-		pr_info("shmem_parser(%p,%s,%d)", opt, arg, unset);
-	/* parse out optional addr family */
-	if (strcasestr(p, "pci:")) {
-		p += skip_pci;
-	} else if (strcasestr(p, "mem:")) {
-		die("I can't add to E820 map yet.\n");
-	}
-	/* parse out physical addr */
-	base = 10;
-	if (strcasestr(p, "0x"))
-		base = 16;
-	phys_addr = strtoll(p, &next, base);
-	if (next == p && phys_addr == 0) {
-		pr_info("shmem: no physical addr specified, using default.");
-		phys_addr = default_phys_addr;
-	}
-	if (*next != ':' && *next != '\0')
-		die("shmem: unexpected chars after phys addr.\n");
-	if (*next == '\0')
-		p = next;
-	else
-		p = next + 1;
-	/* parse out size */
-	base = 10;
-	if (strcasestr(p, "0x"))
-		base = 16;
-	size = strtoll(p, &next, base);
-	if (next == p && size == 0) {
-		pr_info("shmem: no size specified, using default.");
-		size = default_size;
-	}
-	/* look for [KMGkmg][Bb]*  uses base 2. */
-	int skip_B = 0;
-	if (strspn(next, "KMGkmg")) {	/* might have a prefix */
-		if (*(next + 1) == 'B' || *(next + 1) == 'b')
-			skip_B = 1;
-		switch (*next) {
-		case 'K':
-		case 'k':
-			size = size << KB_SHIFT;
-			break;
-		case 'M':
-		case 'm':
-			size = size << MB_SHIFT;
-			break;
-		case 'G':
-		case 'g':
-			size = size << GB_SHIFT;
-			break;
-		default:
-			die("shmem: bug in detecting size prefix.");
-			break;
-		}
-		next += 1 + skip_B;
-	}
-	if (*next != ':' && *next != '\0') {
-		die("shmem: unexpected chars after phys size. <%c><%c>\n",
-		    *next, *p);
-	}
-	if (*next == '\0')
-		p = next;
-	else
-		p = next + 1;
-	/* parse out optional shmem handle */
-	const int skip_handle = strlen("handle=");
-	next = strcasestr(p, "handle=");
-	if (*p && next) {
-		if (p != next)
-			die("unexpected chars before handle\n");
-		p += skip_handle;
-		next = strchrnul(p, ':');
-		if (next - p) {
-			handle = malloc(next - p + 1);
-			strncpy(handle, p, next - p);
-			handle[next - p] = '\0';	/* just in case. */
-		}
-		if (*next == '\0')
-			p = next;
-		else
-			p = next + 1;
-	}
-	/* parse optional create flag to see if we should create shm seg. */
-	if (*p && strcasestr(p, "create")) {
-		create = 1;
-		p += strlen("create");
-	}
-	if (*p != '\0')
-		die("shmem: unexpected trailing chars\n");
-	if (handle == NULL) {
-		handle = malloc(strlen(default_handle) + 1);
-		strcpy(handle, default_handle);
-	}
-	if (verbose) {
-		pr_info("shmem: phys_addr = %llx",
-			(unsigned long long)phys_addr);
-		pr_info("shmem: size      = %llx", (unsigned long long)size);
-		pr_info("shmem: handle    = %s", handle);
-		pr_info("shmem: create    = %d", create);
-	}
-
-	si->phys_addr = phys_addr;
-	si->size = size;
-	si->handle = handle;
-	si->create = create;
-	pci_shmem__register_mem(si);	/* ownership of si, etc. passed on. */
-	return 0;
-}
-
-int pci_shmem__init(struct kvm *kvm)
-{
-	char *mem;
-	int r;
-
-	if (shmem_region == NULL)
-		return 0;
-
-	/* Register MMIO space for MSI-X */
-	r = ioport__register(kvm, IOPORT_EMPTY, &shmem_pci__io_ops, IOPORT_SIZE, NULL);
-	if (r < 0)
-		return r;
-	ivshmem_registers = (u16)r;
-
-	msix_block = pci_get_io_space_block(0x1010);
-	kvm__register_mmio(kvm, msix_block, 0x1010, false, callback_mmio_msix, NULL);
-
-	/*
-	 * This registers 3 BARs:
-	 *
-	 * 0 - ivshmem registers
-	 * 1 - MSI-X MMIO space
-	 * 2 - Shared memory block
-	 */
-	pci_shmem_pci_device.bar[0] = cpu_to_le32(ivshmem_registers | PCI_BASE_ADDRESS_SPACE_IO);
-	pci_shmem_pci_device.bar_size[0] = shmem_region->size;
-	pci_shmem_pci_device.bar[1] = cpu_to_le32(msix_block | PCI_BASE_ADDRESS_SPACE_MEMORY);
-	pci_shmem_pci_device.bar_size[1] = 0x1010;
-	pci_shmem_pci_device.bar[2] = cpu_to_le32(shmem_region->phys_addr | PCI_BASE_ADDRESS_SPACE_MEMORY);
-	pci_shmem_pci_device.bar_size[2] = shmem_region->size;
-
-	device__register(&pci_shmem_device);
-
-	/* Open shared memory and plug it into the guest */
-	mem = setup_shmem(shmem_region->handle, shmem_region->size,
-				shmem_region->create);
-	if (mem == NULL)
-		return -EINVAL;
-
-	kvm__register_dev_mem(kvm, shmem_region->phys_addr, shmem_region->size,
-			      mem);
-	return 0;
-}
-dev_init(pci_shmem__init);
-
-int pci_shmem__exit(struct kvm *kvm)
-{
-	return 0;
-}
-dev_exit(pci_shmem__exit);
diff --git a/include/kvm/pci-shmem.h b/include/kvm/pci-shmem.h
deleted file mode 100644
index 6cff2b85bfd3..000000000000
--- a/include/kvm/pci-shmem.h
+++ /dev/null
@@ -1,32 +0,0 @@
-#ifndef KVM__PCI_SHMEM_H
-#define KVM__PCI_SHMEM_H
-
-#include <linux/types.h>
-#include <linux/list.h>
-
-#include "kvm/parse-options.h"
-
-#define SHMEM_DEFAULT_SIZE (16 << MB_SHIFT)
-#define SHMEM_DEFAULT_ADDR (0xc8000000)
-#define SHMEM_DEFAULT_HANDLE "/kvm_shmem"
-
-struct kvm;
-struct shmem_info;
-
-struct shmem_info {
-	u64 phys_addr;
-	u64 size;
-	char *handle;
-	int create;
-};
-
-int pci_shmem__init(struct kvm *kvm);
-int pci_shmem__exit(struct kvm *kvm);
-int pci_shmem__register_mem(struct shmem_info *si);
-int shmem_parser(const struct option *opt, const char *arg, int unset);
-
-int pci_shmem__get_local_irqfd(struct kvm *kvm);
-int pci_shmem__add_client(struct kvm *kvm, u32 id, int fd);
-int pci_shmem__remove_client(struct kvm *kvm, u32 id);
-
-#endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 05/32] Check that a PCI device's memory size is power of two
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (3 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 04/32] Remove pci-shmem device Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 06/32] arm/pci: Advertise only PCI bus 0 in the DT Alexandru Elisei
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

According to the PCI local bus specification [1], a device's memory size
must be a power of two. This is also implicit in the mechanism that a CPU
uses to get the memory size requirement for a PCI device.

The vesa device requests a memory size that isn't a power of two.
According to the same spec [1], a device is allowed to consume more memory
than it actually requires. As a result, the amount of memory that the vesa
device now reserves has been increased.

To prevent slip-ups in the future, a few BUILD_BUG_ON statements were added
in places where the memory size is known at compile time.

[1] PCI Local Bus Specification Revision 3.0, section 6.2.5.1

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c          | 3 +++
 include/kvm/util.h | 2 ++
 include/kvm/vesa.h | 6 +++++-
 virtio/pci.c       | 3 +++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index f3c5114cf4fe..d75b4b316a1e 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -58,6 +58,9 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 	char *mem;
 	int r;
 
+	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
+	BUILD_BUG_ON(VESA_MEM_SIZE < VESA_BPP/8 * VESA_WIDTH * VESA_HEIGHT);
+
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
 
diff --git a/include/kvm/util.h b/include/kvm/util.h
index 4ca7aa9392b6..199724c4018c 100644
--- a/include/kvm/util.h
+++ b/include/kvm/util.h
@@ -104,6 +104,8 @@ static inline unsigned long roundup_pow_of_two(unsigned long x)
 	return x ? 1UL << fls_long(x - 1) : 0;
 }
 
+#define is_power_of_two(x)	((x) > 0 ? ((x) & ((x) - 1)) == 0 : 0)
+
 struct kvm;
 void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
 void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
diff --git a/include/kvm/vesa.h b/include/kvm/vesa.h
index 0fac11ab5a9f..e7d971343642 100644
--- a/include/kvm/vesa.h
+++ b/include/kvm/vesa.h
@@ -5,8 +5,12 @@
 #define VESA_HEIGHT	480
 
 #define VESA_MEM_ADDR	0xd0000000
-#define VESA_MEM_SIZE	(4*VESA_WIDTH*VESA_HEIGHT)
 #define VESA_BPP	32
+/*
+ * We actually only need VESA_BPP/8*VESA_WIDTH*VESA_HEIGHT bytes. But the memory
+ * size must be a power of 2, so we round up.
+ */
+#define VESA_MEM_SIZE	(1 << 21)
 
 struct kvm;
 struct biosregs;
diff --git a/virtio/pci.c b/virtio/pci.c
index 99653cad2c0f..04e801827df9 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -435,6 +435,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	vpci->kvm = kvm;
 	vpci->dev = dev;
 
+	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
+	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
+
 	r = ioport__register(kvm, IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
 	if (r < 0)
 		return r;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 06/32] arm/pci: Advertise only PCI bus 0 in the DT
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (4 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 05/32] Check that a PCI device's memory size is power of two Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 07/32] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

The "bus-range" property encodes the PCI bus number of the PCI
controller and the largest bus number of any PCI buses that are
subordinate to this node [1]. kvmtool emulates only PCI bus 0.
Advertise this in the PCI DT node by setting "bus-range" to <0,0>.

[1] IEEE Std 1275-1994, Section 3 "Bus Nodes Properties and Methods"

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arm/pci.c b/arm/pci.c
index 557cfa98938d..ed325fa4a811 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -30,7 +30,7 @@ void pci__generate_fdt_nodes(void *fdt)
 	struct of_interrupt_map_entry irq_map[OF_PCI_IRQ_MAP_MAX];
 	unsigned nentries = 0;
 	/* Bus range */
-	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(1), };
+	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(0), };
 	/* Configuration Space */
 	u64 cfg_reg_prop[] = { cpu_to_fdt64(KVM_PCI_CFG_AREA),
 			       cpu_to_fdt64(ARM_PCI_CFG_SIZE), };
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 07/32] ioport: pci: Move port allocations to PCI devices
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (5 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 06/32] arm/pci: Advertise only PCI bus 0 in the DT Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 08/32] pci: Fix ioport allocation size Alexandru Elisei
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

The dynamic ioport allocation with IOPORT_EMPTY is currently only used
by PCI devices. Other devices use fixed ports for which they request
registration to the ioport API.

PCI ports need to be in the PCI IO space and there is no reason ioport
API should know a PCI port is being allocated and needs to be placed in
PCI IO space. This currently just happens to be the case.

Move the responsability of dynamic allocation of ioports from the ioport
API to PCI.

In the future, if other types of devices also need dynamic ioport
allocation, they'll have to figure out the range of ports they are
allowed to use.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Renamed functions for clarity]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c                      |  4 ++--
 include/kvm/ioport.h           |  3 ---
 include/kvm/pci.h              |  4 +++-
 ioport.c                       | 18 ------------------
 pci.c                          | 17 +++++++++++++----
 powerpc/include/kvm/kvm-arch.h |  2 +-
 vfio/core.c                    |  6 ++++--
 vfio/pci.c                     |  4 ++--
 virtio/pci.c                   |  7 ++++---
 x86/include/kvm/kvm-arch.h     |  2 +-
 10 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index d75b4b316a1e..24fb46faad3b 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -63,8 +63,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
-
-	r = ioport__register(kvm, IOPORT_EMPTY, &vesa_io_ops, IOPORT_SIZE, NULL);
+	r = pci_get_io_port_block(IOPORT_SIZE);
+	r = ioport__register(kvm, r, &vesa_io_ops, IOPORT_SIZE, NULL);
 	if (r < 0)
 		return ERR_PTR(r);
 
diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index db52a479742b..b10fcd5b4412 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -14,11 +14,8 @@
 
 /* some ports we reserve for own use */
 #define IOPORT_DBG			0xe0
-#define IOPORT_START			0x6200
 #define IOPORT_SIZE			0x400
 
-#define IOPORT_EMPTY			USHRT_MAX
-
 struct kvm;
 
 struct ioport {
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index a86c15a70e6d..ccb155e3e8fe 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -19,6 +19,7 @@
 #define PCI_CONFIG_DATA		0xcfc
 #define PCI_CONFIG_BUS_FORWARD	0xcfa
 #define PCI_IO_SIZE		0x100
+#define PCI_IOPORT_START	0x6200
 #define PCI_CFG_SIZE		(1ULL << 24)
 
 struct kvm;
@@ -152,7 +153,8 @@ struct pci_device_header {
 int pci__init(struct kvm *kvm);
 int pci__exit(struct kvm *kvm);
 struct pci_device_header *pci__find_dev(u8 dev_num);
-u32 pci_get_io_space_block(u32 size);
+u32 pci_get_mmio_block(u32 size);
+u16 pci_get_io_port_block(u32 size);
 void pci__assign_irq(struct device_header *dev_hdr);
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size);
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size);
diff --git a/ioport.c b/ioport.c
index a6dc65e3e6c6..a72e4035881a 100644
--- a/ioport.c
+++ b/ioport.c
@@ -16,24 +16,8 @@
 
 #define ioport_node(n) rb_entry(n, struct ioport, node)
 
-DEFINE_MUTEX(ioport_mutex);
-
-static u16			free_io_port_idx; /* protected by ioport_mutex */
-
 static struct rb_root		ioport_tree = RB_ROOT;
 
-static u16 ioport__find_free_port(void)
-{
-	u16 free_port;
-
-	mutex_lock(&ioport_mutex);
-	free_port = IOPORT_START + free_io_port_idx * IOPORT_SIZE;
-	free_io_port_idx++;
-	mutex_unlock(&ioport_mutex);
-
-	return free_port;
-}
-
 static struct ioport *ioport_search(struct rb_root *root, u64 addr)
 {
 	struct rb_int_node *node;
@@ -85,8 +69,6 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	int r;
 
 	br_write_lock(kvm);
-	if (port == IOPORT_EMPTY)
-		port = ioport__find_free_port();
 
 	entry = ioport_search(&ioport_tree, port);
 	if (entry) {
diff --git a/pci.c b/pci.c
index 3198732935eb..80b5c5d3d7f3 100644
--- a/pci.c
+++ b/pci.c
@@ -15,15 +15,24 @@ static u32 pci_config_address_bits;
  * (That's why it can still 32bit even with 64bit guests-- 64bit
  * PCI isn't currently supported.)
  */
-static u32 io_space_blocks		= KVM_PCI_MMIO_AREA;
+static u32 mmio_blocks			= KVM_PCI_MMIO_AREA;
+static u16 io_port_blocks		= PCI_IOPORT_START;
+
+u16 pci_get_io_port_block(u32 size)
+{
+	u16 port = ALIGN(io_port_blocks, IOPORT_SIZE);
+
+	io_port_blocks = port + size;
+	return port;
+}
 
 /*
  * BARs must be naturally aligned, so enforce this in the allocator.
  */
-u32 pci_get_io_space_block(u32 size)
+u32 pci_get_mmio_block(u32 size)
 {
-	u32 block = ALIGN(io_space_blocks, size);
-	io_space_blocks = block + size;
+	u32 block = ALIGN(mmio_blocks, size);
+	mmio_blocks = block + size;
 	return block;
 }
 
diff --git a/powerpc/include/kvm/kvm-arch.h b/powerpc/include/kvm/kvm-arch.h
index 8126b96cb66a..26d440b22bdd 100644
--- a/powerpc/include/kvm/kvm-arch.h
+++ b/powerpc/include/kvm/kvm-arch.h
@@ -34,7 +34,7 @@
 #define KVM_MMIO_START			PPC_MMIO_START
 
 /*
- * This is the address that pci_get_io_space_block() starts allocating
+ * This is the address that pci_get_io_port_block() starts allocating
  * from.  Note that this is a PCI bus address.
  */
 #define KVM_IOPORT_AREA			0x0
diff --git a/vfio/core.c b/vfio/core.c
index 17b5b0cfc9ac..0ed1e6fee6bf 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -202,8 +202,10 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
 				  struct vfio_region *region)
 {
 	if (region->is_ioport) {
-		int port = ioport__register(kvm, IOPORT_EMPTY, &vfio_ioport_ops,
-					    region->info.size, region);
+		int port = pci_get_io_port_block(region->info.size);
+
+		port = ioport__register(kvm, port, &vfio_ioport_ops,
+					region->info.size, region);
 		if (port < 0)
 			return port;
 
diff --git a/vfio/pci.c b/vfio/pci.c
index 76e24c156906..8e5d8572bc0c 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -750,7 +750,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm,
 	 * powers of two.
 	 */
 	mmio_size = roundup_pow_of_two(table->size + pba->size);
-	table->guest_phys_addr = pci_get_io_space_block(mmio_size);
+	table->guest_phys_addr = pci_get_mmio_block(mmio_size);
 	if (!table->guest_phys_addr) {
 		pr_err("cannot allocate IO space");
 		ret = -ENOMEM;
@@ -846,7 +846,7 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 	if (!region->is_ioport) {
 		/* Grab some MMIO space in the guest */
 		map_size = ALIGN(region->info.size, PAGE_SIZE);
-		region->guest_phys_addr = pci_get_io_space_block(map_size);
+		region->guest_phys_addr = pci_get_mmio_block(map_size);
 	}
 
 	/* Map the BARs into the guest or setup a trap region. */
diff --git a/virtio/pci.c b/virtio/pci.c
index 04e801827df9..d73414abde05 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -438,18 +438,19 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = ioport__register(kvm, IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
+	r = pci_get_io_port_block(IOPORT_SIZE);
+	r = ioport__register(kvm, r, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
 	if (r < 0)
 		return r;
 	vpci->port_addr = (u16)r;
 
-	vpci->mmio_addr = pci_get_io_space_block(IOPORT_SIZE);
+	vpci->mmio_addr = pci_get_mmio_block(IOPORT_SIZE);
 	r = kvm__register_mmio(kvm, vpci->mmio_addr, IOPORT_SIZE, false,
 			       virtio_pci__io_mmio_callback, vpci);
 	if (r < 0)
 		goto free_ioport;
 
-	vpci->msix_io_block = pci_get_io_space_block(PCI_IO_SIZE * 2);
+	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
 	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
 			       virtio_pci__msix_mmio_callback, vpci);
 	if (r < 0)
diff --git a/x86/include/kvm/kvm-arch.h b/x86/include/kvm/kvm-arch.h
index bfdd3438a9de..85cd336c7577 100644
--- a/x86/include/kvm/kvm-arch.h
+++ b/x86/include/kvm/kvm-arch.h
@@ -16,7 +16,7 @@
 
 #define KVM_MMIO_START		KVM_32BIT_GAP_START
 
-/* This is the address that pci_get_io_space_block() starts allocating
+/* This is the address that pci_get_io_port_block() starts allocating
  * from.  Note that this is a PCI bus address (though same on x86).
  */
 #define KVM_IOPORT_AREA		0x0
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 08/32] pci: Fix ioport allocation size
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (6 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 07/32] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-30  9:27   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 09/32] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
                   ` (24 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

The PCI Local Bus Specification, Rev. 3.0,
Section 6.2.5.1. "Address Maps" states:
"Devices that map control functions into I/O Space must not consume more
than 256 bytes per I/O Base Address register."

Yet all the PCI devices allocate IO ports of IOPORT_SIZE (= 1024 bytes).

Fix this by having PCI devices use 256 bytes ports for IO BARs.

There is no hard requirement on the size of the memory region described
by memory BARs. Since BAR 1 is supposed to offer the same functionality as
IO ports, let's make its size match BAR 0.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Added rationale for changing BAR1 size to PCI_IO_SIZE]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c            |  4 ++--
 include/kvm/ioport.h |  1 -
 pci.c                |  2 +-
 virtio/pci.c         | 15 +++++++--------
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index 24fb46faad3b..d8d91aa9c873 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -63,8 +63,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
-	r = pci_get_io_port_block(IOPORT_SIZE);
-	r = ioport__register(kvm, r, &vesa_io_ops, IOPORT_SIZE, NULL);
+	r = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
 	if (r < 0)
 		return ERR_PTR(r);
 
diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index b10fcd5b4412..8c86b7151f25 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -14,7 +14,6 @@
 
 /* some ports we reserve for own use */
 #define IOPORT_DBG			0xe0
-#define IOPORT_SIZE			0x400
 
 struct kvm;
 
diff --git a/pci.c b/pci.c
index 80b5c5d3d7f3..b6892d974c08 100644
--- a/pci.c
+++ b/pci.c
@@ -20,7 +20,7 @@ static u16 io_port_blocks		= PCI_IOPORT_START;
 
 u16 pci_get_io_port_block(u32 size)
 {
-	u16 port = ALIGN(io_port_blocks, IOPORT_SIZE);
+	u16 port = ALIGN(io_port_blocks, PCI_IO_SIZE);
 
 	io_port_blocks = port + size;
 	return port;
diff --git a/virtio/pci.c b/virtio/pci.c
index d73414abde05..eeb5b5efa6e1 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -421,7 +421,7 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 {
 	struct virtio_pci *vpci = ptr;
 	int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
-	u16 port = vpci->port_addr + (addr & (IOPORT_SIZE - 1));
+	u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
 
 	kvm__emulate_io(vcpu, port, data, direction, len, 1);
 }
@@ -435,17 +435,16 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	vpci->kvm = kvm;
 	vpci->dev = dev;
 
-	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = pci_get_io_port_block(IOPORT_SIZE);
-	r = ioport__register(kvm, r, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
+	r = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev);
 	if (r < 0)
 		return r;
 	vpci->port_addr = (u16)r;
 
-	vpci->mmio_addr = pci_get_mmio_block(IOPORT_SIZE);
-	r = kvm__register_mmio(kvm, vpci->mmio_addr, IOPORT_SIZE, false,
+	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
+	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
 			       virtio_pci__io_mmio_callback, vpci);
 	if (r < 0)
 		goto free_ioport;
@@ -475,8 +474,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
 		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
 		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
-		.bar_size[0]		= cpu_to_le32(IOPORT_SIZE),
-		.bar_size[1]		= cpu_to_le32(IOPORT_SIZE),
+		.bar_size[0]		= cpu_to_le32(PCI_IO_SIZE),
+		.bar_size[1]		= cpu_to_le32(PCI_IO_SIZE),
 		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
 	};
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 09/32] virtio/pci: Make memory and IO BARs independent
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (7 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 08/32] pci: Fix ioport allocation size Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 10/32] vfio/pci: Allocate correct size for MSIX table and PBA BARs Alexandru Elisei
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

Currently, callbacks for memory BAR 1 call the IO port emulation.  This
means that the memory BAR needs I/O Space to be enabled whenever Memory
Space is enabled.

Refactor the code so the two type of  BARs are independent. Also, unify
ioport/mmio callback arguments so that they all receive a virtio_device.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Cosmetic changes wrt to where local variables are initialized]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 virtio/pci.c | 63 +++++++++++++++++++++++++++++++++-------------------
 1 file changed, 40 insertions(+), 23 deletions(-)

diff --git a/virtio/pci.c b/virtio/pci.c
index eeb5b5efa6e1..281c31817598 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -87,8 +87,8 @@ static inline bool virtio_pci__msix_enabled(struct virtio_pci *vpci)
 	return vpci->pci_hdr.msix.ctrl & cpu_to_le16(PCI_MSIX_FLAGS_ENABLE);
 }
 
-static bool virtio_pci__specific_io_in(struct kvm *kvm, struct virtio_device *vdev, u16 port,
-					void *data, int size, int offset)
+static bool virtio_pci__specific_data_in(struct kvm *kvm, struct virtio_device *vdev,
+					 void *data, int size, unsigned long offset)
 {
 	u32 config_offset;
 	struct virtio_pci *vpci = vdev->virtio;
@@ -117,20 +117,17 @@ static bool virtio_pci__specific_io_in(struct kvm *kvm, struct virtio_device *vd
 	return false;
 }
 
-static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+static bool virtio_pci__data_in(struct kvm_cpu *vcpu, struct virtio_device *vdev,
+				unsigned long offset, void *data, int size)
 {
-	unsigned long offset;
 	bool ret = true;
-	struct virtio_device *vdev;
 	struct virtio_pci *vpci;
 	struct virt_queue *vq;
 	struct kvm *kvm;
 	u32 val;
 
 	kvm = vcpu->kvm;
-	vdev = ioport->priv;
 	vpci = vdev->virtio;
-	offset = port - vpci->port_addr;
 
 	switch (offset) {
 	case VIRTIO_PCI_HOST_FEATURES:
@@ -154,13 +151,22 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p
 		vpci->isr = VIRTIO_IRQ_LOW;
 		break;
 	default:
-		ret = virtio_pci__specific_io_in(kvm, vdev, port, data, size, offset);
+		ret = virtio_pci__specific_data_in(kvm, vdev, data, size, offset);
 		break;
 	};
 
 	return ret;
 }
 
+static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+{
+	struct virtio_device *vdev = ioport->priv;
+	struct virtio_pci *vpci = vdev->virtio;
+	unsigned long offset = port - vpci->port_addr;
+
+	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
+}
+
 static void update_msix_map(struct virtio_pci *vpci,
 			    struct msix_table *msix_entry, u32 vecnum)
 {
@@ -185,8 +191,8 @@ static void update_msix_map(struct virtio_pci *vpci,
 	irq__update_msix_route(vpci->kvm, gsi, &msix_entry->msg);
 }
 
-static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *vdev, u16 port,
-					void *data, int size, int offset)
+static bool virtio_pci__specific_data_out(struct kvm *kvm, struct virtio_device *vdev,
+					  void *data, int size, unsigned long offset)
 {
 	struct virtio_pci *vpci = vdev->virtio;
 	u32 config_offset, vec;
@@ -259,19 +265,16 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *v
 	return false;
 }
 
-static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+static bool virtio_pci__data_out(struct kvm_cpu *vcpu, struct virtio_device *vdev,
+				 unsigned long offset, void *data, int size)
 {
-	unsigned long offset;
 	bool ret = true;
-	struct virtio_device *vdev;
 	struct virtio_pci *vpci;
 	struct kvm *kvm;
 	u32 val;
 
 	kvm = vcpu->kvm;
-	vdev = ioport->priv;
 	vpci = vdev->virtio;
-	offset = port - vpci->port_addr;
 
 	switch (offset) {
 	case VIRTIO_PCI_GUEST_FEATURES:
@@ -304,13 +307,22 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 		virtio_notify_status(kvm, vdev, vpci->dev, vpci->status);
 		break;
 	default:
-		ret = virtio_pci__specific_io_out(kvm, vdev, port, data, size, offset);
+		ret = virtio_pci__specific_data_out(kvm, vdev, data, size, offset);
 		break;
 	};
 
 	return ret;
 }
 
+static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+{
+	struct virtio_device *vdev = ioport->priv;
+	struct virtio_pci *vpci = vdev->virtio;
+	unsigned long offset = port - vpci->port_addr;
+
+	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
+}
+
 static struct ioport_operations virtio_pci__io_ops = {
 	.io_in	= virtio_pci__io_in,
 	.io_out	= virtio_pci__io_out,
@@ -320,7 +332,8 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
 					   u64 addr, u8 *data, u32 len,
 					   u8 is_write, void *ptr)
 {
-	struct virtio_pci *vpci = ptr;
+	struct virtio_device *vdev = ptr;
+	struct virtio_pci *vpci = vdev->virtio;
 	struct msix_table *table;
 	int vecnum;
 	size_t offset;
@@ -419,11 +432,15 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 					 u64 addr, u8 *data, u32 len,
 					 u8 is_write, void *ptr)
 {
-	struct virtio_pci *vpci = ptr;
-	int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
-	u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
+	struct virtio_device *vdev = ptr;
+	struct virtio_pci *vpci = vdev->virtio;
 
-	kvm__emulate_io(vcpu, port, data, direction, len, 1);
+	if (!is_write)
+		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
+				    data, len);
+	else
+		virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr,
+				     data, len);
 }
 
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
@@ -445,13 +462,13 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
 	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
-			       virtio_pci__io_mmio_callback, vpci);
+			       virtio_pci__io_mmio_callback, vdev);
 	if (r < 0)
 		goto free_ioport;
 
 	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
 	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
-			       virtio_pci__msix_mmio_callback, vpci);
+			       virtio_pci__msix_mmio_callback, vdev);
 	if (r < 0)
 		goto free_mmio;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 10/32] vfio/pci: Allocate correct size for MSIX table and PBA BARs
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (8 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 09/32] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 11/32] vfio/pci: Don't assume that only even numbered BARs are 64bit Alexandru Elisei
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

kvmtool assumes that the BAR that holds the address for the MSIX table
and PBA structure has a size which is equal to their total size and it
allocates memory from MMIO space accordingly.  However, when
initializing the BARs, the BAR size is set to the region size reported
by VFIO. When the physical BAR size is greater than the mmio space that
kvmtool allocates, we can have a situation where the BAR overlaps with
another BAR, in which case kvmtool will fail to map the memory. This was
found when trying to do PCI passthrough with a PCIe Realtek r8168 NIC,
when the guest was also using virtio-block and virtio-net devices:

[..]
[    0.197926] PCI: OF: PROBE_ONLY enabled
[    0.198454] pci-host-generic 40000000.pci: host bridge /pci ranges:
[    0.199291] pci-host-generic 40000000.pci:    IO 0x00007000..0x0000ffff -> 0x00007000
[    0.200331] pci-host-generic 40000000.pci:   MEM 0x41000000..0x7fffffff -> 0x41000000
[    0.201480] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x40ffffff] for [bus 00]
[    0.202635] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00
[    0.203535] pci_bus 0000:00: root bus resource [bus 00]
[    0.204227] pci_bus 0000:00: root bus resource [io  0x0000-0x8fff] (bus address [0x7000-0xffff])
[    0.205483] pci_bus 0000:00: root bus resource [mem 0x41000000-0x7fffffff]
[    0.206456] pci 0000:00:00.0: [10ec:8168] type 00 class 0x020000
[    0.207399] pci 0000:00:00.0: reg 0x10: [io  0x0000-0x00ff]
[    0.208252] pci 0000:00:00.0: reg 0x18: [mem 0x41002000-0x41002fff]
[    0.209233] pci 0000:00:00.0: reg 0x20: [mem 0x41000000-0x41003fff]
[    0.210481] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000
[    0.211349] pci 0000:00:01.0: reg 0x10: [io  0x0100-0x01ff]
[    0.212118] pci 0000:00:01.0: reg 0x14: [mem 0x41003000-0x410030ff]
[    0.212982] pci 0000:00:01.0: reg 0x18: [mem 0x41003200-0x410033ff]
[    0.214247] pci 0000:00:02.0: [1af4:1001] type 00 class 0x018000
[    0.215096] pci 0000:00:02.0: reg 0x10: [io  0x0200-0x02ff]
[    0.215863] pci 0000:00:02.0: reg 0x14: [mem 0x41003400-0x410034ff]
[    0.216723] pci 0000:00:02.0: reg 0x18: [mem 0x41003600-0x410037ff]
[    0.218105] pci 0000:00:00.0: can't claim BAR 4 [mem 0x41000000-0x41003fff]: address conflict with 0000:00:00.0 [mem 0x41002000-0x41002fff]
[..]

Guest output of lspci -vv:

00:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
	Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 16
	Region 0: I/O ports at 0000 [size=256]
	Region 2: Memory at 41002000 (64-bit, non-prefetchable) [size=4K]
	Region 4: Memory at 41000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00001000

Let's fix this by allocating an amount of MMIO memory equal to the size
of the BAR that contains the MSIX table and/or PBA.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 68 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 8e5d8572bc0c..bbb8469c8d93 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -715,17 +715,44 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 	return 0;
 }
 
-static int vfio_pci_create_msix_table(struct kvm *kvm,
-				      struct vfio_pci_device *pdev)
+static int vfio_pci_get_region_info(struct vfio_device *vdev, u32 index,
+				    struct vfio_region_info *info)
+{
+	int ret;
+
+	*info = (struct vfio_region_info) {
+		.argsz = sizeof(*info),
+		.index = index,
+	};
+
+	ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
+	if (ret) {
+		ret = -errno;
+		vfio_dev_err(vdev, "cannot get info for BAR %u", index);
+		return ret;
+	}
+
+	if (info->size && !is_power_of_two(info->size)) {
+		vfio_dev_err(vdev, "region is not power of two: 0x%llx",
+				info->size);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
 {
 	int ret;
 	size_t i;
-	size_t mmio_size;
+	size_t map_size;
 	size_t nr_entries;
 	struct vfio_pci_msi_entry *entries;
+	struct vfio_pci_device *pdev = &vdev->pci;
 	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
 	struct vfio_pci_msix_table *table = &pdev->msix_table;
 	struct msix_cap *msix = PCI_CAP(&pdev->hdr, pdev->msix.pos);
+	struct vfio_region_info info;
 
 	table->bar = msix->table_offset & PCI_MSIX_TABLE_BIR;
 	pba->bar = msix->pba_offset & PCI_MSIX_TABLE_BIR;
@@ -744,15 +771,31 @@ static int vfio_pci_create_msix_table(struct kvm *kvm,
 	for (i = 0; i < nr_entries; i++)
 		entries[i].config.ctrl = PCI_MSIX_ENTRY_CTRL_MASKBIT;
 
+	ret = vfio_pci_get_region_info(vdev, table->bar, &info);
+	if (ret)
+		return ret;
+	if (!info.size)
+		return -EINVAL;
+	map_size = info.size;
+
+	if (table->bar != pba->bar) {
+		ret = vfio_pci_get_region_info(vdev, pba->bar, &info);
+		if (ret)
+			return ret;
+		if (!info.size)
+			return -EINVAL;
+		map_size += info.size;
+	}
+
 	/*
 	 * To ease MSI-X cap configuration in case they share the same BAR,
 	 * collapse table and pending array. The size of the BAR regions must be
 	 * powers of two.
 	 */
-	mmio_size = roundup_pow_of_two(table->size + pba->size);
-	table->guest_phys_addr = pci_get_mmio_block(mmio_size);
+	map_size = ALIGN(map_size, PAGE_SIZE);
+	table->guest_phys_addr = pci_get_mmio_block(map_size);
 	if (!table->guest_phys_addr) {
-		pr_err("cannot allocate IO space");
+		pr_err("cannot allocate MMIO space");
 		ret = -ENOMEM;
 		goto out_free;
 	}
@@ -816,17 +859,10 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 
 	region->vdev = vdev;
 	region->is_ioport = !!(bar & PCI_BASE_ADDRESS_SPACE_IO);
-	region->info = (struct vfio_region_info) {
-		.argsz = sizeof(region->info),
-		.index = nr,
-	};
 
-	ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &region->info);
-	if (ret) {
-		ret = -errno;
-		vfio_dev_err(vdev, "cannot get info for BAR %zu", nr);
+	ret = vfio_pci_get_region_info(vdev, nr, &region->info);
+	if (ret)
 		return ret;
-	}
 
 	/* Ignore invalid or unimplemented regions */
 	if (!region->info.size)
@@ -871,7 +907,7 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 		return ret;
 
 	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) {
-		ret = vfio_pci_create_msix_table(kvm, pdev);
+		ret = vfio_pci_create_msix_table(kvm, vdev);
 		if (ret)
 			return ret;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 11/32] vfio/pci: Don't assume that only even numbered BARs are 64bit
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (9 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 10/32] vfio/pci: Allocate correct size for MSIX table and PBA BARs Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 12/32] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Not all devices have the bottom 32 bits of a 64 bit BAR in an even
numbered BAR. For example, on an NVIDIA Quadro P400, BARs 1 and 3 are
64bit. Remove this assumption.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index bbb8469c8d93..1bdc20038411 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -920,8 +920,10 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 
 	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
 		/* Ignore top half of 64-bit BAR */
-		if (i % 2 && is_64bit)
+		if (is_64bit) {
+			is_64bit = false;
 			continue;
+		}
 
 		ret = vfio_pci_configure_bar(kvm, vdev, i);
 		if (ret)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 12/32] vfio/pci: Ignore expansion ROM BAR writes
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (10 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 11/32] vfio/pci: Don't assume that only even numbered BARs are 64bit Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-30  9:29   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 13/32] vfio/pci: Don't access unallocated regions Alexandru Elisei
                   ` (20 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

To get the size of the expansion ROM, software writes 0xfffff800 to the
expansion ROM BAR in the PCI configuration space. PCI emulation executes
the optional configuration space write callback that a device can implement
before emulating this write.

kvmtool's implementation of VFIO doesn't have support for emulating
expansion ROMs. However, the callback writes the guest value to the
hardware BAR, and then it reads it back to the emulated BAR to make sure
the write has completed successfully.

After this, we return to regular PCI emulation and because the BAR is no
longer 0, we write back to the BAR the value that the guest used to get the
size. As a result, the guest will think that the ROM size is 0x800 after
the subsequent read and we end up unintentionally exposing to the guest a
BAR which we don't emulate.

Let's fix this by ignoring writes to the expansion ROM BAR.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/vfio/pci.c b/vfio/pci.c
index 1bdc20038411..1f38f90c3ae9 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -472,6 +472,9 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	struct vfio_device *vdev;
 	void *base = pci_hdr;
 
+	if (offset == PCI_ROM_ADDRESS)
+		return;
+
 	pdev = container_of(pci_hdr, struct vfio_pci_device, hdr);
 	vdev = container_of(pdev, struct vfio_device, pci);
 	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 13/32] vfio/pci: Don't access unallocated regions
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (11 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 12/32] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-30  9:29   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 14/32] virtio: Don't ignore initialization failures Alexandru Elisei
                   ` (19 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Don't try to configure a BAR if there is no region associated with it.

Also move the variable declarations from inside the loop to the start of
the function for consistency.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 1f38f90c3ae9..4412c6d7a862 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -645,16 +645,19 @@ static int vfio_pci_parse_cfg_space(struct vfio_device *vdev)
 static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 {
 	int i;
+	u64 base;
 	ssize_t hdr_sz;
 	struct msix_cap *msix;
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev = &vdev->pci;
+	struct vfio_region *region;
 
 	/* Initialise the BARs */
 	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
-		u64 base;
-		struct vfio_region *region = &vdev->regions[i];
+		if ((u32)i == vdev->info.num_regions)
+			break;
 
+		region = &vdev->regions[i];
 		/* Construct a fake reg to match what we've mapped. */
 		if (region->is_ioport) {
 			base = (region->port_base & PCI_BASE_ADDRESS_IO_MASK) |
@@ -853,11 +856,12 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 	u32 bar;
 	size_t map_size;
 	struct vfio_pci_device *pdev = &vdev->pci;
-	struct vfio_region *region = &vdev->regions[nr];
+	struct vfio_region *region;
 
 	if (nr >= vdev->info.num_regions)
 		return 0;
 
+	region = &vdev->regions[nr];
 	bar = pdev->hdr.bar[nr];
 
 	region->vdev = vdev;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 14/32] virtio: Don't ignore initialization failures
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (12 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 13/32] vfio/pci: Don't access unallocated regions Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-30  9:30   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 15/32] Don't ignore errors registering a device, ioport or mmio emulation Alexandru Elisei
                   ` (18 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Don't ignore an error in the bus specific initialization function in
virtio_init; don't ignore the result of virtio_init; and don't return 0
in virtio_blk__init and virtio_scsi__init when we encounter an error.
Hopefully this will save some developer's time debugging faulty virtio
devices in a guest.

To take advantage of the cleanup function virtio_blk__exit, we have
moved appending the new device to the list before the call to
virtio_init.

To safeguard against this in the future, virtio_init has been annoted
with the compiler attribute warn_unused_result.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/kvm.h        |  1 +
 include/kvm/virtio.h     |  7 ++++---
 include/linux/compiler.h |  2 +-
 virtio/9p.c              |  9 ++++++---
 virtio/balloon.c         | 10 +++++++---
 virtio/blk.c             | 14 +++++++++-----
 virtio/console.c         | 11 ++++++++---
 virtio/core.c            |  9 +++++----
 virtio/net.c             | 32 ++++++++++++++++++--------------
 virtio/scsi.c            | 14 +++++++++-----
 10 files changed, 68 insertions(+), 41 deletions(-)

diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 7a738183d67a..c6dc6ef72d11 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -8,6 +8,7 @@
 
 #include <stdbool.h>
 #include <linux/types.h>
+#include <linux/compiler.h>
 #include <time.h>
 #include <signal.h>
 #include <sys/prctl.h>
diff --git a/include/kvm/virtio.h b/include/kvm/virtio.h
index 19b913732cd5..3a311f54f2a5 100644
--- a/include/kvm/virtio.h
+++ b/include/kvm/virtio.h
@@ -7,6 +7,7 @@
 #include <linux/virtio_pci.h>
 
 #include <linux/types.h>
+#include <linux/compiler.h>
 #include <linux/virtio_config.h>
 #include <sys/uio.h>
 
@@ -204,9 +205,9 @@ struct virtio_ops {
 	int (*reset)(struct kvm *kvm, struct virtio_device *vdev);
 };
 
-int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
-		struct virtio_ops *ops, enum virtio_trans trans,
-		int device_id, int subsys_id, int class);
+int __must_check virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
+			     struct virtio_ops *ops, enum virtio_trans trans,
+			     int device_id, int subsys_id, int class);
 int virtio_compat_add_message(const char *device, const char *config);
 const char* virtio_trans_name(enum virtio_trans trans);
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 898420b81aec..a662ba0a5c68 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -14,7 +14,7 @@
 #define __packed	__attribute__((packed))
 #define __iomem
 #define __force
-#define __must_check
+#define __must_check	__attribute__((warn_unused_result))
 #define unlikely
 
 #endif
diff --git a/virtio/9p.c b/virtio/9p.c
index ac70dbc31207..b78f2b3f0e09 100644
--- a/virtio/9p.c
+++ b/virtio/9p.c
@@ -1551,11 +1551,14 @@ int virtio_9p_img_name_parser(const struct option *opt, const char *arg, int uns
 int virtio_9p__init(struct kvm *kvm)
 {
 	struct p9_dev *p9dev;
+	int r;
 
 	list_for_each_entry(p9dev, &devs, list) {
-		virtio_init(kvm, p9dev, &p9dev->vdev, &p9_dev_virtio_ops,
-			    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_9P,
-			    VIRTIO_ID_9P, PCI_CLASS_9P);
+		r = virtio_init(kvm, p9dev, &p9dev->vdev, &p9_dev_virtio_ops,
+				VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_9P,
+				VIRTIO_ID_9P, PCI_CLASS_9P);
+		if (r < 0)
+			return r;
 	}
 
 	return 0;
diff --git a/virtio/balloon.c b/virtio/balloon.c
index 0bd16703dfee..8e8803fed607 100644
--- a/virtio/balloon.c
+++ b/virtio/balloon.c
@@ -264,6 +264,8 @@ struct virtio_ops bln_dev_virtio_ops = {
 
 int virtio_bln__init(struct kvm *kvm)
 {
+	int r;
+
 	if (!kvm->cfg.balloon)
 		return 0;
 
@@ -273,9 +275,11 @@ int virtio_bln__init(struct kvm *kvm)
 	bdev.stat_waitfd	= eventfd(0, 0);
 	memset(&bdev.config, 0, sizeof(struct virtio_balloon_config));
 
-	virtio_init(kvm, &bdev, &bdev.vdev, &bln_dev_virtio_ops,
-		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLN,
-		    VIRTIO_ID_BALLOON, PCI_CLASS_BLN);
+	r = virtio_init(kvm, &bdev, &bdev.vdev, &bln_dev_virtio_ops,
+			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLN,
+			VIRTIO_ID_BALLOON, PCI_CLASS_BLN);
+	if (r < 0)
+		return r;
 
 	if (compat_id == -1)
 		compat_id = virtio_compat_add_message("virtio-balloon", "CONFIG_VIRTIO_BALLOON");
diff --git a/virtio/blk.c b/virtio/blk.c
index f267be1563dc..4d02d101af81 100644
--- a/virtio/blk.c
+++ b/virtio/blk.c
@@ -306,6 +306,7 @@ static struct virtio_ops blk_dev_virtio_ops = {
 static int virtio_blk__init_one(struct kvm *kvm, struct disk_image *disk)
 {
 	struct blk_dev *bdev;
+	int r;
 
 	if (!disk)
 		return -EINVAL;
@@ -323,12 +324,14 @@ static int virtio_blk__init_one(struct kvm *kvm, struct disk_image *disk)
 		.kvm			= kvm,
 	};
 
-	virtio_init(kvm, bdev, &bdev->vdev, &blk_dev_virtio_ops,
-		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLK,
-		    VIRTIO_ID_BLOCK, PCI_CLASS_BLK);
-
 	list_add_tail(&bdev->list, &bdevs);
 
+	r = virtio_init(kvm, bdev, &bdev->vdev, &blk_dev_virtio_ops,
+			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLK,
+			VIRTIO_ID_BLOCK, PCI_CLASS_BLK);
+	if (r < 0)
+		return r;
+
 	disk_image__set_callback(bdev->disk, virtio_blk_complete);
 
 	if (compat_id == -1)
@@ -359,7 +362,8 @@ int virtio_blk__init(struct kvm *kvm)
 
 	return 0;
 cleanup:
-	return virtio_blk__exit(kvm);
+	virtio_blk__exit(kvm);
+	return r;
 }
 virtio_dev_init(virtio_blk__init);
 
diff --git a/virtio/console.c b/virtio/console.c
index f1be02549222..e0b98df37965 100644
--- a/virtio/console.c
+++ b/virtio/console.c
@@ -230,12 +230,17 @@ static struct virtio_ops con_dev_virtio_ops = {
 
 int virtio_console__init(struct kvm *kvm)
 {
+	int r;
+
 	if (kvm->cfg.active_console != CONSOLE_VIRTIO)
 		return 0;
 
-	virtio_init(kvm, &cdev, &cdev.vdev, &con_dev_virtio_ops,
-		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_CONSOLE,
-		    VIRTIO_ID_CONSOLE, PCI_CLASS_CONSOLE);
+	r = virtio_init(kvm, &cdev, &cdev.vdev, &con_dev_virtio_ops,
+			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_CONSOLE,
+			VIRTIO_ID_CONSOLE, PCI_CLASS_CONSOLE);
+	if (r < 0)
+		return r;
+
 	if (compat_id == -1)
 		compat_id = virtio_compat_add_message("virtio-console", "CONFIG_VIRTIO_CONSOLE");
 
diff --git a/virtio/core.c b/virtio/core.c
index e10ec362e1ea..f5b3c07fc100 100644
--- a/virtio/core.c
+++ b/virtio/core.c
@@ -259,6 +259,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		int device_id, int subsys_id, int class)
 {
 	void *virtio;
+	int r;
 
 	switch (trans) {
 	case VIRTIO_PCI:
@@ -272,7 +273,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		vdev->ops->init			= virtio_pci__init;
 		vdev->ops->exit			= virtio_pci__exit;
 		vdev->ops->reset		= virtio_pci__reset;
-		vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
+		r = vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
 		break;
 	case VIRTIO_MMIO:
 		virtio = calloc(sizeof(struct virtio_mmio), 1);
@@ -285,13 +286,13 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		vdev->ops->init			= virtio_mmio_init;
 		vdev->ops->exit			= virtio_mmio_exit;
 		vdev->ops->reset		= virtio_mmio_reset;
-		vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
+		r = vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
 		break;
 	default:
-		return -1;
+		r = -1;
 	};
 
-	return 0;
+	return r;
 }
 
 int virtio_compat_add_message(const char *device, const char *config)
diff --git a/virtio/net.c b/virtio/net.c
index 091406912a24..425c13ba1136 100644
--- a/virtio/net.c
+++ b/virtio/net.c
@@ -910,7 +910,7 @@ done:
 
 static int virtio_net__init_one(struct virtio_net_params *params)
 {
-	int i, err;
+	int i, r;
 	struct net_dev *ndev;
 	struct virtio_ops *ops;
 	enum virtio_trans trans = VIRTIO_DEFAULT_TRANS(params->kvm);
@@ -920,10 +920,8 @@ static int virtio_net__init_one(struct virtio_net_params *params)
 		return -ENOMEM;
 
 	ops = malloc(sizeof(*ops));
-	if (ops == NULL) {
-		err = -ENOMEM;
-		goto err_free_ndev;
-	}
+	if (ops == NULL)
+		return -ENOMEM;
 
 	list_add_tail(&ndev->list, &ndevs);
 
@@ -969,8 +967,10 @@ static int virtio_net__init_one(struct virtio_net_params *params)
 				   virtio_trans_name(trans));
 	}
 
-	virtio_init(params->kvm, ndev, &ndev->vdev, ops, trans,
-		    PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET);
+	r = virtio_init(params->kvm, ndev, &ndev->vdev, ops, trans,
+			PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET);
+	if (r < 0)
+		return r;
 
 	if (params->vhost)
 		virtio_net__vhost_init(params->kvm, ndev);
@@ -979,19 +979,17 @@ static int virtio_net__init_one(struct virtio_net_params *params)
 		compat_id = virtio_compat_add_message("virtio-net", "CONFIG_VIRTIO_NET");
 
 	return 0;
-
-err_free_ndev:
-	free(ndev);
-	return err;
 }
 
 int virtio_net__init(struct kvm *kvm)
 {
-	int i;
+	int i, r;
 
 	for (i = 0; i < kvm->cfg.num_net_devices; i++) {
 		kvm->cfg.net_params[i].kvm = kvm;
-		virtio_net__init_one(&kvm->cfg.net_params[i]);
+		r = virtio_net__init_one(&kvm->cfg.net_params[i]);
+		if (r < 0)
+			goto cleanup;
 	}
 
 	if (kvm->cfg.num_net_devices == 0 && kvm->cfg.no_net == 0) {
@@ -1007,10 +1005,16 @@ int virtio_net__init(struct kvm *kvm)
 		str_to_mac(kvm->cfg.guest_mac, net_params.guest_mac);
 		str_to_mac(kvm->cfg.host_mac, net_params.host_mac);
 
-		virtio_net__init_one(&net_params);
+		r = virtio_net__init_one(&net_params);
+		if (r < 0)
+			goto cleanup;
 	}
 
 	return 0;
+
+cleanup:
+	virtio_net__exit(kvm);
+	return r;
 }
 virtio_dev_init(virtio_net__init);
 
diff --git a/virtio/scsi.c b/virtio/scsi.c
index 1ec78fe0945a..16a86cb7e0e6 100644
--- a/virtio/scsi.c
+++ b/virtio/scsi.c
@@ -234,6 +234,7 @@ static void virtio_scsi_vhost_init(struct kvm *kvm, struct scsi_dev *sdev)
 static int virtio_scsi_init_one(struct kvm *kvm, struct disk_image *disk)
 {
 	struct scsi_dev *sdev;
+	int r;
 
 	if (!disk)
 		return -EINVAL;
@@ -260,12 +261,14 @@ static int virtio_scsi_init_one(struct kvm *kvm, struct disk_image *disk)
 	strlcpy((char *)&sdev->target.vhost_wwpn, disk->wwpn, sizeof(sdev->target.vhost_wwpn));
 	sdev->target.vhost_tpgt = strtol(disk->tpgt, NULL, 0);
 
-	virtio_init(kvm, sdev, &sdev->vdev, &scsi_dev_virtio_ops,
-		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_SCSI,
-		    VIRTIO_ID_SCSI, PCI_CLASS_BLK);
-
 	list_add_tail(&sdev->list, &sdevs);
 
+	r = virtio_init(kvm, sdev, &sdev->vdev, &scsi_dev_virtio_ops,
+			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_SCSI,
+			VIRTIO_ID_SCSI, PCI_CLASS_BLK);
+	if (r < 0)
+		return r;
+
 	virtio_scsi_vhost_init(kvm, sdev);
 
 	if (compat_id == -1)
@@ -302,7 +305,8 @@ int virtio_scsi_init(struct kvm *kvm)
 
 	return 0;
 cleanup:
-	return virtio_scsi_exit(kvm);
+	virtio_scsi_exit(kvm);
+	return r;
 }
 virtio_dev_init(virtio_scsi_init);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 15/32] Don't ignore errors registering a device, ioport or mmio emulation
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (13 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 14/32] virtio: Don't ignore initialization failures Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 16/32] hw/vesa: Don't ignore fatal errors Alexandru Elisei
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

An error returned by device__register, kvm__register_mmio and
ioport__register means that the device will
not be emulated properly. Annotate the functions with __must_check, so we
get a compiler warning when this error is ignored.

And fix several instances where the caller returns 0 even if the function
failed.

Also make sure the ioport emulation code uses ioport_remove consistently.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/ioport.c          |  3 +-
 hw/i8042.c            | 12 ++++++--
 hw/vesa.c             |  4 ++-
 include/kvm/devices.h |  3 +-
 include/kvm/ioport.h  |  6 ++--
 include/kvm/kvm.h     |  6 ++--
 ioport.c              | 25 ++++++++--------
 mips/kvm.c            |  3 +-
 powerpc/ioport.c      |  3 +-
 virtio/mmio.c         | 13 +++++++--
 x86/ioport.c          | 66 ++++++++++++++++++++++++++++++++-----------
 11 files changed, 101 insertions(+), 43 deletions(-)

diff --git a/arm/ioport.c b/arm/ioport.c
index bdd30b6fe812..2f0feb9ab69f 100644
--- a/arm/ioport.c
+++ b/arm/ioport.c
@@ -1,8 +1,9 @@
 #include "kvm/ioport.h"
 #include "kvm/irq.h"
 
-void ioport__setup_arch(struct kvm *kvm)
+int ioport__setup_arch(struct kvm *kvm)
 {
+	return 0;
 }
 
 void ioport__map_irq(u8 *irq)
diff --git a/hw/i8042.c b/hw/i8042.c
index 2d8c96e9c7e6..37a99a2dc6b8 100644
--- a/hw/i8042.c
+++ b/hw/i8042.c
@@ -349,10 +349,18 @@ static struct ioport_operations kbd_ops = {
 
 int kbd__init(struct kvm *kvm)
 {
+	int r;
+
 	kbd_reset();
 	state.kvm = kvm;
-	ioport__register(kvm, I8042_DATA_REG, &kbd_ops, 2, NULL);
-	ioport__register(kvm, I8042_COMMAND_REG, &kbd_ops, 2, NULL);
+	r = ioport__register(kvm, I8042_DATA_REG, &kbd_ops, 2, NULL);
+	if (r < 0)
+		return r;
+	r = ioport__register(kvm, I8042_COMMAND_REG, &kbd_ops, 2, NULL);
+	if (r < 0) {
+		ioport__unregister(kvm, I8042_DATA_REG);
+		return r;
+	}
 
 	return 0;
 }
diff --git a/hw/vesa.c b/hw/vesa.c
index d8d91aa9c873..b92cc990b730 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -70,7 +70,9 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	vesa_base_addr			= (u16)r;
 	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
-	device__register(&vesa_device);
+	r = device__register(&vesa_device);
+	if (r < 0)
+		return ERR_PTR(r);
 
 	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
 	if (mem == MAP_FAILED)
diff --git a/include/kvm/devices.h b/include/kvm/devices.h
index 405f19521977..e445db6f56b1 100644
--- a/include/kvm/devices.h
+++ b/include/kvm/devices.h
@@ -3,6 +3,7 @@
 
 #include <linux/rbtree.h>
 #include <linux/types.h>
+#include <linux/compiler.h>
 
 enum device_bus_type {
 	DEVICE_BUS_PCI,
@@ -18,7 +19,7 @@ struct device_header {
 	struct rb_node		node;
 };
 
-int device__register(struct device_header *dev);
+int __must_check device__register(struct device_header *dev);
 void device__unregister(struct device_header *dev);
 struct device_header *device__find_dev(enum device_bus_type bus_type,
 				       u8 dev_num);
diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index 8c86b7151f25..62a719327e3f 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -33,11 +33,11 @@ struct ioport_operations {
 							    enum irq_type));
 };
 
-void ioport__setup_arch(struct kvm *kvm);
+int ioport__setup_arch(struct kvm *kvm);
 void ioport__map_irq(u8 *irq);
 
-int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops,
-			int count, void *param);
+int __must_check ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops,
+				  int count, void *param);
 int ioport__unregister(struct kvm *kvm, u16 port);
 int ioport__init(struct kvm *kvm);
 int ioport__exit(struct kvm *kvm);
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index c6dc6ef72d11..50119a8672eb 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -128,9 +128,9 @@ static inline int kvm__reserve_mem(struct kvm *kvm, u64 guest_phys, u64 size)
 				 KVM_MEM_TYPE_RESERVED);
 }
 
-int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
-		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
-			void *ptr);
+int __must_check kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
+				    void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
+				    void *ptr);
 bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr);
 void kvm__reboot(struct kvm *kvm);
 void kvm__pause(struct kvm *kvm);
diff --git a/ioport.c b/ioport.c
index a72e4035881a..cb778ed8d757 100644
--- a/ioport.c
+++ b/ioport.c
@@ -73,7 +73,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	entry = ioport_search(&ioport_tree, port);
 	if (entry) {
 		pr_warning("ioport re-registered: %x", port);
-		rb_int_erase(&ioport_tree, &entry->node);
+		ioport_remove(&ioport_tree, entry);
 	}
 
 	entry = malloc(sizeof(*entry));
@@ -91,16 +91,21 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	};
 
 	r = ioport_insert(&ioport_tree, entry);
-	if (r < 0) {
-		free(entry);
-		br_write_unlock(kvm);
-		return r;
-	}
-
-	device__register(&entry->dev_hdr);
+	if (r < 0)
+		goto out_free;
+	r = device__register(&entry->dev_hdr);
+	if (r < 0)
+		goto out_remove;
 	br_write_unlock(kvm);
 
 	return port;
+
+out_remove:
+	ioport_remove(&ioport_tree, entry);
+out_free:
+	free(entry);
+	br_write_unlock(kvm);
+	return r;
 }
 
 int ioport__unregister(struct kvm *kvm, u16 port)
@@ -196,9 +201,7 @@ out:
 
 int ioport__init(struct kvm *kvm)
 {
-	ioport__setup_arch(kvm);
-
-	return 0;
+	return ioport__setup_arch(kvm);
 }
 dev_base_init(ioport__init);
 
diff --git a/mips/kvm.c b/mips/kvm.c
index 211770da0d85..26355930d3b6 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -100,8 +100,9 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
 		die_perror("KVM_IRQ_LINE ioctl");
 }
 
-void ioport__setup_arch(struct kvm *kvm)
+int ioport__setup_arch(struct kvm *kvm)
 {
+	return 0;
 }
 
 bool kvm__arch_cpu_supports_vm(void)
diff --git a/powerpc/ioport.c b/powerpc/ioport.c
index 58dc625c54fe..0c188b61a51a 100644
--- a/powerpc/ioport.c
+++ b/powerpc/ioport.c
@@ -12,9 +12,10 @@
 
 #include <stdlib.h>
 
-void ioport__setup_arch(struct kvm *kvm)
+int ioport__setup_arch(struct kvm *kvm)
 {
 	/* PPC has no legacy ioports to set up */
+	return 0;
 }
 
 void ioport__map_irq(u8 *irq)
diff --git a/virtio/mmio.c b/virtio/mmio.c
index 03cecc366292..5537c39367d6 100644
--- a/virtio/mmio.c
+++ b/virtio/mmio.c
@@ -292,13 +292,16 @@ int virtio_mmio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		     int device_id, int subsys_id, int class)
 {
 	struct virtio_mmio *vmmio = vdev->virtio;
+	int r;
 
 	vmmio->addr	= virtio_mmio_get_io_space_block(VIRTIO_MMIO_IO_SIZE);
 	vmmio->kvm	= kvm;
 	vmmio->dev	= dev;
 
-	kvm__register_mmio(kvm, vmmio->addr, VIRTIO_MMIO_IO_SIZE,
-			   false, virtio_mmio_mmio_callback, vdev);
+	r = kvm__register_mmio(kvm, vmmio->addr, VIRTIO_MMIO_IO_SIZE,
+			       false, virtio_mmio_mmio_callback, vdev);
+	if (r < 0)
+		return r;
 
 	vmmio->hdr = (struct virtio_mmio_hdr) {
 		.magic		= {'v', 'i', 'r', 't'},
@@ -313,7 +316,11 @@ int virtio_mmio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.data		= generate_virtio_mmio_fdt_node,
 	};
 
-	device__register(&vmmio->dev_hdr);
+	r = device__register(&vmmio->dev_hdr);
+	if (r < 0) {
+		kvm__deregister_mmio(kvm, vmmio->addr);
+		return r;
+	}
 
 	/*
 	 * Instantiate guest virtio-mmio devices using kernel command line
diff --git a/x86/ioport.c b/x86/ioport.c
index 8572c758ed4f..7ad7b8f3f497 100644
--- a/x86/ioport.c
+++ b/x86/ioport.c
@@ -69,50 +69,84 @@ void ioport__map_irq(u8 *irq)
 {
 }
 
-void ioport__setup_arch(struct kvm *kvm)
+int ioport__setup_arch(struct kvm *kvm)
 {
+	int r;
+
 	/* Legacy ioport setup */
 
 	/* 0000 - 001F - DMA1 controller */
-	ioport__register(kvm, 0x0000, &dummy_read_write_ioport_ops, 32, NULL);
+	r = ioport__register(kvm, 0x0000, &dummy_read_write_ioport_ops, 32, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0x0020 - 0x003F - 8259A PIC 1 */
-	ioport__register(kvm, 0x0020, &dummy_read_write_ioport_ops, 2, NULL);
+	r = ioport__register(kvm, 0x0020, &dummy_read_write_ioport_ops, 2, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 0040-005F - PIT - PROGRAMMABLE INTERVAL TIMER (8253, 8254) */
-	ioport__register(kvm, 0x0040, &dummy_read_write_ioport_ops, 4, NULL);
+	r = ioport__register(kvm, 0x0040, &dummy_read_write_ioport_ops, 4, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0092 - PS/2 system control port A */
-	ioport__register(kvm, 0x0092, &ps2_control_a_ops, 1, NULL);
+	r = ioport__register(kvm, 0x0092, &ps2_control_a_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0x00A0 - 0x00AF - 8259A PIC 2 */
-	ioport__register(kvm, 0x00A0, &dummy_read_write_ioport_ops, 2, NULL);
+	r = ioport__register(kvm, 0x00A0, &dummy_read_write_ioport_ops, 2, NULL);
+	if (r < 0)
+		return r;
 
 	/* 00C0 - 001F - DMA2 controller */
-	ioport__register(kvm, 0x00C0, &dummy_read_write_ioport_ops, 32, NULL);
+	r = ioport__register(kvm, 0x00C0, &dummy_read_write_ioport_ops, 32, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 00E0-00EF are 'motherboard specific' so we use them for our
 	   internal debugging purposes.  */
-	ioport__register(kvm, IOPORT_DBG, &debug_ops, 1, NULL);
+	r = ioport__register(kvm, IOPORT_DBG, &debug_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 00ED - DUMMY PORT FOR DELAY??? */
-	ioport__register(kvm, 0x00ED, &dummy_write_only_ioport_ops, 1, NULL);
+	r = ioport__register(kvm, 0x00ED, &dummy_write_only_ioport_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0x00F0 - 0x00FF - Math co-processor */
-	ioport__register(kvm, 0x00F0, &dummy_write_only_ioport_ops, 2, NULL);
+	r = ioport__register(kvm, 0x00F0, &dummy_write_only_ioport_ops, 2, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 0278-027A - PARALLEL PRINTER PORT (usually LPT1, sometimes LPT2) */
-	ioport__register(kvm, 0x0278, &dummy_read_write_ioport_ops, 3, NULL);
+	r = ioport__register(kvm, 0x0278, &dummy_read_write_ioport_ops, 3, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 0378-037A - PARALLEL PRINTER PORT (usually LPT2, sometimes LPT3) */
-	ioport__register(kvm, 0x0378, &dummy_read_write_ioport_ops, 3, NULL);
+	r = ioport__register(kvm, 0x0378, &dummy_read_write_ioport_ops, 3, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 03D4-03D5 - COLOR VIDEO - CRT CONTROL REGISTERS */
-	ioport__register(kvm, 0x03D4, &dummy_read_write_ioport_ops, 1, NULL);
-	ioport__register(kvm, 0x03D5, &dummy_write_only_ioport_ops, 1, NULL);
+	r = ioport__register(kvm, 0x03D4, &dummy_read_write_ioport_ops, 1, NULL);
+	if (r < 0)
+		return r;
+	r = ioport__register(kvm, 0x03D5, &dummy_write_only_ioport_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
-	ioport__register(kvm, 0x402, &seabios_debug_ops, 1, NULL);
+	r = ioport__register(kvm, 0x402, &seabios_debug_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0510 - QEMU BIOS configuration register */
-	ioport__register(kvm, 0x510, &dummy_read_write_ioport_ops, 2, NULL);
+	r = ioport__register(kvm, 0x510, &dummy_read_write_ioport_ops, 2, NULL);
+	if (r < 0)
+		return r;
+
+	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 16/32] hw/vesa: Don't ignore fatal errors
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (14 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 15/32] Don't ignore errors registering a device, ioport or mmio emulation Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-30  9:30   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 17/32] hw/vesa: Set the size for BAR 0 Alexandru Elisei
                   ` (16 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Failling an mmap call or creating a memslot means that device emulation
will not work, don't ignore it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index b92cc990b730..ad09373ea2ff 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -63,22 +63,25 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
-	r = pci_get_io_port_block(PCI_IO_SIZE);
-	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
+	vesa_base_addr = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, vesa_base_addr, &vesa_io_ops, PCI_IO_SIZE, NULL);
 	if (r < 0)
-		return ERR_PTR(r);
+		goto out_error;
 
-	vesa_base_addr			= (u16)r;
 	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
 	r = device__register(&vesa_device);
 	if (r < 0)
-		return ERR_PTR(r);
+		goto unregister_ioport;
 
 	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
-	if (mem == MAP_FAILED)
-		ERR_PTR(-errno);
+	if (mem == MAP_FAILED) {
+		r = -errno;
+		goto unregister_device;
+	}
 
-	kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
+	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
+	if (r < 0)
+		goto unmap_dev;
 
 	vesafb = (struct framebuffer) {
 		.width			= VESA_WIDTH,
@@ -90,4 +93,13 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 		.kvm			= kvm,
 	};
 	return fb__register(&vesafb);
+
+unmap_dev:
+	munmap(mem, VESA_MEM_SIZE);
+unregister_device:
+	device__unregister(&vesa_device);
+unregister_ioport:
+	ioport__unregister(kvm, vesa_base_addr);
+out_error:
+	return ERR_PTR(r);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 17/32] hw/vesa: Set the size for BAR 0
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (15 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 16/32] hw/vesa: Don't ignore fatal errors Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 18/32] ioport: Fail when registering overlapping ports Alexandru Elisei
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Implemented BARs have an non-zero address and a size. Let's set the size
for BAR 0.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/vesa.c b/hw/vesa.c
index ad09373ea2ff..dd59a112330b 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -69,6 +69,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 		goto out_error;
 
 	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
+	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
 	r = device__register(&vesa_device);
 	if (r < 0)
 		goto unregister_ioport;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 18/32] ioport: Fail when registering overlapping ports
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (16 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 17/32] hw/vesa: Set the size for BAR 0 Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-30 11:13   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 19/32] ioport: mmio: Use a mutex and reference counting for locking Alexandru Elisei
                   ` (14 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

If we try to register a range of ports which overlaps with another, already
registered, I/O ports region then device emulation for that region will not
work anymore. There's nothing sane that the ioport emulation layer can do
in this case so refuse to allocate the port. This matches the behavior of
kvm__register_mmio.

There's no need to protect allocating a new ioport struct with a lock, so
move the lock to protect the actual ioport insertion in the tree.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 ioport.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/ioport.c b/ioport.c
index cb778ed8d757..d9f2e8ea3c3b 100644
--- a/ioport.c
+++ b/ioport.c
@@ -68,14 +68,6 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	struct ioport *entry;
 	int r;
 
-	br_write_lock(kvm);
-
-	entry = ioport_search(&ioport_tree, port);
-	if (entry) {
-		pr_warning("ioport re-registered: %x", port);
-		ioport_remove(&ioport_tree, entry);
-	}
-
 	entry = malloc(sizeof(*entry));
 	if (entry == NULL)
 		return -ENOMEM;
@@ -90,6 +82,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 		},
 	};
 
+	br_write_lock(kvm);
 	r = ioport_insert(&ioport_tree, entry);
 	if (r < 0)
 		goto out_free;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 19/32] ioport: mmio: Use a mutex and reference counting for locking
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (17 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 18/32] ioport: Fail when registering overlapping ports Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-31 11:51   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 20/32] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
                   ` (13 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

kvmtool uses brlock for protecting accesses to the ioport and mmio
red-black trees. brlock allows concurrent reads, but only one writer, which
is assumed not to be a VCPU thread (for more information see commit
0b907ed2eaec ("kvm tools: Add a brlock)). This is done by issuing a
compiler barrier on read and pausing the entire virtual machine on writes.
When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread read/write
lock.

When we will implement reassignable BARs, the mmio or ioport mapping will
be done as a result of a VCPU mmio access. When brlock is a pthread
read/write lock, it means that we will try to acquire a write lock with the
read lock already held by the same VCPU and we will deadlock. When it's
not, a VCPU will have to call kvm__pause, which means the virtual machine
will stay paused forever.

Let's avoid all this by using a mutex and reference counting the red-black
tree entries. This way we can guarantee that we won't unregister a node
that another thread is currently using for emulation.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/ioport.h          |  2 +
 include/kvm/rbtree-interval.h |  4 +-
 ioport.c                      | 64 +++++++++++++++++-------
 mmio.c                        | 91 +++++++++++++++++++++++++----------
 4 files changed, 118 insertions(+), 43 deletions(-)

diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index 62a719327e3f..039633f76bdd 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -22,6 +22,8 @@ struct ioport {
 	struct ioport_operations	*ops;
 	void				*priv;
 	struct device_header		dev_hdr;
+	u32				refcount;
+	bool				remove;
 };
 
 struct ioport_operations {
diff --git a/include/kvm/rbtree-interval.h b/include/kvm/rbtree-interval.h
index 730eb5e8551d..17cd3b5f3199 100644
--- a/include/kvm/rbtree-interval.h
+++ b/include/kvm/rbtree-interval.h
@@ -6,7 +6,9 @@
 
 #define RB_INT_INIT(l, h) \
 	(struct rb_int_node){.low = l, .high = h}
-#define rb_int(n) rb_entry(n, struct rb_int_node, node)
+#define rb_int(n)	rb_entry(n, struct rb_int_node, node)
+#define rb_int_start(n)	((n)->low)
+#define rb_int_end(n)	((n)->low + (n)->high - 1)
 
 struct rb_int_node {
 	struct rb_node	node;
diff --git a/ioport.c b/ioport.c
index d9f2e8ea3c3b..5d7288809528 100644
--- a/ioport.c
+++ b/ioport.c
@@ -2,7 +2,6 @@
 
 #include "kvm/kvm.h"
 #include "kvm/util.h"
-#include "kvm/brlock.h"
 #include "kvm/rbtree-interval.h"
 #include "kvm/mutex.h"
 
@@ -16,6 +15,8 @@
 
 #define ioport_node(n) rb_entry(n, struct ioport, node)
 
+static DEFINE_MUTEX(ioport_lock);
+
 static struct rb_root		ioport_tree = RB_ROOT;
 
 static struct ioport *ioport_search(struct rb_root *root, u64 addr)
@@ -39,6 +40,36 @@ static void ioport_remove(struct rb_root *root, struct ioport *data)
 	rb_int_erase(root, &data->node);
 }
 
+static struct ioport *ioport_get(struct rb_root *root, u64 addr)
+{
+	struct ioport *ioport;
+
+	mutex_lock(&ioport_lock);
+	ioport = ioport_search(root, addr);
+	if (ioport)
+		ioport->refcount++;
+	mutex_unlock(&ioport_lock);
+
+	return ioport;
+}
+
+/* Called with ioport_lock held. */
+static void ioport_unregister(struct rb_root *root, struct ioport *data)
+{
+	device__unregister(&data->dev_hdr);
+	ioport_remove(root, data);
+	free(data);
+}
+
+static void ioport_put(struct rb_root *root, struct ioport *data)
+{
+	mutex_lock(&ioport_lock);
+	data->refcount--;
+	if (data->remove && data->refcount == 0)
+		ioport_unregister(root, data);
+	mutex_unlock(&ioport_lock);
+}
+
 #ifdef CONFIG_HAS_LIBFDT
 static void generate_ioport_fdt_node(void *fdt,
 				     struct device_header *dev_hdr,
@@ -80,16 +111,18 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 			.bus_type	= DEVICE_BUS_IOPORT,
 			.data		= generate_ioport_fdt_node,
 		},
+		.refcount	= 0,
+		.remove		= false,
 	};
 
-	br_write_lock(kvm);
+	mutex_lock(&ioport_lock);
 	r = ioport_insert(&ioport_tree, entry);
 	if (r < 0)
 		goto out_free;
 	r = device__register(&entry->dev_hdr);
 	if (r < 0)
 		goto out_remove;
-	br_write_unlock(kvm);
+	mutex_unlock(&ioport_lock);
 
 	return port;
 
@@ -97,7 +130,7 @@ out_remove:
 	ioport_remove(&ioport_tree, entry);
 out_free:
 	free(entry);
-	br_write_unlock(kvm);
+	mutex_unlock(&ioport_lock);
 	return r;
 }
 
@@ -106,22 +139,22 @@ int ioport__unregister(struct kvm *kvm, u16 port)
 	struct ioport *entry;
 	int r;
 
-	br_write_lock(kvm);
+	mutex_lock(&ioport_lock);
 
 	r = -ENOENT;
 	entry = ioport_search(&ioport_tree, port);
 	if (!entry)
 		goto done;
 
-	device__unregister(&entry->dev_hdr);
-	ioport_remove(&ioport_tree, entry);
-
-	free(entry);
+	if (entry->refcount)
+		entry->remove = true;
+	else
+		ioport_unregister(&ioport_tree, entry);
 
 	r = 0;
 
 done:
-	br_write_unlock(kvm);
+	mutex_unlock(&ioport_lock);
 
 	return r;
 }
@@ -136,9 +169,7 @@ static void ioport__unregister_all(void)
 	while (rb) {
 		rb_node = rb_int(rb);
 		entry = ioport_node(rb_node);
-		device__unregister(&entry->dev_hdr);
-		ioport_remove(&ioport_tree, entry);
-		free(entry);
+		ioport_unregister(&ioport_tree, entry);
 		rb = rb_first(&ioport_tree);
 	}
 }
@@ -164,8 +195,7 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
 	void *ptr = data;
 	struct kvm *kvm = vcpu->kvm;
 
-	br_read_lock(kvm);
-	entry = ioport_search(&ioport_tree, port);
+	entry = ioport_get(&ioport_tree, port);
 	if (!entry)
 		goto out;
 
@@ -180,9 +210,9 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
 		ptr += size;
 	}
 
-out:
-	br_read_unlock(kvm);
+	ioport_put(&ioport_tree, entry);
 
+out:
 	if (ret)
 		return true;
 
diff --git a/mmio.c b/mmio.c
index 61e1d47a587d..57a4d5b9d64d 100644
--- a/mmio.c
+++ b/mmio.c
@@ -1,7 +1,7 @@
 #include "kvm/kvm.h"
 #include "kvm/kvm-cpu.h"
 #include "kvm/rbtree-interval.h"
-#include "kvm/brlock.h"
+#include "kvm/mutex.h"
 
 #include <stdio.h>
 #include <stdlib.h>
@@ -15,10 +15,14 @@
 
 #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
 
+static DEFINE_MUTEX(mmio_lock);
+
 struct mmio_mapping {
 	struct rb_int_node	node;
 	void			(*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
 	void			*ptr;
+	u32			refcount;
+	bool			remove;
 };
 
 static struct rb_root mmio_tree = RB_ROOT;
@@ -51,6 +55,11 @@ static int mmio_insert(struct rb_root *root, struct mmio_mapping *data)
 	return rb_int_insert(root, &data->node);
 }
 
+static void mmio_remove(struct rb_root *root, struct mmio_mapping *data)
+{
+	rb_int_erase(root, &data->node);
+}
+
 static const char *to_direction(u8 is_write)
 {
 	if (is_write)
@@ -59,6 +68,41 @@ static const char *to_direction(u8 is_write)
 	return "read";
 }
 
+static struct mmio_mapping *mmio_get(struct rb_root *root, u64 phys_addr, u32 len)
+{
+	struct mmio_mapping *mmio;
+
+	mutex_lock(&mmio_lock);
+	mmio = mmio_search(root, phys_addr, len);
+	if (mmio)
+		mmio->refcount++;
+	mutex_unlock(&mmio_lock);
+
+	return mmio;
+}
+
+/* Called with mmio_lock held. */
+static void mmio_deregister(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio)
+{
+	struct kvm_coalesced_mmio_zone zone = (struct kvm_coalesced_mmio_zone) {
+		.addr	= rb_int_start(&mmio->node),
+		.size	= 1,
+	};
+	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
+
+	mmio_remove(root, mmio);
+	free(mmio);
+}
+
+static void mmio_put(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio)
+{
+	mutex_lock(&mmio_lock);
+	mmio->refcount--;
+	if (mmio->remove && mmio->refcount == 0)
+		mmio_deregister(kvm, root, mmio);
+	mutex_unlock(&mmio_lock);
+}
+
 int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
 		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
 			void *ptr)
@@ -72,9 +116,11 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
 		return -ENOMEM;
 
 	*mmio = (struct mmio_mapping) {
-		.node = RB_INT_INIT(phys_addr, phys_addr + phys_addr_len),
-		.mmio_fn = mmio_fn,
-		.ptr	= ptr,
+		.node		= RB_INT_INIT(phys_addr, phys_addr + phys_addr_len),
+		.mmio_fn	= mmio_fn,
+		.ptr		= ptr,
+		.refcount	= 0,
+		.remove		= false,
 	};
 
 	if (coalesce) {
@@ -88,9 +134,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
 			return -errno;
 		}
 	}
-	br_write_lock(kvm);
+	mutex_lock(&mmio_lock);
 	ret = mmio_insert(&mmio_tree, mmio);
-	br_write_unlock(kvm);
+	mutex_unlock(&mmio_lock);
 
 	return ret;
 }
@@ -98,25 +144,20 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
 bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
 {
 	struct mmio_mapping *mmio;
-	struct kvm_coalesced_mmio_zone zone;
 
-	br_write_lock(kvm);
+	mutex_lock(&mmio_lock);
 	mmio = mmio_search_single(&mmio_tree, phys_addr);
 	if (mmio == NULL) {
-		br_write_unlock(kvm);
+		mutex_unlock(&mmio_lock);
 		return false;
 	}
 
-	zone = (struct kvm_coalesced_mmio_zone) {
-		.addr	= phys_addr,
-		.size	= 1,
-	};
-	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
+	if (mmio->refcount)
+		mmio->remove = true;
+	else
+		mmio_deregister(kvm, &mmio_tree, mmio);
+	mutex_unlock(&mmio_lock);
 
-	rb_int_erase(&mmio_tree, &mmio->node);
-	br_write_unlock(kvm);
-
-	free(mmio);
 	return true;
 }
 
@@ -124,18 +165,18 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
 {
 	struct mmio_mapping *mmio;
 
-	br_read_lock(vcpu->kvm);
-	mmio = mmio_search(&mmio_tree, phys_addr, len);
-
-	if (mmio)
-		mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
-	else {
+	mmio = mmio_get(&mmio_tree, phys_addr, len);
+	if (!mmio) {
 		if (vcpu->kvm->cfg.mmio_debug)
 			fprintf(stderr,	"Warning: Ignoring MMIO %s at %016llx (length %u)\n",
 				to_direction(is_write),
 				(unsigned long long)phys_addr, len);
+		goto out;
 	}
-	br_read_unlock(vcpu->kvm);
 
+	mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
+	mmio_put(vcpu->kvm, &mmio_tree, mmio);
+
+out:
 	return true;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 20/32] pci: Add helpers for BAR values and memory/IO space access
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (18 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 19/32] ioport: mmio: Use a mutex and reference counting for locking Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-04-02  8:33   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 21/32] virtio/pci: Get emulated region address from BARs Alexandru Elisei
                   ` (12 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

We're going to be checking the BAR type, the address written to it and if
access to memory or I/O space is enabled quite often when we add support
for reasignable BARs; make our life easier by adding helpers for it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/pci.h   | 53 +++++++++++++++++++++++++++++++++++++++++++++
 pci.c               |  4 ++--
 powerpc/spapr_pci.c |  2 +-
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index ccb155e3e8fe..adb4b5c082d5 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -5,6 +5,7 @@
 #include <linux/kvm.h>
 #include <linux/pci_regs.h>
 #include <endian.h>
+#include <stdbool.h>
 
 #include "kvm/devices.h"
 #include "kvm/msi.h"
@@ -161,4 +162,56 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
 
+static inline bool __pci__memory_space_enabled(u16 command)
+{
+	return command & PCI_COMMAND_MEMORY;
+}
+
+static inline bool pci__memory_space_enabled(struct pci_device_header *pci_hdr)
+{
+	return __pci__memory_space_enabled(pci_hdr->command);
+}
+
+static inline bool __pci__io_space_enabled(u16 command)
+{
+	return command & PCI_COMMAND_IO;
+}
+
+static inline bool pci__io_space_enabled(struct pci_device_header *pci_hdr)
+{
+	return __pci__io_space_enabled(pci_hdr->command);
+}
+
+static inline bool __pci__bar_is_io(u32 bar)
+{
+	return bar & PCI_BASE_ADDRESS_SPACE_IO;
+}
+
+static inline bool pci__bar_is_io(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return __pci__bar_is_io(pci_hdr->bar[bar_num]);
+}
+
+static inline bool pci__bar_is_memory(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return !pci__bar_is_io(pci_hdr, bar_num);
+}
+
+static inline u32 __pci__bar_address(u32 bar)
+{
+	if (__pci__bar_is_io(bar))
+		return bar & PCI_BASE_ADDRESS_IO_MASK;
+	return bar & PCI_BASE_ADDRESS_MEM_MASK;
+}
+
+static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return __pci__bar_address(pci_hdr->bar[bar_num]);
+}
+
+static inline u32 pci__bar_size(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return pci_hdr->bar_size[bar_num];
+}
+
 #endif /* KVM__PCI_H */
diff --git a/pci.c b/pci.c
index b6892d974c08..7399c76c0819 100644
--- a/pci.c
+++ b/pci.c
@@ -185,7 +185,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	 * size, it will write the address back.
 	 */
 	if (bar < 6) {
-		if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO)
+		if (pci__bar_is_io(pci_hdr, bar))
 			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
 		else
 			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
@@ -211,7 +211,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 		 */
 		memcpy(&value, data, size);
 		if (value == 0xffffffff)
-			value = ~(pci_hdr->bar_size[bar] - 1);
+			value = ~(pci__bar_size(pci_hdr, bar) - 1);
 		/* Preserve the special bits. */
 		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
 		memcpy(base + offset, &value, size);
diff --git a/powerpc/spapr_pci.c b/powerpc/spapr_pci.c
index a15f7d895a46..7be44d950acb 100644
--- a/powerpc/spapr_pci.c
+++ b/powerpc/spapr_pci.c
@@ -369,7 +369,7 @@ int spapr_populate_pci_devices(struct kvm *kvm,
 				of_pci_b_ddddd(devid) |
 				of_pci_b_fff(fn) |
 				of_pci_b_rrrrrrrr(bars[i]));
-			reg[n+1].size = cpu_to_be64(hdr->bar_size[i]);
+			reg[n+1].size = cpu_to_be64(pci__bar_size(hdr, i));
 			reg[n+1].addr = 0;
 
 			assigned_addresses[n].phys_hi = cpu_to_be32(
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 21/32] virtio/pci: Get emulated region address from BARs
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (19 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 20/32] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 22/32] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

The struct virtio_pci fields port_addr, mmio_addr and msix_io_block
represent the same addresses that are written in the corresponding BARs.
Remove this duplication of information and always use the address from the
BAR. This will make our life a lot easier when we add support for
reassignable BARs, because we won't have to update the fields on each BAR
change.

No functional changes.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/virtio-pci.h |  3 --
 virtio/pci.c             | 82 +++++++++++++++++++++++++---------------
 2 files changed, 52 insertions(+), 33 deletions(-)

diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h
index 278a25950d8b..959b4b81c871 100644
--- a/include/kvm/virtio-pci.h
+++ b/include/kvm/virtio-pci.h
@@ -24,8 +24,6 @@ struct virtio_pci {
 	void			*dev;
 	struct kvm		*kvm;
 
-	u16			port_addr;
-	u32			mmio_addr;
 	u8			status;
 	u8			isr;
 	u32			features;
@@ -43,7 +41,6 @@ struct virtio_pci {
 	u32			config_gsi;
 	u32			vq_vector[VIRTIO_PCI_MAX_VQ];
 	u32			gsis[VIRTIO_PCI_MAX_VQ];
-	u32			msix_io_block;
 	u64			msix_pba;
 	struct msix_table	msix_table[VIRTIO_PCI_MAX_VQ + VIRTIO_PCI_MAX_CONFIG];
 
diff --git a/virtio/pci.c b/virtio/pci.c
index 281c31817598..d111dc499f5e 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -13,6 +13,21 @@
 #include <linux/byteorder.h>
 #include <string.h>
 
+static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
+{
+	return pci__bar_address(&vpci->pci_hdr, 0);
+}
+
+static u32 virtio_pci__mmio_addr(struct virtio_pci *vpci)
+{
+	return pci__bar_address(&vpci->pci_hdr, 1);
+}
+
+static u32 virtio_pci__msix_io_addr(struct virtio_pci *vpci)
+{
+	return pci__bar_address(&vpci->pci_hdr, 2);
+}
+
 static void virtio_pci__ioevent_callback(struct kvm *kvm, void *param)
 {
 	struct virtio_pci_ioevent_param *ioeventfd = param;
@@ -25,6 +40,8 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 {
 	struct ioevent ioevent;
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
+	u16 port_addr = virtio_pci__port_addr(vpci);
 	int r, flags = 0;
 	int fd;
 
@@ -48,7 +65,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 		flags |= IOEVENTFD_FLAG_USER_POLL;
 
 	/* ioport */
-	ioevent.io_addr	= vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY;
+	ioevent.io_addr	= port_addr + VIRTIO_PCI_QUEUE_NOTIFY;
 	ioevent.io_len	= sizeof(u16);
 	ioevent.fd	= fd = eventfd(0, 0);
 	r = ioeventfd__add_event(&ioevent, flags | IOEVENTFD_FLAG_PIO);
@@ -56,7 +73,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 		return r;
 
 	/* mmio */
-	ioevent.io_addr	= vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY;
+	ioevent.io_addr	= mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY;
 	ioevent.io_len	= sizeof(u16);
 	ioevent.fd	= eventfd(0, 0);
 	r = ioeventfd__add_event(&ioevent, flags);
@@ -68,7 +85,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 	return 0;
 
 free_ioport_evt:
-	ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
+	ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
 	return r;
 }
 
@@ -76,9 +93,11 @@ static void virtio_pci_exit_vq(struct kvm *kvm, struct virtio_device *vdev,
 			       int vq)
 {
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
+	u16 port_addr = virtio_pci__port_addr(vpci);
 
-	ioeventfd__del_event(vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
-	ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
+	ioeventfd__del_event(mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
+	ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
 	virtio_exit_vq(kvm, vdev, vpci->dev, vq);
 }
 
@@ -162,7 +181,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p
 {
 	struct virtio_device *vdev = ioport->priv;
 	struct virtio_pci *vpci = vdev->virtio;
-	unsigned long offset = port - vpci->port_addr;
+	unsigned long offset = port - virtio_pci__port_addr(vpci);
 
 	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
 }
@@ -318,7 +337,7 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 {
 	struct virtio_device *vdev = ioport->priv;
 	struct virtio_pci *vpci = vdev->virtio;
-	unsigned long offset = port - vpci->port_addr;
+	unsigned long offset = port - virtio_pci__port_addr(vpci);
 
 	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
 }
@@ -335,17 +354,18 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
 	struct virtio_device *vdev = ptr;
 	struct virtio_pci *vpci = vdev->virtio;
 	struct msix_table *table;
+	u32 msix_io_addr = virtio_pci__msix_io_addr(vpci);
 	int vecnum;
 	size_t offset;
 
-	if (addr > vpci->msix_io_block + PCI_IO_SIZE) {
+	if (addr > msix_io_addr + PCI_IO_SIZE) {
 		if (is_write)
 			return;
 		table  = (struct msix_table *)&vpci->msix_pba;
-		offset = addr - (vpci->msix_io_block + PCI_IO_SIZE);
+		offset = addr - (msix_io_addr + PCI_IO_SIZE);
 	} else {
 		table  = vpci->msix_table;
-		offset = addr - vpci->msix_io_block;
+		offset = addr - msix_io_addr;
 	}
 	vecnum = offset / sizeof(struct msix_table);
 	offset = offset % sizeof(struct msix_table);
@@ -434,19 +454,20 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 {
 	struct virtio_device *vdev = ptr;
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
 
 	if (!is_write)
-		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
-				    data, len);
+		virtio_pci__data_in(vcpu, vdev, addr - mmio_addr, data, len);
 	else
-		virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr,
-				     data, len);
+		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
 }
 
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		     int device_id, int subsys_id, int class)
 {
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr, msix_io_block;
+	u16 port_addr;
 	int r;
 
 	vpci->kvm = kvm;
@@ -454,20 +475,21 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = pci_get_io_port_block(PCI_IO_SIZE);
-	r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev);
+	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
+			     vdev);
 	if (r < 0)
 		return r;
-	vpci->port_addr = (u16)r;
+	port_addr = (u16)r;
 
-	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
-	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
+	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
+	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
 			       virtio_pci__io_mmio_callback, vdev);
 	if (r < 0)
 		goto free_ioport;
 
-	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
-	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
+	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
+	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
 			       virtio_pci__msix_mmio_callback, vdev);
 	if (r < 0)
 		goto free_mmio;
@@ -483,11 +505,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.class[2]		= (class >> 16) & 0xff,
 		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
 		.subsys_id		= cpu_to_le16(subsys_id),
-		.bar[0]			= cpu_to_le32(vpci->port_addr
+		.bar[0]			= cpu_to_le32(port_addr
 							| PCI_BASE_ADDRESS_SPACE_IO),
-		.bar[1]			= cpu_to_le32(vpci->mmio_addr
+		.bar[1]			= cpu_to_le32(mmio_addr
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
-		.bar[2]			= cpu_to_le32(vpci->msix_io_block
+		.bar[2]			= cpu_to_le32(msix_io_block
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
 		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
 		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
@@ -534,11 +556,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	return 0;
 
 free_msix_mmio:
-	kvm__deregister_mmio(kvm, vpci->msix_io_block);
+	kvm__deregister_mmio(kvm, msix_io_block);
 free_mmio:
-	kvm__deregister_mmio(kvm, vpci->mmio_addr);
+	kvm__deregister_mmio(kvm, mmio_addr);
 free_ioport:
-	ioport__unregister(kvm, vpci->port_addr);
+	ioport__unregister(kvm, port_addr);
 	return r;
 }
 
@@ -558,9 +580,9 @@ int virtio_pci__exit(struct kvm *kvm, struct virtio_device *vdev)
 	struct virtio_pci *vpci = vdev->virtio;
 
 	virtio_pci__reset(kvm, vdev);
-	kvm__deregister_mmio(kvm, vpci->mmio_addr);
-	kvm__deregister_mmio(kvm, vpci->msix_io_block);
-	ioport__unregister(kvm, vpci->port_addr);
+	kvm__deregister_mmio(kvm, virtio_pci__mmio_addr(vpci));
+	kvm__deregister_mmio(kvm, virtio_pci__msix_io_addr(vpci));
+	ioport__unregister(kvm, virtio_pci__port_addr(vpci));
 
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 22/32] vfio: Destroy memslot when unmapping the associated VAs
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (20 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 21/32] virtio/pci: Get emulated region address from BARs Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-04-02  8:33   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 23/32] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
                   ` (10 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

When we want to map a device region into the guest address space, first we
perform an mmap on the device fd. The resulting VMA is a mapping between
host userspace addresses and physical addresses associated with the device.
Next, we create a memslot, which populates the stage 2 table with the
mappings between guest physical addresses and the device physical adresses.

However, when we want to unmap the device from the guest address space, we
only call munmap, which destroys the VMA and the stage 2 mappings, but
doesn't destroy the memslot and kvmtool's internal mem_bank structure
associated with the memslot.

This has been perfectly fine so far, because we only unmap a device region
when we exit kvmtool. This is will change when we add support for
reassignable BARs, and we will have to unmap vfio regions as the guest
kernel writes new addresses in the BARs. This can lead to two possible
problems:

- We refuse to create a valid BAR mapping because of a stale mem_bank
  structure which belonged to a previously unmapped region.

- It is possible that the mmap in vfio_map_region returns the same address
  that was used to create a memslot, but was unmapped by vfio_unmap_region.
  Guest accesses to the device memory will fault because the stage 2
  mappings are missing, and this can lead to performance degradation.

Let's do the right thing and destroy the memslot and the mem_bank struct
associated with it when we unmap a vfio region. Set host_addr to NULL after
the munmap call so we won't try to unmap an address which is currently used
by the process for something else if vfio_unmap_region gets called twice.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/kvm.h |   4 ++
 kvm.c             | 101 ++++++++++++++++++++++++++++++++++++++++------
 vfio/core.c       |   6 +++
 3 files changed, 99 insertions(+), 12 deletions(-)

diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 50119a8672eb..9428f57a1c0c 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -1,6 +1,7 @@
 #ifndef KVM__KVM_H
 #define KVM__KVM_H
 
+#include "kvm/mutex.h"
 #include "kvm/kvm-arch.h"
 #include "kvm/kvm-config.h"
 #include "kvm/util-init.h"
@@ -56,6 +57,7 @@ struct kvm_mem_bank {
 	void			*host_addr;
 	u64			size;
 	enum kvm_mem_type	type;
+	u32			slot;
 };
 
 struct kvm {
@@ -72,6 +74,7 @@ struct kvm {
 	u64			ram_size;
 	void			*ram_start;
 	u64			ram_pagesize;
+	struct mutex		mem_banks_lock;
 	struct list_head	mem_banks;
 
 	bool			nmi_disabled;
@@ -106,6 +109,7 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
 void kvm__irq_trigger(struct kvm *kvm, int irq);
 bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count);
 bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write);
+int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr);
 int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr,
 		      enum kvm_mem_type type);
 static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size,
diff --git a/kvm.c b/kvm.c
index 57c4ff98ec4c..26f6b9bc58a3 100644
--- a/kvm.c
+++ b/kvm.c
@@ -157,6 +157,7 @@ struct kvm *kvm__new(void)
 	if (!kvm)
 		return ERR_PTR(-ENOMEM);
 
+	mutex_init(&kvm->mem_banks_lock);
 	kvm->sys_fd = -1;
 	kvm->vm_fd = -1;
 
@@ -183,20 +184,84 @@ int kvm__exit(struct kvm *kvm)
 }
 core_exit(kvm__exit);
 
+int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size,
+		     void *userspace_addr)
+{
+	struct kvm_userspace_memory_region mem;
+	struct kvm_mem_bank *bank;
+	int ret;
+
+	mutex_lock(&kvm->mem_banks_lock);
+	list_for_each_entry(bank, &kvm->mem_banks, list)
+		if (bank->guest_phys_addr == guest_phys &&
+		    bank->size == size && bank->host_addr == userspace_addr)
+			break;
+
+	if (&bank->list == &kvm->mem_banks) {
+		pr_err("Region [%llx-%llx] not found", guest_phys,
+		       guest_phys + size - 1);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (bank->type == KVM_MEM_TYPE_RESERVED) {
+		pr_err("Cannot delete reserved region [%llx-%llx]",
+		       guest_phys, guest_phys + size - 1);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	mem = (struct kvm_userspace_memory_region) {
+		.slot			= bank->slot,
+		.guest_phys_addr	= guest_phys,
+		.memory_size		= 0,
+		.userspace_addr		= (unsigned long)userspace_addr,
+	};
+
+	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
+	if (ret < 0) {
+		ret = -errno;
+		goto out;
+	}
+
+	list_del(&bank->list);
+	free(bank);
+	kvm->mem_slots--;
+	ret = 0;
+
+out:
+	mutex_unlock(&kvm->mem_banks_lock);
+	return ret;
+}
+
 int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 		      void *userspace_addr, enum kvm_mem_type type)
 {
 	struct kvm_userspace_memory_region mem;
 	struct kvm_mem_bank *merged = NULL;
 	struct kvm_mem_bank *bank;
+	struct list_head *prev_entry;
+	u32 slot;
 	int ret;
 
-	/* Check for overlap */
+	mutex_lock(&kvm->mem_banks_lock);
+	/* Check for overlap and find first empty slot. */
+	slot = 0;
+	prev_entry = &kvm->mem_banks;
 	list_for_each_entry(bank, &kvm->mem_banks, list) {
 		u64 bank_end = bank->guest_phys_addr + bank->size - 1;
 		u64 end = guest_phys + size - 1;
-		if (guest_phys > bank_end || end < bank->guest_phys_addr)
+		if (guest_phys > bank_end || end < bank->guest_phys_addr) {
+			/*
+			 * Keep the banks sorted ascending by slot, so it's
+			 * easier for us to find a free slot.
+			 */
+			if (bank->slot == slot) {
+				slot++;
+				prev_entry = &bank->list;
+			}
 			continue;
+		}
 
 		/* Merge overlapping reserved regions */
 		if (bank->type == KVM_MEM_TYPE_RESERVED &&
@@ -226,38 +291,50 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 		       kvm_mem_type_to_string(bank->type), bank->guest_phys_addr,
 		       bank->guest_phys_addr + bank->size - 1);
 
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 
-	if (merged)
-		return 0;
+	if (merged) {
+		ret = 0;
+		goto out;
+	}
 
 	bank = malloc(sizeof(*bank));
-	if (!bank)
-		return -ENOMEM;
+	if (!bank) {
+		ret = -ENOMEM;
+		goto out;
+	}
 
 	INIT_LIST_HEAD(&bank->list);
 	bank->guest_phys_addr		= guest_phys;
 	bank->host_addr			= userspace_addr;
 	bank->size			= size;
 	bank->type			= type;
+	bank->slot			= slot;
 
 	if (type != KVM_MEM_TYPE_RESERVED) {
 		mem = (struct kvm_userspace_memory_region) {
-			.slot			= kvm->mem_slots++,
+			.slot			= slot,
 			.guest_phys_addr	= guest_phys,
 			.memory_size		= size,
 			.userspace_addr		= (unsigned long)userspace_addr,
 		};
 
 		ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
-		if (ret < 0)
-			return -errno;
+		if (ret < 0) {
+			ret = -errno;
+			goto out;
+		}
 	}
 
-	list_add(&bank->list, &kvm->mem_banks);
+	list_add(&bank->list, prev_entry);
+	kvm->mem_slots++;
+	ret = 0;
 
-	return 0;
+out:
+	mutex_unlock(&kvm->mem_banks_lock);
+	return ret;
 }
 
 void *guest_flat_to_host(struct kvm *kvm, u64 offset)
diff --git a/vfio/core.c b/vfio/core.c
index 0ed1e6fee6bf..b80ebf3bb913 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -256,8 +256,14 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
 
 void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
 {
+	u64 map_size;
+
 	if (region->host_addr) {
+		map_size = ALIGN(region->info.size, PAGE_SIZE);
+		kvm__destroy_mem(kvm, region->guest_phys_addr, map_size,
+				 region->host_addr);
 		munmap(region->host_addr, region->info.size);
+		region->host_addr = NULL;
 	} else if (region->is_ioport) {
 		ioport__unregister(kvm, region->port_base);
 	} else {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 23/32] vfio: Reserve ioports when configuring the BAR
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (21 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 22/32] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 24/32] pci: Limit configuration transaction size to 32 bits Alexandru Elisei
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Let's be consistent and reserve ioports when we are configuring the BAR,
not when we map it, just like we do with mmio regions.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/core.c | 9 +++------
 vfio/pci.c  | 4 +++-
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/vfio/core.c b/vfio/core.c
index b80ebf3bb913..bad3c7c8cd26 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -202,14 +202,11 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
 				  struct vfio_region *region)
 {
 	if (region->is_ioport) {
-		int port = pci_get_io_port_block(region->info.size);
-
-		port = ioport__register(kvm, port, &vfio_ioport_ops,
-					region->info.size, region);
+		int port = ioport__register(kvm, region->port_base,
+					   &vfio_ioport_ops, region->info.size,
+					   region);
 		if (port < 0)
 			return port;
-
-		region->port_base = port;
 		return 0;
 	}
 
diff --git a/vfio/pci.c b/vfio/pci.c
index 4412c6d7a862..fe02574390f6 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -886,7 +886,9 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 		}
 	}
 
-	if (!region->is_ioport) {
+	if (region->is_ioport) {
+		region->port_base = pci_get_io_port_block(region->info.size);
+	} else {
 		/* Grab some MMIO space in the guest */
 		map_size = ALIGN(region->info.size, PAGE_SIZE);
 		region->guest_phys_addr = pci_get_mmio_block(map_size);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 24/32] pci: Limit configuration transaction size to 32 bits
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (22 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 23/32] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-04-02  8:34   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 25/32] vfio/pci: Don't write configuration value twice Alexandru Elisei
                   ` (8 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

From PCI Local Bus Specification Revision 3.0. section 3.8 "64-Bit Bus
Extension":

"The bandwidth requirements for I/O and configuration transactions cannot
justify the added complexity, and, therefore, only memory transactions
support 64-bit data transfers".

Further down, the spec also describes the possible responses of a target
which has been requested to do a 64-bit transaction. Limit the transaction
to the lower 32 bits, to match the second accepted behaviour.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 pci.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/pci.c b/pci.c
index 7399c76c0819..611e2c0bf1da 100644
--- a/pci.c
+++ b/pci.c
@@ -119,6 +119,9 @@ static bool pci_config_data_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 {
 	union pci_config_address pci_config_address;
 
+	if (size > 4)
+		size = 4;
+
 	pci_config_address.w = ioport__read32(&pci_config_address_bits);
 	/*
 	 * If someone accesses PCI configuration space offsets that are not
@@ -135,6 +138,9 @@ static bool pci_config_data_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 {
 	union pci_config_address pci_config_address;
 
+	if (size > 4)
+		size = 4;
+
 	pci_config_address.w = ioport__read32(&pci_config_address_bits);
 	/*
 	 * If someone accesses PCI configuration space offsets that are not
@@ -248,6 +254,9 @@ static void pci_config_mmio_access(struct kvm_cpu *vcpu, u64 addr, u8 *data,
 	cfg_addr.w		= (u32)addr;
 	cfg_addr.enable_bit	= 1;
 
+	if (len > 4)
+		len = 4;
+
 	if (is_write)
 		pci__config_wr(kvm, cfg_addr, data, len);
 	else
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 25/32] vfio/pci: Don't write configuration value twice
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (23 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 24/32] pci: Limit configuration transaction size to 32 bits Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-04-02  8:35   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 26/32] vesa: Create device exactly once Alexandru Elisei
                   ` (7 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

After writing to the device fd as part of the PCI configuration space
emulation, we read back from the device to make sure that the write
finished. The value is read back into the PCI configuration space and
afterwards, the same value is copied by the PCI emulation code. Let's
read from the device fd into a temporary variable, to prevent this
double write.

The double write is harmless in itself. But when we implement
reassignable BARs, we need to keep track of the old BAR value, and the
VFIO code is overwritting it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index fe02574390f6..8b2a0c8dbac3 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -470,7 +470,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
 	struct vfio_device *vdev;
-	void *base = pci_hdr;
+	u32 tmp;
 
 	if (offset == PCI_ROM_ADDRESS)
 		return;
@@ -490,7 +490,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
 		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
 
-	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
+	if (pread(vdev->fd, &tmp, sz, info->offset + offset) != sz)
 		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
 			      sz, offset);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 26/32] vesa: Create device exactly once
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (24 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 25/32] vfio/pci: Don't write configuration value twice Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-04-02  8:58   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 27/32] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
                   ` (6 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

A vesa device is used by the SDL, GTK or VNC framebuffers. Don't allow the
user to specify more than one of these options because kvmtool will create
identical devices and bad things will happen:

$ ./lkvm run -c2 -m2048 -k bzImage --sdl --gtk
  # lkvm run -k bzImage -m 2048 -c 2 --name guest-10159
  Error: device region [d0000000-d012bfff] would overlap device region [d0000000-d012bfff]
*** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
*** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
*** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
======= Backtrace: =========
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
(+0x777e5)[0x7fae0ed447e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fae0ed5153c]
*** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
/usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
/usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
======= Backtrace: =========
Aborted (core dumped)

The vesa device is explicitly created during the initialization phase of
the above framebuffers. Remove the superfluous check for their existence.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index dd59a112330b..8071ad153f27 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -61,8 +61,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
 	BUILD_BUG_ON(VESA_MEM_SIZE < VESA_BPP/8 * VESA_WIDTH * VESA_HEIGHT);
 
-	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
-		return NULL;
+	if (device__find_dev(vesa_device.bus_type, vesa_device.dev_num) == &vesa_device) {
+		r = -EEXIST;
+		goto out_error;
+	}
+
 	vesa_base_addr = pci_get_io_port_block(PCI_IO_SIZE);
 	r = ioport__register(kvm, vesa_base_addr, &vesa_io_ops, PCI_IO_SIZE, NULL);
 	if (r < 0)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 27/32] pci: Implement callbacks for toggling BAR emulation
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (25 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 26/32] vesa: Create device exactly once Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-04-03 11:57   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 28/32] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
                   ` (5 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, Alexandru Elisei

From: Alexandru Elisei <alexandru.elisei@gmail.com>

Implement callbacks for activating and deactivating emulation for a BAR
region. This is in preparation for allowing a guest operating system to
enable and disable access to I/O or memory space, or to reassign the
BARs.

The emulated vesa device framebuffer isn't designed to allow stopping and
restarting at arbitrary points in the guest execution. Furthermore, on x86,
the kernel will not change the BAR addresses, which on bare metal are
programmed by the firmware, so take the easy way out and refuse to
activate/deactivate emulation for the BAR regions. We also take this
opportunity to make the vesa emulation code more consistent by moving all
static variable definitions in one place, at the top of the file.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c         |  70 ++++++++++++++++++++------------
 include/kvm/pci.h |  18 ++++++++-
 pci.c             |  44 ++++++++++++++++++++
 vfio/pci.c        | 100 ++++++++++++++++++++++++++++++++++++++--------
 virtio/pci.c      |  90 ++++++++++++++++++++++++++++++-----------
 5 files changed, 254 insertions(+), 68 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index 8071ad153f27..31c2d16ae4de 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -18,6 +18,31 @@
 #include <inttypes.h>
 #include <unistd.h>
 
+static struct pci_device_header vesa_pci_device = {
+	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
+	.device_id	= cpu_to_le16(PCI_DEVICE_ID_VESA),
+	.header_type	= PCI_HEADER_TYPE_NORMAL,
+	.revision_id	= 0,
+	.class[2]	= 0x03,
+	.subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
+	.subsys_id	= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
+	.bar[1]		= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
+	.bar_size[1]	= VESA_MEM_SIZE,
+};
+
+static struct device_header vesa_device = {
+	.bus_type	= DEVICE_BUS_PCI,
+	.data		= &vesa_pci_device,
+};
+
+static struct framebuffer vesafb = {
+	.width		= VESA_WIDTH,
+	.height		= VESA_HEIGHT,
+	.depth		= VESA_BPP,
+	.mem_addr	= VESA_MEM_ADDR,
+	.mem_size	= VESA_MEM_SIZE,
+};
+
 static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
 {
 	return true;
@@ -33,24 +58,19 @@ static struct ioport_operations vesa_io_ops = {
 	.io_out			= vesa_pci_io_out,
 };
 
-static struct pci_device_header vesa_pci_device = {
-	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
-	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
-	.header_type		= PCI_HEADER_TYPE_NORMAL,
-	.revision_id		= 0,
-	.class[2]		= 0x03,
-	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
-	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
-	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
-	.bar_size[1]		= VESA_MEM_SIZE,
-};
-
-static struct device_header vesa_device = {
-	.bus_type	= DEVICE_BUS_PCI,
-	.data		= &vesa_pci_device,
-};
+static int vesa__bar_activate(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      int bar_num, void *data)
+{
+	/* We don't support remapping of the framebuffer. */
+	return 0;
+}
 
-static struct framebuffer vesafb;
+static int vesa__bar_deactivate(struct kvm *kvm, struct pci_device_header *pci_hdr,
+				int bar_num, void *data)
+{
+	/* We don't support remapping of the framebuffer. */
+	return -EINVAL;
+}
 
 struct framebuffer *vesa__init(struct kvm *kvm)
 {
@@ -73,6 +93,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
 	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
+	r = pci__register_bar_regions(kvm, &vesa_pci_device, vesa__bar_activate,
+				      vesa__bar_deactivate, NULL);
+	if (r < 0)
+		goto unregister_ioport;
+
 	r = device__register(&vesa_device);
 	if (r < 0)
 		goto unregister_ioport;
@@ -87,15 +112,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 	if (r < 0)
 		goto unmap_dev;
 
-	vesafb = (struct framebuffer) {
-		.width			= VESA_WIDTH,
-		.height			= VESA_HEIGHT,
-		.depth			= VESA_BPP,
-		.mem			= mem,
-		.mem_addr		= VESA_MEM_ADDR,
-		.mem_size		= VESA_MEM_SIZE,
-		.kvm			= kvm,
-	};
+	vesafb.mem = mem;
+	vesafb.kvm = kvm;
 	return fb__register(&vesafb);
 
 unmap_dev:
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index adb4b5c082d5..1d7d4c0cea5a 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -89,12 +89,19 @@ struct pci_cap_hdr {
 	u8	next;
 };
 
+struct pci_device_header;
+
+typedef int (*bar_activate_fn_t)(struct kvm *kvm,
+				 struct pci_device_header *pci_hdr,
+				 int bar_num, void *data);
+typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
+				   struct pci_device_header *pci_hdr,
+				   int bar_num, void *data);
+
 #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
 #define PCI_DEV_CFG_SIZE	256
 #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
-struct pci_device_header;
-
 struct pci_config_operations {
 	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
 		      u8 offset, void *data, int sz);
@@ -136,6 +143,9 @@ struct pci_device_header {
 
 	/* Private to lkvm */
 	u32		bar_size[6];
+	bar_activate_fn_t	bar_activate_fn;
+	bar_deactivate_fn_t	bar_deactivate_fn;
+	void *data;
 	struct pci_config_operations	cfg_ops;
 	/*
 	 * PCI INTx# are level-triggered, but virtual device often feature
@@ -162,6 +172,10 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
 
+int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      bar_activate_fn_t bar_activate_fn,
+			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
+
 static inline bool __pci__memory_space_enabled(u16 command)
 {
 	return command & PCI_COMMAND_MEMORY;
diff --git a/pci.c b/pci.c
index 611e2c0bf1da..4ace190898f2 100644
--- a/pci.c
+++ b/pci.c
@@ -66,6 +66,11 @@ void pci__assign_irq(struct device_header *dev_hdr)
 		pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
 }
 
+static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return pci__bar_size(pci_hdr, bar_num);
+}
+
 static void *pci_config_address_ptr(u16 port)
 {
 	unsigned long offset;
@@ -273,6 +278,45 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
 	return hdr->data;
 }
 
+int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      bar_activate_fn_t bar_activate_fn,
+			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
+{
+	int i, r;
+	bool has_bar_regions = false;
+
+	assert(bar_activate_fn && bar_deactivate_fn);
+
+	pci_hdr->bar_activate_fn = bar_activate_fn;
+	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
+	pci_hdr->data = data;
+
+	for (i = 0; i < 6; i++) {
+		if (!pci_bar_is_implemented(pci_hdr, i))
+			continue;
+
+		has_bar_regions = true;
+
+		if (pci__bar_is_io(pci_hdr, i) &&
+		    pci__io_space_enabled(pci_hdr)) {
+				r = bar_activate_fn(kvm, pci_hdr, i, data);
+				if (r < 0)
+					return r;
+			}
+
+		if (pci__bar_is_memory(pci_hdr, i) &&
+		    pci__memory_space_enabled(pci_hdr)) {
+				r = bar_activate_fn(kvm, pci_hdr, i, data);
+				if (r < 0)
+					return r;
+			}
+	}
+
+	assert(has_bar_regions);
+
+	return 0;
+}
+
 int pci__init(struct kvm *kvm)
 {
 	int r;
diff --git a/vfio/pci.c b/vfio/pci.c
index 8b2a0c8dbac3..18e22a8c5320 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -8,6 +8,8 @@
 #include <sys/resource.h>
 #include <sys/time.h>
 
+#include <assert.h>
+
 /* Wrapper around UAPI vfio_irq_set */
 union vfio_irq_eventfd {
 	struct vfio_irq_set	irq;
@@ -446,6 +448,81 @@ out_unlock:
 	mutex_unlock(&pdev->msi.mutex);
 }
 
+static int vfio_pci_bar_activate(struct kvm *kvm,
+				 struct pci_device_header *pci_hdr,
+				 int bar_num, void *data)
+{
+	struct vfio_device *vdev = data;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
+	struct vfio_pci_msix_table *table = &pdev->msix_table;
+	struct vfio_region *region;
+	bool has_msix;
+	int ret;
+
+	assert((u32)bar_num < vdev->info.num_regions);
+
+	region = &vdev->regions[bar_num];
+	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
+
+	if (has_msix && (u32)bar_num == table->bar) {
+		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
+					 table->size, false,
+					 vfio_pci_msix_table_access, pdev);
+		if (ret < 0 || table->bar != pba->bar)
+			goto out;
+	}
+
+	if (has_msix && (u32)bar_num == pba->bar) {
+		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
+					 pba->size, false,
+					 vfio_pci_msix_pba_access, pdev);
+		goto out;
+	}
+
+	ret = vfio_map_region(kvm, vdev, region);
+out:
+	return ret;
+}
+
+static int vfio_pci_bar_deactivate(struct kvm *kvm,
+				   struct pci_device_header *pci_hdr,
+				   int bar_num, void *data)
+{
+	struct vfio_device *vdev = data;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
+	struct vfio_pci_msix_table *table = &pdev->msix_table;
+	struct vfio_region *region;
+	bool has_msix, success;
+	int ret;
+
+	assert((u32)bar_num < vdev->info.num_regions);
+
+	region = &vdev->regions[bar_num];
+	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
+
+	if (has_msix && (u32)bar_num == table->bar) {
+		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
+		/* kvm__deregister_mmio fails when the region is not found. */
+		ret = (success ? 0 : -ENOENT);
+		if (ret < 0 || table->bar!= pba->bar)
+			goto out;
+	}
+
+	if (has_msix && (u32)bar_num == pba->bar) {
+		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
+		ret = (success ? 0 : -ENOENT);
+		goto out;
+	}
+
+	vfio_unmap_region(kvm, region);
+	ret = 0;
+
+out:
+	return ret;
+}
+
 static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
 			      u8 offset, void *data, int sz)
 {
@@ -805,12 +882,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
 		ret = -ENOMEM;
 		goto out_free;
 	}
-	pba->guest_phys_addr = table->guest_phys_addr + table->size;
-
-	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
-				 false, vfio_pci_msix_table_access, pdev);
-	if (ret < 0)
-		goto out_free;
 
 	/*
 	 * We could map the physical PBA directly into the guest, but it's
@@ -820,10 +891,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
 	 * between MSI-X table and PBA. For the sake of isolation, create a
 	 * virtual PBA.
 	 */
-	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
-				 vfio_pci_msix_pba_access, pdev);
-	if (ret < 0)
-		goto out_free;
+	pba->guest_phys_addr = table->guest_phys_addr + table->size;
 
 	pdev->msix.entries = entries;
 	pdev->msix.nr_entries = nr_entries;
@@ -894,11 +962,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 		region->guest_phys_addr = pci_get_mmio_block(map_size);
 	}
 
-	/* Map the BARs into the guest or setup a trap region. */
-	ret = vfio_map_region(kvm, vdev, region);
-	if (ret)
-		return ret;
-
 	return 0;
 }
 
@@ -945,7 +1008,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 	}
 
 	/* We've configured the BARs, fake up a Configuration Space */
-	return vfio_pci_fixup_cfg_space(vdev);
+	ret = vfio_pci_fixup_cfg_space(vdev);
+	if (ret)
+		return ret;
+
+	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
+					 vfio_pci_bar_deactivate, vdev);
 }
 
 /*
diff --git a/virtio/pci.c b/virtio/pci.c
index d111dc499f5e..598da699c241 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -11,6 +11,7 @@
 #include <sys/ioctl.h>
 #include <linux/virtio_pci.h>
 #include <linux/byteorder.h>
+#include <assert.h>
 #include <string.h>
 
 static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
@@ -462,6 +463,64 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
 }
 
+static int virtio_pci__bar_activate(struct kvm *kvm,
+				    struct pci_device_header *pci_hdr,
+				    int bar_num, void *data)
+{
+	struct virtio_device *vdev = data;
+	u32 bar_addr, bar_size;
+	int r = -EINVAL;
+
+	assert(bar_num <= 2);
+
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+	bar_size = pci__bar_size(pci_hdr, bar_num);
+
+	switch (bar_num) {
+	case 0:
+		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
+				     bar_size, vdev);
+		if (r > 0)
+			r = 0;
+		break;
+	case 1:
+		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
+					virtio_pci__io_mmio_callback, vdev);
+		break;
+	case 2:
+		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
+					virtio_pci__msix_mmio_callback, vdev);
+	}
+
+	return r;
+}
+
+static int virtio_pci__bar_deactivate(struct kvm *kvm,
+				      struct pci_device_header *pci_hdr,
+				      int bar_num, void *data)
+{
+	u32 bar_addr;
+	bool success;
+	int r = -EINVAL;
+
+	assert(bar_num <= 2);
+
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+
+	switch (bar_num) {
+	case 0:
+		r = ioport__unregister(kvm, bar_addr);
+		break;
+	case 1:
+	case 2:
+		success = kvm__deregister_mmio(kvm, bar_addr);
+		/* kvm__deregister_mmio fails when the region is not found. */
+		r = (success ? 0 : -ENOENT);
+	}
+
+	return r;
+}
+
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		     int device_id, int subsys_id, int class)
 {
@@ -476,23 +535,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
 	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
-	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
-			     vdev);
-	if (r < 0)
-		return r;
-	port_addr = (u16)r;
-
 	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
-	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
-			       virtio_pci__io_mmio_callback, vdev);
-	if (r < 0)
-		goto free_ioport;
-
 	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
-	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
-			       virtio_pci__msix_mmio_callback, vdev);
-	if (r < 0)
-		goto free_mmio;
 
 	vpci->pci_hdr = (struct pci_device_header) {
 		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
@@ -518,6 +562,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
 	};
 
+	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
+				      virtio_pci__bar_activate,
+				      virtio_pci__bar_deactivate, vdev);
+	if (r < 0)
+		return r;
+
 	vpci->dev_hdr = (struct device_header) {
 		.bus_type		= DEVICE_BUS_PCI,
 		.data			= &vpci->pci_hdr,
@@ -548,20 +598,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	r = device__register(&vpci->dev_hdr);
 	if (r < 0)
-		goto free_msix_mmio;
+		return r;
 
 	/* save the IRQ that device__register() has allocated */
 	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
 
 	return 0;
-
-free_msix_mmio:
-	kvm__deregister_mmio(kvm, msix_io_block);
-free_mmio:
-	kvm__deregister_mmio(kvm, mmio_addr);
-free_ioport:
-	ioport__unregister(kvm, port_addr);
-	return r;
 }
 
 int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 28/32] pci: Toggle BAR I/O and memory space emulation
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (26 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 27/32] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-04-06 14:03   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 29/32] pci: Implement reassignable BARs Alexandru Elisei
                   ` (4 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

During configuration of the BAR addresses, a Linux guest disables and
enables access to I/O and memory space. When access is disabled, we don't
stop emulating the memory regions described by the BARs. Now that we have
callbacks for activating and deactivating emulation for a BAR region,
let's use that to stop emulation when access is disabled, and
re-activate it when access is re-enabled.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 pci.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/pci.c b/pci.c
index 4ace190898f2..c2860e6707fe 100644
--- a/pci.c
+++ b/pci.c
@@ -163,6 +163,42 @@ static struct ioport_operations pci_config_data_ops = {
 	.io_out	= pci_config_data_out,
 };
 
+static void pci_config_command_wr(struct kvm *kvm,
+				  struct pci_device_header *pci_hdr,
+				  u16 new_command)
+{
+	int i;
+	bool toggle_io, toggle_mem;
+
+	toggle_io = (pci_hdr->command ^ new_command) & PCI_COMMAND_IO;
+	toggle_mem = (pci_hdr->command ^ new_command) & PCI_COMMAND_MEMORY;
+
+	for (i = 0; i < 6; i++) {
+		if (!pci_bar_is_implemented(pci_hdr, i))
+			continue;
+
+		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
+			if (__pci__io_space_enabled(new_command))
+				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
+							 pci_hdr->data);
+			else
+				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
+							   pci_hdr->data);
+		}
+
+		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
+			if (__pci__memory_space_enabled(new_command))
+				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
+							 pci_hdr->data);
+			else
+				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
+							   pci_hdr->data);
+		}
+	}
+
+	pci_hdr->command = new_command;
+}
+
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
@@ -188,6 +224,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	if (*(u32 *)(base + offset) == 0)
 		return;
 
+	if (offset == PCI_COMMAND) {
+		memcpy(&value, data, size);
+		pci_config_command_wr(kvm, pci_hdr, (u16)value);
+		return;
+	}
+
 	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 29/32] pci: Implement reassignable BARs
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (27 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 28/32] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-04-06 14:05   ` André Przywara
  2020-03-26 15:24 ` [PATCH v3 kvmtool 30/32] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
                   ` (3 subsequent siblings)
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

BARs are used by the guest to configure the access to the PCI device by
writing the address to which the device will respond. The basic idea for
adding support for reassignable BARs is straightforward: deactivate
emulation for the memory region described by the old BAR value, and
activate emulation for the new region.

BAR reassignement can be done while device access is enabled and memory
regions for different devices can overlap as long as no access is made
to the overlapping memory regions. This means that it is legal for the
BARs of two distinct devices to point to an overlapping memory region,
and indeed, this is how Linux does resource assignment at boot. To
account for this situation, the simple algorithm described above is
enhanced to scan for all devices and:

- Deactivate emulation for any BARs that might overlap with the new BAR
  value.

- Enable emulation for any BARs that were overlapping with the old value
  after the BAR has been updated.

Activating/deactivating emulation of a memory region has side effects.
In order to prevent the execution of the same callback twice we now keep
track of the state of the region emulation. For example, this can happen
if we program a BAR with an address that overlaps a second BAR, thus
deactivating emulation for the second BAR, and then we disable all
region accesses to the second BAR by writing to the command register.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/pci.h |  14 ++-
 pci.c             | 273 +++++++++++++++++++++++++++++++++++++---------
 vfio/pci.c        |  12 ++
 3 files changed, 244 insertions(+), 55 deletions(-)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 1d7d4c0cea5a..be75f77fd2cb 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -11,6 +11,17 @@
 #include "kvm/msi.h"
 #include "kvm/fdt.h"
 
+#define pci_dev_err(pci_hdr, fmt, ...) \
+	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_warn(pci_hdr, fmt, ...) \
+	pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_info(pci_hdr, fmt, ...) \
+	pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_dbg(pci_hdr, fmt, ...) \
+	pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_die(pci_hdr, fmt, ...) \
+	die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+
 /*
  * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
  * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
@@ -142,7 +153,8 @@ struct pci_device_header {
 	};
 
 	/* Private to lkvm */
-	u32		bar_size[6];
+	u32			bar_size[6];
+	bool			bar_active[6];
 	bar_activate_fn_t	bar_activate_fn;
 	bar_deactivate_fn_t	bar_deactivate_fn;
 	void *data;
diff --git a/pci.c b/pci.c
index c2860e6707fe..68ece65441a6 100644
--- a/pci.c
+++ b/pci.c
@@ -71,6 +71,11 @@ static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_nu
 	return pci__bar_size(pci_hdr, bar_num);
 }
 
+static bool pci_bar_is_active(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return  pci_hdr->bar_active[bar_num];
+}
+
 static void *pci_config_address_ptr(u16 port)
 {
 	unsigned long offset;
@@ -163,6 +168,46 @@ static struct ioport_operations pci_config_data_ops = {
 	.io_out	= pci_config_data_out,
 };
 
+static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			    int bar_num)
+{
+	int r = 0;
+
+	if (pci_bar_is_active(pci_hdr, bar_num))
+		goto out;
+
+	r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
+	if (r < 0) {
+		pci_dev_warn(pci_hdr, "Error activating emulation for BAR %d",
+			    bar_num);
+		goto out;
+	}
+	pci_hdr->bar_active[bar_num] = true;
+
+out:
+	return r;
+}
+
+static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      int bar_num)
+{
+	int r = 0;
+
+	if (!pci_bar_is_active(pci_hdr, bar_num))
+		goto out;
+
+	r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
+	if (r < 0) {
+		pci_dev_warn(pci_hdr, "Error deactivating emulation for BAR %d",
+			    bar_num);
+		goto out;
+	}
+	pci_hdr->bar_active[bar_num] = false;
+
+out:
+	return r;
+}
+
 static void pci_config_command_wr(struct kvm *kvm,
 				  struct pci_device_header *pci_hdr,
 				  u16 new_command)
@@ -179,26 +224,179 @@ static void pci_config_command_wr(struct kvm *kvm,
 
 		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
 			if (__pci__io_space_enabled(new_command))
-				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
-							 pci_hdr->data);
+				pci_activate_bar(kvm, pci_hdr, i);
 			else
-				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
-							   pci_hdr->data);
+				pci_deactivate_bar(kvm, pci_hdr, i);
 		}
 
 		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
 			if (__pci__memory_space_enabled(new_command))
-				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
-							 pci_hdr->data);
+				pci_activate_bar(kvm, pci_hdr, i);
 			else
-				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
-							   pci_hdr->data);
+				pci_deactivate_bar(kvm, pci_hdr, i);
 		}
 	}
 
 	pci_hdr->command = new_command;
 }
 
+static int pci_deactivate_bar_regions(struct kvm *kvm,
+				      struct pci_device_header *pci_hdr,
+				      u32 start, u32 size)
+{
+	struct device_header *dev_hdr;
+	struct pci_device_header *tmp_hdr;
+	u32 tmp_addr, tmp_size;
+	int i, r;
+
+	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
+	while (dev_hdr) {
+		tmp_hdr = dev_hdr->data;
+		for (i = 0; i < 6; i++) {
+			if (!pci_bar_is_implemented(tmp_hdr, i))
+				continue;
+
+			tmp_addr = pci__bar_address(tmp_hdr, i);
+			tmp_size = pci__bar_size(tmp_hdr, i);
+
+			if (tmp_addr + tmp_size <= start ||
+			    tmp_addr >= start + size)
+				continue;
+
+			r = pci_deactivate_bar(kvm, tmp_hdr, i);
+			if (r < 0)
+				return r;
+		}
+		dev_hdr = device__next_dev(dev_hdr);
+	}
+
+	return 0;
+}
+
+static int pci_activate_bar_regions(struct kvm *kvm,
+				    struct pci_device_header *pci_hdr,
+				    u32 start, u32 size)
+{
+	struct device_header *dev_hdr;
+	struct pci_device_header *tmp_hdr;
+	u32 tmp_addr, tmp_size;
+	int i, r;
+
+	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
+	while (dev_hdr) {
+		tmp_hdr = dev_hdr->data;
+		for (i = 0; i < 6; i++) {
+			if (!pci_bar_is_implemented(tmp_hdr, i))
+				continue;
+
+			tmp_addr = pci__bar_address(tmp_hdr, i);
+			tmp_size = pci__bar_size(tmp_hdr, i);
+
+			if (tmp_addr + tmp_size <= start ||
+			    tmp_addr >= start + size)
+				continue;
+
+			r = pci_activate_bar(kvm, tmp_hdr, i);
+			if (r < 0)
+				return r;
+		}
+		dev_hdr = device__next_dev(dev_hdr);
+	}
+
+	return 0;
+}
+
+static void pci_config_bar_wr(struct kvm *kvm,
+			      struct pci_device_header *pci_hdr, int bar_num,
+			      u32 value)
+{
+	u32 old_addr, new_addr, bar_size;
+	u32 mask;
+	int r;
+
+	if (pci__bar_is_io(pci_hdr, bar_num))
+		mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
+	else
+		mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
+
+	/*
+	 * If the kernel masks the BAR, it will expect to find the size of the
+	 * BAR there next time it reads from it. After the kernel reads the
+	 * size, it will write the address back.
+	 *
+	 * According to the PCI local bus specification REV 3.0: The number of
+	 * upper bits that a device actually implements depends on how much of
+	 * the address space the device will respond to. A device that wants a 1
+	 * MB memory address space (using a 32-bit base address register) would
+	 * build the top 12 bits of the address register, hardwiring the other
+	 * bits to 0.
+	 *
+	 * Furthermore, software can determine how much address space the device
+	 * requires by writing a value of all 1's to the register and then
+	 * reading the value back. The device will return 0's in all don't-care
+	 * address bits, effectively specifying the address space required.
+	 *
+	 * Software computes the size of the address space with the formula
+	 * S =  ~B + 1, where S is the memory size and B is the value read from
+	 * the BAR. This means that the BAR value that kvmtool should return is
+	 * B = ~(S - 1).
+	 */
+	if (value == 0xffffffff) {
+		value = ~(pci__bar_size(pci_hdr, bar_num) - 1);
+		/* Preserve the special bits. */
+		value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
+		pci_hdr->bar[bar_num] = value;
+		return;
+	}
+
+	value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
+
+	/* Don't toggle emulation when region type access is disbled. */
+	if (pci__bar_is_io(pci_hdr, bar_num) &&
+	    !pci__io_space_enabled(pci_hdr)) {
+		pci_hdr->bar[bar_num] = value;
+		return;
+	}
+
+	if (pci__bar_is_memory(pci_hdr, bar_num) &&
+	    !pci__memory_space_enabled(pci_hdr)) {
+		pci_hdr->bar[bar_num] = value;
+		return;
+	}
+
+	old_addr = pci__bar_address(pci_hdr, bar_num);
+	new_addr = __pci__bar_address(value);
+	bar_size = pci__bar_size(pci_hdr, bar_num);
+
+	r = pci_deactivate_bar(kvm, pci_hdr, bar_num);
+	if (r < 0)
+		return;
+
+	r = pci_deactivate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
+	if (r < 0) {
+		/*
+		 * We cannot update the BAR because of an overlapping region
+		 * that failed to deactivate emulation, so keep the old BAR
+		 * value and re-activate emulation for it.
+		 */
+		pci_activate_bar(kvm, pci_hdr, bar_num);
+		return;
+	}
+
+	pci_hdr->bar[bar_num] = value;
+	r = pci_activate_bar(kvm, pci_hdr, bar_num);
+	if (r < 0) {
+		/*
+		 * New region cannot be emulated, re-enable the regions that
+		 * were overlapping.
+		 */
+		pci_activate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
+		return;
+	}
+
+	pci_activate_bar_regions(kvm, pci_hdr, old_addr, bar_size);
+}
+
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
@@ -206,7 +404,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 	u32 value = 0;
-	u32 mask;
 
 	if (!pci_device_exists(addr.bus_number, dev_num, 0))
 		return;
@@ -231,46 +428,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	}
 
 	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
-
-	/*
-	 * If the kernel masks the BAR, it will expect to find the size of the
-	 * BAR there next time it reads from it. After the kernel reads the
-	 * size, it will write the address back.
-	 */
 	if (bar < 6) {
-		if (pci__bar_is_io(pci_hdr, bar))
-			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
-		else
-			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
-		/*
-		 * According to the PCI local bus specification REV 3.0:
-		 * The number of upper bits that a device actually implements
-		 * depends on how much of the address space the device will
-		 * respond to. A device that wants a 1 MB memory address space
-		 * (using a 32-bit base address register) would build the top
-		 * 12 bits of the address register, hardwiring the other bits
-		 * to 0.
-		 *
-		 * Furthermore, software can determine how much address space
-		 * the device requires by writing a value of all 1's to the
-		 * register and then reading the value back. The device will
-		 * return 0's in all don't-care address bits, effectively
-		 * specifying the address space required.
-		 *
-		 * Software computes the size of the address space with the
-		 * formula S = ~B + 1, where S is the memory size and B is the
-		 * value read from the BAR. This means that the BAR value that
-		 * kvmtool should return is B = ~(S - 1).
-		 */
 		memcpy(&value, data, size);
-		if (value == 0xffffffff)
-			value = ~(pci__bar_size(pci_hdr, bar) - 1);
-		/* Preserve the special bits. */
-		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
-		memcpy(base + offset, &value, size);
-	} else {
-		memcpy(base + offset, data, size);
+		pci_config_bar_wr(kvm, pci_hdr, bar, value);
+		return;
 	}
+
+	memcpy(base + offset, data, size);
 }
 
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
@@ -338,20 +502,21 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr
 			continue;
 
 		has_bar_regions = true;
+		assert(!pci_bar_is_active(pci_hdr, i));
 
 		if (pci__bar_is_io(pci_hdr, i) &&
 		    pci__io_space_enabled(pci_hdr)) {
-				r = bar_activate_fn(kvm, pci_hdr, i, data);
-				if (r < 0)
-					return r;
-			}
+			r = pci_activate_bar(kvm, pci_hdr, i);
+			if (r < 0)
+				return r;
+		}
 
 		if (pci__bar_is_memory(pci_hdr, i) &&
 		    pci__memory_space_enabled(pci_hdr)) {
-				r = bar_activate_fn(kvm, pci_hdr, i, data);
-				if (r < 0)
-					return r;
-			}
+			r = pci_activate_bar(kvm, pci_hdr, i);
+			if (r < 0)
+				return r;
+		}
 	}
 
 	assert(has_bar_regions);
diff --git a/vfio/pci.c b/vfio/pci.c
index 18e22a8c5320..2b891496547d 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -457,6 +457,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
 	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
 	struct vfio_pci_msix_table *table = &pdev->msix_table;
 	struct vfio_region *region;
+	u32 bar_addr;
 	bool has_msix;
 	int ret;
 
@@ -465,7 +466,14 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
 	region = &vdev->regions[bar_num];
 	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
 
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+	if (pci__bar_is_io(pci_hdr, bar_num))
+		region->port_base = bar_addr;
+	else
+		region->guest_phys_addr = bar_addr;
+
 	if (has_msix && (u32)bar_num == table->bar) {
+		table->guest_phys_addr = region->guest_phys_addr;
 		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
 					 table->size, false,
 					 vfio_pci_msix_table_access, pdev);
@@ -474,6 +482,10 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
 	}
 
 	if (has_msix && (u32)bar_num == pba->bar) {
+		if (pba->bar == table->bar)
+			pba->guest_phys_addr = table->guest_phys_addr + table->size;
+		else
+			pba->guest_phys_addr = region->guest_phys_addr;
 		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
 					 pba->size, false,
 					 vfio_pci_msix_pba_access, pdev);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 30/32] arm/fdt: Remove 'linux,pci-probe-only' property
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (28 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 29/32] pci: Implement reassignable BARs Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 31/32] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

PCI now supports configurable BARs. Get rid of the no longer needed,
Linux-only, fdt property.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/fdt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index c80e6da323b6..02091e9e0bee 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -130,7 +130,6 @@ static int setup_fdt(struct kvm *kvm)
 
 	/* /chosen */
 	_FDT(fdt_begin_node(fdt, "chosen"));
-	_FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1));
 
 	/* Pass on our amended command line to a Linux kernel only. */
 	if (kvm->cfg.firmware_filename) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 31/32] vfio: Trap MMIO access to BAR addresses which aren't page aligned
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (29 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 30/32] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-03-26 15:24 ` [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
  2020-04-06 14:14 ` [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE " André Przywara
  32 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

KVM_SET_USER_MEMORY_REGION will fail if the guest physical address is
not aligned to the page size. However, it is legal for a guest to
program an address which isn't aligned to the page size. Trap and
emulate MMIO accesses to the region when that happens.

Without this patch, when assigning a Seagate Barracude hard drive to a
VM I was seeing these errors:

[    0.286029] pci 0000:00:00.0: BAR 0: assigned [mem 0x41004600-0x4100467f]
  Error: 0000:01:00.0: failed to register region with KVM
  Error: [1095:3132] Error activating emulation for BAR 0
[..]
[   10.561794] irq 13: nobody cared (try booting with the "irqpoll" option)
[   10.563122] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-seattle-00009-g909b20467ed1 #133
[   10.563124] Hardware name: linux,dummy-virt (DT)
[   10.563126] Call trace:
[   10.563134]  dump_backtrace+0x0/0x140
[   10.563137]  show_stack+0x14/0x20
[   10.563141]  dump_stack+0xbc/0x100
[   10.563146]  __report_bad_irq+0x48/0xd4
[   10.563148]  note_interrupt+0x288/0x378
[   10.563151]  handle_irq_event_percpu+0x80/0x88
[   10.563153]  handle_irq_event+0x44/0xc8
[   10.563155]  handle_fasteoi_irq+0xb4/0x160
[   10.563157]  generic_handle_irq+0x24/0x38
[   10.563159]  __handle_domain_irq+0x60/0xb8
[   10.563162]  gic_handle_irq+0x50/0xa0
[   10.563164]  el1_irq+0xb8/0x180
[   10.563166]  arch_cpu_idle+0x10/0x18
[   10.563170]  do_idle+0x204/0x290
[   10.563172]  cpu_startup_entry+0x20/0x40
[   10.563175]  rest_init+0xd4/0xe0
[   10.563180]  arch_call_rest_init+0xc/0x14
[   10.563182]  start_kernel+0x420/0x44c
[   10.563183] handlers:
[   10.563650] [<000000001e474803>] sil24_interrupt
[   10.564559] Disabling IRQ #13
[..]
[   11.832916] ata1: spurious interrupt (slot_stat 0x0 active_tag -84148995 sactive 0x0)
[   12.045444] ata_ratelimit: 1 callbacks suppressed

With this patch, I don't see the errors and the device works as
expected.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/vfio/core.c b/vfio/core.c
index bad3c7c8cd26..0b45e78b40b4 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -226,6 +226,15 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
 	if (!(region->info.flags & VFIO_REGION_INFO_FLAG_MMAP))
 		return vfio_setup_trap_region(kvm, vdev, region);
 
+	/*
+	 * KVM_SET_USER_MEMORY_REGION will fail because the guest physical
+	 * address isn't page aligned, let's emulate the region ourselves.
+	 */
+	if (region->guest_phys_addr & (PAGE_SIZE - 1))
+		return kvm__register_mmio(kvm, region->guest_phys_addr,
+					  region->info.size, false,
+					  vfio_mmio_access, region);
+
 	if (region->info.flags & VFIO_REGION_INFO_FLAG_READ)
 		prot |= PROT_READ;
 	if (region->info.flags & VFIO_REGION_INFO_FLAG_WRITE)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (30 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 31/32] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
@ 2020-03-26 15:24 ` Alexandru Elisei
  2020-04-06 14:06   ` André Przywara
  2020-04-06 14:14 ` [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE " André Przywara
  32 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-03-26 15:24 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

PCI Express comes with an extended addressing scheme, which directly
translated into a bigger device configuration space (256->4096 bytes)
and bigger PCI configuration space (16->256 MB), as well as mandatory
capabilities (power management [1] and PCI Express capability [2]).

However, our virtio PCI implementation implements version 0.9 of the
protocol and it still uses transitional PCI device ID's, so we have
opted to omit the mandatory PCI Express capabilities.For VFIO, the power
management and PCI Express capability are left for a subsequent patch.

[1] PCI Express Base Specification Revision 1.1, section 7.6
[2] PCI Express Base Specification Revision 1.1, section 7.8

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/include/arm-common/kvm-arch.h |  4 +-
 arm/pci.c                         |  2 +-
 builtin-run.c                     |  1 +
 include/kvm/kvm-config.h          |  2 +-
 include/kvm/pci.h                 | 76 ++++++++++++++++++++++++++++---
 pci.c                             |  5 +-
 vfio/pci.c                        | 26 +++++++----
 7 files changed, 96 insertions(+), 20 deletions(-)

diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
index b9d486d5eac2..13c55fa3dc29 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -23,7 +23,7 @@
 
 #define ARM_IOPORT_SIZE		(ARM_MMIO_AREA - ARM_IOPORT_AREA)
 #define ARM_VIRTIO_MMIO_SIZE	(ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
-#define ARM_PCI_CFG_SIZE	(1ULL << 24)
+#define ARM_PCI_CFG_SIZE	(1ULL << 28)
 #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
 				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
 
@@ -50,6 +50,8 @@
 
 #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
 
+#define ARCH_HAS_PCI_EXP	1
+
 static inline bool arm_addr_in_ioport_region(u64 phys_addr)
 {
 	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
diff --git a/arm/pci.c b/arm/pci.c
index ed325fa4a811..2251f627d8b5 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
 	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
 	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
 	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
-	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
+	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
 	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
 
 	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
diff --git a/builtin-run.c b/builtin-run.c
index 9cb8c75300eb..def8a1f803ad 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -27,6 +27,7 @@
 #include "kvm/irq.h"
 #include "kvm/kvm.h"
 #include "kvm/pci.h"
+#include "kvm/vfio.h"
 #include "kvm/rtc.h"
 #include "kvm/sdl.h"
 #include "kvm/vnc.h"
diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
index a052b0bc7582..a1012c57b7a7 100644
--- a/include/kvm/kvm-config.h
+++ b/include/kvm/kvm-config.h
@@ -2,7 +2,6 @@
 #define KVM_CONFIG_H_
 
 #include "kvm/disk-image.h"
-#include "kvm/vfio.h"
 #include "kvm/kvm-config-arch.h"
 
 #define DEFAULT_KVM_DEV		"/dev/kvm"
@@ -18,6 +17,7 @@
 #define MIN_RAM_SIZE_MB		(64ULL)
 #define MIN_RAM_SIZE_BYTE	(MIN_RAM_SIZE_MB << MB_SHIFT)
 
+struct vfio_device_params;
 struct kvm_config {
 	struct kvm_config_arch arch;
 	struct disk_image_params disk_image[MAX_DISK_IMAGES];
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index be75f77fd2cb..71ee9d8cb01f 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -10,6 +10,7 @@
 #include "kvm/devices.h"
 #include "kvm/msi.h"
 #include "kvm/fdt.h"
+#include "kvm.h"
 
 #define pci_dev_err(pci_hdr, fmt, ...) \
 	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
@@ -32,9 +33,41 @@
 #define PCI_CONFIG_BUS_FORWARD	0xcfa
 #define PCI_IO_SIZE		0x100
 #define PCI_IOPORT_START	0x6200
-#define PCI_CFG_SIZE		(1ULL << 24)
 
-struct kvm;
+#define PCIE_CAP_REG_VER	0x1
+#define PCIE_CAP_REG_DEV_LEGACY	(1 << 4)
+#define PM_CAP_VER		0x3
+
+#ifdef ARCH_HAS_PCI_EXP
+#define PCI_CFG_SIZE		(1ULL << 28)
+#define PCI_DEV_CFG_SIZE	4096
+
+union pci_config_address {
+	struct {
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+		unsigned	reg_offset	: 2;		/* 1  .. 0  */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
+		unsigned	function_number	: 3;		/* 14 .. 12 */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
+		unsigned	enable_bit	: 1;		/* 31       */
+#else
+		unsigned	enable_bit	: 1;		/* 31       */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	function_number	: 3;		/* 14 .. 12 */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
+		unsigned	reg_offset	: 2;		/* 1  .. 0  */
+#endif
+	};
+	u32 w;
+};
+
+#else
+#define PCI_CFG_SIZE		(1ULL << 24)
+#define PCI_DEV_CFG_SIZE	256
 
 union pci_config_address {
 	struct {
@@ -58,6 +91,8 @@ union pci_config_address {
 	};
 	u32 w;
 };
+#endif
+#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
 struct msix_table {
 	struct msi_msg msg;
@@ -100,6 +135,33 @@ struct pci_cap_hdr {
 	u8	next;
 };
 
+struct pcie_cap {
+	u8 cap;
+	u8 next;
+	u16 cap_reg;
+	u32 dev_cap;
+	u16 dev_ctrl;
+	u16 dev_status;
+	u32 link_cap;
+	u16 link_ctrl;
+	u16 link_status;
+	u32 slot_cap;
+	u16 slot_ctrl;
+	u16 slot_status;
+	u16 root_ctrl;
+	u16 root_cap;
+	u32 root_status;
+};
+
+struct pm_cap {
+	u8 cap;
+	u8 next;
+	u16 pmc;
+	u16 pmcsr;
+	u8 pmcsr_bse;
+	u8 data;
+};
+
 struct pci_device_header;
 
 typedef int (*bar_activate_fn_t)(struct kvm *kvm,
@@ -110,14 +172,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
 				   int bar_num, void *data);
 
 #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
-#define PCI_DEV_CFG_SIZE	256
-#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
 struct pci_config_operations {
 	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
-		      u8 offset, void *data, int sz);
+		      u16 offset, void *data, int sz);
 	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
-		     u8 offset, void *data, int sz);
+		     u16 offset, void *data, int sz);
 };
 
 struct pci_device_header {
@@ -147,6 +207,10 @@ struct pci_device_header {
 			u8		min_gnt;
 			u8		max_lat;
 			struct msix_cap msix;
+#ifdef ARCH_HAS_PCI_EXP
+			struct pm_cap pm;
+			struct pcie_cap pcie;
+#endif
 		} __attribute__((packed));
 		/* Pad to PCI config space size */
 		u8	__pad[PCI_DEV_CFG_SIZE];
diff --git a/pci.c b/pci.c
index 68ece65441a6..b471209a6efc 100644
--- a/pci.c
+++ b/pci.c
@@ -400,7 +400,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
-	u8 bar, offset;
+	u8 bar;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 	u32 value = 0;
@@ -439,7 +440,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
-	u8 offset;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 
diff --git a/vfio/pci.c b/vfio/pci.c
index 2b891496547d..6b8726227ea0 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -311,7 +311,7 @@ out_unlock:
 }
 
 static void vfio_pci_msix_cap_write(struct kvm *kvm,
-				    struct vfio_device *vdev, u8 off,
+				    struct vfio_device *vdev, u16 off,
 				    void *data, int sz)
 {
 	struct vfio_pci_device *pdev = &vdev->pci;
@@ -343,7 +343,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
 }
 
 static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
-				     u8 off, u8 *data, u32 sz)
+				     u16 off, u8 *data, u32 sz)
 {
 	size_t i;
 	u32 mask = 0;
@@ -391,7 +391,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
 }
 
 static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
-				   u8 off, u8 *data, u32 sz)
+				   u16 off, u8 *data, u32 sz)
 {
 	u8 ctrl;
 	struct msi_msg msg;
@@ -536,7 +536,7 @@ out:
 }
 
 static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
-			      u8 offset, void *data, int sz)
+			      u16 offset, void *data, int sz)
 {
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
@@ -554,7 +554,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
 }
 
 static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
-			       u8 offset, void *data, int sz)
+			       u16 offset, void *data, int sz)
 {
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
@@ -638,15 +638,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 {
 	int ret;
 	size_t size;
-	u8 pos, next;
+	u16 pos, next;
 	struct pci_cap_hdr *cap;
-	u8 virt_hdr[PCI_DEV_CFG_SIZE];
+	u8 *virt_hdr;
 	struct vfio_pci_device *pdev = &vdev->pci;
 
 	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
 		return 0;
 
-	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
+	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
+	if (!virt_hdr)
+		return -errno;
 
 	pos = pdev->hdr.capabilities & ~3;
 
@@ -682,6 +684,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
 	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
 
+	free(virt_hdr);
+
 	return 0;
 }
 
@@ -792,7 +796,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 
 	/* Install our fake Configuration Space */
 	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
-	hdr_sz = PCI_DEV_CFG_SIZE;
+	/*
+	 * We don't touch the extended configuration space, let's be cautious
+	 * and not overwrite it all with zeros, or bad things might happen.
+	 */
+	hdr_sz = PCI_CFG_SPACE_SIZE;
 	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
 		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
 			     hdr_sz);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 08/32] pci: Fix ioport allocation size
  2020-03-26 15:24 ` [PATCH v3 kvmtool 08/32] pci: Fix ioport allocation size Alexandru Elisei
@ 2020-03-30  9:27   ` André Przywara
  0 siblings, 0 replies; 68+ messages in thread
From: André Przywara @ 2020-03-30  9:27 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	Julien Thierry

On 26/03/2020 15:24, Alexandru Elisei wrote:
> From: Julien Thierry <julien.thierry@arm.com>

Hi,

> The PCI Local Bus Specification, Rev. 3.0,
> Section 6.2.5.1. "Address Maps" states:
> "Devices that map control functions into I/O Space must not consume more
> than 256 bytes per I/O Base Address register."
> 
> Yet all the PCI devices allocate IO ports of IOPORT_SIZE (= 1024 bytes).
> 
> Fix this by having PCI devices use 256 bytes ports for IO BARs.
> 
> There is no hard requirement on the size of the memory region described
> by memory BARs. Since BAR 1 is supposed to offer the same functionality as
> IO ports, let's make its size match BAR 0.
> 
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> [Added rationale for changing BAR1 size to PCI_IO_SIZE]
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  hw/vesa.c            |  4 ++--
>  include/kvm/ioport.h |  1 -
>  pci.c                |  2 +-
>  virtio/pci.c         | 15 +++++++--------
>  4 files changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index 24fb46faad3b..d8d91aa9c873 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -63,8 +63,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>  	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>  		return NULL;
> -	r = pci_get_io_port_block(IOPORT_SIZE);
> -	r = ioport__register(kvm, r, &vesa_io_ops, IOPORT_SIZE, NULL);
> +	r = pci_get_io_port_block(PCI_IO_SIZE);
> +	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
>  	if (r < 0)
>  		return ERR_PTR(r);
>  
> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> index b10fcd5b4412..8c86b7151f25 100644
> --- a/include/kvm/ioport.h
> +++ b/include/kvm/ioport.h
> @@ -14,7 +14,6 @@
>  
>  /* some ports we reserve for own use */
>  #define IOPORT_DBG			0xe0
> -#define IOPORT_SIZE			0x400
>  
>  struct kvm;
>  
> diff --git a/pci.c b/pci.c
> index 80b5c5d3d7f3..b6892d974c08 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -20,7 +20,7 @@ static u16 io_port_blocks		= PCI_IOPORT_START;
>  
>  u16 pci_get_io_port_block(u32 size)
>  {
> -	u16 port = ALIGN(io_port_blocks, IOPORT_SIZE);
> +	u16 port = ALIGN(io_port_blocks, PCI_IO_SIZE);
>  
>  	io_port_blocks = port + size;
>  	return port;
> diff --git a/virtio/pci.c b/virtio/pci.c
> index d73414abde05..eeb5b5efa6e1 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -421,7 +421,7 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>  {
>  	struct virtio_pci *vpci = ptr;
>  	int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
> -	u16 port = vpci->port_addr + (addr & (IOPORT_SIZE - 1));
> +	u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
>  
>  	kvm__emulate_io(vcpu, port, data, direction, len, 1);
>  }
> @@ -435,17 +435,16 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  	vpci->kvm = kvm;
>  	vpci->dev = dev;
>  
> -	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>  
> -	r = pci_get_io_port_block(IOPORT_SIZE);
> -	r = ioport__register(kvm, r, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
> +	r = pci_get_io_port_block(PCI_IO_SIZE);
> +	r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev);
>  	if (r < 0)
>  		return r;
>  	vpci->port_addr = (u16)r;
>  
> -	vpci->mmio_addr = pci_get_mmio_block(IOPORT_SIZE);
> -	r = kvm__register_mmio(kvm, vpci->mmio_addr, IOPORT_SIZE, false,
> +	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
> +	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
>  			       virtio_pci__io_mmio_callback, vpci);
>  	if (r < 0)
>  		goto free_ioport;
> @@ -475,8 +474,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  							| PCI_BASE_ADDRESS_SPACE_MEMORY),
>  		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
>  		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
> -		.bar_size[0]		= cpu_to_le32(IOPORT_SIZE),
> -		.bar_size[1]		= cpu_to_le32(IOPORT_SIZE),
> +		.bar_size[0]		= cpu_to_le32(PCI_IO_SIZE),
> +		.bar_size[1]		= cpu_to_le32(PCI_IO_SIZE),
>  		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
>  	};
>  
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 12/32] vfio/pci: Ignore expansion ROM BAR writes
  2020-03-26 15:24 ` [PATCH v3 kvmtool 12/32] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
@ 2020-03-30  9:29   ` André Przywara
  0 siblings, 0 replies; 68+ messages in thread
From: André Przywara @ 2020-03-30  9:29 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> To get the size of the expansion ROM, software writes 0xfffff800 to the
> expansion ROM BAR in the PCI configuration space. PCI emulation executes
> the optional configuration space write callback that a device can implement
> before emulating this write.
> 
> kvmtool's implementation of VFIO doesn't have support for emulating
> expansion ROMs. However, the callback writes the guest value to the
> hardware BAR, and then it reads it back to the emulated BAR to make sure
> the write has completed successfully.
> 
> After this, we return to regular PCI emulation and because the BAR is no
> longer 0, we write back to the BAR the value that the guest used to get the
> size. As a result, the guest will think that the ROM size is 0x800 after
> the subsequent read and we end up unintentionally exposing to the guest a
> BAR which we don't emulate.
> 
> Let's fix this by ignoring writes to the expansion ROM BAR.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Thanks,
Andre

> ---
>  vfio/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 1bdc20038411..1f38f90c3ae9 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -472,6 +472,9 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	struct vfio_device *vdev;
>  	void *base = pci_hdr;
>  
> +	if (offset == PCI_ROM_ADDRESS)
> +		return;
> +
>  	pdev = container_of(pci_hdr, struct vfio_pci_device, hdr);
>  	vdev = container_of(pdev, struct vfio_device, pci);
>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 13/32] vfio/pci: Don't access unallocated regions
  2020-03-26 15:24 ` [PATCH v3 kvmtool 13/32] vfio/pci: Don't access unallocated regions Alexandru Elisei
@ 2020-03-30  9:29   ` André Przywara
  0 siblings, 0 replies; 68+ messages in thread
From: André Przywara @ 2020-03-30  9:29 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:

Hi,

> Don't try to configure a BAR if there is no region associated with it.
> 
> Also move the variable declarations from inside the loop to the start of
> the function for consistency.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  vfio/pci.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 1f38f90c3ae9..4412c6d7a862 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -645,16 +645,19 @@ static int vfio_pci_parse_cfg_space(struct vfio_device *vdev)
>  static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>  {
>  	int i;
> +	u64 base;
>  	ssize_t hdr_sz;
>  	struct msix_cap *msix;
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev = &vdev->pci;
> +	struct vfio_region *region;
>  
>  	/* Initialise the BARs */
>  	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
> -		u64 base;
> -		struct vfio_region *region = &vdev->regions[i];
> +		if ((u32)i == vdev->info.num_regions)
> +			break;
>  
> +		region = &vdev->regions[i];
>  		/* Construct a fake reg to match what we've mapped. */
>  		if (region->is_ioport) {
>  			base = (region->port_base & PCI_BASE_ADDRESS_IO_MASK) |
> @@ -853,11 +856,12 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  	u32 bar;
>  	size_t map_size;
>  	struct vfio_pci_device *pdev = &vdev->pci;
> -	struct vfio_region *region = &vdev->regions[nr];
> +	struct vfio_region *region;
>  
>  	if (nr >= vdev->info.num_regions)
>  		return 0;
>  
> +	region = &vdev->regions[nr];
>  	bar = pdev->hdr.bar[nr];
>  
>  	region->vdev = vdev;
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 14/32] virtio: Don't ignore initialization failures
  2020-03-26 15:24 ` [PATCH v3 kvmtool 14/32] virtio: Don't ignore initialization failures Alexandru Elisei
@ 2020-03-30  9:30   ` André Przywara
  2020-04-14 10:27     ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-03-30  9:30 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 26/03/2020 15:24, Alexandru Elisei wrote:
> Don't ignore an error in the bus specific initialization function in
> virtio_init; don't ignore the result of virtio_init; and don't return 0
> in virtio_blk__init and virtio_scsi__init when we encounter an error.
> Hopefully this will save some developer's time debugging faulty virtio
> devices in a guest.
> 
> To take advantage of the cleanup function virtio_blk__exit, we have
> moved appending the new device to the list before the call to
> virtio_init.
> 
> To safeguard against this in the future, virtio_init has been annoted
> with the compiler attribute warn_unused_result.

This is a good idea, but actually triggers an unrelated, long standing
bug in vesa.c (on x86):
hw/vesa.c: In function ‘vesa__init’:
hw/vesa.c:77:3: error: ignoring return value of ‘ERR_PTR’, declared with
attribute warn_unused_result [-Werror=unused-result]
   ERR_PTR(-errno);
   ^
cc1: all warnings being treated as errors

So could you add the missing "return" statement in that line, to fix
that bug?
I see that this gets rectified two patches later, but for the sake of
bisect-ability it would be good to keep this compilable all the way through.

> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  include/kvm/kvm.h        |  1 +
>  include/kvm/virtio.h     |  7 ++++---
>  include/linux/compiler.h |  2 +-
>  virtio/9p.c              |  9 ++++++---
>  virtio/balloon.c         | 10 +++++++---
>  virtio/blk.c             | 14 +++++++++-----
>  virtio/console.c         | 11 ++++++++---
>  virtio/core.c            |  9 +++++----
>  virtio/net.c             | 32 ++++++++++++++++++--------------
>  virtio/scsi.c            | 14 +++++++++-----
>  10 files changed, 68 insertions(+), 41 deletions(-)
> 
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index 7a738183d67a..c6dc6ef72d11 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -8,6 +8,7 @@
>  
>  #include <stdbool.h>
>  #include <linux/types.h>
> +#include <linux/compiler.h>
>  #include <time.h>
>  #include <signal.h>
>  #include <sys/prctl.h>
> diff --git a/include/kvm/virtio.h b/include/kvm/virtio.h
> index 19b913732cd5..3a311f54f2a5 100644
> --- a/include/kvm/virtio.h
> +++ b/include/kvm/virtio.h
> @@ -7,6 +7,7 @@
>  #include <linux/virtio_pci.h>
>  
>  #include <linux/types.h>
> +#include <linux/compiler.h>
>  #include <linux/virtio_config.h>
>  #include <sys/uio.h>
>  
> @@ -204,9 +205,9 @@ struct virtio_ops {
>  	int (*reset)(struct kvm *kvm, struct virtio_device *vdev);
>  };
>  
> -int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
> -		struct virtio_ops *ops, enum virtio_trans trans,
> -		int device_id, int subsys_id, int class);
> +int __must_check virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
> +			     struct virtio_ops *ops, enum virtio_trans trans,
> +			     int device_id, int subsys_id, int class);
>  int virtio_compat_add_message(const char *device, const char *config);
>  const char* virtio_trans_name(enum virtio_trans trans);
>  
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 898420b81aec..a662ba0a5c68 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -14,7 +14,7 @@
>  #define __packed	__attribute__((packed))
>  #define __iomem
>  #define __force
> -#define __must_check
> +#define __must_check	__attribute__((warn_unused_result))
>  #define unlikely
>  
>  #endif
> diff --git a/virtio/9p.c b/virtio/9p.c
> index ac70dbc31207..b78f2b3f0e09 100644
> --- a/virtio/9p.c
> +++ b/virtio/9p.c
> @@ -1551,11 +1551,14 @@ int virtio_9p_img_name_parser(const struct option *opt, const char *arg, int uns
>  int virtio_9p__init(struct kvm *kvm)
>  {
>  	struct p9_dev *p9dev;
> +	int r;
>  
>  	list_for_each_entry(p9dev, &devs, list) {
> -		virtio_init(kvm, p9dev, &p9dev->vdev, &p9_dev_virtio_ops,
> -			    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_9P,
> -			    VIRTIO_ID_9P, PCI_CLASS_9P);
> +		r = virtio_init(kvm, p9dev, &p9dev->vdev, &p9_dev_virtio_ops,
> +				VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_9P,
> +				VIRTIO_ID_9P, PCI_CLASS_9P);
> +		if (r < 0)
> +			return r;
>  	}
>  
>  	return 0;
> diff --git a/virtio/balloon.c b/virtio/balloon.c
> index 0bd16703dfee..8e8803fed607 100644
> --- a/virtio/balloon.c
> +++ b/virtio/balloon.c
> @@ -264,6 +264,8 @@ struct virtio_ops bln_dev_virtio_ops = {
>  
>  int virtio_bln__init(struct kvm *kvm)
>  {
> +	int r;
> +
>  	if (!kvm->cfg.balloon)
>  		return 0;
>  
> @@ -273,9 +275,11 @@ int virtio_bln__init(struct kvm *kvm)
>  	bdev.stat_waitfd	= eventfd(0, 0);
>  	memset(&bdev.config, 0, sizeof(struct virtio_balloon_config));
>  
> -	virtio_init(kvm, &bdev, &bdev.vdev, &bln_dev_virtio_ops,
> -		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLN,
> -		    VIRTIO_ID_BALLOON, PCI_CLASS_BLN);
> +	r = virtio_init(kvm, &bdev, &bdev.vdev, &bln_dev_virtio_ops,
> +			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLN,
> +			VIRTIO_ID_BALLOON, PCI_CLASS_BLN);
> +	if (r < 0)
> +		return r;
>  
>  	if (compat_id == -1)
>  		compat_id = virtio_compat_add_message("virtio-balloon", "CONFIG_VIRTIO_BALLOON");
> diff --git a/virtio/blk.c b/virtio/blk.c
> index f267be1563dc..4d02d101af81 100644
> --- a/virtio/blk.c
> +++ b/virtio/blk.c
> @@ -306,6 +306,7 @@ static struct virtio_ops blk_dev_virtio_ops = {
>  static int virtio_blk__init_one(struct kvm *kvm, struct disk_image *disk)
>  {
>  	struct blk_dev *bdev;
> +	int r;
>  
>  	if (!disk)
>  		return -EINVAL;
> @@ -323,12 +324,14 @@ static int virtio_blk__init_one(struct kvm *kvm, struct disk_image *disk)
>  		.kvm			= kvm,
>  	};
>  
> -	virtio_init(kvm, bdev, &bdev->vdev, &blk_dev_virtio_ops,
> -		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLK,
> -		    VIRTIO_ID_BLOCK, PCI_CLASS_BLK);
> -
>  	list_add_tail(&bdev->list, &bdevs);
>  
> +	r = virtio_init(kvm, bdev, &bdev->vdev, &blk_dev_virtio_ops,
> +			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLK,
> +			VIRTIO_ID_BLOCK, PCI_CLASS_BLK);
> +	if (r < 0)
> +		return r;
> +
>  	disk_image__set_callback(bdev->disk, virtio_blk_complete);
>  
>  	if (compat_id == -1)
> @@ -359,7 +362,8 @@ int virtio_blk__init(struct kvm *kvm)
>  
>  	return 0;
>  cleanup:
> -	return virtio_blk__exit(kvm);
> +	virtio_blk__exit(kvm);
> +	return r;
>  }
>  virtio_dev_init(virtio_blk__init);
>  
> diff --git a/virtio/console.c b/virtio/console.c
> index f1be02549222..e0b98df37965 100644
> --- a/virtio/console.c
> +++ b/virtio/console.c
> @@ -230,12 +230,17 @@ static struct virtio_ops con_dev_virtio_ops = {
>  
>  int virtio_console__init(struct kvm *kvm)
>  {
> +	int r;
> +
>  	if (kvm->cfg.active_console != CONSOLE_VIRTIO)
>  		return 0;
>  
> -	virtio_init(kvm, &cdev, &cdev.vdev, &con_dev_virtio_ops,
> -		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_CONSOLE,
> -		    VIRTIO_ID_CONSOLE, PCI_CLASS_CONSOLE);
> +	r = virtio_init(kvm, &cdev, &cdev.vdev, &con_dev_virtio_ops,
> +			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_CONSOLE,
> +			VIRTIO_ID_CONSOLE, PCI_CLASS_CONSOLE);
> +	if (r < 0)
> +		return r;
> +
>  	if (compat_id == -1)
>  		compat_id = virtio_compat_add_message("virtio-console", "CONFIG_VIRTIO_CONSOLE");
>  
> diff --git a/virtio/core.c b/virtio/core.c
> index e10ec362e1ea..f5b3c07fc100 100644
> --- a/virtio/core.c
> +++ b/virtio/core.c
> @@ -259,6 +259,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		int device_id, int subsys_id, int class)
>  {
>  	void *virtio;
> +	int r;
>  
>  	switch (trans) {
>  	case VIRTIO_PCI:
> @@ -272,7 +273,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		vdev->ops->init			= virtio_pci__init;
>  		vdev->ops->exit			= virtio_pci__exit;
>  		vdev->ops->reset		= virtio_pci__reset;
> -		vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
> +		r = vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
>  		break;
>  	case VIRTIO_MMIO:
>  		virtio = calloc(sizeof(struct virtio_mmio), 1);
> @@ -285,13 +286,13 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		vdev->ops->init			= virtio_mmio_init;
>  		vdev->ops->exit			= virtio_mmio_exit;
>  		vdev->ops->reset		= virtio_mmio_reset;
> -		vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
> +		r = vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
>  		break;
>  	default:
> -		return -1;
> +		r = -1;
>  	};
>  
> -	return 0;
> +	return r;
>  }
>  
>  int virtio_compat_add_message(const char *device, const char *config)
> diff --git a/virtio/net.c b/virtio/net.c
> index 091406912a24..425c13ba1136 100644
> --- a/virtio/net.c
> +++ b/virtio/net.c
> @@ -910,7 +910,7 @@ done:
>  
>  static int virtio_net__init_one(struct virtio_net_params *params)
>  {
> -	int i, err;
> +	int i, r;
>  	struct net_dev *ndev;
>  	struct virtio_ops *ops;
>  	enum virtio_trans trans = VIRTIO_DEFAULT_TRANS(params->kvm);
> @@ -920,10 +920,8 @@ static int virtio_net__init_one(struct virtio_net_params *params)
>  		return -ENOMEM;
>  
>  	ops = malloc(sizeof(*ops));
> -	if (ops == NULL) {
> -		err = -ENOMEM;
> -		goto err_free_ndev;
> -	}
> +	if (ops == NULL)
> +		return -ENOMEM;
>  
>  	list_add_tail(&ndev->list, &ndevs);

As mentioned in the reply to the v2 version, I think this is still
leaking memory here.

The rest looks fine.

Cheers,
Andre

>  
> @@ -969,8 +967,10 @@ static int virtio_net__init_one(struct virtio_net_params *params)
>  				   virtio_trans_name(trans));
>  	}
>  
> -	virtio_init(params->kvm, ndev, &ndev->vdev, ops, trans,
> -		    PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET);
> +	r = virtio_init(params->kvm, ndev, &ndev->vdev, ops, trans,
> +			PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET);
> +	if (r < 0)
> +		return r;
>  
>  	if (params->vhost)
>  		virtio_net__vhost_init(params->kvm, ndev);
> @@ -979,19 +979,17 @@ static int virtio_net__init_one(struct virtio_net_params *params)
>  		compat_id = virtio_compat_add_message("virtio-net", "CONFIG_VIRTIO_NET");
>  
>  	return 0;
> -
> -err_free_ndev:
> -	free(ndev);
> -	return err;
>  }
>  
>  int virtio_net__init(struct kvm *kvm)
>  {
> -	int i;
> +	int i, r;
>  
>  	for (i = 0; i < kvm->cfg.num_net_devices; i++) {
>  		kvm->cfg.net_params[i].kvm = kvm;
> -		virtio_net__init_one(&kvm->cfg.net_params[i]);
> +		r = virtio_net__init_one(&kvm->cfg.net_params[i]);
> +		if (r < 0)
> +			goto cleanup;
>  	}
>  
>  	if (kvm->cfg.num_net_devices == 0 && kvm->cfg.no_net == 0) {
> @@ -1007,10 +1005,16 @@ int virtio_net__init(struct kvm *kvm)
>  		str_to_mac(kvm->cfg.guest_mac, net_params.guest_mac);
>  		str_to_mac(kvm->cfg.host_mac, net_params.host_mac);
>  
> -		virtio_net__init_one(&net_params);
> +		r = virtio_net__init_one(&net_params);
> +		if (r < 0)
> +			goto cleanup;
>  	}
>  
>  	return 0;
> +
> +cleanup:
> +	virtio_net__exit(kvm);
> +	return r;
>  }
>  virtio_dev_init(virtio_net__init);
>  
> diff --git a/virtio/scsi.c b/virtio/scsi.c
> index 1ec78fe0945a..16a86cb7e0e6 100644
> --- a/virtio/scsi.c
> +++ b/virtio/scsi.c
> @@ -234,6 +234,7 @@ static void virtio_scsi_vhost_init(struct kvm *kvm, struct scsi_dev *sdev)
>  static int virtio_scsi_init_one(struct kvm *kvm, struct disk_image *disk)
>  {
>  	struct scsi_dev *sdev;
> +	int r;
>  
>  	if (!disk)
>  		return -EINVAL;
> @@ -260,12 +261,14 @@ static int virtio_scsi_init_one(struct kvm *kvm, struct disk_image *disk)
>  	strlcpy((char *)&sdev->target.vhost_wwpn, disk->wwpn, sizeof(sdev->target.vhost_wwpn));
>  	sdev->target.vhost_tpgt = strtol(disk->tpgt, NULL, 0);
>  
> -	virtio_init(kvm, sdev, &sdev->vdev, &scsi_dev_virtio_ops,
> -		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_SCSI,
> -		    VIRTIO_ID_SCSI, PCI_CLASS_BLK);
> -
>  	list_add_tail(&sdev->list, &sdevs);
>  
> +	r = virtio_init(kvm, sdev, &sdev->vdev, &scsi_dev_virtio_ops,
> +			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_SCSI,
> +			VIRTIO_ID_SCSI, PCI_CLASS_BLK);
> +	if (r < 0)
> +		return r;
> +
>  	virtio_scsi_vhost_init(kvm, sdev);
>  
>  	if (compat_id == -1)
> @@ -302,7 +305,8 @@ int virtio_scsi_init(struct kvm *kvm)
>  
>  	return 0;
>  cleanup:
> -	return virtio_scsi_exit(kvm);
> +	virtio_scsi_exit(kvm);
> +	return r;
>  }
>  virtio_dev_init(virtio_scsi_init);
>  
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 16/32] hw/vesa: Don't ignore fatal errors
  2020-03-26 15:24 ` [PATCH v3 kvmtool 16/32] hw/vesa: Don't ignore fatal errors Alexandru Elisei
@ 2020-03-30  9:30   ` André Przywara
  0 siblings, 0 replies; 68+ messages in thread
From: André Przywara @ 2020-03-30  9:30 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> Failling an mmap call or creating a memslot means that device emulation
> will not work, don't ignore it.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  hw/vesa.c | 28 ++++++++++++++++++++--------
>  1 file changed, 20 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index b92cc990b730..ad09373ea2ff 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -63,22 +63,25 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>  	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>  		return NULL;
> -	r = pci_get_io_port_block(PCI_IO_SIZE);
> -	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
> +	vesa_base_addr = pci_get_io_port_block(PCI_IO_SIZE);
> +	r = ioport__register(kvm, vesa_base_addr, &vesa_io_ops, PCI_IO_SIZE, NULL);
>  	if (r < 0)
> -		return ERR_PTR(r);
> +		goto out_error;
>  
> -	vesa_base_addr			= (u16)r;
>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
>  	r = device__register(&vesa_device);
>  	if (r < 0)
> -		return ERR_PTR(r);
> +		goto unregister_ioport;
>  
>  	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> -	if (mem == MAP_FAILED)
> -		ERR_PTR(-errno);
> +	if (mem == MAP_FAILED) {
> +		r = -errno;
> +		goto unregister_device;
> +	}
>  
> -	kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
> +	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
> +	if (r < 0)
> +		goto unmap_dev;
>  
>  	vesafb = (struct framebuffer) {
>  		.width			= VESA_WIDTH,
> @@ -90,4 +93,13 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  		.kvm			= kvm,
>  	};
>  	return fb__register(&vesafb);
> +
> +unmap_dev:
> +	munmap(mem, VESA_MEM_SIZE);
> +unregister_device:
> +	device__unregister(&vesa_device);
> +unregister_ioport:
> +	ioport__unregister(kvm, vesa_base_addr);
> +out_error:
> +	return ERR_PTR(r);
>  }
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 18/32] ioport: Fail when registering overlapping ports
  2020-03-26 15:24 ` [PATCH v3 kvmtool 18/32] ioport: Fail when registering overlapping ports Alexandru Elisei
@ 2020-03-30 11:13   ` André Przywara
  0 siblings, 0 replies; 68+ messages in thread
From: André Przywara @ 2020-03-30 11:13 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> If we try to register a range of ports which overlaps with another, already
> registered, I/O ports region then device emulation for that region will not
> work anymore. There's nothing sane that the ioport emulation layer can do
> in this case so refuse to allocate the port. This matches the behavior of
> kvm__register_mmio.
> 
> There's no need to protect allocating a new ioport struct with a lock, so
> move the lock to protect the actual ioport insertion in the tree.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

That looks correct to me. The protection of the ioport_tree by the
br_lock would deserve some mentioning in a comment, but this is a
separate issue, so:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  ioport.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/ioport.c b/ioport.c
> index cb778ed8d757..d9f2e8ea3c3b 100644
> --- a/ioport.c
> +++ b/ioport.c
> @@ -68,14 +68,6 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>  	struct ioport *entry;
>  	int r;
>  
> -	br_write_lock(kvm);
> -
> -	entry = ioport_search(&ioport_tree, port);
> -	if (entry) {
> -		pr_warning("ioport re-registered: %x", port);
> -		ioport_remove(&ioport_tree, entry);
> -	}
> -
>  	entry = malloc(sizeof(*entry));
>  	if (entry == NULL)
>  		return -ENOMEM;
> @@ -90,6 +82,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>  		},
>  	};
>  
> +	br_write_lock(kvm);
>  	r = ioport_insert(&ioport_tree, entry);
>  	if (r < 0)
>  		goto out_free;
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 19/32] ioport: mmio: Use a mutex and reference counting for locking
  2020-03-26 15:24 ` [PATCH v3 kvmtool 19/32] ioport: mmio: Use a mutex and reference counting for locking Alexandru Elisei
@ 2020-03-31 11:51   ` André Przywara
  2020-05-01 15:30     ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-03-31 11:51 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:

Hi,

> kvmtool uses brlock for protecting accesses to the ioport and mmio
> red-black trees. brlock allows concurrent reads, but only one writer, which
> is assumed not to be a VCPU thread (for more information see commit
> 0b907ed2eaec ("kvm tools: Add a brlock)). This is done by issuing a
> compiler barrier on read and pausing the entire virtual machine on writes.
> When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread read/write
> lock.
> 
> When we will implement reassignable BARs, the mmio or ioport mapping will
> be done as a result of a VCPU mmio access. When brlock is a pthread
> read/write lock, it means that we will try to acquire a write lock with the
> read lock already held by the same VCPU and we will deadlock. When it's
> not, a VCPU will have to call kvm__pause, which means the virtual machine
> will stay paused forever.
> 
> Let's avoid all this by using a mutex and reference counting the red-black
> tree entries. This way we can guarantee that we won't unregister a node
> that another thread is currently using for emulation.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  include/kvm/ioport.h          |  2 +
>  include/kvm/rbtree-interval.h |  4 +-
>  ioport.c                      | 64 +++++++++++++++++-------
>  mmio.c                        | 91 +++++++++++++++++++++++++----------
>  4 files changed, 118 insertions(+), 43 deletions(-)
> 
> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> index 62a719327e3f..039633f76bdd 100644
> --- a/include/kvm/ioport.h
> +++ b/include/kvm/ioport.h
> @@ -22,6 +22,8 @@ struct ioport {
>  	struct ioport_operations	*ops;
>  	void				*priv;
>  	struct device_header		dev_hdr;
> +	u32				refcount;
> +	bool				remove;

The use of this extra "remove" variable seems somehow odd. I think
normally you would initialise the refcount to 1, and let the unregister
operation do a put as well, with the removal code triggered if the count
reaches zero. At least this is what kref does, can we do the same here?
Or is there anything that would prevent it? I think it's a good idea to
stick to existing design patterns for things like refcounts.

Cheers,
Andre.


>  };
>  
>  struct ioport_operations {
> diff --git a/include/kvm/rbtree-interval.h b/include/kvm/rbtree-interval.h
> index 730eb5e8551d..17cd3b5f3199 100644
> --- a/include/kvm/rbtree-interval.h
> +++ b/include/kvm/rbtree-interval.h
> @@ -6,7 +6,9 @@
>  
>  #define RB_INT_INIT(l, h) \
>  	(struct rb_int_node){.low = l, .high = h}
> -#define rb_int(n) rb_entry(n, struct rb_int_node, node)
> +#define rb_int(n)	rb_entry(n, struct rb_int_node, node)
> +#define rb_int_start(n)	((n)->low)
> +#define rb_int_end(n)	((n)->low + (n)->high - 1)
>  
>  struct rb_int_node {
>  	struct rb_node	node;
> diff --git a/ioport.c b/ioport.c
> index d9f2e8ea3c3b..5d7288809528 100644
> --- a/ioport.c
> +++ b/ioport.c
> @@ -2,7 +2,6 @@
>  
>  #include "kvm/kvm.h"
>  #include "kvm/util.h"
> -#include "kvm/brlock.h"
>  #include "kvm/rbtree-interval.h"
>  #include "kvm/mutex.h"
>  
> @@ -16,6 +15,8 @@
>  
>  #define ioport_node(n) rb_entry(n, struct ioport, node)
>  
> +static DEFINE_MUTEX(ioport_lock);
> +
>  static struct rb_root		ioport_tree = RB_ROOT;
>  
>  static struct ioport *ioport_search(struct rb_root *root, u64 addr)
> @@ -39,6 +40,36 @@ static void ioport_remove(struct rb_root *root, struct ioport *data)
>  	rb_int_erase(root, &data->node);
>  }
>  
> +static struct ioport *ioport_get(struct rb_root *root, u64 addr)
> +{
> +	struct ioport *ioport;
> +
> +	mutex_lock(&ioport_lock);
> +	ioport = ioport_search(root, addr);
> +	if (ioport)
> +		ioport->refcount++;
> +	mutex_unlock(&ioport_lock);
> +
> +	return ioport;
> +}
> +
> +/* Called with ioport_lock held. */
> +static void ioport_unregister(struct rb_root *root, struct ioport *data)
> +{
> +	device__unregister(&data->dev_hdr);
> +	ioport_remove(root, data);
> +	free(data);
> +}
> +
> +static void ioport_put(struct rb_root *root, struct ioport *data)
> +{
> +	mutex_lock(&ioport_lock);
> +	data->refcount--;
> +	if (data->remove && data->refcount == 0)
> +		ioport_unregister(root, data);
> +	mutex_unlock(&ioport_lock);
> +}
> +
>  #ifdef CONFIG_HAS_LIBFDT
>  static void generate_ioport_fdt_node(void *fdt,
>  				     struct device_header *dev_hdr,
> @@ -80,16 +111,18 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>  			.bus_type	= DEVICE_BUS_IOPORT,
>  			.data		= generate_ioport_fdt_node,
>  		},
> +		.refcount	= 0,
> +		.remove		= false,
>  	};
>  
> -	br_write_lock(kvm);
> +	mutex_lock(&ioport_lock);
>  	r = ioport_insert(&ioport_tree, entry);
>  	if (r < 0)
>  		goto out_free;
>  	r = device__register(&entry->dev_hdr);
>  	if (r < 0)
>  		goto out_remove;
> -	br_write_unlock(kvm);
> +	mutex_unlock(&ioport_lock);
>  
>  	return port;
>  
> @@ -97,7 +130,7 @@ out_remove:
>  	ioport_remove(&ioport_tree, entry);
>  out_free:
>  	free(entry);
> -	br_write_unlock(kvm);
> +	mutex_unlock(&ioport_lock);
>  	return r;
>  }
>  
> @@ -106,22 +139,22 @@ int ioport__unregister(struct kvm *kvm, u16 port)
>  	struct ioport *entry;
>  	int r;
>  
> -	br_write_lock(kvm);
> +	mutex_lock(&ioport_lock);
>  
>  	r = -ENOENT;
>  	entry = ioport_search(&ioport_tree, port);
>  	if (!entry)
>  		goto done;
>  
> -	device__unregister(&entry->dev_hdr);
> -	ioport_remove(&ioport_tree, entry);
> -
> -	free(entry);
> +	if (entry->refcount)
> +		entry->remove = true;
> +	else
> +		ioport_unregister(&ioport_tree, entry);
>  
>  	r = 0;
>  
>  done:
> -	br_write_unlock(kvm);
> +	mutex_unlock(&ioport_lock);
>  
>  	return r;
>  }
> @@ -136,9 +169,7 @@ static void ioport__unregister_all(void)
>  	while (rb) {
>  		rb_node = rb_int(rb);
>  		entry = ioport_node(rb_node);
> -		device__unregister(&entry->dev_hdr);
> -		ioport_remove(&ioport_tree, entry);
> -		free(entry);
> +		ioport_unregister(&ioport_tree, entry);
>  		rb = rb_first(&ioport_tree);
>  	}
>  }
> @@ -164,8 +195,7 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>  	void *ptr = data;
>  	struct kvm *kvm = vcpu->kvm;
>  
> -	br_read_lock(kvm);
> -	entry = ioport_search(&ioport_tree, port);
> +	entry = ioport_get(&ioport_tree, port);
>  	if (!entry)
>  		goto out;
>  
> @@ -180,9 +210,9 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>  		ptr += size;
>  	}
>  
> -out:
> -	br_read_unlock(kvm);
> +	ioport_put(&ioport_tree, entry);
>  
> +out:
>  	if (ret)
>  		return true;
>  
> diff --git a/mmio.c b/mmio.c
> index 61e1d47a587d..57a4d5b9d64d 100644
> --- a/mmio.c
> +++ b/mmio.c
> @@ -1,7 +1,7 @@
>  #include "kvm/kvm.h"
>  #include "kvm/kvm-cpu.h"
>  #include "kvm/rbtree-interval.h"
> -#include "kvm/brlock.h"
> +#include "kvm/mutex.h"
>  
>  #include <stdio.h>
>  #include <stdlib.h>
> @@ -15,10 +15,14 @@
>  
>  #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
>  
> +static DEFINE_MUTEX(mmio_lock);
> +
>  struct mmio_mapping {
>  	struct rb_int_node	node;
>  	void			(*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
>  	void			*ptr;
> +	u32			refcount;
> +	bool			remove;
>  };
>  
>  static struct rb_root mmio_tree = RB_ROOT;
> @@ -51,6 +55,11 @@ static int mmio_insert(struct rb_root *root, struct mmio_mapping *data)
>  	return rb_int_insert(root, &data->node);
>  }
>  
> +static void mmio_remove(struct rb_root *root, struct mmio_mapping *data)
> +{
> +	rb_int_erase(root, &data->node);
> +}
> +
>  static const char *to_direction(u8 is_write)
>  {
>  	if (is_write)
> @@ -59,6 +68,41 @@ static const char *to_direction(u8 is_write)
>  	return "read";
>  }
>  
> +static struct mmio_mapping *mmio_get(struct rb_root *root, u64 phys_addr, u32 len)
> +{
> +	struct mmio_mapping *mmio;
> +
> +	mutex_lock(&mmio_lock);
> +	mmio = mmio_search(root, phys_addr, len);
> +	if (mmio)
> +		mmio->refcount++;
> +	mutex_unlock(&mmio_lock);
> +
> +	return mmio;
> +}
> +
> +/* Called with mmio_lock held. */
> +static void mmio_deregister(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio)
> +{
> +	struct kvm_coalesced_mmio_zone zone = (struct kvm_coalesced_mmio_zone) {
> +		.addr	= rb_int_start(&mmio->node),
> +		.size	= 1,
> +	};
> +	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
> +
> +	mmio_remove(root, mmio);
> +	free(mmio);
> +}
> +
> +static void mmio_put(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio)
> +{
> +	mutex_lock(&mmio_lock);
> +	mmio->refcount--;
> +	if (mmio->remove && mmio->refcount == 0)
> +		mmio_deregister(kvm, root, mmio);
> +	mutex_unlock(&mmio_lock);
> +}
> +
>  int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
>  		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
>  			void *ptr)
> @@ -72,9 +116,11 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
>  		return -ENOMEM;
>  
>  	*mmio = (struct mmio_mapping) {
> -		.node = RB_INT_INIT(phys_addr, phys_addr + phys_addr_len),
> -		.mmio_fn = mmio_fn,
> -		.ptr	= ptr,
> +		.node		= RB_INT_INIT(phys_addr, phys_addr + phys_addr_len),
> +		.mmio_fn	= mmio_fn,
> +		.ptr		= ptr,
> +		.refcount	= 0,
> +		.remove		= false,
>  	};
>  
>  	if (coalesce) {
> @@ -88,9 +134,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
>  			return -errno;
>  		}
>  	}
> -	br_write_lock(kvm);
> +	mutex_lock(&mmio_lock);
>  	ret = mmio_insert(&mmio_tree, mmio);
> -	br_write_unlock(kvm);
> +	mutex_unlock(&mmio_lock);
>  
>  	return ret;
>  }
> @@ -98,25 +144,20 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
>  bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>  {
>  	struct mmio_mapping *mmio;
> -	struct kvm_coalesced_mmio_zone zone;
>  
> -	br_write_lock(kvm);
> +	mutex_lock(&mmio_lock);
>  	mmio = mmio_search_single(&mmio_tree, phys_addr);
>  	if (mmio == NULL) {
> -		br_write_unlock(kvm);
> +		mutex_unlock(&mmio_lock);
>  		return false;
>  	}
>  
> -	zone = (struct kvm_coalesced_mmio_zone) {
> -		.addr	= phys_addr,
> -		.size	= 1,
> -	};
> -	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
> +	if (mmio->refcount)
> +		mmio->remove = true;
> +	else
> +		mmio_deregister(kvm, &mmio_tree, mmio);
> +	mutex_unlock(&mmio_lock);
>  
> -	rb_int_erase(&mmio_tree, &mmio->node);
> -	br_write_unlock(kvm);
> -
> -	free(mmio);
>  	return true;
>  }
>  
> @@ -124,18 +165,18 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
>  {
>  	struct mmio_mapping *mmio;
>  
> -	br_read_lock(vcpu->kvm);
> -	mmio = mmio_search(&mmio_tree, phys_addr, len);
> -
> -	if (mmio)
> -		mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
> -	else {
> +	mmio = mmio_get(&mmio_tree, phys_addr, len);
> +	if (!mmio) {
>  		if (vcpu->kvm->cfg.mmio_debug)
>  			fprintf(stderr,	"Warning: Ignoring MMIO %s at %016llx (length %u)\n",
>  				to_direction(is_write),
>  				(unsigned long long)phys_addr, len);
> +		goto out;
>  	}
> -	br_read_unlock(vcpu->kvm);
>  
> +	mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
> +	mmio_put(vcpu->kvm, &mmio_tree, mmio);
> +
> +out:
>  	return true;
>  }
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 20/32] pci: Add helpers for BAR values and memory/IO space access
  2020-03-26 15:24 ` [PATCH v3 kvmtool 20/32] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
@ 2020-04-02  8:33   ` André Przywara
  0 siblings, 0 replies; 68+ messages in thread
From: André Przywara @ 2020-04-02  8:33 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> We're going to be checking the BAR type, the address written to it and if
> access to memory or I/O space is enabled quite often when we add support
> for reasignable BARs; make our life easier by adding helpers for it.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  include/kvm/pci.h   | 53 +++++++++++++++++++++++++++++++++++++++++++++
>  pci.c               |  4 ++--
>  powerpc/spapr_pci.c |  2 +-
>  3 files changed, 56 insertions(+), 3 deletions(-)
> 
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index ccb155e3e8fe..adb4b5c082d5 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -5,6 +5,7 @@
>  #include <linux/kvm.h>
>  #include <linux/pci_regs.h>
>  #include <endian.h>
> +#include <stdbool.h>
>  
>  #include "kvm/devices.h"
>  #include "kvm/msi.h"
> @@ -161,4 +162,56 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
>  
>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>  
> +static inline bool __pci__memory_space_enabled(u16 command)
> +{
> +	return command & PCI_COMMAND_MEMORY;
> +}
> +
> +static inline bool pci__memory_space_enabled(struct pci_device_header *pci_hdr)
> +{
> +	return __pci__memory_space_enabled(pci_hdr->command);
> +}
> +
> +static inline bool __pci__io_space_enabled(u16 command)
> +{
> +	return command & PCI_COMMAND_IO;
> +}
> +
> +static inline bool pci__io_space_enabled(struct pci_device_header *pci_hdr)
> +{
> +	return __pci__io_space_enabled(pci_hdr->command);
> +}
> +
> +static inline bool __pci__bar_is_io(u32 bar)
> +{
> +	return bar & PCI_BASE_ADDRESS_SPACE_IO;
> +}
> +
> +static inline bool pci__bar_is_io(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return __pci__bar_is_io(pci_hdr->bar[bar_num]);
> +}
> +
> +static inline bool pci__bar_is_memory(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return !pci__bar_is_io(pci_hdr, bar_num);
> +}
> +
> +static inline u32 __pci__bar_address(u32 bar)
> +{
> +	if (__pci__bar_is_io(bar))
> +		return bar & PCI_BASE_ADDRESS_IO_MASK;
> +	return bar & PCI_BASE_ADDRESS_MEM_MASK;
> +}
> +
> +static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return __pci__bar_address(pci_hdr->bar[bar_num]);
> +}
> +
> +static inline u32 pci__bar_size(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return pci_hdr->bar_size[bar_num];
> +}
> +
>  #endif /* KVM__PCI_H */
> diff --git a/pci.c b/pci.c
> index b6892d974c08..7399c76c0819 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -185,7 +185,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	 * size, it will write the address back.
>  	 */
>  	if (bar < 6) {
> -		if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO)
> +		if (pci__bar_is_io(pci_hdr, bar))
>  			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
>  		else
>  			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> @@ -211,7 +211,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  		 */
>  		memcpy(&value, data, size);
>  		if (value == 0xffffffff)
> -			value = ~(pci_hdr->bar_size[bar] - 1);
> +			value = ~(pci__bar_size(pci_hdr, bar) - 1);
>  		/* Preserve the special bits. */
>  		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
>  		memcpy(base + offset, &value, size);
> diff --git a/powerpc/spapr_pci.c b/powerpc/spapr_pci.c
> index a15f7d895a46..7be44d950acb 100644
> --- a/powerpc/spapr_pci.c
> +++ b/powerpc/spapr_pci.c
> @@ -369,7 +369,7 @@ int spapr_populate_pci_devices(struct kvm *kvm,
>  				of_pci_b_ddddd(devid) |
>  				of_pci_b_fff(fn) |
>  				of_pci_b_rrrrrrrr(bars[i]));
> -			reg[n+1].size = cpu_to_be64(hdr->bar_size[i]);
> +			reg[n+1].size = cpu_to_be64(pci__bar_size(hdr, i));
>  			reg[n+1].addr = 0;
>  
>  			assigned_addresses[n].phys_hi = cpu_to_be32(
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 22/32] vfio: Destroy memslot when unmapping the associated VAs
  2020-03-26 15:24 ` [PATCH v3 kvmtool 22/32] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
@ 2020-04-02  8:33   ` André Przywara
  0 siblings, 0 replies; 68+ messages in thread
From: André Przywara @ 2020-04-02  8:33 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> When we want to map a device region into the guest address space, first we
> perform an mmap on the device fd. The resulting VMA is a mapping between
> host userspace addresses and physical addresses associated with the device.
> Next, we create a memslot, which populates the stage 2 table with the
> mappings between guest physical addresses and the device physical adresses.
> 
> However, when we want to unmap the device from the guest address space, we
> only call munmap, which destroys the VMA and the stage 2 mappings, but
> doesn't destroy the memslot and kvmtool's internal mem_bank structure
> associated with the memslot.
> 
> This has been perfectly fine so far, because we only unmap a device region
> when we exit kvmtool. This is will change when we add support for
> reassignable BARs, and we will have to unmap vfio regions as the guest
> kernel writes new addresses in the BARs. This can lead to two possible
> problems:
> 
> - We refuse to create a valid BAR mapping because of a stale mem_bank
>   structure which belonged to a previously unmapped region.
> 
> - It is possible that the mmap in vfio_map_region returns the same address
>   that was used to create a memslot, but was unmapped by vfio_unmap_region.
>   Guest accesses to the device memory will fault because the stage 2
>   mappings are missing, and this can lead to performance degradation.
> 
> Let's do the right thing and destroy the memslot and the mem_bank struct
> associated with it when we unmap a vfio region. Set host_addr to NULL after
> the munmap call so we won't try to unmap an address which is currently used
> by the process for something else if vfio_unmap_region gets called twice.
> 

Thanks for adding the locking here, looks alright now.

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  include/kvm/kvm.h |   4 ++
>  kvm.c             | 101 ++++++++++++++++++++++++++++++++++++++++------
>  vfio/core.c       |   6 +++
>  3 files changed, 99 insertions(+), 12 deletions(-)
> 
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index 50119a8672eb..9428f57a1c0c 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -1,6 +1,7 @@
>  #ifndef KVM__KVM_H
>  #define KVM__KVM_H
>  
> +#include "kvm/mutex.h"
>  #include "kvm/kvm-arch.h"
>  #include "kvm/kvm-config.h"
>  #include "kvm/util-init.h"
> @@ -56,6 +57,7 @@ struct kvm_mem_bank {
>  	void			*host_addr;
>  	u64			size;
>  	enum kvm_mem_type	type;
> +	u32			slot;
>  };
>  
>  struct kvm {
> @@ -72,6 +74,7 @@ struct kvm {
>  	u64			ram_size;
>  	void			*ram_start;
>  	u64			ram_pagesize;
> +	struct mutex		mem_banks_lock;
>  	struct list_head	mem_banks;
>  
>  	bool			nmi_disabled;
> @@ -106,6 +109,7 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
>  void kvm__irq_trigger(struct kvm *kvm, int irq);
>  bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count);
>  bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write);
> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr);
>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr,
>  		      enum kvm_mem_type type);
>  static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size,
> diff --git a/kvm.c b/kvm.c
> index 57c4ff98ec4c..26f6b9bc58a3 100644
> --- a/kvm.c
> +++ b/kvm.c
> @@ -157,6 +157,7 @@ struct kvm *kvm__new(void)
>  	if (!kvm)
>  		return ERR_PTR(-ENOMEM);
>  
> +	mutex_init(&kvm->mem_banks_lock);
>  	kvm->sys_fd = -1;
>  	kvm->vm_fd = -1;
>  
> @@ -183,20 +184,84 @@ int kvm__exit(struct kvm *kvm)
>  }
>  core_exit(kvm__exit);
>  
> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size,
> +		     void *userspace_addr)
> +{
> +	struct kvm_userspace_memory_region mem;
> +	struct kvm_mem_bank *bank;
> +	int ret;
> +
> +	mutex_lock(&kvm->mem_banks_lock);
> +	list_for_each_entry(bank, &kvm->mem_banks, list)
> +		if (bank->guest_phys_addr == guest_phys &&
> +		    bank->size == size && bank->host_addr == userspace_addr)
> +			break;
> +
> +	if (&bank->list == &kvm->mem_banks) {
> +		pr_err("Region [%llx-%llx] not found", guest_phys,
> +		       guest_phys + size - 1);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (bank->type == KVM_MEM_TYPE_RESERVED) {
> +		pr_err("Cannot delete reserved region [%llx-%llx]",
> +		       guest_phys, guest_phys + size - 1);
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	mem = (struct kvm_userspace_memory_region) {
> +		.slot			= bank->slot,
> +		.guest_phys_addr	= guest_phys,
> +		.memory_size		= 0,
> +		.userspace_addr		= (unsigned long)userspace_addr,
> +	};
> +
> +	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
> +	if (ret < 0) {
> +		ret = -errno;
> +		goto out;
> +	}
> +
> +	list_del(&bank->list);
> +	free(bank);
> +	kvm->mem_slots--;
> +	ret = 0;
> +
> +out:
> +	mutex_unlock(&kvm->mem_banks_lock);
> +	return ret;
> +}
> +
>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>  		      void *userspace_addr, enum kvm_mem_type type)
>  {
>  	struct kvm_userspace_memory_region mem;
>  	struct kvm_mem_bank *merged = NULL;
>  	struct kvm_mem_bank *bank;
> +	struct list_head *prev_entry;
> +	u32 slot;
>  	int ret;
>  
> -	/* Check for overlap */
> +	mutex_lock(&kvm->mem_banks_lock);
> +	/* Check for overlap and find first empty slot. */
> +	slot = 0;
> +	prev_entry = &kvm->mem_banks;
>  	list_for_each_entry(bank, &kvm->mem_banks, list) {
>  		u64 bank_end = bank->guest_phys_addr + bank->size - 1;
>  		u64 end = guest_phys + size - 1;
> -		if (guest_phys > bank_end || end < bank->guest_phys_addr)
> +		if (guest_phys > bank_end || end < bank->guest_phys_addr) {
> +			/*
> +			 * Keep the banks sorted ascending by slot, so it's
> +			 * easier for us to find a free slot.
> +			 */
> +			if (bank->slot == slot) {
> +				slot++;
> +				prev_entry = &bank->list;
> +			}
>  			continue;
> +		}
>  
>  		/* Merge overlapping reserved regions */
>  		if (bank->type == KVM_MEM_TYPE_RESERVED &&
> @@ -226,38 +291,50 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>  		       kvm_mem_type_to_string(bank->type), bank->guest_phys_addr,
>  		       bank->guest_phys_addr + bank->size - 1);
>  
> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto out;
>  	}
>  
> -	if (merged)
> -		return 0;
> +	if (merged) {
> +		ret = 0;
> +		goto out;
> +	}
>  
>  	bank = malloc(sizeof(*bank));
> -	if (!bank)
> -		return -ENOMEM;
> +	if (!bank) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
>  
>  	INIT_LIST_HEAD(&bank->list);
>  	bank->guest_phys_addr		= guest_phys;
>  	bank->host_addr			= userspace_addr;
>  	bank->size			= size;
>  	bank->type			= type;
> +	bank->slot			= slot;
>  
>  	if (type != KVM_MEM_TYPE_RESERVED) {
>  		mem = (struct kvm_userspace_memory_region) {
> -			.slot			= kvm->mem_slots++,
> +			.slot			= slot,
>  			.guest_phys_addr	= guest_phys,
>  			.memory_size		= size,
>  			.userspace_addr		= (unsigned long)userspace_addr,
>  		};
>  
>  		ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
> -		if (ret < 0)
> -			return -errno;
> +		if (ret < 0) {
> +			ret = -errno;
> +			goto out;
> +		}
>  	}
>  
> -	list_add(&bank->list, &kvm->mem_banks);
> +	list_add(&bank->list, prev_entry);
> +	kvm->mem_slots++;
> +	ret = 0;
>  
> -	return 0;
> +out:
> +	mutex_unlock(&kvm->mem_banks_lock);
> +	return ret;
>  }
>  
>  void *guest_flat_to_host(struct kvm *kvm, u64 offset)
> diff --git a/vfio/core.c b/vfio/core.c
> index 0ed1e6fee6bf..b80ebf3bb913 100644
> --- a/vfio/core.c
> +++ b/vfio/core.c
> @@ -256,8 +256,14 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
>  
>  void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
>  {
> +	u64 map_size;
> +
>  	if (region->host_addr) {
> +		map_size = ALIGN(region->info.size, PAGE_SIZE);
> +		kvm__destroy_mem(kvm, region->guest_phys_addr, map_size,
> +				 region->host_addr);
>  		munmap(region->host_addr, region->info.size);
> +		region->host_addr = NULL;
>  	} else if (region->is_ioport) {
>  		ioport__unregister(kvm, region->port_base);
>  	} else {
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 24/32] pci: Limit configuration transaction size to 32 bits
  2020-03-26 15:24 ` [PATCH v3 kvmtool 24/32] pci: Limit configuration transaction size to 32 bits Alexandru Elisei
@ 2020-04-02  8:34   ` André Przywara
  2020-05-04 13:00     ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-04-02  8:34 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> From PCI Local Bus Specification Revision 3.0. section 3.8 "64-Bit Bus
> Extension":
> 
> "The bandwidth requirements for I/O and configuration transactions cannot
> justify the added complexity, and, therefore, only memory transactions
> support 64-bit data transfers".
> 
> Further down, the spec also describes the possible responses of a target
> which has been requested to do a 64-bit transaction. Limit the transaction
> to the lower 32 bits, to match the second accepted behaviour.

That looks like a reasonable behaviour.
AFAICS there is one code path from powerpc/spapr_pci.c which isn't
covered by those limitations (rtas_ibm_write_pci_config() ->
pci__config_wr() -> cfg_ops.write() -> vfio_pci_cfg_write()).
Same for read.

I am not sure we really care, maybe you can fix it if you like.

Either way this patch seems right, so:

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  pci.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/pci.c b/pci.c
> index 7399c76c0819..611e2c0bf1da 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -119,6 +119,9 @@ static bool pci_config_data_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
>  {
>  	union pci_config_address pci_config_address;
>  
> +	if (size > 4)
> +		size = 4;
> +
>  	pci_config_address.w = ioport__read32(&pci_config_address_bits);
>  	/*
>  	 * If someone accesses PCI configuration space offsets that are not
> @@ -135,6 +138,9 @@ static bool pci_config_data_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16
>  {
>  	union pci_config_address pci_config_address;
>  
> +	if (size > 4)
> +		size = 4;
> +
>  	pci_config_address.w = ioport__read32(&pci_config_address_bits);
>  	/*
>  	 * If someone accesses PCI configuration space offsets that are not
> @@ -248,6 +254,9 @@ static void pci_config_mmio_access(struct kvm_cpu *vcpu, u64 addr, u8 *data,
>  	cfg_addr.w		= (u32)addr;
>  	cfg_addr.enable_bit	= 1;
>  
> +	if (len > 4)
> +		len = 4;
> +
>  	if (is_write)
>  		pci__config_wr(kvm, cfg_addr, data, len);
>  	else
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 25/32] vfio/pci: Don't write configuration value twice
  2020-03-26 15:24 ` [PATCH v3 kvmtool 25/32] vfio/pci: Don't write configuration value twice Alexandru Elisei
@ 2020-04-02  8:35   ` André Przywara
  2020-05-04 13:44     ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-04-02  8:35 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> After writing to the device fd as part of the PCI configuration space
> emulation, we read back from the device to make sure that the write
> finished. The value is read back into the PCI configuration space and
> afterwards, the same value is copied by the PCI emulation code. Let's
> read from the device fd into a temporary variable, to prevent this
> double write.
> 
> The double write is harmless in itself. But when we implement
> reassignable BARs, we need to keep track of the old BAR value, and the
> VFIO code is overwritting it.
> 

It seems still a bit fragile, since we rely on code in other places to
limit "sz" to 4 or less, but in practice we should be covered.

Can you maybe add an assert here to prevent accidents on the stack?

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Cheers,
Andre

> ---
>  vfio/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index fe02574390f6..8b2a0c8dbac3 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -470,7 +470,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
>  	struct vfio_device *vdev;
> -	void *base = pci_hdr;
> +	u32 tmp;
>  
>  	if (offset == PCI_ROM_ADDRESS)
>  		return;
> @@ -490,7 +490,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
>  		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
>  
> -	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
> +	if (pread(vdev->fd, &tmp, sz, info->offset + offset) != sz)
>  		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
>  			      sz, offset);
>  }
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 26/32] vesa: Create device exactly once
  2020-03-26 15:24 ` [PATCH v3 kvmtool 26/32] vesa: Create device exactly once Alexandru Elisei
@ 2020-04-02  8:58   ` André Przywara
  2020-05-05 13:02     ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-04-02  8:58 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> A vesa device is used by the SDL, GTK or VNC framebuffers. Don't allow the
> user to specify more than one of these options because kvmtool will create
> identical devices and bad things will happen:
> 
> $ ./lkvm run -c2 -m2048 -k bzImage --sdl --gtk
>   # lkvm run -k bzImage -m 2048 -c 2 --name guest-10159
>   Error: device region [d0000000-d012bfff] would overlap device region [d0000000-d012bfff]
> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
> ======= Backtrace: =========
> ======= Backtrace: =========
> /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
> ======= Backtrace: =========
> /lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
> (+0x777e5)[0x7fae0ed447e5]
> /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
> /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fae0ed5153c]
> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
> /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
> /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
> /usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
> /usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
> ======= Backtrace: =========
> Aborted (core dumped)
> 
> The vesa device is explicitly created during the initialization phase of
> the above framebuffers. Remove the superfluous check for their existence.
> 

Not really happy about this pointer comparison, but I don't see a better
way, and it's surely good enough for that purpose.

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  hw/vesa.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index dd59a112330b..8071ad153f27 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -61,8 +61,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
>  	BUILD_BUG_ON(VESA_MEM_SIZE < VESA_BPP/8 * VESA_WIDTH * VESA_HEIGHT);
>  
> -	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
> -		return NULL;
> +	if (device__find_dev(vesa_device.bus_type, vesa_device.dev_num) == &vesa_device) {
> +		r = -EEXIST;
> +		goto out_error;
> +	}
> +
>  	vesa_base_addr = pci_get_io_port_block(PCI_IO_SIZE);
>  	r = ioport__register(kvm, vesa_base_addr, &vesa_io_ops, PCI_IO_SIZE, NULL);
>  	if (r < 0)
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 27/32] pci: Implement callbacks for toggling BAR emulation
  2020-03-26 15:24 ` [PATCH v3 kvmtool 27/32] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
@ 2020-04-03 11:57   ` André Przywara
  2020-04-03 18:14     ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-04-03 11:57 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	Alexandru Elisei

On 26/03/2020 15:24, Alexandru Elisei wrote:

Hi,

> From: Alexandru Elisei <alexandru.elisei@gmail.com>
> 
> Implement callbacks for activating and deactivating emulation for a BAR
> region. This is in preparation for allowing a guest operating system to
> enable and disable access to I/O or memory space, or to reassign the
> BARs.
> 
> The emulated vesa device framebuffer isn't designed to allow stopping and
> restarting at arbitrary points in the guest execution. Furthermore, on x86,
> the kernel will not change the BAR addresses, which on bare metal are
> programmed by the firmware, so take the easy way out and refuse to
> activate/deactivate emulation for the BAR regions. We also take this
> opportunity to make the vesa emulation code more consistent by moving all
> static variable definitions in one place, at the top of the file.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  hw/vesa.c         |  70 ++++++++++++++++++++------------
>  include/kvm/pci.h |  18 ++++++++-
>  pci.c             |  44 ++++++++++++++++++++
>  vfio/pci.c        | 100 ++++++++++++++++++++++++++++++++++++++--------
>  virtio/pci.c      |  90 ++++++++++++++++++++++++++++++-----------
>  5 files changed, 254 insertions(+), 68 deletions(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index 8071ad153f27..31c2d16ae4de 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -18,6 +18,31 @@
>  #include <inttypes.h>
>  #include <unistd.h>
>  
> +static struct pci_device_header vesa_pci_device = {
> +	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> +	.device_id	= cpu_to_le16(PCI_DEVICE_ID_VESA),
> +	.header_type	= PCI_HEADER_TYPE_NORMAL,
> +	.revision_id	= 0,
> +	.class[2]	= 0x03,
> +	.subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
> +	.subsys_id	= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
> +	.bar[1]		= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
> +	.bar_size[1]	= VESA_MEM_SIZE,
> +};
> +
> +static struct device_header vesa_device = {
> +	.bus_type	= DEVICE_BUS_PCI,
> +	.data		= &vesa_pci_device,
> +};
> +
> +static struct framebuffer vesafb = {
> +	.width		= VESA_WIDTH,
> +	.height		= VESA_HEIGHT,
> +	.depth		= VESA_BPP,
> +	.mem_addr	= VESA_MEM_ADDR,
> +	.mem_size	= VESA_MEM_SIZE,
> +};
> +
>  static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>  {
>  	return true;
> @@ -33,24 +58,19 @@ static struct ioport_operations vesa_io_ops = {
>  	.io_out			= vesa_pci_io_out,
>  };
>  
> -static struct pci_device_header vesa_pci_device = {
> -	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> -	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
> -	.header_type		= PCI_HEADER_TYPE_NORMAL,
> -	.revision_id		= 0,
> -	.class[2]		= 0x03,
> -	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
> -	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
> -	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
> -	.bar_size[1]		= VESA_MEM_SIZE,
> -};
> -
> -static struct device_header vesa_device = {
> -	.bus_type	= DEVICE_BUS_PCI,
> -	.data		= &vesa_pci_device,
> -};
> +static int vesa__bar_activate(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      int bar_num, void *data)
> +{
> +	/* We don't support remapping of the framebuffer. */
> +	return 0;
> +}
>  
> -static struct framebuffer vesafb;
> +static int vesa__bar_deactivate(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +				int bar_num, void *data)
> +{
> +	/* We don't support remapping of the framebuffer. */
> +	return -EINVAL;
> +}
>  
>  struct framebuffer *vesa__init(struct kvm *kvm)
>  {
> @@ -73,6 +93,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
>  	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
> +	r = pci__register_bar_regions(kvm, &vesa_pci_device, vesa__bar_activate,
> +				      vesa__bar_deactivate, NULL);
> +	if (r < 0)
> +		goto unregister_ioport;
> +
>  	r = device__register(&vesa_device);
>  	if (r < 0)
>  		goto unregister_ioport;
> @@ -87,15 +112,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  	if (r < 0)
>  		goto unmap_dev;
>  
> -	vesafb = (struct framebuffer) {
> -		.width			= VESA_WIDTH,
> -		.height			= VESA_HEIGHT,
> -		.depth			= VESA_BPP,
> -		.mem			= mem,
> -		.mem_addr		= VESA_MEM_ADDR,
> -		.mem_size		= VESA_MEM_SIZE,
> -		.kvm			= kvm,
> -	};
> +	vesafb.mem = mem;
> +	vesafb.kvm = kvm;
>  	return fb__register(&vesafb);
>  
>  unmap_dev:

Those transformations look correct to me.

> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index adb4b5c082d5..1d7d4c0cea5a 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -89,12 +89,19 @@ struct pci_cap_hdr {
>  	u8	next;
>  };
>  
> +struct pci_device_header;
> +
> +typedef int (*bar_activate_fn_t)(struct kvm *kvm,
> +				 struct pci_device_header *pci_hdr,
> +				 int bar_num, void *data);
> +typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
> +				   struct pci_device_header *pci_hdr,
> +				   int bar_num, void *data);
> +
>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>  #define PCI_DEV_CFG_SIZE	256
>  #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
> -struct pci_device_header;
> -
>  struct pci_config_operations {
>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>  		      u8 offset, void *data, int sz);
> @@ -136,6 +143,9 @@ struct pci_device_header {
>  
>  	/* Private to lkvm */
>  	u32		bar_size[6];
> +	bar_activate_fn_t	bar_activate_fn;
> +	bar_deactivate_fn_t	bar_deactivate_fn;
> +	void *data;
>  	struct pci_config_operations	cfg_ops;
>  	/*
>  	 * PCI INTx# are level-triggered, but virtual device often feature
> @@ -162,6 +172,10 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
>  
>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>  
> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      bar_activate_fn_t bar_activate_fn,
> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
> +
>  static inline bool __pci__memory_space_enabled(u16 command)
>  {
>  	return command & PCI_COMMAND_MEMORY;
> diff --git a/pci.c b/pci.c
> index 611e2c0bf1da..4ace190898f2 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -66,6 +66,11 @@ void pci__assign_irq(struct device_header *dev_hdr)
>  		pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
>  }
>  
> +static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return pci__bar_size(pci_hdr, bar_num);
> +}
> +
>  static void *pci_config_address_ptr(u16 port)
>  {
>  	unsigned long offset;
> @@ -273,6 +278,45 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
>  	return hdr->data;
>  }
>  
> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      bar_activate_fn_t bar_activate_fn,
> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
> +{
> +	int i, r;
> +	bool has_bar_regions = false;
> +
> +	assert(bar_activate_fn && bar_deactivate_fn);
> +
> +	pci_hdr->bar_activate_fn = bar_activate_fn;
> +	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
> +	pci_hdr->data = data;
> +
> +	for (i = 0; i < 6; i++) {
> +		if (!pci_bar_is_implemented(pci_hdr, i))
> +			continue;
> +
> +		has_bar_regions = true;
> +
> +		if (pci__bar_is_io(pci_hdr, i) &&
> +		    pci__io_space_enabled(pci_hdr)) {
> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
> +				if (r < 0)
> +					return r;
> +			}

Indentation seems to be off here, I think the last 4 lines need to have
one tab removed.

> +
> +		if (pci__bar_is_memory(pci_hdr, i) &&
> +		    pci__memory_space_enabled(pci_hdr)) {
> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
> +				if (r < 0)
> +					return r;
> +			}

Same indentation issue here.

> +	}
> +
> +	assert(has_bar_regions);

Is assert() here really a good idea? I see that it makes sense for our
emulated devices, but is that a valid check for VFIO?
From briefly looking I can't find a requirement for having at least one
valid BAR in general, and even if - I think we should rather return an
error than aborting the guest here - or ignore it altogether.

> +
> +	return 0;
> +}
> +
>  int pci__init(struct kvm *kvm)
>  {
>  	int r;
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 8b2a0c8dbac3..18e22a8c5320 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -8,6 +8,8 @@
>  #include <sys/resource.h>
>  #include <sys/time.h>
>  
> +#include <assert.h>
> +
>  /* Wrapper around UAPI vfio_irq_set */
>  union vfio_irq_eventfd {
>  	struct vfio_irq_set	irq;
> @@ -446,6 +448,81 @@ out_unlock:
>  	mutex_unlock(&pdev->msi.mutex);
>  }
>  
> +static int vfio_pci_bar_activate(struct kvm *kvm,
> +				 struct pci_device_header *pci_hdr,
> +				 int bar_num, void *data)
> +{
> +	struct vfio_device *vdev = data;
> +	struct vfio_pci_device *pdev = &vdev->pci;
> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
> +	struct vfio_region *region;
> +	bool has_msix;
> +	int ret;
> +
> +	assert((u32)bar_num < vdev->info.num_regions);
> +
> +	region = &vdev->regions[bar_num];
> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
> +
> +	if (has_msix && (u32)bar_num == table->bar) {
> +		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
> +					 table->size, false,
> +					 vfio_pci_msix_table_access, pdev);
> +		if (ret < 0 || table->bar != pba->bar)

I think this second expression deserves some comment.
If I understand correctly, this would register the PBA trap handler
separetely below if both the MSIX table and the PBA share a BAR?

> +			goto out;

Is there any particular reason you are using goto here? I find it more
confusing if the "out:" label has just a return statement, without any
cleanup or lock dropping. Just a "return ret;" here would be much
cleaner I think. Same for other occassions in this function and
elsewhere in this patch.

Or do you plan on adding some code here later? I don't see it in this
series though.

> +	}
> +
> +	if (has_msix && (u32)bar_num == pba->bar) {
> +		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
> +					 pba->size, false,
> +					 vfio_pci_msix_pba_access, pdev);
> +		goto out;
> +	}
> +
> +	ret = vfio_map_region(kvm, vdev, region);
> +out:
> +	return ret;
> +}
> +
> +static int vfio_pci_bar_deactivate(struct kvm *kvm,
> +				   struct pci_device_header *pci_hdr,
> +				   int bar_num, void *data)
> +{
> +	struct vfio_device *vdev = data;
> +	struct vfio_pci_device *pdev = &vdev->pci;
> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
> +	struct vfio_region *region;
> +	bool has_msix, success;
> +	int ret;
> +
> +	assert((u32)bar_num < vdev->info.num_regions);
> +
> +	region = &vdev->regions[bar_num];
> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
> +
> +	if (has_msix && (u32)bar_num == table->bar) {
> +		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
> +		/* kvm__deregister_mmio fails when the region is not found. */
> +		ret = (success ? 0 : -ENOENT);
> +		if (ret < 0 || table->bar!= pba->bar)
> +			goto out;
> +	}
> +
> +	if (has_msix && (u32)bar_num == pba->bar) {
> +		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
> +		ret = (success ? 0 : -ENOENT);
> +		goto out;
> +	}
> +
> +	vfio_unmap_region(kvm, region);
> +	ret = 0;
> +
> +out:
> +	return ret;
> +}
> +
>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>  			      u8 offset, void *data, int sz)
>  {
> @@ -805,12 +882,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>  		ret = -ENOMEM;
>  		goto out_free;
>  	}
> -	pba->guest_phys_addr = table->guest_phys_addr + table->size;
> -
> -	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
> -				 false, vfio_pci_msix_table_access, pdev);
> -	if (ret < 0)
> -		goto out_free;
>  
>  	/*
>  	 * We could map the physical PBA directly into the guest, but it's
> @@ -820,10 +891,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>  	 * between MSI-X table and PBA. For the sake of isolation, create a
>  	 * virtual PBA.
>  	 */
> -	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
> -				 vfio_pci_msix_pba_access, pdev);
> -	if (ret < 0)
> -		goto out_free;
> +	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>  
>  	pdev->msix.entries = entries;
>  	pdev->msix.nr_entries = nr_entries;
> @@ -894,11 +962,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  		region->guest_phys_addr = pci_get_mmio_block(map_size);
>  	}
>  
> -	/* Map the BARs into the guest or setup a trap region. */
> -	ret = vfio_map_region(kvm, vdev, region);
> -	if (ret)
> -		return ret;
> -
>  	return 0;
>  }
>  
> @@ -945,7 +1008,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>  	}
>  
>  	/* We've configured the BARs, fake up a Configuration Space */
> -	return vfio_pci_fixup_cfg_space(vdev);
> +	ret = vfio_pci_fixup_cfg_space(vdev);
> +	if (ret)
> +		return ret;
> +
> +	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
> +					 vfio_pci_bar_deactivate, vdev);
>  }
>  
>  /*
> diff --git a/virtio/pci.c b/virtio/pci.c
> index d111dc499f5e..598da699c241 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -11,6 +11,7 @@
>  #include <sys/ioctl.h>
>  #include <linux/virtio_pci.h>
>  #include <linux/byteorder.h>
> +#include <assert.h>
>  #include <string.h>
>  
>  static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
> @@ -462,6 +463,64 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>  		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
>  }
>  
> +static int virtio_pci__bar_activate(struct kvm *kvm,
> +				    struct pci_device_header *pci_hdr,
> +				    int bar_num, void *data)
> +{
> +	struct virtio_device *vdev = data;
> +	u32 bar_addr, bar_size;
> +	int r = -EINVAL;
> +
> +	assert(bar_num <= 2);
> +
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +	bar_size = pci__bar_size(pci_hdr, bar_num);
> +
> +	switch (bar_num) {
> +	case 0:
> +		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
> +				     bar_size, vdev);
> +		if (r > 0)
> +			r = 0;
> +		break;
> +	case 1:
> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
> +					virtio_pci__io_mmio_callback, vdev);
> +		break;
> +	case 2:
> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
> +					virtio_pci__msix_mmio_callback, vdev);

I think adding a break; here looks nicer.

Cheers,
Andre


> +	}
> +
> +	return r;
> +}
> +
> +static int virtio_pci__bar_deactivate(struct kvm *kvm,
> +				      struct pci_device_header *pci_hdr,
> +				      int bar_num, void *data)
> +{
> +	u32 bar_addr;
> +	bool success;
> +	int r = -EINVAL;
> +
> +	assert(bar_num <= 2);
> +
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +
> +	switch (bar_num) {
> +	case 0:
> +		r = ioport__unregister(kvm, bar_addr);
> +		break;
> +	case 1:
> +	case 2:
> +		success = kvm__deregister_mmio(kvm, bar_addr);
> +		/* kvm__deregister_mmio fails when the region is not found. */
> +		r = (success ? 0 : -ENOENT);
> +	}
> +
> +	return r;
> +}
> +
>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		     int device_id, int subsys_id, int class)
>  {
> @@ -476,23 +535,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>  
>  	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
> -	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
> -			     vdev);
> -	if (r < 0)
> -		return r;
> -	port_addr = (u16)r;
> -
>  	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
> -	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
> -			       virtio_pci__io_mmio_callback, vdev);
> -	if (r < 0)
> -		goto free_ioport;
> -
>  	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
> -	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
> -			       virtio_pci__msix_mmio_callback, vdev);
> -	if (r < 0)
> -		goto free_mmio;
>  
>  	vpci->pci_hdr = (struct pci_device_header) {
>  		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> @@ -518,6 +562,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
>  	};
>  
> +	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
> +				      virtio_pci__bar_activate,
> +				      virtio_pci__bar_deactivate, vdev);
> +	if (r < 0)
> +		return r;
> +
>  	vpci->dev_hdr = (struct device_header) {
>  		.bus_type		= DEVICE_BUS_PCI,
>  		.data			= &vpci->pci_hdr,
> @@ -548,20 +598,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  
>  	r = device__register(&vpci->dev_hdr);
>  	if (r < 0)
> -		goto free_msix_mmio;
> +		return r;
>  
>  	/* save the IRQ that device__register() has allocated */
>  	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
>  
>  	return 0;
> -
> -free_msix_mmio:
> -	kvm__deregister_mmio(kvm, msix_io_block);
> -free_mmio:
> -	kvm__deregister_mmio(kvm, mmio_addr);
> -free_ioport:
> -	ioport__unregister(kvm, port_addr);
> -	return r;
>  }
>  
>  int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 27/32] pci: Implement callbacks for toggling BAR emulation
  2020-04-03 11:57   ` André Przywara
@ 2020-04-03 18:14     ` Alexandru Elisei
  2020-04-03 19:08       ` André Przywara
  0 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-04-03 18:14 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	Alexandru Elisei

Hi,

On 4/3/20 12:57 PM, André Przywara wrote:
> On 26/03/2020 15:24, Alexandru Elisei wrote:
>
> Hi,
>
>> From: Alexandru Elisei <alexandru.elisei@gmail.com>
>>
>> Implement callbacks for activating and deactivating emulation for a BAR
>> region. This is in preparation for allowing a guest operating system to
>> enable and disable access to I/O or memory space, or to reassign the
>> BARs.
>>
>> The emulated vesa device framebuffer isn't designed to allow stopping and
>> restarting at arbitrary points in the guest execution. Furthermore, on x86,
>> the kernel will not change the BAR addresses, which on bare metal are
>> programmed by the firmware, so take the easy way out and refuse to
>> activate/deactivate emulation for the BAR regions. We also take this
>> opportunity to make the vesa emulation code more consistent by moving all
>> static variable definitions in one place, at the top of the file.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  hw/vesa.c         |  70 ++++++++++++++++++++------------
>>  include/kvm/pci.h |  18 ++++++++-
>>  pci.c             |  44 ++++++++++++++++++++
>>  vfio/pci.c        | 100 ++++++++++++++++++++++++++++++++++++++--------
>>  virtio/pci.c      |  90 ++++++++++++++++++++++++++++++-----------
>>  5 files changed, 254 insertions(+), 68 deletions(-)
>>
>> diff --git a/hw/vesa.c b/hw/vesa.c
>> index 8071ad153f27..31c2d16ae4de 100644
>> --- a/hw/vesa.c
>> +++ b/hw/vesa.c
>> @@ -18,6 +18,31 @@
>>  #include <inttypes.h>
>>  #include <unistd.h>
>>  
>> +static struct pci_device_header vesa_pci_device = {
>> +	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>> +	.device_id	= cpu_to_le16(PCI_DEVICE_ID_VESA),
>> +	.header_type	= PCI_HEADER_TYPE_NORMAL,
>> +	.revision_id	= 0,
>> +	.class[2]	= 0x03,
>> +	.subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>> +	.subsys_id	= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>> +	.bar[1]		= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
>> +	.bar_size[1]	= VESA_MEM_SIZE,
>> +};
>> +
>> +static struct device_header vesa_device = {
>> +	.bus_type	= DEVICE_BUS_PCI,
>> +	.data		= &vesa_pci_device,
>> +};
>> +
>> +static struct framebuffer vesafb = {
>> +	.width		= VESA_WIDTH,
>> +	.height		= VESA_HEIGHT,
>> +	.depth		= VESA_BPP,
>> +	.mem_addr	= VESA_MEM_ADDR,
>> +	.mem_size	= VESA_MEM_SIZE,
>> +};
>> +
>>  static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>>  {
>>  	return true;
>> @@ -33,24 +58,19 @@ static struct ioport_operations vesa_io_ops = {
>>  	.io_out			= vesa_pci_io_out,
>>  };
>>  
>> -static struct pci_device_header vesa_pci_device = {
>> -	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>> -	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
>> -	.header_type		= PCI_HEADER_TYPE_NORMAL,
>> -	.revision_id		= 0,
>> -	.class[2]		= 0x03,
>> -	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>> -	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>> -	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
>> -	.bar_size[1]		= VESA_MEM_SIZE,
>> -};
>> -
>> -static struct device_header vesa_device = {
>> -	.bus_type	= DEVICE_BUS_PCI,
>> -	.data		= &vesa_pci_device,
>> -};
>> +static int vesa__bar_activate(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			      int bar_num, void *data)
>> +{
>> +	/* We don't support remapping of the framebuffer. */
>> +	return 0;
>> +}
>>  
>> -static struct framebuffer vesafb;
>> +static int vesa__bar_deactivate(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +				int bar_num, void *data)
>> +{
>> +	/* We don't support remapping of the framebuffer. */
>> +	return -EINVAL;
>> +}
>>  
>>  struct framebuffer *vesa__init(struct kvm *kvm)
>>  {
>> @@ -73,6 +93,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>  
>>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
>>  	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
>> +	r = pci__register_bar_regions(kvm, &vesa_pci_device, vesa__bar_activate,
>> +				      vesa__bar_deactivate, NULL);
>> +	if (r < 0)
>> +		goto unregister_ioport;
>> +
>>  	r = device__register(&vesa_device);
>>  	if (r < 0)
>>  		goto unregister_ioport;
>> @@ -87,15 +112,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>  	if (r < 0)
>>  		goto unmap_dev;
>>  
>> -	vesafb = (struct framebuffer) {
>> -		.width			= VESA_WIDTH,
>> -		.height			= VESA_HEIGHT,
>> -		.depth			= VESA_BPP,
>> -		.mem			= mem,
>> -		.mem_addr		= VESA_MEM_ADDR,
>> -		.mem_size		= VESA_MEM_SIZE,
>> -		.kvm			= kvm,
>> -	};
>> +	vesafb.mem = mem;
>> +	vesafb.kvm = kvm;
>>  	return fb__register(&vesafb);
>>  
>>  unmap_dev:
> Those transformations look correct to me.
>
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index adb4b5c082d5..1d7d4c0cea5a 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -89,12 +89,19 @@ struct pci_cap_hdr {
>>  	u8	next;
>>  };
>>  
>> +struct pci_device_header;
>> +
>> +typedef int (*bar_activate_fn_t)(struct kvm *kvm,
>> +				 struct pci_device_header *pci_hdr,
>> +				 int bar_num, void *data);
>> +typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>> +				   struct pci_device_header *pci_hdr,
>> +				   int bar_num, void *data);
>> +
>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>>  #define PCI_DEV_CFG_SIZE	256
>>  #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>  
>> -struct pci_device_header;
>> -
>>  struct pci_config_operations {
>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>  		      u8 offset, void *data, int sz);
>> @@ -136,6 +143,9 @@ struct pci_device_header {
>>  
>>  	/* Private to lkvm */
>>  	u32		bar_size[6];
>> +	bar_activate_fn_t	bar_activate_fn;
>> +	bar_deactivate_fn_t	bar_deactivate_fn;
>> +	void *data;
>>  	struct pci_config_operations	cfg_ops;
>>  	/*
>>  	 * PCI INTx# are level-triggered, but virtual device often feature
>> @@ -162,6 +172,10 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
>>  
>>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>>  
>> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			      bar_activate_fn_t bar_activate_fn,
>> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
>> +
>>  static inline bool __pci__memory_space_enabled(u16 command)
>>  {
>>  	return command & PCI_COMMAND_MEMORY;
>> diff --git a/pci.c b/pci.c
>> index 611e2c0bf1da..4ace190898f2 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -66,6 +66,11 @@ void pci__assign_irq(struct device_header *dev_hdr)
>>  		pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
>>  }
>>  
>> +static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
>> +{
>> +	return pci__bar_size(pci_hdr, bar_num);
>> +}
>> +
>>  static void *pci_config_address_ptr(u16 port)
>>  {
>>  	unsigned long offset;
>> @@ -273,6 +278,45 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
>>  	return hdr->data;
>>  }
>>  
>> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			      bar_activate_fn_t bar_activate_fn,
>> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
>> +{
>> +	int i, r;
>> +	bool has_bar_regions = false;
>> +
>> +	assert(bar_activate_fn && bar_deactivate_fn);
>> +
>> +	pci_hdr->bar_activate_fn = bar_activate_fn;
>> +	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
>> +	pci_hdr->data = data;
>> +
>> +	for (i = 0; i < 6; i++) {
>> +		if (!pci_bar_is_implemented(pci_hdr, i))
>> +			continue;
>> +
>> +		has_bar_regions = true;
>> +
>> +		if (pci__bar_is_io(pci_hdr, i) &&
>> +		    pci__io_space_enabled(pci_hdr)) {
>> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
>> +				if (r < 0)
>> +					return r;
>> +			}
> Indentation seems to be off here, I think the last 4 lines need to have
> one tab removed.
>
>> +
>> +		if (pci__bar_is_memory(pci_hdr, i) &&
>> +		    pci__memory_space_enabled(pci_hdr)) {
>> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
>> +				if (r < 0)
>> +					return r;
>> +			}
> Same indentation issue here.

Nicely spotted, I'll fix it.

>
>> +	}
>> +
>> +	assert(has_bar_regions);
> Is assert() here really a good idea? I see that it makes sense for our
> emulated devices, but is that a valid check for VFIO?
> From briefly looking I can't find a requirement for having at least one
> valid BAR in general, and even if - I think we should rather return an
> error than aborting the guest here - or ignore it altogether.

The assert here is to discover coding errors with devices, not with the PCI
emulation. Calling pci__register_bar_regions and providing callbacks for when BAR
access is toggled, but *without* any valid BARs looks like a coding error in the
device emulation code to me.

As for VFIO, I'm struggling to find a valid reason for someone to build a device
that uses PCI, but doesn't have any BARs. Isn't that the entire point of PCI? I'm
perfectly happy to remove the assert if you can provide an rationale for building
such a device.

>
>> +
>> +	return 0;
>> +}
>> +
>>  int pci__init(struct kvm *kvm)
>>  {
>>  	int r;
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 8b2a0c8dbac3..18e22a8c5320 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -8,6 +8,8 @@
>>  #include <sys/resource.h>
>>  #include <sys/time.h>
>>  
>> +#include <assert.h>
>> +
>>  /* Wrapper around UAPI vfio_irq_set */
>>  union vfio_irq_eventfd {
>>  	struct vfio_irq_set	irq;
>> @@ -446,6 +448,81 @@ out_unlock:
>>  	mutex_unlock(&pdev->msi.mutex);
>>  }
>>  
>> +static int vfio_pci_bar_activate(struct kvm *kvm,
>> +				 struct pci_device_header *pci_hdr,
>> +				 int bar_num, void *data)
>> +{
>> +	struct vfio_device *vdev = data;
>> +	struct vfio_pci_device *pdev = &vdev->pci;
>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>> +	struct vfio_region *region;
>> +	bool has_msix;
>> +	int ret;
>> +
>> +	assert((u32)bar_num < vdev->info.num_regions);
>> +
>> +	region = &vdev->regions[bar_num];
>> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>> +
>> +	if (has_msix && (u32)bar_num == table->bar) {
>> +		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>> +					 table->size, false,
>> +					 vfio_pci_msix_table_access, pdev);
>> +		if (ret < 0 || table->bar != pba->bar)
> I think this second expression deserves some comment.
> If I understand correctly, this would register the PBA trap handler
> separetely below if both the MSIX table and the PBA share a BAR?

The MSIX table and the PBA structure can share the same BAR for the base address
(that's why the MSIX capability has an offset field for both of them), but we
register different regions for mmio emulation because we don't want to have a
generic handler and always check if the mmio access was to the MSIX table of the
PBA structure. I can add a comment stating that, sure.

>
>> +			goto out;
> Is there any particular reason you are using goto here? I find it more
> confusing if the "out:" label has just a return statement, without any
> cleanup or lock dropping. Just a "return ret;" here would be much
> cleaner I think. Same for other occassions in this function and
> elsewhere in this patch.
>
> Or do you plan on adding some code here later? I don't see it in this
> series though.

The reason I'm doing this is because I prefer one exit point from the function,
instead of return statements at arbitrary points in the function body. As a point
of reference, the pattern is recommended in the MISRA C standard for safety, in
section 17.4 "No more than one return statement", and is also used in the Linux
kernel. I think it comes down to personal preference, so unless Will of Julien
have a strong preference against it, I would rather keep it.

>
>> +	}
>> +
>> +	if (has_msix && (u32)bar_num == pba->bar) {
>> +		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>> +					 pba->size, false,
>> +					 vfio_pci_msix_pba_access, pdev);
>> +		goto out;
>> +	}
>> +
>> +	ret = vfio_map_region(kvm, vdev, region);
>> +out:
>> +	return ret;
>> +}
>> +
>> +static int vfio_pci_bar_deactivate(struct kvm *kvm,
>> +				   struct pci_device_header *pci_hdr,
>> +				   int bar_num, void *data)
>> +{
>> +	struct vfio_device *vdev = data;
>> +	struct vfio_pci_device *pdev = &vdev->pci;
>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>> +	struct vfio_region *region;
>> +	bool has_msix, success;
>> +	int ret;
>> +
>> +	assert((u32)bar_num < vdev->info.num_regions);
>> +
>> +	region = &vdev->regions[bar_num];
>> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>> +
>> +	if (has_msix && (u32)bar_num == table->bar) {
>> +		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
>> +		/* kvm__deregister_mmio fails when the region is not found. */
>> +		ret = (success ? 0 : -ENOENT);
>> +		if (ret < 0 || table->bar!= pba->bar)
>> +			goto out;
>> +	}
>> +
>> +	if (has_msix && (u32)bar_num == pba->bar) {
>> +		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
>> +		ret = (success ? 0 : -ENOENT);
>> +		goto out;
>> +	}
>> +
>> +	vfio_unmap_region(kvm, region);
>> +	ret = 0;
>> +
>> +out:
>> +	return ret;
>> +}
>> +
>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>  			      u8 offset, void *data, int sz)
>>  {
>> @@ -805,12 +882,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>  		ret = -ENOMEM;
>>  		goto out_free;
>>  	}
>> -	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>> -
>> -	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
>> -				 false, vfio_pci_msix_table_access, pdev);
>> -	if (ret < 0)
>> -		goto out_free;
>>  
>>  	/*
>>  	 * We could map the physical PBA directly into the guest, but it's
>> @@ -820,10 +891,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>  	 * between MSI-X table and PBA. For the sake of isolation, create a
>>  	 * virtual PBA.
>>  	 */
>> -	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
>> -				 vfio_pci_msix_pba_access, pdev);
>> -	if (ret < 0)
>> -		goto out_free;
>> +	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>>  
>>  	pdev->msix.entries = entries;
>>  	pdev->msix.nr_entries = nr_entries;
>> @@ -894,11 +962,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>>  		region->guest_phys_addr = pci_get_mmio_block(map_size);
>>  	}
>>  
>> -	/* Map the BARs into the guest or setup a trap region. */
>> -	ret = vfio_map_region(kvm, vdev, region);
>> -	if (ret)
>> -		return ret;
>> -
>>  	return 0;
>>  }
>>  
>> @@ -945,7 +1008,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>>  	}
>>  
>>  	/* We've configured the BARs, fake up a Configuration Space */
>> -	return vfio_pci_fixup_cfg_space(vdev);
>> +	ret = vfio_pci_fixup_cfg_space(vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
>> +					 vfio_pci_bar_deactivate, vdev);
>>  }
>>  
>>  /*
>> diff --git a/virtio/pci.c b/virtio/pci.c
>> index d111dc499f5e..598da699c241 100644
>> --- a/virtio/pci.c
>> +++ b/virtio/pci.c
>> @@ -11,6 +11,7 @@
>>  #include <sys/ioctl.h>
>>  #include <linux/virtio_pci.h>
>>  #include <linux/byteorder.h>
>> +#include <assert.h>
>>  #include <string.h>
>>  
>>  static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
>> @@ -462,6 +463,64 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>>  		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
>>  }
>>  
>> +static int virtio_pci__bar_activate(struct kvm *kvm,
>> +				    struct pci_device_header *pci_hdr,
>> +				    int bar_num, void *data)
>> +{
>> +	struct virtio_device *vdev = data;
>> +	u32 bar_addr, bar_size;
>> +	int r = -EINVAL;
>> +
>> +	assert(bar_num <= 2);
>> +
>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>> +	bar_size = pci__bar_size(pci_hdr, bar_num);
>> +
>> +	switch (bar_num) {
>> +	case 0:
>> +		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
>> +				     bar_size, vdev);
>> +		if (r > 0)
>> +			r = 0;
>> +		break;
>> +	case 1:
>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>> +					virtio_pci__io_mmio_callback, vdev);
>> +		break;
>> +	case 2:
>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>> +					virtio_pci__msix_mmio_callback, vdev);
> I think adding a break; here looks nicer.

Sure, it will make the function look more consistent.

Thanks,
Alex
>
> Cheers,
> Andre
>
>
>> +	}
>> +
>> +	return r;
>> +}
>> +
>> +static int virtio_pci__bar_deactivate(struct kvm *kvm,
>> +				      struct pci_device_header *pci_hdr,
>> +				      int bar_num, void *data)
>> +{
>> +	u32 bar_addr;
>> +	bool success;
>> +	int r = -EINVAL;
>> +
>> +	assert(bar_num <= 2);
>> +
>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>> +
>> +	switch (bar_num) {
>> +	case 0:
>> +		r = ioport__unregister(kvm, bar_addr);
>> +		break;
>> +	case 1:
>> +	case 2:
>> +		success = kvm__deregister_mmio(kvm, bar_addr);
>> +		/* kvm__deregister_mmio fails when the region is not found. */
>> +		r = (success ? 0 : -ENOENT);
>> +	}
>> +
>> +	return r;
>> +}
>> +
>>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>  		     int device_id, int subsys_id, int class)
>>  {
>> @@ -476,23 +535,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>>  
>>  	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
>> -	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
>> -			     vdev);
>> -	if (r < 0)
>> -		return r;
>> -	port_addr = (u16)r;
>> -
>>  	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
>> -	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
>> -			       virtio_pci__io_mmio_callback, vdev);
>> -	if (r < 0)
>> -		goto free_ioport;
>> -
>>  	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
>> -	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
>> -			       virtio_pci__msix_mmio_callback, vdev);
>> -	if (r < 0)
>> -		goto free_mmio;
>>  
>>  	vpci->pci_hdr = (struct pci_device_header) {
>>  		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>> @@ -518,6 +562,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>  		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
>>  	};
>>  
>> +	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
>> +				      virtio_pci__bar_activate,
>> +				      virtio_pci__bar_deactivate, vdev);
>> +	if (r < 0)
>> +		return r;
>> +
>>  	vpci->dev_hdr = (struct device_header) {
>>  		.bus_type		= DEVICE_BUS_PCI,
>>  		.data			= &vpci->pci_hdr,
>> @@ -548,20 +598,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>  
>>  	r = device__register(&vpci->dev_hdr);
>>  	if (r < 0)
>> -		goto free_msix_mmio;
>> +		return r;
>>  
>>  	/* save the IRQ that device__register() has allocated */
>>  	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
>>  
>>  	return 0;
>> -
>> -free_msix_mmio:
>> -	kvm__deregister_mmio(kvm, msix_io_block);
>> -free_mmio:
>> -	kvm__deregister_mmio(kvm, mmio_addr);
>> -free_ioport:
>> -	ioport__unregister(kvm, port_addr);
>> -	return r;
>>  }
>>  
>>  int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)
>>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 27/32] pci: Implement callbacks for toggling BAR emulation
  2020-04-03 18:14     ` Alexandru Elisei
@ 2020-04-03 19:08       ` André Przywara
  2020-05-05 13:30         ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-04-03 19:08 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	Alexandru Elisei

On 03/04/2020 19:14, Alexandru Elisei wrote:
> Hi,
> 
> On 4/3/20 12:57 PM, André Przywara wrote:
>> On 26/03/2020 15:24, Alexandru Elisei wrote:
>>
>> Hi,
>>
>>> From: Alexandru Elisei <alexandru.elisei@gmail.com>
>>>
>>> Implement callbacks for activating and deactivating emulation for a BAR
>>> region. This is in preparation for allowing a guest operating system to
>>> enable and disable access to I/O or memory space, or to reassign the
>>> BARs.
>>>
>>> The emulated vesa device framebuffer isn't designed to allow stopping and
>>> restarting at arbitrary points in the guest execution. Furthermore, on x86,
>>> the kernel will not change the BAR addresses, which on bare metal are
>>> programmed by the firmware, so take the easy way out and refuse to
>>> activate/deactivate emulation for the BAR regions. We also take this
>>> opportunity to make the vesa emulation code more consistent by moving all
>>> static variable definitions in one place, at the top of the file.
>>>
>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>> ---
>>>  hw/vesa.c         |  70 ++++++++++++++++++++------------
>>>  include/kvm/pci.h |  18 ++++++++-
>>>  pci.c             |  44 ++++++++++++++++++++
>>>  vfio/pci.c        | 100 ++++++++++++++++++++++++++++++++++++++--------
>>>  virtio/pci.c      |  90 ++++++++++++++++++++++++++++++-----------
>>>  5 files changed, 254 insertions(+), 68 deletions(-)
>>>
>>> diff --git a/hw/vesa.c b/hw/vesa.c
>>> index 8071ad153f27..31c2d16ae4de 100644
>>> --- a/hw/vesa.c
>>> +++ b/hw/vesa.c
>>> @@ -18,6 +18,31 @@
>>>  #include <inttypes.h>
>>>  #include <unistd.h>
>>>  
>>> +static struct pci_device_header vesa_pci_device = {
>>> +	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>>> +	.device_id	= cpu_to_le16(PCI_DEVICE_ID_VESA),
>>> +	.header_type	= PCI_HEADER_TYPE_NORMAL,
>>> +	.revision_id	= 0,
>>> +	.class[2]	= 0x03,
>>> +	.subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>>> +	.subsys_id	= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>>> +	.bar[1]		= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
>>> +	.bar_size[1]	= VESA_MEM_SIZE,
>>> +};
>>> +
>>> +static struct device_header vesa_device = {
>>> +	.bus_type	= DEVICE_BUS_PCI,
>>> +	.data		= &vesa_pci_device,
>>> +};
>>> +
>>> +static struct framebuffer vesafb = {
>>> +	.width		= VESA_WIDTH,
>>> +	.height		= VESA_HEIGHT,
>>> +	.depth		= VESA_BPP,
>>> +	.mem_addr	= VESA_MEM_ADDR,
>>> +	.mem_size	= VESA_MEM_SIZE,
>>> +};
>>> +
>>>  static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>>>  {
>>>  	return true;
>>> @@ -33,24 +58,19 @@ static struct ioport_operations vesa_io_ops = {
>>>  	.io_out			= vesa_pci_io_out,
>>>  };
>>>  
>>> -static struct pci_device_header vesa_pci_device = {
>>> -	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>>> -	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
>>> -	.header_type		= PCI_HEADER_TYPE_NORMAL,
>>> -	.revision_id		= 0,
>>> -	.class[2]		= 0x03,
>>> -	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>>> -	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>>> -	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
>>> -	.bar_size[1]		= VESA_MEM_SIZE,
>>> -};
>>> -
>>> -static struct device_header vesa_device = {
>>> -	.bus_type	= DEVICE_BUS_PCI,
>>> -	.data		= &vesa_pci_device,
>>> -};
>>> +static int vesa__bar_activate(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> +			      int bar_num, void *data)
>>> +{
>>> +	/* We don't support remapping of the framebuffer. */
>>> +	return 0;
>>> +}
>>>  
>>> -static struct framebuffer vesafb;
>>> +static int vesa__bar_deactivate(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> +				int bar_num, void *data)
>>> +{
>>> +	/* We don't support remapping of the framebuffer. */
>>> +	return -EINVAL;
>>> +}
>>>  
>>>  struct framebuffer *vesa__init(struct kvm *kvm)
>>>  {
>>> @@ -73,6 +93,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>>  
>>>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
>>>  	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
>>> +	r = pci__register_bar_regions(kvm, &vesa_pci_device, vesa__bar_activate,
>>> +				      vesa__bar_deactivate, NULL);
>>> +	if (r < 0)
>>> +		goto unregister_ioport;
>>> +
>>>  	r = device__register(&vesa_device);
>>>  	if (r < 0)
>>>  		goto unregister_ioport;
>>> @@ -87,15 +112,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>>  	if (r < 0)
>>>  		goto unmap_dev;
>>>  
>>> -	vesafb = (struct framebuffer) {
>>> -		.width			= VESA_WIDTH,
>>> -		.height			= VESA_HEIGHT,
>>> -		.depth			= VESA_BPP,
>>> -		.mem			= mem,
>>> -		.mem_addr		= VESA_MEM_ADDR,
>>> -		.mem_size		= VESA_MEM_SIZE,
>>> -		.kvm			= kvm,
>>> -	};
>>> +	vesafb.mem = mem;
>>> +	vesafb.kvm = kvm;
>>>  	return fb__register(&vesafb);
>>>  
>>>  unmap_dev:
>> Those transformations look correct to me.
>>
>>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>>> index adb4b5c082d5..1d7d4c0cea5a 100644
>>> --- a/include/kvm/pci.h
>>> +++ b/include/kvm/pci.h
>>> @@ -89,12 +89,19 @@ struct pci_cap_hdr {
>>>  	u8	next;
>>>  };
>>>  
>>> +struct pci_device_header;
>>> +
>>> +typedef int (*bar_activate_fn_t)(struct kvm *kvm,
>>> +				 struct pci_device_header *pci_hdr,
>>> +				 int bar_num, void *data);
>>> +typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>>> +				   struct pci_device_header *pci_hdr,
>>> +				   int bar_num, void *data);
>>> +
>>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>>>  #define PCI_DEV_CFG_SIZE	256
>>>  #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>  
>>> -struct pci_device_header;
>>> -
>>>  struct pci_config_operations {
>>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>  		      u8 offset, void *data, int sz);
>>> @@ -136,6 +143,9 @@ struct pci_device_header {
>>>  
>>>  	/* Private to lkvm */
>>>  	u32		bar_size[6];
>>> +	bar_activate_fn_t	bar_activate_fn;
>>> +	bar_deactivate_fn_t	bar_deactivate_fn;
>>> +	void *data;
>>>  	struct pci_config_operations	cfg_ops;
>>>  	/*
>>>  	 * PCI INTx# are level-triggered, but virtual device often feature
>>> @@ -162,6 +172,10 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
>>>  
>>>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>>>  
>>> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> +			      bar_activate_fn_t bar_activate_fn,
>>> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
>>> +
>>>  static inline bool __pci__memory_space_enabled(u16 command)
>>>  {
>>>  	return command & PCI_COMMAND_MEMORY;
>>> diff --git a/pci.c b/pci.c
>>> index 611e2c0bf1da..4ace190898f2 100644
>>> --- a/pci.c
>>> +++ b/pci.c
>>> @@ -66,6 +66,11 @@ void pci__assign_irq(struct device_header *dev_hdr)
>>>  		pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
>>>  }
>>>  
>>> +static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
>>> +{
>>> +	return pci__bar_size(pci_hdr, bar_num);
>>> +}
>>> +
>>>  static void *pci_config_address_ptr(u16 port)
>>>  {
>>>  	unsigned long offset;
>>> @@ -273,6 +278,45 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
>>>  	return hdr->data;
>>>  }
>>>  
>>> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> +			      bar_activate_fn_t bar_activate_fn,
>>> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
>>> +{
>>> +	int i, r;
>>> +	bool has_bar_regions = false;
>>> +
>>> +	assert(bar_activate_fn && bar_deactivate_fn);
>>> +
>>> +	pci_hdr->bar_activate_fn = bar_activate_fn;
>>> +	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
>>> +	pci_hdr->data = data;
>>> +
>>> +	for (i = 0; i < 6; i++) {
>>> +		if (!pci_bar_is_implemented(pci_hdr, i))
>>> +			continue;
>>> +
>>> +		has_bar_regions = true;
>>> +
>>> +		if (pci__bar_is_io(pci_hdr, i) &&
>>> +		    pci__io_space_enabled(pci_hdr)) {
>>> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
>>> +				if (r < 0)
>>> +					return r;
>>> +			}
>> Indentation seems to be off here, I think the last 4 lines need to have
>> one tab removed.
>>
>>> +
>>> +		if (pci__bar_is_memory(pci_hdr, i) &&
>>> +		    pci__memory_space_enabled(pci_hdr)) {
>>> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
>>> +				if (r < 0)
>>> +					return r;
>>> +			}
>> Same indentation issue here.
> 
> Nicely spotted, I'll fix it.
> 
>>
>>> +	}
>>> +
>>> +	assert(has_bar_regions);
>> Is assert() here really a good idea? I see that it makes sense for our
>> emulated devices, but is that a valid check for VFIO?
>> From briefly looking I can't find a requirement for having at least one
>> valid BAR in general, and even if - I think we should rather return an
>> error than aborting the guest here - or ignore it altogether.
> 
> The assert here is to discover coding errors with devices, not with the PCI
> emulation. Calling pci__register_bar_regions and providing callbacks for when BAR
> access is toggled, but *without* any valid BARs looks like a coding error in the
> device emulation code to me.

As I said, I totally see the point for our emulated devices, but it
looks like we use this code also for VFIO? Where we are not in control
of what the device exposes.

> As for VFIO, I'm struggling to find a valid reason for someone to build a device
> that uses PCI, but doesn't have any BARs. Isn't that the entire point of PCI? I'm
> perfectly happy to remove the assert if you can provide an rationale for building
> such a device.

IIRC you have an AMD box, check the "Northbridge" PCI device there,
devices 0:18.x. They provide chipset registers via (extended) config
space only, they don't have any valid BARs. Also I found some SMBus
device without BARs.
Not the most prominent use case (especially for pass through), but
apparently valid.
I think the rationale for using this was to use a well established,
supported and discoverable interface.

>>
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  int pci__init(struct kvm *kvm)
>>>  {
>>>  	int r;
>>> diff --git a/vfio/pci.c b/vfio/pci.c
>>> index 8b2a0c8dbac3..18e22a8c5320 100644
>>> --- a/vfio/pci.c
>>> +++ b/vfio/pci.c
>>> @@ -8,6 +8,8 @@
>>>  #include <sys/resource.h>
>>>  #include <sys/time.h>
>>>  
>>> +#include <assert.h>
>>> +
>>>  /* Wrapper around UAPI vfio_irq_set */
>>>  union vfio_irq_eventfd {
>>>  	struct vfio_irq_set	irq;
>>> @@ -446,6 +448,81 @@ out_unlock:
>>>  	mutex_unlock(&pdev->msi.mutex);
>>>  }
>>>  
>>> +static int vfio_pci_bar_activate(struct kvm *kvm,
>>> +				 struct pci_device_header *pci_hdr,
>>> +				 int bar_num, void *data)
>>> +{
>>> +	struct vfio_device *vdev = data;
>>> +	struct vfio_pci_device *pdev = &vdev->pci;
>>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>>> +	struct vfio_region *region;
>>> +	bool has_msix;
>>> +	int ret;
>>> +
>>> +	assert((u32)bar_num < vdev->info.num_regions);
>>> +
>>> +	region = &vdev->regions[bar_num];
>>> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>>> +
>>> +	if (has_msix && (u32)bar_num == table->bar) {
>>> +		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>>> +					 table->size, false,
>>> +					 vfio_pci_msix_table_access, pdev);
>>> +		if (ret < 0 || table->bar != pba->bar)
>> I think this second expression deserves some comment.
>> If I understand correctly, this would register the PBA trap handler
>> separetely below if both the MSIX table and the PBA share a BAR?
> 
> The MSIX table and the PBA structure can share the same BAR for the base address
> (that's why the MSIX capability has an offset field for both of them), but we
> register different regions for mmio emulation because we don't want to have a
> generic handler and always check if the mmio access was to the MSIX table of the
> PBA structure. I can add a comment stating that, sure.

Yes, thanks for the explanation!

>>
>>> +			goto out;
>> Is there any particular reason you are using goto here? I find it more
>> confusing if the "out:" label has just a return statement, without any
>> cleanup or lock dropping. Just a "return ret;" here would be much
>> cleaner I think. Same for other occassions in this function and
>> elsewhere in this patch.
>>
>> Or do you plan on adding some code here later? I don't see it in this
>> series though.
> 
> The reason I'm doing this is because I prefer one exit point from the function,
> instead of return statements at arbitrary points in the function body. As a point
> of reference, the pattern is recommended in the MISRA C standard for safety, in
> section 17.4 "No more than one return statement", and is also used in the Linux
> kernel. I think it comes down to personal preference, so unless Will of Julien
> have a strong preference against it, I would rather keep it.

Fair enough, your decision. Just to point out that I can't find this
practice in the kernel, also:

Documentation/process/coding-style.rst, section "7) Centralized exiting
of functions":
"The goto statement comes in handy when a function exits from multiple
locations and some common work such as cleanup has to be done.  If there
is no cleanup needed then just return directly."


Thanks!
Andre.

> 
>>
>>> +	}
>>> +
>>> +	if (has_msix && (u32)bar_num == pba->bar) {
>>> +		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>>> +					 pba->size, false,
>>> +					 vfio_pci_msix_pba_access, pdev);
>>> +		goto out;
>>> +	}
>>> +
>>> +	ret = vfio_map_region(kvm, vdev, region);
>>> +out:
>>> +	return ret;
>>> +}
>>> +
>>> +static int vfio_pci_bar_deactivate(struct kvm *kvm,
>>> +				   struct pci_device_header *pci_hdr,
>>> +				   int bar_num, void *data)
>>> +{
>>> +	struct vfio_device *vdev = data;
>>> +	struct vfio_pci_device *pdev = &vdev->pci;
>>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>>> +	struct vfio_region *region;
>>> +	bool has_msix, success;
>>> +	int ret;
>>> +
>>> +	assert((u32)bar_num < vdev->info.num_regions);
>>> +
>>> +	region = &vdev->regions[bar_num];
>>> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>>> +
>>> +	if (has_msix && (u32)bar_num == table->bar) {
>>> +		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
>>> +		/* kvm__deregister_mmio fails when the region is not found. */
>>> +		ret = (success ? 0 : -ENOENT);
>>> +		if (ret < 0 || table->bar!= pba->bar)
>>> +			goto out;
>>> +	}
>>> +
>>> +	if (has_msix && (u32)bar_num == pba->bar) {
>>> +		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
>>> +		ret = (success ? 0 : -ENOENT);
>>> +		goto out;
>>> +	}
>>> +
>>> +	vfio_unmap_region(kvm, region);
>>> +	ret = 0;
>>> +
>>> +out:
>>> +	return ret;
>>> +}
>>> +
>>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>  			      u8 offset, void *data, int sz)
>>>  {
>>> @@ -805,12 +882,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>>  		ret = -ENOMEM;
>>>  		goto out_free;
>>>  	}
>>> -	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>>> -
>>> -	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
>>> -				 false, vfio_pci_msix_table_access, pdev);
>>> -	if (ret < 0)
>>> -		goto out_free;
>>>  
>>>  	/*
>>>  	 * We could map the physical PBA directly into the guest, but it's
>>> @@ -820,10 +891,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>>  	 * between MSI-X table and PBA. For the sake of isolation, create a
>>>  	 * virtual PBA.
>>>  	 */
>>> -	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
>>> -				 vfio_pci_msix_pba_access, pdev);
>>> -	if (ret < 0)
>>> -		goto out_free;
>>> +	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>>>  
>>>  	pdev->msix.entries = entries;
>>>  	pdev->msix.nr_entries = nr_entries;
>>> @@ -894,11 +962,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>>>  		region->guest_phys_addr = pci_get_mmio_block(map_size);
>>>  	}
>>>  
>>> -	/* Map the BARs into the guest or setup a trap region. */
>>> -	ret = vfio_map_region(kvm, vdev, region);
>>> -	if (ret)
>>> -		return ret;
>>> -
>>>  	return 0;
>>>  }
>>>  
>>> @@ -945,7 +1008,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>>>  	}
>>>  
>>>  	/* We've configured the BARs, fake up a Configuration Space */
>>> -	return vfio_pci_fixup_cfg_space(vdev);
>>> +	ret = vfio_pci_fixup_cfg_space(vdev);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
>>> +					 vfio_pci_bar_deactivate, vdev);
>>>  }
>>>  
>>>  /*
>>> diff --git a/virtio/pci.c b/virtio/pci.c
>>> index d111dc499f5e..598da699c241 100644
>>> --- a/virtio/pci.c
>>> +++ b/virtio/pci.c
>>> @@ -11,6 +11,7 @@
>>>  #include <sys/ioctl.h>
>>>  #include <linux/virtio_pci.h>
>>>  #include <linux/byteorder.h>
>>> +#include <assert.h>
>>>  #include <string.h>
>>>  
>>>  static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
>>> @@ -462,6 +463,64 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>>>  		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
>>>  }
>>>  
>>> +static int virtio_pci__bar_activate(struct kvm *kvm,
>>> +				    struct pci_device_header *pci_hdr,
>>> +				    int bar_num, void *data)
>>> +{
>>> +	struct virtio_device *vdev = data;
>>> +	u32 bar_addr, bar_size;
>>> +	int r = -EINVAL;
>>> +
>>> +	assert(bar_num <= 2);
>>> +
>>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>>> +	bar_size = pci__bar_size(pci_hdr, bar_num);
>>> +
>>> +	switch (bar_num) {
>>> +	case 0:
>>> +		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
>>> +				     bar_size, vdev);
>>> +		if (r > 0)
>>> +			r = 0;
>>> +		break;
>>> +	case 1:
>>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>>> +					virtio_pci__io_mmio_callback, vdev);
>>> +		break;
>>> +	case 2:
>>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>>> +					virtio_pci__msix_mmio_callback, vdev);
>> I think adding a break; here looks nicer.
> 
> Sure, it will make the function look more consistent.
> 
> Thanks,
> Alex
>>
>> Cheers,
>> Andre
>>
>>
>>> +	}
>>> +
>>> +	return r;
>>> +}
>>> +
>>> +static int virtio_pci__bar_deactivate(struct kvm *kvm,
>>> +				      struct pci_device_header *pci_hdr,
>>> +				      int bar_num, void *data)
>>> +{
>>> +	u32 bar_addr;
>>> +	bool success;
>>> +	int r = -EINVAL;
>>> +
>>> +	assert(bar_num <= 2);
>>> +
>>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>>> +
>>> +	switch (bar_num) {
>>> +	case 0:
>>> +		r = ioport__unregister(kvm, bar_addr);
>>> +		break;
>>> +	case 1:
>>> +	case 2:
>>> +		success = kvm__deregister_mmio(kvm, bar_addr);
>>> +		/* kvm__deregister_mmio fails when the region is not found. */
>>> +		r = (success ? 0 : -ENOENT);
>>> +	}
>>> +
>>> +	return r;
>>> +}
>>> +
>>>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>  		     int device_id, int subsys_id, int class)
>>>  {
>>> @@ -476,23 +535,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>>>  
>>>  	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
>>> -	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
>>> -			     vdev);
>>> -	if (r < 0)
>>> -		return r;
>>> -	port_addr = (u16)r;
>>> -
>>>  	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
>>> -	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
>>> -			       virtio_pci__io_mmio_callback, vdev);
>>> -	if (r < 0)
>>> -		goto free_ioport;
>>> -
>>>  	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
>>> -	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
>>> -			       virtio_pci__msix_mmio_callback, vdev);
>>> -	if (r < 0)
>>> -		goto free_mmio;
>>>  
>>>  	vpci->pci_hdr = (struct pci_device_header) {
>>>  		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>>> @@ -518,6 +562,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>  		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
>>>  	};
>>>  
>>> +	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
>>> +				      virtio_pci__bar_activate,
>>> +				      virtio_pci__bar_deactivate, vdev);
>>> +	if (r < 0)
>>> +		return r;
>>> +
>>>  	vpci->dev_hdr = (struct device_header) {
>>>  		.bus_type		= DEVICE_BUS_PCI,
>>>  		.data			= &vpci->pci_hdr,
>>> @@ -548,20 +598,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>  
>>>  	r = device__register(&vpci->dev_hdr);
>>>  	if (r < 0)
>>> -		goto free_msix_mmio;
>>> +		return r;
>>>  
>>>  	/* save the IRQ that device__register() has allocated */
>>>  	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
>>>  
>>>  	return 0;
>>> -
>>> -free_msix_mmio:
>>> -	kvm__deregister_mmio(kvm, msix_io_block);
>>> -free_mmio:
>>> -	kvm__deregister_mmio(kvm, mmio_addr);
>>> -free_ioport:
>>> -	ioport__unregister(kvm, port_addr);
>>> -	return r;
>>>  }
>>>  
>>>  int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)
>>>


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 28/32] pci: Toggle BAR I/O and memory space emulation
  2020-03-26 15:24 ` [PATCH v3 kvmtool 28/32] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
@ 2020-04-06 14:03   ` André Przywara
  0 siblings, 0 replies; 68+ messages in thread
From: André Przywara @ 2020-04-06 14:03 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> During configuration of the BAR addresses, a Linux guest disables and
> enables access to I/O and memory space. When access is disabled, we don't
> stop emulating the memory regions described by the BARs. Now that we have
> callbacks for activating and deactivating emulation for a BAR region,
> let's use that to stop emulation when access is disabled, and
> re-activate it when access is re-enabled.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  pci.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/pci.c b/pci.c
> index 4ace190898f2..c2860e6707fe 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -163,6 +163,42 @@ static struct ioport_operations pci_config_data_ops = {
>  	.io_out	= pci_config_data_out,
>  };
>  
> +static void pci_config_command_wr(struct kvm *kvm,
> +				  struct pci_device_header *pci_hdr,
> +				  u16 new_command)
> +{
> +	int i;
> +	bool toggle_io, toggle_mem;
> +
> +	toggle_io = (pci_hdr->command ^ new_command) & PCI_COMMAND_IO;
> +	toggle_mem = (pci_hdr->command ^ new_command) & PCI_COMMAND_MEMORY;
> +
> +	for (i = 0; i < 6; i++) {
> +		if (!pci_bar_is_implemented(pci_hdr, i))
> +			continue;
> +
> +		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
> +			if (__pci__io_space_enabled(new_command))
> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> +							 pci_hdr->data);
> +			else
> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> +							   pci_hdr->data);
> +		}
> +
> +		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
> +			if (__pci__memory_space_enabled(new_command))
> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> +							 pci_hdr->data);
> +			else
> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> +							   pci_hdr->data);
> +		}
> +	}
> +
> +	pci_hdr->command = new_command;
> +}
> +
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
>  	void *base;
> @@ -188,6 +224,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	if (*(u32 *)(base + offset) == 0)
>  		return;
>  
> +	if (offset == PCI_COMMAND) {
> +		memcpy(&value, data, size);
> +		pci_config_command_wr(kvm, pci_hdr, (u16)value);
> +		return;
> +	}
> +
>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>  
>  	/*
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 29/32] pci: Implement reassignable BARs
  2020-03-26 15:24 ` [PATCH v3 kvmtool 29/32] pci: Implement reassignable BARs Alexandru Elisei
@ 2020-04-06 14:05   ` André Przywara
  2020-05-06 12:05     ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-04-06 14:05 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> BARs are used by the guest to configure the access to the PCI device by
> writing the address to which the device will respond. The basic idea for
> adding support for reassignable BARs is straightforward: deactivate
> emulation for the memory region described by the old BAR value, and
> activate emulation for the new region.
> 
> BAR reassignement can be done while device access is enabled and memory
> regions for different devices can overlap as long as no access is made
> to the overlapping memory regions. This means that it is legal for the
> BARs of two distinct devices to point to an overlapping memory region,
> and indeed, this is how Linux does resource assignment at boot. To
> account for this situation, the simple algorithm described above is
> enhanced to scan for all devices and:
> 
> - Deactivate emulation for any BARs that might overlap with the new BAR
>   value.
> 
> - Enable emulation for any BARs that were overlapping with the old value
>   after the BAR has been updated.
> 
> Activating/deactivating emulation of a memory region has side effects.
> In order to prevent the execution of the same callback twice we now keep
> track of the state of the region emulation. For example, this can happen
> if we program a BAR with an address that overlaps a second BAR, thus
> deactivating emulation for the second BAR, and then we disable all
> region accesses to the second BAR by writing to the command register.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  include/kvm/pci.h |  14 ++-
>  pci.c             | 273 +++++++++++++++++++++++++++++++++++++---------
>  vfio/pci.c        |  12 ++
>  3 files changed, 244 insertions(+), 55 deletions(-)
> 
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index 1d7d4c0cea5a..be75f77fd2cb 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -11,6 +11,17 @@
>  #include "kvm/msi.h"
>  #include "kvm/fdt.h"
>  
> +#define pci_dev_err(pci_hdr, fmt, ...) \
> +	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_warn(pci_hdr, fmt, ...) \
> +	pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_info(pci_hdr, fmt, ...) \
> +	pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_dbg(pci_hdr, fmt, ...) \
> +	pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_die(pci_hdr, fmt, ...) \
> +	die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +
>  /*
>   * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
>   * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
> @@ -142,7 +153,8 @@ struct pci_device_header {
>  	};
>  
>  	/* Private to lkvm */
> -	u32		bar_size[6];
> +	u32			bar_size[6];
> +	bool			bar_active[6];
>  	bar_activate_fn_t	bar_activate_fn;
>  	bar_deactivate_fn_t	bar_deactivate_fn;
>  	void *data;
> diff --git a/pci.c b/pci.c
> index c2860e6707fe..68ece65441a6 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -71,6 +71,11 @@ static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_nu
>  	return pci__bar_size(pci_hdr, bar_num);
>  }
>  
> +static bool pci_bar_is_active(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return  pci_hdr->bar_active[bar_num];
> +}
> +
>  static void *pci_config_address_ptr(u16 port)
>  {
>  	unsigned long offset;
> @@ -163,6 +168,46 @@ static struct ioport_operations pci_config_data_ops = {
>  	.io_out	= pci_config_data_out,
>  };
>  
> +static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			    int bar_num)
> +{
> +	int r = 0;
> +
> +	if (pci_bar_is_active(pci_hdr, bar_num))
> +		goto out;
> +
> +	r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
> +	if (r < 0) {
> +		pci_dev_warn(pci_hdr, "Error activating emulation for BAR %d",
> +			    bar_num);
> +		goto out;
> +	}
> +	pci_hdr->bar_active[bar_num] = true;
> +
> +out:
> +	return r;
> +}
> +
> +static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      int bar_num)
> +{
> +	int r = 0;
> +
> +	if (!pci_bar_is_active(pci_hdr, bar_num))
> +		goto out;
> +
> +	r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
> +	if (r < 0) {
> +		pci_dev_warn(pci_hdr, "Error deactivating emulation for BAR %d",
> +			    bar_num);
> +		goto out;
> +	}
> +	pci_hdr->bar_active[bar_num] = false;
> +
> +out:
> +	return r;
> +}
> +
>  static void pci_config_command_wr(struct kvm *kvm,
>  				  struct pci_device_header *pci_hdr,
>  				  u16 new_command)
> @@ -179,26 +224,179 @@ static void pci_config_command_wr(struct kvm *kvm,
>  
>  		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
>  			if (__pci__io_space_enabled(new_command))
> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> -							 pci_hdr->data);
> +				pci_activate_bar(kvm, pci_hdr, i);
>  			else
> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> -							   pci_hdr->data);
> +				pci_deactivate_bar(kvm, pci_hdr, i);
>  		}
>  
>  		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
>  			if (__pci__memory_space_enabled(new_command))
> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> -							 pci_hdr->data);
> +				pci_activate_bar(kvm, pci_hdr, i);
>  			else
> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> -							   pci_hdr->data);
> +				pci_deactivate_bar(kvm, pci_hdr, i);
>  		}
>  	}
>  
>  	pci_hdr->command = new_command;
>  }
>  
> +static int pci_deactivate_bar_regions(struct kvm *kvm,
> +				      struct pci_device_header *pci_hdr,

What is this argument for? I don't see it used, also this function is
not used as a callback. A leftover from some refactoring?

> +				      u32 start, u32 size)
> +{
> +	struct device_header *dev_hdr;
> +	struct pci_device_header *tmp_hdr;
> +	u32 tmp_addr, tmp_size;
> +	int i, r;
> +
> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
> +	while (dev_hdr) {
> +		tmp_hdr = dev_hdr->data;
> +		for (i = 0; i < 6; i++) {
> +			if (!pci_bar_is_implemented(tmp_hdr, i))
> +				continue;
> +
> +			tmp_addr = pci__bar_address(tmp_hdr, i);
> +			tmp_size = pci__bar_size(tmp_hdr, i);
> +
> +			if (tmp_addr + tmp_size <= start ||
> +			    tmp_addr >= start + size)
> +				continue;
> +
> +			r = pci_deactivate_bar(kvm, tmp_hdr, i);

This and the function below are identical, apart from this line here.
Any chance we can merge this?

> +			if (r < 0)
> +				return r;
> +		}
> +		dev_hdr = device__next_dev(dev_hdr);
> +	}
> +
> +	return 0;
> +}
> +
> +static int pci_activate_bar_regions(struct kvm *kvm,
> +				    struct pci_device_header *pci_hdr,

Same question about this parameter, we don't seem to need it. But you
should merge those two functions anyway.

The rest seems to look alright. The pci_config_bar_wr() looks a bit
lengthy, but I guess there is no way around it, since this is the gut of
the algorithm explained in the commit message. Can you maybe add a
condensed version of that explanation to the code?

Thanks,
Andre


> +				    u32 start, u32 size)
> +{
> +	struct device_header *dev_hdr;
> +	struct pci_device_header *tmp_hdr;
> +	u32 tmp_addr, tmp_size;
> +	int i, r;
> +
> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
> +	while (dev_hdr) {
> +		tmp_hdr = dev_hdr->data;
> +		for (i = 0; i < 6; i++) {
> +			if (!pci_bar_is_implemented(tmp_hdr, i))
> +				continue;
> +
> +			tmp_addr = pci__bar_address(tmp_hdr, i);
> +			tmp_size = pci__bar_size(tmp_hdr, i);
> +
> +			if (tmp_addr + tmp_size <= start ||
> +			    tmp_addr >= start + size)
> +				continue;
> +
> +			r = pci_activate_bar(kvm, tmp_hdr, i);
> +			if (r < 0)
> +				return r;
> +		}
> +		dev_hdr = device__next_dev(dev_hdr);
> +	}
> +
> +	return 0;
> +}
> +
> +static void pci_config_bar_wr(struct kvm *kvm,
> +			      struct pci_device_header *pci_hdr, int bar_num,
> +			      u32 value)
> +{
> +	u32 old_addr, new_addr, bar_size;
> +	u32 mask;
> +	int r;
> +
> +	if (pci__bar_is_io(pci_hdr, bar_num))
> +		mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
> +	else
> +		mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> +
> +	/*
> +	 * If the kernel masks the BAR, it will expect to find the size of the
> +	 * BAR there next time it reads from it. After the kernel reads the
> +	 * size, it will write the address back.
> +	 *
> +	 * According to the PCI local bus specification REV 3.0: The number of
> +	 * upper bits that a device actually implements depends on how much of
> +	 * the address space the device will respond to. A device that wants a 1
> +	 * MB memory address space (using a 32-bit base address register) would
> +	 * build the top 12 bits of the address register, hardwiring the other
> +	 * bits to 0.
> +	 *
> +	 * Furthermore, software can determine how much address space the device
> +	 * requires by writing a value of all 1's to the register and then
> +	 * reading the value back. The device will return 0's in all don't-care
> +	 * address bits, effectively specifying the address space required.
> +	 *
> +	 * Software computes the size of the address space with the formula
> +	 * S =  ~B + 1, where S is the memory size and B is the value read from
> +	 * the BAR. This means that the BAR value that kvmtool should return is
> +	 * B = ~(S - 1).
> +	 */
> +	if (value == 0xffffffff) {
> +		value = ~(pci__bar_size(pci_hdr, bar_num) - 1);
> +		/* Preserve the special bits. */
> +		value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
> +		pci_hdr->bar[bar_num] = value;
> +		return;
> +	}
> +
> +	value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
> +
> +	/* Don't toggle emulation when region type access is disbled. */
> +	if (pci__bar_is_io(pci_hdr, bar_num) &&
> +	    !pci__io_space_enabled(pci_hdr)) {
> +		pci_hdr->bar[bar_num] = value;
> +		return;
> +	}
> +
> +	if (pci__bar_is_memory(pci_hdr, bar_num) &&
> +	    !pci__memory_space_enabled(pci_hdr)) {
> +		pci_hdr->bar[bar_num] = value;
> +		return;
> +	}
> +
> +	old_addr = pci__bar_address(pci_hdr, bar_num);
> +	new_addr = __pci__bar_address(value);
> +	bar_size = pci__bar_size(pci_hdr, bar_num);
> +
> +	r = pci_deactivate_bar(kvm, pci_hdr, bar_num);
> +	if (r < 0)
> +		return;
> +
> +	r = pci_deactivate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
> +	if (r < 0) {
> +		/*
> +		 * We cannot update the BAR because of an overlapping region
> +		 * that failed to deactivate emulation, so keep the old BAR
> +		 * value and re-activate emulation for it.
> +		 */
> +		pci_activate_bar(kvm, pci_hdr, bar_num);
> +		return;
> +	}
> +
> +	pci_hdr->bar[bar_num] = value;
> +	r = pci_activate_bar(kvm, pci_hdr, bar_num);
> +	if (r < 0) {
> +		/*
> +		 * New region cannot be emulated, re-enable the regions that
> +		 * were overlapping.
> +		 */
> +		pci_activate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
> +		return;
> +	}
> +
> +	pci_activate_bar_regions(kvm, pci_hdr, old_addr, bar_size);
> +}
> +
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
>  	void *base;
> @@ -206,7 +404,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  	u32 value = 0;
> -	u32 mask;
>  
>  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
>  		return;
> @@ -231,46 +428,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	}
>  
>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
> -
> -	/*
> -	 * If the kernel masks the BAR, it will expect to find the size of the
> -	 * BAR there next time it reads from it. After the kernel reads the
> -	 * size, it will write the address back.
> -	 */
>  	if (bar < 6) {
> -		if (pci__bar_is_io(pci_hdr, bar))
> -			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
> -		else
> -			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> -		/*
> -		 * According to the PCI local bus specification REV 3.0:
> -		 * The number of upper bits that a device actually implements
> -		 * depends on how much of the address space the device will
> -		 * respond to. A device that wants a 1 MB memory address space
> -		 * (using a 32-bit base address register) would build the top
> -		 * 12 bits of the address register, hardwiring the other bits
> -		 * to 0.
> -		 *
> -		 * Furthermore, software can determine how much address space
> -		 * the device requires by writing a value of all 1's to the
> -		 * register and then reading the value back. The device will
> -		 * return 0's in all don't-care address bits, effectively
> -		 * specifying the address space required.
> -		 *
> -		 * Software computes the size of the address space with the
> -		 * formula S = ~B + 1, where S is the memory size and B is the
> -		 * value read from the BAR. This means that the BAR value that
> -		 * kvmtool should return is B = ~(S - 1).
> -		 */
>  		memcpy(&value, data, size);
> -		if (value == 0xffffffff)
> -			value = ~(pci__bar_size(pci_hdr, bar) - 1);
> -		/* Preserve the special bits. */
> -		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
> -		memcpy(base + offset, &value, size);
> -	} else {
> -		memcpy(base + offset, data, size);
> +		pci_config_bar_wr(kvm, pci_hdr, bar, value);
> +		return;
>  	}
> +
> +	memcpy(base + offset, data, size);
>  }
>  
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
> @@ -338,20 +502,21 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr
>  			continue;
>  
>  		has_bar_regions = true;
> +		assert(!pci_bar_is_active(pci_hdr, i));
>  
>  		if (pci__bar_is_io(pci_hdr, i) &&
>  		    pci__io_space_enabled(pci_hdr)) {
> -				r = bar_activate_fn(kvm, pci_hdr, i, data);
> -				if (r < 0)
> -					return r;
> -			}
> +			r = pci_activate_bar(kvm, pci_hdr, i);
> +			if (r < 0)
> +				return r;
> +		}
>  
>  		if (pci__bar_is_memory(pci_hdr, i) &&
>  		    pci__memory_space_enabled(pci_hdr)) {
> -				r = bar_activate_fn(kvm, pci_hdr, i, data);
> -				if (r < 0)
> -					return r;
> -			}
> +			r = pci_activate_bar(kvm, pci_hdr, i);
> +			if (r < 0)
> +				return r;
> +		}
>  	}
>  
>  	assert(has_bar_regions);
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 18e22a8c5320..2b891496547d 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -457,6 +457,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>  	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>  	struct vfio_pci_msix_table *table = &pdev->msix_table;
>  	struct vfio_region *region;
> +	u32 bar_addr;
>  	bool has_msix;
>  	int ret;
>  
> @@ -465,7 +466,14 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>  	region = &vdev->regions[bar_num];
>  	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>  
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +	if (pci__bar_is_io(pci_hdr, bar_num))
> +		region->port_base = bar_addr;
> +	else
> +		region->guest_phys_addr = bar_addr;
> +
>  	if (has_msix && (u32)bar_num == table->bar) {
> +		table->guest_phys_addr = region->guest_phys_addr;
>  		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>  					 table->size, false,
>  					 vfio_pci_msix_table_access, pdev);
> @@ -474,6 +482,10 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>  	}
>  
>  	if (has_msix && (u32)bar_num == pba->bar) {
> +		if (pba->bar == table->bar)
> +			pba->guest_phys_addr = table->guest_phys_addr + table->size;
> +		else
> +			pba->guest_phys_addr = region->guest_phys_addr;
>  		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>  					 pba->size, false,
>  					 vfio_pci_msix_pba_access, pdev);
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support
  2020-03-26 15:24 ` [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
@ 2020-04-06 14:06   ` André Przywara
  2020-04-14  8:56     ` Alexandru Elisei
  2020-05-06 13:51     ` Alexandru Elisei
  0 siblings, 2 replies; 68+ messages in thread
From: André Przywara @ 2020-04-06 14:06 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:
> PCI Express comes with an extended addressing scheme, which directly
> translated into a bigger device configuration space (256->4096 bytes)
> and bigger PCI configuration space (16->256 MB), as well as mandatory
> capabilities (power management [1] and PCI Express capability [2]).
> 
> However, our virtio PCI implementation implements version 0.9 of the
> protocol and it still uses transitional PCI device ID's, so we have
> opted to omit the mandatory PCI Express capabilities.For VFIO, the power
> management and PCI Express capability are left for a subsequent patch.
> 
> [1] PCI Express Base Specification Revision 1.1, section 7.6
> [2] PCI Express Base Specification Revision 1.1, section 7.8
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/include/arm-common/kvm-arch.h |  4 +-
>  arm/pci.c                         |  2 +-
>  builtin-run.c                     |  1 +
>  include/kvm/kvm-config.h          |  2 +-
>  include/kvm/pci.h                 | 76 ++++++++++++++++++++++++++++---
>  pci.c                             |  5 +-
>  vfio/pci.c                        | 26 +++++++----
>  7 files changed, 96 insertions(+), 20 deletions(-)
> 
> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
> index b9d486d5eac2..13c55fa3dc29 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -23,7 +23,7 @@
>  
>  #define ARM_IOPORT_SIZE		(ARM_MMIO_AREA - ARM_IOPORT_AREA)
>  #define ARM_VIRTIO_MMIO_SIZE	(ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)

The existence of this symbol seems somewhat odd, there should be no ARM
specific version of the config space size.
Do you know why we don't use the generic PCI_CFG_SIZE here?
At the very least I would expect something like:
#define ARM_PCI_CFG_SIZE	PCI_EXP_CFG_SIZE

>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>  
> @@ -50,6 +50,8 @@
>  
>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>  
> +#define ARCH_HAS_PCI_EXP	1
> +
>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>  {
>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
> diff --git a/arm/pci.c b/arm/pci.c
> index ed325fa4a811..2251f627d8b5 100644
> --- a/arm/pci.c
> +++ b/arm/pci.c
> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>  
>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
> diff --git a/builtin-run.c b/builtin-run.c
> index 9cb8c75300eb..def8a1f803ad 100644
> --- a/builtin-run.c
> +++ b/builtin-run.c
> @@ -27,6 +27,7 @@
>  #include "kvm/irq.h"
>  #include "kvm/kvm.h"
>  #include "kvm/pci.h"
> +#include "kvm/vfio.h"
>  #include "kvm/rtc.h"
>  #include "kvm/sdl.h"
>  #include "kvm/vnc.h"
> diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
> index a052b0bc7582..a1012c57b7a7 100644
> --- a/include/kvm/kvm-config.h
> +++ b/include/kvm/kvm-config.h
> @@ -2,7 +2,6 @@
>  #define KVM_CONFIG_H_
>  
>  #include "kvm/disk-image.h"
> -#include "kvm/vfio.h"
>  #include "kvm/kvm-config-arch.h"
>  
>  #define DEFAULT_KVM_DEV		"/dev/kvm"
> @@ -18,6 +17,7 @@
>  #define MIN_RAM_SIZE_MB		(64ULL)
>  #define MIN_RAM_SIZE_BYTE	(MIN_RAM_SIZE_MB << MB_SHIFT)
>  
> +struct vfio_device_params;
>  struct kvm_config {
>  	struct kvm_config_arch arch;
>  	struct disk_image_params disk_image[MAX_DISK_IMAGES];
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index be75f77fd2cb..71ee9d8cb01f 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -10,6 +10,7 @@
>  #include "kvm/devices.h"
>  #include "kvm/msi.h"
>  #include "kvm/fdt.h"
> +#include "kvm.h"
>  
>  #define pci_dev_err(pci_hdr, fmt, ...) \
>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> @@ -32,9 +33,41 @@
>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>  #define PCI_IO_SIZE		0x100
>  #define PCI_IOPORT_START	0x6200
> -#define PCI_CFG_SIZE		(1ULL << 24)
>  
> -struct kvm;
> +#define PCIE_CAP_REG_VER	0x1
> +#define PCIE_CAP_REG_DEV_LEGACY	(1 << 4)
> +#define PM_CAP_VER		0x3
> +
> +#ifdef ARCH_HAS_PCI_EXP
> +#define PCI_CFG_SIZE		(1ULL << 28)
> +#define PCI_DEV_CFG_SIZE	4096

Maybe use PCI_CFG_SPACE_EXP_SIZE from pci_regs.h?

> +
> +union pci_config_address {
> +	struct {
> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> +		unsigned	enable_bit	: 1;		/* 31       */
> +#else
> +		unsigned	enable_bit	: 1;		/* 31       */
> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> +#endif

Just for the records:
I think we agreed on this before, but using a C bitfield to model
hardware defined bits is broken, because the C standard doesn't
guarantee those bits to be consecutive and layed out like we hope it would.
But since we have this issue already with the legacy config space, and
it seems to work (TM), we can fix this later.

> +	};
> +	u32 w;
> +};
> +
> +#else
> +#define PCI_CFG_SIZE		(1ULL << 24)
> +#define PCI_DEV_CFG_SIZE	256

Shall we use PCI_CFG_SPACE_SIZE from the kernel headers here?

>  
>  union pci_config_address {
>  	struct {
> @@ -58,6 +91,8 @@ union pci_config_address {
>  	};
>  	u32 w;
>  };
> +#endif
> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
>  struct msix_table {
>  	struct msi_msg msg;
> @@ -100,6 +135,33 @@ struct pci_cap_hdr {
>  	u8	next;
>  };
>  
> +struct pcie_cap {

I guess this is meant to map to the PCI Express Capability Structure as
described in the PCIe spec?
We would need to add the "packed" attribute then. But actually I am not
a fan of using C language constructs to model specified register
arrangements, the kernel tries to avoid this as well.

Actually, looking closer: why do we need this in the first place? I
removed this and struct pm_cap, and it still compiles.
So can we lose those two structures at all? And move the discussion and
implementation (for VirtIO 1.0?) to a later series?

> +	u8 cap;
> +	u8 next;
> +	u16 cap_reg;
> +	u32 dev_cap;
> +	u16 dev_ctrl;
> +	u16 dev_status;
> +	u32 link_cap;
> +	u16 link_ctrl;
> +	u16 link_status;
> +	u32 slot_cap;
> +	u16 slot_ctrl;
> +	u16 slot_status;
> +	u16 root_ctrl;
> +	u16 root_cap;
> +	u32 root_status;
> +};
> +
> +struct pm_cap {
> +	u8 cap;
> +	u8 next;
> +	u16 pmc;
> +	u16 pmcsr;
> +	u8 pmcsr_bse;
> +	u8 data;
> +};
> +
>  struct pci_device_header;
>  
>  typedef int (*bar_activate_fn_t)(struct kvm *kvm,
> @@ -110,14 +172,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>  				   int bar_num, void *data);
>  
>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
> -#define PCI_DEV_CFG_SIZE	256
> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
>  struct pci_config_operations {
>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -		      u8 offset, void *data, int sz);
> +		      u16 offset, void *data, int sz);
>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -		     u8 offset, void *data, int sz);
> +		     u16 offset, void *data, int sz);
>  };
>  
>  struct pci_device_header {
> @@ -147,6 +207,10 @@ struct pci_device_header {
>  			u8		min_gnt;
>  			u8		max_lat;
>  			struct msix_cap msix;
> +#ifdef ARCH_HAS_PCI_EXP
> +			struct pm_cap pm;
> +			struct pcie_cap pcie;
> +#endif
>  		} __attribute__((packed));
>  		/* Pad to PCI config space size */
>  		u8	__pad[PCI_DEV_CFG_SIZE];
> diff --git a/pci.c b/pci.c
> index 68ece65441a6..b471209a6efc 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -400,7 +400,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
>  	void *base;
> -	u8 bar, offset;
> +	u8 bar;
> +	u16 offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  	u32 value = 0;
> @@ -439,7 +440,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
> -	u8 offset;
> +	u16 offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 2b891496547d..6b8726227ea0 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -311,7 +311,7 @@ out_unlock:
>  }
>  
>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
> -				    struct vfio_device *vdev, u8 off,
> +				    struct vfio_device *vdev, u16 off,
>  				    void *data, int sz)
>  {
>  	struct vfio_pci_device *pdev = &vdev->pci;
> @@ -343,7 +343,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>  }
>  
>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
> -				     u8 off, u8 *data, u32 sz)
> +				     u16 off, u8 *data, u32 sz)
>  {
>  	size_t i;
>  	u32 mask = 0;
> @@ -391,7 +391,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>  }
>  
>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
> -				   u8 off, u8 *data, u32 sz)
> +				   u16 off, u8 *data, u32 sz)
>  {
>  	u8 ctrl;
>  	struct msi_msg msg;
> @@ -536,7 +536,7 @@ out:
>  }
>  
>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -			      u8 offset, void *data, int sz)
> +			      u16 offset, void *data, int sz)
>  {
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
> @@ -554,7 +554,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>  }
>  
>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -			       u8 offset, void *data, int sz)
> +			       u16 offset, void *data, int sz)
>  {
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
> @@ -638,15 +638,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  {
>  	int ret;
>  	size_t size;
> -	u8 pos, next;
> +	u16 pos, next;
>  	struct pci_cap_hdr *cap;
> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
> +	u8 *virt_hdr;
>  	struct vfio_pci_device *pdev = &vdev->pci;
>  
>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>  		return 0;
>  
> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
> +	if (!virt_hdr)
> +		return -errno;

There are two places where we return in this function, we don't seem to
free virt_hdr in those cases. Looks like a job for your beloved goto ;-)

>  
>  	pos = pdev->hdr.capabilities & ~3;
>  
> @@ -682,6 +684,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
>  
> +	free(virt_hdr);
> +
>  	return 0;
>  }
>  
> @@ -792,7 +796,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>  
>  	/* Install our fake Configuration Space */
>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
> -	hdr_sz = PCI_DEV_CFG_SIZE;
> +	/*
> +	 * We don't touch the extended configuration space, let's be cautious
> +	 * and not overwrite it all with zeros, or bad things might happen.
> +	 */
> +	hdr_sz = PCI_CFG_SPACE_SIZE;

This breaks compilation on x86, do we miss to include a header file?
Works on arm64, though...

Cheers,
Andre.

>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
>  			     hdr_sz);
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support
  2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (31 preceding siblings ...)
  2020-03-26 15:24 ` [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
@ 2020-04-06 14:14 ` André Przywara
  32 siblings, 0 replies; 68+ messages in thread
From: André Przywara @ 2020-04-06 14:14 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 26/03/2020 15:24, Alexandru Elisei wrote:

Hi Will, Alex,

> kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
> it from trying to reassign the BARs. Let's make the BARs reassignable so we
> can get rid of this band-aid.
> 
> Let's also extend the legacy PCI emulation, which came out in 1992, so we
> can properly emulate the PCI Express version 1.1 protocol, which is new in
> comparison, being published in 2005.
> 
> During device configuration, Linux can assign a region resource to a BAR
> that temporarily overlaps with another device. This means that the code
> that emulates BAR reassignment must be aware of all the registered PCI
> devices. Because of this, and to avoid re-implementing the same code for
> each emulated device, the algorithm for activating/deactivating emulation
> as BAR addresses change lives completely inside the PCI code. Each device
> registers two callback functions which are called when device emulation is
> activated (for example, to activate emulation for a newly assigned BAR
> address), respectively, when device emulation is deactivated (a previous
> BAR address is changed, and emulation for that region must be deactivated).

[ ... ]

> 
> Patches 1-18 are fixes and cleanups, and can be merged independently. They
> are pretty straightforward, so if the size of the series looks off-putting,
> please review these first. I am aware that the series has grown quite a
> lot, I am willing to split the fixes from the rest of the patches, or
> whatever else can make reviewing easier.

I am done with the review of this series. I had only one small comment
on the first 18 patches, so I would suggest that Alex sends a new
version of only those fix&cleanup patches, so we can fix those bugs and
also make the rest of the series easier to digest.

In general this series looks to be in a really good shape, I don't see
anything major anymore.

Alex, many thanks for the accurate and tedious work on this, surely not
the most grateful of tasks.

Cheers,
Andre.

> 
> Changes in v3:
> * Patches 18, 24 and 26 are new.
> * Dropped #9 "arm/pci: Fix PCI IO region" for reasons explained above.
> * Reworded commit message for #12 "vfio/pci: Ignore expansion ROM BAR
>   writes" to make it clear that kvmtool's implementation of VFIO doesn't
>   support expansion BAR emulation.
> * Moved variable declaration at the start of the function for #13
>   "vfio/pci: Don't access unallocated regions".
> * Patch #15 "Don't ignore errors registering a device, ioport or mmio
>   emulation" uses ioport__remove consistenly.
> * Reworked error cleanup for #16 "hw/vesa: Don't ignore fatal errors".
> * Reworded commit message for #17 "hw/vesa: Set the size for BAR 0".
> * Reworked #19 "ioport: mmio: Use a mutex and reference counting for
>   locking" to also use reference counting to prevent races (and updated the
>   commit message in the process).
> * Added the function pci__bar_size to #20 "pci: Add helpers for BAR values
>   and memory/IO space access".
> * Protect mem_banks list with a mutex in #22 "vfio: Destroy memslot when
>   unmapping the associated VAs"; also moved the munmap after the slot is
>   destroyed, as per the KVM api.
> * Dropped the rework of the vesa device in patch #27 "pci: Implement
>   callbacks for toggling BAR emulation". Also added a few assert statements
>   in some callbacks to make sure that they don't get called with an
>   unxpected BAR number (which can only result from a coding error). Also
>   return -ENOENT when kvm__deregister_mmio fails, to match ioport behavior
>   and for better diagnostics.
> * Dropped the PCI configuration write callback from the vesa device in #28
>   "pci: Toggle BAR I/O and memory space emulation". Apparently, if we don't
>   allow the guest kernel to disable BAR access, it treats the VESA device
>   as a VGA boot device, instead of a regular VGA device, and Xorg cannot
>   use it.
> * Droped struct bar_info from #29 "pci: Implement reassignable BARs". Also
>   changed pci_dev_err to pci_dev_warn in pci_{activate,deactivate}_bar,
>   because the errors can be benign (they can happen because two vcpus race
>   against each other to deactivate memory or I/O access, for example).
> * Replaced the PCI device configuration space define with the actual
>   number in #32 "arm/arm64: Add PCI Express 1.1 support". On some distros
>   the defines weren't present in /usr/include/linux/pci_regs.h.
> * Implemented other minor review comments.
> * Gathered Reviewed-by tags.
> 
> Changes in v2:
> * Patches 2, 11-18, 20, 22-27, 29 are new.
> * Patches 11, 13, and 14 have been dropped.
> * Reworked the way BAR reassignment is implemented.
> * The patch "Add PCI Express 1.1 support" has been reworked to apply only
>   to arm64. For x86 we would need ACPI support in order to advertise the
>   location of the ECAM space.
> * Gathered Reviewed-by tags.
> * Implemented review comments.
> 
> [1] https://www.spinics.net/lists/kvm/msg209178.html
> [2] https://www.spinics.net/lists/kvm/msg209178.html
> [3] https://www.spinics.net/lists/arm-kernel/msg778623.html
> 
> 
> Alexandru Elisei (27):
>   Makefile: Use correct objcopy binary when cross-compiling for x86_64
>   hw/i8042: Compile only for x86
>   Remove pci-shmem device
>   Check that a PCI device's memory size is power of two
>   arm/pci: Advertise only PCI bus 0 in the DT
>   vfio/pci: Allocate correct size for MSIX table and PBA BARs
>   vfio/pci: Don't assume that only even numbered BARs are 64bit
>   vfio/pci: Ignore expansion ROM BAR writes
>   vfio/pci: Don't access unallocated regions
>   virtio: Don't ignore initialization failures
>   Don't ignore errors registering a device, ioport or mmio emulation
>   hw/vesa: Don't ignore fatal errors
>   hw/vesa: Set the size for BAR 0
>   ioport: Fail when registering overlapping ports
>   ioport: mmio: Use a mutex and reference counting for locking
>   pci: Add helpers for BAR values and memory/IO space access
>   virtio/pci: Get emulated region address from BARs
>   vfio: Destroy memslot when unmapping the associated VAs
>   vfio: Reserve ioports when configuring the BAR
>   pci: Limit configuration transaction size to 32 bits
>   vfio/pci: Don't write configuration value twice
>   vesa: Create device exactly once
>   pci: Implement callbacks for toggling BAR emulation
>   pci: Toggle BAR I/O and memory space emulation
>   pci: Implement reassignable BARs
>   vfio: Trap MMIO access to BAR addresses which aren't page aligned
>   arm/arm64: Add PCI Express 1.1 support
> 
> Julien Thierry (4):
>   ioport: pci: Move port allocations to PCI devices
>   pci: Fix ioport allocation size
>   virtio/pci: Make memory and IO BARs independent
>   arm/fdt: Remove 'linux,pci-probe-only' property
> 
> Sami Mujawar (1):
>   pci: Fix BAR resource sizing arbitration
> 
>  Makefile                          |   6 +-
>  arm/fdt.c                         |   1 -
>  arm/include/arm-common/kvm-arch.h |   4 +-
>  arm/ioport.c                      |   3 +-
>  arm/pci.c                         |   4 +-
>  builtin-run.c                     |   6 +-
>  hw/i8042.c                        |  14 +-
>  hw/pci-shmem.c                    | 400 ------------------------------
>  hw/vesa.c                         | 113 ++++++---
>  include/kvm/devices.h             |   3 +-
>  include/kvm/ioport.h              |  12 +-
>  include/kvm/kvm-config.h          |   2 +-
>  include/kvm/kvm.h                 |  11 +-
>  include/kvm/pci-shmem.h           |  32 ---
>  include/kvm/pci.h                 | 163 +++++++++++-
>  include/kvm/rbtree-interval.h     |   4 +-
>  include/kvm/util.h                |   2 +
>  include/kvm/vesa.h                |   6 +-
>  include/kvm/virtio-pci.h          |   3 -
>  include/kvm/virtio.h              |   7 +-
>  include/linux/compiler.h          |   2 +-
>  ioport.c                          | 108 ++++----
>  kvm.c                             | 101 +++++++-
>  mips/kvm.c                        |   3 +-
>  mmio.c                            |  91 +++++--
>  pci.c                             | 334 +++++++++++++++++++++++--
>  powerpc/include/kvm/kvm-arch.h    |   2 +-
>  powerpc/ioport.c                  |   3 +-
>  powerpc/spapr_pci.c               |   2 +-
>  vfio/core.c                       |  22 +-
>  vfio/pci.c                        | 233 +++++++++++++----
>  virtio/9p.c                       |   9 +-
>  virtio/balloon.c                  |  10 +-
>  virtio/blk.c                      |  14 +-
>  virtio/console.c                  |  11 +-
>  virtio/core.c                     |   9 +-
>  virtio/mmio.c                     |  13 +-
>  virtio/net.c                      |  32 +--
>  virtio/pci.c                      | 206 ++++++++++-----
>  virtio/scsi.c                     |  14 +-
>  x86/include/kvm/kvm-arch.h        |   2 +-
>  x86/ioport.c                      |  66 +++--
>  42 files changed, 1287 insertions(+), 796 deletions(-)
>  delete mode 100644 hw/pci-shmem.c
>  delete mode 100644 include/kvm/pci-shmem.h
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support
  2020-04-06 14:06   ` André Przywara
@ 2020-04-14  8:56     ` Alexandru Elisei
  2020-05-06 13:51     ` Alexandru Elisei
  1 sibling, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-04-14  8:56 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 4/6/20 3:06 PM, André Przywara wrote:
>
>> @@ -792,7 +796,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>>  
>>  	/* Install our fake Configuration Space */
>>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
>> -	hdr_sz = PCI_DEV_CFG_SIZE;
>> +	/*
>> +	 * We don't touch the extended configuration space, let's be cautious
>> +	 * and not overwrite it all with zeros, or bad things might happen.
>> +	 */
>> +	hdr_sz = PCI_CFG_SPACE_SIZE;
> This breaks compilation on x86, do we miss to include a header file?
> Works on arm64, though...

I've been able to reproduce it on an x86 laptop with Ubuntu 16.04. You've stumbled
upon the reason I switched to using numbers for PCI_DEV_CFG_SIZE instead of using
the defines PCI_CFG_SPACE_SIZE and PCI_CFG_SPACE_EXP_SIZE from linux/pci_regs.h:
those defines don't exist on some distros. When that happened to Sami on arm64, I
made the change to PCI_DEV_CFG_SIZE, but I forgot to change it here. I'll switch
to using 256.

For the record, x86 compiles fine on the x86 machine which I used for testing.

Thanks,
Alex
>
> Cheers,
> Andre.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 14/32] virtio: Don't ignore initialization failures
  2020-03-30  9:30   ` André Przywara
@ 2020-04-14 10:27     ` Alexandru Elisei
  0 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-04-14 10:27 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 3/30/20 10:30 AM, André Przywara wrote:
> Hi,
>
> On 26/03/2020 15:24, Alexandru Elisei wrote:
>> Don't ignore an error in the bus specific initialization function in
>> virtio_init; don't ignore the result of virtio_init; and don't return 0
>> in virtio_blk__init and virtio_scsi__init when we encounter an error.
>> Hopefully this will save some developer's time debugging faulty virtio
>> devices in a guest.
>>
>> To take advantage of the cleanup function virtio_blk__exit, we have
>> moved appending the new device to the list before the call to
>> virtio_init.
>>
>> To safeguard against this in the future, virtio_init has been annoted
>> with the compiler attribute warn_unused_result.
> This is a good idea, but actually triggers an unrelated, long standing
> bug in vesa.c (on x86):
> hw/vesa.c: In function ‘vesa__init’:
> hw/vesa.c:77:3: error: ignoring return value of ‘ERR_PTR’, declared with
> attribute warn_unused_result [-Werror=unused-result]
>    ERR_PTR(-errno);
>    ^
> cc1: all warnings being treated as errors
>
> So could you add the missing "return" statement in that line, to fix
> that bug?
> I see that this gets rectified two patches later, but for the sake of
> bisect-ability it would be good to keep this compilable all the way through.

You're right, I'll fix it here.

>
>> diff --git a/virtio/net.c b/virtio/net.c
>> index 091406912a24..425c13ba1136 100644
>> --- a/virtio/net.c
>> +++ b/virtio/net.c
>> @@ -910,7 +910,7 @@ done:
>>  
>>  static int virtio_net__init_one(struct virtio_net_params *params)
>>  {
>> -	int i, err;
>> +	int i, r;
>>  	struct net_dev *ndev;
>>  	struct virtio_ops *ops;
>>  	enum virtio_trans trans = VIRTIO_DEFAULT_TRANS(params->kvm);
>> @@ -920,10 +920,8 @@ static int virtio_net__init_one(struct virtio_net_params *params)
>>  		return -ENOMEM;
>>  
>>  	ops = malloc(sizeof(*ops));
>> -	if (ops == NULL) {
>> -		err = -ENOMEM;
>> -		goto err_free_ndev;
>> -	}
>> +	if (ops == NULL)
>> +		return -ENOMEM;
>>  
>>  	list_add_tail(&ndev->list, &ndevs);
> As mentioned in the reply to the v2 version, I think this is still
> leaking memory here.

Yes, the leak is there. I didn't realize that virtio_net__exit doesn't do any
free'ing at all.

I find it a bit strange that we copy net_dev_virtio_ops for each device instance.
I suspect that passing the pointer to the static struct to virtio_init would have
been enough (this is what blk does), but I might be missing something. I don't
want to introduce a regression, so I'll keep the memory allocation and free it on
error.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 19/32] ioport: mmio: Use a mutex and reference counting for locking
  2020-03-31 11:51   ` André Przywara
@ 2020-05-01 15:30     ` Alexandru Elisei
  2020-05-06 17:40       ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-01 15:30 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 3/31/20 12:51 PM, André Przywara wrote:
> On 26/03/2020 15:24, Alexandru Elisei wrote:
>
> Hi,
>
>> kvmtool uses brlock for protecting accesses to the ioport and mmio
>> red-black trees. brlock allows concurrent reads, but only one writer, which
>> is assumed not to be a VCPU thread (for more information see commit
>> 0b907ed2eaec ("kvm tools: Add a brlock)). This is done by issuing a
>> compiler barrier on read and pausing the entire virtual machine on writes.
>> When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread read/write
>> lock.
>>
>> When we will implement reassignable BARs, the mmio or ioport mapping will
>> be done as a result of a VCPU mmio access. When brlock is a pthread
>> read/write lock, it means that we will try to acquire a write lock with the
>> read lock already held by the same VCPU and we will deadlock. When it's
>> not, a VCPU will have to call kvm__pause, which means the virtual machine
>> will stay paused forever.
>>
>> Let's avoid all this by using a mutex and reference counting the red-black
>> tree entries. This way we can guarantee that we won't unregister a node
>> that another thread is currently using for emulation.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  include/kvm/ioport.h          |  2 +
>>  include/kvm/rbtree-interval.h |  4 +-
>>  ioport.c                      | 64 +++++++++++++++++-------
>>  mmio.c                        | 91 +++++++++++++++++++++++++----------
>>  4 files changed, 118 insertions(+), 43 deletions(-)
>>
>> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
>> index 62a719327e3f..039633f76bdd 100644
>> --- a/include/kvm/ioport.h
>> +++ b/include/kvm/ioport.h
>> @@ -22,6 +22,8 @@ struct ioport {
>>  	struct ioport_operations	*ops;
>>  	void				*priv;
>>  	struct device_header		dev_hdr;
>> +	u32				refcount;
>> +	bool				remove;
> The use of this extra "remove" variable seems somehow odd. I think
> normally you would initialise the refcount to 1, and let the unregister
> operation do a put as well, with the removal code triggered if the count
> reaches zero. At least this is what kref does, can we do the same here?
> Or is there anything that would prevent it? I think it's a good idea to
> stick to existing design patterns for things like refcounts.
>
> Cheers,
> Andre.
>
You're totally right, it didn't cross my mind to initialize refcount to 1, it's a
great idea, I'll do it like that.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 24/32] pci: Limit configuration transaction size to 32 bits
  2020-04-02  8:34   ` André Przywara
@ 2020-05-04 13:00     ` Alexandru Elisei
  2020-05-04 13:37       ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-04 13:00 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 4/2/20 9:34 AM, André Przywara wrote:
> On 26/03/2020 15:24, Alexandru Elisei wrote:
>> From PCI Local Bus Specification Revision 3.0. section 3.8 "64-Bit Bus
>> Extension":
>>
>> "The bandwidth requirements for I/O and configuration transactions cannot
>> justify the added complexity, and, therefore, only memory transactions
>> support 64-bit data transfers".
>>
>> Further down, the spec also describes the possible responses of a target
>> which has been requested to do a 64-bit transaction. Limit the transaction
>> to the lower 32 bits, to match the second accepted behaviour.
> That looks like a reasonable behaviour.
> AFAICS there is one code path from powerpc/spapr_pci.c which isn't
> covered by those limitations (rtas_ibm_write_pci_config() ->
> pci__config_wr() -> cfg_ops.write() -> vfio_pci_cfg_write()).
> Same for read.

The code compares the access size to 1, 2 and 4, so I think powerpc doesn't expect
64 bit accesses either. The change looks straightforward, I'll do it for consistency.

>
> I am not sure we really care, maybe you can fix it if you like.
>
> Either way this patch seems right, so:
>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Thanks!

Alex

>
> Cheers,
> Andre
>
>> ---
>>  pci.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/pci.c b/pci.c
>> index 7399c76c0819..611e2c0bf1da 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -119,6 +119,9 @@ static bool pci_config_data_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
>>  {
>>  	union pci_config_address pci_config_address;
>>  
>> +	if (size > 4)
>> +		size = 4;
>> +
>>  	pci_config_address.w = ioport__read32(&pci_config_address_bits);
>>  	/*
>>  	 * If someone accesses PCI configuration space offsets that are not
>> @@ -135,6 +138,9 @@ static bool pci_config_data_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16
>>  {
>>  	union pci_config_address pci_config_address;
>>  
>> +	if (size > 4)
>> +		size = 4;
>> +
>>  	pci_config_address.w = ioport__read32(&pci_config_address_bits);
>>  	/*
>>  	 * If someone accesses PCI configuration space offsets that are not
>> @@ -248,6 +254,9 @@ static void pci_config_mmio_access(struct kvm_cpu *vcpu, u64 addr, u8 *data,
>>  	cfg_addr.w		= (u32)addr;
>>  	cfg_addr.enable_bit	= 1;
>>  
>> +	if (len > 4)
>> +		len = 4;
>> +
>>  	if (is_write)
>>  		pci__config_wr(kvm, cfg_addr, data, len);
>>  	else
>>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 24/32] pci: Limit configuration transaction size to 32 bits
  2020-05-04 13:00     ` Alexandru Elisei
@ 2020-05-04 13:37       ` Alexandru Elisei
  0 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-04 13:37 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 5/4/20 2:00 PM, Alexandru Elisei wrote:
> Hi,
>
> On 4/2/20 9:34 AM, André Przywara wrote:
>> On 26/03/2020 15:24, Alexandru Elisei wrote:
>>> From PCI Local Bus Specification Revision 3.0. section 3.8 "64-Bit Bus
>>> Extension":
>>>
>>> "The bandwidth requirements for I/O and configuration transactions cannot
>>> justify the added complexity, and, therefore, only memory transactions
>>> support 64-bit data transfers".
>>>
>>> Further down, the spec also describes the possible responses of a target
>>> which has been requested to do a 64-bit transaction. Limit the transaction
>>> to the lower 32 bits, to match the second accepted behaviour.
>> That looks like a reasonable behaviour.
>> AFAICS there is one code path from powerpc/spapr_pci.c which isn't
>> covered by those limitations (rtas_ibm_write_pci_config() ->
>> pci__config_wr() -> cfg_ops.write() -> vfio_pci_cfg_write()).
>> Same for read.
> The code compares the access size to 1, 2 and 4, so I think powerpc doesn't expect
> 64 bit accesses either. The change looks straightforward, I'll do it for consistency.
>
Read the code more carefully and powerpc already limits the access size to 4 bytes:

static void rtas_ibm_read_pci_config(struct kvm_cpu *vcpu,
                     uint32_t token, uint32_t nargs,
                     target_ulong args,
                     uint32_t nret, target_ulong rets)
{
    [..]
    if (buid != phb.buid || !dev || (size > 4)) {
        phb_dprintf("- cfgRd buid 0x%lx cfg addr 0x%x size %d not found\n",
                buid, addr.w, size);

        rtas_st(vcpu->kvm, rets, 0, -1);
        return;
    }
    pci__config_rd(vcpu->kvm, addr, &val, size);
    [..]
}

It's the same for all the functions where pci__config_{rd,wr} are called directly.
So no changes are needed.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 25/32] vfio/pci: Don't write configuration value twice
  2020-04-02  8:35   ` André Przywara
@ 2020-05-04 13:44     ` Alexandru Elisei
  0 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-04 13:44 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 4/2/20 9:35 AM, André Przywara wrote:
> On 26/03/2020 15:24, Alexandru Elisei wrote:
>> After writing to the device fd as part of the PCI configuration space
>> emulation, we read back from the device to make sure that the write
>> finished. The value is read back into the PCI configuration space and
>> afterwards, the same value is copied by the PCI emulation code. Let's
>> read from the device fd into a temporary variable, to prevent this
>> double write.
>>
>> The double write is harmless in itself. But when we implement
>> reassignable BARs, we need to keep track of the old BAR value, and the
>> VFIO code is overwritting it.
>>
> It seems still a bit fragile, since we rely on code in other places to
> limit "sz" to 4 or less, but in practice we should be covered.
>
> Can you maybe add an assert here to prevent accidents on the stack?

Sure, sounds reasonable.

Thanks,
Alex
>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Cheers,
> Andre
>
>> ---
>>  vfio/pci.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index fe02574390f6..8b2a0c8dbac3 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -470,7 +470,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>>  	struct vfio_region_info *info;
>>  	struct vfio_pci_device *pdev;
>>  	struct vfio_device *vdev;
>> -	void *base = pci_hdr;
>> +	u32 tmp;
>>  
>>  	if (offset == PCI_ROM_ADDRESS)
>>  		return;
>> @@ -490,7 +490,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>>  	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
>>  		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
>>  
>> -	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
>> +	if (pread(vdev->fd, &tmp, sz, info->offset + offset) != sz)
>>  		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
>>  			      sz, offset);
>>  }
>>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 26/32] vesa: Create device exactly once
  2020-04-02  8:58   ` André Przywara
@ 2020-05-05 13:02     ` Alexandru Elisei
  0 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-05 13:02 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 4/2/20 9:58 AM, André Przywara wrote:
> On 26/03/2020 15:24, Alexandru Elisei wrote:
>> A vesa device is used by the SDL, GTK or VNC framebuffers. Don't allow the
>> user to specify more than one of these options because kvmtool will create
>> identical devices and bad things will happen:
>>
>> $ ./lkvm run -c2 -m2048 -k bzImage --sdl --gtk
>>   # lkvm run -k bzImage -m 2048 -c 2 --name guest-10159
>>   Error: device region [d0000000-d012bfff] would overlap device region [d0000000-d012bfff]
>> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
>> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
>> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
>> ======= Backtrace: =========
>> ======= Backtrace: =========
>> /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
>> ======= Backtrace: =========
>> /lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
>> (+0x777e5)[0x7fae0ed447e5]
>> /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
>> /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
>> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fae0ed5153c]
>> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
>> /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
>> /usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
>> /usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
>> ======= Backtrace: =========
>> Aborted (core dumped)
>>
>> The vesa device is explicitly created during the initialization phase of
>> the above framebuffers. Remove the superfluous check for their existence.
>>
> Not really happy about this pointer comparison, but I don't see a better
> way, and it's surely good enough for that purpose.

I've been giving this patch some thought. I think the fix is wrong. The problem is
that the user is allowed to specify more than one of --gtk, --sdl and --vnc. I'll
rewrite the patch to fix that, and print an error message that is actually useful
for the user.

Thanks,
Alex
>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
>
> Cheers,
> Andre
>
>> ---
>>  hw/vesa.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/vesa.c b/hw/vesa.c
>> index dd59a112330b..8071ad153f27 100644
>> --- a/hw/vesa.c
>> +++ b/hw/vesa.c
>> @@ -61,8 +61,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>  	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
>>  	BUILD_BUG_ON(VESA_MEM_SIZE < VESA_BPP/8 * VESA_WIDTH * VESA_HEIGHT);
>>  
>> -	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>> -		return NULL;
>> +	if (device__find_dev(vesa_device.bus_type, vesa_device.dev_num) == &vesa_device) {
>> +		r = -EEXIST;
>> +		goto out_error;
>> +	}
>> +
>>  	vesa_base_addr = pci_get_io_port_block(PCI_IO_SIZE);
>>  	r = ioport__register(kvm, vesa_base_addr, &vesa_io_ops, PCI_IO_SIZE, NULL);
>>  	if (r < 0)
>>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 27/32] pci: Implement callbacks for toggling BAR emulation
  2020-04-03 19:08       ` André Przywara
@ 2020-05-05 13:30         ` Alexandru Elisei
  0 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-05 13:30 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	Alexandru Elisei

Hi,

On 4/3/20 8:08 PM, André Przywara wrote:
> On 03/04/2020 19:14, Alexandru Elisei wrote:
>> Hi,
>>
>> On 4/3/20 12:57 PM, André Przywara wrote:
>>>> +	}
>>>> +
>>>> +	assert(has_bar_regions);
>>> Is assert() here really a good idea? I see that it makes sense for our
>>> emulated devices, but is that a valid check for VFIO?
>>> From briefly looking I can't find a requirement for having at least one
>>> valid BAR in general, and even if - I think we should rather return an
>>> error than aborting the guest here - or ignore it altogether.
>> The assert here is to discover coding errors with devices, not with the PCI
>> emulation. Calling pci__register_bar_regions and providing callbacks for when BAR
>> access is toggled, but *without* any valid BARs looks like a coding error in the
>> device emulation code to me.
> As I said, I totally see the point for our emulated devices, but it
> looks like we use this code also for VFIO? Where we are not in control
> of what the device exposes.
>
>> As for VFIO, I'm struggling to find a valid reason for someone to build a device
>> that uses PCI, but doesn't have any BARs. Isn't that the entire point of PCI? I'm
>> perfectly happy to remove the assert if you can provide an rationale for building
>> such a device.
> IIRC you have an AMD box, check the "Northbridge" PCI device there,
> devices 0:18.x. They provide chipset registers via (extended) config
> space only, they don't have any valid BARs. Also I found some SMBus
> device without BARs.
> Not the most prominent use case (especially for pass through), but
> apparently valid.
> I think the rationale for using this was to use a well established,
> supported and discoverable interface.

I see you point. I'll remove the assert.

>
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  int pci__init(struct kvm *kvm)
>>>>  {
>>>>  	int r;
>>>> diff --git a/vfio/pci.c b/vfio/pci.c
>>>> index 8b2a0c8dbac3..18e22a8c5320 100644
>>>> --- a/vfio/pci.c
>>>> +++ b/vfio/pci.c
>>>> @@ -8,6 +8,8 @@
>>>>  #include <sys/resource.h>
>>>>  #include <sys/time.h>
>>>>  
>>>> +#include <assert.h>
>>>> +
>>>>  /* Wrapper around UAPI vfio_irq_set */
>>>>  union vfio_irq_eventfd {
>>>>  	struct vfio_irq_set	irq;
>>>> @@ -446,6 +448,81 @@ out_unlock:
>>>>  	mutex_unlock(&pdev->msi.mutex);
>>>>  }
>>>>  
>>>> +static int vfio_pci_bar_activate(struct kvm *kvm,
>>>> +				 struct pci_device_header *pci_hdr,
>>>> +				 int bar_num, void *data)
>>>> +{
>>>> +	struct vfio_device *vdev = data;
>>>> +	struct vfio_pci_device *pdev = &vdev->pci;
>>>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>>>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>>>> +	struct vfio_region *region;
>>>> +	bool has_msix;
>>>> +	int ret;
>>>> +
>>>> +	assert((u32)bar_num < vdev->info.num_regions);
>>>> +
>>>> +	region = &vdev->regions[bar_num];
>>>> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>>>> +
>>>> +	if (has_msix && (u32)bar_num == table->bar) {
>>>> +		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>>>> +					 table->size, false,
>>>> +					 vfio_pci_msix_table_access, pdev);
>>>> +		if (ret < 0 || table->bar != pba->bar)
>>> I think this second expression deserves some comment.
>>> If I understand correctly, this would register the PBA trap handler
>>> separetely below if both the MSIX table and the PBA share a BAR?
>> The MSIX table and the PBA structure can share the same BAR for the base address
>> (that's why the MSIX capability has an offset field for both of them), but we
>> register different regions for mmio emulation because we don't want to have a
>> generic handler and always check if the mmio access was to the MSIX table of the
>> PBA structure. I can add a comment stating that, sure.
> Yes, thanks for the explanation!
>
>>>> +			goto out;
>>> Is there any particular reason you are using goto here? I find it more
>>> confusing if the "out:" label has just a return statement, without any
>>> cleanup or lock dropping. Just a "return ret;" here would be much
>>> cleaner I think. Same for other occassions in this function and
>>> elsewhere in this patch.
>>>
>>> Or do you plan on adding some code here later? I don't see it in this
>>> series though.
>> The reason I'm doing this is because I prefer one exit point from the function,
>> instead of return statements at arbitrary points in the function body. As a point
>> of reference, the pattern is recommended in the MISRA C standard for safety, in
>> section 17.4 "No more than one return statement", and is also used in the Linux
>> kernel. I think it comes down to personal preference, so unless Will of Julien
>> have a strong preference against it, I would rather keep it.
> Fair enough, your decision. Just to point out that I can't find this
> practice in the kernel, also:
>
> Documentation/process/coding-style.rst, section "7) Centralized exiting
> of functions":
> "The goto statement comes in handy when a function exits from multiple
> locations and some common work such as cleanup has to be done.  If there
> is no cleanup needed then just return directly."
>
>
> Thanks!
> Andre.
>
>>>> +	}
>>>> +
>>>> +	if (has_msix && (u32)bar_num == pba->bar) {
>>>> +		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>>>> +					 pba->size, false,
>>>> +					 vfio_pci_msix_pba_access, pdev);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	ret = vfio_map_region(kvm, vdev, region);
>>>> +out:
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +static int vfio_pci_bar_deactivate(struct kvm *kvm,
>>>> +				   struct pci_device_header *pci_hdr,
>>>> +				   int bar_num, void *data)
>>>> +{
>>>> +	struct vfio_device *vdev = data;
>>>> +	struct vfio_pci_device *pdev = &vdev->pci;
>>>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>>>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>>>> +	struct vfio_region *region;
>>>> +	bool has_msix, success;
>>>> +	int ret;
>>>> +
>>>> +	assert((u32)bar_num < vdev->info.num_regions);
>>>> +
>>>> +	region = &vdev->regions[bar_num];
>>>> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>>>> +
>>>> +	if (has_msix && (u32)bar_num == table->bar) {
>>>> +		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
>>>> +		/* kvm__deregister_mmio fails when the region is not found. */
>>>> +		ret = (success ? 0 : -ENOENT);
>>>> +		if (ret < 0 || table->bar!= pba->bar)
>>>> +			goto out;
>>>> +	}
>>>> +
>>>> +	if (has_msix && (u32)bar_num == pba->bar) {
>>>> +		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
>>>> +		ret = (success ? 0 : -ENOENT);
>>>> +		goto out;
>>>> +	}
>>>> +
>>>> +	vfio_unmap_region(kvm, region);
>>>> +	ret = 0;
>>>> +
>>>> +out:
>>>> +	return ret;
>>>> +}
>>>> +
>>>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>>  			      u8 offset, void *data, int sz)
>>>>  {
>>>> @@ -805,12 +882,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>>>  		ret = -ENOMEM;
>>>>  		goto out_free;
>>>>  	}
>>>> -	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>>>> -
>>>> -	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
>>>> -				 false, vfio_pci_msix_table_access, pdev);
>>>> -	if (ret < 0)
>>>> -		goto out_free;
>>>>  
>>>>  	/*
>>>>  	 * We could map the physical PBA directly into the guest, but it's
>>>> @@ -820,10 +891,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>>>  	 * between MSI-X table and PBA. For the sake of isolation, create a
>>>>  	 * virtual PBA.
>>>>  	 */
>>>> -	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
>>>> -				 vfio_pci_msix_pba_access, pdev);
>>>> -	if (ret < 0)
>>>> -		goto out_free;
>>>> +	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>>>>  
>>>>  	pdev->msix.entries = entries;
>>>>  	pdev->msix.nr_entries = nr_entries;
>>>> @@ -894,11 +962,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>>>>  		region->guest_phys_addr = pci_get_mmio_block(map_size);
>>>>  	}
>>>>  
>>>> -	/* Map the BARs into the guest or setup a trap region. */
>>>> -	ret = vfio_map_region(kvm, vdev, region);
>>>> -	if (ret)
>>>> -		return ret;
>>>> -
>>>>  	return 0;
>>>>  }
>>>>  
>>>> @@ -945,7 +1008,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>>>>  	}
>>>>  
>>>>  	/* We've configured the BARs, fake up a Configuration Space */
>>>> -	return vfio_pci_fixup_cfg_space(vdev);
>>>> +	ret = vfio_pci_fixup_cfg_space(vdev);
>>>> +	if (ret)
>>>> +		return ret;
>>>> +
>>>> +	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
>>>> +					 vfio_pci_bar_deactivate, vdev);
>>>>  }
>>>>  
>>>>  /*
>>>> diff --git a/virtio/pci.c b/virtio/pci.c
>>>> index d111dc499f5e..598da699c241 100644
>>>> --- a/virtio/pci.c
>>>> +++ b/virtio/pci.c
>>>> @@ -11,6 +11,7 @@
>>>>  #include <sys/ioctl.h>
>>>>  #include <linux/virtio_pci.h>
>>>>  #include <linux/byteorder.h>
>>>> +#include <assert.h>
>>>>  #include <string.h>
>>>>  
>>>>  static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
>>>> @@ -462,6 +463,64 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>>>>  		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
>>>>  }
>>>>  
>>>> +static int virtio_pci__bar_activate(struct kvm *kvm,
>>>> +				    struct pci_device_header *pci_hdr,
>>>> +				    int bar_num, void *data)
>>>> +{
>>>> +	struct virtio_device *vdev = data;
>>>> +	u32 bar_addr, bar_size;
>>>> +	int r = -EINVAL;
>>>> +
>>>> +	assert(bar_num <= 2);
>>>> +
>>>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>>>> +	bar_size = pci__bar_size(pci_hdr, bar_num);
>>>> +
>>>> +	switch (bar_num) {
>>>> +	case 0:
>>>> +		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
>>>> +				     bar_size, vdev);
>>>> +		if (r > 0)
>>>> +			r = 0;
>>>> +		break;
>>>> +	case 1:
>>>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>>>> +					virtio_pci__io_mmio_callback, vdev);
>>>> +		break;
>>>> +	case 2:
>>>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>>>> +					virtio_pci__msix_mmio_callback, vdev);
>>> I think adding a break; here looks nicer.
>> Sure, it will make the function look more consistent.
>>
>> Thanks,
>> Alex
>>> Cheers,
>>> Andre
>>>
>>>
>>>> +	}
>>>> +
>>>> +	return r;
>>>> +}
>>>> +
>>>> +static int virtio_pci__bar_deactivate(struct kvm *kvm,
>>>> +				      struct pci_device_header *pci_hdr,
>>>> +				      int bar_num, void *data)
>>>> +{
>>>> +	u32 bar_addr;
>>>> +	bool success;
>>>> +	int r = -EINVAL;
>>>> +
>>>> +	assert(bar_num <= 2);
>>>> +
>>>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>>>> +
>>>> +	switch (bar_num) {
>>>> +	case 0:
>>>> +		r = ioport__unregister(kvm, bar_addr);
>>>> +		break;
>>>> +	case 1:
>>>> +	case 2:
>>>> +		success = kvm__deregister_mmio(kvm, bar_addr);
>>>> +		/* kvm__deregister_mmio fails when the region is not found. */
>>>> +		r = (success ? 0 : -ENOENT);
>>>> +	}
>>>> +
>>>> +	return r;
>>>> +}
>>>> +
>>>>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>>  		     int device_id, int subsys_id, int class)
>>>>  {
>>>> @@ -476,23 +535,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>>>>  
>>>>  	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
>>>> -	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
>>>> -			     vdev);
>>>> -	if (r < 0)
>>>> -		return r;
>>>> -	port_addr = (u16)r;
>>>> -
>>>>  	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
>>>> -	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
>>>> -			       virtio_pci__io_mmio_callback, vdev);
>>>> -	if (r < 0)
>>>> -		goto free_ioport;
>>>> -
>>>>  	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
>>>> -	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
>>>> -			       virtio_pci__msix_mmio_callback, vdev);
>>>> -	if (r < 0)
>>>> -		goto free_mmio;
>>>>  
>>>>  	vpci->pci_hdr = (struct pci_device_header) {
>>>>  		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>>>> @@ -518,6 +562,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>>  		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
>>>>  	};
>>>>  
>>>> +	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
>>>> +				      virtio_pci__bar_activate,
>>>> +				      virtio_pci__bar_deactivate, vdev);
>>>> +	if (r < 0)
>>>> +		return r;
>>>> +
>>>>  	vpci->dev_hdr = (struct device_header) {
>>>>  		.bus_type		= DEVICE_BUS_PCI,
>>>>  		.data			= &vpci->pci_hdr,
>>>> @@ -548,20 +598,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>>  
>>>>  	r = device__register(&vpci->dev_hdr);
>>>>  	if (r < 0)
>>>> -		goto free_msix_mmio;
>>>> +		return r;
>>>>  
>>>>  	/* save the IRQ that device__register() has allocated */
>>>>  	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
>>>>  
>>>>  	return 0;
>>>> -
>>>> -free_msix_mmio:
>>>> -	kvm__deregister_mmio(kvm, msix_io_block);
>>>> -free_mmio:
>>>> -	kvm__deregister_mmio(kvm, mmio_addr);
>>>> -free_ioport:
>>>> -	ioport__unregister(kvm, port_addr);
>>>> -	return r;
>>>>  }
>>>>  
>>>>  int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)
>>>>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 29/32] pci: Implement reassignable BARs
  2020-04-06 14:05   ` André Przywara
@ 2020-05-06 12:05     ` Alexandru Elisei
  0 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-06 12:05 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 4/6/20 3:05 PM, André Przywara wrote:
> On 26/03/2020 15:24, Alexandru Elisei wrote:
>> BARs are used by the guest to configure the access to the PCI device by
>> writing the address to which the device will respond. The basic idea for
>> adding support for reassignable BARs is straightforward: deactivate
>> emulation for the memory region described by the old BAR value, and
>> activate emulation for the new region.
>>
>> BAR reassignement can be done while device access is enabled and memory
>> regions for different devices can overlap as long as no access is made
>> to the overlapping memory regions. This means that it is legal for the
>> BARs of two distinct devices to point to an overlapping memory region,
>> and indeed, this is how Linux does resource assignment at boot. To
>> account for this situation, the simple algorithm described above is
>> enhanced to scan for all devices and:
>>
>> - Deactivate emulation for any BARs that might overlap with the new BAR
>>   value.
>>
>> - Enable emulation for any BARs that were overlapping with the old value
>>   after the BAR has been updated.
>>
>> Activating/deactivating emulation of a memory region has side effects.
>> In order to prevent the execution of the same callback twice we now keep
>> track of the state of the region emulation. For example, this can happen
>> if we program a BAR with an address that overlaps a second BAR, thus
>> deactivating emulation for the second BAR, and then we disable all
>> region accesses to the second BAR by writing to the command register.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  include/kvm/pci.h |  14 ++-
>>  pci.c             | 273 +++++++++++++++++++++++++++++++++++++---------
>>  vfio/pci.c        |  12 ++
>>  3 files changed, 244 insertions(+), 55 deletions(-)
>>
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index 1d7d4c0cea5a..be75f77fd2cb 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -11,6 +11,17 @@
>>  #include "kvm/msi.h"
>>  #include "kvm/fdt.h"
>>  
>> +#define pci_dev_err(pci_hdr, fmt, ...) \
>> +	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_warn(pci_hdr, fmt, ...) \
>> +	pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_info(pci_hdr, fmt, ...) \
>> +	pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_dbg(pci_hdr, fmt, ...) \
>> +	pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_die(pci_hdr, fmt, ...) \
>> +	die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +
>>  /*
>>   * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
>>   * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
>> @@ -142,7 +153,8 @@ struct pci_device_header {
>>  	};
>>  
>>  	/* Private to lkvm */
>> -	u32		bar_size[6];
>> +	u32			bar_size[6];
>> +	bool			bar_active[6];
>>  	bar_activate_fn_t	bar_activate_fn;
>>  	bar_deactivate_fn_t	bar_deactivate_fn;
>>  	void *data;
>> diff --git a/pci.c b/pci.c
>> index c2860e6707fe..68ece65441a6 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -71,6 +71,11 @@ static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_nu
>>  	return pci__bar_size(pci_hdr, bar_num);
>>  }
>>  
>> +static bool pci_bar_is_active(struct pci_device_header *pci_hdr, int bar_num)
>> +{
>> +	return  pci_hdr->bar_active[bar_num];
>> +}
>> +
>>  static void *pci_config_address_ptr(u16 port)
>>  {
>>  	unsigned long offset;
>> @@ -163,6 +168,46 @@ static struct ioport_operations pci_config_data_ops = {
>>  	.io_out	= pci_config_data_out,
>>  };
>>  
>> +static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			    int bar_num)
>> +{
>> +	int r = 0;
>> +
>> +	if (pci_bar_is_active(pci_hdr, bar_num))
>> +		goto out;
>> +
>> +	r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
>> +	if (r < 0) {
>> +		pci_dev_warn(pci_hdr, "Error activating emulation for BAR %d",
>> +			    bar_num);
>> +		goto out;
>> +	}
>> +	pci_hdr->bar_active[bar_num] = true;
>> +
>> +out:
>> +	return r;
>> +}
>> +
>> +static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			      int bar_num)
>> +{
>> +	int r = 0;
>> +
>> +	if (!pci_bar_is_active(pci_hdr, bar_num))
>> +		goto out;
>> +
>> +	r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
>> +	if (r < 0) {
>> +		pci_dev_warn(pci_hdr, "Error deactivating emulation for BAR %d",
>> +			    bar_num);
>> +		goto out;
>> +	}
>> +	pci_hdr->bar_active[bar_num] = false;
>> +
>> +out:
>> +	return r;
>> +}
>> +
>>  static void pci_config_command_wr(struct kvm *kvm,
>>  				  struct pci_device_header *pci_hdr,
>>  				  u16 new_command)
>> @@ -179,26 +224,179 @@ static void pci_config_command_wr(struct kvm *kvm,
>>  
>>  		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
>>  			if (__pci__io_space_enabled(new_command))
>> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>> -							 pci_hdr->data);
>> +				pci_activate_bar(kvm, pci_hdr, i);
>>  			else
>> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>> -							   pci_hdr->data);
>> +				pci_deactivate_bar(kvm, pci_hdr, i);
>>  		}
>>  
>>  		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
>>  			if (__pci__memory_space_enabled(new_command))
>> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>> -							 pci_hdr->data);
>> +				pci_activate_bar(kvm, pci_hdr, i);
>>  			else
>> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>> -							   pci_hdr->data);
>> +				pci_deactivate_bar(kvm, pci_hdr, i);
>>  		}
>>  	}
>>  
>>  	pci_hdr->command = new_command;
>>  }
>>  
>> +static int pci_deactivate_bar_regions(struct kvm *kvm,
>> +				      struct pci_device_header *pci_hdr,
> What is this argument for? I don't see it used, also this function is
> not used as a callback. A leftover from some refactoring?

Nicely spotted, it's a leftover from refactoring, I'll remove it.

>
>> +				      u32 start, u32 size)
>> +{
>> +	struct device_header *dev_hdr;
>> +	struct pci_device_header *tmp_hdr;
>> +	u32 tmp_addr, tmp_size;
>> +	int i, r;
>> +
>> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
>> +	while (dev_hdr) {
>> +		tmp_hdr = dev_hdr->data;
>> +		for (i = 0; i < 6; i++) {
>> +			if (!pci_bar_is_implemented(tmp_hdr, i))
>> +				continue;
>> +
>> +			tmp_addr = pci__bar_address(tmp_hdr, i);
>> +			tmp_size = pci__bar_size(tmp_hdr, i);
>> +
>> +			if (tmp_addr + tmp_size <= start ||
>> +			    tmp_addr >= start + size)
>> +				continue;
>> +
>> +			r = pci_deactivate_bar(kvm, tmp_hdr, i);
> This and the function below are identical, apart from this line here.
> Any chance we can merge this?

Good idea, I'll merge them.

>
>> +			if (r < 0)
>> +				return r;
>> +		}
>> +		dev_hdr = device__next_dev(dev_hdr);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int pci_activate_bar_regions(struct kvm *kvm,
>> +				    struct pci_device_header *pci_hdr,
> Same question about this parameter, we don't seem to need it. But you
> should merge those two functions anyway.
>
> The rest seems to look alright. The pci_config_bar_wr() looks a bit
> lengthy, but I guess there is no way around it, since this is the gut of
> the algorithm explained in the commit message. Can you maybe add a
> condensed version of that explanation to the code?

I can try adding a comment, but it will make the function even more lengthy. I'll
give it a go and see how it looks.

Thanks,
Alex
>
> Thanks,
> Andre
>
>
>> +				    u32 start, u32 size)
>> +{
>> +	struct device_header *dev_hdr;
>> +	struct pci_device_header *tmp_hdr;
>> +	u32 tmp_addr, tmp_size;
>> +	int i, r;
>> +
>> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
>> +	while (dev_hdr) {
>> +		tmp_hdr = dev_hdr->data;
>> +		for (i = 0; i < 6; i++) {
>> +			if (!pci_bar_is_implemented(tmp_hdr, i))
>> +				continue;
>> +
>> +			tmp_addr = pci__bar_address(tmp_hdr, i);
>> +			tmp_size = pci__bar_size(tmp_hdr, i);
>> +
>> +			if (tmp_addr + tmp_size <= start ||
>> +			    tmp_addr >= start + size)
>> +				continue;
>> +
>> +			r = pci_activate_bar(kvm, tmp_hdr, i);
>> +			if (r < 0)
>> +				return r;
>> +		}
>> +		dev_hdr = device__next_dev(dev_hdr);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void pci_config_bar_wr(struct kvm *kvm,
>> +			      struct pci_device_header *pci_hdr, int bar_num,
>> +			      u32 value)
>> +{
>> +	u32 old_addr, new_addr, bar_size;
>> +	u32 mask;
>> +	int r;
>> +
>> +	if (pci__bar_is_io(pci_hdr, bar_num))
>> +		mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
>> +	else
>> +		mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
>> +
>> +	/*
>> +	 * If the kernel masks the BAR, it will expect to find the size of the
>> +	 * BAR there next time it reads from it. After the kernel reads the
>> +	 * size, it will write the address back.
>> +	 *
>> +	 * According to the PCI local bus specification REV 3.0: The number of
>> +	 * upper bits that a device actually implements depends on how much of
>> +	 * the address space the device will respond to. A device that wants a 1
>> +	 * MB memory address space (using a 32-bit base address register) would
>> +	 * build the top 12 bits of the address register, hardwiring the other
>> +	 * bits to 0.
>> +	 *
>> +	 * Furthermore, software can determine how much address space the device
>> +	 * requires by writing a value of all 1's to the register and then
>> +	 * reading the value back. The device will return 0's in all don't-care
>> +	 * address bits, effectively specifying the address space required.
>> +	 *
>> +	 * Software computes the size of the address space with the formula
>> +	 * S =  ~B + 1, where S is the memory size and B is the value read from
>> +	 * the BAR. This means that the BAR value that kvmtool should return is
>> +	 * B = ~(S - 1).
>> +	 */
>> +	if (value == 0xffffffff) {
>> +		value = ~(pci__bar_size(pci_hdr, bar_num) - 1);
>> +		/* Preserve the special bits. */
>> +		value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
>> +		pci_hdr->bar[bar_num] = value;
>> +		return;
>> +	}
>> +
>> +	value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
>> +
>> +	/* Don't toggle emulation when region type access is disbled. */
>> +	if (pci__bar_is_io(pci_hdr, bar_num) &&
>> +	    !pci__io_space_enabled(pci_hdr)) {
>> +		pci_hdr->bar[bar_num] = value;
>> +		return;
>> +	}
>> +
>> +	if (pci__bar_is_memory(pci_hdr, bar_num) &&
>> +	    !pci__memory_space_enabled(pci_hdr)) {
>> +		pci_hdr->bar[bar_num] = value;
>> +		return;
>> +	}
>> +
>> +	old_addr = pci__bar_address(pci_hdr, bar_num);
>> +	new_addr = __pci__bar_address(value);
>> +	bar_size = pci__bar_size(pci_hdr, bar_num);
>> +
>> +	r = pci_deactivate_bar(kvm, pci_hdr, bar_num);
>> +	if (r < 0)
>> +		return;
>> +
>> +	r = pci_deactivate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
>> +	if (r < 0) {
>> +		/*
>> +		 * We cannot update the BAR because of an overlapping region
>> +		 * that failed to deactivate emulation, so keep the old BAR
>> +		 * value and re-activate emulation for it.
>> +		 */
>> +		pci_activate_bar(kvm, pci_hdr, bar_num);
>> +		return;
>> +	}
>> +
>> +	pci_hdr->bar[bar_num] = value;
>> +	r = pci_activate_bar(kvm, pci_hdr, bar_num);
>> +	if (r < 0) {
>> +		/*
>> +		 * New region cannot be emulated, re-enable the regions that
>> +		 * were overlapping.
>> +		 */
>> +		pci_activate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
>> +		return;
>> +	}
>> +
>> +	pci_activate_bar_regions(kvm, pci_hdr, old_addr, bar_size);
>> +}
>> +
>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>>  	void *base;
>> @@ -206,7 +404,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>>  	u32 value = 0;
>> -	u32 mask;
>>  
>>  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
>>  		return;
>> @@ -231,46 +428,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	}
>>  
>>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>> -
>> -	/*
>> -	 * If the kernel masks the BAR, it will expect to find the size of the
>> -	 * BAR there next time it reads from it. After the kernel reads the
>> -	 * size, it will write the address back.
>> -	 */
>>  	if (bar < 6) {
>> -		if (pci__bar_is_io(pci_hdr, bar))
>> -			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
>> -		else
>> -			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
>> -		/*
>> -		 * According to the PCI local bus specification REV 3.0:
>> -		 * The number of upper bits that a device actually implements
>> -		 * depends on how much of the address space the device will
>> -		 * respond to. A device that wants a 1 MB memory address space
>> -		 * (using a 32-bit base address register) would build the top
>> -		 * 12 bits of the address register, hardwiring the other bits
>> -		 * to 0.
>> -		 *
>> -		 * Furthermore, software can determine how much address space
>> -		 * the device requires by writing a value of all 1's to the
>> -		 * register and then reading the value back. The device will
>> -		 * return 0's in all don't-care address bits, effectively
>> -		 * specifying the address space required.
>> -		 *
>> -		 * Software computes the size of the address space with the
>> -		 * formula S = ~B + 1, where S is the memory size and B is the
>> -		 * value read from the BAR. This means that the BAR value that
>> -		 * kvmtool should return is B = ~(S - 1).
>> -		 */
>>  		memcpy(&value, data, size);
>> -		if (value == 0xffffffff)
>> -			value = ~(pci__bar_size(pci_hdr, bar) - 1);
>> -		/* Preserve the special bits. */
>> -		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
>> -		memcpy(base + offset, &value, size);
>> -	} else {
>> -		memcpy(base + offset, data, size);
>> +		pci_config_bar_wr(kvm, pci_hdr, bar, value);
>> +		return;
>>  	}
>> +
>> +	memcpy(base + offset, data, size);
>>  }
>>  
>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>> @@ -338,20 +502,21 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr
>>  			continue;
>>  
>>  		has_bar_regions = true;
>> +		assert(!pci_bar_is_active(pci_hdr, i));
>>  
>>  		if (pci__bar_is_io(pci_hdr, i) &&
>>  		    pci__io_space_enabled(pci_hdr)) {
>> -				r = bar_activate_fn(kvm, pci_hdr, i, data);
>> -				if (r < 0)
>> -					return r;
>> -			}
>> +			r = pci_activate_bar(kvm, pci_hdr, i);
>> +			if (r < 0)
>> +				return r;
>> +		}
>>  
>>  		if (pci__bar_is_memory(pci_hdr, i) &&
>>  		    pci__memory_space_enabled(pci_hdr)) {
>> -				r = bar_activate_fn(kvm, pci_hdr, i, data);
>> -				if (r < 0)
>> -					return r;
>> -			}
>> +			r = pci_activate_bar(kvm, pci_hdr, i);
>> +			if (r < 0)
>> +				return r;
>> +		}
>>  	}
>>  
>>  	assert(has_bar_regions);
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 18e22a8c5320..2b891496547d 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -457,6 +457,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>>  	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>>  	struct vfio_pci_msix_table *table = &pdev->msix_table;
>>  	struct vfio_region *region;
>> +	u32 bar_addr;
>>  	bool has_msix;
>>  	int ret;
>>  
>> @@ -465,7 +466,14 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>>  	region = &vdev->regions[bar_num];
>>  	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>>  
>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>> +	if (pci__bar_is_io(pci_hdr, bar_num))
>> +		region->port_base = bar_addr;
>> +	else
>> +		region->guest_phys_addr = bar_addr;
>> +
>>  	if (has_msix && (u32)bar_num == table->bar) {
>> +		table->guest_phys_addr = region->guest_phys_addr;
>>  		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>>  					 table->size, false,
>>  					 vfio_pci_msix_table_access, pdev);
>> @@ -474,6 +482,10 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>>  	}
>>  
>>  	if (has_msix && (u32)bar_num == pba->bar) {
>> +		if (pba->bar == table->bar)
>> +			pba->guest_phys_addr = table->guest_phys_addr + table->size;
>> +		else
>> +			pba->guest_phys_addr = region->guest_phys_addr;
>>  		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>>  					 pba->size, false,
>>  					 vfio_pci_msix_pba_access, pdev);
>>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support
  2020-04-06 14:06   ` André Przywara
  2020-04-14  8:56     ` Alexandru Elisei
@ 2020-05-06 13:51     ` Alexandru Elisei
  2020-05-12 14:17       ` André Przywara
  1 sibling, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-06 13:51 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 4/6/20 3:06 PM, André Przywara wrote:
> On 26/03/2020 15:24, Alexandru Elisei wrote:
>> PCI Express comes with an extended addressing scheme, which directly
>> translated into a bigger device configuration space (256->4096 bytes)
>> and bigger PCI configuration space (16->256 MB), as well as mandatory
>> capabilities (power management [1] and PCI Express capability [2]).
>>
>> However, our virtio PCI implementation implements version 0.9 of the
>> protocol and it still uses transitional PCI device ID's, so we have
>> opted to omit the mandatory PCI Express capabilities.For VFIO, the power
>> management and PCI Express capability are left for a subsequent patch.
>>
>> [1] PCI Express Base Specification Revision 1.1, section 7.6
>> [2] PCI Express Base Specification Revision 1.1, section 7.8
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  arm/include/arm-common/kvm-arch.h |  4 +-
>>  arm/pci.c                         |  2 +-
>>  builtin-run.c                     |  1 +
>>  include/kvm/kvm-config.h          |  2 +-
>>  include/kvm/pci.h                 | 76 ++++++++++++++++++++++++++++---
>>  pci.c                             |  5 +-
>>  vfio/pci.c                        | 26 +++++++----
>>  7 files changed, 96 insertions(+), 20 deletions(-)
>>
>> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
>> index b9d486d5eac2..13c55fa3dc29 100644
>> --- a/arm/include/arm-common/kvm-arch.h
>> +++ b/arm/include/arm-common/kvm-arch.h
>> @@ -23,7 +23,7 @@
>>  
>>  #define ARM_IOPORT_SIZE		(ARM_MMIO_AREA - ARM_IOPORT_AREA)
>>  #define ARM_VIRTIO_MMIO_SIZE	(ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
>> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
>> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
> The existence of this symbol seems somewhat odd, there should be no ARM
> specific version of the config space size.

The existence of the symbol is required because at the moment PCI Express support
is not available for all architecture, and this file needs to be included in
pci.h, not the other way around. I agree it's not ideal, but that's the way it is
right now.

> Do you know why we don't use the generic PCI_CFG_SIZE here?
> At the very least I would expect something like:
> #define ARM_PCI_CFG_SIZE	PCI_EXP_CFG_SIZE

We don't use PCI_EXP_CFG_SIZE because it might not be available for all
architectures. It is possible it's not there on x86, like the error you have
encountered showed, so instead of waiting for someone to report a compilation
failure, I would rather use a number from the start.

>
>>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>>  
>> @@ -50,6 +50,8 @@
>>  
>>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>>  
>> +#define ARCH_HAS_PCI_EXP	1
>> +
>>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>>  {
>>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
>> diff --git a/arm/pci.c b/arm/pci.c
>> index ed325fa4a811..2251f627d8b5 100644
>> --- a/arm/pci.c
>> +++ b/arm/pci.c
>> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
>>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
>> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
>> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>>  
>>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
>> diff --git a/builtin-run.c b/builtin-run.c
>> index 9cb8c75300eb..def8a1f803ad 100644
>> --- a/builtin-run.c
>> +++ b/builtin-run.c
>> @@ -27,6 +27,7 @@
>>  #include "kvm/irq.h"
>>  #include "kvm/kvm.h"
>>  #include "kvm/pci.h"
>> +#include "kvm/vfio.h"
>>  #include "kvm/rtc.h"
>>  #include "kvm/sdl.h"
>>  #include "kvm/vnc.h"
>> diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
>> index a052b0bc7582..a1012c57b7a7 100644
>> --- a/include/kvm/kvm-config.h
>> +++ b/include/kvm/kvm-config.h
>> @@ -2,7 +2,6 @@
>>  #define KVM_CONFIG_H_
>>  
>>  #include "kvm/disk-image.h"
>> -#include "kvm/vfio.h"
>>  #include "kvm/kvm-config-arch.h"
>>  
>>  #define DEFAULT_KVM_DEV		"/dev/kvm"
>> @@ -18,6 +17,7 @@
>>  #define MIN_RAM_SIZE_MB		(64ULL)
>>  #define MIN_RAM_SIZE_BYTE	(MIN_RAM_SIZE_MB << MB_SHIFT)
>>  
>> +struct vfio_device_params;
>>  struct kvm_config {
>>  	struct kvm_config_arch arch;
>>  	struct disk_image_params disk_image[MAX_DISK_IMAGES];
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index be75f77fd2cb..71ee9d8cb01f 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -10,6 +10,7 @@
>>  #include "kvm/devices.h"
>>  #include "kvm/msi.h"
>>  #include "kvm/fdt.h"
>> +#include "kvm.h"
>>  
>>  #define pci_dev_err(pci_hdr, fmt, ...) \
>>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> @@ -32,9 +33,41 @@
>>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>>  #define PCI_IO_SIZE		0x100
>>  #define PCI_IOPORT_START	0x6200
>> -#define PCI_CFG_SIZE		(1ULL << 24)
>>  
>> -struct kvm;
>> +#define PCIE_CAP_REG_VER	0x1
>> +#define PCIE_CAP_REG_DEV_LEGACY	(1 << 4)
>> +#define PM_CAP_VER		0x3
>> +
>> +#ifdef ARCH_HAS_PCI_EXP
>> +#define PCI_CFG_SIZE		(1ULL << 28)
>> +#define PCI_DEV_CFG_SIZE	4096
> Maybe use PCI_CFG_SPACE_EXP_SIZE from pci_regs.h?

I cannot do that because it's not available on all distros. I used
PCI_CFG_SPACE_EXP_SIZE in the previous iteration, but I changed it when Sami
reported a compilation failure on his particular setup (it's mentioned in the
changelog).

>
>> +
>> +union pci_config_address {
>> +	struct {
>> +#if __BYTE_ORDER == __LITTLE_ENDIAN
>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>> +		unsigned	enable_bit	: 1;		/* 31       */
>> +#else
>> +		unsigned	enable_bit	: 1;		/* 31       */
>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>> +#endif
> Just for the records:
> I think we agreed on this before, but using a C bitfield to model
> hardware defined bits is broken, because the C standard doesn't
> guarantee those bits to be consecutive and layed out like we hope it would.
> But since we have this issue already with the legacy config space, and
> it seems to work (TM), we can fix this later.

Still on the record, it's broken for big endian because only the byte order is
different, not the individual bits. But there are other things that are broken for
big endian, so not a big deal.

>
>> +	};
>> +	u32 w;
>> +};
>> +
>> +#else
>> +#define PCI_CFG_SIZE		(1ULL << 24)
>> +#define PCI_DEV_CFG_SIZE	256
> Shall we use PCI_CFG_SPACE_SIZE from the kernel headers here?

I would rather not, for the reasons explained above.

>
>>  
>>  union pci_config_address {
>>  	struct {
>> @@ -58,6 +91,8 @@ union pci_config_address {
>>  	};
>>  	u32 w;
>>  };
>> +#endif
>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>  
>>  struct msix_table {
>>  	struct msi_msg msg;
>> @@ -100,6 +135,33 @@ struct pci_cap_hdr {
>>  	u8	next;
>>  };
>>  
>> +struct pcie_cap {
> I guess this is meant to map to the PCI Express Capability Structure as
> described in the PCIe spec?
> We would need to add the "packed" attribute then. But actually I am not
> a fan of using C language constructs to model specified register
> arrangements, the kernel tries to avoid this as well.

I'm not sure what you are suggesting. Should we rewrite the entire PCI emulation
code in kvmtool then?

>
> Actually, looking closer: why do we need this in the first place? I
> removed this and struct pm_cap, and it still compiles.
> So can we lose those two structures at all? And move the discussion and
> implementation (for VirtIO 1.0?) to a later series?

I've answered both points in v2 of the series [1].

[1] https://www.spinics.net/lists/kvm/msg209601.html

>
>> +	u8 cap;
>> +	u8 next;
>> +	u16 cap_reg;
>> +	u32 dev_cap;
>> +	u16 dev_ctrl;
>> +	u16 dev_status;
>> +	u32 link_cap;
>> +	u16 link_ctrl;
>> +	u16 link_status;
>> +	u32 slot_cap;
>> +	u16 slot_ctrl;
>> +	u16 slot_status;
>> +	u16 root_ctrl;
>> +	u16 root_cap;
>> +	u32 root_status;
>> +};
>> +
>> +struct pm_cap {
>> +	u8 cap;
>> +	u8 next;
>> +	u16 pmc;
>> +	u16 pmcsr;
>> +	u8 pmcsr_bse;
>> +	u8 data;
>> +};
>> +
>>  struct pci_device_header;
>>  
>>  typedef int (*bar_activate_fn_t)(struct kvm *kvm,
>> @@ -110,14 +172,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>>  				   int bar_num, void *data);
>>  
>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>> -#define PCI_DEV_CFG_SIZE	256
>> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>  
>>  struct pci_config_operations {
>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -		      u8 offset, void *data, int sz);
>> +		      u16 offset, void *data, int sz);
>>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -		     u8 offset, void *data, int sz);
>> +		     u16 offset, void *data, int sz);
>>  };
>>  
>>  struct pci_device_header {
>> @@ -147,6 +207,10 @@ struct pci_device_header {
>>  			u8		min_gnt;
>>  			u8		max_lat;
>>  			struct msix_cap msix;
>> +#ifdef ARCH_HAS_PCI_EXP
>> +			struct pm_cap pm;
>> +			struct pcie_cap pcie;
>> +#endif
>>  		} __attribute__((packed));
>>  		/* Pad to PCI config space size */
>>  		u8	__pad[PCI_DEV_CFG_SIZE];
>> diff --git a/pci.c b/pci.c
>> index 68ece65441a6..b471209a6efc 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -400,7 +400,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>>  	void *base;
>> -	u8 bar, offset;
>> +	u8 bar;
>> +	u16 offset;
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>>  	u32 value = 0;
>> @@ -439,7 +440,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  
>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>> -	u8 offset;
>> +	u16 offset;
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>>  
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 2b891496547d..6b8726227ea0 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -311,7 +311,7 @@ out_unlock:
>>  }
>>  
>>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
>> -				    struct vfio_device *vdev, u8 off,
>> +				    struct vfio_device *vdev, u16 off,
>>  				    void *data, int sz)
>>  {
>>  	struct vfio_pci_device *pdev = &vdev->pci;
>> @@ -343,7 +343,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>>  }
>>  
>>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>> -				     u8 off, u8 *data, u32 sz)
>> +				     u16 off, u8 *data, u32 sz)
>>  {
>>  	size_t i;
>>  	u32 mask = 0;
>> @@ -391,7 +391,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>>  }
>>  
>>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
>> -				   u8 off, u8 *data, u32 sz)
>> +				   u16 off, u8 *data, u32 sz)
>>  {
>>  	u8 ctrl;
>>  	struct msi_msg msg;
>> @@ -536,7 +536,7 @@ out:
>>  }
>>  
>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -			      u8 offset, void *data, int sz)
>> +			      u16 offset, void *data, int sz)
>>  {
>>  	struct vfio_region_info *info;
>>  	struct vfio_pci_device *pdev;
>> @@ -554,7 +554,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>>  }
>>  
>>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -			       u8 offset, void *data, int sz)
>> +			       u16 offset, void *data, int sz)
>>  {
>>  	struct vfio_region_info *info;
>>  	struct vfio_pci_device *pdev;
>> @@ -638,15 +638,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>  {
>>  	int ret;
>>  	size_t size;
>> -	u8 pos, next;
>> +	u16 pos, next;
>>  	struct pci_cap_hdr *cap;
>> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
>> +	u8 *virt_hdr;
>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>  
>>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>>  		return 0;
>>  
>> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
>> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
>> +	if (!virt_hdr)
>> +		return -errno;
> There are two places where we return in this function, we don't seem to
> free virt_hdr in those cases. Looks like a job for your beloved goto ;-)
>
Indeed, I'll do that.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 19/32] ioport: mmio: Use a mutex and reference counting for locking
  2020-05-01 15:30     ` Alexandru Elisei
@ 2020-05-06 17:40       ` Alexandru Elisei
  0 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-06 17:40 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 5/1/20 4:30 PM, Alexandru Elisei wrote:
> Hi,
>
> On 3/31/20 12:51 PM, André Przywara wrote:
>> On 26/03/2020 15:24, Alexandru Elisei wrote:
>>
>> Hi,
>>
>>> kvmtool uses brlock for protecting accesses to the ioport and mmio
>>> red-black trees. brlock allows concurrent reads, but only one writer, which
>>> is assumed not to be a VCPU thread (for more information see commit
>>> 0b907ed2eaec ("kvm tools: Add a brlock)). This is done by issuing a
>>> compiler barrier on read and pausing the entire virtual machine on writes.
>>> When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread read/write
>>> lock.
>>>
>>> When we will implement reassignable BARs, the mmio or ioport mapping will
>>> be done as a result of a VCPU mmio access. When brlock is a pthread
>>> read/write lock, it means that we will try to acquire a write lock with the
>>> read lock already held by the same VCPU and we will deadlock. When it's
>>> not, a VCPU will have to call kvm__pause, which means the virtual machine
>>> will stay paused forever.
>>>
>>> Let's avoid all this by using a mutex and reference counting the red-black
>>> tree entries. This way we can guarantee that we won't unregister a node
>>> that another thread is currently using for emulation.
>>>
>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>> ---
>>>  include/kvm/ioport.h          |  2 +
>>>  include/kvm/rbtree-interval.h |  4 +-
>>>  ioport.c                      | 64 +++++++++++++++++-------
>>>  mmio.c                        | 91 +++++++++++++++++++++++++----------
>>>  4 files changed, 118 insertions(+), 43 deletions(-)
>>>
>>> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
>>> index 62a719327e3f..039633f76bdd 100644
>>> --- a/include/kvm/ioport.h
>>> +++ b/include/kvm/ioport.h
>>> @@ -22,6 +22,8 @@ struct ioport {
>>>  	struct ioport_operations	*ops;
>>>  	void				*priv;
>>>  	struct device_header		dev_hdr;
>>> +	u32				refcount;
>>> +	bool				remove;
>> The use of this extra "remove" variable seems somehow odd. I think
>> normally you would initialise the refcount to 1, and let the unregister
>> operation do a put as well, with the removal code triggered if the count
>> reaches zero. At least this is what kref does, can we do the same here?
>> Or is there anything that would prevent it? I think it's a good idea to
>> stick to existing design patterns for things like refcounts.
>>
>> Cheers,
>> Andre.
>>
> You're totally right, it didn't cross my mind to initialize refcount to 1, it's a
> great idea, I'll do it like that.

I take that back, I've tested your proposal and it's not working. Let's consider
the scenario with 3 threads where the a mmio node is created with a refcount of 1,
as you suggested (please excuse any bad formatting):

     1             |        2           |        3

1. mmio_get()      |                    |

2. refcount++(== 2)|                    |

                   | 1. deactivate BAR  |

                   | 2. BAR active      |

                   |                    | 1. deactivate BAR

                   |                    | 2. BAR active

                   |                    | 3. refcount--(==1)

                   | 3. refcount--(==0) |

                   | 4. remove mmio node|

3. mmio->mmio_fn() |                    |

When Thread 1 calls mmio->mmio_fn(), the mmio node has been free'd. My original
version works because the only place where I decrement refcount is after the call
to mmio_fn. I really don't want to use locking when calling pci_deactivate_bar. Do
you have any ideas?

Thanks,
Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support
  2020-05-06 13:51     ` Alexandru Elisei
@ 2020-05-12 14:17       ` André Przywara
  2020-05-12 15:44         ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-05-12 14:17 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 06/05/2020 14:51, Alexandru Elisei wrote:

Hi,

> On 4/6/20 3:06 PM, André Przywara wrote:
>> On 26/03/2020 15:24, Alexandru Elisei wrote:
>>> PCI Express comes with an extended addressing scheme, which directly
>>> translated into a bigger device configuration space (256->4096 bytes)
>>> and bigger PCI configuration space (16->256 MB), as well as mandatory
>>> capabilities (power management [1] and PCI Express capability [2]).
>>>
>>> However, our virtio PCI implementation implements version 0.9 of the
>>> protocol and it still uses transitional PCI device ID's, so we have
>>> opted to omit the mandatory PCI Express capabilities.For VFIO, the power
>>> management and PCI Express capability are left for a subsequent patch.
>>>
>>> [1] PCI Express Base Specification Revision 1.1, section 7.6
>>> [2] PCI Express Base Specification Revision 1.1, section 7.8
>>>
>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>> ---
>>>  arm/include/arm-common/kvm-arch.h |  4 +-
>>>  arm/pci.c                         |  2 +-
>>>  builtin-run.c                     |  1 +
>>>  include/kvm/kvm-config.h          |  2 +-
>>>  include/kvm/pci.h                 | 76 ++++++++++++++++++++++++++++---
>>>  pci.c                             |  5 +-
>>>  vfio/pci.c                        | 26 +++++++----
>>>  7 files changed, 96 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
>>> index b9d486d5eac2..13c55fa3dc29 100644
>>> --- a/arm/include/arm-common/kvm-arch.h
>>> +++ b/arm/include/arm-common/kvm-arch.h
>>> @@ -23,7 +23,7 @@
>>>  
>>>  #define ARM_IOPORT_SIZE		(ARM_MMIO_AREA - ARM_IOPORT_AREA)
>>>  #define ARM_VIRTIO_MMIO_SIZE	(ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
>>> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
>>> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
>> The existence of this symbol seems somewhat odd, there should be no ARM
>> specific version of the config space size.
> 
> The existence of the symbol is required because at the moment PCI Express support
> is not available for all architecture, and this file needs to be included in
> pci.h, not the other way around. I agree it's not ideal, but that's the way it is
> right now.
> 
>> Do you know why we don't use the generic PCI_CFG_SIZE here?
>> At the very least I would expect something like:
>> #define ARM_PCI_CFG_SIZE	PCI_EXP_CFG_SIZE
> 
> We don't use PCI_EXP_CFG_SIZE because it might not be available for all
> architectures. It is possible it's not there on x86, like the error you have
> encountered showed, so instead of waiting for someone to report a compilation
> failure, I would rather use a number from the start.

Ah, OK, fair enough then.
I just wanted to avoid the impression that this is something
architecture specific.

>>
>>>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>>>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>>>  
>>> @@ -50,6 +50,8 @@
>>>  
>>>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>>>  
>>> +#define ARCH_HAS_PCI_EXP	1
>>> +
>>>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>>>  {
>>>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
>>> diff --git a/arm/pci.c b/arm/pci.c
>>> index ed325fa4a811..2251f627d8b5 100644
>>> --- a/arm/pci.c
>>> +++ b/arm/pci.c
>>> @@ -62,7 +62,7 @@ void pci__generate_fdt_nodes(void *fdt)
>>>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>>>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>>>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
>>> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
>>> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>>>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>>>  
>>>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
>>> diff --git a/builtin-run.c b/builtin-run.c
>>> index 9cb8c75300eb..def8a1f803ad 100644
>>> --- a/builtin-run.c
>>> +++ b/builtin-run.c
>>> @@ -27,6 +27,7 @@
>>>  #include "kvm/irq.h"
>>>  #include "kvm/kvm.h"
>>>  #include "kvm/pci.h"
>>> +#include "kvm/vfio.h"
>>>  #include "kvm/rtc.h"
>>>  #include "kvm/sdl.h"
>>>  #include "kvm/vnc.h"
>>> diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
>>> index a052b0bc7582..a1012c57b7a7 100644
>>> --- a/include/kvm/kvm-config.h
>>> +++ b/include/kvm/kvm-config.h
>>> @@ -2,7 +2,6 @@
>>>  #define KVM_CONFIG_H_
>>>  
>>>  #include "kvm/disk-image.h"
>>> -#include "kvm/vfio.h"
>>>  #include "kvm/kvm-config-arch.h"
>>>  
>>>  #define DEFAULT_KVM_DEV		"/dev/kvm"
>>> @@ -18,6 +17,7 @@
>>>  #define MIN_RAM_SIZE_MB		(64ULL)
>>>  #define MIN_RAM_SIZE_BYTE	(MIN_RAM_SIZE_MB << MB_SHIFT)
>>>  
>>> +struct vfio_device_params;
>>>  struct kvm_config {
>>>  	struct kvm_config_arch arch;
>>>  	struct disk_image_params disk_image[MAX_DISK_IMAGES];
>>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>>> index be75f77fd2cb..71ee9d8cb01f 100644
>>> --- a/include/kvm/pci.h
>>> +++ b/include/kvm/pci.h
>>> @@ -10,6 +10,7 @@
>>>  #include "kvm/devices.h"
>>>  #include "kvm/msi.h"
>>>  #include "kvm/fdt.h"
>>> +#include "kvm.h"
>>>  
>>>  #define pci_dev_err(pci_hdr, fmt, ...) \
>>>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>>> @@ -32,9 +33,41 @@
>>>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>>>  #define PCI_IO_SIZE		0x100
>>>  #define PCI_IOPORT_START	0x6200
>>> -#define PCI_CFG_SIZE		(1ULL << 24)
>>>  
>>> -struct kvm;
>>> +#define PCIE_CAP_REG_VER	0x1
>>> +#define PCIE_CAP_REG_DEV_LEGACY	(1 << 4)
>>> +#define PM_CAP_VER		0x3
>>> +
>>> +#ifdef ARCH_HAS_PCI_EXP
>>> +#define PCI_CFG_SIZE		(1ULL << 28)
>>> +#define PCI_DEV_CFG_SIZE	4096
>> Maybe use PCI_CFG_SPACE_EXP_SIZE from pci_regs.h?
> 
> I cannot do that because it's not available on all distros. I used
> PCI_CFG_SPACE_EXP_SIZE in the previous iteration, but I changed it when Sami
> reported a compilation failure on his particular setup (it's mentioned in the
> changelog).

OK.

>>
>>> +
>>> +union pci_config_address {
>>> +	struct {
>>> +#if __BYTE_ORDER == __LITTLE_ENDIAN
>>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>>> +		unsigned	enable_bit	: 1;		/* 31       */
>>> +#else
>>> +		unsigned	enable_bit	: 1;		/* 31       */
>>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>>> +#endif
>> Just for the records:
>> I think we agreed on this before, but using a C bitfield to model
>> hardware defined bits is broken, because the C standard doesn't
>> guarantee those bits to be consecutive and layed out like we hope it would.
>> But since we have this issue already with the legacy config space, and
>> it seems to work (TM), we can fix this later.
> 
> Still on the record, it's broken for big endian because only the byte order is
> different, not the individual bits. But there are other things that are broken for
> big endian, so not a big deal.
> 
>>
>>> +	};
>>> +	u32 w;
>>> +};
>>> +
>>> +#else
>>> +#define PCI_CFG_SIZE		(1ULL << 24)
>>> +#define PCI_DEV_CFG_SIZE	256
>> Shall we use PCI_CFG_SPACE_SIZE from the kernel headers here?
> 
> I would rather not, for the reasons explained above.
> 
>>
>>>  
>>>  union pci_config_address {
>>>  	struct {
>>> @@ -58,6 +91,8 @@ union pci_config_address {
>>>  	};
>>>  	u32 w;
>>>  };
>>> +#endif
>>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>  
>>>  struct msix_table {
>>>  	struct msi_msg msg;
>>> @@ -100,6 +135,33 @@ struct pci_cap_hdr {
>>>  	u8	next;
>>>  };
>>>  
>>> +struct pcie_cap {
>> I guess this is meant to map to the PCI Express Capability Structure as
>> described in the PCIe spec?
>> We would need to add the "packed" attribute then. But actually I am not
>> a fan of using C language constructs to model specified register
>> arrangements, the kernel tries to avoid this as well.
> 
> I'm not sure what you are suggesting. Should we rewrite the entire PCI emulation
> code in kvmtool then?

At least not add more of that problematic code, especially if we don't
need it. Maybe there is a better solution for the operations we will
need (byte array?), that's hard to say without seeing the code.

>> Actually, looking closer: why do we need this in the first place? I
>> removed this and struct pm_cap, and it still compiles.
>> So can we lose those two structures at all? And move the discussion and
>> implementation (for VirtIO 1.0?) to a later series?
> 
> I've answered both points in v2 of the series [1].
> 
> [1] https://www.spinics.net/lists/kvm/msg209601.html:

From there:
>> But more importantly: Do we actually need those definitions? We
>> don't seem to use them, do we?
>> And the u8 __pad[PCI_DEV_CFG_SIZE] below should provide the extended
>> storage space a guest would expect?
>
> Yes, we don't use them for the reasons I explained in the commit
> message. I would rather keep them, because they are required by the
> PCIE spec.

I don't get the point of adding code / data structures that we don't
need, especially if it has issues. I understand it's mandatory as per
the spec, but just adding a struct here doesn't fix this or makes this
better.

Cheers,
Andre

> 
>>
>>> +	u8 cap;
>>> +	u8 next;
>>> +	u16 cap_reg;
>>> +	u32 dev_cap;
>>> +	u16 dev_ctrl;
>>> +	u16 dev_status;
>>> +	u32 link_cap;
>>> +	u16 link_ctrl;
>>> +	u16 link_status;
>>> +	u32 slot_cap;
>>> +	u16 slot_ctrl;
>>> +	u16 slot_status;
>>> +	u16 root_ctrl;
>>> +	u16 root_cap;
>>> +	u32 root_status;
>>> +};
>>> +
>>> +struct pm_cap {
>>> +	u8 cap;
>>> +	u8 next;
>>> +	u16 pmc;
>>> +	u16 pmcsr;
>>> +	u8 pmcsr_bse;
>>> +	u8 data;
>>> +};
>>> +
>>>  struct pci_device_header;
>>>  
>>>  typedef int (*bar_activate_fn_t)(struct kvm *kvm,
>>> @@ -110,14 +172,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>>>  				   int bar_num, void *data);
>>>  
>>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>>> -#define PCI_DEV_CFG_SIZE	256
>>> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>  
>>>  struct pci_config_operations {
>>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> -		      u8 offset, void *data, int sz);
>>> +		      u16 offset, void *data, int sz);
>>>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> -		     u8 offset, void *data, int sz);
>>> +		     u16 offset, void *data, int sz);
>>>  };
>>>  
>>>  struct pci_device_header {
>>> @@ -147,6 +207,10 @@ struct pci_device_header {
>>>  			u8		min_gnt;
>>>  			u8		max_lat;
>>>  			struct msix_cap msix;
>>> +#ifdef ARCH_HAS_PCI_EXP
>>> +			struct pm_cap pm;
>>> +			struct pcie_cap pcie;
>>> +#endif
>>>  		} __attribute__((packed));
>>>  		/* Pad to PCI config space size */
>>>  		u8	__pad[PCI_DEV_CFG_SIZE];
>>> diff --git a/pci.c b/pci.c
>>> index 68ece65441a6..b471209a6efc 100644
>>> --- a/pci.c
>>> +++ b/pci.c
>>> @@ -400,7 +400,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>>  {
>>>  	void *base;
>>> -	u8 bar, offset;
>>> +	u8 bar;
>>> +	u16 offset;
>>>  	struct pci_device_header *pci_hdr;
>>>  	u8 dev_num = addr.device_number;
>>>  	u32 value = 0;
>>> @@ -439,7 +440,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>>  
>>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>>  {
>>> -	u8 offset;
>>> +	u16 offset;
>>>  	struct pci_device_header *pci_hdr;
>>>  	u8 dev_num = addr.device_number;
>>>  
>>> diff --git a/vfio/pci.c b/vfio/pci.c
>>> index 2b891496547d..6b8726227ea0 100644
>>> --- a/vfio/pci.c
>>> +++ b/vfio/pci.c
>>> @@ -311,7 +311,7 @@ out_unlock:
>>>  }
>>>  
>>>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
>>> -				    struct vfio_device *vdev, u8 off,
>>> +				    struct vfio_device *vdev, u16 off,
>>>  				    void *data, int sz)
>>>  {
>>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>> @@ -343,7 +343,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>>>  }
>>>  
>>>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>>> -				     u8 off, u8 *data, u32 sz)
>>> +				     u16 off, u8 *data, u32 sz)
>>>  {
>>>  	size_t i;
>>>  	u32 mask = 0;
>>> @@ -391,7 +391,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>>>  }
>>>  
>>>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
>>> -				   u8 off, u8 *data, u32 sz)
>>> +				   u16 off, u8 *data, u32 sz)
>>>  {
>>>  	u8 ctrl;
>>>  	struct msi_msg msg;
>>> @@ -536,7 +536,7 @@ out:
>>>  }
>>>  
>>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> -			      u8 offset, void *data, int sz)
>>> +			      u16 offset, void *data, int sz)
>>>  {
>>>  	struct vfio_region_info *info;
>>>  	struct vfio_pci_device *pdev;
>>> @@ -554,7 +554,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>>>  }
>>>  
>>>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> -			       u8 offset, void *data, int sz)
>>> +			       u16 offset, void *data, int sz)
>>>  {
>>>  	struct vfio_region_info *info;
>>>  	struct vfio_pci_device *pdev;
>>> @@ -638,15 +638,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>>  {
>>>  	int ret;
>>>  	size_t size;
>>> -	u8 pos, next;
>>> +	u16 pos, next;
>>>  	struct pci_cap_hdr *cap;
>>> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
>>> +	u8 *virt_hdr;
>>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>>  
>>>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>>>  		return 0;
>>>  
>>> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
>>> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
>>> +	if (!virt_hdr)
>>> +		return -errno;
>> There are two places where we return in this function, we don't seem to
>> free virt_hdr in those cases. Looks like a job for your beloved goto ;-)
>>
> Indeed, I'll do that.
> 
> Thanks,
> Alex
> 


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support
  2020-05-12 14:17       ` André Przywara
@ 2020-05-12 15:44         ` Alexandru Elisei
  2020-05-13  8:17           ` André Przywara
  0 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-12 15:44 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 5/12/20 3:17 PM, André Przywara wrote:
> On 06/05/2020 14:51, Alexandru Elisei wrote:
>
> Hi,
>
>> On 4/6/20 3:06 PM, André Przywara wrote:
>>> On 26/03/2020 15:24, Alexandru Elisei wrote:
>>>
>>>>  
>>>>  union pci_config_address {
>>>>  	struct {
>>>> @@ -58,6 +91,8 @@ union pci_config_address {
>>>>  	};
>>>>  	u32 w;
>>>>  };
>>>> +#endif
>>>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>>  
>>>>  struct msix_table {
>>>>  	struct msi_msg msg;
>>>> @@ -100,6 +135,33 @@ struct pci_cap_hdr {
>>>>  	u8	next;
>>>>  };
>>>>  
>>>> +struct pcie_cap {
>>> I guess this is meant to map to the PCI Express Capability Structure as
>>> described in the PCIe spec?
>>> We would need to add the "packed" attribute then. But actually I am not
>>> a fan of using C language constructs to model specified register
>>> arrangements, the kernel tries to avoid this as well.
>> I'm not sure what you are suggesting. Should we rewrite the entire PCI emulation
>> code in kvmtool then?
> At least not add more of that problematic code, especially if we don't

I don't see why how the code is problematic. Did I miss it in a previous comment?

> need it. Maybe there is a better solution for the operations we will
> need (byte array?), that's hard to say without seeing the code.
>
>>> Actually, looking closer: why do we need this in the first place? I
>>> removed this and struct pm_cap, and it still compiles.
>>> So can we lose those two structures at all? And move the discussion and
>>> implementation (for VirtIO 1.0?) to a later series?
>> I've answered both points in v2 of the series [1].
>>
>> [1] https://www.spinics.net/lists/kvm/msg209601.html:
> From there:
>>> But more importantly: Do we actually need those definitions? We
>>> don't seem to use them, do we?
>>> And the u8 __pad[PCI_DEV_CFG_SIZE] below should provide the extended
>>> storage space a guest would expect?
>> Yes, we don't use them for the reasons I explained in the commit
>> message. I would rather keep them, because they are required by the
>> PCIE spec.
> I don't get the point of adding code / data structures that we don't
> need, especially if it has issues. I understand it's mandatory as per
> the spec, but just adding a struct here doesn't fix this or makes this
> better.

Sure, I can remove the unused structs, especially if they have issues. But I don't
see what issues they have, would you mind expanding on that?

Thanks,
Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support
  2020-05-12 15:44         ` Alexandru Elisei
@ 2020-05-13  8:17           ` André Przywara
  2020-05-13 14:42             ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: André Przywara @ 2020-05-13  8:17 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On 12/05/2020 16:44, Alexandru Elisei wrote:

Hi,

> On 5/12/20 3:17 PM, André Przywara wrote:
>> On 06/05/2020 14:51, Alexandru Elisei wrote:
>>
>> Hi,
>>
>>> On 4/6/20 3:06 PM, André Przywara wrote:
>>>> On 26/03/2020 15:24, Alexandru Elisei wrote:
>>>>
>>>>>  
>>>>>  union pci_config_address {
>>>>>  	struct {
>>>>> @@ -58,6 +91,8 @@ union pci_config_address {
>>>>>  	};
>>>>>  	u32 w;
>>>>>  };
>>>>> +#endif
>>>>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>>>  
>>>>>  struct msix_table {
>>>>>  	struct msi_msg msg;
>>>>> @@ -100,6 +135,33 @@ struct pci_cap_hdr {
>>>>>  	u8	next;
>>>>>  };
>>>>>  
>>>>> +struct pcie_cap {
>>>> I guess this is meant to map to the PCI Express Capability Structure as
>>>> described in the PCIe spec?
>>>> We would need to add the "packed" attribute then. But actually I am not
>>>> a fan of using C language constructs to model specified register
>>>> arrangements, the kernel tries to avoid this as well.
>>> I'm not sure what you are suggesting. Should we rewrite the entire PCI emulation
>>> code in kvmtool then?
>> At least not add more of that problematic code, especially if we don't
> 
> I don't see why how the code is problematic. Did I miss it in a previous comment?

I was referring to that point that modelling hardware defined registers
using a C struct is somewhat dodgy. As the C standard says, in section
6.7.2.1, end of paragraph 15:
"There may be unnamed padding within a structure object, but not at its
beginning."
Yes, GCC and other implementations offer "packed" to somewhat overcome
this, but this is a compiler extension and has other issues, like if you
have non-aligned members and take pointers to it.

I think our case here is more sane, since we always seem to use it on
normal memory (do we?), and are not mapping it to some actual device memory.

So I leave this up to you, but I am still opposed to the idea of adding
code that is not used.

Cheers,
Andre.

>> need it. Maybe there is a better solution for the operations we will
>> need (byte array?), that's hard to say without seeing the code.
>>
>>>> Actually, looking closer: why do we need this in the first place? I
>>>> removed this and struct pm_cap, and it still compiles.
>>>> So can we lose those two structures at all? And move the discussion and
>>>> implementation (for VirtIO 1.0?) to a later series?
>>> I've answered both points in v2 of the series [1].
>>>
>>> [1] https://www.spinics.net/lists/kvm/msg209601.html:
>> From there:
>>>> But more importantly: Do we actually need those definitions? We
>>>> don't seem to use them, do we?
>>>> And the u8 __pad[PCI_DEV_CFG_SIZE] below should provide the extended
>>>> storage space a guest would expect?
>>> Yes, we don't use them for the reasons I explained in the commit
>>> message. I would rather keep them, because they are required by the
>>> PCIE spec.
>> I don't get the point of adding code / data structures that we don't
>> need, especially if it has issues. I understand it's mandatory as per
>> the spec, but just adding a struct here doesn't fix this or makes this
>> better.
> 
> Sure, I can remove the unused structs, especially if they have issues. But I don't
> see what issues they have, would you mind expanding on that?

The best code is the one not written. ;-)

Cheers,
Andre

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support
  2020-05-13  8:17           ` André Przywara
@ 2020-05-13 14:42             ` Alexandru Elisei
  2021-06-04 16:46               ` Alexandru Elisei
  0 siblings, 1 reply; 68+ messages in thread
From: Alexandru Elisei @ 2020-05-13 14:42 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

On 5/13/20 9:17 AM, André Przywara wrote:
> On 12/05/2020 16:44, Alexandru Elisei wrote:
>
> Hi,
>
>> On 5/12/20 3:17 PM, André Przywara wrote:
>>> On 06/05/2020 14:51, Alexandru Elisei wrote:
>>>
>>> Hi,
>>>
>>>> On 4/6/20 3:06 PM, André Przywara wrote:
>>>>> On 26/03/2020 15:24, Alexandru Elisei wrote:
>>>>>
>>>>>>  
>>>>>>  union pci_config_address {
>>>>>>  	struct {
>>>>>> @@ -58,6 +91,8 @@ union pci_config_address {
>>>>>>  	};
>>>>>>  	u32 w;
>>>>>>  };
>>>>>> +#endif
>>>>>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>>>>  
>>>>>>  struct msix_table {
>>>>>>  	struct msi_msg msg;
>>>>>> @@ -100,6 +135,33 @@ struct pci_cap_hdr {
>>>>>>  	u8	next;
>>>>>>  };
>>>>>>  
>>>>>> +struct pcie_cap {
>>>>> I guess this is meant to map to the PCI Express Capability Structure as
>>>>> described in the PCIe spec?
>>>>> We would need to add the "packed" attribute then. But actually I am not
>>>>> a fan of using C language constructs to model specified register
>>>>> arrangements, the kernel tries to avoid this as well.
>>>> I'm not sure what you are suggesting. Should we rewrite the entire PCI emulation
>>>> code in kvmtool then?
>>> At least not add more of that problematic code, especially if we don't
>> I don't see why how the code is problematic. Did I miss it in a previous comment?
> I was referring to that point that modelling hardware defined registers
> using a C struct is somewhat dodgy. As the C standard says, in section
> 6.7.2.1, end of paragraph 15:
> "There may be unnamed padding within a structure object, but not at its
> beginning."
> Yes, GCC and other implementations offer "packed" to somewhat overcome
> this, but this is a compiler extension and has other issues, like if you
> have non-aligned members and take pointers to it.

The struct memory layout is dictated by aapcs64 (ARM IHI 0055B, page 12):

1. The alignment of an aggregate shall be the alignment of its most-aligned member.
2. The size of an aggregate shall be the smallest multiple of its alignment that
is sufficient to hold all of its members.

aapcs also specifies the alignment for each fundamental data type. The exact
wording is also used for armv7 (ARM IHI 0042F). Add to that the fact that the C
standard prohibits struct member reordering, and I don't think it's possible to
have two different layouts that respect the above rules (I'm happy to be proven
wrong, and I'm sure the people who wrote the specification would like to hear
about it). I would be surprised if other architectures didn't have similar rules.

As for the "packed" attribute, I'm also pretty sure it's a red herring. The PCI
spec specifies the registers in such a way that there's no gaps between the
registers and each register starts at a naturally aligned offset.

I do agree that taking pointers to struct members in this case can be problematic
if the register offset is not a multiple of 8.

> I think our case here is more sane, since we always seem to use it on
> normal memory (do we?), and are not mapping it to some actual device memory.

All the PCI configuration registers live in userspace memory, so normal memory, yes.

>
> So I leave this up to you, but I am still opposed to the idea of adding
> code that is not used.
>
> Cheers,
> Andre.
>
>>> need it. Maybe there is a better solution for the operations we will
>>> need (byte array?), that's hard to say without seeing the code.
>>>
>>>>> Actually, looking closer: why do we need this in the first place? I
>>>>> removed this and struct pm_cap, and it still compiles.
>>>>> So can we lose those two structures at all? And move the discussion and
>>>>> implementation (for VirtIO 1.0?) to a later series?
>>>> I've answered both points in v2 of the series [1].
>>>>
>>>> [1] https://www.spinics.net/lists/kvm/msg209601.html:
>>> From there:
>>>>> But more importantly: Do we actually need those definitions? We
>>>>> don't seem to use them, do we?
>>>>> And the u8 __pad[PCI_DEV_CFG_SIZE] below should provide the extended
>>>>> storage space a guest would expect?
>>>> Yes, we don't use them for the reasons I explained in the commit
>>>> message. I would rather keep them, because they are required by the
>>>> PCIE spec.
>>> I don't get the point of adding code / data structures that we don't
>>> need, especially if it has issues. I understand it's mandatory as per
>>> the spec, but just adding a struct here doesn't fix this or makes this
>>> better.
>> Sure, I can remove the unused structs, especially if they have issues. But I don't
>> see what issues they have, would you mind expanding on that?
> The best code is the one not written. ;-)

Truer words were never spoken.

That settles it then, I'll remove the unused structs.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support
  2020-05-13 14:42             ` Alexandru Elisei
@ 2021-06-04 16:46               ` Alexandru Elisei
  0 siblings, 0 replies; 68+ messages in thread
From: Alexandru Elisei @ 2021-06-04 16:46 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi Andre,

On 5/13/20 3:42 PM, Alexandru Elisei wrote:
> Hi,
>
> On 5/13/20 9:17 AM, André Przywara wrote:
>> On 12/05/2020 16:44, Alexandru Elisei wrote:
>>
>> Hi,
>>
>>> On 5/12/20 3:17 PM, André Przywara wrote:
>>>> On 06/05/2020 14:51, Alexandru Elisei wrote:
>>>>
>>>> Hi,
>>>>
>>>>> On 4/6/20 3:06 PM, André Przywara wrote:
>>>>> [..]
>>>>>> Actually, looking closer: why do we need this in the first place? I
>>>>>> removed this and struct pm_cap, and it still compiles.
>>>>>> So can we lose those two structures at all? And move the discussion and
>>>>>> implementation (for VirtIO 1.0?) to a later series?
>>>>> I've answered both points in v2 of the series [1].
>>>>>
>>>>> [1] https://www.spinics.net/lists/kvm/msg209601.html:
>>>> From there:
>>>>>> But more importantly: Do we actually need those definitions? We
>>>>>> don't seem to use them, do we?
>>>>>> And the u8 __pad[PCI_DEV_CFG_SIZE] below should provide the extended
>>>>>> storage space a guest would expect?
>>>>> Yes, we don't use them for the reasons I explained in the commit
>>>>> message. I would rather keep them, because they are required by the
>>>>> PCIE spec.
>>>> I don't get the point of adding code / data structures that we don't
>>>> need, especially if it has issues. I understand it's mandatory as per
>>>> the spec, but just adding a struct here doesn't fix this or makes this
>>>> better.
>>> Sure, I can remove the unused structs, especially if they have issues. But I don't
>>> see what issues they have, would you mind expanding on that?
>> The best code is the one not written. ;-)
> Truer words were never spoken.
>
> That settles it then, I'll remove the unused structs.

Coming back to this. Without the PCIE capability, Linux will incorrectly set the
size of the PCI configuration space for a device to the legacy PCI size (256 bytes
instead of 4096). This is done in drivers/pci/probe.c::pci_setup_device, where
dev->cfg_size = pci_cfg_size(dev). And pci_cfg_size() is:

int pci_cfg_space_size(struct pci_dev *dev)
{
    int pos;
    u32 status;
    u16 class;

#ifdef CONFIG_PCI_IOV
    /*
     * Per the SR-IOV specification (rev 1.1, sec 3.5), VFs are required to
     * implement a PCIe capability and therefore must implement extended
     * config space.  We can skip the NO_EXTCFG test below and the
     * reachability/aliasing test in pci_cfg_space_size_ext() by virtue of
     * the fact that the SR-IOV capability on the PF resides in extended
     * config space and must be accessible and non-aliased to have enabled
     * support for this VF.  This is a micro performance optimization for
     * systems supporting many VFs.
     */
    if (dev->is_virtfn)
        return PCI_CFG_SPACE_EXP_SIZE;
#endif

    if (dev->bus->bus_flags & PCI_BUS_FLAGS_NO_EXTCFG)
        return PCI_CFG_SPACE_SIZE;

    class = dev->class >> 8;
    if (class == PCI_CLASS_BRIDGE_HOST)
        return pci_cfg_space_size_ext(dev);

    if (pci_is_pcie(dev))
        return pci_cfg_space_size_ext(dev);

    pos = pci_find_capability(dev, PCI_CAP_ID_PCIX);
    if (!pos)
        return PCI_CFG_SPACE_SIZE;

    pci_read_config_dword(dev, pos + PCI_X_STATUS, &status);
    if (status & (PCI_X_STATUS_266MHZ | PCI_X_STATUS_533MHZ))
        return pci_cfg_space_size_ext(dev);

    return PCI_CFG_SPACE_SIZE;
}

And pci_is_pcie(dev) returns true if the device has a valid PCIE capability.

Why do we care? The RTL8168 driver checks if the NIC is PCIE capable, and if not
it prints the error message below and falls back to another (proprietary?) method
of device configuration (in
drivers/net/ethernet/realtek/r8169_main.c::rtl_csi_access_enable):

[    1.490530] r8169 0000:00:00.0 enp0s0: No native access to PCI extended config
space, falling back to CSI
[    1.500201] r8169 0000:00:00.0 enp0s0: Link is Down

I'll keep the PCIE cap and drop the power management cap.

Thanks,

Alex


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2021-06-04 16:46 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-26 15:24 [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 01/32] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 02/32] hw/i8042: Compile only for x86 Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 03/32] pci: Fix BAR resource sizing arbitration Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 04/32] Remove pci-shmem device Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 05/32] Check that a PCI device's memory size is power of two Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 06/32] arm/pci: Advertise only PCI bus 0 in the DT Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 07/32] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 08/32] pci: Fix ioport allocation size Alexandru Elisei
2020-03-30  9:27   ` André Przywara
2020-03-26 15:24 ` [PATCH v3 kvmtool 09/32] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 10/32] vfio/pci: Allocate correct size for MSIX table and PBA BARs Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 11/32] vfio/pci: Don't assume that only even numbered BARs are 64bit Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 12/32] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
2020-03-30  9:29   ` André Przywara
2020-03-26 15:24 ` [PATCH v3 kvmtool 13/32] vfio/pci: Don't access unallocated regions Alexandru Elisei
2020-03-30  9:29   ` André Przywara
2020-03-26 15:24 ` [PATCH v3 kvmtool 14/32] virtio: Don't ignore initialization failures Alexandru Elisei
2020-03-30  9:30   ` André Przywara
2020-04-14 10:27     ` Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 15/32] Don't ignore errors registering a device, ioport or mmio emulation Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 16/32] hw/vesa: Don't ignore fatal errors Alexandru Elisei
2020-03-30  9:30   ` André Przywara
2020-03-26 15:24 ` [PATCH v3 kvmtool 17/32] hw/vesa: Set the size for BAR 0 Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 18/32] ioport: Fail when registering overlapping ports Alexandru Elisei
2020-03-30 11:13   ` André Przywara
2020-03-26 15:24 ` [PATCH v3 kvmtool 19/32] ioport: mmio: Use a mutex and reference counting for locking Alexandru Elisei
2020-03-31 11:51   ` André Przywara
2020-05-01 15:30     ` Alexandru Elisei
2020-05-06 17:40       ` Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 20/32] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
2020-04-02  8:33   ` André Przywara
2020-03-26 15:24 ` [PATCH v3 kvmtool 21/32] virtio/pci: Get emulated region address from BARs Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 22/32] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
2020-04-02  8:33   ` André Przywara
2020-03-26 15:24 ` [PATCH v3 kvmtool 23/32] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 24/32] pci: Limit configuration transaction size to 32 bits Alexandru Elisei
2020-04-02  8:34   ` André Przywara
2020-05-04 13:00     ` Alexandru Elisei
2020-05-04 13:37       ` Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 25/32] vfio/pci: Don't write configuration value twice Alexandru Elisei
2020-04-02  8:35   ` André Przywara
2020-05-04 13:44     ` Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 26/32] vesa: Create device exactly once Alexandru Elisei
2020-04-02  8:58   ` André Przywara
2020-05-05 13:02     ` Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 27/32] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
2020-04-03 11:57   ` André Przywara
2020-04-03 18:14     ` Alexandru Elisei
2020-04-03 19:08       ` André Przywara
2020-05-05 13:30         ` Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 28/32] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
2020-04-06 14:03   ` André Przywara
2020-03-26 15:24 ` [PATCH v3 kvmtool 29/32] pci: Implement reassignable BARs Alexandru Elisei
2020-04-06 14:05   ` André Przywara
2020-05-06 12:05     ` Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 30/32] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 31/32] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
2020-03-26 15:24 ` [PATCH v3 kvmtool 32/32] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
2020-04-06 14:06   ` André Przywara
2020-04-14  8:56     ` Alexandru Elisei
2020-05-06 13:51     ` Alexandru Elisei
2020-05-12 14:17       ` André Przywara
2020-05-12 15:44         ` Alexandru Elisei
2020-05-13  8:17           ` André Przywara
2020-05-13 14:42             ` Alexandru Elisei
2021-06-04 16:46               ` Alexandru Elisei
2020-04-06 14:14 ` [PATCH v3 kvmtool 00/32] Add reassignable BARs and PCIE " André Przywara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.