All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support
@ 2020-01-23 13:47 Alexandru Elisei
  2020-01-23 13:47 ` [PATCH v2 kvmtool 01/30] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
                   ` (31 more replies)
  0 siblings, 32 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
it from trying to reassign the BARs. Let's make the BARs reassignable so
we can get rid of this band-aid.

Let's also extend the legacy PCI emulation, which came out in 1992, so we
can properly emulate the PCI Express version 1.1 protocol, which is
relatively new, being published in 2005.

For this iteration, I have completely reworked the way BARs are
reassigned. As I was adding support for reassignable BARs to more devices,
it became clear to me that I was duplicating the same code over and over
again.  Furthermore, during device configuration, Linux can assign a region
resource to a BAR that temporarily overlaps with another device. With my
original approach, that meant that every device must be aware of the BAR
values for all the other devices.

With this new approach, the algorithm for activating/deactivating emulation
as BAR addresses change lives completely inside the PCI code. Each device
registers two callback functions which are called when device emulation is
activated (for example, to activate emulation for a newly assigned BAR
address), respectively, when device emulation is deactivated (a previous
BAR address is changed, and emulation for that region must be deactivated).

I also tried to do better at testing the patches. I have tested VFIO with
virtio-pci on an arm64 and a x86 machine:

1. AMD Seattle: Intel 82574L Gigabit Ethernet card, Samsung 970 Pro NVME
(controller registers are in the same BAR region as the MSIX table and PBA,
I wrote a nasty hack to make it work, will try to upstream something after
this series), Realtek 8168 Gigabit Ethernet card, NVIDIA Quadro P400 (only
device detection), AMD Firepro W2100 (amdgpu driver fails probing
because of missing expansion ROM emulation in kvmtool, I will send patches
for this too), Myricom 10 Gigabit Ethernet card, Seagate Barracuda 1000GB
drive.

2. Ryzen 3900x + Gigabyte x570 Aorus Master (bios F10): Realtek 8168
Gigabit Ethernet card, AMD Firepro W2100 (same issue as on Seattle).

Using the CFI flash emulation for kvmtool [1] and a hacked version of EDK2
as the firmware for the virtual machine, I was able download an official
debian arm64 installation iso, install debian and then run it. EDK2 patches
for kvmtool will be posted soon.

You will notice from the changelog that there are a lot of new patches
(17!), but most of them are fixes for stuff that I found while testing.

Patches 1-18 are fixes and cleanups, and can be merged independently. They
are pretty straightforward, so if the size of the series looks off-putting,
please review these first. I am aware that the series has grown quite a
lot, I am willing to split the fixes from the rest of the patches, or
whatever else can make reviewing easier.

Changes in v2:
* Patches 2, 11-18, 20, 22-27, 29 are new.
* Patches 11, 13, and 14 have been dropped.
* Reworked the way BAR reassignment is implemented.
* The patch "Add PCI Express 1.1 support" has been reworked to apply only
  to arm64. For x86 we would need ACPI support in order to advertise the
  location of the ECAM space.
* Gathered Reviewed-by tags.
* Implemented review comments.

[1] https://www.spinics.net/lists/arm-kernel/msg778623.html

Alexandru Elisei (24):
  Makefile: Use correct objcopy binary when cross-compiling for x86_64
  hw/i8042: Compile only for x86
  Remove pci-shmem device
  Check that a PCI device's memory size is power of two
  arm/pci: Advertise only PCI bus 0 in the DT
  vfio/pci: Allocate correct size for MSIX table and PBA BARs
  vfio/pci: Don't assume that only even numbered BARs are 64bit
  vfio/pci: Ignore expansion ROM BAR writes
  vfio/pci: Don't access potentially unallocated regions
  virtio: Don't ignore initialization failures
  Don't ignore errors registering a device, ioport or mmio emulation
  hw/vesa: Don't ignore fatal errors
  hw/vesa: Set the size for BAR 0
  Use independent read/write locks for ioport and mmio
  pci: Add helpers for BAR values and memory/IO space access
  virtio/pci: Get emulated region address from BARs
  vfio: Destroy memslot when unmapping the associated VAs
  vfio: Reserve ioports when configuring the BAR
  vfio/pci: Don't write configuration value twice
  pci: Implement callbacks for toggling BAR emulation
  pci: Toggle BAR I/O and memory space emulation
  pci: Implement reassignable BARs
  vfio: Trap MMIO access to BAR addresses which aren't page aligned
  arm/arm64: Add PCI Express 1.1 support

Julien Thierry (5):
  ioport: pci: Move port allocations to PCI devices
  pci: Fix ioport allocation size
  arm/pci: Fix PCI IO region
  virtio/pci: Make memory and IO BARs independent
  arm/fdt: Remove 'linux,pci-probe-only' property

Sami Mujawar (1):
  pci: Fix BAR resource sizing arbitration

 Makefile                          |   6 +-
 arm/fdt.c                         |   1 -
 arm/include/arm-common/kvm-arch.h |   4 +-
 arm/include/arm-common/pci.h      |   1 +
 arm/ioport.c                      |   3 +-
 arm/kvm.c                         |   3 +
 arm/pci.c                         |  25 +-
 builtin-run.c                     |   6 +-
 hw/i8042.c                        |  14 +-
 hw/pci-shmem.c                    | 400 ------------------------------
 hw/vesa.c                         | 132 +++++++---
 include/kvm/devices.h             |   3 +-
 include/kvm/ioport.h              |  10 +-
 include/kvm/kvm-config.h          |   2 +-
 include/kvm/kvm.h                 |   9 +-
 include/kvm/pci-shmem.h           |  32 ---
 include/kvm/pci.h                 | 168 ++++++++++++-
 include/kvm/util.h                |   2 +
 include/kvm/vesa.h                |   6 +-
 include/kvm/virtio-pci.h          |   3 -
 include/kvm/virtio.h              |   7 +-
 include/linux/compiler.h          |   2 +-
 ioport.c                          |  57 ++---
 kvm.c                             |  65 ++++-
 mips/kvm.c                        |   3 +-
 mmio.c                            |  26 +-
 pci.c                             | 320 ++++++++++++++++++++++--
 powerpc/include/kvm/kvm-arch.h    |   2 +-
 powerpc/ioport.c                  |   3 +-
 powerpc/spapr_pci.c               |   2 +-
 vfio/core.c                       |  22 +-
 vfio/pci.c                        | 231 +++++++++++++----
 virtio/9p.c                       |   9 +-
 virtio/balloon.c                  |  10 +-
 virtio/blk.c                      |  14 +-
 virtio/console.c                  |  11 +-
 virtio/core.c                     |   9 +-
 virtio/mmio.c                     |  13 +-
 virtio/net.c                      |  32 +--
 virtio/pci.c                      | 220 +++++++++++-----
 virtio/scsi.c                     |  14 +-
 x86/include/kvm/kvm-arch.h        |   2 +-
 x86/ioport.c                      |  66 +++--
 43 files changed, 1217 insertions(+), 753 deletions(-)
 delete mode 100644 hw/pci-shmem.c
 delete mode 100644 include/kvm/pci-shmem.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 01/30] Makefile: Use correct objcopy binary when cross-compiling for x86_64
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-23 13:47 ` [PATCH v2 kvmtool 02/30] hw/i8042: Compile only for x86 Alexandru Elisei
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

Use the compiler toolchain version of objcopy instead of the native one
when cross-compiling for the x86_64 architecture.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index b76d844f2e01..6d6880dd4f8a 100644
--- a/Makefile
+++ b/Makefile
@@ -22,6 +22,7 @@ CC	:= $(CROSS_COMPILE)gcc
 CFLAGS	:=
 LD	:= $(CROSS_COMPILE)ld
 LDFLAGS	:=
+OBJCOPY	:= $(CROSS_COMPILE)objcopy
 
 FIND	:= find
 CSCOPE	:= cscope
@@ -479,7 +480,7 @@ x86/bios/bios.bin.elf: x86/bios/entry.S x86/bios/e820.c x86/bios/int10.c x86/bio
 
 x86/bios/bios.bin: x86/bios/bios.bin.elf
 	$(E) "  OBJCOPY " $@
-	$(Q) objcopy -O binary -j .text x86/bios/bios.bin.elf x86/bios/bios.bin
+	$(Q) $(OBJCOPY) -O binary -j .text x86/bios/bios.bin.elf x86/bios/bios.bin
 
 x86/bios/bios-rom.o: x86/bios/bios-rom.S x86/bios/bios.bin x86/bios/bios-rom.h
 	$(E) "  CC      " $@
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 02/30] hw/i8042: Compile only for x86
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
  2020-01-23 13:47 ` [PATCH v2 kvmtool 01/30] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-27 18:07   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 03/30] pci: Fix BAR resource sizing arbitration Alexandru Elisei
                   ` (29 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

The initialization function for the i8042 emulated device does exactly
nothing for all architectures, except for x86. As a result, the device
is usable only for x86, so let's make the file an architecture specific
object file.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Makefile   | 2 +-
 hw/i8042.c | 4 ----
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index 6d6880dd4f8a..33eddcbb4d66 100644
--- a/Makefile
+++ b/Makefile
@@ -103,7 +103,6 @@ OBJS	+= hw/pci-shmem.o
 OBJS	+= kvm-ipc.o
 OBJS	+= builtin-sandbox.o
 OBJS	+= virtio/mmio.o
-OBJS	+= hw/i8042.o
 
 # Translate uname -m into ARCH string
 ARCH ?= $(shell uname -m | sed -e s/i.86/i386/ -e s/ppc.*/powerpc/ \
@@ -124,6 +123,7 @@ endif
 #x86
 ifeq ($(ARCH),x86)
 	DEFINES += -DCONFIG_X86
+	OBJS	+= hw/i8042.o
 	OBJS	+= x86/boot.o
 	OBJS	+= x86/cpuid.o
 	OBJS	+= x86/interrupt.o
diff --git a/hw/i8042.c b/hw/i8042.c
index 288b7d1108ac..2d8c96e9c7e6 100644
--- a/hw/i8042.c
+++ b/hw/i8042.c
@@ -349,10 +349,6 @@ static struct ioport_operations kbd_ops = {
 
 int kbd__init(struct kvm *kvm)
 {
-#ifndef CONFIG_X86
-	return 0;
-#endif
-
 	kbd_reset();
 	state.kvm = kvm;
 	ioport__register(kvm, I8042_DATA_REG, &kbd_ops, 2, NULL);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 03/30] pci: Fix BAR resource sizing arbitration
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
  2020-01-23 13:47 ` [PATCH v2 kvmtool 01/30] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
  2020-01-23 13:47 ` [PATCH v2 kvmtool 02/30] hw/i8042: Compile only for x86 Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-27 18:07   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 04/30] Remove pci-shmem device Alexandru Elisei
                   ` (28 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz, Julien Thierry

From: Sami Mujawar <sami.mujawar@arm.com>

According to the 'PCI Local Bus Specification, Revision 3.0,
February 3, 2004, Section 6.2.5.1, Implementation Notes, page 227'

    "Software saves the original value of the Base Address register,
    writes 0 FFFF FFFFh to the register, then reads it back. Size
    calculation can be done from the 32-bit value read by first
    clearing encoding information bits (bit 0 for I/O, bits 0-3 for
    memory), inverting all 32 bits (logical NOT), then incrementing
    by 1. The resultant 32-bit value is the memory/I/O range size
    decoded by the register. Note that the upper 16 bits of the result
    is ignored if the Base Address register is for I/O and bits 16-31
    returned zero upon read."

kvmtool was returning the actual BAR resource size which would be
incorrect as the software software drivers would invert all 32 bits
(logical NOT), then incrementing by 1. This ends up with a very large
resource size (in some cases more than 4GB) due to which drivers
assert/fail to work.

e.g if the BAR resource size was 0x1000, kvmtool would return 0x1000
instead of 0xFFFFF00x.

Fixed pci__config_wr() to return the size of the BAR in accordance with
the PCI Local Bus specification, Implementation Notes.

Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Reworked algorithm, removed power-of-two check]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 pci.c | 42 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/pci.c b/pci.c
index 689869cb79a3..3198732935eb 100644
--- a/pci.c
+++ b/pci.c
@@ -149,6 +149,8 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	u8 bar, offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
+	u32 value = 0;
+	u32 mask;
 
 	if (!pci_device_exists(addr.bus_number, dev_num, 0))
 		return;
@@ -169,13 +171,41 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
 
 	/*
-	 * If the kernel masks the BAR it would expect to find the size of the
-	 * BAR there next time it reads from it. When the kernel got the size it
-	 * would write the address back.
+	 * If the kernel masks the BAR, it will expect to find the size of the
+	 * BAR there next time it reads from it. After the kernel reads the
+	 * size, it will write the address back.
 	 */
-	if (bar < 6 && ioport__read32(data) == 0xFFFFFFFF) {
-		u32 sz = pci_hdr->bar_size[bar];
-		memcpy(base + offset, &sz, sizeof(sz));
+	if (bar < 6) {
+		if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO)
+			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
+		else
+			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
+		/*
+		 * According to the PCI local bus specification REV 3.0:
+		 * The number of upper bits that a device actually implements
+		 * depends on how much of the address space the device will
+		 * respond to. A device that wants a 1 MB memory address space
+		 * (using a 32-bit base address register) would build the top
+		 * 12 bits of the address register, hardwiring the other bits
+		 * to 0.
+		 *
+		 * Furthermore, software can determine how much address space
+		 * the device requires by writing a value of all 1's to the
+		 * register and then reading the value back. The device will
+		 * return 0's in all don't-care address bits, effectively
+		 * specifying the address space required.
+		 *
+		 * Software computes the size of the address space with the
+		 * formula S = ~B + 1, where S is the memory size and B is the
+		 * value read from the BAR. This means that the BAR value that
+		 * kvmtool should return is B = ~(S - 1).
+		 */
+		memcpy(&value, data, size);
+		if (value == 0xffffffff)
+			value = ~(pci_hdr->bar_size[bar] - 1);
+		/* Preserve the special bits. */
+		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
+		memcpy(base + offset, &value, size);
 	} else {
 		memcpy(base + offset, data, size);
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 04/30] Remove pci-shmem device
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (2 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 03/30] pci: Fix BAR resource sizing arbitration Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-23 13:47 ` [PATCH v2 kvmtool 05/30] Check that a PCI device's memory size is power of two Alexandru Elisei
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

The pci-shmem emulated device ("ivshmem") was created by QEMU for
cross-VM data sharing. The only Linux driver that uses this device is
the Android Virtual System on a Chip staging driver, which also mentions
a character device driver implemented on top of shmem, which was removed
from Linux.

On the kvmtool side, the only commits touching the pci-shmem device
since it was introduced in 2012 were made when refactoring various
kvmtool subsystems. Let's remove the maintenance burden on the kvmtool
maintainers and remove this unused device.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Makefile                |   1 -
 builtin-run.c           |   5 -
 hw/pci-shmem.c          | 400 ----------------------------------------
 include/kvm/pci-shmem.h |  32 ----
 4 files changed, 438 deletions(-)
 delete mode 100644 hw/pci-shmem.c
 delete mode 100644 include/kvm/pci-shmem.h

diff --git a/Makefile b/Makefile
index 33eddcbb4d66..f75413e74819 100644
--- a/Makefile
+++ b/Makefile
@@ -99,7 +99,6 @@ OBJS	+= util/read-write.o
 OBJS	+= util/util.o
 OBJS	+= virtio/9p.o
 OBJS	+= virtio/9p-pdu.o
-OBJS	+= hw/pci-shmem.o
 OBJS	+= kvm-ipc.o
 OBJS	+= builtin-sandbox.o
 OBJS	+= virtio/mmio.o
diff --git a/builtin-run.c b/builtin-run.c
index f8dc6c7229b0..9cb8c75300eb 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -31,7 +31,6 @@
 #include "kvm/sdl.h"
 #include "kvm/vnc.h"
 #include "kvm/guest_compat.h"
-#include "kvm/pci-shmem.h"
 #include "kvm/kvm-ipc.h"
 #include "kvm/builtin-debug.h"
 
@@ -99,10 +98,6 @@ void kvm_run_set_wrapper_sandbox(void)
 	OPT_INTEGER('c', "cpus", &(cfg)->nrcpus, "Number of CPUs"),	\
 	OPT_U64('m', "mem", &(cfg)->ram_size, "Virtual machine memory"	\
 		" size in MiB."),					\
-	OPT_CALLBACK('\0', "shmem", NULL,				\
-		     "[pci:]<addr>:<size>[:handle=<handle>][:create]",	\
-		     "Share host shmem with guest via pci device",	\
-		     shmem_parser, NULL),				\
 	OPT_CALLBACK('d', "disk", kvm, "image or rootfs_dir", "Disk "	\
 			" image or rootfs directory", img_name_parser,	\
 			kvm),						\
diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
deleted file mode 100644
index f92bc75544d7..000000000000
--- a/hw/pci-shmem.c
+++ /dev/null
@@ -1,400 +0,0 @@
-#include "kvm/devices.h"
-#include "kvm/pci-shmem.h"
-#include "kvm/virtio-pci-dev.h"
-#include "kvm/irq.h"
-#include "kvm/kvm.h"
-#include "kvm/pci.h"
-#include "kvm/util.h"
-#include "kvm/ioport.h"
-#include "kvm/ioeventfd.h"
-
-#include <linux/kvm.h>
-#include <linux/byteorder.h>
-#include <sys/ioctl.h>
-#include <fcntl.h>
-#include <sys/mman.h>
-
-#define MB_SHIFT (20)
-#define KB_SHIFT (10)
-#define GB_SHIFT (30)
-
-static struct pci_device_header pci_shmem_pci_device = {
-	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
-	.device_id	= cpu_to_le16(0x1110),
-	.header_type	= PCI_HEADER_TYPE_NORMAL,
-	.class[2]	= 0xFF,	/* misc pci device */
-	.status		= cpu_to_le16(PCI_STATUS_CAP_LIST),
-	.capabilities	= (void *)&pci_shmem_pci_device.msix - (void *)&pci_shmem_pci_device,
-	.msix.cap	= PCI_CAP_ID_MSIX,
-	.msix.ctrl	= cpu_to_le16(1),
-	.msix.table_offset = cpu_to_le32(1),		/* Use BAR 1 */
-	.msix.pba_offset = cpu_to_le32(0x1001),		/* Use BAR 1 */
-};
-
-static struct device_header pci_shmem_device = {
-	.bus_type	= DEVICE_BUS_PCI,
-	.data		= &pci_shmem_pci_device,
-};
-
-/* registers for the Inter-VM shared memory device */
-enum ivshmem_registers {
-	INTRMASK = 0,
-	INTRSTATUS = 4,
-	IVPOSITION = 8,
-	DOORBELL = 12,
-};
-
-static struct shmem_info *shmem_region;
-static u16 ivshmem_registers;
-static int local_fd;
-static u32 local_id;
-static u64 msix_block;
-static u64 msix_pba;
-static struct msix_table msix_table[2];
-
-int pci_shmem__register_mem(struct shmem_info *si)
-{
-	if (!shmem_region) {
-		shmem_region = si;
-	} else {
-		pr_warning("only single shmem currently avail. ignoring.\n");
-		free(si);
-	}
-	return 0;
-}
-
-static bool shmem_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
-{
-	u16 offset = port - ivshmem_registers;
-
-	switch (offset) {
-	case INTRMASK:
-		break;
-	case INTRSTATUS:
-		break;
-	case IVPOSITION:
-		ioport__write32(data, local_id);
-		break;
-	case DOORBELL:
-		break;
-	};
-
-	return true;
-}
-
-static bool shmem_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
-{
-	u16 offset = port - ivshmem_registers;
-
-	switch (offset) {
-	case INTRMASK:
-		break;
-	case INTRSTATUS:
-		break;
-	case IVPOSITION:
-		break;
-	case DOORBELL:
-		break;
-	};
-
-	return true;
-}
-
-static struct ioport_operations shmem_pci__io_ops = {
-	.io_in	= shmem_pci__io_in,
-	.io_out	= shmem_pci__io_out,
-};
-
-static void callback_mmio_msix(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr)
-{
-	void *mem;
-
-	if (addr - msix_block < 0x1000)
-		mem = &msix_table;
-	else
-		mem = &msix_pba;
-
-	if (is_write)
-		memcpy(mem + addr - msix_block, data, len);
-	else
-		memcpy(data, mem + addr - msix_block, len);
-}
-
-/*
- * Return an irqfd which can be used by other guests to signal this guest
- * whenever they need to poke it
- */
-int pci_shmem__get_local_irqfd(struct kvm *kvm)
-{
-	int fd, gsi, r;
-
-	if (local_fd == 0) {
-		fd = eventfd(0, 0);
-		if (fd < 0)
-			return fd;
-
-		if (pci_shmem_pci_device.msix.ctrl & cpu_to_le16(PCI_MSIX_FLAGS_ENABLE)) {
-			gsi = irq__add_msix_route(kvm, &msix_table[0].msg,
-						  pci_shmem_device.dev_num << 3);
-			if (gsi < 0)
-				return gsi;
-		} else {
-			gsi = pci_shmem_pci_device.irq_line;
-		}
-
-		r = irq__add_irqfd(kvm, gsi, fd, -1);
-		if (r < 0)
-			return r;
-
-		local_fd = fd;
-	}
-
-	return local_fd;
-}
-
-/*
- * Connect a new client to ivshmem by adding the appropriate datamatch
- * to the DOORBELL
- */
-int pci_shmem__add_client(struct kvm *kvm, u32 id, int fd)
-{
-	struct kvm_ioeventfd ioevent;
-
-	ioevent = (struct kvm_ioeventfd) {
-		.addr		= ivshmem_registers + DOORBELL,
-		.len		= sizeof(u32),
-		.datamatch	= id,
-		.fd		= fd,
-		.flags		= KVM_IOEVENTFD_FLAG_PIO | KVM_IOEVENTFD_FLAG_DATAMATCH,
-	};
-
-	return ioctl(kvm->vm_fd, KVM_IOEVENTFD, &ioevent);
-}
-
-/*
- * Remove a client connected to ivshmem by removing the appropriate datamatch
- * from the DOORBELL
- */
-int pci_shmem__remove_client(struct kvm *kvm, u32 id)
-{
-	struct kvm_ioeventfd ioevent;
-
-	ioevent = (struct kvm_ioeventfd) {
-		.addr		= ivshmem_registers + DOORBELL,
-		.len		= sizeof(u32),
-		.datamatch	= id,
-		.flags		= KVM_IOEVENTFD_FLAG_PIO
-				| KVM_IOEVENTFD_FLAG_DATAMATCH
-				| KVM_IOEVENTFD_FLAG_DEASSIGN,
-	};
-
-	return ioctl(kvm->vm_fd, KVM_IOEVENTFD, &ioevent);
-}
-
-static void *setup_shmem(const char *key, size_t len, int creating)
-{
-	int fd;
-	int rtn;
-	void *mem;
-	int flag = O_RDWR;
-
-	if (creating)
-		flag |= O_CREAT;
-
-	fd = shm_open(key, flag, S_IRUSR | S_IWUSR);
-	if (fd < 0) {
-		pr_warning("Failed to open shared memory file %s\n", key);
-		return NULL;
-	}
-
-	if (creating) {
-		rtn = ftruncate(fd, (off_t) len);
-		if (rtn < 0)
-			pr_warning("Can't ftruncate(fd,%zu)\n", len);
-	}
-	mem = mmap(NULL, len,
-		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, 0);
-	if (mem == MAP_FAILED) {
-		pr_warning("Failed to mmap shared memory file");
-		mem = NULL;
-	}
-	close(fd);
-
-	return mem;
-}
-
-int shmem_parser(const struct option *opt, const char *arg, int unset)
-{
-	const u64 default_size = SHMEM_DEFAULT_SIZE;
-	const u64 default_phys_addr = SHMEM_DEFAULT_ADDR;
-	const char *default_handle = SHMEM_DEFAULT_HANDLE;
-	struct shmem_info *si = malloc(sizeof(struct shmem_info));
-	u64 phys_addr;
-	u64 size;
-	char *handle = NULL;
-	int create = 0;
-	const char *p = arg;
-	char *next;
-	int base = 10;
-	int verbose = 0;
-
-	const int skip_pci = strlen("pci:");
-	if (verbose)
-		pr_info("shmem_parser(%p,%s,%d)", opt, arg, unset);
-	/* parse out optional addr family */
-	if (strcasestr(p, "pci:")) {
-		p += skip_pci;
-	} else if (strcasestr(p, "mem:")) {
-		die("I can't add to E820 map yet.\n");
-	}
-	/* parse out physical addr */
-	base = 10;
-	if (strcasestr(p, "0x"))
-		base = 16;
-	phys_addr = strtoll(p, &next, base);
-	if (next == p && phys_addr == 0) {
-		pr_info("shmem: no physical addr specified, using default.");
-		phys_addr = default_phys_addr;
-	}
-	if (*next != ':' && *next != '\0')
-		die("shmem: unexpected chars after phys addr.\n");
-	if (*next == '\0')
-		p = next;
-	else
-		p = next + 1;
-	/* parse out size */
-	base = 10;
-	if (strcasestr(p, "0x"))
-		base = 16;
-	size = strtoll(p, &next, base);
-	if (next == p && size == 0) {
-		pr_info("shmem: no size specified, using default.");
-		size = default_size;
-	}
-	/* look for [KMGkmg][Bb]*  uses base 2. */
-	int skip_B = 0;
-	if (strspn(next, "KMGkmg")) {	/* might have a prefix */
-		if (*(next + 1) == 'B' || *(next + 1) == 'b')
-			skip_B = 1;
-		switch (*next) {
-		case 'K':
-		case 'k':
-			size = size << KB_SHIFT;
-			break;
-		case 'M':
-		case 'm':
-			size = size << MB_SHIFT;
-			break;
-		case 'G':
-		case 'g':
-			size = size << GB_SHIFT;
-			break;
-		default:
-			die("shmem: bug in detecting size prefix.");
-			break;
-		}
-		next += 1 + skip_B;
-	}
-	if (*next != ':' && *next != '\0') {
-		die("shmem: unexpected chars after phys size. <%c><%c>\n",
-		    *next, *p);
-	}
-	if (*next == '\0')
-		p = next;
-	else
-		p = next + 1;
-	/* parse out optional shmem handle */
-	const int skip_handle = strlen("handle=");
-	next = strcasestr(p, "handle=");
-	if (*p && next) {
-		if (p != next)
-			die("unexpected chars before handle\n");
-		p += skip_handle;
-		next = strchrnul(p, ':');
-		if (next - p) {
-			handle = malloc(next - p + 1);
-			strncpy(handle, p, next - p);
-			handle[next - p] = '\0';	/* just in case. */
-		}
-		if (*next == '\0')
-			p = next;
-		else
-			p = next + 1;
-	}
-	/* parse optional create flag to see if we should create shm seg. */
-	if (*p && strcasestr(p, "create")) {
-		create = 1;
-		p += strlen("create");
-	}
-	if (*p != '\0')
-		die("shmem: unexpected trailing chars\n");
-	if (handle == NULL) {
-		handle = malloc(strlen(default_handle) + 1);
-		strcpy(handle, default_handle);
-	}
-	if (verbose) {
-		pr_info("shmem: phys_addr = %llx",
-			(unsigned long long)phys_addr);
-		pr_info("shmem: size      = %llx", (unsigned long long)size);
-		pr_info("shmem: handle    = %s", handle);
-		pr_info("shmem: create    = %d", create);
-	}
-
-	si->phys_addr = phys_addr;
-	si->size = size;
-	si->handle = handle;
-	si->create = create;
-	pci_shmem__register_mem(si);	/* ownership of si, etc. passed on. */
-	return 0;
-}
-
-int pci_shmem__init(struct kvm *kvm)
-{
-	char *mem;
-	int r;
-
-	if (shmem_region == NULL)
-		return 0;
-
-	/* Register MMIO space for MSI-X */
-	r = ioport__register(kvm, IOPORT_EMPTY, &shmem_pci__io_ops, IOPORT_SIZE, NULL);
-	if (r < 0)
-		return r;
-	ivshmem_registers = (u16)r;
-
-	msix_block = pci_get_io_space_block(0x1010);
-	kvm__register_mmio(kvm, msix_block, 0x1010, false, callback_mmio_msix, NULL);
-
-	/*
-	 * This registers 3 BARs:
-	 *
-	 * 0 - ivshmem registers
-	 * 1 - MSI-X MMIO space
-	 * 2 - Shared memory block
-	 */
-	pci_shmem_pci_device.bar[0] = cpu_to_le32(ivshmem_registers | PCI_BASE_ADDRESS_SPACE_IO);
-	pci_shmem_pci_device.bar_size[0] = shmem_region->size;
-	pci_shmem_pci_device.bar[1] = cpu_to_le32(msix_block | PCI_BASE_ADDRESS_SPACE_MEMORY);
-	pci_shmem_pci_device.bar_size[1] = 0x1010;
-	pci_shmem_pci_device.bar[2] = cpu_to_le32(shmem_region->phys_addr | PCI_BASE_ADDRESS_SPACE_MEMORY);
-	pci_shmem_pci_device.bar_size[2] = shmem_region->size;
-
-	device__register(&pci_shmem_device);
-
-	/* Open shared memory and plug it into the guest */
-	mem = setup_shmem(shmem_region->handle, shmem_region->size,
-				shmem_region->create);
-	if (mem == NULL)
-		return -EINVAL;
-
-	kvm__register_dev_mem(kvm, shmem_region->phys_addr, shmem_region->size,
-			      mem);
-	return 0;
-}
-dev_init(pci_shmem__init);
-
-int pci_shmem__exit(struct kvm *kvm)
-{
-	return 0;
-}
-dev_exit(pci_shmem__exit);
diff --git a/include/kvm/pci-shmem.h b/include/kvm/pci-shmem.h
deleted file mode 100644
index 6cff2b85bfd3..000000000000
--- a/include/kvm/pci-shmem.h
+++ /dev/null
@@ -1,32 +0,0 @@
-#ifndef KVM__PCI_SHMEM_H
-#define KVM__PCI_SHMEM_H
-
-#include <linux/types.h>
-#include <linux/list.h>
-
-#include "kvm/parse-options.h"
-
-#define SHMEM_DEFAULT_SIZE (16 << MB_SHIFT)
-#define SHMEM_DEFAULT_ADDR (0xc8000000)
-#define SHMEM_DEFAULT_HANDLE "/kvm_shmem"
-
-struct kvm;
-struct shmem_info;
-
-struct shmem_info {
-	u64 phys_addr;
-	u64 size;
-	char *handle;
-	int create;
-};
-
-int pci_shmem__init(struct kvm *kvm);
-int pci_shmem__exit(struct kvm *kvm);
-int pci_shmem__register_mem(struct shmem_info *si);
-int shmem_parser(const struct option *opt, const char *arg, int unset);
-
-int pci_shmem__get_local_irqfd(struct kvm *kvm);
-int pci_shmem__add_client(struct kvm *kvm, u32 id, int fd);
-int pci_shmem__remove_client(struct kvm *kvm, u32 id);
-
-#endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 05/30] Check that a PCI device's memory size is power of two
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (3 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 04/30] Remove pci-shmem device Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-27 18:07   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 06/30] arm/pci: Advertise only PCI bus 0 in the DT Alexandru Elisei
                   ` (26 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

According to the PCI local bus specification [1], a device's memory size
must be a power of two. This is also implicit in the mechanism that a CPU
uses to get the memory size requirement for a PCI device.

The vesa device requests a memory size that isn't a power of two.
According to the same spec [1], a device is allowed to consume more memory
than it actually requires. As a result, the amount of memory that the vesa
device now reserves has been increased.

To prevent slip-ups in the future, a few BUILD_BUG_ON statements were added
in places where the memory size is known at compile time.

[1] PCI Local Bus Specification Revision 3.0, section 6.2.5.1

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c          | 3 +++
 include/kvm/util.h | 2 ++
 include/kvm/vesa.h | 6 +++++-
 virtio/pci.c       | 3 +++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index f3c5114cf4fe..d75b4b316a1e 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -58,6 +58,9 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 	char *mem;
 	int r;
 
+	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
+	BUILD_BUG_ON(VESA_MEM_SIZE < VESA_BPP/8 * VESA_WIDTH * VESA_HEIGHT);
+
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
 
diff --git a/include/kvm/util.h b/include/kvm/util.h
index 4ca7aa9392b6..199724c4018c 100644
--- a/include/kvm/util.h
+++ b/include/kvm/util.h
@@ -104,6 +104,8 @@ static inline unsigned long roundup_pow_of_two(unsigned long x)
 	return x ? 1UL << fls_long(x - 1) : 0;
 }
 
+#define is_power_of_two(x)	((x) > 0 ? ((x) & ((x) - 1)) == 0 : 0)
+
 struct kvm;
 void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
 void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
diff --git a/include/kvm/vesa.h b/include/kvm/vesa.h
index 0fac11ab5a9f..e7d971343642 100644
--- a/include/kvm/vesa.h
+++ b/include/kvm/vesa.h
@@ -5,8 +5,12 @@
 #define VESA_HEIGHT	480
 
 #define VESA_MEM_ADDR	0xd0000000
-#define VESA_MEM_SIZE	(4*VESA_WIDTH*VESA_HEIGHT)
 #define VESA_BPP	32
+/*
+ * We actually only need VESA_BPP/8*VESA_WIDTH*VESA_HEIGHT bytes. But the memory
+ * size must be a power of 2, so we round up.
+ */
+#define VESA_MEM_SIZE	(1 << 21)
 
 struct kvm;
 struct biosregs;
diff --git a/virtio/pci.c b/virtio/pci.c
index 99653cad2c0f..04e801827df9 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -435,6 +435,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	vpci->kvm = kvm;
 	vpci->dev = dev;
 
+	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
+	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
+
 	r = ioport__register(kvm, IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
 	if (r < 0)
 		return r;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 06/30] arm/pci: Advertise only PCI bus 0 in the DT
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (4 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 05/30] Check that a PCI device's memory size is power of two Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-27 18:08   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 07/30] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
                   ` (25 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

The "bus-range" property encodes the PCI bus number of the PCI
controller and the largest bus number of any PCI buses that are
subordinate to this node [1]. kvmtool emulates only PCI bus 0.
Advertise this in the PCI DT node by setting "bus-range" to <0,0>.

[1] IEEE Std 1275-1994, Section 3 "Bus Nodes Properties and Methods"

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arm/pci.c b/arm/pci.c
index 557cfa98938d..ed325fa4a811 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -30,7 +30,7 @@ void pci__generate_fdt_nodes(void *fdt)
 	struct of_interrupt_map_entry irq_map[OF_PCI_IRQ_MAP_MAX];
 	unsigned nentries = 0;
 	/* Bus range */
-	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(1), };
+	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(0), };
 	/* Configuration Space */
 	u64 cfg_reg_prop[] = { cpu_to_fdt64(KVM_PCI_CFG_AREA),
 			       cpu_to_fdt64(ARM_PCI_CFG_SIZE), };
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 07/30] ioport: pci: Move port allocations to PCI devices
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (5 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 06/30] arm/pci: Advertise only PCI bus 0 in the DT Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-02-07 17:02   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 08/30] pci: Fix ioport allocation size Alexandru Elisei
                   ` (24 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

The dynamic ioport allocation with IOPORT_EMPTY is currently only used
by PCI devices. Other devices use fixed ports for which they request
registration to the ioport API.

PCI ports need to be in the PCI IO space and there is no reason ioport
API should know a PCI port is being allocated and needs to be placed in
PCI IO space. This currently just happens to be the case.

Move the responsability of dynamic allocation of ioports from the ioport
API to PCI.

In the future, if other types of devices also need dynamic ioport
allocation, they'll have to figure out the range of ports they are
allowed to use.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Renamed functions for clarity]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c                      |  4 ++--
 include/kvm/ioport.h           |  3 ---
 include/kvm/pci.h              |  4 +++-
 ioport.c                       | 18 ------------------
 pci.c                          | 17 +++++++++++++----
 powerpc/include/kvm/kvm-arch.h |  2 +-
 vfio/core.c                    |  6 ++++--
 vfio/pci.c                     |  4 ++--
 virtio/pci.c                   |  7 ++++---
 x86/include/kvm/kvm-arch.h     |  2 +-
 10 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index d75b4b316a1e..24fb46faad3b 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -63,8 +63,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
-
-	r = ioport__register(kvm, IOPORT_EMPTY, &vesa_io_ops, IOPORT_SIZE, NULL);
+	r = pci_get_io_port_block(IOPORT_SIZE);
+	r = ioport__register(kvm, r, &vesa_io_ops, IOPORT_SIZE, NULL);
 	if (r < 0)
 		return ERR_PTR(r);
 
diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index db52a479742b..b10fcd5b4412 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -14,11 +14,8 @@
 
 /* some ports we reserve for own use */
 #define IOPORT_DBG			0xe0
-#define IOPORT_START			0x6200
 #define IOPORT_SIZE			0x400
 
-#define IOPORT_EMPTY			USHRT_MAX
-
 struct kvm;
 
 struct ioport {
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index a86c15a70e6d..ccb155e3e8fe 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -19,6 +19,7 @@
 #define PCI_CONFIG_DATA		0xcfc
 #define PCI_CONFIG_BUS_FORWARD	0xcfa
 #define PCI_IO_SIZE		0x100
+#define PCI_IOPORT_START	0x6200
 #define PCI_CFG_SIZE		(1ULL << 24)
 
 struct kvm;
@@ -152,7 +153,8 @@ struct pci_device_header {
 int pci__init(struct kvm *kvm);
 int pci__exit(struct kvm *kvm);
 struct pci_device_header *pci__find_dev(u8 dev_num);
-u32 pci_get_io_space_block(u32 size);
+u32 pci_get_mmio_block(u32 size);
+u16 pci_get_io_port_block(u32 size);
 void pci__assign_irq(struct device_header *dev_hdr);
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size);
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size);
diff --git a/ioport.c b/ioport.c
index a6dc65e3e6c6..a72e4035881a 100644
--- a/ioport.c
+++ b/ioport.c
@@ -16,24 +16,8 @@
 
 #define ioport_node(n) rb_entry(n, struct ioport, node)
 
-DEFINE_MUTEX(ioport_mutex);
-
-static u16			free_io_port_idx; /* protected by ioport_mutex */
-
 static struct rb_root		ioport_tree = RB_ROOT;
 
-static u16 ioport__find_free_port(void)
-{
-	u16 free_port;
-
-	mutex_lock(&ioport_mutex);
-	free_port = IOPORT_START + free_io_port_idx * IOPORT_SIZE;
-	free_io_port_idx++;
-	mutex_unlock(&ioport_mutex);
-
-	return free_port;
-}
-
 static struct ioport *ioport_search(struct rb_root *root, u64 addr)
 {
 	struct rb_int_node *node;
@@ -85,8 +69,6 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	int r;
 
 	br_write_lock(kvm);
-	if (port == IOPORT_EMPTY)
-		port = ioport__find_free_port();
 
 	entry = ioport_search(&ioport_tree, port);
 	if (entry) {
diff --git a/pci.c b/pci.c
index 3198732935eb..80b5c5d3d7f3 100644
--- a/pci.c
+++ b/pci.c
@@ -15,15 +15,24 @@ static u32 pci_config_address_bits;
  * (That's why it can still 32bit even with 64bit guests-- 64bit
  * PCI isn't currently supported.)
  */
-static u32 io_space_blocks		= KVM_PCI_MMIO_AREA;
+static u32 mmio_blocks			= KVM_PCI_MMIO_AREA;
+static u16 io_port_blocks		= PCI_IOPORT_START;
+
+u16 pci_get_io_port_block(u32 size)
+{
+	u16 port = ALIGN(io_port_blocks, IOPORT_SIZE);
+
+	io_port_blocks = port + size;
+	return port;
+}
 
 /*
  * BARs must be naturally aligned, so enforce this in the allocator.
  */
-u32 pci_get_io_space_block(u32 size)
+u32 pci_get_mmio_block(u32 size)
 {
-	u32 block = ALIGN(io_space_blocks, size);
-	io_space_blocks = block + size;
+	u32 block = ALIGN(mmio_blocks, size);
+	mmio_blocks = block + size;
 	return block;
 }
 
diff --git a/powerpc/include/kvm/kvm-arch.h b/powerpc/include/kvm/kvm-arch.h
index 8126b96cb66a..26d440b22bdd 100644
--- a/powerpc/include/kvm/kvm-arch.h
+++ b/powerpc/include/kvm/kvm-arch.h
@@ -34,7 +34,7 @@
 #define KVM_MMIO_START			PPC_MMIO_START
 
 /*
- * This is the address that pci_get_io_space_block() starts allocating
+ * This is the address that pci_get_io_port_block() starts allocating
  * from.  Note that this is a PCI bus address.
  */
 #define KVM_IOPORT_AREA			0x0
diff --git a/vfio/core.c b/vfio/core.c
index 17b5b0cfc9ac..0ed1e6fee6bf 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -202,8 +202,10 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
 				  struct vfio_region *region)
 {
 	if (region->is_ioport) {
-		int port = ioport__register(kvm, IOPORT_EMPTY, &vfio_ioport_ops,
-					    region->info.size, region);
+		int port = pci_get_io_port_block(region->info.size);
+
+		port = ioport__register(kvm, port, &vfio_ioport_ops,
+					region->info.size, region);
 		if (port < 0)
 			return port;
 
diff --git a/vfio/pci.c b/vfio/pci.c
index 76e24c156906..8e5d8572bc0c 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -750,7 +750,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm,
 	 * powers of two.
 	 */
 	mmio_size = roundup_pow_of_two(table->size + pba->size);
-	table->guest_phys_addr = pci_get_io_space_block(mmio_size);
+	table->guest_phys_addr = pci_get_mmio_block(mmio_size);
 	if (!table->guest_phys_addr) {
 		pr_err("cannot allocate IO space");
 		ret = -ENOMEM;
@@ -846,7 +846,7 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 	if (!region->is_ioport) {
 		/* Grab some MMIO space in the guest */
 		map_size = ALIGN(region->info.size, PAGE_SIZE);
-		region->guest_phys_addr = pci_get_io_space_block(map_size);
+		region->guest_phys_addr = pci_get_mmio_block(map_size);
 	}
 
 	/* Map the BARs into the guest or setup a trap region. */
diff --git a/virtio/pci.c b/virtio/pci.c
index 04e801827df9..d73414abde05 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -438,18 +438,19 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = ioport__register(kvm, IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
+	r = pci_get_io_port_block(IOPORT_SIZE);
+	r = ioport__register(kvm, r, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
 	if (r < 0)
 		return r;
 	vpci->port_addr = (u16)r;
 
-	vpci->mmio_addr = pci_get_io_space_block(IOPORT_SIZE);
+	vpci->mmio_addr = pci_get_mmio_block(IOPORT_SIZE);
 	r = kvm__register_mmio(kvm, vpci->mmio_addr, IOPORT_SIZE, false,
 			       virtio_pci__io_mmio_callback, vpci);
 	if (r < 0)
 		goto free_ioport;
 
-	vpci->msix_io_block = pci_get_io_space_block(PCI_IO_SIZE * 2);
+	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
 	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
 			       virtio_pci__msix_mmio_callback, vpci);
 	if (r < 0)
diff --git a/x86/include/kvm/kvm-arch.h b/x86/include/kvm/kvm-arch.h
index bfdd3438a9de..85cd336c7577 100644
--- a/x86/include/kvm/kvm-arch.h
+++ b/x86/include/kvm/kvm-arch.h
@@ -16,7 +16,7 @@
 
 #define KVM_MMIO_START		KVM_32BIT_GAP_START
 
-/* This is the address that pci_get_io_space_block() starts allocating
+/* This is the address that pci_get_io_port_block() starts allocating
  * from.  Note that this is a PCI bus address (though same on x86).
  */
 #define KVM_IOPORT_AREA		0x0
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 08/30] pci: Fix ioport allocation size
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (6 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 07/30] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-23 13:47 ` [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region Alexandru Elisei
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

The PCI Local Bus Specification, Rev. 3.0,
Section 6.2.5.1. "Address Maps" states:
"Devices that map control functions into I/O Space must not consume more
than 256 bytes per I/O Base Address register."

Yet all the PCI devices allocate IO ports of IOPORT_SIZE (= 1024 bytes).

Fix this by having PCI devices use 256 bytes ports for IO BARs.

There is no hard requirement on the size of the memory region described
by memory BARs. Since BAR 1 is supposed to offer the same functionality as
IO ports, let's make its size match BAR 0.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Added rationale for changing BAR1 size to PCI_IO_SIZE]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c            |  4 ++--
 include/kvm/ioport.h |  1 -
 pci.c                |  2 +-
 virtio/pci.c         | 15 +++++++--------
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index 24fb46faad3b..d8d91aa9c873 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -63,8 +63,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
-	r = pci_get_io_port_block(IOPORT_SIZE);
-	r = ioport__register(kvm, r, &vesa_io_ops, IOPORT_SIZE, NULL);
+	r = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
 	if (r < 0)
 		return ERR_PTR(r);
 
diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index b10fcd5b4412..8c86b7151f25 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -14,7 +14,6 @@
 
 /* some ports we reserve for own use */
 #define IOPORT_DBG			0xe0
-#define IOPORT_SIZE			0x400
 
 struct kvm;
 
diff --git a/pci.c b/pci.c
index 80b5c5d3d7f3..b6892d974c08 100644
--- a/pci.c
+++ b/pci.c
@@ -20,7 +20,7 @@ static u16 io_port_blocks		= PCI_IOPORT_START;
 
 u16 pci_get_io_port_block(u32 size)
 {
-	u16 port = ALIGN(io_port_blocks, IOPORT_SIZE);
+	u16 port = ALIGN(io_port_blocks, PCI_IO_SIZE);
 
 	io_port_blocks = port + size;
 	return port;
diff --git a/virtio/pci.c b/virtio/pci.c
index d73414abde05..eeb5b5efa6e1 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -421,7 +421,7 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 {
 	struct virtio_pci *vpci = ptr;
 	int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
-	u16 port = vpci->port_addr + (addr & (IOPORT_SIZE - 1));
+	u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
 
 	kvm__emulate_io(vcpu, port, data, direction, len, 1);
 }
@@ -435,17 +435,16 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	vpci->kvm = kvm;
 	vpci->dev = dev;
 
-	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = pci_get_io_port_block(IOPORT_SIZE);
-	r = ioport__register(kvm, r, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
+	r = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev);
 	if (r < 0)
 		return r;
 	vpci->port_addr = (u16)r;
 
-	vpci->mmio_addr = pci_get_mmio_block(IOPORT_SIZE);
-	r = kvm__register_mmio(kvm, vpci->mmio_addr, IOPORT_SIZE, false,
+	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
+	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
 			       virtio_pci__io_mmio_callback, vpci);
 	if (r < 0)
 		goto free_ioport;
@@ -475,8 +474,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
 		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
 		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
-		.bar_size[0]		= cpu_to_le32(IOPORT_SIZE),
-		.bar_size[1]		= cpu_to_le32(IOPORT_SIZE),
+		.bar_size[0]		= cpu_to_le32(PCI_IO_SIZE),
+		.bar_size[1]		= cpu_to_le32(PCI_IO_SIZE),
 		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
 	};
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (7 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 08/30] pci: Fix ioport allocation size Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-29 18:16   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 10/30] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
                   ` (22 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

Current PCI IO region that is exposed through the DT contains ports that
are reserved by non-PCI devices.

Use the proper PCI IO start so that the region exposed through DT can
actually be used to reassign device BARs.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/include/arm-common/pci.h |  1 +
 arm/kvm.c                    |  3 +++
 arm/pci.c                    | 21 ++++++++++++++++++---
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arm/include/arm-common/pci.h b/arm/include/arm-common/pci.h
index 9008a0ed072e..aea42b8895e9 100644
--- a/arm/include/arm-common/pci.h
+++ b/arm/include/arm-common/pci.h
@@ -1,6 +1,7 @@
 #ifndef ARM_COMMON__PCI_H
 #define ARM_COMMON__PCI_H
 
+void pci__arm_init(struct kvm *kvm);
 void pci__generate_fdt_nodes(void *fdt);
 
 #endif /* ARM_COMMON__PCI_H */
diff --git a/arm/kvm.c b/arm/kvm.c
index 1f85fc60588f..5c30ec1e0515 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -6,6 +6,7 @@
 #include "kvm/fdt.h"
 
 #include "arm-common/gic.h"
+#include "arm-common/pci.h"
 
 #include <linux/kernel.h>
 #include <linux/kvm.h>
@@ -86,6 +87,8 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 	/* Create the virtual GIC. */
 	if (gic__create(kvm, kvm->cfg.arch.irqchip))
 		die("Failed to create virtual GIC");
+
+	pci__arm_init(kvm);
 }
 
 #define FDT_ALIGN	SZ_2M
diff --git a/arm/pci.c b/arm/pci.c
index ed325fa4a811..1c0949a22408 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -1,3 +1,5 @@
+#include "linux/sizes.h"
+
 #include "kvm/devices.h"
 #include "kvm/fdt.h"
 #include "kvm/kvm.h"
@@ -7,6 +9,11 @@
 
 #include "arm-common/pci.h"
 
+#define ARM_PCI_IO_START ALIGN(PCI_IOPORT_START, SZ_4K)
+
+/* Must be a multiple of 4k */
+#define ARM_PCI_IO_SIZE ((ARM_MMIO_AREA - ARM_PCI_IO_START) & ~(SZ_4K - 1))
+
 /*
  * An entry in the interrupt-map table looks like:
  * <pci unit address> <pci interrupt pin> <gic phandle> <gic interrupt>
@@ -24,6 +31,14 @@ struct of_interrupt_map_entry {
 	struct of_gic_irq		gic_irq;
 } __attribute__((packed));
 
+void pci__arm_init(struct kvm *kvm)
+{
+	u32 align_pad = ARM_PCI_IO_START - PCI_IOPORT_START;
+
+	/* Make PCI port allocation start at a properly aligned address */
+	pci_get_io_port_block(align_pad);
+}
+
 void pci__generate_fdt_nodes(void *fdt)
 {
 	struct device_header *dev_hdr;
@@ -40,10 +55,10 @@ void pci__generate_fdt_nodes(void *fdt)
 			.pci_addr = {
 				.hi	= cpu_to_fdt32(of_pci_b_ss(OF_PCI_SS_IO)),
 				.mid	= 0,
-				.lo	= 0,
+				.lo	= cpu_to_fdt32(ARM_PCI_IO_START),
 			},
-			.cpu_addr	= cpu_to_fdt64(KVM_IOPORT_AREA),
-			.length		= cpu_to_fdt64(ARM_IOPORT_SIZE),
+			.cpu_addr	= cpu_to_fdt64(ARM_PCI_IO_START),
+			.length		= cpu_to_fdt64(ARM_PCI_IO_SIZE),
 		},
 		{
 			.pci_addr = {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 10/30] virtio/pci: Make memory and IO BARs independent
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (8 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-29 18:16   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 11/30] vfio/pci: Allocate correct size for MSIX table and PBA BARs Alexandru Elisei
                   ` (21 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

Currently, callbacks for memory BAR 1 call the IO port emulation.  This
means that the memory BAR needs I/O Space to be enabled whenever Memory
Space is enabled.

Refactor the code so the two type of  BARs are independent. Also, unify
ioport/mmio callback arguments so that they all receive a virtio_device.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 virtio/pci.c | 71 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 48 insertions(+), 23 deletions(-)

diff --git a/virtio/pci.c b/virtio/pci.c
index eeb5b5efa6e1..6723a1f3a84d 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -87,8 +87,8 @@ static inline bool virtio_pci__msix_enabled(struct virtio_pci *vpci)
 	return vpci->pci_hdr.msix.ctrl & cpu_to_le16(PCI_MSIX_FLAGS_ENABLE);
 }
 
-static bool virtio_pci__specific_io_in(struct kvm *kvm, struct virtio_device *vdev, u16 port,
-					void *data, int size, int offset)
+static bool virtio_pci__specific_data_in(struct kvm *kvm, struct virtio_device *vdev,
+					 void *data, int size, unsigned long offset)
 {
 	u32 config_offset;
 	struct virtio_pci *vpci = vdev->virtio;
@@ -117,20 +117,17 @@ static bool virtio_pci__specific_io_in(struct kvm *kvm, struct virtio_device *vd
 	return false;
 }
 
-static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+static bool virtio_pci__data_in(struct kvm_cpu *vcpu, struct virtio_device *vdev,
+				unsigned long offset, void *data, int size)
 {
-	unsigned long offset;
 	bool ret = true;
-	struct virtio_device *vdev;
 	struct virtio_pci *vpci;
 	struct virt_queue *vq;
 	struct kvm *kvm;
 	u32 val;
 
 	kvm = vcpu->kvm;
-	vdev = ioport->priv;
 	vpci = vdev->virtio;
-	offset = port - vpci->port_addr;
 
 	switch (offset) {
 	case VIRTIO_PCI_HOST_FEATURES:
@@ -154,13 +151,26 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p
 		vpci->isr = VIRTIO_IRQ_LOW;
 		break;
 	default:
-		ret = virtio_pci__specific_io_in(kvm, vdev, port, data, size, offset);
+		ret = virtio_pci__specific_data_in(kvm, vdev, data, size, offset);
 		break;
 	};
 
 	return ret;
 }
 
+static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+{
+	unsigned long offset;
+	struct virtio_device *vdev;
+	struct virtio_pci *vpci;
+
+	vdev = ioport->priv;
+	vpci = vdev->virtio;
+	offset = port - vpci->port_addr;
+
+	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
+}
+
 static void update_msix_map(struct virtio_pci *vpci,
 			    struct msix_table *msix_entry, u32 vecnum)
 {
@@ -185,8 +195,8 @@ static void update_msix_map(struct virtio_pci *vpci,
 	irq__update_msix_route(vpci->kvm, gsi, &msix_entry->msg);
 }
 
-static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *vdev, u16 port,
-					void *data, int size, int offset)
+static bool virtio_pci__specific_data_out(struct kvm *kvm, struct virtio_device *vdev,
+					  void *data, int size, unsigned long offset)
 {
 	struct virtio_pci *vpci = vdev->virtio;
 	u32 config_offset, vec;
@@ -259,19 +269,16 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *v
 	return false;
 }
 
-static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+static bool virtio_pci__data_out(struct kvm_cpu *vcpu, struct virtio_device *vdev,
+				 unsigned long offset, void *data, int size)
 {
-	unsigned long offset;
 	bool ret = true;
-	struct virtio_device *vdev;
 	struct virtio_pci *vpci;
 	struct kvm *kvm;
 	u32 val;
 
 	kvm = vcpu->kvm;
-	vdev = ioport->priv;
 	vpci = vdev->virtio;
-	offset = port - vpci->port_addr;
 
 	switch (offset) {
 	case VIRTIO_PCI_GUEST_FEATURES:
@@ -304,13 +311,26 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 		virtio_notify_status(kvm, vdev, vpci->dev, vpci->status);
 		break;
 	default:
-		ret = virtio_pci__specific_io_out(kvm, vdev, port, data, size, offset);
+		ret = virtio_pci__specific_data_out(kvm, vdev, data, size, offset);
 		break;
 	};
 
 	return ret;
 }
 
+static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+{
+	unsigned long offset;
+	struct virtio_device *vdev;
+	struct virtio_pci *vpci;
+
+	vdev = ioport->priv;
+	vpci = vdev->virtio;
+	offset = port - vpci->port_addr;
+
+	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
+}
+
 static struct ioport_operations virtio_pci__io_ops = {
 	.io_in	= virtio_pci__io_in,
 	.io_out	= virtio_pci__io_out,
@@ -320,7 +340,8 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
 					   u64 addr, u8 *data, u32 len,
 					   u8 is_write, void *ptr)
 {
-	struct virtio_pci *vpci = ptr;
+	struct virtio_device *vdev = ptr;
+	struct virtio_pci *vpci = vdev->virtio;
 	struct msix_table *table;
 	int vecnum;
 	size_t offset;
@@ -419,11 +440,15 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 					 u64 addr, u8 *data, u32 len,
 					 u8 is_write, void *ptr)
 {
-	struct virtio_pci *vpci = ptr;
-	int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
-	u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
+	struct virtio_device *vdev = ptr;
+	struct virtio_pci *vpci = vdev->virtio;
 
-	kvm__emulate_io(vcpu, port, data, direction, len, 1);
+	if (!is_write)
+		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
+				    data, len);
+	else
+		virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr,
+				     data, len);
 }
 
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
@@ -445,13 +470,13 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
 	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
-			       virtio_pci__io_mmio_callback, vpci);
+			       virtio_pci__io_mmio_callback, vdev);
 	if (r < 0)
 		goto free_ioport;
 
 	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
 	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
-			       virtio_pci__msix_mmio_callback, vpci);
+			       virtio_pci__msix_mmio_callback, vdev);
 	if (r < 0)
 		goto free_mmio;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 11/30] vfio/pci: Allocate correct size for MSIX table and PBA BARs
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (9 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 10/30] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-29 18:16   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 12/30] vfio/pci: Don't assume that only even numbered BARs are 64bit Alexandru Elisei
                   ` (20 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

kvmtool assumes that the BAR that holds the address for the MSIX table
and PBA structure has a size which is equal to their total size and it
allocates memory from MMIO space accordingly.  However, when
initializing the BARs, the BAR size is set to the region size reported
by VFIO. When the physical BAR size is greater than the mmio space that
kvmtool allocates, we can have a situation where the BAR overlaps with
another BAR, in which case kvmtool will fail to map the memory. This was
found when trying to do PCI passthrough with a PCIe Realtek r8168 NIC,
when the guest was also using virtio-block and virtio-net devices:

[..]
[    0.197926] PCI: OF: PROBE_ONLY enabled
[    0.198454] pci-host-generic 40000000.pci: host bridge /pci ranges:
[    0.199291] pci-host-generic 40000000.pci:    IO 0x00007000..0x0000ffff -> 0x00007000
[    0.200331] pci-host-generic 40000000.pci:   MEM 0x41000000..0x7fffffff -> 0x41000000
[    0.201480] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x40ffffff] for [bus 00]
[    0.202635] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00
[    0.203535] pci_bus 0000:00: root bus resource [bus 00]
[    0.204227] pci_bus 0000:00: root bus resource [io  0x0000-0x8fff] (bus address [0x7000-0xffff])
[    0.205483] pci_bus 0000:00: root bus resource [mem 0x41000000-0x7fffffff]
[    0.206456] pci 0000:00:00.0: [10ec:8168] type 00 class 0x020000
[    0.207399] pci 0000:00:00.0: reg 0x10: [io  0x0000-0x00ff]
[    0.208252] pci 0000:00:00.0: reg 0x18: [mem 0x41002000-0x41002fff]
[    0.209233] pci 0000:00:00.0: reg 0x20: [mem 0x41000000-0x41003fff]
[    0.210481] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000
[    0.211349] pci 0000:00:01.0: reg 0x10: [io  0x0100-0x01ff]
[    0.212118] pci 0000:00:01.0: reg 0x14: [mem 0x41003000-0x410030ff]
[    0.212982] pci 0000:00:01.0: reg 0x18: [mem 0x41003200-0x410033ff]
[    0.214247] pci 0000:00:02.0: [1af4:1001] type 00 class 0x018000
[    0.215096] pci 0000:00:02.0: reg 0x10: [io  0x0200-0x02ff]
[    0.215863] pci 0000:00:02.0: reg 0x14: [mem 0x41003400-0x410034ff]
[    0.216723] pci 0000:00:02.0: reg 0x18: [mem 0x41003600-0x410037ff]
[    0.218105] pci 0000:00:00.0: can't claim BAR 4 [mem 0x41000000-0x41003fff]: address conflict with 0000:00:00.0 [mem 0x41002000-0x41002fff]
[..]

Guest output of lspci -vv:

00:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
	Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 16
	Region 0: I/O ports at 0000 [size=256]
	Region 2: Memory at 41002000 (64-bit, non-prefetchable) [size=4K]
	Region 4: Memory at 41000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00001000

Let's fix this by allocating an amount of MMIO memory equal to the size
of the BAR that contains the MSIX table and/or PBA.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 68 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 8e5d8572bc0c..bbb8469c8d93 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -715,17 +715,44 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 	return 0;
 }
 
-static int vfio_pci_create_msix_table(struct kvm *kvm,
-				      struct vfio_pci_device *pdev)
+static int vfio_pci_get_region_info(struct vfio_device *vdev, u32 index,
+				    struct vfio_region_info *info)
+{
+	int ret;
+
+	*info = (struct vfio_region_info) {
+		.argsz = sizeof(*info),
+		.index = index,
+	};
+
+	ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
+	if (ret) {
+		ret = -errno;
+		vfio_dev_err(vdev, "cannot get info for BAR %u", index);
+		return ret;
+	}
+
+	if (info->size && !is_power_of_two(info->size)) {
+		vfio_dev_err(vdev, "region is not power of two: 0x%llx",
+				info->size);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
 {
 	int ret;
 	size_t i;
-	size_t mmio_size;
+	size_t map_size;
 	size_t nr_entries;
 	struct vfio_pci_msi_entry *entries;
+	struct vfio_pci_device *pdev = &vdev->pci;
 	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
 	struct vfio_pci_msix_table *table = &pdev->msix_table;
 	struct msix_cap *msix = PCI_CAP(&pdev->hdr, pdev->msix.pos);
+	struct vfio_region_info info;
 
 	table->bar = msix->table_offset & PCI_MSIX_TABLE_BIR;
 	pba->bar = msix->pba_offset & PCI_MSIX_TABLE_BIR;
@@ -744,15 +771,31 @@ static int vfio_pci_create_msix_table(struct kvm *kvm,
 	for (i = 0; i < nr_entries; i++)
 		entries[i].config.ctrl = PCI_MSIX_ENTRY_CTRL_MASKBIT;
 
+	ret = vfio_pci_get_region_info(vdev, table->bar, &info);
+	if (ret)
+		return ret;
+	if (!info.size)
+		return -EINVAL;
+	map_size = info.size;
+
+	if (table->bar != pba->bar) {
+		ret = vfio_pci_get_region_info(vdev, pba->bar, &info);
+		if (ret)
+			return ret;
+		if (!info.size)
+			return -EINVAL;
+		map_size += info.size;
+	}
+
 	/*
 	 * To ease MSI-X cap configuration in case they share the same BAR,
 	 * collapse table and pending array. The size of the BAR regions must be
 	 * powers of two.
 	 */
-	mmio_size = roundup_pow_of_two(table->size + pba->size);
-	table->guest_phys_addr = pci_get_mmio_block(mmio_size);
+	map_size = ALIGN(map_size, PAGE_SIZE);
+	table->guest_phys_addr = pci_get_mmio_block(map_size);
 	if (!table->guest_phys_addr) {
-		pr_err("cannot allocate IO space");
+		pr_err("cannot allocate MMIO space");
 		ret = -ENOMEM;
 		goto out_free;
 	}
@@ -816,17 +859,10 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 
 	region->vdev = vdev;
 	region->is_ioport = !!(bar & PCI_BASE_ADDRESS_SPACE_IO);
-	region->info = (struct vfio_region_info) {
-		.argsz = sizeof(region->info),
-		.index = nr,
-	};
 
-	ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &region->info);
-	if (ret) {
-		ret = -errno;
-		vfio_dev_err(vdev, "cannot get info for BAR %zu", nr);
+	ret = vfio_pci_get_region_info(vdev, nr, &region->info);
+	if (ret)
 		return ret;
-	}
 
 	/* Ignore invalid or unimplemented regions */
 	if (!region->info.size)
@@ -871,7 +907,7 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 		return ret;
 
 	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) {
-		ret = vfio_pci_create_msix_table(kvm, pdev);
+		ret = vfio_pci_create_msix_table(kvm, vdev);
 		if (ret)
 			return ret;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 12/30] vfio/pci: Don't assume that only even numbered BARs are 64bit
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (10 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 11/30] vfio/pci: Allocate correct size for MSIX table and PBA BARs Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-30 14:50   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 13/30] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
                   ` (19 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

Not all devices have the bottom 32 bits of a 64 bit BAR in an even
numbered BAR. For example, on an NVIDIA Quadro P400, BARs 1 and 3 are
64bit. Remove this assumption.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index bbb8469c8d93..1bdc20038411 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -920,8 +920,10 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 
 	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
 		/* Ignore top half of 64-bit BAR */
-		if (i % 2 && is_64bit)
+		if (is_64bit) {
+			is_64bit = false;
 			continue;
+		}
 
 		ret = vfio_pci_configure_bar(kvm, vdev, i);
 		if (ret)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 13/30] vfio/pci: Ignore expansion ROM BAR writes
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (11 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 12/30] vfio/pci: Don't assume that only even numbered BARs are 64bit Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-30 14:50   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 14/30] vfio/pci: Don't access potentially unallocated regions Alexandru Elisei
                   ` (18 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

To get the size of the expansion ROM, software writes 0xfffff800 to the
expansion ROM BAR in the PCI configuration space. PCI emulation executes
the optional configuration space write callback that a device can
implement before emulating this write.

VFIO doesn't have support for emulating expansion ROMs. However, the
callback writes the guest value to the hardware BAR, and then it reads
it back to the BAR to make sure the write has completed successfully.

After this, we return to regular PCI emulation and because the BAR is
no longer 0, we write back to the BAR the value that the guest used to
get the size. As a result, the guest will think that the ROM size is
0x800 after the subsequent read and we end up unintentionally exposing
to the guest a BAR which we don't emulate.

Let's fix this by ignoring writes to the expansion ROM BAR.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/vfio/pci.c b/vfio/pci.c
index 1bdc20038411..1f38f90c3ae9 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -472,6 +472,9 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	struct vfio_device *vdev;
 	void *base = pci_hdr;
 
+	if (offset == PCI_ROM_ADDRESS)
+		return;
+
 	pdev = container_of(pci_hdr, struct vfio_pci_device, hdr);
 	vdev = container_of(pdev, struct vfio_device, pci);
 	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 14/30] vfio/pci: Don't access potentially unallocated regions
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (12 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 13/30] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-29 18:17   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures Alexandru Elisei
                   ` (17 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

Don't try to configure a BAR if there is no region associated with it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 1f38f90c3ae9..f86a7d9b7032 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -652,6 +652,8 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 
 	/* Initialise the BARs */
 	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
+		if ((u32)i == vdev->info.num_regions)
+			break;
 		u64 base;
 		struct vfio_region *region = &vdev->regions[i];
 
@@ -853,11 +855,12 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 	u32 bar;
 	size_t map_size;
 	struct vfio_pci_device *pdev = &vdev->pci;
-	struct vfio_region *region = &vdev->regions[nr];
+	struct vfio_region *region;
 
 	if (nr >= vdev->info.num_regions)
 		return 0;
 
+	region = &vdev->regions[nr];
 	bar = pdev->hdr.bar[nr];
 
 	region->vdev = vdev;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (13 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 14/30] vfio/pci: Don't access potentially unallocated regions Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-30 14:51   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 16/30] Don't ignore errors registering a device, ioport or mmio emulation Alexandru Elisei
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

Don't ignore an error in the bus specific initialization function in
virtio_init; don't ignore the result of virtio_init; and don't return 0
in virtio_blk__init and virtio_scsi__init when we encounter an error.
Hopefully this will save some developer's time debugging faulty virtio
devices in a guest.

To take advantage of the cleanup function virtio_blk__exit, we have
moved appending the new device to the list before the call to
virtio_init.

To safeguard against this in the future, virtio_init has been annoted
with the compiler attribute warn_unused_result.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/kvm.h        |  1 +
 include/kvm/virtio.h     |  7 ++++---
 include/linux/compiler.h |  2 +-
 virtio/9p.c              |  9 ++++++---
 virtio/balloon.c         | 10 +++++++---
 virtio/blk.c             | 14 +++++++++-----
 virtio/console.c         | 11 ++++++++---
 virtio/core.c            |  9 +++++----
 virtio/net.c             | 32 ++++++++++++++++++--------------
 virtio/scsi.c            | 14 +++++++++-----
 10 files changed, 68 insertions(+), 41 deletions(-)

diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 7a738183d67a..c6dc6ef72d11 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -8,6 +8,7 @@
 
 #include <stdbool.h>
 #include <linux/types.h>
+#include <linux/compiler.h>
 #include <time.h>
 #include <signal.h>
 #include <sys/prctl.h>
diff --git a/include/kvm/virtio.h b/include/kvm/virtio.h
index 19b913732cd5..3a311f54f2a5 100644
--- a/include/kvm/virtio.h
+++ b/include/kvm/virtio.h
@@ -7,6 +7,7 @@
 #include <linux/virtio_pci.h>
 
 #include <linux/types.h>
+#include <linux/compiler.h>
 #include <linux/virtio_config.h>
 #include <sys/uio.h>
 
@@ -204,9 +205,9 @@ struct virtio_ops {
 	int (*reset)(struct kvm *kvm, struct virtio_device *vdev);
 };
 
-int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
-		struct virtio_ops *ops, enum virtio_trans trans,
-		int device_id, int subsys_id, int class);
+int __must_check virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
+			     struct virtio_ops *ops, enum virtio_trans trans,
+			     int device_id, int subsys_id, int class);
 int virtio_compat_add_message(const char *device, const char *config);
 const char* virtio_trans_name(enum virtio_trans trans);
 
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 898420b81aec..a662ba0a5c68 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -14,7 +14,7 @@
 #define __packed	__attribute__((packed))
 #define __iomem
 #define __force
-#define __must_check
+#define __must_check	__attribute__((warn_unused_result))
 #define unlikely
 
 #endif
diff --git a/virtio/9p.c b/virtio/9p.c
index ac70dbc31207..b78f2b3f0e09 100644
--- a/virtio/9p.c
+++ b/virtio/9p.c
@@ -1551,11 +1551,14 @@ int virtio_9p_img_name_parser(const struct option *opt, const char *arg, int uns
 int virtio_9p__init(struct kvm *kvm)
 {
 	struct p9_dev *p9dev;
+	int r;
 
 	list_for_each_entry(p9dev, &devs, list) {
-		virtio_init(kvm, p9dev, &p9dev->vdev, &p9_dev_virtio_ops,
-			    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_9P,
-			    VIRTIO_ID_9P, PCI_CLASS_9P);
+		r = virtio_init(kvm, p9dev, &p9dev->vdev, &p9_dev_virtio_ops,
+				VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_9P,
+				VIRTIO_ID_9P, PCI_CLASS_9P);
+		if (r < 0)
+			return r;
 	}
 
 	return 0;
diff --git a/virtio/balloon.c b/virtio/balloon.c
index 0bd16703dfee..8e8803fed607 100644
--- a/virtio/balloon.c
+++ b/virtio/balloon.c
@@ -264,6 +264,8 @@ struct virtio_ops bln_dev_virtio_ops = {
 
 int virtio_bln__init(struct kvm *kvm)
 {
+	int r;
+
 	if (!kvm->cfg.balloon)
 		return 0;
 
@@ -273,9 +275,11 @@ int virtio_bln__init(struct kvm *kvm)
 	bdev.stat_waitfd	= eventfd(0, 0);
 	memset(&bdev.config, 0, sizeof(struct virtio_balloon_config));
 
-	virtio_init(kvm, &bdev, &bdev.vdev, &bln_dev_virtio_ops,
-		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLN,
-		    VIRTIO_ID_BALLOON, PCI_CLASS_BLN);
+	r = virtio_init(kvm, &bdev, &bdev.vdev, &bln_dev_virtio_ops,
+			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLN,
+			VIRTIO_ID_BALLOON, PCI_CLASS_BLN);
+	if (r < 0)
+		return r;
 
 	if (compat_id == -1)
 		compat_id = virtio_compat_add_message("virtio-balloon", "CONFIG_VIRTIO_BALLOON");
diff --git a/virtio/blk.c b/virtio/blk.c
index f267be1563dc..4d02d101af81 100644
--- a/virtio/blk.c
+++ b/virtio/blk.c
@@ -306,6 +306,7 @@ static struct virtio_ops blk_dev_virtio_ops = {
 static int virtio_blk__init_one(struct kvm *kvm, struct disk_image *disk)
 {
 	struct blk_dev *bdev;
+	int r;
 
 	if (!disk)
 		return -EINVAL;
@@ -323,12 +324,14 @@ static int virtio_blk__init_one(struct kvm *kvm, struct disk_image *disk)
 		.kvm			= kvm,
 	};
 
-	virtio_init(kvm, bdev, &bdev->vdev, &blk_dev_virtio_ops,
-		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLK,
-		    VIRTIO_ID_BLOCK, PCI_CLASS_BLK);
-
 	list_add_tail(&bdev->list, &bdevs);
 
+	r = virtio_init(kvm, bdev, &bdev->vdev, &blk_dev_virtio_ops,
+			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLK,
+			VIRTIO_ID_BLOCK, PCI_CLASS_BLK);
+	if (r < 0)
+		return r;
+
 	disk_image__set_callback(bdev->disk, virtio_blk_complete);
 
 	if (compat_id == -1)
@@ -359,7 +362,8 @@ int virtio_blk__init(struct kvm *kvm)
 
 	return 0;
 cleanup:
-	return virtio_blk__exit(kvm);
+	virtio_blk__exit(kvm);
+	return r;
 }
 virtio_dev_init(virtio_blk__init);
 
diff --git a/virtio/console.c b/virtio/console.c
index f1be02549222..e0b98df37965 100644
--- a/virtio/console.c
+++ b/virtio/console.c
@@ -230,12 +230,17 @@ static struct virtio_ops con_dev_virtio_ops = {
 
 int virtio_console__init(struct kvm *kvm)
 {
+	int r;
+
 	if (kvm->cfg.active_console != CONSOLE_VIRTIO)
 		return 0;
 
-	virtio_init(kvm, &cdev, &cdev.vdev, &con_dev_virtio_ops,
-		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_CONSOLE,
-		    VIRTIO_ID_CONSOLE, PCI_CLASS_CONSOLE);
+	r = virtio_init(kvm, &cdev, &cdev.vdev, &con_dev_virtio_ops,
+			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_CONSOLE,
+			VIRTIO_ID_CONSOLE, PCI_CLASS_CONSOLE);
+	if (r < 0)
+		return r;
+
 	if (compat_id == -1)
 		compat_id = virtio_compat_add_message("virtio-console", "CONFIG_VIRTIO_CONSOLE");
 
diff --git a/virtio/core.c b/virtio/core.c
index e10ec362e1ea..f5b3c07fc100 100644
--- a/virtio/core.c
+++ b/virtio/core.c
@@ -259,6 +259,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		int device_id, int subsys_id, int class)
 {
 	void *virtio;
+	int r;
 
 	switch (trans) {
 	case VIRTIO_PCI:
@@ -272,7 +273,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		vdev->ops->init			= virtio_pci__init;
 		vdev->ops->exit			= virtio_pci__exit;
 		vdev->ops->reset		= virtio_pci__reset;
-		vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
+		r = vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
 		break;
 	case VIRTIO_MMIO:
 		virtio = calloc(sizeof(struct virtio_mmio), 1);
@@ -285,13 +286,13 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		vdev->ops->init			= virtio_mmio_init;
 		vdev->ops->exit			= virtio_mmio_exit;
 		vdev->ops->reset		= virtio_mmio_reset;
-		vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
+		r = vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
 		break;
 	default:
-		return -1;
+		r = -1;
 	};
 
-	return 0;
+	return r;
 }
 
 int virtio_compat_add_message(const char *device, const char *config)
diff --git a/virtio/net.c b/virtio/net.c
index 091406912a24..425c13ba1136 100644
--- a/virtio/net.c
+++ b/virtio/net.c
@@ -910,7 +910,7 @@ done:
 
 static int virtio_net__init_one(struct virtio_net_params *params)
 {
-	int i, err;
+	int i, r;
 	struct net_dev *ndev;
 	struct virtio_ops *ops;
 	enum virtio_trans trans = VIRTIO_DEFAULT_TRANS(params->kvm);
@@ -920,10 +920,8 @@ static int virtio_net__init_one(struct virtio_net_params *params)
 		return -ENOMEM;
 
 	ops = malloc(sizeof(*ops));
-	if (ops == NULL) {
-		err = -ENOMEM;
-		goto err_free_ndev;
-	}
+	if (ops == NULL)
+		return -ENOMEM;
 
 	list_add_tail(&ndev->list, &ndevs);
 
@@ -969,8 +967,10 @@ static int virtio_net__init_one(struct virtio_net_params *params)
 				   virtio_trans_name(trans));
 	}
 
-	virtio_init(params->kvm, ndev, &ndev->vdev, ops, trans,
-		    PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET);
+	r = virtio_init(params->kvm, ndev, &ndev->vdev, ops, trans,
+			PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET);
+	if (r < 0)
+		return r;
 
 	if (params->vhost)
 		virtio_net__vhost_init(params->kvm, ndev);
@@ -979,19 +979,17 @@ static int virtio_net__init_one(struct virtio_net_params *params)
 		compat_id = virtio_compat_add_message("virtio-net", "CONFIG_VIRTIO_NET");
 
 	return 0;
-
-err_free_ndev:
-	free(ndev);
-	return err;
 }
 
 int virtio_net__init(struct kvm *kvm)
 {
-	int i;
+	int i, r;
 
 	for (i = 0; i < kvm->cfg.num_net_devices; i++) {
 		kvm->cfg.net_params[i].kvm = kvm;
-		virtio_net__init_one(&kvm->cfg.net_params[i]);
+		r = virtio_net__init_one(&kvm->cfg.net_params[i]);
+		if (r < 0)
+			goto cleanup;
 	}
 
 	if (kvm->cfg.num_net_devices == 0 && kvm->cfg.no_net == 0) {
@@ -1007,10 +1005,16 @@ int virtio_net__init(struct kvm *kvm)
 		str_to_mac(kvm->cfg.guest_mac, net_params.guest_mac);
 		str_to_mac(kvm->cfg.host_mac, net_params.host_mac);
 
-		virtio_net__init_one(&net_params);
+		r = virtio_net__init_one(&net_params);
+		if (r < 0)
+			goto cleanup;
 	}
 
 	return 0;
+
+cleanup:
+	virtio_net__exit(kvm);
+	return r;
 }
 virtio_dev_init(virtio_net__init);
 
diff --git a/virtio/scsi.c b/virtio/scsi.c
index 1ec78fe0945a..16a86cb7e0e6 100644
--- a/virtio/scsi.c
+++ b/virtio/scsi.c
@@ -234,6 +234,7 @@ static void virtio_scsi_vhost_init(struct kvm *kvm, struct scsi_dev *sdev)
 static int virtio_scsi_init_one(struct kvm *kvm, struct disk_image *disk)
 {
 	struct scsi_dev *sdev;
+	int r;
 
 	if (!disk)
 		return -EINVAL;
@@ -260,12 +261,14 @@ static int virtio_scsi_init_one(struct kvm *kvm, struct disk_image *disk)
 	strlcpy((char *)&sdev->target.vhost_wwpn, disk->wwpn, sizeof(sdev->target.vhost_wwpn));
 	sdev->target.vhost_tpgt = strtol(disk->tpgt, NULL, 0);
 
-	virtio_init(kvm, sdev, &sdev->vdev, &scsi_dev_virtio_ops,
-		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_SCSI,
-		    VIRTIO_ID_SCSI, PCI_CLASS_BLK);
-
 	list_add_tail(&sdev->list, &sdevs);
 
+	r = virtio_init(kvm, sdev, &sdev->vdev, &scsi_dev_virtio_ops,
+			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_SCSI,
+			VIRTIO_ID_SCSI, PCI_CLASS_BLK);
+	if (r < 0)
+		return r;
+
 	virtio_scsi_vhost_init(kvm, sdev);
 
 	if (compat_id == -1)
@@ -302,7 +305,8 @@ int virtio_scsi_init(struct kvm *kvm)
 
 	return 0;
 cleanup:
-	return virtio_scsi_exit(kvm);
+	virtio_scsi_exit(kvm);
+	return r;
 }
 virtio_dev_init(virtio_scsi_init);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 16/30] Don't ignore errors registering a device, ioport or mmio emulation
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (14 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-30 14:51   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 17/30] hw/vesa: Don't ignore fatal errors Alexandru Elisei
                   ` (15 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

An error returned by device__register, kvm__register_mmio and
ioport__register means that the device will
not be emulated properly. Annotate the functions with __must_check, so we
get a compiler warning when this error is ignored.

And fix several instances where the caller returns 0 even if the
function failed.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/ioport.c          |  3 +-
 hw/i8042.c            | 12 ++++++--
 hw/vesa.c             |  4 ++-
 include/kvm/devices.h |  3 +-
 include/kvm/ioport.h  |  6 ++--
 include/kvm/kvm.h     |  6 ++--
 ioport.c              | 23 ++++++++-------
 mips/kvm.c            |  3 +-
 powerpc/ioport.c      |  3 +-
 virtio/mmio.c         | 13 +++++++--
 x86/ioport.c          | 66 ++++++++++++++++++++++++++++++++-----------
 11 files changed, 100 insertions(+), 42 deletions(-)

diff --git a/arm/ioport.c b/arm/ioport.c
index bdd30b6fe812..2f0feb9ab69f 100644
--- a/arm/ioport.c
+++ b/arm/ioport.c
@@ -1,8 +1,9 @@
 #include "kvm/ioport.h"
 #include "kvm/irq.h"
 
-void ioport__setup_arch(struct kvm *kvm)
+int ioport__setup_arch(struct kvm *kvm)
 {
+	return 0;
 }
 
 void ioport__map_irq(u8 *irq)
diff --git a/hw/i8042.c b/hw/i8042.c
index 2d8c96e9c7e6..37a99a2dc6b8 100644
--- a/hw/i8042.c
+++ b/hw/i8042.c
@@ -349,10 +349,18 @@ static struct ioport_operations kbd_ops = {
 
 int kbd__init(struct kvm *kvm)
 {
+	int r;
+
 	kbd_reset();
 	state.kvm = kvm;
-	ioport__register(kvm, I8042_DATA_REG, &kbd_ops, 2, NULL);
-	ioport__register(kvm, I8042_COMMAND_REG, &kbd_ops, 2, NULL);
+	r = ioport__register(kvm, I8042_DATA_REG, &kbd_ops, 2, NULL);
+	if (r < 0)
+		return r;
+	r = ioport__register(kvm, I8042_COMMAND_REG, &kbd_ops, 2, NULL);
+	if (r < 0) {
+		ioport__unregister(kvm, I8042_DATA_REG);
+		return r;
+	}
 
 	return 0;
 }
diff --git a/hw/vesa.c b/hw/vesa.c
index d8d91aa9c873..b92cc990b730 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -70,7 +70,9 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	vesa_base_addr			= (u16)r;
 	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
-	device__register(&vesa_device);
+	r = device__register(&vesa_device);
+	if (r < 0)
+		return ERR_PTR(r);
 
 	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
 	if (mem == MAP_FAILED)
diff --git a/include/kvm/devices.h b/include/kvm/devices.h
index 405f19521977..e445db6f56b1 100644
--- a/include/kvm/devices.h
+++ b/include/kvm/devices.h
@@ -3,6 +3,7 @@
 
 #include <linux/rbtree.h>
 #include <linux/types.h>
+#include <linux/compiler.h>
 
 enum device_bus_type {
 	DEVICE_BUS_PCI,
@@ -18,7 +19,7 @@ struct device_header {
 	struct rb_node		node;
 };
 
-int device__register(struct device_header *dev);
+int __must_check device__register(struct device_header *dev);
 void device__unregister(struct device_header *dev);
 struct device_header *device__find_dev(enum device_bus_type bus_type,
 				       u8 dev_num);
diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index 8c86b7151f25..62a719327e3f 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -33,11 +33,11 @@ struct ioport_operations {
 							    enum irq_type));
 };
 
-void ioport__setup_arch(struct kvm *kvm);
+int ioport__setup_arch(struct kvm *kvm);
 void ioport__map_irq(u8 *irq);
 
-int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops,
-			int count, void *param);
+int __must_check ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops,
+				  int count, void *param);
 int ioport__unregister(struct kvm *kvm, u16 port);
 int ioport__init(struct kvm *kvm);
 int ioport__exit(struct kvm *kvm);
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index c6dc6ef72d11..50119a8672eb 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -128,9 +128,9 @@ static inline int kvm__reserve_mem(struct kvm *kvm, u64 guest_phys, u64 size)
 				 KVM_MEM_TYPE_RESERVED);
 }
 
-int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
-		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
-			void *ptr);
+int __must_check kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
+				    void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
+				    void *ptr);
 bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr);
 void kvm__reboot(struct kvm *kvm);
 void kvm__pause(struct kvm *kvm);
diff --git a/ioport.c b/ioport.c
index a72e4035881a..d224819c6e43 100644
--- a/ioport.c
+++ b/ioport.c
@@ -91,16 +91,21 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	};
 
 	r = ioport_insert(&ioport_tree, entry);
-	if (r < 0) {
-		free(entry);
-		br_write_unlock(kvm);
-		return r;
-	}
-
-	device__register(&entry->dev_hdr);
+	if (r < 0)
+		goto out_free;
+	r = device__register(&entry->dev_hdr);
+	if (r < 0)
+		goto out_erase;
 	br_write_unlock(kvm);
 
 	return port;
+
+out_erase:
+	rb_int_erase(&ioport_tree, &entry->node);
+out_free:
+	free(entry);
+	br_write_unlock(kvm);
+	return r;
 }
 
 int ioport__unregister(struct kvm *kvm, u16 port)
@@ -196,9 +201,7 @@ out:
 
 int ioport__init(struct kvm *kvm)
 {
-	ioport__setup_arch(kvm);
-
-	return 0;
+	return ioport__setup_arch(kvm);
 }
 dev_base_init(ioport__init);
 
diff --git a/mips/kvm.c b/mips/kvm.c
index 211770da0d85..26355930d3b6 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -100,8 +100,9 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
 		die_perror("KVM_IRQ_LINE ioctl");
 }
 
-void ioport__setup_arch(struct kvm *kvm)
+int ioport__setup_arch(struct kvm *kvm)
 {
+	return 0;
 }
 
 bool kvm__arch_cpu_supports_vm(void)
diff --git a/powerpc/ioport.c b/powerpc/ioport.c
index 58dc625c54fe..0c188b61a51a 100644
--- a/powerpc/ioport.c
+++ b/powerpc/ioport.c
@@ -12,9 +12,10 @@
 
 #include <stdlib.h>
 
-void ioport__setup_arch(struct kvm *kvm)
+int ioport__setup_arch(struct kvm *kvm)
 {
 	/* PPC has no legacy ioports to set up */
+	return 0;
 }
 
 void ioport__map_irq(u8 *irq)
diff --git a/virtio/mmio.c b/virtio/mmio.c
index 03cecc366292..5537c39367d6 100644
--- a/virtio/mmio.c
+++ b/virtio/mmio.c
@@ -292,13 +292,16 @@ int virtio_mmio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		     int device_id, int subsys_id, int class)
 {
 	struct virtio_mmio *vmmio = vdev->virtio;
+	int r;
 
 	vmmio->addr	= virtio_mmio_get_io_space_block(VIRTIO_MMIO_IO_SIZE);
 	vmmio->kvm	= kvm;
 	vmmio->dev	= dev;
 
-	kvm__register_mmio(kvm, vmmio->addr, VIRTIO_MMIO_IO_SIZE,
-			   false, virtio_mmio_mmio_callback, vdev);
+	r = kvm__register_mmio(kvm, vmmio->addr, VIRTIO_MMIO_IO_SIZE,
+			       false, virtio_mmio_mmio_callback, vdev);
+	if (r < 0)
+		return r;
 
 	vmmio->hdr = (struct virtio_mmio_hdr) {
 		.magic		= {'v', 'i', 'r', 't'},
@@ -313,7 +316,11 @@ int virtio_mmio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.data		= generate_virtio_mmio_fdt_node,
 	};
 
-	device__register(&vmmio->dev_hdr);
+	r = device__register(&vmmio->dev_hdr);
+	if (r < 0) {
+		kvm__deregister_mmio(kvm, vmmio->addr);
+		return r;
+	}
 
 	/*
 	 * Instantiate guest virtio-mmio devices using kernel command line
diff --git a/x86/ioport.c b/x86/ioport.c
index 8572c758ed4f..7ad7b8f3f497 100644
--- a/x86/ioport.c
+++ b/x86/ioport.c
@@ -69,50 +69,84 @@ void ioport__map_irq(u8 *irq)
 {
 }
 
-void ioport__setup_arch(struct kvm *kvm)
+int ioport__setup_arch(struct kvm *kvm)
 {
+	int r;
+
 	/* Legacy ioport setup */
 
 	/* 0000 - 001F - DMA1 controller */
-	ioport__register(kvm, 0x0000, &dummy_read_write_ioport_ops, 32, NULL);
+	r = ioport__register(kvm, 0x0000, &dummy_read_write_ioport_ops, 32, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0x0020 - 0x003F - 8259A PIC 1 */
-	ioport__register(kvm, 0x0020, &dummy_read_write_ioport_ops, 2, NULL);
+	r = ioport__register(kvm, 0x0020, &dummy_read_write_ioport_ops, 2, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 0040-005F - PIT - PROGRAMMABLE INTERVAL TIMER (8253, 8254) */
-	ioport__register(kvm, 0x0040, &dummy_read_write_ioport_ops, 4, NULL);
+	r = ioport__register(kvm, 0x0040, &dummy_read_write_ioport_ops, 4, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0092 - PS/2 system control port A */
-	ioport__register(kvm, 0x0092, &ps2_control_a_ops, 1, NULL);
+	r = ioport__register(kvm, 0x0092, &ps2_control_a_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0x00A0 - 0x00AF - 8259A PIC 2 */
-	ioport__register(kvm, 0x00A0, &dummy_read_write_ioport_ops, 2, NULL);
+	r = ioport__register(kvm, 0x00A0, &dummy_read_write_ioport_ops, 2, NULL);
+	if (r < 0)
+		return r;
 
 	/* 00C0 - 001F - DMA2 controller */
-	ioport__register(kvm, 0x00C0, &dummy_read_write_ioport_ops, 32, NULL);
+	r = ioport__register(kvm, 0x00C0, &dummy_read_write_ioport_ops, 32, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 00E0-00EF are 'motherboard specific' so we use them for our
 	   internal debugging purposes.  */
-	ioport__register(kvm, IOPORT_DBG, &debug_ops, 1, NULL);
+	r = ioport__register(kvm, IOPORT_DBG, &debug_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 00ED - DUMMY PORT FOR DELAY??? */
-	ioport__register(kvm, 0x00ED, &dummy_write_only_ioport_ops, 1, NULL);
+	r = ioport__register(kvm, 0x00ED, &dummy_write_only_ioport_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0x00F0 - 0x00FF - Math co-processor */
-	ioport__register(kvm, 0x00F0, &dummy_write_only_ioport_ops, 2, NULL);
+	r = ioport__register(kvm, 0x00F0, &dummy_write_only_ioport_ops, 2, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 0278-027A - PARALLEL PRINTER PORT (usually LPT1, sometimes LPT2) */
-	ioport__register(kvm, 0x0278, &dummy_read_write_ioport_ops, 3, NULL);
+	r = ioport__register(kvm, 0x0278, &dummy_read_write_ioport_ops, 3, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 0378-037A - PARALLEL PRINTER PORT (usually LPT2, sometimes LPT3) */
-	ioport__register(kvm, 0x0378, &dummy_read_write_ioport_ops, 3, NULL);
+	r = ioport__register(kvm, 0x0378, &dummy_read_write_ioport_ops, 3, NULL);
+	if (r < 0)
+		return r;
 
 	/* PORT 03D4-03D5 - COLOR VIDEO - CRT CONTROL REGISTERS */
-	ioport__register(kvm, 0x03D4, &dummy_read_write_ioport_ops, 1, NULL);
-	ioport__register(kvm, 0x03D5, &dummy_write_only_ioport_ops, 1, NULL);
+	r = ioport__register(kvm, 0x03D4, &dummy_read_write_ioport_ops, 1, NULL);
+	if (r < 0)
+		return r;
+	r = ioport__register(kvm, 0x03D5, &dummy_write_only_ioport_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
-	ioport__register(kvm, 0x402, &seabios_debug_ops, 1, NULL);
+	r = ioport__register(kvm, 0x402, &seabios_debug_ops, 1, NULL);
+	if (r < 0)
+		return r;
 
 	/* 0510 - QEMU BIOS configuration register */
-	ioport__register(kvm, 0x510, &dummy_read_write_ioport_ops, 2, NULL);
+	r = ioport__register(kvm, 0x510, &dummy_read_write_ioport_ops, 2, NULL);
+	if (r < 0)
+		return r;
+
+	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 17/30] hw/vesa: Don't ignore fatal errors
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (15 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 16/30] Don't ignore errors registering a device, ioport or mmio emulation Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-01-30 14:52   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0 Alexandru Elisei
                   ` (14 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

Failling an mmap call or creating a memslot means that device emulation
will not work, don't ignore it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index b92cc990b730..a665736a76d7 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -76,9 +76,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
 	if (mem == MAP_FAILED)
-		ERR_PTR(-errno);
+		return ERR_PTR(-errno);
 
-	kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
+	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
+	if (r < 0)
+		return ERR_PTR(r);
 
 	vesafb = (struct framebuffer) {
 		.width			= VESA_WIDTH,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (16 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 17/30] hw/vesa: Don't ignore fatal errors Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-02-03 12:20   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio Alexandru Elisei
                   ` (13 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

BAR 0 is an I/O BAR and is registered as an ioport region. Let's set its
size, so a guest can actually use it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/vesa.c b/hw/vesa.c
index a665736a76d7..e988c0425946 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -70,6 +70,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	vesa_base_addr			= (u16)r;
 	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
+	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
 	r = device__register(&vesa_device);
 	if (r < 0)
 		return ERR_PTR(r);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (17 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0 Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-02-03 12:23   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 20/30] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
                   ` (12 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

kvmtool uses brlock for protecting accesses to the ioport and mmio
red-black trees. brlock allows concurrent reads, but only one writer,
which is assumed not to be a VCPU thread. This is done by issuing a
compiler barrier on read and pausing the entire virtual machine on
writes. When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread
read/write lock.

When we will implement reassignable BARs, the mmio or ioport mapping
will be done as a result of a VCPU mmio access. When brlock is a
read/write lock, it means that we will try to acquire a write lock with
the read lock already held by the same VCPU and we will deadlock. When
it's not, a VCPU will have to call kvm__pause, which means the virtual
machine will stay paused forever.

Let's avoid all this by using separate pthread_rwlock_t locks for the
mmio and the ioport red-black trees and carefully choosing our read
critical region such that modification as a result of a guest mmio
access doesn't deadlock.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 ioport.c | 20 +++++++++++---------
 mmio.c   | 26 +++++++++++++++++---------
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/ioport.c b/ioport.c
index d224819c6e43..c044a80dd763 100644
--- a/ioport.c
+++ b/ioport.c
@@ -2,9 +2,9 @@
 
 #include "kvm/kvm.h"
 #include "kvm/util.h"
-#include "kvm/brlock.h"
 #include "kvm/rbtree-interval.h"
 #include "kvm/mutex.h"
+#include "kvm/rwsem.h"
 
 #include <linux/kvm.h>	/* for KVM_EXIT_* */
 #include <linux/types.h>
@@ -16,6 +16,8 @@
 
 #define ioport_node(n) rb_entry(n, struct ioport, node)
 
+static DECLARE_RWSEM(ioport_lock);
+
 static struct rb_root		ioport_tree = RB_ROOT;
 
 static struct ioport *ioport_search(struct rb_root *root, u64 addr)
@@ -68,7 +70,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	struct ioport *entry;
 	int r;
 
-	br_write_lock(kvm);
+	down_write(&ioport_lock);
 
 	entry = ioport_search(&ioport_tree, port);
 	if (entry) {
@@ -96,7 +98,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	r = device__register(&entry->dev_hdr);
 	if (r < 0)
 		goto out_erase;
-	br_write_unlock(kvm);
+	up_write(&ioport_lock);
 
 	return port;
 
@@ -104,7 +106,7 @@ out_erase:
 	rb_int_erase(&ioport_tree, &entry->node);
 out_free:
 	free(entry);
-	br_write_unlock(kvm);
+	up_write(&ioport_lock);
 	return r;
 }
 
@@ -113,7 +115,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
 	struct ioport *entry;
 	int r;
 
-	br_write_lock(kvm);
+	down_write(&ioport_lock);
 
 	r = -ENOENT;
 	entry = ioport_search(&ioport_tree, port);
@@ -128,7 +130,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
 	r = 0;
 
 done:
-	br_write_unlock(kvm);
+	up_write(&ioport_lock);
 
 	return r;
 }
@@ -171,8 +173,10 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
 	void *ptr = data;
 	struct kvm *kvm = vcpu->kvm;
 
-	br_read_lock(kvm);
+	down_read(&ioport_lock);
 	entry = ioport_search(&ioport_tree, port);
+	up_read(&ioport_lock);
+
 	if (!entry)
 		goto out;
 
@@ -188,8 +192,6 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
 	}
 
 out:
-	br_read_unlock(kvm);
-
 	if (ret)
 		return true;
 
diff --git a/mmio.c b/mmio.c
index 61e1d47a587d..4e0ff830c738 100644
--- a/mmio.c
+++ b/mmio.c
@@ -1,7 +1,7 @@
 #include "kvm/kvm.h"
 #include "kvm/kvm-cpu.h"
 #include "kvm/rbtree-interval.h"
-#include "kvm/brlock.h"
+#include "kvm/rwsem.h"
 
 #include <stdio.h>
 #include <stdlib.h>
@@ -15,6 +15,8 @@
 
 #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
 
+static DECLARE_RWSEM(mmio_lock);
+
 struct mmio_mapping {
 	struct rb_int_node	node;
 	void			(*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
@@ -61,7 +63,7 @@ static const char *to_direction(u8 is_write)
 
 int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
 		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
-			void *ptr)
+		       void *ptr)
 {
 	struct mmio_mapping *mmio;
 	struct kvm_coalesced_mmio_zone zone;
@@ -88,9 +90,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
 			return -errno;
 		}
 	}
-	br_write_lock(kvm);
+	down_write(&mmio_lock);
 	ret = mmio_insert(&mmio_tree, mmio);
-	br_write_unlock(kvm);
+	up_write(&mmio_lock);
 
 	return ret;
 }
@@ -100,10 +102,10 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
 	struct mmio_mapping *mmio;
 	struct kvm_coalesced_mmio_zone zone;
 
-	br_write_lock(kvm);
+	down_write(&mmio_lock);
 	mmio = mmio_search_single(&mmio_tree, phys_addr);
 	if (mmio == NULL) {
-		br_write_unlock(kvm);
+		up_write(&mmio_lock);
 		return false;
 	}
 
@@ -114,7 +116,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
 	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
 
 	rb_int_erase(&mmio_tree, &mmio->node);
-	br_write_unlock(kvm);
+	up_write(&mmio_lock);
 
 	free(mmio);
 	return true;
@@ -124,8 +126,15 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
 {
 	struct mmio_mapping *mmio;
 
-	br_read_lock(vcpu->kvm);
+	/*
+	 * The callback might call kvm__register_mmio which takes a write lock,
+	 * so avoid deadlocks by protecting only the node search with a reader
+	 * lock. Note that there is still a small time window for a node to be
+	 * deleted by another vcpu before mmio_fn gets called.
+	 */
+	down_read(&mmio_lock);
 	mmio = mmio_search(&mmio_tree, phys_addr, len);
+	up_read(&mmio_lock);
 
 	if (mmio)
 		mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
@@ -135,7 +144,6 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
 				to_direction(is_write),
 				(unsigned long long)phys_addr, len);
 	}
-	br_read_unlock(vcpu->kvm);
 
 	return true;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 20/30] pci: Add helpers for BAR values and memory/IO space access
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (18 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-02-05 17:00   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 21/30] virtio/pci: Get emulated region address from BARs Alexandru Elisei
                   ` (11 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

We're going to be checking the BAR type, the address written to it and
if access to memory or I/O space is enabled quite often when we add
support for reasignable BARs, add helpers for it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/pci.h | 48 +++++++++++++++++++++++++++++++++++++++++++++++
 pci.c             |  2 +-
 2 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index ccb155e3e8fe..235cd82fff3c 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -5,6 +5,7 @@
 #include <linux/kvm.h>
 #include <linux/pci_regs.h>
 #include <endian.h>
+#include <stdbool.h>
 
 #include "kvm/devices.h"
 #include "kvm/msi.h"
@@ -161,4 +162,51 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
 
+static inline bool __pci__memory_space_enabled(u16 command)
+{
+	return command & PCI_COMMAND_MEMORY;
+}
+
+static inline bool pci__memory_space_enabled(struct pci_device_header *pci_hdr)
+{
+	return __pci__memory_space_enabled(pci_hdr->command);
+}
+
+static inline bool __pci__io_space_enabled(u16 command)
+{
+	return command & PCI_COMMAND_IO;
+}
+
+static inline bool pci__io_space_enabled(struct pci_device_header *pci_hdr)
+{
+	return __pci__io_space_enabled(pci_hdr->command);
+}
+
+static inline bool __pci__bar_is_io(u32 bar)
+{
+	return bar & PCI_BASE_ADDRESS_SPACE_IO;
+}
+
+static inline bool pci__bar_is_io(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return __pci__bar_is_io(pci_hdr->bar[bar_num]);
+}
+
+static inline bool pci__bar_is_memory(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return !pci__bar_is_io(pci_hdr, bar_num);
+}
+
+static inline u32 __pci__bar_address(u32 bar)
+{
+	if (__pci__bar_is_io(bar))
+		return bar & PCI_BASE_ADDRESS_IO_MASK;
+	return bar & PCI_BASE_ADDRESS_MEM_MASK;
+}
+
+static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return __pci__bar_address(pci_hdr->bar[bar_num]);
+}
+
 #endif /* KVM__PCI_H */
diff --git a/pci.c b/pci.c
index b6892d974c08..4f7b863298f6 100644
--- a/pci.c
+++ b/pci.c
@@ -185,7 +185,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	 * size, it will write the address back.
 	 */
 	if (bar < 6) {
-		if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO)
+		if (pci__bar_is_io(pci_hdr, bar))
 			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
 		else
 			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 21/30] virtio/pci: Get emulated region address from BARs
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (19 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 20/30] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-02-05 17:01   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
                   ` (10 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

The struct virtio_pci fields port_addr, mmio_addr and msix_io_block
represent the same addresses that are written in the corresponding BARs.
Remove this duplication of information and always use the address from the
BAR. This will make our life a lot easier when we add support for
reassignable BARs, because we won't have to update the fields on each BAR
change.

No functional changes.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/virtio-pci.h |  3 --
 virtio/pci.c             | 86 ++++++++++++++++++++++++++--------------
 2 files changed, 56 insertions(+), 33 deletions(-)

diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h
index 278a25950d8b..959b4b81c871 100644
--- a/include/kvm/virtio-pci.h
+++ b/include/kvm/virtio-pci.h
@@ -24,8 +24,6 @@ struct virtio_pci {
 	void			*dev;
 	struct kvm		*kvm;
 
-	u16			port_addr;
-	u32			mmio_addr;
 	u8			status;
 	u8			isr;
 	u32			features;
@@ -43,7 +41,6 @@ struct virtio_pci {
 	u32			config_gsi;
 	u32			vq_vector[VIRTIO_PCI_MAX_VQ];
 	u32			gsis[VIRTIO_PCI_MAX_VQ];
-	u32			msix_io_block;
 	u64			msix_pba;
 	struct msix_table	msix_table[VIRTIO_PCI_MAX_VQ + VIRTIO_PCI_MAX_CONFIG];
 
diff --git a/virtio/pci.c b/virtio/pci.c
index 6723a1f3a84d..c4822514856c 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -13,6 +13,21 @@
 #include <linux/byteorder.h>
 #include <string.h>
 
+static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
+{
+	return pci__bar_address(&vpci->pci_hdr, 0);
+}
+
+static u32 virtio_pci__mmio_addr(struct virtio_pci *vpci)
+{
+	return pci__bar_address(&vpci->pci_hdr, 1);
+}
+
+static u32 virtio_pci__msix_io_addr(struct virtio_pci *vpci)
+{
+	return pci__bar_address(&vpci->pci_hdr, 2);
+}
+
 static void virtio_pci__ioevent_callback(struct kvm *kvm, void *param)
 {
 	struct virtio_pci_ioevent_param *ioeventfd = param;
@@ -25,6 +40,8 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 {
 	struct ioevent ioevent;
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
+	u16 port_addr = virtio_pci__port_addr(vpci);
 	int r, flags = 0;
 	int fd;
 
@@ -48,7 +65,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 		flags |= IOEVENTFD_FLAG_USER_POLL;
 
 	/* ioport */
-	ioevent.io_addr	= vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY;
+	ioevent.io_addr	= port_addr + VIRTIO_PCI_QUEUE_NOTIFY;
 	ioevent.io_len	= sizeof(u16);
 	ioevent.fd	= fd = eventfd(0, 0);
 	r = ioeventfd__add_event(&ioevent, flags | IOEVENTFD_FLAG_PIO);
@@ -56,7 +73,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 		return r;
 
 	/* mmio */
-	ioevent.io_addr	= vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY;
+	ioevent.io_addr	= mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY;
 	ioevent.io_len	= sizeof(u16);
 	ioevent.fd	= eventfd(0, 0);
 	r = ioeventfd__add_event(&ioevent, flags);
@@ -68,7 +85,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 	return 0;
 
 free_ioport_evt:
-	ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
+	ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
 	return r;
 }
 
@@ -76,9 +93,11 @@ static void virtio_pci_exit_vq(struct kvm *kvm, struct virtio_device *vdev,
 			       int vq)
 {
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
+	u16 port_addr = virtio_pci__port_addr(vpci);
 
-	ioeventfd__del_event(vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
-	ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
+	ioeventfd__del_event(mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
+	ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
 	virtio_exit_vq(kvm, vdev, vpci->dev, vq);
 }
 
@@ -163,10 +182,12 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p
 	unsigned long offset;
 	struct virtio_device *vdev;
 	struct virtio_pci *vpci;
+	u16 port_addr;
 
 	vdev = ioport->priv;
 	vpci = vdev->virtio;
-	offset = port - vpci->port_addr;
+	port_addr = virtio_pci__port_addr(vpci);
+	offset = port - port_addr;
 
 	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
 }
@@ -323,10 +344,12 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 	unsigned long offset;
 	struct virtio_device *vdev;
 	struct virtio_pci *vpci;
+	u16 port_addr;
 
 	vdev = ioport->priv;
 	vpci = vdev->virtio;
-	offset = port - vpci->port_addr;
+	port_addr = virtio_pci__port_addr(vpci);
+	offset = port - port_addr;
 
 	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
 }
@@ -343,17 +366,18 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
 	struct virtio_device *vdev = ptr;
 	struct virtio_pci *vpci = vdev->virtio;
 	struct msix_table *table;
+	u32 msix_io_addr = virtio_pci__msix_io_addr(vpci);
 	int vecnum;
 	size_t offset;
 
-	if (addr > vpci->msix_io_block + PCI_IO_SIZE) {
+	if (addr > msix_io_addr + PCI_IO_SIZE) {
 		if (is_write)
 			return;
 		table  = (struct msix_table *)&vpci->msix_pba;
-		offset = addr - (vpci->msix_io_block + PCI_IO_SIZE);
+		offset = addr - (msix_io_addr + PCI_IO_SIZE);
 	} else {
 		table  = vpci->msix_table;
-		offset = addr - vpci->msix_io_block;
+		offset = addr - msix_io_addr;
 	}
 	vecnum = offset / sizeof(struct msix_table);
 	offset = offset % sizeof(struct msix_table);
@@ -442,19 +466,20 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 {
 	struct virtio_device *vdev = ptr;
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
 
 	if (!is_write)
-		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
-				    data, len);
+		virtio_pci__data_in(vcpu, vdev, addr - mmio_addr, data, len);
 	else
-		virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr,
-				     data, len);
+		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
 }
 
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		     int device_id, int subsys_id, int class)
 {
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr, msix_io_block;
+	u16 port_addr;
 	int r;
 
 	vpci->kvm = kvm;
@@ -462,20 +487,21 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = pci_get_io_port_block(PCI_IO_SIZE);
-	r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev);
+	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
+			     vdev);
 	if (r < 0)
 		return r;
-	vpci->port_addr = (u16)r;
+	port_addr = (u16)r;
 
-	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
-	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
+	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
+	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
 			       virtio_pci__io_mmio_callback, vdev);
 	if (r < 0)
 		goto free_ioport;
 
-	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
-	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
+	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
+	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
 			       virtio_pci__msix_mmio_callback, vdev);
 	if (r < 0)
 		goto free_mmio;
@@ -491,11 +517,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.class[2]		= (class >> 16) & 0xff,
 		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
 		.subsys_id		= cpu_to_le16(subsys_id),
-		.bar[0]			= cpu_to_le32(vpci->port_addr
+		.bar[0]			= cpu_to_le32(port_addr
 							| PCI_BASE_ADDRESS_SPACE_IO),
-		.bar[1]			= cpu_to_le32(vpci->mmio_addr
+		.bar[1]			= cpu_to_le32(mmio_addr
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
-		.bar[2]			= cpu_to_le32(vpci->msix_io_block
+		.bar[2]			= cpu_to_le32(msix_io_block
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
 		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
 		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
@@ -542,11 +568,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	return 0;
 
 free_msix_mmio:
-	kvm__deregister_mmio(kvm, vpci->msix_io_block);
+	kvm__deregister_mmio(kvm, msix_io_block);
 free_mmio:
-	kvm__deregister_mmio(kvm, vpci->mmio_addr);
+	kvm__deregister_mmio(kvm, mmio_addr);
 free_ioport:
-	ioport__unregister(kvm, vpci->port_addr);
+	ioport__unregister(kvm, port_addr);
 	return r;
 }
 
@@ -566,9 +592,9 @@ int virtio_pci__exit(struct kvm *kvm, struct virtio_device *vdev)
 	struct virtio_pci *vpci = vdev->virtio;
 
 	virtio_pci__reset(kvm, vdev);
-	kvm__deregister_mmio(kvm, vpci->mmio_addr);
-	kvm__deregister_mmio(kvm, vpci->msix_io_block);
-	ioport__unregister(kvm, vpci->port_addr);
+	kvm__deregister_mmio(kvm, virtio_pci__mmio_addr(vpci));
+	kvm__deregister_mmio(kvm, virtio_pci__msix_io_addr(vpci));
+	ioport__unregister(kvm, virtio_pci__port_addr(vpci));
 
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (20 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 21/30] virtio/pci: Get emulated region address from BARs Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-02-05 17:01   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 23/30] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
                   ` (9 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

When we want to map a device region into the guest address space, first
we perform an mmap on the device fd. The resulting VMA is a mapping
between host userspace addresses and physical addresses associated with
the device. Next, we create a memslot, which populates the stage 2 table
with the mappings between guest physical addresses and the device
physical adresses.

However, when we want to unmap the device from the guest address space,
we only call munmap, which destroys the VMA and the stage 2 mappings,
but doesn't destroy the memslot and kvmtool's internal mem_bank
structure associated with the memslot.

This has been perfectly fine so far, because we only unmap a device
region when we exit kvmtool. This is will change when we add support for
reassignable BARs, and we will have to unmap vfio regions as the guest
kernel writes new addresses in the BARs. This can lead to two possible
problems:

- We refuse to create a valid BAR mapping because of a stale mem_bank
  structure which belonged to a previously unmapped region.

- It is possible that the mmap in vfio_map_region returns the same
  address that was used to create a memslot, but was unmapped by
  vfio_unmap_region. Guest accesses to the device memory will fault
  because the stage 2 mappings are missing, and this can lead to
  performance degradation.

Let's do the right thing and destroy the memslot and the mem_bank struct
associated with it when we unmap a vfio region. Set host_addr to NULL
after the munmap call so we won't try to unmap an address which is
currently used if vfio_unmap_region gets called twice.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/kvm.h |  2 ++
 kvm.c             | 65 ++++++++++++++++++++++++++++++++++++++++++++---
 vfio/core.c       |  6 +++++
 3 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 50119a8672eb..c7e57b890cdd 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -56,6 +56,7 @@ struct kvm_mem_bank {
 	void			*host_addr;
 	u64			size;
 	enum kvm_mem_type	type;
+	u32			slot;
 };
 
 struct kvm {
@@ -106,6 +107,7 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
 void kvm__irq_trigger(struct kvm *kvm, int irq);
 bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count);
 bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write);
+int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr);
 int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr,
 		      enum kvm_mem_type type);
 static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size,
diff --git a/kvm.c b/kvm.c
index 57c4ff98ec4c..afcf55c7bf45 100644
--- a/kvm.c
+++ b/kvm.c
@@ -183,20 +183,75 @@ int kvm__exit(struct kvm *kvm)
 }
 core_exit(kvm__exit);
 
+int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size,
+		     void *userspace_addr)
+{
+	struct kvm_userspace_memory_region mem;
+	struct kvm_mem_bank *bank;
+	int ret;
+
+	list_for_each_entry(bank, &kvm->mem_banks, list)
+		if (bank->guest_phys_addr == guest_phys &&
+		    bank->size == size && bank->host_addr == userspace_addr)
+			break;
+
+	if (&bank->list == &kvm->mem_banks) {
+		pr_err("Region [%llx-%llx] not found", guest_phys,
+		       guest_phys + size - 1);
+		return -EINVAL;
+	}
+
+	if (bank->type == KVM_MEM_TYPE_RESERVED) {
+		pr_err("Cannot delete reserved region [%llx-%llx]",
+		       guest_phys, guest_phys + size - 1);
+		return -EINVAL;
+	}
+
+	mem = (struct kvm_userspace_memory_region) {
+		.slot			= bank->slot,
+		.guest_phys_addr	= guest_phys,
+		.memory_size		= 0,
+		.userspace_addr		= (unsigned long)userspace_addr,
+	};
+
+	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
+	if (ret < 0)
+		return -errno;
+
+	list_del(&bank->list);
+	free(bank);
+	kvm->mem_slots--;
+
+	return 0;
+}
+
 int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 		      void *userspace_addr, enum kvm_mem_type type)
 {
 	struct kvm_userspace_memory_region mem;
 	struct kvm_mem_bank *merged = NULL;
 	struct kvm_mem_bank *bank;
+	struct list_head *prev_entry;
+	u32 slot;
 	int ret;
 
-	/* Check for overlap */
+	/* Check for overlap and find first empty slot. */
+	slot = 0;
+	prev_entry = &kvm->mem_banks;
 	list_for_each_entry(bank, &kvm->mem_banks, list) {
 		u64 bank_end = bank->guest_phys_addr + bank->size - 1;
 		u64 end = guest_phys + size - 1;
-		if (guest_phys > bank_end || end < bank->guest_phys_addr)
+		if (guest_phys > bank_end || end < bank->guest_phys_addr) {
+			/*
+			 * Keep the banks sorted ascending by slot, so it's
+			 * easier for us to find a free slot.
+			 */
+			if (bank->slot == slot) {
+				slot++;
+				prev_entry = &bank->list;
+			}
 			continue;
+		}
 
 		/* Merge overlapping reserved regions */
 		if (bank->type == KVM_MEM_TYPE_RESERVED &&
@@ -241,10 +296,11 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 	bank->host_addr			= userspace_addr;
 	bank->size			= size;
 	bank->type			= type;
+	bank->slot			= slot;
 
 	if (type != KVM_MEM_TYPE_RESERVED) {
 		mem = (struct kvm_userspace_memory_region) {
-			.slot			= kvm->mem_slots++,
+			.slot			= slot,
 			.guest_phys_addr	= guest_phys,
 			.memory_size		= size,
 			.userspace_addr		= (unsigned long)userspace_addr,
@@ -255,7 +311,8 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 			return -errno;
 	}
 
-	list_add(&bank->list, &kvm->mem_banks);
+	list_add(&bank->list, prev_entry);
+	kvm->mem_slots++;
 
 	return 0;
 }
diff --git a/vfio/core.c b/vfio/core.c
index 0ed1e6fee6bf..73fdac8be675 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -256,8 +256,14 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
 
 void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
 {
+	u64 map_size;
+
 	if (region->host_addr) {
+		map_size = ALIGN(region->info.size, PAGE_SIZE);
 		munmap(region->host_addr, region->info.size);
+		kvm__destroy_mem(kvm, region->guest_phys_addr, map_size,
+				 region->host_addr);
+		region->host_addr = NULL;
 	} else if (region->is_ioport) {
 		ioport__unregister(kvm, region->port_base);
 	} else {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 23/30] vfio: Reserve ioports when configuring the BAR
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (21 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-02-05 18:34   ` Andre Przywara
  2020-01-23 13:47 ` [PATCH v2 kvmtool 24/30] vfio/pci: Don't write configuration value twice Alexandru Elisei
                   ` (8 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

Let's be consistent and reserve ioports when we are configuring the BAR,
not when we map it, just like we do with mmio regions.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/core.c | 9 +++------
 vfio/pci.c  | 4 +++-
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/vfio/core.c b/vfio/core.c
index 73fdac8be675..6b9b58ea8d2f 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -202,14 +202,11 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
 				  struct vfio_region *region)
 {
 	if (region->is_ioport) {
-		int port = pci_get_io_port_block(region->info.size);
-
-		port = ioport__register(kvm, port, &vfio_ioport_ops,
-					region->info.size, region);
+		int port = ioport__register(kvm, region->port_base,
+					   &vfio_ioport_ops, region->info.size,
+					   region);
 		if (port < 0)
 			return port;
-
-		region->port_base = port;
 		return 0;
 	}
 
diff --git a/vfio/pci.c b/vfio/pci.c
index f86a7d9b7032..abde16dc8693 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -885,7 +885,9 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 		}
 	}
 
-	if (!region->is_ioport) {
+	if (region->is_ioport) {
+		region->port_base = pci_get_io_port_block(region->info.size);
+	} else {
 		/* Grab some MMIO space in the guest */
 		map_size = ALIGN(region->info.size, PAGE_SIZE);
 		region->guest_phys_addr = pci_get_mmio_block(map_size);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 24/30] vfio/pci: Don't write configuration value twice
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (22 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 23/30] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
@ 2020-01-23 13:47 ` Alexandru Elisei
  2020-02-05 18:35   ` Andre Przywara
  2020-01-23 13:48 ` [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
                   ` (7 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:47 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

After writing to the device fd as part of the PCI configuration space
emulation, we read back from the device to make sure that the write
finished. The value is read back into the PCI configuration space and
afterwards, the same value is copied by the PCI emulation code. Let's
read from the device fd into a temporary variable, to prevent this
double write.

The double write is harmless in itself. But when we implement
reassignable BARs, we need to keep track of the old BAR value, and the
VFIO code is overwritting it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index abde16dc8693..8a775a4a4a54 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -470,7 +470,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
 	struct vfio_device *vdev;
-	void *base = pci_hdr;
+	u32 tmp;
 
 	if (offset == PCI_ROM_ADDRESS)
 		return;
@@ -490,7 +490,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
 		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
 
-	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
+	if (pread(vdev->fd, &tmp, sz, info->offset + offset) != sz)
 		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
 			      sz, offset);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (23 preceding siblings ...)
  2020-01-23 13:47 ` [PATCH v2 kvmtool 24/30] vfio/pci: Don't write configuration value twice Alexandru Elisei
@ 2020-01-23 13:48 ` Alexandru Elisei
  2020-02-06 18:21   ` Andre Przywara
  2020-01-23 13:48 ` [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
                   ` (6 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:48 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

Implement callbacks for activating and deactivating emulation for a BAR
region. This is in preparation for allowing a guest operating system to
enable and disable access to I/O or memory space, or to reassign the
BARs.

The emulated vesa device has been refactored in the process and the static
variables were removed in order to make using the callbacks less painful.
The framebuffer isn't designed to allow stopping and restarting at
arbitrary points in the guest execution. Furthermore, on x86, the kernel
will not change the BAR addresses, which on bare metal are programmed by
the firmware, so take the easy way out and refuse to deactivate emulation
for the BAR regions.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c         | 120 ++++++++++++++++++++++++++++++++--------------
 include/kvm/pci.h |  19 +++++++-
 pci.c             |  44 +++++++++++++++++
 vfio/pci.c        | 100 +++++++++++++++++++++++++++++++-------
 virtio/pci.c      |  90 ++++++++++++++++++++++++----------
 5 files changed, 294 insertions(+), 79 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index e988c0425946..74ebebbefa6b 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -18,6 +18,12 @@
 #include <inttypes.h>
 #include <unistd.h>
 
+struct vesa_dev {
+	struct pci_device_header	pci_hdr;
+	struct device_header		dev_hdr;
+	struct framebuffer		fb;
+};
+
 static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
 {
 	return true;
@@ -33,29 +39,52 @@ static struct ioport_operations vesa_io_ops = {
 	.io_out			= vesa_pci_io_out,
 };
 
-static struct pci_device_header vesa_pci_device = {
-	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
-	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
-	.header_type		= PCI_HEADER_TYPE_NORMAL,
-	.revision_id		= 0,
-	.class[2]		= 0x03,
-	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
-	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
-	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
-	.bar_size[1]		= VESA_MEM_SIZE,
-};
+static int vesa__bar_activate(struct kvm *kvm,
+			      struct pci_device_header *pci_hdr,
+			      int bar_num, void *data)
+{
+	struct vesa_dev *vdev = data;
+	u32 bar_addr, bar_size;
+	char *mem;
+	int r;
 
-static struct device_header vesa_device = {
-	.bus_type	= DEVICE_BUS_PCI,
-	.data		= &vesa_pci_device,
-};
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+	bar_size = pci_hdr->bar_size[bar_num];
 
-static struct framebuffer vesafb;
+	switch (bar_num) {
+	case 0:
+		r = ioport__register(kvm, bar_addr, &vesa_io_ops, bar_size,
+				     NULL);
+		break;
+	case 1:
+		mem = mmap(NULL, bar_size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
+		if (mem == MAP_FAILED) {
+			r = -errno;
+			break;
+		}
+		r = kvm__register_dev_mem(kvm, bar_addr, bar_size, mem);
+		if (r < 0)
+			break;
+		vdev->fb.mem = mem;
+		break;
+	default:
+		r = -EINVAL;
+	}
+
+	return r;
+}
+
+static int vesa__bar_deactivate(struct kvm *kvm,
+				struct pci_device_header *pci_hdr,
+				int bar_num, void *data)
+{
+	return -EINVAL;
+}
 
 struct framebuffer *vesa__init(struct kvm *kvm)
 {
-	u16 vesa_base_addr;
-	char *mem;
+	struct vesa_dev *vdev;
+	u16 port_addr;
 	int r;
 
 	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
@@ -63,34 +92,51 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
-	r = pci_get_io_port_block(PCI_IO_SIZE);
-	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
-	if (r < 0)
-		return ERR_PTR(r);
 
-	vesa_base_addr			= (u16)r;
-	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
-	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
-	r = device__register(&vesa_device);
-	if (r < 0)
-		return ERR_PTR(r);
+	vdev = calloc(1, sizeof(*vdev));
+	if (vdev == NULL)
+		return ERR_PTR(-ENOMEM);
 
-	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
-	if (mem == MAP_FAILED)
-		return ERR_PTR(-errno);
+	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
 
-	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
-	if (r < 0)
-		return ERR_PTR(r);
+	vdev->pci_hdr = (struct pci_device_header) {
+		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
+		.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
+		.command		= PCI_COMMAND_IO | PCI_COMMAND_MEMORY,
+		.header_type		= PCI_HEADER_TYPE_NORMAL,
+		.revision_id		= 0,
+		.class[2]		= 0x03,
+		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
+		.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
+		.bar[0]			= cpu_to_le32(port_addr | PCI_BASE_ADDRESS_SPACE_IO),
+		.bar_size[0]		= PCI_IO_SIZE,
+		.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
+		.bar_size[1]		= VESA_MEM_SIZE,
+	};
 
-	vesafb = (struct framebuffer) {
+	vdev->fb = (struct framebuffer) {
 		.width			= VESA_WIDTH,
 		.height			= VESA_HEIGHT,
 		.depth			= VESA_BPP,
-		.mem			= mem,
+		.mem			= NULL,
 		.mem_addr		= VESA_MEM_ADDR,
 		.mem_size		= VESA_MEM_SIZE,
 		.kvm			= kvm,
 	};
-	return fb__register(&vesafb);
+
+	r = pci__register_bar_regions(kvm, &vdev->pci_hdr, vesa__bar_activate,
+				      vesa__bar_deactivate, vdev);
+	if (r < 0)
+		return ERR_PTR(r);
+
+	vdev->dev_hdr = (struct device_header) {
+		.bus_type       = DEVICE_BUS_PCI,
+		.data           = &vdev->pci_hdr,
+	};
+
+	r = device__register(&vdev->dev_hdr);
+	if (r < 0)
+		return ERR_PTR(r);
+
+	return fb__register(&vdev->fb);
 }
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 235cd82fff3c..bf42f497168f 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -89,12 +89,19 @@ struct pci_cap_hdr {
 	u8	next;
 };
 
+struct pci_device_header;
+
+typedef int (*bar_activate_fn_t)(struct kvm *kvm,
+				 struct pci_device_header *pci_hdr,
+				 int bar_num, void *data);
+typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
+				   struct pci_device_header *pci_hdr,
+				   int bar_num, void *data);
+
 #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
 #define PCI_DEV_CFG_SIZE	256
 #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
-struct pci_device_header;
-
 struct pci_config_operations {
 	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
 		      u8 offset, void *data, int sz);
@@ -136,6 +143,9 @@ struct pci_device_header {
 
 	/* Private to lkvm */
 	u32		bar_size[6];
+	bar_activate_fn_t	bar_activate_fn;
+	bar_deactivate_fn_t	bar_deactivate_fn;
+	void *data;
 	struct pci_config_operations	cfg_ops;
 	/*
 	 * PCI INTx# are level-triggered, but virtual device often feature
@@ -160,8 +170,13 @@ void pci__assign_irq(struct device_header *dev_hdr);
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size);
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size);
 
+
 void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
 
+int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      bar_activate_fn_t bar_activate_fn,
+			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
+
 static inline bool __pci__memory_space_enabled(u16 command)
 {
 	return command & PCI_COMMAND_MEMORY;
diff --git a/pci.c b/pci.c
index 4f7b863298f6..5412f2defa2e 100644
--- a/pci.c
+++ b/pci.c
@@ -66,6 +66,11 @@ void pci__assign_irq(struct device_header *dev_hdr)
 		pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
 }
 
+static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return  bar_num < 6 && pci_hdr->bar_size[bar_num];
+}
+
 static void *pci_config_address_ptr(u16 port)
 {
 	unsigned long offset;
@@ -264,6 +269,45 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
 	return hdr->data;
 }
 
+int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      bar_activate_fn_t bar_activate_fn,
+			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
+{
+	int i, r;
+	bool has_bar_regions = false;
+
+	assert(bar_activate_fn && bar_deactivate_fn);
+
+	pci_hdr->bar_activate_fn = bar_activate_fn;
+	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
+	pci_hdr->data = data;
+
+	for (i = 0; i < 6; i++) {
+		if (!pci_bar_is_implemented(pci_hdr, i))
+			continue;
+
+		has_bar_regions = true;
+
+		if (pci__bar_is_io(pci_hdr, i) &&
+		    pci__io_space_enabled(pci_hdr)) {
+				r = bar_activate_fn(kvm, pci_hdr, i, data);
+				if (r < 0)
+					return r;
+			}
+
+		if (pci__bar_is_memory(pci_hdr, i) &&
+		    pci__memory_space_enabled(pci_hdr)) {
+				r = bar_activate_fn(kvm, pci_hdr, i, data);
+				if (r < 0)
+					return r;
+			}
+	}
+
+	assert(has_bar_regions);
+
+	return 0;
+}
+
 int pci__init(struct kvm *kvm)
 {
 	int r;
diff --git a/vfio/pci.c b/vfio/pci.c
index 8a775a4a4a54..9e595562180b 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -446,6 +446,83 @@ out_unlock:
 	mutex_unlock(&pdev->msi.mutex);
 }
 
+static int vfio_pci_bar_activate(struct kvm *kvm,
+				 struct pci_device_header *pci_hdr,
+				 int bar_num, void *data)
+{
+	struct vfio_device *vdev = data;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
+	struct vfio_pci_msix_table *table = &pdev->msix_table;
+	struct vfio_region *region = &vdev->regions[bar_num];
+	int ret;
+
+	if (!region->info.size) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
+	    (u32)bar_num == table->bar) {
+		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
+					 table->size, false,
+					 vfio_pci_msix_table_access, pdev);
+		if (ret < 0 || table->bar!= pba->bar)
+			goto out;
+	}
+
+	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
+	    (u32)bar_num == pba->bar) {
+		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
+					 pba->size, false,
+					 vfio_pci_msix_pba_access, pdev);
+		goto out;
+	}
+
+	ret = vfio_map_region(kvm, vdev, region);
+out:
+	return ret;
+}
+
+static int vfio_pci_bar_deactivate(struct kvm *kvm,
+				   struct pci_device_header *pci_hdr,
+				   int bar_num, void *data)
+{
+	struct vfio_device *vdev = data;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
+	struct vfio_pci_msix_table *table = &pdev->msix_table;
+	struct vfio_region *region = &vdev->regions[bar_num];
+	int ret;
+	bool success;
+
+	if (!region->info.size) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
+	    (u32)bar_num == table->bar) {
+		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
+		ret = (success ? 0 : -EINVAL);
+		if (ret < 0 || table->bar!= pba->bar)
+			goto out;
+	}
+
+	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
+	    (u32)bar_num == pba->bar) {
+		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
+		ret = (success ? 0 : -EINVAL);
+		goto out;
+	}
+
+	vfio_unmap_region(kvm, region);
+	ret = 0;
+
+out:
+	return ret;
+}
+
 static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
 			      u8 offset, void *data, int sz)
 {
@@ -804,12 +881,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
 		ret = -ENOMEM;
 		goto out_free;
 	}
-	pba->guest_phys_addr = table->guest_phys_addr + table->size;
-
-	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
-				 false, vfio_pci_msix_table_access, pdev);
-	if (ret < 0)
-		goto out_free;
 
 	/*
 	 * We could map the physical PBA directly into the guest, but it's
@@ -819,10 +890,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
 	 * between MSI-X table and PBA. For the sake of isolation, create a
 	 * virtual PBA.
 	 */
-	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
-				 vfio_pci_msix_pba_access, pdev);
-	if (ret < 0)
-		goto out_free;
+	pba->guest_phys_addr = table->guest_phys_addr + table->size;
 
 	pdev->msix.entries = entries;
 	pdev->msix.nr_entries = nr_entries;
@@ -893,11 +961,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 		region->guest_phys_addr = pci_get_mmio_block(map_size);
 	}
 
-	/* Map the BARs into the guest or setup a trap region. */
-	ret = vfio_map_region(kvm, vdev, region);
-	if (ret)
-		return ret;
-
 	return 0;
 }
 
@@ -944,7 +1007,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 	}
 
 	/* We've configured the BARs, fake up a Configuration Space */
-	return vfio_pci_fixup_cfg_space(vdev);
+	ret = vfio_pci_fixup_cfg_space(vdev);
+	if (ret)
+		return ret;
+
+	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
+					 vfio_pci_bar_deactivate, vdev);
 }
 
 /*
diff --git a/virtio/pci.c b/virtio/pci.c
index c4822514856c..5a3cc6f1e943 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -474,6 +474,65 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
 }
 
+static int virtio_pci__bar_activate(struct kvm *kvm,
+				    struct pci_device_header *pci_hdr,
+				    int bar_num, void *data)
+{
+	struct virtio_device *vdev = data;
+	u32 bar_addr, bar_size;
+	int r;
+
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+	bar_size = pci_hdr->bar_size[bar_num];
+
+	switch (bar_num) {
+	case 0:
+		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
+				     bar_size, vdev);
+		if (r > 0)
+			r = 0;
+		break;
+	case 1:
+		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
+					virtio_pci__io_mmio_callback, vdev);
+		break;
+	case 2:
+		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
+					virtio_pci__msix_mmio_callback, vdev);
+		break;
+	default:
+		r = -EINVAL;
+	}
+
+	return r;
+}
+
+static int virtio_pci__bar_deactivate(struct kvm *kvm,
+				      struct pci_device_header *pci_hdr,
+				      int bar_num, void *data)
+{
+	u32 bar_addr;
+	bool success;
+	int r;
+
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+
+	switch (bar_num) {
+	case 0:
+		r = ioport__unregister(kvm, bar_addr);
+		break;
+	case 1:
+	case 2:
+		success = kvm__deregister_mmio(kvm, bar_addr);
+		r = (success ? 0 : -EINVAL);
+		break;
+	default:
+		r = -EINVAL;
+	}
+
+	return r;
+}
+
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		     int device_id, int subsys_id, int class)
 {
@@ -488,23 +547,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
 	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
-	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
-			     vdev);
-	if (r < 0)
-		return r;
-	port_addr = (u16)r;
-
 	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
-	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
-			       virtio_pci__io_mmio_callback, vdev);
-	if (r < 0)
-		goto free_ioport;
-
 	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
-	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
-			       virtio_pci__msix_mmio_callback, vdev);
-	if (r < 0)
-		goto free_mmio;
 
 	vpci->pci_hdr = (struct pci_device_header) {
 		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
@@ -530,6 +574,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
 	};
 
+	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
+				      virtio_pci__bar_activate,
+				      virtio_pci__bar_deactivate, vdev);
+	if (r < 0)
+		return r;
+
 	vpci->dev_hdr = (struct device_header) {
 		.bus_type		= DEVICE_BUS_PCI,
 		.data			= &vpci->pci_hdr,
@@ -560,20 +610,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	r = device__register(&vpci->dev_hdr);
 	if (r < 0)
-		goto free_msix_mmio;
+		return r;
 
 	/* save the IRQ that device__register() has allocated */
 	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
 
 	return 0;
-
-free_msix_mmio:
-	kvm__deregister_mmio(kvm, msix_io_block);
-free_mmio:
-	kvm__deregister_mmio(kvm, mmio_addr);
-free_ioport:
-	ioport__unregister(kvm, port_addr);
-	return r;
 }
 
 int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (24 preceding siblings ...)
  2020-01-23 13:48 ` [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
@ 2020-01-23 13:48 ` Alexandru Elisei
  2020-02-06 18:21   ` Andre Przywara
  2020-01-23 13:48 ` [PATCH v2 kvmtool 27/30] pci: Implement reassignable BARs Alexandru Elisei
                   ` (5 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:48 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

During configuration of the BAR addresses, a Linux guest disables and
enables access to I/O and memory space. When access is disabled, we don't
stop emulating the memory regions described by the BARs. Now that we have
callbacks for activating and deactivating emulation for a BAR region,
let's use that to stop emulation when access is disabled, and
re-activate it when access is re-enabled.

The vesa emulation hasn't been designed with toggling on and off in
mind, so refuse writes to the PCI command register that disable memory
or IO access.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c | 16 ++++++++++++++++
 pci.c     | 42 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/hw/vesa.c b/hw/vesa.c
index 74ebebbefa6b..3044a86078fb 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -81,6 +81,18 @@ static int vesa__bar_deactivate(struct kvm *kvm,
 	return -EINVAL;
 }
 
+static void vesa__pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
+				u8 offset, void *data, int sz)
+{
+	u32 value;
+
+	if (offset == PCI_COMMAND) {
+		memcpy(&value, data, sz);
+		value |= (PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
+		memcpy(data, &value, sz);
+	}
+}
+
 struct framebuffer *vesa__init(struct kvm *kvm)
 {
 	struct vesa_dev *vdev;
@@ -114,6 +126,10 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 		.bar_size[1]		= VESA_MEM_SIZE,
 	};
 
+	vdev->pci_hdr.cfg_ops = (struct pci_config_operations) {
+		.write	= vesa__pci_cfg_write,
+	};
+
 	vdev->fb = (struct framebuffer) {
 		.width			= VESA_WIDTH,
 		.height			= VESA_HEIGHT,
diff --git a/pci.c b/pci.c
index 5412f2defa2e..98331a1fc205 100644
--- a/pci.c
+++ b/pci.c
@@ -157,6 +157,42 @@ static struct ioport_operations pci_config_data_ops = {
 	.io_out	= pci_config_data_out,
 };
 
+static void pci_config_command_wr(struct kvm *kvm,
+				  struct pci_device_header *pci_hdr,
+				  u16 new_command)
+{
+	int i;
+	bool toggle_io, toggle_mem;
+
+	toggle_io = (pci_hdr->command ^ new_command) & PCI_COMMAND_IO;
+	toggle_mem = (pci_hdr->command ^ new_command) & PCI_COMMAND_MEMORY;
+
+	for (i = 0; i < 6; i++) {
+		if (!pci_bar_is_implemented(pci_hdr, i))
+			continue;
+
+		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
+			if (__pci__io_space_enabled(new_command))
+				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
+							 pci_hdr->data);
+			else
+				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
+							   pci_hdr->data);
+		}
+
+		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
+			if (__pci__memory_space_enabled(new_command))
+				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
+							 pci_hdr->data);
+			else
+				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
+							   pci_hdr->data);
+		}
+	}
+
+	pci_hdr->command = new_command;
+}
+
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
@@ -182,6 +218,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	if (*(u32 *)(base + offset) == 0)
 		return;
 
+	if (offset == PCI_COMMAND) {
+		memcpy(&value, data, size);
+		pci_config_command_wr(kvm, pci_hdr, (u16)value);
+		return;
+	}
+
 	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 27/30] pci: Implement reassignable BARs
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (25 preceding siblings ...)
  2020-01-23 13:48 ` [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
@ 2020-01-23 13:48 ` Alexandru Elisei
  2020-02-07 16:50   ` Andre Przywara
  2020-01-23 13:48 ` [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
                   ` (4 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:48 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

BARs are used by the guest to configure the access to the PCI device by
writing the address to which the device will respond. The basic idea for
adding support for reassignable BARs is straightforward: deactivate
emulation for the memory region described by the old BAR value, and
activate emulation for the new region.

BAR reassignement can be done while device access is enabled and memory
regions for different devices can overlap as long as no access is made
to the overlapping memory regions. This means that it is legal for the
BARs of two distinct devices to point to an overlapping memory region,
and indeed, this is how Linux does resource assignment at boot. To
account for this situation, the simple algorithm described above is
enhanced to scan for all devices and:

- Deactivate emulation for any BARs that might overlap with the new BAR
  value.

- Enable emulation for any BARs that were overlapping with the old value
  after the BAR has been updated.

Activating/deactivating emulation of a memory region has side effects.
In order to prevent the execution of the same callback twice we now keep
track of the state of the region emulation. For example, this can happen
if we program a BAR with an address that overlaps a second BAR, thus
deactivating emulation for the second BAR, and then we disable all
region accesses to the second BAR by writing to the command register.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c           |   6 +-
 include/kvm/pci.h   |  23 +++-
 pci.c               | 274 +++++++++++++++++++++++++++++++++++---------
 powerpc/spapr_pci.c |   2 +-
 vfio/pci.c          |  15 ++-
 virtio/pci.c        |   8 +-
 6 files changed, 261 insertions(+), 67 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index 3044a86078fb..aca938f79c82 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -49,7 +49,7 @@ static int vesa__bar_activate(struct kvm *kvm,
 	int r;
 
 	bar_addr = pci__bar_address(pci_hdr, bar_num);
-	bar_size = pci_hdr->bar_size[bar_num];
+	bar_size = pci__bar_size(pci_hdr, bar_num);
 
 	switch (bar_num) {
 	case 0:
@@ -121,9 +121,9 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
 		.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
 		.bar[0]			= cpu_to_le32(port_addr | PCI_BASE_ADDRESS_SPACE_IO),
-		.bar_size[0]		= PCI_IO_SIZE,
+		.bar_info[0]		= (struct pci_bar_info) {.size = PCI_IO_SIZE},
 		.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
-		.bar_size[1]		= VESA_MEM_SIZE,
+		.bar_info[1]		= (struct pci_bar_info) {.size = VESA_MEM_SIZE},
 	};
 
 	vdev->pci_hdr.cfg_ops = (struct pci_config_operations) {
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index bf42f497168f..ae71ef33237c 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -11,6 +11,17 @@
 #include "kvm/msi.h"
 #include "kvm/fdt.h"
 
+#define pci_dev_err(pci_hdr, fmt, ...) \
+	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_warn(pci_hdr, fmt, ...) \
+	pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_info(pci_hdr, fmt, ...) \
+	pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_dbg(pci_hdr, fmt, ...) \
+	pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_die(pci_hdr, fmt, ...) \
+	die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+
 /*
  * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
  * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
@@ -89,6 +100,11 @@ struct pci_cap_hdr {
 	u8	next;
 };
 
+struct pci_bar_info {
+	u32 size;
+	bool active;
+};
+
 struct pci_device_header;
 
 typedef int (*bar_activate_fn_t)(struct kvm *kvm,
@@ -142,7 +158,7 @@ struct pci_device_header {
 	};
 
 	/* Private to lkvm */
-	u32		bar_size[6];
+	struct pci_bar_info	bar_info[6];
 	bar_activate_fn_t	bar_activate_fn;
 	bar_deactivate_fn_t	bar_deactivate_fn;
 	void *data;
@@ -224,4 +240,9 @@ static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_nu
 	return __pci__bar_address(pci_hdr->bar[bar_num]);
 }
 
+static inline u32 pci__bar_size(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return pci_hdr->bar_info[bar_num].size;
+}
+
 #endif /* KVM__PCI_H */
diff --git a/pci.c b/pci.c
index 98331a1fc205..1e9791250bc3 100644
--- a/pci.c
+++ b/pci.c
@@ -68,7 +68,7 @@ void pci__assign_irq(struct device_header *dev_hdr)
 
 static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
 {
-	return  bar_num < 6 && pci_hdr->bar_size[bar_num];
+	return  bar_num < 6 && pci__bar_size(pci_hdr, bar_num);
 }
 
 static void *pci_config_address_ptr(u16 port)
@@ -157,6 +157,46 @@ static struct ioport_operations pci_config_data_ops = {
 	.io_out	= pci_config_data_out,
 };
 
+static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			    int bar_num)
+{
+	int r = 0;
+
+	if (pci_hdr->bar_info[bar_num].active)
+		goto out;
+
+	r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
+	if (r < 0) {
+		pci_dev_err(pci_hdr, "Error activating emulation for BAR %d",
+			    bar_num);
+		goto out;
+	}
+	pci_hdr->bar_info[bar_num].active = true;
+
+out:
+	return r;
+}
+
+static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      int bar_num)
+{
+	int r = 0;
+
+	if (!pci_hdr->bar_info[bar_num].active)
+		goto out;
+
+	r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
+	if (r < 0) {
+		pci_dev_err(pci_hdr, "Error deactivating emulation for BAR %d",
+			    bar_num);
+		goto out;
+	}
+	pci_hdr->bar_info[bar_num].active = false;
+
+out:
+	return r;
+}
+
 static void pci_config_command_wr(struct kvm *kvm,
 				  struct pci_device_header *pci_hdr,
 				  u16 new_command)
@@ -173,26 +213,179 @@ static void pci_config_command_wr(struct kvm *kvm,
 
 		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
 			if (__pci__io_space_enabled(new_command))
-				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
-							 pci_hdr->data);
-			else
-				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
-							   pci_hdr->data);
+				pci_activate_bar(kvm, pci_hdr, i);
+			if (!__pci__io_space_enabled(new_command))
+				pci_deactivate_bar(kvm, pci_hdr, i);
 		}
 
 		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
 			if (__pci__memory_space_enabled(new_command))
-				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
-							 pci_hdr->data);
-			else
-				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
-							   pci_hdr->data);
+				pci_activate_bar(kvm, pci_hdr, i);
+			if (!__pci__memory_space_enabled(new_command))
+				pci_deactivate_bar(kvm, pci_hdr, i);
 		}
 	}
 
 	pci_hdr->command = new_command;
 }
 
+static int pci_deactivate_bar_regions(struct kvm *kvm,
+				      struct pci_device_header *pci_hdr,
+				      u32 start, u32 size)
+{
+	struct device_header *dev_hdr;
+	struct pci_device_header *tmp_hdr;
+	u32 tmp_addr, tmp_size;
+	int i, r;
+
+	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
+	while (dev_hdr) {
+		tmp_hdr = dev_hdr->data;
+		for (i = 0; i < 6; i++) {
+			if (!pci_bar_is_implemented(tmp_hdr, i))
+				continue;
+
+			tmp_addr = pci__bar_address(tmp_hdr, i);
+			tmp_size = pci__bar_size(tmp_hdr, i);
+
+			if (tmp_addr + tmp_size <= start ||
+			    tmp_addr >= start + size)
+				continue;
+
+			r = pci_deactivate_bar(kvm, tmp_hdr, i);
+			if (r < 0)
+				return r;
+		}
+		dev_hdr = device__next_dev(dev_hdr);
+	}
+
+	return 0;
+}
+
+static int pci_activate_bar_regions(struct kvm *kvm,
+				    struct pci_device_header *pci_hdr,
+				    u32 start, u32 size)
+{
+	struct device_header *dev_hdr;
+	struct pci_device_header *tmp_hdr;
+	u32 tmp_addr, tmp_size;
+	int i, r;
+
+	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
+	while (dev_hdr) {
+		tmp_hdr = dev_hdr->data;
+		for (i = 0; i < 6; i++) {
+			if (!pci_bar_is_implemented(tmp_hdr, i))
+				continue;
+
+			tmp_addr = pci__bar_address(tmp_hdr, i);
+			tmp_size = pci__bar_size(tmp_hdr, i);
+
+			if (tmp_addr + tmp_size <= start ||
+			    tmp_addr >= start + size)
+				continue;
+
+			r = pci_activate_bar(kvm, tmp_hdr, i);
+			if (r < 0)
+				return r;
+		}
+		dev_hdr = device__next_dev(dev_hdr);
+	}
+
+	return 0;
+}
+
+static void pci_config_bar_wr(struct kvm *kvm,
+			      struct pci_device_header *pci_hdr, int bar_num,
+			      u32 value)
+{
+	u32 old_addr, new_addr, bar_size;
+	u32 mask;
+	int r;
+
+	if (pci__bar_is_io(pci_hdr, bar_num))
+		mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
+	else
+		mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
+
+	/*
+	 * If the kernel masks the BAR, it will expect to find the size of the
+	 * BAR there next time it reads from it. After the kernel reads the
+	 * size, it will write the address back.
+	 *
+	 * According to the PCI local bus specification REV 3.0: The number of
+	 * upper bits that a device actually implements depends on how much of
+	 * the address space the device will respond to. A device that wants a 1
+	 * MB memory address space (using a 32-bit base address register) would
+	 * build the top 12 bits of the address register, hardwiring the other
+	 * bits to 0.
+	 *
+	 * Furthermore, software can determine how much address space the device
+	 * requires by writing a value of all 1's to the register and then
+	 * reading the value back. The device will return 0's in all don't-care
+	 * address bits, effectively specifying the address space required.
+	 *
+	 * Software computes the size of the address space with the formula
+	 * S =  ~B + 1, where S is the memory size and B is the value read from
+	 * the BAR. This means that the BAR value that kvmtool should return is
+	 * B = ~(S - 1).
+	 */
+	if (value == 0xffffffff) {
+		value = ~(pci__bar_size(pci_hdr, bar_num) - 1);
+		/* Preserve the special bits. */
+		value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
+		pci_hdr->bar[bar_num] = value;
+		return;
+	}
+
+	value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
+
+	/* Don't toggle emulation when region type access is disbled. */
+	if (pci__bar_is_io(pci_hdr, bar_num) &&
+	    !pci__io_space_enabled(pci_hdr)) {
+		pci_hdr->bar[bar_num] = value;
+		return;
+	}
+
+	if (pci__bar_is_memory(pci_hdr, bar_num) &&
+	    !pci__memory_space_enabled(pci_hdr)) {
+		pci_hdr->bar[bar_num] = value;
+		return;
+	}
+
+	old_addr = pci__bar_address(pci_hdr, bar_num);
+	new_addr = __pci__bar_address(value);
+	bar_size = pci__bar_size(pci_hdr, bar_num);
+
+	r = pci_deactivate_bar(kvm, pci_hdr, bar_num);
+	if (r < 0)
+		return;
+
+	r = pci_deactivate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
+	if (r < 0) {
+		/*
+		 * We cannot update the BAR because of an overlapping region
+		 * that failed to deactivate emulation, so keep the old BAR
+		 * value and re-activate emulation for it.
+		 */
+		pci_activate_bar(kvm, pci_hdr, bar_num);
+		return;
+	}
+
+	pci_hdr->bar[bar_num] = value;
+	r = pci_activate_bar(kvm, pci_hdr, bar_num);
+	if (r < 0) {
+		/*
+		 * New region cannot be emulated, re-enable the regions that
+		 * were overlapping.
+		 */
+		pci_activate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
+		return;
+	}
+
+	pci_activate_bar_regions(kvm, pci_hdr, old_addr, bar_size);
+}
+
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
@@ -200,7 +393,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 	u32 value = 0;
-	u32 mask;
 
 	if (!pci_device_exists(addr.bus_number, dev_num, 0))
 		return;
@@ -225,46 +417,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	}
 
 	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
-
-	/*
-	 * If the kernel masks the BAR, it will expect to find the size of the
-	 * BAR there next time it reads from it. After the kernel reads the
-	 * size, it will write the address back.
-	 */
 	if (bar < 6) {
-		if (pci__bar_is_io(pci_hdr, bar))
-			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
-		else
-			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
-		/*
-		 * According to the PCI local bus specification REV 3.0:
-		 * The number of upper bits that a device actually implements
-		 * depends on how much of the address space the device will
-		 * respond to. A device that wants a 1 MB memory address space
-		 * (using a 32-bit base address register) would build the top
-		 * 12 bits of the address register, hardwiring the other bits
-		 * to 0.
-		 *
-		 * Furthermore, software can determine how much address space
-		 * the device requires by writing a value of all 1's to the
-		 * register and then reading the value back. The device will
-		 * return 0's in all don't-care address bits, effectively
-		 * specifying the address space required.
-		 *
-		 * Software computes the size of the address space with the
-		 * formula S = ~B + 1, where S is the memory size and B is the
-		 * value read from the BAR. This means that the BAR value that
-		 * kvmtool should return is B = ~(S - 1).
-		 */
 		memcpy(&value, data, size);
-		if (value == 0xffffffff)
-			value = ~(pci_hdr->bar_size[bar] - 1);
-		/* Preserve the special bits. */
-		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
-		memcpy(base + offset, &value, size);
-	} else {
-		memcpy(base + offset, data, size);
+		pci_config_bar_wr(kvm, pci_hdr, bar, value);
+		return;
 	}
+
+	memcpy(base + offset, data, size);
 }
 
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
@@ -329,20 +488,21 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr
 			continue;
 
 		has_bar_regions = true;
+		assert(!pci_hdr->bar_info[i].active);
 
 		if (pci__bar_is_io(pci_hdr, i) &&
 		    pci__io_space_enabled(pci_hdr)) {
-				r = bar_activate_fn(kvm, pci_hdr, i, data);
-				if (r < 0)
-					return r;
-			}
+			r = pci_activate_bar(kvm, pci_hdr, i);
+			if (r < 0)
+				return r;
+		}
 
 		if (pci__bar_is_memory(pci_hdr, i) &&
 		    pci__memory_space_enabled(pci_hdr)) {
-				r = bar_activate_fn(kvm, pci_hdr, i, data);
-				if (r < 0)
-					return r;
-			}
+			r = pci_activate_bar(kvm, pci_hdr, i);
+			if (r < 0)
+				return r;
+		}
 	}
 
 	assert(has_bar_regions);
diff --git a/powerpc/spapr_pci.c b/powerpc/spapr_pci.c
index a15f7d895a46..7be44d950acb 100644
--- a/powerpc/spapr_pci.c
+++ b/powerpc/spapr_pci.c
@@ -369,7 +369,7 @@ int spapr_populate_pci_devices(struct kvm *kvm,
 				of_pci_b_ddddd(devid) |
 				of_pci_b_fff(fn) |
 				of_pci_b_rrrrrrrr(bars[i]));
-			reg[n+1].size = cpu_to_be64(hdr->bar_size[i]);
+			reg[n+1].size = cpu_to_be64(pci__bar_size(hdr, i));
 			reg[n+1].addr = 0;
 
 			assigned_addresses[n].phys_hi = cpu_to_be32(
diff --git a/vfio/pci.c b/vfio/pci.c
index 9e595562180b..3a641e72e574 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -455,6 +455,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
 	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
 	struct vfio_pci_msix_table *table = &pdev->msix_table;
 	struct vfio_region *region = &vdev->regions[bar_num];
+	u32 bar_addr;
 	int ret;
 
 	if (!region->info.size) {
@@ -462,8 +463,11 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
 		goto out;
 	}
 
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+
 	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
 	    (u32)bar_num == table->bar) {
+		table->guest_phys_addr = region->guest_phys_addr = bar_addr;
 		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
 					 table->size, false,
 					 vfio_pci_msix_table_access, pdev);
@@ -473,13 +477,22 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
 
 	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
 	    (u32)bar_num == pba->bar) {
+		if (pba->bar == table->bar)
+			pba->guest_phys_addr = table->guest_phys_addr + table->size;
+		else
+			pba->guest_phys_addr = region->guest_phys_addr = bar_addr;
 		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
 					 pba->size, false,
 					 vfio_pci_msix_pba_access, pdev);
 		goto out;
 	}
 
+	if (pci__bar_is_io(pci_hdr, bar_num))
+		region->port_base = bar_addr;
+	else
+		region->guest_phys_addr = bar_addr;
 	ret = vfio_map_region(kvm, vdev, region);
+
 out:
 	return ret;
 }
@@ -749,7 +762,7 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 		if (!base)
 			continue;
 
-		pdev->hdr.bar_size[i] = region->info.size;
+		pdev->hdr.bar_info[i].size = region->info.size;
 	}
 
 	/* I really can't be bothered to support cardbus. */
diff --git a/virtio/pci.c b/virtio/pci.c
index 5a3cc6f1e943..e02430881394 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -483,7 +483,7 @@ static int virtio_pci__bar_activate(struct kvm *kvm,
 	int r;
 
 	bar_addr = pci__bar_address(pci_hdr, bar_num);
-	bar_size = pci_hdr->bar_size[bar_num];
+	bar_size = pci__bar_size(pci_hdr, bar_num);
 
 	switch (bar_num) {
 	case 0:
@@ -569,9 +569,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
 		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
 		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
-		.bar_size[0]		= cpu_to_le32(PCI_IO_SIZE),
-		.bar_size[1]		= cpu_to_le32(PCI_IO_SIZE),
-		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
+		.bar_info[0]		= (struct pci_bar_info) {.size = cpu_to_le32(PCI_IO_SIZE)},
+		.bar_info[1]		= (struct pci_bar_info) {.size = cpu_to_le32(PCI_IO_SIZE)},
+		.bar_info[2]		= (struct pci_bar_info) {.size = cpu_to_le32(PCI_IO_SIZE*2)},
 	};
 
 	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (26 preceding siblings ...)
  2020-01-23 13:48 ` [PATCH v2 kvmtool 27/30] pci: Implement reassignable BARs Alexandru Elisei
@ 2020-01-23 13:48 ` Alexandru Elisei
  2020-02-07 16:51   ` Andre Przywara
  2020-02-07 17:38   ` Andre Przywara
  2020-01-23 13:48 ` [PATCH v2 kvmtool 29/30] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
                   ` (3 subsequent siblings)
  31 siblings, 2 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:48 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

PCI now supports configurable BARs. Get rid of the no longer needed,
Linux-only, fdt property.

Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/fdt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index c80e6da323b6..02091e9e0bee 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -130,7 +130,6 @@ static int setup_fdt(struct kvm *kvm)
 
 	/* /chosen */
 	_FDT(fdt_begin_node(fdt, "chosen"));
-	_FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1));
 
 	/* Pass on our amended command line to a Linux kernel only. */
 	if (kvm->cfg.firmware_filename) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 29/30] vfio: Trap MMIO access to BAR addresses which aren't page aligned
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (27 preceding siblings ...)
  2020-01-23 13:48 ` [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
@ 2020-01-23 13:48 ` Alexandru Elisei
  2020-02-07 16:51   ` Andre Przywara
  2020-01-23 13:48 ` [PATCH v2 kvmtool 30/30] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
                   ` (2 subsequent siblings)
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:48 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

KVM_SET_USER_MEMORY_REGION will fail if the guest physical address is
not aligned to the page size. However, it is legal for a guest to
program an address which isn't aligned to the page size. Trap and
emulate MMIO accesses to the region when that happens.

Without this patch, when assigning a Seagate Barracude hard drive to a
VM I was seeing these errors:

[    0.286029] pci 0000:00:00.0: BAR 0: assigned [mem 0x41004600-0x4100467f]
  Error: 0000:01:00.0: failed to register region with KVM
  Error: [1095:3132] Error activating emulation for BAR 0
[..]
[   10.561794] irq 13: nobody cared (try booting with the "irqpoll" option)
[   10.563122] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-seattle-00009-g909b20467ed1 #133
[   10.563124] Hardware name: linux,dummy-virt (DT)
[   10.563126] Call trace:
[   10.563134]  dump_backtrace+0x0/0x140
[   10.563137]  show_stack+0x14/0x20
[   10.563141]  dump_stack+0xbc/0x100
[   10.563146]  __report_bad_irq+0x48/0xd4
[   10.563148]  note_interrupt+0x288/0x378
[   10.563151]  handle_irq_event_percpu+0x80/0x88
[   10.563153]  handle_irq_event+0x44/0xc8
[   10.563155]  handle_fasteoi_irq+0xb4/0x160
[   10.563157]  generic_handle_irq+0x24/0x38
[   10.563159]  __handle_domain_irq+0x60/0xb8
[   10.563162]  gic_handle_irq+0x50/0xa0
[   10.563164]  el1_irq+0xb8/0x180
[   10.563166]  arch_cpu_idle+0x10/0x18
[   10.563170]  do_idle+0x204/0x290
[   10.563172]  cpu_startup_entry+0x20/0x40
[   10.563175]  rest_init+0xd4/0xe0
[   10.563180]  arch_call_rest_init+0xc/0x14
[   10.563182]  start_kernel+0x420/0x44c
[   10.563183] handlers:
[   10.563650] [<000000001e474803>] sil24_interrupt
[   10.564559] Disabling IRQ #13
[..]
[   11.832916] ata1: spurious interrupt (slot_stat 0x0 active_tag -84148995 sactive 0x0)
[   12.045444] ata_ratelimit: 1 callbacks suppressed

With this patch, I don't see the errors and the device works as
expected.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/vfio/core.c b/vfio/core.c
index 6b9b58ea8d2f..b23e77c54771 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -226,6 +226,15 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
 	if (!(region->info.flags & VFIO_REGION_INFO_FLAG_MMAP))
 		return vfio_setup_trap_region(kvm, vdev, region);
 
+	/*
+	 * KVM_SET_USER_MEMORY_REGION will fail because the guest physical
+	 * address isn't page aligned, let's emulate the region ourselves.
+	 */
+	if (region->guest_phys_addr & (PAGE_SIZE - 1))
+		return kvm__register_mmio(kvm, region->guest_phys_addr,
+					  region->info.size, false,
+					  vfio_mmio_access, region);
+
 	if (region->info.flags & VFIO_REGION_INFO_FLAG_READ)
 		prot |= PROT_READ;
 	if (region->info.flags & VFIO_REGION_INFO_FLAG_WRITE)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH v2 kvmtool 30/30] arm/arm64: Add PCI Express 1.1 support
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (28 preceding siblings ...)
  2020-01-23 13:48 ` [PATCH v2 kvmtool 29/30] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
@ 2020-01-23 13:48 ` Alexandru Elisei
  2020-02-07 16:51   ` Andre Przywara
  2020-02-07 17:02 ` [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE " Andre Przywara
  2020-05-13 14:56 ` Marc Zyngier
  31 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-23 13:48 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

PCI Express comes with an extended addressing scheme, which directly
translated into a bigger device configuration space (256->4096 bytes)
and bigger PCI configuration space (16->256 MB), as well as mandatory
capabilities (power management [1] and PCI Express capability [2]).

However, our virtio PCI implementation implements version 0.9 of the
protocol and it still uses transitional PCI device ID's, so we have
opted to omit the mandatory PCI Express capabilities.For VFIO, the power
management and PCI Express capability are left for a subsequent patch.

[1] PCI Express Base Specification Revision 1.1, section 7.6
[2] PCI Express Base Specification Revision 1.1, section 7.8

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/include/arm-common/kvm-arch.h |  4 +-
 arm/pci.c                         |  2 +-
 builtin-run.c                     |  1 +
 hw/vesa.c                         |  2 +-
 include/kvm/kvm-config.h          |  2 +-
 include/kvm/pci.h                 | 76 ++++++++++++++++++++++++++++---
 pci.c                             |  5 +-
 vfio/pci.c                        | 26 +++++++----
 8 files changed, 97 insertions(+), 21 deletions(-)

diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
index b9d486d5eac2..13c55fa3dc29 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -23,7 +23,7 @@
 
 #define ARM_IOPORT_SIZE		(ARM_MMIO_AREA - ARM_IOPORT_AREA)
 #define ARM_VIRTIO_MMIO_SIZE	(ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
-#define ARM_PCI_CFG_SIZE	(1ULL << 24)
+#define ARM_PCI_CFG_SIZE	(1ULL << 28)
 #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
 				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
 
@@ -50,6 +50,8 @@
 
 #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
 
+#define ARCH_HAS_PCI_EXP	1
+
 static inline bool arm_addr_in_ioport_region(u64 phys_addr)
 {
 	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
diff --git a/arm/pci.c b/arm/pci.c
index 1c0949a22408..eec9f3d936a5 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -77,7 +77,7 @@ void pci__generate_fdt_nodes(void *fdt)
 	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
 	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
 	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
-	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
+	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
 	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
 
 	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
diff --git a/builtin-run.c b/builtin-run.c
index 9cb8c75300eb..def8a1f803ad 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -27,6 +27,7 @@
 #include "kvm/irq.h"
 #include "kvm/kvm.h"
 #include "kvm/pci.h"
+#include "kvm/vfio.h"
 #include "kvm/rtc.h"
 #include "kvm/sdl.h"
 #include "kvm/vnc.h"
diff --git a/hw/vesa.c b/hw/vesa.c
index aca938f79c82..4321cfbb6ddc 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -82,7 +82,7 @@ static int vesa__bar_deactivate(struct kvm *kvm,
 }
 
 static void vesa__pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
-				u8 offset, void *data, int sz)
+				u16 offset, void *data, int sz)
 {
 	u32 value;
 
diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
index a052b0bc7582..a1012c57b7a7 100644
--- a/include/kvm/kvm-config.h
+++ b/include/kvm/kvm-config.h
@@ -2,7 +2,6 @@
 #define KVM_CONFIG_H_
 
 #include "kvm/disk-image.h"
-#include "kvm/vfio.h"
 #include "kvm/kvm-config-arch.h"
 
 #define DEFAULT_KVM_DEV		"/dev/kvm"
@@ -18,6 +17,7 @@
 #define MIN_RAM_SIZE_MB		(64ULL)
 #define MIN_RAM_SIZE_BYTE	(MIN_RAM_SIZE_MB << MB_SHIFT)
 
+struct vfio_device_params;
 struct kvm_config {
 	struct kvm_config_arch arch;
 	struct disk_image_params disk_image[MAX_DISK_IMAGES];
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index ae71ef33237c..0c3c74b82626 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -10,6 +10,7 @@
 #include "kvm/devices.h"
 #include "kvm/msi.h"
 #include "kvm/fdt.h"
+#include "kvm.h"
 
 #define pci_dev_err(pci_hdr, fmt, ...) \
 	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
@@ -32,9 +33,41 @@
 #define PCI_CONFIG_BUS_FORWARD	0xcfa
 #define PCI_IO_SIZE		0x100
 #define PCI_IOPORT_START	0x6200
-#define PCI_CFG_SIZE		(1ULL << 24)
 
-struct kvm;
+#define PCIE_CAP_REG_VER	0x1
+#define PCIE_CAP_REG_DEV_LEGACY	(1 << 4)
+#define PM_CAP_VER		0x3
+
+#ifdef ARCH_HAS_PCI_EXP
+#define PCI_CFG_SIZE		(1ULL << 28)
+#define PCI_DEV_CFG_SIZE	PCI_CFG_SPACE_EXP_SIZE
+
+union pci_config_address {
+	struct {
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+		unsigned	reg_offset	: 2;		/* 1  .. 0  */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
+		unsigned	function_number	: 3;		/* 14 .. 12 */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
+		unsigned	enable_bit	: 1;		/* 31       */
+#else
+		unsigned	enable_bit	: 1;		/* 31       */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	function_number	: 3;		/* 14 .. 12 */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
+		unsigned	reg_offset	: 2;		/* 1  .. 0  */
+#endif
+	};
+	u32 w;
+};
+
+#else
+#define PCI_CFG_SIZE		(1ULL << 24)
+#define PCI_DEV_CFG_SIZE	PCI_CFG_SPACE_SIZE
 
 union pci_config_address {
 	struct {
@@ -58,6 +91,8 @@ union pci_config_address {
 	};
 	u32 w;
 };
+#endif
+#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
 struct msix_table {
 	struct msi_msg msg;
@@ -100,6 +135,33 @@ struct pci_cap_hdr {
 	u8	next;
 };
 
+struct pcie_cap {
+	u8 cap;
+	u8 next;
+	u16 cap_reg;
+	u32 dev_cap;
+	u16 dev_ctrl;
+	u16 dev_status;
+	u32 link_cap;
+	u16 link_ctrl;
+	u16 link_status;
+	u32 slot_cap;
+	u16 slot_ctrl;
+	u16 slot_status;
+	u16 root_ctrl;
+	u16 root_cap;
+	u32 root_status;
+};
+
+struct pm_cap {
+	u8 cap;
+	u8 next;
+	u16 pmc;
+	u16 pmcsr;
+	u8 pmcsr_bse;
+	u8 data;
+};
+
 struct pci_bar_info {
 	u32 size;
 	bool active;
@@ -115,14 +177,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
 				   int bar_num, void *data);
 
 #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
-#define PCI_DEV_CFG_SIZE	256
-#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
 struct pci_config_operations {
 	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
-		      u8 offset, void *data, int sz);
+		      u16 offset, void *data, int sz);
 	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
-		     u8 offset, void *data, int sz);
+		     u16 offset, void *data, int sz);
 };
 
 struct pci_device_header {
@@ -152,6 +212,10 @@ struct pci_device_header {
 			u8		min_gnt;
 			u8		max_lat;
 			struct msix_cap msix;
+#ifdef ARCH_HAS_PCI_EXP
+			struct pm_cap pm;
+			struct pcie_cap pcie;
+#endif
 		} __attribute__((packed));
 		/* Pad to PCI config space size */
 		u8	__pad[PCI_DEV_CFG_SIZE];
diff --git a/pci.c b/pci.c
index 1e9791250bc3..ea3df8d2e28a 100644
--- a/pci.c
+++ b/pci.c
@@ -389,7 +389,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
-	u8 bar, offset;
+	u8 bar;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 	u32 value = 0;
@@ -428,7 +429,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
-	u8 offset;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 
diff --git a/vfio/pci.c b/vfio/pci.c
index 3a641e72e574..05e8b54e77ac 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -309,7 +309,7 @@ out_unlock:
 }
 
 static void vfio_pci_msix_cap_write(struct kvm *kvm,
-				    struct vfio_device *vdev, u8 off,
+				    struct vfio_device *vdev, u16 off,
 				    void *data, int sz)
 {
 	struct vfio_pci_device *pdev = &vdev->pci;
@@ -341,7 +341,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
 }
 
 static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
-				     u8 off, u8 *data, u32 sz)
+				     u16 off, u8 *data, u32 sz)
 {
 	size_t i;
 	u32 mask = 0;
@@ -389,7 +389,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
 }
 
 static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
-				   u8 off, u8 *data, u32 sz)
+				   u16 off, u8 *data, u32 sz)
 {
 	u8 ctrl;
 	struct msi_msg msg;
@@ -537,7 +537,7 @@ out:
 }
 
 static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
-			      u8 offset, void *data, int sz)
+			      u16 offset, void *data, int sz)
 {
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
@@ -555,7 +555,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
 }
 
 static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
-			       u8 offset, void *data, int sz)
+			       u16 offset, void *data, int sz)
 {
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
@@ -639,15 +639,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 {
 	int ret;
 	size_t size;
-	u8 pos, next;
+	u16 pos, next;
 	struct pci_cap_hdr *cap;
-	u8 virt_hdr[PCI_DEV_CFG_SIZE];
+	u8 *virt_hdr;
 	struct vfio_pci_device *pdev = &vdev->pci;
 
 	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
 		return 0;
 
-	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
+	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
+	if (!virt_hdr)
+		return -errno;
 
 	pos = pdev->hdr.capabilities & ~3;
 
@@ -683,6 +685,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
 	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
 
+	free(virt_hdr);
+
 	return 0;
 }
 
@@ -792,7 +796,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 
 	/* Install our fake Configuration Space */
 	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
-	hdr_sz = PCI_DEV_CFG_SIZE;
+	/*
+	 * We don't touch the extended configuration space, let's be cautious
+	 * and not overwrite it all with zeros, or bad things might happen.
+	 */
+	hdr_sz = PCI_CFG_SPACE_SIZE;
 	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
 		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
 			     hdr_sz);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 02/30] hw/i8042: Compile only for x86
  2020-01-23 13:47 ` [PATCH v2 kvmtool 02/30] hw/i8042: Compile only for x86 Alexandru Elisei
@ 2020-01-27 18:07   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-01-27 18:07 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:37 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> The initialization function for the i8042 emulated device does exactly
> nothing for all architectures, except for x86. As a result, the device
> is usable only for x86, so let's make the file an architecture specific
> object file.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre.

> ---
>  Makefile   | 2 +-
>  hw/i8042.c | 4 ----
>  2 files changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 6d6880dd4f8a..33eddcbb4d66 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -103,7 +103,6 @@ OBJS	+= hw/pci-shmem.o
>  OBJS	+= kvm-ipc.o
>  OBJS	+= builtin-sandbox.o
>  OBJS	+= virtio/mmio.o
> -OBJS	+= hw/i8042.o
>  
>  # Translate uname -m into ARCH string
>  ARCH ?= $(shell uname -m | sed -e s/i.86/i386/ -e s/ppc.*/powerpc/ \
> @@ -124,6 +123,7 @@ endif
>  #x86
>  ifeq ($(ARCH),x86)
>  	DEFINES += -DCONFIG_X86
> +	OBJS	+= hw/i8042.o
>  	OBJS	+= x86/boot.o
>  	OBJS	+= x86/cpuid.o
>  	OBJS	+= x86/interrupt.o
> diff --git a/hw/i8042.c b/hw/i8042.c
> index 288b7d1108ac..2d8c96e9c7e6 100644
> --- a/hw/i8042.c
> +++ b/hw/i8042.c
> @@ -349,10 +349,6 @@ static struct ioport_operations kbd_ops = {
>  
>  int kbd__init(struct kvm *kvm)
>  {
> -#ifndef CONFIG_X86
> -	return 0;
> -#endif
> -
>  	kbd_reset();
>  	state.kvm = kvm;
>  	ioport__register(kvm, I8042_DATA_REG, &kbd_ops, 2, NULL);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 03/30] pci: Fix BAR resource sizing arbitration
  2020-01-23 13:47 ` [PATCH v2 kvmtool 03/30] pci: Fix BAR resource sizing arbitration Alexandru Elisei
@ 2020-01-27 18:07   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-01-27 18:07 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

On Thu, 23 Jan 2020 13:47:38 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> From: Sami Mujawar <sami.mujawar@arm.com>
> 
> According to the 'PCI Local Bus Specification, Revision 3.0,
> February 3, 2004, Section 6.2.5.1, Implementation Notes, page 227'
> 
>     "Software saves the original value of the Base Address register,
>     writes 0 FFFF FFFFh to the register, then reads it back. Size
>     calculation can be done from the 32-bit value read by first
>     clearing encoding information bits (bit 0 for I/O, bits 0-3 for
>     memory), inverting all 32 bits (logical NOT), then incrementing
>     by 1. The resultant 32-bit value is the memory/I/O range size
>     decoded by the register. Note that the upper 16 bits of the result
>     is ignored if the Base Address register is for I/O and bits 16-31
>     returned zero upon read."
> 
> kvmtool was returning the actual BAR resource size which would be
> incorrect as the software software drivers would invert all 32 bits
> (logical NOT), then incrementing by 1. This ends up with a very large
> resource size (in some cases more than 4GB) due to which drivers
> assert/fail to work.
> 
> e.g if the BAR resource size was 0x1000, kvmtool would return 0x1000
> instead of 0xFFFFF00x.
> 
> Fixed pci__config_wr() to return the size of the BAR in accordance with
> the PCI Local Bus specification, Implementation Notes.
> 
> Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> [Reworked algorithm, removed power-of-two check]
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

That looks correct now - after realising mask is not what one thinks it is ;-)

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  pci.c | 42 ++++++++++++++++++++++++++++++++++++------
>  1 file changed, 36 insertions(+), 6 deletions(-)
> 
> diff --git a/pci.c b/pci.c
> index 689869cb79a3..3198732935eb 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -149,6 +149,8 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	u8 bar, offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
> +	u32 value = 0;
> +	u32 mask;
>  
>  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
>  		return;
> @@ -169,13 +171,41 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>  
>  	/*
> -	 * If the kernel masks the BAR it would expect to find the size of the
> -	 * BAR there next time it reads from it. When the kernel got the size it
> -	 * would write the address back.
> +	 * If the kernel masks the BAR, it will expect to find the size of the
> +	 * BAR there next time it reads from it. After the kernel reads the
> +	 * size, it will write the address back.
>  	 */
> -	if (bar < 6 && ioport__read32(data) == 0xFFFFFFFF) {
> -		u32 sz = pci_hdr->bar_size[bar];
> -		memcpy(base + offset, &sz, sizeof(sz));
> +	if (bar < 6) {
> +		if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO)
> +			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
> +		else
> +			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> +		/*
> +		 * According to the PCI local bus specification REV 3.0:
> +		 * The number of upper bits that a device actually implements
> +		 * depends on how much of the address space the device will
> +		 * respond to. A device that wants a 1 MB memory address space
> +		 * (using a 32-bit base address register) would build the top
> +		 * 12 bits of the address register, hardwiring the other bits
> +		 * to 0.
> +		 *
> +		 * Furthermore, software can determine how much address space
> +		 * the device requires by writing a value of all 1's to the
> +		 * register and then reading the value back. The device will
> +		 * return 0's in all don't-care address bits, effectively
> +		 * specifying the address space required.
> +		 *
> +		 * Software computes the size of the address space with the
> +		 * formula S = ~B + 1, where S is the memory size and B is the
> +		 * value read from the BAR. This means that the BAR value that
> +		 * kvmtool should return is B = ~(S - 1).
> +		 */
> +		memcpy(&value, data, size);
> +		if (value == 0xffffffff)
> +			value = ~(pci_hdr->bar_size[bar] - 1);
> +		/* Preserve the special bits. */
> +		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
> +		memcpy(base + offset, &value, size);
>  	} else {
>  		memcpy(base + offset, data, size);
>  	}


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 05/30] Check that a PCI device's memory size is power of two
  2020-01-23 13:47 ` [PATCH v2 kvmtool 05/30] Check that a PCI device's memory size is power of two Alexandru Elisei
@ 2020-01-27 18:07   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-01-27 18:07 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:40 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> According to the PCI local bus specification [1], a device's memory size
> must be a power of two. This is also implicit in the mechanism that a CPU
> uses to get the memory size requirement for a PCI device.
> 
> The vesa device requests a memory size that isn't a power of two.
> According to the same spec [1], a device is allowed to consume more memory
> than it actually requires. As a result, the amount of memory that the vesa
> device now reserves has been increased.
> 
> To prevent slip-ups in the future, a few BUILD_BUG_ON statements were added
> in places where the memory size is known at compile time.
> 
> [1] PCI Local Bus Specification Revision 3.0, section 6.2.5.1
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  hw/vesa.c          | 3 +++
>  include/kvm/util.h | 2 ++
>  include/kvm/vesa.h | 6 +++++-
>  virtio/pci.c       | 3 +++
>  4 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index f3c5114cf4fe..d75b4b316a1e 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -58,6 +58,9 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  	char *mem;
>  	int r;
>  
> +	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
> +	BUILD_BUG_ON(VESA_MEM_SIZE < VESA_BPP/8 * VESA_WIDTH * VESA_HEIGHT);
> +
>  	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>  		return NULL;
>  
> diff --git a/include/kvm/util.h b/include/kvm/util.h
> index 4ca7aa9392b6..199724c4018c 100644
> --- a/include/kvm/util.h
> +++ b/include/kvm/util.h
> @@ -104,6 +104,8 @@ static inline unsigned long roundup_pow_of_two(unsigned long x)
>  	return x ? 1UL << fls_long(x - 1) : 0;
>  }
>  
> +#define is_power_of_two(x)	((x) > 0 ? ((x) & ((x) - 1)) == 0 : 0)
> +
>  struct kvm;
>  void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
>  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
> diff --git a/include/kvm/vesa.h b/include/kvm/vesa.h
> index 0fac11ab5a9f..e7d971343642 100644
> --- a/include/kvm/vesa.h
> +++ b/include/kvm/vesa.h
> @@ -5,8 +5,12 @@
>  #define VESA_HEIGHT	480
>  
>  #define VESA_MEM_ADDR	0xd0000000
> -#define VESA_MEM_SIZE	(4*VESA_WIDTH*VESA_HEIGHT)
>  #define VESA_BPP	32
> +/*
> + * We actually only need VESA_BPP/8*VESA_WIDTH*VESA_HEIGHT bytes. But the memory
> + * size must be a power of 2, so we round up.
> + */
> +#define VESA_MEM_SIZE	(1 << 21)
>  
>  struct kvm;
>  struct biosregs;
> diff --git a/virtio/pci.c b/virtio/pci.c
> index 99653cad2c0f..04e801827df9 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -435,6 +435,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  	vpci->kvm = kvm;
>  	vpci->dev = dev;
>  
> +	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
> +	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
> +
>  	r = ioport__register(kvm, IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
>  	if (r < 0)
>  		return r;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 06/30] arm/pci: Advertise only PCI bus 0 in the DT
  2020-01-23 13:47 ` [PATCH v2 kvmtool 06/30] arm/pci: Advertise only PCI bus 0 in the DT Alexandru Elisei
@ 2020-01-27 18:08   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-01-27 18:08 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:41 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> The "bus-range" property encodes the PCI bus number of the PCI
> controller and the largest bus number of any PCI buses that are
> subordinate to this node [1]. kvmtool emulates only PCI bus 0.

I am wondering if that ever becomes a limitation, but in the current context it looks like the right thing to do.

> Advertise this in the PCI DT node by setting "bus-range" to <0,0>.
> 
> [1] IEEE Std 1275-1994, Section 3 "Bus Nodes Properties and Methods"
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  arm/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arm/pci.c b/arm/pci.c
> index 557cfa98938d..ed325fa4a811 100644
> --- a/arm/pci.c
> +++ b/arm/pci.c
> @@ -30,7 +30,7 @@ void pci__generate_fdt_nodes(void *fdt)
>  	struct of_interrupt_map_entry irq_map[OF_PCI_IRQ_MAP_MAX];
>  	unsigned nentries = 0;
>  	/* Bus range */
> -	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(1), };
> +	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(0), };
>  	/* Configuration Space */
>  	u64 cfg_reg_prop[] = { cpu_to_fdt64(KVM_PCI_CFG_AREA),
>  			       cpu_to_fdt64(ARM_PCI_CFG_SIZE), };


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region
  2020-01-23 13:47 ` [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region Alexandru Elisei
@ 2020-01-29 18:16   ` Andre Przywara
  2020-03-04 16:20     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-01-29 18:16 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

On Thu, 23 Jan 2020 13:47:44 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> From: Julien Thierry <julien.thierry@arm.com>
> 
> Current PCI IO region that is exposed through the DT contains ports that
> are reserved by non-PCI devices.
> 
> Use the proper PCI IO start so that the region exposed through DT can
> actually be used to reassign device BARs.

I guess the majority of the patch is about that the current allocation starts at 0x6200, which is not 4K aligned?
It would be nice if we could mention this in the commit message.

Actually, silly question: It seems like this 0x6200 is rather arbitrary, can't we just change that to a 4K aligned value and drop that patch here?
If something on the x86 side relies on that value, it should rather be explicit than by chance.
(Because while this patch here seems correct, it's also quite convoluted.)

Cheers,
Andre.

> 
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/include/arm-common/pci.h |  1 +
>  arm/kvm.c                    |  3 +++
>  arm/pci.c                    | 21 ++++++++++++++++++---
>  3 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/arm/include/arm-common/pci.h b/arm/include/arm-common/pci.h
> index 9008a0ed072e..aea42b8895e9 100644
> --- a/arm/include/arm-common/pci.h
> +++ b/arm/include/arm-common/pci.h
> @@ -1,6 +1,7 @@
>  #ifndef ARM_COMMON__PCI_H
>  #define ARM_COMMON__PCI_H
>  
> +void pci__arm_init(struct kvm *kvm);
>  void pci__generate_fdt_nodes(void *fdt);
>  
>  #endif /* ARM_COMMON__PCI_H */
> diff --git a/arm/kvm.c b/arm/kvm.c
> index 1f85fc60588f..5c30ec1e0515 100644
> --- a/arm/kvm.c
> +++ b/arm/kvm.c
> @@ -6,6 +6,7 @@
>  #include "kvm/fdt.h"
>  
>  #include "arm-common/gic.h"
> +#include "arm-common/pci.h"
>  
>  #include <linux/kernel.h>
>  #include <linux/kvm.h>
> @@ -86,6 +87,8 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
>  	/* Create the virtual GIC. */
>  	if (gic__create(kvm, kvm->cfg.arch.irqchip))
>  		die("Failed to create virtual GIC");
> +
> +	pci__arm_init(kvm);
>  }
>  
>  #define FDT_ALIGN	SZ_2M
> diff --git a/arm/pci.c b/arm/pci.c
> index ed325fa4a811..1c0949a22408 100644
> --- a/arm/pci.c
> +++ b/arm/pci.c
> @@ -1,3 +1,5 @@
> +#include "linux/sizes.h"
> +
>  #include "kvm/devices.h"
>  #include "kvm/fdt.h"
>  #include "kvm/kvm.h"
> @@ -7,6 +9,11 @@
>  
>  #include "arm-common/pci.h"
>  
> +#define ARM_PCI_IO_START ALIGN(PCI_IOPORT_START, SZ_4K)
> +
> +/* Must be a multiple of 4k */
> +#define ARM_PCI_IO_SIZE ((ARM_MMIO_AREA - ARM_PCI_IO_START) & ~(SZ_4K - 1))
> +
>  /*
>   * An entry in the interrupt-map table looks like:
>   * <pci unit address> <pci interrupt pin> <gic phandle> <gic interrupt>
> @@ -24,6 +31,14 @@ struct of_interrupt_map_entry {
>  	struct of_gic_irq		gic_irq;
>  } __attribute__((packed));
>  
> +void pci__arm_init(struct kvm *kvm)
> +{
> +	u32 align_pad = ARM_PCI_IO_START - PCI_IOPORT_START;
> +
> +	/* Make PCI port allocation start at a properly aligned address */
> +	pci_get_io_port_block(align_pad);
> +}
> +
>  void pci__generate_fdt_nodes(void *fdt)
>  {
>  	struct device_header *dev_hdr;
> @@ -40,10 +55,10 @@ void pci__generate_fdt_nodes(void *fdt)
>  			.pci_addr = {
>  				.hi	= cpu_to_fdt32(of_pci_b_ss(OF_PCI_SS_IO)),
>  				.mid	= 0,
> -				.lo	= 0,
> +				.lo	= cpu_to_fdt32(ARM_PCI_IO_START),
>  			},
> -			.cpu_addr	= cpu_to_fdt64(KVM_IOPORT_AREA),
> -			.length		= cpu_to_fdt64(ARM_IOPORT_SIZE),
> +			.cpu_addr	= cpu_to_fdt64(ARM_PCI_IO_START),
> +			.length		= cpu_to_fdt64(ARM_PCI_IO_SIZE),
>  		},
>  		{
>  			.pci_addr = {


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 10/30] virtio/pci: Make memory and IO BARs independent
  2020-01-23 13:47 ` [PATCH v2 kvmtool 10/30] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
@ 2020-01-29 18:16   ` Andre Przywara
  2020-03-05 15:41     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-01-29 18:16 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

On Thu, 23 Jan 2020 13:47:45 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> From: Julien Thierry <julien.thierry@arm.com>
> 
> Currently, callbacks for memory BAR 1 call the IO port emulation.  This
> means that the memory BAR needs I/O Space to be enabled whenever Memory
> Space is enabled.
> 
> Refactor the code so the two type of  BARs are independent. Also, unify
> ioport/mmio callback arguments so that they all receive a virtio_device.

That's a nice cleanup, I like that it avoids shoehorning everything as legacy I/O into the emulation.

Just a nit below, but nevertheless:
 
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

> ---
>  virtio/pci.c | 71 +++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 48 insertions(+), 23 deletions(-)
> 
> diff --git a/virtio/pci.c b/virtio/pci.c
> index eeb5b5efa6e1..6723a1f3a84d 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -87,8 +87,8 @@ static inline bool virtio_pci__msix_enabled(struct virtio_pci *vpci)
>  	return vpci->pci_hdr.msix.ctrl & cpu_to_le16(PCI_MSIX_FLAGS_ENABLE);
>  }
>  
> -static bool virtio_pci__specific_io_in(struct kvm *kvm, struct virtio_device *vdev, u16 port,
> -					void *data, int size, int offset)
> +static bool virtio_pci__specific_data_in(struct kvm *kvm, struct virtio_device *vdev,
> +					 void *data, int size, unsigned long offset)
>  {
>  	u32 config_offset;
>  	struct virtio_pci *vpci = vdev->virtio;
> @@ -117,20 +117,17 @@ static bool virtio_pci__specific_io_in(struct kvm *kvm, struct virtio_device *vd
>  	return false;
>  }
>  
> -static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
> +static bool virtio_pci__data_in(struct kvm_cpu *vcpu, struct virtio_device *vdev,
> +				unsigned long offset, void *data, int size)
>  {
> -	unsigned long offset;
>  	bool ret = true;
> -	struct virtio_device *vdev;
>  	struct virtio_pci *vpci;
>  	struct virt_queue *vq;
>  	struct kvm *kvm;
>  	u32 val;
>  
>  	kvm = vcpu->kvm;
> -	vdev = ioport->priv;
>  	vpci = vdev->virtio;
> -	offset = port - vpci->port_addr;
>  
>  	switch (offset) {
>  	case VIRTIO_PCI_HOST_FEATURES:
> @@ -154,13 +151,26 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p
>  		vpci->isr = VIRTIO_IRQ_LOW;
>  		break;
>  	default:
> -		ret = virtio_pci__specific_io_in(kvm, vdev, port, data, size, offset);
> +		ret = virtio_pci__specific_data_in(kvm, vdev, data, size, offset);
>  		break;
>  	};
>  
>  	return ret;
>  }
>  
> +static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
> +{
> +	unsigned long offset;
> +	struct virtio_device *vdev;
> +	struct virtio_pci *vpci;
> +
> +	vdev = ioport->priv;
> +	vpci = vdev->virtio;
> +	offset = port - vpci->port_addr;

You could initialise the variables directly at their declaration, which looks nicer and underlines that they are just helper variables.
Same below.

Cheers,
Andre.

> +
> +	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
> +}
> +
>  static void update_msix_map(struct virtio_pci *vpci,
>  			    struct msix_table *msix_entry, u32 vecnum)
>  {
> @@ -185,8 +195,8 @@ static void update_msix_map(struct virtio_pci *vpci,
>  	irq__update_msix_route(vpci->kvm, gsi, &msix_entry->msg);
>  }
>  
> -static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *vdev, u16 port,
> -					void *data, int size, int offset)
> +static bool virtio_pci__specific_data_out(struct kvm *kvm, struct virtio_device *vdev,
> +					  void *data, int size, unsigned long offset)
>  {
>  	struct virtio_pci *vpci = vdev->virtio;
>  	u32 config_offset, vec;
> @@ -259,19 +269,16 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *v
>  	return false;
>  }
>  
> -static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
> +static bool virtio_pci__data_out(struct kvm_cpu *vcpu, struct virtio_device *vdev,
> +				 unsigned long offset, void *data, int size)
>  {
> -	unsigned long offset;
>  	bool ret = true;
> -	struct virtio_device *vdev;
>  	struct virtio_pci *vpci;
>  	struct kvm *kvm;
>  	u32 val;
>  
>  	kvm = vcpu->kvm;
> -	vdev = ioport->priv;
>  	vpci = vdev->virtio;
> -	offset = port - vpci->port_addr;
>  
>  	switch (offset) {
>  	case VIRTIO_PCI_GUEST_FEATURES:
> @@ -304,13 +311,26 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
>  		virtio_notify_status(kvm, vdev, vpci->dev, vpci->status);
>  		break;
>  	default:
> -		ret = virtio_pci__specific_io_out(kvm, vdev, port, data, size, offset);
> +		ret = virtio_pci__specific_data_out(kvm, vdev, data, size, offset);
>  		break;
>  	};
>  
>  	return ret;
>  }
>  
> +static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
> +{
> +	unsigned long offset;
> +	struct virtio_device *vdev;
> +	struct virtio_pci *vpci;
> +
> +	vdev = ioport->priv;
> +	vpci = vdev->virtio;
> +	offset = port - vpci->port_addr;
> +
> +	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
> +}
> +
>  static struct ioport_operations virtio_pci__io_ops = {
>  	.io_in	= virtio_pci__io_in,
>  	.io_out	= virtio_pci__io_out,
> @@ -320,7 +340,8 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
>  					   u64 addr, u8 *data, u32 len,
>  					   u8 is_write, void *ptr)
>  {
> -	struct virtio_pci *vpci = ptr;
> +	struct virtio_device *vdev = ptr;
> +	struct virtio_pci *vpci = vdev->virtio;
>  	struct msix_table *table;
>  	int vecnum;
>  	size_t offset;
> @@ -419,11 +440,15 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>  					 u64 addr, u8 *data, u32 len,
>  					 u8 is_write, void *ptr)
>  {
> -	struct virtio_pci *vpci = ptr;
> -	int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
> -	u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
> +	struct virtio_device *vdev = ptr;
> +	struct virtio_pci *vpci = vdev->virtio;
>  
> -	kvm__emulate_io(vcpu, port, data, direction, len, 1);
> +	if (!is_write)
> +		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
> +				    data, len);
> +	else
> +		virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr,
> +				     data, len);
>  }
>  
>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
> @@ -445,13 +470,13 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  
>  	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
>  	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
> -			       virtio_pci__io_mmio_callback, vpci);
> +			       virtio_pci__io_mmio_callback, vdev);
>  	if (r < 0)
>  		goto free_ioport;
>  
>  	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
>  	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
> -			       virtio_pci__msix_mmio_callback, vpci);
> +			       virtio_pci__msix_mmio_callback, vdev);
>  	if (r < 0)
>  		goto free_mmio;
>  


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 11/30] vfio/pci: Allocate correct size for MSIX table and PBA BARs
  2020-01-23 13:47 ` [PATCH v2 kvmtool 11/30] vfio/pci: Allocate correct size for MSIX table and PBA BARs Alexandru Elisei
@ 2020-01-29 18:16   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-01-29 18:16 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:46 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> kvmtool assumes that the BAR that holds the address for the MSIX table
> and PBA structure has a size which is equal to their total size and it
> allocates memory from MMIO space accordingly.  However, when
> initializing the BARs, the BAR size is set to the region size reported
> by VFIO. When the physical BAR size is greater than the mmio space that
> kvmtool allocates, we can have a situation where the BAR overlaps with
> another BAR, in which case kvmtool will fail to map the memory. This was
> found when trying to do PCI passthrough with a PCIe Realtek r8168 NIC,
> when the guest was also using virtio-block and virtio-net devices:

Good catch!

> 
> [..]
> [    0.197926] PCI: OF: PROBE_ONLY enabled
> [    0.198454] pci-host-generic 40000000.pci: host bridge /pci ranges:
> [    0.199291] pci-host-generic 40000000.pci:    IO 0x00007000..0x0000ffff -> 0x00007000
> [    0.200331] pci-host-generic 40000000.pci:   MEM 0x41000000..0x7fffffff -> 0x41000000
> [    0.201480] pci-host-generic 40000000.pci: ECAM at [mem 0x40000000-0x40ffffff] for [bus 00]
> [    0.202635] pci-host-generic 40000000.pci: PCI host bridge to bus 0000:00
> [    0.203535] pci_bus 0000:00: root bus resource [bus 00]
> [    0.204227] pci_bus 0000:00: root bus resource [io  0x0000-0x8fff] (bus address [0x7000-0xffff])
> [    0.205483] pci_bus 0000:00: root bus resource [mem 0x41000000-0x7fffffff]
> [    0.206456] pci 0000:00:00.0: [10ec:8168] type 00 class 0x020000
> [    0.207399] pci 0000:00:00.0: reg 0x10: [io  0x0000-0x00ff]
> [    0.208252] pci 0000:00:00.0: reg 0x18: [mem 0x41002000-0x41002fff]
> [    0.209233] pci 0000:00:00.0: reg 0x20: [mem 0x41000000-0x41003fff]
> [    0.210481] pci 0000:00:01.0: [1af4:1000] type 00 class 0x020000
> [    0.211349] pci 0000:00:01.0: reg 0x10: [io  0x0100-0x01ff]
> [    0.212118] pci 0000:00:01.0: reg 0x14: [mem 0x41003000-0x410030ff]
> [    0.212982] pci 0000:00:01.0: reg 0x18: [mem 0x41003200-0x410033ff]
> [    0.214247] pci 0000:00:02.0: [1af4:1001] type 00 class 0x018000
> [    0.215096] pci 0000:00:02.0: reg 0x10: [io  0x0200-0x02ff]
> [    0.215863] pci 0000:00:02.0: reg 0x14: [mem 0x41003400-0x410034ff]
> [    0.216723] pci 0000:00:02.0: reg 0x18: [mem 0x41003600-0x410037ff]
> [    0.218105] pci 0000:00:00.0: can't claim BAR 4 [mem 0x41000000-0x41003fff]: address conflict with 0000:00:00.0 [mem 0x41002000-0x41002fff]
> [..]
> 
> Guest output of lspci -vv:
> 
> 00:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
> 	Subsystem: TP-LINK Technologies Co., Ltd. TG-3468 Gigabit PCI Express Network Adapter
> 	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Interrupt: pin A routed to IRQ 16
> 	Region 0: I/O ports at 0000 [size=256]
> 	Region 2: Memory at 41002000 (64-bit, non-prefetchable) [size=4K]
> 	Region 4: Memory at 41000000 (64-bit, prefetchable) [size=16K]
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> 		Address: 0000000000000000  Data: 0000
> 	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
> 		Vector table: BAR=4 offset=00000000
> 		PBA: BAR=4 offset=00001000
> 
> Let's fix this by allocating an amount of MMIO memory equal to the size
> of the BAR that contains the MSIX table and/or PBA.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Looks alright to me:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  vfio/pci.c | 68 +++++++++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 52 insertions(+), 16 deletions(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 8e5d8572bc0c..bbb8469c8d93 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -715,17 +715,44 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>  	return 0;
>  }
>  
> -static int vfio_pci_create_msix_table(struct kvm *kvm,
> -				      struct vfio_pci_device *pdev)
> +static int vfio_pci_get_region_info(struct vfio_device *vdev, u32 index,
> +				    struct vfio_region_info *info)
> +{
> +	int ret;
> +
> +	*info = (struct vfio_region_info) {
> +		.argsz = sizeof(*info),
> +		.index = index,
> +	};
> +
> +	ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
> +	if (ret) {
> +		ret = -errno;
> +		vfio_dev_err(vdev, "cannot get info for BAR %u", index);
> +		return ret;
> +	}
> +
> +	if (info->size && !is_power_of_two(info->size)) {
> +		vfio_dev_err(vdev, "region is not power of two: 0x%llx",
> +				info->size);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>  {
>  	int ret;
>  	size_t i;
> -	size_t mmio_size;
> +	size_t map_size;
>  	size_t nr_entries;
>  	struct vfio_pci_msi_entry *entries;
> +	struct vfio_pci_device *pdev = &vdev->pci;
>  	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>  	struct vfio_pci_msix_table *table = &pdev->msix_table;
>  	struct msix_cap *msix = PCI_CAP(&pdev->hdr, pdev->msix.pos);
> +	struct vfio_region_info info;
>  
>  	table->bar = msix->table_offset & PCI_MSIX_TABLE_BIR;
>  	pba->bar = msix->pba_offset & PCI_MSIX_TABLE_BIR;
> @@ -744,15 +771,31 @@ static int vfio_pci_create_msix_table(struct kvm *kvm,
>  	for (i = 0; i < nr_entries; i++)
>  		entries[i].config.ctrl = PCI_MSIX_ENTRY_CTRL_MASKBIT;
>  
> +	ret = vfio_pci_get_region_info(vdev, table->bar, &info);
> +	if (ret)
> +		return ret;
> +	if (!info.size)
> +		return -EINVAL;
> +	map_size = info.size;
> +
> +	if (table->bar != pba->bar) {
> +		ret = vfio_pci_get_region_info(vdev, pba->bar, &info);
> +		if (ret)
> +			return ret;
> +		if (!info.size)
> +			return -EINVAL;
> +		map_size += info.size;
> +	}
> +
>  	/*
>  	 * To ease MSI-X cap configuration in case they share the same BAR,
>  	 * collapse table and pending array. The size of the BAR regions must be
>  	 * powers of two.
>  	 */
> -	mmio_size = roundup_pow_of_two(table->size + pba->size);
> -	table->guest_phys_addr = pci_get_mmio_block(mmio_size);
> +	map_size = ALIGN(map_size, PAGE_SIZE);
> +	table->guest_phys_addr = pci_get_mmio_block(map_size);
>  	if (!table->guest_phys_addr) {
> -		pr_err("cannot allocate IO space");
> +		pr_err("cannot allocate MMIO space");
>  		ret = -ENOMEM;
>  		goto out_free;
>  	}
> @@ -816,17 +859,10 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  
>  	region->vdev = vdev;
>  	region->is_ioport = !!(bar & PCI_BASE_ADDRESS_SPACE_IO);
> -	region->info = (struct vfio_region_info) {
> -		.argsz = sizeof(region->info),
> -		.index = nr,
> -	};
>  
> -	ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, &region->info);
> -	if (ret) {
> -		ret = -errno;
> -		vfio_dev_err(vdev, "cannot get info for BAR %zu", nr);
> +	ret = vfio_pci_get_region_info(vdev, nr, &region->info);
> +	if (ret)
>  		return ret;
> -	}
>  
>  	/* Ignore invalid or unimplemented regions */
>  	if (!region->info.size)
> @@ -871,7 +907,7 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>  		return ret;
>  
>  	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) {
> -		ret = vfio_pci_create_msix_table(kvm, pdev);
> +		ret = vfio_pci_create_msix_table(kvm, vdev);
>  		if (ret)
>  			return ret;
>  	}


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 14/30] vfio/pci: Don't access potentially unallocated regions
  2020-01-23 13:47 ` [PATCH v2 kvmtool 14/30] vfio/pci: Don't access potentially unallocated regions Alexandru Elisei
@ 2020-01-29 18:17   ` Andre Przywara
  2020-03-06 10:54     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-01-29 18:17 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:49 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> Don't try to configure a BAR if there is no region associated with it.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  vfio/pci.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 1f38f90c3ae9..f86a7d9b7032 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -652,6 +652,8 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>  
>  	/* Initialise the BARs */
>  	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
> +		if ((u32)i == vdev->info.num_regions)
> +			break;

My inner check-patch complains that we should not have code before declarations.
Can we solve this the same way as below?

Cheers,
Andre


>  		u64 base;
>  		struct vfio_region *region = &vdev->regions[i];
>  
> @@ -853,11 +855,12 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  	u32 bar;
>  	size_t map_size;
>  	struct vfio_pci_device *pdev = &vdev->pci;
> -	struct vfio_region *region = &vdev->regions[nr];
> +	struct vfio_region *region;
>  
>  	if (nr >= vdev->info.num_regions)
>  		return 0;
>  
> +	region = &vdev->regions[nr];
>  	bar = pdev->hdr.bar[nr];
>  
>  	region->vdev = vdev;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 12/30] vfio/pci: Don't assume that only even numbered BARs are 64bit
  2020-01-23 13:47 ` [PATCH v2 kvmtool 12/30] vfio/pci: Don't assume that only even numbered BARs are 64bit Alexandru Elisei
@ 2020-01-30 14:50   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-01-30 14:50 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, Sami Mujawar, Lorenzo Pieralisi, maz

On Thu, 23 Jan 2020 13:47:47 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Not all devices have the bottom 32 bits of a 64 bit BAR in an even
> numbered BAR. For example, on an NVIDIA Quadro P400, BARs 1 and 3 are
> 64bit. Remove this assumption.
>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  vfio/pci.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/vfio/pci.c b/vfio/pci.c
> index bbb8469c8d93..1bdc20038411 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -920,8 +920,10 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>
>  for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
>  /* Ignore top half of 64-bit BAR */
> -if (i % 2 && is_64bit)
> +if (is_64bit) {
> +is_64bit = false;
>  continue;
> +}
>
>  ret = vfio_pci_configure_bar(kvm, vdev, i);
>  if (ret)

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 13/30] vfio/pci: Ignore expansion ROM BAR writes
  2020-01-23 13:47 ` [PATCH v2 kvmtool 13/30] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
@ 2020-01-30 14:50   ` Andre Przywara
  2020-01-30 15:52     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-01-30 14:50 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:48 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> To get the size of the expansion ROM, software writes 0xfffff800 to the
> expansion ROM BAR in the PCI configuration space. PCI emulation executes
> the optional configuration space write callback that a device can
> implement before emulating this write.
> 
> VFIO doesn't have support for emulating expansion ROMs.

With "VFIO doesn't have support" you mean kvmtool's VFIO implementation or the kernel's VFIO driver?
Because to me it looks like it should work in the kernel, at least for the BAR sizing on the expansion ROM BAR:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/vfio/pci/vfio_pci_config.c#n477

Am I missing something here?

QEMU seems to have code to load the ROM from the device and present that to the guest, but I am not sure exactly why.

Cheers,
Andre

> However, the
> callback writes the guest value to the hardware BAR, and then it reads
> it back to the BAR to make sure the write has completed successfully.
> 
> After this, we return to regular PCI emulation and because the BAR is
> no longer 0, we write back to the BAR the value that the guest used to
> get the size. As a result, the guest will think that the ROM size is
> 0x800 after the subsequent read and we end up unintentionally exposing
> to the guest a BAR which we don't emulate.
> 
> Let's fix this by ignoring writes to the expansion ROM BAR.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  vfio/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 1bdc20038411..1f38f90c3ae9 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -472,6 +472,9 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	struct vfio_device *vdev;
>  	void *base = pci_hdr;
>  
> +	if (offset == PCI_ROM_ADDRESS)
> +		return;
> +
>  	pdev = container_of(pci_hdr, struct vfio_pci_device, hdr);
>  	vdev = container_of(pdev, struct vfio_device, pci);
>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures
  2020-01-23 13:47 ` [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures Alexandru Elisei
@ 2020-01-30 14:51   ` Andre Przywara
  2020-03-06 11:20     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-01-30 14:51 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:50 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> Don't ignore an error in the bus specific initialization function in
> virtio_init; don't ignore the result of virtio_init; and don't return 0
> in virtio_blk__init and virtio_scsi__init when we encounter an error.
> Hopefully this will save some developer's time debugging faulty virtio
> devices in a guest.

Seems like the right thing to do, but I was wondering how you triggered this? AFAICS virtio_init only fails when calloc() fails or you pass an illegal transport, with the latter looking like being hard coded to one of the two supported.

One minor thing below ...

> 
> To take advantage of the cleanup function virtio_blk__exit, we have
> moved appending the new device to the list before the call to
> virtio_init.
> 
> To safeguard against this in the future, virtio_init has been annoted
> with the compiler attribute warn_unused_result.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  include/kvm/kvm.h        |  1 +
>  include/kvm/virtio.h     |  7 ++++---
>  include/linux/compiler.h |  2 +-
>  virtio/9p.c              |  9 ++++++---
>  virtio/balloon.c         | 10 +++++++---
>  virtio/blk.c             | 14 +++++++++-----
>  virtio/console.c         | 11 ++++++++---
>  virtio/core.c            |  9 +++++----
>  virtio/net.c             | 32 ++++++++++++++++++--------------
>  virtio/scsi.c            | 14 +++++++++-----
>  10 files changed, 68 insertions(+), 41 deletions(-)
> 
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index 7a738183d67a..c6dc6ef72d11 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -8,6 +8,7 @@
>  
>  #include <stdbool.h>
>  #include <linux/types.h>
> +#include <linux/compiler.h>
>  #include <time.h>
>  #include <signal.h>
>  #include <sys/prctl.h>
> diff --git a/include/kvm/virtio.h b/include/kvm/virtio.h
> index 19b913732cd5..3a311f54f2a5 100644
> --- a/include/kvm/virtio.h
> +++ b/include/kvm/virtio.h
> @@ -7,6 +7,7 @@
>  #include <linux/virtio_pci.h>
>  
>  #include <linux/types.h>
> +#include <linux/compiler.h>
>  #include <linux/virtio_config.h>
>  #include <sys/uio.h>
>  
> @@ -204,9 +205,9 @@ struct virtio_ops {
>  	int (*reset)(struct kvm *kvm, struct virtio_device *vdev);
>  };
>  
> -int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
> -		struct virtio_ops *ops, enum virtio_trans trans,
> -		int device_id, int subsys_id, int class);
> +int __must_check virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
> +			     struct virtio_ops *ops, enum virtio_trans trans,
> +			     int device_id, int subsys_id, int class);
>  int virtio_compat_add_message(const char *device, const char *config);
>  const char* virtio_trans_name(enum virtio_trans trans);
>  
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 898420b81aec..a662ba0a5c68 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -14,7 +14,7 @@
>  #define __packed	__attribute__((packed))
>  #define __iomem
>  #define __force
> -#define __must_check
> +#define __must_check	__attribute__((warn_unused_result))
>  #define unlikely
>  
>  #endif
> diff --git a/virtio/9p.c b/virtio/9p.c
> index ac70dbc31207..b78f2b3f0e09 100644
> --- a/virtio/9p.c
> +++ b/virtio/9p.c
> @@ -1551,11 +1551,14 @@ int virtio_9p_img_name_parser(const struct option *opt, const char *arg, int uns
>  int virtio_9p__init(struct kvm *kvm)
>  {
>  	struct p9_dev *p9dev;
> +	int r;
>  
>  	list_for_each_entry(p9dev, &devs, list) {
> -		virtio_init(kvm, p9dev, &p9dev->vdev, &p9_dev_virtio_ops,
> -			    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_9P,
> -			    VIRTIO_ID_9P, PCI_CLASS_9P);
> +		r = virtio_init(kvm, p9dev, &p9dev->vdev, &p9_dev_virtio_ops,
> +				VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_9P,
> +				VIRTIO_ID_9P, PCI_CLASS_9P);
> +		if (r < 0)
> +			return r;
>  	}
>  
>  	return 0;
> diff --git a/virtio/balloon.c b/virtio/balloon.c
> index 0bd16703dfee..8e8803fed607 100644
> --- a/virtio/balloon.c
> +++ b/virtio/balloon.c
> @@ -264,6 +264,8 @@ struct virtio_ops bln_dev_virtio_ops = {
>  
>  int virtio_bln__init(struct kvm *kvm)
>  {
> +	int r;
> +
>  	if (!kvm->cfg.balloon)
>  		return 0;
>  
> @@ -273,9 +275,11 @@ int virtio_bln__init(struct kvm *kvm)
>  	bdev.stat_waitfd	= eventfd(0, 0);
>  	memset(&bdev.config, 0, sizeof(struct virtio_balloon_config));
>  
> -	virtio_init(kvm, &bdev, &bdev.vdev, &bln_dev_virtio_ops,
> -		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLN,
> -		    VIRTIO_ID_BALLOON, PCI_CLASS_BLN);
> +	r = virtio_init(kvm, &bdev, &bdev.vdev, &bln_dev_virtio_ops,
> +			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLN,
> +			VIRTIO_ID_BALLOON, PCI_CLASS_BLN);
> +	if (r < 0)
> +		return r;
>  
>  	if (compat_id == -1)
>  		compat_id = virtio_compat_add_message("virtio-balloon", "CONFIG_VIRTIO_BALLOON");
> diff --git a/virtio/blk.c b/virtio/blk.c
> index f267be1563dc..4d02d101af81 100644
> --- a/virtio/blk.c
> +++ b/virtio/blk.c
> @@ -306,6 +306,7 @@ static struct virtio_ops blk_dev_virtio_ops = {
>  static int virtio_blk__init_one(struct kvm *kvm, struct disk_image *disk)
>  {
>  	struct blk_dev *bdev;
> +	int r;
>  
>  	if (!disk)
>  		return -EINVAL;
> @@ -323,12 +324,14 @@ static int virtio_blk__init_one(struct kvm *kvm, struct disk_image *disk)
>  		.kvm			= kvm,
>  	};
>  
> -	virtio_init(kvm, bdev, &bdev->vdev, &blk_dev_virtio_ops,
> -		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLK,
> -		    VIRTIO_ID_BLOCK, PCI_CLASS_BLK);
> -
>  	list_add_tail(&bdev->list, &bdevs);
>  
> +	r = virtio_init(kvm, bdev, &bdev->vdev, &blk_dev_virtio_ops,
> +			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_BLK,
> +			VIRTIO_ID_BLOCK, PCI_CLASS_BLK);
> +	if (r < 0)
> +		return r;
> +
>  	disk_image__set_callback(bdev->disk, virtio_blk_complete);
>  
>  	if (compat_id == -1)
> @@ -359,7 +362,8 @@ int virtio_blk__init(struct kvm *kvm)
>  
>  	return 0;
>  cleanup:
> -	return virtio_blk__exit(kvm);
> +	virtio_blk__exit(kvm);
> +	return r;
>  }
>  virtio_dev_init(virtio_blk__init);
>  
> diff --git a/virtio/console.c b/virtio/console.c
> index f1be02549222..e0b98df37965 100644
> --- a/virtio/console.c
> +++ b/virtio/console.c
> @@ -230,12 +230,17 @@ static struct virtio_ops con_dev_virtio_ops = {
>  
>  int virtio_console__init(struct kvm *kvm)
>  {
> +	int r;
> +
>  	if (kvm->cfg.active_console != CONSOLE_VIRTIO)
>  		return 0;
>  
> -	virtio_init(kvm, &cdev, &cdev.vdev, &con_dev_virtio_ops,
> -		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_CONSOLE,
> -		    VIRTIO_ID_CONSOLE, PCI_CLASS_CONSOLE);
> +	r = virtio_init(kvm, &cdev, &cdev.vdev, &con_dev_virtio_ops,
> +			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_CONSOLE,
> +			VIRTIO_ID_CONSOLE, PCI_CLASS_CONSOLE);
> +	if (r < 0)
> +		return r;
> +
>  	if (compat_id == -1)
>  		compat_id = virtio_compat_add_message("virtio-console", "CONFIG_VIRTIO_CONSOLE");
>  
> diff --git a/virtio/core.c b/virtio/core.c
> index e10ec362e1ea..f5b3c07fc100 100644
> --- a/virtio/core.c
> +++ b/virtio/core.c
> @@ -259,6 +259,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		int device_id, int subsys_id, int class)
>  {
>  	void *virtio;
> +	int r;
>  
>  	switch (trans) {
>  	case VIRTIO_PCI:
> @@ -272,7 +273,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		vdev->ops->init			= virtio_pci__init;
>  		vdev->ops->exit			= virtio_pci__exit;
>  		vdev->ops->reset		= virtio_pci__reset;
> -		vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
> +		r = vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
>  		break;
>  	case VIRTIO_MMIO:
>  		virtio = calloc(sizeof(struct virtio_mmio), 1);
> @@ -285,13 +286,13 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		vdev->ops->init			= virtio_mmio_init;
>  		vdev->ops->exit			= virtio_mmio_exit;
>  		vdev->ops->reset		= virtio_mmio_reset;
> -		vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
> +		r = vdev->ops->init(kvm, dev, vdev, device_id, subsys_id, class);
>  		break;
>  	default:
> -		return -1;
> +		r = -1;
>  	};
>  
> -	return 0;
> +	return r;
>  }
>  
>  int virtio_compat_add_message(const char *device, const char *config)
> diff --git a/virtio/net.c b/virtio/net.c
> index 091406912a24..425c13ba1136 100644
> --- a/virtio/net.c
> +++ b/virtio/net.c
> @@ -910,7 +910,7 @@ done:
>  
>  static int virtio_net__init_one(struct virtio_net_params *params)
>  {
> -	int i, err;
> +	int i, r;
>  	struct net_dev *ndev;
>  	struct virtio_ops *ops;
>  	enum virtio_trans trans = VIRTIO_DEFAULT_TRANS(params->kvm);
> @@ -920,10 +920,8 @@ static int virtio_net__init_one(struct virtio_net_params *params)
>  		return -ENOMEM;
>  
>  	ops = malloc(sizeof(*ops));
> -	if (ops == NULL) {
> -		err = -ENOMEM;
> -		goto err_free_ndev;
> -	}
> +	if (ops == NULL)
> +		return -ENOMEM;

Doesn't that leave struct net_dev allocated? I am happy with removing the goto, but we should free(ndev) before we return, I think.

Cheers,
Andre.

>  
>  	list_add_tail(&ndev->list, &ndevs);
>  
> @@ -969,8 +967,10 @@ static int virtio_net__init_one(struct virtio_net_params *params)
>  				   virtio_trans_name(trans));
>  	}
>  
> -	virtio_init(params->kvm, ndev, &ndev->vdev, ops, trans,
> -		    PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET);
> +	r = virtio_init(params->kvm, ndev, &ndev->vdev, ops, trans,
> +			PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET);
> +	if (r < 0)
> +		return r;
>  
>  	if (params->vhost)
>  		virtio_net__vhost_init(params->kvm, ndev);
> @@ -979,19 +979,17 @@ static int virtio_net__init_one(struct virtio_net_params *params)
>  		compat_id = virtio_compat_add_message("virtio-net", "CONFIG_VIRTIO_NET");
>  
>  	return 0;
> -
> -err_free_ndev:
> -	free(ndev);
> -	return err;
>  }
>  
>  int virtio_net__init(struct kvm *kvm)
>  {
> -	int i;
> +	int i, r;
>  
>  	for (i = 0; i < kvm->cfg.num_net_devices; i++) {
>  		kvm->cfg.net_params[i].kvm = kvm;
> -		virtio_net__init_one(&kvm->cfg.net_params[i]);
> +		r = virtio_net__init_one(&kvm->cfg.net_params[i]);
> +		if (r < 0)
> +			goto cleanup;
>  	}
>  
>  	if (kvm->cfg.num_net_devices == 0 && kvm->cfg.no_net == 0) {
> @@ -1007,10 +1005,16 @@ int virtio_net__init(struct kvm *kvm)
>  		str_to_mac(kvm->cfg.guest_mac, net_params.guest_mac);
>  		str_to_mac(kvm->cfg.host_mac, net_params.host_mac);
>  
> -		virtio_net__init_one(&net_params);
> +		r = virtio_net__init_one(&net_params);
> +		if (r < 0)
> +			goto cleanup;
>  	}
>  
>  	return 0;
> +
> +cleanup:
> +	virtio_net__exit(kvm);
> +	return r;
>  }
>  virtio_dev_init(virtio_net__init);
>  
> diff --git a/virtio/scsi.c b/virtio/scsi.c
> index 1ec78fe0945a..16a86cb7e0e6 100644
> --- a/virtio/scsi.c
> +++ b/virtio/scsi.c
> @@ -234,6 +234,7 @@ static void virtio_scsi_vhost_init(struct kvm *kvm, struct scsi_dev *sdev)
>  static int virtio_scsi_init_one(struct kvm *kvm, struct disk_image *disk)
>  {
>  	struct scsi_dev *sdev;
> +	int r;
>  
>  	if (!disk)
>  		return -EINVAL;
> @@ -260,12 +261,14 @@ static int virtio_scsi_init_one(struct kvm *kvm, struct disk_image *disk)
>  	strlcpy((char *)&sdev->target.vhost_wwpn, disk->wwpn, sizeof(sdev->target.vhost_wwpn));
>  	sdev->target.vhost_tpgt = strtol(disk->tpgt, NULL, 0);
>  
> -	virtio_init(kvm, sdev, &sdev->vdev, &scsi_dev_virtio_ops,
> -		    VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_SCSI,
> -		    VIRTIO_ID_SCSI, PCI_CLASS_BLK);
> -
>  	list_add_tail(&sdev->list, &sdevs);
>  
> +	r = virtio_init(kvm, sdev, &sdev->vdev, &scsi_dev_virtio_ops,
> +			VIRTIO_DEFAULT_TRANS(kvm), PCI_DEVICE_ID_VIRTIO_SCSI,
> +			VIRTIO_ID_SCSI, PCI_CLASS_BLK);
> +	if (r < 0)
> +		return r;
> +
>  	virtio_scsi_vhost_init(kvm, sdev);
>  
>  	if (compat_id == -1)
> @@ -302,7 +305,8 @@ int virtio_scsi_init(struct kvm *kvm)
>  
>  	return 0;
>  cleanup:
> -	return virtio_scsi_exit(kvm);
> +	virtio_scsi_exit(kvm);
> +	return r;
>  }
>  virtio_dev_init(virtio_scsi_init);
>  


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 16/30] Don't ignore errors registering a device, ioport or mmio emulation
  2020-01-23 13:47 ` [PATCH v2 kvmtool 16/30] Don't ignore errors registering a device, ioport or mmio emulation Alexandru Elisei
@ 2020-01-30 14:51   ` Andre Przywara
  2020-03-06 11:28     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-01-30 14:51 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:51 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> An error returned by device__register, kvm__register_mmio and
> ioport__register means that the device will
> not be emulated properly. Annotate the functions with __must_check, so we
> get a compiler warning when this error is ignored.
> 
> And fix several instances where the caller returns 0 even if the
> function failed.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Looks alright, one minor nit below, with that fixed:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

> ---
>  arm/ioport.c          |  3 +-
>  hw/i8042.c            | 12 ++++++--
>  hw/vesa.c             |  4 ++-
>  include/kvm/devices.h |  3 +-
>  include/kvm/ioport.h  |  6 ++--
>  include/kvm/kvm.h     |  6 ++--
>  ioport.c              | 23 ++++++++-------
>  mips/kvm.c            |  3 +-
>  powerpc/ioport.c      |  3 +-
>  virtio/mmio.c         | 13 +++++++--
>  x86/ioport.c          | 66 ++++++++++++++++++++++++++++++++-----------
>  11 files changed, 100 insertions(+), 42 deletions(-)
> 
> diff --git a/arm/ioport.c b/arm/ioport.c
> index bdd30b6fe812..2f0feb9ab69f 100644
> --- a/arm/ioport.c
> +++ b/arm/ioport.c
> @@ -1,8 +1,9 @@
>  #include "kvm/ioport.h"
>  #include "kvm/irq.h"
>  
> -void ioport__setup_arch(struct kvm *kvm)
> +int ioport__setup_arch(struct kvm *kvm)
>  {
> +	return 0;
>  }
>  
>  void ioport__map_irq(u8 *irq)
> diff --git a/hw/i8042.c b/hw/i8042.c
> index 2d8c96e9c7e6..37a99a2dc6b8 100644
> --- a/hw/i8042.c
> +++ b/hw/i8042.c
> @@ -349,10 +349,18 @@ static struct ioport_operations kbd_ops = {
>  
>  int kbd__init(struct kvm *kvm)
>  {
> +	int r;
> +
>  	kbd_reset();
>  	state.kvm = kvm;
> -	ioport__register(kvm, I8042_DATA_REG, &kbd_ops, 2, NULL);
> -	ioport__register(kvm, I8042_COMMAND_REG, &kbd_ops, 2, NULL);
> +	r = ioport__register(kvm, I8042_DATA_REG, &kbd_ops, 2, NULL);
> +	if (r < 0)
> +		return r;
> +	r = ioport__register(kvm, I8042_COMMAND_REG, &kbd_ops, 2, NULL);
> +	if (r < 0) {
> +		ioport__unregister(kvm, I8042_DATA_REG);
> +		return r;
> +	}
>  
>  	return 0;
>  }
> diff --git a/hw/vesa.c b/hw/vesa.c
> index d8d91aa9c873..b92cc990b730 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -70,7 +70,9 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>  	vesa_base_addr			= (u16)r;
>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
> -	device__register(&vesa_device);
> +	r = device__register(&vesa_device);
> +	if (r < 0)
> +		return ERR_PTR(r);
>  
>  	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
>  	if (mem == MAP_FAILED)
> diff --git a/include/kvm/devices.h b/include/kvm/devices.h
> index 405f19521977..e445db6f56b1 100644
> --- a/include/kvm/devices.h
> +++ b/include/kvm/devices.h
> @@ -3,6 +3,7 @@
>  
>  #include <linux/rbtree.h>
>  #include <linux/types.h>
> +#include <linux/compiler.h>
>  
>  enum device_bus_type {
>  	DEVICE_BUS_PCI,
> @@ -18,7 +19,7 @@ struct device_header {
>  	struct rb_node		node;
>  };
>  
> -int device__register(struct device_header *dev);
> +int __must_check device__register(struct device_header *dev);
>  void device__unregister(struct device_header *dev);
>  struct device_header *device__find_dev(enum device_bus_type bus_type,
>  				       u8 dev_num);
> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> index 8c86b7151f25..62a719327e3f 100644
> --- a/include/kvm/ioport.h
> +++ b/include/kvm/ioport.h
> @@ -33,11 +33,11 @@ struct ioport_operations {
>  							    enum irq_type));
>  };
>  
> -void ioport__setup_arch(struct kvm *kvm);
> +int ioport__setup_arch(struct kvm *kvm);
>  void ioport__map_irq(u8 *irq);
>  
> -int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops,
> -			int count, void *param);
> +int __must_check ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops,
> +				  int count, void *param);
>  int ioport__unregister(struct kvm *kvm, u16 port);
>  int ioport__init(struct kvm *kvm);
>  int ioport__exit(struct kvm *kvm);
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index c6dc6ef72d11..50119a8672eb 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -128,9 +128,9 @@ static inline int kvm__reserve_mem(struct kvm *kvm, u64 guest_phys, u64 size)
>  				 KVM_MEM_TYPE_RESERVED);
>  }
>  
> -int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
> -		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
> -			void *ptr);
> +int __must_check kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
> +				    void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
> +				    void *ptr);
>  bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr);
>  void kvm__reboot(struct kvm *kvm);
>  void kvm__pause(struct kvm *kvm);
> diff --git a/ioport.c b/ioport.c
> index a72e4035881a..d224819c6e43 100644
> --- a/ioport.c
> +++ b/ioport.c
> @@ -91,16 +91,21 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>  	};
>  
>  	r = ioport_insert(&ioport_tree, entry);
> -	if (r < 0) {
> -		free(entry);
> -		br_write_unlock(kvm);
> -		return r;
> -	}
> -
> -	device__register(&entry->dev_hdr);
> +	if (r < 0)
> +		goto out_free;
> +	r = device__register(&entry->dev_hdr);
> +	if (r < 0)
> +		goto out_erase;
>  	br_write_unlock(kvm);
>  
>  	return port;
> +
> +out_erase:
> +	rb_int_erase(&ioport_tree, &entry->node);

To keep the abstraction, shouldn't that rather be ioport_remove() instead?

Cheers,
Andre.

> +out_free:
> +	free(entry);
> +	br_write_unlock(kvm);
> +	return r;
>  }
>  
>  int ioport__unregister(struct kvm *kvm, u16 port)
> @@ -196,9 +201,7 @@ out:
>  
>  int ioport__init(struct kvm *kvm)
>  {
> -	ioport__setup_arch(kvm);
> -
> -	return 0;
> +	return ioport__setup_arch(kvm);
>  }
>  dev_base_init(ioport__init);
>  
> diff --git a/mips/kvm.c b/mips/kvm.c
> index 211770da0d85..26355930d3b6 100644
> --- a/mips/kvm.c
> +++ b/mips/kvm.c
> @@ -100,8 +100,9 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
>  		die_perror("KVM_IRQ_LINE ioctl");
>  }
>  
> -void ioport__setup_arch(struct kvm *kvm)
> +int ioport__setup_arch(struct kvm *kvm)
>  {
> +	return 0;
>  }
>  
>  bool kvm__arch_cpu_supports_vm(void)
> diff --git a/powerpc/ioport.c b/powerpc/ioport.c
> index 58dc625c54fe..0c188b61a51a 100644
> --- a/powerpc/ioport.c
> +++ b/powerpc/ioport.c
> @@ -12,9 +12,10 @@
>  
>  #include <stdlib.h>
>  
> -void ioport__setup_arch(struct kvm *kvm)
> +int ioport__setup_arch(struct kvm *kvm)
>  {
>  	/* PPC has no legacy ioports to set up */
> +	return 0;
>  }
>  
>  void ioport__map_irq(u8 *irq)
> diff --git a/virtio/mmio.c b/virtio/mmio.c
> index 03cecc366292..5537c39367d6 100644
> --- a/virtio/mmio.c
> +++ b/virtio/mmio.c
> @@ -292,13 +292,16 @@ int virtio_mmio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		     int device_id, int subsys_id, int class)
>  {
>  	struct virtio_mmio *vmmio = vdev->virtio;
> +	int r;
>  
>  	vmmio->addr	= virtio_mmio_get_io_space_block(VIRTIO_MMIO_IO_SIZE);
>  	vmmio->kvm	= kvm;
>  	vmmio->dev	= dev;
>  
> -	kvm__register_mmio(kvm, vmmio->addr, VIRTIO_MMIO_IO_SIZE,
> -			   false, virtio_mmio_mmio_callback, vdev);
> +	r = kvm__register_mmio(kvm, vmmio->addr, VIRTIO_MMIO_IO_SIZE,
> +			       false, virtio_mmio_mmio_callback, vdev);
> +	if (r < 0)
> +		return r;
>  
>  	vmmio->hdr = (struct virtio_mmio_hdr) {
>  		.magic		= {'v', 'i', 'r', 't'},
> @@ -313,7 +316,11 @@ int virtio_mmio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		.data		= generate_virtio_mmio_fdt_node,
>  	};
>  
> -	device__register(&vmmio->dev_hdr);
> +	r = device__register(&vmmio->dev_hdr);
> +	if (r < 0) {
> +		kvm__deregister_mmio(kvm, vmmio->addr);
> +		return r;
> +	}
>  
>  	/*
>  	 * Instantiate guest virtio-mmio devices using kernel command line
> diff --git a/x86/ioport.c b/x86/ioport.c
> index 8572c758ed4f..7ad7b8f3f497 100644
> --- a/x86/ioport.c
> +++ b/x86/ioport.c
> @@ -69,50 +69,84 @@ void ioport__map_irq(u8 *irq)
>  {
>  }
>  
> -void ioport__setup_arch(struct kvm *kvm)
> +int ioport__setup_arch(struct kvm *kvm)
>  {
> +	int r;
> +
>  	/* Legacy ioport setup */
>  
>  	/* 0000 - 001F - DMA1 controller */
> -	ioport__register(kvm, 0x0000, &dummy_read_write_ioport_ops, 32, NULL);
> +	r = ioport__register(kvm, 0x0000, &dummy_read_write_ioport_ops, 32, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* 0x0020 - 0x003F - 8259A PIC 1 */
> -	ioport__register(kvm, 0x0020, &dummy_read_write_ioport_ops, 2, NULL);
> +	r = ioport__register(kvm, 0x0020, &dummy_read_write_ioport_ops, 2, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* PORT 0040-005F - PIT - PROGRAMMABLE INTERVAL TIMER (8253, 8254) */
> -	ioport__register(kvm, 0x0040, &dummy_read_write_ioport_ops, 4, NULL);
> +	r = ioport__register(kvm, 0x0040, &dummy_read_write_ioport_ops, 4, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* 0092 - PS/2 system control port A */
> -	ioport__register(kvm, 0x0092, &ps2_control_a_ops, 1, NULL);
> +	r = ioport__register(kvm, 0x0092, &ps2_control_a_ops, 1, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* 0x00A0 - 0x00AF - 8259A PIC 2 */
> -	ioport__register(kvm, 0x00A0, &dummy_read_write_ioport_ops, 2, NULL);
> +	r = ioport__register(kvm, 0x00A0, &dummy_read_write_ioport_ops, 2, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* 00C0 - 001F - DMA2 controller */
> -	ioport__register(kvm, 0x00C0, &dummy_read_write_ioport_ops, 32, NULL);
> +	r = ioport__register(kvm, 0x00C0, &dummy_read_write_ioport_ops, 32, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* PORT 00E0-00EF are 'motherboard specific' so we use them for our
>  	   internal debugging purposes.  */
> -	ioport__register(kvm, IOPORT_DBG, &debug_ops, 1, NULL);
> +	r = ioport__register(kvm, IOPORT_DBG, &debug_ops, 1, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* PORT 00ED - DUMMY PORT FOR DELAY??? */
> -	ioport__register(kvm, 0x00ED, &dummy_write_only_ioport_ops, 1, NULL);
> +	r = ioport__register(kvm, 0x00ED, &dummy_write_only_ioport_ops, 1, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* 0x00F0 - 0x00FF - Math co-processor */
> -	ioport__register(kvm, 0x00F0, &dummy_write_only_ioport_ops, 2, NULL);
> +	r = ioport__register(kvm, 0x00F0, &dummy_write_only_ioport_ops, 2, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* PORT 0278-027A - PARALLEL PRINTER PORT (usually LPT1, sometimes LPT2) */
> -	ioport__register(kvm, 0x0278, &dummy_read_write_ioport_ops, 3, NULL);
> +	r = ioport__register(kvm, 0x0278, &dummy_read_write_ioport_ops, 3, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* PORT 0378-037A - PARALLEL PRINTER PORT (usually LPT2, sometimes LPT3) */
> -	ioport__register(kvm, 0x0378, &dummy_read_write_ioport_ops, 3, NULL);
> +	r = ioport__register(kvm, 0x0378, &dummy_read_write_ioport_ops, 3, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* PORT 03D4-03D5 - COLOR VIDEO - CRT CONTROL REGISTERS */
> -	ioport__register(kvm, 0x03D4, &dummy_read_write_ioport_ops, 1, NULL);
> -	ioport__register(kvm, 0x03D5, &dummy_write_only_ioport_ops, 1, NULL);
> +	r = ioport__register(kvm, 0x03D4, &dummy_read_write_ioport_ops, 1, NULL);
> +	if (r < 0)
> +		return r;
> +	r = ioport__register(kvm, 0x03D5, &dummy_write_only_ioport_ops, 1, NULL);
> +	if (r < 0)
> +		return r;
>  
> -	ioport__register(kvm, 0x402, &seabios_debug_ops, 1, NULL);
> +	r = ioport__register(kvm, 0x402, &seabios_debug_ops, 1, NULL);
> +	if (r < 0)
> +		return r;
>  
>  	/* 0510 - QEMU BIOS configuration register */
> -	ioport__register(kvm, 0x510, &dummy_read_write_ioport_ops, 2, NULL);
> +	r = ioport__register(kvm, 0x510, &dummy_read_write_ioport_ops, 2, NULL);
> +	if (r < 0)
> +		return r;
> +
> +	return 0;
>  }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 17/30] hw/vesa: Don't ignore fatal errors
  2020-01-23 13:47 ` [PATCH v2 kvmtool 17/30] hw/vesa: Don't ignore fatal errors Alexandru Elisei
@ 2020-01-30 14:52   ` Andre Przywara
  2020-03-06 12:33     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-01-30 14:52 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:52 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Failling an mmap call or creating a memslot means that device emulation
> will not work, don't ignore it.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  hw/vesa.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index b92cc990b730..a665736a76d7 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -76,9 +76,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>  	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
>  	if (mem == MAP_FAILED)
> -		ERR_PTR(-errno);
> +		return ERR_PTR(-errno);
>  
> -	kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
> +	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
> +	if (r < 0)
> +		return ERR_PTR(r);

For the sake of correctness, we should munmap here, I think.
With that fixed:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre.

>  
>  	vesafb = (struct framebuffer) {
>  		.width			= VESA_WIDTH,


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 13/30] vfio/pci: Ignore expansion ROM BAR writes
  2020-01-30 14:50   ` Andre Przywara
@ 2020-01-30 15:52     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-01-30 15:52 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 1/30/20 2:50 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:48 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> To get the size of the expansion ROM, software writes 0xfffff800 to the
>> expansion ROM BAR in the PCI configuration space. PCI emulation executes
>> the optional configuration space write callback that a device can
>> implement before emulating this write.
>>
>> VFIO doesn't have support for emulating expansion ROMs.
> With "VFIO doesn't have support" you mean kvmtool's VFIO implementation or the kernel's VFIO driver?
> Because to me it looks like it should work in the kernel, at least for the BAR sizing on the expansion ROM BAR:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/vfio/pci/vfio_pci_config.c#n477
>
> Am I missing something here?

kvmtool's implementation of VFIO doesn't have support for expansion roms, sorry
for the confusion. VFIO definitely has support for expansion roms, I actually
have some patches that I wrote in a previous iteration of this series that
enable kvmtool to use it (I'll come back to them after this series gets merged).

>
> QEMU seems to have code to load the ROM from the device and present that to the guest, but I am not sure exactly why.

Same here, I don't know why qemu does that, I would have imagined that the
KVM_MEM_READONLY flag for KVM_SET_USER_MEMORY_REGION is a perfect use case for
expansion ROM emulation. To sanitize accesses? To make it possible for the user
to provide its own firmware file for the device? Either way, I think this is a
discussion for another time.

To summarize, I'll reword the commit to remove the confusion - kvmtool's
implementation of VFIO doesn't have support for expansion ROM bars and emulation.

Thanks,
Alex
>
> Cheers,
> Andre
>
>> However, the
>> callback writes the guest value to the hardware BAR, and then it reads
>> it back to the BAR to make sure the write has completed successfully.
>>
>> After this, we return to regular PCI emulation and because the BAR is
>> no longer 0, we write back to the BAR the value that the guest used to
>> get the size. As a result, the guest will think that the ROM size is
>> 0x800 after the subsequent read and we end up unintentionally exposing
>> to the guest a BAR which we don't emulate.
>>
>> Let's fix this by ignoring writes to the expansion ROM BAR.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  vfio/pci.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 1bdc20038411..1f38f90c3ae9 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -472,6 +472,9 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>>  	struct vfio_device *vdev;
>>  	void *base = pci_hdr;
>>  
>> +	if (offset == PCI_ROM_ADDRESS)
>> +		return;
>> +
>>  	pdev = container_of(pci_hdr, struct vfio_pci_device, hdr);
>>  	vdev = container_of(pdev, struct vfio_device, pci);
>>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0
  2020-01-23 13:47 ` [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0 Alexandru Elisei
@ 2020-02-03 12:20   ` Andre Przywara
  2020-02-03 12:27     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-03 12:20 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:53 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> BAR 0 is an I/O BAR and is registered as an ioport region. Let's set its
> size, so a guest can actually use it.

Well, the whole I/O bar emulates as RAZ/WI, so I would be curious how the guest would actually use it, but specifying the size is surely a good thing, so:
 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara>

Cheers,
Andre

> ---
>  hw/vesa.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index a665736a76d7..e988c0425946 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -70,6 +70,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>  	vesa_base_addr			= (u16)r;
>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
> +	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
>  	r = device__register(&vesa_device);
>  	if (r < 0)
>  		return ERR_PTR(r);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio
  2020-01-23 13:47 ` [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio Alexandru Elisei
@ 2020-02-03 12:23   ` Andre Przywara
  2020-02-05 11:25     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-03 12:23 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:54 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> kvmtool uses brlock for protecting accesses to the ioport and mmio
> red-black trees. brlock allows concurrent reads, but only one writer,
> which is assumed not to be a VCPU thread. This is done by issuing a
> compiler barrier on read and pausing the entire virtual machine on
> writes. When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread
> read/write lock.
> 
> When we will implement reassignable BARs, the mmio or ioport mapping
> will be done as a result of a VCPU mmio access. When brlock is a
> read/write lock, it means that we will try to acquire a write lock with
> the read lock already held by the same VCPU and we will deadlock. When
> it's not, a VCPU will have to call kvm__pause, which means the virtual
> machine will stay paused forever.
> 
> Let's avoid all this by using separate pthread_rwlock_t locks for the
> mmio and the ioport red-black trees and carefully choosing our read
> critical region such that modification as a result of a guest mmio
> access doesn't deadlock.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  ioport.c | 20 +++++++++++---------
>  mmio.c   | 26 +++++++++++++++++---------
>  2 files changed, 28 insertions(+), 18 deletions(-)
> 
> diff --git a/ioport.c b/ioport.c
> index d224819c6e43..c044a80dd763 100644
> --- a/ioport.c
> +++ b/ioport.c
> @@ -2,9 +2,9 @@
>  
>  #include "kvm/kvm.h"
>  #include "kvm/util.h"
> -#include "kvm/brlock.h"
>  #include "kvm/rbtree-interval.h"
>  #include "kvm/mutex.h"
> +#include "kvm/rwsem.h"
>  
>  #include <linux/kvm.h>	/* for KVM_EXIT_* */
>  #include <linux/types.h>
> @@ -16,6 +16,8 @@
>  
>  #define ioport_node(n) rb_entry(n, struct ioport, node)
>  
> +static DECLARE_RWSEM(ioport_lock);
> +
>  static struct rb_root		ioport_tree = RB_ROOT;
>  
>  static struct ioport *ioport_search(struct rb_root *root, u64 addr)
> @@ -68,7 +70,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>  	struct ioport *entry;
>  	int r;
>  
> -	br_write_lock(kvm);
> +	down_write(&ioport_lock);
>  
>  	entry = ioport_search(&ioport_tree, port);
>  	if (entry) {
> @@ -96,7 +98,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>  	r = device__register(&entry->dev_hdr);
>  	if (r < 0)
>  		goto out_erase;
> -	br_write_unlock(kvm);
> +	up_write(&ioport_lock);
>  
>  	return port;
>  
> @@ -104,7 +106,7 @@ out_erase:
>  	rb_int_erase(&ioport_tree, &entry->node);
>  out_free:
>  	free(entry);
> -	br_write_unlock(kvm);
> +	up_write(&ioport_lock);
>  	return r;
>  }
>  
> @@ -113,7 +115,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
>  	struct ioport *entry;
>  	int r;
>  
> -	br_write_lock(kvm);
> +	down_write(&ioport_lock);
>  
>  	r = -ENOENT;
>  	entry = ioport_search(&ioport_tree, port);
> @@ -128,7 +130,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
>  	r = 0;
>  
>  done:
> -	br_write_unlock(kvm);
> +	up_write(&ioport_lock);
>  
>  	return r;
>  }
> @@ -171,8 +173,10 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>  	void *ptr = data;
>  	struct kvm *kvm = vcpu->kvm;
>  
> -	br_read_lock(kvm);
> +	down_read(&ioport_lock);
>  	entry = ioport_search(&ioport_tree, port);
> +	up_read(&ioport_lock);
> +
>  	if (!entry)
>  		goto out;

I don't think it's valid to drop the lock that early. A concurrent ioport_unregister would free the entry pointer, so we have a potential use-after-free here.
I guess you are thinking about an x86 CF8/CFC config space access here, that in turn would take the write lock when updating an I/O BAR?

So I think the same comment that you added below on kvm__emulate_mmio() applies here? More on this below then ....

>  
> @@ -188,8 +192,6 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>  	}
>  
>  out:
> -	br_read_unlock(kvm);
> -
>  	if (ret)
>  		return true;
>  
> diff --git a/mmio.c b/mmio.c
> index 61e1d47a587d..4e0ff830c738 100644
> --- a/mmio.c
> +++ b/mmio.c
> @@ -1,7 +1,7 @@
>  #include "kvm/kvm.h"
>  #include "kvm/kvm-cpu.h"
>  #include "kvm/rbtree-interval.h"
> -#include "kvm/brlock.h"
> +#include "kvm/rwsem.h"
>  
>  #include <stdio.h>
>  #include <stdlib.h>
> @@ -15,6 +15,8 @@
>  
>  #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
>  
> +static DECLARE_RWSEM(mmio_lock);
> +
>  struct mmio_mapping {
>  	struct rb_int_node	node;
>  	void			(*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
> @@ -61,7 +63,7 @@ static const char *to_direction(u8 is_write)
>  
>  int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
>  		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
> -			void *ptr)
> +		       void *ptr)
>  {
>  	struct mmio_mapping *mmio;
>  	struct kvm_coalesced_mmio_zone zone;
> @@ -88,9 +90,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
>  			return -errno;
>  		}
>  	}
> -	br_write_lock(kvm);
> +	down_write(&mmio_lock);
>  	ret = mmio_insert(&mmio_tree, mmio);
> -	br_write_unlock(kvm);
> +	up_write(&mmio_lock);
>  
>  	return ret;
>  }
> @@ -100,10 +102,10 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>  	struct mmio_mapping *mmio;
>  	struct kvm_coalesced_mmio_zone zone;
>  
> -	br_write_lock(kvm);
> +	down_write(&mmio_lock);
>  	mmio = mmio_search_single(&mmio_tree, phys_addr);
>  	if (mmio == NULL) {
> -		br_write_unlock(kvm);
> +		up_write(&mmio_lock);
>  		return false;
>  	}
>  
> @@ -114,7 +116,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>  	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
>  
>  	rb_int_erase(&mmio_tree, &mmio->node);
> -	br_write_unlock(kvm);
> +	up_write(&mmio_lock);
>  
>  	free(mmio);
>  	return true;
> @@ -124,8 +126,15 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
>  {
>  	struct mmio_mapping *mmio;
>  
> -	br_read_lock(vcpu->kvm);
> +	/*
> +	 * The callback might call kvm__register_mmio which takes a write lock,
> +	 * so avoid deadlocks by protecting only the node search with a reader
> +	 * lock. Note that there is still a small time window for a node to be
> +	 * deleted by another vcpu before mmio_fn gets called.
> +	 */

Do I get this right that this means the locking is not "fully" correct?
I don't think we should tolerate this. The underlying problem seems to be that the lock protects two separate things: namely the RB tree to find the handler, but also the handlers and their data structures itself. So far this was feasible, but this doesn't work any longer.

I think refcounting would be the answer here: Once mmio_search() returns an entry, a ref counter increases, preventing this entry from being removed by kvm__deregister_mmio(). If the emulation has finished, we decrement the counter, and trigger the free operation if it has reached zero.

Does that make sense?

Cheers,
Andre.

> +	down_read(&mmio_lock);
>  	mmio = mmio_search(&mmio_tree, phys_addr, len);
> +	up_read(&mmio_lock);
>  
>  	if (mmio)
>  		mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
> @@ -135,7 +144,6 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
>  				to_direction(is_write),
>  				(unsigned long long)phys_addr, len);
>  	}
> -	br_read_unlock(vcpu->kvm);
>  
>  	return true;
>  }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0
  2020-02-03 12:20   ` Andre Przywara
@ 2020-02-03 12:27     ` Alexandru Elisei
  2020-02-05 17:00       ` Andre Przywara
  0 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-02-03 12:27 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi Andre,

On 2/3/20 12:20 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:53 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
>> BAR 0 is an I/O BAR and is registered as an ioport region. Let's set its
>> size, so a guest can actually use it.
> Well, the whole I/O bar emulates as RAZ/WI, so I would be curious how the guest would actually use it, but specifying the size is surely a good thing, so:

Yeah, you're right, I was thinking about ARM where ioport are MMIO and you need to
map those address. I'll remove the part about the guest being able to actually use
it in the next iteration of the series.. Is it OK if I keep your Reviewed-by?

>  
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Reviewed-by: Andre Przywara <andre.przywara>
>
> Cheers,
> Andre
>
>> ---
>>  hw/vesa.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/hw/vesa.c b/hw/vesa.c
>> index a665736a76d7..e988c0425946 100644
>> --- a/hw/vesa.c
>> +++ b/hw/vesa.c
>> @@ -70,6 +70,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>  
>>  	vesa_base_addr			= (u16)r;
>>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
>> +	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
>>  	r = device__register(&vesa_device);
>>  	if (r < 0)
>>  		return ERR_PTR(r);

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio
  2020-02-03 12:23   ` Andre Przywara
@ 2020-02-05 11:25     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-02-05 11:25 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/3/20 12:23 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:54 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> kvmtool uses brlock for protecting accesses to the ioport and mmio
>> red-black trees. brlock allows concurrent reads, but only one writer,
>> which is assumed not to be a VCPU thread. This is done by issuing a
>> compiler barrier on read and pausing the entire virtual machine on
>> writes. When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread
>> read/write lock.
>>
>> When we will implement reassignable BARs, the mmio or ioport mapping
>> will be done as a result of a VCPU mmio access. When brlock is a
>> read/write lock, it means that we will try to acquire a write lock with
>> the read lock already held by the same VCPU and we will deadlock. When
>> it's not, a VCPU will have to call kvm__pause, which means the virtual
>> machine will stay paused forever.
>>
>> Let's avoid all this by using separate pthread_rwlock_t locks for the
>> mmio and the ioport red-black trees and carefully choosing our read
>> critical region such that modification as a result of a guest mmio
>> access doesn't deadlock.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  ioport.c | 20 +++++++++++---------
>>  mmio.c   | 26 +++++++++++++++++---------
>>  2 files changed, 28 insertions(+), 18 deletions(-)
>>
>> diff --git a/ioport.c b/ioport.c
>> index d224819c6e43..c044a80dd763 100644
>> --- a/ioport.c
>> +++ b/ioport.c
>> @@ -2,9 +2,9 @@
>>  
>>  #include "kvm/kvm.h"
>>  #include "kvm/util.h"
>> -#include "kvm/brlock.h"
>>  #include "kvm/rbtree-interval.h"
>>  #include "kvm/mutex.h"
>> +#include "kvm/rwsem.h"
>>  
>>  #include <linux/kvm.h>	/* for KVM_EXIT_* */
>>  #include <linux/types.h>
>> @@ -16,6 +16,8 @@
>>  
>>  #define ioport_node(n) rb_entry(n, struct ioport, node)
>>  
>> +static DECLARE_RWSEM(ioport_lock);
>> +
>>  static struct rb_root		ioport_tree = RB_ROOT;
>>  
>>  static struct ioport *ioport_search(struct rb_root *root, u64 addr)
>> @@ -68,7 +70,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>>  	struct ioport *entry;
>>  	int r;
>>  
>> -	br_write_lock(kvm);
>> +	down_write(&ioport_lock);
>>  
>>  	entry = ioport_search(&ioport_tree, port);
>>  	if (entry) {
>> @@ -96,7 +98,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>>  	r = device__register(&entry->dev_hdr);
>>  	if (r < 0)
>>  		goto out_erase;
>> -	br_write_unlock(kvm);
>> +	up_write(&ioport_lock);
>>  
>>  	return port;
>>  
>> @@ -104,7 +106,7 @@ out_erase:
>>  	rb_int_erase(&ioport_tree, &entry->node);
>>  out_free:
>>  	free(entry);
>> -	br_write_unlock(kvm);
>> +	up_write(&ioport_lock);
>>  	return r;
>>  }
>>  
>> @@ -113,7 +115,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
>>  	struct ioport *entry;
>>  	int r;
>>  
>> -	br_write_lock(kvm);
>> +	down_write(&ioport_lock);
>>  
>>  	r = -ENOENT;
>>  	entry = ioport_search(&ioport_tree, port);
>> @@ -128,7 +130,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
>>  	r = 0;
>>  
>>  done:
>> -	br_write_unlock(kvm);
>> +	up_write(&ioport_lock);
>>  
>>  	return r;
>>  }
>> @@ -171,8 +173,10 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>>  	void *ptr = data;
>>  	struct kvm *kvm = vcpu->kvm;
>>  
>> -	br_read_lock(kvm);
>> +	down_read(&ioport_lock);
>>  	entry = ioport_search(&ioport_tree, port);
>> +	up_read(&ioport_lock);
>> +
>>  	if (!entry)
>>  		goto out;
> I don't think it's valid to drop the lock that early. A concurrent ioport_unregister would free the entry pointer, so we have a potential use-after-free here.
> I guess you are thinking about an x86 CF8/CFC config space access here, that in turn would take the write lock when updating an I/O BAR?
>
> So I think the same comment that you added below on kvm__emulate_mmio() applies here? More on this below then ....

Yes, it applies. More on this below.

>
>>  
>> @@ -188,8 +192,6 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>>  	}
>>  
>>  out:
>> -	br_read_unlock(kvm);
>> -
>>  	if (ret)
>>  		return true;
>>  
>> diff --git a/mmio.c b/mmio.c
>> index 61e1d47a587d..4e0ff830c738 100644
>> --- a/mmio.c
>> +++ b/mmio.c
>> @@ -1,7 +1,7 @@
>>  #include "kvm/kvm.h"
>>  #include "kvm/kvm-cpu.h"
>>  #include "kvm/rbtree-interval.h"
>> -#include "kvm/brlock.h"
>> +#include "kvm/rwsem.h"
>>  
>>  #include <stdio.h>
>>  #include <stdlib.h>
>> @@ -15,6 +15,8 @@
>>  
>>  #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
>>  
>> +static DECLARE_RWSEM(mmio_lock);
>> +
>>  struct mmio_mapping {
>>  	struct rb_int_node	node;
>>  	void			(*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
>> @@ -61,7 +63,7 @@ static const char *to_direction(u8 is_write)
>>  
>>  int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
>>  		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
>> -			void *ptr)
>> +		       void *ptr)
>>  {
>>  	struct mmio_mapping *mmio;
>>  	struct kvm_coalesced_mmio_zone zone;
>> @@ -88,9 +90,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
>>  			return -errno;
>>  		}
>>  	}
>> -	br_write_lock(kvm);
>> +	down_write(&mmio_lock);
>>  	ret = mmio_insert(&mmio_tree, mmio);
>> -	br_write_unlock(kvm);
>> +	up_write(&mmio_lock);
>>  
>>  	return ret;
>>  }
>> @@ -100,10 +102,10 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>>  	struct mmio_mapping *mmio;
>>  	struct kvm_coalesced_mmio_zone zone;
>>  
>> -	br_write_lock(kvm);
>> +	down_write(&mmio_lock);
>>  	mmio = mmio_search_single(&mmio_tree, phys_addr);
>>  	if (mmio == NULL) {
>> -		br_write_unlock(kvm);
>> +		up_write(&mmio_lock);
>>  		return false;
>>  	}
>>  
>> @@ -114,7 +116,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>>  	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
>>  
>>  	rb_int_erase(&mmio_tree, &mmio->node);
>> -	br_write_unlock(kvm);
>> +	up_write(&mmio_lock);
>>  
>>  	free(mmio);
>>  	return true;
>> @@ -124,8 +126,15 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
>>  {
>>  	struct mmio_mapping *mmio;
>>  
>> -	br_read_lock(vcpu->kvm);
>> +	/*
>> +	 * The callback might call kvm__register_mmio which takes a write lock,
>> +	 * so avoid deadlocks by protecting only the node search with a reader
>> +	 * lock. Note that there is still a small time window for a node to be
>> +	 * deleted by another vcpu before mmio_fn gets called.
>> +	 */
> Do I get this right that this means the locking is not "fully" correct?
> I don't think we should tolerate this. The underlying problem seems to be that the lock protects two separate things: namely the RB tree to find the handler, but also the handlers and their data structures itself. So far this was feasible, but this doesn't work any longer.
>
> I think refcounting would be the answer here: Once mmio_search() returns an entry, a ref counter increases, preventing this entry from being removed by kvm__deregister_mmio(). If the emulation has finished, we decrement the counter, and trigger the free operation if it has reached zero.
>
> Does that make sense?

The only situation you end up with use-after-free if there's a race inside the
guest between one thread which reprograms the BAR address/disables access to
memory BARs, and another thread thread which tries to access the memory region
described by the BAR. My reasoning for putting the comment there instead of fixing
the race was that the guest is broken in this case and it won't function correctly
regardless of what kvmtool does. And having this use-after-free error in kvmtool
might actually benefit debugging the guest.

Adding a refcounter to prevent that from happening should be fairly straightforward.

Thanks,
Alex
>
> Cheers,
> Andre.
>
>> +	down_read(&mmio_lock);
>>  	mmio = mmio_search(&mmio_tree, phys_addr, len);
>> +	up_read(&mmio_lock);
>>  
>>  	if (mmio)
>>  		mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
>> @@ -135,7 +144,6 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
>>  				to_direction(is_write),
>>  				(unsigned long long)phys_addr, len);
>>  	}
>> -	br_read_unlock(vcpu->kvm);
>>  
>>  	return true;
>>  }

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0
  2020-02-03 12:27     ` Alexandru Elisei
@ 2020-02-05 17:00       ` Andre Przywara
  2020-03-06 12:40         ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-05 17:00 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Mon, 3 Feb 2020 12:27:55 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Hi Andre,
> 
> On 2/3/20 12:20 PM, Andre Przywara wrote:
> > On Thu, 23 Jan 2020 13:47:53 +0000
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> >  
> >> BAR 0 is an I/O BAR and is registered as an ioport region. Let's set its
> >> size, so a guest can actually use it.  
> > Well, the whole I/O bar emulates as RAZ/WI, so I would be curious how the guest would actually use it, but specifying the size is surely a good thing, so:  
> 
> Yeah, you're right, I was thinking about ARM where ioport are MMIO and you need to
> map those address. I'll remove the part about the guest being able to actually use
> it in the next iteration of the series.. Is it OK if I keep your Reviewed-by?

Sure, as I mentioned the patch itself is fine.

Thanks,
Andre.

> >    
> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>  
> > Reviewed-by: Andre Przywara <andre.przywara>
> >
> > Cheers,
> > Andre
> >  
> >> ---
> >>  hw/vesa.c | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/hw/vesa.c b/hw/vesa.c
> >> index a665736a76d7..e988c0425946 100644
> >> --- a/hw/vesa.c
> >> +++ b/hw/vesa.c
> >> @@ -70,6 +70,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
> >>  
> >>  	vesa_base_addr			= (u16)r;
> >>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
> >> +	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
> >>  	r = device__register(&vesa_device);
> >>  	if (r < 0)
> >>  		return ERR_PTR(r);  


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 20/30] pci: Add helpers for BAR values and memory/IO space access
  2020-01-23 13:47 ` [PATCH v2 kvmtool 20/30] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
@ 2020-02-05 17:00   ` Andre Przywara
  2020-02-05 17:02     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-05 17:00 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:55 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> We're going to be checking the BAR type, the address written to it and
> if access to memory or I/O space is enabled quite often when we add
> support for reasignable BARs, add helpers for it.

I am not a particular fan of these double underscores inside identifiers, but I guess that is too late now, since it's already all over the place. So:

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre.

> ---
>  include/kvm/pci.h | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>  pci.c             |  2 +-
>  2 files changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index ccb155e3e8fe..235cd82fff3c 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -5,6 +5,7 @@
>  #include <linux/kvm.h>
>  #include <linux/pci_regs.h>
>  #include <endian.h>
> +#include <stdbool.h>
>  
>  #include "kvm/devices.h"
>  #include "kvm/msi.h"
> @@ -161,4 +162,51 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
>  
>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>  
> +static inline bool __pci__memory_space_enabled(u16 command)
> +{
> +	return command & PCI_COMMAND_MEMORY;
> +}
> +
> +static inline bool pci__memory_space_enabled(struct pci_device_header *pci_hdr)
> +{
> +	return __pci__memory_space_enabled(pci_hdr->command);
> +}
> +
> +static inline bool __pci__io_space_enabled(u16 command)
> +{
> +	return command & PCI_COMMAND_IO;
> +}
> +
> +static inline bool pci__io_space_enabled(struct pci_device_header *pci_hdr)
> +{
> +	return __pci__io_space_enabled(pci_hdr->command);
> +}
> +
> +static inline bool __pci__bar_is_io(u32 bar)
> +{
> +	return bar & PCI_BASE_ADDRESS_SPACE_IO;
> +}
> +
> +static inline bool pci__bar_is_io(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return __pci__bar_is_io(pci_hdr->bar[bar_num]);
> +}
> +
> +static inline bool pci__bar_is_memory(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return !pci__bar_is_io(pci_hdr, bar_num);
> +}
> +
> +static inline u32 __pci__bar_address(u32 bar)
> +{
> +	if (__pci__bar_is_io(bar))
> +		return bar & PCI_BASE_ADDRESS_IO_MASK;
> +	return bar & PCI_BASE_ADDRESS_MEM_MASK;
> +}
> +
> +static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return __pci__bar_address(pci_hdr->bar[bar_num]);
> +}
> +
>  #endif /* KVM__PCI_H */
> diff --git a/pci.c b/pci.c
> index b6892d974c08..4f7b863298f6 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -185,7 +185,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	 * size, it will write the address back.
>  	 */
>  	if (bar < 6) {
> -		if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO)
> +		if (pci__bar_is_io(pci_hdr, bar))
>  			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
>  		else
>  			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 21/30] virtio/pci: Get emulated region address from BARs
  2020-01-23 13:47 ` [PATCH v2 kvmtool 21/30] virtio/pci: Get emulated region address from BARs Alexandru Elisei
@ 2020-02-05 17:01   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-02-05 17:01 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:56 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> The struct virtio_pci fields port_addr, mmio_addr and msix_io_block
> represent the same addresses that are written in the corresponding BARs.
> Remove this duplication of information and always use the address from the
> BAR. This will make our life a lot easier when we add support for
> reassignable BARs, because we won't have to update the fields on each BAR
> change.
> 
> No functional changes.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  include/kvm/virtio-pci.h |  3 --
>  virtio/pci.c             | 86 ++++++++++++++++++++++++++--------------
>  2 files changed, 56 insertions(+), 33 deletions(-)
> 
> diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h
> index 278a25950d8b..959b4b81c871 100644
> --- a/include/kvm/virtio-pci.h
> +++ b/include/kvm/virtio-pci.h
> @@ -24,8 +24,6 @@ struct virtio_pci {
>  	void			*dev;
>  	struct kvm		*kvm;
>  
> -	u16			port_addr;
> -	u32			mmio_addr;
>  	u8			status;
>  	u8			isr;
>  	u32			features;
> @@ -43,7 +41,6 @@ struct virtio_pci {
>  	u32			config_gsi;
>  	u32			vq_vector[VIRTIO_PCI_MAX_VQ];
>  	u32			gsis[VIRTIO_PCI_MAX_VQ];
> -	u32			msix_io_block;
>  	u64			msix_pba;
>  	struct msix_table	msix_table[VIRTIO_PCI_MAX_VQ + VIRTIO_PCI_MAX_CONFIG];
>  
> diff --git a/virtio/pci.c b/virtio/pci.c
> index 6723a1f3a84d..c4822514856c 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -13,6 +13,21 @@
>  #include <linux/byteorder.h>
>  #include <string.h>
>  
> +static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
> +{
> +	return pci__bar_address(&vpci->pci_hdr, 0);
> +}
> +
> +static u32 virtio_pci__mmio_addr(struct virtio_pci *vpci)
> +{
> +	return pci__bar_address(&vpci->pci_hdr, 1);
> +}
> +
> +static u32 virtio_pci__msix_io_addr(struct virtio_pci *vpci)
> +{
> +	return pci__bar_address(&vpci->pci_hdr, 2);
> +}
> +
>  static void virtio_pci__ioevent_callback(struct kvm *kvm, void *param)
>  {
>  	struct virtio_pci_ioevent_param *ioeventfd = param;
> @@ -25,6 +40,8 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
>  {
>  	struct ioevent ioevent;
>  	struct virtio_pci *vpci = vdev->virtio;
> +	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
> +	u16 port_addr = virtio_pci__port_addr(vpci);
>  	int r, flags = 0;
>  	int fd;
>  
> @@ -48,7 +65,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
>  		flags |= IOEVENTFD_FLAG_USER_POLL;
>  
>  	/* ioport */
> -	ioevent.io_addr	= vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY;
> +	ioevent.io_addr	= port_addr + VIRTIO_PCI_QUEUE_NOTIFY;
>  	ioevent.io_len	= sizeof(u16);
>  	ioevent.fd	= fd = eventfd(0, 0);
>  	r = ioeventfd__add_event(&ioevent, flags | IOEVENTFD_FLAG_PIO);
> @@ -56,7 +73,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
>  		return r;
>  
>  	/* mmio */
> -	ioevent.io_addr	= vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY;
> +	ioevent.io_addr	= mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY;
>  	ioevent.io_len	= sizeof(u16);
>  	ioevent.fd	= eventfd(0, 0);
>  	r = ioeventfd__add_event(&ioevent, flags);
> @@ -68,7 +85,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
>  	return 0;
>  
>  free_ioport_evt:
> -	ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
> +	ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
>  	return r;
>  }
>  
> @@ -76,9 +93,11 @@ static void virtio_pci_exit_vq(struct kvm *kvm, struct virtio_device *vdev,
>  			       int vq)
>  {
>  	struct virtio_pci *vpci = vdev->virtio;
> +	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
> +	u16 port_addr = virtio_pci__port_addr(vpci);
>  
> -	ioeventfd__del_event(vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
> -	ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
> +	ioeventfd__del_event(mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
> +	ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
>  	virtio_exit_vq(kvm, vdev, vpci->dev, vq);
>  }
>  
> @@ -163,10 +182,12 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p
>  	unsigned long offset;
>  	struct virtio_device *vdev;
>  	struct virtio_pci *vpci;
> +	u16 port_addr;
>  
>  	vdev = ioport->priv;
>  	vpci = vdev->virtio;
> -	offset = port - vpci->port_addr;
> +	port_addr = virtio_pci__port_addr(vpci);
> +	offset = port - port_addr;
>  
>  	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
>  }
> @@ -323,10 +344,12 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
>  	unsigned long offset;
>  	struct virtio_device *vdev;
>  	struct virtio_pci *vpci;
> +	u16 port_addr;
>  
>  	vdev = ioport->priv;
>  	vpci = vdev->virtio;
> -	offset = port - vpci->port_addr;
> +	port_addr = virtio_pci__port_addr(vpci);
> +	offset = port - port_addr;
>  
>  	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
>  }
> @@ -343,17 +366,18 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
>  	struct virtio_device *vdev = ptr;
>  	struct virtio_pci *vpci = vdev->virtio;
>  	struct msix_table *table;
> +	u32 msix_io_addr = virtio_pci__msix_io_addr(vpci);
>  	int vecnum;
>  	size_t offset;
>  
> -	if (addr > vpci->msix_io_block + PCI_IO_SIZE) {
> +	if (addr > msix_io_addr + PCI_IO_SIZE) {
>  		if (is_write)
>  			return;
>  		table  = (struct msix_table *)&vpci->msix_pba;
> -		offset = addr - (vpci->msix_io_block + PCI_IO_SIZE);
> +		offset = addr - (msix_io_addr + PCI_IO_SIZE);
>  	} else {
>  		table  = vpci->msix_table;
> -		offset = addr - vpci->msix_io_block;
> +		offset = addr - msix_io_addr;
>  	}
>  	vecnum = offset / sizeof(struct msix_table);
>  	offset = offset % sizeof(struct msix_table);
> @@ -442,19 +466,20 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>  {
>  	struct virtio_device *vdev = ptr;
>  	struct virtio_pci *vpci = vdev->virtio;
> +	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
>  
>  	if (!is_write)
> -		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
> -				    data, len);
> +		virtio_pci__data_in(vcpu, vdev, addr - mmio_addr, data, len);
>  	else
> -		virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr,
> -				     data, len);
> +		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
>  }
>  
>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		     int device_id, int subsys_id, int class)
>  {
>  	struct virtio_pci *vpci = vdev->virtio;
> +	u32 mmio_addr, msix_io_block;
> +	u16 port_addr;
>  	int r;
>  
>  	vpci->kvm = kvm;
> @@ -462,20 +487,21 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  
>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>  
> -	r = pci_get_io_port_block(PCI_IO_SIZE);
> -	r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev);
> +	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
> +	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
> +			     vdev);
>  	if (r < 0)
>  		return r;
> -	vpci->port_addr = (u16)r;
> +	port_addr = (u16)r;
>  
> -	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
> -	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
> +	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
> +	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
>  			       virtio_pci__io_mmio_callback, vdev);
>  	if (r < 0)
>  		goto free_ioport;
>  
> -	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
> -	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
> +	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
> +	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
>  			       virtio_pci__msix_mmio_callback, vdev);
>  	if (r < 0)
>  		goto free_mmio;
> @@ -491,11 +517,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		.class[2]		= (class >> 16) & 0xff,
>  		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>  		.subsys_id		= cpu_to_le16(subsys_id),
> -		.bar[0]			= cpu_to_le32(vpci->port_addr
> +		.bar[0]			= cpu_to_le32(port_addr
>  							| PCI_BASE_ADDRESS_SPACE_IO),
> -		.bar[1]			= cpu_to_le32(vpci->mmio_addr
> +		.bar[1]			= cpu_to_le32(mmio_addr
>  							| PCI_BASE_ADDRESS_SPACE_MEMORY),
> -		.bar[2]			= cpu_to_le32(vpci->msix_io_block
> +		.bar[2]			= cpu_to_le32(msix_io_block
>  							| PCI_BASE_ADDRESS_SPACE_MEMORY),
>  		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
>  		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
> @@ -542,11 +568,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  	return 0;
>  
>  free_msix_mmio:
> -	kvm__deregister_mmio(kvm, vpci->msix_io_block);
> +	kvm__deregister_mmio(kvm, msix_io_block);
>  free_mmio:
> -	kvm__deregister_mmio(kvm, vpci->mmio_addr);
> +	kvm__deregister_mmio(kvm, mmio_addr);
>  free_ioport:
> -	ioport__unregister(kvm, vpci->port_addr);
> +	ioport__unregister(kvm, port_addr);
>  	return r;
>  }
>  
> @@ -566,9 +592,9 @@ int virtio_pci__exit(struct kvm *kvm, struct virtio_device *vdev)
>  	struct virtio_pci *vpci = vdev->virtio;
>  
>  	virtio_pci__reset(kvm, vdev);
> -	kvm__deregister_mmio(kvm, vpci->mmio_addr);
> -	kvm__deregister_mmio(kvm, vpci->msix_io_block);
> -	ioport__unregister(kvm, vpci->port_addr);
> +	kvm__deregister_mmio(kvm, virtio_pci__mmio_addr(vpci));
> +	kvm__deregister_mmio(kvm, virtio_pci__msix_io_addr(vpci));
> +	ioport__unregister(kvm, virtio_pci__port_addr(vpci));
>  
>  	return 0;
>  }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs
  2020-01-23 13:47 ` [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
@ 2020-02-05 17:01   ` Andre Przywara
  2020-03-09 12:38     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-05 17:01 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:57 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> When we want to map a device region into the guest address space, first
> we perform an mmap on the device fd. The resulting VMA is a mapping
> between host userspace addresses and physical addresses associated with
> the device. Next, we create a memslot, which populates the stage 2 table
> with the mappings between guest physical addresses and the device
> physical adresses.
> 
> However, when we want to unmap the device from the guest address space,
> we only call munmap, which destroys the VMA and the stage 2 mappings,
> but doesn't destroy the memslot and kvmtool's internal mem_bank
> structure associated with the memslot.
> 
> This has been perfectly fine so far, because we only unmap a device
> region when we exit kvmtool. This is will change when we add support for
> reassignable BARs, and we will have to unmap vfio regions as the guest
> kernel writes new addresses in the BARs. This can lead to two possible
> problems:
> 
> - We refuse to create a valid BAR mapping because of a stale mem_bank
>   structure which belonged to a previously unmapped region.
> 
> - It is possible that the mmap in vfio_map_region returns the same
>   address that was used to create a memslot, but was unmapped by
>   vfio_unmap_region. Guest accesses to the device memory will fault
>   because the stage 2 mappings are missing, and this can lead to
>   performance degradation.
> 
> Let's do the right thing and destroy the memslot and the mem_bank struct
> associated with it when we unmap a vfio region. Set host_addr to NULL
> after the munmap call so we won't try to unmap an address which is
> currently used if vfio_unmap_region gets called twice.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  include/kvm/kvm.h |  2 ++
>  kvm.c             | 65 ++++++++++++++++++++++++++++++++++++++++++++---
>  vfio/core.c       |  6 +++++
>  3 files changed, 69 insertions(+), 4 deletions(-)
> 
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index 50119a8672eb..c7e57b890cdd 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -56,6 +56,7 @@ struct kvm_mem_bank {
>  	void			*host_addr;
>  	u64			size;
>  	enum kvm_mem_type	type;
> +	u32			slot;
>  };
>  
>  struct kvm {
> @@ -106,6 +107,7 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
>  void kvm__irq_trigger(struct kvm *kvm, int irq);
>  bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count);
>  bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write);
> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr);
>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr,
>  		      enum kvm_mem_type type);
>  static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size,
> diff --git a/kvm.c b/kvm.c
> index 57c4ff98ec4c..afcf55c7bf45 100644
> --- a/kvm.c
> +++ b/kvm.c
> @@ -183,20 +183,75 @@ int kvm__exit(struct kvm *kvm)
>  }
>  core_exit(kvm__exit);
>  
> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size,
> +		     void *userspace_addr)
> +{
> +	struct kvm_userspace_memory_region mem;
> +	struct kvm_mem_bank *bank;
> +	int ret;
> +
> +	list_for_each_entry(bank, &kvm->mem_banks, list)
> +		if (bank->guest_phys_addr == guest_phys &&
> +		    bank->size == size && bank->host_addr == userspace_addr)
> +			break;

Shouldn't we protect the list with some lock? I am actually not sure we have this problem already, but at least now a guest could reassign BARs concurrently on different VCPUs, in which case multiple kvm__destroy_mem() and kvm__register_dev_mem() calls might race against each other.
I think so far we got away with it because of the currently static nature of the memslot assignment.

> +
> +	if (&bank->list == &kvm->mem_banks) {
> +		pr_err("Region [%llx-%llx] not found", guest_phys,
> +		       guest_phys + size - 1);
> +		return -EINVAL;
> +	}
> +
> +	if (bank->type == KVM_MEM_TYPE_RESERVED) {
> +		pr_err("Cannot delete reserved region [%llx-%llx]",
> +		       guest_phys, guest_phys + size - 1);
> +		return -EINVAL;
> +	}
> +
> +	mem = (struct kvm_userspace_memory_region) {
> +		.slot			= bank->slot,
> +		.guest_phys_addr	= guest_phys,
> +		.memory_size		= 0,
> +		.userspace_addr		= (unsigned long)userspace_addr,
> +	};
> +
> +	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
> +	if (ret < 0)
> +		return -errno;
> +
> +	list_del(&bank->list);
> +	free(bank);
> +	kvm->mem_slots--;
> +
> +	return 0;
> +}
> +
>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>  		      void *userspace_addr, enum kvm_mem_type type)
>  {
>  	struct kvm_userspace_memory_region mem;
>  	struct kvm_mem_bank *merged = NULL;
>  	struct kvm_mem_bank *bank;
> +	struct list_head *prev_entry;
> +	u32 slot;
>  	int ret;
>  
> -	/* Check for overlap */
> +	/* Check for overlap and find first empty slot. */
> +	slot = 0;
> +	prev_entry = &kvm->mem_banks;
>  	list_for_each_entry(bank, &kvm->mem_banks, list) {
>  		u64 bank_end = bank->guest_phys_addr + bank->size - 1;
>  		u64 end = guest_phys + size - 1;
> -		if (guest_phys > bank_end || end < bank->guest_phys_addr)
> +		if (guest_phys > bank_end || end < bank->guest_phys_addr) {
> +			/*
> +			 * Keep the banks sorted ascending by slot, so it's
> +			 * easier for us to find a free slot.
> +			 */
> +			if (bank->slot == slot) {
> +				slot++;
> +				prev_entry = &bank->list;
> +			}
>  			continue;
> +		}
>  
>  		/* Merge overlapping reserved regions */
>  		if (bank->type == KVM_MEM_TYPE_RESERVED &&
> @@ -241,10 +296,11 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>  	bank->host_addr			= userspace_addr;
>  	bank->size			= size;
>  	bank->type			= type;
> +	bank->slot			= slot;
>  
>  	if (type != KVM_MEM_TYPE_RESERVED) {
>  		mem = (struct kvm_userspace_memory_region) {
> -			.slot			= kvm->mem_slots++,
> +			.slot			= slot,
>  			.guest_phys_addr	= guest_phys,
>  			.memory_size		= size,
>  			.userspace_addr		= (unsigned long)userspace_addr,
> @@ -255,7 +311,8 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>  			return -errno;
>  	}
>  
> -	list_add(&bank->list, &kvm->mem_banks);
> +	list_add(&bank->list, prev_entry);
> +	kvm->mem_slots++;
>  
>  	return 0;
>  }
> diff --git a/vfio/core.c b/vfio/core.c
> index 0ed1e6fee6bf..73fdac8be675 100644
> --- a/vfio/core.c
> +++ b/vfio/core.c
> @@ -256,8 +256,14 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
>  
>  void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
>  {
> +	u64 map_size;
> +
>  	if (region->host_addr) {
> +		map_size = ALIGN(region->info.size, PAGE_SIZE);
>  		munmap(region->host_addr, region->info.size);
> +		kvm__destroy_mem(kvm, region->guest_phys_addr, map_size,
> +				 region->host_addr);

Shouldn't we destroy the memslot first, then unmap? Because in the current version we are giving a no longer valid userland address to the ioctl. I actually wonder how that passes the access_ok() check in the kernel's KVM_SET_USER_MEMORY_REGION handler.

Cheers,
Andre

> +		region->host_addr = NULL;
>  	} else if (region->is_ioport) {
>  		ioport__unregister(kvm, region->port_base);
>  	} else {

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 20/30] pci: Add helpers for BAR values and memory/IO space access
  2020-02-05 17:00   ` Andre Przywara
@ 2020-02-05 17:02     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-02-05 17:02 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/5/20 5:00 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:55 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> We're going to be checking the BAR type, the address written to it and
>> if access to memory or I/O space is enabled quite often when we add
>> support for reasignable BARs, add helpers for it.
> I am not a particular fan of these double underscores inside identifiers, but I guess that is too late now, since it's already all over the place. So:

Me neither, I was going for consistency.

Thanks,
Alex
>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
>
> Cheers,
> Andre.
>
>> ---
>>  include/kvm/pci.h | 48 +++++++++++++++++++++++++++++++++++++++++++++++
>>  pci.c             |  2 +-
>>  2 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index ccb155e3e8fe..235cd82fff3c 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -5,6 +5,7 @@
>>  #include <linux/kvm.h>
>>  #include <linux/pci_regs.h>
>>  #include <endian.h>
>> +#include <stdbool.h>
>>  
>>  #include "kvm/devices.h"
>>  #include "kvm/msi.h"
>> @@ -161,4 +162,51 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
>>  
>>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>>  
>> +static inline bool __pci__memory_space_enabled(u16 command)
>> +{
>> +	return command & PCI_COMMAND_MEMORY;
>> +}
>> +
>> +static inline bool pci__memory_space_enabled(struct pci_device_header *pci_hdr)
>> +{
>> +	return __pci__memory_space_enabled(pci_hdr->command);
>> +}
>> +
>> +static inline bool __pci__io_space_enabled(u16 command)
>> +{
>> +	return command & PCI_COMMAND_IO;
>> +}
>> +
>> +static inline bool pci__io_space_enabled(struct pci_device_header *pci_hdr)
>> +{
>> +	return __pci__io_space_enabled(pci_hdr->command);
>> +}
>> +
>> +static inline bool __pci__bar_is_io(u32 bar)
>> +{
>> +	return bar & PCI_BASE_ADDRESS_SPACE_IO;
>> +}
>> +
>> +static inline bool pci__bar_is_io(struct pci_device_header *pci_hdr, int bar_num)
>> +{
>> +	return __pci__bar_is_io(pci_hdr->bar[bar_num]);
>> +}
>> +
>> +static inline bool pci__bar_is_memory(struct pci_device_header *pci_hdr, int bar_num)
>> +{
>> +	return !pci__bar_is_io(pci_hdr, bar_num);
>> +}
>> +
>> +static inline u32 __pci__bar_address(u32 bar)
>> +{
>> +	if (__pci__bar_is_io(bar))
>> +		return bar & PCI_BASE_ADDRESS_IO_MASK;
>> +	return bar & PCI_BASE_ADDRESS_MEM_MASK;
>> +}
>> +
>> +static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_num)
>> +{
>> +	return __pci__bar_address(pci_hdr->bar[bar_num]);
>> +}
>> +
>>  #endif /* KVM__PCI_H */
>> diff --git a/pci.c b/pci.c
>> index b6892d974c08..4f7b863298f6 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -185,7 +185,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	 * size, it will write the address back.
>>  	 */
>>  	if (bar < 6) {
>> -		if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO)
>> +		if (pci__bar_is_io(pci_hdr, bar))
>>  			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
>>  		else
>>  			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 23/30] vfio: Reserve ioports when configuring the BAR
  2020-01-23 13:47 ` [PATCH v2 kvmtool 23/30] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
@ 2020-02-05 18:34   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-02-05 18:34 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:58 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> Let's be consistent and reserve ioports when we are configuring the BAR,
> not when we map it, just like we do with mmio regions.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Thanks,
Andre

> ---
>  vfio/core.c | 9 +++------
>  vfio/pci.c  | 4 +++-
>  2 files changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/vfio/core.c b/vfio/core.c
> index 73fdac8be675..6b9b58ea8d2f 100644
> --- a/vfio/core.c
> +++ b/vfio/core.c
> @@ -202,14 +202,11 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
>  				  struct vfio_region *region)
>  {
>  	if (region->is_ioport) {
> -		int port = pci_get_io_port_block(region->info.size);
> -
> -		port = ioport__register(kvm, port, &vfio_ioport_ops,
> -					region->info.size, region);
> +		int port = ioport__register(kvm, region->port_base,
> +					   &vfio_ioport_ops, region->info.size,
> +					   region);
>  		if (port < 0)
>  			return port;
> -
> -		region->port_base = port;
>  		return 0;
>  	}
>  
> diff --git a/vfio/pci.c b/vfio/pci.c
> index f86a7d9b7032..abde16dc8693 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -885,7 +885,9 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  		}
>  	}
>  
> -	if (!region->is_ioport) {
> +	if (region->is_ioport) {
> +		region->port_base = pci_get_io_port_block(region->info.size);
> +	} else {
>  		/* Grab some MMIO space in the guest */
>  		map_size = ALIGN(region->info.size, PAGE_SIZE);
>  		region->guest_phys_addr = pci_get_mmio_block(map_size);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 24/30] vfio/pci: Don't write configuration value twice
  2020-01-23 13:47 ` [PATCH v2 kvmtool 24/30] vfio/pci: Don't write configuration value twice Alexandru Elisei
@ 2020-02-05 18:35   ` Andre Przywara
  2020-03-09 15:21     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-05 18:35 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:59 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> After writing to the device fd as part of the PCI configuration space
> emulation, we read back from the device to make sure that the write
> finished. The value is read back into the PCI configuration space and
> afterwards, the same value is copied by the PCI emulation code. Let's
> read from the device fd into a temporary variable, to prevent this
> double write.
> 
> The double write is harmless in itself. But when we implement
> reassignable BARs, we need to keep track of the old BAR value, and the
> VFIO code is overwritting it.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  vfio/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index abde16dc8693..8a775a4a4a54 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -470,7 +470,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
>  	struct vfio_device *vdev;
> -	void *base = pci_hdr;
> +	u32 tmp;

Can we make this a u64, please? I am not sure if 64-bit MMIO is allowed for PCI config space accesses, but a guest could do it anyway, and it looks like it would overwrite the vdev pointer on the stack here in this case.

Cheers,
Andre.

>  
>  	if (offset == PCI_ROM_ADDRESS)
>  		return;
> @@ -490,7 +490,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
>  		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
>  
> -	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
> +	if (pread(vdev->fd, &tmp, sz, info->offset + offset) != sz)
>  		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
>  			      sz, offset);
>  }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation
  2020-01-23 13:48 ` [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
@ 2020-02-06 18:21   ` Andre Przywara
  2020-02-07 10:12     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-06 18:21 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:48:00 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> Implement callbacks for activating and deactivating emulation for a BAR
> region. This is in preparation for allowing a guest operating system to
> enable and disable access to I/O or memory space, or to reassign the
> BARs.
> 
> The emulated vesa device has been refactored in the process and the static
> variables were removed in order to make using the callbacks less painful.
> The framebuffer isn't designed to allow stopping and restarting at
> arbitrary points in the guest execution. Furthermore, on x86, the kernel
> will not change the BAR addresses, which on bare metal are programmed by
> the firmware, so take the easy way out and refuse to deactivate emulation
> for the BAR regions.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  hw/vesa.c         | 120 ++++++++++++++++++++++++++++++++--------------
>  include/kvm/pci.h |  19 +++++++-
>  pci.c             |  44 +++++++++++++++++
>  vfio/pci.c        | 100 +++++++++++++++++++++++++++++++-------
>  virtio/pci.c      |  90 ++++++++++++++++++++++++----------
>  5 files changed, 294 insertions(+), 79 deletions(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index e988c0425946..74ebebbefa6b 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -18,6 +18,12 @@
>  #include <inttypes.h>
>  #include <unistd.h>
>  
> +struct vesa_dev {
> +	struct pci_device_header	pci_hdr;
> +	struct device_header		dev_hdr;
> +	struct framebuffer		fb;
> +};
> +
>  static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>  {
>  	return true;
> @@ -33,29 +39,52 @@ static struct ioport_operations vesa_io_ops = {
>  	.io_out			= vesa_pci_io_out,
>  };
>  
> -static struct pci_device_header vesa_pci_device = {
> -	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> -	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
> -	.header_type		= PCI_HEADER_TYPE_NORMAL,
> -	.revision_id		= 0,
> -	.class[2]		= 0x03,
> -	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
> -	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
> -	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
> -	.bar_size[1]		= VESA_MEM_SIZE,
> -};
> +static int vesa__bar_activate(struct kvm *kvm,
> +			      struct pci_device_header *pci_hdr,
> +			      int bar_num, void *data)
> +{
> +	struct vesa_dev *vdev = data;
> +	u32 bar_addr, bar_size;
> +	char *mem;
> +	int r;
>  
> -static struct device_header vesa_device = {
> -	.bus_type	= DEVICE_BUS_PCI,
> -	.data		= &vesa_pci_device,
> -};
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +	bar_size = pci_hdr->bar_size[bar_num];
>  
> -static struct framebuffer vesafb;
> +	switch (bar_num) {
> +	case 0:
> +		r = ioport__register(kvm, bar_addr, &vesa_io_ops, bar_size,
> +				     NULL);
> +		break;
> +	case 1:
> +		mem = mmap(NULL, bar_size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> +		if (mem == MAP_FAILED) {
> +			r = -errno;
> +			break;
> +		}
> +		r = kvm__register_dev_mem(kvm, bar_addr, bar_size, mem);
> +		if (r < 0)
> +			break;
> +		vdev->fb.mem = mem;
> +		break;
> +	default:
> +		r = -EINVAL;
> +	}
> +
> +	return r;
> +}
> +
> +static int vesa__bar_deactivate(struct kvm *kvm,
> +				struct pci_device_header *pci_hdr,
> +				int bar_num, void *data)
> +{
> +	return -EINVAL;
> +}
>  
>  struct framebuffer *vesa__init(struct kvm *kvm)
>  {
> -	u16 vesa_base_addr;
> -	char *mem;
> +	struct vesa_dev *vdev;
> +	u16 port_addr;
>  	int r;
>  
>  	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
> @@ -63,34 +92,51 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>  	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>  		return NULL;
> -	r = pci_get_io_port_block(PCI_IO_SIZE);
> -	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
> -	if (r < 0)
> -		return ERR_PTR(r);
>  
> -	vesa_base_addr			= (u16)r;
> -	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
> -	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
> -	r = device__register(&vesa_device);
> -	if (r < 0)
> -		return ERR_PTR(r);
> +	vdev = calloc(1, sizeof(*vdev));
> +	if (vdev == NULL)
> +		return ERR_PTR(-ENOMEM);

Is it really necessary to allocate this here? You never free this, and I don't see how you could actually do this. AFAICS conceptually there can be only one VESA device? So maybe have a static variable above and use that instead of passing the pointer around? Or use &vdev if you need a pointer argument for the callbacks.

>  
> -	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
> -	if (mem == MAP_FAILED)
> -		return ERR_PTR(-errno);
> +	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
>  
> -	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
> -	if (r < 0)
> -		return ERR_PTR(r);
> +	vdev->pci_hdr = (struct pci_device_header) {
> +		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> +		.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
> +		.command		= PCI_COMMAND_IO | PCI_COMMAND_MEMORY,
> +		.header_type		= PCI_HEADER_TYPE_NORMAL,
> +		.revision_id		= 0,
> +		.class[2]		= 0x03,
> +		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
> +		.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
> +		.bar[0]			= cpu_to_le32(port_addr | PCI_BASE_ADDRESS_SPACE_IO),
> +		.bar_size[0]		= PCI_IO_SIZE,
> +		.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
> +		.bar_size[1]		= VESA_MEM_SIZE,
> +	};
>  
> -	vesafb = (struct framebuffer) {
> +	vdev->fb = (struct framebuffer) {
>  		.width			= VESA_WIDTH,
>  		.height			= VESA_HEIGHT,
>  		.depth			= VESA_BPP,
> -		.mem			= mem,
> +		.mem			= NULL,
>  		.mem_addr		= VESA_MEM_ADDR,
>  		.mem_size		= VESA_MEM_SIZE,
>  		.kvm			= kvm,
>  	};
> -	return fb__register(&vesafb);
> +
> +	r = pci__register_bar_regions(kvm, &vdev->pci_hdr, vesa__bar_activate,
> +				      vesa__bar_deactivate, vdev);
> +	if (r < 0)
> +		return ERR_PTR(r);
> +
> +	vdev->dev_hdr = (struct device_header) {
> +		.bus_type       = DEVICE_BUS_PCI,
> +		.data           = &vdev->pci_hdr,
> +	};
> +
> +	r = device__register(&vdev->dev_hdr);
> +	if (r < 0)
> +		return ERR_PTR(r);
> +
> +	return fb__register(&vdev->fb);
>  }
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index 235cd82fff3c..bf42f497168f 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -89,12 +89,19 @@ struct pci_cap_hdr {
>  	u8	next;
>  };
>  
> +struct pci_device_header;
> +
> +typedef int (*bar_activate_fn_t)(struct kvm *kvm,
> +				 struct pci_device_header *pci_hdr,
> +				 int bar_num, void *data);
> +typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
> +				   struct pci_device_header *pci_hdr,
> +				   int bar_num, void *data);
> +
>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>  #define PCI_DEV_CFG_SIZE	256
>  #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
> -struct pci_device_header;
> -
>  struct pci_config_operations {
>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>  		      u8 offset, void *data, int sz);
> @@ -136,6 +143,9 @@ struct pci_device_header {
>  
>  	/* Private to lkvm */
>  	u32		bar_size[6];
> +	bar_activate_fn_t	bar_activate_fn;
> +	bar_deactivate_fn_t	bar_deactivate_fn;
> +	void *data;
>  	struct pci_config_operations	cfg_ops;
>  	/*
>  	 * PCI INTx# are level-triggered, but virtual device often feature
> @@ -160,8 +170,13 @@ void pci__assign_irq(struct device_header *dev_hdr);
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size);
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size);
>  
> +

Stray empty line?

Cheers,
Andre

>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>  
> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      bar_activate_fn_t bar_activate_fn,
> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
> +
>  static inline bool __pci__memory_space_enabled(u16 command)
>  {
>  	return command & PCI_COMMAND_MEMORY;
> diff --git a/pci.c b/pci.c
> index 4f7b863298f6..5412f2defa2e 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -66,6 +66,11 @@ void pci__assign_irq(struct device_header *dev_hdr)
>  		pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
>  }
>  
> +static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return  bar_num < 6 && pci_hdr->bar_size[bar_num];
> +}
> +
>  static void *pci_config_address_ptr(u16 port)
>  {
>  	unsigned long offset;
> @@ -264,6 +269,45 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
>  	return hdr->data;
>  }
>  
> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      bar_activate_fn_t bar_activate_fn,
> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
> +{
> +	int i, r;
> +	bool has_bar_regions = false;
> +
> +	assert(bar_activate_fn && bar_deactivate_fn);
> +
> +	pci_hdr->bar_activate_fn = bar_activate_fn;
> +	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
> +	pci_hdr->data = data;
> +
> +	for (i = 0; i < 6; i++) {
> +		if (!pci_bar_is_implemented(pci_hdr, i))
> +			continue;
> +
> +		has_bar_regions = true;
> +
> +		if (pci__bar_is_io(pci_hdr, i) &&
> +		    pci__io_space_enabled(pci_hdr)) {
> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
> +				if (r < 0)
> +					return r;
> +			}
> +
> +		if (pci__bar_is_memory(pci_hdr, i) &&
> +		    pci__memory_space_enabled(pci_hdr)) {
> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
> +				if (r < 0)
> +					return r;
> +			}
> +	}
> +
> +	assert(has_bar_regions);
> +
> +	return 0;
> +}
> +
>  int pci__init(struct kvm *kvm)
>  {
>  	int r;
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 8a775a4a4a54..9e595562180b 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -446,6 +446,83 @@ out_unlock:
>  	mutex_unlock(&pdev->msi.mutex);
>  }
>  
> +static int vfio_pci_bar_activate(struct kvm *kvm,
> +				 struct pci_device_header *pci_hdr,
> +				 int bar_num, void *data)
> +{
> +	struct vfio_device *vdev = data;
> +	struct vfio_pci_device *pdev = &vdev->pci;
> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
> +	struct vfio_region *region = &vdev->regions[bar_num];
> +	int ret;
> +
> +	if (!region->info.size) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
> +	    (u32)bar_num == table->bar) {
> +		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
> +					 table->size, false,
> +					 vfio_pci_msix_table_access, pdev);
> +		if (ret < 0 || table->bar!= pba->bar)
> +			goto out;
> +	}
> +
> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
> +	    (u32)bar_num == pba->bar) {
> +		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
> +					 pba->size, false,
> +					 vfio_pci_msix_pba_access, pdev);
> +		goto out;
> +	}
> +
> +	ret = vfio_map_region(kvm, vdev, region);
> +out:
> +	return ret;
> +}
> +
> +static int vfio_pci_bar_deactivate(struct kvm *kvm,
> +				   struct pci_device_header *pci_hdr,
> +				   int bar_num, void *data)
> +{
> +	struct vfio_device *vdev = data;
> +	struct vfio_pci_device *pdev = &vdev->pci;
> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
> +	struct vfio_region *region = &vdev->regions[bar_num];
> +	int ret;
> +	bool success;
> +
> +	if (!region->info.size) {
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
> +	    (u32)bar_num == table->bar) {
> +		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
> +		ret = (success ? 0 : -EINVAL);
> +		if (ret < 0 || table->bar!= pba->bar)
> +			goto out;
> +	}
> +
> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
> +	    (u32)bar_num == pba->bar) {
> +		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
> +		ret = (success ? 0 : -EINVAL);
> +		goto out;
> +	}
> +
> +	vfio_unmap_region(kvm, region);
> +	ret = 0;
> +
> +out:
> +	return ret;
> +}
> +
>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>  			      u8 offset, void *data, int sz)
>  {
> @@ -804,12 +881,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>  		ret = -ENOMEM;
>  		goto out_free;
>  	}
> -	pba->guest_phys_addr = table->guest_phys_addr + table->size;
> -
> -	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
> -				 false, vfio_pci_msix_table_access, pdev);
> -	if (ret < 0)
> -		goto out_free;
>  
>  	/*
>  	 * We could map the physical PBA directly into the guest, but it's
> @@ -819,10 +890,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>  	 * between MSI-X table and PBA. For the sake of isolation, create a
>  	 * virtual PBA.
>  	 */
> -	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
> -				 vfio_pci_msix_pba_access, pdev);
> -	if (ret < 0)
> -		goto out_free;
> +	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>  
>  	pdev->msix.entries = entries;
>  	pdev->msix.nr_entries = nr_entries;
> @@ -893,11 +961,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  		region->guest_phys_addr = pci_get_mmio_block(map_size);
>  	}
>  
> -	/* Map the BARs into the guest or setup a trap region. */
> -	ret = vfio_map_region(kvm, vdev, region);
> -	if (ret)
> -		return ret;
> -
>  	return 0;
>  }
>  
> @@ -944,7 +1007,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>  	}
>  
>  	/* We've configured the BARs, fake up a Configuration Space */
> -	return vfio_pci_fixup_cfg_space(vdev);
> +	ret = vfio_pci_fixup_cfg_space(vdev);
> +	if (ret)
> +		return ret;
> +
> +	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
> +					 vfio_pci_bar_deactivate, vdev);
>  }
>  
>  /*
> diff --git a/virtio/pci.c b/virtio/pci.c
> index c4822514856c..5a3cc6f1e943 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -474,6 +474,65 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>  		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
>  }
>  
> +static int virtio_pci__bar_activate(struct kvm *kvm,
> +				    struct pci_device_header *pci_hdr,
> +				    int bar_num, void *data)
> +{
> +	struct virtio_device *vdev = data;
> +	u32 bar_addr, bar_size;
> +	int r;
> +
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +	bar_size = pci_hdr->bar_size[bar_num];
> +
> +	switch (bar_num) {
> +	case 0:
> +		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
> +				     bar_size, vdev);
> +		if (r > 0)
> +			r = 0;
> +		break;
> +	case 1:
> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
> +					virtio_pci__io_mmio_callback, vdev);
> +		break;
> +	case 2:
> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
> +					virtio_pci__msix_mmio_callback, vdev);
> +		break;
> +	default:
> +		r = -EINVAL;
> +	}
> +
> +	return r;
> +}
> +
> +static int virtio_pci__bar_deactivate(struct kvm *kvm,
> +				      struct pci_device_header *pci_hdr,
> +				      int bar_num, void *data)
> +{
> +	u32 bar_addr;
> +	bool success;
> +	int r;
> +
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +
> +	switch (bar_num) {
> +	case 0:
> +		r = ioport__unregister(kvm, bar_addr);
> +		break;
> +	case 1:
> +	case 2:
> +		success = kvm__deregister_mmio(kvm, bar_addr);
> +		r = (success ? 0 : -EINVAL);
> +		break;
> +	default:
> +		r = -EINVAL;
> +	}
> +
> +	return r;
> +}
> +
>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		     int device_id, int subsys_id, int class)
>  {
> @@ -488,23 +547,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>  
>  	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
> -	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
> -			     vdev);
> -	if (r < 0)
> -		return r;
> -	port_addr = (u16)r;
> -
>  	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
> -	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
> -			       virtio_pci__io_mmio_callback, vdev);
> -	if (r < 0)
> -		goto free_ioport;
> -
>  	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
> -	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
> -			       virtio_pci__msix_mmio_callback, vdev);
> -	if (r < 0)
> -		goto free_mmio;
>  
>  	vpci->pci_hdr = (struct pci_device_header) {
>  		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> @@ -530,6 +574,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
>  	};
>  
> +	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
> +				      virtio_pci__bar_activate,
> +				      virtio_pci__bar_deactivate, vdev);
> +	if (r < 0)
> +		return r;
> +
>  	vpci->dev_hdr = (struct device_header) {
>  		.bus_type		= DEVICE_BUS_PCI,
>  		.data			= &vpci->pci_hdr,
> @@ -560,20 +610,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  
>  	r = device__register(&vpci->dev_hdr);
>  	if (r < 0)
> -		goto free_msix_mmio;
> +		return r;
>  
>  	/* save the IRQ that device__register() has allocated */
>  	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
>  
>  	return 0;
> -
> -free_msix_mmio:
> -	kvm__deregister_mmio(kvm, msix_io_block);
> -free_mmio:
> -	kvm__deregister_mmio(kvm, mmio_addr);
> -free_ioport:
> -	ioport__unregister(kvm, port_addr);
> -	return r;
>  }
>  
>  int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation
  2020-01-23 13:48 ` [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
@ 2020-02-06 18:21   ` Andre Przywara
  2020-02-07 11:08     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-06 18:21 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:48:01 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> During configuration of the BAR addresses, a Linux guest disables and
> enables access to I/O and memory space. When access is disabled, we don't
> stop emulating the memory regions described by the BARs. Now that we have
> callbacks for activating and deactivating emulation for a BAR region,
> let's use that to stop emulation when access is disabled, and
> re-activate it when access is re-enabled.
> 
> The vesa emulation hasn't been designed with toggling on and off in
> mind, so refuse writes to the PCI command register that disable memory
> or IO access.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  hw/vesa.c | 16 ++++++++++++++++
>  pci.c     | 42 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 58 insertions(+)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index 74ebebbefa6b..3044a86078fb 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -81,6 +81,18 @@ static int vesa__bar_deactivate(struct kvm *kvm,
>  	return -EINVAL;
>  }
>  
> +static void vesa__pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +				u8 offset, void *data, int sz)
> +{
> +	u32 value;

I guess the same comment as on the other patch applies: using u64 looks safer to me. Also you should clear it, to avoid nasty surprises in case of a short write (1 or 2 bytes only).

The rest looks alright.

Cheers,
Andre

> +
> +	if (offset == PCI_COMMAND) {
> +		memcpy(&value, data, sz);
> +		value |= (PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
> +		memcpy(data, &value, sz);
> +	}
> +}
> +
>  struct framebuffer *vesa__init(struct kvm *kvm)
>  {
>  	struct vesa_dev *vdev;
> @@ -114,6 +126,10 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  		.bar_size[1]		= VESA_MEM_SIZE,
>  	};
>  
> +	vdev->pci_hdr.cfg_ops = (struct pci_config_operations) {
> +		.write	= vesa__pci_cfg_write,
> +	};
> +
>  	vdev->fb = (struct framebuffer) {
>  		.width			= VESA_WIDTH,
>  		.height			= VESA_HEIGHT,
> diff --git a/pci.c b/pci.c
> index 5412f2defa2e..98331a1fc205 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -157,6 +157,42 @@ static struct ioport_operations pci_config_data_ops = {
>  	.io_out	= pci_config_data_out,
>  };
>  
> +static void pci_config_command_wr(struct kvm *kvm,
> +				  struct pci_device_header *pci_hdr,
> +				  u16 new_command)
> +{
> +	int i;
> +	bool toggle_io, toggle_mem;
> +
> +	toggle_io = (pci_hdr->command ^ new_command) & PCI_COMMAND_IO;
> +	toggle_mem = (pci_hdr->command ^ new_command) & PCI_COMMAND_MEMORY;
> +
> +	for (i = 0; i < 6; i++) {
> +		if (!pci_bar_is_implemented(pci_hdr, i))
> +			continue;
> +
> +		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
> +			if (__pci__io_space_enabled(new_command))
> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> +							 pci_hdr->data);
> +			else
> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> +							   pci_hdr->data);
> +		}
> +
> +		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
> +			if (__pci__memory_space_enabled(new_command))
> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> +							 pci_hdr->data);
> +			else
> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> +							   pci_hdr->data);
> +		}
> +	}
> +
> +	pci_hdr->command = new_command;
> +}
> +
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
>  	void *base;
> @@ -182,6 +218,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	if (*(u32 *)(base + offset) == 0)
>  		return;
>  
> +	if (offset == PCI_COMMAND) {
> +		memcpy(&value, data, size);
> +		pci_config_command_wr(kvm, pci_hdr, (u16)value);
> +		return;
> +	}
> +
>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>  
>  	/*


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation
  2020-02-06 18:21   ` Andre Przywara
@ 2020-02-07 10:12     ` Alexandru Elisei
  2020-02-07 15:39       ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-02-07 10:12 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/6/20 6:21 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:48:00 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> Implement callbacks for activating and deactivating emulation for a BAR
>> region. This is in preparation for allowing a guest operating system to
>> enable and disable access to I/O or memory space, or to reassign the
>> BARs.
>>
>> The emulated vesa device has been refactored in the process and the static
>> variables were removed in order to make using the callbacks less painful.
>> The framebuffer isn't designed to allow stopping and restarting at
>> arbitrary points in the guest execution. Furthermore, on x86, the kernel
>> will not change the BAR addresses, which on bare metal are programmed by
>> the firmware, so take the easy way out and refuse to deactivate emulation
>> for the BAR regions.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  hw/vesa.c         | 120 ++++++++++++++++++++++++++++++++--------------
>>  include/kvm/pci.h |  19 +++++++-
>>  pci.c             |  44 +++++++++++++++++
>>  vfio/pci.c        | 100 +++++++++++++++++++++++++++++++-------
>>  virtio/pci.c      |  90 ++++++++++++++++++++++++----------
>>  5 files changed, 294 insertions(+), 79 deletions(-)
>>
>> diff --git a/hw/vesa.c b/hw/vesa.c
>> index e988c0425946..74ebebbefa6b 100644
>> --- a/hw/vesa.c
>> +++ b/hw/vesa.c
>> @@ -18,6 +18,12 @@
>>  #include <inttypes.h>
>>  #include <unistd.h>
>>  
>> +struct vesa_dev {
>> +	struct pci_device_header	pci_hdr;
>> +	struct device_header		dev_hdr;
>> +	struct framebuffer		fb;
>> +};
>> +
>>  static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>>  {
>>  	return true;
>> @@ -33,29 +39,52 @@ static struct ioport_operations vesa_io_ops = {
>>  	.io_out			= vesa_pci_io_out,
>>  };
>>  
>> -static struct pci_device_header vesa_pci_device = {
>> -	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>> -	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
>> -	.header_type		= PCI_HEADER_TYPE_NORMAL,
>> -	.revision_id		= 0,
>> -	.class[2]		= 0x03,
>> -	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>> -	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>> -	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
>> -	.bar_size[1]		= VESA_MEM_SIZE,
>> -};
>> +static int vesa__bar_activate(struct kvm *kvm,
>> +			      struct pci_device_header *pci_hdr,
>> +			      int bar_num, void *data)
>> +{
>> +	struct vesa_dev *vdev = data;
>> +	u32 bar_addr, bar_size;
>> +	char *mem;
>> +	int r;
>>  
>> -static struct device_header vesa_device = {
>> -	.bus_type	= DEVICE_BUS_PCI,
>> -	.data		= &vesa_pci_device,
>> -};
>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>> +	bar_size = pci_hdr->bar_size[bar_num];
>>  
>> -static struct framebuffer vesafb;
>> +	switch (bar_num) {
>> +	case 0:
>> +		r = ioport__register(kvm, bar_addr, &vesa_io_ops, bar_size,
>> +				     NULL);
>> +		break;
>> +	case 1:
>> +		mem = mmap(NULL, bar_size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
>> +		if (mem == MAP_FAILED) {
>> +			r = -errno;
>> +			break;
>> +		}
>> +		r = kvm__register_dev_mem(kvm, bar_addr, bar_size, mem);
>> +		if (r < 0)
>> +			break;
>> +		vdev->fb.mem = mem;
>> +		break;
>> +	default:
>> +		r = -EINVAL;
>> +	}
>> +
>> +	return r;
>> +}
>> +
>> +static int vesa__bar_deactivate(struct kvm *kvm,
>> +				struct pci_device_header *pci_hdr,
>> +				int bar_num, void *data)
>> +{
>> +	return -EINVAL;
>> +}
>>  
>>  struct framebuffer *vesa__init(struct kvm *kvm)
>>  {
>> -	u16 vesa_base_addr;
>> -	char *mem;
>> +	struct vesa_dev *vdev;
>> +	u16 port_addr;
>>  	int r;
>>  
>>  	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
>> @@ -63,34 +92,51 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>  
>>  	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>>  		return NULL;
>> -	r = pci_get_io_port_block(PCI_IO_SIZE);
>> -	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
>> -	if (r < 0)
>> -		return ERR_PTR(r);
>>  
>> -	vesa_base_addr			= (u16)r;
>> -	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
>> -	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
>> -	r = device__register(&vesa_device);
>> -	if (r < 0)
>> -		return ERR_PTR(r);
>> +	vdev = calloc(1, sizeof(*vdev));
>> +	if (vdev == NULL)
>> +		return ERR_PTR(-ENOMEM);
> Is it really necessary to allocate this here? You never free this, and I don't see how you could actually do this. AFAICS conceptually there can be only one VESA device? So maybe have a static variable above and use that instead of passing the pointer around? Or use &vdev if you need a pointer argument for the callbacks.

As far as I can tell, there can be only one VESA device, yes. I was following the
same pattern from virtio/{net,blk,rng,scsi,9p}.c, which I prefer because it's
explicit what function can access the device. What's wrong with passing the
pointer around? The entire PCI emulation code works like that.

>
>>  
>> -	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
>> -	if (mem == MAP_FAILED)
>> -		return ERR_PTR(-errno);
>> +	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
>>  
>> -	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
>> -	if (r < 0)
>> -		return ERR_PTR(r);
>> +	vdev->pci_hdr = (struct pci_device_header) {
>> +		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>> +		.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
>> +		.command		= PCI_COMMAND_IO | PCI_COMMAND_MEMORY,
>> +		.header_type		= PCI_HEADER_TYPE_NORMAL,
>> +		.revision_id		= 0,
>> +		.class[2]		= 0x03,
>> +		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>> +		.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>> +		.bar[0]			= cpu_to_le32(port_addr | PCI_BASE_ADDRESS_SPACE_IO),
>> +		.bar_size[0]		= PCI_IO_SIZE,
>> +		.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
>> +		.bar_size[1]		= VESA_MEM_SIZE,
>> +	};
>>  
>> -	vesafb = (struct framebuffer) {
>> +	vdev->fb = (struct framebuffer) {
>>  		.width			= VESA_WIDTH,
>>  		.height			= VESA_HEIGHT,
>>  		.depth			= VESA_BPP,
>> -		.mem			= mem,
>> +		.mem			= NULL,
>>  		.mem_addr		= VESA_MEM_ADDR,
>>  		.mem_size		= VESA_MEM_SIZE,
>>  		.kvm			= kvm,
>>  	};
>> -	return fb__register(&vesafb);
>> +
>> +	r = pci__register_bar_regions(kvm, &vdev->pci_hdr, vesa__bar_activate,
>> +				      vesa__bar_deactivate, vdev);
>> +	if (r < 0)
>> +		return ERR_PTR(r);
>> +
>> +	vdev->dev_hdr = (struct device_header) {
>> +		.bus_type       = DEVICE_BUS_PCI,
>> +		.data           = &vdev->pci_hdr,
>> +	};
>> +
>> +	r = device__register(&vdev->dev_hdr);
>> +	if (r < 0)
>> +		return ERR_PTR(r);
>> +
>> +	return fb__register(&vdev->fb);
>>  }
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index 235cd82fff3c..bf42f497168f 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -89,12 +89,19 @@ struct pci_cap_hdr {
>>  	u8	next;
>>  };
>>  
>> +struct pci_device_header;
>> +
>> +typedef int (*bar_activate_fn_t)(struct kvm *kvm,
>> +				 struct pci_device_header *pci_hdr,
>> +				 int bar_num, void *data);
>> +typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>> +				   struct pci_device_header *pci_hdr,
>> +				   int bar_num, void *data);
>> +
>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>>  #define PCI_DEV_CFG_SIZE	256
>>  #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>  
>> -struct pci_device_header;
>> -
>>  struct pci_config_operations {
>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>  		      u8 offset, void *data, int sz);
>> @@ -136,6 +143,9 @@ struct pci_device_header {
>>  
>>  	/* Private to lkvm */
>>  	u32		bar_size[6];
>> +	bar_activate_fn_t	bar_activate_fn;
>> +	bar_deactivate_fn_t	bar_deactivate_fn;
>> +	void *data;
>>  	struct pci_config_operations	cfg_ops;
>>  	/*
>>  	 * PCI INTx# are level-triggered, but virtual device often feature
>> @@ -160,8 +170,13 @@ void pci__assign_irq(struct device_header *dev_hdr);
>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size);
>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size);
>>  
>> +
> Stray empty line?

Indeed, will get rid of it.

Thanks,
Alex
>
> Cheers,
> Andre
>
>>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>>  
>> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			      bar_activate_fn_t bar_activate_fn,
>> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
>> +
>>  static inline bool __pci__memory_space_enabled(u16 command)
>>  {
>>  	return command & PCI_COMMAND_MEMORY;
>> diff --git a/pci.c b/pci.c
>> index 4f7b863298f6..5412f2defa2e 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -66,6 +66,11 @@ void pci__assign_irq(struct device_header *dev_hdr)
>>  		pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
>>  }
>>  
>> +static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
>> +{
>> +	return  bar_num < 6 && pci_hdr->bar_size[bar_num];
>> +}
>> +
>>  static void *pci_config_address_ptr(u16 port)
>>  {
>>  	unsigned long offset;
>> @@ -264,6 +269,45 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
>>  	return hdr->data;
>>  }
>>  
>> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			      bar_activate_fn_t bar_activate_fn,
>> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
>> +{
>> +	int i, r;
>> +	bool has_bar_regions = false;
>> +
>> +	assert(bar_activate_fn && bar_deactivate_fn);
>> +
>> +	pci_hdr->bar_activate_fn = bar_activate_fn;
>> +	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
>> +	pci_hdr->data = data;
>> +
>> +	for (i = 0; i < 6; i++) {
>> +		if (!pci_bar_is_implemented(pci_hdr, i))
>> +			continue;
>> +
>> +		has_bar_regions = true;
>> +
>> +		if (pci__bar_is_io(pci_hdr, i) &&
>> +		    pci__io_space_enabled(pci_hdr)) {
>> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
>> +				if (r < 0)
>> +					return r;
>> +			}
>> +
>> +		if (pci__bar_is_memory(pci_hdr, i) &&
>> +		    pci__memory_space_enabled(pci_hdr)) {
>> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
>> +				if (r < 0)
>> +					return r;
>> +			}
>> +	}
>> +
>> +	assert(has_bar_regions);
>> +
>> +	return 0;
>> +}
>> +
>>  int pci__init(struct kvm *kvm)
>>  {
>>  	int r;
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 8a775a4a4a54..9e595562180b 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -446,6 +446,83 @@ out_unlock:
>>  	mutex_unlock(&pdev->msi.mutex);
>>  }
>>  
>> +static int vfio_pci_bar_activate(struct kvm *kvm,
>> +				 struct pci_device_header *pci_hdr,
>> +				 int bar_num, void *data)
>> +{
>> +	struct vfio_device *vdev = data;
>> +	struct vfio_pci_device *pdev = &vdev->pci;
>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>> +	struct vfio_region *region = &vdev->regions[bar_num];
>> +	int ret;
>> +
>> +	if (!region->info.size) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>> +
>> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>> +	    (u32)bar_num == table->bar) {
>> +		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>> +					 table->size, false,
>> +					 vfio_pci_msix_table_access, pdev);
>> +		if (ret < 0 || table->bar!= pba->bar)
>> +			goto out;
>> +	}
>> +
>> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>> +	    (u32)bar_num == pba->bar) {
>> +		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>> +					 pba->size, false,
>> +					 vfio_pci_msix_pba_access, pdev);
>> +		goto out;
>> +	}
>> +
>> +	ret = vfio_map_region(kvm, vdev, region);
>> +out:
>> +	return ret;
>> +}
>> +
>> +static int vfio_pci_bar_deactivate(struct kvm *kvm,
>> +				   struct pci_device_header *pci_hdr,
>> +				   int bar_num, void *data)
>> +{
>> +	struct vfio_device *vdev = data;
>> +	struct vfio_pci_device *pdev = &vdev->pci;
>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>> +	struct vfio_region *region = &vdev->regions[bar_num];
>> +	int ret;
>> +	bool success;
>> +
>> +	if (!region->info.size) {
>> +		ret = -EINVAL;
>> +		goto out;
>> +	}
>> +
>> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>> +	    (u32)bar_num == table->bar) {
>> +		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
>> +		ret = (success ? 0 : -EINVAL);
>> +		if (ret < 0 || table->bar!= pba->bar)
>> +			goto out;
>> +	}
>> +
>> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>> +	    (u32)bar_num == pba->bar) {
>> +		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
>> +		ret = (success ? 0 : -EINVAL);
>> +		goto out;
>> +	}
>> +
>> +	vfio_unmap_region(kvm, region);
>> +	ret = 0;
>> +
>> +out:
>> +	return ret;
>> +}
>> +
>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>  			      u8 offset, void *data, int sz)
>>  {
>> @@ -804,12 +881,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>  		ret = -ENOMEM;
>>  		goto out_free;
>>  	}
>> -	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>> -
>> -	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
>> -				 false, vfio_pci_msix_table_access, pdev);
>> -	if (ret < 0)
>> -		goto out_free;
>>  
>>  	/*
>>  	 * We could map the physical PBA directly into the guest, but it's
>> @@ -819,10 +890,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>  	 * between MSI-X table and PBA. For the sake of isolation, create a
>>  	 * virtual PBA.
>>  	 */
>> -	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
>> -				 vfio_pci_msix_pba_access, pdev);
>> -	if (ret < 0)
>> -		goto out_free;
>> +	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>>  
>>  	pdev->msix.entries = entries;
>>  	pdev->msix.nr_entries = nr_entries;
>> @@ -893,11 +961,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>>  		region->guest_phys_addr = pci_get_mmio_block(map_size);
>>  	}
>>  
>> -	/* Map the BARs into the guest or setup a trap region. */
>> -	ret = vfio_map_region(kvm, vdev, region);
>> -	if (ret)
>> -		return ret;
>> -
>>  	return 0;
>>  }
>>  
>> @@ -944,7 +1007,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>>  	}
>>  
>>  	/* We've configured the BARs, fake up a Configuration Space */
>> -	return vfio_pci_fixup_cfg_space(vdev);
>> +	ret = vfio_pci_fixup_cfg_space(vdev);
>> +	if (ret)
>> +		return ret;
>> +
>> +	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
>> +					 vfio_pci_bar_deactivate, vdev);
>>  }
>>  
>>  /*
>> diff --git a/virtio/pci.c b/virtio/pci.c
>> index c4822514856c..5a3cc6f1e943 100644
>> --- a/virtio/pci.c
>> +++ b/virtio/pci.c
>> @@ -474,6 +474,65 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>>  		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
>>  }
>>  
>> +static int virtio_pci__bar_activate(struct kvm *kvm,
>> +				    struct pci_device_header *pci_hdr,
>> +				    int bar_num, void *data)
>> +{
>> +	struct virtio_device *vdev = data;
>> +	u32 bar_addr, bar_size;
>> +	int r;
>> +
>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>> +	bar_size = pci_hdr->bar_size[bar_num];
>> +
>> +	switch (bar_num) {
>> +	case 0:
>> +		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
>> +				     bar_size, vdev);
>> +		if (r > 0)
>> +			r = 0;
>> +		break;
>> +	case 1:
>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>> +					virtio_pci__io_mmio_callback, vdev);
>> +		break;
>> +	case 2:
>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>> +					virtio_pci__msix_mmio_callback, vdev);
>> +		break;
>> +	default:
>> +		r = -EINVAL;
>> +	}
>> +
>> +	return r;
>> +}
>> +
>> +static int virtio_pci__bar_deactivate(struct kvm *kvm,
>> +				      struct pci_device_header *pci_hdr,
>> +				      int bar_num, void *data)
>> +{
>> +	u32 bar_addr;
>> +	bool success;
>> +	int r;
>> +
>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>> +
>> +	switch (bar_num) {
>> +	case 0:
>> +		r = ioport__unregister(kvm, bar_addr);
>> +		break;
>> +	case 1:
>> +	case 2:
>> +		success = kvm__deregister_mmio(kvm, bar_addr);
>> +		r = (success ? 0 : -EINVAL);
>> +		break;
>> +	default:
>> +		r = -EINVAL;
>> +	}
>> +
>> +	return r;
>> +}
>> +
>>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>  		     int device_id, int subsys_id, int class)
>>  {
>> @@ -488,23 +547,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>>  
>>  	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
>> -	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
>> -			     vdev);
>> -	if (r < 0)
>> -		return r;
>> -	port_addr = (u16)r;
>> -
>>  	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
>> -	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
>> -			       virtio_pci__io_mmio_callback, vdev);
>> -	if (r < 0)
>> -		goto free_ioport;
>> -
>>  	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
>> -	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
>> -			       virtio_pci__msix_mmio_callback, vdev);
>> -	if (r < 0)
>> -		goto free_mmio;
>>  
>>  	vpci->pci_hdr = (struct pci_device_header) {
>>  		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>> @@ -530,6 +574,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>  		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
>>  	};
>>  
>> +	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
>> +				      virtio_pci__bar_activate,
>> +				      virtio_pci__bar_deactivate, vdev);
>> +	if (r < 0)
>> +		return r;
>> +
>>  	vpci->dev_hdr = (struct device_header) {
>>  		.bus_type		= DEVICE_BUS_PCI,
>>  		.data			= &vpci->pci_hdr,
>> @@ -560,20 +610,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>  
>>  	r = device__register(&vpci->dev_hdr);
>>  	if (r < 0)
>> -		goto free_msix_mmio;
>> +		return r;
>>  
>>  	/* save the IRQ that device__register() has allocated */
>>  	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
>>  
>>  	return 0;
>> -
>> -free_msix_mmio:
>> -	kvm__deregister_mmio(kvm, msix_io_block);
>> -free_mmio:
>> -	kvm__deregister_mmio(kvm, mmio_addr);
>> -free_ioport:
>> -	ioport__unregister(kvm, port_addr);
>> -	return r;
>>  }
>>  
>>  int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation
  2020-02-06 18:21   ` Andre Przywara
@ 2020-02-07 11:08     ` Alexandru Elisei
  2020-02-07 11:36       ` Andre Przywara
  0 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-02-07 11:08 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/6/20 6:21 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:48:01 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> During configuration of the BAR addresses, a Linux guest disables and
>> enables access to I/O and memory space. When access is disabled, we don't
>> stop emulating the memory regions described by the BARs. Now that we have
>> callbacks for activating and deactivating emulation for a BAR region,
>> let's use that to stop emulation when access is disabled, and
>> re-activate it when access is re-enabled.
>>
>> The vesa emulation hasn't been designed with toggling on and off in
>> mind, so refuse writes to the PCI command register that disable memory
>> or IO access.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  hw/vesa.c | 16 ++++++++++++++++
>>  pci.c     | 42 ++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 58 insertions(+)
>>
>> diff --git a/hw/vesa.c b/hw/vesa.c
>> index 74ebebbefa6b..3044a86078fb 100644
>> --- a/hw/vesa.c
>> +++ b/hw/vesa.c
>> @@ -81,6 +81,18 @@ static int vesa__bar_deactivate(struct kvm *kvm,
>>  	return -EINVAL;
>>  }
>>  
>> +static void vesa__pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +				u8 offset, void *data, int sz)
>> +{
>> +	u32 value;
> I guess the same comment as on the other patch applies: using u64 looks safer to me. Also you should clear it, to avoid nasty surprises in case of a short write (1 or 2 bytes only).

I was under the impression that the maximum size for a write to the PCI CAM or
ECAM space is 32 bits. This is certainly what I've seen when running Linux, and
the assumption in the PCI emulation code which has been working since 2010. I'm
trying to dig out more information about this.

If it's not, then we have a bigger problem because the PCI emulation code doesn't
support it, and to account for it we would need to add a certain amount of logic
to the code to deal with it: what if a write hits the command register and another
adjacent register? what if a write hits two BARs? A BAR and a regular register
before/after it? Part of a BAR and two registers before/after? You can see where
this is going.

Until we find exactly where in a PCI spec says that 64 bit writes to the
configuration space are allowed, I would rather avoid all this complexity and
assume that the guest is sane and will only write 32 bit values.

Thanks,
Alex
>
> The rest looks alright.
>
> Cheers,
> Andre
>
>> +
>> +	if (offset == PCI_COMMAND) {
>> +		memcpy(&value, data, sz);
>> +		value |= (PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
>> +		memcpy(data, &value, sz);
>> +	}
>> +}
>> +
>>  struct framebuffer *vesa__init(struct kvm *kvm)
>>  {
>>  	struct vesa_dev *vdev;
>> @@ -114,6 +126,10 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>  		.bar_size[1]		= VESA_MEM_SIZE,
>>  	};
>>  
>> +	vdev->pci_hdr.cfg_ops = (struct pci_config_operations) {
>> +		.write	= vesa__pci_cfg_write,
>> +	};
>> +
>>  	vdev->fb = (struct framebuffer) {
>>  		.width			= VESA_WIDTH,
>>  		.height			= VESA_HEIGHT,
>> diff --git a/pci.c b/pci.c
>> index 5412f2defa2e..98331a1fc205 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -157,6 +157,42 @@ static struct ioport_operations pci_config_data_ops = {
>>  	.io_out	= pci_config_data_out,
>>  };
>>  
>> +static void pci_config_command_wr(struct kvm *kvm,
>> +				  struct pci_device_header *pci_hdr,
>> +				  u16 new_command)
>> +{
>> +	int i;
>> +	bool toggle_io, toggle_mem;
>> +
>> +	toggle_io = (pci_hdr->command ^ new_command) & PCI_COMMAND_IO;
>> +	toggle_mem = (pci_hdr->command ^ new_command) & PCI_COMMAND_MEMORY;
>> +
>> +	for (i = 0; i < 6; i++) {
>> +		if (!pci_bar_is_implemented(pci_hdr, i))
>> +			continue;
>> +
>> +		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
>> +			if (__pci__io_space_enabled(new_command))
>> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>> +							 pci_hdr->data);
>> +			else
>> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>> +							   pci_hdr->data);
>> +		}
>> +
>> +		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
>> +			if (__pci__memory_space_enabled(new_command))
>> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>> +							 pci_hdr->data);
>> +			else
>> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>> +							   pci_hdr->data);
>> +		}
>> +	}
>> +
>> +	pci_hdr->command = new_command;
>> +}
>> +
>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>>  	void *base;
>> @@ -182,6 +218,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	if (*(u32 *)(base + offset) == 0)
>>  		return;
>>  
>> +	if (offset == PCI_COMMAND) {
>> +		memcpy(&value, data, size);
>> +		pci_config_command_wr(kvm, pci_hdr, (u16)value);
>> +		return;
>> +	}
>> +
>>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>>  
>>  	/*

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation
  2020-02-07 11:08     ` Alexandru Elisei
@ 2020-02-07 11:36       ` Andre Przywara
  2020-02-07 11:44         ` Alexandru Elisei
  2020-03-09 14:54         ` Alexandru Elisei
  0 siblings, 2 replies; 88+ messages in thread
From: Andre Przywara @ 2020-02-07 11:36 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Fri, 7 Feb 2020 11:08:19 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> On 2/6/20 6:21 PM, Andre Przywara wrote:
> > On Thu, 23 Jan 2020 13:48:01 +0000
> > Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> >
> > Hi,
> >  
> >> During configuration of the BAR addresses, a Linux guest disables and
> >> enables access to I/O and memory space. When access is disabled, we don't
> >> stop emulating the memory regions described by the BARs. Now that we have
> >> callbacks for activating and deactivating emulation for a BAR region,
> >> let's use that to stop emulation when access is disabled, and
> >> re-activate it when access is re-enabled.
> >>
> >> The vesa emulation hasn't been designed with toggling on and off in
> >> mind, so refuse writes to the PCI command register that disable memory
> >> or IO access.
> >>
> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> >> ---
> >>  hw/vesa.c | 16 ++++++++++++++++
> >>  pci.c     | 42 ++++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 58 insertions(+)
> >>
> >> diff --git a/hw/vesa.c b/hw/vesa.c
> >> index 74ebebbefa6b..3044a86078fb 100644
> >> --- a/hw/vesa.c
> >> +++ b/hw/vesa.c
> >> @@ -81,6 +81,18 @@ static int vesa__bar_deactivate(struct kvm *kvm,
> >>  	return -EINVAL;
> >>  }
> >>  
> >> +static void vesa__pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
> >> +				u8 offset, void *data, int sz)
> >> +{
> >> +	u32 value;  
> > I guess the same comment as on the other patch applies: using u64 looks safer to me. Also you should clear it, to avoid nasty surprises in case of a short write (1 or 2 bytes only).  
> 
> I was under the impression that the maximum size for a write to the PCI CAM or
> ECAM space is 32 bits. This is certainly what I've seen when running Linux, and
> the assumption in the PCI emulation code which has been working since 2010. I'm
> trying to dig out more information about this.
> 
> If it's not, then we have a bigger problem because the PCI emulation code doesn't
> support it, and to account for it we would need to add a certain amount of logic
> to the code to deal with it: what if a write hits the command register and another
> adjacent register? what if a write hits two BARs? A BAR and a regular register
> before/after it? Part of a BAR and two registers before/after? You can see where
> this is going.
> 
> Until we find exactly where in a PCI spec says that 64 bit writes to the
> configuration space are allowed, I would rather avoid all this complexity and
> assume that the guest is sane and will only write 32 bit values.

I don't think it's allowed, but that's not the point here:
If a (malicious?) guest does a 64-bit write, it will overwrite kvmtool's stack. We should not allow that. We don't need to behave correctly, but the guest should not be able to affect the host (VMM). All it should take is to have "u64 value = 0;" to fix that.

Another possibility would be to filter for legal MMIO lengths earlier.

Cheers,
Andre.

> 
> Thanks,
> Alex
> >
> > The rest looks alright.
> >
> > Cheers,
> > Andre
> >  
> >> +
> >> +	if (offset == PCI_COMMAND) {
> >> +		memcpy(&value, data, sz);
> >> +		value |= (PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
> >> +		memcpy(data, &value, sz);
> >> +	}
> >> +}
> >> +
> >>  struct framebuffer *vesa__init(struct kvm *kvm)
> >>  {
> >>  	struct vesa_dev *vdev;
> >> @@ -114,6 +126,10 @@ struct framebuffer *vesa__init(struct kvm *kvm)
> >>  		.bar_size[1]		= VESA_MEM_SIZE,
> >>  	};
> >>  
> >> +	vdev->pci_hdr.cfg_ops = (struct pci_config_operations) {
> >> +		.write	= vesa__pci_cfg_write,
> >> +	};
> >> +
> >>  	vdev->fb = (struct framebuffer) {
> >>  		.width			= VESA_WIDTH,
> >>  		.height			= VESA_HEIGHT,
> >> diff --git a/pci.c b/pci.c
> >> index 5412f2defa2e..98331a1fc205 100644
> >> --- a/pci.c
> >> +++ b/pci.c
> >> @@ -157,6 +157,42 @@ static struct ioport_operations pci_config_data_ops = {
> >>  	.io_out	= pci_config_data_out,
> >>  };
> >>  
> >> +static void pci_config_command_wr(struct kvm *kvm,
> >> +				  struct pci_device_header *pci_hdr,
> >> +				  u16 new_command)
> >> +{
> >> +	int i;
> >> +	bool toggle_io, toggle_mem;
> >> +
> >> +	toggle_io = (pci_hdr->command ^ new_command) & PCI_COMMAND_IO;
> >> +	toggle_mem = (pci_hdr->command ^ new_command) & PCI_COMMAND_MEMORY;
> >> +
> >> +	for (i = 0; i < 6; i++) {
> >> +		if (!pci_bar_is_implemented(pci_hdr, i))
> >> +			continue;
> >> +
> >> +		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
> >> +			if (__pci__io_space_enabled(new_command))
> >> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> >> +							 pci_hdr->data);
> >> +			else
> >> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> >> +							   pci_hdr->data);
> >> +		}
> >> +
> >> +		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
> >> +			if (__pci__memory_space_enabled(new_command))
> >> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> >> +							 pci_hdr->data);
> >> +			else
> >> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> >> +							   pci_hdr->data);
> >> +		}
> >> +	}
> >> +
> >> +	pci_hdr->command = new_command;
> >> +}
> >> +
> >>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
> >>  {
> >>  	void *base;
> >> @@ -182,6 +218,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
> >>  	if (*(u32 *)(base + offset) == 0)
> >>  		return;
> >>  
> >> +	if (offset == PCI_COMMAND) {
> >> +		memcpy(&value, data, size);
> >> +		pci_config_command_wr(kvm, pci_hdr, (u16)value);
> >> +		return;
> >> +	}
> >> +
> >>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
> >>  
> >>  	/*  


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation
  2020-02-07 11:36       ` Andre Przywara
@ 2020-02-07 11:44         ` Alexandru Elisei
  2020-03-09 14:54         ` Alexandru Elisei
  1 sibling, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-02-07 11:44 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/7/20 11:36 AM, Andre Przywara wrote:
> On Fri, 7 Feb 2020 11:08:19 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> On 2/6/20 6:21 PM, Andre Przywara wrote:
>>> On Thu, 23 Jan 2020 13:48:01 +0000
>>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>>
>>> Hi,
>>>  
>>>> During configuration of the BAR addresses, a Linux guest disables and
>>>> enables access to I/O and memory space. When access is disabled, we don't
>>>> stop emulating the memory regions described by the BARs. Now that we have
>>>> callbacks for activating and deactivating emulation for a BAR region,
>>>> let's use that to stop emulation when access is disabled, and
>>>> re-activate it when access is re-enabled.
>>>>
>>>> The vesa emulation hasn't been designed with toggling on and off in
>>>> mind, so refuse writes to the PCI command register that disable memory
>>>> or IO access.
>>>>
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>>> ---
>>>>  hw/vesa.c | 16 ++++++++++++++++
>>>>  pci.c     | 42 ++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 58 insertions(+)
>>>>
>>>> diff --git a/hw/vesa.c b/hw/vesa.c
>>>> index 74ebebbefa6b..3044a86078fb 100644
>>>> --- a/hw/vesa.c
>>>> +++ b/hw/vesa.c
>>>> @@ -81,6 +81,18 @@ static int vesa__bar_deactivate(struct kvm *kvm,
>>>>  	return -EINVAL;
>>>>  }
>>>>  
>>>> +static void vesa__pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> +				u8 offset, void *data, int sz)
>>>> +{
>>>> +	u32 value;  
>>> I guess the same comment as on the other patch applies: using u64 looks safer to me. Also you should clear it, to avoid nasty surprises in case of a short write (1 or 2 bytes only).  
>> I was under the impression that the maximum size for a write to the PCI CAM or
>> ECAM space is 32 bits. This is certainly what I've seen when running Linux, and
>> the assumption in the PCI emulation code which has been working since 2010. I'm
>> trying to dig out more information about this.
>>
>> If it's not, then we have a bigger problem because the PCI emulation code doesn't
>> support it, and to account for it we would need to add a certain amount of logic
>> to the code to deal with it: what if a write hits the command register and another
>> adjacent register? what if a write hits two BARs? A BAR and a regular register
>> before/after it? Part of a BAR and two registers before/after? You can see where
>> this is going.
>>
>> Until we find exactly where in a PCI spec says that 64 bit writes to the
>> configuration space are allowed, I would rather avoid all this complexity and
>> assume that the guest is sane and will only write 32 bit values.
> I don't think it's allowed, but that's not the point here:
> If a (malicious?) guest does a 64-bit write, it will overwrite kvmtool's stack. We should not allow that. We don't need to behave correctly, but the guest should not be able to affect the host (VMM). All it should take is to have "u64 value = 0;" to fix that.
>
> Another possibility would be to filter for legal MMIO lengths earlier.

I would rather respond to accesses to the PCI config space which are longer than
32 bits with a MASTER ABORT. I think this would be more robust.

Thanks,
Alex
>
> Cheers,
> Andre.
>
>> Thanks,
>> Alex
>>> The rest looks alright.
>>>
>>> Cheers,
>>> Andre
>>>  
>>>> +
>>>> +	if (offset == PCI_COMMAND) {
>>>> +		memcpy(&value, data, sz);
>>>> +		value |= (PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
>>>> +		memcpy(data, &value, sz);
>>>> +	}
>>>> +}
>>>> +
>>>>  struct framebuffer *vesa__init(struct kvm *kvm)
>>>>  {
>>>>  	struct vesa_dev *vdev;
>>>> @@ -114,6 +126,10 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>>>  		.bar_size[1]		= VESA_MEM_SIZE,
>>>>  	};
>>>>  
>>>> +	vdev->pci_hdr.cfg_ops = (struct pci_config_operations) {
>>>> +		.write	= vesa__pci_cfg_write,
>>>> +	};
>>>> +
>>>>  	vdev->fb = (struct framebuffer) {
>>>>  		.width			= VESA_WIDTH,
>>>>  		.height			= VESA_HEIGHT,
>>>> diff --git a/pci.c b/pci.c
>>>> index 5412f2defa2e..98331a1fc205 100644
>>>> --- a/pci.c
>>>> +++ b/pci.c
>>>> @@ -157,6 +157,42 @@ static struct ioport_operations pci_config_data_ops = {
>>>>  	.io_out	= pci_config_data_out,
>>>>  };
>>>>  
>>>> +static void pci_config_command_wr(struct kvm *kvm,
>>>> +				  struct pci_device_header *pci_hdr,
>>>> +				  u16 new_command)
>>>> +{
>>>> +	int i;
>>>> +	bool toggle_io, toggle_mem;
>>>> +
>>>> +	toggle_io = (pci_hdr->command ^ new_command) & PCI_COMMAND_IO;
>>>> +	toggle_mem = (pci_hdr->command ^ new_command) & PCI_COMMAND_MEMORY;
>>>> +
>>>> +	for (i = 0; i < 6; i++) {
>>>> +		if (!pci_bar_is_implemented(pci_hdr, i))
>>>> +			continue;
>>>> +
>>>> +		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
>>>> +			if (__pci__io_space_enabled(new_command))
>>>> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>>>> +							 pci_hdr->data);
>>>> +			else
>>>> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>>>> +							   pci_hdr->data);
>>>> +		}
>>>> +
>>>> +		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
>>>> +			if (__pci__memory_space_enabled(new_command))
>>>> +				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>>>> +							 pci_hdr->data);
>>>> +			else
>>>> +				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>>>> +							   pci_hdr->data);
>>>> +		}
>>>> +	}
>>>> +
>>>> +	pci_hdr->command = new_command;
>>>> +}
>>>> +
>>>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>>>  {
>>>>  	void *base;
>>>> @@ -182,6 +218,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>>>  	if (*(u32 *)(base + offset) == 0)
>>>>  		return;
>>>>  
>>>> +	if (offset == PCI_COMMAND) {
>>>> +		memcpy(&value, data, size);
>>>> +		pci_config_command_wr(kvm, pci_hdr, (u16)value);
>>>> +		return;
>>>> +	}
>>>> +
>>>>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>>>>  
>>>>  	/*  

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation
  2020-02-07 10:12     ` Alexandru Elisei
@ 2020-02-07 15:39       ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-02-07 15:39 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/7/20 10:12 AM, Alexandru Elisei wrote:
> Hi,
>
> On 2/6/20 6:21 PM, Andre Przywara wrote:
>> On Thu, 23 Jan 2020 13:48:00 +0000
>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>
>> Hi,
>>
>>> Implement callbacks for activating and deactivating emulation for a BAR
>>> region. This is in preparation for allowing a guest operating system to
>>> enable and disable access to I/O or memory space, or to reassign the
>>> BARs.
>>>
>>> The emulated vesa device has been refactored in the process and the static
>>> variables were removed in order to make using the callbacks less painful.
>>> The framebuffer isn't designed to allow stopping and restarting at
>>> arbitrary points in the guest execution. Furthermore, on x86, the kernel
>>> will not change the BAR addresses, which on bare metal are programmed by
>>> the firmware, so take the easy way out and refuse to deactivate emulation
>>> for the BAR regions.
>>>
>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>> ---
>>>  hw/vesa.c         | 120 ++++++++++++++++++++++++++++++++--------------
>>>  include/kvm/pci.h |  19 +++++++-
>>>  pci.c             |  44 +++++++++++++++++
>>>  vfio/pci.c        | 100 +++++++++++++++++++++++++++++++-------
>>>  virtio/pci.c      |  90 ++++++++++++++++++++++++----------
>>>  5 files changed, 294 insertions(+), 79 deletions(-)
>>>
>>> diff --git a/hw/vesa.c b/hw/vesa.c
>>> index e988c0425946..74ebebbefa6b 100644
>>> --- a/hw/vesa.c
>>> +++ b/hw/vesa.c
>>> @@ -18,6 +18,12 @@
>>>  #include <inttypes.h>
>>>  #include <unistd.h>
>>>  
>>> +struct vesa_dev {
>>> +	struct pci_device_header	pci_hdr;
>>> +	struct device_header		dev_hdr;
>>> +	struct framebuffer		fb;
>>> +};
>>> +
>>>  static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>>>  {
>>>  	return true;
>>> @@ -33,29 +39,52 @@ static struct ioport_operations vesa_io_ops = {
>>>  	.io_out			= vesa_pci_io_out,
>>>  };
>>>  
>>> -static struct pci_device_header vesa_pci_device = {
>>> -	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>>> -	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
>>> -	.header_type		= PCI_HEADER_TYPE_NORMAL,
>>> -	.revision_id		= 0,
>>> -	.class[2]		= 0x03,
>>> -	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>>> -	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>>> -	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
>>> -	.bar_size[1]		= VESA_MEM_SIZE,
>>> -};
>>> +static int vesa__bar_activate(struct kvm *kvm,
>>> +			      struct pci_device_header *pci_hdr,
>>> +			      int bar_num, void *data)
>>> +{
>>> +	struct vesa_dev *vdev = data;
>>> +	u32 bar_addr, bar_size;
>>> +	char *mem;
>>> +	int r;
>>>  
>>> -static struct device_header vesa_device = {
>>> -	.bus_type	= DEVICE_BUS_PCI,
>>> -	.data		= &vesa_pci_device,
>>> -};
>>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>>> +	bar_size = pci_hdr->bar_size[bar_num];
>>>  
>>> -static struct framebuffer vesafb;
>>> +	switch (bar_num) {
>>> +	case 0:
>>> +		r = ioport__register(kvm, bar_addr, &vesa_io_ops, bar_size,
>>> +				     NULL);
>>> +		break;
>>> +	case 1:
>>> +		mem = mmap(NULL, bar_size, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
>>> +		if (mem == MAP_FAILED) {
>>> +			r = -errno;
>>> +			break;
>>> +		}
>>> +		r = kvm__register_dev_mem(kvm, bar_addr, bar_size, mem);
>>> +		if (r < 0)
>>> +			break;
>>> +		vdev->fb.mem = mem;
>>> +		break;
>>> +	default:
>>> +		r = -EINVAL;
>>> +	}
>>> +
>>> +	return r;
>>> +}
>>> +
>>> +static int vesa__bar_deactivate(struct kvm *kvm,
>>> +				struct pci_device_header *pci_hdr,
>>> +				int bar_num, void *data)
>>> +{
>>> +	return -EINVAL;
>>> +}
>>>  
>>>  struct framebuffer *vesa__init(struct kvm *kvm)
>>>  {
>>> -	u16 vesa_base_addr;
>>> -	char *mem;
>>> +	struct vesa_dev *vdev;
>>> +	u16 port_addr;
>>>  	int r;
>>>  
>>>  	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
>>> @@ -63,34 +92,51 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>>  
>>>  	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>>>  		return NULL;
>>> -	r = pci_get_io_port_block(PCI_IO_SIZE);
>>> -	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
>>> -	if (r < 0)
>>> -		return ERR_PTR(r);
>>>  
>>> -	vesa_base_addr			= (u16)r;
>>> -	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
>>> -	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
>>> -	r = device__register(&vesa_device);
>>> -	if (r < 0)
>>> -		return ERR_PTR(r);
>>> +	vdev = calloc(1, sizeof(*vdev));
>>> +	if (vdev == NULL)
>>> +		return ERR_PTR(-ENOMEM);
>> Is it really necessary to allocate this here? You never free this, and I don't see how you could actually do this. AFAICS conceptually there can be only one VESA device? So maybe have a static variable above and use that instead of passing the pointer around? Or use &vdev if you need a pointer argument for the callbacks.
> As far as I can tell, there can be only one VESA device, yes. I was following the
> same pattern from virtio/{net,blk,rng,scsi,9p}.c, which I prefer because it's
> explicit what function can access the device. What's wrong with passing the
> pointer around? The entire PCI emulation code works like that.

Coming back to this, I did some testing on my x86 machine and kvmtool breaks
spectacularly if you specify more than one UI option (more than one of --sdl,
--gtk, --vnc). I'm not sure what the original intent was, but right now specifying
only one option (and having one VESA device) is the only configuration that works.
I'll write a patch to make sure that the user specifies only one option.

I also looked at virtio/console.c and virtio/balloon.c, because only one instance
can be created for a VM. They too declare the device struct as static, and in
console.c the usage is inconsistent: in some callbacks, they use the device
argument, in others they use the static device directly. This is exactly the kind
of thing that I am trying to avoid (for this patch and future patches).

Thanks,
Alex
>
>>>  
>>> -	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
>>> -	if (mem == MAP_FAILED)
>>> -		return ERR_PTR(-errno);
>>> +	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
>>>  
>>> -	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
>>> -	if (r < 0)
>>> -		return ERR_PTR(r);
>>> +	vdev->pci_hdr = (struct pci_device_header) {
>>> +		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>>> +		.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
>>> +		.command		= PCI_COMMAND_IO | PCI_COMMAND_MEMORY,
>>> +		.header_type		= PCI_HEADER_TYPE_NORMAL,
>>> +		.revision_id		= 0,
>>> +		.class[2]		= 0x03,
>>> +		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>>> +		.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>>> +		.bar[0]			= cpu_to_le32(port_addr | PCI_BASE_ADDRESS_SPACE_IO),
>>> +		.bar_size[0]		= PCI_IO_SIZE,
>>> +		.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
>>> +		.bar_size[1]		= VESA_MEM_SIZE,
>>> +	};
>>>  
>>> -	vesafb = (struct framebuffer) {
>>> +	vdev->fb = (struct framebuffer) {
>>>  		.width			= VESA_WIDTH,
>>>  		.height			= VESA_HEIGHT,
>>>  		.depth			= VESA_BPP,
>>> -		.mem			= mem,
>>> +		.mem			= NULL,
>>>  		.mem_addr		= VESA_MEM_ADDR,
>>>  		.mem_size		= VESA_MEM_SIZE,
>>>  		.kvm			= kvm,
>>>  	};
>>> -	return fb__register(&vesafb);
>>> +
>>> +	r = pci__register_bar_regions(kvm, &vdev->pci_hdr, vesa__bar_activate,
>>> +				      vesa__bar_deactivate, vdev);
>>> +	if (r < 0)
>>> +		return ERR_PTR(r);
>>> +
>>> +	vdev->dev_hdr = (struct device_header) {
>>> +		.bus_type       = DEVICE_BUS_PCI,
>>> +		.data           = &vdev->pci_hdr,
>>> +	};
>>> +
>>> +	r = device__register(&vdev->dev_hdr);
>>> +	if (r < 0)
>>> +		return ERR_PTR(r);
>>> +
>>> +	return fb__register(&vdev->fb);
>>>  }
>>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>>> index 235cd82fff3c..bf42f497168f 100644
>>> --- a/include/kvm/pci.h
>>> +++ b/include/kvm/pci.h
>>> @@ -89,12 +89,19 @@ struct pci_cap_hdr {
>>>  	u8	next;
>>>  };
>>>  
>>> +struct pci_device_header;
>>> +
>>> +typedef int (*bar_activate_fn_t)(struct kvm *kvm,
>>> +				 struct pci_device_header *pci_hdr,
>>> +				 int bar_num, void *data);
>>> +typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>>> +				   struct pci_device_header *pci_hdr,
>>> +				   int bar_num, void *data);
>>> +
>>>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>>>  #define PCI_DEV_CFG_SIZE	256
>>>  #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>>  
>>> -struct pci_device_header;
>>> -
>>>  struct pci_config_operations {
>>>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>  		      u8 offset, void *data, int sz);
>>> @@ -136,6 +143,9 @@ struct pci_device_header {
>>>  
>>>  	/* Private to lkvm */
>>>  	u32		bar_size[6];
>>> +	bar_activate_fn_t	bar_activate_fn;
>>> +	bar_deactivate_fn_t	bar_deactivate_fn;
>>> +	void *data;
>>>  	struct pci_config_operations	cfg_ops;
>>>  	/*
>>>  	 * PCI INTx# are level-triggered, but virtual device often feature
>>> @@ -160,8 +170,13 @@ void pci__assign_irq(struct device_header *dev_hdr);
>>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size);
>>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size);
>>>  
>>> +
>> Stray empty line?
> Indeed, will get rid of it.
>
> Thanks,
> Alex
>> Cheers,
>> Andre
>>
>>>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>>>  
>>> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> +			      bar_activate_fn_t bar_activate_fn,
>>> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
>>> +
>>>  static inline bool __pci__memory_space_enabled(u16 command)
>>>  {
>>>  	return command & PCI_COMMAND_MEMORY;
>>> diff --git a/pci.c b/pci.c
>>> index 4f7b863298f6..5412f2defa2e 100644
>>> --- a/pci.c
>>> +++ b/pci.c
>>> @@ -66,6 +66,11 @@ void pci__assign_irq(struct device_header *dev_hdr)
>>>  		pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
>>>  }
>>>  
>>> +static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
>>> +{
>>> +	return  bar_num < 6 && pci_hdr->bar_size[bar_num];
>>> +}
>>> +
>>>  static void *pci_config_address_ptr(u16 port)
>>>  {
>>>  	unsigned long offset;
>>> @@ -264,6 +269,45 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
>>>  	return hdr->data;
>>>  }
>>>  
>>> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>> +			      bar_activate_fn_t bar_activate_fn,
>>> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
>>> +{
>>> +	int i, r;
>>> +	bool has_bar_regions = false;
>>> +
>>> +	assert(bar_activate_fn && bar_deactivate_fn);
>>> +
>>> +	pci_hdr->bar_activate_fn = bar_activate_fn;
>>> +	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
>>> +	pci_hdr->data = data;
>>> +
>>> +	for (i = 0; i < 6; i++) {
>>> +		if (!pci_bar_is_implemented(pci_hdr, i))
>>> +			continue;
>>> +
>>> +		has_bar_regions = true;
>>> +
>>> +		if (pci__bar_is_io(pci_hdr, i) &&
>>> +		    pci__io_space_enabled(pci_hdr)) {
>>> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
>>> +				if (r < 0)
>>> +					return r;
>>> +			}
>>> +
>>> +		if (pci__bar_is_memory(pci_hdr, i) &&
>>> +		    pci__memory_space_enabled(pci_hdr)) {
>>> +				r = bar_activate_fn(kvm, pci_hdr, i, data);
>>> +				if (r < 0)
>>> +					return r;
>>> +			}
>>> +	}
>>> +
>>> +	assert(has_bar_regions);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  int pci__init(struct kvm *kvm)
>>>  {
>>>  	int r;
>>> diff --git a/vfio/pci.c b/vfio/pci.c
>>> index 8a775a4a4a54..9e595562180b 100644
>>> --- a/vfio/pci.c
>>> +++ b/vfio/pci.c
>>> @@ -446,6 +446,83 @@ out_unlock:
>>>  	mutex_unlock(&pdev->msi.mutex);
>>>  }
>>>  
>>> +static int vfio_pci_bar_activate(struct kvm *kvm,
>>> +				 struct pci_device_header *pci_hdr,
>>> +				 int bar_num, void *data)
>>> +{
>>> +	struct vfio_device *vdev = data;
>>> +	struct vfio_pci_device *pdev = &vdev->pci;
>>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>>> +	struct vfio_region *region = &vdev->regions[bar_num];
>>> +	int ret;
>>> +
>>> +	if (!region->info.size) {
>>> +		ret = -EINVAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>>> +	    (u32)bar_num == table->bar) {
>>> +		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>>> +					 table->size, false,
>>> +					 vfio_pci_msix_table_access, pdev);
>>> +		if (ret < 0 || table->bar!= pba->bar)
>>> +			goto out;
>>> +	}
>>> +
>>> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>>> +	    (u32)bar_num == pba->bar) {
>>> +		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>>> +					 pba->size, false,
>>> +					 vfio_pci_msix_pba_access, pdev);
>>> +		goto out;
>>> +	}
>>> +
>>> +	ret = vfio_map_region(kvm, vdev, region);
>>> +out:
>>> +	return ret;
>>> +}
>>> +
>>> +static int vfio_pci_bar_deactivate(struct kvm *kvm,
>>> +				   struct pci_device_header *pci_hdr,
>>> +				   int bar_num, void *data)
>>> +{
>>> +	struct vfio_device *vdev = data;
>>> +	struct vfio_pci_device *pdev = &vdev->pci;
>>> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>>> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
>>> +	struct vfio_region *region = &vdev->regions[bar_num];
>>> +	int ret;
>>> +	bool success;
>>> +
>>> +	if (!region->info.size) {
>>> +		ret = -EINVAL;
>>> +		goto out;
>>> +	}
>>> +
>>> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>>> +	    (u32)bar_num == table->bar) {
>>> +		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
>>> +		ret = (success ? 0 : -EINVAL);
>>> +		if (ret < 0 || table->bar!= pba->bar)
>>> +			goto out;
>>> +	}
>>> +
>>> +	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>>> +	    (u32)bar_num == pba->bar) {
>>> +		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
>>> +		ret = (success ? 0 : -EINVAL);
>>> +		goto out;
>>> +	}
>>> +
>>> +	vfio_unmap_region(kvm, region);
>>> +	ret = 0;
>>> +
>>> +out:
>>> +	return ret;
>>> +}
>>> +
>>>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>  			      u8 offset, void *data, int sz)
>>>  {
>>> @@ -804,12 +881,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>>  		ret = -ENOMEM;
>>>  		goto out_free;
>>>  	}
>>> -	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>>> -
>>> -	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
>>> -				 false, vfio_pci_msix_table_access, pdev);
>>> -	if (ret < 0)
>>> -		goto out_free;
>>>  
>>>  	/*
>>>  	 * We could map the physical PBA directly into the guest, but it's
>>> @@ -819,10 +890,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>>>  	 * between MSI-X table and PBA. For the sake of isolation, create a
>>>  	 * virtual PBA.
>>>  	 */
>>> -	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
>>> -				 vfio_pci_msix_pba_access, pdev);
>>> -	if (ret < 0)
>>> -		goto out_free;
>>> +	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>>>  
>>>  	pdev->msix.entries = entries;
>>>  	pdev->msix.nr_entries = nr_entries;
>>> @@ -893,11 +961,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>>>  		region->guest_phys_addr = pci_get_mmio_block(map_size);
>>>  	}
>>>  
>>> -	/* Map the BARs into the guest or setup a trap region. */
>>> -	ret = vfio_map_region(kvm, vdev, region);
>>> -	if (ret)
>>> -		return ret;
>>> -
>>>  	return 0;
>>>  }
>>>  
>>> @@ -944,7 +1007,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>>>  	}
>>>  
>>>  	/* We've configured the BARs, fake up a Configuration Space */
>>> -	return vfio_pci_fixup_cfg_space(vdev);
>>> +	ret = vfio_pci_fixup_cfg_space(vdev);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
>>> +					 vfio_pci_bar_deactivate, vdev);
>>>  }
>>>  
>>>  /*
>>> diff --git a/virtio/pci.c b/virtio/pci.c
>>> index c4822514856c..5a3cc6f1e943 100644
>>> --- a/virtio/pci.c
>>> +++ b/virtio/pci.c
>>> @@ -474,6 +474,65 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>>>  		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
>>>  }
>>>  
>>> +static int virtio_pci__bar_activate(struct kvm *kvm,
>>> +				    struct pci_device_header *pci_hdr,
>>> +				    int bar_num, void *data)
>>> +{
>>> +	struct virtio_device *vdev = data;
>>> +	u32 bar_addr, bar_size;
>>> +	int r;
>>> +
>>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>>> +	bar_size = pci_hdr->bar_size[bar_num];
>>> +
>>> +	switch (bar_num) {
>>> +	case 0:
>>> +		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
>>> +				     bar_size, vdev);
>>> +		if (r > 0)
>>> +			r = 0;
>>> +		break;
>>> +	case 1:
>>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>>> +					virtio_pci__io_mmio_callback, vdev);
>>> +		break;
>>> +	case 2:
>>> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
>>> +					virtio_pci__msix_mmio_callback, vdev);
>>> +		break;
>>> +	default:
>>> +		r = -EINVAL;
>>> +	}
>>> +
>>> +	return r;
>>> +}
>>> +
>>> +static int virtio_pci__bar_deactivate(struct kvm *kvm,
>>> +				      struct pci_device_header *pci_hdr,
>>> +				      int bar_num, void *data)
>>> +{
>>> +	u32 bar_addr;
>>> +	bool success;
>>> +	int r;
>>> +
>>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>>> +
>>> +	switch (bar_num) {
>>> +	case 0:
>>> +		r = ioport__unregister(kvm, bar_addr);
>>> +		break;
>>> +	case 1:
>>> +	case 2:
>>> +		success = kvm__deregister_mmio(kvm, bar_addr);
>>> +		r = (success ? 0 : -EINVAL);
>>> +		break;
>>> +	default:
>>> +		r = -EINVAL;
>>> +	}
>>> +
>>> +	return r;
>>> +}
>>> +
>>>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>  		     int device_id, int subsys_id, int class)
>>>  {
>>> @@ -488,23 +547,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>>>  
>>>  	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
>>> -	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
>>> -			     vdev);
>>> -	if (r < 0)
>>> -		return r;
>>> -	port_addr = (u16)r;
>>> -
>>>  	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
>>> -	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
>>> -			       virtio_pci__io_mmio_callback, vdev);
>>> -	if (r < 0)
>>> -		goto free_ioport;
>>> -
>>>  	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
>>> -	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
>>> -			       virtio_pci__msix_mmio_callback, vdev);
>>> -	if (r < 0)
>>> -		goto free_mmio;
>>>  
>>>  	vpci->pci_hdr = (struct pci_device_header) {
>>>  		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
>>> @@ -530,6 +574,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>  		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
>>>  	};
>>>  
>>> +	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
>>> +				      virtio_pci__bar_activate,
>>> +				      virtio_pci__bar_deactivate, vdev);
>>> +	if (r < 0)
>>> +		return r;
>>> +
>>>  	vpci->dev_hdr = (struct device_header) {
>>>  		.bus_type		= DEVICE_BUS_PCI,
>>>  		.data			= &vpci->pci_hdr,
>>> @@ -560,20 +610,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>>  
>>>  	r = device__register(&vpci->dev_hdr);
>>>  	if (r < 0)
>>> -		goto free_msix_mmio;
>>> +		return r;
>>>  
>>>  	/* save the IRQ that device__register() has allocated */
>>>  	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
>>>  
>>>  	return 0;
>>> -
>>> -free_msix_mmio:
>>> -	kvm__deregister_mmio(kvm, msix_io_block);
>>> -free_mmio:
>>> -	kvm__deregister_mmio(kvm, mmio_addr);
>>> -free_ioport:
>>> -	ioport__unregister(kvm, port_addr);
>>> -	return r;
>>>  }
>>>  
>>>  int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 27/30] pci: Implement reassignable BARs
  2020-01-23 13:48 ` [PATCH v2 kvmtool 27/30] pci: Implement reassignable BARs Alexandru Elisei
@ 2020-02-07 16:50   ` Andre Przywara
  2020-03-10 14:17     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-07 16:50 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:48:02 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> BARs are used by the guest to configure the access to the PCI device by
> writing the address to which the device will respond. The basic idea for
> adding support for reassignable BARs is straightforward: deactivate
> emulation for the memory region described by the old BAR value, and
> activate emulation for the new region.
> 
> BAR reassignement can be done while device access is enabled and memory
> regions for different devices can overlap as long as no access is made
> to the overlapping memory regions. This means that it is legal for the
> BARs of two distinct devices to point to an overlapping memory region,
> and indeed, this is how Linux does resource assignment at boot. To
> account for this situation, the simple algorithm described above is
> enhanced to scan for all devices and:
> 
> - Deactivate emulation for any BARs that might overlap with the new BAR
>   value.
> 
> - Enable emulation for any BARs that were overlapping with the old value
>   after the BAR has been updated.
> 
> Activating/deactivating emulation of a memory region has side effects.
> In order to prevent the execution of the same callback twice we now keep
> track of the state of the region emulation. For example, this can happen
> if we program a BAR with an address that overlaps a second BAR, thus
> deactivating emulation for the second BAR, and then we disable all
> region accesses to the second BAR by writing to the command register.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  hw/vesa.c           |   6 +-
>  include/kvm/pci.h   |  23 +++-
>  pci.c               | 274 +++++++++++++++++++++++++++++++++++---------
>  powerpc/spapr_pci.c |   2 +-
>  vfio/pci.c          |  15 ++-
>  virtio/pci.c        |   8 +-
>  6 files changed, 261 insertions(+), 67 deletions(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index 3044a86078fb..aca938f79c82 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -49,7 +49,7 @@ static int vesa__bar_activate(struct kvm *kvm,
>  	int r;
>  
>  	bar_addr = pci__bar_address(pci_hdr, bar_num);
> -	bar_size = pci_hdr->bar_size[bar_num];
> +	bar_size = pci__bar_size(pci_hdr, bar_num);
>  
>  	switch (bar_num) {
>  	case 0:
> @@ -121,9 +121,9 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>  		.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>  		.bar[0]			= cpu_to_le32(port_addr | PCI_BASE_ADDRESS_SPACE_IO),
> -		.bar_size[0]		= PCI_IO_SIZE,
> +		.bar_info[0]		= (struct pci_bar_info) {.size = PCI_IO_SIZE},
>  		.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
> -		.bar_size[1]		= VESA_MEM_SIZE,
> +		.bar_info[1]		= (struct pci_bar_info) {.size = VESA_MEM_SIZE},
>  	};
>  
>  	vdev->pci_hdr.cfg_ops = (struct pci_config_operations) {
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index bf42f497168f..ae71ef33237c 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -11,6 +11,17 @@
>  #include "kvm/msi.h"
>  #include "kvm/fdt.h"
>  
> +#define pci_dev_err(pci_hdr, fmt, ...) \
> +	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_warn(pci_hdr, fmt, ...) \
> +	pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_info(pci_hdr, fmt, ...) \
> +	pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_dbg(pci_hdr, fmt, ...) \
> +	pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_die(pci_hdr, fmt, ...) \
> +	die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +
>  /*
>   * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
>   * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
> @@ -89,6 +100,11 @@ struct pci_cap_hdr {
>  	u8	next;
>  };
>  
> +struct pci_bar_info {
> +	u32 size;
> +	bool active;
> +};

Do we really need this data structure above?
There is this "32-bit plus 1-bit" annoyance, but also a lot of changes in this patch are about this, making the code less pretty.
So what about we introduce a bitmap, below in struct pci_device_header? I think we inherited the neat set_bit/test_bit functions from the kernel, so can we use that by just adding something like an "unsigned long bar_enabled;" below?

> +
>  struct pci_device_header;
>  
>  typedef int (*bar_activate_fn_t)(struct kvm *kvm,
> @@ -142,7 +158,7 @@ struct pci_device_header {
>  	};
>  
>  	/* Private to lkvm */
> -	u32		bar_size[6];
> +	struct pci_bar_info	bar_info[6];
>  	bar_activate_fn_t	bar_activate_fn;
>  	bar_deactivate_fn_t	bar_deactivate_fn;
>  	void *data;
> @@ -224,4 +240,9 @@ static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_nu
>  	return __pci__bar_address(pci_hdr->bar[bar_num]);
>  }
>  
> +static inline u32 pci__bar_size(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return pci_hdr->bar_info[bar_num].size;
> +}
> +
>  #endif /* KVM__PCI_H */
> diff --git a/pci.c b/pci.c
> index 98331a1fc205..1e9791250bc3 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -68,7 +68,7 @@ void pci__assign_irq(struct device_header *dev_hdr)
>  
>  static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
>  {
> -	return  bar_num < 6 && pci_hdr->bar_size[bar_num];
> +	return  bar_num < 6 && pci__bar_size(pci_hdr, bar_num);
>  }
>  
>  static void *pci_config_address_ptr(u16 port)
> @@ -157,6 +157,46 @@ static struct ioport_operations pci_config_data_ops = {
>  	.io_out	= pci_config_data_out,
>  };
>  
> +static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			    int bar_num)
> +{
> +	int r = 0;
> +
> +	if (pci_hdr->bar_info[bar_num].active)
> +		goto out;
> +
> +	r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
> +	if (r < 0) {
> +		pci_dev_err(pci_hdr, "Error activating emulation for BAR %d",
> +			    bar_num);
> +		goto out;
> +	}
> +	pci_hdr->bar_info[bar_num].active = true;
> +
> +out:
> +	return r;
> +}
> +
> +static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      int bar_num)
> +{
> +	int r = 0;
> +
> +	if (!pci_hdr->bar_info[bar_num].active)
> +		goto out;
> +
> +	r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
> +	if (r < 0) {
> +		pci_dev_err(pci_hdr, "Error deactivating emulation for BAR %d",
> +			    bar_num);
> +		goto out;
> +	}
> +	pci_hdr->bar_info[bar_num].active = false;
> +
> +out:
> +	return r;
> +}
> +
>  static void pci_config_command_wr(struct kvm *kvm,
>  				  struct pci_device_header *pci_hdr,
>  				  u16 new_command)
> @@ -173,26 +213,179 @@ static void pci_config_command_wr(struct kvm *kvm,
>  
>  		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
>  			if (__pci__io_space_enabled(new_command))
> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> -							 pci_hdr->data);
> -			else
> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> -							   pci_hdr->data);
> +				pci_activate_bar(kvm, pci_hdr, i);
> +			if (!__pci__io_space_enabled(new_command))

Isn't that just "else", as before?

> +				pci_deactivate_bar(kvm, pci_hdr, i);
>  		}
>  
>  		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
>  			if (__pci__memory_space_enabled(new_command))
> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> -							 pci_hdr->data);
> -			else
> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> -							   pci_hdr->data);
> +				pci_activate_bar(kvm, pci_hdr, i);
> +			if (!__pci__memory_space_enabled(new_command))

Same here?

> +				pci_deactivate_bar(kvm, pci_hdr, i);
>  		}
>  	}
>  
>  	pci_hdr->command = new_command;
>  }
>  
> +static int pci_deactivate_bar_regions(struct kvm *kvm,
> +				      struct pci_device_header *pci_hdr,
> +				      u32 start, u32 size)
> +{
> +	struct device_header *dev_hdr;
> +	struct pci_device_header *tmp_hdr;
> +	u32 tmp_addr, tmp_size;
> +	int i, r;
> +
> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
> +	while (dev_hdr) {
> +		tmp_hdr = dev_hdr->data;
> +		for (i = 0; i < 6; i++) {
> +			if (!pci_bar_is_implemented(tmp_hdr, i))
> +				continue;
> +
> +			tmp_addr = pci__bar_address(tmp_hdr, i);
> +			tmp_size = pci__bar_size(tmp_hdr, i);
> +
> +			if (tmp_addr + tmp_size <= start ||
> +			    tmp_addr >= start + size)
> +				continue;
> +
> +			r = pci_deactivate_bar(kvm, tmp_hdr, i);
> +			if (r < 0)
> +				return r;
> +		}
> +		dev_hdr = device__next_dev(dev_hdr);
> +	}
> +
> +	return 0;
> +}
> +
> +static int pci_activate_bar_regions(struct kvm *kvm,
> +				    struct pci_device_header *pci_hdr,
> +				    u32 start, u32 size)
> +{
> +	struct device_header *dev_hdr;
> +	struct pci_device_header *tmp_hdr;
> +	u32 tmp_addr, tmp_size;
> +	int i, r;
> +
> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
> +	while (dev_hdr) {
> +		tmp_hdr = dev_hdr->data;
> +		for (i = 0; i < 6; i++) {
> +			if (!pci_bar_is_implemented(tmp_hdr, i))
> +				continue;
> +
> +			tmp_addr = pci__bar_address(tmp_hdr, i);
> +			tmp_size = pci__bar_size(tmp_hdr, i);
> +
> +			if (tmp_addr + tmp_size <= start ||
> +			    tmp_addr >= start + size)
> +				continue;
> +
> +			r = pci_activate_bar(kvm, tmp_hdr, i);
> +			if (r < 0)
> +				return r;
> +		}
> +		dev_hdr = device__next_dev(dev_hdr);
> +	}
> +
> +	return 0;
> +}
> +
> +static void pci_config_bar_wr(struct kvm *kvm,
> +			      struct pci_device_header *pci_hdr, int bar_num,
> +			      u32 value)
> +{
> +	u32 old_addr, new_addr, bar_size;
> +	u32 mask;
> +	int r;
> +
> +	if (pci__bar_is_io(pci_hdr, bar_num))
> +		mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
> +	else
> +		mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> +
> +	/*
> +	 * If the kernel masks the BAR, it will expect to find the size of the
> +	 * BAR there next time it reads from it. After the kernel reads the
> +	 * size, it will write the address back.
> +	 *
> +	 * According to the PCI local bus specification REV 3.0: The number of
> +	 * upper bits that a device actually implements depends on how much of
> +	 * the address space the device will respond to. A device that wants a 1
> +	 * MB memory address space (using a 32-bit base address register) would
> +	 * build the top 12 bits of the address register, hardwiring the other
> +	 * bits to 0.
> +	 *
> +	 * Furthermore, software can determine how much address space the device
> +	 * requires by writing a value of all 1's to the register and then
> +	 * reading the value back. The device will return 0's in all don't-care
> +	 * address bits, effectively specifying the address space required.
> +	 *
> +	 * Software computes the size of the address space with the formula
> +	 * S =  ~B + 1, where S is the memory size and B is the value read from
> +	 * the BAR. This means that the BAR value that kvmtool should return is
> +	 * B = ~(S - 1).
> +	 */
> +	if (value == 0xffffffff) {
> +		value = ~(pci__bar_size(pci_hdr, bar_num) - 1);
> +		/* Preserve the special bits. */
> +		value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
> +		pci_hdr->bar[bar_num] = value;
> +		return;
> +	}
> +
> +	value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
> +
> +	/* Don't toggle emulation when region type access is disbled. */
> +	if (pci__bar_is_io(pci_hdr, bar_num) &&
> +	    !pci__io_space_enabled(pci_hdr)) {
> +		pci_hdr->bar[bar_num] = value;
> +		return;
> +	}
> +
> +	if (pci__bar_is_memory(pci_hdr, bar_num) &&
> +	    !pci__memory_space_enabled(pci_hdr)) {
> +		pci_hdr->bar[bar_num] = value;
> +		return;
> +	}
> +
> +	old_addr = pci__bar_address(pci_hdr, bar_num);
> +	new_addr = __pci__bar_address(value);
> +	bar_size = pci__bar_size(pci_hdr, bar_num);
> +
> +	r = pci_deactivate_bar(kvm, pci_hdr, bar_num);
> +	if (r < 0)
> +		return;
> +
> +	r = pci_deactivate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
> +	if (r < 0) {
> +		/*
> +		 * We cannot update the BAR because of an overlapping region
> +		 * that failed to deactivate emulation, so keep the old BAR
> +		 * value and re-activate emulation for it.
> +		 */
> +		pci_activate_bar(kvm, pci_hdr, bar_num);
> +		return;
> +	}
> +
> +	pci_hdr->bar[bar_num] = value;
> +	r = pci_activate_bar(kvm, pci_hdr, bar_num);
> +	if (r < 0) {
> +		/*
> +		 * New region cannot be emulated, re-enable the regions that
> +		 * were overlapping.
> +		 */
> +		pci_activate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
> +		return;
> +	}
> +
> +	pci_activate_bar_regions(kvm, pci_hdr, old_addr, bar_size);
> +}
> +
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
>  	void *base;
> @@ -200,7 +393,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  	u32 value = 0;
> -	u32 mask;
>  
>  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
>  		return;
> @@ -225,46 +417,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	}
>  
>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
> -
> -	/*
> -	 * If the kernel masks the BAR, it will expect to find the size of the
> -	 * BAR there next time it reads from it. After the kernel reads the
> -	 * size, it will write the address back.
> -	 */
>  	if (bar < 6) {
> -		if (pci__bar_is_io(pci_hdr, bar))
> -			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
> -		else
> -			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> -		/*
> -		 * According to the PCI local bus specification REV 3.0:
> -		 * The number of upper bits that a device actually implements
> -		 * depends on how much of the address space the device will
> -		 * respond to. A device that wants a 1 MB memory address space
> -		 * (using a 32-bit base address register) would build the top
> -		 * 12 bits of the address register, hardwiring the other bits
> -		 * to 0.
> -		 *
> -		 * Furthermore, software can determine how much address space
> -		 * the device requires by writing a value of all 1's to the
> -		 * register and then reading the value back. The device will
> -		 * return 0's in all don't-care address bits, effectively
> -		 * specifying the address space required.
> -		 *
> -		 * Software computes the size of the address space with the
> -		 * formula S = ~B + 1, where S is the memory size and B is the
> -		 * value read from the BAR. This means that the BAR value that
> -		 * kvmtool should return is B = ~(S - 1).
> -		 */
>  		memcpy(&value, data, size);
> -		if (value == 0xffffffff)
> -			value = ~(pci_hdr->bar_size[bar] - 1);
> -		/* Preserve the special bits. */
> -		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
> -		memcpy(base + offset, &value, size);
> -	} else {
> -		memcpy(base + offset, data, size);
> +		pci_config_bar_wr(kvm, pci_hdr, bar, value);
> +		return;
>  	}
> +
> +	memcpy(base + offset, data, size);
>  }
>  
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
> @@ -329,20 +488,21 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr
>  			continue;
>  
>  		has_bar_regions = true;
> +		assert(!pci_hdr->bar_info[i].active);
>  
>  		if (pci__bar_is_io(pci_hdr, i) &&
>  		    pci__io_space_enabled(pci_hdr)) {
> -				r = bar_activate_fn(kvm, pci_hdr, i, data);
> -				if (r < 0)
> -					return r;
> -			}
> +			r = pci_activate_bar(kvm, pci_hdr, i);
> +			if (r < 0)
> +				return r;
> +		}
>  
>  		if (pci__bar_is_memory(pci_hdr, i) &&
>  		    pci__memory_space_enabled(pci_hdr)) {
> -				r = bar_activate_fn(kvm, pci_hdr, i, data);
> -				if (r < 0)
> -					return r;
> -			}
> +			r = pci_activate_bar(kvm, pci_hdr, i);
> +			if (r < 0)
> +				return r;
> +		}
>  	}
>  
>  	assert(has_bar_regions);
> diff --git a/powerpc/spapr_pci.c b/powerpc/spapr_pci.c
> index a15f7d895a46..7be44d950acb 100644
> --- a/powerpc/spapr_pci.c
> +++ b/powerpc/spapr_pci.c
> @@ -369,7 +369,7 @@ int spapr_populate_pci_devices(struct kvm *kvm,
>  				of_pci_b_ddddd(devid) |
>  				of_pci_b_fff(fn) |
>  				of_pci_b_rrrrrrrr(bars[i]));
> -			reg[n+1].size = cpu_to_be64(hdr->bar_size[i]);
> +			reg[n+1].size = cpu_to_be64(pci__bar_size(hdr, i));
>  			reg[n+1].addr = 0;
>  
>  			assigned_addresses[n].phys_hi = cpu_to_be32(
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 9e595562180b..3a641e72e574 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -455,6 +455,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>  	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>  	struct vfio_pci_msix_table *table = &pdev->msix_table;
>  	struct vfio_region *region = &vdev->regions[bar_num];
> +	u32 bar_addr;
>  	int ret;
>  
>  	if (!region->info.size) {
> @@ -462,8 +463,11 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>  		goto out;
>  	}
>  
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +
>  	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>  	    (u32)bar_num == table->bar) {
> +		table->guest_phys_addr = region->guest_phys_addr = bar_addr;

I think those double assignments are a bit frowned upon, at least in Linux coding style. It would probably be cleaner to assign the region member after the error check.

>  		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>  					 table->size, false,
>  					 vfio_pci_msix_table_access, pdev);
> @@ -473,13 +477,22 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>  
>  	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>  	    (u32)bar_num == pba->bar) {
> +		if (pba->bar == table->bar)
> +			pba->guest_phys_addr = table->guest_phys_addr + table->size;
> +		else
> +			pba->guest_phys_addr = region->guest_phys_addr = bar_addr;

same here with the double assignment

>  		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>  					 pba->size, false,
>  					 vfio_pci_msix_pba_access, pdev);
>  		goto out;
>  	}
>  
> +	if (pci__bar_is_io(pci_hdr, bar_num))
> +		region->port_base = bar_addr;
> +	else
> +		region->guest_phys_addr = bar_addr;

Isn't that redundant with those double assignments above? Maybe you can get rid of those altogether?

Cheers,
Andre

>  	ret = vfio_map_region(kvm, vdev, region);
> +
>  out:
>  	return ret;
>  }
> @@ -749,7 +762,7 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>  		if (!base)
>  			continue;
>  
> -		pdev->hdr.bar_size[i] = region->info.size;
> +		pdev->hdr.bar_info[i].size = region->info.size;
>  	}
>  
>  	/* I really can't be bothered to support cardbus. */
> diff --git a/virtio/pci.c b/virtio/pci.c
> index 5a3cc6f1e943..e02430881394 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -483,7 +483,7 @@ static int virtio_pci__bar_activate(struct kvm *kvm,
>  	int r;
>  
>  	bar_addr = pci__bar_address(pci_hdr, bar_num);
> -	bar_size = pci_hdr->bar_size[bar_num];
> +	bar_size = pci__bar_size(pci_hdr, bar_num);
>  
>  	switch (bar_num) {
>  	case 0:
> @@ -569,9 +569,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  							| PCI_BASE_ADDRESS_SPACE_MEMORY),
>  		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
>  		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
> -		.bar_size[0]		= cpu_to_le32(PCI_IO_SIZE),
> -		.bar_size[1]		= cpu_to_le32(PCI_IO_SIZE),
> -		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
> +		.bar_info[0]		= (struct pci_bar_info) {.size = cpu_to_le32(PCI_IO_SIZE)},
> +		.bar_info[1]		= (struct pci_bar_info) {.size = cpu_to_le32(PCI_IO_SIZE)},
> +		.bar_info[2]		= (struct pci_bar_info) {.size = cpu_to_le32(PCI_IO_SIZE*2)},
>  	};
>  
>  	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property
  2020-01-23 13:48 ` [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
@ 2020-02-07 16:51   ` Andre Przywara
  2020-02-07 17:38   ` Andre Przywara
  1 sibling, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-02-07 16:51 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

On Thu, 23 Jan 2020 13:48:03 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> From: Julien Thierry <julien.thierry@arm.com>
> 
> PCI now supports configurable BARs. Get rid of the no longer needed,
> Linux-only, fdt property.

\o/

> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre.

> ---
>  arm/fdt.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arm/fdt.c b/arm/fdt.c
> index c80e6da323b6..02091e9e0bee 100644
> --- a/arm/fdt.c
> +++ b/arm/fdt.c
> @@ -130,7 +130,6 @@ static int setup_fdt(struct kvm *kvm)
>  
>  	/* /chosen */
>  	_FDT(fdt_begin_node(fdt, "chosen"));
> -	_FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1));
>  
>  	/* Pass on our amended command line to a Linux kernel only. */
>  	if (kvm->cfg.firmware_filename) {


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 29/30] vfio: Trap MMIO access to BAR addresses which aren't page aligned
  2020-01-23 13:48 ` [PATCH v2 kvmtool 29/30] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
@ 2020-02-07 16:51   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-02-07 16:51 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:48:04 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> KVM_SET_USER_MEMORY_REGION will fail if the guest physical address is
> not aligned to the page size. However, it is legal for a guest to
> program an address which isn't aligned to the page size. Trap and
> emulate MMIO accesses to the region when that happens.
> 
> Without this patch, when assigning a Seagate Barracude hard drive to a
> VM I was seeing these errors:
> 
> [    0.286029] pci 0000:00:00.0: BAR 0: assigned [mem 0x41004600-0x4100467f]
>   Error: 0000:01:00.0: failed to register region with KVM
>   Error: [1095:3132] Error activating emulation for BAR 0
> [..]
> [   10.561794] irq 13: nobody cared (try booting with the "irqpoll" option)
> [   10.563122] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-seattle-00009-g909b20467ed1 #133
> [   10.563124] Hardware name: linux,dummy-virt (DT)
> [   10.563126] Call trace:
> [   10.563134]  dump_backtrace+0x0/0x140
> [   10.563137]  show_stack+0x14/0x20
> [   10.563141]  dump_stack+0xbc/0x100
> [   10.563146]  __report_bad_irq+0x48/0xd4
> [   10.563148]  note_interrupt+0x288/0x378
> [   10.563151]  handle_irq_event_percpu+0x80/0x88
> [   10.563153]  handle_irq_event+0x44/0xc8
> [   10.563155]  handle_fasteoi_irq+0xb4/0x160
> [   10.563157]  generic_handle_irq+0x24/0x38
> [   10.563159]  __handle_domain_irq+0x60/0xb8
> [   10.563162]  gic_handle_irq+0x50/0xa0
> [   10.563164]  el1_irq+0xb8/0x180
> [   10.563166]  arch_cpu_idle+0x10/0x18
> [   10.563170]  do_idle+0x204/0x290
> [   10.563172]  cpu_startup_entry+0x20/0x40
> [   10.563175]  rest_init+0xd4/0xe0
> [   10.563180]  arch_call_rest_init+0xc/0x14
> [   10.563182]  start_kernel+0x420/0x44c
> [   10.563183] handlers:
> [   10.563650] [<000000001e474803>] sil24_interrupt
> [   10.564559] Disabling IRQ #13
> [..]
> [   11.832916] ata1: spurious interrupt (slot_stat 0x0 active_tag -84148995 sactive 0x0)
> [   12.045444] ata_ratelimit: 1 callbacks suppressed
> 
> With this patch, I don't see the errors and the device works as
> expected.

Pretty neat and easy fix for that nasty problem!

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre.

> ---
>  vfio/core.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/vfio/core.c b/vfio/core.c
> index 6b9b58ea8d2f..b23e77c54771 100644
> --- a/vfio/core.c
> +++ b/vfio/core.c
> @@ -226,6 +226,15 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
>  	if (!(region->info.flags & VFIO_REGION_INFO_FLAG_MMAP))
>  		return vfio_setup_trap_region(kvm, vdev, region);
>  
> +	/*
> +	 * KVM_SET_USER_MEMORY_REGION will fail because the guest physical
> +	 * address isn't page aligned, let's emulate the region ourselves.
> +	 */
> +	if (region->guest_phys_addr & (PAGE_SIZE - 1))
> +		return kvm__register_mmio(kvm, region->guest_phys_addr,
> +					  region->info.size, false,
> +					  vfio_mmio_access, region);
> +
>  	if (region->info.flags & VFIO_REGION_INFO_FLAG_READ)
>  		prot |= PROT_READ;
>  	if (region->info.flags & VFIO_REGION_INFO_FLAG_WRITE)


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 30/30] arm/arm64: Add PCI Express 1.1 support
  2020-01-23 13:48 ` [PATCH v2 kvmtool 30/30] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
@ 2020-02-07 16:51   ` Andre Przywara
  2020-03-10 16:28     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-07 16:51 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:48:05 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> PCI Express comes with an extended addressing scheme, which directly
> translated into a bigger device configuration space (256->4096 bytes)
> and bigger PCI configuration space (16->256 MB), as well as mandatory
> capabilities (power management [1] and PCI Express capability [2]).
> 
> However, our virtio PCI implementation implements version 0.9 of the
> protocol and it still uses transitional PCI device ID's, so we have
> opted to omit the mandatory PCI Express capabilities.For VFIO, the power
> management and PCI Express capability are left for a subsequent patch.
> 
> [1] PCI Express Base Specification Revision 1.1, section 7.6
> [2] PCI Express Base Specification Revision 1.1, section 7.8
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/include/arm-common/kvm-arch.h |  4 +-
>  arm/pci.c                         |  2 +-
>  builtin-run.c                     |  1 +
>  hw/vesa.c                         |  2 +-
>  include/kvm/kvm-config.h          |  2 +-
>  include/kvm/pci.h                 | 76 ++++++++++++++++++++++++++++---
>  pci.c                             |  5 +-
>  vfio/pci.c                        | 26 +++++++----
>  8 files changed, 97 insertions(+), 21 deletions(-)
> 
> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
> index b9d486d5eac2..13c55fa3dc29 100644
> --- a/arm/include/arm-common/kvm-arch.h
> +++ b/arm/include/arm-common/kvm-arch.h
> @@ -23,7 +23,7 @@
>  
>  #define ARM_IOPORT_SIZE		(ARM_MMIO_AREA - ARM_IOPORT_AREA)
>  #define ARM_VIRTIO_MMIO_SIZE	(ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>  
> @@ -50,6 +50,8 @@
>  
>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>  
> +#define ARCH_HAS_PCI_EXP	1
> +
>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>  {
>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
> diff --git a/arm/pci.c b/arm/pci.c
> index 1c0949a22408..eec9f3d936a5 100644
> --- a/arm/pci.c
> +++ b/arm/pci.c
> @@ -77,7 +77,7 @@ void pci__generate_fdt_nodes(void *fdt)
>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>  
>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
> diff --git a/builtin-run.c b/builtin-run.c
> index 9cb8c75300eb..def8a1f803ad 100644
> --- a/builtin-run.c
> +++ b/builtin-run.c
> @@ -27,6 +27,7 @@
>  #include "kvm/irq.h"
>  #include "kvm/kvm.h"
>  #include "kvm/pci.h"
> +#include "kvm/vfio.h"
>  #include "kvm/rtc.h"
>  #include "kvm/sdl.h"
>  #include "kvm/vnc.h"
> diff --git a/hw/vesa.c b/hw/vesa.c
> index aca938f79c82..4321cfbb6ddc 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -82,7 +82,7 @@ static int vesa__bar_deactivate(struct kvm *kvm,
>  }
>  
>  static void vesa__pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -				u8 offset, void *data, int sz)
> +				u16 offset, void *data, int sz)
>  {
>  	u32 value;
>  
> diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
> index a052b0bc7582..a1012c57b7a7 100644
> --- a/include/kvm/kvm-config.h
> +++ b/include/kvm/kvm-config.h
> @@ -2,7 +2,6 @@
>  #define KVM_CONFIG_H_
>  
>  #include "kvm/disk-image.h"
> -#include "kvm/vfio.h"
>  #include "kvm/kvm-config-arch.h"
>  
>  #define DEFAULT_KVM_DEV		"/dev/kvm"
> @@ -18,6 +17,7 @@
>  #define MIN_RAM_SIZE_MB		(64ULL)
>  #define MIN_RAM_SIZE_BYTE	(MIN_RAM_SIZE_MB << MB_SHIFT)
>  
> +struct vfio_device_params;
>  struct kvm_config {
>  	struct kvm_config_arch arch;
>  	struct disk_image_params disk_image[MAX_DISK_IMAGES];
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index ae71ef33237c..0c3c74b82626 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -10,6 +10,7 @@
>  #include "kvm/devices.h"
>  #include "kvm/msi.h"
>  #include "kvm/fdt.h"
> +#include "kvm.h"
>  
>  #define pci_dev_err(pci_hdr, fmt, ...) \
>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> @@ -32,9 +33,41 @@
>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>  #define PCI_IO_SIZE		0x100
>  #define PCI_IOPORT_START	0x6200
> -#define PCI_CFG_SIZE		(1ULL << 24)
>  
> -struct kvm;
> +#define PCIE_CAP_REG_VER	0x1
> +#define PCIE_CAP_REG_DEV_LEGACY	(1 << 4)
> +#define PM_CAP_VER		0x3
> +
> +#ifdef ARCH_HAS_PCI_EXP
> +#define PCI_CFG_SIZE		(1ULL << 28)
> +#define PCI_DEV_CFG_SIZE	PCI_CFG_SPACE_EXP_SIZE
> +
> +union pci_config_address {
> +	struct {
> +#if __BYTE_ORDER == __LITTLE_ENDIAN
> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */

Meeh, using C struct bitfields and expect them to map to certain bits is not within the C standard. But I see that you are merely the messenger here, as we use this already for the CAM mapping. So we keep this fix for another time ...

> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> +		unsigned	enable_bit	: 1;		/* 31       */
> +#else
> +		unsigned	enable_bit	: 1;		/* 31       */
> +		unsigned	reserved	: 3;		/* 30 .. 28 */
> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
> +		unsigned	device_number	: 5;		/* 19 .. 15 */
> +		unsigned	function_number	: 3;		/* 14 .. 12 */
> +		unsigned	register_number	: 10;		/* 11 .. 2  */
> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> +#endif
> +	};
> +	u32 w;
> +};
> +
> +#else
> +#define PCI_CFG_SIZE		(1ULL << 24)
> +#define PCI_DEV_CFG_SIZE	PCI_CFG_SPACE_SIZE
>  
>  union pci_config_address {
>  	struct {
> @@ -58,6 +91,8 @@ union pci_config_address {
>  	};
>  	u32 w;
>  };
> +#endif
> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
>  struct msix_table {
>  	struct msi_msg msg;
> @@ -100,6 +135,33 @@ struct pci_cap_hdr {
>  	u8	next;
>  };
>  
> +struct pcie_cap {
> +	u8 cap;
> +	u8 next;
> +	u16 cap_reg;
> +	u32 dev_cap;
> +	u16 dev_ctrl;
> +	u16 dev_status;
> +	u32 link_cap;
> +	u16 link_ctrl;
> +	u16 link_status;
> +	u32 slot_cap;
> +	u16 slot_ctrl;
> +	u16 slot_status;
> +	u16 root_ctrl;
> +	u16 root_cap;
> +	u32 root_status;
> +};

Wouldn't you need those to be defined as packed as well, if you include them below in a packed struct?

But more importantly: Do we actually need those definitions? We don't seem to use them, do we?
And the u8 __pad[PCI_DEV_CFG_SIZE] below should provide the extended storage space a guest would expect?

The rest looks alright.

Cheers,
Andre.

> +
> +struct pm_cap {
> +	u8 cap;
> +	u8 next;
> +	u16 pmc;
> +	u16 pmcsr;
> +	u8 pmcsr_bse;
> +	u8 data;
> +};
> +
>  struct pci_bar_info {
>  	u32 size;
>  	bool active;
> @@ -115,14 +177,12 @@ typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
>  				   int bar_num, void *data);
>  
>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
> -#define PCI_DEV_CFG_SIZE	256
> -#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
>  struct pci_config_operations {
>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -		      u8 offset, void *data, int sz);
> +		      u16 offset, void *data, int sz);
>  	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -		     u8 offset, void *data, int sz);
> +		     u16 offset, void *data, int sz);
>  };
>  
>  struct pci_device_header {
> @@ -152,6 +212,10 @@ struct pci_device_header {
>  			u8		min_gnt;
>  			u8		max_lat;
>  			struct msix_cap msix;
> +#ifdef ARCH_HAS_PCI_EXP
> +			struct pm_cap pm;
> +			struct pcie_cap pcie;
> +#endif
>  		} __attribute__((packed));
>  		/* Pad to PCI config space size */
>  		u8	__pad[PCI_DEV_CFG_SIZE];
> diff --git a/pci.c b/pci.c
> index 1e9791250bc3..ea3df8d2e28a 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -389,7 +389,8 @@ static void pci_config_bar_wr(struct kvm *kvm,
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
>  	void *base;
> -	u8 bar, offset;
> +	u8 bar;
> +	u16 offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  	u32 value = 0;
> @@ -428,7 +429,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
> -	u8 offset;
> +	u16 offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 3a641e72e574..05e8b54e77ac 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -309,7 +309,7 @@ out_unlock:
>  }
>  
>  static void vfio_pci_msix_cap_write(struct kvm *kvm,
> -				    struct vfio_device *vdev, u8 off,
> +				    struct vfio_device *vdev, u16 off,
>  				    void *data, int sz)
>  {
>  	struct vfio_pci_device *pdev = &vdev->pci;
> @@ -341,7 +341,7 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
>  }
>  
>  static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
> -				     u8 off, u8 *data, u32 sz)
> +				     u16 off, u8 *data, u32 sz)
>  {
>  	size_t i;
>  	u32 mask = 0;
> @@ -389,7 +389,7 @@ static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
>  }
>  
>  static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
> -				   u8 off, u8 *data, u32 sz)
> +				   u16 off, u8 *data, u32 sz)
>  {
>  	u8 ctrl;
>  	struct msi_msg msg;
> @@ -537,7 +537,7 @@ out:
>  }
>  
>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -			      u8 offset, void *data, int sz)
> +			      u16 offset, void *data, int sz)
>  {
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
> @@ -555,7 +555,7 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>  }
>  
>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
> -			       u8 offset, void *data, int sz)
> +			       u16 offset, void *data, int sz)
>  {
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
> @@ -639,15 +639,17 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  {
>  	int ret;
>  	size_t size;
> -	u8 pos, next;
> +	u16 pos, next;
>  	struct pci_cap_hdr *cap;
> -	u8 virt_hdr[PCI_DEV_CFG_SIZE];
> +	u8 *virt_hdr;
>  	struct vfio_pci_device *pdev = &vdev->pci;
>  
>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>  		return 0;
>  
> -	memset(virt_hdr, 0, PCI_DEV_CFG_SIZE);
> +	virt_hdr = calloc(1, PCI_DEV_CFG_SIZE);
> +	if (!virt_hdr)
> +		return -errno;
>  
>  	pos = pdev->hdr.capabilities & ~3;
>  
> @@ -683,6 +685,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  	size = PCI_DEV_CFG_SIZE - PCI_STD_HEADER_SIZEOF;
>  	memcpy((void *)&pdev->hdr + pos, virt_hdr + pos, size);
>  
> +	free(virt_hdr);
> +
>  	return 0;
>  }
>  
> @@ -792,7 +796,11 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>  
>  	/* Install our fake Configuration Space */
>  	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
> -	hdr_sz = PCI_DEV_CFG_SIZE;
> +	/*
> +	 * We don't touch the extended configuration space, let's be cautious
> +	 * and not overwrite it all with zeros, or bad things might happen.
> +	 */
> +	hdr_sz = PCI_CFG_SPACE_SIZE;
>  	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
>  		vfio_dev_err(vdev, "failed to write %zd bytes to Config Space",
>  			     hdr_sz);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 07/30] ioport: pci: Move port allocations to PCI devices
  2020-01-23 13:47 ` [PATCH v2 kvmtool 07/30] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
@ 2020-02-07 17:02   ` Andre Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-02-07 17:02 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

On Thu, 23 Jan 2020 13:47:42 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> From: Julien Thierry <julien.thierry@arm.com>
> 
> The dynamic ioport allocation with IOPORT_EMPTY is currently only used
> by PCI devices. Other devices use fixed ports for which they request
> registration to the ioport API.
> 
> PCI ports need to be in the PCI IO space and there is no reason ioport
> API should know a PCI port is being allocated and needs to be placed in
> PCI IO space. This currently just happens to be the case.
> 
> Move the responsability of dynamic allocation of ioports from the ioport
> API to PCI.
> 
> In the future, if other types of devices also need dynamic ioport
> allocation, they'll have to figure out the range of ports they are
> allowed to use.
> 
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> [Renamed functions for clarity]
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

I replied to the wrong series version of this patch before, so for the sake for completeness, here on the right thread:

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  hw/vesa.c                      |  4 ++--
>  include/kvm/ioport.h           |  3 ---
>  include/kvm/pci.h              |  4 +++-
>  ioport.c                       | 18 ------------------
>  pci.c                          | 17 +++++++++++++----
>  powerpc/include/kvm/kvm-arch.h |  2 +-
>  vfio/core.c                    |  6 ++++--
>  vfio/pci.c                     |  4 ++--
>  virtio/pci.c                   |  7 ++++---
>  x86/include/kvm/kvm-arch.h     |  2 +-
>  10 files changed, 30 insertions(+), 37 deletions(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index d75b4b316a1e..24fb46faad3b 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -63,8 +63,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>  	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>  		return NULL;
> -
> -	r = ioport__register(kvm, IOPORT_EMPTY, &vesa_io_ops, IOPORT_SIZE, NULL);
> +	r = pci_get_io_port_block(IOPORT_SIZE);
> +	r = ioport__register(kvm, r, &vesa_io_ops, IOPORT_SIZE, NULL);
>  	if (r < 0)
>  		return ERR_PTR(r);
>  
> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> index db52a479742b..b10fcd5b4412 100644
> --- a/include/kvm/ioport.h
> +++ b/include/kvm/ioport.h
> @@ -14,11 +14,8 @@
>  
>  /* some ports we reserve for own use */
>  #define IOPORT_DBG			0xe0
> -#define IOPORT_START			0x6200
>  #define IOPORT_SIZE			0x400
>  
> -#define IOPORT_EMPTY			USHRT_MAX
> -
>  struct kvm;
>  
>  struct ioport {
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index a86c15a70e6d..ccb155e3e8fe 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -19,6 +19,7 @@
>  #define PCI_CONFIG_DATA		0xcfc
>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>  #define PCI_IO_SIZE		0x100
> +#define PCI_IOPORT_START	0x6200
>  #define PCI_CFG_SIZE		(1ULL << 24)
>  
>  struct kvm;
> @@ -152,7 +153,8 @@ struct pci_device_header {
>  int pci__init(struct kvm *kvm);
>  int pci__exit(struct kvm *kvm);
>  struct pci_device_header *pci__find_dev(u8 dev_num);
> -u32 pci_get_io_space_block(u32 size);
> +u32 pci_get_mmio_block(u32 size);
> +u16 pci_get_io_port_block(u32 size);
>  void pci__assign_irq(struct device_header *dev_hdr);
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size);
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size);
> diff --git a/ioport.c b/ioport.c
> index a6dc65e3e6c6..a72e4035881a 100644
> --- a/ioport.c
> +++ b/ioport.c
> @@ -16,24 +16,8 @@
>  
>  #define ioport_node(n) rb_entry(n, struct ioport, node)
>  
> -DEFINE_MUTEX(ioport_mutex);
> -
> -static u16			free_io_port_idx; /* protected by ioport_mutex */
> -
>  static struct rb_root		ioport_tree = RB_ROOT;
>  
> -static u16 ioport__find_free_port(void)
> -{
> -	u16 free_port;
> -
> -	mutex_lock(&ioport_mutex);
> -	free_port = IOPORT_START + free_io_port_idx * IOPORT_SIZE;
> -	free_io_port_idx++;
> -	mutex_unlock(&ioport_mutex);
> -
> -	return free_port;
> -}
> -
>  static struct ioport *ioport_search(struct rb_root *root, u64 addr)
>  {
>  	struct rb_int_node *node;
> @@ -85,8 +69,6 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>  	int r;
>  
>  	br_write_lock(kvm);
> -	if (port == IOPORT_EMPTY)
> -		port = ioport__find_free_port();
>  
>  	entry = ioport_search(&ioport_tree, port);
>  	if (entry) {
> diff --git a/pci.c b/pci.c
> index 3198732935eb..80b5c5d3d7f3 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -15,15 +15,24 @@ static u32 pci_config_address_bits;
>   * (That's why it can still 32bit even with 64bit guests-- 64bit
>   * PCI isn't currently supported.)
>   */
> -static u32 io_space_blocks		= KVM_PCI_MMIO_AREA;
> +static u32 mmio_blocks			= KVM_PCI_MMIO_AREA;
> +static u16 io_port_blocks		= PCI_IOPORT_START;
> +
> +u16 pci_get_io_port_block(u32 size)
> +{
> +	u16 port = ALIGN(io_port_blocks, IOPORT_SIZE);
> +
> +	io_port_blocks = port + size;
> +	return port;
> +}
>  
>  /*
>   * BARs must be naturally aligned, so enforce this in the allocator.
>   */
> -u32 pci_get_io_space_block(u32 size)
> +u32 pci_get_mmio_block(u32 size)
>  {
> -	u32 block = ALIGN(io_space_blocks, size);
> -	io_space_blocks = block + size;
> +	u32 block = ALIGN(mmio_blocks, size);
> +	mmio_blocks = block + size;
>  	return block;
>  }
>  
> diff --git a/powerpc/include/kvm/kvm-arch.h b/powerpc/include/kvm/kvm-arch.h
> index 8126b96cb66a..26d440b22bdd 100644
> --- a/powerpc/include/kvm/kvm-arch.h
> +++ b/powerpc/include/kvm/kvm-arch.h
> @@ -34,7 +34,7 @@
>  #define KVM_MMIO_START			PPC_MMIO_START
>  
>  /*
> - * This is the address that pci_get_io_space_block() starts allocating
> + * This is the address that pci_get_io_port_block() starts allocating
>   * from.  Note that this is a PCI bus address.
>   */
>  #define KVM_IOPORT_AREA			0x0
> diff --git a/vfio/core.c b/vfio/core.c
> index 17b5b0cfc9ac..0ed1e6fee6bf 100644
> --- a/vfio/core.c
> +++ b/vfio/core.c
> @@ -202,8 +202,10 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
>  				  struct vfio_region *region)
>  {
>  	if (region->is_ioport) {
> -		int port = ioport__register(kvm, IOPORT_EMPTY, &vfio_ioport_ops,
> -					    region->info.size, region);
> +		int port = pci_get_io_port_block(region->info.size);
> +
> +		port = ioport__register(kvm, port, &vfio_ioport_ops,
> +					region->info.size, region);
>  		if (port < 0)
>  			return port;
>  
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 76e24c156906..8e5d8572bc0c 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -750,7 +750,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm,
>  	 * powers of two.
>  	 */
>  	mmio_size = roundup_pow_of_two(table->size + pba->size);
> -	table->guest_phys_addr = pci_get_io_space_block(mmio_size);
> +	table->guest_phys_addr = pci_get_mmio_block(mmio_size);
>  	if (!table->guest_phys_addr) {
>  		pr_err("cannot allocate IO space");
>  		ret = -ENOMEM;
> @@ -846,7 +846,7 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  	if (!region->is_ioport) {
>  		/* Grab some MMIO space in the guest */
>  		map_size = ALIGN(region->info.size, PAGE_SIZE);
> -		region->guest_phys_addr = pci_get_io_space_block(map_size);
> +		region->guest_phys_addr = pci_get_mmio_block(map_size);
>  	}
>  
>  	/* Map the BARs into the guest or setup a trap region. */
> diff --git a/virtio/pci.c b/virtio/pci.c
> index 04e801827df9..d73414abde05 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -438,18 +438,19 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>  
> -	r = ioport__register(kvm, IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
> +	r = pci_get_io_port_block(IOPORT_SIZE);
> +	r = ioport__register(kvm, r, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
>  	if (r < 0)
>  		return r;
>  	vpci->port_addr = (u16)r;
>  
> -	vpci->mmio_addr = pci_get_io_space_block(IOPORT_SIZE);
> +	vpci->mmio_addr = pci_get_mmio_block(IOPORT_SIZE);
>  	r = kvm__register_mmio(kvm, vpci->mmio_addr, IOPORT_SIZE, false,
>  			       virtio_pci__io_mmio_callback, vpci);
>  	if (r < 0)
>  		goto free_ioport;
>  
> -	vpci->msix_io_block = pci_get_io_space_block(PCI_IO_SIZE * 2);
> +	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
>  	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
>  			       virtio_pci__msix_mmio_callback, vpci);
>  	if (r < 0)
> diff --git a/x86/include/kvm/kvm-arch.h b/x86/include/kvm/kvm-arch.h
> index bfdd3438a9de..85cd336c7577 100644
> --- a/x86/include/kvm/kvm-arch.h
> +++ b/x86/include/kvm/kvm-arch.h
> @@ -16,7 +16,7 @@
>  
>  #define KVM_MMIO_START		KVM_32BIT_GAP_START
>  
> -/* This is the address that pci_get_io_space_block() starts allocating
> +/* This is the address that pci_get_io_port_block() starts allocating
>   * from.  Note that this is a PCI bus address (though same on x86).
>   */
>  #define KVM_IOPORT_AREA		0x0


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (29 preceding siblings ...)
  2020-01-23 13:48 ` [PATCH v2 kvmtool 30/30] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
@ 2020-02-07 17:02 ` Andre Przywara
  2020-05-13 14:56 ` Marc Zyngier
  31 siblings, 0 replies; 88+ messages in thread
From: Andre Przywara @ 2020-02-07 17:02 UTC (permalink / raw)
  To: will, julien.thierry.kdev
  Cc: Alexandru Elisei, kvm, sami.mujawar, lorenzo.pieralisi, maz

On Thu, 23 Jan 2020 13:47:35 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi Will,

I am done with reviewing this series. The first patches up to and including 07/30 are good to go, since they are fixes, I would be delighted to see them merged ASAP. You might want to add my R-b: tags to those.

For the rest of the series I had some comments, but apart from two locking issues I don't see any showstoppers.
If I understand correctly, Alex would send a new version soonish. Merging the first fixes now would help to make v3 less daunting regarding the number of patches ;-)

Thanks,
Andre

> kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
> it from trying to reassign the BARs. Let's make the BARs reassignable so
> we can get rid of this band-aid.
> 
> Let's also extend the legacy PCI emulation, which came out in 1992, so we
> can properly emulate the PCI Express version 1.1 protocol, which is
> relatively new, being published in 2005.
> 
> For this iteration, I have completely reworked the way BARs are
> reassigned. As I was adding support for reassignable BARs to more devices,
> it became clear to me that I was duplicating the same code over and over
> again.  Furthermore, during device configuration, Linux can assign a region
> resource to a BAR that temporarily overlaps with another device. With my
> original approach, that meant that every device must be aware of the BAR
> values for all the other devices.
> 
> With this new approach, the algorithm for activating/deactivating emulation
> as BAR addresses change lives completely inside the PCI code. Each device
> registers two callback functions which are called when device emulation is
> activated (for example, to activate emulation for a newly assigned BAR
> address), respectively, when device emulation is deactivated (a previous
> BAR address is changed, and emulation for that region must be deactivated).
> 
> I also tried to do better at testing the patches. I have tested VFIO with
> virtio-pci on an arm64 and a x86 machine:
> 
> 1. AMD Seattle: Intel 82574L Gigabit Ethernet card, Samsung 970 Pro NVME
> (controller registers are in the same BAR region as the MSIX table and PBA,
> I wrote a nasty hack to make it work, will try to upstream something after
> this series), Realtek 8168 Gigabit Ethernet card, NVIDIA Quadro P400 (only
> device detection), AMD Firepro W2100 (amdgpu driver fails probing
> because of missing expansion ROM emulation in kvmtool, I will send patches
> for this too), Myricom 10 Gigabit Ethernet card, Seagate Barracuda 1000GB
> drive.
> 
> 2. Ryzen 3900x + Gigabyte x570 Aorus Master (bios F10): Realtek 8168
> Gigabit Ethernet card, AMD Firepro W2100 (same issue as on Seattle).
> 
> Using the CFI flash emulation for kvmtool [1] and a hacked version of EDK2
> as the firmware for the virtual machine, I was able download an official
> debian arm64 installation iso, install debian and then run it. EDK2 patches
> for kvmtool will be posted soon.
> 
> You will notice from the changelog that there are a lot of new patches
> (17!), but most of them are fixes for stuff that I found while testing.
> 
> Patches 1-18 are fixes and cleanups, and can be merged independently. They
> are pretty straightforward, so if the size of the series looks off-putting,
> please review these first. I am aware that the series has grown quite a
> lot, I am willing to split the fixes from the rest of the patches, or
> whatever else can make reviewing easier.
> 
> Changes in v2:
> * Patches 2, 11-18, 20, 22-27, 29 are new.
> * Patches 11, 13, and 14 have been dropped.
> * Reworked the way BAR reassignment is implemented.
> * The patch "Add PCI Express 1.1 support" has been reworked to apply only
>   to arm64. For x86 we would need ACPI support in order to advertise the
>   location of the ECAM space.
> * Gathered Reviewed-by tags.
> * Implemented review comments.
> 
> [1] https://www.spinics.net/lists/arm-kernel/msg778623.html
> 
> Alexandru Elisei (24):
>   Makefile: Use correct objcopy binary when cross-compiling for x86_64
>   hw/i8042: Compile only for x86
>   Remove pci-shmem device
>   Check that a PCI device's memory size is power of two
>   arm/pci: Advertise only PCI bus 0 in the DT
>   vfio/pci: Allocate correct size for MSIX table and PBA BARs
>   vfio/pci: Don't assume that only even numbered BARs are 64bit
>   vfio/pci: Ignore expansion ROM BAR writes
>   vfio/pci: Don't access potentially unallocated regions
>   virtio: Don't ignore initialization failures
>   Don't ignore errors registering a device, ioport or mmio emulation
>   hw/vesa: Don't ignore fatal errors
>   hw/vesa: Set the size for BAR 0
>   Use independent read/write locks for ioport and mmio
>   pci: Add helpers for BAR values and memory/IO space access
>   virtio/pci: Get emulated region address from BARs
>   vfio: Destroy memslot when unmapping the associated VAs
>   vfio: Reserve ioports when configuring the BAR
>   vfio/pci: Don't write configuration value twice
>   pci: Implement callbacks for toggling BAR emulation
>   pci: Toggle BAR I/O and memory space emulation
>   pci: Implement reassignable BARs
>   vfio: Trap MMIO access to BAR addresses which aren't page aligned
>   arm/arm64: Add PCI Express 1.1 support
> 
> Julien Thierry (5):
>   ioport: pci: Move port allocations to PCI devices
>   pci: Fix ioport allocation size
>   arm/pci: Fix PCI IO region
>   virtio/pci: Make memory and IO BARs independent
>   arm/fdt: Remove 'linux,pci-probe-only' property
> 
> Sami Mujawar (1):
>   pci: Fix BAR resource sizing arbitration
> 
>  Makefile                          |   6 +-
>  arm/fdt.c                         |   1 -
>  arm/include/arm-common/kvm-arch.h |   4 +-
>  arm/include/arm-common/pci.h      |   1 +
>  arm/ioport.c                      |   3 +-
>  arm/kvm.c                         |   3 +
>  arm/pci.c                         |  25 +-
>  builtin-run.c                     |   6 +-
>  hw/i8042.c                        |  14 +-
>  hw/pci-shmem.c                    | 400 ------------------------------
>  hw/vesa.c                         | 132 +++++++---
>  include/kvm/devices.h             |   3 +-
>  include/kvm/ioport.h              |  10 +-
>  include/kvm/kvm-config.h          |   2 +-
>  include/kvm/kvm.h                 |   9 +-
>  include/kvm/pci-shmem.h           |  32 ---
>  include/kvm/pci.h                 | 168 ++++++++++++-
>  include/kvm/util.h                |   2 +
>  include/kvm/vesa.h                |   6 +-
>  include/kvm/virtio-pci.h          |   3 -
>  include/kvm/virtio.h              |   7 +-
>  include/linux/compiler.h          |   2 +-
>  ioport.c                          |  57 ++---
>  kvm.c                             |  65 ++++-
>  mips/kvm.c                        |   3 +-
>  mmio.c                            |  26 +-
>  pci.c                             | 320 ++++++++++++++++++++++--
>  powerpc/include/kvm/kvm-arch.h    |   2 +-
>  powerpc/ioport.c                  |   3 +-
>  powerpc/spapr_pci.c               |   2 +-
>  vfio/core.c                       |  22 +-
>  vfio/pci.c                        | 231 +++++++++++++----
>  virtio/9p.c                       |   9 +-
>  virtio/balloon.c                  |  10 +-
>  virtio/blk.c                      |  14 +-
>  virtio/console.c                  |  11 +-
>  virtio/core.c                     |   9 +-
>  virtio/mmio.c                     |  13 +-
>  virtio/net.c                      |  32 +--
>  virtio/pci.c                      | 220 +++++++++++-----
>  virtio/scsi.c                     |  14 +-
>  x86/include/kvm/kvm-arch.h        |   2 +-
>  x86/ioport.c                      |  66 +++--
>  43 files changed, 1217 insertions(+), 753 deletions(-)
>  delete mode 100644 hw/pci-shmem.c
>  delete mode 100644 include/kvm/pci-shmem.h
> 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property
  2020-01-23 13:48 ` [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
  2020-02-07 16:51   ` Andre Przywara
@ 2020-02-07 17:38   ` Andre Przywara
  2020-03-10 16:04     ` Alexandru Elisei
  1 sibling, 1 reply; 88+ messages in thread
From: Andre Przywara @ 2020-02-07 17:38 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

On Thu, 23 Jan 2020 13:48:03 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> From: Julien Thierry <julien.thierry@arm.com>
> 
> PCI now supports configurable BARs. Get rid of the no longer needed,
> Linux-only, fdt property.

I was just wondering: what is the x86 story here?
Does the x86 kernel never reassign BARs? Or is this dependent on something else?
I see tons of pci kernel command line parameters for pci=, maybe one of them would explicitly allow reassigning?

Cheers,
Andre

> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/fdt.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arm/fdt.c b/arm/fdt.c
> index c80e6da323b6..02091e9e0bee 100644
> --- a/arm/fdt.c
> +++ b/arm/fdt.c
> @@ -130,7 +130,6 @@ static int setup_fdt(struct kvm *kvm)
>  
>  	/* /chosen */
>  	_FDT(fdt_begin_node(fdt, "chosen"));
> -	_FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1));
>  
>  	/* Pass on our amended command line to a Linux kernel only. */
>  	if (kvm->cfg.firmware_filename) {


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region
  2020-01-29 18:16   ` Andre Przywara
@ 2020-03-04 16:20     ` Alexandru Elisei
  2020-03-05 13:06       ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-04 16:20 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

Hi,

On 1/29/20 6:16 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:44 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> From: Julien Thierry <julien.thierry@arm.com>
>>
>> Current PCI IO region that is exposed through the DT contains ports that
>> are reserved by non-PCI devices.
>>
>> Use the proper PCI IO start so that the region exposed through DT can
>> actually be used to reassign device BARs.
> I guess the majority of the patch is about that the current allocation starts at 0x6200, which is not 4K aligned?
> It would be nice if we could mention this in the commit message.
>
> Actually, silly question: It seems like this 0x6200 is rather arbitrary, can't we just change that to a 4K aligned value and drop that patch here?
> If something on the x86 side relies on that value, it should rather be explicit than by chance.
> (Because while this patch here seems correct, it's also quite convoluted.)

I've taken a closer look at this patch, and to be honest right now it seems at
best redundant. I don't really understand why the start of the PCI ioport region
must be aligned to 4K - a Linux guest has no problem assigning address 0x1100 for
ioports without this patch, but with the rest of the series applied. On the
kvmtool side, arm doesn't have any fixed I/O device addresses like x86 does, so
it's safe to use to use the entire region starting at 0 for ioport allocation.
Even without any of the patches from this series, I haven't encountered any
instances of Linux complaining.

I'll test this some more before posting v3, but right now it looks to me like the
best course of action will be to drop the patch.

Thanks,
Alex
>
> Cheers,
> Andre.
>
>> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  arm/include/arm-common/pci.h |  1 +
>>  arm/kvm.c                    |  3 +++
>>  arm/pci.c                    | 21 ++++++++++++++++++---
>>  3 files changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/arm/include/arm-common/pci.h b/arm/include/arm-common/pci.h
>> index 9008a0ed072e..aea42b8895e9 100644
>> --- a/arm/include/arm-common/pci.h
>> +++ b/arm/include/arm-common/pci.h
>> @@ -1,6 +1,7 @@
>>  #ifndef ARM_COMMON__PCI_H
>>  #define ARM_COMMON__PCI_H
>>  
>> +void pci__arm_init(struct kvm *kvm);
>>  void pci__generate_fdt_nodes(void *fdt);
>>  
>>  #endif /* ARM_COMMON__PCI_H */
>> diff --git a/arm/kvm.c b/arm/kvm.c
>> index 1f85fc60588f..5c30ec1e0515 100644
>> --- a/arm/kvm.c
>> +++ b/arm/kvm.c
>> @@ -6,6 +6,7 @@
>>  #include "kvm/fdt.h"
>>  
>>  #include "arm-common/gic.h"
>> +#include "arm-common/pci.h"
>>  
>>  #include <linux/kernel.h>
>>  #include <linux/kvm.h>
>> @@ -86,6 +87,8 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
>>  	/* Create the virtual GIC. */
>>  	if (gic__create(kvm, kvm->cfg.arch.irqchip))
>>  		die("Failed to create virtual GIC");
>> +
>> +	pci__arm_init(kvm);
>>  }
>>  
>>  #define FDT_ALIGN	SZ_2M
>> diff --git a/arm/pci.c b/arm/pci.c
>> index ed325fa4a811..1c0949a22408 100644
>> --- a/arm/pci.c
>> +++ b/arm/pci.c
>> @@ -1,3 +1,5 @@
>> +#include "linux/sizes.h"
>> +
>>  #include "kvm/devices.h"
>>  #include "kvm/fdt.h"
>>  #include "kvm/kvm.h"
>> @@ -7,6 +9,11 @@
>>  
>>  #include "arm-common/pci.h"
>>  
>> +#define ARM_PCI_IO_START ALIGN(PCI_IOPORT_START, SZ_4K)
>> +
>> +/* Must be a multiple of 4k */
>> +#define ARM_PCI_IO_SIZE ((ARM_MMIO_AREA - ARM_PCI_IO_START) & ~(SZ_4K - 1))
>> +
>>  /*
>>   * An entry in the interrupt-map table looks like:
>>   * <pci unit address> <pci interrupt pin> <gic phandle> <gic interrupt>
>> @@ -24,6 +31,14 @@ struct of_interrupt_map_entry {
>>  	struct of_gic_irq		gic_irq;
>>  } __attribute__((packed));
>>  
>> +void pci__arm_init(struct kvm *kvm)
>> +{
>> +	u32 align_pad = ARM_PCI_IO_START - PCI_IOPORT_START;
>> +
>> +	/* Make PCI port allocation start at a properly aligned address */
>> +	pci_get_io_port_block(align_pad);
>> +}
>> +
>>  void pci__generate_fdt_nodes(void *fdt)
>>  {
>>  	struct device_header *dev_hdr;
>> @@ -40,10 +55,10 @@ void pci__generate_fdt_nodes(void *fdt)
>>  			.pci_addr = {
>>  				.hi	= cpu_to_fdt32(of_pci_b_ss(OF_PCI_SS_IO)),
>>  				.mid	= 0,
>> -				.lo	= 0,
>> +				.lo	= cpu_to_fdt32(ARM_PCI_IO_START),
>>  			},
>> -			.cpu_addr	= cpu_to_fdt64(KVM_IOPORT_AREA),
>> -			.length		= cpu_to_fdt64(ARM_IOPORT_SIZE),
>> +			.cpu_addr	= cpu_to_fdt64(ARM_PCI_IO_START),
>> +			.length		= cpu_to_fdt64(ARM_PCI_IO_SIZE),
>>  		},
>>  		{
>>  			.pci_addr = {

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region
  2020-03-04 16:20     ` Alexandru Elisei
@ 2020-03-05 13:06       ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-05 13:06 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

Hi,

On 3/4/20 4:20 PM, Alexandru Elisei wrote:
> Hi,
>
> On 1/29/20 6:16 PM, Andre Przywara wrote:
>> On Thu, 23 Jan 2020 13:47:44 +0000
>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>
>> Hi,
>>
>>> From: Julien Thierry <julien.thierry@arm.com>
>>>
>>> Current PCI IO region that is exposed through the DT contains ports that
>>> are reserved by non-PCI devices.
>>>
>>> Use the proper PCI IO start so that the region exposed through DT can
>>> actually be used to reassign device BARs.
>> I guess the majority of the patch is about that the current allocation starts at 0x6200, which is not 4K aligned?
>> It would be nice if we could mention this in the commit message.
>>
>> Actually, silly question: It seems like this 0x6200 is rather arbitrary, can't we just change that to a 4K aligned value and drop that patch here?
>> If something on the x86 side relies on that value, it should rather be explicit than by chance.
>> (Because while this patch here seems correct, it's also quite convoluted.)
> I've taken a closer look at this patch, and to be honest right now it seems at
> best redundant. I don't really understand why the start of the PCI ioport region
> must be aligned to 4K - a Linux guest has no problem assigning address 0x1100 for
> ioports without this patch, but with the rest of the series applied. On the
> kvmtool side, arm doesn't have any fixed I/O device addresses like x86 does, so
> it's safe to use to use the entire region starting at 0 for ioport allocation.
> Even without any of the patches from this series, I haven't encountered any
> instances of Linux complaining.
>
> I'll test this some more before posting v3, but right now it looks to me like the
> best course of action will be to drop the patch.

I spoke too soon, the problem is more subtle than that. I forgot about the uart,
which is accessible at addresses 0x{2,3}{e,f}8. In practice, having the uart
overlap the PCI ioports region works by chance because the uart driver claims the
memory resource before the PCI driver. My first idea was to have the PCI ioport
region start at 0x6200 (that's where the PCI code starts allocating ioports), but
I get this splat with a 5.5 Linux guest:

[    0.523407] ------------[ cut here ]------------
[    0.524059] kernel BUG at lib/ioremap.c:74!
[    0.524597] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[    0.525306] Modules linked in:
[    0.525706] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0 #1
[    0.526456] Hardware name: linux,dummy-virt (DT)
[    0.527047] pstate: 80000005 (Nzcv daif -PAN -UAO)
[    0.527761] pc : ioremap_page_range+0x30c/0x3a0
[    0.528351] lr : pci_remap_iospace+0xb0/0x118
[    0.528913] sp : ffff800011ccf9b0
[    0.529334] x29: ffff800011ccf9b0 x28: ffff8000113701f8
[    0.530016] x27: ffffffdfffe00000 x26: 0400000000000001
[    0.530697] x25: ffff8000118dc8d8 x24: 0000000000000041
[    0.531423] x23: ffffffdffec09000 x22: ffffffdffec09000
[    0.532135] x21: ffffffdffec09000 x20: ffff000001acff00
[    0.532848] x19: 0000000000000001 x18: 0000000000000010
[    0.533561] x17: 00000000ab887d9b x16: 00000000f86b3432
[    0.534273] x15: ffffffffffffffff x14: 303530307830203e
[    0.534978] x13: 0000000020000000 x12: 0000000000007000
[    0.535704] x11: 3030303030303035 x10: 30307830204d454d
[    0.536407] x9 : 0000000000007000 x8 : ffff0000c13da500
[    0.537097] x7 : ffffffdffec00000 x6 : ffff000001abf7f8
[    0.537786] x5 : ffff8000118dc000 x4 : ffffffdffec00000
[    0.538506] x3 : 0068000000000f07 x2 : 0140000000000000
[    0.539243] x1 : 0000002001400000 x0 : 006800017ff20f13
[    0.539977] Call trace:
[    0.540304]  ioremap_page_range+0x30c/0x3a0
[    0.540854]  pci_remap_iospace+0xb0/0x118
[    0.541376]  devm_pci_remap_iospace+0x48/0x98
[    0.541946]  pci_parse_request_of_pci_ranges+0x148/0x1c0
[    0.542650]  pci_host_common_probe+0x68/0x1d0
[    0.543180]  gen_pci_probe+0x2c/0x38
[    0.543665]  platform_drv_probe+0x50/0xa0
[    0.544186]  really_probe+0xd4/0x308
[    0.544632]  driver_probe_device+0x54/0xe8
[    0.545138]  device_driver_attach+0x6c/0x78
[    0.545654]  __driver_attach+0x54/0xd0
[    0.546115]  bus_for_each_dev+0x70/0xc0
[    0.546588]  driver_attach+0x20/0x28
[    0.547029]  bus_add_driver+0x178/0x1d8
[    0.547554]  driver_register+0x60/0x110
[    0.548032]  __platform_driver_register+0x44/0x50
[    0.548612]  gen_pci_driver_init+0x18/0x20
[    0.549137]  do_one_initcall+0x74/0x1a8
[    0.549612]  kernel_init_freeable+0x190/0x1f4
[    0.550150]  kernel_init+0x10/0x100
[    0.550581]  ret_from_fork+0x10/0x18
[    0.551023] Code: a9446bf9 a94573fb a8cb7bfd d65f03c0 (d4210000)
[    0.551839] ---[ end trace c04d8b733115ba34 ]---
[    0.552411] note: swapper/0[1] exited with preempt_count 1
[    0.553088] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    0.554029] SMP: stopping secondary CPUs
[    0.554631] Kernel Offset: disabled
[    0.555063] CPU features: 0x00002,20802008
[    0.555587] Memory Limit: none
[    0.556147] ---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b ]---

The reason for that is that the address 0x6200 is not page aligned, and it is
already mapped because it happened to overlap with a previous page allocation. I
changed it to be 4k aligned to make sure it's not already mapped, and it worked.
However, even with the address aligned to 4k, a Linux guest which uses 64k pages
still gets the above splat. The ioports region is 64k, so we cannot get away with
simply rounding it up to the nearest 64k multiple. Instead, I'm going to try
moving the entire ioports region from [0, 64k) to [64k, 128k) and making
ARM_MMIO_AREA smaller by 64k. I'll send a new patch when I respin the series.

Thanks,
Alex
>
> Thanks,
> Alex
>> Cheers,
>> Andre.
>>
>>> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>> ---
>>>  arm/include/arm-common/pci.h |  1 +
>>>  arm/kvm.c                    |  3 +++
>>>  arm/pci.c                    | 21 ++++++++++++++++++---
>>>  3 files changed, 22 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arm/include/arm-common/pci.h b/arm/include/arm-common/pci.h
>>> index 9008a0ed072e..aea42b8895e9 100644
>>> --- a/arm/include/arm-common/pci.h
>>> +++ b/arm/include/arm-common/pci.h
>>> @@ -1,6 +1,7 @@
>>>  #ifndef ARM_COMMON__PCI_H
>>>  #define ARM_COMMON__PCI_H
>>>  
>>> +void pci__arm_init(struct kvm *kvm);
>>>  void pci__generate_fdt_nodes(void *fdt);
>>>  
>>>  #endif /* ARM_COMMON__PCI_H */
>>> diff --git a/arm/kvm.c b/arm/kvm.c
>>> index 1f85fc60588f..5c30ec1e0515 100644
>>> --- a/arm/kvm.c
>>> +++ b/arm/kvm.c
>>> @@ -6,6 +6,7 @@
>>>  #include "kvm/fdt.h"
>>>  
>>>  #include "arm-common/gic.h"
>>> +#include "arm-common/pci.h"
>>>  
>>>  #include <linux/kernel.h>
>>>  #include <linux/kvm.h>
>>> @@ -86,6 +87,8 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
>>>  	/* Create the virtual GIC. */
>>>  	if (gic__create(kvm, kvm->cfg.arch.irqchip))
>>>  		die("Failed to create virtual GIC");
>>> +
>>> +	pci__arm_init(kvm);
>>>  }
>>>  
>>>  #define FDT_ALIGN	SZ_2M
>>> diff --git a/arm/pci.c b/arm/pci.c
>>> index ed325fa4a811..1c0949a22408 100644
>>> --- a/arm/pci.c
>>> +++ b/arm/pci.c
>>> @@ -1,3 +1,5 @@
>>> +#include "linux/sizes.h"
>>> +
>>>  #include "kvm/devices.h"
>>>  #include "kvm/fdt.h"
>>>  #include "kvm/kvm.h"
>>> @@ -7,6 +9,11 @@
>>>  
>>>  #include "arm-common/pci.h"
>>>  
>>> +#define ARM_PCI_IO_START ALIGN(PCI_IOPORT_START, SZ_4K)
>>> +
>>> +/* Must be a multiple of 4k */
>>> +#define ARM_PCI_IO_SIZE ((ARM_MMIO_AREA - ARM_PCI_IO_START) & ~(SZ_4K - 1))
>>> +
>>>  /*
>>>   * An entry in the interrupt-map table looks like:
>>>   * <pci unit address> <pci interrupt pin> <gic phandle> <gic interrupt>
>>> @@ -24,6 +31,14 @@ struct of_interrupt_map_entry {
>>>  	struct of_gic_irq		gic_irq;
>>>  } __attribute__((packed));
>>>  
>>> +void pci__arm_init(struct kvm *kvm)
>>> +{
>>> +	u32 align_pad = ARM_PCI_IO_START - PCI_IOPORT_START;
>>> +
>>> +	/* Make PCI port allocation start at a properly aligned address */
>>> +	pci_get_io_port_block(align_pad);
>>> +}
>>> +
>>>  void pci__generate_fdt_nodes(void *fdt)
>>>  {
>>>  	struct device_header *dev_hdr;
>>> @@ -40,10 +55,10 @@ void pci__generate_fdt_nodes(void *fdt)
>>>  			.pci_addr = {
>>>  				.hi	= cpu_to_fdt32(of_pci_b_ss(OF_PCI_SS_IO)),
>>>  				.mid	= 0,
>>> -				.lo	= 0,
>>> +				.lo	= cpu_to_fdt32(ARM_PCI_IO_START),
>>>  			},
>>> -			.cpu_addr	= cpu_to_fdt64(KVM_IOPORT_AREA),
>>> -			.length		= cpu_to_fdt64(ARM_IOPORT_SIZE),
>>> +			.cpu_addr	= cpu_to_fdt64(ARM_PCI_IO_START),
>>> +			.length		= cpu_to_fdt64(ARM_PCI_IO_SIZE),
>>>  		},
>>>  		{
>>>  			.pci_addr = {

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 10/30] virtio/pci: Make memory and IO BARs independent
  2020-01-29 18:16   ` Andre Przywara
@ 2020-03-05 15:41     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-05 15:41 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

Hi,

On 1/29/20 6:16 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:45 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> From: Julien Thierry <julien.thierry@arm.com>
>>
>> Currently, callbacks for memory BAR 1 call the IO port emulation.  This
>> means that the memory BAR needs I/O Space to be enabled whenever Memory
>> Space is enabled.
>>
>> Refactor the code so the two type of  BARs are independent. Also, unify
>> ioport/mmio callback arguments so that they all receive a virtio_device.
> That's a nice cleanup, I like that it avoids shoehorning everything as legacy I/O into the emulation.
>
> Just a nit below, but nevertheless:
>  
>> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Thank you!

[..]

>
>> +static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>> +{
>> +	unsigned long offset;
>> +	struct virtio_device *vdev;
>> +	struct virtio_pci *vpci;
>> +
>> +	vdev = ioport->priv;
>> +	vpci = vdev->virtio;
>> +	offset = port - vpci->port_addr;
> You could initialise the variables directly at their declaration, which looks nicer and underlines that they are just helper variables.
> Same below.

Sure, makes sense. I'll make the change for virtio_pci__io_{in,out}.

Thanks, Alex

>
> Cheers,
> Andre.
>
>> +
>> +	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
>> +}
>> +
>>  static void update_msix_map(struct virtio_pci *vpci,
>>  			    struct msix_table *msix_entry, u32 vecnum)
>>  {
>> @@ -185,8 +195,8 @@ static void update_msix_map(struct virtio_pci *vpci,
>>  	irq__update_msix_route(vpci->kvm, gsi, &msix_entry->msg);
>>  }
>>  
>> -static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *vdev, u16 port,
>> -					void *data, int size, int offset)
>> +static bool virtio_pci__specific_data_out(struct kvm *kvm, struct virtio_device *vdev,
>> +					  void *data, int size, unsigned long offset)
>>  {
>>  	struct virtio_pci *vpci = vdev->virtio;
>>  	u32 config_offset, vec;
>> @@ -259,19 +269,16 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *v
>>  	return false;
>>  }
>>  
>> -static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>> +static bool virtio_pci__data_out(struct kvm_cpu *vcpu, struct virtio_device *vdev,
>> +				 unsigned long offset, void *data, int size)
>>  {
>> -	unsigned long offset;
>>  	bool ret = true;
>> -	struct virtio_device *vdev;
>>  	struct virtio_pci *vpci;
>>  	struct kvm *kvm;
>>  	u32 val;
>>  
>>  	kvm = vcpu->kvm;
>> -	vdev = ioport->priv;
>>  	vpci = vdev->virtio;
>> -	offset = port - vpci->port_addr;
>>  
>>  	switch (offset) {
>>  	case VIRTIO_PCI_GUEST_FEATURES:
>> @@ -304,13 +311,26 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
>>  		virtio_notify_status(kvm, vdev, vpci->dev, vpci->status);
>>  		break;
>>  	default:
>> -		ret = virtio_pci__specific_io_out(kvm, vdev, port, data, size, offset);
>> +		ret = virtio_pci__specific_data_out(kvm, vdev, data, size, offset);
>>  		break;
>>  	};
>>  
>>  	return ret;
>>  }
>>  
>> +static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>> +{
>> +	unsigned long offset;
>> +	struct virtio_device *vdev;
>> +	struct virtio_pci *vpci;
>> +
>> +	vdev = ioport->priv;
>> +	vpci = vdev->virtio;
>> +	offset = port - vpci->port_addr;
>> +
>> +	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
>> +}
>> +
>>  static struct ioport_operations virtio_pci__io_ops = {
>>  	.io_in	= virtio_pci__io_in,
>>  	.io_out	= virtio_pci__io_out,
>> @@ -320,7 +340,8 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
>>  					   u64 addr, u8 *data, u32 len,
>>  					   u8 is_write, void *ptr)
>>  {
>> -	struct virtio_pci *vpci = ptr;
>> +	struct virtio_device *vdev = ptr;
>> +	struct virtio_pci *vpci = vdev->virtio;
>>  	struct msix_table *table;
>>  	int vecnum;
>>  	size_t offset;
>> @@ -419,11 +440,15 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>>  					 u64 addr, u8 *data, u32 len,
>>  					 u8 is_write, void *ptr)
>>  {
>> -	struct virtio_pci *vpci = ptr;
>> -	int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
>> -	u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
>> +	struct virtio_device *vdev = ptr;
>> +	struct virtio_pci *vpci = vdev->virtio;
>>  
>> -	kvm__emulate_io(vcpu, port, data, direction, len, 1);
>> +	if (!is_write)
>> +		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
>> +				    data, len);
>> +	else
>> +		virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr,
>> +				     data, len);
>>  }
>>  
>>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>> @@ -445,13 +470,13 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>>  
>>  	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
>>  	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
>> -			       virtio_pci__io_mmio_callback, vpci);
>> +			       virtio_pci__io_mmio_callback, vdev);
>>  	if (r < 0)
>>  		goto free_ioport;
>>  
>>  	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
>>  	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
>> -			       virtio_pci__msix_mmio_callback, vpci);
>> +			       virtio_pci__msix_mmio_callback, vdev);
>>  	if (r < 0)
>>  		goto free_mmio;
>>  

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 14/30] vfio/pci: Don't access potentially unallocated regions
  2020-01-29 18:17   ` Andre Przywara
@ 2020-03-06 10:54     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-06 10:54 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 1/29/20 6:17 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:49 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> Don't try to configure a BAR if there is no region associated with it.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  vfio/pci.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 1f38f90c3ae9..f86a7d9b7032 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -652,6 +652,8 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
>>  
>>  	/* Initialise the BARs */
>>  	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
>> +		if ((u32)i == vdev->info.num_regions)
>> +			break;
> My inner check-patch complains that we should not have code before declarations.
> Can we solve this the same way as below?

Sure, I'll change it and update the commit message accordingly.

Thanks, Alex

>
> Cheers,
> Andre
>
>
>>  		u64 base;
>>  		struct vfio_region *region = &vdev->regions[i];
>>  
>> @@ -853,11 +855,12 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>>  	u32 bar;
>>  	size_t map_size;
>>  	struct vfio_pci_device *pdev = &vdev->pci;
>> -	struct vfio_region *region = &vdev->regions[nr];
>> +	struct vfio_region *region;
>>  
>>  	if (nr >= vdev->info.num_regions)
>>  		return 0;
>>  
>> +	region = &vdev->regions[nr];
>>  	bar = pdev->hdr.bar[nr];
>>  
>>  	region->vdev = vdev;

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures
  2020-01-30 14:51   ` Andre Przywara
@ 2020-03-06 11:20     ` Alexandru Elisei
  2020-03-30  9:27       ` André Przywara
  0 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-06 11:20 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 1/30/20 2:51 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:50 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> Don't ignore an error in the bus specific initialization function in
>> virtio_init; don't ignore the result of virtio_init; and don't return 0
>> in virtio_blk__init and virtio_scsi__init when we encounter an error.
>> Hopefully this will save some developer's time debugging faulty virtio
>> devices in a guest.
> Seems like the right thing to do, but I was wondering how you triggered this? AFAICS virtio_init only fails when calloc() fails or you pass an illegal transport, with the latter looking like being hard coded to one of the two supported.

I haven't triggered it. I found it by inspection. The transport-specific
initialization functions can fail for various reasons (ioport_register or
kvm__register_mmio can fail because some device emulation claimed all the MMIO
space or the MMIO space was configured incorrectly in the kvm-arch.h header file;
or memory allocation failed, etc) and this is the reason they return an int.
Because of this, virtio_init can fail and this is the reason it too returns an
int. It makes sense to check that the protocol that your device uses is actually
working.

>
> One minor thing below ...

[..]

>> diff --git a/virtio/net.c b/virtio/net.c
>> index 091406912a24..425c13ba1136 100644
>> --- a/virtio/net.c
>> +++ b/virtio/net.c
>> @@ -910,7 +910,7 @@ done:
>>  
>>  static int virtio_net__init_one(struct virtio_net_params *params)
>>  {
>> -	int i, err;
>> +	int i, r;
>>  	struct net_dev *ndev;
>>  	struct virtio_ops *ops;
>>  	enum virtio_trans trans = VIRTIO_DEFAULT_TRANS(params->kvm);
>> @@ -920,10 +920,8 @@ static int virtio_net__init_one(struct virtio_net_params *params)
>>  		return -ENOMEM;
>>  
>>  	ops = malloc(sizeof(*ops));
>> -	if (ops == NULL) {
>> -		err = -ENOMEM;
>> -		goto err_free_ndev;
>> -	}
>> +	if (ops == NULL)
>> +		return -ENOMEM;
> Doesn't that leave struct net_dev allocated? I am happy with removing the goto, but we should free(ndev) before we return, I think.

Nope, the cleanup routine in virtio_net__exit takes care of deallocating it (you
get there from virtio_net__init if virtio_net__init_one fails).

Thanks,
Alex

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 16/30] Don't ignore errors registering a device, ioport or mmio emulation
  2020-01-30 14:51   ` Andre Przywara
@ 2020-03-06 11:28     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-06 11:28 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 1/30/20 2:51 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:51 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> An error returned by device__register, kvm__register_mmio and
>> ioport__register means that the device will
>> not be emulated properly. Annotate the functions with __must_check, so we
>> get a compiler warning when this error is ignored.
>>
>> And fix several instances where the caller returns 0 even if the
>> function failed.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Looks alright, one minor nit below, with that fixed:
>
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>

[..]

>> diff --git a/ioport.c b/ioport.c
>> index a72e4035881a..d224819c6e43 100644
>> --- a/ioport.c
>> +++ b/ioport.c
>> @@ -91,16 +91,21 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>>  	};
>>  
>>  	r = ioport_insert(&ioport_tree, entry);
>> -	if (r < 0) {
>> -		free(entry);
>> -		br_write_unlock(kvm);
>> -		return r;
>> -	}
>> -
>> -	device__register(&entry->dev_hdr);
>> +	if (r < 0)
>> +		goto out_free;
>> +	r = device__register(&entry->dev_hdr);
>> +	if (r < 0)
>> +		goto out_erase;
>>  	br_write_unlock(kvm);
>>  
>>  	return port;
>> +
>> +out_erase:
>> +	rb_int_erase(&ioport_tree, &entry->node);
> To keep the abstraction, shouldn't that rather be ioport_remove() instead?

ioport__register already uses rb_int_erase to remove a node (at the beginning, if
the requested port is already allocated). But you're right, it should use
ioport_remove in both cases, like ioport__unregister{,_all} does.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 17/30] hw/vesa: Don't ignore fatal errors
  2020-01-30 14:52   ` Andre Przywara
@ 2020-03-06 12:33     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-06 12:33 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 1/30/20 2:52 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:52 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
>> Failling an mmap call or creating a memslot means that device emulation
>> will not work, don't ignore it.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  hw/vesa.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/vesa.c b/hw/vesa.c
>> index b92cc990b730..a665736a76d7 100644
>> --- a/hw/vesa.c
>> +++ b/hw/vesa.c
>> @@ -76,9 +76,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>  
>>  	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
>>  	if (mem == MAP_FAILED)
>> -		ERR_PTR(-errno);
>> +		return ERR_PTR(-errno);
>>  
>> -	kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
>> +	r = kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
>> +	if (r < 0)
>> +		return ERR_PTR(r);
> For the sake of correctness, we should munmap here, I think.
> With that fixed:
>
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Actually, I think the correct cleanup order should be munmap(mem) ->
device__unregister(vesa_device) -> ioport__unregister(vesa_base_addr). I'll drop
your R-b.

Thanks,
Alex
>
> Cheers,
> Andre.
>
>>  
>>  	vesafb = (struct framebuffer) {
>>  		.width			= VESA_WIDTH,

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0
  2020-02-05 17:00       ` Andre Przywara
@ 2020-03-06 12:40         ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-06 12:40 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/5/20 5:00 PM, Andre Przywara wrote:
> On Mon, 3 Feb 2020 12:27:55 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
>> Hi Andre,
>>
>> On 2/3/20 12:20 PM, Andre Przywara wrote:
>>> On Thu, 23 Jan 2020 13:47:53 +0000
>>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>>  
>>>> BAR 0 is an I/O BAR and is registered as an ioport region. Let's set its
>>>> size, so a guest can actually use it.  
>>> Well, the whole I/O bar emulates as RAZ/WI, so I would be curious how the guest would actually use it, but specifying the size is surely a good thing, so:  
>> Yeah, you're right, I was thinking about ARM where ioport are MMIO and you need to
>> map those address. I'll remove the part about the guest being able to actually use
>> it in the next iteration of the series.. Is it OK if I keep your Reviewed-by?
> Sure, as I mentioned the patch itself is fine.
>
> Thanks,
> Andre.
>
>>>    
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>  
>>> Reviewed-by: Andre Przywara <andre.przywara>

I'm going to go ahead thinking it was a typo and you meant andre.przywara@arm.com
when posting the next iteration of this series. Please let me know if I got it wrong.

Thanks,
Alex
>>>
>>> Cheers,
>>> Andre
>>>  
>>>> ---
>>>>  hw/vesa.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/hw/vesa.c b/hw/vesa.c
>>>> index a665736a76d7..e988c0425946 100644
>>>> --- a/hw/vesa.c
>>>> +++ b/hw/vesa.c
>>>> @@ -70,6 +70,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>>>  
>>>>  	vesa_base_addr			= (u16)r;
>>>>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
>>>> +	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
>>>>  	r = device__register(&vesa_device);
>>>>  	if (r < 0)
>>>>  		return ERR_PTR(r);  

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs
  2020-02-05 17:01   ` Andre Przywara
@ 2020-03-09 12:38     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-09 12:38 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/5/20 5:01 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:57 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> When we want to map a device region into the guest address space, first
>> we perform an mmap on the device fd. The resulting VMA is a mapping
>> between host userspace addresses and physical addresses associated with
>> the device. Next, we create a memslot, which populates the stage 2 table
>> with the mappings between guest physical addresses and the device
>> physical adresses.
>>
>> However, when we want to unmap the device from the guest address space,
>> we only call munmap, which destroys the VMA and the stage 2 mappings,
>> but doesn't destroy the memslot and kvmtool's internal mem_bank
>> structure associated with the memslot.
>>
>> This has been perfectly fine so far, because we only unmap a device
>> region when we exit kvmtool. This is will change when we add support for
>> reassignable BARs, and we will have to unmap vfio regions as the guest
>> kernel writes new addresses in the BARs. This can lead to two possible
>> problems:
>>
>> - We refuse to create a valid BAR mapping because of a stale mem_bank
>>   structure which belonged to a previously unmapped region.
>>
>> - It is possible that the mmap in vfio_map_region returns the same
>>   address that was used to create a memslot, but was unmapped by
>>   vfio_unmap_region. Guest accesses to the device memory will fault
>>   because the stage 2 mappings are missing, and this can lead to
>>   performance degradation.
>>
>> Let's do the right thing and destroy the memslot and the mem_bank struct
>> associated with it when we unmap a vfio region. Set host_addr to NULL
>> after the munmap call so we won't try to unmap an address which is
>> currently used if vfio_unmap_region gets called twice.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  include/kvm/kvm.h |  2 ++
>>  kvm.c             | 65 ++++++++++++++++++++++++++++++++++++++++++++---
>>  vfio/core.c       |  6 +++++
>>  3 files changed, 69 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
>> index 50119a8672eb..c7e57b890cdd 100644
>> --- a/include/kvm/kvm.h
>> +++ b/include/kvm/kvm.h
>> @@ -56,6 +56,7 @@ struct kvm_mem_bank {
>>  	void			*host_addr;
>>  	u64			size;
>>  	enum kvm_mem_type	type;
>> +	u32			slot;
>>  };
>>  
>>  struct kvm {
>> @@ -106,6 +107,7 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
>>  void kvm__irq_trigger(struct kvm *kvm, int irq);
>>  bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count);
>>  bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write);
>> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr);
>>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr,
>>  		      enum kvm_mem_type type);
>>  static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size,
>> diff --git a/kvm.c b/kvm.c
>> index 57c4ff98ec4c..afcf55c7bf45 100644
>> --- a/kvm.c
>> +++ b/kvm.c
>> @@ -183,20 +183,75 @@ int kvm__exit(struct kvm *kvm)
>>  }
>>  core_exit(kvm__exit);
>>  
>> +int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>> +		     void *userspace_addr)
>> +{
>> +	struct kvm_userspace_memory_region mem;
>> +	struct kvm_mem_bank *bank;
>> +	int ret;
>> +
>> +	list_for_each_entry(bank, &kvm->mem_banks, list)
>> +		if (bank->guest_phys_addr == guest_phys &&
>> +		    bank->size == size && bank->host_addr == userspace_addr)
>> +			break;
> Shouldn't we protect the list with some lock? I am actually not sure we have this problem already, but at least now a guest could reassign BARs concurrently on different VCPUs, in which case multiple kvm__destroy_mem() and kvm__register_dev_mem() calls might race against each other.
> I think so far we got away with it because of the currently static nature of the memslot assignment.

And the fact that I haven't tested PCI passthrough with more than one device :)
I'll protect changes to the memory banks with a lock.

>
>> +
>> +	if (&bank->list == &kvm->mem_banks) {
>> +		pr_err("Region [%llx-%llx] not found", guest_phys,
>> +		       guest_phys + size - 1);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (bank->type == KVM_MEM_TYPE_RESERVED) {
>> +		pr_err("Cannot delete reserved region [%llx-%llx]",
>> +		       guest_phys, guest_phys + size - 1);
>> +		return -EINVAL;
>> +	}
>> +
>> +	mem = (struct kvm_userspace_memory_region) {
>> +		.slot			= bank->slot,
>> +		.guest_phys_addr	= guest_phys,
>> +		.memory_size		= 0,
>> +		.userspace_addr		= (unsigned long)userspace_addr,
>> +	};
>> +
>> +	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
>> +	if (ret < 0)
>> +		return -errno;
>> +
>> +	list_del(&bank->list);
>> +	free(bank);
>> +	kvm->mem_slots--;
>> +
>> +	return 0;
>> +}
>> +
>>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>>  		      void *userspace_addr, enum kvm_mem_type type)
>>  {
>>  	struct kvm_userspace_memory_region mem;
>>  	struct kvm_mem_bank *merged = NULL;
>>  	struct kvm_mem_bank *bank;
>> +	struct list_head *prev_entry;
>> +	u32 slot;
>>  	int ret;
>>  
>> -	/* Check for overlap */
>> +	/* Check for overlap and find first empty slot. */
>> +	slot = 0;
>> +	prev_entry = &kvm->mem_banks;
>>  	list_for_each_entry(bank, &kvm->mem_banks, list) {
>>  		u64 bank_end = bank->guest_phys_addr + bank->size - 1;
>>  		u64 end = guest_phys + size - 1;
>> -		if (guest_phys > bank_end || end < bank->guest_phys_addr)
>> +		if (guest_phys > bank_end || end < bank->guest_phys_addr) {
>> +			/*
>> +			 * Keep the banks sorted ascending by slot, so it's
>> +			 * easier for us to find a free slot.
>> +			 */
>> +			if (bank->slot == slot) {
>> +				slot++;
>> +				prev_entry = &bank->list;
>> +			}
>>  			continue;
>> +		}
>>  
>>  		/* Merge overlapping reserved regions */
>>  		if (bank->type == KVM_MEM_TYPE_RESERVED &&
>> @@ -241,10 +296,11 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>>  	bank->host_addr			= userspace_addr;
>>  	bank->size			= size;
>>  	bank->type			= type;
>> +	bank->slot			= slot;
>>  
>>  	if (type != KVM_MEM_TYPE_RESERVED) {
>>  		mem = (struct kvm_userspace_memory_region) {
>> -			.slot			= kvm->mem_slots++,
>> +			.slot			= slot,
>>  			.guest_phys_addr	= guest_phys,
>>  			.memory_size		= size,
>>  			.userspace_addr		= (unsigned long)userspace_addr,
>> @@ -255,7 +311,8 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
>>  			return -errno;
>>  	}
>>  
>> -	list_add(&bank->list, &kvm->mem_banks);
>> +	list_add(&bank->list, prev_entry);
>> +	kvm->mem_slots++;
>>  
>>  	return 0;
>>  }
>> diff --git a/vfio/core.c b/vfio/core.c
>> index 0ed1e6fee6bf..73fdac8be675 100644
>> --- a/vfio/core.c
>> +++ b/vfio/core.c
>> @@ -256,8 +256,14 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
>>  
>>  void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
>>  {
>> +	u64 map_size;
>> +
>>  	if (region->host_addr) {
>> +		map_size = ALIGN(region->info.size, PAGE_SIZE);
>>  		munmap(region->host_addr, region->info.size);
>> +		kvm__destroy_mem(kvm, region->guest_phys_addr, map_size,
>> +				 region->host_addr);
> Shouldn't we destroy the memslot first, then unmap? Because in the current version we are giving a no longer valid userland address to the ioctl. I actually wonder how that passes the access_ok() check in the kernel's KVM_SET_USER_MEMORY_REGION handler.

Yes, you're right. From Documentation/virt/kvm/api.txt, section 4.35
KVM_SET_USER_MEMORY_REGION:

"[..] Memory for the region is taken starting at the address denoted by the field
userspace_addr, which must point at user addressable memory for the entire memory
slot size."

I'll put the munmap after the ioctl.

Thanks,
Alex
>
> Cheers,
> Andre
>
>> +		region->host_addr = NULL;
>>  	} else if (region->is_ioport) {
>>  		ioport__unregister(kvm, region->port_base);
>>  	} else {

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation
  2020-02-07 11:36       ` Andre Przywara
  2020-02-07 11:44         ` Alexandru Elisei
@ 2020-03-09 14:54         ` Alexandru Elisei
  1 sibling, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-09 14:54 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/7/20 11:36 AM, Andre Przywara wrote:
> On Fri, 7 Feb 2020 11:08:19 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> On 2/6/20 6:21 PM, Andre Przywara wrote:
>>> On Thu, 23 Jan 2020 13:48:01 +0000
>>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>>
>>> Hi,
>>>  
>>>> During configuration of the BAR addresses, a Linux guest disables and
>>>> enables access to I/O and memory space. When access is disabled, we don't
>>>> stop emulating the memory regions described by the BARs. Now that we have
>>>> callbacks for activating and deactivating emulation for a BAR region,
>>>> let's use that to stop emulation when access is disabled, and
>>>> re-activate it when access is re-enabled.
>>>>
>>>> The vesa emulation hasn't been designed with toggling on and off in
>>>> mind, so refuse writes to the PCI command register that disable memory
>>>> or IO access.
>>>>
>>>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>>>> ---
>>>>  hw/vesa.c | 16 ++++++++++++++++
>>>>  pci.c     | 42 ++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 58 insertions(+)
>>>>
>>>> diff --git a/hw/vesa.c b/hw/vesa.c
>>>> index 74ebebbefa6b..3044a86078fb 100644
>>>> --- a/hw/vesa.c
>>>> +++ b/hw/vesa.c
>>>> @@ -81,6 +81,18 @@ static int vesa__bar_deactivate(struct kvm *kvm,
>>>>  	return -EINVAL;
>>>>  }
>>>>  
>>>> +static void vesa__pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>>>> +				u8 offset, void *data, int sz)
>>>> +{
>>>> +	u32 value;  
>>> I guess the same comment as on the other patch applies: using u64 looks safer to me. Also you should clear it, to avoid nasty surprises in case of a short write (1 or 2 bytes only).  
>> I was under the impression that the maximum size for a write to the PCI CAM or
>> ECAM space is 32 bits. This is certainly what I've seen when running Linux, and
>> the assumption in the PCI emulation code which has been working since 2010. I'm
>> trying to dig out more information about this.
>>
>> If it's not, then we have a bigger problem because the PCI emulation code doesn't
>> support it, and to account for it we would need to add a certain amount of logic
>> to the code to deal with it: what if a write hits the command register and another
>> adjacent register? what if a write hits two BARs? A BAR and a regular register
>> before/after it? Part of a BAR and two registers before/after? You can see where
>> this is going.
>>
>> Until we find exactly where in a PCI spec says that 64 bit writes to the
>> configuration space are allowed, I would rather avoid all this complexity and
>> assume that the guest is sane and will only write 32 bit values.
> I don't think it's allowed, but that's not the point here:
> If a (malicious?) guest does a 64-bit write, it will overwrite kvmtool's stack. We should not allow that. We don't need to behave correctly, but the guest should not be able to affect the host (VMM). All it should take is to have "u64 value = 0;" to fix that.

Did a lot of digging about this. From PCI Local Bus 3.0, section 3.8:

"The bandwidth requirements for I/O and *configuration* transactions cannot
justify the added complexity, and, therefore, only memory transactions support
64-bit data transfers" (emphasis added).

So 64-bit data transfers are *not* allowed for configuration space accesses.

From PCI Express Base Specification Revision 1.1, section 7.2.2:

"Because Root Complex implementations are not required to support the generation
of Configuration Requests from memory space accesses that cross DW boundaries, or
that use locked semantics, software should take care not to cause the generation
of such requests when using the memory-mapped configuration access mechanism
unless it is known that the Root Complex implementation being used will support
the translation"

So the PCI Express spec clearly states that only particular implementations
support 64-bit accesses to the configuration space; not generic implementations.

I'll modify the PCI emulation layer to forbid accesses wider than 32-bit.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 24/30] vfio/pci: Don't write configuration value twice
  2020-02-05 18:35   ` Andre Przywara
@ 2020-03-09 15:21     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-09 15:21 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/5/20 6:35 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:59 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> After writing to the device fd as part of the PCI configuration space
>> emulation, we read back from the device to make sure that the write
>> finished. The value is read back into the PCI configuration space and
>> afterwards, the same value is copied by the PCI emulation code. Let's
>> read from the device fd into a temporary variable, to prevent this
>> double write.
>>
>> The double write is harmless in itself. But when we implement
>> reassignable BARs, we need to keep track of the old BAR value, and the
>> VFIO code is overwritting it.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  vfio/pci.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index abde16dc8693..8a775a4a4a54 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -470,7 +470,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>>  	struct vfio_region_info *info;
>>  	struct vfio_pci_device *pdev;
>>  	struct vfio_device *vdev;
>> -	void *base = pci_hdr;
>> +	u32 tmp;
> Can we make this a u64, please? I am not sure if 64-bit MMIO is allowed for PCI config space accesses, but a guest could do it anyway, and it looks like it would overwrite the vdev pointer on the stack here in this case.

See my replies to the next patch in the series.

Thanks,
Alex
>
> Cheers,
> Andre.
>
>>  
>>  	if (offset == PCI_ROM_ADDRESS)
>>  		return;
>> @@ -490,7 +490,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>>  	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
>>  		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
>>  
>> -	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
>> +	if (pread(vdev->fd, &tmp, sz, info->offset + offset) != sz)
>>  		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
>>  			      sz, offset);
>>  }

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 27/30] pci: Implement reassignable BARs
  2020-02-07 16:50   ` Andre Przywara
@ 2020-03-10 14:17     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-10 14:17 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/7/20 4:50 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:48:02 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> BARs are used by the guest to configure the access to the PCI device by
>> writing the address to which the device will respond. The basic idea for
>> adding support for reassignable BARs is straightforward: deactivate
>> emulation for the memory region described by the old BAR value, and
>> activate emulation for the new region.
>>
>> BAR reassignement can be done while device access is enabled and memory
>> regions for different devices can overlap as long as no access is made
>> to the overlapping memory regions. This means that it is legal for the
>> BARs of two distinct devices to point to an overlapping memory region,
>> and indeed, this is how Linux does resource assignment at boot. To
>> account for this situation, the simple algorithm described above is
>> enhanced to scan for all devices and:
>>
>> - Deactivate emulation for any BARs that might overlap with the new BAR
>>   value.
>>
>> - Enable emulation for any BARs that were overlapping with the old value
>>   after the BAR has been updated.
>>
>> Activating/deactivating emulation of a memory region has side effects.
>> In order to prevent the execution of the same callback twice we now keep
>> track of the state of the region emulation. For example, this can happen
>> if we program a BAR with an address that overlaps a second BAR, thus
>> deactivating emulation for the second BAR, and then we disable all
>> region accesses to the second BAR by writing to the command register.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  hw/vesa.c           |   6 +-
>>  include/kvm/pci.h   |  23 +++-
>>  pci.c               | 274 +++++++++++++++++++++++++++++++++++---------
>>  powerpc/spapr_pci.c |   2 +-
>>  vfio/pci.c          |  15 ++-
>>  virtio/pci.c        |   8 +-
>>  6 files changed, 261 insertions(+), 67 deletions(-)
>>
>> diff --git a/hw/vesa.c b/hw/vesa.c
>> index 3044a86078fb..aca938f79c82 100644
>> --- a/hw/vesa.c
>> +++ b/hw/vesa.c
>> @@ -49,7 +49,7 @@ static int vesa__bar_activate(struct kvm *kvm,
>>  	int r;
>>  
>>  	bar_addr = pci__bar_address(pci_hdr, bar_num);
>> -	bar_size = pci_hdr->bar_size[bar_num];
>> +	bar_size = pci__bar_size(pci_hdr, bar_num);
>>  
>>  	switch (bar_num) {
>>  	case 0:
>> @@ -121,9 +121,9 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>>  		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
>>  		.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
>>  		.bar[0]			= cpu_to_le32(port_addr | PCI_BASE_ADDRESS_SPACE_IO),
>> -		.bar_size[0]		= PCI_IO_SIZE,
>> +		.bar_info[0]		= (struct pci_bar_info) {.size = PCI_IO_SIZE},
>>  		.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
>> -		.bar_size[1]		= VESA_MEM_SIZE,
>> +		.bar_info[1]		= (struct pci_bar_info) {.size = VESA_MEM_SIZE},
>>  	};
>>  
>>  	vdev->pci_hdr.cfg_ops = (struct pci_config_operations) {
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index bf42f497168f..ae71ef33237c 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -11,6 +11,17 @@
>>  #include "kvm/msi.h"
>>  #include "kvm/fdt.h"
>>  
>> +#define pci_dev_err(pci_hdr, fmt, ...) \
>> +	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_warn(pci_hdr, fmt, ...) \
>> +	pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_info(pci_hdr, fmt, ...) \
>> +	pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_dbg(pci_hdr, fmt, ...) \
>> +	pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_die(pci_hdr, fmt, ...) \
>> +	die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +
>>  /*
>>   * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
>>   * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
>> @@ -89,6 +100,11 @@ struct pci_cap_hdr {
>>  	u8	next;
>>  };
>>  
>> +struct pci_bar_info {
>> +	u32 size;
>> +	bool active;
>> +};
> Do we really need this data structure above?
> There is this "32-bit plus 1-bit" annoyance, but also a lot of changes in this patch are about this, making the code less pretty.
> So what about we introduce a bitmap, below in struct pci_device_header? I think we inherited the neat set_bit/test_bit functions from the kernel, so can we use that by just adding something like an "unsigned long bar_enabled;" below?

I think I understand what you are saying. I don't want to use a bitmap, because I
think that's even uglier. I'll try and see how adding an array of bools to struct
pci_device_header and keeping the bar_size member would look like.

>
>> +
>>  struct pci_device_header;
>>  
>>  typedef int (*bar_activate_fn_t)(struct kvm *kvm,
>> @@ -142,7 +158,7 @@ struct pci_device_header {
>>  	};
>>  
>>  	/* Private to lkvm */
>> -	u32		bar_size[6];
>> +	struct pci_bar_info	bar_info[6];
>>  	bar_activate_fn_t	bar_activate_fn;
>>  	bar_deactivate_fn_t	bar_deactivate_fn;
>>  	void *data;
>> @@ -224,4 +240,9 @@ static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_nu
>>  	return __pci__bar_address(pci_hdr->bar[bar_num]);
>>  }
>>  
>> +static inline u32 pci__bar_size(struct pci_device_header *pci_hdr, int bar_num)
>> +{
>> +	return pci_hdr->bar_info[bar_num].size;
>> +}
>> +
>>  #endif /* KVM__PCI_H */
>> diff --git a/pci.c b/pci.c
>> index 98331a1fc205..1e9791250bc3 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -68,7 +68,7 @@ void pci__assign_irq(struct device_header *dev_hdr)
>>  
>>  static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
>>  {
>> -	return  bar_num < 6 && pci_hdr->bar_size[bar_num];
>> +	return  bar_num < 6 && pci__bar_size(pci_hdr, bar_num);
>>  }
>>  
>>  static void *pci_config_address_ptr(u16 port)
>> @@ -157,6 +157,46 @@ static struct ioport_operations pci_config_data_ops = {
>>  	.io_out	= pci_config_data_out,
>>  };
>>  
>> +static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			    int bar_num)
>> +{
>> +	int r = 0;
>> +
>> +	if (pci_hdr->bar_info[bar_num].active)
>> +		goto out;
>> +
>> +	r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
>> +	if (r < 0) {
>> +		pci_dev_err(pci_hdr, "Error activating emulation for BAR %d",
>> +			    bar_num);
>> +		goto out;
>> +	}
>> +	pci_hdr->bar_info[bar_num].active = true;
>> +
>> +out:
>> +	return r;
>> +}
>> +
>> +static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			      int bar_num)
>> +{
>> +	int r = 0;
>> +
>> +	if (!pci_hdr->bar_info[bar_num].active)
>> +		goto out;
>> +
>> +	r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
>> +	if (r < 0) {
>> +		pci_dev_err(pci_hdr, "Error deactivating emulation for BAR %d",
>> +			    bar_num);
>> +		goto out;
>> +	}
>> +	pci_hdr->bar_info[bar_num].active = false;
>> +
>> +out:
>> +	return r;
>> +}
>> +
>>  static void pci_config_command_wr(struct kvm *kvm,
>>  				  struct pci_device_header *pci_hdr,
>>  				  u16 new_command)
>> @@ -173,26 +213,179 @@ static void pci_config_command_wr(struct kvm *kvm,
>>  
>>  		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
>>  			if (__pci__io_space_enabled(new_command))
>> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>> -							 pci_hdr->data);
>> -			else
>> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>> -							   pci_hdr->data);
>> +				pci_activate_bar(kvm, pci_hdr, i);
>> +			if (!__pci__io_space_enabled(new_command))
> Isn't that just "else", as before?
>
>> +				pci_deactivate_bar(kvm, pci_hdr, i);
>>  		}
>>  
>>  		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
>>  			if (__pci__memory_space_enabled(new_command))
>> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>> -							 pci_hdr->data);
>> -			else
>> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>> -							   pci_hdr->data);
>> +				pci_activate_bar(kvm, pci_hdr, i);
>> +			if (!__pci__memory_space_enabled(new_command))
> Same here?

You're right (same as above).

>
>> +				pci_deactivate_bar(kvm, pci_hdr, i);
>>  		}
>>  	}
>>  
>>  	pci_hdr->command = new_command;
>>  }
>>  
>> +static int pci_deactivate_bar_regions(struct kvm *kvm,
>> +				      struct pci_device_header *pci_hdr,
>> +				      u32 start, u32 size)
>> +{
>> +	struct device_header *dev_hdr;
>> +	struct pci_device_header *tmp_hdr;
>> +	u32 tmp_addr, tmp_size;
>> +	int i, r;
>> +
>> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
>> +	while (dev_hdr) {
>> +		tmp_hdr = dev_hdr->data;
>> +		for (i = 0; i < 6; i++) {
>> +			if (!pci_bar_is_implemented(tmp_hdr, i))
>> +				continue;
>> +
>> +			tmp_addr = pci__bar_address(tmp_hdr, i);
>> +			tmp_size = pci__bar_size(tmp_hdr, i);
>> +
>> +			if (tmp_addr + tmp_size <= start ||
>> +			    tmp_addr >= start + size)
>> +				continue;
>> +
>> +			r = pci_deactivate_bar(kvm, tmp_hdr, i);
>> +			if (r < 0)
>> +				return r;
>> +		}
>> +		dev_hdr = device__next_dev(dev_hdr);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int pci_activate_bar_regions(struct kvm *kvm,
>> +				    struct pci_device_header *pci_hdr,
>> +				    u32 start, u32 size)
>> +{
>> +	struct device_header *dev_hdr;
>> +	struct pci_device_header *tmp_hdr;
>> +	u32 tmp_addr, tmp_size;
>> +	int i, r;
>> +
>> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
>> +	while (dev_hdr) {
>> +		tmp_hdr = dev_hdr->data;
>> +		for (i = 0; i < 6; i++) {
>> +			if (!pci_bar_is_implemented(tmp_hdr, i))
>> +				continue;
>> +
>> +			tmp_addr = pci__bar_address(tmp_hdr, i);
>> +			tmp_size = pci__bar_size(tmp_hdr, i);
>> +
>> +			if (tmp_addr + tmp_size <= start ||
>> +			    tmp_addr >= start + size)
>> +				continue;
>> +
>> +			r = pci_activate_bar(kvm, tmp_hdr, i);
>> +			if (r < 0)
>> +				return r;
>> +		}
>> +		dev_hdr = device__next_dev(dev_hdr);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void pci_config_bar_wr(struct kvm *kvm,
>> +			      struct pci_device_header *pci_hdr, int bar_num,
>> +			      u32 value)
>> +{
>> +	u32 old_addr, new_addr, bar_size;
>> +	u32 mask;
>> +	int r;
>> +
>> +	if (pci__bar_is_io(pci_hdr, bar_num))
>> +		mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
>> +	else
>> +		mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
>> +
>> +	/*
>> +	 * If the kernel masks the BAR, it will expect to find the size of the
>> +	 * BAR there next time it reads from it. After the kernel reads the
>> +	 * size, it will write the address back.
>> +	 *
>> +	 * According to the PCI local bus specification REV 3.0: The number of
>> +	 * upper bits that a device actually implements depends on how much of
>> +	 * the address space the device will respond to. A device that wants a 1
>> +	 * MB memory address space (using a 32-bit base address register) would
>> +	 * build the top 12 bits of the address register, hardwiring the other
>> +	 * bits to 0.
>> +	 *
>> +	 * Furthermore, software can determine how much address space the device
>> +	 * requires by writing a value of all 1's to the register and then
>> +	 * reading the value back. The device will return 0's in all don't-care
>> +	 * address bits, effectively specifying the address space required.
>> +	 *
>> +	 * Software computes the size of the address space with the formula
>> +	 * S =  ~B + 1, where S is the memory size and B is the value read from
>> +	 * the BAR. This means that the BAR value that kvmtool should return is
>> +	 * B = ~(S - 1).
>> +	 */
>> +	if (value == 0xffffffff) {
>> +		value = ~(pci__bar_size(pci_hdr, bar_num) - 1);
>> +		/* Preserve the special bits. */
>> +		value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
>> +		pci_hdr->bar[bar_num] = value;
>> +		return;
>> +	}
>> +
>> +	value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
>> +
>> +	/* Don't toggle emulation when region type access is disbled. */
>> +	if (pci__bar_is_io(pci_hdr, bar_num) &&
>> +	    !pci__io_space_enabled(pci_hdr)) {
>> +		pci_hdr->bar[bar_num] = value;
>> +		return;
>> +	}
>> +
>> +	if (pci__bar_is_memory(pci_hdr, bar_num) &&
>> +	    !pci__memory_space_enabled(pci_hdr)) {
>> +		pci_hdr->bar[bar_num] = value;
>> +		return;
>> +	}
>> +
>> +	old_addr = pci__bar_address(pci_hdr, bar_num);
>> +	new_addr = __pci__bar_address(value);
>> +	bar_size = pci__bar_size(pci_hdr, bar_num);
>> +
>> +	r = pci_deactivate_bar(kvm, pci_hdr, bar_num);
>> +	if (r < 0)
>> +		return;
>> +
>> +	r = pci_deactivate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
>> +	if (r < 0) {
>> +		/*
>> +		 * We cannot update the BAR because of an overlapping region
>> +		 * that failed to deactivate emulation, so keep the old BAR
>> +		 * value and re-activate emulation for it.
>> +		 */
>> +		pci_activate_bar(kvm, pci_hdr, bar_num);
>> +		return;
>> +	}
>> +
>> +	pci_hdr->bar[bar_num] = value;
>> +	r = pci_activate_bar(kvm, pci_hdr, bar_num);
>> +	if (r < 0) {
>> +		/*
>> +		 * New region cannot be emulated, re-enable the regions that
>> +		 * were overlapping.
>> +		 */
>> +		pci_activate_bar_regions(kvm, pci_hdr, new_addr, bar_size);
>> +		return;
>> +	}
>> +
>> +	pci_activate_bar_regions(kvm, pci_hdr, old_addr, bar_size);
>> +}
>> +
>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>>  	void *base;
>> @@ -200,7 +393,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>>  	u32 value = 0;
>> -	u32 mask;
>>  
>>  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
>>  		return;
>> @@ -225,46 +417,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	}
>>  
>>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>> -
>> -	/*
>> -	 * If the kernel masks the BAR, it will expect to find the size of the
>> -	 * BAR there next time it reads from it. After the kernel reads the
>> -	 * size, it will write the address back.
>> -	 */
>>  	if (bar < 6) {
>> -		if (pci__bar_is_io(pci_hdr, bar))
>> -			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
>> -		else
>> -			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
>> -		/*
>> -		 * According to the PCI local bus specification REV 3.0:
>> -		 * The number of upper bits that a device actually implements
>> -		 * depends on how much of the address space the device will
>> -		 * respond to. A device that wants a 1 MB memory address space
>> -		 * (using a 32-bit base address register) would build the top
>> -		 * 12 bits of the address register, hardwiring the other bits
>> -		 * to 0.
>> -		 *
>> -		 * Furthermore, software can determine how much address space
>> -		 * the device requires by writing a value of all 1's to the
>> -		 * register and then reading the value back. The device will
>> -		 * return 0's in all don't-care address bits, effectively
>> -		 * specifying the address space required.
>> -		 *
>> -		 * Software computes the size of the address space with the
>> -		 * formula S = ~B + 1, where S is the memory size and B is the
>> -		 * value read from the BAR. This means that the BAR value that
>> -		 * kvmtool should return is B = ~(S - 1).
>> -		 */
>>  		memcpy(&value, data, size);
>> -		if (value == 0xffffffff)
>> -			value = ~(pci_hdr->bar_size[bar] - 1);
>> -		/* Preserve the special bits. */
>> -		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
>> -		memcpy(base + offset, &value, size);
>> -	} else {
>> -		memcpy(base + offset, data, size);
>> +		pci_config_bar_wr(kvm, pci_hdr, bar, value);
>> +		return;
>>  	}
>> +
>> +	memcpy(base + offset, data, size);
>>  }
>>  
>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>> @@ -329,20 +488,21 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr
>>  			continue;
>>  
>>  		has_bar_regions = true;
>> +		assert(!pci_hdr->bar_info[i].active);
>>  
>>  		if (pci__bar_is_io(pci_hdr, i) &&
>>  		    pci__io_space_enabled(pci_hdr)) {
>> -				r = bar_activate_fn(kvm, pci_hdr, i, data);
>> -				if (r < 0)
>> -					return r;
>> -			}
>> +			r = pci_activate_bar(kvm, pci_hdr, i);
>> +			if (r < 0)
>> +				return r;
>> +		}
>>  
>>  		if (pci__bar_is_memory(pci_hdr, i) &&
>>  		    pci__memory_space_enabled(pci_hdr)) {
>> -				r = bar_activate_fn(kvm, pci_hdr, i, data);
>> -				if (r < 0)
>> -					return r;
>> -			}
>> +			r = pci_activate_bar(kvm, pci_hdr, i);
>> +			if (r < 0)
>> +				return r;
>> +		}
>>  	}
>>  
>>  	assert(has_bar_regions);
>> diff --git a/powerpc/spapr_pci.c b/powerpc/spapr_pci.c
>> index a15f7d895a46..7be44d950acb 100644
>> --- a/powerpc/spapr_pci.c
>> +++ b/powerpc/spapr_pci.c
>> @@ -369,7 +369,7 @@ int spapr_populate_pci_devices(struct kvm *kvm,
>>  				of_pci_b_ddddd(devid) |
>>  				of_pci_b_fff(fn) |
>>  				of_pci_b_rrrrrrrr(bars[i]));
>> -			reg[n+1].size = cpu_to_be64(hdr->bar_size[i]);
>> +			reg[n+1].size = cpu_to_be64(pci__bar_size(hdr, i));
>>  			reg[n+1].addr = 0;
>>  
>>  			assigned_addresses[n].phys_hi = cpu_to_be32(
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 9e595562180b..3a641e72e574 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -455,6 +455,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>>  	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>>  	struct vfio_pci_msix_table *table = &pdev->msix_table;
>>  	struct vfio_region *region = &vdev->regions[bar_num];
>> +	u32 bar_addr;
>>  	int ret;
>>  
>>  	if (!region->info.size) {
>> @@ -462,8 +463,11 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>>  		goto out;
>>  	}
>>  
>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>> +
>>  	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>>  	    (u32)bar_num == table->bar) {
>> +		table->guest_phys_addr = region->guest_phys_addr = bar_addr;
> I think those double assignments are a bit frowned upon, at least in Linux coding style. It would probably be cleaner to assign the region member after the error check.
>
>>  		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>>  					 table->size, false,
>>  					 vfio_pci_msix_table_access, pdev);
>> @@ -473,13 +477,22 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>>  
>>  	if ((pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) &&
>>  	    (u32)bar_num == pba->bar) {
>> +		if (pba->bar == table->bar)
>> +			pba->guest_phys_addr = table->guest_phys_addr + table->size;
>> +		else
>> +			pba->guest_phys_addr = region->guest_phys_addr = bar_addr;
> same here with the double assignment

Ok, I'll split it.

>
>>  		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>>  					 pba->size, false,
>>  					 vfio_pci_msix_pba_access, pdev);
>>  		goto out;
>>  	}
>>  
>> +	if (pci__bar_is_io(pci_hdr, bar_num))
>> +		region->port_base = bar_addr;
>> +	else
>> +		region->guest_phys_addr = bar_addr;
> Isn't that redundant with those double assignments above? Maybe you can get rid of those altogether?

I don't think it's redundant, because the double assignments above only happen
when specific conditions are met.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property
  2020-02-07 17:38   ` Andre Przywara
@ 2020-03-10 16:04     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-10 16:04 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi,
	maz, Julien Thierry

Hi,

On 2/7/20 5:38 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:48:03 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> From: Julien Thierry <julien.thierry@arm.com>
>>
>> PCI now supports configurable BARs. Get rid of the no longer needed,
>> Linux-only, fdt property.
> I was just wondering: what is the x86 story here?
> Does the x86 kernel never reassign BARs? Or is this dependent on something else?
> I see tons of pci kernel command line parameters for pci=, maybe one of them would explicitly allow reassigning?

I only see pci=conf1, can you post your kernel command line? Here's mine:

[    0.000000] Command line: noapic noacpi pci=conf1 reboot=k panic=1
i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 earlyprintk=serial i8042.noaux=1
console=ttyS0 earlycon root=/dev/vda1

Just for pci=conf1, from Documentation/admin-guide/kernel-parameters.txt:

"conf1        [X86] Force use of PCI Configuration Access
                Mechanism 1 (config address in IO port 0xCF8,
                data in IO port 0xCFC, both 32-bit)."

But you have a point, I haven't seen an x86 guest reassign BARs, I assumed it's
because it trusts the BIOS allocation. I'll try to figure out why this happens
(maybe I need a special kernel parameter).

Thanks,
Alex
>
> Cheers,
> Andre
>
>> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  arm/fdt.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/arm/fdt.c b/arm/fdt.c
>> index c80e6da323b6..02091e9e0bee 100644
>> --- a/arm/fdt.c
>> +++ b/arm/fdt.c
>> @@ -130,7 +130,6 @@ static int setup_fdt(struct kvm *kvm)
>>  
>>  	/* /chosen */
>>  	_FDT(fdt_begin_node(fdt, "chosen"));
>> -	_FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1));
>>  
>>  	/* Pass on our amended command line to a Linux kernel only. */
>>  	if (kvm->cfg.firmware_filename) {

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 30/30] arm/arm64: Add PCI Express 1.1 support
  2020-02-07 16:51   ` Andre Przywara
@ 2020-03-10 16:28     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-03-10 16:28 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 2/7/20 4:51 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:48:05 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> PCI Express comes with an extended addressing scheme, which directly
>> translated into a bigger device configuration space (256->4096 bytes)
>> and bigger PCI configuration space (16->256 MB), as well as mandatory
>> capabilities (power management [1] and PCI Express capability [2]).
>>
>> However, our virtio PCI implementation implements version 0.9 of the
>> protocol and it still uses transitional PCI device ID's, so we have
>> opted to omit the mandatory PCI Express capabilities.For VFIO, the power
>> management and PCI Express capability are left for a subsequent patch.
>>
>> [1] PCI Express Base Specification Revision 1.1, section 7.6
>> [2] PCI Express Base Specification Revision 1.1, section 7.8
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  arm/include/arm-common/kvm-arch.h |  4 +-
>>  arm/pci.c                         |  2 +-
>>  builtin-run.c                     |  1 +
>>  hw/vesa.c                         |  2 +-
>>  include/kvm/kvm-config.h          |  2 +-
>>  include/kvm/pci.h                 | 76 ++++++++++++++++++++++++++++---
>>  pci.c                             |  5 +-
>>  vfio/pci.c                        | 26 +++++++----
>>  8 files changed, 97 insertions(+), 21 deletions(-)
>>
>> diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
>> index b9d486d5eac2..13c55fa3dc29 100644
>> --- a/arm/include/arm-common/kvm-arch.h
>> +++ b/arm/include/arm-common/kvm-arch.h
>> @@ -23,7 +23,7 @@
>>  
>>  #define ARM_IOPORT_SIZE		(ARM_MMIO_AREA - ARM_IOPORT_AREA)
>>  #define ARM_VIRTIO_MMIO_SIZE	(ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
>> -#define ARM_PCI_CFG_SIZE	(1ULL << 24)
>> +#define ARM_PCI_CFG_SIZE	(1ULL << 28)
>>  #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
>>  				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
>>  
>> @@ -50,6 +50,8 @@
>>  
>>  #define VIRTIO_RING_ENDIAN	(VIRTIO_ENDIAN_LE | VIRTIO_ENDIAN_BE)
>>  
>> +#define ARCH_HAS_PCI_EXP	1
>> +
>>  static inline bool arm_addr_in_ioport_region(u64 phys_addr)
>>  {
>>  	u64 limit = KVM_IOPORT_AREA + ARM_IOPORT_SIZE;
>> diff --git a/arm/pci.c b/arm/pci.c
>> index 1c0949a22408..eec9f3d936a5 100644
>> --- a/arm/pci.c
>> +++ b/arm/pci.c
>> @@ -77,7 +77,7 @@ void pci__generate_fdt_nodes(void *fdt)
>>  	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
>>  	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
>>  	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
>> -	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
>> +	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
>>  	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
>>  
>>  	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
>> diff --git a/builtin-run.c b/builtin-run.c
>> index 9cb8c75300eb..def8a1f803ad 100644
>> --- a/builtin-run.c
>> +++ b/builtin-run.c
>> @@ -27,6 +27,7 @@
>>  #include "kvm/irq.h"
>>  #include "kvm/kvm.h"
>>  #include "kvm/pci.h"
>> +#include "kvm/vfio.h"
>>  #include "kvm/rtc.h"
>>  #include "kvm/sdl.h"
>>  #include "kvm/vnc.h"
>> diff --git a/hw/vesa.c b/hw/vesa.c
>> index aca938f79c82..4321cfbb6ddc 100644
>> --- a/hw/vesa.c
>> +++ b/hw/vesa.c
>> @@ -82,7 +82,7 @@ static int vesa__bar_deactivate(struct kvm *kvm,
>>  }
>>  
>>  static void vesa__pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> -				u8 offset, void *data, int sz)
>> +				u16 offset, void *data, int sz)
>>  {
>>  	u32 value;
>>  
>> diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
>> index a052b0bc7582..a1012c57b7a7 100644
>> --- a/include/kvm/kvm-config.h
>> +++ b/include/kvm/kvm-config.h
>> @@ -2,7 +2,6 @@
>>  #define KVM_CONFIG_H_
>>  
>>  #include "kvm/disk-image.h"
>> -#include "kvm/vfio.h"
>>  #include "kvm/kvm-config-arch.h"
>>  
>>  #define DEFAULT_KVM_DEV		"/dev/kvm"
>> @@ -18,6 +17,7 @@
>>  #define MIN_RAM_SIZE_MB		(64ULL)
>>  #define MIN_RAM_SIZE_BYTE	(MIN_RAM_SIZE_MB << MB_SHIFT)
>>  
>> +struct vfio_device_params;
>>  struct kvm_config {
>>  	struct kvm_config_arch arch;
>>  	struct disk_image_params disk_image[MAX_DISK_IMAGES];
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index ae71ef33237c..0c3c74b82626 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -10,6 +10,7 @@
>>  #include "kvm/devices.h"
>>  #include "kvm/msi.h"
>>  #include "kvm/fdt.h"
>> +#include "kvm.h"
>>  
>>  #define pci_dev_err(pci_hdr, fmt, ...) \
>>  	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> @@ -32,9 +33,41 @@
>>  #define PCI_CONFIG_BUS_FORWARD	0xcfa
>>  #define PCI_IO_SIZE		0x100
>>  #define PCI_IOPORT_START	0x6200
>> -#define PCI_CFG_SIZE		(1ULL << 24)
>>  
>> -struct kvm;
>> +#define PCIE_CAP_REG_VER	0x1
>> +#define PCIE_CAP_REG_DEV_LEGACY	(1 << 4)
>> +#define PM_CAP_VER		0x3
>> +
>> +#ifdef ARCH_HAS_PCI_EXP
>> +#define PCI_CFG_SIZE		(1ULL << 28)
>> +#define PCI_DEV_CFG_SIZE	PCI_CFG_SPACE_EXP_SIZE
>> +
>> +union pci_config_address {
>> +	struct {
>> +#if __BYTE_ORDER == __LITTLE_ENDIAN
>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
> Meeh, using C struct bitfields and expect them to map to certain bits is not within the C standard. But I see that you are merely the messenger here, as we use this already for the CAM mapping. So we keep this fix for another time ...
>
>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>> +		unsigned	enable_bit	: 1;		/* 31       */
>> +#else
>> +		unsigned	enable_bit	: 1;		/* 31       */
>> +		unsigned	reserved	: 3;		/* 30 .. 28 */
>> +		unsigned	bus_number	: 8;		/* 27 .. 20 */
>> +		unsigned	device_number	: 5;		/* 19 .. 15 */
>> +		unsigned	function_number	: 3;		/* 14 .. 12 */
>> +		unsigned	register_number	: 10;		/* 11 .. 2  */
>> +		unsigned	reg_offset	: 2;		/* 1  .. 0  */
>> +#endif
>> +	};
>> +	u32 w;
>> +};
>> +
>> +#else
>> +#define PCI_CFG_SIZE		(1ULL << 24)
>> +#define PCI_DEV_CFG_SIZE	PCI_CFG_SPACE_SIZE
>>  
>>  union pci_config_address {
>>  	struct {
>> @@ -58,6 +91,8 @@ union pci_config_address {
>>  	};
>>  	u32 w;
>>  };
>> +#endif
>> +#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>>  
>>  struct msix_table {
>>  	struct msi_msg msg;
>> @@ -100,6 +135,33 @@ struct pci_cap_hdr {
>>  	u8	next;
>>  };
>>  
>> +struct pcie_cap {
>> +	u8 cap;
>> +	u8 next;
>> +	u16 cap_reg;
>> +	u32 dev_cap;
>> +	u16 dev_ctrl;
>> +	u16 dev_status;
>> +	u32 link_cap;
>> +	u16 link_ctrl;
>> +	u16 link_status;
>> +	u32 slot_cap;
>> +	u16 slot_ctrl;
>> +	u16 slot_status;
>> +	u16 root_ctrl;
>> +	u16 root_cap;
>> +	u32 root_status;
>> +};
> Wouldn't you need those to be defined as packed as well, if you include them below in a packed struct?

No. For gcc-8.4 and gcc-4.0.2 (and I assume everything in between):

"Specifying the |packed|attribute for |struct|and |union|types is equivalent to
specifying the |packed|attribute on each of the structure or union members".

> But more importantly: Do we actually need those definitions? We don't seem to use them, do we?
> And the u8 __pad[PCI_DEV_CFG_SIZE] below should provide the extended storage space a guest would expect?

Yes, we don't use them for the reasons I explained in the commit message. I would
rather keep them, because they are required by the PCIE spec.

Thanks,
Alex
>
> The rest looks alright.
>
> Cheers,
> Andre.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures
  2020-03-06 11:20     ` Alexandru Elisei
@ 2020-03-30  9:27       ` André Przywara
  0 siblings, 0 replies; 88+ messages in thread
From: André Przywara @ 2020-03-30  9:27 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On 06/03/2020 11:20, Alexandru Elisei wrote:

Hi,

replying here after reviewing the v3 patch, and still seeing the problem.

> On 1/30/20 2:51 PM, Andre Przywara wrote:
>> On Thu, 23 Jan 2020 13:47:50 +0000
>> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>>
>> Hi,
>>
>>> Don't ignore an error in the bus specific initialization function in
>>> virtio_init; don't ignore the result of virtio_init; and don't return 0
>>> in virtio_blk__init and virtio_scsi__init when we encounter an error.
>>> Hopefully this will save some developer's time debugging faulty virtio
>>> devices in a guest.
>> Seems like the right thing to do, but I was wondering how you triggered this? AFAICS virtio_init only fails when calloc() fails or you pass an illegal transport, with the latter looking like being hard coded to one of the two supported.
> 
> I haven't triggered it. I found it by inspection. The transport-specific
> initialization functions can fail for various reasons (ioport_register or
> kvm__register_mmio can fail because some device emulation claimed all the MMIO
> space or the MMIO space was configured incorrectly in the kvm-arch.h header file;
> or memory allocation failed, etc) and this is the reason they return an int.
> Because of this, virtio_init can fail and this is the reason it too returns an
> int. It makes sense to check that the protocol that your device uses is actually
> working.
> 
>>
>> One minor thing below ...
> 
> [..]
> 
>>> diff --git a/virtio/net.c b/virtio/net.c
>>> index 091406912a24..425c13ba1136 100644
>>> --- a/virtio/net.c
>>> +++ b/virtio/net.c
>>> @@ -910,7 +910,7 @@ done:
>>>  
>>>  static int virtio_net__init_one(struct virtio_net_params *params)
>>>  {
>>> -	int i, err;
>>> +	int i, r;
>>>  	struct net_dev *ndev;
>>>  	struct virtio_ops *ops;
>>>  	enum virtio_trans trans = VIRTIO_DEFAULT_TRANS(params->kvm);
>>> @@ -920,10 +920,8 @@ static int virtio_net__init_one(struct virtio_net_params *params)
>>>  		return -ENOMEM;
>>>  
>>>  	ops = malloc(sizeof(*ops));
>>> -	if (ops == NULL) {
>>> -		err = -ENOMEM;
>>> -		goto err_free_ndev;
>>> -	}
>>> +	if (ops == NULL)
>>> +		return -ENOMEM;
>> Doesn't that leave struct net_dev allocated? I am happy with removing the goto, but we should free(ndev) before we return, I think.
> 
> Nope, the cleanup routine in virtio_net__exit takes care of deallocating it (you
> get there from virtio_net__init if virtio_net__init_one fails).

First, I don't see where we actually deallocate the struct net_dev
storage for each network device in __exit() - it seems to only call the
downscript, if needed, but frees nothing.

But more importantly, even that would only happen if this structure
would be already part of the list, which happens only *after* the check
for the ops malloc() return value. If we return prematurely due to this
malloc() failing, the ndev pointer is lost on the stack.

So I guess you need to free this here. As mentioned, you should still
drop the goto, since there is only one user.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support
  2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (30 preceding siblings ...)
  2020-02-07 17:02 ` [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE " Andre Przywara
@ 2020-05-13 14:56 ` Marc Zyngier
  2020-05-13 15:15   ` Alexandru Elisei
  31 siblings, 1 reply; 88+ messages in thread
From: Marc Zyngier @ 2020-05-13 14:56 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Hi all,

On 2020-01-23 13:47, Alexandru Elisei wrote:
> kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to 
> prevent
> it from trying to reassign the BARs. Let's make the BARs reassignable 
> so
> we can get rid of this band-aid.

Is there anything holding up this series? I'd really like to see it
merged in mainline kvmtool, as the EDK2 port seem to have surfaced
(and there are environments where running QEMU is just overkill).

It'd be good if it could be rebased and reposted.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support
  2020-05-13 14:56 ` Marc Zyngier
@ 2020-05-13 15:15   ` Alexandru Elisei
  2020-05-13 16:41     ` Alexandru Elisei
  0 siblings, 1 reply; 88+ messages in thread
From: Alexandru Elisei @ 2020-05-13 15:15 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Hi,

On 5/13/20 3:56 PM, Marc Zyngier wrote:
> Hi all,
>
> On 2020-01-23 13:47, Alexandru Elisei wrote:
>> kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
>> it from trying to reassign the BARs. Let's make the BARs reassignable so
>> we can get rid of this band-aid.
>
> Is there anything holding up this series? I'd really like to see it
> merged in mainline kvmtool, as the EDK2 port seem to have surfaced
> (and there are environments where running QEMU is just overkill).
>
> It'd be good if it could be rebased and reposted.

Thank you for the interest, v3 is already out there, by the way, and the first 18
patches are already merged.

I finished working on v4 and I was just getting ready to run the finally battery
of tests. If I don't discover any bugs (fingers crossed!), I'll send v4 tomorrow.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support
  2020-05-13 15:15   ` Alexandru Elisei
@ 2020-05-13 16:41     ` Alexandru Elisei
  0 siblings, 0 replies; 88+ messages in thread
From: Alexandru Elisei @ 2020-05-13 16:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Hi,

On 5/13/20 4:15 PM, Alexandru Elisei wrote:
> Hi,
>
> On 5/13/20 3:56 PM, Marc Zyngier wrote:
>> Hi all,
>>
>> On 2020-01-23 13:47, Alexandru Elisei wrote:
>>> kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
>>> it from trying to reassign the BARs. Let's make the BARs reassignable so
>>> we can get rid of this band-aid.
>> Is there anything holding up this series? I'd really like to see it
>> merged in mainline kvmtool, as the EDK2 port seem to have surfaced
>> (and there are environments where running QEMU is just overkill).
>>
>> It'd be good if it could be rebased and reposted.
> Thank you for the interest, v3 is already out there, by the way, and the first 18
> patches are already merged.

v3 can be found at link [1]. The cover letter mentions that I had to drop Julien
Thierry's patch that fixed the UART overlapping with the PCI I/O region because it
broke guests that used 64k pages. Which means that EDK2 + PCI still doesn't work
with kvmtool, even if the series gets merged. On the plus side, CFI flash
emulation is merged and EDK2 works right now with kvmtool, as long as you stick to
virtio-mmio (which unfortunately means no passthrough as well). I tested this with
the EDK2 firmware posted by Ard [2].

[1] https://www.spinics.net/lists/kvm/msg211272.html
[2] https://www.spinics.net/lists/kvm/msg213842.html
>
> I finished working on v4 and I was just getting ready to run the finally battery
> of tests. If I don't discover any bugs (fingers crossed!), I'll send v4 tomorrow.

Just as I feared, the last patch in the series, the one that adds PCIe support,
breaks EDK2. EDK2 doesn't know about legacy PCI so the aforementioned overlap is
not an issue. But as soon as I advertise support for PCIe EDK2 breaks because of
it. I think I'll just drop the PCIe support patch from the series (so I don't
regress EDK2 + virtio-mmio support) and re-send it after this entire issue gets
sorted.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2020-05-13 16:41 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 01/30] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 02/30] hw/i8042: Compile only for x86 Alexandru Elisei
2020-01-27 18:07   ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 03/30] pci: Fix BAR resource sizing arbitration Alexandru Elisei
2020-01-27 18:07   ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 04/30] Remove pci-shmem device Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 05/30] Check that a PCI device's memory size is power of two Alexandru Elisei
2020-01-27 18:07   ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 06/30] arm/pci: Advertise only PCI bus 0 in the DT Alexandru Elisei
2020-01-27 18:08   ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 07/30] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
2020-02-07 17:02   ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 08/30] pci: Fix ioport allocation size Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region Alexandru Elisei
2020-01-29 18:16   ` Andre Przywara
2020-03-04 16:20     ` Alexandru Elisei
2020-03-05 13:06       ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 10/30] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
2020-01-29 18:16   ` Andre Przywara
2020-03-05 15:41     ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 11/30] vfio/pci: Allocate correct size for MSIX table and PBA BARs Alexandru Elisei
2020-01-29 18:16   ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 12/30] vfio/pci: Don't assume that only even numbered BARs are 64bit Alexandru Elisei
2020-01-30 14:50   ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 13/30] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
2020-01-30 14:50   ` Andre Przywara
2020-01-30 15:52     ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 14/30] vfio/pci: Don't access potentially unallocated regions Alexandru Elisei
2020-01-29 18:17   ` Andre Przywara
2020-03-06 10:54     ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures Alexandru Elisei
2020-01-30 14:51   ` Andre Przywara
2020-03-06 11:20     ` Alexandru Elisei
2020-03-30  9:27       ` André Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 16/30] Don't ignore errors registering a device, ioport or mmio emulation Alexandru Elisei
2020-01-30 14:51   ` Andre Przywara
2020-03-06 11:28     ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 17/30] hw/vesa: Don't ignore fatal errors Alexandru Elisei
2020-01-30 14:52   ` Andre Przywara
2020-03-06 12:33     ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0 Alexandru Elisei
2020-02-03 12:20   ` Andre Przywara
2020-02-03 12:27     ` Alexandru Elisei
2020-02-05 17:00       ` Andre Przywara
2020-03-06 12:40         ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio Alexandru Elisei
2020-02-03 12:23   ` Andre Przywara
2020-02-05 11:25     ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 20/30] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
2020-02-05 17:00   ` Andre Przywara
2020-02-05 17:02     ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 21/30] virtio/pci: Get emulated region address from BARs Alexandru Elisei
2020-02-05 17:01   ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
2020-02-05 17:01   ` Andre Przywara
2020-03-09 12:38     ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 23/30] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
2020-02-05 18:34   ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 24/30] vfio/pci: Don't write configuration value twice Alexandru Elisei
2020-02-05 18:35   ` Andre Przywara
2020-03-09 15:21     ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
2020-02-06 18:21   ` Andre Przywara
2020-02-07 10:12     ` Alexandru Elisei
2020-02-07 15:39       ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
2020-02-06 18:21   ` Andre Przywara
2020-02-07 11:08     ` Alexandru Elisei
2020-02-07 11:36       ` Andre Przywara
2020-02-07 11:44         ` Alexandru Elisei
2020-03-09 14:54         ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 27/30] pci: Implement reassignable BARs Alexandru Elisei
2020-02-07 16:50   ` Andre Przywara
2020-03-10 14:17     ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
2020-02-07 16:51   ` Andre Przywara
2020-02-07 17:38   ` Andre Przywara
2020-03-10 16:04     ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 29/30] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
2020-02-07 16:51   ` Andre Przywara
2020-01-23 13:48 ` [PATCH v2 kvmtool 30/30] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
2020-02-07 16:51   ` Andre Przywara
2020-03-10 16:28     ` Alexandru Elisei
2020-02-07 17:02 ` [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE " Andre Przywara
2020-05-13 14:56 ` Marc Zyngier
2020-05-13 15:15   ` Alexandru Elisei
2020-05-13 16:41     ` Alexandru Elisei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.