KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support
@ 2019-11-25 10:30 Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 01/16] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
                   ` (16 more replies)
  0 siblings, 17 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
it from trying to reprogram the BARs. Let's make the BARs writable so we
can get rid of this band-aid.

Let's also extend the legacy PCI emulation, which came out in 1992, so we
can properly emulate the PCI Express version 1.1 protocol, which is
relatively newer, being published in 2005.

With these two changes, we are very close to running EDK2 as the firmware
for the virtual machine; the only thing that is missing is flash emulation
for storing firmware variables.

Summary of the patches:
* Patches 1-12 are fixes or enhancements needed to support reprogramable
  BARs.
* Patches 13-15 add support for reprogramable BARs and remove the dt
  property.
* Patch 16 adds support for PCIE 1.1.

Based on the series at [1].

[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/034964.html

Alexandru Elisei (8):
  Makefile: Use correct objcopy binary when cross-compiling for x86_64
  Remove pci-shmem device
  Check that a PCI device's memory size is power of two
  arm: pci.c: Advertise only PCI bus 0 in the DT
  virtio/pci: Ignore MMIO and I/O accesses when they are disabled
  Use independent read/write locks for ioport and mmio
  virtio/pci: Add support for BAR configuration
  Add PCI Express 1.1 support

Julien Thierry (7):
  ioport: pci: Move port allocations to PCI devices
  pci: Fix ioport allocation size
  arm/pci: Fix PCI IO region
  arm/pci: Do not use first PCI IO space bytes for devices
  virtio/pci: Make memory and IO BARs independent
  vfio: Add support for BAR configuration
  arm/fdt: Remove 'linux,pci-probe-only' property

Sami Mujawar (1):
  pci: Fix BAR resource sizing arbitration

 Makefile                          |   4 +-
 arm/fdt.c                         |   1 -
 arm/include/arm-common/kvm-arch.h |   2 +-
 arm/include/arm-common/pci.h      |   1 +
 arm/kvm.c                         |   3 +
 arm/pci.c                         |  28 ++-
 builtin-run.c                     |   5 -
 hw/pci-shmem.c                    | 400 ------------------------------
 hw/vesa.c                         |  21 +-
 include/kvm/ioport.h              |   4 -
 include/kvm/pci-shmem.h           |  32 ---
 include/kvm/pci.h                 |  61 ++++-
 include/kvm/util.h                |   2 +
 include/kvm/vesa.h                |   6 +-
 include/kvm/virtio-pci.h          |   1 +
 ioport.c                          |  36 +--
 mmio.c                            |  26 +-
 pci.c                             |  64 ++++-
 powerpc/include/kvm/kvm-arch.h    |   2 +-
 vfio/core.c                       |   6 +-
 vfio/pci.c                        |  97 +++++++-
 virtio/pci.c                      | 355 ++++++++++++++++++++------
 x86/include/kvm/kvm-arch.h        |   2 +-
 23 files changed, 562 insertions(+), 597 deletions(-)
 delete mode 100644 hw/pci-shmem.c
 delete mode 100644 include/kvm/pci-shmem.h

-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 01/16] Makefile: Use correct objcopy binary when cross-compiling for x86_64
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-27 18:23   ` Andre Przywara
  2019-11-25 10:30 ` [PATCH kvmtool 02/16] pci: Fix BAR resource sizing arbitration Alexandru Elisei
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Use the compiler toolchain version of objcopy instead of the native one
when cross-compiling for the x86_64 architecture.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index b76d844f2e01..6d6880dd4f8a 100644
--- a/Makefile
+++ b/Makefile
@@ -22,6 +22,7 @@ CC	:= $(CROSS_COMPILE)gcc
 CFLAGS	:=
 LD	:= $(CROSS_COMPILE)ld
 LDFLAGS	:=
+OBJCOPY	:= $(CROSS_COMPILE)objcopy
 
 FIND	:= find
 CSCOPE	:= cscope
@@ -479,7 +480,7 @@ x86/bios/bios.bin.elf: x86/bios/entry.S x86/bios/e820.c x86/bios/int10.c x86/bio
 
 x86/bios/bios.bin: x86/bios/bios.bin.elf
 	$(E) "  OBJCOPY " $@
-	$(Q) objcopy -O binary -j .text x86/bios/bios.bin.elf x86/bios/bios.bin
+	$(Q) $(OBJCOPY) -O binary -j .text x86/bios/bios.bin.elf x86/bios/bios.bin
 
 x86/bios/bios-rom.o: x86/bios/bios-rom.S x86/bios/bios.bin x86/bios/bios-rom.h
 	$(E) "  CC      " $@
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 02/16] pci: Fix BAR resource sizing arbitration
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 01/16] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-27 18:24   ` Andre Przywara
  2019-11-25 10:30 ` [PATCH kvmtool 03/16] Remove pci-shmem device Alexandru Elisei
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

From: Sami Mujawar <sami.mujawar@arm.com>

According to the 'PCI Local Bus Specification, Revision 3.0,
February 3, 2004, Section 6.2.5.1, Implementation Notes, page 227'

    "Software saves the original value of the Base Address register,
    writes 0 FFFF FFFFh to the register, then reads it back. Size
    calculation can be done from the 32-bit value read by first
    clearing encoding information bits (bit 0 for I/O, bits 0-3 for
    memory), inverting all 32 bits (logical NOT), then incrementing
    by 1. The resultant 32-bit value is the memory/I/O range size
    decoded by the register. Note that the upper 16 bits of the result
    is ignored if the Base Address register is for I/O and bits 16-31
    returned zero upon read."

kvmtool was returning the actual BAR resource size which would be
incorrect as the software software drivers would invert all 32 bits
(logical NOT), then incrementing by 1. This ends up with a very large
resource size (in some cases more than 4GB) due to which drivers
assert/fail to work.

e.g if the BAR resource size was 0x1000, kvmtool would return 0x1000
instead of 0xFFFFF00x.

Fixed pci__config_wr() to return the size of the BAR in accordance with
the PCI Local Bus specification, Implementation Notes.

Cc: julien.thierry.kdev@gmail.com
Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Reworked algorithm, removed power-of-two check]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 pci.c | 42 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/pci.c b/pci.c
index 689869cb79a3..e1b57325bdeb 100644
--- a/pci.c
+++ b/pci.c
@@ -149,6 +149,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	u8 bar, offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
+	u32 value = 0;
 
 	if (!pci_device_exists(addr.bus_number, dev_num, 0))
 		return;
@@ -169,13 +170,42 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
 
 	/*
-	 * If the kernel masks the BAR it would expect to find the size of the
-	 * BAR there next time it reads from it. When the kernel got the size it
-	 * would write the address back.
+	 * If the kernel masks the BAR, it will expect to find the size of the
+	 * BAR there next time it reads from it. After the kernel reads the
+	 * size, it will write the address back.
 	 */
-	if (bar < 6 && ioport__read32(data) == 0xFFFFFFFF) {
-		u32 sz = pci_hdr->bar_size[bar];
-		memcpy(base + offset, &sz, sizeof(sz));
+	if (bar < 6) {
+		memcpy(&value, data, size);
+		if (value == 0xffffffff)
+			/*
+			 * According to the PCI local bus specification REV 3.0:
+			 * The number of upper bits that a device actually implements
+			 * depends on how much of the address space the device will
+			 * respond to. A device that wants a 1 MB memory address space
+			 * (using a 32-bit base address register) would build the top
+			 * 12 bits of the address register, hardwiring the other bits
+			 * to 0.
+			 *
+			 * Furthermore, software can determine how much address space
+			 * the device requires by writing a value of all 1's to the
+			 * register and then reading the value back. The device will
+			 * return 0's in all don't-care address bits, effectively
+			 * specifying the address space required.
+			 *
+			 * Software computes the size of the address space with the
+			 * formula S = ~B + 1, where S is the memory size and B is the
+			 * value read from the BAR. This means that the BAR value that
+			 * kvmtool should return is B = ~(S - 1).
+			 */
+			value = ~(pci_hdr->bar_size[bar] - 1);
+		if (pci_hdr->bar[bar] & 0x1)
+			value = (value & PCI_BASE_ADDRESS_IO_MASK) | \
+				PCI_BASE_ADDRESS_SPACE_IO;
+		else
+			/* Memory Space BAR, preserve the first 4 bits. */
+			value = (value & PCI_BASE_ADDRESS_MEM_MASK) | \
+				(pci_hdr->bar[bar] & ~PCI_BASE_ADDRESS_MEM_MASK);
+		memcpy(base + offset, &value, size);
 	} else {
 		memcpy(base + offset, data, size);
 	}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 03/16] Remove pci-shmem device
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 01/16] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 02/16] pci: Fix BAR resource sizing arbitration Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-27 18:24   ` Andre Przywara
  2019-11-25 10:30 ` [PATCH kvmtool 04/16] Check that a PCI device's memory size is power of two Alexandru Elisei
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

The pci-shmem emulated device ("ivshmem") was created by QEMU for
cross-VM data sharing. The only Linux driver that uses this device is
the Android Virtual System on a Chip staging driver, which also mentions
a character device driver implemented on top of shmem, which was removed
from Linux.

On the kvmtool side, the only commits touching the pci-shmem device
since it was introduced in 2012 were made when refactoring various
kvmtool subsystems. Let's remove the maintenance burden on the kvmtool
maintainers and remove this unused device.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 Makefile                |   1 -
 builtin-run.c           |   5 -
 hw/pci-shmem.c          | 400 ----------------------------------------
 include/kvm/pci-shmem.h |  32 ----
 4 files changed, 438 deletions(-)
 delete mode 100644 hw/pci-shmem.c
 delete mode 100644 include/kvm/pci-shmem.h

diff --git a/Makefile b/Makefile
index 6d6880dd4f8a..99c6a9e24d72 100644
--- a/Makefile
+++ b/Makefile
@@ -99,7 +99,6 @@ OBJS	+= util/read-write.o
 OBJS	+= util/util.o
 OBJS	+= virtio/9p.o
 OBJS	+= virtio/9p-pdu.o
-OBJS	+= hw/pci-shmem.o
 OBJS	+= kvm-ipc.o
 OBJS	+= builtin-sandbox.o
 OBJS	+= virtio/mmio.o
diff --git a/builtin-run.c b/builtin-run.c
index f8dc6c7229b0..9cb8c75300eb 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -31,7 +31,6 @@
 #include "kvm/sdl.h"
 #include "kvm/vnc.h"
 #include "kvm/guest_compat.h"
-#include "kvm/pci-shmem.h"
 #include "kvm/kvm-ipc.h"
 #include "kvm/builtin-debug.h"
 
@@ -99,10 +98,6 @@ void kvm_run_set_wrapper_sandbox(void)
 	OPT_INTEGER('c', "cpus", &(cfg)->nrcpus, "Number of CPUs"),	\
 	OPT_U64('m', "mem", &(cfg)->ram_size, "Virtual machine memory"	\
 		" size in MiB."),					\
-	OPT_CALLBACK('\0', "shmem", NULL,				\
-		     "[pci:]<addr>:<size>[:handle=<handle>][:create]",	\
-		     "Share host shmem with guest via pci device",	\
-		     shmem_parser, NULL),				\
 	OPT_CALLBACK('d', "disk", kvm, "image or rootfs_dir", "Disk "	\
 			" image or rootfs directory", img_name_parser,	\
 			kvm),						\
diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
deleted file mode 100644
index f92bc75544d7..000000000000
--- a/hw/pci-shmem.c
+++ /dev/null
@@ -1,400 +0,0 @@
-#include "kvm/devices.h"
-#include "kvm/pci-shmem.h"
-#include "kvm/virtio-pci-dev.h"
-#include "kvm/irq.h"
-#include "kvm/kvm.h"
-#include "kvm/pci.h"
-#include "kvm/util.h"
-#include "kvm/ioport.h"
-#include "kvm/ioeventfd.h"
-
-#include <linux/kvm.h>
-#include <linux/byteorder.h>
-#include <sys/ioctl.h>
-#include <fcntl.h>
-#include <sys/mman.h>
-
-#define MB_SHIFT (20)
-#define KB_SHIFT (10)
-#define GB_SHIFT (30)
-
-static struct pci_device_header pci_shmem_pci_device = {
-	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
-	.device_id	= cpu_to_le16(0x1110),
-	.header_type	= PCI_HEADER_TYPE_NORMAL,
-	.class[2]	= 0xFF,	/* misc pci device */
-	.status		= cpu_to_le16(PCI_STATUS_CAP_LIST),
-	.capabilities	= (void *)&pci_shmem_pci_device.msix - (void *)&pci_shmem_pci_device,
-	.msix.cap	= PCI_CAP_ID_MSIX,
-	.msix.ctrl	= cpu_to_le16(1),
-	.msix.table_offset = cpu_to_le32(1),		/* Use BAR 1 */
-	.msix.pba_offset = cpu_to_le32(0x1001),		/* Use BAR 1 */
-};
-
-static struct device_header pci_shmem_device = {
-	.bus_type	= DEVICE_BUS_PCI,
-	.data		= &pci_shmem_pci_device,
-};
-
-/* registers for the Inter-VM shared memory device */
-enum ivshmem_registers {
-	INTRMASK = 0,
-	INTRSTATUS = 4,
-	IVPOSITION = 8,
-	DOORBELL = 12,
-};
-
-static struct shmem_info *shmem_region;
-static u16 ivshmem_registers;
-static int local_fd;
-static u32 local_id;
-static u64 msix_block;
-static u64 msix_pba;
-static struct msix_table msix_table[2];
-
-int pci_shmem__register_mem(struct shmem_info *si)
-{
-	if (!shmem_region) {
-		shmem_region = si;
-	} else {
-		pr_warning("only single shmem currently avail. ignoring.\n");
-		free(si);
-	}
-	return 0;
-}
-
-static bool shmem_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
-{
-	u16 offset = port - ivshmem_registers;
-
-	switch (offset) {
-	case INTRMASK:
-		break;
-	case INTRSTATUS:
-		break;
-	case IVPOSITION:
-		ioport__write32(data, local_id);
-		break;
-	case DOORBELL:
-		break;
-	};
-
-	return true;
-}
-
-static bool shmem_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
-{
-	u16 offset = port - ivshmem_registers;
-
-	switch (offset) {
-	case INTRMASK:
-		break;
-	case INTRSTATUS:
-		break;
-	case IVPOSITION:
-		break;
-	case DOORBELL:
-		break;
-	};
-
-	return true;
-}
-
-static struct ioport_operations shmem_pci__io_ops = {
-	.io_in	= shmem_pci__io_in,
-	.io_out	= shmem_pci__io_out,
-};
-
-static void callback_mmio_msix(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr)
-{
-	void *mem;
-
-	if (addr - msix_block < 0x1000)
-		mem = &msix_table;
-	else
-		mem = &msix_pba;
-
-	if (is_write)
-		memcpy(mem + addr - msix_block, data, len);
-	else
-		memcpy(data, mem + addr - msix_block, len);
-}
-
-/*
- * Return an irqfd which can be used by other guests to signal this guest
- * whenever they need to poke it
- */
-int pci_shmem__get_local_irqfd(struct kvm *kvm)
-{
-	int fd, gsi, r;
-
-	if (local_fd == 0) {
-		fd = eventfd(0, 0);
-		if (fd < 0)
-			return fd;
-
-		if (pci_shmem_pci_device.msix.ctrl & cpu_to_le16(PCI_MSIX_FLAGS_ENABLE)) {
-			gsi = irq__add_msix_route(kvm, &msix_table[0].msg,
-						  pci_shmem_device.dev_num << 3);
-			if (gsi < 0)
-				return gsi;
-		} else {
-			gsi = pci_shmem_pci_device.irq_line;
-		}
-
-		r = irq__add_irqfd(kvm, gsi, fd, -1);
-		if (r < 0)
-			return r;
-
-		local_fd = fd;
-	}
-
-	return local_fd;
-}
-
-/*
- * Connect a new client to ivshmem by adding the appropriate datamatch
- * to the DOORBELL
- */
-int pci_shmem__add_client(struct kvm *kvm, u32 id, int fd)
-{
-	struct kvm_ioeventfd ioevent;
-
-	ioevent = (struct kvm_ioeventfd) {
-		.addr		= ivshmem_registers + DOORBELL,
-		.len		= sizeof(u32),
-		.datamatch	= id,
-		.fd		= fd,
-		.flags		= KVM_IOEVENTFD_FLAG_PIO | KVM_IOEVENTFD_FLAG_DATAMATCH,
-	};
-
-	return ioctl(kvm->vm_fd, KVM_IOEVENTFD, &ioevent);
-}
-
-/*
- * Remove a client connected to ivshmem by removing the appropriate datamatch
- * from the DOORBELL
- */
-int pci_shmem__remove_client(struct kvm *kvm, u32 id)
-{
-	struct kvm_ioeventfd ioevent;
-
-	ioevent = (struct kvm_ioeventfd) {
-		.addr		= ivshmem_registers + DOORBELL,
-		.len		= sizeof(u32),
-		.datamatch	= id,
-		.flags		= KVM_IOEVENTFD_FLAG_PIO
-				| KVM_IOEVENTFD_FLAG_DATAMATCH
-				| KVM_IOEVENTFD_FLAG_DEASSIGN,
-	};
-
-	return ioctl(kvm->vm_fd, KVM_IOEVENTFD, &ioevent);
-}
-
-static void *setup_shmem(const char *key, size_t len, int creating)
-{
-	int fd;
-	int rtn;
-	void *mem;
-	int flag = O_RDWR;
-
-	if (creating)
-		flag |= O_CREAT;
-
-	fd = shm_open(key, flag, S_IRUSR | S_IWUSR);
-	if (fd < 0) {
-		pr_warning("Failed to open shared memory file %s\n", key);
-		return NULL;
-	}
-
-	if (creating) {
-		rtn = ftruncate(fd, (off_t) len);
-		if (rtn < 0)
-			pr_warning("Can't ftruncate(fd,%zu)\n", len);
-	}
-	mem = mmap(NULL, len,
-		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, 0);
-	if (mem == MAP_FAILED) {
-		pr_warning("Failed to mmap shared memory file");
-		mem = NULL;
-	}
-	close(fd);
-
-	return mem;
-}
-
-int shmem_parser(const struct option *opt, const char *arg, int unset)
-{
-	const u64 default_size = SHMEM_DEFAULT_SIZE;
-	const u64 default_phys_addr = SHMEM_DEFAULT_ADDR;
-	const char *default_handle = SHMEM_DEFAULT_HANDLE;
-	struct shmem_info *si = malloc(sizeof(struct shmem_info));
-	u64 phys_addr;
-	u64 size;
-	char *handle = NULL;
-	int create = 0;
-	const char *p = arg;
-	char *next;
-	int base = 10;
-	int verbose = 0;
-
-	const int skip_pci = strlen("pci:");
-	if (verbose)
-		pr_info("shmem_parser(%p,%s,%d)", opt, arg, unset);
-	/* parse out optional addr family */
-	if (strcasestr(p, "pci:")) {
-		p += skip_pci;
-	} else if (strcasestr(p, "mem:")) {
-		die("I can't add to E820 map yet.\n");
-	}
-	/* parse out physical addr */
-	base = 10;
-	if (strcasestr(p, "0x"))
-		base = 16;
-	phys_addr = strtoll(p, &next, base);
-	if (next == p && phys_addr == 0) {
-		pr_info("shmem: no physical addr specified, using default.");
-		phys_addr = default_phys_addr;
-	}
-	if (*next != ':' && *next != '\0')
-		die("shmem: unexpected chars after phys addr.\n");
-	if (*next == '\0')
-		p = next;
-	else
-		p = next + 1;
-	/* parse out size */
-	base = 10;
-	if (strcasestr(p, "0x"))
-		base = 16;
-	size = strtoll(p, &next, base);
-	if (next == p && size == 0) {
-		pr_info("shmem: no size specified, using default.");
-		size = default_size;
-	}
-	/* look for [KMGkmg][Bb]*  uses base 2. */
-	int skip_B = 0;
-	if (strspn(next, "KMGkmg")) {	/* might have a prefix */
-		if (*(next + 1) == 'B' || *(next + 1) == 'b')
-			skip_B = 1;
-		switch (*next) {
-		case 'K':
-		case 'k':
-			size = size << KB_SHIFT;
-			break;
-		case 'M':
-		case 'm':
-			size = size << MB_SHIFT;
-			break;
-		case 'G':
-		case 'g':
-			size = size << GB_SHIFT;
-			break;
-		default:
-			die("shmem: bug in detecting size prefix.");
-			break;
-		}
-		next += 1 + skip_B;
-	}
-	if (*next != ':' && *next != '\0') {
-		die("shmem: unexpected chars after phys size. <%c><%c>\n",
-		    *next, *p);
-	}
-	if (*next == '\0')
-		p = next;
-	else
-		p = next + 1;
-	/* parse out optional shmem handle */
-	const int skip_handle = strlen("handle=");
-	next = strcasestr(p, "handle=");
-	if (*p && next) {
-		if (p != next)
-			die("unexpected chars before handle\n");
-		p += skip_handle;
-		next = strchrnul(p, ':');
-		if (next - p) {
-			handle = malloc(next - p + 1);
-			strncpy(handle, p, next - p);
-			handle[next - p] = '\0';	/* just in case. */
-		}
-		if (*next == '\0')
-			p = next;
-		else
-			p = next + 1;
-	}
-	/* parse optional create flag to see if we should create shm seg. */
-	if (*p && strcasestr(p, "create")) {
-		create = 1;
-		p += strlen("create");
-	}
-	if (*p != '\0')
-		die("shmem: unexpected trailing chars\n");
-	if (handle == NULL) {
-		handle = malloc(strlen(default_handle) + 1);
-		strcpy(handle, default_handle);
-	}
-	if (verbose) {
-		pr_info("shmem: phys_addr = %llx",
-			(unsigned long long)phys_addr);
-		pr_info("shmem: size      = %llx", (unsigned long long)size);
-		pr_info("shmem: handle    = %s", handle);
-		pr_info("shmem: create    = %d", create);
-	}
-
-	si->phys_addr = phys_addr;
-	si->size = size;
-	si->handle = handle;
-	si->create = create;
-	pci_shmem__register_mem(si);	/* ownership of si, etc. passed on. */
-	return 0;
-}
-
-int pci_shmem__init(struct kvm *kvm)
-{
-	char *mem;
-	int r;
-
-	if (shmem_region == NULL)
-		return 0;
-
-	/* Register MMIO space for MSI-X */
-	r = ioport__register(kvm, IOPORT_EMPTY, &shmem_pci__io_ops, IOPORT_SIZE, NULL);
-	if (r < 0)
-		return r;
-	ivshmem_registers = (u16)r;
-
-	msix_block = pci_get_io_space_block(0x1010);
-	kvm__register_mmio(kvm, msix_block, 0x1010, false, callback_mmio_msix, NULL);
-
-	/*
-	 * This registers 3 BARs:
-	 *
-	 * 0 - ivshmem registers
-	 * 1 - MSI-X MMIO space
-	 * 2 - Shared memory block
-	 */
-	pci_shmem_pci_device.bar[0] = cpu_to_le32(ivshmem_registers | PCI_BASE_ADDRESS_SPACE_IO);
-	pci_shmem_pci_device.bar_size[0] = shmem_region->size;
-	pci_shmem_pci_device.bar[1] = cpu_to_le32(msix_block | PCI_BASE_ADDRESS_SPACE_MEMORY);
-	pci_shmem_pci_device.bar_size[1] = 0x1010;
-	pci_shmem_pci_device.bar[2] = cpu_to_le32(shmem_region->phys_addr | PCI_BASE_ADDRESS_SPACE_MEMORY);
-	pci_shmem_pci_device.bar_size[2] = shmem_region->size;
-
-	device__register(&pci_shmem_device);
-
-	/* Open shared memory and plug it into the guest */
-	mem = setup_shmem(shmem_region->handle, shmem_region->size,
-				shmem_region->create);
-	if (mem == NULL)
-		return -EINVAL;
-
-	kvm__register_dev_mem(kvm, shmem_region->phys_addr, shmem_region->size,
-			      mem);
-	return 0;
-}
-dev_init(pci_shmem__init);
-
-int pci_shmem__exit(struct kvm *kvm)
-{
-	return 0;
-}
-dev_exit(pci_shmem__exit);
diff --git a/include/kvm/pci-shmem.h b/include/kvm/pci-shmem.h
deleted file mode 100644
index 6cff2b85bfd3..000000000000
--- a/include/kvm/pci-shmem.h
+++ /dev/null
@@ -1,32 +0,0 @@
-#ifndef KVM__PCI_SHMEM_H
-#define KVM__PCI_SHMEM_H
-
-#include <linux/types.h>
-#include <linux/list.h>
-
-#include "kvm/parse-options.h"
-
-#define SHMEM_DEFAULT_SIZE (16 << MB_SHIFT)
-#define SHMEM_DEFAULT_ADDR (0xc8000000)
-#define SHMEM_DEFAULT_HANDLE "/kvm_shmem"
-
-struct kvm;
-struct shmem_info;
-
-struct shmem_info {
-	u64 phys_addr;
-	u64 size;
-	char *handle;
-	int create;
-};
-
-int pci_shmem__init(struct kvm *kvm);
-int pci_shmem__exit(struct kvm *kvm);
-int pci_shmem__register_mem(struct shmem_info *si);
-int shmem_parser(const struct option *opt, const char *arg, int unset);
-
-int pci_shmem__get_local_irqfd(struct kvm *kvm);
-int pci_shmem__add_client(struct kvm *kvm, u32 id, int fd);
-int pci_shmem__remove_client(struct kvm *kvm, u32 id);
-
-#endif
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 04/16] Check that a PCI device's memory size is power of two
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (2 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 03/16] Remove pci-shmem device Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-27 18:25   ` Andre Przywara
  2019-11-25 10:30 ` [PATCH kvmtool 05/16] arm: pci.c: Advertise only PCI bus 0 in the DT Alexandru Elisei
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

According to the PCI local bus specification [1], a device's memory size
must be a power of two. This is also implicit in the mechanism that a CPU
uses to get the memory size requirement for a PCI device.

The vesa device requests a memory size that isn't a power of two.
According to the same spec [1], a device is allowed to consume more memory
than it actually requires. As a result, the amount of memory that the vesa
device now reserves has been increased.

To prevent slip-ups in the future, a few BUILD_BUG_ON statements were added
in places where the memory size is known at compile time.

[1] PCI Local Bus Specification Revision 3.0, section 6.2.5.1

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c          | 2 ++
 include/kvm/util.h | 2 ++
 include/kvm/vesa.h | 6 +++++-
 vfio/pci.c         | 5 +++++
 virtio/pci.c       | 3 +++
 5 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index f3c5114cf4fe..75670a51be5f 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -58,6 +58,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 	char *mem;
 	int r;
 
+	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
+
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
 
diff --git a/include/kvm/util.h b/include/kvm/util.h
index 4ca7aa9392b6..e90f1c2db39f 100644
--- a/include/kvm/util.h
+++ b/include/kvm/util.h
@@ -104,6 +104,8 @@ static inline unsigned long roundup_pow_of_two(unsigned long x)
 	return x ? 1UL << fls_long(x - 1) : 0;
 }
 
+#define is_power_of_two(x)	((x) ? ((x) & ((x) - 1)) == 0 : 0)
+
 struct kvm;
 void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
 void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
diff --git a/include/kvm/vesa.h b/include/kvm/vesa.h
index 0fac11ab5a9f..e7d971343642 100644
--- a/include/kvm/vesa.h
+++ b/include/kvm/vesa.h
@@ -5,8 +5,12 @@
 #define VESA_HEIGHT	480
 
 #define VESA_MEM_ADDR	0xd0000000
-#define VESA_MEM_SIZE	(4*VESA_WIDTH*VESA_HEIGHT)
 #define VESA_BPP	32
+/*
+ * We actually only need VESA_BPP/8*VESA_WIDTH*VESA_HEIGHT bytes. But the memory
+ * size must be a power of 2, so we round up.
+ */
+#define VESA_MEM_SIZE	(1 << 21)
 
 struct kvm;
 struct biosregs;
diff --git a/vfio/pci.c b/vfio/pci.c
index 76e24c156906..914732cc6897 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -831,6 +831,11 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 	/* Ignore invalid or unimplemented regions */
 	if (!region->info.size)
 		return 0;
+	if (!is_power_of_two(region->info.size)) {
+		vfio_dev_err(vdev, "region is not power of two: 0x%llx",
+			     region->info.size);
+		return -EINVAL;
+	}
 
 	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) {
 		/* Trap and emulate MSI-X table */
diff --git a/virtio/pci.c b/virtio/pci.c
index 99653cad2c0f..04e801827df9 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -435,6 +435,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	vpci->kvm = kvm;
 	vpci->dev = dev;
 
+	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
+	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
+
 	r = ioport__register(kvm, IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
 	if (r < 0)
 		return r;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 05/16] arm: pci.c: Advertise only PCI bus 0 in the DT
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (3 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 04/16] Check that a PCI device's memory size is power of two Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-28 17:43   ` Andre Przywara
  2019-11-25 10:30 ` [PATCH kvmtool 06/16] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

The "bus-range" property encodes the first and last bus number. kvmtool
uses bus 0 for PCI and bus 1 for MMIO.  Advertise only the PCI bus in
the PCI DT node by setting "bus-range" to <0, 0>.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arm/pci.c b/arm/pci.c
index 557cfa98938d..ed325fa4a811 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -30,7 +30,7 @@ void pci__generate_fdt_nodes(void *fdt)
 	struct of_interrupt_map_entry irq_map[OF_PCI_IRQ_MAP_MAX];
 	unsigned nentries = 0;
 	/* Bus range */
-	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(1), };
+	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(0), };
 	/* Configuration Space */
 	u64 cfg_reg_prop[] = { cpu_to_fdt64(KVM_PCI_CFG_AREA),
 			       cpu_to_fdt64(ARM_PCI_CFG_SIZE), };
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 06/16] ioport: pci: Move port allocations to PCI devices
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (4 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 05/16] arm: pci.c: Advertise only PCI bus 0 in the DT Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 07/16] pci: Fix ioport allocation size Alexandru Elisei
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

From: Julien Thierry <julien.thierry@arm.com>

The dynamic ioport allocation with IOPORT_EMPTY is currently only used
by PCI devices. Other devices use fixed ports for which they request
registration to the ioport API.

PCI ports need to be in the PCI IO space and there is no reason ioport
API should know a PCI port is being allocated and needs to be placed in
PCI IO space. This currently just happens to be the case.

Move the responsability of dynamic allocation of ioports from the ioport
API to PCI.

In the future, if other types of devices also need dynamic ioport
allocation, they'll have to figure out the range of ports they are
allowed to use.

Cc: julien.thierry.kdev@gmail.com
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Renamed functions for clarity]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c                      |  4 ++--
 include/kvm/ioport.h           |  3 ---
 include/kvm/pci.h              |  4 +++-
 ioport.c                       | 18 ------------------
 pci.c                          | 17 +++++++++++++----
 powerpc/include/kvm/kvm-arch.h |  2 +-
 vfio/core.c                    |  6 ++++--
 vfio/pci.c                     |  4 ++--
 virtio/pci.c                   |  7 ++++---
 x86/include/kvm/kvm-arch.h     |  2 +-
 10 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index 75670a51be5f..70ab59974f76 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -62,8 +62,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
-
-	r = ioport__register(kvm, IOPORT_EMPTY, &vesa_io_ops, IOPORT_SIZE, NULL);
+	r = pci_get_io_port_block(IOPORT_SIZE);
+	r = ioport__register(kvm, r, &vesa_io_ops, IOPORT_SIZE, NULL);
 	if (r < 0)
 		return ERR_PTR(r);
 
diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index db52a479742b..b10fcd5b4412 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -14,11 +14,8 @@
 
 /* some ports we reserve for own use */
 #define IOPORT_DBG			0xe0
-#define IOPORT_START			0x6200
 #define IOPORT_SIZE			0x400
 
-#define IOPORT_EMPTY			USHRT_MAX
-
 struct kvm;
 
 struct ioport {
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index a86c15a70e6d..ccb155e3e8fe 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -19,6 +19,7 @@
 #define PCI_CONFIG_DATA		0xcfc
 #define PCI_CONFIG_BUS_FORWARD	0xcfa
 #define PCI_IO_SIZE		0x100
+#define PCI_IOPORT_START	0x6200
 #define PCI_CFG_SIZE		(1ULL << 24)
 
 struct kvm;
@@ -152,7 +153,8 @@ struct pci_device_header {
 int pci__init(struct kvm *kvm);
 int pci__exit(struct kvm *kvm);
 struct pci_device_header *pci__find_dev(u8 dev_num);
-u32 pci_get_io_space_block(u32 size);
+u32 pci_get_mmio_block(u32 size);
+u16 pci_get_io_port_block(u32 size);
 void pci__assign_irq(struct device_header *dev_hdr);
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size);
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size);
diff --git a/ioport.c b/ioport.c
index a6dc65e3e6c6..a72e4035881a 100644
--- a/ioport.c
+++ b/ioport.c
@@ -16,24 +16,8 @@
 
 #define ioport_node(n) rb_entry(n, struct ioport, node)
 
-DEFINE_MUTEX(ioport_mutex);
-
-static u16			free_io_port_idx; /* protected by ioport_mutex */
-
 static struct rb_root		ioport_tree = RB_ROOT;
 
-static u16 ioport__find_free_port(void)
-{
-	u16 free_port;
-
-	mutex_lock(&ioport_mutex);
-	free_port = IOPORT_START + free_io_port_idx * IOPORT_SIZE;
-	free_io_port_idx++;
-	mutex_unlock(&ioport_mutex);
-
-	return free_port;
-}
-
 static struct ioport *ioport_search(struct rb_root *root, u64 addr)
 {
 	struct rb_int_node *node;
@@ -85,8 +69,6 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	int r;
 
 	br_write_lock(kvm);
-	if (port == IOPORT_EMPTY)
-		port = ioport__find_free_port();
 
 	entry = ioport_search(&ioport_tree, port);
 	if (entry) {
diff --git a/pci.c b/pci.c
index e1b57325bdeb..32a07335a765 100644
--- a/pci.c
+++ b/pci.c
@@ -15,15 +15,24 @@ static u32 pci_config_address_bits;
  * (That's why it can still 32bit even with 64bit guests-- 64bit
  * PCI isn't currently supported.)
  */
-static u32 io_space_blocks		= KVM_PCI_MMIO_AREA;
+static u32 mmio_blocks			= KVM_PCI_MMIO_AREA;
+static u16 io_port_blocks		= PCI_IOPORT_START;
+
+u16 pci_get_io_port_block(u32 size)
+{
+	u16 port = ALIGN(io_port_blocks, IOPORT_SIZE);
+
+	io_port_blocks = port + size;
+	return port;
+}
 
 /*
  * BARs must be naturally aligned, so enforce this in the allocator.
  */
-u32 pci_get_io_space_block(u32 size)
+u32 pci_get_mmio_block(u32 size)
 {
-	u32 block = ALIGN(io_space_blocks, size);
-	io_space_blocks = block + size;
+	u32 block = ALIGN(mmio_blocks, size);
+	mmio_blocks = block + size;
 	return block;
 }
 
diff --git a/powerpc/include/kvm/kvm-arch.h b/powerpc/include/kvm/kvm-arch.h
index 8126b96cb66a..26d440b22bdd 100644
--- a/powerpc/include/kvm/kvm-arch.h
+++ b/powerpc/include/kvm/kvm-arch.h
@@ -34,7 +34,7 @@
 #define KVM_MMIO_START			PPC_MMIO_START
 
 /*
- * This is the address that pci_get_io_space_block() starts allocating
+ * This is the address that pci_get_io_port_block() starts allocating
  * from.  Note that this is a PCI bus address.
  */
 #define KVM_IOPORT_AREA			0x0
diff --git a/vfio/core.c b/vfio/core.c
index 17b5b0cfc9ac..0ed1e6fee6bf 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -202,8 +202,10 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
 				  struct vfio_region *region)
 {
 	if (region->is_ioport) {
-		int port = ioport__register(kvm, IOPORT_EMPTY, &vfio_ioport_ops,
-					    region->info.size, region);
+		int port = pci_get_io_port_block(region->info.size);
+
+		port = ioport__register(kvm, port, &vfio_ioport_ops,
+					region->info.size, region);
 		if (port < 0)
 			return port;
 
diff --git a/vfio/pci.c b/vfio/pci.c
index 914732cc6897..bc5a6d452f7a 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -750,7 +750,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm,
 	 * powers of two.
 	 */
 	mmio_size = roundup_pow_of_two(table->size + pba->size);
-	table->guest_phys_addr = pci_get_io_space_block(mmio_size);
+	table->guest_phys_addr = pci_get_mmio_block(mmio_size);
 	if (!table->guest_phys_addr) {
 		pr_err("cannot allocate IO space");
 		ret = -ENOMEM;
@@ -851,7 +851,7 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 	if (!region->is_ioport) {
 		/* Grab some MMIO space in the guest */
 		map_size = ALIGN(region->info.size, PAGE_SIZE);
-		region->guest_phys_addr = pci_get_io_space_block(map_size);
+		region->guest_phys_addr = pci_get_mmio_block(map_size);
 	}
 
 	/* Map the BARs into the guest or setup a trap region. */
diff --git a/virtio/pci.c b/virtio/pci.c
index 04e801827df9..d73414abde05 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -438,18 +438,19 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = ioport__register(kvm, IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
+	r = pci_get_io_port_block(IOPORT_SIZE);
+	r = ioport__register(kvm, r, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
 	if (r < 0)
 		return r;
 	vpci->port_addr = (u16)r;
 
-	vpci->mmio_addr = pci_get_io_space_block(IOPORT_SIZE);
+	vpci->mmio_addr = pci_get_mmio_block(IOPORT_SIZE);
 	r = kvm__register_mmio(kvm, vpci->mmio_addr, IOPORT_SIZE, false,
 			       virtio_pci__io_mmio_callback, vpci);
 	if (r < 0)
 		goto free_ioport;
 
-	vpci->msix_io_block = pci_get_io_space_block(PCI_IO_SIZE * 2);
+	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
 	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
 			       virtio_pci__msix_mmio_callback, vpci);
 	if (r < 0)
diff --git a/x86/include/kvm/kvm-arch.h b/x86/include/kvm/kvm-arch.h
index bfdd3438a9de..85cd336c7577 100644
--- a/x86/include/kvm/kvm-arch.h
+++ b/x86/include/kvm/kvm-arch.h
@@ -16,7 +16,7 @@
 
 #define KVM_MMIO_START		KVM_32BIT_GAP_START
 
-/* This is the address that pci_get_io_space_block() starts allocating
+/* This is the address that pci_get_io_port_block() starts allocating
  * from.  Note that this is a PCI bus address (though same on x86).
  */
 #define KVM_IOPORT_AREA		0x0
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 07/16] pci: Fix ioport allocation size
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (5 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 06/16] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 08/16] arm/pci: Fix PCI IO region Alexandru Elisei
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

From: Julien Thierry <julien.thierry@arm.com>

The PCI Local Bus Specification, Rev. 3.0, Section 6.2.5.1. "Address Maps"
states: "Devices that map control functions into I/O Space must not consume
more than 256 bytes per I/O Base Address register."

Yet all the PCI devices allocate IO ports of IOPORT_SIZE (= 1024 bytes).

Fix this by having PCI devices use 256 bytes ports for IO BARs.

There is no hard requirement on the size of the memory region described
by memory BARs. However, the region must be big enough to hold the
virtio common interface described in [1], which is 20 bytes, and other
MSI-X and/or device specific configuration. To be consistent, let's also
limit the memory region described by BAR1 to 256. This is the same size
used by BAR2 for each of the two MSI-X vectors.

[1] VIRTIO Version 1.0 Committee Specification 04, section 4.4.8.

Cc: julien.thierry.kdev@gmail.com
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Added rationale for changing BAR1 size to PCI_IO_SIZE]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c            |  4 ++--
 include/kvm/ioport.h |  1 -
 pci.c                |  2 +-
 virtio/pci.c         | 15 +++++++--------
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index 70ab59974f76..0191e9264666 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -62,8 +62,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
 		return NULL;
-	r = pci_get_io_port_block(IOPORT_SIZE);
-	r = ioport__register(kvm, r, &vesa_io_ops, IOPORT_SIZE, NULL);
+	r = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, r, &vesa_io_ops, PCI_IO_SIZE, NULL);
 	if (r < 0)
 		return ERR_PTR(r);
 
diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index b10fcd5b4412..8c86b7151f25 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -14,7 +14,6 @@
 
 /* some ports we reserve for own use */
 #define IOPORT_DBG			0xe0
-#define IOPORT_SIZE			0x400
 
 struct kvm;
 
diff --git a/pci.c b/pci.c
index 32a07335a765..b4677434c50c 100644
--- a/pci.c
+++ b/pci.c
@@ -20,7 +20,7 @@ static u16 io_port_blocks		= PCI_IOPORT_START;
 
 u16 pci_get_io_port_block(u32 size)
 {
-	u16 port = ALIGN(io_port_blocks, IOPORT_SIZE);
+	u16 port = ALIGN(io_port_blocks, PCI_IO_SIZE);
 
 	io_port_blocks = port + size;
 	return port;
diff --git a/virtio/pci.c b/virtio/pci.c
index d73414abde05..eeb5b5efa6e1 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -421,7 +421,7 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 {
 	struct virtio_pci *vpci = ptr;
 	int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
-	u16 port = vpci->port_addr + (addr & (IOPORT_SIZE - 1));
+	u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
 
 	kvm__emulate_io(vcpu, port, data, direction, len, 1);
 }
@@ -435,17 +435,16 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	vpci->kvm = kvm;
 	vpci->dev = dev;
 
-	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = pci_get_io_port_block(IOPORT_SIZE);
-	r = ioport__register(kvm, r, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
+	r = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev);
 	if (r < 0)
 		return r;
 	vpci->port_addr = (u16)r;
 
-	vpci->mmio_addr = pci_get_mmio_block(IOPORT_SIZE);
-	r = kvm__register_mmio(kvm, vpci->mmio_addr, IOPORT_SIZE, false,
+	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
+	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
 			       virtio_pci__io_mmio_callback, vpci);
 	if (r < 0)
 		goto free_ioport;
@@ -475,8 +474,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
 		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
 		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
-		.bar_size[0]		= cpu_to_le32(IOPORT_SIZE),
-		.bar_size[1]		= cpu_to_le32(IOPORT_SIZE),
+		.bar_size[0]		= cpu_to_le32(PCI_IO_SIZE),
+		.bar_size[1]		= cpu_to_le32(PCI_IO_SIZE),
 		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
 	};
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 08/16] arm/pci: Fix PCI IO region
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (6 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 07/16] pci: Fix ioport allocation size Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 09/16] arm/pci: Do not use first PCI IO space bytes for devices Alexandru Elisei
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

From: Julien Thierry <julien.thierry@arm.com>

Current PCI IO region that is exposed through the DT contains ports that
are reserved by non-PCI devices.

Use the proper PCI IO start so that the region exposed through DT can
actually be used to reassign device BARs.

Cc: julien.thierry.kdev@gmail.com
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/include/arm-common/pci.h |  1 +
 arm/kvm.c                    |  3 +++
 arm/pci.c                    | 21 ++++++++++++++++++---
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/arm/include/arm-common/pci.h b/arm/include/arm-common/pci.h
index 9008a0ed072e..aea42b8895e9 100644
--- a/arm/include/arm-common/pci.h
+++ b/arm/include/arm-common/pci.h
@@ -1,6 +1,7 @@
 #ifndef ARM_COMMON__PCI_H
 #define ARM_COMMON__PCI_H
 
+void pci__arm_init(struct kvm *kvm);
 void pci__generate_fdt_nodes(void *fdt);
 
 #endif /* ARM_COMMON__PCI_H */
diff --git a/arm/kvm.c b/arm/kvm.c
index 1f85fc60588f..5c30ec1e0515 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -6,6 +6,7 @@
 #include "kvm/fdt.h"
 
 #include "arm-common/gic.h"
+#include "arm-common/pci.h"
 
 #include <linux/kernel.h>
 #include <linux/kvm.h>
@@ -86,6 +87,8 @@ void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 	/* Create the virtual GIC. */
 	if (gic__create(kvm, kvm->cfg.arch.irqchip))
 		die("Failed to create virtual GIC");
+
+	pci__arm_init(kvm);
 }
 
 #define FDT_ALIGN	SZ_2M
diff --git a/arm/pci.c b/arm/pci.c
index ed325fa4a811..1c0949a22408 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -1,3 +1,5 @@
+#include "linux/sizes.h"
+
 #include "kvm/devices.h"
 #include "kvm/fdt.h"
 #include "kvm/kvm.h"
@@ -7,6 +9,11 @@
 
 #include "arm-common/pci.h"
 
+#define ARM_PCI_IO_START ALIGN(PCI_IOPORT_START, SZ_4K)
+
+/* Must be a multiple of 4k */
+#define ARM_PCI_IO_SIZE ((ARM_MMIO_AREA - ARM_PCI_IO_START) & ~(SZ_4K - 1))
+
 /*
  * An entry in the interrupt-map table looks like:
  * <pci unit address> <pci interrupt pin> <gic phandle> <gic interrupt>
@@ -24,6 +31,14 @@ struct of_interrupt_map_entry {
 	struct of_gic_irq		gic_irq;
 } __attribute__((packed));
 
+void pci__arm_init(struct kvm *kvm)
+{
+	u32 align_pad = ARM_PCI_IO_START - PCI_IOPORT_START;
+
+	/* Make PCI port allocation start at a properly aligned address */
+	pci_get_io_port_block(align_pad);
+}
+
 void pci__generate_fdt_nodes(void *fdt)
 {
 	struct device_header *dev_hdr;
@@ -40,10 +55,10 @@ void pci__generate_fdt_nodes(void *fdt)
 			.pci_addr = {
 				.hi	= cpu_to_fdt32(of_pci_b_ss(OF_PCI_SS_IO)),
 				.mid	= 0,
-				.lo	= 0,
+				.lo	= cpu_to_fdt32(ARM_PCI_IO_START),
 			},
-			.cpu_addr	= cpu_to_fdt64(KVM_IOPORT_AREA),
-			.length		= cpu_to_fdt64(ARM_IOPORT_SIZE),
+			.cpu_addr	= cpu_to_fdt64(ARM_PCI_IO_START),
+			.length		= cpu_to_fdt64(ARM_PCI_IO_SIZE),
 		},
 		{
 			.pci_addr = {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 09/16] arm/pci: Do not use first PCI IO space bytes for devices
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (7 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 08/16] arm/pci: Fix PCI IO region Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-12-02 12:15   ` Lorenzo Pieralisi
  2019-11-25 10:30 ` [PATCH kvmtool 10/16] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

From: Julien Thierry <julien.thierry@arm.com>

Linux has this convention that the lower 0x1000 bytes of the IO space
should not be used. (cf PCIBIOS_MIN_IO).

Just allocate those bytes to prevent future allocation assigning it to
devices.

Cc: julien.thierry.kdev@gmail.com
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arm/pci.c b/arm/pci.c
index 1c0949a22408..4e6467357ce8 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -37,6 +37,9 @@ void pci__arm_init(struct kvm *kvm)
 
 	/* Make PCI port allocation start at a properly aligned address */
 	pci_get_io_port_block(align_pad);
+
+	/* Convention, don't allocate first 0x1000 bytes of PCI IO */
+	pci_get_io_port_block(0x1000);
 }
 
 void pci__generate_fdt_nodes(void *fdt)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 10/16] virtio/pci: Make memory and IO BARs independent
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (8 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 09/16] arm/pci: Do not use first PCI IO space bytes for devices Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 11/16] virtio/pci: Ignore MMIO and I/O accesses when they are disabled Alexandru Elisei
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

From: Julien Thierry <julien.thierry@arm.com>

Currently, callbacks for memory BAR 1 call the IO port emulation.  This
means that the memory BAR needs I/O Space to be enabled whenever Memory
Space is enabled.

Refactor the code so the two type of  BARs are independent. Also, unify
ioport/mmio callback arguments so that they all receive a virtio_device.

Cc: julien.thierry.kdev@gmail.com
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 virtio/pci.c | 71 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 48 insertions(+), 23 deletions(-)

diff --git a/virtio/pci.c b/virtio/pci.c
index eeb5b5efa6e1..6723a1f3a84d 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -87,8 +87,8 @@ static inline bool virtio_pci__msix_enabled(struct virtio_pci *vpci)
 	return vpci->pci_hdr.msix.ctrl & cpu_to_le16(PCI_MSIX_FLAGS_ENABLE);
 }
 
-static bool virtio_pci__specific_io_in(struct kvm *kvm, struct virtio_device *vdev, u16 port,
-					void *data, int size, int offset)
+static bool virtio_pci__specific_data_in(struct kvm *kvm, struct virtio_device *vdev,
+					 void *data, int size, unsigned long offset)
 {
 	u32 config_offset;
 	struct virtio_pci *vpci = vdev->virtio;
@@ -117,20 +117,17 @@ static bool virtio_pci__specific_io_in(struct kvm *kvm, struct virtio_device *vd
 	return false;
 }
 
-static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+static bool virtio_pci__data_in(struct kvm_cpu *vcpu, struct virtio_device *vdev,
+				unsigned long offset, void *data, int size)
 {
-	unsigned long offset;
 	bool ret = true;
-	struct virtio_device *vdev;
 	struct virtio_pci *vpci;
 	struct virt_queue *vq;
 	struct kvm *kvm;
 	u32 val;
 
 	kvm = vcpu->kvm;
-	vdev = ioport->priv;
 	vpci = vdev->virtio;
-	offset = port - vpci->port_addr;
 
 	switch (offset) {
 	case VIRTIO_PCI_HOST_FEATURES:
@@ -154,13 +151,26 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p
 		vpci->isr = VIRTIO_IRQ_LOW;
 		break;
 	default:
-		ret = virtio_pci__specific_io_in(kvm, vdev, port, data, size, offset);
+		ret = virtio_pci__specific_data_in(kvm, vdev, data, size, offset);
 		break;
 	};
 
 	return ret;
 }
 
+static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+{
+	unsigned long offset;
+	struct virtio_device *vdev;
+	struct virtio_pci *vpci;
+
+	vdev = ioport->priv;
+	vpci = vdev->virtio;
+	offset = port - vpci->port_addr;
+
+	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
+}
+
 static void update_msix_map(struct virtio_pci *vpci,
 			    struct msix_table *msix_entry, u32 vecnum)
 {
@@ -185,8 +195,8 @@ static void update_msix_map(struct virtio_pci *vpci,
 	irq__update_msix_route(vpci->kvm, gsi, &msix_entry->msg);
 }
 
-static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *vdev, u16 port,
-					void *data, int size, int offset)
+static bool virtio_pci__specific_data_out(struct kvm *kvm, struct virtio_device *vdev,
+					  void *data, int size, unsigned long offset)
 {
 	struct virtio_pci *vpci = vdev->virtio;
 	u32 config_offset, vec;
@@ -259,19 +269,16 @@ static bool virtio_pci__specific_io_out(struct kvm *kvm, struct virtio_device *v
 	return false;
 }
 
-static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+static bool virtio_pci__data_out(struct kvm_cpu *vcpu, struct virtio_device *vdev,
+				 unsigned long offset, void *data, int size)
 {
-	unsigned long offset;
 	bool ret = true;
-	struct virtio_device *vdev;
 	struct virtio_pci *vpci;
 	struct kvm *kvm;
 	u32 val;
 
 	kvm = vcpu->kvm;
-	vdev = ioport->priv;
 	vpci = vdev->virtio;
-	offset = port - vpci->port_addr;
 
 	switch (offset) {
 	case VIRTIO_PCI_GUEST_FEATURES:
@@ -304,13 +311,26 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 		virtio_notify_status(kvm, vdev, vpci->dev, vpci->status);
 		break;
 	default:
-		ret = virtio_pci__specific_io_out(kvm, vdev, port, data, size, offset);
+		ret = virtio_pci__specific_data_out(kvm, vdev, data, size, offset);
 		break;
 	};
 
 	return ret;
 }
 
+static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
+{
+	unsigned long offset;
+	struct virtio_device *vdev;
+	struct virtio_pci *vpci;
+
+	vdev = ioport->priv;
+	vpci = vdev->virtio;
+	offset = port - vpci->port_addr;
+
+	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
+}
+
 static struct ioport_operations virtio_pci__io_ops = {
 	.io_in	= virtio_pci__io_in,
 	.io_out	= virtio_pci__io_out,
@@ -320,7 +340,8 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
 					   u64 addr, u8 *data, u32 len,
 					   u8 is_write, void *ptr)
 {
-	struct virtio_pci *vpci = ptr;
+	struct virtio_device *vdev = ptr;
+	struct virtio_pci *vpci = vdev->virtio;
 	struct msix_table *table;
 	int vecnum;
 	size_t offset;
@@ -419,11 +440,15 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 					 u64 addr, u8 *data, u32 len,
 					 u8 is_write, void *ptr)
 {
-	struct virtio_pci *vpci = ptr;
-	int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
-	u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
+	struct virtio_device *vdev = ptr;
+	struct virtio_pci *vpci = vdev->virtio;
 
-	kvm__emulate_io(vcpu, port, data, direction, len, 1);
+	if (!is_write)
+		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
+				    data, len);
+	else
+		virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr,
+				     data, len);
 }
 
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
@@ -445,13 +470,13 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
 	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
-			       virtio_pci__io_mmio_callback, vpci);
+			       virtio_pci__io_mmio_callback, vdev);
 	if (r < 0)
 		goto free_ioport;
 
 	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
 	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
-			       virtio_pci__msix_mmio_callback, vpci);
+			       virtio_pci__msix_mmio_callback, vdev);
 	if (r < 0)
 		goto free_mmio;
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 11/16] virtio/pci: Ignore MMIO and I/O accesses when they are disabled
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (9 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 10/16] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 12/16] Use independent read/write locks for ioport and mmio Alexandru Elisei
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

A device's response to memory or I/O accesses is disabled when Memory
Space, respectively I/O Space, is set to 0 in the Command register.
According to the PCI Local Bus Specification Revision 3.0, those two
bits reset to 0.

Let's respect the specifiction, so set Command and I/O Space to 0 on
reset, and ignore accesses to a device's respective regions when they
are disabled.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 virtio/pci.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/virtio/pci.c b/virtio/pci.c
index 6723a1f3a84d..9f86bb7b6f93 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -168,6 +168,9 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p
 	vpci = vdev->virtio;
 	offset = port - vpci->port_addr;
 
+	if (!(vpci->pci_hdr.command & PCI_COMMAND_IO))
+		return true;
+
 	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
 }
 
@@ -328,6 +331,9 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 	vpci = vdev->virtio;
 	offset = port - vpci->port_addr;
 
+	if (!(vpci->pci_hdr.command & PCI_COMMAND_IO))
+		return true;
+
 	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
 }
 
@@ -346,6 +352,9 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
 	int vecnum;
 	size_t offset;
 
+	if (!(vpci->pci_hdr.command & PCI_COMMAND_MEMORY))
+		return;
+
 	if (addr > vpci->msix_io_block + PCI_IO_SIZE) {
 		if (is_write)
 			return;
@@ -443,6 +452,9 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 	struct virtio_device *vdev = ptr;
 	struct virtio_pci *vpci = vdev->virtio;
 
+	if (!(vpci->pci_hdr.command & PCI_COMMAND_MEMORY))
+		return;
+
 	if (!is_write)
 		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
 				    data, len);
@@ -483,7 +495,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	vpci->pci_hdr = (struct pci_device_header) {
 		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
 		.device_id		= cpu_to_le16(device_id),
-		.command		= PCI_COMMAND_IO | PCI_COMMAND_MEMORY,
+		/* The Command register is 0 on RST. */
+		.command		= 0,
 		.header_type		= PCI_HEADER_TYPE_NORMAL,
 		.revision_id		= 0,
 		.class[0]		= class & 0xff,
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 12/16] Use independent read/write locks for ioport and mmio
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (10 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 11/16] virtio/pci: Ignore MMIO and I/O accesses when they are disabled Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 13/16] vfio: Add support for BAR configuration Alexandru Elisei
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

kvmtool uses brlock for protecting accesses to the ioport and mmio
red-black trees. brlock allows concurrent reads, but only one writer,
which is assumed not to be a VCPU thread. This is done by issuing a
compiler barrier on read and pausing the entire virtual machine on
writes. When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread
read/write lock.

When we will implement reassignable BARs, the mmio or ioport mapping
will be done as a result of a VCPU mmio access. When brlock is a
read/write lock, it means that we will try to acquire a write lock with
the read lock already held by the same VCPU and we will deadlock. When
it's not, a VCPU will have to call kvm__pause, which means the virtual
machine will stay paused forever.

Let's avoid all this by using separate pthread_rwlock_t locks for the
mmio and the ioport red-black trees and carefully choosing our read
critical region such that modification as a result of a guest mmio
access doesn't deadlock.

In theory, this leaves us with a small window of opportunity for a VCPU
to modify a node used by another VCPU. Inserting in the trees is done by
the main thread before starting the virtual machine, and deleting is
done after the virtual machine has been paused to be destroyed, so in
practice this can only happen if the guest is bugged.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 ioport.c | 20 +++++++++++---------
 mmio.c   | 26 +++++++++++++++++---------
 2 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/ioport.c b/ioport.c
index a72e4035881a..3802d8300191 100644
--- a/ioport.c
+++ b/ioport.c
@@ -2,9 +2,9 @@
 
 #include "kvm/kvm.h"
 #include "kvm/util.h"
-#include "kvm/brlock.h"
 #include "kvm/rbtree-interval.h"
 #include "kvm/mutex.h"
+#include "kvm/rwsem.h"
 
 #include <linux/kvm.h>	/* for KVM_EXIT_* */
 #include <linux/types.h>
@@ -16,6 +16,8 @@
 
 #define ioport_node(n) rb_entry(n, struct ioport, node)
 
+static DECLARE_RWSEM(ioport_lock);
+
 static struct rb_root		ioport_tree = RB_ROOT;
 
 static struct ioport *ioport_search(struct rb_root *root, u64 addr)
@@ -68,7 +70,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	struct ioport *entry;
 	int r;
 
-	br_write_lock(kvm);
+	down_write(&ioport_lock);
 
 	entry = ioport_search(&ioport_tree, port);
 	if (entry) {
@@ -93,12 +95,12 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 	r = ioport_insert(&ioport_tree, entry);
 	if (r < 0) {
 		free(entry);
-		br_write_unlock(kvm);
+		up_write(&ioport_lock);
 		return r;
 	}
 
 	device__register(&entry->dev_hdr);
-	br_write_unlock(kvm);
+	up_write(&ioport_lock);
 
 	return port;
 }
@@ -108,7 +110,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
 	struct ioport *entry;
 	int r;
 
-	br_write_lock(kvm);
+	down_write(&ioport_lock);
 
 	r = -ENOENT;
 	entry = ioport_search(&ioport_tree, port);
@@ -123,7 +125,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
 	r = 0;
 
 done:
-	br_write_unlock(kvm);
+	up_write(&ioport_lock);
 
 	return r;
 }
@@ -166,8 +168,10 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
 	void *ptr = data;
 	struct kvm *kvm = vcpu->kvm;
 
-	br_read_lock(kvm);
+	down_read(&ioport_lock);
 	entry = ioport_search(&ioport_tree, port);
+	up_read(&ioport_lock);
+
 	if (!entry)
 		goto out;
 
@@ -183,8 +187,6 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
 	}
 
 out:
-	br_read_unlock(kvm);
-
 	if (ret)
 		return true;
 
diff --git a/mmio.c b/mmio.c
index 61e1d47a587d..4e0ff830c738 100644
--- a/mmio.c
+++ b/mmio.c
@@ -1,7 +1,7 @@
 #include "kvm/kvm.h"
 #include "kvm/kvm-cpu.h"
 #include "kvm/rbtree-interval.h"
-#include "kvm/brlock.h"
+#include "kvm/rwsem.h"
 
 #include <stdio.h>
 #include <stdlib.h>
@@ -15,6 +15,8 @@
 
 #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
 
+static DECLARE_RWSEM(mmio_lock);
+
 struct mmio_mapping {
 	struct rb_int_node	node;
 	void			(*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
@@ -61,7 +63,7 @@ static const char *to_direction(u8 is_write)
 
 int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
 		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
-			void *ptr)
+		       void *ptr)
 {
 	struct mmio_mapping *mmio;
 	struct kvm_coalesced_mmio_zone zone;
@@ -88,9 +90,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
 			return -errno;
 		}
 	}
-	br_write_lock(kvm);
+	down_write(&mmio_lock);
 	ret = mmio_insert(&mmio_tree, mmio);
-	br_write_unlock(kvm);
+	up_write(&mmio_lock);
 
 	return ret;
 }
@@ -100,10 +102,10 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
 	struct mmio_mapping *mmio;
 	struct kvm_coalesced_mmio_zone zone;
 
-	br_write_lock(kvm);
+	down_write(&mmio_lock);
 	mmio = mmio_search_single(&mmio_tree, phys_addr);
 	if (mmio == NULL) {
-		br_write_unlock(kvm);
+		up_write(&mmio_lock);
 		return false;
 	}
 
@@ -114,7 +116,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
 	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
 
 	rb_int_erase(&mmio_tree, &mmio->node);
-	br_write_unlock(kvm);
+	up_write(&mmio_lock);
 
 	free(mmio);
 	return true;
@@ -124,8 +126,15 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
 {
 	struct mmio_mapping *mmio;
 
-	br_read_lock(vcpu->kvm);
+	/*
+	 * The callback might call kvm__register_mmio which takes a write lock,
+	 * so avoid deadlocks by protecting only the node search with a reader
+	 * lock. Note that there is still a small time window for a node to be
+	 * deleted by another vcpu before mmio_fn gets called.
+	 */
+	down_read(&mmio_lock);
 	mmio = mmio_search(&mmio_tree, phys_addr, len);
+	up_read(&mmio_lock);
 
 	if (mmio)
 		mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
@@ -135,7 +144,6 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
 				to_direction(is_write),
 				(unsigned long long)phys_addr, len);
 	}
-	br_read_unlock(vcpu->kvm);
 
 	return true;
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 13/16] vfio: Add support for BAR configuration
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (11 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 12/16] Use independent read/write locks for ioport and mmio Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-29 17:05   ` Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 14/16] virtio/pci: " Alexandru Elisei
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

From: Julien Thierry <julien.thierry@arm.com>

When a guest can reassign BARs, kvmtool needs to maintain the vfio_region
consistent with their corresponding BARs. Take the new updated addresses
from the PCI header read back from the vfio driver.

Also, to modify the BARs, it is expected that guests will disable
IO/Memory response in the PCI command. Support this by mapping/unmapping
regions when the corresponding response gets enabled/disabled.

Cc: julien.thierry.kdev@gmail.com
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
[Fixed BAR selection]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/core.c |  8 ++---
 vfio/pci.c  | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 87 insertions(+), 9 deletions(-)

diff --git a/vfio/core.c b/vfio/core.c
index 0ed1e6fee6bf..b554897fc8c1 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -202,14 +202,13 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
 				  struct vfio_region *region)
 {
 	if (region->is_ioport) {
-		int port = pci_get_io_port_block(region->info.size);
+		int port = ioport__register(kvm, region->port_base,
+					    &vfio_ioport_ops,
+					    region->info.size, region);
 
-		port = ioport__register(kvm, port, &vfio_ioport_ops,
-					region->info.size, region);
 		if (port < 0)
 			return port;
 
-		region->port_base = port;
 		return 0;
 	}
 
@@ -258,6 +257,7 @@ void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
 {
 	if (region->host_addr) {
 		munmap(region->host_addr, region->info.size);
+		region->host_addr = NULL;
 	} else if (region->is_ioport) {
 		ioport__unregister(kvm, region->port_base);
 	} else {
diff --git a/vfio/pci.c b/vfio/pci.c
index bc5a6d452f7a..28f895c06b27 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -1,3 +1,4 @@
+#include "kvm/ioport.h"
 #include "kvm/irq.h"
 #include "kvm/kvm.h"
 #include "kvm/kvm-cpu.h"
@@ -464,6 +465,67 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
 			      sz, offset);
 }
 
+static void vfio_pci_cfg_handle_command(struct kvm *kvm, struct vfio_device *vdev,
+					void *data, int sz)
+{
+	struct pci_device_header *hdr = &vdev->pci.hdr;
+	bool toggle_io;
+	bool toggle_mem;
+	u16 cmd;
+	int i;
+
+	cmd = ioport__read16(data);
+	toggle_io = !!((cmd ^ hdr->command) & PCI_COMMAND_IO);
+	toggle_mem = !!((cmd ^ hdr->command) & PCI_COMMAND_MEMORY);
+
+	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
+		struct vfio_region *region = &vdev->regions[i];
+
+		if (region->is_ioport && toggle_io) {
+			if (cmd & PCI_COMMAND_IO)
+				vfio_map_region(kvm, vdev, region);
+			else
+				vfio_unmap_region(kvm, region);
+		}
+
+		if (!region->is_ioport && toggle_mem) {
+			if (cmd & PCI_COMMAND_MEMORY)
+				vfio_map_region(kvm, vdev, region);
+			else
+				vfio_unmap_region(kvm, region);
+		}
+	}
+}
+
+static void vfio_pci_cfg_update_bar(struct kvm *kvm, struct vfio_device *vdev,
+				    int bar_num, void *data, int sz)
+{
+	struct pci_device_header *hdr = &vdev->pci.hdr;
+	struct vfio_region *region;
+	uint32_t bar;
+
+	region = &vdev->regions[bar_num + VFIO_PCI_BAR0_REGION_INDEX];
+	bar = ioport__read32(data);
+
+	if (region->is_ioport) {
+		if (hdr->command & PCI_COMMAND_IO)
+			vfio_unmap_region(kvm, region);
+
+		region->port_base = bar & PCI_BASE_ADDRESS_IO_MASK;
+
+		if (hdr->command & PCI_COMMAND_IO)
+			vfio_map_region(kvm, vdev, region);
+	} else {
+		if (hdr->command & PCI_COMMAND_MEMORY)
+			vfio_unmap_region(kvm, region);
+
+		region->guest_phys_addr = bar & PCI_BASE_ADDRESS_MEM_MASK;
+
+		if (hdr->command & PCI_COMMAND_MEMORY)
+			vfio_map_region(kvm, vdev, region);
+	}
+}
+
 static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
 			       u8 offset, void *data, int sz)
 {
@@ -471,6 +533,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	struct vfio_pci_device *pdev;
 	struct vfio_device *vdev;
 	void *base = pci_hdr;
+	int bar_num;
 
 	pdev = container_of(pci_hdr, struct vfio_pci_device, hdr);
 	vdev = container_of(pdev, struct vfio_device, pci);
@@ -487,9 +550,17 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
 		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
 
+	if (offset == PCI_COMMAND)
+		vfio_pci_cfg_handle_command(kvm, vdev, data, sz);
+
 	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
 		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
 			      sz, offset);
+
+	if (offset >= PCI_BASE_ADDRESS_0 && offset <= PCI_BASE_ADDRESS_5) {
+		bar_num = (offset - PCI_BASE_ADDRESS_0) / sizeof(u32);
+		vfio_pci_cfg_update_bar(kvm, vdev, bar_num, data, sz);
+	}
 }
 
 static ssize_t vfio_pci_msi_cap_size(struct msi_cap_64 *cap_hdr)
@@ -808,6 +879,7 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 	size_t map_size;
 	struct vfio_pci_device *pdev = &vdev->pci;
 	struct vfio_region *region = &vdev->regions[nr];
+	bool map_now;
 
 	if (nr >= vdev->info.num_regions)
 		return 0;
@@ -848,16 +920,22 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 		}
 	}
 
-	if (!region->is_ioport) {
+	if (region->is_ioport) {
+		region->port_base = pci_get_io_port_block(region->info.size);
+		map_now = !!(pdev->hdr.command & PCI_COMMAND_IO);
+	} else {
 		/* Grab some MMIO space in the guest */
 		map_size = ALIGN(region->info.size, PAGE_SIZE);
 		region->guest_phys_addr = pci_get_mmio_block(map_size);
+		map_now = !!(pdev->hdr.command & PCI_COMMAND_MEMORY);
 	}
 
-	/* Map the BARs into the guest or setup a trap region. */
-	ret = vfio_map_region(kvm, vdev, region);
-	if (ret)
-		return ret;
+	if (map_now) {
+		/* Map the BARs into the guest or setup a trap region. */
+		ret = vfio_map_region(kvm, vdev, region);
+		if (ret)
+			return ret;
+	}
 
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 14/16] virtio/pci: Add support for BAR configuration
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (12 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 13/16] vfio: Add support for BAR configuration Alexandru Elisei
@ 2019-11-25 10:30 ` " Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 15/16] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

A device's Memory or I/O space address can be written by software in a
Base Address Register (BAR). Allow the BARs to be programable by
registering the mmio or ioport emulation when access is enabled for that
region, not when the virtual machine is created.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/virtio-pci.h |   1 +
 virtio/pci.c             | 208 +++++++++++++++++++++++++++++++++------
 2 files changed, 179 insertions(+), 30 deletions(-)

diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h
index 278a25950d8b..9b3ad3d04d41 100644
--- a/include/kvm/virtio-pci.h
+++ b/include/kvm/virtio-pci.h
@@ -22,6 +22,7 @@ struct virtio_pci {
 	struct pci_device_header pci_hdr;
 	struct device_header	dev_hdr;
 	void			*dev;
+	struct virtio_device	*vdev;
 	struct kvm		*kvm;
 
 	u16			port_addr;
diff --git a/virtio/pci.c b/virtio/pci.c
index 9f86bb7b6f93..dadb796e6d62 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -463,6 +463,170 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 				     data, len);
 }
 
+static void virtio_pci__update_command(struct kvm *kvm,
+				       struct pci_device_header *pci_hdr,
+				       void *data, int sz)
+{
+	struct virtio_pci *vpci = container_of(pci_hdr, struct virtio_pci, pci_hdr);
+	u32 addr, mem_size;
+	u16 cmd;
+	int r;
+	bool toggle_io, toggle_mem;
+
+	cmd = ioport__read16(data);
+
+	toggle_io = !!((cmd ^ pci_hdr->command) & PCI_COMMAND_IO);
+	toggle_mem = !!((cmd ^ pci_hdr->command) & PCI_COMMAND_MEMORY);
+
+	if (toggle_io && (cmd & PCI_COMMAND_IO)) {
+		addr = pci_hdr->bar[0] & PCI_BASE_ADDRESS_IO_MASK;
+		mem_size = pci_hdr->bar_size[0];
+
+		r = ioport__register(kvm, addr, &virtio_pci__io_ops, mem_size,
+				     vpci->vdev);
+		if (r < 0) {
+			pr_err("ioport__register failed for memory region 0x%x@0x%x\n",
+			       mem_size, addr);
+			/* Drop writes. */
+			memcpy(data, (void *)&pci_hdr->command, sz);
+			goto mem_setup;
+		}
+		vpci->port_addr = (u16)r;
+	}
+
+	if (toggle_io && !(cmd & PCI_COMMAND_IO)) {
+		/* Same as for the memory BARs. */
+		memcpy((void *)&pci_hdr->command, data, sz);
+		ioport__unregister(kvm, vpci->port_addr);
+	}
+
+mem_setup:
+	if (toggle_mem && (cmd & PCI_COMMAND_MEMORY)) {
+		addr = pci_hdr->bar[1] & PCI_BASE_ADDRESS_MEM_MASK;
+		mem_size = pci_hdr->bar_size[1];
+		r = kvm__register_mmio(kvm, addr, mem_size, false,
+				       virtio_pci__io_mmio_callback, vpci->vdev);
+		if (r < 0) {
+			pr_err("kvm__register_mmio failed for memory region 0x%x@0x%x\n",
+			       mem_size, addr);
+			/* Treat it like a Master Abort and drop writes. */
+			memcpy(data, (void *)&pci_hdr->command, sz);
+			return;
+		}
+		vpci->mmio_addr = addr;
+
+		addr = pci_hdr->bar[2] & PCI_BASE_ADDRESS_MEM_MASK;
+		mem_size = pci_hdr->bar_size[2];
+		r = kvm__register_mmio(kvm, addr, mem_size, false,
+				       virtio_pci__msix_mmio_callback, vpci->vdev);
+		if (r < 0) {
+			pr_err("kvm__register_mmio failed for memory region 0x%x@0x%x\n",
+			       mem_size, addr);
+			kvm__deregister_mmio(kvm, pci_hdr->bar[1] & PCI_BASE_ADDRESS_MEM_MASK);
+			/* Drop writes. */
+			memcpy(data, (void *)&pci_hdr->command, sz);
+			return;
+		}
+		vpci->msix_io_block = addr;
+	}
+
+	if (toggle_mem && !(cmd & PCI_COMMAND_MEMORY)) {
+		/*
+		 * This is to prevent races - an access by another thread
+		 * initiated before the region is unregistered, but performed
+		 * after that, will see that memory access is disabled and won't
+		 * do any emulation.
+		 */
+		memcpy((void *)&pci_hdr->command, data, sz);
+		kvm__deregister_mmio(kvm, vpci->mmio_addr);
+		kvm__deregister_mmio(kvm, vpci->msix_io_block);
+	}
+}
+
+static void virtio_pci__update_bar(struct kvm *kvm,
+				   struct pci_device_header *pci_hdr,
+				   int bar_num, void *data, int sz)
+{
+	struct virtio_pci *vpci = container_of(pci_hdr, struct virtio_pci, pci_hdr);
+	u32 new_bar, old_bar;
+	u32 new_addr, mem_size;
+	int r;
+
+	if (bar_num > 2) {
+		/* BARs not implemented, ignore writes. */
+		bzero(data, sz);
+		return;
+	}
+
+	/*
+	 * The BARs are being configured when device access is disabled, the
+	 * memory will be registered for mmio/ioport emulation when the device
+	 * is actually enabled.
+	 */
+	if (bar_num == 0 && !(pci_hdr->command & PCI_COMMAND_IO))
+		return;
+	if (bar_num >= 1 && !(pci_hdr->command & PCI_COMMAND_MEMORY))
+		return;
+
+	new_bar = ioport__read32(data);
+	if (bar_num == 0)
+		new_addr = new_bar & PCI_BASE_ADDRESS_IO_MASK;
+	else
+		new_addr = new_bar & PCI_BASE_ADDRESS_MEM_MASK;
+	old_bar = pci_hdr->bar[bar_num];
+	mem_size = pci_hdr->bar_size[bar_num];
+
+	if (bar_num == 0) {
+		ioport__unregister(kvm, old_bar & PCI_BASE_ADDRESS_IO_MASK);
+		r = ioport__register(kvm, new_addr, &virtio_pci__io_ops,
+				     mem_size, vpci->vdev);
+		if (r < 0) {
+			pr_err("ioport__register failed for memory region 0x%x@0x%x\n",
+			       mem_size, new_addr);
+			/* Treat it like a Master Abort and drop writes. */
+			memcpy(data, (void *)&old_bar, sz);
+			return;
+		}
+		vpci->port_addr = (u16)r;
+	} else {
+		kvm__deregister_mmio(kvm, old_bar & PCI_BASE_ADDRESS_MEM_MASK);
+		if (bar_num == 1)
+			r = kvm__register_mmio(kvm, new_addr, mem_size, false,
+					       virtio_pci__io_mmio_callback,
+					       vpci->vdev);
+		else
+			r = kvm__register_mmio(kvm, new_addr, mem_size, false,
+					       virtio_pci__msix_mmio_callback,
+					       vpci->vdev);
+		if (r < 0) {
+			pr_err("kvm__register_mmio failed for memory region 0x%x@0x%x\n",
+			       mem_size, new_addr);
+			/* Drop writes. */
+			memcpy(data, (void *)&old_bar, sz);
+			return;
+		}
+		if (bar_num == 1)
+			vpci->mmio_addr = new_addr;
+		else
+			vpci->msix_io_block = new_addr;
+	}
+}
+
+static void virtio_pci__config_wr(struct kvm *kvm,
+                                 struct pci_device_header *pci_hdr,
+                                 u8 offset, void *data, int sz)
+{
+	int bar_num;
+
+	if (offset == PCI_COMMAND)
+		virtio_pci__update_command(kvm, pci_hdr, data, sz);
+
+	if (offset >= PCI_BASE_ADDRESS_0 && offset <= PCI_BASE_ADDRESS_5) {
+		bar_num = (offset - PCI_BASE_ADDRESS_0) / sizeof(u32);
+		virtio_pci__update_bar(kvm, pci_hdr, bar_num, data, sz);
+	}
+}
+
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		     int device_id, int subsys_id, int class)
 {
@@ -471,26 +635,14 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	vpci->kvm = kvm;
 	vpci->dev = dev;
+	vpci->vdev = vdev;
 
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = pci_get_io_port_block(PCI_IO_SIZE);
-	r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev);
-	if (r < 0)
-		return r;
-	vpci->port_addr = (u16)r;
-
+	/* Let's have some sensible initial values for the BARs. */
+	vpci->port_addr = (u16)pci_get_io_port_block(PCI_IO_SIZE);
 	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
-	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
-			       virtio_pci__io_mmio_callback, vdev);
-	if (r < 0)
-		goto free_ioport;
-
 	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
-	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
-			       virtio_pci__msix_mmio_callback, vdev);
-	if (r < 0)
-		goto free_mmio;
 
 	vpci->pci_hdr = (struct pci_device_header) {
 		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
@@ -504,17 +656,21 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.class[2]		= (class >> 16) & 0xff,
 		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
 		.subsys_id		= cpu_to_le16(subsys_id),
-		.bar[0]			= cpu_to_le32(vpci->port_addr
-							| PCI_BASE_ADDRESS_SPACE_IO),
-		.bar[1]			= cpu_to_le32(vpci->mmio_addr
-							| PCI_BASE_ADDRESS_SPACE_MEMORY),
-		.bar[2]			= cpu_to_le32(vpci->msix_io_block
-							| PCI_BASE_ADDRESS_SPACE_MEMORY),
+		/*
+		 * According to PCI Local Bus 3.0, read accesses to reserved or
+		 * unimplemented registers must be completed normally and a data
+		 * value of 0 returned. Set the bars to sensible addresses so
+		 * the guest does not think that they are not implemented.
+		 */
+		.bar[0]                 = vpci->port_addr | PCI_BASE_ADDRESS_SPACE_IO,
+		.bar[1]                 = vpci->mmio_addr | PCI_BASE_ADDRESS_SPACE_MEMORY,
+		.bar[2]                 = vpci->msix_io_block | PCI_BASE_ADDRESS_SPACE_MEMORY,
 		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
 		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
 		.bar_size[0]		= cpu_to_le32(PCI_IO_SIZE),
 		.bar_size[1]		= cpu_to_le32(PCI_IO_SIZE),
 		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
+		.cfg_ops.write		= virtio_pci__config_wr,
 	};
 
 	vpci->dev_hdr = (struct device_header) {
@@ -547,20 +703,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	r = device__register(&vpci->dev_hdr);
 	if (r < 0)
-		goto free_msix_mmio;
+		return r;
 
 	/* save the IRQ that device__register() has allocated */
 	vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
 
 	return 0;
-
-free_msix_mmio:
-	kvm__deregister_mmio(kvm, vpci->msix_io_block);
-free_mmio:
-	kvm__deregister_mmio(kvm, vpci->mmio_addr);
-free_ioport:
-	ioport__unregister(kvm, vpci->port_addr);
-	return r;
 }
 
 int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 15/16] arm/fdt: Remove 'linux,pci-probe-only' property
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (13 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 14/16] virtio/pci: " Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-25 10:30 ` [PATCH kvmtool 16/16] Add PCI Express 1.1 support Alexandru Elisei
  2019-11-28 17:41 ` [PATCH kvmtool 00/16] Add writable BARs and PCIE " Lorenzo Pieralisi
  16 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

From: Julien Thierry <julien.thierry@arm.com>

PCI now supports configurable BARs. Get rid of the no longer needed,
Linux-only, fdt property.

Cc: julien.thierry.kdev@gmail.com
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/fdt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index c80e6da323b6..02091e9e0bee 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -130,7 +130,6 @@ static int setup_fdt(struct kvm *kvm)
 
 	/* /chosen */
 	_FDT(fdt_begin_node(fdt, "chosen"));
-	_FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1));
 
 	/* Pass on our amended command line to a Linux kernel only. */
 	if (kvm->cfg.firmware_filename) {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH kvmtool 16/16] Add PCI Express 1.1 support
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (14 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 15/16] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
@ 2019-11-25 10:30 ` Alexandru Elisei
  2019-11-28 17:41 ` [PATCH kvmtool 00/16] Add writable BARs and PCIE " Lorenzo Pieralisi
  16 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-25 10:30 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

PCI Express comes with an extended addressing scheme, which directly
translated into a bigger device configuration space (256->4096 bytes)
and bigger PCI configuration space (16->256 MB), as well as mandatory
capabilities (power management and PCI Express capabilities) [1].

Add support for PCI Express in preparation for running EDK2.

We take this opportunity to rewrite the initialization steps in
virtio_pci__init to make it more consistent and easy to follow.

[1] PCI Express Base Specification Revision 1.1

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/include/arm-common/kvm-arch.h |  2 +-
 arm/pci.c                         |  2 +-
 hw/vesa.c                         | 15 ++++++++
 include/kvm/pci.h                 | 57 +++++++++++++++++++++++-------
 pci.c                             |  5 +--
 virtio/pci.c                      | 58 +++++++++++++++++++++----------
 6 files changed, 104 insertions(+), 35 deletions(-)

diff --git a/arm/include/arm-common/kvm-arch.h b/arm/include/arm-common/kvm-arch.h
index b9d486d5eac2..aafb8bb797cb 100644
--- a/arm/include/arm-common/kvm-arch.h
+++ b/arm/include/arm-common/kvm-arch.h
@@ -23,7 +23,7 @@
 
 #define ARM_IOPORT_SIZE		(ARM_MMIO_AREA - ARM_IOPORT_AREA)
 #define ARM_VIRTIO_MMIO_SIZE	(ARM_AXI_AREA - (ARM_MMIO_AREA + ARM_GIC_SIZE))
-#define ARM_PCI_CFG_SIZE	(1ULL << 24)
+#define ARM_PCI_CFG_SIZE	(1ULL << 28)
 #define ARM_PCI_MMIO_SIZE	(ARM_MEMORY_AREA - \
 				(ARM_AXI_AREA + ARM_PCI_CFG_SIZE))
 
diff --git a/arm/pci.c b/arm/pci.c
index 4e6467357ce8..0c8e3aecc20b 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -80,7 +80,7 @@ void pci__generate_fdt_nodes(void *fdt)
 	_FDT(fdt_property_cell(fdt, "#address-cells", 0x3));
 	_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
 	_FDT(fdt_property_cell(fdt, "#interrupt-cells", 0x1));
-	_FDT(fdt_property_string(fdt, "compatible", "pci-host-cam-generic"));
+	_FDT(fdt_property_string(fdt, "compatible", "pci-host-ecam-generic"));
 	_FDT(fdt_property(fdt, "dma-coherent", NULL, 0));
 
 	_FDT(fdt_property(fdt, "bus-range", bus_range, sizeof(bus_range)));
diff --git a/hw/vesa.c b/hw/vesa.c
index 0191e9264666..d0caecf0b1b4 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -69,6 +69,21 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	vesa_base_addr			= (u16)r;
 	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
+	vesa_pci_device.capabilities	= (void *)&vesa_pci_device.pm - (void *)&vesa_pci_device;
+
+	vesa_pci_device.pm = (struct pm_cap) {
+		.cap	= PCI_CAP_ID_PM,
+		.next	= (void *)&vesa_pci_device.pcie - (void *)&vesa_pci_device,
+		.pmc	= PM_CAP_VER,
+		.pmcsr	= PCI_PM_CTRL_NO_SOFT_RESET,
+	};
+
+	vesa_pci_device.pcie = (struct pcie_cap) {
+		.cap		= PCI_CAP_ID_EXP,
+		.next		= 0,
+		.cap_reg	= PCIE_CAP_REG_DEV_LEGACY | PCIE_CAP_REG_VER,
+	};
+
 	device__register(&vesa_device);
 
 	mem = mmap(NULL, VESA_MEM_SIZE, PROT_RW, MAP_ANON_NORESERVE, -1, 0);
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index ccb155e3e8fe..1ac7809e80ed 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -20,7 +20,11 @@
 #define PCI_CONFIG_BUS_FORWARD	0xcfa
 #define PCI_IO_SIZE		0x100
 #define PCI_IOPORT_START	0x6200
-#define PCI_CFG_SIZE		(1ULL << 24)
+#define PCI_CFG_SIZE		(1ULL << 28)
+
+#define PCIE_CAP_REG_VER	0x1
+#define PCIE_CAP_REG_DEV_LEGACY	(1 << 4)
+#define PM_CAP_VER		0x3
 
 struct kvm;
 
@@ -28,19 +32,19 @@ union pci_config_address {
 	struct {
 #if __BYTE_ORDER == __LITTLE_ENDIAN
 		unsigned	reg_offset	: 2;		/* 1  .. 0  */
-		unsigned	register_number	: 6;		/* 7  .. 2  */
-		unsigned	function_number	: 3;		/* 10 .. 8  */
-		unsigned	device_number	: 5;		/* 15 .. 11 */
-		unsigned	bus_number	: 8;		/* 23 .. 16 */
-		unsigned	reserved	: 7;		/* 30 .. 24 */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
+		unsigned	function_number	: 3;		/* 14 .. 12  */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
 		unsigned	enable_bit	: 1;		/* 31       */
 #else
 		unsigned	enable_bit	: 1;		/* 31       */
-		unsigned	reserved	: 7;		/* 30 .. 24 */
-		unsigned	bus_number	: 8;		/* 23 .. 16 */
-		unsigned	device_number	: 5;		/* 15 .. 11 */
-		unsigned	function_number	: 3;		/* 10 .. 8  */
-		unsigned	register_number	: 6;		/* 7  .. 2  */
+		unsigned	reserved	: 3;		/* 30 .. 28 */
+		unsigned	bus_number	: 8;		/* 27 .. 20 */
+		unsigned	device_number	: 5;		/* 19 .. 15 */
+		unsigned	function_number	: 3;		/* 14 .. 12  */
+		unsigned	register_number	: 10;		/* 11 .. 2  */
 		unsigned	reg_offset	: 2;		/* 1  .. 0  */
 #endif
 	};
@@ -88,8 +92,35 @@ struct pci_cap_hdr {
 	u8	next;
 };
 
+struct pcie_cap {
+	u8 cap;
+	u8 next;
+	u16 cap_reg;
+	u32 dev_cap;
+	u16 dev_ctrl;
+	u16 dev_status;
+	u32 link_cap;
+	u16 link_ctrl;
+	u16 link_status;
+	u32 slot_cap;
+	u16 slot_ctrl;
+	u16 slot_status;
+	u16 root_ctrl;
+	u16 root_cap;
+	u32 root_status;
+};
+
+struct pm_cap {
+	u8 cap;
+	u8 next;
+	u16 pmc;
+	u16 pmcsr;
+	u8 pmcsr_bse;
+	u8 data;
+};
+
 #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
-#define PCI_DEV_CFG_SIZE	256
+#define PCI_DEV_CFG_SIZE	4096
 #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
 struct pci_device_header;
@@ -127,6 +158,8 @@ struct pci_device_header {
 			u8		irq_pin;
 			u8		min_gnt;
 			u8		max_lat;
+			struct pm_cap pm;
+			struct pcie_cap pcie;
 			struct msix_cap msix;
 		} __attribute__((packed));
 		/* Pad to PCI config space size */
diff --git a/pci.c b/pci.c
index b4677434c50c..f1bfa5a8d157 100644
--- a/pci.c
+++ b/pci.c
@@ -155,7 +155,8 @@ static struct ioport_operations pci_config_data_ops = {
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
-	u8 bar, offset;
+	u8 bar;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 	u32 value = 0;
@@ -222,7 +223,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
-	u8 offset;
+	u16 offset;
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 
diff --git a/virtio/pci.c b/virtio/pci.c
index dadb796e6d62..004a136af672 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -666,36 +666,56 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.bar[1]                 = vpci->mmio_addr | PCI_BASE_ADDRESS_SPACE_MEMORY,
 		.bar[2]                 = vpci->msix_io_block | PCI_BASE_ADDRESS_SPACE_MEMORY,
 		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
-		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
+		.capabilities		= (void *)&vpci->pci_hdr.pm - (void *)&vpci->pci_hdr,
 		.bar_size[0]		= cpu_to_le32(PCI_IO_SIZE),
 		.bar_size[1]		= cpu_to_le32(PCI_IO_SIZE),
 		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
 		.cfg_ops.write		= virtio_pci__config_wr,
 	};
 
+	vpci->pci_hdr.pm = (struct pm_cap) {
+		.cap	= PCI_CAP_ID_PM,
+		.next	= (void *)&vpci->pci_hdr.pcie - (void *)&vpci->pci_hdr,
+		.pmc	= PM_CAP_VER,
+		/*
+		 * We don't do device state emulation, let the driver know that
+		 * going from D0->D3hot->D0 won't change the internal state of
+		 * the device.
+		 */
+		.pmcsr	= PCI_PM_CTRL_NO_SOFT_RESET,
+	};
+
+	vpci->pci_hdr.pcie = (struct pcie_cap) {
+		.cap		= PCI_CAP_ID_EXP,
+		.next		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
+		.cap_reg	= PCIE_CAP_REG_DEV_LEGACY | PCIE_CAP_REG_VER,
+	};
+
+	vpci->pci_hdr.msix = (struct msix_cap) {
+		.cap		= PCI_CAP_ID_MSIX,
+		.next		= 0,
+		/*
+		 * We at most have VIRTIO_PCI_MAX_VQ entries for virt queue,
+		 * VIRTIO_PCI_MAX_CONFIG entries for config.
+		 *
+		 * To quote the PCI spec:
+		 *
+		 * System software reads this field to determine the
+		 * MSI-X Table Size N, which is encoded as N-1.
+		 * For example, a returned value of "00000000011"
+		 * indicates a table size of 4.
+		 */
+		.ctrl		= cpu_to_le16(VIRTIO_PCI_MAX_VQ + VIRTIO_PCI_MAX_CONFIG - 1),
+		/* Both table and PBA are mapped to the same BAR (2) */
+		.table_offset	= cpu_to_le32(2),
+		.pba_offset	= cpu_to_le32(2 | PCI_IO_SIZE),
+	};
+
 	vpci->dev_hdr = (struct device_header) {
 		.bus_type		= DEVICE_BUS_PCI,
 		.data			= &vpci->pci_hdr,
 	};
 
-	vpci->pci_hdr.msix.cap = PCI_CAP_ID_MSIX;
-	vpci->pci_hdr.msix.next = 0;
-	/*
-	 * We at most have VIRTIO_PCI_MAX_VQ entries for virt queue,
-	 * VIRTIO_PCI_MAX_CONFIG entries for config.
-	 *
-	 * To quote the PCI spec:
-	 *
-	 * System software reads this field to determine the
-	 * MSI-X Table Size N, which is encoded as N-1.
-	 * For example, a returned value of "00000000011"
-	 * indicates a table size of 4.
-	 */
-	vpci->pci_hdr.msix.ctrl = cpu_to_le16(VIRTIO_PCI_MAX_VQ + VIRTIO_PCI_MAX_CONFIG - 1);
-
-	/* Both table and PBA are mapped to the same BAR (2) */
-	vpci->pci_hdr.msix.table_offset = cpu_to_le32(2);
-	vpci->pci_hdr.msix.pba_offset = cpu_to_le32(2 | PCI_IO_SIZE);
 	vpci->config_vector = 0;
 
 	if (irq__can_signal_msi(kvm))
-- 
2.20.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH kvmtool 01/16] Makefile: Use correct objcopy binary when cross-compiling for x86_64
  2019-11-25 10:30 ` [PATCH kvmtool 01/16] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
@ 2019-11-27 18:23   ` Andre Przywara
  0 siblings, 0 replies; 26+ messages in thread
From: Andre Przywara @ 2019-11-27 18:23 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On Mon, 25 Nov 2019 10:30:18 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> Use the compiler toolchain version of objcopy instead of the native one
> when cross-compiling for the x86_64 architecture.

Indeed, confirmed to fix aarch64->x86_64 cross builds (that's a thing!).

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  Makefile | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index b76d844f2e01..6d6880dd4f8a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -22,6 +22,7 @@ CC	:= $(CROSS_COMPILE)gcc
>  CFLAGS	:=
>  LD	:= $(CROSS_COMPILE)ld
>  LDFLAGS	:=
> +OBJCOPY	:= $(CROSS_COMPILE)objcopy
>  
>  FIND	:= find
>  CSCOPE	:= cscope
> @@ -479,7 +480,7 @@ x86/bios/bios.bin.elf: x86/bios/entry.S x86/bios/e820.c x86/bios/int10.c x86/bio
>  
>  x86/bios/bios.bin: x86/bios/bios.bin.elf
>  	$(E) "  OBJCOPY " $@
> -	$(Q) objcopy -O binary -j .text x86/bios/bios.bin.elf x86/bios/bios.bin
> +	$(Q) $(OBJCOPY) -O binary -j .text x86/bios/bios.bin.elf x86/bios/bios.bin
>  
>  x86/bios/bios-rom.o: x86/bios/bios-rom.S x86/bios/bios.bin x86/bios/bios-rom.h
>  	$(E) "  CC      " $@


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH kvmtool 02/16] pci: Fix BAR resource sizing arbitration
  2019-11-25 10:30 ` [PATCH kvmtool 02/16] pci: Fix BAR resource sizing arbitration Alexandru Elisei
@ 2019-11-27 18:24   ` Andre Przywara
  2019-11-28  9:37     ` Alexandru Elisei
  0 siblings, 1 reply; 26+ messages in thread
From: Andre Przywara @ 2019-11-27 18:24 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On Mon, 25 Nov 2019 10:30:19 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> From: Sami Mujawar <sami.mujawar@arm.com>
> 
> According to the 'PCI Local Bus Specification, Revision 3.0,
> February 3, 2004, Section 6.2.5.1, Implementation Notes, page 227'
> 
>     "Software saves the original value of the Base Address register,
>     writes 0 FFFF FFFFh to the register, then reads it back. Size
>     calculation can be done from the 32-bit value read by first
>     clearing encoding information bits (bit 0 for I/O, bits 0-3 for
>     memory), inverting all 32 bits (logical NOT), then incrementing
>     by 1. The resultant 32-bit value is the memory/I/O range size
>     decoded by the register. Note that the upper 16 bits of the result
>     is ignored if the Base Address register is for I/O and bits 16-31
>     returned zero upon read."
> 
> kvmtool was returning the actual BAR resource size which would be
> incorrect as the software software drivers would invert all 32 bits
> (logical NOT), then incrementing by 1. This ends up with a very large
> resource size (in some cases more than 4GB) due to which drivers
> assert/fail to work.
> 
> e.g if the BAR resource size was 0x1000, kvmtool would return 0x1000
> instead of 0xFFFFF00x.
> 
> Fixed pci__config_wr() to return the size of the BAR in accordance with
> the PCI Local Bus specification, Implementation Notes.
> 
> Cc: julien.thierry.kdev@gmail.com
> Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> [Reworked algorithm, removed power-of-two check]
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  pci.c | 42 ++++++++++++++++++++++++++++++++++++------
>  1 file changed, 36 insertions(+), 6 deletions(-)
> 
> diff --git a/pci.c b/pci.c
> index 689869cb79a3..e1b57325bdeb 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -149,6 +149,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	u8 bar, offset;
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
> +	u32 value = 0;
>  
>  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
>  		return;
> @@ -169,13 +170,42 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>  
>  	/*
> -	 * If the kernel masks the BAR it would expect to find the size of the
> -	 * BAR there next time it reads from it. When the kernel got the size it
> -	 * would write the address back.
> +	 * If the kernel masks the BAR, it will expect to find the size of the
> +	 * BAR there next time it reads from it. After the kernel reads the
> +	 * size, it will write the address back.
>  	 */
> -	if (bar < 6 && ioport__read32(data) == 0xFFFFFFFF) {
> -		u32 sz = pci_hdr->bar_size[bar];
> -		memcpy(base + offset, &sz, sizeof(sz));
> +	if (bar < 6) {
> +		memcpy(&value, data, size);
> +		if (value == 0xffffffff)
> +			/*
> +			 * According to the PCI local bus specification REV 3.0:

So this whole comment breaks 80 columns. Can we just move it one level up/left, putting it before the if statement? This also fixes the confusing indentation below.

> +			 * The number of upper bits that a device actually implements
> +			 * depends on how much of the address space the device will
> +			 * respond to. A device that wants a 1 MB memory address space
> +			 * (using a 32-bit base address register) would build the top
> +			 * 12 bits of the address register, hardwiring the other bits
> +			 * to 0.
> +			 *
> +			 * Furthermore, software can determine how much address space
> +			 * the device requires by writing a value of all 1's to the
> +			 * register and then reading the value back. The device will
> +			 * return 0's in all don't-care address bits, effectively
> +			 * specifying the address space required.
> +			 *
> +			 * Software computes the size of the address space with the
> +			 * formula S = ~B + 1, where S is the memory size and B is the
> +			 * value read from the BAR. This means that the BAR value that
> +			 * kvmtool should return is B = ~(S - 1).
> +			 */
> +			value = ~(pci_hdr->bar_size[bar] - 1);
> +		if (pci_hdr->bar[bar] & 0x1)

Should this be PCI_BASE_ADDRESS_SPACE_IO, for clarity?

> +			value = (value & PCI_BASE_ADDRESS_IO_MASK) | \

backslash not needed

> +				PCI_BASE_ADDRESS_SPACE_IO;
> +		else
> +			/* Memory Space BAR, preserve the first 4 bits. */
> +			value = (value & PCI_BASE_ADDRESS_MEM_MASK) | \

same here.

> +				(pci_hdr->bar[bar] & ~PCI_BASE_ADDRESS_MEM_MASK);
> +		memcpy(base + offset, &value, size);
>  	} else {
>  		memcpy(base + offset, data, size);
>  	}

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH kvmtool 03/16] Remove pci-shmem device
  2019-11-25 10:30 ` [PATCH kvmtool 03/16] Remove pci-shmem device Alexandru Elisei
@ 2019-11-27 18:24   ` Andre Przywara
  0 siblings, 0 replies; 26+ messages in thread
From: Andre Przywara @ 2019-11-27 18:24 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On Mon, 25 Nov 2019 10:30:20 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> The pci-shmem emulated device ("ivshmem") was created by QEMU for
> cross-VM data sharing. The only Linux driver that uses this device is
> the Android Virtual System on a Chip staging driver, which also mentions
> a character device driver implemented on top of shmem, which was removed
> from Linux.
> 
> On the kvmtool side, the only commits touching the pci-shmem device
> since it was introduced in 2012 were made when refactoring various
> kvmtool subsystems. Let's remove the maintenance burden on the kvmtool
> maintainers and remove this unused device.

Yeah, sometimes you just have to let it go ;-)

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  Makefile                |   1 -
>  builtin-run.c           |   5 -
>  hw/pci-shmem.c          | 400 ----------------------------------------
>  include/kvm/pci-shmem.h |  32 ----
>  4 files changed, 438 deletions(-)
>  delete mode 100644 hw/pci-shmem.c
>  delete mode 100644 include/kvm/pci-shmem.h
> 
> diff --git a/Makefile b/Makefile
> index 6d6880dd4f8a..99c6a9e24d72 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -99,7 +99,6 @@ OBJS	+= util/read-write.o
>  OBJS	+= util/util.o
>  OBJS	+= virtio/9p.o
>  OBJS	+= virtio/9p-pdu.o
> -OBJS	+= hw/pci-shmem.o
>  OBJS	+= kvm-ipc.o
>  OBJS	+= builtin-sandbox.o
>  OBJS	+= virtio/mmio.o
> diff --git a/builtin-run.c b/builtin-run.c
> index f8dc6c7229b0..9cb8c75300eb 100644
> --- a/builtin-run.c
> +++ b/builtin-run.c
> @@ -31,7 +31,6 @@
>  #include "kvm/sdl.h"
>  #include "kvm/vnc.h"
>  #include "kvm/guest_compat.h"
> -#include "kvm/pci-shmem.h"
>  #include "kvm/kvm-ipc.h"
>  #include "kvm/builtin-debug.h"
>  
> @@ -99,10 +98,6 @@ void kvm_run_set_wrapper_sandbox(void)
>  	OPT_INTEGER('c', "cpus", &(cfg)->nrcpus, "Number of CPUs"),	\
>  	OPT_U64('m', "mem", &(cfg)->ram_size, "Virtual machine memory"	\
>  		" size in MiB."),					\
> -	OPT_CALLBACK('\0', "shmem", NULL,				\
> -		     "[pci:]<addr>:<size>[:handle=<handle>][:create]",	\
> -		     "Share host shmem with guest via pci device",	\
> -		     shmem_parser, NULL),				\
>  	OPT_CALLBACK('d', "disk", kvm, "image or rootfs_dir", "Disk "	\
>  			" image or rootfs directory", img_name_parser,	\
>  			kvm),						\
> diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
> deleted file mode 100644
> index f92bc75544d7..000000000000
> --- a/hw/pci-shmem.c
> +++ /dev/null
> @@ -1,400 +0,0 @@
> -#include "kvm/devices.h"
> -#include "kvm/pci-shmem.h"
> -#include "kvm/virtio-pci-dev.h"
> -#include "kvm/irq.h"
> -#include "kvm/kvm.h"
> -#include "kvm/pci.h"
> -#include "kvm/util.h"
> -#include "kvm/ioport.h"
> -#include "kvm/ioeventfd.h"
> -
> -#include <linux/kvm.h>
> -#include <linux/byteorder.h>
> -#include <sys/ioctl.h>
> -#include <fcntl.h>
> -#include <sys/mman.h>
> -
> -#define MB_SHIFT (20)
> -#define KB_SHIFT (10)
> -#define GB_SHIFT (30)
> -
> -static struct pci_device_header pci_shmem_pci_device = {
> -	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> -	.device_id	= cpu_to_le16(0x1110),
> -	.header_type	= PCI_HEADER_TYPE_NORMAL,
> -	.class[2]	= 0xFF,	/* misc pci device */
> -	.status		= cpu_to_le16(PCI_STATUS_CAP_LIST),
> -	.capabilities	= (void *)&pci_shmem_pci_device.msix - (void *)&pci_shmem_pci_device,
> -	.msix.cap	= PCI_CAP_ID_MSIX,
> -	.msix.ctrl	= cpu_to_le16(1),
> -	.msix.table_offset = cpu_to_le32(1),		/* Use BAR 1 */
> -	.msix.pba_offset = cpu_to_le32(0x1001),		/* Use BAR 1 */
> -};
> -
> -static struct device_header pci_shmem_device = {
> -	.bus_type	= DEVICE_BUS_PCI,
> -	.data		= &pci_shmem_pci_device,
> -};
> -
> -/* registers for the Inter-VM shared memory device */
> -enum ivshmem_registers {
> -	INTRMASK = 0,
> -	INTRSTATUS = 4,
> -	IVPOSITION = 8,
> -	DOORBELL = 12,
> -};
> -
> -static struct shmem_info *shmem_region;
> -static u16 ivshmem_registers;
> -static int local_fd;
> -static u32 local_id;
> -static u64 msix_block;
> -static u64 msix_pba;
> -static struct msix_table msix_table[2];
> -
> -int pci_shmem__register_mem(struct shmem_info *si)
> -{
> -	if (!shmem_region) {
> -		shmem_region = si;
> -	} else {
> -		pr_warning("only single shmem currently avail. ignoring.\n");
> -		free(si);
> -	}
> -	return 0;
> -}
> -
> -static bool shmem_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
> -{
> -	u16 offset = port - ivshmem_registers;
> -
> -	switch (offset) {
> -	case INTRMASK:
> -		break;
> -	case INTRSTATUS:
> -		break;
> -	case IVPOSITION:
> -		ioport__write32(data, local_id);
> -		break;
> -	case DOORBELL:
> -		break;
> -	};
> -
> -	return true;
> -}
> -
> -static bool shmem_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
> -{
> -	u16 offset = port - ivshmem_registers;
> -
> -	switch (offset) {
> -	case INTRMASK:
> -		break;
> -	case INTRSTATUS:
> -		break;
> -	case IVPOSITION:
> -		break;
> -	case DOORBELL:
> -		break;
> -	};
> -
> -	return true;
> -}
> -
> -static struct ioport_operations shmem_pci__io_ops = {
> -	.io_in	= shmem_pci__io_in,
> -	.io_out	= shmem_pci__io_out,
> -};
> -
> -static void callback_mmio_msix(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr)
> -{
> -	void *mem;
> -
> -	if (addr - msix_block < 0x1000)
> -		mem = &msix_table;
> -	else
> -		mem = &msix_pba;
> -
> -	if (is_write)
> -		memcpy(mem + addr - msix_block, data, len);
> -	else
> -		memcpy(data, mem + addr - msix_block, len);
> -}
> -
> -/*
> - * Return an irqfd which can be used by other guests to signal this guest
> - * whenever they need to poke it
> - */
> -int pci_shmem__get_local_irqfd(struct kvm *kvm)
> -{
> -	int fd, gsi, r;
> -
> -	if (local_fd == 0) {
> -		fd = eventfd(0, 0);
> -		if (fd < 0)
> -			return fd;
> -
> -		if (pci_shmem_pci_device.msix.ctrl & cpu_to_le16(PCI_MSIX_FLAGS_ENABLE)) {
> -			gsi = irq__add_msix_route(kvm, &msix_table[0].msg,
> -						  pci_shmem_device.dev_num << 3);
> -			if (gsi < 0)
> -				return gsi;
> -		} else {
> -			gsi = pci_shmem_pci_device.irq_line;
> -		}
> -
> -		r = irq__add_irqfd(kvm, gsi, fd, -1);
> -		if (r < 0)
> -			return r;
> -
> -		local_fd = fd;
> -	}
> -
> -	return local_fd;
> -}
> -
> -/*
> - * Connect a new client to ivshmem by adding the appropriate datamatch
> - * to the DOORBELL
> - */
> -int pci_shmem__add_client(struct kvm *kvm, u32 id, int fd)
> -{
> -	struct kvm_ioeventfd ioevent;
> -
> -	ioevent = (struct kvm_ioeventfd) {
> -		.addr		= ivshmem_registers + DOORBELL,
> -		.len		= sizeof(u32),
> -		.datamatch	= id,
> -		.fd		= fd,
> -		.flags		= KVM_IOEVENTFD_FLAG_PIO | KVM_IOEVENTFD_FLAG_DATAMATCH,
> -	};
> -
> -	return ioctl(kvm->vm_fd, KVM_IOEVENTFD, &ioevent);
> -}
> -
> -/*
> - * Remove a client connected to ivshmem by removing the appropriate datamatch
> - * from the DOORBELL
> - */
> -int pci_shmem__remove_client(struct kvm *kvm, u32 id)
> -{
> -	struct kvm_ioeventfd ioevent;
> -
> -	ioevent = (struct kvm_ioeventfd) {
> -		.addr		= ivshmem_registers + DOORBELL,
> -		.len		= sizeof(u32),
> -		.datamatch	= id,
> -		.flags		= KVM_IOEVENTFD_FLAG_PIO
> -				| KVM_IOEVENTFD_FLAG_DATAMATCH
> -				| KVM_IOEVENTFD_FLAG_DEASSIGN,
> -	};
> -
> -	return ioctl(kvm->vm_fd, KVM_IOEVENTFD, &ioevent);
> -}
> -
> -static void *setup_shmem(const char *key, size_t len, int creating)
> -{
> -	int fd;
> -	int rtn;
> -	void *mem;
> -	int flag = O_RDWR;
> -
> -	if (creating)
> -		flag |= O_CREAT;
> -
> -	fd = shm_open(key, flag, S_IRUSR | S_IWUSR);
> -	if (fd < 0) {
> -		pr_warning("Failed to open shared memory file %s\n", key);
> -		return NULL;
> -	}
> -
> -	if (creating) {
> -		rtn = ftruncate(fd, (off_t) len);
> -		if (rtn < 0)
> -			pr_warning("Can't ftruncate(fd,%zu)\n", len);
> -	}
> -	mem = mmap(NULL, len,
> -		   PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, 0);
> -	if (mem == MAP_FAILED) {
> -		pr_warning("Failed to mmap shared memory file");
> -		mem = NULL;
> -	}
> -	close(fd);
> -
> -	return mem;
> -}
> -
> -int shmem_parser(const struct option *opt, const char *arg, int unset)
> -{
> -	const u64 default_size = SHMEM_DEFAULT_SIZE;
> -	const u64 default_phys_addr = SHMEM_DEFAULT_ADDR;
> -	const char *default_handle = SHMEM_DEFAULT_HANDLE;
> -	struct shmem_info *si = malloc(sizeof(struct shmem_info));
> -	u64 phys_addr;
> -	u64 size;
> -	char *handle = NULL;
> -	int create = 0;
> -	const char *p = arg;
> -	char *next;
> -	int base = 10;
> -	int verbose = 0;
> -
> -	const int skip_pci = strlen("pci:");
> -	if (verbose)
> -		pr_info("shmem_parser(%p,%s,%d)", opt, arg, unset);
> -	/* parse out optional addr family */
> -	if (strcasestr(p, "pci:")) {
> -		p += skip_pci;
> -	} else if (strcasestr(p, "mem:")) {
> -		die("I can't add to E820 map yet.\n");
> -	}
> -	/* parse out physical addr */
> -	base = 10;
> -	if (strcasestr(p, "0x"))
> -		base = 16;
> -	phys_addr = strtoll(p, &next, base);
> -	if (next == p && phys_addr == 0) {
> -		pr_info("shmem: no physical addr specified, using default.");
> -		phys_addr = default_phys_addr;
> -	}
> -	if (*next != ':' && *next != '\0')
> -		die("shmem: unexpected chars after phys addr.\n");
> -	if (*next == '\0')
> -		p = next;
> -	else
> -		p = next + 1;
> -	/* parse out size */
> -	base = 10;
> -	if (strcasestr(p, "0x"))
> -		base = 16;
> -	size = strtoll(p, &next, base);
> -	if (next == p && size == 0) {
> -		pr_info("shmem: no size specified, using default.");
> -		size = default_size;
> -	}
> -	/* look for [KMGkmg][Bb]*  uses base 2. */
> -	int skip_B = 0;
> -	if (strspn(next, "KMGkmg")) {	/* might have a prefix */
> -		if (*(next + 1) == 'B' || *(next + 1) == 'b')
> -			skip_B = 1;
> -		switch (*next) {
> -		case 'K':
> -		case 'k':
> -			size = size << KB_SHIFT;
> -			break;
> -		case 'M':
> -		case 'm':
> -			size = size << MB_SHIFT;
> -			break;
> -		case 'G':
> -		case 'g':
> -			size = size << GB_SHIFT;
> -			break;
> -		default:
> -			die("shmem: bug in detecting size prefix.");
> -			break;
> -		}
> -		next += 1 + skip_B;
> -	}
> -	if (*next != ':' && *next != '\0') {
> -		die("shmem: unexpected chars after phys size. <%c><%c>\n",
> -		    *next, *p);
> -	}
> -	if (*next == '\0')
> -		p = next;
> -	else
> -		p = next + 1;
> -	/* parse out optional shmem handle */
> -	const int skip_handle = strlen("handle=");
> -	next = strcasestr(p, "handle=");
> -	if (*p && next) {
> -		if (p != next)
> -			die("unexpected chars before handle\n");
> -		p += skip_handle;
> -		next = strchrnul(p, ':');
> -		if (next - p) {
> -			handle = malloc(next - p + 1);
> -			strncpy(handle, p, next - p);
> -			handle[next - p] = '\0';	/* just in case. */
> -		}
> -		if (*next == '\0')
> -			p = next;
> -		else
> -			p = next + 1;
> -	}
> -	/* parse optional create flag to see if we should create shm seg. */
> -	if (*p && strcasestr(p, "create")) {
> -		create = 1;
> -		p += strlen("create");
> -	}
> -	if (*p != '\0')
> -		die("shmem: unexpected trailing chars\n");
> -	if (handle == NULL) {
> -		handle = malloc(strlen(default_handle) + 1);
> -		strcpy(handle, default_handle);
> -	}
> -	if (verbose) {
> -		pr_info("shmem: phys_addr = %llx",
> -			(unsigned long long)phys_addr);
> -		pr_info("shmem: size      = %llx", (unsigned long long)size);
> -		pr_info("shmem: handle    = %s", handle);
> -		pr_info("shmem: create    = %d", create);
> -	}
> -
> -	si->phys_addr = phys_addr;
> -	si->size = size;
> -	si->handle = handle;
> -	si->create = create;
> -	pci_shmem__register_mem(si);	/* ownership of si, etc. passed on. */
> -	return 0;
> -}
> -
> -int pci_shmem__init(struct kvm *kvm)
> -{
> -	char *mem;
> -	int r;
> -
> -	if (shmem_region == NULL)
> -		return 0;
> -
> -	/* Register MMIO space for MSI-X */
> -	r = ioport__register(kvm, IOPORT_EMPTY, &shmem_pci__io_ops, IOPORT_SIZE, NULL);
> -	if (r < 0)
> -		return r;
> -	ivshmem_registers = (u16)r;
> -
> -	msix_block = pci_get_io_space_block(0x1010);
> -	kvm__register_mmio(kvm, msix_block, 0x1010, false, callback_mmio_msix, NULL);
> -
> -	/*
> -	 * This registers 3 BARs:
> -	 *
> -	 * 0 - ivshmem registers
> -	 * 1 - MSI-X MMIO space
> -	 * 2 - Shared memory block
> -	 */
> -	pci_shmem_pci_device.bar[0] = cpu_to_le32(ivshmem_registers | PCI_BASE_ADDRESS_SPACE_IO);
> -	pci_shmem_pci_device.bar_size[0] = shmem_region->size;
> -	pci_shmem_pci_device.bar[1] = cpu_to_le32(msix_block | PCI_BASE_ADDRESS_SPACE_MEMORY);
> -	pci_shmem_pci_device.bar_size[1] = 0x1010;
> -	pci_shmem_pci_device.bar[2] = cpu_to_le32(shmem_region->phys_addr | PCI_BASE_ADDRESS_SPACE_MEMORY);
> -	pci_shmem_pci_device.bar_size[2] = shmem_region->size;
> -
> -	device__register(&pci_shmem_device);
> -
> -	/* Open shared memory and plug it into the guest */
> -	mem = setup_shmem(shmem_region->handle, shmem_region->size,
> -				shmem_region->create);
> -	if (mem == NULL)
> -		return -EINVAL;
> -
> -	kvm__register_dev_mem(kvm, shmem_region->phys_addr, shmem_region->size,
> -			      mem);
> -	return 0;
> -}
> -dev_init(pci_shmem__init);
> -
> -int pci_shmem__exit(struct kvm *kvm)
> -{
> -	return 0;
> -}
> -dev_exit(pci_shmem__exit);
> diff --git a/include/kvm/pci-shmem.h b/include/kvm/pci-shmem.h
> deleted file mode 100644
> index 6cff2b85bfd3..000000000000
> --- a/include/kvm/pci-shmem.h
> +++ /dev/null
> @@ -1,32 +0,0 @@
> -#ifndef KVM__PCI_SHMEM_H
> -#define KVM__PCI_SHMEM_H
> -
> -#include <linux/types.h>
> -#include <linux/list.h>
> -
> -#include "kvm/parse-options.h"
> -
> -#define SHMEM_DEFAULT_SIZE (16 << MB_SHIFT)
> -#define SHMEM_DEFAULT_ADDR (0xc8000000)
> -#define SHMEM_DEFAULT_HANDLE "/kvm_shmem"
> -
> -struct kvm;
> -struct shmem_info;
> -
> -struct shmem_info {
> -	u64 phys_addr;
> -	u64 size;
> -	char *handle;
> -	int create;
> -};
> -
> -int pci_shmem__init(struct kvm *kvm);
> -int pci_shmem__exit(struct kvm *kvm);
> -int pci_shmem__register_mem(struct shmem_info *si);
> -int shmem_parser(const struct option *opt, const char *arg, int unset);
> -
> -int pci_shmem__get_local_irqfd(struct kvm *kvm);
> -int pci_shmem__add_client(struct kvm *kvm, u32 id, int fd);
> -int pci_shmem__remove_client(struct kvm *kvm, u32 id);
> -
> -#endif


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH kvmtool 04/16] Check that a PCI device's memory size is power of two
  2019-11-25 10:30 ` [PATCH kvmtool 04/16] Check that a PCI device's memory size is power of two Alexandru Elisei
@ 2019-11-27 18:25   ` Andre Przywara
  0 siblings, 0 replies; 26+ messages in thread
From: Andre Przywara @ 2019-11-27 18:25 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On Mon, 25 Nov 2019 10:30:21 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

Hi,

> According to the PCI local bus specification [1], a device's memory size
> must be a power of two. This is also implicit in the mechanism that a CPU
> uses to get the memory size requirement for a PCI device.
> 
> The vesa device requests a memory size that isn't a power of two.
> According to the same spec [1], a device is allowed to consume more memory
> than it actually requires. As a result, the amount of memory that the vesa
> device now reserves has been increased.
> 
> To prevent slip-ups in the future, a few BUILD_BUG_ON statements were added
> in places where the memory size is known at compile time.
> 
> [1] PCI Local Bus Specification Revision 3.0, section 6.2.5.1
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  hw/vesa.c          | 2 ++
>  include/kvm/util.h | 2 ++
>  include/kvm/vesa.h | 6 +++++-
>  vfio/pci.c         | 5 +++++
>  virtio/pci.c       | 3 +++
>  5 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index f3c5114cf4fe..75670a51be5f 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -58,6 +58,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  	char *mem;
>  	int r;
>  
> +	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
> +
>  	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>  		return NULL;
>  
> diff --git a/include/kvm/util.h b/include/kvm/util.h
> index 4ca7aa9392b6..e90f1c2db39f 100644
> --- a/include/kvm/util.h
> +++ b/include/kvm/util.h
> @@ -104,6 +104,8 @@ static inline unsigned long roundup_pow_of_two(unsigned long x)
>  	return x ? 1UL << fls_long(x - 1) : 0;
>  }
>  
> +#define is_power_of_two(x)	((x) ? ((x) & ((x) - 1)) == 0 : 0)

This gives weird results for negative values (which the kernel avoids by having this a static inline and using an unsigned type).
Not sure we care, but maybe (x > 0) ? ... would fix this?

> +
>  struct kvm;
>  void *mmap_hugetlbfs(struct kvm *kvm, const char *htlbfs_path, u64 size);
>  void *mmap_anon_or_hugetlbfs(struct kvm *kvm, const char *hugetlbfs_path, u64 size);
> diff --git a/include/kvm/vesa.h b/include/kvm/vesa.h
> index 0fac11ab5a9f..e7d971343642 100644
> --- a/include/kvm/vesa.h
> +++ b/include/kvm/vesa.h
> @@ -5,8 +5,12 @@
>  #define VESA_HEIGHT	480
>  
>  #define VESA_MEM_ADDR	0xd0000000
> -#define VESA_MEM_SIZE	(4*VESA_WIDTH*VESA_HEIGHT)
>  #define VESA_BPP	32
> +/*
> + * We actually only need VESA_BPP/8*VESA_WIDTH*VESA_HEIGHT bytes. But the memory
> + * size must be a power of 2, so we round up.
> + */
> +#define VESA_MEM_SIZE	(1 << 21)

I don't think it's worth calculating the value and rounding it up, but can we have a BUILD_BUG checking that VESA_MEM_SIZE is big enough to hold the framebuffer?

Cheers,
Andre

>  
>  struct kvm;
>  struct biosregs;
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 76e24c156906..914732cc6897 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -831,6 +831,11 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  	/* Ignore invalid or unimplemented regions */
>  	if (!region->info.size)
>  		return 0;
> +	if (!is_power_of_two(region->info.size)) {
> +		vfio_dev_err(vdev, "region is not power of two: 0x%llx",
> +			     region->info.size);
> +		return -EINVAL;
> +	}
>  
>  	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX) {
>  		/* Trap and emulate MSI-X table */
> diff --git a/virtio/pci.c b/virtio/pci.c
> index 99653cad2c0f..04e801827df9 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -435,6 +435,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  	vpci->kvm = kvm;
>  	vpci->dev = dev;
>  
> +	BUILD_BUG_ON(!is_power_of_two(IOPORT_SIZE));
> +	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
> +
>  	r = ioport__register(kvm, IOPORT_EMPTY, &virtio_pci__io_ops, IOPORT_SIZE, vdev);
>  	if (r < 0)
>  		return r;


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH kvmtool 02/16] pci: Fix BAR resource sizing arbitration
  2019-11-27 18:24   ` Andre Przywara
@ 2019-11-28  9:37     ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-28  9:37 UTC (permalink / raw)
  To: Andre Przywara
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

Hi,

Thank you for reviewing this!

On 11/27/19 6:24 PM, Andre Przywara wrote:
> On Mon, 25 Nov 2019 10:30:19 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> From: Sami Mujawar <sami.mujawar@arm.com>
>>
>> According to the 'PCI Local Bus Specification, Revision 3.0,
>> February 3, 2004, Section 6.2.5.1, Implementation Notes, page 227'
>>
>>     "Software saves the original value of the Base Address register,
>>     writes 0 FFFF FFFFh to the register, then reads it back. Size
>>     calculation can be done from the 32-bit value read by first
>>     clearing encoding information bits (bit 0 for I/O, bits 0-3 for
>>     memory), inverting all 32 bits (logical NOT), then incrementing
>>     by 1. The resultant 32-bit value is the memory/I/O range size
>>     decoded by the register. Note that the upper 16 bits of the result
>>     is ignored if the Base Address register is for I/O and bits 16-31
>>     returned zero upon read."
>>
>> kvmtool was returning the actual BAR resource size which would be
>> incorrect as the software software drivers would invert all 32 bits
>> (logical NOT), then incrementing by 1. This ends up with a very large
>> resource size (in some cases more than 4GB) due to which drivers
>> assert/fail to work.
>>
>> e.g if the BAR resource size was 0x1000, kvmtool would return 0x1000
>> instead of 0xFFFFF00x.
>>
>> Fixed pci__config_wr() to return the size of the BAR in accordance with
>> the PCI Local Bus specification, Implementation Notes.
>>
>> Cc: julien.thierry.kdev@gmail.com
>> Signed-off-by: Sami Mujawar <sami.mujawar@arm.com>
>> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
>> [Reworked algorithm, removed power-of-two check]
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>>  pci.c | 42 ++++++++++++++++++++++++++++++++++++------
>>  1 file changed, 36 insertions(+), 6 deletions(-)
>>
>> diff --git a/pci.c b/pci.c
>> index 689869cb79a3..e1b57325bdeb 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -149,6 +149,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	u8 bar, offset;
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>> +	u32 value = 0;
>>  
>>  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
>>  		return;
>> @@ -169,13 +170,42 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>>  
>>  	/*
>> -	 * If the kernel masks the BAR it would expect to find the size of the
>> -	 * BAR there next time it reads from it. When the kernel got the size it
>> -	 * would write the address back.
>> +	 * If the kernel masks the BAR, it will expect to find the size of the
>> +	 * BAR there next time it reads from it. After the kernel reads the
>> +	 * size, it will write the address back.
>>  	 */
>> -	if (bar < 6 && ioport__read32(data) == 0xFFFFFFFF) {
>> -		u32 sz = pci_hdr->bar_size[bar];
>> -		memcpy(base + offset, &sz, sizeof(sz));
>> +	if (bar < 6) {
>> +		memcpy(&value, data, size);
>> +		if (value == 0xffffffff)
>> +			/*
>> +			 * According to the PCI local bus specification REV 3.0:
> So this whole comment breaks 80 columns. Can we just move it one level up/left, putting it before the if statement? This also fixes the confusing indentation below.

Good idea, will do.

>
>> +			 * The number of upper bits that a device actually implements
>> +			 * depends on how much of the address space the device will
>> +			 * respond to. A device that wants a 1 MB memory address space
>> +			 * (using a 32-bit base address register) would build the top
>> +			 * 12 bits of the address register, hardwiring the other bits
>> +			 * to 0.
>> +			 *
>> +			 * Furthermore, software can determine how much address space
>> +			 * the device requires by writing a value of all 1's to the
>> +			 * register and then reading the value back. The device will
>> +			 * return 0's in all don't-care address bits, effectively
>> +			 * specifying the address space required.
>> +			 *
>> +			 * Software computes the size of the address space with the
>> +			 * formula S = ~B + 1, where S is the memory size and B is the
>> +			 * value read from the BAR. This means that the BAR value that
>> +			 * kvmtool should return is B = ~(S - 1).
>> +			 */
>> +			value = ~(pci_hdr->bar_size[bar] - 1);
>> +		if (pci_hdr->bar[bar] & 0x1)
> Should this be PCI_BASE_ADDRESS_SPACE_IO, for clarity?

Yes, that's better, will change.

>
>> +			value = (value & PCI_BASE_ADDRESS_IO_MASK) | \
> backslash not needed
>
>> +				PCI_BASE_ADDRESS_SPACE_IO;
>> +		else
>> +			/* Memory Space BAR, preserve the first 4 bits. */
>> +			value = (value & PCI_BASE_ADDRESS_MEM_MASK) | \
> same here.

I'll remove both.

Thanks,
Alex
>> +				(pci_hdr->bar[bar] & ~PCI_BASE_ADDRESS_MEM_MASK);
>> +		memcpy(base + offset, &value, size);
>>  	} else {
>>  		memcpy(base + offset, data, size);
>>  	}
> Cheers,
> Andre.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support
  2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
                   ` (15 preceding siblings ...)
  2019-11-25 10:30 ` [PATCH kvmtool 16/16] Add PCI Express 1.1 support Alexandru Elisei
@ 2019-11-28 17:41 ` " Lorenzo Pieralisi
  16 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Pieralisi @ 2019-11-28 17:41 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, andre.przywara, sami.mujawar

On Mon, Nov 25, 2019 at 10:30:17AM +0000, Alexandru Elisei wrote:
> kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
> it from trying to reprogram the BARs. Let's make the BARs writable so we
> can get rid of this band-aid.

For the sake of precision, PCI BARs are *always* writable and are indeed
written in eg Linux kernel enumeration code regardless of what DT
firmware property is present - in order to size them.

This series - which is very welcome - allows something different, namely
it allows changing the BARs value from a default address space
configuration aka reassigning BARs in Linux kernel speak.

Thanks,
Lorenzo

> Let's also extend the legacy PCI emulation, which came out in 1992, so we
> can properly emulate the PCI Express version 1.1 protocol, which is
> relatively newer, being published in 2005.
> 
> With these two changes, we are very close to running EDK2 as the firmware
> for the virtual machine; the only thing that is missing is flash emulation
> for storing firmware variables.
> 
> Summary of the patches:
> * Patches 1-12 are fixes or enhancements needed to support reprogramable
>   BARs.
> * Patches 13-15 add support for reprogramable BARs and remove the dt
>   property.
> * Patch 16 adds support for PCIE 1.1.
> 
> Based on the series at [1].
> 
> [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/034964.html
> 
> Alexandru Elisei (8):
>   Makefile: Use correct objcopy binary when cross-compiling for x86_64
>   Remove pci-shmem device
>   Check that a PCI device's memory size is power of two
>   arm: pci.c: Advertise only PCI bus 0 in the DT
>   virtio/pci: Ignore MMIO and I/O accesses when they are disabled
>   Use independent read/write locks for ioport and mmio
>   virtio/pci: Add support for BAR configuration
>   Add PCI Express 1.1 support
> 
> Julien Thierry (7):
>   ioport: pci: Move port allocations to PCI devices
>   pci: Fix ioport allocation size
>   arm/pci: Fix PCI IO region
>   arm/pci: Do not use first PCI IO space bytes for devices
>   virtio/pci: Make memory and IO BARs independent
>   vfio: Add support for BAR configuration
>   arm/fdt: Remove 'linux,pci-probe-only' property
> 
> Sami Mujawar (1):
>   pci: Fix BAR resource sizing arbitration
> 
>  Makefile                          |   4 +-
>  arm/fdt.c                         |   1 -
>  arm/include/arm-common/kvm-arch.h |   2 +-
>  arm/include/arm-common/pci.h      |   1 +
>  arm/kvm.c                         |   3 +
>  arm/pci.c                         |  28 ++-
>  builtin-run.c                     |   5 -
>  hw/pci-shmem.c                    | 400 ------------------------------
>  hw/vesa.c                         |  21 +-
>  include/kvm/ioport.h              |   4 -
>  include/kvm/pci-shmem.h           |  32 ---
>  include/kvm/pci.h                 |  61 ++++-
>  include/kvm/util.h                |   2 +
>  include/kvm/vesa.h                |   6 +-
>  include/kvm/virtio-pci.h          |   1 +
>  ioport.c                          |  36 +--
>  mmio.c                            |  26 +-
>  pci.c                             |  64 ++++-
>  powerpc/include/kvm/kvm-arch.h    |   2 +-
>  vfio/core.c                       |   6 +-
>  vfio/pci.c                        |  97 +++++++-
>  virtio/pci.c                      | 355 ++++++++++++++++++++------
>  x86/include/kvm/kvm-arch.h        |   2 +-
>  23 files changed, 562 insertions(+), 597 deletions(-)
>  delete mode 100644 hw/pci-shmem.c
>  delete mode 100644 include/kvm/pci-shmem.h
> 
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH kvmtool 05/16] arm: pci.c: Advertise only PCI bus 0 in the DT
  2019-11-25 10:30 ` [PATCH kvmtool 05/16] arm: pci.c: Advertise only PCI bus 0 in the DT Alexandru Elisei
@ 2019-11-28 17:43   ` Andre Przywara
  0 siblings, 0 replies; 26+ messages in thread
From: Andre Przywara @ 2019-11-28 17:43 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi

On Mon, 25 Nov 2019 10:30:22 +0000
Alexandru Elisei <alexandru.elisei@arm.com> wrote:

> The "bus-range" property encodes the first and last bus number. kvmtool
> uses bus 0 for PCI and bus 1 for MMIO. 

Mmmh, but this DT setting is about (guest visible) PCI busses, not kvmtool busses, isn't it?
So for the PCI devices we *emulate* that's probably correct, since we don't have any PCI bridge functionality among them, but wouldn't forwarding a PCI device with a bridge require more than one bus? And the guest OS' enumeration code would try to create a new bus, but fails, because there is only one?

So I agree that the [0, 1] looks somewhat arbitrary, but shouldn't we set it to [0, 255] instead, to not limit things?
I think this setting should correspond to the PCIe config space size we provide, which should be: 4096 bytes * 8 fns * 32 devs * nr_busses (for PCIe).

At least that's my understanding of these things, please correct me if I am wrong.

Cheers,
Andre.

> Advertise only the PCI bus in
> the PCI DT node by setting "bus-range" to <0, 0>.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arm/pci.c b/arm/pci.c
> index 557cfa98938d..ed325fa4a811 100644
> --- a/arm/pci.c
> +++ b/arm/pci.c
> @@ -30,7 +30,7 @@ void pci__generate_fdt_nodes(void *fdt)
>  	struct of_interrupt_map_entry irq_map[OF_PCI_IRQ_MAP_MAX];
>  	unsigned nentries = 0;
>  	/* Bus range */
> -	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(1), };
> +	u32 bus_range[] = { cpu_to_fdt32(0), cpu_to_fdt32(0), };
>  	/* Configuration Space */
>  	u64 cfg_reg_prop[] = { cpu_to_fdt64(KVM_PCI_CFG_AREA),
>  			       cpu_to_fdt64(ARM_PCI_CFG_SIZE), };


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH kvmtool 13/16] vfio: Add support for BAR configuration
  2019-11-25 10:30 ` [PATCH kvmtool 13/16] vfio: Add support for BAR configuration Alexandru Elisei
@ 2019-11-29 17:05   ` Alexandru Elisei
  0 siblings, 0 replies; 26+ messages in thread
From: Alexandru Elisei @ 2019-11-29 17:05 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi

Hi,

On 11/25/19 10:30 AM, Alexandru Elisei wrote:
> From: Julien Thierry <julien.thierry@arm.com>
>
> When a guest can reassign BARs, kvmtool needs to maintain the vfio_region
> consistent with their corresponding BARs. Take the new updated addresses
> from the PCI header read back from the vfio driver.
>
> Also, to modify the BARs, it is expected that guests will disable
> IO/Memory response in the PCI command. Support this by mapping/unmapping
> regions when the corresponding response gets enabled/disabled.
>
> Cc: julien.thierry.kdev@gmail.com
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> [Fixed BAR selection]
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  vfio/core.c |  8 ++---
>  vfio/pci.c  | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 87 insertions(+), 9 deletions(-)

I've finally had the chance to do more testing for PCI passthrough, and this patch
is pretty broken, so far I've found that: kvmtool does trap-and-emulate for the
BAR(s) dedicated to the MSI-X table and MSI-X PBA structure, and we don't take
that into account; vfio_unmap_region doesn't destroy the memslot; when the guest
enables memory or I/O accesses, we call vfio_map_region for all 6 BARs, even
though some of them might be unimplemented (their value is 0).

Thanks,
Alex
> diff --git a/vfio/core.c b/vfio/core.c
> index 0ed1e6fee6bf..b554897fc8c1 100644
> --- a/vfio/core.c
> +++ b/vfio/core.c
> @@ -202,14 +202,13 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
>  				  struct vfio_region *region)
>  {
>  	if (region->is_ioport) {
> -		int port = pci_get_io_port_block(region->info.size);
> +		int port = ioport__register(kvm, region->port_base,
> +					    &vfio_ioport_ops,
> +					    region->info.size, region);
>  
> -		port = ioport__register(kvm, port, &vfio_ioport_ops,
> -					region->info.size, region);
>  		if (port < 0)
>  			return port;
>  
> -		region->port_base = port;
>  		return 0;
>  	}
>  
> @@ -258,6 +257,7 @@ void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
>  {
>  	if (region->host_addr) {
>  		munmap(region->host_addr, region->info.size);
> +		region->host_addr = NULL;
>  	} else if (region->is_ioport) {
>  		ioport__unregister(kvm, region->port_base);
>  	} else {
> diff --git a/vfio/pci.c b/vfio/pci.c
> index bc5a6d452f7a..28f895c06b27 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -1,3 +1,4 @@
> +#include "kvm/ioport.h"
>  #include "kvm/irq.h"
>  #include "kvm/kvm.h"
>  #include "kvm/kvm-cpu.h"
> @@ -464,6 +465,67 @@ static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr
>  			      sz, offset);
>  }
>  
> +static void vfio_pci_cfg_handle_command(struct kvm *kvm, struct vfio_device *vdev,
> +					void *data, int sz)
> +{
> +	struct pci_device_header *hdr = &vdev->pci.hdr;
> +	bool toggle_io;
> +	bool toggle_mem;
> +	u16 cmd;
> +	int i;
> +
> +	cmd = ioport__read16(data);
> +	toggle_io = !!((cmd ^ hdr->command) & PCI_COMMAND_IO);
> +	toggle_mem = !!((cmd ^ hdr->command) & PCI_COMMAND_MEMORY);
> +
> +	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
> +		struct vfio_region *region = &vdev->regions[i];
> +
> +		if (region->is_ioport && toggle_io) {
> +			if (cmd & PCI_COMMAND_IO)
> +				vfio_map_region(kvm, vdev, region);
> +			else
> +				vfio_unmap_region(kvm, region);
> +		}
> +
> +		if (!region->is_ioport && toggle_mem) {
> +			if (cmd & PCI_COMMAND_MEMORY)
> +				vfio_map_region(kvm, vdev, region);
> +			else
> +				vfio_unmap_region(kvm, region);
> +		}
> +	}
> +}
> +
> +static void vfio_pci_cfg_update_bar(struct kvm *kvm, struct vfio_device *vdev,
> +				    int bar_num, void *data, int sz)
> +{
> +	struct pci_device_header *hdr = &vdev->pci.hdr;
> +	struct vfio_region *region;
> +	uint32_t bar;
> +
> +	region = &vdev->regions[bar_num + VFIO_PCI_BAR0_REGION_INDEX];
> +	bar = ioport__read32(data);
> +
> +	if (region->is_ioport) {
> +		if (hdr->command & PCI_COMMAND_IO)
> +			vfio_unmap_region(kvm, region);
> +
> +		region->port_base = bar & PCI_BASE_ADDRESS_IO_MASK;
> +
> +		if (hdr->command & PCI_COMMAND_IO)
> +			vfio_map_region(kvm, vdev, region);
> +	} else {
> +		if (hdr->command & PCI_COMMAND_MEMORY)
> +			vfio_unmap_region(kvm, region);
> +
> +		region->guest_phys_addr = bar & PCI_BASE_ADDRESS_MEM_MASK;
> +
> +		if (hdr->command & PCI_COMMAND_MEMORY)
> +			vfio_map_region(kvm, vdev, region);
> +	}
> +}
> +
>  static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
>  			       u8 offset, void *data, int sz)
>  {
> @@ -471,6 +533,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	struct vfio_pci_device *pdev;
>  	struct vfio_device *vdev;
>  	void *base = pci_hdr;
> +	int bar_num;
>  
>  	pdev = container_of(pci_hdr, struct vfio_pci_device, hdr);
>  	vdev = container_of(pdev, struct vfio_device, pci);
> @@ -487,9 +550,17 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
>  		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
>  
> +	if (offset == PCI_COMMAND)
> +		vfio_pci_cfg_handle_command(kvm, vdev, data, sz);
> +
>  	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
>  		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
>  			      sz, offset);
> +
> +	if (offset >= PCI_BASE_ADDRESS_0 && offset <= PCI_BASE_ADDRESS_5) {
> +		bar_num = (offset - PCI_BASE_ADDRESS_0) / sizeof(u32);
> +		vfio_pci_cfg_update_bar(kvm, vdev, bar_num, data, sz);
> +	}
>  }
>  
>  static ssize_t vfio_pci_msi_cap_size(struct msi_cap_64 *cap_hdr)
> @@ -808,6 +879,7 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  	size_t map_size;
>  	struct vfio_pci_device *pdev = &vdev->pci;
>  	struct vfio_region *region = &vdev->regions[nr];
> +	bool map_now;
>  
>  	if (nr >= vdev->info.num_regions)
>  		return 0;
> @@ -848,16 +920,22 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  		}
>  	}
>  
> -	if (!region->is_ioport) {
> +	if (region->is_ioport) {
> +		region->port_base = pci_get_io_port_block(region->info.size);
> +		map_now = !!(pdev->hdr.command & PCI_COMMAND_IO);
> +	} else {
>  		/* Grab some MMIO space in the guest */
>  		map_size = ALIGN(region->info.size, PAGE_SIZE);
>  		region->guest_phys_addr = pci_get_mmio_block(map_size);
> +		map_now = !!(pdev->hdr.command & PCI_COMMAND_MEMORY);
>  	}
>  
> -	/* Map the BARs into the guest or setup a trap region. */
> -	ret = vfio_map_region(kvm, vdev, region);
> -	if (ret)
> -		return ret;
> +	if (map_now) {
> +		/* Map the BARs into the guest or setup a trap region. */
> +		ret = vfio_map_region(kvm, vdev, region);
> +		if (ret)
> +			return ret;
> +	}
>  
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH kvmtool 09/16] arm/pci: Do not use first PCI IO space bytes for devices
  2019-11-25 10:30 ` [PATCH kvmtool 09/16] arm/pci: Do not use first PCI IO space bytes for devices Alexandru Elisei
@ 2019-12-02 12:15   ` Lorenzo Pieralisi
  0 siblings, 0 replies; 26+ messages in thread
From: Lorenzo Pieralisi @ 2019-12-02 12:15 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvm, will, julien.thierry.kdev, andre.przywara, sami.mujawar

On Mon, Nov 25, 2019 at 10:30:26AM +0000, Alexandru Elisei wrote:
> From: Julien Thierry <julien.thierry@arm.com>
> 
> Linux has this convention that the lower 0x1000 bytes of the IO space
> should not be used. (cf PCIBIOS_MIN_IO).
> 
> Just allocate those bytes to prevent future allocation assigning it to
> devices.

I do not understand what this means; if the kernel reassigns IO BARs
within the IO window kvmtool provides (through DT host controller
bindings - ranges property) this patch should not be needed.

Furthermore I don't think there should be any dependency in kvmtool
to the Linux PCIBIOS_MIN_IO offset (which happens to be 0x1000 but
kvmtool must not rely on that).

To sum it up: kvmtool should assign sensible IO ports default values
to BARs (even though that's *not* mandatory) and let the OS reassign
values according to the IO port windows provided through DT bindings
(ie ranges property).

It is likely there is something I am missing in this patch logic,
apologies for asking but I don't think this patch should be required.

Thanks,
Lorenzo

> Cc: julien.thierry.kdev@gmail.com
> Signed-off-by: Julien Thierry <julien.thierry@arm.com>
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arm/pci.c b/arm/pci.c
> index 1c0949a22408..4e6467357ce8 100644
> --- a/arm/pci.c
> +++ b/arm/pci.c
> @@ -37,6 +37,9 @@ void pci__arm_init(struct kvm *kvm)
>  
>  	/* Make PCI port allocation start at a properly aligned address */
>  	pci_get_io_port_block(align_pad);
> +
> +	/* Convention, don't allocate first 0x1000 bytes of PCI IO */
> +	pci_get_io_port_block(0x1000);
>  }
>  
>  void pci__generate_fdt_nodes(void *fdt)
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, back to index

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-25 10:30 [PATCH kvmtool 00/16] Add writable BARs and PCIE 1.1 support Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 01/16] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
2019-11-27 18:23   ` Andre Przywara
2019-11-25 10:30 ` [PATCH kvmtool 02/16] pci: Fix BAR resource sizing arbitration Alexandru Elisei
2019-11-27 18:24   ` Andre Przywara
2019-11-28  9:37     ` Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 03/16] Remove pci-shmem device Alexandru Elisei
2019-11-27 18:24   ` Andre Przywara
2019-11-25 10:30 ` [PATCH kvmtool 04/16] Check that a PCI device's memory size is power of two Alexandru Elisei
2019-11-27 18:25   ` Andre Przywara
2019-11-25 10:30 ` [PATCH kvmtool 05/16] arm: pci.c: Advertise only PCI bus 0 in the DT Alexandru Elisei
2019-11-28 17:43   ` Andre Przywara
2019-11-25 10:30 ` [PATCH kvmtool 06/16] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 07/16] pci: Fix ioport allocation size Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 08/16] arm/pci: Fix PCI IO region Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 09/16] arm/pci: Do not use first PCI IO space bytes for devices Alexandru Elisei
2019-12-02 12:15   ` Lorenzo Pieralisi
2019-11-25 10:30 ` [PATCH kvmtool 10/16] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 11/16] virtio/pci: Ignore MMIO and I/O accesses when they are disabled Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 12/16] Use independent read/write locks for ioport and mmio Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 13/16] vfio: Add support for BAR configuration Alexandru Elisei
2019-11-29 17:05   ` Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 14/16] virtio/pci: " Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 15/16] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
2019-11-25 10:30 ` [PATCH kvmtool 16/16] Add PCI Express 1.1 support Alexandru Elisei
2019-11-28 17:41 ` [PATCH kvmtool 00/16] Add writable BARs and PCIE " Lorenzo Pieralisi

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git