kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO
@ 2017-06-22 17:05 Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 01/10] pci: add config operations callbacks on the PCI header Jean-Philippe Brucker
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

This series implements PCI pass-through using VFIO in kvmtool. We
introduce a new parameter to lkvm run, --vfio-group, that takes an IOMMU
group number as argument, and passes all devices in the group to the
guest.

This implementation is missing quite a few features. Most notably:

* Apart from MSI/MSI-X, all capabilities are hidden. For instance, we
  don't support PCIe, which would require ECAM (we only do CAM). We also
  wouldn't support assigning virtio 1.0 devices, since they rely on the
  vendor-specific cap.

* IOMMU reserved regions might clash with the hardcoded guest address
  space. Fixing this would require a complete rewrite of address space
  allocation. We would initially register reserved regions, then
  subsystems would register their addressing capabilities (MMIO, config
  space, RAM), and finally devices would request chunks of memory as
  needed. Memory regions would be presented to the guest using DT (ARM &
  PPC), kernel parameters (mips) or e820 map (x86).
  
  For now kvmtool will simply abort initialization when there is an
  overlap between reserved regions and guest memory map. Users can change
  the kvmtool memory map manually to work around it, or simply comment out
  the check if the guest is unlikely to trigger DMA to that particular
  region.

* Non-ARM architectures are only build-tested. I don't have any hardware
  for x86 (though I might try Qemu when I find time). I also lack the
  knowledge for using VFIO on PowerPC.

Since last version [1], I had to rewrite the MSI-X code, which didn't work
with virtio legacy devices. I also refactored vfio.c to prepare for
vfio-platform support, which I'll send later. Detailed changes since last
version follow. You can pull the patches from [2].

Thanks,
Jean


  Patch 3:
* fixed deassign IRQFD

  Patch 6:
* Split vfio.c into vfio/core.c and vfio/pci.c, preparing for platform
  support.
* Cleanup on exit.
* Fixed multifunction devices (hopefully).
* Added dev_* printf helpers.
* Fixed powerpc and mips build

  Patches 7-8:
* Made all MSI routing setup lazy. We now closely follow virtual and
  physical state of MSI capabilities and vectors.
* Added handling of mask/unmask MSI vector.
* Merged MSI and MSI-X code where possible.
* Added rlimit check, to reserve enough fds for MSIs upfront.

[1] http://www.spinics.net/lists/kvm/msg147624.html
[2] git://linux-arm.org/kvmtool-jpb.git vfio/v2 

Jean-Philippe Brucker (10):
  pci: add config operations callbacks on the PCI header
  pci: allow to specify IRQ type for PCI devices
  irq: add irqfd helpers
  Extend memory bank API with memory types
  pci: add capability helpers
  Add PCI device passthrough using VFIO
  vfio-pci: add MSI-X support
  vfio-pci: add MSI support
  Introduce reserved memory regions
  vfio: check reserved regions before mapping DMA

 Makefile                     |    2 +
 arm/gic.c                    |   74 ++-
 arm/include/arm-common/gic.h |    6 +
 arm/kvm.c                    |    2 +-
 arm/pci.c                    |    4 +-
 builtin-run.c                |    5 +
 hw/pci-shmem.c               |   12 +-
 hw/vesa.c                    |    2 +-
 include/kvm/irq.h            |   17 +
 include/kvm/kvm-config.h     |    3 +
 include/kvm/kvm.h            |   54 +-
 include/kvm/pci.h            |  116 ++++-
 include/kvm/vfio.h           |  111 +++++
 irq.c                        |   24 +
 kvm.c                        |   99 +++-
 mips/kvm.c                   |    6 +-
 pci.c                        |  105 ++--
 powerpc/kvm.c                |    2 +-
 vfio/core.c                  |  430 ++++++++++++++++
 vfio/pci.c                   | 1108 ++++++++++++++++++++++++++++++++++++++++++
 virtio/net.c                 |    9 +-
 virtio/scsi.c                |   10 +-
 x86/kvm.c                    |    6 +-
 23 files changed, 2084 insertions(+), 123 deletions(-)
 create mode 100644 include/kvm/vfio.h
 create mode 100644 vfio/core.c
 create mode 100644 vfio/pci.c

-- 
2.13.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 01/10] pci: add config operations callbacks on the PCI header
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 02/10] pci: allow to specify IRQ type for PCI devices Jean-Philippe Brucker
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

When implementing PCI device passthrough, we will need to forward config
accesses from a guest to the VFIO driver. Add a private cfg_ops structure
to the PCI header, and use it in the PCI config access functions.

A read from the guest first calls into the device's cfg_ops.read, to let
the backend update the local header before filling the guest register.
Same happens for a write, we let the backend perform the write and replace
the guest-provided register with whatever sticks, before updating the local
header.

Try to untangle the PCI config access logic while we're at it.

Signed-off-by: Will Deacon <will.deacon@arm.com>
[JPB: moved to a separate patch]
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 include/kvm/pci.h | 72 ++++++++++++++++++++++++++++----------------
 pci.c             | 89 ++++++++++++++++++++++++++-----------------------------
 2 files changed, 89 insertions(+), 72 deletions(-)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index b0c28a10..56649d87 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -57,33 +57,55 @@ struct msix_cap {
 	u32 pba_offset;
 };
 
+#define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
+#define PCI_DEV_CFG_SIZE	256
+#define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
+
+struct pci_device_header;
+
+struct pci_config_operations {
+	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
+		      u8 offset, void *data, int sz);
+	void (*read)(struct kvm *kvm, struct pci_device_header *pci_hdr,
+		     u8 offset, void *data, int sz);
+};
+
 struct pci_device_header {
-	u16		vendor_id;
-	u16		device_id;
-	u16		command;
-	u16		status;
-	u8		revision_id;
-	u8		class[3];
-	u8		cacheline_size;
-	u8		latency_timer;
-	u8		header_type;
-	u8		bist;
-	u32		bar[6];
-	u32		card_bus;
-	u16		subsys_vendor_id;
-	u16		subsys_id;
-	u32		exp_rom_bar;
-	u8		capabilities;
-	u8		reserved1[3];
-	u32		reserved2;
-	u8		irq_line;
-	u8		irq_pin;
-	u8		min_gnt;
-	u8		max_lat;
-	struct msix_cap msix;
-	u8		empty[136]; /* Rest of PCI config space */
+	/* Configuration space, as seen by the guest */
+	union {
+		struct {
+			u16		vendor_id;
+			u16		device_id;
+			u16		command;
+			u16		status;
+			u8		revision_id;
+			u8		class[3];
+			u8		cacheline_size;
+			u8		latency_timer;
+			u8		header_type;
+			u8		bist;
+			u32		bar[6];
+			u32		card_bus;
+			u16		subsys_vendor_id;
+			u16		subsys_id;
+			u32		exp_rom_bar;
+			u8		capabilities;
+			u8		reserved1[3];
+			u32		reserved2;
+			u8		irq_line;
+			u8		irq_pin;
+			u8		min_gnt;
+			u8		max_lat;
+			struct msix_cap msix;
+		} __attribute__((packed));
+		/* Pad to PCI config space size */
+		u8	__pad[PCI_DEV_CFG_SIZE];
+	};
+
+	/* Private to lkvm */
 	u32		bar_size[6];
-} __attribute__((packed));
+	struct pci_config_operations	cfg_ops;
+};
 
 int pci__init(struct kvm *kvm);
 int pci__exit(struct kvm *kvm);
diff --git a/pci.c b/pci.c
index 3a6696c5..e48e24b8 100644
--- a/pci.c
+++ b/pci.c
@@ -8,8 +8,6 @@
 #include <linux/err.h>
 #include <assert.h>
 
-#define PCI_BAR_OFFSET(b)		(offsetof(struct pci_device_header, bar[b]))
-
 static u32 pci_config_address_bits;
 
 /* This is within our PCI gap - in an unused area.
@@ -131,59 +129,56 @@ static struct ioport_operations pci_config_data_ops = {
 
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
-	u8 dev_num;
-
-	dev_num	= addr.device_number;
-
-	if (pci_device_exists(0, dev_num, 0)) {
-		unsigned long offset;
-
-		offset = addr.w & 0xff;
-		if (offset < sizeof(struct pci_device_header)) {
-			void *p = device__find_dev(DEVICE_BUS_PCI, dev_num)->data;
-			struct pci_device_header *hdr = p;
-			u8 bar = (offset - PCI_BAR_OFFSET(0)) / (sizeof(u32));
-			u32 sz = cpu_to_le32(PCI_IO_SIZE);
-
-			if (bar < 6 && hdr->bar_size[bar])
-				sz = hdr->bar_size[bar];
-
-			/*
-			 * If the kernel masks the BAR it would expect to find the
-			 * size of the BAR there next time it reads from it.
-			 * When the kernel got the size it would write the address
-			 * back.
-			 */
-			if (*(u32 *)(p + offset)) {
-				/* See if kernel tries to mask one of the BARs */
-				if ((offset >= PCI_BAR_OFFSET(0)) &&
-				    (offset <= PCI_BAR_OFFSET(6)) &&
-				    (ioport__read32(data)  == 0xFFFFFFFF))
-					memcpy(p + offset, &sz, sizeof(sz));
-				    else
-					memcpy(p + offset, data, size);
-			}
-		}
+	void *base;
+	u8 bar, offset;
+	struct pci_device_header *pci_hdr;
+	u8 dev_num = addr.device_number;
+
+	if (!pci_device_exists(addr.bus_number, dev_num, 0))
+		return;
+
+	offset = addr.w & PCI_DEV_CFG_MASK;
+	base = pci_hdr = device__find_dev(DEVICE_BUS_PCI, dev_num)->data;
+
+	if (pci_hdr->cfg_ops.write)
+		pci_hdr->cfg_ops.write(kvm, pci_hdr, offset, data, size);
+
+	/*
+	 * legacy hack: ignore writes to uninitialized regions (e.g. ROM BAR).
+	 * Not very nice but has been working so far.
+	 */
+	if (*(u32 *)(base + offset) == 0)
+		return;
+
+	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
+
+	/*
+	 * If the kernel masks the BAR it would expect to find the size of the
+	 * BAR there next time it reads from it. When the kernel got the size it
+	 * would write the address back.
+	 */
+	if (bar < 6 && ioport__read32(data) == 0xFFFFFFFF) {
+		u32 sz = pci_hdr->bar_size[bar];
+		memcpy(base + offset, &sz, sizeof(sz));
+	} else {
+		memcpy(base + offset, data, size);
 	}
 }
 
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
-	u8 dev_num;
-
-	dev_num	= addr.device_number;
+	u8 offset;
+	struct pci_device_header *pci_hdr;
+	u8 dev_num = addr.device_number;
 
-	if (pci_device_exists(0, dev_num, 0)) {
-		unsigned long offset;
+	if (pci_device_exists(addr.bus_number, dev_num, 0)) {
+		pci_hdr = device__find_dev(DEVICE_BUS_PCI, dev_num)->data;
+		offset = addr.w & PCI_DEV_CFG_MASK;
 
-		offset = addr.w & 0xff;
-		if (offset < sizeof(struct pci_device_header)) {
-			void *p = device__find_dev(DEVICE_BUS_PCI, dev_num)->data;
+		if (pci_hdr->cfg_ops.read)
+			pci_hdr->cfg_ops.read(kvm, pci_hdr, offset, data, size);
 
-			memcpy(data, p + offset, size);
-		} else {
-			memset(data, 0x00, size);
-		}
+		memcpy(data, (void *)pci_hdr + offset, size);
 	} else {
 		memset(data, 0xff, size);
 	}
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 02/10] pci: allow to specify IRQ type for PCI devices
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 01/10] pci: add config operations callbacks on the PCI header Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 03/10] irq: add irqfd helpers Jean-Philippe Brucker
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Currently all our virtual device interrupts are edge-triggered. But we're
going to need level-triggered interrupts when passing physical devices.
Let the device configure its interrupt kind. Keep edge as default, to
avoid changing existing users.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 arm/pci.c         | 3 ++-
 include/kvm/pci.h | 6 ++++++
 pci.c             | 3 +++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arm/pci.c b/arm/pci.c
index 813df26a..744b14c2 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -77,6 +77,7 @@ void pci__generate_fdt_nodes(void *fdt)
 		u8 dev_num = dev_hdr->dev_num;
 		u8 pin = pci_hdr->irq_pin;
 		u8 irq = pci_hdr->irq_line;
+		u32 irq_flags = pci_hdr->irq_type;
 
 		*entry = (struct of_interrupt_map_entry) {
 			.pci_irq_mask = {
@@ -93,7 +94,7 @@ void pci__generate_fdt_nodes(void *fdt)
 			.gic_irq = {
 				.type	= cpu_to_fdt32(GIC_FDT_IRQ_TYPE_SPI),
 				.num	= cpu_to_fdt32(irq - GIC_SPI_IRQ_BASE),
-				.flags	= cpu_to_fdt32(IRQ_TYPE_EDGE_RISING),
+				.flags	= cpu_to_fdt32(irq_flags),
 			},
 		};
 
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 56649d87..5d9c0f3b 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -9,6 +9,7 @@
 #include "kvm/devices.h"
 #include "kvm/kvm.h"
 #include "kvm/msi.h"
+#include "kvm/fdt.h"
 
 /*
  * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
@@ -105,6 +106,11 @@ struct pci_device_header {
 	/* Private to lkvm */
 	u32		bar_size[6];
 	struct pci_config_operations	cfg_ops;
+	/*
+	 * PCI INTx# are level-triggered, but virtual device often feature
+	 * edge-triggered INTx# for convenience.
+	 */
+	enum irq_type	irq_type;
 };
 
 int pci__init(struct kvm *kvm);
diff --git a/pci.c b/pci.c
index e48e24b8..5a8c2ef4 100644
--- a/pci.c
+++ b/pci.c
@@ -39,6 +39,9 @@ void pci__assign_irq(struct device_header *dev_hdr)
 	 */
 	pci_hdr->irq_pin	= 1;
 	pci_hdr->irq_line	= irq__alloc_line();
+
+	if (!pci_hdr->irq_type)
+		pci_hdr->irq_type = IRQ_TYPE_EDGE_RISING;
 }
 
 static void *pci_config_address_ptr(u16 port)
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 03/10] irq: add irqfd helpers
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 01/10] pci: add config operations callbacks on the PCI header Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 02/10] pci: allow to specify IRQ type for PCI devices Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  2017-07-31 17:55   ` Punit Agrawal
  2017-06-22 17:05 ` [PATCH v2 kvmtool 04/10] Extend memory bank API with memory types Jean-Philippe Brucker
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Add helpers to add and remove IRQFD routing for both irqchips and MSIs.
We have to make a special case of IRQ lines on ARM where the
initialisation order goes like this:

 (1) Devices reserve their IRQ lines
 (2) VGIC is setup with VGIC_CTRL_INIT (in a late_init call)
 (3) MSIs are reserved lazily, when the guest needs them

Since we cannot setup IRQFD before (2), store the IRQFD routing for IRQ
lines temporarily until we're ready to submit them.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 arm/gic.c                    | 74 +++++++++++++++++++++++++++++++++++++++++++-
 arm/include/arm-common/gic.h |  6 ++++
 hw/pci-shmem.c               |  8 +----
 include/kvm/irq.h            | 17 ++++++++++
 irq.c                        | 24 ++++++++++++++
 virtio/net.c                 |  9 ++----
 virtio/scsi.c                | 10 ++----
 7 files changed, 126 insertions(+), 22 deletions(-)

diff --git a/arm/gic.c b/arm/gic.c
index 9de6a9c9..e53f932e 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -17,6 +17,14 @@ static u64 gic_redists_base;
 static u64 gic_redists_size;
 static u64 gic_msi_base;
 static u64 gic_msi_size = 0;
+static bool vgic_is_init = false;
+
+struct kvm_irqfd_line {
+	struct kvm_irqfd	irqfd;
+	struct list_head	list;
+};
+
+static LIST_HEAD(irqfd_lines);
 
 int irqchip_parser(const struct option *opt, const char *arg, int unset)
 {
@@ -36,6 +44,25 @@ int irqchip_parser(const struct option *opt, const char *arg, int unset)
 	return 0;
 }
 
+static int irq__setup_irqfd_lines(struct kvm *kvm)
+{
+	int ret;
+	struct kvm_irqfd_line *line, *tmp;
+
+	list_for_each_entry_safe(line, tmp, &irqfd_lines, list) {
+		ret = ioctl(kvm->vm_fd, KVM_IRQFD, &line->irqfd);
+		if (ret < 0) {
+			pr_err("Failed to register IRQFD");
+			return ret;
+		}
+
+		list_del(&line->list);
+		free(line);
+	}
+
+	return 0;
+}
+
 static int irq__routing_init(struct kvm *kvm)
 {
 	int r;
@@ -282,7 +309,9 @@ static int gic__init_gic(struct kvm *kvm)
 	kvm->msix_needs_devid = kvm__supports_vm_extension(kvm,
 							   KVM_CAP_MSI_DEVID);
 
-	return 0;
+	vgic_is_init = true;
+
+	return irq__setup_irqfd_lines(kvm);
 }
 late_init(gic__init_gic)
 
@@ -359,3 +388,46 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
 	kvm__irq_line(kvm, irq, VIRTIO_IRQ_HIGH);
 	kvm__irq_line(kvm, irq, VIRTIO_IRQ_LOW);
 }
+
+int gic__add_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd,
+		   int resample_fd)
+{
+	struct kvm_irqfd_line *line;
+
+	if (vgic_is_init)
+		return irq__common_add_irqfd(kvm, gsi, trigger_fd, resample_fd);
+
+	/* Postpone the routing setup until we have a distributor */
+	line = calloc(1, sizeof(*line));
+	if (!line)
+		return -ENOMEM;
+
+	line->irqfd = (struct kvm_irqfd) {
+		.fd = trigger_fd,
+		.gsi = gsi,
+		.flags = resample_fd > 0 ? KVM_IRQFD_FLAG_RESAMPLE : 0,
+		.resamplefd = resample_fd,
+	};
+	list_add(&line->list, &irqfd_lines);
+
+	return 0;
+}
+
+void gic__del_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd)
+{
+	struct kvm_irqfd_line *line;
+
+	if (vgic_is_init) {
+		irq__common_del_irqfd(kvm, gsi, trigger_fd);
+		return;
+	}
+
+	list_for_each_entry(line, &irqfd_lines, list) {
+		if (line->irqfd.gsi != gsi)
+			continue;
+
+		list_del(&line->list);
+		free(line);
+		break;
+	}
+}
diff --git a/arm/include/arm-common/gic.h b/arm/include/arm-common/gic.h
index 433dd237..0c279aca 100644
--- a/arm/include/arm-common/gic.h
+++ b/arm/include/arm-common/gic.h
@@ -33,4 +33,10 @@ int gic__alloc_irqnum(void);
 int gic__create(struct kvm *kvm, enum irqchip_type type);
 void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type);
 
+int gic__add_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd,
+		   int resample_fd);
+void gic__del_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd);
+#define irq__add_irqfd gic__add_irqfd
+#define irq__del_irqfd gic__del_irqfd
+
 #endif /* ARM_COMMON__GIC_H */
diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
index 512b5b06..107043e9 100644
--- a/hw/pci-shmem.c
+++ b/hw/pci-shmem.c
@@ -127,7 +127,6 @@ static void callback_mmio_msix(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len
 int pci_shmem__get_local_irqfd(struct kvm *kvm)
 {
 	int fd, gsi, r;
-	struct kvm_irqfd irqfd;
 
 	if (local_fd == 0) {
 		fd = eventfd(0, 0);
@@ -143,12 +142,7 @@ int pci_shmem__get_local_irqfd(struct kvm *kvm)
 			gsi = pci_shmem_pci_device.irq_line;
 		}
 
-		irqfd = (struct kvm_irqfd) {
-			.fd = fd,
-			.gsi = gsi,
-		};
-
-		r = ioctl(kvm->vm_fd, KVM_IRQFD, &irqfd);
+		r = irq__add_irqfd(kvm, gsi, fd, -1);
 		if (r < 0)
 			return r;
 
diff --git a/include/kvm/irq.h b/include/kvm/irq.h
index ee059e31..b24385b1 100644
--- a/include/kvm/irq.h
+++ b/include/kvm/irq.h
@@ -6,6 +6,7 @@
 #include <linux/list.h>
 #include <linux/kvm.h>
 
+#include "kvm/kvm-arch.h"
 #include "kvm/msi.h"
 
 struct kvm;
@@ -23,4 +24,20 @@ int irq__allocate_routing_entry(void);
 int irq__add_msix_route(struct kvm *kvm, struct msi_msg *msg, u32 device_id);
 void irq__update_msix_route(struct kvm *kvm, u32 gsi, struct msi_msg *msg);
 
+/*
+ * The function takes two eventfd arguments, trigger_fd and resample_fd. If
+ * resample_fd is <= 0, resampling is disabled and the IRQ is edge-triggered
+ */
+int irq__common_add_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd,
+			   int resample_fd);
+void irq__common_del_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd);
+
+#ifndef irq__add_irqfd
+#define irq__add_irqfd irq__common_add_irqfd
+#endif
+
+#ifndef irq__del_irqfd
+#define irq__del_irqfd irq__common_del_irqfd
+#endif
+
 #endif
diff --git a/irq.c b/irq.c
index 44040626..25fa5727 100644
--- a/irq.c
+++ b/irq.c
@@ -133,6 +133,30 @@ void irq__update_msix_route(struct kvm *kvm, u32 gsi, struct msi_msg *msg)
 		die_perror("KVM_SET_GSI_ROUTING");
 }
 
+int irq__common_add_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd,
+			   int resample_fd)
+{
+	struct kvm_irqfd irqfd = {
+		.fd		= trigger_fd,
+		.gsi		= gsi,
+		.flags		= resample_fd > 0 ? KVM_IRQFD_FLAG_RESAMPLE : 0,
+		.resamplefd	= resample_fd,
+	};
+
+	return ioctl(kvm->vm_fd, KVM_IRQFD, &irqfd);
+}
+
+void irq__common_del_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd)
+{
+	struct kvm_irqfd irqfd = {
+		.fd		= trigger_fd,
+		.gsi		= gsi,
+		.flags		= KVM_IRQFD_FLAG_DEASSIGN,
+	};
+
+	ioctl(kvm->vm_fd, KVM_IRQFD, &irqfd);
+}
+
 int __attribute__((weak)) irq__exit(struct kvm *kvm)
 {
 	free(irq_routing);
diff --git a/virtio/net.c b/virtio/net.c
index 9fb9f1ee..52645481 100644
--- a/virtio/net.c
+++ b/virtio/net.c
@@ -558,23 +558,18 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align,
 static void notify_vq_gsi(struct kvm *kvm, void *dev, u32 vq, u32 gsi)
 {
 	struct net_dev *ndev = dev;
-	struct kvm_irqfd irq;
 	struct vhost_vring_file file;
 	int r;
 
 	if (ndev->vhost_fd == 0)
 		return;
 
-	irq = (struct kvm_irqfd) {
-		.gsi	= gsi,
-		.fd	= eventfd(0, 0),
-	};
 	file = (struct vhost_vring_file) {
 		.index	= vq,
-		.fd	= irq.fd,
+		.fd	= eventfd(0, 0),
 	};
 
-	r = ioctl(kvm->vm_fd, KVM_IRQFD, &irq);
+	r = irq__add_irqfd(kvm, gsi, file.fd, -1);
 	if (r < 0)
 		die_perror("KVM_IRQFD failed");
 
diff --git a/virtio/scsi.c b/virtio/scsi.c
index 58d2353a..a429ac85 100644
--- a/virtio/scsi.c
+++ b/virtio/scsi.c
@@ -1,6 +1,7 @@
 #include "kvm/virtio-scsi.h"
 #include "kvm/virtio-pci-dev.h"
 #include "kvm/disk-image.h"
+#include "kvm/irq.h"
 #include "kvm/kvm.h"
 #include "kvm/pci.h"
 #include "kvm/ioeventfd.h"
@@ -97,22 +98,17 @@ static void notify_vq_gsi(struct kvm *kvm, void *dev, u32 vq, u32 gsi)
 {
 	struct vhost_vring_file file;
 	struct scsi_dev *sdev = dev;
-	struct kvm_irqfd irq;
 	int r;
 
 	if (sdev->vhost_fd == 0)
 		return;
 
-	irq = (struct kvm_irqfd) {
-		.gsi	= gsi,
-		.fd	= eventfd(0, 0),
-	};
 	file = (struct vhost_vring_file) {
 		.index	= vq,
-		.fd	= irq.fd,
+		.fd	= eventfd(0, 0),
 	};
 
-	r = ioctl(kvm->vm_fd, KVM_IRQFD, &irq);
+	r = irq__add_irqfd(kvm, gsi, file.fd, -1);
 	if (r < 0)
 		die_perror("KVM_IRQFD failed");
 
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 04/10] Extend memory bank API with memory types
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
                   ` (2 preceding siblings ...)
  2017-06-22 17:05 ` [PATCH v2 kvmtool 03/10] irq: add irqfd helpers Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 05/10] pci: add capability helpers Jean-Philippe Brucker
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Introduce memory types RAM and DEVICE, along with a way for subsystems to
query the global memory banks. This is required by VFIO, which will need
to pin and map guest RAM so that assigned devices can safely do DMA to it.
Depending on the architecture, the physical map is made of either one or
two RAM regions. In addition, this new memory types API paves the way to
reserved memory regions introduced in a subsequent patch.

For the moment we put vesa and ivshmem memory into the DEVICE category, so
they don't have to be pinned. This means that physical devices assigned
with VFIO won't be able to DMA to the vesa frame buffer or ivshmem. In
order to do that, simply changing the type to "RAM" would work. But to
keep the types consistent, it would be better to introduce flags such as
KVM_MEM_TYPE_DMA that would complement both RAM and DEVICE type. We could
then reuse the API for generating firmware information (that is, for x86
bios; DT supports reserved-memory description).

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 arm/kvm.c         |  2 +-
 hw/pci-shmem.c    |  4 ++--
 hw/vesa.c         |  2 +-
 include/kvm/kvm.h | 44 +++++++++++++++++++++++++++++++++++++++++++-
 kvm.c             | 31 ++++++++++++++++++++++++++++++-
 mips/kvm.c        |  6 +++---
 powerpc/kvm.c     |  2 +-
 x86/kvm.c         |  6 +++---
 8 files changed, 84 insertions(+), 13 deletions(-)

diff --git a/arm/kvm.c b/arm/kvm.c
index 3cfa90ab..77d98281 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -34,7 +34,7 @@ void kvm__init_ram(struct kvm *kvm)
 	phys_size	= kvm->ram_size;
 	host_mem	= kvm->ram_start;
 
-	err = kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+	err = kvm__register_ram(kvm, phys_start, phys_size, host_mem);
 	if (err)
 		die("Failed to register %lld bytes of memory at physical "
 		    "address 0x%llx [err %d]", phys_size, phys_start, err);
diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
index 107043e9..f92bc755 100644
--- a/hw/pci-shmem.c
+++ b/hw/pci-shmem.c
@@ -387,8 +387,8 @@ int pci_shmem__init(struct kvm *kvm)
 	if (mem == NULL)
 		return -EINVAL;
 
-	kvm__register_mem(kvm, shmem_region->phys_addr, shmem_region->size,
-			  mem);
+	kvm__register_dev_mem(kvm, shmem_region->phys_addr, shmem_region->size,
+			      mem);
 	return 0;
 }
 dev_init(pci_shmem__init);
diff --git a/hw/vesa.c b/hw/vesa.c
index a9a1d3e2..f3c5114c 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -73,7 +73,7 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 	if (mem == MAP_FAILED)
 		ERR_PTR(-errno);
 
-	kvm__register_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
+	kvm__register_dev_mem(kvm, VESA_MEM_ADDR, VESA_MEM_SIZE, mem);
 
 	vesafb = (struct framebuffer) {
 		.width			= VESA_WIDTH,
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 90463b8c..57061566 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -34,6 +34,14 @@ enum {
 	KVM_VMSTATE_PAUSED,
 };
 
+enum kvm_mem_type {
+	KVM_MEM_TYPE_RAM	= 1 << 1,
+	KVM_MEM_TYPE_DEVICE	= 1 << 2,
+
+	KVM_MEM_TYPE_ALL	= KVM_MEM_TYPE_RAM
+				| KVM_MEM_TYPE_DEVICE
+};
+
 struct kvm_ext {
 	const char *name;
 	int code;
@@ -44,6 +52,7 @@ struct kvm_mem_bank {
 	u64			guest_phys_addr;
 	void			*host_addr;
 	u64			size;
+	enum kvm_mem_type	type;
 };
 
 struct kvm {
@@ -90,7 +99,22 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
 void kvm__irq_trigger(struct kvm *kvm, int irq);
 bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count);
 bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write);
-int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr);
+int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr,
+		      enum kvm_mem_type type);
+static inline int kvm__register_ram(struct kvm *kvm, u64 guest_phys, u64 size,
+				    void *userspace_addr)
+{
+	return kvm__register_mem(kvm, guest_phys, size, userspace_addr,
+				 KVM_MEM_TYPE_RAM);
+}
+
+static inline int kvm__register_dev_mem(struct kvm *kvm, u64 guest_phys,
+					u64 size, void *userspace_addr)
+{
+	return kvm__register_mem(kvm, guest_phys, size, userspace_addr,
+				 KVM_MEM_TYPE_DEVICE);
+}
+
 int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
 		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
 			void *ptr);
@@ -117,6 +141,24 @@ u64 host_to_guest_flat(struct kvm *kvm, void *ptr);
 bool kvm__arch_load_kernel_image(struct kvm *kvm, int fd_kernel, int fd_initrd,
 				 const char *kernel_cmdline);
 
+static inline const char *kvm_mem_type_to_string(enum kvm_mem_type type)
+{
+	switch (type) {
+	case KVM_MEM_TYPE_ALL:
+		return "(all)";
+	case KVM_MEM_TYPE_RAM:
+		return "RAM";
+	case KVM_MEM_TYPE_DEVICE:
+		return "device";
+	}
+
+	return "???";
+}
+
+int kvm__for_each_mem_bank(struct kvm *kvm, enum kvm_mem_type type,
+			   int (*fun)(struct kvm *kvm, struct kvm_mem_bank *bank, void *data),
+			   void *data);
+
 /*
  * Debugging
  */
diff --git a/kvm.c b/kvm.c
index 665ed148..39ce88c4 100644
--- a/kvm.c
+++ b/kvm.c
@@ -182,7 +182,8 @@ core_exit(kvm__exit);
  * memory regions to it. Therefore, be careful if you use this function for
  * registering memory regions for emulating hardware.
  */
-int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr)
+int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
+		      void *userspace_addr, enum kvm_mem_type type)
 {
 	struct kvm_userspace_memory_region mem;
 	struct kvm_mem_bank *bank;
@@ -196,6 +197,7 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace
 	bank->guest_phys_addr		= guest_phys;
 	bank->host_addr			= userspace_addr;
 	bank->size			= size;
+	bank->type			= type;
 
 	mem = (struct kvm_userspace_memory_region) {
 		.slot			= kvm->mem_slots++,
@@ -245,6 +247,33 @@ u64 host_to_guest_flat(struct kvm *kvm, void *ptr)
 	return 0;
 }
 
+/*
+ * Iterate over each registered memory bank. Call @fun for each bank with @data
+ * as argument. @type is a bitmask that allows to filter banks according to
+ * their type.
+ *
+ * If one call to @fun returns a non-zero value, stop iterating and return the
+ * value. Otherwise, return zero.
+ */
+int kvm__for_each_mem_bank(struct kvm *kvm, enum kvm_mem_type type,
+			   int (*fun)(struct kvm *kvm, struct kvm_mem_bank *bank, void *data),
+			   void *data)
+{
+	int ret;
+	struct kvm_mem_bank *bank;
+
+	list_for_each_entry(bank, &kvm->mem_banks, list) {
+		if (type != KVM_MEM_TYPE_ALL && !(bank->type & type))
+			continue;
+
+		ret = fun(kvm, bank, data);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
 int kvm__recommended_cpus(struct kvm *kvm)
 {
 	int ret;
diff --git a/mips/kvm.c b/mips/kvm.c
index 24bd6507..211770da 100644
--- a/mips/kvm.c
+++ b/mips/kvm.c
@@ -28,21 +28,21 @@ void kvm__init_ram(struct kvm *kvm)
 		phys_size  = kvm->ram_size;
 		host_mem   = kvm->ram_start;
 
-		kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
 	} else {
 		/* one region for memory that fits below MMIO range */
 		phys_start = 0;
 		phys_size  = KVM_MMIO_START;
 		host_mem   = kvm->ram_start;
 
-		kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
 
 		/* one region for rest of memory */
 		phys_start = KVM_MMIO_START + KVM_MMIO_SIZE;
 		phys_size  = kvm->ram_size - KVM_MMIO_START;
 		host_mem   = kvm->ram_start + KVM_MMIO_START;
 
-		kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
 	}
 }
 
diff --git a/powerpc/kvm.c b/powerpc/kvm.c
index c738c1d6..702d67dc 100644
--- a/powerpc/kvm.c
+++ b/powerpc/kvm.c
@@ -79,7 +79,7 @@ void kvm__init_ram(struct kvm *kvm)
 		    "overlaps MMIO!\n",
 		    phys_size);
 
-	kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+	kvm__register_ram(kvm, phys_start, phys_size, host_mem);
 }
 
 void kvm__arch_set_cmdline(char *cmdline, bool video)
diff --git a/x86/kvm.c b/x86/kvm.c
index bfa04b81..8ee58e83 100644
--- a/x86/kvm.c
+++ b/x86/kvm.c
@@ -98,7 +98,7 @@ void kvm__init_ram(struct kvm *kvm)
 		phys_size  = kvm->ram_size;
 		host_mem   = kvm->ram_start;
 
-		kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
 	} else {
 		/* First RAM range from zero to the PCI gap: */
 
@@ -106,7 +106,7 @@ void kvm__init_ram(struct kvm *kvm)
 		phys_size  = KVM_32BIT_GAP_START;
 		host_mem   = kvm->ram_start;
 
-		kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
 
 		/* Second RAM range from 4GB to the end of RAM: */
 
@@ -114,7 +114,7 @@ void kvm__init_ram(struct kvm *kvm)
 		phys_size  = kvm->ram_size - phys_start;
 		host_mem   = kvm->ram_start + phys_start;
 
-		kvm__register_mem(kvm, phys_start, phys_size, host_mem);
+		kvm__register_ram(kvm, phys_start, phys_size, host_mem);
 	}
 }
 
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 05/10] pci: add capability helpers
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
                   ` (3 preceding siblings ...)
  2017-06-22 17:05 ` [PATCH v2 kvmtool 04/10] Extend memory bank API with memory types Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 06/10] Add PCI device passthrough using VFIO Jean-Philippe Brucker
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Add a way to iterate over all capabilities in a config space. Add a search
function for getting a specific capability.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 include/kvm/pci.h | 12 ++++++++++++
 pci.c             | 13 +++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 5d9c0f3b..2950bb10 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -58,6 +58,11 @@ struct msix_cap {
 	u32 pba_offset;
 };
 
+struct pci_cap_hdr {
+	u8	type;
+	u8	next;
+};
+
 #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
 #define PCI_DEV_CFG_SIZE	256
 #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
@@ -113,6 +118,11 @@ struct pci_device_header {
 	enum irq_type	irq_type;
 };
 
+#define pci_for_each_cap(pos, cap, hdr)				\
+	for ((pos) = (hdr)->capabilities & ~3;			\
+	     (cap) = (void *)(hdr) + (pos), (pos) != 0;		\
+	     (pos) = ((struct pci_cap_hdr *)(cap))->next & ~3)
+
 int pci__init(struct kvm *kvm);
 int pci__exit(struct kvm *kvm);
 struct pci_device_header *pci__find_dev(u8 dev_num);
@@ -121,4 +131,6 @@ void pci__assign_irq(struct device_header *dev_hdr);
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size);
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size);
 
+void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
+
 #endif /* KVM__PCI_H */
diff --git a/pci.c b/pci.c
index 5a8c2ef4..689869cb 100644
--- a/pci.c
+++ b/pci.c
@@ -27,6 +27,19 @@ u32 pci_get_io_space_block(u32 size)
 	return block;
 }
 
+void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type)
+{
+	u8 pos;
+	struct pci_cap_hdr *cap;
+
+	pci_for_each_cap(pos, cap, hdr) {
+		if (cap->type == cap_type)
+			return cap;
+	}
+
+	return NULL;
+}
+
 void pci__assign_irq(struct device_header *dev_hdr)
 {
 	struct pci_device_header *pci_hdr = dev_hdr->data;
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 06/10] Add PCI device passthrough using VFIO
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
                   ` (4 preceding siblings ...)
  2017-06-22 17:05 ` [PATCH v2 kvmtool 05/10] pci: add capability helpers Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  2017-07-31 17:52   ` Punit Agrawal
  2017-06-22 17:05 ` [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support Jean-Philippe Brucker
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Assigning devices using VFIO allows the guest to have direct access to the
device, whilst filtering accesses to sensitive areas by trapping config
space accesses and mapping DMA with an IOMMU.

This patch adds a new option to lkvm run: --vfio-group=<group_number>.
Before assigning a device to a VM, some preparation is required. As
described in Linux Documentation/vfio.txt, the device driver need to be
changed to vfio-pci:

  $ dev=0000:00:00.0

  $ echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
  $ echo vfio-pci > /sys/bus/pci/devices/$dev/driver_override
  $ echo $dev > /sys/bus/pci/drivers_probe
  $ readlink /sys/bus/pci/devices/$dev/iommu_group
  ../../../kernel/iommu_groups/5

Adding --vfio[-group]=5 to lkvm-run will pass the device to the guest.
Multiple groups can be passed to the guest by adding more --vfio
parameters.

This patch only implements PCI with INTx. MSI-X routing will be added in a
subsequent patch, and at some point we might add support for passing
platform devices to guests.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 Makefile                 |   2 +
 arm/pci.c                |   1 +
 builtin-run.c            |   5 +
 include/kvm/kvm-config.h |   3 +
 include/kvm/pci.h        |   3 +-
 include/kvm/vfio.h       |  57 +++++++
 vfio/core.c              | 395 +++++++++++++++++++++++++++++++++++++++++++++++
 vfio/pci.c               | 365 +++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 830 insertions(+), 1 deletion(-)
 create mode 100644 include/kvm/vfio.h
 create mode 100644 vfio/core.c
 create mode 100644 vfio/pci.c

diff --git a/Makefile b/Makefile
index 57714815..caae6f07 100644
--- a/Makefile
+++ b/Makefile
@@ -59,6 +59,8 @@ OBJS	+= main.o
 OBJS	+= mmio.o
 OBJS	+= pci.o
 OBJS	+= term.o
+OBJS	+= vfio/core.o
+OBJS	+= vfio/pci.o
 OBJS	+= virtio/blk.o
 OBJS	+= virtio/scsi.o
 OBJS	+= virtio/console.o
diff --git a/arm/pci.c b/arm/pci.c
index 744b14c2..557cfa98 100644
--- a/arm/pci.c
+++ b/arm/pci.c
@@ -1,5 +1,6 @@
 #include "kvm/devices.h"
 #include "kvm/fdt.h"
+#include "kvm/kvm.h"
 #include "kvm/of_pci.h"
 #include "kvm/pci.h"
 #include "kvm/util.h"
diff --git a/builtin-run.c b/builtin-run.c
index 72b878dc..3ee735d9 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -146,6 +146,11 @@ void kvm_run_set_wrapper_sandbox(void)
 	OPT_BOOLEAN('\0', "no-dhcp", &(cfg)->no_dhcp, "Disable kernel"	\
 			" DHCP in rootfs mode"),			\
 									\
+	OPT_GROUP("VFIO options:"),					\
+	OPT_CALLBACK('\0', "vfio-group", NULL, "group number",		\
+			"Assign a VFIO group to the virtual machine",	\
+			vfio_group_parser, kvm),			\
+									\
 	OPT_GROUP("Debug options:"),					\
 	OPT_BOOLEAN('\0', "debug", &do_debug_print,			\
 			"Enable debug messages"),			\
diff --git a/include/kvm/kvm-config.h b/include/kvm/kvm-config.h
index 386fa8c5..62dc6a2f 100644
--- a/include/kvm/kvm-config.h
+++ b/include/kvm/kvm-config.h
@@ -2,6 +2,7 @@
 #define KVM_CONFIG_H_
 
 #include "kvm/disk-image.h"
+#include "kvm/vfio.h"
 #include "kvm/kvm-config-arch.h"
 
 #define DEFAULT_KVM_DEV		"/dev/kvm"
@@ -20,9 +21,11 @@
 struct kvm_config {
 	struct kvm_config_arch arch;
 	struct disk_image_params disk_image[MAX_DISK_IMAGES];
+	struct vfio_group vfio_group[MAX_VFIO_GROUPS];
 	u64 ram_size;
 	u8  image_count;
 	u8 num_net_devices;
+	u8 num_vfio_groups;
 	bool virtio_rng;
 	int active_console;
 	int debug_iodelay;
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 2950bb10..44e5adff 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -7,7 +7,6 @@
 #include <endian.h>
 
 #include "kvm/devices.h"
-#include "kvm/kvm.h"
 #include "kvm/msi.h"
 #include "kvm/fdt.h"
 
@@ -22,6 +21,8 @@
 #define PCI_IO_SIZE		0x100
 #define PCI_CFG_SIZE		(1ULL << 24)
 
+struct kvm;
+
 union pci_config_address {
 	struct {
 #if __BYTE_ORDER == __LITTLE_ENDIAN
diff --git a/include/kvm/vfio.h b/include/kvm/vfio.h
new file mode 100644
index 00000000..060f32a3
--- /dev/null
+++ b/include/kvm/vfio.h
@@ -0,0 +1,57 @@
+#ifndef KVM__VFIO_H
+#define KVM__VFIO_H
+
+#include "kvm/parse-options.h"
+#include "kvm/pci.h"
+
+#include <linux/vfio.h>
+
+#include <dirent.h>
+
+#define dev_err(vdev, fmt, ...)		pr_err("%s: " fmt, vdev->name, ##__VA_ARGS__)
+#define dev_warn(vdev, fmt, ...)	pr_warning("%s: " fmt, vdev->name, ##__VA_ARGS__)
+#define dev_info(vdev, fmt, ...)	pr_info("%s: " fmt, vdev->name, ##__VA_ARGS__)
+#define dev_die(vdev, fmt, ...)		die("%s: " fmt, vdev->name, ##__VA_ARGS__)
+
+#define MAX_VFIO_GROUPS			16
+
+struct vfio_pci_device {
+	struct pci_device_header	hdr;
+};
+
+struct vfio_region {
+	struct vfio_region_info		info;
+	u64				guest_phys_addr;
+	void				*host_addr;
+};
+
+struct vfio_device {
+	struct device_header		dev_hdr;
+
+	int				fd;
+	struct vfio_device_info		info;
+	struct vfio_irq_info		irq_info;
+	struct vfio_region		*regions;
+
+	char				*name;
+	char				*sysfs_path;
+
+	struct hlist_node		list;
+
+	struct vfio_pci_device		pci;
+};
+
+struct vfio_group {
+	unsigned long			id; /* iommu_group number in sysfs */
+	int				fd;
+	struct hlist_head		devices;
+};
+
+int vfio_group_parser(const struct option *opt, const char *arg, int unset);
+int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
+		    struct vfio_region *region);
+void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region);
+int vfio_pci_setup_device(struct kvm *kvm, struct vfio_device *device);
+void vfio_pci_teardown_device(struct kvm *kvm, struct vfio_device *vdev);
+
+#endif /* KVM__VFIO_H */
diff --git a/vfio/core.c b/vfio/core.c
new file mode 100644
index 00000000..7e1ba789
--- /dev/null
+++ b/vfio/core.c
@@ -0,0 +1,395 @@
+#include "kvm/kvm.h"
+#include "kvm/vfio.h"
+
+#include <linux/list.h>
+
+#define VFIO_DEV_DIR		"/dev/vfio"
+#define VFIO_DEV_NODE		VFIO_DEV_DIR "/vfio"
+#define IOMMU_GROUP_DIR		"/sys/kernel/iommu_groups"
+
+#define VFIO_PATH_MAX_LEN	16
+
+static int vfio_container;
+
+int vfio_group_parser(const struct option *opt, const char *arg, int unset)
+{
+	char *cur, *buf = strdup(arg);
+	static int idx = 0;
+	struct kvm *kvm = opt->ptr;
+	struct vfio_group *group = &kvm->cfg.vfio_group[idx];
+
+	if (idx >= MAX_VFIO_GROUPS) {
+		if (idx++ == MAX_VFIO_GROUPS)
+			pr_warning("Too many VFIO groups");
+		free(buf);
+		return 0;
+	}
+
+	cur = strtok(buf, ",");
+	group->id = strtoul(cur, NULL, 0);
+
+	kvm->cfg.num_vfio_groups = ++idx;
+	free(buf);
+
+	return 0;
+}
+
+int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
+		    struct vfio_region *region)
+{
+	void *base;
+	int ret, prot = 0;
+	/* KVM needs page-aligned regions */
+	u64 map_size = ALIGN(region->info.size, PAGE_SIZE);
+
+	/*
+	 * We don't want to mess about trapping config accesses, so require that
+	 * they can be mmap'd. Note that for PCI, this precludes the use of I/O
+	 * BARs in the guest (we will hide them from Configuration Space, which
+	 * is trapped).
+	 */
+	if (!(region->info.flags & VFIO_REGION_INFO_FLAG_MMAP)) {
+		dev_info(vdev, "ignoring region %u, as it can't be mmap'd",
+			 region->info.index);
+		return 0;
+	}
+
+	if (region->info.flags & VFIO_REGION_INFO_FLAG_READ)
+		prot |= PROT_READ;
+	if (region->info.flags & VFIO_REGION_INFO_FLAG_WRITE)
+		prot |= PROT_WRITE;
+
+	base = mmap(NULL, region->info.size, prot, MAP_SHARED, vdev->fd,
+		    region->info.offset);
+	if (base == MAP_FAILED) {
+		ret = -errno;
+		dev_err(vdev, "failed to mmap region %u (0x%llx bytes)",
+			region->info.index, region->info.size);
+		return ret;
+	}
+	region->host_addr = base;
+
+	ret = kvm__register_dev_mem(kvm, region->guest_phys_addr, map_size,
+				    region->host_addr);
+	if (ret) {
+		dev_err(vdev, "failed to register region with KVM");
+		return ret;
+	}
+
+	return 0;
+}
+
+void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
+{
+	munmap(region->host_addr, region->info.size);
+}
+
+static int vfio_configure_device(struct kvm *kvm, struct vfio_group *group,
+				 const char *dirpath, const char *name)
+{
+	u32 num_regions;
+	int ret = -ENOMEM;
+	char fullpath[PATH_MAX];
+	struct vfio_device *vdev;
+
+	snprintf(fullpath, PATH_MAX, "%s/%s", dirpath, name);
+
+	vdev = calloc(1, sizeof(*vdev));
+	if (!vdev)
+		return -ENOMEM;
+
+	vdev->name = strdup(name);
+	if (!vdev->name)
+		goto err_free_device;
+
+	vdev->sysfs_path = strndup(fullpath, PATH_MAX);
+	if (!vdev->sysfs_path)
+		goto err_free_name;
+
+	vdev->fd = ioctl(group->fd, VFIO_GROUP_GET_DEVICE_FD, name);
+	if (vdev->fd < 0) {
+		dev_err(vdev, "failed to get fd");
+
+		/* The device might be a bridge without an fd */
+		ret = 0;
+		goto err_free_path;
+	}
+
+	vdev->info.argsz = sizeof(vdev->info);
+	if (ioctl(vdev->fd, VFIO_DEVICE_GET_INFO, &vdev->info)) {
+		ret = -errno;
+		dev_err(vdev, "failed to get info");
+		goto err_close_device;
+	}
+
+	if (vdev->info.flags & VFIO_DEVICE_FLAGS_RESET &&
+	    ioctl(vdev->fd, VFIO_DEVICE_RESET) < 0)
+		dev_warn(vdev, "failed to reset device");
+
+	num_regions = vdev->info.num_regions;
+
+	vdev->regions = calloc(num_regions, sizeof(*vdev->regions));
+	if (!vdev->regions) {
+		ret = -ENOMEM;
+		goto err_close_device;
+	}
+
+	/* Now for the bus-specific initialization... */
+	if (vdev->info.flags & VFIO_DEVICE_FLAGS_PCI) {
+		ret = vfio_pci_setup_device(kvm, vdev);
+	} else {
+		dev_warn(vdev, "only vfio-pci is supported");
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		goto err_free_regions;
+
+	dev_info(vdev, "assigned to device number 0x%x in group %lu",
+		 vdev->dev_hdr.dev_num, group->id);
+
+	hlist_add_head(&vdev->list, &group->devices);
+
+	return 0;
+
+err_free_regions:
+	free(vdev->regions);
+err_close_device:
+	close(vdev->fd);
+err_free_path:
+	free((void *)vdev->sysfs_path);
+err_free_name:
+	free((void *)vdev->name);
+err_free_device:
+	free(vdev);
+
+	return ret;
+}
+
+static int vfio_configure_iommu_groups(struct kvm *kvm)
+{
+	int i, ret;
+
+	for (i = 0; i < kvm->cfg.num_vfio_groups; ++i) {
+		DIR *dir;
+		struct dirent *dirent;
+		char dirpath[PATH_MAX];
+		struct vfio_group *group = &kvm->cfg.vfio_group[i];
+
+		snprintf(dirpath, PATH_MAX, IOMMU_GROUP_DIR "/%lu/devices",
+			 group->id);
+
+		dir = opendir(dirpath);
+		if (!dir) {
+			ret = -errno;
+			pr_err("Failed to open IOMMU group %s", dirpath);
+			return ret;
+		}
+
+		while ((dirent = readdir(dir))) {
+			if (dirent->d_type != DT_LNK)
+				continue;
+
+			ret = vfio_configure_device(kvm, group, dirpath,
+						    dirent->d_name);
+			if (ret)
+				return ret;
+		}
+
+		if (closedir(dir))
+			pr_warning("Failed to close IOMMU group %s", dirpath);
+	}
+
+	return 0;
+}
+
+static int vfio_get_iommu_type(void)
+{
+	if (ioctl(vfio_container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_NESTING_IOMMU))
+		return VFIO_TYPE1_NESTING_IOMMU;
+
+	if (ioctl(vfio_container, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU))
+		return VFIO_TYPE1v2_IOMMU;
+
+	if (ioctl(vfio_container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU))
+		return VFIO_TYPE1_IOMMU;
+
+	return -ENODEV;
+}
+
+static int vfio_map_mem_bank(struct kvm *kvm, struct kvm_mem_bank *bank, void *data)
+{
+	int ret = 0;
+	struct vfio_iommu_type1_dma_map dma_map = {
+		.argsz	= sizeof(dma_map),
+		.flags	= VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE,
+		.vaddr	= (unsigned long)bank->host_addr,
+		.iova	= (u64)bank->guest_phys_addr,
+		.size	= bank->size,
+	};
+
+	/* Map the guest memory for DMA (i.e. provide isolation) */
+	if (ioctl(vfio_container, VFIO_IOMMU_MAP_DMA, &dma_map)) {
+		ret = -errno;
+		pr_err("Failed to map 0x%llx -> 0x%llx (%llu) for DMA",
+		       dma_map.iova, dma_map.vaddr, dma_map.size);
+	}
+
+	return ret;
+}
+
+static int vfio_unmap_mem_bank(struct kvm *kvm, struct kvm_mem_bank *bank, void *data)
+{
+	struct vfio_iommu_type1_dma_unmap dma_unmap = {
+		.argsz = sizeof(dma_unmap),
+		.size = bank->size,
+		.iova = bank->guest_phys_addr,
+	};
+
+	ioctl(vfio_container, VFIO_IOMMU_UNMAP_DMA, &dma_unmap);
+
+	return 0;
+}
+
+static int vfio_group_init(struct kvm *kvm, struct vfio_group *group)
+{
+	int ret;
+	char group_node[VFIO_PATH_MAX_LEN];
+	struct vfio_group_status group_status = {
+		.argsz = sizeof(group_status),
+	};
+
+	INIT_HLIST_HEAD(&group->devices);
+
+	snprintf(group_node, VFIO_PATH_MAX_LEN, VFIO_DEV_DIR "/%lu",
+		 group->id);
+
+	group->fd = open(group_node, O_RDWR);
+	if (group->fd == -1) {
+		ret = -errno;
+		pr_err("Failed to open IOMMU group %s", group_node);
+		return ret;
+	}
+
+	if (ioctl(group->fd, VFIO_GROUP_GET_STATUS, &group_status)) {
+		ret = -errno;
+		pr_err("Failed to determine status of IOMMU group %s",
+		       group_node);
+		return ret;
+	}
+
+	if (!(group_status.flags & VFIO_GROUP_FLAGS_VIABLE)) {
+		pr_err("IOMMU group %s is not viable", group_node);
+		return -EINVAL;
+	}
+
+	if (ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &vfio_container)) {
+		ret = -errno;
+		pr_err("Failed to add IOMMU group %s to VFIO container",
+		       group_node);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void vfio_group_exit(struct kvm *kvm, struct vfio_group *group)
+{
+	int fd = group->fd;
+	struct hlist_node *next;
+	struct vfio_device *vdev;
+
+	hlist_for_each_entry_safe(vdev, next, &group->devices, list) {
+		if (vdev->info.flags & VFIO_DEVICE_FLAGS_PCI)
+			vfio_pci_teardown_device(kvm, vdev);
+
+		close(vdev->fd);
+
+		free(vdev->regions);
+		free(vdev->name);
+		free(vdev->sysfs_path);
+		free(vdev);
+	}
+
+	ioctl(fd, VFIO_GROUP_UNSET_CONTAINER);
+	close(fd);
+}
+
+static int vfio_container_init(struct kvm *kvm)
+{
+	int api, i, ret, iommu_type;;
+
+	/* Create a container for our IOMMU groups */
+	vfio_container = open(VFIO_DEV_NODE, O_RDWR);
+	if (vfio_container == -1) {
+		ret = errno;
+		pr_err("Failed to open %s", VFIO_DEV_NODE);
+		return ret;
+	}
+
+	api = ioctl(vfio_container, VFIO_GET_API_VERSION);
+	if (api != VFIO_API_VERSION) {
+		pr_err("Unknown VFIO API version %d", api);
+		return -ENODEV;
+	}
+
+	iommu_type = vfio_get_iommu_type();
+	if (iommu_type < 0) {
+		pr_err("VFIO type-1 IOMMU not supported on this platform");
+		return iommu_type;
+	}
+
+	/* Sanity check our groups and add them to the container */
+	for (i = 0; i < kvm->cfg.num_vfio_groups; ++i) {
+		ret = vfio_group_init(kvm, &kvm->cfg.vfio_group[i]);
+		if (ret)
+			return ret;
+	}
+
+	/* Finalise the container */
+	if (ioctl(vfio_container, VFIO_SET_IOMMU, iommu_type)) {
+		ret = -errno;
+		pr_err("Failed to set IOMMU type %d for VFIO container",
+		       iommu_type);
+		return ret;
+	} else {
+		pr_info("Using IOMMU type %d for VFIO container", iommu_type);
+	}
+
+	return kvm__for_each_mem_bank(kvm, KVM_MEM_TYPE_RAM, vfio_map_mem_bank,
+				      NULL);
+}
+
+static int vfio__init(struct kvm *kvm)
+{
+	int ret;
+
+	if (!kvm->cfg.num_vfio_groups)
+		return 0;
+
+	ret = vfio_container_init(kvm);
+	if (ret)
+		return ret;
+
+	ret = vfio_configure_iommu_groups(kvm);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+dev_base_init(vfio__init);
+
+static int vfio__exit(struct kvm *kvm)
+{
+	int i;
+
+	if (!kvm->cfg.num_vfio_groups)
+		return 0;
+
+	for (i = 0; i < kvm->cfg.num_vfio_groups; ++i)
+		vfio_group_exit(kvm, &kvm->cfg.vfio_group[i]);
+
+	kvm__for_each_mem_bank(kvm, KVM_MEM_TYPE_RAM, vfio_unmap_mem_bank, NULL);
+	return close(vfio_container);
+}
+dev_base_exit(vfio__exit);
diff --git a/vfio/pci.c b/vfio/pci.c
new file mode 100644
index 00000000..aca43431
--- /dev/null
+++ b/vfio/pci.c
@@ -0,0 +1,365 @@
+#include "kvm/irq.h"
+#include "kvm/kvm.h"
+#include "kvm/kvm-cpu.h"
+#include "kvm/vfio.h"
+
+#include <sys/ioctl.h>
+#include <sys/eventfd.h>
+
+/* Wrapper around UAPI vfio_irq_set */
+struct vfio_irq_eventfd {
+	struct vfio_irq_set	irq;
+	int			fd;
+};
+
+static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      u8 offset, void *data, int sz)
+{
+	struct vfio_region_info *info;
+	struct vfio_pci_device *pdev;
+	struct vfio_device *vdev;
+	char base[sz];
+
+	pdev = container_of(pci_hdr, struct vfio_pci_device, hdr);
+	vdev = container_of(pdev, struct vfio_device, pci);
+	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
+
+	/* Dummy read in case of side-effects */
+	if (pread(vdev->fd, base, sz, info->offset + offset) != sz)
+		dev_warn(vdev, "failed to read %d bytes from Configuration Space at 0x%x",
+			 sz, offset);
+}
+
+static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			       u8 offset, void *data, int sz)
+{
+	struct vfio_region_info *info;
+	struct vfio_pci_device *pdev;
+	struct vfio_device *vdev;
+	void *base = pci_hdr;
+
+	pdev = container_of(pci_hdr, struct vfio_pci_device, hdr);
+	vdev = container_of(pdev, struct vfio_device, pci);
+	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
+
+	if (pwrite(vdev->fd, data, sz, info->offset + offset) != sz)
+		dev_warn(vdev, "Failed to write %d bytes to Configuration Space at 0x%x",
+			 sz, offset);
+
+	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
+		dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
+			 sz, offset);
+}
+
+static int vfio_pci_parse_caps(struct vfio_device *vdev)
+{
+	struct vfio_pci_device *pdev = &vdev->pci;
+
+	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
+		return 0;
+
+	pdev->hdr.status &= ~PCI_STATUS_CAP_LIST;
+	pdev->hdr.capabilities = 0;
+
+	/* TODO: install virtual capabilities */
+
+	return 0;
+}
+
+static int vfio_pci_parse_cfg_space(struct vfio_device *vdev)
+{
+	struct vfio_region_info *info;
+	ssize_t sz = PCI_DEV_CFG_SIZE;
+	struct vfio_pci_device *pdev = &vdev->pci;
+
+	if (vdev->info.num_regions < VFIO_PCI_CONFIG_REGION_INDEX) {
+		dev_err(vdev, "Config Space not found");
+		return -ENODEV;
+	}
+
+	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
+	*info = (struct vfio_region_info) {
+			.argsz = sizeof(*info),
+			.index = VFIO_PCI_CONFIG_REGION_INDEX,
+	};
+
+	ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
+	if (!info->size) {
+		dev_err(vdev, "Config Space has size zero?!");
+		return -EINVAL;
+	}
+
+	if (pread(vdev->fd, &pdev->hdr, sz, info->offset) != sz) {
+		dev_err(vdev, "failed to read %zd bytes of Config Space", sz);
+		return -EIO;
+	}
+
+	/* Strip bit 7, that indicates multifunction */
+	pdev->hdr.header_type &= 0x7f;
+
+	if (pdev->hdr.header_type != PCI_HEADER_TYPE_NORMAL) {
+		dev_err(vdev, "unsupported header type %u",
+			pdev->hdr.header_type);
+		return -EOPNOTSUPP;
+	}
+
+	vfio_pci_parse_caps(vdev);
+
+	return 0;
+}
+
+static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
+{
+	int i;
+	ssize_t hdr_sz;
+	struct vfio_region_info *info;
+	struct vfio_pci_device *pdev = &vdev->pci;
+
+	/* Enable exclusively MMIO and bus mastering */
+	pdev->hdr.command &= ~PCI_COMMAND_IO;
+	pdev->hdr.command |= PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER;
+
+	/* Initialise the BARs */
+	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
+		struct vfio_region *region = &vdev->regions[i];
+		u64 base = region->guest_phys_addr;
+
+		if (!base)
+			continue;
+
+		pdev->hdr.bar_size[i] = region->info.size;
+
+		/* Construct a fake reg to match what we've mapped. */
+		pdev->hdr.bar[i] = (base & PCI_BASE_ADDRESS_MEM_MASK) |
+					PCI_BASE_ADDRESS_SPACE_MEMORY |
+					PCI_BASE_ADDRESS_MEM_TYPE_32;
+	}
+
+	/* I really can't be bothered to support cardbus. */
+	pdev->hdr.card_bus = 0;
+
+	/*
+	 * Nuke the expansion ROM for now. If we want to do this properly,
+	 * we need to save its size somewhere and map into the guest.
+	 */
+	pdev->hdr.exp_rom_bar = 0;
+
+	/* Install our fake Configuration Space, without the caps */
+	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
+	hdr_sz = offsetof(struct pci_device_header, msix);
+	if (pwrite(vdev->fd, &pdev->hdr, hdr_sz, info->offset) != hdr_sz) {
+		dev_err(vdev, "failed to write %zd bytes to Config Space",
+			hdr_sz);
+		return -EIO;
+	}
+
+	/* TODO: install virtual capability */
+
+	/* Register callbacks for cfg accesses */
+	pdev->hdr.cfg_ops = (struct pci_config_operations) {
+		.read	= vfio_pci_cfg_read,
+		.write	= vfio_pci_cfg_write,
+	};
+
+	pdev->hdr.irq_type = IRQ_TYPE_LEVEL_HIGH;
+
+	return 0;
+}
+
+static int vfio_pci_configure_dev_regions(struct kvm *kvm,
+					  struct vfio_device *vdev)
+{
+	u32 i;
+	int ret;
+	size_t map_size;
+
+	ret = vfio_pci_parse_cfg_space(vdev);
+	if (ret)
+		return ret;
+
+	/* First of all, map the BARs directly into the guest */
+	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
+		struct vfio_region *region = &vdev->regions[i];
+
+		if (i >= vdev->info.num_regions)
+			break;
+
+		region->info = (struct vfio_region_info) {
+			.argsz = sizeof(*region),
+			.index = i,
+		};
+
+		ret = ioctl(vdev->fd, VFIO_DEVICE_GET_REGION_INFO,
+			    &region->info);
+		if (ret) {
+			ret = -errno;
+			dev_err(vdev, "cannot get info for region %u", i);
+			return ret;
+		}
+
+		/* Ignore invalid or unimplemented regions */
+		if (!region->info.size)
+			continue;
+
+		/* Grab some MMIO space in the guest */
+		map_size = ALIGN(region->info.size, PAGE_SIZE);
+		region->guest_phys_addr = pci_get_io_space_block(map_size);
+
+		/*
+		 * Map the BARs into the guest. We'll later need to update
+		 * configuration space to reflect our allocation.
+		 */
+		ret = vfio_map_region(kvm, vdev, region);
+		if (ret)
+			return ret;
+	}
+
+	/* We've configured the BARs, fake up a Configuration Space */
+	return vfio_pci_fixup_cfg_space(vdev);
+}
+
+static int vfio_pci_init_irqfd(struct kvm *kvm, int devfd, int gsi)
+{
+	int ret;
+	int trigger_fd, unmask_fd;
+	struct vfio_irq_eventfd	trigger;
+	struct vfio_irq_eventfd	unmask;
+
+	/*
+	 * PCI IRQ is level-triggered, so we use two eventfds. trigger_fd
+	 * signals an interrupt from host to guest, and unmask_fd signals the
+	 * deassertion of the line from guest to host.
+	 */
+	trigger_fd = eventfd(0, 0);
+	if (trigger_fd < 0) {
+		pr_err("Failed to create trigger eventfd");
+		return trigger_fd;
+	}
+
+	unmask_fd = eventfd(0, 0);
+	if (unmask_fd < 0) {
+		pr_err("Failed to create unmask eventfd");
+		close(trigger_fd);
+		return unmask_fd;
+	}
+
+	ret = irq__add_irqfd(kvm, gsi, trigger_fd, unmask_fd);
+	if (ret)
+		goto err_close;
+
+	trigger.irq = (struct vfio_irq_set) {
+		.argsz	= sizeof(trigger),
+		.flags	= VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_TRIGGER,
+		.index	= VFIO_PCI_INTX_IRQ_INDEX,
+		.start	= 0,
+		.count	= 1,
+	};
+	trigger.fd = trigger_fd;
+
+	ret = ioctl(devfd, VFIO_DEVICE_SET_IRQS, &trigger);
+	if (ret < 0) {
+		pr_err("Failed to setup VFIO IRQ");
+		goto err_delete_line;
+	}
+
+	unmask.irq = (struct vfio_irq_set) {
+		.argsz	= sizeof(unmask),
+		.flags	= VFIO_IRQ_SET_DATA_EVENTFD | VFIO_IRQ_SET_ACTION_UNMASK,
+		.index	= VFIO_PCI_INTX_IRQ_INDEX,
+		.start	= 0,
+		.count	= 1,
+	};
+	unmask.fd = unmask_fd;
+
+	ret = ioctl(devfd, VFIO_DEVICE_SET_IRQS, &unmask);
+	if (ret < 0) {
+		pr_err("Failed to setup unmask IRQ");
+		goto err_remove_event;
+	}
+
+	return 0;
+
+err_remove_event:
+	/* Remove trigger event */
+	trigger.irq.flags = VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER;
+	ioctl(devfd, VFIO_DEVICE_SET_IRQS, &trigger.irq);
+
+err_delete_line:
+	irq__del_irqfd(kvm, gsi, trigger_fd);
+
+err_close:
+	close(trigger_fd);
+	close(unmask_fd);
+	return ret;
+}
+
+static int vfio_pci_configure_dev_irqs(struct kvm *kvm, struct vfio_device *vdev)
+{
+	struct vfio_pci_device *pdev = &vdev->pci;
+	int gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
+
+	vdev->irq_info = (struct vfio_irq_info) {
+		.argsz = sizeof(vdev->irq_info),
+	};
+
+	ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &vdev->irq_info);
+	if (vdev->irq_info.count == 0) {
+		dev_err(vdev, "no interrupt found by VFIO");
+		return -ENODEV;
+	}
+
+	if (!(vdev->irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
+		dev_err(vdev, "interrupt not EVENTFD capable");
+		return -EINVAL;
+	}
+
+	/* TODO: add MSI support */
+	dev_err(vdev, "MSI-X not available, falling back to INTx");
+
+	if (!(vdev->irq_info.flags & VFIO_IRQ_INFO_AUTOMASKED)) {
+		dev_err(vdev, "INTx interrupt not AUTOMASKED");
+		return -EINVAL;
+	}
+
+	return vfio_pci_init_irqfd(kvm, vdev->fd, gsi);
+}
+
+int vfio_pci_setup_device(struct kvm *kvm, struct vfio_device *vdev)
+{
+	int ret;
+
+	ret = vfio_pci_configure_dev_regions(kvm, vdev);
+	if (ret) {
+		dev_err(vdev, "failed to configure regions");
+		return ret;
+	}
+
+	vdev->dev_hdr = (struct device_header) {
+		.bus_type	= DEVICE_BUS_PCI,
+		.data		= &vdev->pci.hdr,
+	};
+
+	ret = device__register(&vdev->dev_hdr);
+	if (ret) {
+		dev_err(vdev, "failed to register VFIO device");
+		return ret;
+	}
+
+	ret = vfio_pci_configure_dev_irqs(kvm, vdev);
+	if (ret) {
+		dev_err(vdev, "failed to configure IRQs");
+		return ret;
+	}
+
+	return 0;
+}
+
+void vfio_pci_teardown_device(struct kvm *kvm, struct vfio_device *vdev)
+{
+	size_t i;
+
+	for (i = 0; i < vdev->info.num_regions; i++)
+		vfio_unmap_region(kvm, &vdev->regions[i]);
+
+	device__unregister(&vdev->dev_hdr);
+}
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
                   ` (5 preceding siblings ...)
  2017-06-22 17:05 ` [PATCH v2 kvmtool 06/10] Add PCI device passthrough using VFIO Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  2017-07-31 17:49   ` Punit Agrawal
  2017-08-18 17:42   ` Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 08/10] vfio-pci: add MSI support Jean-Philippe Brucker
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Add virtual MSI-X tables for PCI devices, and create IRQFD routes to let
the kernel inject MSIs from a physical device directly into the guest.

It would be tempting to create the MSI routes at init time before starting
vCPUs, when we can afford to exit gracefully. But some of it must be
initialized when the guest requests it.

* On the KVM side, MSIs must be enabled after devices allocate their IRQ
  lines and irqchips are operational, which can happen until late_init.

* On the VFIO side, hardware state of devices may be updated when setting
  up MSIs. For example, when passing a virtio-pci-legacy device to the
  guest:

  (1) The device-specific configuration layout (in BAR0) depends on
      whether MSIs are enabled or not in the device. If they are enabled,
      the device-specific configuration starts at offset 24, otherwise it
      starts at offset 20.
  (2) Linux guest assumes that MSIs are initially disabled (doesn't
      actually check the capability). So it reads the device config at
      offset 20.
  (3) Had we enabled MSIs early, host would have enabled the MSI-X
      capability and device would return the config at offset 24.
  (4) The guest would read junk and explode.

Therefore we have to create MSI-X routes when the guest requests MSIs, and
enable/disable them in VFIO when the guest pokes the MSI-X capability. We
have to follow both physical and virtual state of the capability, which
makes the state machine a bit complex, but I think it works.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 include/kvm/vfio.h |  54 +++++
 vfio/pci.c         | 619 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 664 insertions(+), 9 deletions(-)

diff --git a/include/kvm/vfio.h b/include/kvm/vfio.h
index 060f32a3..3bb9c116 100644
--- a/include/kvm/vfio.h
+++ b/include/kvm/vfio.h
@@ -1,6 +1,7 @@
 #ifndef KVM__VFIO_H
 #define KVM__VFIO_H
 
+#include "kvm/mutex.h"
 #include "kvm/parse-options.h"
 #include "kvm/pci.h"
 
@@ -15,8 +16,61 @@
 
 #define MAX_VFIO_GROUPS			16
 
+enum vfio_pci_irq_type {
+	VFIO_PCI_IRQ_INTX		= VFIO_PCI_INTX_IRQ_INDEX,
+	VFIO_PCI_IRQ_MSI		= VFIO_PCI_MSI_IRQ_INDEX,
+	VFIO_PCI_IRQ_MSIX		= VFIO_PCI_MSIX_IRQ_INDEX,
+};
+
+/* MSI/MSI-X capability enabled */
+#define VFIO_PCI_MSI_STATE_ENABLED	(1 << 0)
+/* MSI/MSI-X capability or individual vector masked */
+#define VFIO_PCI_MSI_STATE_MASKED	(1 << 1)
+/* MSI-X capability has no vector enabled yet */
+#define VFIO_PCI_MSI_STATE_EMPTY	(1 << 2)
+
+struct vfio_pci_msi_entry {
+	struct msix_table		config;
+	int				gsi;
+	int				eventfd;
+	u8				phys_state;
+	u8				virt_state;
+};
+
+struct vfio_pci_msix_table {
+	size_t				size;
+	unsigned int			bar;
+	u32				guest_phys_addr;
+};
+
+struct vfio_pci_msix_pba {
+	size_t				size;
+	off_t				offset; /* in VFIO device fd */
+	unsigned int			bar;
+	u32				guest_phys_addr;
+};
+
+struct vfio_pci_msix_cap {
+	struct vfio_pci_msix_table	table;
+	struct vfio_pci_msix_pba	pba;
+};
+
+struct vfio_pci_msi_common {
+	off_t				pos;
+	u8				virt_state;
+	u8				phys_state;
+	struct mutex			mutex;
+	struct vfio_irq_set		*irq_set;
+	size_t				nr_entries;
+	struct vfio_pci_msi_entry	*entries;
+};
+
 struct vfio_pci_device {
 	struct pci_device_header	hdr;
+
+	enum vfio_pci_irq_type		irq_type;
+	struct vfio_pci_msi_common	msi;
+	struct vfio_pci_msix_cap	msix;
 };
 
 struct vfio_region {
diff --git a/vfio/pci.c b/vfio/pci.c
index aca43431..a273fabb 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -5,6 +5,8 @@
 
 #include <sys/ioctl.h>
 #include <sys/eventfd.h>
+#include <sys/resource.h>
+#include <sys/time.h>
 
 /* Wrapper around UAPI vfio_irq_set */
 struct vfio_irq_eventfd {
@@ -12,6 +14,289 @@ struct vfio_irq_eventfd {
 	int			fd;
 };
 
+#define msi_is_enabled(state)		((state) & VFIO_PCI_MSI_STATE_ENABLED)
+#define msi_is_masked(state)		((state) & VFIO_PCI_MSI_STATE_MASKED)
+#define msi_is_empty(state)		((state) & VFIO_PCI_MSI_STATE_EMPTY)
+
+#define msi_update_state(state, val, bit)				\
+	(state) = (val) ? (state) | bit : (state) & ~bit;
+#define msi_set_enabled(state, val)					\
+	msi_update_state(state, val, VFIO_PCI_MSI_STATE_ENABLED)
+#define msi_set_masked(state, val)					\
+	msi_update_state(state, val, VFIO_PCI_MSI_STATE_MASKED)
+#define msi_set_empty(state, val)					\
+	msi_update_state(state, val, VFIO_PCI_MSI_STATE_EMPTY)
+
+static int vfio_pci_enable_msis(struct kvm *kvm, struct vfio_device *vdev)
+{
+	size_t i;
+	int ret = 0;
+	int *eventfds;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct vfio_irq_eventfd single = {
+		.irq = {
+			.argsz	= sizeof(single),
+			.flags	= VFIO_IRQ_SET_DATA_EVENTFD |
+				  VFIO_IRQ_SET_ACTION_TRIGGER,
+			.index	= pdev->irq_type,
+			.count	= 1,
+		},
+	};
+
+	if (!msi_is_enabled(pdev->msi.virt_state))
+		return 0;
+
+	eventfds = (void *)pdev->msi.irq_set + sizeof(struct vfio_irq_set);
+
+	/*
+	 * Initial registration of the full range. This enables the physical
+	 * MSI/MSI-X capability.
+	 *
+	 * As an optimization, only update MSIs when guest unmasks the
+	 * capability. This greatly reduces the initialization time for Linux
+	 * guest with 2048+ MSIs. Linux guest starts by enabling the MSI-X cap
+	 * masked, then fills individual vectors, then unmasks the whole
+	 * function. So we only do one VFIO ioctl when enabling for the first
+	 * time, and then one when unmasking.
+	 *
+	 * phys_state is empty when it is enabled but no vector has been
+	 * registered via SET_IRQS yet.
+	 */
+	if (!msi_is_enabled(pdev->msi.phys_state) ||
+	    (!msi_is_masked(pdev->msi.virt_state) &&
+	     msi_is_empty(pdev->msi.phys_state))) {
+		bool empty = true;
+
+		for (i = 0; i < pdev->msi.nr_entries; i++) {
+			eventfds[i] = pdev->msi.entries[i].gsi >= 0 ?
+				      pdev->msi.entries[i].eventfd : -1;
+
+			if (eventfds[i] >= 0)
+				empty = false;
+		}
+
+		ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, pdev->msi.irq_set);
+		if (ret < 0) {
+			perror("VFIO_DEVICE_SET_IRQS(multi)");
+			return ret;
+		}
+
+		msi_set_enabled(pdev->msi.phys_state, true);
+		msi_set_empty(pdev->msi.phys_state, empty);
+
+		return 0;
+	}
+
+	if (msi_is_masked(pdev->msi.virt_state)) {
+		/* TODO: if phys_state is not empty nor masked, mask all vectors */
+		return 0;
+	}
+
+	/* Update individual vectors to avoid breaking those in use */
+	for (i = 0; i < pdev->msi.nr_entries; i++) {
+		struct vfio_pci_msi_entry *entry = &pdev->msi.entries[i];
+		int fd = entry->gsi >= 0 ? entry->eventfd : -1;
+
+		if (fd == eventfds[i])
+			continue;
+
+		single.irq.start = i;
+		single.fd = fd;
+
+		ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &single);
+		if (ret < 0) {
+			perror("VFIO_DEVICE_SET_IRQS(single)");
+			break;
+		}
+
+		eventfds[i] = fd;
+
+		if (msi_is_empty(pdev->msi.phys_state) && fd >= 0)
+		    msi_set_empty(pdev->msi.phys_state, false);
+	}
+
+	return ret;
+}
+
+static int vfio_pci_disable_msis(struct kvm *kvm, struct vfio_device *vdev)
+{
+	int ret;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct vfio_irq_set irq_set = {
+		.argsz	= sizeof(irq_set),
+		.flags 	= VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER,
+		.index 	= pdev->irq_type,
+		.start 	= 0,
+		.count	= 0,
+	};
+
+	if (!msi_is_enabled(pdev->msi.phys_state) ||
+	    msi_is_enabled(pdev->msi.virt_state))
+		return 0;
+
+	ret = ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+	if (ret < 0) {
+		perror("VFIO_DEVICE_SET_IRQS(NONE)");
+		return ret;
+	}
+
+	msi_set_enabled(pdev->msi.phys_state, false);
+	msi_set_empty(pdev->msi.phys_state, true);
+
+	return 0;
+}
+
+static int vfio_pci_update_msi_entry(struct kvm *kvm, struct vfio_device *vdev,
+				     struct vfio_pci_msi_entry *entry)
+{
+	int ret;
+
+	if (entry->eventfd < 0) {
+		entry->eventfd = eventfd(0, 0);
+		if (entry->eventfd < 0) {
+			ret = -errno;
+			dev_err(vdev, "cannot create eventfd");
+			return ret;
+		}
+	}
+
+	/* Allocate IRQ if necessary */
+	if (entry->gsi < 0) {
+		int ret = irq__add_msix_route(kvm, &entry->config.msg,
+					      vdev->dev_hdr.dev_num << 3);
+		if (ret < 0) {
+			dev_err(vdev, "cannot create MSI-X route");
+			return ret;
+		}
+		entry->gsi = ret;
+	} else {
+		irq__update_msix_route(kvm, entry->gsi, &entry->config.msg);
+	}
+
+	/*
+	 * MSI masking is unimplemented in VFIO, so we have to handle it by
+	 * disabling/enabling IRQ route instead. We do it on the KVM side rather
+	 * than VFIO, because:
+	 * - it is 8x faster
+	 * - it allows to decouple masking logic from capability state.
+	 * - in masked state, after removing irqfd route, we could easily plug
+	 *   the eventfd in a local handler, in order to serve Pending Bit reads
+	 *   to the guest.
+	 *
+	 * So entry->phys_state is masked when there is no active irqfd route.
+	 */
+	if (msi_is_masked(entry->virt_state) == msi_is_masked(entry->phys_state))
+		return 0;
+
+	if (msi_is_masked(entry->phys_state)) {
+		ret = irq__add_irqfd(kvm, entry->gsi, entry->eventfd, -1);
+		if (ret < 0) {
+			dev_err(vdev, "cannot setup irqfd");
+			return ret;
+		}
+	} else {
+		irq__del_irqfd(kvm, entry->gsi, entry->eventfd);
+	}
+
+	msi_set_masked(entry->phys_state, msi_is_masked(entry->virt_state));
+
+	return 0;
+}
+
+static void vfio_pci_msix_pba_access(struct kvm_cpu *vcpu, u64 addr, u8 *data,
+				     u32 len, u8 is_write, void *ptr)
+{
+	struct vfio_pci_device *pdev = ptr;
+	struct vfio_pci_msix_pba *pba = &pdev->msix.pba;
+	u64 offset = addr - pba->guest_phys_addr;
+	struct vfio_device *vdev = container_of(pdev, struct vfio_device, pci);
+
+	if (is_write)
+		return;
+
+	/*
+	 * TODO: emulate PBA. Hardware MSI-X is never masked, so reading the PBA
+	 * is completely useless here. Note that Linux doesn't use PBA.
+	 */
+	if (pread(vdev->fd, data, len, pba->offset + offset) != (ssize_t)len)
+		dev_err(vdev, "cannot access MSIX PBA\n");
+}
+
+static void vfio_pci_msix_table_access(struct kvm_cpu *vcpu, u64 addr, u8 *data,
+				       u32 len, u8 is_write, void *ptr)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct vfio_pci_msi_entry *entry;
+	struct vfio_pci_device *pdev = ptr;
+	struct vfio_device *vdev = container_of(pdev, struct vfio_device, pci);
+
+	u64 offset = addr - pdev->msix.table.guest_phys_addr;
+
+	size_t vector = offset / PCI_MSIX_ENTRY_SIZE;
+	/* PCI spec says that software must use aligned 4 or 8 bytes accesses */
+	off_t field = offset % PCI_MSIX_ENTRY_SIZE;
+	entry = &pdev->msi.entries[vector];
+
+	mutex_lock(&pdev->msi.mutex);
+
+	if (!is_write) {
+		memcpy(data, (void *)&entry->config + field, len);
+		goto out_unlock;
+	}
+
+	memcpy((void *)&entry->config + field, data, len);
+
+	if (field != PCI_MSIX_ENTRY_VECTOR_CTRL)
+		goto out_unlock;
+
+	msi_set_masked(entry->virt_state, entry->config.ctrl &
+		       PCI_MSIX_ENTRY_CTRL_MASKBIT);
+
+	if (vfio_pci_update_msi_entry(kvm, vdev, entry) < 0)
+		/* Not much we can do here. */
+		dev_err(vdev, "failed to configure MSIX vector %zu", vector);
+
+	/* Update the physical capability if necessary */
+	if (vfio_pci_enable_msis(kvm, vdev))
+		dev_err(vdev, "cannot enable MSIX");
+
+out_unlock:
+	mutex_unlock(&pdev->msi.mutex);
+}
+
+static void vfio_pci_msix_cap_write(struct kvm *kvm,
+				    struct vfio_device *vdev, u8 off,
+				    void *data, int sz)
+{
+	struct vfio_pci_device *pdev = &vdev->pci;
+	off_t enable_pos = PCI_MSIX_FLAGS + 1;
+	bool enable;
+	u16 flags;
+
+	off -= pdev->msi.pos;
+
+	/* Check if access intersects with the MSI-X Enable bit */
+	if (off > enable_pos || off + sz <= enable_pos)
+		return;
+
+	/* Read byte that contains the Enable bit */
+	flags = *(u8 *)(data + enable_pos - off) << 8;
+
+	msi_set_masked(pdev->msi.virt_state, flags & PCI_MSIX_FLAGS_MASKALL);
+	msi_set_enabled(pdev->msi.virt_state, flags & PCI_MSIX_FLAGS_ENABLE);
+
+	enable = flags & PCI_MSIX_FLAGS_ENABLE;
+
+	mutex_lock(&pdev->msi.mutex);
+
+	if (enable && vfio_pci_enable_msis(kvm, vdev))
+		dev_err(vdev, "cannot enable MSIX");
+	else if (!enable && vfio_pci_disable_msis(kvm, vdev))
+		dev_err(vdev, "cannot disable MSIX");
+
+	mutex_unlock(&pdev->msi.mutex);
+}
+
 static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
 			      u8 offset, void *data, int sz)
 {
@@ -46,22 +331,112 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 		dev_warn(vdev, "Failed to write %d bytes to Configuration Space at 0x%x",
 			 sz, offset);
 
+	/* Handle MSI write now, since it might update the hardware capability */
+	switch (pdev->irq_type) {
+	case VFIO_PCI_IRQ_MSIX:
+		vfio_pci_msix_cap_write(kvm, vdev, offset, data, sz);
+		break;
+	default:
+		break;
+	}
+
 	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
 		dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
 			 sz, offset);
 }
 
+static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
+{
+	switch (cap_hdr->type) {
+	case PCI_CAP_ID_MSIX:
+		return PCI_CAP_MSIX_SIZEOF;
+	default:
+		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
+		return 0;
+	}
+}
+
+/*
+ * Copy capability from physical header into virtual header, and add it to the
+ * virtual capability list.
+ *
+ * @fd_offset: offset of pci header into vfio device fd
+ * @pos: offset of capability from start of header
+ */
+static int vfio_pci_add_cap(struct vfio_device *vdev, struct pci_cap_hdr *cap_hdr,
+			    off_t fd_offset, off_t pos)
+{
+	int i;
+	ssize_t size = vfio_pci_cap_size(cap_hdr);
+	struct pci_device_header *hdr = &vdev->pci.hdr;
+	struct pci_cap_hdr *out = (void *)hdr + pos;
+
+	if (pread(vdev->fd, out, size, fd_offset + pos) != size)
+		return -errno;
+
+	out->next = 0;
+
+	if (!hdr->capabilities) {
+		hdr->capabilities = pos;
+		hdr->status |= PCI_STATUS_CAP_LIST;
+	} else {
+		/* Add cap at end of list */
+		struct pci_cap_hdr *last;
+
+		pci_for_each_cap(i, last, hdr)
+			;
+		last->next = pos;
+	}
+
+	return 0;
+}
+
 static int vfio_pci_parse_caps(struct vfio_device *vdev)
 {
+	u8 pos;
+	int ret;
+	struct pci_cap_hdr cap;
+	ssize_t sz = sizeof(cap);
+	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev = &vdev->pci;
 
 	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
 		return 0;
 
+	pos = pdev->hdr.capabilities & ~3;
+	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
+
 	pdev->hdr.status &= ~PCI_STATUS_CAP_LIST;
 	pdev->hdr.capabilities = 0;
 
-	/* TODO: install virtual capabilities */
+	for (; pos; pos = cap.next) {
+		if (pos >= PCI_DEV_CFG_SIZE) {
+			dev_warn(vdev, "ignoring cap outside of config space");
+			return -EINVAL;
+		}
+
+		if (pread(vdev->fd, &cap, sz, info->offset + pos) != sz) {
+			dev_warn(vdev, "failed to read from capabilities pointer (0x%x)",
+				 pos);
+			return -EINVAL;
+		}
+
+		switch (cap.type) {
+		case PCI_CAP_ID_MSIX:
+			ret = vfio_pci_add_cap(vdev, &cap, info->offset, pos);
+			if (ret) {
+				dev_warn(vdev, "failed to read capability structure %x",
+					 cap.type);
+				return ret;
+			}
+
+			pdev->msi.pos = pos;
+			pdev->irq_type = VFIO_PCI_IRQ_MSIX;
+			break;
+
+			/* Any other capability is hidden */
+		}
+	}
 
 	return 0;
 }
@@ -103,6 +478,8 @@ static int vfio_pci_parse_cfg_space(struct vfio_device *vdev)
 		return -EOPNOTSUPP;
 	}
 
+	pdev->irq_type = VFIO_PCI_IRQ_INTX;
+
 	vfio_pci_parse_caps(vdev);
 
 	return 0;
@@ -111,7 +488,11 @@ static int vfio_pci_parse_cfg_space(struct vfio_device *vdev)
 static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 {
 	int i;
+	int pos;
 	ssize_t hdr_sz;
+	ssize_t cap_sz;
+	struct msix_cap *msix;
+	struct pci_cap_hdr *cap;
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev = &vdev->pci;
 
@@ -144,6 +525,22 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 	 */
 	pdev->hdr.exp_rom_bar = 0;
 
+	/* Plumb in our fake MSI-X capability, if we have it. */
+	msix = pci_find_cap(&pdev->hdr, PCI_CAP_ID_MSIX);
+	if (msix) {
+		/* Add a shortcut to the PBA region for the MMIO handler */
+		int pba_index = VFIO_PCI_BAR0_REGION_INDEX + pdev->msix.pba.bar;
+		pdev->msix.pba.offset = vdev->regions[pba_index].info.offset +
+					(msix->pba_offset & PCI_MSIX_PBA_OFFSET);
+
+		/* Tidy up the capability */
+		msix->table_offset &= PCI_MSIX_TABLE_BIR;
+		msix->pba_offset &= PCI_MSIX_PBA_BIR;
+		if (pdev->msix.table.bar == pdev->msix.pba.bar)
+			msix->pba_offset |= pdev->msix.table.size &
+					    PCI_MSIX_PBA_OFFSET;
+	}
+
 	/* Install our fake Configuration Space, without the caps */
 	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
 	hdr_sz = offsetof(struct pci_device_header, msix);
@@ -153,7 +550,16 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 		return -EIO;
 	}
 
-	/* TODO: install virtual capability */
+	/* Install the fake capability list */
+	pci_for_each_cap(pos, cap, &pdev->hdr) {
+		cap_sz = vfio_pci_cap_size(cap);
+
+		if (pwrite(vdev->fd, cap, cap_sz, info->offset + pos) !=
+		    cap_sz) {
+			dev_err(vdev, "failed to write capability %u", cap->type);
+			return -EIO;
+		}
+	}
 
 	/* Register callbacks for cfg accesses */
 	pdev->hdr.cfg_ops = (struct pci_config_operations) {
@@ -166,17 +572,103 @@ static int vfio_pci_fixup_cfg_space(struct vfio_device *vdev)
 	return 0;
 }
 
+static int vfio_pci_create_msix_table(struct kvm *kvm,
+				      struct vfio_pci_device *pdev)
+{
+	int ret;
+	size_t i;
+	size_t nr_entries;
+	size_t table_size;
+	struct vfio_pci_msi_entry *entries;
+	struct vfio_pci_msix_pba *pba = &pdev->msix.pba;
+	struct vfio_pci_msix_table *table = &pdev->msix.table;
+	struct msix_cap *msix = (void *)&pdev->hdr + pdev->msi.pos;
+
+	table->bar = msix->table_offset & PCI_MSIX_TABLE_BIR;
+	pba->bar = msix->pba_offset & PCI_MSIX_TABLE_BIR;
+
+	/*
+	 * KVM needs memory regions to be multiple of and aligned on PAGE_SIZE.
+	 */
+	nr_entries = (msix->ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
+	table_size = ALIGN(nr_entries * PCI_MSIX_ENTRY_SIZE, PAGE_SIZE);
+
+	entries = calloc(nr_entries, sizeof(struct vfio_pci_msi_entry));
+	if (!entries)
+		return -ENOMEM;
+
+	for (i = 0; i < nr_entries; i++)
+		entries[i].config.ctrl = PCI_MSIX_ENTRY_CTRL_MASKBIT;
+
+	/*
+	 * To ease MSI-X cap configuration in case they share the same BAR,
+	 * collapse table and pending array. According to PCI, address spaces
+	 * must be power of two. Since nr_entries is a power of two, and PBA
+	 * size is less than table_size, reserve 2*table_size.
+	 */
+	table->guest_phys_addr = pci_get_io_space_block(2 * table_size);
+	if (!table->guest_phys_addr) {
+		pr_err("cannot allocate IO space");
+		ret = -ENOMEM;
+		goto out_free;
+	}
+	pba->guest_phys_addr = table->guest_phys_addr + table_size;
+
+	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table_size, false,
+				 vfio_pci_msix_table_access, pdev);
+	if (ret < 0)
+		goto out_free;
+
+	/*
+	 * We could map the physical PBA directly into the guest, but it's
+	 * likely smaller than a page, and we can only hand full pages to the
+	 * guest. Even though the PCI spec disallows sharing a page used for
+	 * MSI-X with any other resource, it allows to share the same page
+	 * between MSI-X table and PBA. For the sake of isolation, create a
+	 * virtual PBA.
+	 */
+	pba->size = nr_entries / 8;
+
+	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
+				 vfio_pci_msix_pba_access, pdev);
+	if (ret < 0)
+		goto out_free;
+
+	pdev->msi.entries = entries;
+	pdev->msi.nr_entries = nr_entries;
+	table->size = table_size;
+
+	return 0;
+
+out_free:
+	free(entries);
+
+	return ret;
+}
+
 static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 					  struct vfio_device *vdev)
 {
 	u32 i;
 	int ret;
 	size_t map_size;
+	struct vfio_pci_device *pdev = &vdev->pci;
 
 	ret = vfio_pci_parse_cfg_space(vdev);
 	if (ret)
 		return ret;
 
+	switch (pdev->irq_type) {
+	case VFIO_PCI_IRQ_MSIX:
+		ret = vfio_pci_create_msix_table(kvm, pdev);
+		break;
+	default:
+		break;
+	}
+
+	if (ret)
+		return ret;
+
 	/* First of all, map the BARs directly into the guest */
 	for (i = VFIO_PCI_BAR0_REGION_INDEX; i <= VFIO_PCI_BAR5_REGION_INDEX; ++i) {
 		struct vfio_region *region = &vdev->regions[i];
@@ -201,6 +693,17 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 		if (!region->info.size)
 			continue;
 
+		if (pdev->irq_type == VFIO_PCI_IRQ_MSIX) {
+			/* Trap and emulate MSI-X table */
+			if (i == pdev->msix.table.bar) {
+				region->guest_phys_addr = pdev->msix.table.guest_phys_addr;
+				continue;
+			} else if (i == pdev->msix.pba.bar) {
+				region->guest_phys_addr = pdev->msix.pba.guest_phys_addr;
+				continue;
+			}
+		}
+
 		/* Grab some MMIO space in the guest */
 		map_size = ALIGN(region->info.size, PAGE_SIZE);
 		region->guest_phys_addr = pci_get_io_space_block(map_size);
@@ -218,6 +721,87 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 	return vfio_pci_fixup_cfg_space(vdev);
 }
 
+/*
+ * Attempt to update the FD limit, if opening an eventfd for each IRQ vector
+ * would hit the limit. Which is likely to happen when a device uses 2048 MSIs.
+ */
+static int vfio_pci_reserve_irq_fds(size_t num)
+{
+	/*
+	 * I counted around 27 fds under normal load. Let's add 100 for good
+	 * measure.
+	 */
+	static size_t needed = 128;
+	struct rlimit fd_limit, new_limit;
+
+	needed += num;
+
+	if (getrlimit(RLIMIT_NOFILE, &fd_limit)) {
+		perror("getrlimit(RLIMIT_NOFILE)");
+		return 0;
+	}
+
+	if (fd_limit.rlim_cur >= needed)
+		return 0;
+
+	new_limit.rlim_cur = needed;
+
+	if (fd_limit.rlim_max < needed)
+		/* Try to bump hard limit (root only) */
+		new_limit.rlim_max = needed;
+	else
+		new_limit.rlim_max = fd_limit.rlim_max;
+
+	if (setrlimit(RLIMIT_NOFILE, &new_limit)) {
+		perror("setrlimit(RLIMIT_NOFILE)");
+		pr_warning("not enough FDs for full MSI-X support (estimated need: %zu)",
+			   (size_t)(needed - fd_limit.rlim_cur));
+	}
+
+	return 0;
+}
+
+static int vfio_pci_init_msi(struct kvm *kvm, struct vfio_pci_device *pdev)
+{
+	size_t i;
+	int *eventfds;
+	size_t irq_set_size;
+	struct vfio_pci_msi_entry *entry;
+	size_t nr_entries = pdev->msi.nr_entries;
+
+	mutex_init(&pdev->msi.mutex);
+
+	vfio_pci_reserve_irq_fds(nr_entries);
+
+	irq_set_size = sizeof(struct vfio_irq_set) + nr_entries * sizeof(int);
+
+	pdev->msi.irq_set = malloc(irq_set_size);
+	if (!pdev->msi.irq_set)
+		return -ENOMEM;
+
+	*pdev->msi.irq_set = (struct vfio_irq_set) {
+		.argsz	= irq_set_size,
+		.flags 	= VFIO_IRQ_SET_DATA_EVENTFD |
+			  VFIO_IRQ_SET_ACTION_TRIGGER,
+		.index 	= pdev->irq_type,
+		.start 	= 0,
+		.count 	= nr_entries,
+	};
+
+	eventfds = (void *)pdev->msi.irq_set + sizeof(struct vfio_irq_set);
+
+	for (i = 0; i < nr_entries; i++) {
+		entry = &pdev->msi.entries[i];
+		entry->gsi = -1;
+		entry->eventfd = -1;
+		msi_set_masked(entry->virt_state, true);
+		msi_set_masked(entry->phys_state, true);
+		eventfds[i] = -1;
+	}
+
+	return 0;
+}
+
 static int vfio_pci_init_irqfd(struct kvm *kvm, int devfd, int gsi)
 {
 	int ret;
@@ -225,6 +809,8 @@ static int vfio_pci_init_irqfd(struct kvm *kvm, int devfd, int gsi)
 	struct vfio_irq_eventfd	trigger;
 	struct vfio_irq_eventfd	unmask;
 
+	vfio_pci_reserve_irq_fds(2);
+
 	/*
 	 * PCI IRQ is level-triggered, so we use two eventfds. trigger_fd
 	 * signals an interrupt from host to guest, and unmask_fd signals the
@@ -295,11 +881,12 @@ err_close:
 
 static int vfio_pci_configure_dev_irqs(struct kvm *kvm, struct vfio_device *vdev)
 {
+	int ret;
 	struct vfio_pci_device *pdev = &vdev->pci;
-	int gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
 
 	vdev->irq_info = (struct vfio_irq_info) {
 		.argsz = sizeof(vdev->irq_info),
+		.index = pdev->irq_type,
 	};
 
 	ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, &vdev->irq_info);
@@ -313,15 +900,25 @@ static int vfio_pci_configure_dev_irqs(struct kvm *kvm, struct vfio_device *vdev
 		return -EINVAL;
 	}
 
-	/* TODO: add MSI support */
-	dev_err(vdev, "MSI-X not available, falling back to INTx");
+	if (pdev->irq_type == VFIO_PCI_IRQ_MSIX) {
+		if (vdev->irq_info.count != pdev->msi.nr_entries) {
+			dev_err(vdev, "invalid number of MSIs reported by VFIO");
+			return -EINVAL;
+		}
 
-	if (!(vdev->irq_info.flags & VFIO_IRQ_INFO_AUTOMASKED)) {
-		dev_err(vdev, "INTx interrupt not AUTOMASKED");
-		return -EINVAL;
+		ret = vfio_pci_init_msi(kvm, &vdev->pci);
+	} else {
+		int gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
+
+		if (!(vdev->irq_info.flags & VFIO_IRQ_INFO_AUTOMASKED)) {
+			dev_err(vdev, "INTx interrupt not AUTOMASKED");
+			return -EINVAL;
+		}
+
+		ret = vfio_pci_init_irqfd(kvm, vdev->fd, gsi);
 	}
 
-	return vfio_pci_init_irqfd(kvm, vdev->fd, gsi);
+	return ret;
 }
 
 int vfio_pci_setup_device(struct kvm *kvm, struct vfio_device *vdev)
@@ -357,9 +954,13 @@ int vfio_pci_setup_device(struct kvm *kvm, struct vfio_device *vdev)
 void vfio_pci_teardown_device(struct kvm *kvm, struct vfio_device *vdev)
 {
 	size_t i;
+	struct vfio_pci_device *pdev = &vdev->pci;
 
 	for (i = 0; i < vdev->info.num_regions; i++)
 		vfio_unmap_region(kvm, &vdev->regions[i]);
 
 	device__unregister(&vdev->dev_hdr);
+
+	free(pdev->msi.irq_set);
+	free(pdev->msi.entries);
 }
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 08/10] vfio-pci: add MSI support
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
                   ` (6 preceding siblings ...)
  2017-06-22 17:05 ` [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 09/10] Introduce reserved memory regions Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 10/10] vfio: check reserved regions before mapping DMA Jean-Philippe Brucker
  9 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

When a device has MSI capability but not MSI-X, use it.

This patch is untested. Consider it broken unless proven otherwise.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 include/kvm/pci.h |  23 +++++++++
 vfio/pci.c        | 146 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 167 insertions(+), 2 deletions(-)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 44e5adff..c5fc8254 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -59,6 +59,29 @@ struct msix_cap {
 	u32 pba_offset;
 };
 
+struct msi_cap_64 {
+	u8 cap;
+	u8 next;
+	u16 ctrl;
+	u32 address_lo;
+	u32 address_hi;
+	u16 data;
+	u16 _align;
+	u32 mask_bits;
+	u32 pend_bits;
+};
+
+struct msi_cap_32 {
+	u8 cap;
+	u8 next;
+	u16 ctrl;
+	u32 address_lo;
+	u16 data;
+	u16 _align;
+	u32 mask_bits;
+	u32 pend_bits;
+};
+
 struct pci_cap_hdr {
 	u8	type;
 	u8	next;
diff --git a/vfio/pci.c b/vfio/pci.c
index a273fabb..9e45d30b 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -297,6 +297,112 @@ static void vfio_pci_msix_cap_write(struct kvm *kvm,
 	mutex_unlock(&pdev->msi.mutex);
 }
 
+static int vfio_pci_msi_vector_write(struct kvm *kvm, struct vfio_device *vdev,
+				     u8 off, u8 *data, u32 sz)
+{
+	size_t i;
+	u32 mask = 0;
+	size_t mask_pos, start, limit;
+	struct vfio_pci_msi_entry *entry;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct msi_cap_64 *msi_cap_64 = (void *)&pdev->hdr + pdev->msi.pos;
+
+	if (!(msi_cap_64->ctrl & PCI_MSI_FLAGS_MASKBIT))
+		return 0;
+
+	if (msi_cap_64->ctrl & PCI_MSI_FLAGS_64BIT)
+		mask_pos = PCI_MSI_MASK_64;
+	else
+		mask_pos = PCI_MSI_MASK_32;
+
+	if (off >= mask_pos + 4 || off + sz <= mask_pos)
+		return 0;
+
+	/* Set mask to current state */
+	for (i = 0; i < pdev->msi.nr_entries; i++) {
+		entry = &pdev->msi.entries[i];
+		mask |= !!msi_is_masked(entry->virt_state) << i;
+	}
+
+	/* Update mask following the intersection of access and register */
+	start = max_t(size_t, off, mask_pos);
+	limit = min_t(size_t, off + sz, mask_pos + 4);
+
+	memcpy((void *)&mask + start - mask_pos, data + start - off,
+	       limit - start);
+
+	/* Update states if necessary */
+	for (i = 0; i < pdev->msi.nr_entries; i++) {
+		bool masked = mask & (1 << i);
+
+		entry = &pdev->msi.entries[i];
+		if (masked != msi_is_masked(entry->virt_state)) {
+			msi_set_masked(entry->virt_state, masked);
+			vfio_pci_update_msi_entry(kvm, vdev, entry);
+		}
+	}
+
+	return 1;
+}
+
+static void vfio_pci_msi_cap_write(struct kvm *kvm, struct vfio_device *vdev,
+				   u8 off, u8 *data, u32 sz)
+{
+	u8 ctrl;
+	struct msi_msg msg;
+	size_t i, nr_vectors;
+	struct vfio_pci_msi_entry *entry;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct msi_cap_64 *msi_cap_64 = (void *)&pdev->hdr + pdev->msi.pos;
+
+	off -= pdev->msi.pos;
+
+	/* Check if the guest is trying to update mask bits */
+	if (vfio_pci_msi_vector_write(kvm, vdev, off, data, sz))
+		return;
+
+	/* Only modify routes when guest pokes the enable bit */
+	if (off > PCI_MSI_FLAGS || off + sz <= PCI_MSI_FLAGS)
+		return;
+
+	ctrl = *(u8 *)(data + PCI_MSI_FLAGS - off);
+
+	mutex_lock(&pdev->msi.mutex);
+
+	msi_set_enabled(pdev->msi.virt_state, ctrl & PCI_MSI_FLAGS_ENABLE);
+
+	if (!msi_is_enabled(pdev->msi.virt_state)) {
+		vfio_pci_disable_msis(kvm, vdev);
+		mutex_unlock(&pdev->msi.mutex);
+		return;
+	}
+
+	/* Create routes for the requested vectors */
+	nr_vectors = 1 << ((ctrl & PCI_MSI_FLAGS_QSIZE) >> 4);
+
+	msg.address_lo = msi_cap_64->address_lo;
+	if (msi_cap_64->ctrl & PCI_MSI_FLAGS_64BIT) {
+		msg.address_hi = msi_cap_64->address_hi;
+		msg.data = msi_cap_64->data;
+	} else {
+		struct msi_cap_32 *msi_cap_32 = (void *)msi_cap_64;
+		msg.address_hi = 0;
+		msg.data = msi_cap_32->data;
+	}
+
+	for (i = 0; i < nr_vectors; i++) {
+		entry = &pdev->msi.entries[i];
+		entry->config.msg = msg;
+		vfio_pci_update_msi_entry(kvm, vdev, entry);
+	}
+
+	/* Update the physical capability if necessary */
+	if (vfio_pci_enable_msis(kvm, vdev))
+		dev_err(vdev, "cannot enable MSIX");
+
+	mutex_unlock(&pdev->msi.mutex);
+}
+
 static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
 			      u8 offset, void *data, int sz)
 {
@@ -333,6 +439,9 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 
 	/* Handle MSI write now, since it might update the hardware capability */
 	switch (pdev->irq_type) {
+	case VFIO_PCI_IRQ_MSI:
+		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
+		break;
 	case VFIO_PCI_IRQ_MSIX:
 		vfio_pci_msix_cap_write(kvm, vdev, offset, data, sz);
 		break;
@@ -345,11 +454,25 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 			 sz, offset);
 }
 
+static ssize_t vfio_pci_msi_cap_size(struct msi_cap_64 *cap_hdr)
+{
+	size_t size = 10;
+
+	if (cap_hdr->ctrl & PCI_MSI_FLAGS_64BIT)
+		size += 4;
+	if (cap_hdr->ctrl & PCI_MSI_FLAGS_MASKBIT)
+		size += 10;
+
+	return size;
+}
+
 static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
 {
 	switch (cap_hdr->type) {
 	case PCI_CAP_ID_MSIX:
 		return PCI_CAP_MSIX_SIZEOF;
+	case PCI_CAP_ID_MSI:
+		return vfio_pci_msi_cap_size((void *)cap_hdr);
 	default:
 		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
 		return 0;
@@ -423,6 +546,7 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 
 		switch (cap.type) {
 		case PCI_CAP_ID_MSIX:
+		case PCI_CAP_ID_MSI:
 			ret = vfio_pci_add_cap(vdev, &cap, info->offset, pos);
 			if (ret) {
 				dev_warn(vdev, "failed to read capability structure %x",
@@ -431,7 +555,8 @@ static int vfio_pci_parse_caps(struct vfio_device *vdev)
 			}
 
 			pdev->msi.pos = pos;
-			pdev->irq_type = VFIO_PCI_IRQ_MSIX;
+			pdev->irq_type = cap.type == PCI_CAP_ID_MSIX ?
+				VFIO_PCI_IRQ_MSIX : VFIO_PCI_IRQ_MSI;
 			break;
 
 			/* Any other capability is hidden */
@@ -646,6 +771,19 @@ out_free:
 	return ret;
 }
 
+static int vfio_pci_create_msi_cap(struct kvm *kvm, struct vfio_pci_device *pdev)
+{
+	struct msi_cap_64 *cap = (void *)&pdev->hdr + pdev->msi.pos;
+
+	pdev->msi.nr_entries = 1 << ((cap->ctrl & PCI_MSI_FLAGS_QMASK) >> 1),
+	pdev->msi.entries = calloc(pdev->msi.nr_entries,
+				   sizeof(struct vfio_pci_msi_entry));
+	if (!pdev->msi.entries)
+		return -ENOMEM;
+
+	return 0;
+}
+
 static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 					  struct vfio_device *vdev)
 {
@@ -662,6 +800,9 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 	case VFIO_PCI_IRQ_MSIX:
 		ret = vfio_pci_create_msix_table(kvm, pdev);
 		break;
+	case VFIO_PCI_IRQ_MSI:
+		ret = vfio_pci_create_msi_cap(kvm, pdev);
+		break;
 	default:
 		break;
 	}
@@ -900,7 +1041,8 @@ static int vfio_pci_configure_dev_irqs(struct kvm *kvm, struct vfio_device *vdev
 		return -EINVAL;
 	}
 
-	if (pdev->irq_type == VFIO_PCI_IRQ_MSIX) {
+	if (pdev->irq_type == VFIO_PCI_IRQ_MSIX ||
+	    pdev->irq_type == VFIO_PCI_IRQ_MSI) {
 		if (vdev->irq_info.count != pdev->msi.nr_entries) {
 			dev_err(vdev, "invalid number of MSIs reported by VFIO");
 			return -EINVAL;
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 09/10] Introduce reserved memory regions
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
                   ` (7 preceding siblings ...)
  2017-06-22 17:05 ` [PATCH v2 kvmtool 08/10] vfio-pci: add MSI support Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  2017-06-22 17:05 ` [PATCH v2 kvmtool 10/10] vfio: check reserved regions before mapping DMA Jean-Philippe Brucker
  9 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

When passing devices to the guest, there might be address ranges
unavailable to the device. For instance, if address 0x10000000 corresponds
to an MSI doorbell, any transaction from a device to that address will be
directed to the MSI controller and might not even reach the IOMMU.
0x10000000 is therefore reserved by the physical IOMMU in the guest's
physical space.

This patch introduces a simple API to register reserved ranges of
addresses that should not or cannot be provided to the guest. For the
moment it only checks that a reserved range does not overlap any user
memory (we don't consider MMIO) and aborts otherwise.

It should be possible instead to poke holes in the guest-physical memory
map and report them via the architecture's preferred route:
* ARM and PowerPC can add reserved-memory nodes to the DT they provide to
  the guest.
* x86 could poke holes in the memory map reported with e820. This requires
  to postpone creating the memory map until at least VFIO is initialized.
* MIPS could describe the reserved ranges with the "memmap=mm$ss" kernel
  parameter.

This would also require to call KVM_SET_USER_MEMORY_REGION for all memory
regions at the end of kvmtool initialisation. Extra care should be taken
to ensure we don't break any architecture, since they currently rely on
having a linear address space with at most two memory blocks.

This patch doesn't implement any address space carving. If an abort is
encountered, user can try to rebuild kvmtool with different addresses or
change its IOMMU resv regions if possible.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 include/kvm/kvm.h | 12 +++++++++-
 kvm.c             | 68 +++++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 57061566..8546787a 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -35,10 +35,12 @@ enum {
 };
 
 enum kvm_mem_type {
+	KVM_MEM_TYPE_RESERVED	= 1 << 0,
 	KVM_MEM_TYPE_RAM	= 1 << 1,
 	KVM_MEM_TYPE_DEVICE	= 1 << 2,
 
-	KVM_MEM_TYPE_ALL	= KVM_MEM_TYPE_RAM
+	KVM_MEM_TYPE_ALL	= KVM_MEM_TYPE_RESERVED
+				| KVM_MEM_TYPE_RAM
 				| KVM_MEM_TYPE_DEVICE
 };
 
@@ -115,6 +117,12 @@ static inline int kvm__register_dev_mem(struct kvm *kvm, u64 guest_phys,
 				 KVM_MEM_TYPE_DEVICE);
 }
 
+static inline int kvm__reserve_mem(struct kvm *kvm, u64 guest_phys, u64 size)
+{
+	return kvm__register_mem(kvm, guest_phys, size, NULL,
+				 KVM_MEM_TYPE_RESERVED);
+}
+
 int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
 		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
 			void *ptr);
@@ -150,6 +158,8 @@ static inline const char *kvm_mem_type_to_string(enum kvm_mem_type type)
 		return "RAM";
 	case KVM_MEM_TYPE_DEVICE:
 		return "device";
+	case KVM_MEM_TYPE_RESERVED:
+		return "reserved";
 	}
 
 	return "???";
diff --git a/kvm.c b/kvm.c
index 39ce88c4..9078a026 100644
--- a/kvm.c
+++ b/kvm.c
@@ -177,18 +177,55 @@ int kvm__exit(struct kvm *kvm)
 }
 core_exit(kvm__exit);
 
-/*
- * Note: KVM_SET_USER_MEMORY_REGION assumes that we don't pass overlapping
- * memory regions to it. Therefore, be careful if you use this function for
- * registering memory regions for emulating hardware.
- */
 int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 		      void *userspace_addr, enum kvm_mem_type type)
 {
 	struct kvm_userspace_memory_region mem;
+	struct kvm_mem_bank *merged = NULL;
 	struct kvm_mem_bank *bank;
 	int ret;
 
+	/* Check for overlap */
+	list_for_each_entry(bank, &kvm->mem_banks, list) {
+		u64 bank_end = bank->guest_phys_addr + bank->size - 1;
+		u64 end = guest_phys + size - 1;
+		if (guest_phys > bank_end || end < bank->guest_phys_addr)
+			continue;
+
+		/* Merge overlapping reserved regions */
+		if (bank->type == KVM_MEM_TYPE_RESERVED &&
+		    type == KVM_MEM_TYPE_RESERVED) {
+			bank->guest_phys_addr = min(bank->guest_phys_addr, guest_phys);
+			bank->size = max(bank_end, end) - bank->guest_phys_addr + 1;
+
+			if (merged) {
+				/*
+				 * This is at least the second merge, remove
+				 * previous result.
+				 */
+				list_del(&merged->list);
+				free(merged);
+			}
+
+			guest_phys = bank->guest_phys_addr;
+			size = bank->size;
+			merged = bank;
+
+			/* Keep checking that we don't overlap another region */
+			continue;
+		}
+
+		pr_err("%s region [%llx-%llx] would overlap %s region [%llx-%llx]",
+		       kvm_mem_type_to_string(type), guest_phys, guest_phys + size - 1,
+		       kvm_mem_type_to_string(bank->type), bank->guest_phys_addr,
+		       bank->guest_phys_addr + bank->size - 1);
+
+		return -EINVAL;
+	}
+
+	if (merged)
+		return 0;
+
 	bank = malloc(sizeof(*bank));
 	if (!bank)
 		return -ENOMEM;
@@ -199,18 +236,21 @@ int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size,
 	bank->size			= size;
 	bank->type			= type;
 
-	mem = (struct kvm_userspace_memory_region) {
-		.slot			= kvm->mem_slots++,
-		.guest_phys_addr	= guest_phys,
-		.memory_size		= size,
-		.userspace_addr		= (unsigned long)userspace_addr,
-	};
+	if (type != KVM_MEM_TYPE_RESERVED) {
+		mem = (struct kvm_userspace_memory_region) {
+			.slot			= kvm->mem_slots++,
+			.guest_phys_addr	= guest_phys,
+			.memory_size		= size,
+			.userspace_addr		= (unsigned long)userspace_addr,
+		};
 
-	ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
-	if (ret < 0)
-		return -errno;
+		ret = ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem);
+		if (ret < 0)
+			return -errno;
+	}
 
 	list_add(&bank->list, &kvm->mem_banks);
+
 	return 0;
 }
 
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 kvmtool 10/10] vfio: check reserved regions before mapping DMA
  2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
                   ` (8 preceding siblings ...)
  2017-06-22 17:05 ` [PATCH v2 kvmtool 09/10] Introduce reserved memory regions Jean-Philippe Brucker
@ 2017-06-22 17:05 ` Jean-Philippe Brucker
  9 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-06-22 17:05 UTC (permalink / raw)
  To: kvm; +Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Use the new reserved_regions API to ensure that RAM doesn't overlap any
reserved region. This prevents for instance from mapping an MSI doorbell
into the guest IPA space. For the moment we reject any overlapping. In the
future, we might carve reserved regions out of the guest physical
space.

Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
---
 vfio/core.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/vfio/core.c b/vfio/core.c
index 7e1ba789..10b67f6d 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -84,6 +84,37 @@ void vfio_unmap_region(struct kvm *kvm, struct vfio_region *region)
 	munmap(region->host_addr, region->info.size);
 }
 
+static int vfio_configure_reserved_regions(struct kvm *kvm,
+					   struct vfio_group *group)
+{
+	FILE *file;
+	int ret = 0;
+	char type[9];
+	char filename[PATH_MAX];
+	unsigned long long start, end;
+
+	snprintf(filename, PATH_MAX, IOMMU_GROUP_DIR "/%lu/reserved_regions",
+		 group->id);
+
+	/* reserved_regions might not be present on older systems */
+	if (access(filename, F_OK))
+		return 0;
+
+	file = fopen(filename, "r");
+	if (!file)
+		return -errno;
+
+	while (fscanf(file, "0x%llx 0x%llx %8s\n", &start, &end, type) == 3) {
+		ret = kvm__reserve_mem(kvm, start, end - start + 1);
+		if (ret)
+			break;
+	}
+
+	fclose(file);
+
+	return ret;
+}
+
 static int vfio_configure_device(struct kvm *kvm, struct vfio_group *group,
 				 const char *dirpath, const char *name)
 {
@@ -196,6 +227,10 @@ static int vfio_configure_iommu_groups(struct kvm *kvm)
 				return ret;
 		}
 
+		ret = vfio_configure_reserved_regions(kvm, group);
+		if (ret)
+			return ret;
+
 		if (closedir(dir))
 			pr_warning("Failed to close IOMMU group %s", dirpath);
 	}
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support
  2017-06-22 17:05 ` [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support Jean-Philippe Brucker
@ 2017-07-31 17:49   ` Punit Agrawal
  2017-08-01 16:04     ` Punit Agrawal
  2017-08-18 17:42   ` Jean-Philippe Brucker
  1 sibling, 1 reply; 24+ messages in thread
From: Punit Agrawal @ 2017-07-31 17:49 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Hi Jean-Philippe,

I ran into an issue while testing these series on a Seattle based
system. More below...

Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:

> Add virtual MSI-X tables for PCI devices, and create IRQFD routes to let
> the kernel inject MSIs from a physical device directly into the guest.
>
> It would be tempting to create the MSI routes at init time before starting
> vCPUs, when we can afford to exit gracefully. But some of it must be
> initialized when the guest requests it.
>
> * On the KVM side, MSIs must be enabled after devices allocate their IRQ
>   lines and irqchips are operational, which can happen until late_init.
>
> * On the VFIO side, hardware state of devices may be updated when setting
>   up MSIs. For example, when passing a virtio-pci-legacy device to the
>   guest:
>
>   (1) The device-specific configuration layout (in BAR0) depends on
>       whether MSIs are enabled or not in the device. If they are enabled,
>       the device-specific configuration starts at offset 24, otherwise it
>       starts at offset 20.
>   (2) Linux guest assumes that MSIs are initially disabled (doesn't
>       actually check the capability). So it reads the device config at
>       offset 20.
>   (3) Had we enabled MSIs early, host would have enabled the MSI-X
>       capability and device would return the config at offset 24.
>   (4) The guest would read junk and explode.
>
> Therefore we have to create MSI-X routes when the guest requests MSIs, and
> enable/disable them in VFIO when the guest pokes the MSI-X capability. We
> have to follow both physical and virtual state of the capability, which
> makes the state machine a bit complex, but I think it works.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  include/kvm/vfio.h |  54 +++++
>  vfio/pci.c         | 619 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 664 insertions(+), 9 deletions(-)
>

[...]

> +static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
> +{
> +	switch (cap_hdr->type) {
> +	case PCI_CAP_ID_MSIX:
> +		return PCI_CAP_MSIX_SIZEOF;
> +	default:
> +		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
> +		return 0;
> +	}
> +}
> +
> +/*
> + * Copy capability from physical header into virtual header, and add it to the
> + * virtual capability list.
> + *
> + * @fd_offset: offset of pci header into vfio device fd
> + * @pos: offset of capability from start of header
> + */
> +static int vfio_pci_add_cap(struct vfio_device *vdev, struct pci_cap_hdr *cap_hdr,
> +			    off_t fd_offset, off_t pos)
> +{
> +	int i;
> +	ssize_t size = vfio_pci_cap_size(cap_hdr);
> +	struct pci_device_header *hdr = &vdev->pci.hdr;
> +	struct pci_cap_hdr *out = (void *)hdr + pos;
> +
> +	if (pread(vdev->fd, out, size, fd_offset + pos) != size)
> +		return -errno;
> +
> +	out->next = 0;
> +
> +	if (!hdr->capabilities) {
> +		hdr->capabilities = pos;
> +		hdr->status |= PCI_STATUS_CAP_LIST;
> +	} else {
> +		/* Add cap at end of list */
> +		struct pci_cap_hdr *last;
> +
> +		pci_for_each_cap(i, last, hdr)
> +			;
> +		last->next = pos;

Setting the next capability is somehow overwriting the vendor id in the
virtual header. This leads to the device not being probed by the
appropriate driver.

I did a quick hack to prevent the vendor_id override and was able to
proceed with probing the device. But there are still some issues with
setting up the capabilities.

Can you take a look to see what's going wrong?

Thanks,
Punit

> +	}
> +
> +	return 0;
> +}
> +
>  static int vfio_pci_parse_caps(struct vfio_device *vdev)
>  {
> +	u8 pos;
> +	int ret;
> +	struct pci_cap_hdr cap;
> +	ssize_t sz = sizeof(cap);
> +	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev = &vdev->pci;
>  
>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>  		return 0;
>  
> +	pos = pdev->hdr.capabilities & ~3;
> +	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
> +
>  	pdev->hdr.status &= ~PCI_STATUS_CAP_LIST;
>  	pdev->hdr.capabilities = 0;
>  
> -	/* TODO: install virtual capabilities */
> +	for (; pos; pos = cap.next) {
> +		if (pos >= PCI_DEV_CFG_SIZE) {
> +			dev_warn(vdev, "ignoring cap outside of config space");
> +			return -EINVAL;
> +		}
> +
> +		if (pread(vdev->fd, &cap, sz, info->offset + pos) != sz) {
> +			dev_warn(vdev, "failed to read from capabilities pointer (0x%x)",
> +				 pos);
> +			return -EINVAL;
> +		}
> +
> +		switch (cap.type) {
> +		case PCI_CAP_ID_MSIX:
> +			ret = vfio_pci_add_cap(vdev, &cap, info->offset, pos);
> +			if (ret) {
> +				dev_warn(vdev, "failed to read capability structure %x",
> +					 cap.type);
> +				return ret;
> +			}
> +
> +			pdev->msi.pos = pos;
> +			pdev->irq_type = VFIO_PCI_IRQ_MSIX;
> +			break;
> +
> +			/* Any other capability is hidden */
> +		}
> +	}
>  
>  	return 0;
>  }

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 06/10] Add PCI device passthrough using VFIO
  2017-06-22 17:05 ` [PATCH v2 kvmtool 06/10] Add PCI device passthrough using VFIO Jean-Philippe Brucker
@ 2017-07-31 17:52   ` Punit Agrawal
  2017-08-02 15:17     ` Jean-Philippe Brucker
  0 siblings, 1 reply; 24+ messages in thread
From: Punit Agrawal @ 2017-07-31 17:52 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Hi Jean-Philippe,

A couple of queries below -

Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:

> Assigning devices using VFIO allows the guest to have direct access to the
> device, whilst filtering accesses to sensitive areas by trapping config
> space accesses and mapping DMA with an IOMMU.
>
> This patch adds a new option to lkvm run: --vfio-group=<group_number>.
> Before assigning a device to a VM, some preparation is required. As
> described in Linux Documentation/vfio.txt, the device driver need to be

Nitpick: "needs"

> changed to vfio-pci:
>
>   $ dev=0000:00:00.0
>
>   $ echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
>   $ echo vfio-pci > /sys/bus/pci/devices/$dev/driver_override
>   $ echo $dev > /sys/bus/pci/drivers_probe
>   $ readlink /sys/bus/pci/devices/$dev/iommu_group
>   ../../../kernel/iommu_groups/5
>
> Adding --vfio[-group]=5 to lkvm-run will pass the device to the guest.
> Multiple groups can be passed to the guest by adding more --vfio
> parameters.
>
> This patch only implements PCI with INTx. MSI-X routing will be added in a
> subsequent patch, and at some point we might add support for passing
> platform devices to guests.
>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  Makefile                 |   2 +
>  arm/pci.c                |   1 +
>  builtin-run.c            |   5 +
>  include/kvm/kvm-config.h |   3 +
>  include/kvm/pci.h        |   3 +-
>  include/kvm/vfio.h       |  57 +++++++
>  vfio/core.c              | 395 +++++++++++++++++++++++++++++++++++++++++++++++
>  vfio/pci.c               | 365 +++++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 830 insertions(+), 1 deletion(-)
>  create mode 100644 include/kvm/vfio.h
>  create mode 100644 vfio/core.c
>  create mode 100644 vfio/pci.c
>

[...]

> +static int vfio_container_init(struct kvm *kvm)
> +{
> +	int api, i, ret, iommu_type;;
> +
> +	/* Create a container for our IOMMU groups */
> +	vfio_container = open(VFIO_DEV_NODE, O_RDWR);
> +	if (vfio_container == -1) {
> +		ret = errno;
> +		pr_err("Failed to open %s", VFIO_DEV_NODE);
> +		return ret;
> +	}
> +
> +	api = ioctl(vfio_container, VFIO_GET_API_VERSION);
> +	if (api != VFIO_API_VERSION) {
> +		pr_err("Unknown VFIO API version %d", api);
> +		return -ENODEV;
> +	}

We are using the VFIO_API_VERSION pulled in from linux/vfio.h. Will
kvmtool be in trouble if for some reason the API version is incompatibly
incremented in the future?

> +
> +	iommu_type = vfio_get_iommu_type();
> +	if (iommu_type < 0) {
> +		pr_err("VFIO type-1 IOMMU not supported on this platform");
> +		return iommu_type;
> +	}
> +
> +	/* Sanity check our groups and add them to the container */
> +	for (i = 0; i < kvm->cfg.num_vfio_groups; ++i) {
> +		ret = vfio_group_init(kvm, &kvm->cfg.vfio_group[i]);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	/* Finalise the container */
> +	if (ioctl(vfio_container, VFIO_SET_IOMMU, iommu_type)) {
> +		ret = -errno;
> +		pr_err("Failed to set IOMMU type %d for VFIO container",
> +		       iommu_type);
> +		return ret;

Just checking - is there a need to remove the groups from the containers
(added by VFIO_GROUP_SET_CONTAINER in vfio_group_init()) before erroring
out here? I imagine freeing of the vfio_container when kvmtool is
cleaning up after itself will do the right thing.

Thanks,
Punit

> +	} else {
> +		pr_info("Using IOMMU type %d for VFIO container", iommu_type);
> +	}
> +
> +	return kvm__for_each_mem_bank(kvm, KVM_MEM_TYPE_RAM, vfio_map_mem_bank,
> +				      NULL);
> +}
> +
> +static int vfio__init(struct kvm *kvm)
> +{
> +	int ret;
> +
> +	if (!kvm->cfg.num_vfio_groups)
> +		return 0;
> +
> +	ret = vfio_container_init(kvm);
> +	if (ret)
> +		return ret;
> +
> +	ret = vfio_configure_iommu_groups(kvm);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +dev_base_init(vfio__init);
> +
> +static int vfio__exit(struct kvm *kvm)
> +{
> +	int i;
> +
> +	if (!kvm->cfg.num_vfio_groups)
> +		return 0;
> +
> +	for (i = 0; i < kvm->cfg.num_vfio_groups; ++i)
> +		vfio_group_exit(kvm, &kvm->cfg.vfio_group[i]);
> +
> +	kvm__for_each_mem_bank(kvm, KVM_MEM_TYPE_RAM, vfio_unmap_mem_bank, NULL);
> +	return close(vfio_container);
> +}
> +dev_base_exit(vfio__exit);

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 03/10] irq: add irqfd helpers
  2017-06-22 17:05 ` [PATCH v2 kvmtool 03/10] irq: add irqfd helpers Jean-Philippe Brucker
@ 2017-07-31 17:55   ` Punit Agrawal
  2017-08-02 15:17     ` Jean-Philippe Brucker
  0 siblings, 1 reply; 24+ messages in thread
From: Punit Agrawal @ 2017-07-31 17:55 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:

> Add helpers to add and remove IRQFD routing for both irqchips and MSIs.
> We have to make a special case of IRQ lines on ARM where the
> initialisation order goes like this:
>
>  (1) Devices reserve their IRQ lines
>  (2) VGIC is setup with VGIC_CTRL_INIT (in a late_init call)
>  (3) MSIs are reserved lazily, when the guest needs them
>
> Since we cannot setup IRQFD before (2), store the IRQFD routing for IRQ
> lines temporarily until we're ready to submit them.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
>  arm/gic.c                    | 74 +++++++++++++++++++++++++++++++++++++++++++-
>  arm/include/arm-common/gic.h |  6 ++++
>  hw/pci-shmem.c               |  8 +----
>  include/kvm/irq.h            | 17 ++++++++++
>  irq.c                        | 24 ++++++++++++++
>  virtio/net.c                 |  9 ++----
>  virtio/scsi.c                | 10 ++----
>  7 files changed, 126 insertions(+), 22 deletions(-)
>

[...]


> diff --git a/arm/include/arm-common/gic.h b/arm/include/arm-common/gic.h
> index 433dd237..0c279aca 100644
> --- a/arm/include/arm-common/gic.h
> +++ b/arm/include/arm-common/gic.h
> @@ -33,4 +33,10 @@ int gic__alloc_irqnum(void);
>  int gic__create(struct kvm *kvm, enum irqchip_type type);
>  void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type);
>  
> +int gic__add_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd,
> +		   int resample_fd);
> +void gic__del_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd);
> +#define irq__add_irqfd gic__add_irqfd
> +#define irq__del_irqfd gic__del_irqfd
> +
>  #endif /* ARM_COMMON__GIC_H */
> diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
> index 512b5b06..107043e9 100644
> --- a/hw/pci-shmem.c
> +++ b/hw/pci-shmem.c
> @@ -127,7 +127,6 @@ static void callback_mmio_msix(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len
>  int pci_shmem__get_local_irqfd(struct kvm *kvm)
>  {
>  	int fd, gsi, r;
> -	struct kvm_irqfd irqfd;
>  
>  	if (local_fd == 0) {
>  		fd = eventfd(0, 0);
> @@ -143,12 +142,7 @@ int pci_shmem__get_local_irqfd(struct kvm *kvm)
>  			gsi = pci_shmem_pci_device.irq_line;
>  		}
>  
> -		irqfd = (struct kvm_irqfd) {
> -			.fd = fd,
> -			.gsi = gsi,
> -		};
> -
> -		r = ioctl(kvm->vm_fd, KVM_IRQFD, &irqfd);
> +		r = irq__add_irqfd(kvm, gsi, fd, -1);
>  		if (r < 0)
>  			return r;
>  

Not something you've introduced but I couldn't find any users of
pci_shmem__get_local_irqfd(). Could this function be dropped - either as
a pre-patch or separately?

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support
  2017-07-31 17:49   ` Punit Agrawal
@ 2017-08-01 16:04     ` Punit Agrawal
  2017-08-02 15:18       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 24+ messages in thread
From: Punit Agrawal @ 2017-08-01 16:04 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Punit Agrawal <punit.agrawal@arm.com> writes:

> Hi Jean-Philippe,
>
> I ran into an issue while testing these series on a Seattle based
> system. More below...
>
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:
>
>> Add virtual MSI-X tables for PCI devices, and create IRQFD routes to let
>> the kernel inject MSIs from a physical device directly into the guest.
>>
>> It would be tempting to create the MSI routes at init time before starting
>> vCPUs, when we can afford to exit gracefully. But some of it must be
>> initialized when the guest requests it.
>>
>> * On the KVM side, MSIs must be enabled after devices allocate their IRQ
>>   lines and irqchips are operational, which can happen until late_init.
>>
>> * On the VFIO side, hardware state of devices may be updated when setting
>>   up MSIs. For example, when passing a virtio-pci-legacy device to the
>>   guest:
>>
>>   (1) The device-specific configuration layout (in BAR0) depends on
>>       whether MSIs are enabled or not in the device. If they are enabled,
>>       the device-specific configuration starts at offset 24, otherwise it
>>       starts at offset 20.
>>   (2) Linux guest assumes that MSIs are initially disabled (doesn't
>>       actually check the capability). So it reads the device config at
>>       offset 20.
>>   (3) Had we enabled MSIs early, host would have enabled the MSI-X
>>       capability and device would return the config at offset 24.
>>   (4) The guest would read junk and explode.
>>
>> Therefore we have to create MSI-X routes when the guest requests MSIs, and
>> enable/disable them in VFIO when the guest pokes the MSI-X capability. We
>> have to follow both physical and virtual state of the capability, which
>> makes the state machine a bit complex, but I think it works.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> ---
>>  include/kvm/vfio.h |  54 +++++
>>  vfio/pci.c         | 619 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 664 insertions(+), 9 deletions(-)
>>
>
> [...]
>
>> +static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
>> +{
>> +	switch (cap_hdr->type) {
>> +	case PCI_CAP_ID_MSIX:
>> +		return PCI_CAP_MSIX_SIZEOF;
>> +	default:
>> +		pr_err("unknown PCI capability 0x%x", cap_hdr->type);
>> +		return 0;
>> +	}
>> +}
>> +
>> +/*
>> + * Copy capability from physical header into virtual header, and add it to the
>> + * virtual capability list.
>> + *
>> + * @fd_offset: offset of pci header into vfio device fd
>> + * @pos: offset of capability from start of header
>> + */
>> +static int vfio_pci_add_cap(struct vfio_device *vdev, struct pci_cap_hdr *cap_hdr,
>> +			    off_t fd_offset, off_t pos)
>> +{
>> +	int i;
>> +	ssize_t size = vfio_pci_cap_size(cap_hdr);
>> +	struct pci_device_header *hdr = &vdev->pci.hdr;
>> +	struct pci_cap_hdr *out = (void *)hdr + pos;
>> +
>> +	if (pread(vdev->fd, out, size, fd_offset + pos) != size)
>> +		return -errno;
>> +
>> +	out->next = 0;
>> +
>> +	if (!hdr->capabilities) {
>> +		hdr->capabilities = pos;
>> +		hdr->status |= PCI_STATUS_CAP_LIST;
>> +	} else {
>> +		/* Add cap at end of list */
>> +		struct pci_cap_hdr *last;
>> +
>> +		pci_for_each_cap(i, last, hdr)
>> +			;
>> +		last->next = pos;
>
> Setting the next capability is somehow overwriting the vendor id in the
> virtual header. This leads to the device not being probed by the
> appropriate driver.
>
> I did a quick hack to prevent the vendor_id override and was able to
> proceed with probing the device. But there are still some issues with
> setting up the capabilities.
>
> Can you take a look to see what's going wrong?

I think I've figured out what's going wrong. pci_for_each_cap is defined
as - 

#define pci_for_each_cap(pos, cap, hdr)                         \
        for ((pos) = (hdr)->capabilities & ~3;                  \
             (cap) = (void *)(hdr) + (pos), (pos) != 0;         \
             (pos) = ((struct pci_cap_hdr *)(cap))->next & ~3)

Here cap is being assigned before the pos != 0 check. So in the last
iteration through the loop, cap will point to the start of the PCI
header - which is then used to set "last->next = pos". Below change got
me the right vendor id and the driver was able to probe the device. But
I don't seem to be getting any interrupts. I'll have a look to see if I
can figure out what's going wrong.

diff --git a/vfio/pci.c b/vfio/pci.c
index 9e45d30..ad03325 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -489,7 +489,6 @@ static ssize_t vfio_pci_cap_size(struct pci_cap_hdr *cap_hdr)
 static int vfio_pci_add_cap(struct vfio_device *vdev, struct pci_cap_hdr *cap_hdr,
 			    off_t fd_offset, off_t pos)
 {
-	int i;
 	ssize_t size = vfio_pci_cap_size(cap_hdr);
 	struct pci_device_header *hdr = &vdev->pci.hdr;
 	struct pci_cap_hdr *out = (void *)hdr + pos;
@@ -505,9 +504,12 @@ static int vfio_pci_add_cap(struct vfio_device *vdev, struct pci_cap_hdr *cap_hd
 	} else {
 		/* Add cap at end of list */
 		struct pci_cap_hdr *last;
+		int i = hdr->capabilities & ~3;
 
-		pci_for_each_cap(i, last, hdr)
-			;
+		do {
+			last = (void *)hdr + i;
+			i = last->next;
+		} while (i);
 		last->next = pos;
 	}

>
> Thanks,
> Punit
>
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  static int vfio_pci_parse_caps(struct vfio_device *vdev)
>>  {
>> +	u8 pos;
>> +	int ret;
>> +	struct pci_cap_hdr cap;
>> +	ssize_t sz = sizeof(cap);
>> +	struct vfio_region_info *info;
>>  	struct vfio_pci_device *pdev = &vdev->pci;
>>  
>>  	if (!(pdev->hdr.status & PCI_STATUS_CAP_LIST))
>>  		return 0;
>>  
>> +	pos = pdev->hdr.capabilities & ~3;
>> +	info = &vdev->regions[VFIO_PCI_CONFIG_REGION_INDEX].info;
>> +
>>  	pdev->hdr.status &= ~PCI_STATUS_CAP_LIST;
>>  	pdev->hdr.capabilities = 0;
>>  
>> -	/* TODO: install virtual capabilities */
>> +	for (; pos; pos = cap.next) {
>> +		if (pos >= PCI_DEV_CFG_SIZE) {
>> +			dev_warn(vdev, "ignoring cap outside of config space");
>> +			return -EINVAL;
>> +		}
>> +
>> +		if (pread(vdev->fd, &cap, sz, info->offset + pos) != sz) {
>> +			dev_warn(vdev, "failed to read from capabilities pointer (0x%x)",
>> +				 pos);
>> +			return -EINVAL;
>> +		}
>> +
>> +		switch (cap.type) {
>> +		case PCI_CAP_ID_MSIX:
>> +			ret = vfio_pci_add_cap(vdev, &cap, info->offset, pos);
>> +			if (ret) {
>> +				dev_warn(vdev, "failed to read capability structure %x",
>> +					 cap.type);
>> +				return ret;
>> +			}
>> +
>> +			pdev->msi.pos = pos;
>> +			pdev->irq_type = VFIO_PCI_IRQ_MSIX;
>> +			break;
>> +
>> +			/* Any other capability is hidden */
>> +		}
>> +	}
>>  
>>  	return 0;
>>  }
>
> [...]

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 06/10] Add PCI device passthrough using VFIO
  2017-07-31 17:52   ` Punit Agrawal
@ 2017-08-02 15:17     ` Jean-Philippe Brucker
  2017-08-03  9:36       ` Punit Agrawal
  0 siblings, 1 reply; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-08-02 15:17 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Hi Punit,

Thanks for reviewing and testing

On 31/07/17 18:52, Punit Agrawal wrote:
> Hi Jean-Philippe,
> 
> A couple of queries below -
> 
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:
> 
>> Assigning devices using VFIO allows the guest to have direct access to the
>> device, whilst filtering accesses to sensitive areas by trapping config
>> space accesses and mapping DMA with an IOMMU.
>>
>> This patch adds a new option to lkvm run: --vfio-group=<group_number>.
>> Before assigning a device to a VM, some preparation is required. As
>> described in Linux Documentation/vfio.txt, the device driver need to be
> 
> Nitpick: "needs"
> 
>> changed to vfio-pci:
>>
>>   $ dev=0000:00:00.0
>>
>>   $ echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
>>   $ echo vfio-pci > /sys/bus/pci/devices/$dev/driver_override
>>   $ echo $dev > /sys/bus/pci/drivers_probe
>>   $ readlink /sys/bus/pci/devices/$dev/iommu_group
>>   ../../../kernel/iommu_groups/5
>>
>> Adding --vfio[-group]=5 to lkvm-run will pass the device to the guest.
>> Multiple groups can be passed to the guest by adding more --vfio
>> parameters.
>>
>> This patch only implements PCI with INTx. MSI-X routing will be added in a
>> subsequent patch, and at some point we might add support for passing
>> platform devices to guests.
>>
>> Signed-off-by: Will Deacon <will.deacon@arm.com>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> ---
>>  Makefile                 |   2 +
>>  arm/pci.c                |   1 +
>>  builtin-run.c            |   5 +
>>  include/kvm/kvm-config.h |   3 +
>>  include/kvm/pci.h        |   3 +-
>>  include/kvm/vfio.h       |  57 +++++++
>>  vfio/core.c              | 395 +++++++++++++++++++++++++++++++++++++++++++++++
>>  vfio/pci.c               | 365 +++++++++++++++++++++++++++++++++++++++++++
>>  8 files changed, 830 insertions(+), 1 deletion(-)
>>  create mode 100644 include/kvm/vfio.h
>>  create mode 100644 vfio/core.c
>>  create mode 100644 vfio/pci.c
>>
> 
> [...]
> 
>> +static int vfio_container_init(struct kvm *kvm)
>> +{
>> +	int api, i, ret, iommu_type;;
>> +
>> +	/* Create a container for our IOMMU groups */
>> +	vfio_container = open(VFIO_DEV_NODE, O_RDWR);
>> +	if (vfio_container == -1) {
>> +		ret = errno;
>> +		pr_err("Failed to open %s", VFIO_DEV_NODE);
>> +		return ret;
>> +	}
>> +
>> +	api = ioctl(vfio_container, VFIO_GET_API_VERSION);
>> +	if (api != VFIO_API_VERSION) {
>> +		pr_err("Unknown VFIO API version %d", api);
>> +		return -ENODEV;
>> +	}
> 
> We are using the VFIO_API_VERSION pulled in from linux/vfio.h. Will
> kvmtool be in trouble if for some reason the API version is incompatibly
> incremented in the future?

There are no precedents for this, as VFIO version is still 0. Additions to
the API are added in a backward-compatible way (for example by adding new
flags in the info structures and new fields at the end). If something
significant enough happened and required to increase the version number,
then the next version would probably be incompatible, and kvmtool would
have to support it separately.

>> +
>> +	iommu_type = vfio_get_iommu_type();
>> +	if (iommu_type < 0) {
>> +		pr_err("VFIO type-1 IOMMU not supported on this platform");
>> +		return iommu_type;
>> +	}
>> +
>> +	/* Sanity check our groups and add them to the container */
>> +	for (i = 0; i < kvm->cfg.num_vfio_groups; ++i) {
>> +		ret = vfio_group_init(kvm, &kvm->cfg.vfio_group[i]);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	/* Finalise the container */
>> +	if (ioctl(vfio_container, VFIO_SET_IOMMU, iommu_type)) {
>> +		ret = -errno;
>> +		pr_err("Failed to set IOMMU type %d for VFIO container",
>> +		       iommu_type);
>> +		return ret;
> 
> Just checking - is there a need to remove the groups from the containers
> (added by VFIO_GROUP_SET_CONTAINER in vfio_group_init()) before erroring
> out here? I imagine freeing of the vfio_container when kvmtool is
> cleaning up after itself will do the right thing.

Yes, VFIO cleans everything up when the container and group file
descriptors are released.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 03/10] irq: add irqfd helpers
  2017-07-31 17:55   ` Punit Agrawal
@ 2017-08-02 15:17     ` Jean-Philippe Brucker
  0 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-08-02 15:17 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

On 31/07/17 18:55, Punit Agrawal wrote:
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:
> 
>> Add helpers to add and remove IRQFD routing for both irqchips and MSIs.
>> We have to make a special case of IRQ lines on ARM where the
>> initialisation order goes like this:
>>
>>  (1) Devices reserve their IRQ lines
>>  (2) VGIC is setup with VGIC_CTRL_INIT (in a late_init call)
>>  (3) MSIs are reserved lazily, when the guest needs them
>>
>> Since we cannot setup IRQFD before (2), store the IRQFD routing for IRQ
>> lines temporarily until we're ready to submit them.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> ---
>>  arm/gic.c                    | 74 +++++++++++++++++++++++++++++++++++++++++++-
>>  arm/include/arm-common/gic.h |  6 ++++
>>  hw/pci-shmem.c               |  8 +----
>>  include/kvm/irq.h            | 17 ++++++++++
>>  irq.c                        | 24 ++++++++++++++
>>  virtio/net.c                 |  9 ++----
>>  virtio/scsi.c                | 10 ++----
>>  7 files changed, 126 insertions(+), 22 deletions(-)
>>
> 
> [...]
> 
> 
>> diff --git a/arm/include/arm-common/gic.h b/arm/include/arm-common/gic.h
>> index 433dd237..0c279aca 100644
>> --- a/arm/include/arm-common/gic.h
>> +++ b/arm/include/arm-common/gic.h
>> @@ -33,4 +33,10 @@ int gic__alloc_irqnum(void);
>>  int gic__create(struct kvm *kvm, enum irqchip_type type);
>>  void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type);
>>  
>> +int gic__add_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd,
>> +		   int resample_fd);
>> +void gic__del_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd);
>> +#define irq__add_irqfd gic__add_irqfd
>> +#define irq__del_irqfd gic__del_irqfd
>> +
>>  #endif /* ARM_COMMON__GIC_H */
>> diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
>> index 512b5b06..107043e9 100644
>> --- a/hw/pci-shmem.c
>> +++ b/hw/pci-shmem.c
>> @@ -127,7 +127,6 @@ static void callback_mmio_msix(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len
>>  int pci_shmem__get_local_irqfd(struct kvm *kvm)
>>  {
>>  	int fd, gsi, r;
>> -	struct kvm_irqfd irqfd;
>>  
>>  	if (local_fd == 0) {
>>  		fd = eventfd(0, 0);
>> @@ -143,12 +142,7 @@ int pci_shmem__get_local_irqfd(struct kvm *kvm)
>>  			gsi = pci_shmem_pci_device.irq_line;
>>  		}
>>  
>> -		irqfd = (struct kvm_irqfd) {
>> -			.fd = fd,
>> -			.gsi = gsi,
>> -		};
>> -
>> -		r = ioctl(kvm->vm_fd, KVM_IRQFD, &irqfd);
>> +		r = irq__add_irqfd(kvm, gsi, fd, -1);
>>  		if (r < 0)
>>  			return r;
>>  
> 
> Not something you've introduced but I couldn't find any users of
> pci_shmem__get_local_irqfd(). Could this function be dropped - either as
> a pre-patch or separately?

Heh, indeed. The situation is the same for pci_shmem__add/remove_client. I
suppose it's an interface for another device that has never been added.

Cleaning up pci_shmem is another task altogether. It's not used and there
is no driver upstream. It should probably be removed altogether [1], but I
have enough patches in the pipe right now :)

Thanks,
Jean

[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2015-June/015270.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support
  2017-08-01 16:04     ` Punit Agrawal
@ 2017-08-02 15:18       ` Jean-Philippe Brucker
  2017-08-03 10:25         ` Punit Agrawal
  0 siblings, 1 reply; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-08-02 15:18 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

On 01/08/17 17:04, Punit Agrawal wrote:
[...]
> I think I've figured out what's going wrong. pci_for_each_cap is defined
> as - 
> 
> #define pci_for_each_cap(pos, cap, hdr)                         \
>         for ((pos) = (hdr)->capabilities & ~3;                  \
>              (cap) = (void *)(hdr) + (pos), (pos) != 0;         \
>              (pos) = ((struct pci_cap_hdr *)(cap))->next & ~3)
> 
> Here cap is being assigned before the pos != 0 check. So in the last
> iteration through the loop, cap will point to the start of the PCI
> header - which is then used to set "last->next = pos".

Ah right, sorry about that. How about moving the pos check in front? Does
the following work for you?

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index c5fc8254..37a65956 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -142,9 +142,9 @@ struct pci_device_header {
 	enum irq_type	irq_type;
 };
 
-#define pci_for_each_cap(pos, cap, hdr)				\
-	for ((pos) = (hdr)->capabilities & ~3;			\
-	     (cap) = (void *)(hdr) + (pos), (pos) != 0;		\
+#define pci_for_each_cap(pos, cap, hdr)					\
+	for ((pos) = (hdr)->capabilities & ~3;				\
+	     (pos) != 0 && ({ (cap) = (void *)(hdr) + (pos); 1; });	\
 	     (pos) = ((struct pci_cap_hdr *)(cap))->next & ~3)
 
 int pci__init(struct kvm *kvm);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 06/10] Add PCI device passthrough using VFIO
  2017-08-02 15:17     ` Jean-Philippe Brucker
@ 2017-08-03  9:36       ` Punit Agrawal
  2017-08-03 11:24         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 24+ messages in thread
From: Punit Agrawal @ 2017-08-03  9:36 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:

> Hi Punit,
>
> Thanks for reviewing and testing
>
> On 31/07/17 18:52, Punit Agrawal wrote:
>> Hi Jean-Philippe,
>> 
>> A couple of queries below -
>> 
>> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:
>> 
>>> Assigning devices using VFIO allows the guest to have direct access to the
>>> device, whilst filtering accesses to sensitive areas by trapping config
>>> space accesses and mapping DMA with an IOMMU.
>>>
>>> This patch adds a new option to lkvm run: --vfio-group=<group_number>.
>>> Before assigning a device to a VM, some preparation is required. As
>>> described in Linux Documentation/vfio.txt, the device driver need to be
>> 
>> Nitpick: "needs"
>> 
>>> changed to vfio-pci:
>>>
>>>   $ dev=0000:00:00.0
>>>
>>>   $ echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
>>>   $ echo vfio-pci > /sys/bus/pci/devices/$dev/driver_override
>>>   $ echo $dev > /sys/bus/pci/drivers_probe
>>>   $ readlink /sys/bus/pci/devices/$dev/iommu_group
>>>   ../../../kernel/iommu_groups/5
>>>
>>> Adding --vfio[-group]=5 to lkvm-run will pass the device to the guest.
>>> Multiple groups can be passed to the guest by adding more --vfio
>>> parameters.
>>>
>>> This patch only implements PCI with INTx. MSI-X routing will be added in a
>>> subsequent patch, and at some point we might add support for passing
>>> platform devices to guests.
>>>
>>> Signed-off-by: Will Deacon <will.deacon@arm.com>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>>> ---
>>>  Makefile                 |   2 +
>>>  arm/pci.c                |   1 +
>>>  builtin-run.c            |   5 +
>>>  include/kvm/kvm-config.h |   3 +
>>>  include/kvm/pci.h        |   3 +-
>>>  include/kvm/vfio.h       |  57 +++++++
>>>  vfio/core.c              | 395 +++++++++++++++++++++++++++++++++++++++++++++++
>>>  vfio/pci.c               | 365 +++++++++++++++++++++++++++++++++++++++++++
>>>  8 files changed, 830 insertions(+), 1 deletion(-)
>>>  create mode 100644 include/kvm/vfio.h
>>>  create mode 100644 vfio/core.c
>>>  create mode 100644 vfio/pci.c
>>>
>> 
>> [...]
>> 
>>> +static int vfio_container_init(struct kvm *kvm)
>>> +{
>>> +	int api, i, ret, iommu_type;;
>>> +
>>> +	/* Create a container for our IOMMU groups */
>>> +	vfio_container = open(VFIO_DEV_NODE, O_RDWR);
>>> +	if (vfio_container == -1) {
>>> +		ret = errno;
>>> +		pr_err("Failed to open %s", VFIO_DEV_NODE);
>>> +		return ret;
>>> +	}
>>> +
>>> +	api = ioctl(vfio_container, VFIO_GET_API_VERSION);
>>> +	if (api != VFIO_API_VERSION) {
>>> +		pr_err("Unknown VFIO API version %d", api);
>>> +		return -ENODEV;
>>> +	}
>> 
>> We are using the VFIO_API_VERSION pulled in from linux/vfio.h. Will
>> kvmtool be in trouble if for some reason the API version is incompatibly
>> incremented in the future?
>
> There are no precedents for this, as VFIO version is still 0. Additions to
> the API are added in a backward-compatible way (for example by adding new
> flags in the info structures and new fields at the end). If something
> significant enough happened and required to increase the version number,
> then the next version would probably be incompatible, and kvmtool would
> have to support it separately.

In that case, we should use a local version number macro for comparison
and if ever VFIO api changes kvmtool will explicitly complain instead of
silently dealing with an unknown version.

>
>>> +
>>> +	iommu_type = vfio_get_iommu_type();
>>> +	if (iommu_type < 0) {
>>> +		pr_err("VFIO type-1 IOMMU not supported on this platform");
>>> +		return iommu_type;
>>> +	}
>>> +
>>> +	/* Sanity check our groups and add them to the container */
>>> +	for (i = 0; i < kvm->cfg.num_vfio_groups; ++i) {
>>> +		ret = vfio_group_init(kvm, &kvm->cfg.vfio_group[i]);
>>> +		if (ret)
>>> +			return ret;
>>> +	}
>>> +
>>> +	/* Finalise the container */
>>> +	if (ioctl(vfio_container, VFIO_SET_IOMMU, iommu_type)) {
>>> +		ret = -errno;
>>> +		pr_err("Failed to set IOMMU type %d for VFIO container",
>>> +		       iommu_type);
>>> +		return ret;
>> 
>> Just checking - is there a need to remove the groups from the containers
>> (added by VFIO_GROUP_SET_CONTAINER in vfio_group_init()) before erroring
>> out here? I imagine freeing of the vfio_container when kvmtool is
>> cleaning up after itself will do the right thing.
>
> Yes, VFIO cleans everything up when the container and group file
> descriptors are released.

Thanks for confirming!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support
  2017-08-02 15:18       ` Jean-Philippe Brucker
@ 2017-08-03 10:25         ` Punit Agrawal
  2017-08-03 10:53           ` Jean-Philippe Brucker
  0 siblings, 1 reply; 24+ messages in thread
From: Punit Agrawal @ 2017-08-03 10:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:

> On 01/08/17 17:04, Punit Agrawal wrote:
> [...]
>> I think I've figured out what's going wrong. pci_for_each_cap is defined
>> as - 
>> 
>> #define pci_for_each_cap(pos, cap, hdr)                         \
>>         for ((pos) = (hdr)->capabilities & ~3;                  \
>>              (cap) = (void *)(hdr) + (pos), (pos) != 0;         \
>>              (pos) = ((struct pci_cap_hdr *)(cap))->next & ~3)
>> 
>> Here cap is being assigned before the pos != 0 check. So in the last
>> iteration through the loop, cap will point to the start of the PCI
>> header - which is then used to set "last->next = pos".
>
> Ah right, sorry about that. How about moving the pos check in front? Does
> the following work for you?
>
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index c5fc8254..37a65956 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -142,9 +142,9 @@ struct pci_device_header {
>  	enum irq_type	irq_type;
>  };
>  
> -#define pci_for_each_cap(pos, cap, hdr)				\
> -	for ((pos) = (hdr)->capabilities & ~3;			\
> -	     (cap) = (void *)(hdr) + (pos), (pos) != 0;		\
> +#define pci_for_each_cap(pos, cap, hdr)					\
> +	for ((pos) = (hdr)->capabilities & ~3;				\
> +	     (pos) != 0 && ({ (cap) = (void *)(hdr) + (pos); 1; });	\
>  	     (pos) = ((struct pci_cap_hdr *)(cap))->next & ~3)
>  
>  int pci__init(struct kvm *kvm);

The compiler throws a warning with the above change and fails to compile
(as all warnings are treated as error by kvmtool build) -

vfio/pci.c: In function ‘vfio_pci_setup_device’:
vfio/pci.c:511:14: error: ‘last’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
   last->next = pos;
   ~~~~~~~~~~~^~~~~
vfio/pci.c:507:23: note: ‘last’ was declared here
   struct pci_cap_hdr *last;
                       ^~~~

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support
  2017-08-03 10:25         ` Punit Agrawal
@ 2017-08-03 10:53           ` Jean-Philippe Brucker
  0 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-08-03 10:53 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

On 03/08/17 11:25, Punit Agrawal wrote:
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:
> 
>> On 01/08/17 17:04, Punit Agrawal wrote:
>> [...]
>>> I think I've figured out what's going wrong. pci_for_each_cap is defined
>>> as - 
>>>
>>> #define pci_for_each_cap(pos, cap, hdr)                         \
>>>         for ((pos) = (hdr)->capabilities & ~3;                  \
>>>              (cap) = (void *)(hdr) + (pos), (pos) != 0;         \
>>>              (pos) = ((struct pci_cap_hdr *)(cap))->next & ~3)
>>>
>>> Here cap is being assigned before the pos != 0 check. So in the last
>>> iteration through the loop, cap will point to the start of the PCI
>>> header - which is then used to set "last->next = pos".
>>
>> Ah right, sorry about that. How about moving the pos check in front? Does
>> the following work for you?
>>
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index c5fc8254..37a65956 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -142,9 +142,9 @@ struct pci_device_header {
>>  	enum irq_type	irq_type;
>>  };
>>  
>> -#define pci_for_each_cap(pos, cap, hdr)				\
>> -	for ((pos) = (hdr)->capabilities & ~3;			\
>> -	     (cap) = (void *)(hdr) + (pos), (pos) != 0;		\
>> +#define pci_for_each_cap(pos, cap, hdr)					\
>> +	for ((pos) = (hdr)->capabilities & ~3;				\
>> +	     (pos) != 0 && ({ (cap) = (void *)(hdr) + (pos); 1; });	\
>>  	     (pos) = ((struct pci_cap_hdr *)(cap))->next & ~3)
>>  
>>  int pci__init(struct kvm *kvm);
> 
> The compiler throws a warning with the above change and fails to compile
> (as all warnings are treated as error by kvmtool build) -
> 
> vfio/pci.c: In function ‘vfio_pci_setup_device’:
> vfio/pci.c:511:14: error: ‘last’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
>    last->next = pos;
>    ~~~~~~~~~~~^~~~~
> vfio/pci.c:507:23: note: ‘last’ was declared here
>    struct pci_cap_hdr *last;
>                        ^~~~
 Ok fair enough. In practice this wouldn't happen, as there is at least
one capability in the chain when we get there (the compiler might get
confused by the "& ~3" masking, which doesn't matter for our virtual
values). But the specialization you proposed to get the last capability
should be fine, I'll go with that in the next version.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 06/10] Add PCI device passthrough using VFIO
  2017-08-03  9:36       ` Punit Agrawal
@ 2017-08-03 11:24         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-08-03 11:24 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

On 03/08/17 10:36, Punit Agrawal wrote:
> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:
> 
>> Hi Punit,
>>
>> Thanks for reviewing and testing
>>
>> On 31/07/17 18:52, Punit Agrawal wrote:
>>> Hi Jean-Philippe,
>>>
>>> A couple of queries below -
>>>
>>> Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:
>>>
>>>> Assigning devices using VFIO allows the guest to have direct access to the
>>>> device, whilst filtering accesses to sensitive areas by trapping config
>>>> space accesses and mapping DMA with an IOMMU.
>>>>
>>>> This patch adds a new option to lkvm run: --vfio-group=<group_number>.
>>>> Before assigning a device to a VM, some preparation is required. As
>>>> described in Linux Documentation/vfio.txt, the device driver need to be
>>>
>>> Nitpick: "needs"
>>>
>>>> changed to vfio-pci:
>>>>
>>>>   $ dev=0000:00:00.0
>>>>
>>>>   $ echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
>>>>   $ echo vfio-pci > /sys/bus/pci/devices/$dev/driver_override
>>>>   $ echo $dev > /sys/bus/pci/drivers_probe
>>>>   $ readlink /sys/bus/pci/devices/$dev/iommu_group
>>>>   ../../../kernel/iommu_groups/5
>>>>
>>>> Adding --vfio[-group]=5 to lkvm-run will pass the device to the guest.
>>>> Multiple groups can be passed to the guest by adding more --vfio
>>>> parameters.
>>>>
>>>> This patch only implements PCI with INTx. MSI-X routing will be added in a
>>>> subsequent patch, and at some point we might add support for passing
>>>> platform devices to guests.
>>>>
>>>> Signed-off-by: Will Deacon <will.deacon@arm.com>
>>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>>>> ---
>>>>  Makefile                 |   2 +
>>>>  arm/pci.c                |   1 +
>>>>  builtin-run.c            |   5 +
>>>>  include/kvm/kvm-config.h |   3 +
>>>>  include/kvm/pci.h        |   3 +-
>>>>  include/kvm/vfio.h       |  57 +++++++
>>>>  vfio/core.c              | 395 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>  vfio/pci.c               | 365 +++++++++++++++++++++++++++++++++++++++++++
>>>>  8 files changed, 830 insertions(+), 1 deletion(-)
>>>>  create mode 100644 include/kvm/vfio.h
>>>>  create mode 100644 vfio/core.c
>>>>  create mode 100644 vfio/pci.c
>>>>
>>>
>>> [...]
>>>
>>>> +static int vfio_container_init(struct kvm *kvm)
>>>> +{
>>>> +	int api, i, ret, iommu_type;;
>>>> +
>>>> +	/* Create a container for our IOMMU groups */
>>>> +	vfio_container = open(VFIO_DEV_NODE, O_RDWR);
>>>> +	if (vfio_container == -1) {
>>>> +		ret = errno;
>>>> +		pr_err("Failed to open %s", VFIO_DEV_NODE);
>>>> +		return ret;
>>>> +	}
>>>> +
>>>> +	api = ioctl(vfio_container, VFIO_GET_API_VERSION);
>>>> +	if (api != VFIO_API_VERSION) {
>>>> +		pr_err("Unknown VFIO API version %d", api);
>>>> +		return -ENODEV;
>>>> +	}
>>>
>>> We are using the VFIO_API_VERSION pulled in from linux/vfio.h. Will
>>> kvmtool be in trouble if for some reason the API version is incompatibly
>>> incremented in the future?
>>
>> There are no precedents for this, as VFIO version is still 0. Additions to
>> the API are added in a backward-compatible way (for example by adding new
>> flags in the info structures and new fields at the end). If something
>> significant enough happened and required to increase the version number,
>> then the next version would probably be incompatible, and kvmtool would
>> have to support it separately.
> 
> In that case, we should use a local version number macro for comparison
> and if ever VFIO api changes kvmtool will explicitly complain instead of
> silently dealing with an unknown version.

Good idea. I think I'll go one step further and pull the UAPI headers into
include/linux/vfio.h. It seems to be what we usually do for this kind of
situation (and I'll need it anyway when working with SVM virtualization).

Thanks,
Jean

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support
  2017-06-22 17:05 ` [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support Jean-Philippe Brucker
  2017-07-31 17:49   ` Punit Agrawal
@ 2017-08-18 17:42   ` Jean-Philippe Brucker
  2017-08-22 11:25     ` Punit Agrawal
  1 sibling, 1 reply; 24+ messages in thread
From: Jean-Philippe Brucker @ 2017-08-18 17:42 UTC (permalink / raw)
  To: kvm
  Cc: will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier,
	Punit Agrawal

On 22/06/17 18:05, Jean-Philippe Brucker wrote:
> +		switch (cap.type) {
> +		case PCI_CAP_ID_MSIX:
> +			ret = vfio_pci_add_cap(vdev, &cap, info->offset, pos);
> +			if (ret) {
> +				dev_warn(vdev, "failed to read capability structure %x",
> +					 cap.type);
> +				return ret;
> +			}
> +
> +			pdev->msi.pos = pos;
> +			pdev->irq_type = VFIO_PCI_IRQ_MSIX;

A few more issues noticed when testing this code with an IGB endpoint on
AMD Seattle:

* Selecting only one irq_type seems like the wrong thing to do. If the
device has an MSI-X capability, then kvmtool will only initialize it and
ignore INTx altogether. But if the virtual irqchip doesn't support MSIs,
then the guest will fall back to INTx and fail to receive any interrupt.

* If we force selection of INTx over MSI-X, then kvmtool will not treat
the MSI-X BAR as special, attempt to map it, and get rejected by VFIO.

* The next patch relies on the order of capabilities in the config space
to decide whether to use MSI-X or MSIs, which isn't nice. Same as before,
the discarded capability will be advertised to the guest but won't work.

So I think irq_type should instead be a bitfield, and we should emulate
all available INTx/MSI/MSI-X capabilities, then let the guest choose what
to use.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support
  2017-08-18 17:42   ` Jean-Philippe Brucker
@ 2017-08-22 11:25     ` Punit Agrawal
  0 siblings, 0 replies; 24+ messages in thread
From: Punit Agrawal @ 2017-08-22 11:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: kvm, will.deacon, robin.murphy, lorenzo.pieralisi, marc.zyngier

Jean-Philippe Brucker <jean-philippe.brucker@arm.com> writes:

> On 22/06/17 18:05, Jean-Philippe Brucker wrote:
>> +		switch (cap.type) {
>> +		case PCI_CAP_ID_MSIX:
>> +			ret = vfio_pci_add_cap(vdev, &cap, info->offset, pos);
>> +			if (ret) {
>> +				dev_warn(vdev, "failed to read capability structure %x",
>> +					 cap.type);
>> +				return ret;
>> +			}
>> +
>> +			pdev->msi.pos = pos;
>> +			pdev->irq_type = VFIO_PCI_IRQ_MSIX;
>
> A few more issues noticed when testing this code with an IGB endpoint on
> AMD Seattle:
>
> * Selecting only one irq_type seems like the wrong thing to do. If the
> device has an MSI-X capability, then kvmtool will only initialize it and
> ignore INTx altogether. But if the virtual irqchip doesn't support MSIs,
> then the guest will fall back to INTx and fail to receive any interrupt.

Ah right! I'd missed the fact that we were choosing one interrupt type
support for the device. That explains why I couldn't get legacy
interrupts to work with passthrough for a device advertising MSI/-X.

>
> * If we force selection of INTx over MSI-X, then kvmtool will not treat
> the MSI-X BAR as special, attempt to map it, and get rejected by VFIO.
>
> * The next patch relies on the order of capabilities in the config space
> to decide whether to use MSI-X or MSIs, which isn't nice. Same as before,
> the discarded capability will be advertised to the guest but won't work.
>
> So I think irq_type should instead be a bitfield, and we should emulate
> all available INTx/MSI/MSI-X capabilities, then let the guest choose what
> to use.

Makes sense. I'll look out for the next update.

Thanks for looking into this.

Punit

>
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-08-22 11:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-22 17:05 [PATCH v2 kvmtool 00/10] Add PCI passthrough support with VFIO Jean-Philippe Brucker
2017-06-22 17:05 ` [PATCH v2 kvmtool 01/10] pci: add config operations callbacks on the PCI header Jean-Philippe Brucker
2017-06-22 17:05 ` [PATCH v2 kvmtool 02/10] pci: allow to specify IRQ type for PCI devices Jean-Philippe Brucker
2017-06-22 17:05 ` [PATCH v2 kvmtool 03/10] irq: add irqfd helpers Jean-Philippe Brucker
2017-07-31 17:55   ` Punit Agrawal
2017-08-02 15:17     ` Jean-Philippe Brucker
2017-06-22 17:05 ` [PATCH v2 kvmtool 04/10] Extend memory bank API with memory types Jean-Philippe Brucker
2017-06-22 17:05 ` [PATCH v2 kvmtool 05/10] pci: add capability helpers Jean-Philippe Brucker
2017-06-22 17:05 ` [PATCH v2 kvmtool 06/10] Add PCI device passthrough using VFIO Jean-Philippe Brucker
2017-07-31 17:52   ` Punit Agrawal
2017-08-02 15:17     ` Jean-Philippe Brucker
2017-08-03  9:36       ` Punit Agrawal
2017-08-03 11:24         ` Jean-Philippe Brucker
2017-06-22 17:05 ` [PATCH v2 kvmtool 07/10] vfio-pci: add MSI-X support Jean-Philippe Brucker
2017-07-31 17:49   ` Punit Agrawal
2017-08-01 16:04     ` Punit Agrawal
2017-08-02 15:18       ` Jean-Philippe Brucker
2017-08-03 10:25         ` Punit Agrawal
2017-08-03 10:53           ` Jean-Philippe Brucker
2017-08-18 17:42   ` Jean-Philippe Brucker
2017-08-22 11:25     ` Punit Agrawal
2017-06-22 17:05 ` [PATCH v2 kvmtool 08/10] vfio-pci: add MSI support Jean-Philippe Brucker
2017-06-22 17:05 ` [PATCH v2 kvmtool 09/10] Introduce reserved memory regions Jean-Philippe Brucker
2017-06-22 17:05 ` [PATCH v2 kvmtool 10/10] vfio: check reserved regions before mapping DMA Jean-Philippe Brucker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).