kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 kvmtool 00/12] Add reassignable BARs
@ 2020-05-14 15:38 Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 01/12] ioport: mmio: Use a mutex and reference counting for locking Alexandru Elisei
                   ` (13 more replies)
  0 siblings, 14 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
it from trying to reassign the BARs. Let's make the BARs reassignable so we
can get rid of this band-aid.

During device configuration, Linux can assign a region resource to a BAR
that temporarily overlaps with another device. This means that the code
that emulates BAR reassignment must be aware of all the registered PCI
devices. Because of this, and to avoid re-implementing the same code for
each emulated device, the algorithm for activating/deactivating emulation
as BAR addresses change lives completely inside the PCI code. Each device
registers two callback functions which are called when device emulation is
activated (for example, to activate emulation for a newly assigned BAR
address), respectively, when device emulation is deactivated (a previous
BAR address is changed, and emulation for that region must be deactivated).

Two important changes this iteration: the first 18 patches and patch 22
from v3 have been merged and I have dropped the last patch, the patch that
adds support for PCI Express 1.1 (hence the subject change). The CFI
patches have been merged and EDK2 can run right now under kvmtool with
virtio-mmio, and PCI Express breaks that. EDK2 doesn't have support for
legacy PCI, so when running under kvmtool, the PCI bus doesn't exist as far
as EDK2 is concerned. As soon as we add support for PCI Express, EDK2 will
run into the bug that I described in v3 [1]. I'll resent that patch along
with the fixes that make EDK2 work with PCI Express.

Like before, I have tested the patches on an x86_64 and arm64 machine:

1. On AMD Seattle. Tried PCI passthrough using two different guests in each
case; one with 4k pages, and one with 64k pages (the CPU doesn't have
support for 16k pages):
- Intel 82574L Gigabit Ethernet card
- Samsung 970 Pro NVME (controller registers are in the same BAR region as the
  MSIX table and PBA, I wrote a nasty hack to make it work, will try to upstream
  something after this series)
- Realtek 8168 Gigabit Ethernet card
- NVIDIA Quadro P400 (nouveau probe fails because expansion ROM emulation not
  implemented in kvmtool, I will write a fix for that)
- AMD Firepro W2100 (amdgpu driver probe fails just like on the NVIDIA card)
- Seagate Barracuda 1000GB drive and Crucial MX500 SSD attached to a PCIE to
  SATA card.

I wrote a kvm-unit-tests test that stresses the MMIO emulation locking
scheme that I implemented by spawning 4 vcpus (the Seattle machine has 8
physical CPUs) that toggle memory, and another 4 vcpus that read from the
memory region described by a BAR. I ran this under valgrind, and no
deadlocks or use-after-free errors were detected.

I've also installed an official debian iso in a virtual machine by using
the EDK2 binary that Ard posted [2] and virtio-mmio.

2. Ryzen 3900x + Gigabyte x570 Aorus Master (bios F10). Tried PCI passthrough
with a Realtek 8168 and Intel 82574L Gigabit Ethernet cards at the same time,
plus running with --sdl enabled to test VESA device emulation at the same time.
I also managed to get the guest to do bar reassignment for one device by using
the kernel command line parameter pci.resource_alignment=16@pci:10ec:8168.

Changes in v4:
* Patches 1-18 and 22 have been merged.
* Gathered Reviewed-by tags. Thank you!
* Patch #1 (was #19): added comments explaining why refcount starts at 0
  and we use a separate variable for deletion; minor changes to
  ioport__unregister to make it more similar to kvm__deregister_mmio.
* Patch #6 (was #25): assert that size is less than or equal to 4 to
  prevent any stack overruns.
* Patch #7 (was #26): rewrote it, now we prohibit the user from requesting
  more than one of --gtk, --vnc and --sdl.
* Patch #8 (was #27): fixed coding style issues, added a comment explaining
  the the MSIX table and PBA structure can share the same BAR and removed
  the assert from pci__register_bar_regions.
* Patch #10 (was #29): removed unused parameter from the functions that
  activate/deactivate BAR emulation and consolidated them into one
  function; added a comment summarizing the algorithm for BAR reassignment.
* Dropped patch #13 (was #32) for the reasons explained above.

Changes in v3:
* Patches 18, 24 and 26 are new.
* Dropped #9 "arm/pci: Fix PCI IO region" for reasons explained above.
* Reworded commit message for #12 "vfio/pci: Ignore expansion ROM BAR
  writes" to make it clear that kvmtool's implementation of VFIO doesn't
  support expansion BAR emulation.
* Moved variable declaration at the start of the function for #13
  "vfio/pci: Don't access unallocated regions".
* Patch #15 "Don't ignore errors registering a device, ioport or mmio
  emulation" uses ioport__remove consistenly.
* Reworked error cleanup for #16 "hw/vesa: Don't ignore fatal errors".
* Reworded commit message for #17 "hw/vesa: Set the size for BAR 0".
* Reworked #19 "ioport: mmio: Use a mutex and reference counting for
  locking" to also use reference counting to prevent races (and updated the
  commit message in the process).
* Added the function pci__bar_size to #20 "pci: Add helpers for BAR values
  and memory/IO space access".
* Protect mem_banks list with a mutex in #22 "vfio: Destroy memslot when
  unmapping the associated VAs"; also moved the munmap after the slot is
  destroyed, as per the KVM api.
* Dropped the rework of the vesa device in patch #27 "pci: Implement
  callbacks for toggling BAR emulation". Also added a few assert statements
  in some callbacks to make sure that they don't get called with an
  unxpected BAR number (which can only result from a coding error). Also
  return -ENOENT when kvm__deregister_mmio fails, to match ioport behavior
  and for better diagnostics.
* Dropped the PCI configuration write callback from the vesa device in #28
  "pci: Toggle BAR I/O and memory space emulation". Apparently, if we don't
  allow the guest kernel to disable BAR access, it treats the VESA device
  as a VGA boot device, instead of a regular VGA device, and Xorg cannot
  use it.
* Droped struct bar_info from #29 "pci: Implement reassignable BARs". Also
  changed pci_dev_err to pci_dev_warn in pci_{activate,deactivate}_bar,
  because the errors can be benign (they can happen because two vcpus race
  against each other to deactivate memory or I/O access, for example).
* Replaced the PCI device configuration space define with the actual
  number in #32 "arm/arm64: Add PCI Express 1.1 support". On some distros
  the defines weren't present in /usr/include/linux/pci_regs.h.
* Implemented other minor review comments.
* Gathered Reviewed-by tags.

Changes in v2:
* Patches 2, 11-18, 20, 22-27, 29 are new.
* Patches 11, 13, and 14 have been dropped.
* Reworked the way BAR reassignment is implemented.
* The patch "Add PCI Express 1.1 support" has been reworked to apply only
  to arm64. For x86 we would need ACPI support in order to advertise the
  location of the ECAM space.
* Gathered Reviewed-by tags.
* Implemented review comments.

[1] https://www.spinics.net/lists/kvm/msg211272.html
[2] https://www.spinics.net/lists/kvm/msg213842.html


Alexandru Elisei (11):
  ioport: mmio: Use a mutex and reference counting for locking
  pci: Add helpers for BAR values and memory/IO space access
  virtio/pci: Get emulated region address from BARs
  vfio: Reserve ioports when configuring the BAR
  pci: Limit configuration transaction size to 32 bits
  vfio/pci: Don't write configuration value twice
  Don't allow more than one framebuffers
  pci: Implement callbacks for toggling BAR emulation
  pci: Toggle BAR I/O and memory space emulation
  pci: Implement reassignable BARs
  vfio: Trap MMIO access to BAR addresses which aren't page aligned

Julien Thierry (1):
  arm/fdt: Remove 'linux,pci-probe-only' property

 arm/fdt.c                     |   1 -
 builtin-run.c                 |   5 +
 hw/vesa.c                     |  72 +++++---
 include/kvm/ioport.h          |   2 +
 include/kvm/pci.h             |  85 ++++++++-
 include/kvm/rbtree-interval.h |   4 +-
 include/kvm/virtio-pci.h      |   3 -
 ioport.c                      |  85 ++++++---
 mmio.c                        | 107 +++++++++---
 pci.c                         | 320 ++++++++++++++++++++++++++++++----
 powerpc/spapr_pci.c           |   2 +-
 vfio/core.c                   |  18 +-
 vfio/pci.c                    | 132 ++++++++++++--
 virtio/pci.c                  | 158 ++++++++++++-----
 14 files changed, 794 insertions(+), 200 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 01/12] ioport: mmio: Use a mutex and reference counting for locking
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-15 10:13   ` André Przywara
  2020-05-14 15:38 ` [PATCH v4 kvmtool 02/12] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

kvmtool uses brlock for protecting accesses to the ioport and mmio
red-black trees. brlock allows concurrent reads, but only one writer, which
is assumed not to be a VCPU thread (for more information see commit
0b907ed2eaec ("kvm tools: Add a brlock)). This is done by issuing a
compiler barrier on read and pausing the entire virtual machine on writes.
When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread read/write
lock.

When we will implement reassignable BARs, the mmio or ioport mapping will
be done as a result of a VCPU mmio access. When brlock is a pthread
read/write lock, it means that we will try to acquire a write lock with the
read lock already held by the same VCPU and we will deadlock. When it's
not, a VCPU will have to call kvm__pause, which means the virtual machine
will stay paused forever.

Let's avoid all this by using a mutex and reference counting the red-black
tree entries. This way we can guarantee that we won't unregister a node
that another thread is currently using for emulation.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/ioport.h          |   2 +
 include/kvm/rbtree-interval.h |   4 +-
 ioport.c                      |  85 ++++++++++++++++++++++-----------
 mmio.c                        | 107 ++++++++++++++++++++++++++++++++----------
 4 files changed, 143 insertions(+), 55 deletions(-)

diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
index 62a719327e3f..039633f76bdd 100644
--- a/include/kvm/ioport.h
+++ b/include/kvm/ioport.h
@@ -22,6 +22,8 @@ struct ioport {
 	struct ioport_operations	*ops;
 	void				*priv;
 	struct device_header		dev_hdr;
+	u32				refcount;
+	bool				remove;
 };
 
 struct ioport_operations {
diff --git a/include/kvm/rbtree-interval.h b/include/kvm/rbtree-interval.h
index 730eb5e8551d..17cd3b5f3199 100644
--- a/include/kvm/rbtree-interval.h
+++ b/include/kvm/rbtree-interval.h
@@ -6,7 +6,9 @@
 
 #define RB_INT_INIT(l, h) \
 	(struct rb_int_node){.low = l, .high = h}
-#define rb_int(n) rb_entry(n, struct rb_int_node, node)
+#define rb_int(n)	rb_entry(n, struct rb_int_node, node)
+#define rb_int_start(n)	((n)->low)
+#define rb_int_end(n)	((n)->low + (n)->high - 1)
 
 struct rb_int_node {
 	struct rb_node	node;
diff --git a/ioport.c b/ioport.c
index d9f2e8ea3c3b..844a832d25a4 100644
--- a/ioport.c
+++ b/ioport.c
@@ -2,7 +2,6 @@
 
 #include "kvm/kvm.h"
 #include "kvm/util.h"
-#include "kvm/brlock.h"
 #include "kvm/rbtree-interval.h"
 #include "kvm/mutex.h"
 
@@ -16,6 +15,8 @@
 
 #define ioport_node(n) rb_entry(n, struct ioport, node)
 
+static DEFINE_MUTEX(ioport_lock);
+
 static struct rb_root		ioport_tree = RB_ROOT;
 
 static struct ioport *ioport_search(struct rb_root *root, u64 addr)
@@ -39,6 +40,36 @@ static void ioport_remove(struct rb_root *root, struct ioport *data)
 	rb_int_erase(root, &data->node);
 }
 
+static struct ioport *ioport_get(struct rb_root *root, u64 addr)
+{
+	struct ioport *ioport;
+
+	mutex_lock(&ioport_lock);
+	ioport = ioport_search(root, addr);
+	if (ioport)
+		ioport->refcount++;
+	mutex_unlock(&ioport_lock);
+
+	return ioport;
+}
+
+/* Called with ioport_lock held. */
+static void ioport_unregister(struct rb_root *root, struct ioport *data)
+{
+	device__unregister(&data->dev_hdr);
+	ioport_remove(root, data);
+	free(data);
+}
+
+static void ioport_put(struct rb_root *root, struct ioport *data)
+{
+	mutex_lock(&ioport_lock);
+	data->refcount--;
+	if (data->remove && data->refcount == 0)
+		ioport_unregister(root, data);
+	mutex_unlock(&ioport_lock);
+}
+
 #ifdef CONFIG_HAS_LIBFDT
 static void generate_ioport_fdt_node(void *fdt,
 				     struct device_header *dev_hdr,
@@ -80,16 +111,22 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
 			.bus_type	= DEVICE_BUS_IOPORT,
 			.data		= generate_ioport_fdt_node,
 		},
+		/*
+		 * Start from 0 because ioport__unregister() doesn't decrement
+		 * the reference count.
+		 */
+		.refcount	= 0,
+		.remove		= false,
 	};
 
-	br_write_lock(kvm);
+	mutex_lock(&ioport_lock);
 	r = ioport_insert(&ioport_tree, entry);
 	if (r < 0)
 		goto out_free;
 	r = device__register(&entry->dev_hdr);
 	if (r < 0)
 		goto out_remove;
-	br_write_unlock(kvm);
+	mutex_unlock(&ioport_lock);
 
 	return port;
 
@@ -97,33 +134,28 @@ out_remove:
 	ioport_remove(&ioport_tree, entry);
 out_free:
 	free(entry);
-	br_write_unlock(kvm);
+	mutex_unlock(&ioport_lock);
 	return r;
 }
 
 int ioport__unregister(struct kvm *kvm, u16 port)
 {
 	struct ioport *entry;
-	int r;
-
-	br_write_lock(kvm);
 
-	r = -ENOENT;
+	mutex_lock(&ioport_lock);
 	entry = ioport_search(&ioport_tree, port);
-	if (!entry)
-		goto done;
-
-	device__unregister(&entry->dev_hdr);
-	ioport_remove(&ioport_tree, entry);
-
-	free(entry);
-
-	r = 0;
-
-done:
-	br_write_unlock(kvm);
+	if (!entry) {
+		mutex_unlock(&ioport_lock);
+		return -ENOENT;
+	}
+	/* The same reasoning from kvm__deregister_mmio() applies. */
+	if (entry->refcount == 0)
+		ioport_unregister(&ioport_tree, entry);
+	else
+		entry->remove = true;
+	mutex_unlock(&ioport_lock);
 
-	return r;
+	return 0;
 }
 
 static void ioport__unregister_all(void)
@@ -136,9 +168,7 @@ static void ioport__unregister_all(void)
 	while (rb) {
 		rb_node = rb_int(rb);
 		entry = ioport_node(rb_node);
-		device__unregister(&entry->dev_hdr);
-		ioport_remove(&ioport_tree, entry);
-		free(entry);
+		ioport_unregister(&ioport_tree, entry);
 		rb = rb_first(&ioport_tree);
 	}
 }
@@ -164,8 +194,7 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
 	void *ptr = data;
 	struct kvm *kvm = vcpu->kvm;
 
-	br_read_lock(kvm);
-	entry = ioport_search(&ioport_tree, port);
+	entry = ioport_get(&ioport_tree, port);
 	if (!entry)
 		goto out;
 
@@ -180,9 +209,9 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
 		ptr += size;
 	}
 
-out:
-	br_read_unlock(kvm);
+	ioport_put(&ioport_tree, entry);
 
+out:
 	if (ret)
 		return true;
 
diff --git a/mmio.c b/mmio.c
index 61e1d47a587d..cd141cd30750 100644
--- a/mmio.c
+++ b/mmio.c
@@ -1,7 +1,7 @@
 #include "kvm/kvm.h"
 #include "kvm/kvm-cpu.h"
 #include "kvm/rbtree-interval.h"
-#include "kvm/brlock.h"
+#include "kvm/mutex.h"
 
 #include <stdio.h>
 #include <stdlib.h>
@@ -15,10 +15,14 @@
 
 #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
 
+static DEFINE_MUTEX(mmio_lock);
+
 struct mmio_mapping {
 	struct rb_int_node	node;
 	void			(*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
 	void			*ptr;
+	u32			refcount;
+	bool			remove;
 };
 
 static struct rb_root mmio_tree = RB_ROOT;
@@ -51,6 +55,11 @@ static int mmio_insert(struct rb_root *root, struct mmio_mapping *data)
 	return rb_int_insert(root, &data->node);
 }
 
+static void mmio_remove(struct rb_root *root, struct mmio_mapping *data)
+{
+	rb_int_erase(root, &data->node);
+}
+
 static const char *to_direction(u8 is_write)
 {
 	if (is_write)
@@ -59,6 +68,41 @@ static const char *to_direction(u8 is_write)
 	return "read";
 }
 
+static struct mmio_mapping *mmio_get(struct rb_root *root, u64 phys_addr, u32 len)
+{
+	struct mmio_mapping *mmio;
+
+	mutex_lock(&mmio_lock);
+	mmio = mmio_search(root, phys_addr, len);
+	if (mmio)
+		mmio->refcount++;
+	mutex_unlock(&mmio_lock);
+
+	return mmio;
+}
+
+/* Called with mmio_lock held. */
+static void mmio_deregister(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio)
+{
+	struct kvm_coalesced_mmio_zone zone = (struct kvm_coalesced_mmio_zone) {
+		.addr	= rb_int_start(&mmio->node),
+		.size	= 1,
+	};
+	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
+
+	mmio_remove(root, mmio);
+	free(mmio);
+}
+
+static void mmio_put(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio)
+{
+	mutex_lock(&mmio_lock);
+	mmio->refcount--;
+	if (mmio->remove && mmio->refcount == 0)
+		mmio_deregister(kvm, root, mmio);
+	mutex_unlock(&mmio_lock);
+}
+
 int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
 		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
 			void *ptr)
@@ -72,9 +116,15 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
 		return -ENOMEM;
 
 	*mmio = (struct mmio_mapping) {
-		.node = RB_INT_INIT(phys_addr, phys_addr + phys_addr_len),
-		.mmio_fn = mmio_fn,
-		.ptr	= ptr,
+		.node		= RB_INT_INIT(phys_addr, phys_addr + phys_addr_len),
+		.mmio_fn	= mmio_fn,
+		.ptr		= ptr,
+		/*
+		 * Start from 0 because kvm__deregister_mmio() doesn't decrement
+		 * the reference count.
+		 */
+		.refcount	= 0,
+		.remove		= false,
 	};
 
 	if (coalesce) {
@@ -88,9 +138,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
 			return -errno;
 		}
 	}
-	br_write_lock(kvm);
+	mutex_lock(&mmio_lock);
 	ret = mmio_insert(&mmio_tree, mmio);
-	br_write_unlock(kvm);
+	mutex_unlock(&mmio_lock);
 
 	return ret;
 }
@@ -98,25 +148,30 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
 bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
 {
 	struct mmio_mapping *mmio;
-	struct kvm_coalesced_mmio_zone zone;
 
-	br_write_lock(kvm);
+	mutex_lock(&mmio_lock);
 	mmio = mmio_search_single(&mmio_tree, phys_addr);
 	if (mmio == NULL) {
-		br_write_unlock(kvm);
+		mutex_unlock(&mmio_lock);
 		return false;
 	}
+	/*
+	 * The PCI emulation code calls this function when memory access is
+	 * disabled for a device, or when a BAR has a new address assigned. PCI
+	 * emulation doesn't use any locks and as a result we can end up in a
+	 * situation where we have called mmio_get() to do emulation on one VCPU
+	 * thread (let's call it VCPU0), and several other VCPU threads have
+	 * called kvm__deregister_mmio(). In this case, if we decrement refcount
+	 * kvm__deregister_mmio() (either directly, or by calling mmio_put()),
+	 * refcount will reach 0 and we will free the mmio node before VCPU0 has
+	 * called mmio_put(). This will trigger use-after-free errors on VCPU0.
+	 */
+	if (mmio->refcount == 0)
+		mmio_deregister(kvm, &mmio_tree, mmio);
+	else
+		mmio->remove = true;
+	mutex_unlock(&mmio_lock);
 
-	zone = (struct kvm_coalesced_mmio_zone) {
-		.addr	= phys_addr,
-		.size	= 1,
-	};
-	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
-
-	rb_int_erase(&mmio_tree, &mmio->node);
-	br_write_unlock(kvm);
-
-	free(mmio);
 	return true;
 }
 
@@ -124,18 +179,18 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
 {
 	struct mmio_mapping *mmio;
 
-	br_read_lock(vcpu->kvm);
-	mmio = mmio_search(&mmio_tree, phys_addr, len);
-
-	if (mmio)
-		mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
-	else {
+	mmio = mmio_get(&mmio_tree, phys_addr, len);
+	if (!mmio) {
 		if (vcpu->kvm->cfg.mmio_debug)
 			fprintf(stderr,	"Warning: Ignoring MMIO %s at %016llx (length %u)\n",
 				to_direction(is_write),
 				(unsigned long long)phys_addr, len);
+		goto out;
 	}
-	br_read_unlock(vcpu->kvm);
 
+	mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
+	mmio_put(vcpu->kvm, &mmio_tree, mmio);
+
+out:
 	return true;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 02/12] pci: Add helpers for BAR values and memory/IO space access
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 01/12] ioport: mmio: Use a mutex and reference counting for locking Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 03/12] virtio/pci: Get emulated region address from BARs Alexandru Elisei
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

We're going to be checking the BAR type, the address written to it and if
access to memory or I/O space is enabled quite often when we add support
for reasignable BARs; make our life easier by adding helpers for it.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/pci.h   | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 pci.c               |  4 ++--
 powerpc/spapr_pci.c |  2 +-
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 2c29c094d4cb..0f0815b36157 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -5,6 +5,7 @@
 #include <linux/kvm.h>
 #include <linux/pci_regs.h>
 #include <endian.h>
+#include <stdbool.h>
 
 #include "kvm/devices.h"
 #include "kvm/msi.h"
@@ -161,4 +162,56 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
 
+static inline bool __pci__memory_space_enabled(u16 command)
+{
+	return command & PCI_COMMAND_MEMORY;
+}
+
+static inline bool pci__memory_space_enabled(struct pci_device_header *pci_hdr)
+{
+	return __pci__memory_space_enabled(pci_hdr->command);
+}
+
+static inline bool __pci__io_space_enabled(u16 command)
+{
+	return command & PCI_COMMAND_IO;
+}
+
+static inline bool pci__io_space_enabled(struct pci_device_header *pci_hdr)
+{
+	return __pci__io_space_enabled(pci_hdr->command);
+}
+
+static inline bool __pci__bar_is_io(u32 bar)
+{
+	return bar & PCI_BASE_ADDRESS_SPACE_IO;
+}
+
+static inline bool pci__bar_is_io(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return __pci__bar_is_io(pci_hdr->bar[bar_num]);
+}
+
+static inline bool pci__bar_is_memory(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return !pci__bar_is_io(pci_hdr, bar_num);
+}
+
+static inline u32 __pci__bar_address(u32 bar)
+{
+	if (__pci__bar_is_io(bar))
+		return bar & PCI_BASE_ADDRESS_IO_MASK;
+	return bar & PCI_BASE_ADDRESS_MEM_MASK;
+}
+
+static inline u32 pci__bar_address(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return __pci__bar_address(pci_hdr->bar[bar_num]);
+}
+
+static inline u32 pci__bar_size(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return pci_hdr->bar_size[bar_num];
+}
+
 #endif /* KVM__PCI_H */
diff --git a/pci.c b/pci.c
index 3ecdd0f9c75c..81e9cec918fb 100644
--- a/pci.c
+++ b/pci.c
@@ -185,7 +185,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	 * size, it will write the address back.
 	 */
 	if (bar < 6) {
-		if (pci_hdr->bar[bar] & PCI_BASE_ADDRESS_SPACE_IO)
+		if (pci__bar_is_io(pci_hdr, bar))
 			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
 		else
 			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
@@ -211,7 +211,7 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 		 */
 		memcpy(&value, data, size);
 		if (value == 0xffffffff)
-			value = ~(pci_hdr->bar_size[bar] - 1);
+			value = ~(pci__bar_size(pci_hdr, bar) - 1);
 		/* Preserve the special bits. */
 		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
 		memcpy(base + offset, &value, size);
diff --git a/powerpc/spapr_pci.c b/powerpc/spapr_pci.c
index a15f7d895a46..7be44d950acb 100644
--- a/powerpc/spapr_pci.c
+++ b/powerpc/spapr_pci.c
@@ -369,7 +369,7 @@ int spapr_populate_pci_devices(struct kvm *kvm,
 				of_pci_b_ddddd(devid) |
 				of_pci_b_fff(fn) |
 				of_pci_b_rrrrrrrr(bars[i]));
-			reg[n+1].size = cpu_to_be64(hdr->bar_size[i]);
+			reg[n+1].size = cpu_to_be64(pci__bar_size(hdr, i));
 			reg[n+1].addr = 0;
 
 			assigned_addresses[n].phys_hi = cpu_to_be32(
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 03/12] virtio/pci: Get emulated region address from BARs
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 01/12] ioport: mmio: Use a mutex and reference counting for locking Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 02/12] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 04/12] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

The struct virtio_pci fields port_addr, mmio_addr and msix_io_block
represent the same addresses that are written in the corresponding BARs.
Remove this duplication of information and always use the address from the
BAR. This will make our life a lot easier when we add support for
reassignable BARs, because we won't have to update the fields on each BAR
change.

No functional changes.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/virtio-pci.h |  3 --
 virtio/pci.c             | 82 ++++++++++++++++++++++++++++++------------------
 2 files changed, 52 insertions(+), 33 deletions(-)

diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h
index 278a25950d8b..959b4b81c871 100644
--- a/include/kvm/virtio-pci.h
+++ b/include/kvm/virtio-pci.h
@@ -24,8 +24,6 @@ struct virtio_pci {
 	void			*dev;
 	struct kvm		*kvm;
 
-	u16			port_addr;
-	u32			mmio_addr;
 	u8			status;
 	u8			isr;
 	u32			features;
@@ -43,7 +41,6 @@ struct virtio_pci {
 	u32			config_gsi;
 	u32			vq_vector[VIRTIO_PCI_MAX_VQ];
 	u32			gsis[VIRTIO_PCI_MAX_VQ];
-	u32			msix_io_block;
 	u64			msix_pba;
 	struct msix_table	msix_table[VIRTIO_PCI_MAX_VQ + VIRTIO_PCI_MAX_CONFIG];
 
diff --git a/virtio/pci.c b/virtio/pci.c
index c6529493f06f..eded8685e1b3 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -13,6 +13,21 @@
 #include <linux/byteorder.h>
 #include <string.h>
 
+static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
+{
+	return pci__bar_address(&vpci->pci_hdr, 0);
+}
+
+static u32 virtio_pci__mmio_addr(struct virtio_pci *vpci)
+{
+	return pci__bar_address(&vpci->pci_hdr, 1);
+}
+
+static u32 virtio_pci__msix_io_addr(struct virtio_pci *vpci)
+{
+	return pci__bar_address(&vpci->pci_hdr, 2);
+}
+
 static void virtio_pci__ioevent_callback(struct kvm *kvm, void *param)
 {
 	struct virtio_pci_ioevent_param *ioeventfd = param;
@@ -25,6 +40,8 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 {
 	struct ioevent ioevent;
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
+	u16 port_addr = virtio_pci__port_addr(vpci);
 	int r, flags = 0;
 	int fd;
 
@@ -48,7 +65,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 		flags |= IOEVENTFD_FLAG_USER_POLL;
 
 	/* ioport */
-	ioevent.io_addr	= vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY;
+	ioevent.io_addr	= port_addr + VIRTIO_PCI_QUEUE_NOTIFY;
 	ioevent.io_len	= sizeof(u16);
 	ioevent.fd	= fd = eventfd(0, 0);
 	r = ioeventfd__add_event(&ioevent, flags | IOEVENTFD_FLAG_PIO);
@@ -56,7 +73,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 		return r;
 
 	/* mmio */
-	ioevent.io_addr	= vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY;
+	ioevent.io_addr	= mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY;
 	ioevent.io_len	= sizeof(u16);
 	ioevent.fd	= eventfd(0, 0);
 	r = ioeventfd__add_event(&ioevent, flags);
@@ -68,7 +85,7 @@ static int virtio_pci__init_ioeventfd(struct kvm *kvm, struct virtio_device *vde
 	return 0;
 
 free_ioport_evt:
-	ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
+	ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
 	return r;
 }
 
@@ -76,9 +93,11 @@ static void virtio_pci_exit_vq(struct kvm *kvm, struct virtio_device *vdev,
 			       int vq)
 {
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
+	u16 port_addr = virtio_pci__port_addr(vpci);
 
-	ioeventfd__del_event(vpci->mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
-	ioeventfd__del_event(vpci->port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
+	ioeventfd__del_event(mmio_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
+	ioeventfd__del_event(port_addr + VIRTIO_PCI_QUEUE_NOTIFY, vq);
 	virtio_exit_vq(kvm, vdev, vpci->dev, vq);
 }
 
@@ -162,7 +181,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 p
 {
 	struct virtio_device *vdev = ioport->priv;
 	struct virtio_pci *vpci = vdev->virtio;
-	unsigned long offset = port - vpci->port_addr;
+	unsigned long offset = port - virtio_pci__port_addr(vpci);
 
 	return virtio_pci__data_in(vcpu, vdev, offset, data, size);
 }
@@ -318,7 +337,7 @@ static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 {
 	struct virtio_device *vdev = ioport->priv;
 	struct virtio_pci *vpci = vdev->virtio;
-	unsigned long offset = port - vpci->port_addr;
+	unsigned long offset = port - virtio_pci__port_addr(vpci);
 
 	return virtio_pci__data_out(vcpu, vdev, offset, data, size);
 }
@@ -335,17 +354,18 @@ static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
 	struct virtio_device *vdev = ptr;
 	struct virtio_pci *vpci = vdev->virtio;
 	struct msix_table *table;
+	u32 msix_io_addr = virtio_pci__msix_io_addr(vpci);
 	int vecnum;
 	size_t offset;
 
-	if (addr > vpci->msix_io_block + PCI_IO_SIZE) {
+	if (addr > msix_io_addr + PCI_IO_SIZE) {
 		if (is_write)
 			return;
 		table  = (struct msix_table *)&vpci->msix_pba;
-		offset = addr - (vpci->msix_io_block + PCI_IO_SIZE);
+		offset = addr - (msix_io_addr + PCI_IO_SIZE);
 	} else {
 		table  = vpci->msix_table;
-		offset = addr - vpci->msix_io_block;
+		offset = addr - msix_io_addr;
 	}
 	vecnum = offset / sizeof(struct msix_table);
 	offset = offset % sizeof(struct msix_table);
@@ -434,19 +454,20 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 {
 	struct virtio_device *vdev = ptr;
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr = virtio_pci__mmio_addr(vpci);
 
 	if (!is_write)
-		virtio_pci__data_in(vcpu, vdev, addr - vpci->mmio_addr,
-				    data, len);
+		virtio_pci__data_in(vcpu, vdev, addr - mmio_addr, data, len);
 	else
-		virtio_pci__data_out(vcpu, vdev, addr - vpci->mmio_addr,
-				     data, len);
+		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
 }
 
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		     int device_id, int subsys_id, int class)
 {
 	struct virtio_pci *vpci = vdev->virtio;
+	u32 mmio_addr, msix_io_block;
+	u16 port_addr;
 	int r;
 
 	vpci->kvm = kvm;
@@ -454,20 +475,21 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
-	r = pci_get_io_port_block(PCI_IO_SIZE);
-	r = ioport__register(kvm, r, &virtio_pci__io_ops, PCI_IO_SIZE, vdev);
+	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
+	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
+			     vdev);
 	if (r < 0)
 		return r;
-	vpci->port_addr = (u16)r;
+	port_addr = (u16)r;
 
-	vpci->mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
-	r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
+	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
+	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
 			       virtio_pci__io_mmio_callback, vdev);
 	if (r < 0)
 		goto free_ioport;
 
-	vpci->msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
-	r = kvm__register_mmio(kvm, vpci->msix_io_block, PCI_IO_SIZE * 2, false,
+	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
+	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
 			       virtio_pci__msix_mmio_callback, vdev);
 	if (r < 0)
 		goto free_mmio;
@@ -483,11 +505,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.class[2]		= (class >> 16) & 0xff,
 		.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
 		.subsys_id		= cpu_to_le16(subsys_id),
-		.bar[0]			= cpu_to_le32(vpci->port_addr
+		.bar[0]			= cpu_to_le32(port_addr
 							| PCI_BASE_ADDRESS_SPACE_IO),
-		.bar[1]			= cpu_to_le32(vpci->mmio_addr
+		.bar[1]			= cpu_to_le32(mmio_addr
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
-		.bar[2]			= cpu_to_le32(vpci->msix_io_block
+		.bar[2]			= cpu_to_le32(msix_io_block
 							| PCI_BASE_ADDRESS_SPACE_MEMORY),
 		.status			= cpu_to_le16(PCI_STATUS_CAP_LIST),
 		.capabilities		= (void *)&vpci->pci_hdr.msix - (void *)&vpci->pci_hdr,
@@ -533,11 +555,11 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	return 0;
 
 free_msix_mmio:
-	kvm__deregister_mmio(kvm, vpci->msix_io_block);
+	kvm__deregister_mmio(kvm, msix_io_block);
 free_mmio:
-	kvm__deregister_mmio(kvm, vpci->mmio_addr);
+	kvm__deregister_mmio(kvm, mmio_addr);
 free_ioport:
-	ioport__unregister(kvm, vpci->port_addr);
+	ioport__unregister(kvm, port_addr);
 	return r;
 }
 
@@ -557,9 +579,9 @@ int virtio_pci__exit(struct kvm *kvm, struct virtio_device *vdev)
 	struct virtio_pci *vpci = vdev->virtio;
 
 	virtio_pci__reset(kvm, vdev);
-	kvm__deregister_mmio(kvm, vpci->mmio_addr);
-	kvm__deregister_mmio(kvm, vpci->msix_io_block);
-	ioport__unregister(kvm, vpci->port_addr);
+	kvm__deregister_mmio(kvm, virtio_pci__mmio_addr(vpci));
+	kvm__deregister_mmio(kvm, virtio_pci__msix_io_addr(vpci));
+	ioport__unregister(kvm, virtio_pci__port_addr(vpci));
 
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 04/12] vfio: Reserve ioports when configuring the BAR
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (2 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 03/12] virtio/pci: Get emulated region address from BARs Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 05/12] pci: Limit configuration transaction size to 32 bits Alexandru Elisei
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

Let's be consistent and reserve ioports when we are configuring the BAR,
not when we map it, just like we do with mmio regions.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/core.c | 9 +++------
 vfio/pci.c  | 4 +++-
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/vfio/core.c b/vfio/core.c
index b80ebf3bb913..bad3c7c8cd26 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -202,14 +202,11 @@ static int vfio_setup_trap_region(struct kvm *kvm, struct vfio_device *vdev,
 				  struct vfio_region *region)
 {
 	if (region->is_ioport) {
-		int port = pci_get_io_port_block(region->info.size);
-
-		port = ioport__register(kvm, port, &vfio_ioport_ops,
-					region->info.size, region);
+		int port = ioport__register(kvm, region->port_base,
+					   &vfio_ioport_ops, region->info.size,
+					   region);
 		if (port < 0)
 			return port;
-
-		region->port_base = port;
 		return 0;
 	}
 
diff --git a/vfio/pci.c b/vfio/pci.c
index 89d29b89db43..0b548e4bf9e2 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -894,7 +894,9 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 		}
 	}
 
-	if (!region->is_ioport) {
+	if (region->is_ioport) {
+		region->port_base = pci_get_io_port_block(region->info.size);
+	} else {
 		/* Grab some MMIO space in the guest */
 		map_size = ALIGN(region->info.size, PAGE_SIZE);
 		region->guest_phys_addr = pci_get_mmio_block(map_size);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 05/12] pci: Limit configuration transaction size to 32 bits
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (3 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 04/12] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 06/12] vfio/pci: Don't write configuration value twice Alexandru Elisei
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

From PCI Local Bus Specification Revision 3.0. section 3.8 "64-Bit Bus
Extension":

"The bandwidth requirements for I/O and configuration transactions cannot
justify the added complexity, and, therefore, only memory transactions
support 64-bit data transfers".

Further down, the spec also describes the possible responses of a target
which has been requested to do a 64-bit transaction. Limit the transaction
to the lower 32 bits, to match the second accepted behaviour.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 pci.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/pci.c b/pci.c
index 81e9cec918fb..eb0bb366a291 100644
--- a/pci.c
+++ b/pci.c
@@ -119,6 +119,9 @@ static bool pci_config_data_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 {
 	union pci_config_address pci_config_address;
 
+	if (size > 4)
+		size = 4;
+
 	pci_config_address.w = ioport__read32(&pci_config_address_bits);
 	/*
 	 * If someone accesses PCI configuration space offsets that are not
@@ -135,6 +138,9 @@ static bool pci_config_data_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16
 {
 	union pci_config_address pci_config_address;
 
+	if (size > 4)
+		size = 4;
+
 	pci_config_address.w = ioport__read32(&pci_config_address_bits);
 	/*
 	 * If someone accesses PCI configuration space offsets that are not
@@ -248,6 +254,9 @@ static void pci_config_mmio_access(struct kvm_cpu *vcpu, u64 addr, u8 *data,
 	cfg_addr.w		= (u32)addr;
 	cfg_addr.enable_bit	= 1;
 
+	if (len > 4)
+		len = 4;
+
 	if (is_write)
 		pci__config_wr(kvm, cfg_addr, data, len);
 	else
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 06/12] vfio/pci: Don't write configuration value twice
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (4 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 05/12] pci: Limit configuration transaction size to 32 bits Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 16:55   ` André Przywara
  2020-05-14 15:38 ` [PATCH v4 kvmtool 07/12] Don't allow more than one framebuffers Alexandru Elisei
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

After writing to the device fd as part of the PCI configuration space
emulation, we read back from the device to make sure that the write
finished. The value is read back into the PCI configuration space and
afterwards, the same value is copied by the PCI emulation code. Let's
read from the device fd into a temporary variable, to prevent this
double write.

The double write is harmless in itself. But when we implement
reassignable BARs, we need to keep track of the old BAR value, and the
VFIO code is overwritting it.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/pci.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/vfio/pci.c b/vfio/pci.c
index 0b548e4bf9e2..2de893407574 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -3,6 +3,8 @@
 #include "kvm/kvm-cpu.h"
 #include "kvm/vfio.h"
 
+#include <assert.h>
+
 #include <sys/ioctl.h>
 #include <sys/eventfd.h>
 #include <sys/resource.h>
@@ -478,7 +480,10 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	struct vfio_region_info *info;
 	struct vfio_pci_device *pdev;
 	struct vfio_device *vdev;
-	void *base = pci_hdr;
+	u32 tmp;
+
+	/* Make sure a larger size will not overrun tmp on the stack. */
+	assert(sz <= 4);
 
 	if (offset == PCI_ROM_ADDRESS)
 		return;
@@ -498,7 +503,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
 	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
 		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
 
-	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
+	if (pread(vdev->fd, &tmp, sz, info->offset + offset) != sz)
 		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
 			      sz, offset);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 07/12] Don't allow more than one framebuffers
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (5 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 06/12] vfio/pci: Don't write configuration value twice Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 16:56   ` André Przywara
  2020-05-14 15:38 ` [PATCH v4 kvmtool 08/12] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

A vesa device is used by the SDL, GTK or VNC framebuffers. Don't allow the
user to specify more than one of these options because kvmtool will create
identical vesa devices and bad things will happen:

$ ./lkvm run -c2 -m2048 -k bzImage --sdl --gtk
  # lkvm run -k bzImage -m 2048 -c 2 --name guest-10159
  Error: device region [d0000000-d012bfff] would overlap device region [d0000000-d012bfff]
*** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
*** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
*** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
======= Backtrace: =========
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
(+0x777e5)[0x7fae0ed447e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fae0ed5153c]
*** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
/lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
/usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
/usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
======= Backtrace: =========
Aborted (core dumped)

The vesa device is explicitly created during the initialization phase of
the above framebuffers. Also remove the superfluous check for their
existence.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 builtin-run.c | 5 +++++
 hw/vesa.c     | 2 --
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin-run.c b/builtin-run.c
index 03b119e29995..c23e7a2e5ecd 100644
--- a/builtin-run.c
+++ b/builtin-run.c
@@ -543,6 +543,11 @@ static struct kvm *kvm_cmd_run_init(int argc, const char **argv)
 		kvm->cfg.console = DEFAULT_CONSOLE;
 
 	video = kvm->cfg.vnc || kvm->cfg.sdl || kvm->cfg.gtk;
+	if (video) {
+		if ((kvm->cfg.vnc && (kvm->cfg.sdl || kvm->cfg.gtk)) ||
+		    (kvm->cfg.sdl && kvm->cfg.gtk))
+			die("Only one of --vnc, --sdl or --gtk can be specified");
+	}
 
 	if (!strncmp(kvm->cfg.console, "virtio", 6))
 		kvm->cfg.active_console  = CONSOLE_VIRTIO;
diff --git a/hw/vesa.c b/hw/vesa.c
index dd59a112330b..a4ec7e16e1e6 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -61,8 +61,6 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
 	BUILD_BUG_ON(VESA_MEM_SIZE < VESA_BPP/8 * VESA_WIDTH * VESA_HEIGHT);
 
-	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
-		return NULL;
 	vesa_base_addr = pci_get_io_port_block(PCI_IO_SIZE);
 	r = ioport__register(kvm, vesa_base_addr, &vesa_io_ops, PCI_IO_SIZE, NULL);
 	if (r < 0)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 08/12] pci: Implement callbacks for toggling BAR emulation
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (6 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 07/12] Don't allow more than one framebuffers Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 16:56   ` André Przywara
  2020-05-14 15:38 ` [PATCH v4 kvmtool 09/12] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

Implement callbacks for activating and deactivating emulation for a BAR
region. This is in preparation for allowing a guest operating system to
enable and disable access to I/O or memory space, or to reassign the
BARs.

The emulated vesa device framebuffer isn't designed to allow stopping and
restarting at arbitrary points in the guest execution. Furthermore, on x86,
the kernel will not change the BAR addresses, which on bare metal are
programmed by the firmware, so take the easy way out and refuse to
activate/deactivate emulation for the BAR regions. We also take this
opportunity to make the vesa emulation code more consistent by moving all
static variable definitions in one place, at the top of the file.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 hw/vesa.c         |  70 ++++++++++++++++++++++-------------
 include/kvm/pci.h |  18 ++++++++-
 pci.c             |  39 ++++++++++++++++++++
 vfio/pci.c        | 107 ++++++++++++++++++++++++++++++++++++++++++++++--------
 virtio/pci.c      |  92 ++++++++++++++++++++++++++++++++++------------
 5 files changed, 258 insertions(+), 68 deletions(-)

diff --git a/hw/vesa.c b/hw/vesa.c
index a4ec7e16e1e6..8659a0028d31 100644
--- a/hw/vesa.c
+++ b/hw/vesa.c
@@ -18,6 +18,31 @@
 #include <inttypes.h>
 #include <unistd.h>
 
+static struct pci_device_header vesa_pci_device = {
+	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
+	.device_id	= cpu_to_le16(PCI_DEVICE_ID_VESA),
+	.header_type	= PCI_HEADER_TYPE_NORMAL,
+	.revision_id	= 0,
+	.class[2]	= 0x03,
+	.subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
+	.subsys_id	= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
+	.bar[1]		= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
+	.bar_size[1]	= VESA_MEM_SIZE,
+};
+
+static struct device_header vesa_device = {
+	.bus_type	= DEVICE_BUS_PCI,
+	.data		= &vesa_pci_device,
+};
+
+static struct framebuffer vesafb = {
+	.width		= VESA_WIDTH,
+	.height		= VESA_HEIGHT,
+	.depth		= VESA_BPP,
+	.mem_addr	= VESA_MEM_ADDR,
+	.mem_size	= VESA_MEM_SIZE,
+};
+
 static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
 {
 	return true;
@@ -33,24 +58,19 @@ static struct ioport_operations vesa_io_ops = {
 	.io_out			= vesa_pci_io_out,
 };
 
-static struct pci_device_header vesa_pci_device = {
-	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
-	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
-	.header_type		= PCI_HEADER_TYPE_NORMAL,
-	.revision_id		= 0,
-	.class[2]		= 0x03,
-	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
-	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
-	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
-	.bar_size[1]		= VESA_MEM_SIZE,
-};
-
-static struct device_header vesa_device = {
-	.bus_type	= DEVICE_BUS_PCI,
-	.data		= &vesa_pci_device,
-};
+static int vesa__bar_activate(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      int bar_num, void *data)
+{
+	/* We don't support remapping of the framebuffer. */
+	return 0;
+}
 
-static struct framebuffer vesafb;
+static int vesa__bar_deactivate(struct kvm *kvm, struct pci_device_header *pci_hdr,
+				int bar_num, void *data)
+{
+	/* We don't support remapping of the framebuffer. */
+	return -EINVAL;
+}
 
 struct framebuffer *vesa__init(struct kvm *kvm)
 {
@@ -68,6 +88,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 
 	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
 	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
+	r = pci__register_bar_regions(kvm, &vesa_pci_device, vesa__bar_activate,
+				      vesa__bar_deactivate, NULL);
+	if (r < 0)
+		goto unregister_ioport;
+
 	r = device__register(&vesa_device);
 	if (r < 0)
 		goto unregister_ioport;
@@ -82,15 +107,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
 	if (r < 0)
 		goto unmap_dev;
 
-	vesafb = (struct framebuffer) {
-		.width			= VESA_WIDTH,
-		.height			= VESA_HEIGHT,
-		.depth			= VESA_BPP,
-		.mem			= mem,
-		.mem_addr		= VESA_MEM_ADDR,
-		.mem_size		= VESA_MEM_SIZE,
-		.kvm			= kvm,
-	};
+	vesafb.mem = mem;
+	vesafb.kvm = kvm;
 	return fb__register(&vesafb);
 
 unmap_dev:
diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 0f0815b36157..73e06d76d244 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -89,12 +89,19 @@ struct pci_cap_hdr {
 	u8	next;
 };
 
+struct pci_device_header;
+
+typedef int (*bar_activate_fn_t)(struct kvm *kvm,
+				 struct pci_device_header *pci_hdr,
+				 int bar_num, void *data);
+typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
+				   struct pci_device_header *pci_hdr,
+				   int bar_num, void *data);
+
 #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
 #define PCI_DEV_CFG_SIZE	256
 #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
 
-struct pci_device_header;
-
 struct pci_config_operations {
 	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
 		      u8 offset, void *data, int sz);
@@ -136,6 +143,9 @@ struct pci_device_header {
 
 	/* Private to lkvm */
 	u32		bar_size[6];
+	bar_activate_fn_t	bar_activate_fn;
+	bar_deactivate_fn_t	bar_deactivate_fn;
+	void *data;
 	struct pci_config_operations	cfg_ops;
 	/*
 	 * PCI INTx# are level-triggered, but virtual device often feature
@@ -162,6 +172,10 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
 
 void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
 
+int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      bar_activate_fn_t bar_activate_fn,
+			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
+
 static inline bool __pci__memory_space_enabled(u16 command)
 {
 	return command & PCI_COMMAND_MEMORY;
diff --git a/pci.c b/pci.c
index eb0bb366a291..b8e71b5f8d6c 100644
--- a/pci.c
+++ b/pci.c
@@ -66,6 +66,11 @@ int pci__assign_irq(struct pci_device_header *pci_hdr)
 	return pci_hdr->irq_line;
 }
 
+static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return pci__bar_size(pci_hdr, bar_num);
+}
+
 static void *pci_config_address_ptr(u16 port)
 {
 	unsigned long offset;
@@ -273,6 +278,40 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
 	return hdr->data;
 }
 
+int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      bar_activate_fn_t bar_activate_fn,
+			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
+{
+	int i, r;
+
+	assert(bar_activate_fn && bar_deactivate_fn);
+
+	pci_hdr->bar_activate_fn = bar_activate_fn;
+	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
+	pci_hdr->data = data;
+
+	for (i = 0; i < 6; i++) {
+		if (!pci_bar_is_implemented(pci_hdr, i))
+			continue;
+
+		if (pci__bar_is_io(pci_hdr, i) &&
+		    pci__io_space_enabled(pci_hdr)) {
+			r = bar_activate_fn(kvm, pci_hdr, i, data);
+			if (r < 0)
+				return r;
+		}
+
+		if (pci__bar_is_memory(pci_hdr, i) &&
+		    pci__memory_space_enabled(pci_hdr)) {
+			r = bar_activate_fn(kvm, pci_hdr, i, data);
+			if (r < 0)
+				return r;
+		}
+	}
+
+	return 0;
+}
+
 int pci__init(struct kvm *kvm)
 {
 	int r;
diff --git a/vfio/pci.c b/vfio/pci.c
index 2de893407574..34f19792765e 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -10,6 +10,8 @@
 #include <sys/resource.h>
 #include <sys/time.h>
 
+#include <assert.h>
+
 /* Wrapper around UAPI vfio_irq_set */
 union vfio_irq_eventfd {
 	struct vfio_irq_set	irq;
@@ -456,6 +458,88 @@ out_unlock:
 	mutex_unlock(&pdev->msi.mutex);
 }
 
+static int vfio_pci_bar_activate(struct kvm *kvm,
+				 struct pci_device_header *pci_hdr,
+				 int bar_num, void *data)
+{
+	struct vfio_device *vdev = data;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
+	struct vfio_pci_msix_table *table = &pdev->msix_table;
+	struct vfio_region *region;
+	bool has_msix;
+	int ret;
+
+	assert((u32)bar_num < vdev->info.num_regions);
+
+	region = &vdev->regions[bar_num];
+	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
+
+	if (has_msix && (u32)bar_num == table->bar) {
+		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
+					 table->size, false,
+					 vfio_pci_msix_table_access, pdev);
+		/*
+		 * The MSIX table and the PBA structure can share the same BAR,
+		 * but for convenience we register different regions for mmio
+		 * emulation. We want to we update both if they share the same
+		 * BAR.
+		 */
+		if (ret < 0 || table->bar != pba->bar)
+			goto out;
+	}
+
+	if (has_msix && (u32)bar_num == pba->bar) {
+		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
+					 pba->size, false,
+					 vfio_pci_msix_pba_access, pdev);
+		goto out;
+	}
+
+	ret = vfio_map_region(kvm, vdev, region);
+out:
+	return ret;
+}
+
+static int vfio_pci_bar_deactivate(struct kvm *kvm,
+				   struct pci_device_header *pci_hdr,
+				   int bar_num, void *data)
+{
+	struct vfio_device *vdev = data;
+	struct vfio_pci_device *pdev = &vdev->pci;
+	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
+	struct vfio_pci_msix_table *table = &pdev->msix_table;
+	struct vfio_region *region;
+	bool has_msix, success;
+	int ret;
+
+	assert((u32)bar_num < vdev->info.num_regions);
+
+	region = &vdev->regions[bar_num];
+	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
+
+	if (has_msix && (u32)bar_num == table->bar) {
+		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
+		/* kvm__deregister_mmio fails when the region is not found. */
+		ret = (success ? 0 : -ENOENT);
+		/* See vfio_pci_bar_activate(). */
+		if (ret < 0 || table->bar!= pba->bar)
+			goto out;
+	}
+
+	if (has_msix && (u32)bar_num == pba->bar) {
+		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
+		ret = (success ? 0 : -ENOENT);
+		goto out;
+	}
+
+	vfio_unmap_region(kvm, region);
+	ret = 0;
+
+out:
+	return ret;
+}
+
 static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
 			      u8 offset, void *data, int sz)
 {
@@ -818,12 +902,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
 		ret = -ENOMEM;
 		goto out_free;
 	}
-	pba->guest_phys_addr = table->guest_phys_addr + table->size;
-
-	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
-				 false, vfio_pci_msix_table_access, pdev);
-	if (ret < 0)
-		goto out_free;
 
 	/*
 	 * We could map the physical PBA directly into the guest, but it's
@@ -833,10 +911,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
 	 * between MSI-X table and PBA. For the sake of isolation, create a
 	 * virtual PBA.
 	 */
-	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
-				 vfio_pci_msix_pba_access, pdev);
-	if (ret < 0)
-		goto out_free;
+	pba->guest_phys_addr = table->guest_phys_addr + table->size;
 
 	pdev->msix.entries = entries;
 	pdev->msix.nr_entries = nr_entries;
@@ -907,11 +982,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
 		region->guest_phys_addr = pci_get_mmio_block(map_size);
 	}
 
-	/* Map the BARs into the guest or setup a trap region. */
-	ret = vfio_map_region(kvm, vdev, region);
-	if (ret)
-		return ret;
-
 	return 0;
 }
 
@@ -958,7 +1028,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
 	}
 
 	/* We've configured the BARs, fake up a Configuration Space */
-	return vfio_pci_fixup_cfg_space(vdev);
+	ret = vfio_pci_fixup_cfg_space(vdev);
+	if (ret)
+		return ret;
+
+	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
+					 vfio_pci_bar_deactivate, vdev);
 }
 
 /*
diff --git a/virtio/pci.c b/virtio/pci.c
index eded8685e1b3..6eea6c686695 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -11,6 +11,7 @@
 #include <sys/ioctl.h>
 #include <linux/virtio_pci.h>
 #include <linux/byteorder.h>
+#include <assert.h>
 #include <string.h>
 
 static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
@@ -462,6 +463,66 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
 		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
 }
 
+static int virtio_pci__bar_activate(struct kvm *kvm,
+				    struct pci_device_header *pci_hdr,
+				    int bar_num, void *data)
+{
+	struct virtio_device *vdev = data;
+	u32 bar_addr, bar_size;
+	int r = -EINVAL;
+
+	assert(bar_num <= 2);
+
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+	bar_size = pci__bar_size(pci_hdr, bar_num);
+
+	switch (bar_num) {
+	case 0:
+		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
+				     bar_size, vdev);
+		if (r > 0)
+			r = 0;
+		break;
+	case 1:
+		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
+					virtio_pci__io_mmio_callback, vdev);
+		break;
+	case 2:
+		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
+					virtio_pci__msix_mmio_callback, vdev);
+		break;
+	}
+
+	return r;
+}
+
+static int virtio_pci__bar_deactivate(struct kvm *kvm,
+				      struct pci_device_header *pci_hdr,
+				      int bar_num, void *data)
+{
+	u32 bar_addr;
+	bool success;
+	int r = -EINVAL;
+
+	assert(bar_num <= 2);
+
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+
+	switch (bar_num) {
+	case 0:
+		r = ioport__unregister(kvm, bar_addr);
+		break;
+	case 1:
+	case 2:
+		success = kvm__deregister_mmio(kvm, bar_addr);
+		/* kvm__deregister_mmio fails when the region is not found. */
+		r = (success ? 0 : -ENOENT);
+		break;
+	}
+
+	return r;
+}
+
 int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		     int device_id, int subsys_id, int class)
 {
@@ -476,23 +537,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
 
 	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
-	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
-			     vdev);
-	if (r < 0)
-		return r;
-	port_addr = (u16)r;
-
 	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
-	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
-			       virtio_pci__io_mmio_callback, vdev);
-	if (r < 0)
-		goto free_ioport;
-
 	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
-	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
-			       virtio_pci__msix_mmio_callback, vdev);
-	if (r < 0)
-		goto free_mmio;
 
 	vpci->pci_hdr = (struct pci_device_header) {
 		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
@@ -518,6 +564,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
 	};
 
+	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
+				      virtio_pci__bar_activate,
+				      virtio_pci__bar_deactivate, vdev);
+	if (r < 0)
+		return r;
+
 	vpci->dev_hdr = (struct device_header) {
 		.bus_type		= DEVICE_BUS_PCI,
 		.data			= &vpci->pci_hdr,
@@ -550,17 +602,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 
 	r = device__register(&vpci->dev_hdr);
 	if (r < 0)
-		goto free_msix_mmio;
+		return r;
 
 	return 0;
-
-free_msix_mmio:
-	kvm__deregister_mmio(kvm, msix_io_block);
-free_mmio:
-	kvm__deregister_mmio(kvm, mmio_addr);
-free_ioport:
-	ioport__unregister(kvm, port_addr);
-	return r;
 }
 
 int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 09/12] pci: Toggle BAR I/O and memory space emulation
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (7 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 08/12] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 10/12] pci: Implement reassignable BARs Alexandru Elisei
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

During configuration of the BAR addresses, a Linux guest disables and
enables access to I/O and memory space. When access is disabled, we don't
stop emulating the memory regions described by the BARs. Now that we have
callbacks for activating and deactivating emulation for a BAR region,
let's use that to stop emulation when access is disabled, and
re-activate it when access is re-enabled.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 pci.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/pci.c b/pci.c
index b8e71b5f8d6c..96239160110c 100644
--- a/pci.c
+++ b/pci.c
@@ -163,6 +163,42 @@ static struct ioport_operations pci_config_data_ops = {
 	.io_out	= pci_config_data_out,
 };
 
+static void pci_config_command_wr(struct kvm *kvm,
+				  struct pci_device_header *pci_hdr,
+				  u16 new_command)
+{
+	int i;
+	bool toggle_io, toggle_mem;
+
+	toggle_io = (pci_hdr->command ^ new_command) & PCI_COMMAND_IO;
+	toggle_mem = (pci_hdr->command ^ new_command) & PCI_COMMAND_MEMORY;
+
+	for (i = 0; i < 6; i++) {
+		if (!pci_bar_is_implemented(pci_hdr, i))
+			continue;
+
+		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
+			if (__pci__io_space_enabled(new_command))
+				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
+							 pci_hdr->data);
+			else
+				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
+							   pci_hdr->data);
+		}
+
+		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
+			if (__pci__memory_space_enabled(new_command))
+				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
+							 pci_hdr->data);
+			else
+				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
+							   pci_hdr->data);
+		}
+	}
+
+	pci_hdr->command = new_command;
+}
+
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
@@ -188,6 +224,12 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	if (*(u32 *)(base + offset) == 0)
 		return;
 
+	if (offset == PCI_COMMAND) {
+		memcpy(&value, data, size);
+		pci_config_command_wr(kvm, pci_hdr, (u16)value);
+		return;
+	}
+
 	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
 
 	/*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 10/12] pci: Implement reassignable BARs
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (8 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 09/12] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 16:56   ` André Przywara
  2020-05-14 15:38 ` [PATCH v4 kvmtool 11/12] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

BARs are used by the guest to configure the access to the PCI device by
writing the address to which the device will respond. The basic idea for
adding support for reassignable BARs is straightforward: deactivate
emulation for the memory region described by the old BAR value, and
activate emulation for the new region.

BAR reassignment can be done while device access is enabled and memory
regions for different devices can overlap as long as no access is made to
the overlapping memory regions. This means that it is legal for the BARs of
two distinct devices to point to an overlapping memory region, and indeed,
this is how Linux does resource assignment at boot. To account for this
situation, the simple algorithm described above is enhanced to scan for all
devices and:

- Deactivate emulation for any BARs that might overlap with the new BAR
  value.

- Enable emulation for any BARs that were overlapping with the old value
  after the BAR has been updated.

Activating/deactivating emulation of a memory region has side effects.  In
order to prevent the execution of the same callback twice we now keep track
of the state of the region emulation. For example, this can happen if we
program a BAR with an address that overlaps a second BAR, thus deactivating
emulation for the second BAR, and then we disable all region accesses to
the second BAR by writing to the command register.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 include/kvm/pci.h |  14 ++-
 pci.c             | 250 +++++++++++++++++++++++++++++++++++++++++++-----------
 vfio/pci.c        |  12 +++
 3 files changed, 227 insertions(+), 49 deletions(-)

diff --git a/include/kvm/pci.h b/include/kvm/pci.h
index 73e06d76d244..bf81323d83b7 100644
--- a/include/kvm/pci.h
+++ b/include/kvm/pci.h
@@ -11,6 +11,17 @@
 #include "kvm/msi.h"
 #include "kvm/fdt.h"
 
+#define pci_dev_err(pci_hdr, fmt, ...) \
+	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_warn(pci_hdr, fmt, ...) \
+	pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_info(pci_hdr, fmt, ...) \
+	pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_dbg(pci_hdr, fmt, ...) \
+	pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+#define pci_dev_die(pci_hdr, fmt, ...) \
+	die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
+
 /*
  * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
  * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
@@ -142,7 +153,8 @@ struct pci_device_header {
 	};
 
 	/* Private to lkvm */
-	u32		bar_size[6];
+	u32			bar_size[6];
+	bool			bar_active[6];
 	bar_activate_fn_t	bar_activate_fn;
 	bar_deactivate_fn_t	bar_deactivate_fn;
 	void *data;
diff --git a/pci.c b/pci.c
index 96239160110c..2e2c0270a166 100644
--- a/pci.c
+++ b/pci.c
@@ -71,6 +71,11 @@ static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_nu
 	return pci__bar_size(pci_hdr, bar_num);
 }
 
+static bool pci_bar_is_active(struct pci_device_header *pci_hdr, int bar_num)
+{
+	return  pci_hdr->bar_active[bar_num];
+}
+
 static void *pci_config_address_ptr(u16 port)
 {
 	unsigned long offset;
@@ -163,6 +168,46 @@ static struct ioport_operations pci_config_data_ops = {
 	.io_out	= pci_config_data_out,
 };
 
+static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			    int bar_num)
+{
+	int r = 0;
+
+	if (pci_bar_is_active(pci_hdr, bar_num))
+		goto out;
+
+	r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
+	if (r < 0) {
+		pci_dev_warn(pci_hdr, "Error activating emulation for BAR %d",
+			     bar_num);
+		goto out;
+	}
+	pci_hdr->bar_active[bar_num] = true;
+
+out:
+	return r;
+}
+
+static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
+			      int bar_num)
+{
+	int r = 0;
+
+	if (!pci_bar_is_active(pci_hdr, bar_num))
+		goto out;
+
+	r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
+	if (r < 0) {
+		pci_dev_warn(pci_hdr, "Error deactivating emulation for BAR %d",
+			     bar_num);
+		goto out;
+	}
+	pci_hdr->bar_active[bar_num] = false;
+
+out:
+	return r;
+}
+
 static void pci_config_command_wr(struct kvm *kvm,
 				  struct pci_device_header *pci_hdr,
 				  u16 new_command)
@@ -179,26 +224,167 @@ static void pci_config_command_wr(struct kvm *kvm,
 
 		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
 			if (__pci__io_space_enabled(new_command))
-				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
-							 pci_hdr->data);
+				pci_activate_bar(kvm, pci_hdr, i);
 			else
-				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
-							   pci_hdr->data);
+				pci_deactivate_bar(kvm, pci_hdr, i);
 		}
 
 		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
 			if (__pci__memory_space_enabled(new_command))
-				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
-							 pci_hdr->data);
+				pci_activate_bar(kvm, pci_hdr, i);
 			else
-				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
-							   pci_hdr->data);
+				pci_deactivate_bar(kvm, pci_hdr, i);
 		}
 	}
 
 	pci_hdr->command = new_command;
 }
 
+static int pci_toggle_bar_regions(bool activate, struct kvm *kvm, u32 start, u32 size)
+{
+	struct device_header *dev_hdr;
+	struct pci_device_header *tmp_hdr;
+	u32 tmp_start, tmp_size;
+	int i, r;
+
+	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
+	while (dev_hdr) {
+		tmp_hdr = dev_hdr->data;
+		for (i = 0; i < 6; i++) {
+			if (!pci_bar_is_implemented(tmp_hdr, i))
+				continue;
+
+			tmp_start = pci__bar_address(tmp_hdr, i);
+			tmp_size = pci__bar_size(tmp_hdr, i);
+			if (tmp_start + tmp_size <= start ||
+			    tmp_start >= start + size)
+				continue;
+
+			if (activate)
+				r = pci_activate_bar(kvm, tmp_hdr, i);
+			else
+				r = pci_deactivate_bar(kvm, tmp_hdr, i);
+			if (r < 0)
+				return r;
+		}
+		dev_hdr = device__next_dev(dev_hdr);
+	}
+
+	return 0;
+}
+
+static inline int pci_activate_bar_regions(struct kvm *kvm, u32 start, u32 size)
+{
+	return pci_toggle_bar_regions(true, kvm, start, size);
+}
+
+static inline int pci_deactivate_bar_regions(struct kvm *kvm, u32 start, u32 size)
+{
+	return pci_toggle_bar_regions(false, kvm, start, size);
+}
+
+static void pci_config_bar_wr(struct kvm *kvm,
+			      struct pci_device_header *pci_hdr, int bar_num,
+			      u32 value)
+{
+	u32 old_addr, new_addr, bar_size;
+	u32 mask;
+	int r;
+
+	if (pci__bar_is_io(pci_hdr, bar_num))
+		mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
+	else
+		mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
+
+	/*
+	 * If the kernel masks the BAR, it will expect to find the size of the
+	 * BAR there next time it reads from it. After the kernel reads the
+	 * size, it will write the address back.
+	 *
+	 * According to the PCI local bus specification REV 3.0: The number of
+	 * upper bits that a device actually implements depends on how much of
+	 * the address space the device will respond to. A device that wants a 1
+	 * MB memory address space (using a 32-bit base address register) would
+	 * build the top 12 bits of the address register, hardwiring the other
+	 * bits to 0.
+	 *
+	 * Furthermore, software can determine how much address space the device
+	 * requires by writing a value of all 1's to the register and then
+	 * reading the value back. The device will return 0's in all don't-care
+	 * address bits, effectively specifying the address space required.
+	 *
+	 * Software computes the size of the address space with the formula
+	 * S =  ~B + 1, where S is the memory size and B is the value read from
+	 * the BAR. This means that the BAR value that kvmtool should return is
+	 * B = ~(S - 1).
+	 */
+	if (value == 0xffffffff) {
+		value = ~(pci__bar_size(pci_hdr, bar_num) - 1);
+		/* Preserve the special bits. */
+		value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
+		pci_hdr->bar[bar_num] = value;
+		return;
+	}
+
+	value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
+
+	/* Don't toggle emulation when region type access is disbled. */
+	if (pci__bar_is_io(pci_hdr, bar_num) &&
+	    !pci__io_space_enabled(pci_hdr)) {
+		pci_hdr->bar[bar_num] = value;
+		return;
+	}
+
+	if (pci__bar_is_memory(pci_hdr, bar_num) &&
+	    !pci__memory_space_enabled(pci_hdr)) {
+		pci_hdr->bar[bar_num] = value;
+		return;
+	}
+
+	/*
+	 * BAR reassignment can be done while device access is enabled and
+	 * memory regions for different devices can overlap as long as no access
+	 * is made to the overlapping memory regions. To implement BAR
+	 * reasignment, we deactivate emulation for the region described by the
+	 * BAR value that the guest is changing, we disable emulation for the
+	 * regions that overlap with the new one (by scanning through all PCI
+	 * devices), we enable emulation for the new BAR value and finally we
+	 * enable emulation for all device regions that were overlapping with
+	 * the old value.
+	 */
+	old_addr = pci__bar_address(pci_hdr, bar_num);
+	new_addr = __pci__bar_address(value);
+	bar_size = pci__bar_size(pci_hdr, bar_num);
+
+	r = pci_deactivate_bar(kvm, pci_hdr, bar_num);
+	if (r < 0)
+		return;
+
+	r = pci_deactivate_bar_regions(kvm, new_addr, bar_size);
+	if (r < 0) {
+		/*
+		 * We cannot update the BAR because of an overlapping region
+		 * that failed to deactivate emulation, so keep the old BAR
+		 * value and re-activate emulation for it.
+		 */
+		pci_activate_bar(kvm, pci_hdr, bar_num);
+		return;
+	}
+
+	pci_hdr->bar[bar_num] = value;
+	r = pci_activate_bar(kvm, pci_hdr, bar_num);
+	if (r < 0) {
+		/*
+		 * New region cannot be emulated, re-enable the regions that
+		 * were overlapping.
+		 */
+		pci_activate_bar_regions(kvm, new_addr, bar_size);
+		return;
+	}
+
+	pci_activate_bar_regions(kvm, old_addr, bar_size);
+}
+
 void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
 {
 	void *base;
@@ -206,7 +392,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	struct pci_device_header *pci_hdr;
 	u8 dev_num = addr.device_number;
 	u32 value = 0;
-	u32 mask;
 
 	if (!pci_device_exists(addr.bus_number, dev_num, 0))
 		return;
@@ -231,46 +416,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
 	}
 
 	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
-
-	/*
-	 * If the kernel masks the BAR, it will expect to find the size of the
-	 * BAR there next time it reads from it. After the kernel reads the
-	 * size, it will write the address back.
-	 */
 	if (bar < 6) {
-		if (pci__bar_is_io(pci_hdr, bar))
-			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
-		else
-			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
-		/*
-		 * According to the PCI local bus specification REV 3.0:
-		 * The number of upper bits that a device actually implements
-		 * depends on how much of the address space the device will
-		 * respond to. A device that wants a 1 MB memory address space
-		 * (using a 32-bit base address register) would build the top
-		 * 12 bits of the address register, hardwiring the other bits
-		 * to 0.
-		 *
-		 * Furthermore, software can determine how much address space
-		 * the device requires by writing a value of all 1's to the
-		 * register and then reading the value back. The device will
-		 * return 0's in all don't-care address bits, effectively
-		 * specifying the address space required.
-		 *
-		 * Software computes the size of the address space with the
-		 * formula S = ~B + 1, where S is the memory size and B is the
-		 * value read from the BAR. This means that the BAR value that
-		 * kvmtool should return is B = ~(S - 1).
-		 */
 		memcpy(&value, data, size);
-		if (value == 0xffffffff)
-			value = ~(pci__bar_size(pci_hdr, bar) - 1);
-		/* Preserve the special bits. */
-		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
-		memcpy(base + offset, &value, size);
-	} else {
-		memcpy(base + offset, data, size);
+		pci_config_bar_wr(kvm, pci_hdr, bar, value);
+		return;
 	}
+
+	memcpy(base + offset, data, size);
 }
 
 void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
@@ -336,16 +488,18 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr
 		if (!pci_bar_is_implemented(pci_hdr, i))
 			continue;
 
+		assert(!pci_bar_is_active(pci_hdr, i));
+
 		if (pci__bar_is_io(pci_hdr, i) &&
 		    pci__io_space_enabled(pci_hdr)) {
-			r = bar_activate_fn(kvm, pci_hdr, i, data);
+			r = pci_activate_bar(kvm, pci_hdr, i);
 			if (r < 0)
 				return r;
 		}
 
 		if (pci__bar_is_memory(pci_hdr, i) &&
 		    pci__memory_space_enabled(pci_hdr)) {
-			r = bar_activate_fn(kvm, pci_hdr, i, data);
+			r = pci_activate_bar(kvm, pci_hdr, i);
 			if (r < 0)
 				return r;
 		}
diff --git a/vfio/pci.c b/vfio/pci.c
index 34f19792765e..49ecd12a38cd 100644
--- a/vfio/pci.c
+++ b/vfio/pci.c
@@ -467,6 +467,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
 	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
 	struct vfio_pci_msix_table *table = &pdev->msix_table;
 	struct vfio_region *region;
+	u32 bar_addr;
 	bool has_msix;
 	int ret;
 
@@ -475,7 +476,14 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
 	region = &vdev->regions[bar_num];
 	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
 
+	bar_addr = pci__bar_address(pci_hdr, bar_num);
+	if (pci__bar_is_io(pci_hdr, bar_num))
+		region->port_base = bar_addr;
+	else
+		region->guest_phys_addr = bar_addr;
+
 	if (has_msix && (u32)bar_num == table->bar) {
+		table->guest_phys_addr = region->guest_phys_addr;
 		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
 					 table->size, false,
 					 vfio_pci_msix_table_access, pdev);
@@ -490,6 +498,10 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
 	}
 
 	if (has_msix && (u32)bar_num == pba->bar) {
+		if (pba->bar == table->bar)
+			pba->guest_phys_addr = table->guest_phys_addr + table->size;
+		else
+			pba->guest_phys_addr = region->guest_phys_addr;
 		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
 					 pba->size, false,
 					 vfio_pci_msix_pba_access, pdev);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 11/12] arm/fdt: Remove 'linux,pci-probe-only' property
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (9 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 10/12] pci: Implement reassignable BARs Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-14 15:38 ` [PATCH v4 kvmtool 12/12] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz, Julien Thierry

From: Julien Thierry <julien.thierry@arm.com>

PCI now supports configurable BARs. Get rid of the no longer needed,
Linux-only, fdt property.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/fdt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arm/fdt.c b/arm/fdt.c
index c80e6da323b6..02091e9e0bee 100644
--- a/arm/fdt.c
+++ b/arm/fdt.c
@@ -130,7 +130,6 @@ static int setup_fdt(struct kvm *kvm)
 
 	/* /chosen */
 	_FDT(fdt_begin_node(fdt, "chosen"));
-	_FDT(fdt_property_cell(fdt, "linux,pci-probe-only", 1));
 
 	/* Pass on our amended command line to a Linux kernel only. */
 	if (kvm->cfg.firmware_filename) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 kvmtool 12/12] vfio: Trap MMIO access to BAR addresses which aren't page aligned
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (10 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 11/12] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
@ 2020-05-14 15:38 ` Alexandru Elisei
  2020-05-15 15:38 ` [PATCH v4 kvmtool 00/12] Add reassignable BARs André Przywara
  2020-05-19 16:46 ` Will Deacon
  13 siblings, 0 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-14 15:38 UTC (permalink / raw)
  To: kvm
  Cc: will, julien.thierry.kdev, andre.przywara, sami.mujawar,
	lorenzo.pieralisi, maz

KVM_SET_USER_MEMORY_REGION will fail if the guest physical address is
not aligned to the page size. However, it is legal for a guest to
program an address which isn't aligned to the page size. Trap and
emulate MMIO accesses to the region when that happens.

Without this patch, when assigning a Seagate Barracude hard drive to a
VM I was seeing these errors:

[    0.286029] pci 0000:00:00.0: BAR 0: assigned [mem 0x41004600-0x4100467f]
  Error: 0000:01:00.0: failed to register region with KVM
  Error: [1095:3132] Error activating emulation for BAR 0
[..]
[   10.561794] irq 13: nobody cared (try booting with the "irqpoll" option)
[   10.563122] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-seattle-00009-g909b20467ed1 #133
[   10.563124] Hardware name: linux,dummy-virt (DT)
[   10.563126] Call trace:
[   10.563134]  dump_backtrace+0x0/0x140
[   10.563137]  show_stack+0x14/0x20
[   10.563141]  dump_stack+0xbc/0x100
[   10.563146]  __report_bad_irq+0x48/0xd4
[   10.563148]  note_interrupt+0x288/0x378
[   10.563151]  handle_irq_event_percpu+0x80/0x88
[   10.563153]  handle_irq_event+0x44/0xc8
[   10.563155]  handle_fasteoi_irq+0xb4/0x160
[   10.563157]  generic_handle_irq+0x24/0x38
[   10.563159]  __handle_domain_irq+0x60/0xb8
[   10.563162]  gic_handle_irq+0x50/0xa0
[   10.563164]  el1_irq+0xb8/0x180
[   10.563166]  arch_cpu_idle+0x10/0x18
[   10.563170]  do_idle+0x204/0x290
[   10.563172]  cpu_startup_entry+0x20/0x40
[   10.563175]  rest_init+0xd4/0xe0
[   10.563180]  arch_call_rest_init+0xc/0x14
[   10.563182]  start_kernel+0x420/0x44c
[   10.563183] handlers:
[   10.563650] [<000000001e474803>] sil24_interrupt
[   10.564559] Disabling IRQ #13
[..]
[   11.832916] ata1: spurious interrupt (slot_stat 0x0 active_tag -84148995 sactive 0x0)
[   12.045444] ata_ratelimit: 1 callbacks suppressed

With this patch, I don't see the errors and the device works as
expected.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 vfio/core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/vfio/core.c b/vfio/core.c
index bad3c7c8cd26..0b45e78b40b4 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -226,6 +226,15 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
 	if (!(region->info.flags & VFIO_REGION_INFO_FLAG_MMAP))
 		return vfio_setup_trap_region(kvm, vdev, region);
 
+	/*
+	 * KVM_SET_USER_MEMORY_REGION will fail because the guest physical
+	 * address isn't page aligned, let's emulate the region ourselves.
+	 */
+	if (region->guest_phys_addr & (PAGE_SIZE - 1))
+		return kvm__register_mmio(kvm, region->guest_phys_addr,
+					  region->info.size, false,
+					  vfio_mmio_access, region);
+
 	if (region->info.flags & VFIO_REGION_INFO_FLAG_READ)
 		prot |= PROT_READ;
 	if (region->info.flags & VFIO_REGION_INFO_FLAG_WRITE)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 kvmtool 06/12] vfio/pci: Don't write configuration value twice
  2020-05-14 15:38 ` [PATCH v4 kvmtool 06/12] vfio/pci: Don't write configuration value twice Alexandru Elisei
@ 2020-05-14 16:55   ` André Przywara
  0 siblings, 0 replies; 22+ messages in thread
From: André Przywara @ 2020-05-14 16:55 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On 14/05/2020 16:38, Alexandru Elisei wrote:
> After writing to the device fd as part of the PCI configuration space
> emulation, we read back from the device to make sure that the write
> finished. The value is read back into the PCI configuration space and
> afterwards, the same value is copied by the PCI emulation code. Let's
> read from the device fd into a temporary variable, to prevent this
> double write.
> 
> The double write is harmless in itself. But when we implement
> reassignable BARs, we need to keep track of the old BAR value, and the
> VFIO code is overwritting it.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Thanks for the changes!

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  vfio/pci.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 0b548e4bf9e2..2de893407574 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -3,6 +3,8 @@
>  #include "kvm/kvm-cpu.h"
>  #include "kvm/vfio.h"
>  
> +#include <assert.h>
> +
>  #include <sys/ioctl.h>
>  #include <sys/eventfd.h>
>  #include <sys/resource.h>
> @@ -478,7 +480,10 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	struct vfio_region_info *info;
>  	struct vfio_pci_device *pdev;
>  	struct vfio_device *vdev;
> -	void *base = pci_hdr;
> +	u32 tmp;
> +
> +	/* Make sure a larger size will not overrun tmp on the stack. */
> +	assert(sz <= 4);
>  
>  	if (offset == PCI_ROM_ADDRESS)
>  		return;
> @@ -498,7 +503,7 @@ static void vfio_pci_cfg_write(struct kvm *kvm, struct pci_device_header *pci_hd
>  	if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSI)
>  		vfio_pci_msi_cap_write(kvm, vdev, offset, data, sz);
>  
> -	if (pread(vdev->fd, base + offset, sz, info->offset + offset) != sz)
> +	if (pread(vdev->fd, &tmp, sz, info->offset + offset) != sz)
>  		vfio_dev_warn(vdev, "Failed to read %d bytes from Configuration Space at 0x%x",
>  			      sz, offset);
>  }
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 kvmtool 08/12] pci: Implement callbacks for toggling BAR emulation
  2020-05-14 15:38 ` [PATCH v4 kvmtool 08/12] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
@ 2020-05-14 16:56   ` André Przywara
  0 siblings, 0 replies; 22+ messages in thread
From: André Przywara @ 2020-05-14 16:56 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On 14/05/2020 16:38, Alexandru Elisei wrote:
> Implement callbacks for activating and deactivating emulation for a BAR
> region. This is in preparation for allowing a guest operating system to
> enable and disable access to I/O or memory space, or to reassign the
> BARs.
> 
> The emulated vesa device framebuffer isn't designed to allow stopping and
> restarting at arbitrary points in the guest execution. Furthermore, on x86,
> the kernel will not change the BAR addresses, which on bare metal are
> programmed by the firmware, so take the easy way out and refuse to
> activate/deactivate emulation for the BAR regions. We also take this
> opportunity to make the vesa emulation code more consistent by moving all
> static variable definitions in one place, at the top of the file.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  hw/vesa.c         |  70 ++++++++++++++++++++++-------------
>  include/kvm/pci.h |  18 ++++++++-
>  pci.c             |  39 ++++++++++++++++++++
>  vfio/pci.c        | 107 ++++++++++++++++++++++++++++++++++++++++++++++--------
>  virtio/pci.c      |  92 ++++++++++++++++++++++++++++++++++------------
>  5 files changed, 258 insertions(+), 68 deletions(-)
> 
> diff --git a/hw/vesa.c b/hw/vesa.c
> index a4ec7e16e1e6..8659a0028d31 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -18,6 +18,31 @@
>  #include <inttypes.h>
>  #include <unistd.h>
>  
> +static struct pci_device_header vesa_pci_device = {
> +	.vendor_id	= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> +	.device_id	= cpu_to_le16(PCI_DEVICE_ID_VESA),
> +	.header_type	= PCI_HEADER_TYPE_NORMAL,
> +	.revision_id	= 0,
> +	.class[2]	= 0x03,
> +	.subsys_vendor_id = cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
> +	.subsys_id	= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
> +	.bar[1]		= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
> +	.bar_size[1]	= VESA_MEM_SIZE,
> +};
> +
> +static struct device_header vesa_device = {
> +	.bus_type	= DEVICE_BUS_PCI,
> +	.data		= &vesa_pci_device,
> +};
> +
> +static struct framebuffer vesafb = {
> +	.width		= VESA_WIDTH,
> +	.height		= VESA_HEIGHT,
> +	.depth		= VESA_BPP,
> +	.mem_addr	= VESA_MEM_ADDR,
> +	.mem_size	= VESA_MEM_SIZE,
> +};
> +
>  static bool vesa_pci_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, void *data, int size)
>  {
>  	return true;
> @@ -33,24 +58,19 @@ static struct ioport_operations vesa_io_ops = {
>  	.io_out			= vesa_pci_io_out,
>  };
>  
> -static struct pci_device_header vesa_pci_device = {
> -	.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> -	.device_id		= cpu_to_le16(PCI_DEVICE_ID_VESA),
> -	.header_type		= PCI_HEADER_TYPE_NORMAL,
> -	.revision_id		= 0,
> -	.class[2]		= 0x03,
> -	.subsys_vendor_id	= cpu_to_le16(PCI_SUBSYSTEM_VENDOR_ID_REDHAT_QUMRANET),
> -	.subsys_id		= cpu_to_le16(PCI_SUBSYSTEM_ID_VESA),
> -	.bar[1]			= cpu_to_le32(VESA_MEM_ADDR | PCI_BASE_ADDRESS_SPACE_MEMORY),
> -	.bar_size[1]		= VESA_MEM_SIZE,
> -};
> -
> -static struct device_header vesa_device = {
> -	.bus_type	= DEVICE_BUS_PCI,
> -	.data		= &vesa_pci_device,
> -};
> +static int vesa__bar_activate(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      int bar_num, void *data)
> +{
> +	/* We don't support remapping of the framebuffer. */
> +	return 0;
> +}
>  
> -static struct framebuffer vesafb;
> +static int vesa__bar_deactivate(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +				int bar_num, void *data)
> +{
> +	/* We don't support remapping of the framebuffer. */
> +	return -EINVAL;
> +}
>  
>  struct framebuffer *vesa__init(struct kvm *kvm)
>  {
> @@ -68,6 +88,11 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>  	vesa_pci_device.bar[0]		= cpu_to_le32(vesa_base_addr | PCI_BASE_ADDRESS_SPACE_IO);
>  	vesa_pci_device.bar_size[0]	= PCI_IO_SIZE;
> +	r = pci__register_bar_regions(kvm, &vesa_pci_device, vesa__bar_activate,
> +				      vesa__bar_deactivate, NULL);
> +	if (r < 0)
> +		goto unregister_ioport;
> +
>  	r = device__register(&vesa_device);
>  	if (r < 0)
>  		goto unregister_ioport;
> @@ -82,15 +107,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  	if (r < 0)
>  		goto unmap_dev;
>  
> -	vesafb = (struct framebuffer) {
> -		.width			= VESA_WIDTH,
> -		.height			= VESA_HEIGHT,
> -		.depth			= VESA_BPP,
> -		.mem			= mem,
> -		.mem_addr		= VESA_MEM_ADDR,
> -		.mem_size		= VESA_MEM_SIZE,
> -		.kvm			= kvm,
> -	};
> +	vesafb.mem = mem;
> +	vesafb.kvm = kvm;
>  	return fb__register(&vesafb);
>  
>  unmap_dev:
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index 0f0815b36157..73e06d76d244 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -89,12 +89,19 @@ struct pci_cap_hdr {
>  	u8	next;
>  };
>  
> +struct pci_device_header;
> +
> +typedef int (*bar_activate_fn_t)(struct kvm *kvm,
> +				 struct pci_device_header *pci_hdr,
> +				 int bar_num, void *data);
> +typedef int (*bar_deactivate_fn_t)(struct kvm *kvm,
> +				   struct pci_device_header *pci_hdr,
> +				   int bar_num, void *data);
> +
>  #define PCI_BAR_OFFSET(b)	(offsetof(struct pci_device_header, bar[b]))
>  #define PCI_DEV_CFG_SIZE	256
>  #define PCI_DEV_CFG_MASK	(PCI_DEV_CFG_SIZE - 1)
>  
> -struct pci_device_header;
> -
>  struct pci_config_operations {
>  	void (*write)(struct kvm *kvm, struct pci_device_header *pci_hdr,
>  		      u8 offset, void *data, int sz);
> @@ -136,6 +143,9 @@ struct pci_device_header {
>  
>  	/* Private to lkvm */
>  	u32		bar_size[6];
> +	bar_activate_fn_t	bar_activate_fn;
> +	bar_deactivate_fn_t	bar_deactivate_fn;
> +	void *data;
>  	struct pci_config_operations	cfg_ops;
>  	/*
>  	 * PCI INTx# are level-triggered, but virtual device often feature
> @@ -162,6 +172,10 @@ void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data,
>  
>  void *pci_find_cap(struct pci_device_header *hdr, u8 cap_type);
>  
> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      bar_activate_fn_t bar_activate_fn,
> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data);
> +
>  static inline bool __pci__memory_space_enabled(u16 command)
>  {
>  	return command & PCI_COMMAND_MEMORY;
> diff --git a/pci.c b/pci.c
> index eb0bb366a291..b8e71b5f8d6c 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -66,6 +66,11 @@ int pci__assign_irq(struct pci_device_header *pci_hdr)
>  	return pci_hdr->irq_line;
>  }
>  
> +static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return pci__bar_size(pci_hdr, bar_num);
> +}
> +
>  static void *pci_config_address_ptr(u16 port)
>  {
>  	unsigned long offset;
> @@ -273,6 +278,40 @@ struct pci_device_header *pci__find_dev(u8 dev_num)
>  	return hdr->data;
>  }
>  
> +int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      bar_activate_fn_t bar_activate_fn,
> +			      bar_deactivate_fn_t bar_deactivate_fn, void *data)
> +{
> +	int i, r;
> +
> +	assert(bar_activate_fn && bar_deactivate_fn);
> +
> +	pci_hdr->bar_activate_fn = bar_activate_fn;
> +	pci_hdr->bar_deactivate_fn = bar_deactivate_fn;
> +	pci_hdr->data = data;
> +
> +	for (i = 0; i < 6; i++) {
> +		if (!pci_bar_is_implemented(pci_hdr, i))
> +			continue;
> +
> +		if (pci__bar_is_io(pci_hdr, i) &&
> +		    pci__io_space_enabled(pci_hdr)) {
> +			r = bar_activate_fn(kvm, pci_hdr, i, data);
> +			if (r < 0)
> +				return r;
> +		}
> +
> +		if (pci__bar_is_memory(pci_hdr, i) &&
> +		    pci__memory_space_enabled(pci_hdr)) {
> +			r = bar_activate_fn(kvm, pci_hdr, i, data);
> +			if (r < 0)
> +				return r;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  int pci__init(struct kvm *kvm)
>  {
>  	int r;
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 2de893407574..34f19792765e 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -10,6 +10,8 @@
>  #include <sys/resource.h>
>  #include <sys/time.h>
>  
> +#include <assert.h>
> +
>  /* Wrapper around UAPI vfio_irq_set */
>  union vfio_irq_eventfd {
>  	struct vfio_irq_set	irq;
> @@ -456,6 +458,88 @@ out_unlock:
>  	mutex_unlock(&pdev->msi.mutex);
>  }
>  
> +static int vfio_pci_bar_activate(struct kvm *kvm,
> +				 struct pci_device_header *pci_hdr,
> +				 int bar_num, void *data)
> +{
> +	struct vfio_device *vdev = data;
> +	struct vfio_pci_device *pdev = &vdev->pci;
> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
> +	struct vfio_region *region;
> +	bool has_msix;
> +	int ret;
> +
> +	assert((u32)bar_num < vdev->info.num_regions);
> +
> +	region = &vdev->regions[bar_num];
> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
> +
> +	if (has_msix && (u32)bar_num == table->bar) {
> +		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
> +					 table->size, false,
> +					 vfio_pci_msix_table_access, pdev);
> +		/*
> +		 * The MSIX table and the PBA structure can share the same BAR,
> +		 * but for convenience we register different regions for mmio
> +		 * emulation. We want to we update both if they share the same
> +		 * BAR.
> +		 */
> +		if (ret < 0 || table->bar != pba->bar)
> +			goto out;
> +	}
> +
> +	if (has_msix && (u32)bar_num == pba->bar) {
> +		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
> +					 pba->size, false,
> +					 vfio_pci_msix_pba_access, pdev);
> +		goto out;
> +	}
> +
> +	ret = vfio_map_region(kvm, vdev, region);
> +out:
> +	return ret;
> +}
> +
> +static int vfio_pci_bar_deactivate(struct kvm *kvm,
> +				   struct pci_device_header *pci_hdr,
> +				   int bar_num, void *data)
> +{
> +	struct vfio_device *vdev = data;
> +	struct vfio_pci_device *pdev = &vdev->pci;
> +	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
> +	struct vfio_pci_msix_table *table = &pdev->msix_table;
> +	struct vfio_region *region;
> +	bool has_msix, success;
> +	int ret;
> +
> +	assert((u32)bar_num < vdev->info.num_regions);
> +
> +	region = &vdev->regions[bar_num];
> +	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
> +
> +	if (has_msix && (u32)bar_num == table->bar) {
> +		success = kvm__deregister_mmio(kvm, table->guest_phys_addr);
> +		/* kvm__deregister_mmio fails when the region is not found. */
> +		ret = (success ? 0 : -ENOENT);
> +		/* See vfio_pci_bar_activate(). */
> +		if (ret < 0 || table->bar!= pba->bar)
> +			goto out;
> +	}
> +
> +	if (has_msix && (u32)bar_num == pba->bar) {
> +		success = kvm__deregister_mmio(kvm, pba->guest_phys_addr);
> +		ret = (success ? 0 : -ENOENT);
> +		goto out;
> +	}
> +
> +	vfio_unmap_region(kvm, region);
> +	ret = 0;
> +
> +out:
> +	return ret;
> +}
> +
>  static void vfio_pci_cfg_read(struct kvm *kvm, struct pci_device_header *pci_hdr,
>  			      u8 offset, void *data, int sz)
>  {
> @@ -818,12 +902,6 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>  		ret = -ENOMEM;
>  		goto out_free;
>  	}
> -	pba->guest_phys_addr = table->guest_phys_addr + table->size;
> -
> -	ret = kvm__register_mmio(kvm, table->guest_phys_addr, table->size,
> -				 false, vfio_pci_msix_table_access, pdev);
> -	if (ret < 0)
> -		goto out_free;
>  
>  	/*
>  	 * We could map the physical PBA directly into the guest, but it's
> @@ -833,10 +911,7 @@ static int vfio_pci_create_msix_table(struct kvm *kvm, struct vfio_device *vdev)
>  	 * between MSI-X table and PBA. For the sake of isolation, create a
>  	 * virtual PBA.
>  	 */
> -	ret = kvm__register_mmio(kvm, pba->guest_phys_addr, pba->size, false,
> -				 vfio_pci_msix_pba_access, pdev);
> -	if (ret < 0)
> -		goto out_free;
> +	pba->guest_phys_addr = table->guest_phys_addr + table->size;
>  
>  	pdev->msix.entries = entries;
>  	pdev->msix.nr_entries = nr_entries;
> @@ -907,11 +982,6 @@ static int vfio_pci_configure_bar(struct kvm *kvm, struct vfio_device *vdev,
>  		region->guest_phys_addr = pci_get_mmio_block(map_size);
>  	}
>  
> -	/* Map the BARs into the guest or setup a trap region. */
> -	ret = vfio_map_region(kvm, vdev, region);
> -	if (ret)
> -		return ret;
> -
>  	return 0;
>  }
>  
> @@ -958,7 +1028,12 @@ static int vfio_pci_configure_dev_regions(struct kvm *kvm,
>  	}
>  
>  	/* We've configured the BARs, fake up a Configuration Space */
> -	return vfio_pci_fixup_cfg_space(vdev);
> +	ret = vfio_pci_fixup_cfg_space(vdev);
> +	if (ret)
> +		return ret;
> +
> +	return pci__register_bar_regions(kvm, &pdev->hdr, vfio_pci_bar_activate,
> +					 vfio_pci_bar_deactivate, vdev);
>  }
>  
>  /*
> diff --git a/virtio/pci.c b/virtio/pci.c
> index eded8685e1b3..6eea6c686695 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -11,6 +11,7 @@
>  #include <sys/ioctl.h>
>  #include <linux/virtio_pci.h>
>  #include <linux/byteorder.h>
> +#include <assert.h>
>  #include <string.h>
>  
>  static u16 virtio_pci__port_addr(struct virtio_pci *vpci)
> @@ -462,6 +463,66 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu *vcpu,
>  		virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
>  }
>  
> +static int virtio_pci__bar_activate(struct kvm *kvm,
> +				    struct pci_device_header *pci_hdr,
> +				    int bar_num, void *data)
> +{
> +	struct virtio_device *vdev = data;
> +	u32 bar_addr, bar_size;
> +	int r = -EINVAL;
> +
> +	assert(bar_num <= 2);
> +
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +	bar_size = pci__bar_size(pci_hdr, bar_num);
> +
> +	switch (bar_num) {
> +	case 0:
> +		r = ioport__register(kvm, bar_addr, &virtio_pci__io_ops,
> +				     bar_size, vdev);
> +		if (r > 0)
> +			r = 0;
> +		break;
> +	case 1:
> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
> +					virtio_pci__io_mmio_callback, vdev);
> +		break;
> +	case 2:
> +		r =  kvm__register_mmio(kvm, bar_addr, bar_size, false,
> +					virtio_pci__msix_mmio_callback, vdev);
> +		break;
> +	}
> +
> +	return r;
> +}
> +
> +static int virtio_pci__bar_deactivate(struct kvm *kvm,
> +				      struct pci_device_header *pci_hdr,
> +				      int bar_num, void *data)
> +{
> +	u32 bar_addr;
> +	bool success;
> +	int r = -EINVAL;
> +
> +	assert(bar_num <= 2);
> +
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +
> +	switch (bar_num) {
> +	case 0:
> +		r = ioport__unregister(kvm, bar_addr);
> +		break;
> +	case 1:
> +	case 2:
> +		success = kvm__deregister_mmio(kvm, bar_addr);
> +		/* kvm__deregister_mmio fails when the region is not found. */
> +		r = (success ? 0 : -ENOENT);
> +		break;
> +	}
> +
> +	return r;
> +}
> +
>  int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		     int device_id, int subsys_id, int class)
>  {
> @@ -476,23 +537,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  	BUILD_BUG_ON(!is_power_of_two(PCI_IO_SIZE));
>  
>  	port_addr = pci_get_io_port_block(PCI_IO_SIZE);
> -	r = ioport__register(kvm, port_addr, &virtio_pci__io_ops, PCI_IO_SIZE,
> -			     vdev);
> -	if (r < 0)
> -		return r;
> -	port_addr = (u16)r;
> -
>  	mmio_addr = pci_get_mmio_block(PCI_IO_SIZE);
> -	r = kvm__register_mmio(kvm, mmio_addr, PCI_IO_SIZE, false,
> -			       virtio_pci__io_mmio_callback, vdev);
> -	if (r < 0)
> -		goto free_ioport;
> -
>  	msix_io_block = pci_get_mmio_block(PCI_IO_SIZE * 2);
> -	r = kvm__register_mmio(kvm, msix_io_block, PCI_IO_SIZE * 2, false,
> -			       virtio_pci__msix_mmio_callback, vdev);
> -	if (r < 0)
> -		goto free_mmio;
>  
>  	vpci->pci_hdr = (struct pci_device_header) {
>  		.vendor_id		= cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET),
> @@ -518,6 +564,12 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  		.bar_size[2]		= cpu_to_le32(PCI_IO_SIZE*2),
>  	};
>  
> +	r = pci__register_bar_regions(kvm, &vpci->pci_hdr,
> +				      virtio_pci__bar_activate,
> +				      virtio_pci__bar_deactivate, vdev);
> +	if (r < 0)
> +		return r;
> +
>  	vpci->dev_hdr = (struct device_header) {
>  		.bus_type		= DEVICE_BUS_PCI,
>  		.data			= &vpci->pci_hdr,
> @@ -550,17 +602,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
>  
>  	r = device__register(&vpci->dev_hdr);
>  	if (r < 0)
> -		goto free_msix_mmio;
> +		return r;
>  
>  	return 0;
> -
> -free_msix_mmio:
> -	kvm__deregister_mmio(kvm, msix_io_block);
> -free_mmio:
> -	kvm__deregister_mmio(kvm, mmio_addr);
> -free_ioport:
> -	ioport__unregister(kvm, port_addr);
> -	return r;
>  }
>  
>  int virtio_pci__reset(struct kvm *kvm, struct virtio_device *vdev)
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 kvmtool 07/12] Don't allow more than one framebuffers
  2020-05-14 15:38 ` [PATCH v4 kvmtool 07/12] Don't allow more than one framebuffers Alexandru Elisei
@ 2020-05-14 16:56   ` André Przywara
  0 siblings, 0 replies; 22+ messages in thread
From: André Przywara @ 2020-05-14 16:56 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On 14/05/2020 16:38, Alexandru Elisei wrote:
> A vesa device is used by the SDL, GTK or VNC framebuffers. Don't allow the
> user to specify more than one of these options because kvmtool will create
> identical vesa devices and bad things will happen:
> 
> $ ./lkvm run -c2 -m2048 -k bzImage --sdl --gtk
>   # lkvm run -k bzImage -m 2048 -c 2 --name guest-10159
>   Error: device region [d0000000-d012bfff] would overlap device region [d0000000-d012bfff]
> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
> ======= Backtrace: =========
> ======= Backtrace: =========
> /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
> ======= Backtrace: =========
> /lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
> (+0x777e5)[0x7fae0ed447e5]
> /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fae0ed447e5]
> /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fae0ed4d37a]
> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fae0ed5153c]
> *** Error in `./lkvm': free(): invalid pointer: 0x00007fad78002e40 ***
> /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
> /lib/x86_64-linux-gnu/libglib-2.0.so.0(g_string_free+0x3b)[0x7fae0f814dab]
> /usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
> /usr/lib/x86_64-linux-gnu/libgtk-3.so.0(+0x21121c)[0x7fae1023321c]
> ======= Backtrace: =========
> Aborted (core dumped)
> 
> The vesa device is explicitly created during the initialization phase of
> the above framebuffers. Also remove the superfluous check for their
> existence.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  builtin-run.c | 5 +++++
>  hw/vesa.c     | 2 --
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin-run.c b/builtin-run.c
> index 03b119e29995..c23e7a2e5ecd 100644
> --- a/builtin-run.c
> +++ b/builtin-run.c
> @@ -543,6 +543,11 @@ static struct kvm *kvm_cmd_run_init(int argc, const char **argv)
>  		kvm->cfg.console = DEFAULT_CONSOLE;
>  
>  	video = kvm->cfg.vnc || kvm->cfg.sdl || kvm->cfg.gtk;
> +	if (video) {
> +		if ((kvm->cfg.vnc && (kvm->cfg.sdl || kvm->cfg.gtk)) ||
> +		    (kvm->cfg.sdl && kvm->cfg.gtk))
> +			die("Only one of --vnc, --sdl or --gtk can be specified");
> +	}
>  
>  	if (!strncmp(kvm->cfg.console, "virtio", 6))
>  		kvm->cfg.active_console  = CONSOLE_VIRTIO;
> diff --git a/hw/vesa.c b/hw/vesa.c
> index dd59a112330b..a4ec7e16e1e6 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -61,8 +61,6 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  	BUILD_BUG_ON(!is_power_of_two(VESA_MEM_SIZE));
>  	BUILD_BUG_ON(VESA_MEM_SIZE < VESA_BPP/8 * VESA_WIDTH * VESA_HEIGHT);
>  
> -	if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
> -		return NULL;
>  	vesa_base_addr = pci_get_io_port_block(PCI_IO_SIZE);
>  	r = ioport__register(kvm, vesa_base_addr, &vesa_io_ops, PCI_IO_SIZE, NULL);
>  	if (r < 0)
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 kvmtool 10/12] pci: Implement reassignable BARs
  2020-05-14 15:38 ` [PATCH v4 kvmtool 10/12] pci: Implement reassignable BARs Alexandru Elisei
@ 2020-05-14 16:56   ` André Przywara
  2020-05-15 13:25     ` Alexandru Elisei
  0 siblings, 1 reply; 22+ messages in thread
From: André Przywara @ 2020-05-14 16:56 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On 14/05/2020 16:38, Alexandru Elisei wrote:

Hi,

> BARs are used by the guest to configure the access to the PCI device by
> writing the address to which the device will respond. The basic idea for
> adding support for reassignable BARs is straightforward: deactivate
> emulation for the memory region described by the old BAR value, and
> activate emulation for the new region.
> 
> BAR reassignment can be done while device access is enabled and memory
> regions for different devices can overlap as long as no access is made to
> the overlapping memory regions. This means that it is legal for the BARs of
> two distinct devices to point to an overlapping memory region, and indeed,
> this is how Linux does resource assignment at boot. To account for this
> situation, the simple algorithm described above is enhanced to scan for all
> devices and:
> 
> - Deactivate emulation for any BARs that might overlap with the new BAR
>   value.
> 
> - Enable emulation for any BARs that were overlapping with the old value
>   after the BAR has been updated.
> 
> Activating/deactivating emulation of a memory region has side effects.  In
> order to prevent the execution of the same callback twice we now keep track
> of the state of the region emulation. For example, this can happen if we
> program a BAR with an address that overlaps a second BAR, thus deactivating
> emulation for the second BAR, and then we disable all region accesses to
> the second BAR by writing to the command register.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Many thanks for the changes and the added comments!

Reviewed-by: Andre Przywara <andre.przywara@arm.com>


One minor hint below, but that's not critical.

> ---
>  include/kvm/pci.h |  14 ++-
>  pci.c             | 250 +++++++++++++++++++++++++++++++++++++++++++-----------
>  vfio/pci.c        |  12 +++
>  3 files changed, 227 insertions(+), 49 deletions(-)
> 
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index 73e06d76d244..bf81323d83b7 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -11,6 +11,17 @@
>  #include "kvm/msi.h"
>  #include "kvm/fdt.h"
>  
> +#define pci_dev_err(pci_hdr, fmt, ...) \
> +	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_warn(pci_hdr, fmt, ...) \
> +	pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_info(pci_hdr, fmt, ...) \
> +	pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_dbg(pci_hdr, fmt, ...) \
> +	pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +#define pci_dev_die(pci_hdr, fmt, ...) \
> +	die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
> +
>  /*
>   * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
>   * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
> @@ -142,7 +153,8 @@ struct pci_device_header {
>  	};
>  
>  	/* Private to lkvm */
> -	u32		bar_size[6];
> +	u32			bar_size[6];
> +	bool			bar_active[6];
>  	bar_activate_fn_t	bar_activate_fn;
>  	bar_deactivate_fn_t	bar_deactivate_fn;
>  	void *data;
> diff --git a/pci.c b/pci.c
> index 96239160110c..2e2c0270a166 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -71,6 +71,11 @@ static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_nu
>  	return pci__bar_size(pci_hdr, bar_num);
>  }
>  
> +static bool pci_bar_is_active(struct pci_device_header *pci_hdr, int bar_num)
> +{
> +	return  pci_hdr->bar_active[bar_num];
> +}
> +
>  static void *pci_config_address_ptr(u16 port)
>  {
>  	unsigned long offset;
> @@ -163,6 +168,46 @@ static struct ioport_operations pci_config_data_ops = {
>  	.io_out	= pci_config_data_out,
>  };
>  
> +static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			    int bar_num)
> +{
> +	int r = 0;
> +
> +	if (pci_bar_is_active(pci_hdr, bar_num))
> +		goto out;
> +
> +	r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
> +	if (r < 0) {
> +		pci_dev_warn(pci_hdr, "Error activating emulation for BAR %d",
> +			     bar_num);
> +		goto out;
> +	}
> +	pci_hdr->bar_active[bar_num] = true;
> +
> +out:
> +	return r;
> +}
> +
> +static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
> +			      int bar_num)
> +{
> +	int r = 0;
> +
> +	if (!pci_bar_is_active(pci_hdr, bar_num))
> +		goto out;
> +
> +	r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
> +	if (r < 0) {
> +		pci_dev_warn(pci_hdr, "Error deactivating emulation for BAR %d",
> +			     bar_num);
> +		goto out;
> +	}
> +	pci_hdr->bar_active[bar_num] = false;
> +
> +out:
> +	return r;
> +}
> +
>  static void pci_config_command_wr(struct kvm *kvm,
>  				  struct pci_device_header *pci_hdr,
>  				  u16 new_command)
> @@ -179,26 +224,167 @@ static void pci_config_command_wr(struct kvm *kvm,
>  
>  		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
>  			if (__pci__io_space_enabled(new_command))
> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> -							 pci_hdr->data);
> +				pci_activate_bar(kvm, pci_hdr, i);
>  			else
> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> -							   pci_hdr->data);
> +				pci_deactivate_bar(kvm, pci_hdr, i);
>  		}
>  
>  		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
>  			if (__pci__memory_space_enabled(new_command))
> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
> -							 pci_hdr->data);
> +				pci_activate_bar(kvm, pci_hdr, i);
>  			else
> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
> -							   pci_hdr->data);
> +				pci_deactivate_bar(kvm, pci_hdr, i);
>  		}
>  	}
>  
>  	pci_hdr->command = new_command;
>  }
>  
> +static int pci_toggle_bar_regions(bool activate, struct kvm *kvm, u32 start, u32 size)
> +{
> +	struct device_header *dev_hdr;
> +	struct pci_device_header *tmp_hdr;
> +	u32 tmp_start, tmp_size;
> +	int i, r;
> +
> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
> +	while (dev_hdr) {
> +		tmp_hdr = dev_hdr->data;
> +		for (i = 0; i < 6; i++) {
> +			if (!pci_bar_is_implemented(tmp_hdr, i))
> +				continue;
> +
> +			tmp_start = pci__bar_address(tmp_hdr, i);
> +			tmp_size = pci__bar_size(tmp_hdr, i);
> +			if (tmp_start + tmp_size <= start ||
> +			    tmp_start >= start + size)
> +				continue;
> +
> +			if (activate)
> +				r = pci_activate_bar(kvm, tmp_hdr, i);
> +			else
> +				r = pci_deactivate_bar(kvm, tmp_hdr, i);
> +			if (r < 0)
> +				return r;
> +		}
> +		dev_hdr = device__next_dev(dev_hdr);
> +	}
> +
> +	return 0;
> +}
> +
> +static inline int pci_activate_bar_regions(struct kvm *kvm, u32 start, u32 size)

This inline is not needed. It's a hint anyway, the compiler may or may
not observe it. It knows best anyway, if it doesn't inline it, then for
a reason.

There is a cause for "static inline" in *header files*, because it
prevents warnings about unused functions.

Cheers,
Anre.

> +{
> +	return pci_toggle_bar_regions(true, kvm, start, size);
> +}
> +
> +static inline int pci_deactivate_bar_regions(struct kvm *kvm, u32 start, u32 size)
> +{
> +	return pci_toggle_bar_regions(false, kvm, start, size);
> +}
> +
> +static void pci_config_bar_wr(struct kvm *kvm,
> +			      struct pci_device_header *pci_hdr, int bar_num,
> +			      u32 value)
> +{
> +	u32 old_addr, new_addr, bar_size;
> +	u32 mask;
> +	int r;
> +
> +	if (pci__bar_is_io(pci_hdr, bar_num))
> +		mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
> +	else
> +		mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> +
> +	/*
> +	 * If the kernel masks the BAR, it will expect to find the size of the
> +	 * BAR there next time it reads from it. After the kernel reads the
> +	 * size, it will write the address back.
> +	 *
> +	 * According to the PCI local bus specification REV 3.0: The number of
> +	 * upper bits that a device actually implements depends on how much of
> +	 * the address space the device will respond to. A device that wants a 1
> +	 * MB memory address space (using a 32-bit base address register) would
> +	 * build the top 12 bits of the address register, hardwiring the other
> +	 * bits to 0.
> +	 *
> +	 * Furthermore, software can determine how much address space the device
> +	 * requires by writing a value of all 1's to the register and then
> +	 * reading the value back. The device will return 0's in all don't-care
> +	 * address bits, effectively specifying the address space required.
> +	 *
> +	 * Software computes the size of the address space with the formula
> +	 * S =  ~B + 1, where S is the memory size and B is the value read from
> +	 * the BAR. This means that the BAR value that kvmtool should return is
> +	 * B = ~(S - 1).
> +	 */
> +	if (value == 0xffffffff) {
> +		value = ~(pci__bar_size(pci_hdr, bar_num) - 1);
> +		/* Preserve the special bits. */
> +		value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
> +		pci_hdr->bar[bar_num] = value;
> +		return;
> +	}
> +
> +	value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
> +
> +	/* Don't toggle emulation when region type access is disbled. */
> +	if (pci__bar_is_io(pci_hdr, bar_num) &&
> +	    !pci__io_space_enabled(pci_hdr)) {
> +		pci_hdr->bar[bar_num] = value;
> +		return;
> +	}
> +
> +	if (pci__bar_is_memory(pci_hdr, bar_num) &&
> +	    !pci__memory_space_enabled(pci_hdr)) {
> +		pci_hdr->bar[bar_num] = value;
> +		return;
> +	}
> +
> +	/*
> +	 * BAR reassignment can be done while device access is enabled and
> +	 * memory regions for different devices can overlap as long as no access
> +	 * is made to the overlapping memory regions. To implement BAR
> +	 * reasignment, we deactivate emulation for the region described by the
> +	 * BAR value that the guest is changing, we disable emulation for the
> +	 * regions that overlap with the new one (by scanning through all PCI
> +	 * devices), we enable emulation for the new BAR value and finally we
> +	 * enable emulation for all device regions that were overlapping with
> +	 * the old value.
> +	 */
> +	old_addr = pci__bar_address(pci_hdr, bar_num);
> +	new_addr = __pci__bar_address(value);
> +	bar_size = pci__bar_size(pci_hdr, bar_num);
> +
> +	r = pci_deactivate_bar(kvm, pci_hdr, bar_num);
> +	if (r < 0)
> +		return;
> +
> +	r = pci_deactivate_bar_regions(kvm, new_addr, bar_size);
> +	if (r < 0) {
> +		/*
> +		 * We cannot update the BAR because of an overlapping region
> +		 * that failed to deactivate emulation, so keep the old BAR
> +		 * value and re-activate emulation for it.
> +		 */
> +		pci_activate_bar(kvm, pci_hdr, bar_num);
> +		return;
> +	}
> +
> +	pci_hdr->bar[bar_num] = value;
> +	r = pci_activate_bar(kvm, pci_hdr, bar_num);
> +	if (r < 0) {
> +		/*
> +		 * New region cannot be emulated, re-enable the regions that
> +		 * were overlapping.
> +		 */
> +		pci_activate_bar_regions(kvm, new_addr, bar_size);
> +		return;
> +	}
> +
> +	pci_activate_bar_regions(kvm, old_addr, bar_size);
> +}
> +
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>  {
>  	void *base;
> @@ -206,7 +392,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	struct pci_device_header *pci_hdr;
>  	u8 dev_num = addr.device_number;
>  	u32 value = 0;
> -	u32 mask;
>  
>  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
>  		return;
> @@ -231,46 +416,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>  	}
>  
>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
> -
> -	/*
> -	 * If the kernel masks the BAR, it will expect to find the size of the
> -	 * BAR there next time it reads from it. After the kernel reads the
> -	 * size, it will write the address back.
> -	 */
>  	if (bar < 6) {
> -		if (pci__bar_is_io(pci_hdr, bar))
> -			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
> -		else
> -			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> -		/*
> -		 * According to the PCI local bus specification REV 3.0:
> -		 * The number of upper bits that a device actually implements
> -		 * depends on how much of the address space the device will
> -		 * respond to. A device that wants a 1 MB memory address space
> -		 * (using a 32-bit base address register) would build the top
> -		 * 12 bits of the address register, hardwiring the other bits
> -		 * to 0.
> -		 *
> -		 * Furthermore, software can determine how much address space
> -		 * the device requires by writing a value of all 1's to the
> -		 * register and then reading the value back. The device will
> -		 * return 0's in all don't-care address bits, effectively
> -		 * specifying the address space required.
> -		 *
> -		 * Software computes the size of the address space with the
> -		 * formula S = ~B + 1, where S is the memory size and B is the
> -		 * value read from the BAR. This means that the BAR value that
> -		 * kvmtool should return is B = ~(S - 1).
> -		 */
>  		memcpy(&value, data, size);
> -		if (value == 0xffffffff)
> -			value = ~(pci__bar_size(pci_hdr, bar) - 1);
> -		/* Preserve the special bits. */
> -		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
> -		memcpy(base + offset, &value, size);
> -	} else {
> -		memcpy(base + offset, data, size);
> +		pci_config_bar_wr(kvm, pci_hdr, bar, value);
> +		return;
>  	}
> +
> +	memcpy(base + offset, data, size);
>  }
>  
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
> @@ -336,16 +488,18 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr
>  		if (!pci_bar_is_implemented(pci_hdr, i))
>  			continue;
>  
> +		assert(!pci_bar_is_active(pci_hdr, i));
> +
>  		if (pci__bar_is_io(pci_hdr, i) &&
>  		    pci__io_space_enabled(pci_hdr)) {
> -			r = bar_activate_fn(kvm, pci_hdr, i, data);
> +			r = pci_activate_bar(kvm, pci_hdr, i);
>  			if (r < 0)
>  				return r;
>  		}
>  
>  		if (pci__bar_is_memory(pci_hdr, i) &&
>  		    pci__memory_space_enabled(pci_hdr)) {
> -			r = bar_activate_fn(kvm, pci_hdr, i, data);
> +			r = pci_activate_bar(kvm, pci_hdr, i);
>  			if (r < 0)
>  				return r;
>  		}
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 34f19792765e..49ecd12a38cd 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -467,6 +467,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>  	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>  	struct vfio_pci_msix_table *table = &pdev->msix_table;
>  	struct vfio_region *region;
> +	u32 bar_addr;
>  	bool has_msix;
>  	int ret;
>  
> @@ -475,7 +476,14 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>  	region = &vdev->regions[bar_num];
>  	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>  
> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
> +	if (pci__bar_is_io(pci_hdr, bar_num))
> +		region->port_base = bar_addr;
> +	else
> +		region->guest_phys_addr = bar_addr;
> +
>  	if (has_msix && (u32)bar_num == table->bar) {
> +		table->guest_phys_addr = region->guest_phys_addr;
>  		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>  					 table->size, false,
>  					 vfio_pci_msix_table_access, pdev);
> @@ -490,6 +498,10 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>  	}
>  
>  	if (has_msix && (u32)bar_num == pba->bar) {
> +		if (pba->bar == table->bar)
> +			pba->guest_phys_addr = table->guest_phys_addr + table->size;
> +		else
> +			pba->guest_phys_addr = region->guest_phys_addr;
>  		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>  					 pba->size, false,
>  					 vfio_pci_msix_pba_access, pdev);
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 kvmtool 01/12] ioport: mmio: Use a mutex and reference counting for locking
  2020-05-14 15:38 ` [PATCH v4 kvmtool 01/12] ioport: mmio: Use a mutex and reference counting for locking Alexandru Elisei
@ 2020-05-15 10:13   ` André Przywara
  2020-05-15 13:18     ` Alexandru Elisei
  0 siblings, 1 reply; 22+ messages in thread
From: André Przywara @ 2020-05-15 10:13 UTC (permalink / raw)
  To: Alexandru Elisei, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

On 14/05/2020 16:38, Alexandru Elisei wrote:

Hi,

> kvmtool uses brlock for protecting accesses to the ioport and mmio
> red-black trees. brlock allows concurrent reads, but only one writer, which
> is assumed not to be a VCPU thread (for more information see commit
> 0b907ed2eaec ("kvm tools: Add a brlock)). This is done by issuing a
> compiler barrier on read and pausing the entire virtual machine on writes.
> When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread read/write
> lock.
> 
> When we will implement reassignable BARs, the mmio or ioport mapping will
> be done as a result of a VCPU mmio access. When brlock is a pthread
> read/write lock, it means that we will try to acquire a write lock with the
> read lock already held by the same VCPU and we will deadlock. When it's
> not, a VCPU will have to call kvm__pause, which means the virtual machine
> will stay paused forever.
> 
> Let's avoid all this by using a mutex and reference counting the red-black
> tree entries. This way we can guarantee that we won't unregister a node
> that another thread is currently using for emulation.

It's a pity that we can't use the traditional refcount pattern
(initialise to 1) here, but I trust your analysis and testing that this
is needed.

As far as my brain can process, this looks alright now:

> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Just a note: the usage of return and goto seems now somewhat
inconsistent here: there are cases where we do "unlock;return;" in the
middle of the function (kvm__deregister_mmio), and other cases where we
use a "goto out", even though there is just a return at that label
(kvm__emulate_mmio). In ioport__unregister() you seem to switch the scheme.
But this is just cosmetical, and we can fix this up later, should we
care, so this doesn't hold back this patch.

Cheers,
Andre

> ---
>  include/kvm/ioport.h          |   2 +
>  include/kvm/rbtree-interval.h |   4 +-
>  ioport.c                      |  85 ++++++++++++++++++++++-----------
>  mmio.c                        | 107 ++++++++++++++++++++++++++++++++----------
>  4 files changed, 143 insertions(+), 55 deletions(-)
> 
> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> index 62a719327e3f..039633f76bdd 100644
> --- a/include/kvm/ioport.h
> +++ b/include/kvm/ioport.h
> @@ -22,6 +22,8 @@ struct ioport {
>  	struct ioport_operations	*ops;
>  	void				*priv;
>  	struct device_header		dev_hdr;
> +	u32				refcount;
> +	bool				remove;
>  };
>  
>  struct ioport_operations {
> diff --git a/include/kvm/rbtree-interval.h b/include/kvm/rbtree-interval.h
> index 730eb5e8551d..17cd3b5f3199 100644
> --- a/include/kvm/rbtree-interval.h
> +++ b/include/kvm/rbtree-interval.h
> @@ -6,7 +6,9 @@
>  
>  #define RB_INT_INIT(l, h) \
>  	(struct rb_int_node){.low = l, .high = h}
> -#define rb_int(n) rb_entry(n, struct rb_int_node, node)
> +#define rb_int(n)	rb_entry(n, struct rb_int_node, node)
> +#define rb_int_start(n)	((n)->low)
> +#define rb_int_end(n)	((n)->low + (n)->high - 1)
>  
>  struct rb_int_node {
>  	struct rb_node	node;
> diff --git a/ioport.c b/ioport.c
> index d9f2e8ea3c3b..844a832d25a4 100644
> --- a/ioport.c
> +++ b/ioport.c
> @@ -2,7 +2,6 @@
>  
>  #include "kvm/kvm.h"
>  #include "kvm/util.h"
> -#include "kvm/brlock.h"
>  #include "kvm/rbtree-interval.h"
>  #include "kvm/mutex.h"
>  
> @@ -16,6 +15,8 @@
>  
>  #define ioport_node(n) rb_entry(n, struct ioport, node)
>  
> +static DEFINE_MUTEX(ioport_lock);
> +
>  static struct rb_root		ioport_tree = RB_ROOT;
>  
>  static struct ioport *ioport_search(struct rb_root *root, u64 addr)
> @@ -39,6 +40,36 @@ static void ioport_remove(struct rb_root *root, struct ioport *data)
>  	rb_int_erase(root, &data->node);
>  }
>  
> +static struct ioport *ioport_get(struct rb_root *root, u64 addr)
> +{
> +	struct ioport *ioport;
> +
> +	mutex_lock(&ioport_lock);
> +	ioport = ioport_search(root, addr);
> +	if (ioport)
> +		ioport->refcount++;
> +	mutex_unlock(&ioport_lock);
> +
> +	return ioport;
> +}
> +
> +/* Called with ioport_lock held. */
> +static void ioport_unregister(struct rb_root *root, struct ioport *data)
> +{
> +	device__unregister(&data->dev_hdr);
> +	ioport_remove(root, data);
> +	free(data);
> +}
> +
> +static void ioport_put(struct rb_root *root, struct ioport *data)
> +{
> +	mutex_lock(&ioport_lock);
> +	data->refcount--;
> +	if (data->remove && data->refcount == 0)
> +		ioport_unregister(root, data);
> +	mutex_unlock(&ioport_lock);
> +}
> +
>  #ifdef CONFIG_HAS_LIBFDT
>  static void generate_ioport_fdt_node(void *fdt,
>  				     struct device_header *dev_hdr,
> @@ -80,16 +111,22 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>  			.bus_type	= DEVICE_BUS_IOPORT,
>  			.data		= generate_ioport_fdt_node,
>  		},
> +		/*
> +		 * Start from 0 because ioport__unregister() doesn't decrement
> +		 * the reference count.
> +		 */
> +		.refcount	= 0,
> +		.remove		= false,
>  	};
>  
> -	br_write_lock(kvm);
> +	mutex_lock(&ioport_lock);
>  	r = ioport_insert(&ioport_tree, entry);
>  	if (r < 0)
>  		goto out_free;
>  	r = device__register(&entry->dev_hdr);
>  	if (r < 0)
>  		goto out_remove;
> -	br_write_unlock(kvm);
> +	mutex_unlock(&ioport_lock);
>  
>  	return port;
>  
> @@ -97,33 +134,28 @@ out_remove:
>  	ioport_remove(&ioport_tree, entry);
>  out_free:
>  	free(entry);
> -	br_write_unlock(kvm);
> +	mutex_unlock(&ioport_lock);
>  	return r;
>  }
>  
>  int ioport__unregister(struct kvm *kvm, u16 port)
>  {
>  	struct ioport *entry;
> -	int r;
> -
> -	br_write_lock(kvm);
>  
> -	r = -ENOENT;
> +	mutex_lock(&ioport_lock);
>  	entry = ioport_search(&ioport_tree, port);
> -	if (!entry)
> -		goto done;
> -
> -	device__unregister(&entry->dev_hdr);
> -	ioport_remove(&ioport_tree, entry);
> -
> -	free(entry);
> -
> -	r = 0;
> -
> -done:
> -	br_write_unlock(kvm);
> +	if (!entry) {
> +		mutex_unlock(&ioport_lock);
> +		return -ENOENT;
> +	}
> +	/* The same reasoning from kvm__deregister_mmio() applies. */
> +	if (entry->refcount == 0)
> +		ioport_unregister(&ioport_tree, entry);
> +	else
> +		entry->remove = true;
> +	mutex_unlock(&ioport_lock);
>  
> -	return r;
> +	return 0;
>  }
>  
>  static void ioport__unregister_all(void)
> @@ -136,9 +168,7 @@ static void ioport__unregister_all(void)
>  	while (rb) {
>  		rb_node = rb_int(rb);
>  		entry = ioport_node(rb_node);
> -		device__unregister(&entry->dev_hdr);
> -		ioport_remove(&ioport_tree, entry);
> -		free(entry);
> +		ioport_unregister(&ioport_tree, entry);
>  		rb = rb_first(&ioport_tree);
>  	}
>  }
> @@ -164,8 +194,7 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>  	void *ptr = data;
>  	struct kvm *kvm = vcpu->kvm;
>  
> -	br_read_lock(kvm);
> -	entry = ioport_search(&ioport_tree, port);
> +	entry = ioport_get(&ioport_tree, port);
>  	if (!entry)
>  		goto out;
>  
> @@ -180,9 +209,9 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>  		ptr += size;
>  	}
>  
> -out:
> -	br_read_unlock(kvm);
> +	ioport_put(&ioport_tree, entry);
>  
> +out:
>  	if (ret)
>  		return true;
>  
> diff --git a/mmio.c b/mmio.c
> index 61e1d47a587d..cd141cd30750 100644
> --- a/mmio.c
> +++ b/mmio.c
> @@ -1,7 +1,7 @@
>  #include "kvm/kvm.h"
>  #include "kvm/kvm-cpu.h"
>  #include "kvm/rbtree-interval.h"
> -#include "kvm/brlock.h"
> +#include "kvm/mutex.h"
>  
>  #include <stdio.h>
>  #include <stdlib.h>
> @@ -15,10 +15,14 @@
>  
>  #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
>  
> +static DEFINE_MUTEX(mmio_lock);
> +
>  struct mmio_mapping {
>  	struct rb_int_node	node;
>  	void			(*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
>  	void			*ptr;
> +	u32			refcount;
> +	bool			remove;
>  };
>  
>  static struct rb_root mmio_tree = RB_ROOT;
> @@ -51,6 +55,11 @@ static int mmio_insert(struct rb_root *root, struct mmio_mapping *data)
>  	return rb_int_insert(root, &data->node);
>  }
>  
> +static void mmio_remove(struct rb_root *root, struct mmio_mapping *data)
> +{
> +	rb_int_erase(root, &data->node);
> +}
> +
>  static const char *to_direction(u8 is_write)
>  {
>  	if (is_write)
> @@ -59,6 +68,41 @@ static const char *to_direction(u8 is_write)
>  	return "read";
>  }
>  
> +static struct mmio_mapping *mmio_get(struct rb_root *root, u64 phys_addr, u32 len)
> +{
> +	struct mmio_mapping *mmio;
> +
> +	mutex_lock(&mmio_lock);
> +	mmio = mmio_search(root, phys_addr, len);
> +	if (mmio)
> +		mmio->refcount++;
> +	mutex_unlock(&mmio_lock);
> +
> +	return mmio;
> +}
> +
> +/* Called with mmio_lock held. */
> +static void mmio_deregister(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio)
> +{
> +	struct kvm_coalesced_mmio_zone zone = (struct kvm_coalesced_mmio_zone) {
> +		.addr	= rb_int_start(&mmio->node),
> +		.size	= 1,
> +	};
> +	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
> +
> +	mmio_remove(root, mmio);
> +	free(mmio);
> +}
> +
> +static void mmio_put(struct kvm *kvm, struct rb_root *root, struct mmio_mapping *mmio)
> +{
> +	mutex_lock(&mmio_lock);
> +	mmio->refcount--;
> +	if (mmio->remove && mmio->refcount == 0)
> +		mmio_deregister(kvm, root, mmio);
> +	mutex_unlock(&mmio_lock);
> +}
> +
>  int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
>  		       void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
>  			void *ptr)
> @@ -72,9 +116,15 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
>  		return -ENOMEM;
>  
>  	*mmio = (struct mmio_mapping) {
> -		.node = RB_INT_INIT(phys_addr, phys_addr + phys_addr_len),
> -		.mmio_fn = mmio_fn,
> -		.ptr	= ptr,
> +		.node		= RB_INT_INIT(phys_addr, phys_addr + phys_addr_len),
> +		.mmio_fn	= mmio_fn,
> +		.ptr		= ptr,
> +		/*
> +		 * Start from 0 because kvm__deregister_mmio() doesn't decrement
> +		 * the reference count.
> +		 */
> +		.refcount	= 0,
> +		.remove		= false,
>  	};
>  
>  	if (coalesce) {
> @@ -88,9 +138,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
>  			return -errno;
>  		}
>  	}
> -	br_write_lock(kvm);
> +	mutex_lock(&mmio_lock);
>  	ret = mmio_insert(&mmio_tree, mmio);
> -	br_write_unlock(kvm);
> +	mutex_unlock(&mmio_lock);
>  
>  	return ret;
>  }
> @@ -98,25 +148,30 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
>  bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>  {
>  	struct mmio_mapping *mmio;
> -	struct kvm_coalesced_mmio_zone zone;
>  
> -	br_write_lock(kvm);
> +	mutex_lock(&mmio_lock);
>  	mmio = mmio_search_single(&mmio_tree, phys_addr);
>  	if (mmio == NULL) {
> -		br_write_unlock(kvm);
> +		mutex_unlock(&mmio_lock);
>  		return false;
>  	}
> +	/*
> +	 * The PCI emulation code calls this function when memory access is
> +	 * disabled for a device, or when a BAR has a new address assigned. PCI
> +	 * emulation doesn't use any locks and as a result we can end up in a
> +	 * situation where we have called mmio_get() to do emulation on one VCPU
> +	 * thread (let's call it VCPU0), and several other VCPU threads have
> +	 * called kvm__deregister_mmio(). In this case, if we decrement refcount
> +	 * kvm__deregister_mmio() (either directly, or by calling mmio_put()),
> +	 * refcount will reach 0 and we will free the mmio node before VCPU0 has
> +	 * called mmio_put(). This will trigger use-after-free errors on VCPU0.
> +	 */
> +	if (mmio->refcount == 0)
> +		mmio_deregister(kvm, &mmio_tree, mmio);
> +	else
> +		mmio->remove = true;
> +	mutex_unlock(&mmio_lock);
>  
> -	zone = (struct kvm_coalesced_mmio_zone) {
> -		.addr	= phys_addr,
> -		.size	= 1,
> -	};
> -	ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
> -
> -	rb_int_erase(&mmio_tree, &mmio->node);
> -	br_write_unlock(kvm);
> -
> -	free(mmio);
>  	return true;
>  }
>  
> @@ -124,18 +179,18 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
>  {
>  	struct mmio_mapping *mmio;
>  
> -	br_read_lock(vcpu->kvm);
> -	mmio = mmio_search(&mmio_tree, phys_addr, len);
> -
> -	if (mmio)
> -		mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
> -	else {
> +	mmio = mmio_get(&mmio_tree, phys_addr, len);
> +	if (!mmio) {
>  		if (vcpu->kvm->cfg.mmio_debug)
>  			fprintf(stderr,	"Warning: Ignoring MMIO %s at %016llx (length %u)\n",
>  				to_direction(is_write),
>  				(unsigned long long)phys_addr, len);
> +		goto out;
>  	}
> -	br_read_unlock(vcpu->kvm);
>  
> +	mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
> +	mmio_put(vcpu->kvm, &mmio_tree, mmio);
> +
> +out:
>  	return true;
>  }
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 kvmtool 01/12] ioport: mmio: Use a mutex and reference counting for locking
  2020-05-15 10:13   ` André Przywara
@ 2020-05-15 13:18     ` Alexandru Elisei
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-15 13:18 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 5/15/20 11:13 AM, André Przywara wrote:
> On 14/05/2020 16:38, Alexandru Elisei wrote:
>
> Hi,
>
>> kvmtool uses brlock for protecting accesses to the ioport and mmio
>> red-black trees. brlock allows concurrent reads, but only one writer, which
>> is assumed not to be a VCPU thread (for more information see commit
>> 0b907ed2eaec ("kvm tools: Add a brlock)). This is done by issuing a
>> compiler barrier on read and pausing the entire virtual machine on writes.
>> When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread read/write
>> lock.
>>
>> When we will implement reassignable BARs, the mmio or ioport mapping will
>> be done as a result of a VCPU mmio access. When brlock is a pthread
>> read/write lock, it means that we will try to acquire a write lock with the
>> read lock already held by the same VCPU and we will deadlock. When it's
>> not, a VCPU will have to call kvm__pause, which means the virtual machine
>> will stay paused forever.
>>
>> Let's avoid all this by using a mutex and reference counting the red-black
>> tree entries. This way we can guarantee that we won't unregister a node
>> that another thread is currently using for emulation.
> It's a pity that we can't use the traditional refcount pattern
> (initialise to 1) here, but I trust your analysis and testing that this
> is needed.

If it helps, the pattern is somewhat similar to kobject, which has a kobject_del
function (kref doesn't). kboject_del decrements the reference count only if the
parent is not NULL, then sets parent to NULL. The patch does what kobject_del
would have done if it didn't have a kref embedded in it.

>
> As far as my brain can process, this looks alright now:
>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
>
> Just a note: the usage of return and goto seems now somewhat
> inconsistent here: there are cases where we do "unlock;return;" in the
> middle of the function (kvm__deregister_mmio), and other cases where we
> use a "goto out", even though there is just a return at that label
> (kvm__emulate_mmio). In ioport__unregister() you seem to switch the scheme.
> But this is just cosmetical, and we can fix this up later, should we
> care, so this doesn't hold back this patch.

There are a number of inconsistencies in both files: mmio doesn't have its own
header file and ioport has, kvm__deregister_mmio should use unregister (like
ioport), it returns bool instead of an int (like ioport), when unregistering a
mmio region we call KVM_UNREGISTER_COALESCED_MMIO unconditionally, ioport
unregisters all ioports on exit and mmio doesn't, ioport creates a new device for
each call to ioport__register, etc, etc. Not to mention the fact that ioport is
used for architectures that don't have the notion of I/O ports. I think these
issues deserve their own cleanup series.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 kvmtool 10/12] pci: Implement reassignable BARs
  2020-05-14 16:56   ` André Przywara
@ 2020-05-15 13:25     ` Alexandru Elisei
  0 siblings, 0 replies; 22+ messages in thread
From: Alexandru Elisei @ 2020-05-15 13:25 UTC (permalink / raw)
  To: André Przywara, kvm
  Cc: will, julien.thierry.kdev, sami.mujawar, lorenzo.pieralisi, maz

Hi,

On 5/14/20 5:56 PM, André Przywara wrote:
> On 14/05/2020 16:38, Alexandru Elisei wrote:
>
> Hi,
>
>> BARs are used by the guest to configure the access to the PCI device by
>> writing the address to which the device will respond. The basic idea for
>> adding support for reassignable BARs is straightforward: deactivate
>> emulation for the memory region described by the old BAR value, and
>> activate emulation for the new region.
>>
>> BAR reassignment can be done while device access is enabled and memory
>> regions for different devices can overlap as long as no access is made to
>> the overlapping memory regions. This means that it is legal for the BARs of
>> two distinct devices to point to an overlapping memory region, and indeed,
>> this is how Linux does resource assignment at boot. To account for this
>> situation, the simple algorithm described above is enhanced to scan for all
>> devices and:
>>
>> - Deactivate emulation for any BARs that might overlap with the new BAR
>>   value.
>>
>> - Enable emulation for any BARs that were overlapping with the old value
>>   after the BAR has been updated.
>>
>> Activating/deactivating emulation of a memory region has side effects.  In
>> order to prevent the execution of the same callback twice we now keep track
>> of the state of the region emulation. For example, this can happen if we
>> program a BAR with an address that overlaps a second BAR, thus deactivating
>> emulation for the second BAR, and then we disable all region accesses to
>> the second BAR by writing to the command register.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Many thanks for the changes and the added comments!

Thank you for the review! :D

>
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
>
>
> One minor hint below, but that's not critical.
>
>> ---
>>  include/kvm/pci.h |  14 ++-
>>  pci.c             | 250 +++++++++++++++++++++++++++++++++++++++++++-----------
>>  vfio/pci.c        |  12 +++
>>  3 files changed, 227 insertions(+), 49 deletions(-)
>>
>> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
>> index 73e06d76d244..bf81323d83b7 100644
>> --- a/include/kvm/pci.h
>> +++ b/include/kvm/pci.h
>> @@ -11,6 +11,17 @@
>>  #include "kvm/msi.h"
>>  #include "kvm/fdt.h"
>>  
>> +#define pci_dev_err(pci_hdr, fmt, ...) \
>> +	pr_err("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_warn(pci_hdr, fmt, ...) \
>> +	pr_warning("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_info(pci_hdr, fmt, ...) \
>> +	pr_info("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_dbg(pci_hdr, fmt, ...) \
>> +	pr_debug("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +#define pci_dev_die(pci_hdr, fmt, ...) \
>> +	die("[%04x:%04x] " fmt, pci_hdr->vendor_id, pci_hdr->device_id, ##__VA_ARGS__)
>> +
>>  /*
>>   * PCI Configuration Mechanism #1 I/O ports. See Section 3.7.4.1.
>>   * ("Configuration Mechanism #1") of the PCI Local Bus Specification 2.1 for
>> @@ -142,7 +153,8 @@ struct pci_device_header {
>>  	};
>>  
>>  	/* Private to lkvm */
>> -	u32		bar_size[6];
>> +	u32			bar_size[6];
>> +	bool			bar_active[6];
>>  	bar_activate_fn_t	bar_activate_fn;
>>  	bar_deactivate_fn_t	bar_deactivate_fn;
>>  	void *data;
>> diff --git a/pci.c b/pci.c
>> index 96239160110c..2e2c0270a166 100644
>> --- a/pci.c
>> +++ b/pci.c
>> @@ -71,6 +71,11 @@ static bool pci_bar_is_implemented(struct pci_device_header *pci_hdr, int bar_nu
>>  	return pci__bar_size(pci_hdr, bar_num);
>>  }
>>  
>> +static bool pci_bar_is_active(struct pci_device_header *pci_hdr, int bar_num)
>> +{
>> +	return  pci_hdr->bar_active[bar_num];
>> +}
>> +
>>  static void *pci_config_address_ptr(u16 port)
>>  {
>>  	unsigned long offset;
>> @@ -163,6 +168,46 @@ static struct ioport_operations pci_config_data_ops = {
>>  	.io_out	= pci_config_data_out,
>>  };
>>  
>> +static int pci_activate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			    int bar_num)
>> +{
>> +	int r = 0;
>> +
>> +	if (pci_bar_is_active(pci_hdr, bar_num))
>> +		goto out;
>> +
>> +	r = pci_hdr->bar_activate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
>> +	if (r < 0) {
>> +		pci_dev_warn(pci_hdr, "Error activating emulation for BAR %d",
>> +			     bar_num);
>> +		goto out;
>> +	}
>> +	pci_hdr->bar_active[bar_num] = true;
>> +
>> +out:
>> +	return r;
>> +}
>> +
>> +static int pci_deactivate_bar(struct kvm *kvm, struct pci_device_header *pci_hdr,
>> +			      int bar_num)
>> +{
>> +	int r = 0;
>> +
>> +	if (!pci_bar_is_active(pci_hdr, bar_num))
>> +		goto out;
>> +
>> +	r = pci_hdr->bar_deactivate_fn(kvm, pci_hdr, bar_num, pci_hdr->data);
>> +	if (r < 0) {
>> +		pci_dev_warn(pci_hdr, "Error deactivating emulation for BAR %d",
>> +			     bar_num);
>> +		goto out;
>> +	}
>> +	pci_hdr->bar_active[bar_num] = false;
>> +
>> +out:
>> +	return r;
>> +}
>> +
>>  static void pci_config_command_wr(struct kvm *kvm,
>>  				  struct pci_device_header *pci_hdr,
>>  				  u16 new_command)
>> @@ -179,26 +224,167 @@ static void pci_config_command_wr(struct kvm *kvm,
>>  
>>  		if (toggle_io && pci__bar_is_io(pci_hdr, i)) {
>>  			if (__pci__io_space_enabled(new_command))
>> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>> -							 pci_hdr->data);
>> +				pci_activate_bar(kvm, pci_hdr, i);
>>  			else
>> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>> -							   pci_hdr->data);
>> +				pci_deactivate_bar(kvm, pci_hdr, i);
>>  		}
>>  
>>  		if (toggle_mem && pci__bar_is_memory(pci_hdr, i)) {
>>  			if (__pci__memory_space_enabled(new_command))
>> -				pci_hdr->bar_activate_fn(kvm, pci_hdr, i,
>> -							 pci_hdr->data);
>> +				pci_activate_bar(kvm, pci_hdr, i);
>>  			else
>> -				pci_hdr->bar_deactivate_fn(kvm, pci_hdr, i,
>> -							   pci_hdr->data);
>> +				pci_deactivate_bar(kvm, pci_hdr, i);
>>  		}
>>  	}
>>  
>>  	pci_hdr->command = new_command;
>>  }
>>  
>> +static int pci_toggle_bar_regions(bool activate, struct kvm *kvm, u32 start, u32 size)
>> +{
>> +	struct device_header *dev_hdr;
>> +	struct pci_device_header *tmp_hdr;
>> +	u32 tmp_start, tmp_size;
>> +	int i, r;
>> +
>> +	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
>> +	while (dev_hdr) {
>> +		tmp_hdr = dev_hdr->data;
>> +		for (i = 0; i < 6; i++) {
>> +			if (!pci_bar_is_implemented(tmp_hdr, i))
>> +				continue;
>> +
>> +			tmp_start = pci__bar_address(tmp_hdr, i);
>> +			tmp_size = pci__bar_size(tmp_hdr, i);
>> +			if (tmp_start + tmp_size <= start ||
>> +			    tmp_start >= start + size)
>> +				continue;
>> +
>> +			if (activate)
>> +				r = pci_activate_bar(kvm, tmp_hdr, i);
>> +			else
>> +				r = pci_deactivate_bar(kvm, tmp_hdr, i);
>> +			if (r < 0)
>> +				return r;
>> +		}
>> +		dev_hdr = device__next_dev(dev_hdr);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static inline int pci_activate_bar_regions(struct kvm *kvm, u32 start, u32 size)
> This inline is not needed. It's a hint anyway, the compiler may or may
> not observe it. It knows best anyway, if it doesn't inline it, then for
> a reason.
>
> There is a cause for "static inline" in *header files*, because it
> prevents warnings about unused functions.

I wanted a higher chance to have the function inlined by the compiler to avoid the
argument shuffling that would have to be done in this function ({x0..x2} ->
{x1..x3}). That's probably a premature and pointless optimization, because the
function is not in a fast path and "inline" doesn't guarantee inlining. I'm fine
with dropping the attribute.

Thanks,
Alex
>
> Cheers,
> Anre.
>
>> +{
>> +	return pci_toggle_bar_regions(true, kvm, start, size);
>> +}
>> +
>> +static inline int pci_deactivate_bar_regions(struct kvm *kvm, u32 start, u32 size)
>> +{
>> +	return pci_toggle_bar_regions(false, kvm, start, size);
>> +}
>> +
>> +static void pci_config_bar_wr(struct kvm *kvm,
>> +			      struct pci_device_header *pci_hdr, int bar_num,
>> +			      u32 value)
>> +{
>> +	u32 old_addr, new_addr, bar_size;
>> +	u32 mask;
>> +	int r;
>> +
>> +	if (pci__bar_is_io(pci_hdr, bar_num))
>> +		mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
>> +	else
>> +		mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
>> +
>> +	/*
>> +	 * If the kernel masks the BAR, it will expect to find the size of the
>> +	 * BAR there next time it reads from it. After the kernel reads the
>> +	 * size, it will write the address back.
>> +	 *
>> +	 * According to the PCI local bus specification REV 3.0: The number of
>> +	 * upper bits that a device actually implements depends on how much of
>> +	 * the address space the device will respond to. A device that wants a 1
>> +	 * MB memory address space (using a 32-bit base address register) would
>> +	 * build the top 12 bits of the address register, hardwiring the other
>> +	 * bits to 0.
>> +	 *
>> +	 * Furthermore, software can determine how much address space the device
>> +	 * requires by writing a value of all 1's to the register and then
>> +	 * reading the value back. The device will return 0's in all don't-care
>> +	 * address bits, effectively specifying the address space required.
>> +	 *
>> +	 * Software computes the size of the address space with the formula
>> +	 * S =  ~B + 1, where S is the memory size and B is the value read from
>> +	 * the BAR. This means that the BAR value that kvmtool should return is
>> +	 * B = ~(S - 1).
>> +	 */
>> +	if (value == 0xffffffff) {
>> +		value = ~(pci__bar_size(pci_hdr, bar_num) - 1);
>> +		/* Preserve the special bits. */
>> +		value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
>> +		pci_hdr->bar[bar_num] = value;
>> +		return;
>> +	}
>> +
>> +	value = (value & mask) | (pci_hdr->bar[bar_num] & ~mask);
>> +
>> +	/* Don't toggle emulation when region type access is disbled. */
>> +	if (pci__bar_is_io(pci_hdr, bar_num) &&
>> +	    !pci__io_space_enabled(pci_hdr)) {
>> +		pci_hdr->bar[bar_num] = value;
>> +		return;
>> +	}
>> +
>> +	if (pci__bar_is_memory(pci_hdr, bar_num) &&
>> +	    !pci__memory_space_enabled(pci_hdr)) {
>> +		pci_hdr->bar[bar_num] = value;
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * BAR reassignment can be done while device access is enabled and
>> +	 * memory regions for different devices can overlap as long as no access
>> +	 * is made to the overlapping memory regions. To implement BAR
>> +	 * reasignment, we deactivate emulation for the region described by the
>> +	 * BAR value that the guest is changing, we disable emulation for the
>> +	 * regions that overlap with the new one (by scanning through all PCI
>> +	 * devices), we enable emulation for the new BAR value and finally we
>> +	 * enable emulation for all device regions that were overlapping with
>> +	 * the old value.
>> +	 */
>> +	old_addr = pci__bar_address(pci_hdr, bar_num);
>> +	new_addr = __pci__bar_address(value);
>> +	bar_size = pci__bar_size(pci_hdr, bar_num);
>> +
>> +	r = pci_deactivate_bar(kvm, pci_hdr, bar_num);
>> +	if (r < 0)
>> +		return;
>> +
>> +	r = pci_deactivate_bar_regions(kvm, new_addr, bar_size);
>> +	if (r < 0) {
>> +		/*
>> +		 * We cannot update the BAR because of an overlapping region
>> +		 * that failed to deactivate emulation, so keep the old BAR
>> +		 * value and re-activate emulation for it.
>> +		 */
>> +		pci_activate_bar(kvm, pci_hdr, bar_num);
>> +		return;
>> +	}
>> +
>> +	pci_hdr->bar[bar_num] = value;
>> +	r = pci_activate_bar(kvm, pci_hdr, bar_num);
>> +	if (r < 0) {
>> +		/*
>> +		 * New region cannot be emulated, re-enable the regions that
>> +		 * were overlapping.
>> +		 */
>> +		pci_activate_bar_regions(kvm, new_addr, bar_size);
>> +		return;
>> +	}
>> +
>> +	pci_activate_bar_regions(kvm, old_addr, bar_size);
>> +}
>> +
>>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>>  {
>>  	void *base;
>> @@ -206,7 +392,6 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	struct pci_device_header *pci_hdr;
>>  	u8 dev_num = addr.device_number;
>>  	u32 value = 0;
>> -	u32 mask;
>>  
>>  	if (!pci_device_exists(addr.bus_number, dev_num, 0))
>>  		return;
>> @@ -231,46 +416,13 @@ void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void *data,
>>  	}
>>  
>>  	bar = (offset - PCI_BAR_OFFSET(0)) / sizeof(u32);
>> -
>> -	/*
>> -	 * If the kernel masks the BAR, it will expect to find the size of the
>> -	 * BAR there next time it reads from it. After the kernel reads the
>> -	 * size, it will write the address back.
>> -	 */
>>  	if (bar < 6) {
>> -		if (pci__bar_is_io(pci_hdr, bar))
>> -			mask = (u32)PCI_BASE_ADDRESS_IO_MASK;
>> -		else
>> -			mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
>> -		/*
>> -		 * According to the PCI local bus specification REV 3.0:
>> -		 * The number of upper bits that a device actually implements
>> -		 * depends on how much of the address space the device will
>> -		 * respond to. A device that wants a 1 MB memory address space
>> -		 * (using a 32-bit base address register) would build the top
>> -		 * 12 bits of the address register, hardwiring the other bits
>> -		 * to 0.
>> -		 *
>> -		 * Furthermore, software can determine how much address space
>> -		 * the device requires by writing a value of all 1's to the
>> -		 * register and then reading the value back. The device will
>> -		 * return 0's in all don't-care address bits, effectively
>> -		 * specifying the address space required.
>> -		 *
>> -		 * Software computes the size of the address space with the
>> -		 * formula S = ~B + 1, where S is the memory size and B is the
>> -		 * value read from the BAR. This means that the BAR value that
>> -		 * kvmtool should return is B = ~(S - 1).
>> -		 */
>>  		memcpy(&value, data, size);
>> -		if (value == 0xffffffff)
>> -			value = ~(pci__bar_size(pci_hdr, bar) - 1);
>> -		/* Preserve the special bits. */
>> -		value = (value & mask) | (pci_hdr->bar[bar] & ~mask);
>> -		memcpy(base + offset, &value, size);
>> -	} else {
>> -		memcpy(base + offset, data, size);
>> +		pci_config_bar_wr(kvm, pci_hdr, bar, value);
>> +		return;
>>  	}
>> +
>> +	memcpy(base + offset, data, size);
>>  }
>>  
>>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void *data, int size)
>> @@ -336,16 +488,18 @@ int pci__register_bar_regions(struct kvm *kvm, struct pci_device_header *pci_hdr
>>  		if (!pci_bar_is_implemented(pci_hdr, i))
>>  			continue;
>>  
>> +		assert(!pci_bar_is_active(pci_hdr, i));
>> +
>>  		if (pci__bar_is_io(pci_hdr, i) &&
>>  		    pci__io_space_enabled(pci_hdr)) {
>> -			r = bar_activate_fn(kvm, pci_hdr, i, data);
>> +			r = pci_activate_bar(kvm, pci_hdr, i);
>>  			if (r < 0)
>>  				return r;
>>  		}
>>  
>>  		if (pci__bar_is_memory(pci_hdr, i) &&
>>  		    pci__memory_space_enabled(pci_hdr)) {
>> -			r = bar_activate_fn(kvm, pci_hdr, i, data);
>> +			r = pci_activate_bar(kvm, pci_hdr, i);
>>  			if (r < 0)
>>  				return r;
>>  		}
>> diff --git a/vfio/pci.c b/vfio/pci.c
>> index 34f19792765e..49ecd12a38cd 100644
>> --- a/vfio/pci.c
>> +++ b/vfio/pci.c
>> @@ -467,6 +467,7 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>>  	struct vfio_pci_msix_pba *pba = &pdev->msix_pba;
>>  	struct vfio_pci_msix_table *table = &pdev->msix_table;
>>  	struct vfio_region *region;
>> +	u32 bar_addr;
>>  	bool has_msix;
>>  	int ret;
>>  
>> @@ -475,7 +476,14 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>>  	region = &vdev->regions[bar_num];
>>  	has_msix = pdev->irq_modes & VFIO_PCI_IRQ_MODE_MSIX;
>>  
>> +	bar_addr = pci__bar_address(pci_hdr, bar_num);
>> +	if (pci__bar_is_io(pci_hdr, bar_num))
>> +		region->port_base = bar_addr;
>> +	else
>> +		region->guest_phys_addr = bar_addr;
>> +
>>  	if (has_msix && (u32)bar_num == table->bar) {
>> +		table->guest_phys_addr = region->guest_phys_addr;
>>  		ret = kvm__register_mmio(kvm, table->guest_phys_addr,
>>  					 table->size, false,
>>  					 vfio_pci_msix_table_access, pdev);
>> @@ -490,6 +498,10 @@ static int vfio_pci_bar_activate(struct kvm *kvm,
>>  	}
>>  
>>  	if (has_msix && (u32)bar_num == pba->bar) {
>> +		if (pba->bar == table->bar)
>> +			pba->guest_phys_addr = table->guest_phys_addr + table->size;
>> +		else
>> +			pba->guest_phys_addr = region->guest_phys_addr;
>>  		ret = kvm__register_mmio(kvm, pba->guest_phys_addr,
>>  					 pba->size, false,
>>  					 vfio_pci_msix_pba_access, pdev);
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 kvmtool 00/12] Add reassignable BARs
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (11 preceding siblings ...)
  2020-05-14 15:38 ` [PATCH v4 kvmtool 12/12] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
@ 2020-05-15 15:38 ` André Przywara
  2020-05-19 16:46 ` Will Deacon
  13 siblings, 0 replies; 22+ messages in thread
From: André Przywara @ 2020-05-15 15:38 UTC (permalink / raw)
  To: will
  Cc: Alexandru Elisei, kvm, julien.thierry.kdev, sami.mujawar,
	lorenzo.pieralisi, maz

On 14/05/2020 16:38, Alexandru Elisei wrote:

Hi Will,

> kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
> it from trying to reassign the BARs. Let's make the BARs reassignable so we
> can get rid of this band-aid.

I have covered this series now, posting Reviewed-by: tags for the
patches that were still missing it.

From my point of view this is good to go now. There is still work to do,
but this needs to happen on top of this anyway.

Thanks!
Andre.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 kvmtool 00/12] Add reassignable BARs
  2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
                   ` (12 preceding siblings ...)
  2020-05-15 15:38 ` [PATCH v4 kvmtool 00/12] Add reassignable BARs André Przywara
@ 2020-05-19 16:46 ` Will Deacon
  13 siblings, 0 replies; 22+ messages in thread
From: Will Deacon @ 2020-05-19 16:46 UTC (permalink / raw)
  To: kvm, Alexandru Elisei
  Cc: catalin.marinas, Will Deacon, sami.mujawar, andre.przywara,
	lorenzo.pieralisi, julien.thierry.kdev, maz

On Thu, 14 May 2020 16:38:17 +0100, Alexandru Elisei wrote:
> kvmtool uses the Linux-only dt property 'linux,pci-probe-only' to prevent
> it from trying to reassign the BARs. Let's make the BARs reassignable so we
> can get rid of this band-aid.
> 
> During device configuration, Linux can assign a region resource to a BAR
> that temporarily overlaps with another device. This means that the code
> that emulates BAR reassignment must be aware of all the registered PCI
> devices. Because of this, and to avoid re-implementing the same code for
> each emulated device, the algorithm for activating/deactivating emulation
> as BAR addresses change lives completely inside the PCI code. Each device
> registers two callback functions which are called when device emulation is
> activated (for example, to activate emulation for a newly assigned BAR
> address), respectively, when device emulation is deactivated (a previous
> BAR address is changed, and emulation for that region must be deactivated).
> 
> [...]

Applied to kvmtool (master), thanks!

[01/12] ioport: mmio: Use a mutex and reference counting for locking
        https://git.kernel.org/will/kvmtool/c/09577ac58fef
[02/12] pci: Add helpers for BAR values and memory/IO space access
        https://git.kernel.org/will/kvmtool/c/2f6384f924d7
[03/12] virtio/pci: Get emulated region address from BARs
        https://git.kernel.org/will/kvmtool/c/e539f3e425fb
[04/12] vfio: Reserve ioports when configuring the BAR
        https://git.kernel.org/will/kvmtool/c/a05e576f7fd8
[05/12] pci: Limit configuration transaction size to 32 bits
        https://git.kernel.org/will/kvmtool/c/6ea32ebdb84b
[06/12] vfio/pci: Don't write configuration value twice
        https://git.kernel.org/will/kvmtool/c/e1d0285c89ae
[07/12] Don't allow more than one framebuffers
        https://git.kernel.org/will/kvmtool/c/cbf3d16fccba
[08/12] pci: Implement callbacks for toggling BAR emulation
        https://git.kernel.org/will/kvmtool/c/5a8e4f25dd7b
[09/12] pci: Toggle BAR I/O and memory space emulation
        https://git.kernel.org/will/kvmtool/c/46e04130d264
[10/12] pci: Implement reassignable BARs
        https://git.kernel.org/will/kvmtool/c/465edc9d0fab
[11/12] arm/fdt: Remove 'linux,pci-probe-only' property
        https://git.kernel.org/will/kvmtool/c/ad5e9056de0c
[12/12] vfio: Trap MMIO access to BAR addresses which aren't page aligned
        https://git.kernel.org/will/kvmtool/c/b4fc4f605fc6

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-05-19 16:47 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14 15:38 [PATCH v4 kvmtool 00/12] Add reassignable BARs Alexandru Elisei
2020-05-14 15:38 ` [PATCH v4 kvmtool 01/12] ioport: mmio: Use a mutex and reference counting for locking Alexandru Elisei
2020-05-15 10:13   ` André Przywara
2020-05-15 13:18     ` Alexandru Elisei
2020-05-14 15:38 ` [PATCH v4 kvmtool 02/12] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
2020-05-14 15:38 ` [PATCH v4 kvmtool 03/12] virtio/pci: Get emulated region address from BARs Alexandru Elisei
2020-05-14 15:38 ` [PATCH v4 kvmtool 04/12] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
2020-05-14 15:38 ` [PATCH v4 kvmtool 05/12] pci: Limit configuration transaction size to 32 bits Alexandru Elisei
2020-05-14 15:38 ` [PATCH v4 kvmtool 06/12] vfio/pci: Don't write configuration value twice Alexandru Elisei
2020-05-14 16:55   ` André Przywara
2020-05-14 15:38 ` [PATCH v4 kvmtool 07/12] Don't allow more than one framebuffers Alexandru Elisei
2020-05-14 16:56   ` André Przywara
2020-05-14 15:38 ` [PATCH v4 kvmtool 08/12] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
2020-05-14 16:56   ` André Przywara
2020-05-14 15:38 ` [PATCH v4 kvmtool 09/12] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
2020-05-14 15:38 ` [PATCH v4 kvmtool 10/12] pci: Implement reassignable BARs Alexandru Elisei
2020-05-14 16:56   ` André Przywara
2020-05-15 13:25     ` Alexandru Elisei
2020-05-14 15:38 ` [PATCH v4 kvmtool 11/12] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
2020-05-14 15:38 ` [PATCH v4 kvmtool 12/12] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
2020-05-15 15:38 ` [PATCH v4 kvmtool 00/12] Add reassignable BARs André Przywara
2020-05-19 16:46 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).