All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override
@ 2017-06-20 15:47 Alex Williamson
  2017-06-20 15:47 ` [PATCH v3 1/9] vfio: Fix group release deadlock Alex Williamson
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:47 UTC (permalink / raw)
  To: kvm; +Cc: eric.auger, alex.williamson, linux-kernel, linux

v3:

 * Fix Alexey's nit in 2/, which becomes a bug in 3/.  I posted the
   intended correction for this, but 0-day builds broke on it and I'd
   like to be sure we get all the automated testing possible, so v3.
   Added Alexey's Rb.

Thanks,

Alex

v2:

 * Added received acks and reviews, thanks!
 * Rebased and resolved conflict in patch 2/, dropped reviews due
   to changes and added Alexey to cc as spapr code is moved too
 * Added stable tag for patches 1-3
 * Resolved comment typo Eric noted in patch 1
 * Split AMBA out to patches 8 & 9 as Eric noted amba_bustype is
   not exported.  These can be separate follow-up patches if delayed

Please re-ack/review patch 2.  Eric, I'm happy to add your Tested-by
to the whole series if appropriate as well.  Thanks,

Alex


v1:

VM hotplug testing reveals a number of races in the vfio device,
group, container shutdown path, some attributed to libvirt's ask/take
unplug behavior and some long standing with groups potentially
composed of multiple devices, where each device can be independently
bound to drivers.  Libvirt's ask/take behavior is a result of the
asynchronous nature of PCI hotplug, libvirt registers a hot-unplug
request (ask), which is acknowledged almost immediately and then
proceeds to try to unbind the device from the vfio bus driver (take).
This sets us off on racing paths where we allow the device to be
released from the group much like would happen in groups with multiple
devices, while the group and container are torn down separately.
These races are addressed in the first 3 patches of this series.

The long standing issue with removing devices from in-use groups is
that we feel that the system is compromised if we allow user and host
devices within the same non-isolated group.  This triggers a BUG_ON
when we detect this condition after the rogue driver binding.  Since
that code was put in place we've added driver_override support for
all of the physical buses supported by vfio, giving us a way to block
binding to such compromising drivers.  We finally enable that in the
latter 4 patches of this series, minding that we need to allow
re-binding to non-compromising drivers, and also noting that a small
synchronization stall is effective in eliminating the need for this
blocking in the more common singleton device group case.

Reviews, comments, and acks appreciated.  Thanks,

Alex

---

Alex Williamson (9):
      vfio: Fix group release deadlock
      kvm-vfio: Decouple only when we match a group
      vfio: New external user group/file match
      iommu: Add driver-not-bound notification
      vfio: Create interface for vfio bus drivers to register
      vfio: Register pci, platform, amba, and mdev bus drivers
      vfio: Use driver_override to avert binding to compromising drivers
      amba: Export amba_bustype
      vfio: Add AMBA driver_override support


 drivers/amba/bus.c                    |    1 
 drivers/iommu/iommu.c                 |    3 
 drivers/vfio/mdev/vfio_mdev.c         |   13 ++
 drivers/vfio/pci/vfio_pci.c           |    7 +
 drivers/vfio/platform/vfio_amba.c     |   24 +++
 drivers/vfio/platform/vfio_platform.c |   24 +++
 drivers/vfio/vfio.c                   |  252 ++++++++++++++++++++++++++++++++-
 include/linux/iommu.h                 |    1 
 include/linux/vfio.h                  |    5 +
 virt/kvm/vfio.c                       |   40 +++--
 10 files changed, 344 insertions(+), 26 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/9] vfio: Fix group release deadlock
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
@ 2017-06-20 15:47 ` Alex Williamson
  2017-06-20 15:47 ` [PATCH v3 2/9] kvm-vfio: Decouple only when we match a group Alex Williamson
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:47 UTC (permalink / raw)
  To: kvm; +Cc: eric.auger, alex.williamson, linux-kernel, stable, linux

If vfio_iommu_group_notifier() acquires a group reference and that
reference becomes the last reference to the group, then vfio_group_put
introduces a deadlock code path where we're trying to unregister from
the iommu notifier chain from within a callout of that chain.  Use a
work_struct to release this reference asynchronously.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/vfio/vfio.c |   37 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6a49485eb49d..54dd2fbf83d9 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -423,6 +423,34 @@ static void vfio_group_put(struct vfio_group *group)
 	kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
 }
 
+struct vfio_group_put_work {
+	struct work_struct work;
+	struct vfio_group *group;
+};
+
+static void vfio_group_put_bg(struct work_struct *work)
+{
+	struct vfio_group_put_work *do_work;
+
+	do_work = container_of(work, struct vfio_group_put_work, work);
+
+	vfio_group_put(do_work->group);
+	kfree(do_work);
+}
+
+static void vfio_group_schedule_put(struct vfio_group *group)
+{
+	struct vfio_group_put_work *do_work;
+
+	do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
+	if (WARN_ON(!do_work))
+		return;
+
+	INIT_WORK(&do_work->work, vfio_group_put_bg);
+	do_work->group = group;
+	schedule_work(&do_work->work);
+}
+
 /* Assume group_lock or group reference is held */
 static void vfio_group_get(struct vfio_group *group)
 {
@@ -762,7 +790,14 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 		break;
 	}
 
-	vfio_group_put(group);
+	/*
+	 * If we're the last reference to the group, the group will be
+	 * released, which includes unregistering the iommu group notifier.
+	 * We hold a read-lock on that notifier list, unregistering needs
+	 * a write-lock... deadlock.  Release our reference asynchronously
+	 * to avoid that situation.
+	 */
+	vfio_group_schedule_put(group);
 	return NOTIFY_OK;
 }
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/9] kvm-vfio: Decouple only when we match a group
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
  2017-06-20 15:47 ` [PATCH v3 1/9] vfio: Fix group release deadlock Alex Williamson
@ 2017-06-20 15:47 ` Alex Williamson
  2017-06-26  7:30   ` Auger Eric
  2017-06-28 17:37   ` Paolo Bonzini
  2017-06-20 15:47 ` [PATCH v3 3/9] vfio: New external user group/file match Alex Williamson
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:47 UTC (permalink / raw)
  To: kvm
  Cc: Alexey Kardashevskiy, linux-kernel, stable, linux, eric.auger,
	alex.williamson, Paolo Bonzini

Unset-KVM and decrement-assignment only when we find the group in our
list.  Otherwise we can get out of sync if the user triggers this for
groups that aren't currently on our list.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: stable@vger.kernel.org
---
 virt/kvm/vfio.c |   13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 37d9118fd84b..6e002d0f3191 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -246,21 +246,20 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 				continue;
 
 			list_del(&kvg->node);
+			kvm_arch_end_assignment(dev->kvm);
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+			kvm_spapr_tce_release_vfio_group(dev->kvm,
+							 kvg->vfio_group);
+#endif
+			kvm_vfio_group_set_kvm(kvg->vfio_group, NULL);
 			kvm_vfio_group_put_external_user(kvg->vfio_group);
 			kfree(kvg);
 			ret = 0;
 			break;
 		}
 
-		kvm_arch_end_assignment(dev->kvm);
-
 		mutex_unlock(&kv->lock);
 
-#ifdef CONFIG_SPAPR_TCE_IOMMU
-		kvm_spapr_tce_release_vfio_group(dev->kvm, vfio_group);
-#endif
-		kvm_vfio_group_set_kvm(vfio_group, NULL);
-
 		kvm_vfio_group_put_external_user(vfio_group);
 
 		kvm_vfio_update_coherency(dev);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/9] vfio: New external user group/file match
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
  2017-06-20 15:47 ` [PATCH v3 1/9] vfio: Fix group release deadlock Alex Williamson
  2017-06-20 15:47 ` [PATCH v3 2/9] kvm-vfio: Decouple only when we match a group Alex Williamson
@ 2017-06-20 15:47 ` Alex Williamson
  2017-06-20 15:48 ` [PATCH v3 4/9] iommu: Add driver-not-bound notification Alex Williamson
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:47 UTC (permalink / raw)
  To: kvm
  Cc: linux, stable, linux-kernel, eric.auger, alex.williamson, Paolo Bonzini

At the point where the kvm-vfio pseudo device wants to release its
vfio group reference, we can't always acquire a new reference to make
that happen.  The group can be in a state where we wouldn't allow a
new reference to be added.  This new helper function allows a caller
to match a file to a group to facilitate this.  Given a file and
group, report if they match.  Thus the caller needs to already have a
group reference to match to the file.  This allows the deletion of a
group without acquiring a new reference.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
---
 drivers/vfio/vfio.c  |    9 +++++++++
 include/linux/vfio.h |    2 ++
 virt/kvm/vfio.c      |   27 +++++++++++++++++++--------
 3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 54dd2fbf83d9..7597a377eb4e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1776,6 +1776,15 @@ void vfio_group_put_external_user(struct vfio_group *group)
 }
 EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
 
+bool vfio_external_group_match_file(struct vfio_group *test_group,
+				    struct file *filep)
+{
+	struct vfio_group *group = filep->private_data;
+
+	return (filep->f_op == &vfio_group_fops) && (group == test_group);
+}
+EXPORT_SYMBOL_GPL(vfio_external_group_match_file);
+
 int vfio_external_user_iommu_id(struct vfio_group *group)
 {
 	return iommu_group_id(group->iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index edf9b2cad277..9b34d0af5d27 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -97,6 +97,8 @@ extern void vfio_unregister_iommu_driver(
  */
 extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 extern void vfio_group_put_external_user(struct vfio_group *group);
+extern bool vfio_external_group_match_file(struct vfio_group *group,
+					   struct file *filep);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
 extern long vfio_external_check_extension(struct vfio_group *group,
 					  unsigned long arg);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 6e002d0f3191..d99850c462a1 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -51,6 +51,22 @@ static struct vfio_group *kvm_vfio_group_get_external_user(struct file *filep)
 	return vfio_group;
 }
 
+static bool kvm_vfio_external_group_match_file(struct vfio_group *group,
+					       struct file *filep)
+{
+	bool ret, (*fn)(struct vfio_group *, struct file *);
+
+	fn = symbol_get(vfio_external_group_match_file);
+	if (!fn)
+		return false;
+
+	ret = fn(group, filep);
+
+	symbol_put(vfio_external_group_match_file);
+
+	return ret;
+}
+
 static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
 {
 	void (*fn)(struct vfio_group *);
@@ -231,18 +247,13 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 		if (!f.file)
 			return -EBADF;
 
-		vfio_group = kvm_vfio_group_get_external_user(f.file);
-		fdput(f);
-
-		if (IS_ERR(vfio_group))
-			return PTR_ERR(vfio_group);
-
 		ret = -ENOENT;
 
 		mutex_lock(&kv->lock);
 
 		list_for_each_entry(kvg, &kv->group_list, node) {
-			if (kvg->vfio_group != vfio_group)
+			if (!kvm_vfio_external_group_match_file(kvg->vfio_group,
+								f.file))
 				continue;
 
 			list_del(&kvg->node);
@@ -260,7 +271,7 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
 
 		mutex_unlock(&kv->lock);
 
-		kvm_vfio_group_put_external_user(vfio_group);
+		fdput(f);
 
 		kvm_vfio_update_coherency(dev);
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/9] iommu: Add driver-not-bound notification
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
                   ` (2 preceding siblings ...)
  2017-06-20 15:47 ` [PATCH v3 3/9] vfio: New external user group/file match Alex Williamson
@ 2017-06-20 15:48 ` Alex Williamson
  2017-06-20 15:48 ` [PATCH v3 5/9] vfio: Create interface for vfio bus drivers to register Alex Williamson
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:48 UTC (permalink / raw)
  To: kvm; +Cc: eric.auger, alex.williamson, Joerg Roedel, linux-kernel, linux

The driver core supports a BUS_NOTIFY_DRIVER_NOT_BOUND notification
sent if a driver fails to bind to a device.  Extend IOMMU group
notifications to include a version of this.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/iommu.c |    3 +++
 include/linux/iommu.h |    1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index cf7ca7e70777..1a59b3626ab2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1139,6 +1139,9 @@ static int iommu_bus_notifier(struct notifier_block *nb,
 	case BUS_NOTIFY_UNBOUND_DRIVER:
 		group_action = IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER;
 		break;
+	case BUS_NOTIFY_DRIVER_NOT_BOUND:
+		group_action = IOMMU_GROUP_NOTIFY_DRIVER_NOT_BOUND;
+		break;
 	}
 
 	if (group_action)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2cb54adc4a33..5f6b01398bc1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -271,6 +271,7 @@ static inline void iommu_device_set_fwnode(struct iommu_device *iommu,
 #define IOMMU_GROUP_NOTIFY_BOUND_DRIVER		4 /* Post Driver bind */
 #define IOMMU_GROUP_NOTIFY_UNBIND_DRIVER	5 /* Pre Driver unbind */
 #define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER	6 /* Post Driver unbind */
+#define IOMMU_GROUP_NOTIFY_DRIVER_NOT_BOUND	7 /* Driver bind failed */
 
 extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops);
 extern bool iommu_present(struct bus_type *bus);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 5/9] vfio: Create interface for vfio bus drivers to register
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
                   ` (3 preceding siblings ...)
  2017-06-20 15:48 ` [PATCH v3 4/9] iommu: Add driver-not-bound notification Alex Williamson
@ 2017-06-20 15:48 ` Alex Williamson
  2017-06-20 15:48 ` [PATCH v3 6/9] vfio: Register pci, platform, amba, and mdev bus drivers Alex Williamson
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:48 UTC (permalink / raw)
  To: kvm; +Cc: eric.auger, alex.williamson, linux-kernel, linux

Generally we don't know about vfio bus drivers until a device is
added to the vfio-core with vfio_add_group_dev(), this optional
registration with vfio_register_bus_driver() allows vfio-core to
track known drivers.  Our current use for this information is to
know whether a driver is vfio compatible during a bind operation.
For devices on buses with driver_override support, we can use this
linkage to block non-vfio drivers from binding to devices where
the iommu group state would trigger a BUG to avoid host/user
integrity issues.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/vfio.c  |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h |    3 +++
 2 files changed, 58 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 7597a377eb4e..f40d1508d368 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -43,6 +43,8 @@
 	struct class			*class;
 	struct list_head		iommu_drivers_list;
 	struct mutex			iommu_drivers_lock;
+	struct list_head		bus_drivers_list;
+	struct mutex			bus_drivers_lock;
 	struct list_head		group_list;
 	struct idr			group_idr;
 	struct mutex			group_lock;
@@ -51,6 +53,11 @@
 	wait_queue_head_t		release_q;
 } vfio;
 
+struct vfio_bus_driver {
+	struct device_driver		*drv;
+	struct list_head		vfio_next;
+};
+
 struct vfio_iommu_driver {
 	const struct vfio_iommu_driver_ops	*ops;
 	struct list_head			vfio_next;
@@ -2243,6 +2250,52 @@ int vfio_unregister_notifier(struct device *dev, enum vfio_notify_type type,
 }
 EXPORT_SYMBOL(vfio_unregister_notifier);
 
+int vfio_register_bus_driver(struct device_driver *drv)
+{
+	struct vfio_bus_driver *driver, *tmp;
+
+	driver = kzalloc(sizeof(*driver), GFP_KERNEL);
+	if (!driver)
+		return -ENOMEM;
+
+	driver->drv = drv;
+
+	mutex_lock(&vfio.bus_drivers_lock);
+
+	/* Check for duplicates */
+	list_for_each_entry(tmp, &vfio.bus_drivers_list, vfio_next) {
+		if (tmp->drv == drv) {
+			mutex_unlock(&vfio.bus_drivers_lock);
+			kfree(driver);
+			return -EINVAL;
+		}
+	}
+
+	list_add(&driver->vfio_next, &vfio.bus_drivers_list);
+
+	mutex_unlock(&vfio.bus_drivers_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_register_bus_driver);
+
+void vfio_unregister_bus_driver(struct device_driver *drv)
+{
+	struct vfio_bus_driver *driver;
+
+	mutex_lock(&vfio.bus_drivers_lock);
+	list_for_each_entry(driver, &vfio.bus_drivers_list, vfio_next) {
+		if (driver->drv == drv) {
+			list_del(&driver->vfio_next);
+			mutex_unlock(&vfio.bus_drivers_lock);
+			kfree(driver);
+			return;
+		}
+	}
+	mutex_unlock(&vfio.bus_drivers_lock);
+}
+EXPORT_SYMBOL_GPL(vfio_unregister_bus_driver);
+
 /**
  * Module/class support
  */
@@ -2266,8 +2319,10 @@ static int __init vfio_init(void)
 	idr_init(&vfio.group_idr);
 	mutex_init(&vfio.group_lock);
 	mutex_init(&vfio.iommu_drivers_lock);
+	mutex_init(&vfio.bus_drivers_lock);
 	INIT_LIST_HEAD(&vfio.group_list);
 	INIT_LIST_HEAD(&vfio.iommu_drivers_list);
+	INIT_LIST_HEAD(&vfio.bus_drivers_list);
 	init_waitqueue_head(&vfio.release_q);
 
 	ret = misc_register(&vfio_dev);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 9b34d0af5d27..dab0f8105e4a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -92,6 +92,9 @@ struct vfio_iommu_driver_ops {
 extern void vfio_unregister_iommu_driver(
 				const struct vfio_iommu_driver_ops *ops);
 
+extern int vfio_register_bus_driver(struct device_driver *drv);
+extern void vfio_unregister_bus_driver(struct device_driver *drv);
+
 /*
  * External user API
  */

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 6/9] vfio: Register pci, platform, amba, and mdev bus drivers
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
                   ` (4 preceding siblings ...)
  2017-06-20 15:48 ` [PATCH v3 5/9] vfio: Create interface for vfio bus drivers to register Alex Williamson
@ 2017-06-20 15:48 ` Alex Williamson
  2017-06-20 15:48 ` [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers Alex Williamson
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:48 UTC (permalink / raw)
  To: kvm
  Cc: Baptiste Reynal, Kirti Wankhede, linux-kernel, eric.auger,
	alex.williamson, linux

Hook into vfio bus driver register/unregister support.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Cc: Baptiste Reynal <b.reynal@virtualopensystems.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>
---
 drivers/vfio/mdev/vfio_mdev.c         |   13 ++++++++++++-
 drivers/vfio/pci/vfio_pci.c           |    7 +++++++
 drivers/vfio/platform/vfio_amba.c     |   24 +++++++++++++++++++++++-
 drivers/vfio/platform/vfio_platform.c |   24 +++++++++++++++++++++++-
 4 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
index fa848a701b8b..b73d6f3e8ad5 100644
--- a/drivers/vfio/mdev/vfio_mdev.c
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -131,11 +131,22 @@ struct mdev_driver vfio_mdev_driver = {
 
 static int __init vfio_mdev_init(void)
 {
-	return mdev_register_driver(&vfio_mdev_driver, THIS_MODULE);
+	int ret;
+
+	ret = mdev_register_driver(&vfio_mdev_driver, THIS_MODULE);
+	if (ret)
+		return ret;
+
+	ret = vfio_register_bus_driver(&vfio_mdev_driver.driver);
+	if (ret)
+		mdev_unregister_driver(&vfio_mdev_driver);
+
+	return ret;
 }
 
 static void __exit vfio_mdev_exit(void)
 {
+	vfio_unregister_bus_driver(&vfio_mdev_driver.driver);
 	mdev_unregister_driver(&vfio_mdev_driver);
 }
 
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 063c1ce6fa42..b9ac5b8c53e6 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1397,6 +1397,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 
 static void __exit vfio_pci_cleanup(void)
 {
+	vfio_unregister_bus_driver(&vfio_pci_driver.driver);
 	pci_unregister_driver(&vfio_pci_driver);
 	vfio_pci_uninit_perm_bits();
 }
@@ -1456,6 +1457,12 @@ static int __init vfio_pci_init(void)
 	if (ret)
 		goto out_driver;
 
+	ret = vfio_register_bus_driver(&vfio_pci_driver.driver);
+	if (ret) {
+		pci_unregister_driver(&vfio_pci_driver);
+		goto out_driver;
+	}
+
 	vfio_pci_fill_ids();
 
 	return 0;
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index 31372fbf6c5b..7fd9cb4a6756 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -109,7 +109,29 @@ static int vfio_amba_remove(struct amba_device *adev)
 	},
 };
 
-module_amba_driver(vfio_amba_driver);
+static void __exit vfio_amba_exit(void)
+{
+	vfio_unregister_bus_driver(&vfio_amba_driver.drv);
+	amba_driver_unregister(&vfio_amba_driver);
+}
+
+static int __init vfio_amba_init(void)
+{
+	int ret;
+
+	ret = amba_driver_register(&vfio_amba_driver);
+	if (ret)
+		return ret;
+
+	ret = vfio_register_bus_driver(&vfio_amba_driver.drv);
+	if (ret)
+		amba_driver_unregister(&vfio_amba_driver);
+
+	return ret;
+}
+
+module_init(vfio_amba_init);
+module_exit(vfio_amba_exit);
 
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 6561751a1063..3974dc65e6dc 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -100,7 +100,29 @@ static int vfio_platform_remove(struct platform_device *pdev)
 	},
 };
 
-module_platform_driver(vfio_platform_driver);
+static void __exit vfio_platform_exit(void)
+{
+	vfio_unregister_bus_driver(&vfio_platform_driver.driver);
+	platform_driver_unregister(&vfio_platform_driver);
+}
+
+static int __init vfio_platform_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&vfio_platform_driver);
+	if (ret)
+		return ret;
+
+	ret = vfio_register_bus_driver(&vfio_platform_driver.driver);
+	if (ret)
+		platform_driver_unregister(&vfio_platform_driver);
+
+	return ret;
+}
+
+module_init(vfio_platform_init);
+module_exit(vfio_platform_exit);
 
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL v2");

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
                   ` (5 preceding siblings ...)
  2017-06-20 15:48 ` [PATCH v3 6/9] vfio: Register pci, platform, amba, and mdev bus drivers Alex Williamson
@ 2017-06-20 15:48 ` Alex Williamson
  2017-06-26  9:08   ` Russell King - ARM Linux
  2017-06-20 15:48 ` [PATCH v3 8/9] amba: Export amba_bustype Alex Williamson
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:48 UTC (permalink / raw)
  To: kvm; +Cc: eric.auger, alex.williamson, linux-kernel, linux

If a device is bound to a non-vfio, non-whitelisted driver while a
group is in use, then the integrity of the group is compromised and
will result in hitting a BUG_ON.  This code tries to avoid this case
by mangling driver_override to force a no-match for the driver.  The
driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
or BOUND_DRIVER, at which point we can remove the driver_override
mangling.

A complication here is that even though the window between these
notifications is expected to be extremely small, the vfio group could
be removed, which would prevent us from finding the group again to
remove the driver_override.  We therefore take a group reference when
adding to driver_override and release it when removed.  A second
complication is that driver_override can be modified by the system
admin through sysfs.  To avoid trivial interference, we add a non-
user-visible UUID to the group and use this as part of the mangle
string.

The above blocks binding to a driver that would compromise the host,
but we'd also like to avoid reaching that step if possible.  For this
we add a wait_event_timeout() with a short, 1 second timeout, which is
highly effective in allowing empty groups to finish cleanup.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/vfio/vfio.c |  145 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 138 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index f40d1508d368..20e57fecf652 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -25,6 +25,7 @@
 #include <linux/miscdevice.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
+#include <linux/platform_device.h>
 #include <linux/pci.h>
 #include <linux/rwsem.h>
 #include <linux/sched.h>
@@ -32,6 +33,7 @@
 #include <linux/stat.h>
 #include <linux/string.h>
 #include <linux/uaccess.h>
+#include <linux/uuid.h>
 #include <linux/vfio.h>
 #include <linux/wait.h>
 
@@ -95,6 +97,7 @@ struct vfio_group {
 	bool				noiommu;
 	struct kvm			*kvm;
 	struct blocking_notifier_head	notifier;
+	unsigned char			uuid[16];
 };
 
 struct vfio_device {
@@ -352,6 +355,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group)
 
 	group->nb.notifier_call = vfio_iommu_group_notifier;
 
+	generate_random_uuid(group->uuid);
+
 	/*
 	 * blocking notifiers acquire a rwsem around registering and hold
 	 * it around callback.  Therefore, need to register outside of
@@ -728,6 +733,111 @@ static int vfio_group_nb_verify(struct vfio_group *group, struct device *dev)
 	return vfio_dev_viable(dev, group);
 }
 
+#define VFIO_TAG_PREFIX "#vfio_group:"
+
+static char **vfio_find_driver_override(struct device *dev)
+{
+	if (dev_is_pci(dev)) {
+		struct pci_dev *pdev = to_pci_dev(dev);
+		return &pdev->driver_override;
+	} else if (dev->bus == &platform_bus_type) {
+		struct platform_device *pdev = to_platform_device(dev);
+		return &pdev->driver_override;
+	}
+
+	return NULL;
+}
+
+/*
+ * If we're about to bind to something other than a known whitelisted driver
+ * or known vfio bus driver, try to avert it with driver_override.
+ */
+static void vfio_group_nb_pre_bind(struct vfio_group *group, struct device *dev)
+{
+	struct vfio_bus_driver *driver;
+	struct device_driver *drv = ACCESS_ONCE(dev->driver);
+	char **driver_override;
+
+	if (vfio_dev_whitelisted(dev, drv))
+		return; /* Binding to known "innocuous" device/driver */
+
+	mutex_lock(&vfio.bus_drivers_lock);
+	list_for_each_entry(driver, &vfio.bus_drivers_list, vfio_next) {
+		if (driver->drv == drv) {
+			mutex_unlock(&vfio.bus_drivers_lock);
+			return; /* Binding to known vfio bus driver, ok */
+		}
+	}
+	mutex_unlock(&vfio.bus_drivers_lock);
+
+	/* Can we stall slightly to let users fall off? */
+	if (list_empty(&group->device_list)) {
+		if (wait_event_timeout(vfio.release_q,
+				!atomic_read(&group->container_users), HZ))
+			return;
+	}
+
+	driver_override = vfio_find_driver_override(dev);
+	if (driver_override) {
+		char tag[50], *new = NULL, *old = *driver_override;
+
+		snprintf(tag, sizeof(tag), "%s%pU",
+			 VFIO_TAG_PREFIX, group->uuid);
+
+		if (old && strstr(old, tag))
+			return; /* block already in place */
+
+		new = kasprintf(GFP_KERNEL, "%s%s", old ? old : "", tag);
+		if (new) {
+			*driver_override = new;
+			kfree(old);
+			vfio_group_get(group);
+			dev_warn(dev, "vfio: Blocking unsafe driver bind\n");
+			return;
+		}
+	}
+
+	dev_warn(dev, "vfio: Unsafe driver binding to in-use group!\n");
+}
+
+/* If we've mangled driver_override, remove it */
+static void vfio_group_nb_post_bind(struct vfio_group *group,
+				    struct device *dev)
+{
+	char **driver_override = vfio_find_driver_override(dev);
+
+	if (driver_override && *driver_override) {
+		char tag[50], *new, *start, *end, *old = *driver_override;
+
+		snprintf(tag, sizeof(tag), "%s%pU",
+			 VFIO_TAG_PREFIX, group->uuid);
+
+		start = strstr(old, tag);
+		if (start) {
+			end = start + strlen(tag);
+
+			if (old + strlen(old) > end)
+				memmove(start, end,
+					strlen(old) - (end - old) + 1);
+			else
+				*start = 0;
+
+			if (strlen(old)) {
+				new = kasprintf(GFP_KERNEL, "%s", old);
+				if (new) {
+					*driver_override = new;
+					kfree(old);
+				} /* else, in-place terminated, ok */
+			} else {
+				*driver_override = NULL;
+				kfree(old);
+			}
+
+			vfio_group_put(group);
+		}
+	}
+}
+
 static int vfio_iommu_group_notifier(struct notifier_block *nb,
 				     unsigned long action, void *data)
 {
@@ -757,14 +867,23 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 		 */
 		break;
 	case IOMMU_GROUP_NOTIFY_BIND_DRIVER:
-		pr_debug("%s: Device %s, group %d binding to driver\n",
+		pr_debug("%s: Device %s, group %d binding to driver %s\n",
 			 __func__, dev_name(dev),
-			 iommu_group_id(group->iommu_group));
+			 iommu_group_id(group->iommu_group), dev->driver->name);
+		if (vfio_group_nb_verify(group, dev))
+			vfio_group_nb_pre_bind(group, dev);
+		break;
+	case IOMMU_GROUP_NOTIFY_DRIVER_NOT_BOUND:
+		pr_debug("%s: Device %s, group %d binding fail to driver %s\n",
+			 __func__, dev_name(dev),
+			 iommu_group_id(group->iommu_group), dev->driver->name);
+		vfio_group_nb_post_bind(group, dev);
 		break;
 	case IOMMU_GROUP_NOTIFY_BOUND_DRIVER:
 		pr_debug("%s: Device %s, group %d bound to driver %s\n",
 			 __func__, dev_name(dev),
 			 iommu_group_id(group->iommu_group), dev->driver->name);
+		vfio_group_nb_post_bind(group, dev);
 		BUG_ON(vfio_group_nb_verify(group, dev));
 		break;
 	case IOMMU_GROUP_NOTIFY_UNBIND_DRIVER:
@@ -1351,6 +1470,7 @@ static int vfio_group_unset_container(struct vfio_group *group)
 	if (users != 1)
 		return -EBUSY;
 
+	wake_up(&vfio.release_q);
 	__vfio_group_unset_container(group);
 
 	return 0;
@@ -1364,7 +1484,11 @@ static int vfio_group_unset_container(struct vfio_group *group)
  */
 static void vfio_group_try_dissolve_container(struct vfio_group *group)
 {
-	if (0 == atomic_dec_if_positive(&group->container_users))
+	int users = atomic_dec_if_positive(&group->container_users);
+
+	wake_up(&vfio.release_q);
+
+	if (!users)
 		__vfio_group_unset_container(group);
 }
 
@@ -1433,19 +1557,26 @@ static bool vfio_group_viable(struct vfio_group *group)
 
 static int vfio_group_add_container_user(struct vfio_group *group)
 {
+	int ret;
+
 	if (!atomic_inc_not_zero(&group->container_users))
 		return -EINVAL;
 
 	if (group->noiommu) {
-		atomic_dec(&group->container_users);
-		return -EPERM;
+		ret = -EPERM;
+		goto out;
 	}
 	if (!group->container->iommu_driver || !vfio_group_viable(group)) {
-		atomic_dec(&group->container_users);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 
 	return 0;
+
+out:
+	atomic_dec(&group->container_users);
+	wake_up(&vfio.release_q);
+	return ret;
 }
 
 static const struct file_operations vfio_device_fops;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 8/9] amba: Export amba_bustype
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
                   ` (6 preceding siblings ...)
  2017-06-20 15:48 ` [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers Alex Williamson
@ 2017-06-20 15:48 ` Alex Williamson
  2017-06-26  7:30   ` Auger Eric
  2017-06-20 15:48 ` [PATCH v3 9/9] vfio: Add AMBA driver_override support Alex Williamson
  2017-06-26  7:31 ` [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Auger Eric
  9 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:48 UTC (permalink / raw)
  To: kvm; +Cc: eric.auger, alex.williamson, linux, linux-kernel

This allows modules to match struct device.bus to amba_bustype for the
purpose of casting the device to an amba_device with to_amba_device().

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Eric Auger <eric.auger@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
---
 drivers/amba/bus.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index a56fa2a1e9aa..4e118cd3ddf3 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -197,6 +197,7 @@ struct bus_type amba_bustype = {
 	.uevent		= amba_uevent,
 	.pm		= &amba_pm,
 };
+EXPORT_SYMBOL_GPL(amba_bustype);
 
 static int __init amba_init(void)
 {

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 9/9] vfio: Add AMBA driver_override support
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
                   ` (7 preceding siblings ...)
  2017-06-20 15:48 ` [PATCH v3 8/9] amba: Export amba_bustype Alex Williamson
@ 2017-06-20 15:48 ` Alex Williamson
  2017-06-26  7:30   ` Auger Eric
  2017-06-26  7:31 ` [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Auger Eric
  9 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-06-20 15:48 UTC (permalink / raw)
  To: kvm; +Cc: eric.auger, alex.williamson, linux-kernel, linux

AMBA also supports driver_override, but amba_bustype was not exported
to be able to identify an amba device.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 20e57fecf652..36f0fcfded0b 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -36,6 +36,7 @@
 #include <linux/uuid.h>
 #include <linux/vfio.h>
 #include <linux/wait.h>
+#include <linux/amba/bus.h>
 
 #define DRIVER_VERSION	"0.3"
 #define DRIVER_AUTHOR	"Alex Williamson <alex.williamson@redhat.com>"
@@ -743,6 +744,11 @@ static char **vfio_find_driver_override(struct device *dev)
 	} else if (dev->bus == &platform_bus_type) {
 		struct platform_device *pdev = to_platform_device(dev);
 		return &pdev->driver_override;
+#ifdef CONFIG_ARM_AMBA
+	} else if (dev->bus == &amba_bustype) {
+		struct amba_device *adev = to_amba_device(dev);
+		return &adev->driver_override;
+#endif
 	}
 
 	return NULL;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 8/9] amba: Export amba_bustype
  2017-06-20 15:48 ` [PATCH v3 8/9] amba: Export amba_bustype Alex Williamson
@ 2017-06-26  7:30   ` Auger Eric
  0 siblings, 0 replies; 23+ messages in thread
From: Auger Eric @ 2017-06-26  7:30 UTC (permalink / raw)
  To: Alex Williamson, kvm; +Cc: linux, linux-kernel

Hi Alex,

On 20/06/2017 17:48, Alex Williamson wrote:
> This allows modules to match struct device.bus to amba_bustype for the
> purpose of casting the device to an amba_device with to_amba_device().
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Reported-by: Eric Auger <eric.auger@redhat.com>
> Cc: Russell King <linux@armlinux.org.uk>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
> ---
>  drivers/amba/bus.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> index a56fa2a1e9aa..4e118cd3ddf3 100644
> --- a/drivers/amba/bus.c
> +++ b/drivers/amba/bus.c
> @@ -197,6 +197,7 @@ struct bus_type amba_bustype = {
>  	.uevent		= amba_uevent,
>  	.pm		= &amba_pm,
>  };
> +EXPORT_SYMBOL_GPL(amba_bustype);
>  
>  static int __init amba_init(void)
>  {
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 9/9] vfio: Add AMBA driver_override support
  2017-06-20 15:48 ` [PATCH v3 9/9] vfio: Add AMBA driver_override support Alex Williamson
@ 2017-06-26  7:30   ` Auger Eric
  0 siblings, 0 replies; 23+ messages in thread
From: Auger Eric @ 2017-06-26  7:30 UTC (permalink / raw)
  To: Alex Williamson, kvm; +Cc: linux-kernel, linux

Hi Alex,

On 20/06/2017 17:48, Alex Williamson wrote:
> AMBA also supports driver_override, but amba_bustype was not exported
> to be able to identify an amba device.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  drivers/vfio/vfio.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 20e57fecf652..36f0fcfded0b 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -36,6 +36,7 @@
>  #include <linux/uuid.h>
>  #include <linux/vfio.h>
>  #include <linux/wait.h>
> +#include <linux/amba/bus.h>
>  
>  #define DRIVER_VERSION	"0.3"
>  #define DRIVER_AUTHOR	"Alex Williamson <alex.williamson@redhat.com>"
> @@ -743,6 +744,11 @@ static char **vfio_find_driver_override(struct device *dev)
>  	} else if (dev->bus == &platform_bus_type) {
>  		struct platform_device *pdev = to_platform_device(dev);
>  		return &pdev->driver_override;
> +#ifdef CONFIG_ARM_AMBA
> +	} else if (dev->bus == &amba_bustype) {
> +		struct amba_device *adev = to_amba_device(dev);
> +		return &adev->driver_override;
> +#endif
>  	}
>  
>  	return NULL;
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/9] kvm-vfio: Decouple only when we match a group
  2017-06-20 15:47 ` [PATCH v3 2/9] kvm-vfio: Decouple only when we match a group Alex Williamson
@ 2017-06-26  7:30   ` Auger Eric
  2017-06-28 17:37   ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Auger Eric @ 2017-06-26  7:30 UTC (permalink / raw)
  To: Alex Williamson, kvm
  Cc: Alexey Kardashevskiy, linux-kernel, stable, linux, Paolo Bonzini

Hi,
On 20/06/2017 17:47, Alex Williamson wrote:
> Unset-KVM and decrement-assignment only when we find the group in our
> list.  Otherwise we can get out of sync if the user triggers this for
> groups that aren't currently on our list.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: stable@vger.kernel.org
Reviewed-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric
> ---
>  virt/kvm/vfio.c |   13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 37d9118fd84b..6e002d0f3191 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -246,21 +246,20 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>  				continue;
>  
>  			list_del(&kvg->node);
> +			kvm_arch_end_assignment(dev->kvm);
> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> +			kvm_spapr_tce_release_vfio_group(dev->kvm,
> +							 kvg->vfio_group);
> +#endif
> +			kvm_vfio_group_set_kvm(kvg->vfio_group, NULL);
>  			kvm_vfio_group_put_external_user(kvg->vfio_group);
>  			kfree(kvg);
>  			ret = 0;
>  			break;
>  		}
>  
> -		kvm_arch_end_assignment(dev->kvm);
> -
>  		mutex_unlock(&kv->lock);
>  
> -#ifdef CONFIG_SPAPR_TCE_IOMMU
> -		kvm_spapr_tce_release_vfio_group(dev->kvm, vfio_group);
> -#endif
> -		kvm_vfio_group_set_kvm(vfio_group, NULL);
> -
>  		kvm_vfio_group_put_external_user(vfio_group);
>  
>  		kvm_vfio_update_coherency(dev);
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override
  2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
                   ` (8 preceding siblings ...)
  2017-06-20 15:48 ` [PATCH v3 9/9] vfio: Add AMBA driver_override support Alex Williamson
@ 2017-06-26  7:31 ` Auger Eric
  9 siblings, 0 replies; 23+ messages in thread
From: Auger Eric @ 2017-06-26  7:31 UTC (permalink / raw)
  To: Alex Williamson, kvm; +Cc: linux-kernel, linux

Hi Alex,

On 20/06/2017 17:47, Alex Williamson wrote:
> v3:
> 
>  * Fix Alexey's nit in 2/, which becomes a bug in 3/.  I posted the
>    intended correction for this, but 0-day builds broke on it and I'd
>    like to be sure we get all the automated testing possible, so v3.
>    Added Alexey's Rb.

I tested non regression with vfio-platform.

"vfio: Use driver_override to avert binding to compromising drivers"
modality was tested with vfio-pci on aarch64 AMD Overdrive system (2 igb
PFs in the same group).

Feel free to add my Tested-by: Eric Auger <eric.auger@redhat.com>

Thanks

Eric

> 
> Thanks,
> 
> Alex
> 
> v2:
> 
>  * Added received acks and reviews, thanks!
>  * Rebased and resolved conflict in patch 2/, dropped reviews due
>    to changes and added Alexey to cc as spapr code is moved too
>  * Added stable tag for patches 1-3
>  * Resolved comment typo Eric noted in patch 1
>  * Split AMBA out to patches 8 & 9 as Eric noted amba_bustype is
>    not exported.  These can be separate follow-up patches if delayed
> 
> Please re-ack/review patch 2.  Eric, I'm happy to add your Tested-by
> to the whole series if appropriate as well.  Thanks,
> 
> Alex
> 
> 
> v1:
> 
> VM hotplug testing reveals a number of races in the vfio device,
> group, container shutdown path, some attributed to libvirt's ask/take
> unplug behavior and some long standing with groups potentially
> composed of multiple devices, where each device can be independently
> bound to drivers.  Libvirt's ask/take behavior is a result of the
> asynchronous nature of PCI hotplug, libvirt registers a hot-unplug
> request (ask), which is acknowledged almost immediately and then
> proceeds to try to unbind the device from the vfio bus driver (take).
> This sets us off on racing paths where we allow the device to be
> released from the group much like would happen in groups with multiple
> devices, while the group and container are torn down separately.
> These races are addressed in the first 3 patches of this series.
> 
> The long standing issue with removing devices from in-use groups is
> that we feel that the system is compromised if we allow user and host
> devices within the same non-isolated group.  This triggers a BUG_ON
> when we detect this condition after the rogue driver binding.  Since
> that code was put in place we've added driver_override support for
> all of the physical buses supported by vfio, giving us a way to block
> binding to such compromising drivers.  We finally enable that in the
> latter 4 patches of this series, minding that we need to allow
> re-binding to non-compromising drivers, and also noting that a small
> synchronization stall is effective in eliminating the need for this
> blocking in the more common singleton device group case.
> 
> Reviews, comments, and acks appreciated.  Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (9):
>       vfio: Fix group release deadlock
>       kvm-vfio: Decouple only when we match a group
>       vfio: New external user group/file match
>       iommu: Add driver-not-bound notification
>       vfio: Create interface for vfio bus drivers to register
>       vfio: Register pci, platform, amba, and mdev bus drivers
>       vfio: Use driver_override to avert binding to compromising drivers
>       amba: Export amba_bustype
>       vfio: Add AMBA driver_override support
> 
> 
>  drivers/amba/bus.c                    |    1 
>  drivers/iommu/iommu.c                 |    3 
>  drivers/vfio/mdev/vfio_mdev.c         |   13 ++
>  drivers/vfio/pci/vfio_pci.c           |    7 +
>  drivers/vfio/platform/vfio_amba.c     |   24 +++
>  drivers/vfio/platform/vfio_platform.c |   24 +++
>  drivers/vfio/vfio.c                   |  252 ++++++++++++++++++++++++++++++++-
>  include/linux/iommu.h                 |    1 
>  include/linux/vfio.h                  |    5 +
>  virt/kvm/vfio.c                       |   40 +++--
>  10 files changed, 344 insertions(+), 26 deletions(-)
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers
  2017-06-20 15:48 ` [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers Alex Williamson
@ 2017-06-26  9:08   ` Russell King - ARM Linux
  2017-06-26 19:39     ` Alex Williamson
  2017-07-10 21:34     ` Alex Williamson
  0 siblings, 2 replies; 23+ messages in thread
From: Russell King - ARM Linux @ 2017-06-26  9:08 UTC (permalink / raw)
  To: Alex Williamson, Greg KH; +Cc: kvm, eric.auger, linux-kernel

On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:
> If a device is bound to a non-vfio, non-whitelisted driver while a
> group is in use, then the integrity of the group is compromised and
> will result in hitting a BUG_ON.  This code tries to avoid this case
> by mangling driver_override to force a no-match for the driver.  The
> driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> or BOUND_DRIVER, at which point we can remove the driver_override
> mangling.

Rather than mangling the driver override string to prevent driver binding,
I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
notifier to fail the device probe?

The driver override strings are, after all, exposed to userspace, and
it strikes me that this kind of mangling is racy - userspace can read
or change the override string at any time.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers
  2017-06-26  9:08   ` Russell King - ARM Linux
@ 2017-06-26 19:39     ` Alex Williamson
  2017-07-10 21:34     ` Alex Williamson
  1 sibling, 0 replies; 23+ messages in thread
From: Alex Williamson @ 2017-06-26 19:39 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: Greg KH, kvm, eric.auger, linux-kernel

On Mon, 26 Jun 2017 10:08:55 +0100
Russell King - ARM Linux <linux@armlinux.org.uk> wrote:

> On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:
> > If a device is bound to a non-vfio, non-whitelisted driver while a
> > group is in use, then the integrity of the group is compromised and
> > will result in hitting a BUG_ON.  This code tries to avoid this case
> > by mangling driver_override to force a no-match for the driver.  The
> > driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> > or BOUND_DRIVER, at which point we can remove the driver_override
> > mangling.  
> 
> Rather than mangling the driver override string to prevent driver binding,
> I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
> notifier to fail the device probe?
> 
> The driver override strings are, after all, exposed to userspace, and
> it strikes me that this kind of mangling is racy - userspace can read
> or change the override string at any time.

Indeed, that looks easier.  I sent and RFC, let's see what Greg has to
say.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/9] kvm-vfio: Decouple only when we match a group
  2017-06-20 15:47 ` [PATCH v3 2/9] kvm-vfio: Decouple only when we match a group Alex Williamson
  2017-06-26  7:30   ` Auger Eric
@ 2017-06-28 17:37   ` Paolo Bonzini
  1 sibling, 0 replies; 23+ messages in thread
From: Paolo Bonzini @ 2017-06-28 17:37 UTC (permalink / raw)
  To: Alex Williamson, kvm
  Cc: Alexey Kardashevskiy, linux-kernel, stable, linux, eric.auger



On 20/06/2017 17:47, Alex Williamson wrote:
> Unset-KVM and decrement-assignment only when we find the group in our
> list.  Otherwise we can get out of sync if the user triggers this for
> groups that aren't currently on our list.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Eric Auger <eric.auger@redhat.com>
> Cc: stable@vger.kernel.org
> ---
>  virt/kvm/vfio.c |   13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 37d9118fd84b..6e002d0f3191 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -246,21 +246,20 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>  				continue;
>  
>  			list_del(&kvg->node);
> +			kvm_arch_end_assignment(dev->kvm);
> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> +			kvm_spapr_tce_release_vfio_group(dev->kvm,
> +							 kvg->vfio_group);
> +#endif
> +			kvm_vfio_group_set_kvm(kvg->vfio_group, NULL);
>  			kvm_vfio_group_put_external_user(kvg->vfio_group);
>  			kfree(kvg);
>  			ret = 0;
>  			break;
>  		}
>  
> -		kvm_arch_end_assignment(dev->kvm);
> -
>  		mutex_unlock(&kv->lock);
>  
> -#ifdef CONFIG_SPAPR_TCE_IOMMU
> -		kvm_spapr_tce_release_vfio_group(dev->kvm, vfio_group);
> -#endif
> -		kvm_vfio_group_set_kvm(vfio_group, NULL);
> -
>  		kvm_vfio_group_put_external_user(vfio_group);
>  
>  		kvm_vfio_update_coherency(dev);
> 
> 

Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers
  2017-06-26  9:08   ` Russell King - ARM Linux
  2017-06-26 19:39     ` Alex Williamson
@ 2017-07-10 21:34     ` Alex Williamson
  2017-07-11  9:46       ` Greg KH
  1 sibling, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-07-10 21:34 UTC (permalink / raw)
  To: Russell King - ARM Linux; +Cc: Greg KH, kvm, eric.auger, linux-kernel

On Mon, 26 Jun 2017 10:08:55 +0100
Russell King - ARM Linux <linux@armlinux.org.uk> wrote:

> On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:
> > If a device is bound to a non-vfio, non-whitelisted driver while a
> > group is in use, then the integrity of the group is compromised and
> > will result in hitting a BUG_ON.  This code tries to avoid this case
> > by mangling driver_override to force a no-match for the driver.  The
> > driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> > or BOUND_DRIVER, at which point we can remove the driver_override
> > mangling.  
> 
> Rather than mangling the driver override string to prevent driver binding,
> I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
> notifier to fail the device probe?

Well, it seemed like a good idea, but I don't think we're getting any
traction here, the thread has gone cold:

https://lkml.org/lkml/2017/6/27/1002

Greg, any further comments?
 
> The driver override strings are, after all, exposed to userspace, and
> it strikes me that this kind of mangling is racy - userspace can read
> or change the override string at any time.

As an alternative, I think we can make this not racy.  BIND_DRIVER is
notified through device_bind_driver() which specifies that the device
lock is held.  This covers not only BIND_DRIVER, but also BOUND_DRIVER
and DRIVER_NOT_BOUND.  So if the user entry points in sysfs were to
require the device lock, we could easily mangle and de-mangle without
interference from a user.  It also seems like a rather good idea in
general to exclude the user from changing driver_override while we're
evaluating a match using it.  Do you still have an objection to
mangling driver_override if we can avoid the user race?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers
  2017-07-10 21:34     ` Alex Williamson
@ 2017-07-11  9:46       ` Greg KH
  2017-07-11 16:41         ` Alex Williamson
  0 siblings, 1 reply; 23+ messages in thread
From: Greg KH @ 2017-07-11  9:46 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Russell King - ARM Linux, kvm, eric.auger, linux-kernel

On Mon, Jul 10, 2017 at 03:34:12PM -0600, Alex Williamson wrote:
> On Mon, 26 Jun 2017 10:08:55 +0100
> Russell King - ARM Linux <linux@armlinux.org.uk> wrote:
> 
> > On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:
> > > If a device is bound to a non-vfio, non-whitelisted driver while a
> > > group is in use, then the integrity of the group is compromised and
> > > will result in hitting a BUG_ON.  This code tries to avoid this case
> > > by mangling driver_override to force a no-match for the driver.  The
> > > driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> > > or BOUND_DRIVER, at which point we can remove the driver_override
> > > mangling.  
> > 
> > Rather than mangling the driver override string to prevent driver binding,
> > I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
> > notifier to fail the device probe?
> 
> Well, it seemed like a good idea, but I don't think we're getting any
> traction here, the thread has gone cold:
> 
> https://lkml.org/lkml/2017/6/27/1002
> 
> Greg, any further comments?

I still think your drivers should be fixed, adding
yet-another-odd-interaction with the driver core is ripe for added
complexity...

And, as there's no real patch for me to do anything with (hint, I can't
apply RFC patches), I don't know what I can do here...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers
  2017-07-11  9:46       ` Greg KH
@ 2017-07-11 16:41         ` Alex Williamson
  2017-07-13  8:23           ` Greg KH
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-07-11 16:41 UTC (permalink / raw)
  To: Greg KH; +Cc: Russell King - ARM Linux, kvm, eric.auger, linux-kernel

On Tue, 11 Jul 2017 11:46:27 +0200
Greg KH <greg@kroah.com> wrote:

> On Mon, Jul 10, 2017 at 03:34:12PM -0600, Alex Williamson wrote:
> > On Mon, 26 Jun 2017 10:08:55 +0100
> > Russell King - ARM Linux <linux@armlinux.org.uk> wrote:
> >   
> > > On Tue, Jun 20, 2017 at 09:48:31AM -0600, Alex Williamson wrote:  
> > > > If a device is bound to a non-vfio, non-whitelisted driver while a
> > > > group is in use, then the integrity of the group is compromised and
> > > > will result in hitting a BUG_ON.  This code tries to avoid this case
> > > > by mangling driver_override to force a no-match for the driver.  The
> > > > driver-core will either follow-up with a DRIVER_NOT_BOUND (preferred)
> > > > or BOUND_DRIVER, at which point we can remove the driver_override
> > > > mangling.    
> > > 
> > > Rather than mangling the driver override string to prevent driver binding,
> > > I wonder if it would make more sense to allow the BUS_NOTIFY_BIND_DRIVER
> > > notifier to fail the device probe?  
> > 
> > Well, it seemed like a good idea, but I don't think we're getting any
> > traction here, the thread has gone cold:
> > 
> > https://lkml.org/lkml/2017/6/27/1002
> > 
> > Greg, any further comments?  
> 
> I still think your drivers should be fixed, adding
> yet-another-odd-interaction with the driver core is ripe for added
> complexity...

Hi Greg,

Let me give a concrete scenario, I have a dual-port conventional PCI
e1000 NIC.  The IOMMU operates on PCIe requester IDs and therefore both
NIC functions are masked behind the requester ID of a PCIe-to-PCI
bridge.  We cannot have the e1000 driver managing one function and a
user managing the other (via vfio-pci).  In this case, not only is the
DMA not isolated but the functions share the same IOMMU context.
Therefore in order to allow the user access to one function via
vfio-pci, the other function needs to be in a known state, either also
bound to vfio-pci, bound to an innocuous driver like pci-stub, or
unbound from any driver.  Given this state, user now has access to one
function of the device, but how can we fix our driver to manage the
other function?

If the other function is also bound to vfio-pci, the driver core does
not allow us to refuse a driver remove request, the best we can do is
block for a while, but we best not do that too long so we end up in the
device unbound state.

Likewise, if the other function was bound to pci-stub, this driver won't
block remove, so the device for the other port can transition to an
unbound state.

Once in an unbound state, how would fixing either the vfio-pci or the
core vfio driver prevent the scenario which can now happen of the
unbound device being bound to the host e1000 driver?  This can happen
in pure PCIe topologies as well where perhaps the IOMMU context is not
shared, but the devices still lack DMA isolation within the group.

The only tool we currently have to manage this scenario is that the
vfio core driver can pull BUG_ON after the fact of the other device
being bound to a host driver.  Understandably, users aren't so keen on
this, which is why I'm trying to allow vfio to block binding of that
other device before it happens.  None of this really seems to fall
within the capabilities of the existing driver core, so simply fixing
my driver doesn't seem to be a well defined option.  Is there a simple
solution I'm missing?  We're not concerned only with auto-probing, we
need to protect against explicit bind attempts as well.
 
> And, as there's no real patch for me to do anything with (hint, I can't
> apply RFC patches), I don't know what I can do here...

Certainly continuing the discussion is all I'm asking for at this
point.  The RFC didn't tickle your fancy, but the reply also didn't
convey an appreciation of the circumstances.  I hope that perhaps this
gets us a step closer so we can decide which way to go.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers
  2017-07-11 16:41         ` Alex Williamson
@ 2017-07-13  8:23           ` Greg KH
  2017-07-14 16:03             ` Alex Williamson
  0 siblings, 1 reply; 23+ messages in thread
From: Greg KH @ 2017-07-13  8:23 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Russell King - ARM Linux, kvm, eric.auger, linux-kernel

On Tue, Jul 11, 2017 at 10:41:16AM -0600, Alex Williamson wrote:
> Let me give a concrete scenario, I have a dual-port conventional PCI
> e1000 NIC.  The IOMMU operates on PCIe requester IDs and therefore both
> NIC functions are masked behind the requester ID of a PCIe-to-PCI
> bridge.  We cannot have the e1000 driver managing one function and a
> user managing the other (via vfio-pci).

Agreed, but really, if a user asks to do such a thing, they deserve the
pieces the kernel ends up in, right?

> In this case, not only is the
> DMA not isolated but the functions share the same IOMMU context.
> Therefore in order to allow the user access to one function via
> vfio-pci, the other function needs to be in a known state, either also
> bound to vfio-pci, bound to an innocuous driver like pci-stub, or
> unbound from any driver.  Given this state, user now has access to one
> function of the device, but how can we fix our driver to manage the
> other function?

We have USB drivers that do this all the time, due to crazy USB specs
that required it.  The cdc-acm driver is one example, and I think there
are a number of USB sound devices that also have this issue.  Just have
the driver of the "first" device grab the second one as well and "know"
about the resources involved, as you are doing today.

But, then you somehow seem to have the requirement to prevent userspace
from mucking around in your driver bindings, and really, you shouldn't
care about this, because again, if it messes up here, all bets are off.

> If the other function is also bound to vfio-pci, the driver core does
> not allow us to refuse a driver remove request, the best we can do is
> block for a while, but we best not do that too long so we end up in the
> device unbound state.
> 
> Likewise, if the other function was bound to pci-stub, this driver won't
> block remove, so the device for the other port can transition to an
> unbound state.
> 
> Once in an unbound state, how would fixing either the vfio-pci or the
> core vfio driver prevent the scenario which can now happen of the
> unbound device being bound to the host e1000 driver?  This can happen
> in pure PCIe topologies as well where perhaps the IOMMU context is not
> shared, but the devices still lack DMA isolation within the group.
> 
> The only tool we currently have to manage this scenario is that the
> vfio core driver can pull BUG_ON after the fact of the other device
> being bound to a host driver.

Well, how about just locking the device down, don't crash the kernel.
That at least gives userspace a chance to figure out what they did was
wrong.

> Understandably, users aren't so keen on
> this, which is why I'm trying to allow vfio to block binding of that
> other device before it happens.  None of this really seems to fall
> within the capabilities of the existing driver core, so simply fixing
> my driver doesn't seem to be a well defined option.  Is there a simple
> solution I'm missing?  We're not concerned only with auto-probing, we
> need to protect against explicit bind attempts as well.

Again, if userspace does an explicit bind/unbind attempt, it _HAS_ to
know what it is doing, we can't protect ourselves from that, that's
always been the case.  The bind/unbind was done as a way for people to
say "I know more about what is going on than the kernel does right now,
so I'm going to override it." and we have to trust that they do know
that.  Don't spend a lot of time and energy trying to protect yourself
from that please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers
  2017-07-13  8:23           ` Greg KH
@ 2017-07-14 16:03             ` Alex Williamson
  2017-07-14 20:09               ` Greg KH
  0 siblings, 1 reply; 23+ messages in thread
From: Alex Williamson @ 2017-07-14 16:03 UTC (permalink / raw)
  To: Greg KH; +Cc: Russell King - ARM Linux, kvm, eric.auger, linux-kernel

Hi Greg,

On Thu, 13 Jul 2017 10:23:14 +0200
Greg KH <greg@kroah.com> wrote:

> On Tue, Jul 11, 2017 at 10:41:16AM -0600, Alex Williamson wrote:
> > Let me give a concrete scenario, I have a dual-port conventional PCI
> > e1000 NIC.  The IOMMU operates on PCIe requester IDs and therefore both
> > NIC functions are masked behind the requester ID of a PCIe-to-PCI
> > bridge.  We cannot have the e1000 driver managing one function and a
> > user managing the other (via vfio-pci).  
> 
> Agreed, but really, if a user asks to do such a thing, they deserve the
> pieces the kernel ends up in, right?

I think that's asking a fair bit from users to understand these
nuances; at one point in time they can bind the device to e1000 and it
works, at another point in time the same operation crashes the system.
Perhaps the user is not even using manual binding, maybe the e1000
driver is freshly loaded and probes any devices that are not bound.
Maybe the device is passed through /sys/bus/pci/drivers_probe.

If the user attempts to bind a device to the wrong driver we don't
intentionally run the system into the ground to make that work.  Of
course if the user is overriding a match with dynamic IDs or
driver_override, then I fully agree, the user should be responsible for
the action they've dictated.  I tend to think of this more towards the
former than the latter.

> > In this case, not only is the
> > DMA not isolated but the functions share the same IOMMU context.
> > Therefore in order to allow the user access to one function via
> > vfio-pci, the other function needs to be in a known state, either also
> > bound to vfio-pci, bound to an innocuous driver like pci-stub, or
> > unbound from any driver.  Given this state, user now has access to one
> > function of the device, but how can we fix our driver to manage the
> > other function?  
> 
> We have USB drivers that do this all the time, due to crazy USB specs
> that required it.  The cdc-acm driver is one example, and I think there
> are a number of USB sound devices that also have this issue.  Just have
> the driver of the "first" device grab the second one as well and "know"
> about the resources involved, as you are doing today.
> 
> But, then you somehow seem to have the requirement to prevent userspace
> from mucking around in your driver bindings, and really, you shouldn't
> care about this, because again, if it messes up here, all bets are off.
> 
> > If the other function is also bound to vfio-pci, the driver core does
> > not allow us to refuse a driver remove request, the best we can do is
> > block for a while, but we best not do that too long so we end up in the
> > device unbound state.
> > 
> > Likewise, if the other function was bound to pci-stub, this driver won't
> > block remove, so the device for the other port can transition to an
> > unbound state.
> > 
> > Once in an unbound state, how would fixing either the vfio-pci or the
> > core vfio driver prevent the scenario which can now happen of the
> > unbound device being bound to the host e1000 driver?  This can happen
> > in pure PCIe topologies as well where perhaps the IOMMU context is not
> > shared, but the devices still lack DMA isolation within the group.
> > 
> > The only tool we currently have to manage this scenario is that the
> > vfio core driver can pull BUG_ON after the fact of the other device
> > being bound to a host driver.  
> 
> Well, how about just locking the device down, don't crash the kernel.
> That at least gives userspace a chance to figure out what they did was
> wrong.

Locking down the device is an interesting strategy, I'll look into this
more.  I think we'd be locking down the user/vfio owned device,
locking down the device for non-vfio use is essentially what I've been
trying to do with blocking driver binding.  With PCI devices we have the
bus-master bit that we could clear and prevent the user from changing to
theoretically stop all DMA for the device.  We'd also need to destroy
the IOMMU domain or else the IOMMU driver is going to break because the
wrong domain type is setup for the host owned device.  However, if we
manage to do all of this perfectly, the result would be that the user
owned device quietly stops working, perhaps only a dmesg log entry to
identify what happened.  I can imagine many bugs filed and much time
wasted looking for that magic breadcrumb.  Perhaps the better approach
is to try to request the conflicting device(s) back from the user, and
failing that kill the user process, ultimately falling back to the
existing BUG_ON if we haven't resolved the compromising scenario.  A
harsher approach, but there's certainly the argument that we're trying
to follow the user command of binding to the native host driver.
 
> > Understandably, users aren't so keen on
> > this, which is why I'm trying to allow vfio to block binding of that
> > other device before it happens.  None of this really seems to fall
> > within the capabilities of the existing driver core, so simply fixing
> > my driver doesn't seem to be a well defined option.  Is there a simple
> > solution I'm missing?  We're not concerned only with auto-probing, we
> > need to protect against explicit bind attempts as well.  
> 
> Again, if userspace does an explicit bind/unbind attempt, it _HAS_ to
> know what it is doing, we can't protect ourselves from that, that's
> always been the case.  The bind/unbind was done as a way for people to
> say "I know more about what is going on than the kernel does right now,
> so I'm going to override it." and we have to trust that they do know
> that.  Don't spend a lot of time and energy trying to protect yourself
> from that please.

Thanks for the advice, I appreciate your time on this.  I'd still like
to improve the current behavior, but I'll see what I can do on the side
of imposing user consequences before taking the head shot.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers
  2017-07-14 16:03             ` Alex Williamson
@ 2017-07-14 20:09               ` Greg KH
  0 siblings, 0 replies; 23+ messages in thread
From: Greg KH @ 2017-07-14 20:09 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Russell King - ARM Linux, kvm, eric.auger, linux-kernel

On Fri, Jul 14, 2017 at 10:03:27AM -0600, Alex Williamson wrote:
> Hi Greg,
> 
> On Thu, 13 Jul 2017 10:23:14 +0200
> Greg KH <greg@kroah.com> wrote:
> 
> > On Tue, Jul 11, 2017 at 10:41:16AM -0600, Alex Williamson wrote:
> > > Let me give a concrete scenario, I have a dual-port conventional PCI
> > > e1000 NIC.  The IOMMU operates on PCIe requester IDs and therefore both
> > > NIC functions are masked behind the requester ID of a PCIe-to-PCI
> > > bridge.  We cannot have the e1000 driver managing one function and a
> > > user managing the other (via vfio-pci).  
> > 
> > Agreed, but really, if a user asks to do such a thing, they deserve the
> > pieces the kernel ends up in, right?
> 
> I think that's asking a fair bit from users to understand these
> nuances;

There is no "nuance" here.

> at one point in time they can bind the device to e1000 and it
> works,

Yeah, they got lucky!

> at another point in time the same operation crashes the system.

And now they didn't!  What did they do differently?  Oh look, one other
device needed to be unbound/bound/whatever to get this to work properly,
let's do that instead next time.

> Perhaps the user is not even using manual binding, maybe the e1000
> driver is freshly loaded and probes any devices that are not bound.

No, that's not the issue here at all, that should always work as the
kernel driver is telling us that it can support this device just fine.
Nothing "grey" or "nuanced" here at all.

> Maybe the device is passed through /sys/bus/pci/drivers_probe.

Then the user gets to keep the pieces of the kernel that might get spit
out at them.

Again, doing manual binding is a risk that if a user takes, it might or
might not work.  It's always been that way.

> If the user attempts to bind a device to the wrong driver we don't
> intentionally run the system into the ground to make that work.

Are you kidding?  That happens all the time, try to do it yourself and
bind a device to a driver that doesn't expect to be handling it.  If you
are lucky your kernel will crash, if unlucky, it will limp along and do
odd things to the device.  It's been this way since these sysfs files
were added over a decade ago.

> Of course if the user is overriding a match with dynamic IDs or
> driver_override, then I fully agree, the user should be responsible
> for the action they've dictated.  I tend to think of this more towards
> the former than the latter.

The user is always responsible if they are using sysfs to bind/unbind
devices from drivers.  It's not complex or nuanced at all.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-07-14 20:09 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-20 15:47 [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Alex Williamson
2017-06-20 15:47 ` [PATCH v3 1/9] vfio: Fix group release deadlock Alex Williamson
2017-06-20 15:47 ` [PATCH v3 2/9] kvm-vfio: Decouple only when we match a group Alex Williamson
2017-06-26  7:30   ` Auger Eric
2017-06-28 17:37   ` Paolo Bonzini
2017-06-20 15:47 ` [PATCH v3 3/9] vfio: New external user group/file match Alex Williamson
2017-06-20 15:48 ` [PATCH v3 4/9] iommu: Add driver-not-bound notification Alex Williamson
2017-06-20 15:48 ` [PATCH v3 5/9] vfio: Create interface for vfio bus drivers to register Alex Williamson
2017-06-20 15:48 ` [PATCH v3 6/9] vfio: Register pci, platform, amba, and mdev bus drivers Alex Williamson
2017-06-20 15:48 ` [PATCH v3 7/9] vfio: Use driver_override to avert binding to compromising drivers Alex Williamson
2017-06-26  9:08   ` Russell King - ARM Linux
2017-06-26 19:39     ` Alex Williamson
2017-07-10 21:34     ` Alex Williamson
2017-07-11  9:46       ` Greg KH
2017-07-11 16:41         ` Alex Williamson
2017-07-13  8:23           ` Greg KH
2017-07-14 16:03             ` Alex Williamson
2017-07-14 20:09               ` Greg KH
2017-06-20 15:48 ` [PATCH v3 8/9] amba: Export amba_bustype Alex Williamson
2017-06-26  7:30   ` Auger Eric
2017-06-20 15:48 ` [PATCH v3 9/9] vfio: Add AMBA driver_override support Alex Williamson
2017-06-26  7:30   ` Auger Eric
2017-06-26  7:31 ` [PATCH v3 0/9] vfio: Fix release ordering races and use driver_override Auger Eric

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.