All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v6-based v1 0/5] refine mdev framework
@ 2016-08-16  8:14 ` Jike Song
  0 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel


This patchset is based on NVidia's "Add Mediated device support" series, version 6:

	http://www.spinics.net/lists/kvm/msg136472.html


Background:

	The patchset from NVidia introduced the Mediated Device support to
	Linux/VFIO. With that series, one can create virtual devices (supporting
	by underlying physical device and vendor driver), and assign them to
	userspace like QEMU/KVM, in the same way as device assignment via VFIO.

	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
	implemented its CCW pass-through.  However, there are limitations
	imposed by current (v6 in particular) mdev framework: the mdev must be
	represented as a PCI device, several vfio capabilities such as
	sparse mmap are not possible, and so forth.

	This series aims to address above limitations and simplify the implementation.


Key Changes:

	- An independent "struct device" was introduced to parent_device, thus
	  a hierarchy in driver core is formed with physical device, parent device
	  and mdev device;

	- Leveraging the mechanism and APIs provided by Linux driver core, it
	  is now safe to remove all refcnts and locks;

	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
	  PCI-specific logic was removed, accesses from userspace are now
	  passed to vendor driver directly, thus guaranteed that full VFIO
	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;

	  With vfio_mdev being BUS-agnostic, it is enough to have only one
	  driver for all mdev devices;

	- UUID was removed from the interface between mdev and vendor driver;


TODO

	remove mdev stuff from vfio.h
	update doc



Jike Song (3):
  mdev: remove uuid from parent_ops
  vfio_mpci: remove PCI-specific logic
  rename 'vfio_mpci' to 'vfio_mdev' since it is now BUS-agnostic

Xiao Guangrong (2):
  mdev: create separate device for parent_device
  mdev: drop management information and utilize driver-core instead

 drivers/vfio/mdev/Kconfig           |  10 +-
 drivers/vfio/mdev/Makefile          |   5 +-
 drivers/vfio/mdev/mdev_core.c       | 483 +++++---------------------------
 drivers/vfio/mdev/mdev_private.h    |   9 +-
 drivers/vfio/mdev/mdev_sysfs.c      |  82 ++----
 drivers/vfio/mdev/vfio_mdev.c       | 168 +++++++++++
 drivers/vfio/mdev/vfio_mpci.c       | 536 ------------------------------------
 drivers/vfio/pci/vfio_pci_private.h |   6 +
 include/linux/mdev.h                |  87 ++----
 include/linux/vfio.h                |   7 -
 10 files changed, 302 insertions(+), 1091 deletions(-)
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c
 delete mode 100644 drivers/vfio/mdev/vfio_mpci.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v6-based v1 0/5] refine mdev framework
@ 2016-08-16  8:14 ` Jike Song
  0 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel


This patchset is based on NVidia's "Add Mediated device support" series, version 6:

	http://www.spinics.net/lists/kvm/msg136472.html


Background:

	The patchset from NVidia introduced the Mediated Device support to
	Linux/VFIO. With that series, one can create virtual devices (supporting
	by underlying physical device and vendor driver), and assign them to
	userspace like QEMU/KVM, in the same way as device assignment via VFIO.

	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
	implemented its CCW pass-through.  However, there are limitations
	imposed by current (v6 in particular) mdev framework: the mdev must be
	represented as a PCI device, several vfio capabilities such as
	sparse mmap are not possible, and so forth.

	This series aims to address above limitations and simplify the implementation.


Key Changes:

	- An independent "struct device" was introduced to parent_device, thus
	  a hierarchy in driver core is formed with physical device, parent device
	  and mdev device;

	- Leveraging the mechanism and APIs provided by Linux driver core, it
	  is now safe to remove all refcnts and locks;

	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
	  PCI-specific logic was removed, accesses from userspace are now
	  passed to vendor driver directly, thus guaranteed that full VFIO
	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;

	  With vfio_mdev being BUS-agnostic, it is enough to have only one
	  driver for all mdev devices;

	- UUID was removed from the interface between mdev and vendor driver;


TODO

	remove mdev stuff from vfio.h
	update doc



Jike Song (3):
  mdev: remove uuid from parent_ops
  vfio_mpci: remove PCI-specific logic
  rename 'vfio_mpci' to 'vfio_mdev' since it is now BUS-agnostic

Xiao Guangrong (2):
  mdev: create separate device for parent_device
  mdev: drop management information and utilize driver-core instead

 drivers/vfio/mdev/Kconfig           |  10 +-
 drivers/vfio/mdev/Makefile          |   5 +-
 drivers/vfio/mdev/mdev_core.c       | 483 +++++---------------------------
 drivers/vfio/mdev/mdev_private.h    |   9 +-
 drivers/vfio/mdev/mdev_sysfs.c      |  82 ++----
 drivers/vfio/mdev/vfio_mdev.c       | 168 +++++++++++
 drivers/vfio/mdev/vfio_mpci.c       | 536 ------------------------------------
 drivers/vfio/pci/vfio_pci_private.h |   6 +
 include/linux/mdev.h                |  87 ++----
 include/linux/vfio.h                |   7 -
 10 files changed, 302 insertions(+), 1091 deletions(-)
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c
 delete mode 100644 drivers/vfio/mdev/vfio_mpci.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC v6-based v1 1/5] mdev: create separate device for parent_device
  2016-08-16  8:14 ` [Qemu-devel] " Jike Song
@ 2016-08-16  8:14   ` Jike Song
  -1 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

From: Xiao Guangrong <guangrong.xiao@linux.intel.com>

By introducing a separate device for parent_device, we can have
the parent list and lock removed, letting driver core and sysfs
to deal with the mutual exclusion.

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 drivers/vfio/mdev/mdev_core.c    | 242 +++++----------------------------------
 drivers/vfio/mdev/mdev_private.h |   9 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  69 +++--------
 drivers/vfio/mdev/vfio_mpci.c    |   2 +-
 include/linux/mdev.h             |   6 +-
 5 files changed, 57 insertions(+), 271 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 90ff073..9138588 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -27,11 +27,6 @@
 #define DRIVER_AUTHOR		"NVIDIA Corporation"
 #define DRIVER_DESC		"Mediated device Core Driver"
 
-#define MDEV_CLASS_NAME		"mdev"
-
-static LIST_HEAD(parent_list);
-static DEFINE_MUTEX(parent_list_lock);
-
 static int mdev_add_attribute_group(struct device *dev,
 				    const struct attribute_group **groups)
 {
@@ -58,55 +53,6 @@ static struct mdev_device *find_mdev_device(struct parent_device *parent,
 	return NULL;
 }
 
-/* Should be called holding parent_list_lock */
-static struct parent_device *find_parent_device(struct device *dev)
-{
-	struct parent_device *parent;
-
-	list_for_each_entry(parent, &parent_list, next) {
-		if (parent->dev == dev)
-			return parent;
-	}
-	return NULL;
-}
-
-static void mdev_release_parent(struct kref *kref)
-{
-	struct parent_device *parent = container_of(kref, struct parent_device,
-						    ref);
-	kfree(parent);
-}
-
-static
-inline struct parent_device *mdev_get_parent(struct parent_device *parent)
-{
-	if (parent)
-		kref_get(&parent->ref);
-
-	return parent;
-}
-
-static inline void mdev_put_parent(struct parent_device *parent)
-{
-	if (parent)
-		kref_put(&parent->ref, mdev_release_parent);
-}
-
-static struct parent_device *mdev_get_parent_by_dev(struct device *dev)
-{
-	struct parent_device *parent = NULL, *p;
-
-	mutex_lock(&parent_list_lock);
-	list_for_each_entry(p, &parent_list, next) {
-		if (p->dev == dev) {
-			parent = mdev_get_parent(p);
-			break;
-		}
-	}
-	mutex_unlock(&parent_list_lock);
-	return parent;
-}
-
 static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
 {
 	struct parent_device *parent = mdev->parent;
@@ -153,7 +99,6 @@ static void mdev_release_device(struct kref *kref)
 
 	device_unregister(&mdev->dev);
 	wake_up(&parent->release_done);
-	mdev_put_parent(parent);
 }
 
 struct mdev_device *mdev_get_device(struct mdev_device *mdev)
@@ -173,66 +118,6 @@ void mdev_put_device(struct mdev_device *mdev)
 EXPORT_SYMBOL(mdev_put_device);
 
 /*
- * Find first mediated device from given uuid and increment refcount of
- * mediated device. Caller should call mdev_put_device() when the use of
- * mdev_device is done.
- */
-static struct mdev_device *mdev_get_first_device_by_uuid(uuid_le uuid)
-{
-	struct mdev_device *mdev = NULL, *p;
-	struct parent_device *parent;
-
-	mutex_lock(&parent_list_lock);
-	list_for_each_entry(parent, &parent_list, next) {
-		mutex_lock(&parent->mdev_list_lock);
-		list_for_each_entry(p, &parent->mdev_list, next) {
-			if (uuid_le_cmp(p->uuid, uuid) == 0) {
-				mdev = mdev_get_device(p);
-				break;
-			}
-		}
-		mutex_unlock(&parent->mdev_list_lock);
-
-		if (mdev)
-			break;
-	}
-	mutex_unlock(&parent_list_lock);
-	return mdev;
-}
-
-/*
- * Find mediated device from given iommu_group and increment refcount of
- * mediated device. Caller should call mdev_put_device() when the use of
- * mdev_device is done.
- */
-struct mdev_device *mdev_get_device_by_group(struct iommu_group *group)
-{
-	struct mdev_device *mdev = NULL, *p;
-	struct parent_device *parent;
-
-	mutex_lock(&parent_list_lock);
-	list_for_each_entry(parent, &parent_list, next) {
-		mutex_lock(&parent->mdev_list_lock);
-		list_for_each_entry(p, &parent->mdev_list, next) {
-			if (!p->group)
-				continue;
-
-			if (iommu_group_id(p->group) == iommu_group_id(group)) {
-				mdev = mdev_get_device(p);
-				break;
-			}
-		}
-		mutex_unlock(&parent->mdev_list_lock);
-
-		if (mdev)
-			break;
-	}
-	mutex_unlock(&parent_list_lock);
-	return mdev;
-}
-EXPORT_SYMBOL(mdev_get_device_by_group);
-
-/*
  * mdev_register_device : Register a device
  * @dev: device structure representing parent device.
  * @ops: Parent device operation structure to be registered.
@@ -252,30 +137,21 @@ int mdev_register_device(struct device *dev, const struct parent_ops *ops)
 	if (!ops->create || !ops->destroy)
 		return -EINVAL;
 
-	mutex_lock(&parent_list_lock);
-
-	/* Check for duplicate */
-	parent = find_parent_device(dev);
-	if (parent) {
-		ret = -EEXIST;
-		goto add_dev_err;
-	}
-
 	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
 	if (!parent) {
 		ret = -ENOMEM;
 		goto add_dev_err;
 	}
 
-	kref_init(&parent->ref);
-	list_add(&parent->next, &parent_list);
-
-	parent->dev = dev;
+	parent->dev.parent = dev;
 	parent->ops = ops;
 	mutex_init(&parent->mdev_list_lock);
 	INIT_LIST_HEAD(&parent->mdev_list);
 	init_waitqueue_head(&parent->release_done);
-	mutex_unlock(&parent_list_lock);
+
+	ret = device_register(&parent->dev);
+	if (ret)
+		goto register_error;
 
 	ret = mdev_create_sysfs_files(dev);
 	if (ret)
@@ -291,50 +167,37 @@ int mdev_register_device(struct device *dev, const struct parent_ops *ops)
 add_group_error:
 	mdev_remove_sysfs_files(dev);
 add_sysfs_error:
-	mutex_lock(&parent_list_lock);
-	list_del(&parent->next);
-	mutex_unlock(&parent_list_lock);
-	mdev_put_parent(parent);
+	device_unregister(&parent->dev);
+register_error:
 	return ret;
 
 add_dev_err:
-	mutex_unlock(&parent_list_lock);
 	return ret;
 }
 EXPORT_SYMBOL(mdev_register_device);
 
 /*
  * mdev_unregister_device : Unregister a parent device
- * @dev: device structure representing parent device.
+ * @dev: device structure representing parent device which is returned by
+ *       mdev_register_device.
  *
  * Remove device from list of registered parent devices. Give a chance to free
  * existing mediated devices for given device.
  */
-
-void mdev_unregister_device(struct device *dev)
+void mdev_unregister_device(struct parent_device *parent)
 {
-	struct parent_device *parent;
 	struct mdev_device *mdev, *n;
 	int ret;
 
-	mutex_lock(&parent_list_lock);
-	parent = find_parent_device(dev);
-
-	if (!parent) {
-		mutex_unlock(&parent_list_lock);
-		return;
-	}
-	dev_info(dev, "MDEV: Unregistering\n");
+	dev_info(&parent->dev, "MDEV: Unregistering\n");
 
 	/*
 	 * Remove parent from the list and remove create and destroy sysfs
 	 * files so that no new mediated device could be created for this parent
 	 */
-	list_del(&parent->next);
-	mdev_remove_sysfs_files(dev);
-	mutex_unlock(&parent_list_lock);
+	mdev_remove_sysfs_files(&parent->dev);
 
-	mdev_remove_attribute_group(dev,
+	mdev_remove_attribute_group(&parent->dev,
 				    parent->ops->dev_attr_groups);
 
 	mutex_lock(&parent->mdev_list_lock);
@@ -350,14 +213,12 @@ void mdev_unregister_device(struct device *dev)
 		ret = wait_event_interruptible_timeout(parent->release_done,
 				list_empty(&parent->mdev_list), HZ * 10);
 		if (ret == -ERESTARTSYS) {
-			dev_warn(dev, "Mediated devices are in use, task"
+			dev_warn(&parent->dev, "Mediated devices are in use, task"
 				      " \"%s\" (%d) "
 				      "blocked until all are released",
 				      current->comm, task_pid_nr(current));
 		}
 	} while (ret <= 0);
-
-	mdev_put_parent(parent);
 }
 EXPORT_SYMBOL(mdev_unregister_device);
 
@@ -377,11 +238,7 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 {
 	int ret;
 	struct mdev_device *mdev;
-	struct parent_device *parent;
-
-	parent = mdev_get_parent_by_dev(dev);
-	if (!parent)
-		return -EINVAL;
+	struct parent_device *parent = dev_to_parent_dev(dev);
 
 	mutex_lock(&parent->mdev_list_lock);
 	/* Check for duplicate */
@@ -403,6 +260,7 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 	kref_init(&mdev->ref);
 
 	mdev->dev.parent  = dev;
+	mdev->dev.class = &mdev_class;
 	mdev->dev.bus     = &mdev_bus_type;
 	mdev->dev.release = mdev_device_release;
 	dev_set_name(&mdev->dev, "%pUl-%d", uuid.b, instance);
@@ -429,19 +287,14 @@ create_failed:
 
 create_err:
 	mutex_unlock(&parent->mdev_list_lock);
-	mdev_put_parent(parent);
 	return ret;
 }
 
 int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance)
 {
 	struct mdev_device *mdev;
-	struct parent_device *parent;
 	int ret;
-
-	parent = mdev_get_parent_by_dev(dev);
-	if (!parent)
-		return -EINVAL;
+	struct parent_device *parent = dev_to_parent_dev(dev);
 
 	mutex_lock(&parent->mdev_list_lock);
 	mdev = find_mdev_device(parent, uuid, instance);
@@ -457,12 +310,10 @@ int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance)
 	mutex_unlock(&parent->mdev_list_lock);
 	mdev_put_device(mdev);
 
-	mdev_put_parent(parent);
 	return ret;
 
 destroy_err:
 	mutex_unlock(&parent->mdev_list_lock);
-	mdev_put_parent(parent);
 	return ret;
 }
 
@@ -575,72 +426,37 @@ EXPORT_SYMBOL(mdev_del_phys_mapping);
 
 void mdev_device_supported_config(struct device *dev, char *str)
 {
-	struct parent_device *parent;
-
-	parent = mdev_get_parent_by_dev(dev);
+	struct parent_device *parent = dev_to_parent_dev(dev);
 
 	if (parent) {
 		if (parent->ops->supported_config)
-			parent->ops->supported_config(parent->dev, str);
-		mdev_put_parent(parent);
+			parent->ops->supported_config(&parent->dev, str);
 	}
 }
 
-int mdev_device_start(uuid_le uuid)
+int mdev_device_start(struct device *dev, bool start)
 {
 	int ret = 0;
-	struct mdev_device *mdev;
-	struct parent_device *parent;
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	struct parent_device *parent = dev_to_parent_dev(dev->parent);
 
-	mdev = mdev_get_first_device_by_uuid(uuid);
-	if (!mdev)
-		return -EINVAL;
-
-	parent = mdev->parent;
-
-	if (parent->ops->start)
+	mdev_get_device(mdev);
+	if (start && parent->ops->start)
 		ret = parent->ops->start(mdev->uuid);
-
-	if (ret)
-		pr_err("mdev_start failed  %d\n", ret);
-	else
-		kobject_uevent(&mdev->dev.kobj, KOBJ_ONLINE);
-
-	mdev_put_device(mdev);
-
-	return ret;
-}
-
-int mdev_device_stop(uuid_le uuid)
-{
-	int ret = 0;
-	struct mdev_device *mdev;
-	struct parent_device *parent;
-
-	mdev = mdev_get_first_device_by_uuid(uuid);
-	if (!mdev)
-		return -EINVAL;
-
-	parent = mdev->parent;
-
-	if (parent->ops->stop)
+	else if (!start && parent->ops->stop)
 		ret = parent->ops->stop(mdev->uuid);
 
 	if (ret)
-		pr_err("mdev stop failed %d\n", ret);
+		pr_err("mdev %s failed  %d\n", start ? "start" : "stop", ret);
 	else
-		kobject_uevent(&mdev->dev.kobj, KOBJ_OFFLINE);
+		kobject_uevent(&mdev->dev.kobj,
+			       start ? KOBJ_ONLINE : KOBJ_OFFLINE);
 
 	mdev_put_device(mdev);
+
 	return ret;
 }
 
-static struct class mdev_class = {
-	.name		= MDEV_CLASS_NAME,
-	.owner		= THIS_MODULE,
-	.class_attrs	= mdev_class_attrs,
-};
-
 static int __init mdev_init(void)
 {
 	int ret;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index ee2db61..3fce40f 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -13,12 +13,16 @@
 #ifndef MDEV_PRIVATE_H
 #define MDEV_PRIVATE_H
 
+#define dev_to_parent_dev(_dev) container_of((_dev),	\
+					     struct parent_device, dev)
+#define dev_to_mdev(_dev) container_of((_dev), struct mdev_device, dev)
+
 int  mdev_bus_register(void);
 void mdev_bus_unregister(void);
 
 /* Function prototypes for mdev_sysfs */
 
-extern struct class_attribute mdev_class_attrs[];
+extern struct class mdev_class;
 
 int  mdev_create_sysfs_files(struct device *dev);
 void mdev_remove_sysfs_files(struct device *dev);
@@ -27,7 +31,6 @@ int  mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 			char *mdev_params);
 int  mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance);
 void mdev_device_supported_config(struct device *dev, char *str);
-int  mdev_device_start(uuid_le uuid);
-int  mdev_device_stop(uuid_le uuid);
+int  mdev_device_start(struct device *dev, bool start);
 
 #endif /* MDEV_PRIVATE_H */
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index e0457e6..3080edc 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -171,65 +171,34 @@ destroy_error:
 	return ret;
 }
 
-ssize_t mdev_start_store(struct class *class, struct class_attribute *attr,
-			 const char *buf, size_t count)
+static ssize_t start_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
 {
-	char *uuid_str, *ptr;
-	uuid_le uuid;
-	int ret;
-
-	ptr = uuid_str = kstrndup(buf, count, GFP_KERNEL);
-
-	if (!uuid_str)
-		return -ENOMEM;
+	int start, ret;
 
-	ret = uuid_le_to_bin(uuid_str, &uuid);
-	if (ret) {
-		pr_err("mdev_start: UUID parse error  %s\n", buf);
-		goto start_error;
-	}
+	ret = kstrtoint(buf, 10, &start);
+	if (ret)
+		goto exit;
 
-	ret = mdev_device_start(uuid);
+	ret = mdev_device_start(dev, !!start);
 	if (ret == 0)
-		ret = count;
-
-start_error:
-	kfree(ptr);
+		ret = len;
+exit:
 	return ret;
 }
+static DEVICE_ATTR_WO(start);
 
-ssize_t mdev_stop_store(struct class *class, struct class_attribute *attr,
-			    const char *buf, size_t count)
-{
-	char *uuid_str, *ptr;
-	uuid_le uuid;
-	int ret;
-
-	ptr = uuid_str = kstrndup(buf, count, GFP_KERNEL);
-
-	if (!uuid_str)
-		return -ENOMEM;
-
-	ret = uuid_le_to_bin(uuid_str, &uuid);
-	if (ret) {
-		pr_err("mdev_stop: UUID parse error %s\n", buf);
-		goto stop_error;
-	}
-
-	ret = mdev_device_stop(uuid);
-	if (ret == 0)
-		ret = count;
-
-stop_error:
-	kfree(ptr);
-	return ret;
+static struct attribute *mdev_device_attrs[] = {
+	&dev_attr_start.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(mdev_device);
 
-}
+#define MDEV_CLASS_NAME		"mdev"
 
-struct class_attribute mdev_class_attrs[] = {
-	__ATTR_WO(mdev_start),
-	__ATTR_WO(mdev_stop),
-	__ATTR_NULL
+struct class mdev_class = {
+	.name		= MDEV_CLASS_NAME,
+	.dev_groups	= mdev_device_groups,
 };
 
 int mdev_create_sysfs_files(struct device *dev)
diff --git a/drivers/vfio/mdev/vfio_mpci.c b/drivers/vfio/mdev/vfio_mpci.c
index 9da94b7..88c0ba6 100644
--- a/drivers/vfio/mdev/vfio_mpci.c
+++ b/drivers/vfio/mdev/vfio_mpci.c
@@ -420,7 +420,7 @@ static int mdev_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		virtaddr = vma->vm_start;
 		req_size = vma->vm_end - vma->vm_start;
 
-		pdev = to_pci_dev(parent->dev);
+		pdev = to_pci_dev(parent->dev.parent);
 		index = VFIO_PCI_OFFSET_TO_INDEX(vma->vm_pgoff << PAGE_SHIFT);
 		pgoff = pci_resource_start(pdev, index) >> PAGE_SHIFT;
 	}
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0b41f30..42da41b 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -159,12 +159,10 @@ struct parent_ops {
  */
 
 struct parent_device {
-	struct device		*dev;
+	struct device dev;
 	const struct parent_ops	*ops;
 
 	/* internal */
-	struct kref		ref;
-	struct list_head	next;
 	struct list_head	mdev_list;
 	struct mutex		mdev_list_lock;
 	wait_queue_head_t	release_done;
@@ -214,7 +212,7 @@ extern struct bus_type mdev_bus_type;
 
 extern int  mdev_register_device(struct device *dev,
 				 const struct parent_ops *ops);
-extern void mdev_unregister_device(struct device *dev);
+extern void mdev_unregister_device(struct parent_device *parent);
 
 extern int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
 extern void mdev_unregister_driver(struct mdev_driver *drv);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v6-based v1 1/5] mdev: create separate device for parent_device
@ 2016-08-16  8:14   ` Jike Song
  0 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

From: Xiao Guangrong <guangrong.xiao@linux.intel.com>

By introducing a separate device for parent_device, we can have
the parent list and lock removed, letting driver core and sysfs
to deal with the mutual exclusion.

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 drivers/vfio/mdev/mdev_core.c    | 242 +++++----------------------------------
 drivers/vfio/mdev/mdev_private.h |   9 +-
 drivers/vfio/mdev/mdev_sysfs.c   |  69 +++--------
 drivers/vfio/mdev/vfio_mpci.c    |   2 +-
 include/linux/mdev.h             |   6 +-
 5 files changed, 57 insertions(+), 271 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 90ff073..9138588 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -27,11 +27,6 @@
 #define DRIVER_AUTHOR		"NVIDIA Corporation"
 #define DRIVER_DESC		"Mediated device Core Driver"
 
-#define MDEV_CLASS_NAME		"mdev"
-
-static LIST_HEAD(parent_list);
-static DEFINE_MUTEX(parent_list_lock);
-
 static int mdev_add_attribute_group(struct device *dev,
 				    const struct attribute_group **groups)
 {
@@ -58,55 +53,6 @@ static struct mdev_device *find_mdev_device(struct parent_device *parent,
 	return NULL;
 }
 
-/* Should be called holding parent_list_lock */
-static struct parent_device *find_parent_device(struct device *dev)
-{
-	struct parent_device *parent;
-
-	list_for_each_entry(parent, &parent_list, next) {
-		if (parent->dev == dev)
-			return parent;
-	}
-	return NULL;
-}
-
-static void mdev_release_parent(struct kref *kref)
-{
-	struct parent_device *parent = container_of(kref, struct parent_device,
-						    ref);
-	kfree(parent);
-}
-
-static
-inline struct parent_device *mdev_get_parent(struct parent_device *parent)
-{
-	if (parent)
-		kref_get(&parent->ref);
-
-	return parent;
-}
-
-static inline void mdev_put_parent(struct parent_device *parent)
-{
-	if (parent)
-		kref_put(&parent->ref, mdev_release_parent);
-}
-
-static struct parent_device *mdev_get_parent_by_dev(struct device *dev)
-{
-	struct parent_device *parent = NULL, *p;
-
-	mutex_lock(&parent_list_lock);
-	list_for_each_entry(p, &parent_list, next) {
-		if (p->dev == dev) {
-			parent = mdev_get_parent(p);
-			break;
-		}
-	}
-	mutex_unlock(&parent_list_lock);
-	return parent;
-}
-
 static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
 {
 	struct parent_device *parent = mdev->parent;
@@ -153,7 +99,6 @@ static void mdev_release_device(struct kref *kref)
 
 	device_unregister(&mdev->dev);
 	wake_up(&parent->release_done);
-	mdev_put_parent(parent);
 }
 
 struct mdev_device *mdev_get_device(struct mdev_device *mdev)
@@ -173,66 +118,6 @@ void mdev_put_device(struct mdev_device *mdev)
 EXPORT_SYMBOL(mdev_put_device);
 
 /*
- * Find first mediated device from given uuid and increment refcount of
- * mediated device. Caller should call mdev_put_device() when the use of
- * mdev_device is done.
- */
-static struct mdev_device *mdev_get_first_device_by_uuid(uuid_le uuid)
-{
-	struct mdev_device *mdev = NULL, *p;
-	struct parent_device *parent;
-
-	mutex_lock(&parent_list_lock);
-	list_for_each_entry(parent, &parent_list, next) {
-		mutex_lock(&parent->mdev_list_lock);
-		list_for_each_entry(p, &parent->mdev_list, next) {
-			if (uuid_le_cmp(p->uuid, uuid) == 0) {
-				mdev = mdev_get_device(p);
-				break;
-			}
-		}
-		mutex_unlock(&parent->mdev_list_lock);
-
-		if (mdev)
-			break;
-	}
-	mutex_unlock(&parent_list_lock);
-	return mdev;
-}
-
-/*
- * Find mediated device from given iommu_group and increment refcount of
- * mediated device. Caller should call mdev_put_device() when the use of
- * mdev_device is done.
- */
-struct mdev_device *mdev_get_device_by_group(struct iommu_group *group)
-{
-	struct mdev_device *mdev = NULL, *p;
-	struct parent_device *parent;
-
-	mutex_lock(&parent_list_lock);
-	list_for_each_entry(parent, &parent_list, next) {
-		mutex_lock(&parent->mdev_list_lock);
-		list_for_each_entry(p, &parent->mdev_list, next) {
-			if (!p->group)
-				continue;
-
-			if (iommu_group_id(p->group) == iommu_group_id(group)) {
-				mdev = mdev_get_device(p);
-				break;
-			}
-		}
-		mutex_unlock(&parent->mdev_list_lock);
-
-		if (mdev)
-			break;
-	}
-	mutex_unlock(&parent_list_lock);
-	return mdev;
-}
-EXPORT_SYMBOL(mdev_get_device_by_group);
-
-/*
  * mdev_register_device : Register a device
  * @dev: device structure representing parent device.
  * @ops: Parent device operation structure to be registered.
@@ -252,30 +137,21 @@ int mdev_register_device(struct device *dev, const struct parent_ops *ops)
 	if (!ops->create || !ops->destroy)
 		return -EINVAL;
 
-	mutex_lock(&parent_list_lock);
-
-	/* Check for duplicate */
-	parent = find_parent_device(dev);
-	if (parent) {
-		ret = -EEXIST;
-		goto add_dev_err;
-	}
-
 	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
 	if (!parent) {
 		ret = -ENOMEM;
 		goto add_dev_err;
 	}
 
-	kref_init(&parent->ref);
-	list_add(&parent->next, &parent_list);
-
-	parent->dev = dev;
+	parent->dev.parent = dev;
 	parent->ops = ops;
 	mutex_init(&parent->mdev_list_lock);
 	INIT_LIST_HEAD(&parent->mdev_list);
 	init_waitqueue_head(&parent->release_done);
-	mutex_unlock(&parent_list_lock);
+
+	ret = device_register(&parent->dev);
+	if (ret)
+		goto register_error;
 
 	ret = mdev_create_sysfs_files(dev);
 	if (ret)
@@ -291,50 +167,37 @@ int mdev_register_device(struct device *dev, const struct parent_ops *ops)
 add_group_error:
 	mdev_remove_sysfs_files(dev);
 add_sysfs_error:
-	mutex_lock(&parent_list_lock);
-	list_del(&parent->next);
-	mutex_unlock(&parent_list_lock);
-	mdev_put_parent(parent);
+	device_unregister(&parent->dev);
+register_error:
 	return ret;
 
 add_dev_err:
-	mutex_unlock(&parent_list_lock);
 	return ret;
 }
 EXPORT_SYMBOL(mdev_register_device);
 
 /*
  * mdev_unregister_device : Unregister a parent device
- * @dev: device structure representing parent device.
+ * @dev: device structure representing parent device which is returned by
+ *       mdev_register_device.
  *
  * Remove device from list of registered parent devices. Give a chance to free
  * existing mediated devices for given device.
  */
-
-void mdev_unregister_device(struct device *dev)
+void mdev_unregister_device(struct parent_device *parent)
 {
-	struct parent_device *parent;
 	struct mdev_device *mdev, *n;
 	int ret;
 
-	mutex_lock(&parent_list_lock);
-	parent = find_parent_device(dev);
-
-	if (!parent) {
-		mutex_unlock(&parent_list_lock);
-		return;
-	}
-	dev_info(dev, "MDEV: Unregistering\n");
+	dev_info(&parent->dev, "MDEV: Unregistering\n");
 
 	/*
 	 * Remove parent from the list and remove create and destroy sysfs
 	 * files so that no new mediated device could be created for this parent
 	 */
-	list_del(&parent->next);
-	mdev_remove_sysfs_files(dev);
-	mutex_unlock(&parent_list_lock);
+	mdev_remove_sysfs_files(&parent->dev);
 
-	mdev_remove_attribute_group(dev,
+	mdev_remove_attribute_group(&parent->dev,
 				    parent->ops->dev_attr_groups);
 
 	mutex_lock(&parent->mdev_list_lock);
@@ -350,14 +213,12 @@ void mdev_unregister_device(struct device *dev)
 		ret = wait_event_interruptible_timeout(parent->release_done,
 				list_empty(&parent->mdev_list), HZ * 10);
 		if (ret == -ERESTARTSYS) {
-			dev_warn(dev, "Mediated devices are in use, task"
+			dev_warn(&parent->dev, "Mediated devices are in use, task"
 				      " \"%s\" (%d) "
 				      "blocked until all are released",
 				      current->comm, task_pid_nr(current));
 		}
 	} while (ret <= 0);
-
-	mdev_put_parent(parent);
 }
 EXPORT_SYMBOL(mdev_unregister_device);
 
@@ -377,11 +238,7 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 {
 	int ret;
 	struct mdev_device *mdev;
-	struct parent_device *parent;
-
-	parent = mdev_get_parent_by_dev(dev);
-	if (!parent)
-		return -EINVAL;
+	struct parent_device *parent = dev_to_parent_dev(dev);
 
 	mutex_lock(&parent->mdev_list_lock);
 	/* Check for duplicate */
@@ -403,6 +260,7 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 	kref_init(&mdev->ref);
 
 	mdev->dev.parent  = dev;
+	mdev->dev.class = &mdev_class;
 	mdev->dev.bus     = &mdev_bus_type;
 	mdev->dev.release = mdev_device_release;
 	dev_set_name(&mdev->dev, "%pUl-%d", uuid.b, instance);
@@ -429,19 +287,14 @@ create_failed:
 
 create_err:
 	mutex_unlock(&parent->mdev_list_lock);
-	mdev_put_parent(parent);
 	return ret;
 }
 
 int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance)
 {
 	struct mdev_device *mdev;
-	struct parent_device *parent;
 	int ret;
-
-	parent = mdev_get_parent_by_dev(dev);
-	if (!parent)
-		return -EINVAL;
+	struct parent_device *parent = dev_to_parent_dev(dev);
 
 	mutex_lock(&parent->mdev_list_lock);
 	mdev = find_mdev_device(parent, uuid, instance);
@@ -457,12 +310,10 @@ int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance)
 	mutex_unlock(&parent->mdev_list_lock);
 	mdev_put_device(mdev);
 
-	mdev_put_parent(parent);
 	return ret;
 
 destroy_err:
 	mutex_unlock(&parent->mdev_list_lock);
-	mdev_put_parent(parent);
 	return ret;
 }
 
@@ -575,72 +426,37 @@ EXPORT_SYMBOL(mdev_del_phys_mapping);
 
 void mdev_device_supported_config(struct device *dev, char *str)
 {
-	struct parent_device *parent;
-
-	parent = mdev_get_parent_by_dev(dev);
+	struct parent_device *parent = dev_to_parent_dev(dev);
 
 	if (parent) {
 		if (parent->ops->supported_config)
-			parent->ops->supported_config(parent->dev, str);
-		mdev_put_parent(parent);
+			parent->ops->supported_config(&parent->dev, str);
 	}
 }
 
-int mdev_device_start(uuid_le uuid)
+int mdev_device_start(struct device *dev, bool start)
 {
 	int ret = 0;
-	struct mdev_device *mdev;
-	struct parent_device *parent;
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	struct parent_device *parent = dev_to_parent_dev(dev->parent);
 
-	mdev = mdev_get_first_device_by_uuid(uuid);
-	if (!mdev)
-		return -EINVAL;
-
-	parent = mdev->parent;
-
-	if (parent->ops->start)
+	mdev_get_device(mdev);
+	if (start && parent->ops->start)
 		ret = parent->ops->start(mdev->uuid);
-
-	if (ret)
-		pr_err("mdev_start failed  %d\n", ret);
-	else
-		kobject_uevent(&mdev->dev.kobj, KOBJ_ONLINE);
-
-	mdev_put_device(mdev);
-
-	return ret;
-}
-
-int mdev_device_stop(uuid_le uuid)
-{
-	int ret = 0;
-	struct mdev_device *mdev;
-	struct parent_device *parent;
-
-	mdev = mdev_get_first_device_by_uuid(uuid);
-	if (!mdev)
-		return -EINVAL;
-
-	parent = mdev->parent;
-
-	if (parent->ops->stop)
+	else if (!start && parent->ops->stop)
 		ret = parent->ops->stop(mdev->uuid);
 
 	if (ret)
-		pr_err("mdev stop failed %d\n", ret);
+		pr_err("mdev %s failed  %d\n", start ? "start" : "stop", ret);
 	else
-		kobject_uevent(&mdev->dev.kobj, KOBJ_OFFLINE);
+		kobject_uevent(&mdev->dev.kobj,
+			       start ? KOBJ_ONLINE : KOBJ_OFFLINE);
 
 	mdev_put_device(mdev);
+
 	return ret;
 }
 
-static struct class mdev_class = {
-	.name		= MDEV_CLASS_NAME,
-	.owner		= THIS_MODULE,
-	.class_attrs	= mdev_class_attrs,
-};
-
 static int __init mdev_init(void)
 {
 	int ret;
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
index ee2db61..3fce40f 100644
--- a/drivers/vfio/mdev/mdev_private.h
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -13,12 +13,16 @@
 #ifndef MDEV_PRIVATE_H
 #define MDEV_PRIVATE_H
 
+#define dev_to_parent_dev(_dev) container_of((_dev),	\
+					     struct parent_device, dev)
+#define dev_to_mdev(_dev) container_of((_dev), struct mdev_device, dev)
+
 int  mdev_bus_register(void);
 void mdev_bus_unregister(void);
 
 /* Function prototypes for mdev_sysfs */
 
-extern struct class_attribute mdev_class_attrs[];
+extern struct class mdev_class;
 
 int  mdev_create_sysfs_files(struct device *dev);
 void mdev_remove_sysfs_files(struct device *dev);
@@ -27,7 +31,6 @@ int  mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 			char *mdev_params);
 int  mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance);
 void mdev_device_supported_config(struct device *dev, char *str);
-int  mdev_device_start(uuid_le uuid);
-int  mdev_device_stop(uuid_le uuid);
+int  mdev_device_start(struct device *dev, bool start);
 
 #endif /* MDEV_PRIVATE_H */
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index e0457e6..3080edc 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -171,65 +171,34 @@ destroy_error:
 	return ret;
 }
 
-ssize_t mdev_start_store(struct class *class, struct class_attribute *attr,
-			 const char *buf, size_t count)
+static ssize_t start_store(struct device *dev, struct device_attribute *attr,
+			  const char *buf, size_t len)
 {
-	char *uuid_str, *ptr;
-	uuid_le uuid;
-	int ret;
-
-	ptr = uuid_str = kstrndup(buf, count, GFP_KERNEL);
-
-	if (!uuid_str)
-		return -ENOMEM;
+	int start, ret;
 
-	ret = uuid_le_to_bin(uuid_str, &uuid);
-	if (ret) {
-		pr_err("mdev_start: UUID parse error  %s\n", buf);
-		goto start_error;
-	}
+	ret = kstrtoint(buf, 10, &start);
+	if (ret)
+		goto exit;
 
-	ret = mdev_device_start(uuid);
+	ret = mdev_device_start(dev, !!start);
 	if (ret == 0)
-		ret = count;
-
-start_error:
-	kfree(ptr);
+		ret = len;
+exit:
 	return ret;
 }
+static DEVICE_ATTR_WO(start);
 
-ssize_t mdev_stop_store(struct class *class, struct class_attribute *attr,
-			    const char *buf, size_t count)
-{
-	char *uuid_str, *ptr;
-	uuid_le uuid;
-	int ret;
-
-	ptr = uuid_str = kstrndup(buf, count, GFP_KERNEL);
-
-	if (!uuid_str)
-		return -ENOMEM;
-
-	ret = uuid_le_to_bin(uuid_str, &uuid);
-	if (ret) {
-		pr_err("mdev_stop: UUID parse error %s\n", buf);
-		goto stop_error;
-	}
-
-	ret = mdev_device_stop(uuid);
-	if (ret == 0)
-		ret = count;
-
-stop_error:
-	kfree(ptr);
-	return ret;
+static struct attribute *mdev_device_attrs[] = {
+	&dev_attr_start.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(mdev_device);
 
-}
+#define MDEV_CLASS_NAME		"mdev"
 
-struct class_attribute mdev_class_attrs[] = {
-	__ATTR_WO(mdev_start),
-	__ATTR_WO(mdev_stop),
-	__ATTR_NULL
+struct class mdev_class = {
+	.name		= MDEV_CLASS_NAME,
+	.dev_groups	= mdev_device_groups,
 };
 
 int mdev_create_sysfs_files(struct device *dev)
diff --git a/drivers/vfio/mdev/vfio_mpci.c b/drivers/vfio/mdev/vfio_mpci.c
index 9da94b7..88c0ba6 100644
--- a/drivers/vfio/mdev/vfio_mpci.c
+++ b/drivers/vfio/mdev/vfio_mpci.c
@@ -420,7 +420,7 @@ static int mdev_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		virtaddr = vma->vm_start;
 		req_size = vma->vm_end - vma->vm_start;
 
-		pdev = to_pci_dev(parent->dev);
+		pdev = to_pci_dev(parent->dev.parent);
 		index = VFIO_PCI_OFFSET_TO_INDEX(vma->vm_pgoff << PAGE_SHIFT);
 		pgoff = pci_resource_start(pdev, index) >> PAGE_SHIFT;
 	}
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 0b41f30..42da41b 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -159,12 +159,10 @@ struct parent_ops {
  */
 
 struct parent_device {
-	struct device		*dev;
+	struct device dev;
 	const struct parent_ops	*ops;
 
 	/* internal */
-	struct kref		ref;
-	struct list_head	next;
 	struct list_head	mdev_list;
 	struct mutex		mdev_list_lock;
 	wait_queue_head_t	release_done;
@@ -214,7 +212,7 @@ extern struct bus_type mdev_bus_type;
 
 extern int  mdev_register_device(struct device *dev,
 				 const struct parent_ops *ops);
-extern void mdev_unregister_device(struct device *dev);
+extern void mdev_unregister_device(struct parent_device *parent);
 
 extern int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
 extern void mdev_unregister_driver(struct mdev_driver *drv);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v6-based v1 2/5] mdev: drop management information and utilize driver-core instead
  2016-08-16  8:14 ` [Qemu-devel] " Jike Song
@ 2016-08-16  8:14   ` Jike Song
  -1 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

From: Xiao Guangrong <guangrong.xiao@linux.intel.com>

Now mdev has a separate parent device, so quite a few codes and
information used for managing mdev, can be removed. Linux driver
core will take care of us.

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 drivers/vfio/mdev/mdev_core.c  | 164 +++++++++++++++--------------------------
 drivers/vfio/mdev/mdev_sysfs.c |  13 ++--
 drivers/vfio/mdev/vfio_mpci.c  |  23 +++---
 include/linux/mdev.h           |  15 +---
 4 files changed, 77 insertions(+), 138 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 9138588..a677db7 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -39,23 +39,42 @@ static void mdev_remove_attribute_group(struct device *dev,
 	sysfs_remove_groups(&dev->kobj, groups);
 }
 
-/* Should be called holding parent->mdev_list_lock */
+struct mdev_info {
+	uuid_le uuid;
+	int instance;
+};
+
+static int __find_mdev_device(struct device *dev, void *data)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	struct mdev_info *info = data;
+
+	if ((uuid_le_cmp(mdev->uuid, info->uuid) == 0) &&
+	   (mdev->instance == info->instance))
+		return 1;
+
+	return 0;
+}
+
 static struct mdev_device *find_mdev_device(struct parent_device *parent,
 					    uuid_le uuid, int instance)
 {
-	struct mdev_device *mdev;
+	struct device *dev;
+	struct mdev_info info = {
+		.uuid = uuid,
+		.instance = instance,
+	 };
 
-	list_for_each_entry(mdev, &parent->mdev_list, next) {
-		if ((uuid_le_cmp(mdev->uuid, uuid) == 0) &&
-		    (mdev->instance == instance))
-			return mdev;
-	}
-	return NULL;
+	dev = device_find_child(&parent->dev, &info, __find_mdev_device);
+	if (!dev)
+		return NULL;
+
+	return dev_to_mdev(dev);
 }
 
 static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
 {
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	int ret;
 
 	ret = parent->ops->create(mdev, mdev_params);
@@ -72,7 +91,7 @@ static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
 
 static int mdev_device_destroy_ops(struct mdev_device *mdev, bool force)
 {
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	int ret = 0;
 
 	/*
@@ -89,34 +108,6 @@ static int mdev_device_destroy_ops(struct mdev_device *mdev, bool force)
 	return ret;
 }
 
-static void mdev_release_device(struct kref *kref)
-{
-	struct mdev_device *mdev = container_of(kref, struct mdev_device, ref);
-	struct parent_device *parent = mdev->parent;
-
-	list_del(&mdev->next);
-	mutex_unlock(&parent->mdev_list_lock);
-
-	device_unregister(&mdev->dev);
-	wake_up(&parent->release_done);
-}
-
-struct mdev_device *mdev_get_device(struct mdev_device *mdev)
-{
-	kref_get(&mdev->ref);
-	return mdev;
-}
-EXPORT_SYMBOL(mdev_get_device);
-
-void mdev_put_device(struct mdev_device *mdev)
-{
-	struct parent_device *parent = mdev->parent;
-
-	kref_put_mutex(&mdev->ref, mdev_release_device,
-		       &parent->mdev_list_lock);
-}
-EXPORT_SYMBOL(mdev_put_device);
-
 /*
  * mdev_register_device : Register a device
  * @dev: device structure representing parent device.
@@ -145,9 +136,6 @@ int mdev_register_device(struct device *dev, const struct parent_ops *ops)
 
 	parent->dev.parent = dev;
 	parent->ops = ops;
-	mutex_init(&parent->mdev_list_lock);
-	INIT_LIST_HEAD(&parent->mdev_list);
-	init_waitqueue_head(&parent->release_done);
 
 	ret = device_register(&parent->dev);
 	if (ret)
@@ -176,63 +164,53 @@ add_dev_err:
 }
 EXPORT_SYMBOL(mdev_register_device);
 
+static int __mdev_device_destroy(struct device *dev, void *data)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	int ret, force = (unsigned long)data;
+
+	ret = mdev_device_destroy_ops(mdev, force);
+
+	/* can not fail if foce = true. */
+	WARN_ON(force && ret);
+	if (ret)
+		goto destroy_err;
+
+	device_unregister(&mdev->dev);
+
+destroy_err:
+	return ret;
+}
+
 /*
  * mdev_unregister_device : Unregister a parent device
- * @dev: device structure representing parent device which is returned by
- *       mdev_register_device.
+ * @dev: device structure representing parent device
  *
  * Remove device from list of registered parent devices. Give a chance to free
  * existing mediated devices for given device.
  */
-void mdev_unregister_device(struct parent_device *parent)
+void mdev_unregister_device(struct device *dev)
 {
-	struct mdev_device *mdev, *n;
-	int ret;
-
+	struct parent_device *parent = dev_to_parent_dev(dev);
 	dev_info(&parent->dev, "MDEV: Unregistering\n");
 
 	/*
 	 * Remove parent from the list and remove create and destroy sysfs
 	 * files so that no new mediated device could be created for this parent
 	 */
-	mdev_remove_sysfs_files(&parent->dev);
+	mdev_remove_sysfs_files(dev);
 
-	mdev_remove_attribute_group(&parent->dev,
-				    parent->ops->dev_attr_groups);
+	mdev_remove_attribute_group(dev, parent->ops->dev_attr_groups);
 
-	mutex_lock(&parent->mdev_list_lock);
-	list_for_each_entry_safe(mdev, n, &parent->mdev_list, next) {
-		mdev_device_destroy_ops(mdev, true);
-		mutex_unlock(&parent->mdev_list_lock);
-		mdev_put_device(mdev);
-		mutex_lock(&parent->mdev_list_lock);
-	}
-	mutex_unlock(&parent->mdev_list_lock);
-
-	do {
-		ret = wait_event_interruptible_timeout(parent->release_done,
-				list_empty(&parent->mdev_list), HZ * 10);
-		if (ret == -ERESTARTSYS) {
-			dev_warn(&parent->dev, "Mediated devices are in use, task"
-				      " \"%s\" (%d) "
-				      "blocked until all are released",
-				      current->comm, task_pid_nr(current));
-		}
-	} while (ret <= 0);
+	device_for_each_child(dev, (void *)true, __mdev_device_destroy);
+
+        device_unregister(dev);
 }
 EXPORT_SYMBOL(mdev_unregister_device);
 
 /*
  * Functions required for mdev_sysfs
  */
-static void mdev_device_release(struct device *dev)
-{
-	struct mdev_device *mdev = to_mdev_device(dev);
-
-	dev_dbg(&mdev->dev, "MDEV: destroying\n");
-	kfree(mdev);
-}
-
 int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 		       char *mdev_params)
 {
@@ -240,7 +218,6 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 	struct mdev_device *mdev;
 	struct parent_device *parent = dev_to_parent_dev(dev);
 
-	mutex_lock(&parent->mdev_list_lock);
 	/* Check for duplicate */
 	mdev = find_mdev_device(parent, uuid, instance);
 	if (mdev) {
@@ -256,13 +233,10 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 
 	memcpy(&mdev->uuid, &uuid, sizeof(uuid_le));
 	mdev->instance = instance;
-	mdev->parent = parent;
-	kref_init(&mdev->ref);
 
 	mdev->dev.parent  = dev;
 	mdev->dev.class = &mdev_class;
 	mdev->dev.bus     = &mdev_bus_type;
-	mdev->dev.release = mdev_device_release;
 	dev_set_name(&mdev->dev, "%pUl-%d", uuid.b, instance);
 
 	ret = device_register(&mdev->dev);
@@ -275,9 +249,6 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 	if (ret)
 		goto create_failed;
 
-	list_add(&mdev->next, &parent->mdev_list);
-	mutex_unlock(&parent->mdev_list_lock);
-
 	dev_dbg(&mdev->dev, "MDEV: created\n");
 
 	return ret;
@@ -286,35 +257,19 @@ create_failed:
 	device_unregister(&mdev->dev);
 
 create_err:
-	mutex_unlock(&parent->mdev_list_lock);
 	return ret;
 }
 
 int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance)
 {
 	struct mdev_device *mdev;
-	int ret;
 	struct parent_device *parent = dev_to_parent_dev(dev);
 
-	mutex_lock(&parent->mdev_list_lock);
 	mdev = find_mdev_device(parent, uuid, instance);
-	if (!mdev) {
-		ret = -EINVAL;
-		goto destroy_err;
-	}
-
-	ret = mdev_device_destroy_ops(mdev, false);
-	if (ret)
-		goto destroy_err;
-
-	mutex_unlock(&parent->mdev_list_lock);
-	mdev_put_device(mdev);
-
-	return ret;
+	if (!mdev)
+		return -EINVAL;
 
-destroy_err:
-	mutex_unlock(&parent->mdev_list_lock);
-	return ret;
+	return __mdev_device_destroy(&mdev->dev, (void *)false);
 }
 
 int mdev_device_invalidate_mapping(struct mdev_device *mdev,
@@ -440,7 +395,6 @@ int mdev_device_start(struct device *dev, bool start)
 	struct mdev_device *mdev = dev_to_mdev(dev);
 	struct parent_device *parent = dev_to_parent_dev(dev->parent);
 
-	mdev_get_device(mdev);
 	if (start && parent->ops->start)
 		ret = parent->ops->start(mdev->uuid);
 	else if (!start && parent->ops->stop)
@@ -452,8 +406,6 @@ int mdev_device_start(struct device *dev, bool start)
 		kobject_uevent(&mdev->dev.kobj,
 			       start ? KOBJ_ONLINE : KOBJ_OFFLINE);
 
-	mdev_put_device(mdev);
-
 	return ret;
 }
 
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 3080edc..bc81c6f 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -215,17 +215,20 @@ int mdev_create_sysfs_files(struct device *dev)
 	ret = sysfs_create_file(&dev->kobj, &dev_attr_mdev_create.attr);
 	if (ret) {
 		pr_err("Failed to create mdev_create sysfs entry\n");
-		goto create_sysfs_failed;
+		goto create_failed;
 	}
 
 	ret = sysfs_create_file(&dev->kobj, &dev_attr_mdev_destroy.attr);
 	if (ret) {
 		pr_err("Failed to create mdev_destroy sysfs entry\n");
-		sysfs_remove_file(&dev->kobj, &dev_attr_mdev_create.attr);
-	} else
-		return ret;
+		goto destroy_failed;
+	}
 
-create_sysfs_failed:
+	return ret;
+
+destroy_failed:
+	sysfs_remove_file(&dev->kobj, &dev_attr_mdev_create.attr);
+create_failed:
 	sysfs_remove_file(&dev->kobj, &dev_attr_mdev_supported_types.attr);
 	return ret;
 }
diff --git a/drivers/vfio/mdev/vfio_mpci.c b/drivers/vfio/mdev/vfio_mpci.c
index 88c0ba6..502aeb7 100644
--- a/drivers/vfio/mdev/vfio_mpci.c
+++ b/drivers/vfio/mdev/vfio_mpci.c
@@ -38,7 +38,8 @@ static int vfio_mpci_open(void *device_data)
 {
 	int ret = 0;
 	struct vfio_mdev *vmdev = device_data;
-	struct parent_device *parent = vmdev->mdev->parent;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 
 	if (!try_module_get(THIS_MODULE))
 		return -ENODEV;
@@ -83,7 +84,7 @@ static void vfio_mpci_close(void *device_data)
 static u8 mpci_find_pci_capability(struct mdev_device *mdev, u8 capability)
 {
 	loff_t pos = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_CONFIG_REGION_INDEX);
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	u16 status;
 	u8  cap_ptr, cap_id = 0xff;
 
@@ -112,7 +113,7 @@ static int mpci_get_irq_count(struct vfio_mdev *vmdev, int irq_type)
 {
 	loff_t pos = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_CONFIG_REGION_INDEX);
 	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 
 	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
 		u8 pin;
@@ -161,13 +162,14 @@ static long vfio_mpci_unlocked_ioctl(void *device_data,
 {
 	int ret = 0;
 	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	unsigned long minsz;
 
 	switch (cmd) {
 	case VFIO_DEVICE_GET_INFO:
 	{
 		struct vfio_device_info info;
-		struct parent_device *parent = vmdev->mdev->parent;
 
 		minsz = offsetofend(struct vfio_device_info, num_irqs);
 
@@ -258,8 +260,6 @@ static long vfio_mpci_unlocked_ioctl(void *device_data,
 	case VFIO_DEVICE_SET_IRQS:
 	{
 		struct vfio_irq_set hdr;
-		struct mdev_device *mdev = vmdev->mdev;
-		struct parent_device *parent = mdev->parent;
 		u8 *data = NULL, *ptr = NULL;
 
 		minsz = offsetofend(struct vfio_irq_set, count);
@@ -302,8 +302,6 @@ static long vfio_mpci_unlocked_ioctl(void *device_data,
 	}
 	case VFIO_DEVICE_RESET:
 	{
-		struct parent_device *parent = vmdev->mdev->parent;
-
 		if (parent->ops->reset)
 			return parent->ops->reset(vmdev->mdev);
 
@@ -318,7 +316,7 @@ static ssize_t vfio_mpci_read(void *device_data, char __user *buf,
 {
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	int ret = 0;
 
 	if (!count)
@@ -351,7 +349,7 @@ static ssize_t vfio_mpci_write(void *device_data, const char __user *buf,
 {
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	int ret = 0;
 
 	if (!count)
@@ -390,7 +388,7 @@ static int mdev_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		return -EINVAL;
 
 	mdev = vmdev->mdev;
-	parent  = mdev->parent;
+	parent  = dev_to_parent_dev(mdev->dev.parent);
 
 	pg_prot  = vma->vm_page_prot;
 
@@ -482,7 +480,7 @@ int vfio_mpci_probe(struct device *dev)
 	if (IS_ERR(vmdev))
 		return PTR_ERR(vmdev);
 
-	vmdev->mdev = mdev_get_device(mdev);
+	vmdev->mdev = mdev;
 	vmdev->group = mdev->group;
 	mutex_init(&vmdev->vfio_mdev_lock);
 
@@ -490,7 +488,6 @@ int vfio_mpci_probe(struct device *dev)
 	if (ret)
 		kfree(vmdev);
 
-	mdev_put_device(mdev);
 	return ret;
 }
 
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 42da41b..8488b27 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -35,16 +35,12 @@ struct mdev_phys_mapping {
 
 struct mdev_device {
 	struct device		dev;
-	struct parent_device	*parent;
 	struct iommu_group	*group;
 	uuid_le			uuid;
 	uint32_t		instance;
 	void			*driver_data;
 
 	/* internal only */
-	struct kref		ref;
-	struct list_head	next;
-
 	struct mdev_phys_mapping phys_mappings;
 };
 
@@ -157,15 +153,9 @@ struct parent_ops {
 /*
  * Parent Device
  */
-
 struct parent_device {
 	struct device dev;
 	const struct parent_ops	*ops;
-
-	/* internal */
-	struct list_head	mdev_list;
-	struct mutex		mdev_list_lock;
-	wait_queue_head_t	release_done;
 };
 
 /**
@@ -212,14 +202,11 @@ extern struct bus_type mdev_bus_type;
 
 extern int  mdev_register_device(struct device *dev,
 				 const struct parent_ops *ops);
-extern void mdev_unregister_device(struct parent_device *parent);
+extern void mdev_unregister_device(struct device *dev);
 
 extern int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
 extern void mdev_unregister_driver(struct mdev_driver *drv);
 
-extern struct mdev_device *mdev_get_device(struct mdev_device *mdev);
-extern void mdev_put_device(struct mdev_device *mdev);
-
 extern struct mdev_device *mdev_get_device_by_group(struct iommu_group *group);
 
 extern int mdev_device_invalidate_mapping(struct mdev_device *mdev,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v6-based v1 2/5] mdev: drop management information and utilize driver-core instead
@ 2016-08-16  8:14   ` Jike Song
  0 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

From: Xiao Guangrong <guangrong.xiao@linux.intel.com>

Now mdev has a separate parent device, so quite a few codes and
information used for managing mdev, can be removed. Linux driver
core will take care of us.

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 drivers/vfio/mdev/mdev_core.c  | 164 +++++++++++++++--------------------------
 drivers/vfio/mdev/mdev_sysfs.c |  13 ++--
 drivers/vfio/mdev/vfio_mpci.c  |  23 +++---
 include/linux/mdev.h           |  15 +---
 4 files changed, 77 insertions(+), 138 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 9138588..a677db7 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -39,23 +39,42 @@ static void mdev_remove_attribute_group(struct device *dev,
 	sysfs_remove_groups(&dev->kobj, groups);
 }
 
-/* Should be called holding parent->mdev_list_lock */
+struct mdev_info {
+	uuid_le uuid;
+	int instance;
+};
+
+static int __find_mdev_device(struct device *dev, void *data)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	struct mdev_info *info = data;
+
+	if ((uuid_le_cmp(mdev->uuid, info->uuid) == 0) &&
+	   (mdev->instance == info->instance))
+		return 1;
+
+	return 0;
+}
+
 static struct mdev_device *find_mdev_device(struct parent_device *parent,
 					    uuid_le uuid, int instance)
 {
-	struct mdev_device *mdev;
+	struct device *dev;
+	struct mdev_info info = {
+		.uuid = uuid,
+		.instance = instance,
+	 };
 
-	list_for_each_entry(mdev, &parent->mdev_list, next) {
-		if ((uuid_le_cmp(mdev->uuid, uuid) == 0) &&
-		    (mdev->instance == instance))
-			return mdev;
-	}
-	return NULL;
+	dev = device_find_child(&parent->dev, &info, __find_mdev_device);
+	if (!dev)
+		return NULL;
+
+	return dev_to_mdev(dev);
 }
 
 static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
 {
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	int ret;
 
 	ret = parent->ops->create(mdev, mdev_params);
@@ -72,7 +91,7 @@ static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
 
 static int mdev_device_destroy_ops(struct mdev_device *mdev, bool force)
 {
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	int ret = 0;
 
 	/*
@@ -89,34 +108,6 @@ static int mdev_device_destroy_ops(struct mdev_device *mdev, bool force)
 	return ret;
 }
 
-static void mdev_release_device(struct kref *kref)
-{
-	struct mdev_device *mdev = container_of(kref, struct mdev_device, ref);
-	struct parent_device *parent = mdev->parent;
-
-	list_del(&mdev->next);
-	mutex_unlock(&parent->mdev_list_lock);
-
-	device_unregister(&mdev->dev);
-	wake_up(&parent->release_done);
-}
-
-struct mdev_device *mdev_get_device(struct mdev_device *mdev)
-{
-	kref_get(&mdev->ref);
-	return mdev;
-}
-EXPORT_SYMBOL(mdev_get_device);
-
-void mdev_put_device(struct mdev_device *mdev)
-{
-	struct parent_device *parent = mdev->parent;
-
-	kref_put_mutex(&mdev->ref, mdev_release_device,
-		       &parent->mdev_list_lock);
-}
-EXPORT_SYMBOL(mdev_put_device);
-
 /*
  * mdev_register_device : Register a device
  * @dev: device structure representing parent device.
@@ -145,9 +136,6 @@ int mdev_register_device(struct device *dev, const struct parent_ops *ops)
 
 	parent->dev.parent = dev;
 	parent->ops = ops;
-	mutex_init(&parent->mdev_list_lock);
-	INIT_LIST_HEAD(&parent->mdev_list);
-	init_waitqueue_head(&parent->release_done);
 
 	ret = device_register(&parent->dev);
 	if (ret)
@@ -176,63 +164,53 @@ add_dev_err:
 }
 EXPORT_SYMBOL(mdev_register_device);
 
+static int __mdev_device_destroy(struct device *dev, void *data)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	int ret, force = (unsigned long)data;
+
+	ret = mdev_device_destroy_ops(mdev, force);
+
+	/* can not fail if foce = true. */
+	WARN_ON(force && ret);
+	if (ret)
+		goto destroy_err;
+
+	device_unregister(&mdev->dev);
+
+destroy_err:
+	return ret;
+}
+
 /*
  * mdev_unregister_device : Unregister a parent device
- * @dev: device structure representing parent device which is returned by
- *       mdev_register_device.
+ * @dev: device structure representing parent device
  *
  * Remove device from list of registered parent devices. Give a chance to free
  * existing mediated devices for given device.
  */
-void mdev_unregister_device(struct parent_device *parent)
+void mdev_unregister_device(struct device *dev)
 {
-	struct mdev_device *mdev, *n;
-	int ret;
-
+	struct parent_device *parent = dev_to_parent_dev(dev);
 	dev_info(&parent->dev, "MDEV: Unregistering\n");
 
 	/*
 	 * Remove parent from the list and remove create and destroy sysfs
 	 * files so that no new mediated device could be created for this parent
 	 */
-	mdev_remove_sysfs_files(&parent->dev);
+	mdev_remove_sysfs_files(dev);
 
-	mdev_remove_attribute_group(&parent->dev,
-				    parent->ops->dev_attr_groups);
+	mdev_remove_attribute_group(dev, parent->ops->dev_attr_groups);
 
-	mutex_lock(&parent->mdev_list_lock);
-	list_for_each_entry_safe(mdev, n, &parent->mdev_list, next) {
-		mdev_device_destroy_ops(mdev, true);
-		mutex_unlock(&parent->mdev_list_lock);
-		mdev_put_device(mdev);
-		mutex_lock(&parent->mdev_list_lock);
-	}
-	mutex_unlock(&parent->mdev_list_lock);
-
-	do {
-		ret = wait_event_interruptible_timeout(parent->release_done,
-				list_empty(&parent->mdev_list), HZ * 10);
-		if (ret == -ERESTARTSYS) {
-			dev_warn(&parent->dev, "Mediated devices are in use, task"
-				      " \"%s\" (%d) "
-				      "blocked until all are released",
-				      current->comm, task_pid_nr(current));
-		}
-	} while (ret <= 0);
+	device_for_each_child(dev, (void *)true, __mdev_device_destroy);
+
+        device_unregister(dev);
 }
 EXPORT_SYMBOL(mdev_unregister_device);
 
 /*
  * Functions required for mdev_sysfs
  */
-static void mdev_device_release(struct device *dev)
-{
-	struct mdev_device *mdev = to_mdev_device(dev);
-
-	dev_dbg(&mdev->dev, "MDEV: destroying\n");
-	kfree(mdev);
-}
-
 int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 		       char *mdev_params)
 {
@@ -240,7 +218,6 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 	struct mdev_device *mdev;
 	struct parent_device *parent = dev_to_parent_dev(dev);
 
-	mutex_lock(&parent->mdev_list_lock);
 	/* Check for duplicate */
 	mdev = find_mdev_device(parent, uuid, instance);
 	if (mdev) {
@@ -256,13 +233,10 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 
 	memcpy(&mdev->uuid, &uuid, sizeof(uuid_le));
 	mdev->instance = instance;
-	mdev->parent = parent;
-	kref_init(&mdev->ref);
 
 	mdev->dev.parent  = dev;
 	mdev->dev.class = &mdev_class;
 	mdev->dev.bus     = &mdev_bus_type;
-	mdev->dev.release = mdev_device_release;
 	dev_set_name(&mdev->dev, "%pUl-%d", uuid.b, instance);
 
 	ret = device_register(&mdev->dev);
@@ -275,9 +249,6 @@ int mdev_device_create(struct device *dev, uuid_le uuid, uint32_t instance,
 	if (ret)
 		goto create_failed;
 
-	list_add(&mdev->next, &parent->mdev_list);
-	mutex_unlock(&parent->mdev_list_lock);
-
 	dev_dbg(&mdev->dev, "MDEV: created\n");
 
 	return ret;
@@ -286,35 +257,19 @@ create_failed:
 	device_unregister(&mdev->dev);
 
 create_err:
-	mutex_unlock(&parent->mdev_list_lock);
 	return ret;
 }
 
 int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance)
 {
 	struct mdev_device *mdev;
-	int ret;
 	struct parent_device *parent = dev_to_parent_dev(dev);
 
-	mutex_lock(&parent->mdev_list_lock);
 	mdev = find_mdev_device(parent, uuid, instance);
-	if (!mdev) {
-		ret = -EINVAL;
-		goto destroy_err;
-	}
-
-	ret = mdev_device_destroy_ops(mdev, false);
-	if (ret)
-		goto destroy_err;
-
-	mutex_unlock(&parent->mdev_list_lock);
-	mdev_put_device(mdev);
-
-	return ret;
+	if (!mdev)
+		return -EINVAL;
 
-destroy_err:
-	mutex_unlock(&parent->mdev_list_lock);
-	return ret;
+	return __mdev_device_destroy(&mdev->dev, (void *)false);
 }
 
 int mdev_device_invalidate_mapping(struct mdev_device *mdev,
@@ -440,7 +395,6 @@ int mdev_device_start(struct device *dev, bool start)
 	struct mdev_device *mdev = dev_to_mdev(dev);
 	struct parent_device *parent = dev_to_parent_dev(dev->parent);
 
-	mdev_get_device(mdev);
 	if (start && parent->ops->start)
 		ret = parent->ops->start(mdev->uuid);
 	else if (!start && parent->ops->stop)
@@ -452,8 +406,6 @@ int mdev_device_start(struct device *dev, bool start)
 		kobject_uevent(&mdev->dev.kobj,
 			       start ? KOBJ_ONLINE : KOBJ_OFFLINE);
 
-	mdev_put_device(mdev);
-
 	return ret;
 }
 
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
index 3080edc..bc81c6f 100644
--- a/drivers/vfio/mdev/mdev_sysfs.c
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -215,17 +215,20 @@ int mdev_create_sysfs_files(struct device *dev)
 	ret = sysfs_create_file(&dev->kobj, &dev_attr_mdev_create.attr);
 	if (ret) {
 		pr_err("Failed to create mdev_create sysfs entry\n");
-		goto create_sysfs_failed;
+		goto create_failed;
 	}
 
 	ret = sysfs_create_file(&dev->kobj, &dev_attr_mdev_destroy.attr);
 	if (ret) {
 		pr_err("Failed to create mdev_destroy sysfs entry\n");
-		sysfs_remove_file(&dev->kobj, &dev_attr_mdev_create.attr);
-	} else
-		return ret;
+		goto destroy_failed;
+	}
 
-create_sysfs_failed:
+	return ret;
+
+destroy_failed:
+	sysfs_remove_file(&dev->kobj, &dev_attr_mdev_create.attr);
+create_failed:
 	sysfs_remove_file(&dev->kobj, &dev_attr_mdev_supported_types.attr);
 	return ret;
 }
diff --git a/drivers/vfio/mdev/vfio_mpci.c b/drivers/vfio/mdev/vfio_mpci.c
index 88c0ba6..502aeb7 100644
--- a/drivers/vfio/mdev/vfio_mpci.c
+++ b/drivers/vfio/mdev/vfio_mpci.c
@@ -38,7 +38,8 @@ static int vfio_mpci_open(void *device_data)
 {
 	int ret = 0;
 	struct vfio_mdev *vmdev = device_data;
-	struct parent_device *parent = vmdev->mdev->parent;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 
 	if (!try_module_get(THIS_MODULE))
 		return -ENODEV;
@@ -83,7 +84,7 @@ static void vfio_mpci_close(void *device_data)
 static u8 mpci_find_pci_capability(struct mdev_device *mdev, u8 capability)
 {
 	loff_t pos = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_CONFIG_REGION_INDEX);
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	u16 status;
 	u8  cap_ptr, cap_id = 0xff;
 
@@ -112,7 +113,7 @@ static int mpci_get_irq_count(struct vfio_mdev *vmdev, int irq_type)
 {
 	loff_t pos = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_CONFIG_REGION_INDEX);
 	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 
 	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
 		u8 pin;
@@ -161,13 +162,14 @@ static long vfio_mpci_unlocked_ioctl(void *device_data,
 {
 	int ret = 0;
 	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	unsigned long minsz;
 
 	switch (cmd) {
 	case VFIO_DEVICE_GET_INFO:
 	{
 		struct vfio_device_info info;
-		struct parent_device *parent = vmdev->mdev->parent;
 
 		minsz = offsetofend(struct vfio_device_info, num_irqs);
 
@@ -258,8 +260,6 @@ static long vfio_mpci_unlocked_ioctl(void *device_data,
 	case VFIO_DEVICE_SET_IRQS:
 	{
 		struct vfio_irq_set hdr;
-		struct mdev_device *mdev = vmdev->mdev;
-		struct parent_device *parent = mdev->parent;
 		u8 *data = NULL, *ptr = NULL;
 
 		minsz = offsetofend(struct vfio_irq_set, count);
@@ -302,8 +302,6 @@ static long vfio_mpci_unlocked_ioctl(void *device_data,
 	}
 	case VFIO_DEVICE_RESET:
 	{
-		struct parent_device *parent = vmdev->mdev->parent;
-
 		if (parent->ops->reset)
 			return parent->ops->reset(vmdev->mdev);
 
@@ -318,7 +316,7 @@ static ssize_t vfio_mpci_read(void *device_data, char __user *buf,
 {
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	int ret = 0;
 
 	if (!count)
@@ -351,7 +349,7 @@ static ssize_t vfio_mpci_write(void *device_data, const char __user *buf,
 {
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = mdev->parent;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 	int ret = 0;
 
 	if (!count)
@@ -390,7 +388,7 @@ static int mdev_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 		return -EINVAL;
 
 	mdev = vmdev->mdev;
-	parent  = mdev->parent;
+	parent  = dev_to_parent_dev(mdev->dev.parent);
 
 	pg_prot  = vma->vm_page_prot;
 
@@ -482,7 +480,7 @@ int vfio_mpci_probe(struct device *dev)
 	if (IS_ERR(vmdev))
 		return PTR_ERR(vmdev);
 
-	vmdev->mdev = mdev_get_device(mdev);
+	vmdev->mdev = mdev;
 	vmdev->group = mdev->group;
 	mutex_init(&vmdev->vfio_mdev_lock);
 
@@ -490,7 +488,6 @@ int vfio_mpci_probe(struct device *dev)
 	if (ret)
 		kfree(vmdev);
 
-	mdev_put_device(mdev);
 	return ret;
 }
 
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 42da41b..8488b27 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -35,16 +35,12 @@ struct mdev_phys_mapping {
 
 struct mdev_device {
 	struct device		dev;
-	struct parent_device	*parent;
 	struct iommu_group	*group;
 	uuid_le			uuid;
 	uint32_t		instance;
 	void			*driver_data;
 
 	/* internal only */
-	struct kref		ref;
-	struct list_head	next;
-
 	struct mdev_phys_mapping phys_mappings;
 };
 
@@ -157,15 +153,9 @@ struct parent_ops {
 /*
  * Parent Device
  */
-
 struct parent_device {
 	struct device dev;
 	const struct parent_ops	*ops;
-
-	/* internal */
-	struct list_head	mdev_list;
-	struct mutex		mdev_list_lock;
-	wait_queue_head_t	release_done;
 };
 
 /**
@@ -212,14 +202,11 @@ extern struct bus_type mdev_bus_type;
 
 extern int  mdev_register_device(struct device *dev,
 				 const struct parent_ops *ops);
-extern void mdev_unregister_device(struct parent_device *parent);
+extern void mdev_unregister_device(struct device *dev);
 
 extern int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
 extern void mdev_unregister_driver(struct mdev_driver *drv);
 
-extern struct mdev_device *mdev_get_device(struct mdev_device *mdev);
-extern void mdev_put_device(struct mdev_device *mdev);
-
 extern struct mdev_device *mdev_get_device_by_group(struct iommu_group *group);
 
 extern int mdev_device_invalidate_mapping(struct mdev_device *mdev,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v6-based v1 3/5] mdev: remove uuid from parent_ops
  2016-08-16  8:14 ` [Qemu-devel] " Jike Song
@ 2016-08-16  8:14   ` Jike Song
  -1 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

mdev should be able to start/stop separately, not relying on
grouping in a VM. And even if grouping start/stop needed, it's always
easy for userspace to find out all mdevs belonging to a particular VM.

Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/mdev/mdev_core.c | 4 ++--
 include/linux/mdev.h          | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index a677db7..934dac9 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -396,9 +396,9 @@ int mdev_device_start(struct device *dev, bool start)
 	struct parent_device *parent = dev_to_parent_dev(dev->parent);
 
 	if (start && parent->ops->start)
-		ret = parent->ops->start(mdev->uuid);
+		ret = parent->ops->start(mdev);
 	else if (!start && parent->ops->stop)
-		ret = parent->ops->stop(mdev->uuid);
+		ret = parent->ops->stop(mdev);
 
 	if (ret)
 		pr_err("mdev %s failed  %d\n", start ? "start" : "stop", ret);
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 8488b27..c06967d 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -79,10 +79,10 @@ struct mdev_device {
  *			Returns integer: success (0) or error (< 0)
  * @start:		Called to initiate mediated device initialization
  *			process in parent device's driver before VMM starts.
- *			@uuid: UUID
+ *			@mdev: mediated device structure
  *			Returns integer: success (0) or error (< 0)
  * @stop:		Called to teardown mediated device related resources
- *			@uuid: UUID
+ *			@mdev: mediated device structure
  *			Returns integer: success (0) or error (< 0)
  * @read:		Read emulation callback
  *			@mdev: mediated device structure
@@ -134,8 +134,8 @@ struct parent_ops {
 	int     (*create)(struct mdev_device *mdev, char *mdev_params);
 	int     (*destroy)(struct mdev_device *mdev);
 	int     (*reset)(struct mdev_device *mdev);
-	int     (*start)(uuid_le uuid);
-	int     (*stop)(uuid_le uuid);
+	int     (*start)(struct mdev_device *mdev);
+	int     (*stop)(struct mdev_device *mdev);
 	ssize_t (*read)(struct mdev_device *mdev, char *buf, size_t count,
 			loff_t pos);
 	ssize_t (*write)(struct mdev_device *mdev, char *buf, size_t count,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v6-based v1 3/5] mdev: remove uuid from parent_ops
@ 2016-08-16  8:14   ` Jike Song
  0 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

mdev should be able to start/stop separately, not relying on
grouping in a VM. And even if grouping start/stop needed, it's always
easy for userspace to find out all mdevs belonging to a particular VM.

Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/mdev/mdev_core.c | 4 ++--
 include/linux/mdev.h          | 8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index a677db7..934dac9 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -396,9 +396,9 @@ int mdev_device_start(struct device *dev, bool start)
 	struct parent_device *parent = dev_to_parent_dev(dev->parent);
 
 	if (start && parent->ops->start)
-		ret = parent->ops->start(mdev->uuid);
+		ret = parent->ops->start(mdev);
 	else if (!start && parent->ops->stop)
-		ret = parent->ops->stop(mdev->uuid);
+		ret = parent->ops->stop(mdev);
 
 	if (ret)
 		pr_err("mdev %s failed  %d\n", start ? "start" : "stop", ret);
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index 8488b27..c06967d 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -79,10 +79,10 @@ struct mdev_device {
  *			Returns integer: success (0) or error (< 0)
  * @start:		Called to initiate mediated device initialization
  *			process in parent device's driver before VMM starts.
- *			@uuid: UUID
+ *			@mdev: mediated device structure
  *			Returns integer: success (0) or error (< 0)
  * @stop:		Called to teardown mediated device related resources
- *			@uuid: UUID
+ *			@mdev: mediated device structure
  *			Returns integer: success (0) or error (< 0)
  * @read:		Read emulation callback
  *			@mdev: mediated device structure
@@ -134,8 +134,8 @@ struct parent_ops {
 	int     (*create)(struct mdev_device *mdev, char *mdev_params);
 	int     (*destroy)(struct mdev_device *mdev);
 	int     (*reset)(struct mdev_device *mdev);
-	int     (*start)(uuid_le uuid);
-	int     (*stop)(uuid_le uuid);
+	int     (*start)(struct mdev_device *mdev);
+	int     (*stop)(struct mdev_device *mdev);
 	ssize_t (*read)(struct mdev_device *mdev, char *buf, size_t count,
 			loff_t pos);
 	ssize_t (*write)(struct mdev_device *mdev, char *buf, size_t count,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v6-based v1 4/5] vfio_mpci: remove PCI-specific logic
  2016-08-16  8:14 ` [Qemu-devel] " Jike Song
@ 2016-08-16  8:14   ` Jike Song
  -1 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

As a generic mdev driver, 'vfio_mpci' should be BUS-agnostic and
as thin as possible, passing through all device operations directly
to underlying physical driver(a.k.a the device-model).

Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/mdev/mdev_core.c       | 107 --------
 drivers/vfio/mdev/vfio_mpci.c       | 469 ++++--------------------------------
 drivers/vfio/pci/vfio_pci_private.h |   6 +
 include/linux/mdev.h                |  62 +----
 include/linux/vfio.h                |   7 -
 5 files changed, 69 insertions(+), 582 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 934dac9..f824856 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -272,113 +272,6 @@ int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance)
 	return __mdev_device_destroy(&mdev->dev, (void *)false);
 }
 
-int mdev_device_invalidate_mapping(struct mdev_device *mdev,
-				   unsigned long addr, unsigned long size)
-{
-	int ret = -EINVAL;
-	struct mdev_phys_mapping *phys_mappings;
-	struct addr_desc *addr_desc;
-
-	if (!mdev || !mdev->phys_mappings.mapping)
-		return ret;
-
-	phys_mappings = &mdev->phys_mappings;
-
-	mutex_lock(&phys_mappings->addr_desc_list_lock);
-
-	list_for_each_entry(addr_desc, &phys_mappings->addr_desc_list, next) {
-
-		if ((addr > addr_desc->start) &&
-		    (addr + size < addr_desc->start + addr_desc->size)) {
-			unmap_mapping_range(phys_mappings->mapping,
-					    addr, size, 0);
-			ret = 0;
-			goto unlock_exit;
-		}
-	}
-
-unlock_exit:
-	mutex_unlock(&phys_mappings->addr_desc_list_lock);
-	return ret;
-}
-EXPORT_SYMBOL(mdev_device_invalidate_mapping);
-
-/* Sanity check for the physical mapping list for mediated device */
-
-int mdev_add_phys_mapping(struct mdev_device *mdev,
-			  struct address_space *mapping,
-			  unsigned long addr, unsigned long size)
-{
-	struct mdev_phys_mapping *phys_mappings;
-	struct addr_desc *addr_desc, *new_addr_desc;
-	int ret = 0;
-
-	if (!mdev)
-		return -EINVAL;
-
-	phys_mappings = &mdev->phys_mappings;
-	if (phys_mappings->mapping && (mapping != phys_mappings->mapping))
-		return -EINVAL;
-
-	if (!phys_mappings->mapping) {
-		phys_mappings->mapping = mapping;
-		mutex_init(&phys_mappings->addr_desc_list_lock);
-		INIT_LIST_HEAD(&phys_mappings->addr_desc_list);
-	}
-
-	mutex_lock(&phys_mappings->addr_desc_list_lock);
-
-	list_for_each_entry(addr_desc, &phys_mappings->addr_desc_list, next) {
-		if ((addr + size < addr_desc->start) ||
-		    (addr_desc->start + addr_desc->size) < addr)
-			continue;
-		else {
-			/* should be no overlap */
-			ret = -EINVAL;
-			goto mapping_exit;
-		}
-	}
-
-	/* add the new entry to the list */
-	new_addr_desc = kzalloc(sizeof(*new_addr_desc), GFP_KERNEL);
-
-	if (!new_addr_desc) {
-		ret = -ENOMEM;
-		goto mapping_exit;
-	}
-
-	new_addr_desc->start = addr;
-	new_addr_desc->size = size;
-	list_add(&new_addr_desc->next, &phys_mappings->addr_desc_list);
-
-mapping_exit:
-	mutex_unlock(&phys_mappings->addr_desc_list_lock);
-	return ret;
-}
-EXPORT_SYMBOL(mdev_add_phys_mapping);
-
-void mdev_del_phys_mapping(struct mdev_device *mdev, unsigned long addr)
-{
-	struct mdev_phys_mapping *phys_mappings;
-	struct addr_desc *addr_desc;
-
-	if (!mdev)
-		return;
-
-	phys_mappings = &mdev->phys_mappings;
-
-	mutex_lock(&phys_mappings->addr_desc_list_lock);
-	list_for_each_entry(addr_desc, &phys_mappings->addr_desc_list, next) {
-		if (addr_desc->start == addr) {
-			list_del(&addr_desc->next);
-			kfree(addr_desc);
-			break;
-		}
-	}
-	mutex_unlock(&phys_mappings->addr_desc_list_lock);
-}
-EXPORT_SYMBOL(mdev_del_phys_mapping);
-
 void mdev_device_supported_config(struct device *dev, char *str)
 {
 	struct parent_device *parent = dev_to_parent_dev(dev);
diff --git a/drivers/vfio/mdev/vfio_mpci.c b/drivers/vfio/mdev/vfio_mpci.c
index 502aeb7..ca47676 100644
--- a/drivers/vfio/mdev/vfio_mpci.c
+++ b/drivers/vfio/mdev/vfio_mpci.c
@@ -1,5 +1,5 @@
 /*
- * VFIO based Mediated PCI device driver
+ * VFIO Bus driver for Mediated device
  *
  * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
  *     Author: Neo Jia <cjia@nvidia.com>
@@ -24,453 +24,88 @@
 
 #define DRIVER_VERSION  "0.1"
 #define DRIVER_AUTHOR   "NVIDIA Corporation"
-#define DRIVER_DESC     "VFIO based Mediated PCI device driver"
+#define DRIVER_DESC     "VFIO Bus driver for Mediated device"
 
 struct vfio_mdev {
 	struct iommu_group *group;
 	struct mdev_device *mdev;
-	int		    refcnt;
-	struct vfio_region_info vfio_region_info[VFIO_PCI_NUM_REGIONS];
-	struct mutex	    vfio_mdev_lock;
 };
 
-static int vfio_mpci_open(void *device_data)
+static int vfio_mdev_open(void *device_data)
 {
-	int ret = 0;
-	struct vfio_mdev *vmdev = device_data;
-	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-
 	if (!try_module_get(THIS_MODULE))
 		return -ENODEV;
 
-	mutex_lock(&vmdev->vfio_mdev_lock);
-	if (!vmdev->refcnt && parent->ops->get_region_info) {
-		int index;
-
-		for (index = VFIO_PCI_BAR0_REGION_INDEX;
-		     index < VFIO_PCI_NUM_REGIONS; index++) {
-			ret = parent->ops->get_region_info(vmdev->mdev, index,
-					      &vmdev->vfio_region_info[index]);
-			if (ret)
-				goto open_error;
-		}
-	}
-
-	vmdev->refcnt++;
-
-open_error:
-	mutex_unlock(&vmdev->vfio_mdev_lock);
-	if (ret)
-		module_put(THIS_MODULE);
-
-	return ret;
-}
-
-static void vfio_mpci_close(void *device_data)
-{
-	struct vfio_mdev *vmdev = device_data;
-
-	mutex_lock(&vmdev->vfio_mdev_lock);
-	vmdev->refcnt--;
-	if (!vmdev->refcnt) {
-		memset(&vmdev->vfio_region_info, 0,
-			sizeof(vmdev->vfio_region_info));
-	}
-	mutex_unlock(&vmdev->vfio_mdev_lock);
-	module_put(THIS_MODULE);
-}
-
-static u8 mpci_find_pci_capability(struct mdev_device *mdev, u8 capability)
-{
-	loff_t pos = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_CONFIG_REGION_INDEX);
-	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-	u16 status;
-	u8  cap_ptr, cap_id = 0xff;
-
-	parent->ops->read(mdev, (char *)&status, sizeof(status),
-			  pos + PCI_STATUS);
-	if (!(status & PCI_STATUS_CAP_LIST))
-		return 0;
-
-	parent->ops->read(mdev, &cap_ptr, sizeof(cap_ptr),
-			  pos + PCI_CAPABILITY_LIST);
-
-	do {
-		cap_ptr &= 0xfc;
-		parent->ops->read(mdev, &cap_id, sizeof(cap_id),
-				  pos + cap_ptr + PCI_CAP_LIST_ID);
-		if (cap_id == capability)
-			return cap_ptr;
-		parent->ops->read(mdev, &cap_ptr, sizeof(cap_ptr),
-				  pos + cap_ptr + PCI_CAP_LIST_NEXT);
-	} while (cap_ptr && cap_id != 0xff);
-
 	return 0;
 }
 
-static int mpci_get_irq_count(struct vfio_mdev *vmdev, int irq_type)
+static void vfio_mdev_close(void *device_data)
 {
-	loff_t pos = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_CONFIG_REGION_INDEX);
-	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-
-	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
-		u8 pin;
-
-		parent->ops->read(mdev, &pin, sizeof(pin),
-				  pos + PCI_INTERRUPT_PIN);
-		if (IS_ENABLED(CONFIG_VFIO_PCI_INTX) && pin)
-			return 1;
-
-	} else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
-		u8 cap_ptr;
-		u16 flags;
-
-		cap_ptr = mpci_find_pci_capability(mdev, PCI_CAP_ID_MSI);
-		if (cap_ptr) {
-			parent->ops->read(mdev, (char *)&flags, sizeof(flags),
-					pos + cap_ptr + PCI_MSI_FLAGS);
-			return 1 << ((flags & PCI_MSI_FLAGS_QMASK) >> 1);
-		}
-	} else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) {
-		u8 cap_ptr;
-		u16 flags;
-
-		cap_ptr = mpci_find_pci_capability(mdev, PCI_CAP_ID_MSIX);
-		if (cap_ptr) {
-			parent->ops->read(mdev, (char *)&flags, sizeof(flags),
-					pos + cap_ptr + PCI_MSIX_FLAGS);
-
-			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
-		}
-	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
-		u8 cap_ptr;
-
-		cap_ptr = mpci_find_pci_capability(mdev, PCI_CAP_ID_EXP);
-		if (cap_ptr)
-			return 1;
-	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
-		return 1;
-	}
-
-	return 0;
+	module_put(THIS_MODULE);
 }
 
-static long vfio_mpci_unlocked_ioctl(void *device_data,
+static long vfio_mdev_unlocked_ioctl(void *device_data,
 				     unsigned int cmd, unsigned long arg)
 {
-	int ret = 0;
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
 	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-	unsigned long minsz;
-
-	switch (cmd) {
-	case VFIO_DEVICE_GET_INFO:
-	{
-		struct vfio_device_info info;
-
-		minsz = offsetofend(struct vfio_device_info, num_irqs);
-
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (info.argsz < minsz)
-			return -EINVAL;
-
-		info.flags = VFIO_DEVICE_FLAGS_PCI;
-
-		if (parent->ops->reset)
-			info.flags |= VFIO_DEVICE_FLAGS_RESET;
-
-		info.num_regions = VFIO_PCI_NUM_REGIONS;
-		info.num_irqs = VFIO_PCI_NUM_IRQS;
-
-		return copy_to_user((void __user *)arg, &info, minsz);
-	}
-	case VFIO_DEVICE_GET_REGION_INFO:
-	{
-		struct vfio_region_info info;
-
-		minsz = offsetofend(struct vfio_region_info, offset);
-
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (info.argsz < minsz)
-			return -EINVAL;
-
-		switch (info.index) {
-		case VFIO_PCI_CONFIG_REGION_INDEX:
-		case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
-			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
-			info.size = vmdev->vfio_region_info[info.index].size;
-			if (!info.size) {
-				info.flags = 0;
-				break;
-			}
-
-			info.flags = vmdev->vfio_region_info[info.index].flags;
-			break;
-		case VFIO_PCI_VGA_REGION_INDEX:
-		case VFIO_PCI_ROM_REGION_INDEX:
-		default:
-			return -EINVAL;
-		}
-
-		return copy_to_user((void __user *)arg, &info, minsz);
-	}
-	case VFIO_DEVICE_GET_IRQ_INFO:
-	{
-		struct vfio_irq_info info;
-
-		minsz = offsetofend(struct vfio_irq_info, count);
-
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
-			return -EINVAL;
-
-		switch (info.index) {
-		case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSI_IRQ_INDEX:
-		case VFIO_PCI_REQ_IRQ_INDEX:
-			break;
-			/* pass thru to return error */
-		case VFIO_PCI_MSIX_IRQ_INDEX:
-		default:
-			return -EINVAL;
-		}
-
-		info.flags = VFIO_IRQ_INFO_EVENTFD;
-		info.count = mpci_get_irq_count(vmdev, info.index);
-
-		if (info.count == -1)
-			return -EINVAL;
-
-		if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
-			info.flags |= (VFIO_IRQ_INFO_MASKABLE |
-					VFIO_IRQ_INFO_AUTOMASKED);
-		else
-			info.flags |= VFIO_IRQ_INFO_NORESIZE;
-
-		return copy_to_user((void __user *)arg, &info, minsz);
-	}
-	case VFIO_DEVICE_SET_IRQS:
-	{
-		struct vfio_irq_set hdr;
-		u8 *data = NULL, *ptr = NULL;
-
-		minsz = offsetofend(struct vfio_irq_set, count);
-
-		if (copy_from_user(&hdr, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (hdr.argsz < minsz || hdr.index >= VFIO_PCI_NUM_IRQS ||
-		    hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
-				  VFIO_IRQ_SET_ACTION_TYPE_MASK))
-			return -EINVAL;
-
-		if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) {
-			size_t size;
-			int max = mpci_get_irq_count(vmdev, hdr.index);
-
-			if (hdr.flags & VFIO_IRQ_SET_DATA_BOOL)
-				size = sizeof(uint8_t);
-			else if (hdr.flags & VFIO_IRQ_SET_DATA_EVENTFD)
-				size = sizeof(int32_t);
-			else
-				return -EINVAL;
-
-			if (hdr.argsz - minsz < hdr.count * size ||
-			    hdr.start >= max || hdr.start + hdr.count > max)
-				return -EINVAL;
-
-			ptr = data = memdup_user((void __user *)(arg + minsz),
-						 hdr.count * size);
-			if (IS_ERR(data))
-				return PTR_ERR(data);
-		}
-
-		if (parent->ops->set_irqs)
-			ret = parent->ops->set_irqs(mdev, hdr.flags, hdr.index,
-						    hdr.start, hdr.count, data);
-
-		kfree(ptr);
-		return ret;
-	}
-	case VFIO_DEVICE_RESET:
-	{
-		if (parent->ops->reset)
-			return parent->ops->reset(vmdev->mdev);
-
-		return -EINVAL;
-	}
-	}
-	return -ENOTTY;
+
+	if (parent && parent->ops && parent->ops->ioctl)
+		return parent->ops->ioctl(mdev, cmd, arg);
+
+	return -ENODEV;
 }
 
-static ssize_t vfio_mpci_read(void *device_data, char __user *buf,
+static ssize_t vfio_mdev_read(void *device_data, char __user *buf,
 			      size_t count, loff_t *ppos)
 {
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
 	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-	int ret = 0;
-
-	if (!count)
-		return 0;
 
-	if (parent->ops->read) {
-		char *ret_data, *ptr;
+	if (parent && parent->ops && parent->ops->read)
+		return parent->ops->read(mdev, buf, count, ppos);
 
-		ptr = ret_data = kzalloc(count, GFP_KERNEL);
-
-		if (!ret_data)
-			return  -ENOMEM;
-
-		ret = parent->ops->read(mdev, ret_data, count, *ppos);
-
-		if (ret > 0) {
-			if (copy_to_user(buf, ret_data, ret))
-				ret = -EFAULT;
-			else
-				*ppos += ret;
-		}
-		kfree(ptr);
-	}
-
-	return ret;
+	return -ENODEV;
 }
 
-static ssize_t vfio_mpci_write(void *device_data, const char __user *buf,
+static ssize_t vfio_mdev_write(void *device_data, const char __user *buf,
 			       size_t count, loff_t *ppos)
 {
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
 	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-	int ret = 0;
-
-	if (!count)
-		return 0;
-
-	if (parent->ops->write) {
-		char *usr_data, *ptr;
-
-		ptr = usr_data = memdup_user(buf, count);
-		if (IS_ERR(usr_data))
-			return PTR_ERR(usr_data);
-
-		ret = parent->ops->write(mdev, usr_data, count, *ppos);
-
-		if (ret > 0)
-			*ppos += ret;
-
-		kfree(ptr);
-	}
-
-	return ret;
-}
-
-static int mdev_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
-{
-	int ret;
-	struct vfio_mdev *vmdev = vma->vm_private_data;
-	struct mdev_device *mdev;
-	struct parent_device *parent;
-	u64 virtaddr = (u64)vmf->virtual_address;
-	unsigned long req_size, pgoff = 0;
-	pgprot_t pg_prot;
-	unsigned int index;
-
-	if (!vmdev && !vmdev->mdev)
-		return -EINVAL;
-
-	mdev = vmdev->mdev;
-	parent  = dev_to_parent_dev(mdev->dev.parent);
-
-	pg_prot  = vma->vm_page_prot;
-
-	if (parent->ops->validate_map_request) {
-		u64 offset;
-		loff_t pos;
-
-		offset   = virtaddr - vma->vm_start;
-		req_size = vma->vm_end - virtaddr;
-		pos = (vma->vm_pgoff << PAGE_SHIFT) + offset;
-
-		ret = parent->ops->validate_map_request(mdev, pos, &virtaddr,
-						&pgoff, &req_size, &pg_prot);
-		if (ret)
-			return ret;
-
-		/*
-		 * Verify pgoff and req_size are valid and virtaddr is within
-		 * vma range
-		 */
-		if (!pgoff || !req_size || (virtaddr < vma->vm_start) ||
-		    ((virtaddr + req_size) >= vma->vm_end))
-			return -EINVAL;
-	} else {
-		struct pci_dev *pdev;
-
-		virtaddr = vma->vm_start;
-		req_size = vma->vm_end - vma->vm_start;
-
-		pdev = to_pci_dev(parent->dev.parent);
-		index = VFIO_PCI_OFFSET_TO_INDEX(vma->vm_pgoff << PAGE_SHIFT);
-		pgoff = pci_resource_start(pdev, index) >> PAGE_SHIFT;
-	}
-
-	ret = remap_pfn_range(vma, virtaddr, pgoff, req_size, pg_prot);
-
-	return ret | VM_FAULT_NOPAGE;
-}
 
-void mdev_dev_mmio_close(struct vm_area_struct *vma)
-{
-	struct vfio_mdev *vmdev = vma->vm_private_data;
-	struct mdev_device *mdev = vmdev->mdev;
+	if (parent && parent->ops && parent->ops->write)
+		return parent->ops->write(mdev, buf, count, ppos);
 
-	mdev_del_phys_mapping(mdev, vma->vm_pgoff << PAGE_SHIFT);
+	return -ENODEV;
 }
 
-static const struct vm_operations_struct mdev_dev_mmio_ops = {
-	.fault = mdev_dev_mmio_fault,
-	.close = mdev_dev_mmio_close,
-};
-
-static int vfio_mpci_mmap(void *device_data, struct vm_area_struct *vma)
+static int vfio_mdev_mmap(void *device_data, struct vm_area_struct *vma)
 {
-	unsigned int index;
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 
-	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
-
-	if (index >= VFIO_PCI_ROM_REGION_INDEX)
-		return -EINVAL;
-
-	vma->vm_private_data = vmdev;
-	vma->vm_ops = &mdev_dev_mmio_ops;
+	if (parent && parent->ops && parent->ops->mmap)
+		return parent->ops->mmap(mdev, vma);
 
-	return mdev_add_phys_mapping(mdev, vma->vm_file->f_mapping,
-				     vma->vm_pgoff << PAGE_SHIFT,
-				     vma->vm_end - vma->vm_start);
+	return -ENODEV;
 }
 
-static const struct vfio_device_ops vfio_mpci_dev_ops = {
-	.name		= "vfio-mpci",
-	.open		= vfio_mpci_open,
-	.release	= vfio_mpci_close,
-	.ioctl		= vfio_mpci_unlocked_ioctl,
-	.read		= vfio_mpci_read,
-	.write		= vfio_mpci_write,
-	.mmap		= vfio_mpci_mmap,
+static const struct vfio_device_ops vfio_mdev_dev_ops = {
+	.name		= "vfio-mdev",
+	.open		= vfio_mdev_open,
+	.release	= vfio_mdev_close,
+	.ioctl		= vfio_mdev_unlocked_ioctl,
+	.read		= vfio_mdev_read,
+	.write		= vfio_mdev_write,
+	.mmap		= vfio_mdev_mmap,
 };
 
-int vfio_mpci_probe(struct device *dev)
+static int vfio_mdev_probe(struct device *dev)
 {
 	struct vfio_mdev *vmdev;
 	struct mdev_device *mdev = to_mdev_device(dev);
@@ -482,16 +117,15 @@ int vfio_mpci_probe(struct device *dev)
 
 	vmdev->mdev = mdev;
 	vmdev->group = mdev->group;
-	mutex_init(&vmdev->vfio_mdev_lock);
 
-	ret = vfio_add_group_dev(dev, &vfio_mpci_dev_ops, vmdev);
+	ret = vfio_add_group_dev(dev, &vfio_mdev_dev_ops, vmdev);
 	if (ret)
 		kfree(vmdev);
 
 	return ret;
 }
 
-void vfio_mpci_remove(struct device *dev)
+static void vfio_mdev_remove(struct device *dev)
 {
 	struct vfio_mdev *vmdev;
 
@@ -499,33 +133,34 @@ void vfio_mpci_remove(struct device *dev)
 	kfree(vmdev);
 }
 
-int vfio_mpci_match(struct device *dev)
+/*
+ * Having one single driver to match all available mdev devices is enough,
+ * for every possible type.
+ */
+static int vfio_mdev_match(struct device *dev)
 {
-	if (dev_is_pci(dev->parent))
-		return 1;
-
-	return 0;
+	return 1;
 }
 
-struct mdev_driver vfio_mpci_driver = {
-	.name	= "vfio_mpci",
-	.probe	= vfio_mpci_probe,
-	.remove	= vfio_mpci_remove,
-	.match	= vfio_mpci_match,
+static struct mdev_driver vfio_mdev_driver = {
+	.name	= "vfio_mdev",
+	.probe	= vfio_mdev_probe,
+	.remove	= vfio_mdev_remove,
+	.match	= vfio_mdev_match,
 };
 
-static int __init vfio_mpci_init(void)
+static int __init vfio_mdev_init(void)
 {
-	return mdev_register_driver(&vfio_mpci_driver, THIS_MODULE);
+	return mdev_register_driver(&vfio_mdev_driver, THIS_MODULE);
 }
 
-static void __exit vfio_mpci_exit(void)
+static void __exit vfio_mdev_exit(void)
 {
-	mdev_unregister_driver(&vfio_mpci_driver);
+	mdev_unregister_driver(&vfio_mdev_driver);
 }
 
-module_init(vfio_mpci_init)
-module_exit(vfio_mpci_exit)
+module_init(vfio_mdev_init)
+module_exit(vfio_mdev_exit)
 
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL");
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 4b69e09..2128de8 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -19,6 +19,12 @@
 #ifndef VFIO_PCI_PRIVATE_H
 #define VFIO_PCI_PRIVATE_H
 
+#define VFIO_PCI_OFFSET_SHIFT   40
+
+#define VFIO_PCI_OFFSET_TO_INDEX(off)	(off >> VFIO_PCI_OFFSET_SHIFT)
+#define VFIO_PCI_INDEX_TO_OFFSET(index)	((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
+#define VFIO_PCI_OFFSET_MASK	(((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
+
 /* Special capability IDs predefined access */
 #define PCI_CAP_ID_INVALID		0xFF	/* default raw access */
 #define PCI_CAP_ID_INVALID_VIRT		0xFE	/* default virt access */
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index c06967d..f99c9e2 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -15,33 +15,17 @@
 
 #include <uapi/linux/vfio.h>
 
-struct parent_device;
 
 /*
  * Mediated device
  */
-
-struct addr_desc {
-	unsigned long start;
-	unsigned long size;
-	struct list_head next;
-};
-
-struct mdev_phys_mapping {
-	struct address_space *mapping;
-	struct list_head addr_desc_list;
-	struct mutex addr_desc_list_lock;
-};
-
+struct parent_device;
 struct mdev_device {
 	struct device		dev;
 	struct iommu_group	*group;
 	uuid_le			uuid;
 	uint32_t		instance;
 	void			*driver_data;
-
-	/* internal only */
-	struct mdev_phys_mapping phys_mappings;
 };
 
 
@@ -96,19 +80,7 @@ struct mdev_device {
  *			@count: number of bytes to be written
  *			@pos: address.
  *			Retuns number on bytes written on success or error.
- * @set_irqs:		Called to send about interrupts configuration
- *			information that VMM sets.
- *			@mdev: mediated device structure
- *			@flags, index, start, count and *data : same as that of
- *			struct vfio_irq_set of VFIO_DEVICE_SET_IRQS API.
- * @get_region_info:	Called to get VFIO region size and flags of mediated
- *			device.
- *			@mdev: mediated device structure
- *			@region_index: VFIO region index
- *			@region_info: output, returns size and flags of
- *				      requested region.
- *			Returns integer: success (0) or error (< 0)
- * @validate_map_request: Validate remap pfn request
+ * @mmap:		Memory Map
  *			@mdev: mediated device structure
  *			@pos: address
  *			@virtaddr: target user address to start at. Vendor
@@ -133,21 +105,18 @@ struct parent_ops {
 	int	(*supported_config)(struct device *dev, char *config);
 	int     (*create)(struct mdev_device *mdev, char *mdev_params);
 	int     (*destroy)(struct mdev_device *mdev);
+
 	int     (*reset)(struct mdev_device *mdev);
 	int     (*start)(struct mdev_device *mdev);
 	int     (*stop)(struct mdev_device *mdev);
-	ssize_t (*read)(struct mdev_device *mdev, char *buf, size_t count,
-			loff_t pos);
-	ssize_t (*write)(struct mdev_device *mdev, char *buf, size_t count,
-			 loff_t pos);
-	int     (*set_irqs)(struct mdev_device *mdev, uint32_t flags,
-			    unsigned int index, unsigned int start,
-			    unsigned int count, void *data);
-	int	(*get_region_info)(struct mdev_device *mdev, int region_index,
-				   struct vfio_region_info *region_info);
-	int	(*validate_map_request)(struct mdev_device *mdev, loff_t pos,
-					u64 *virtaddr, unsigned long *pfn,
-					unsigned long *size, pgprot_t *prot);
+
+	ssize_t (*read)(struct mdev_device *mdev, char __user *buf, size_t count,
+			loff_t *pos);
+	ssize_t (*write)(struct mdev_device *mdev, const char __user *buf,
+			size_t count, loff_t *pos);
+	int	(*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
+	long	(*ioctl)(struct mdev_device *mdev, unsigned int cmd,
+			unsigned long arg);
 };
 
 /*
@@ -209,13 +178,4 @@ extern void mdev_unregister_driver(struct mdev_driver *drv);
 
 extern struct mdev_device *mdev_get_device_by_group(struct iommu_group *group);
 
-extern int mdev_device_invalidate_mapping(struct mdev_device *mdev,
-					unsigned long addr, unsigned long size);
-
-extern int mdev_add_phys_mapping(struct mdev_device *mdev,
-				 struct address_space *mapping,
-				 unsigned long addr, unsigned long size);
-
-
-extern void mdev_del_phys_mapping(struct mdev_device *mdev, unsigned long addr);
 #endif /* MDEV_H */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index abae882..f2f0daf 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -19,13 +19,6 @@
 #include <uapi/linux/vfio.h>
 #include <linux/mdev.h>
 
-#define VFIO_PCI_OFFSET_SHIFT   40
-
-#define VFIO_PCI_OFFSET_TO_INDEX(off)   (off >> VFIO_PCI_OFFSET_SHIFT)
-#define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
-#define VFIO_PCI_OFFSET_MASK    (((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
-
-
 /**
  * struct vfio_device_ops - VFIO bus driver device callbacks
  *
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v6-based v1 4/5] vfio_mpci: remove PCI-specific logic
@ 2016-08-16  8:14   ` Jike Song
  0 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

As a generic mdev driver, 'vfio_mpci' should be BUS-agnostic and
as thin as possible, passing through all device operations directly
to underlying physical driver(a.k.a the device-model).

Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/mdev/mdev_core.c       | 107 --------
 drivers/vfio/mdev/vfio_mpci.c       | 469 ++++--------------------------------
 drivers/vfio/pci/vfio_pci_private.h |   6 +
 include/linux/mdev.h                |  62 +----
 include/linux/vfio.h                |   7 -
 5 files changed, 69 insertions(+), 582 deletions(-)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 934dac9..f824856 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -272,113 +272,6 @@ int mdev_device_destroy(struct device *dev, uuid_le uuid, uint32_t instance)
 	return __mdev_device_destroy(&mdev->dev, (void *)false);
 }
 
-int mdev_device_invalidate_mapping(struct mdev_device *mdev,
-				   unsigned long addr, unsigned long size)
-{
-	int ret = -EINVAL;
-	struct mdev_phys_mapping *phys_mappings;
-	struct addr_desc *addr_desc;
-
-	if (!mdev || !mdev->phys_mappings.mapping)
-		return ret;
-
-	phys_mappings = &mdev->phys_mappings;
-
-	mutex_lock(&phys_mappings->addr_desc_list_lock);
-
-	list_for_each_entry(addr_desc, &phys_mappings->addr_desc_list, next) {
-
-		if ((addr > addr_desc->start) &&
-		    (addr + size < addr_desc->start + addr_desc->size)) {
-			unmap_mapping_range(phys_mappings->mapping,
-					    addr, size, 0);
-			ret = 0;
-			goto unlock_exit;
-		}
-	}
-
-unlock_exit:
-	mutex_unlock(&phys_mappings->addr_desc_list_lock);
-	return ret;
-}
-EXPORT_SYMBOL(mdev_device_invalidate_mapping);
-
-/* Sanity check for the physical mapping list for mediated device */
-
-int mdev_add_phys_mapping(struct mdev_device *mdev,
-			  struct address_space *mapping,
-			  unsigned long addr, unsigned long size)
-{
-	struct mdev_phys_mapping *phys_mappings;
-	struct addr_desc *addr_desc, *new_addr_desc;
-	int ret = 0;
-
-	if (!mdev)
-		return -EINVAL;
-
-	phys_mappings = &mdev->phys_mappings;
-	if (phys_mappings->mapping && (mapping != phys_mappings->mapping))
-		return -EINVAL;
-
-	if (!phys_mappings->mapping) {
-		phys_mappings->mapping = mapping;
-		mutex_init(&phys_mappings->addr_desc_list_lock);
-		INIT_LIST_HEAD(&phys_mappings->addr_desc_list);
-	}
-
-	mutex_lock(&phys_mappings->addr_desc_list_lock);
-
-	list_for_each_entry(addr_desc, &phys_mappings->addr_desc_list, next) {
-		if ((addr + size < addr_desc->start) ||
-		    (addr_desc->start + addr_desc->size) < addr)
-			continue;
-		else {
-			/* should be no overlap */
-			ret = -EINVAL;
-			goto mapping_exit;
-		}
-	}
-
-	/* add the new entry to the list */
-	new_addr_desc = kzalloc(sizeof(*new_addr_desc), GFP_KERNEL);
-
-	if (!new_addr_desc) {
-		ret = -ENOMEM;
-		goto mapping_exit;
-	}
-
-	new_addr_desc->start = addr;
-	new_addr_desc->size = size;
-	list_add(&new_addr_desc->next, &phys_mappings->addr_desc_list);
-
-mapping_exit:
-	mutex_unlock(&phys_mappings->addr_desc_list_lock);
-	return ret;
-}
-EXPORT_SYMBOL(mdev_add_phys_mapping);
-
-void mdev_del_phys_mapping(struct mdev_device *mdev, unsigned long addr)
-{
-	struct mdev_phys_mapping *phys_mappings;
-	struct addr_desc *addr_desc;
-
-	if (!mdev)
-		return;
-
-	phys_mappings = &mdev->phys_mappings;
-
-	mutex_lock(&phys_mappings->addr_desc_list_lock);
-	list_for_each_entry(addr_desc, &phys_mappings->addr_desc_list, next) {
-		if (addr_desc->start == addr) {
-			list_del(&addr_desc->next);
-			kfree(addr_desc);
-			break;
-		}
-	}
-	mutex_unlock(&phys_mappings->addr_desc_list_lock);
-}
-EXPORT_SYMBOL(mdev_del_phys_mapping);
-
 void mdev_device_supported_config(struct device *dev, char *str)
 {
 	struct parent_device *parent = dev_to_parent_dev(dev);
diff --git a/drivers/vfio/mdev/vfio_mpci.c b/drivers/vfio/mdev/vfio_mpci.c
index 502aeb7..ca47676 100644
--- a/drivers/vfio/mdev/vfio_mpci.c
+++ b/drivers/vfio/mdev/vfio_mpci.c
@@ -1,5 +1,5 @@
 /*
- * VFIO based Mediated PCI device driver
+ * VFIO Bus driver for Mediated device
  *
  * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
  *     Author: Neo Jia <cjia@nvidia.com>
@@ -24,453 +24,88 @@
 
 #define DRIVER_VERSION  "0.1"
 #define DRIVER_AUTHOR   "NVIDIA Corporation"
-#define DRIVER_DESC     "VFIO based Mediated PCI device driver"
+#define DRIVER_DESC     "VFIO Bus driver for Mediated device"
 
 struct vfio_mdev {
 	struct iommu_group *group;
 	struct mdev_device *mdev;
-	int		    refcnt;
-	struct vfio_region_info vfio_region_info[VFIO_PCI_NUM_REGIONS];
-	struct mutex	    vfio_mdev_lock;
 };
 
-static int vfio_mpci_open(void *device_data)
+static int vfio_mdev_open(void *device_data)
 {
-	int ret = 0;
-	struct vfio_mdev *vmdev = device_data;
-	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-
 	if (!try_module_get(THIS_MODULE))
 		return -ENODEV;
 
-	mutex_lock(&vmdev->vfio_mdev_lock);
-	if (!vmdev->refcnt && parent->ops->get_region_info) {
-		int index;
-
-		for (index = VFIO_PCI_BAR0_REGION_INDEX;
-		     index < VFIO_PCI_NUM_REGIONS; index++) {
-			ret = parent->ops->get_region_info(vmdev->mdev, index,
-					      &vmdev->vfio_region_info[index]);
-			if (ret)
-				goto open_error;
-		}
-	}
-
-	vmdev->refcnt++;
-
-open_error:
-	mutex_unlock(&vmdev->vfio_mdev_lock);
-	if (ret)
-		module_put(THIS_MODULE);
-
-	return ret;
-}
-
-static void vfio_mpci_close(void *device_data)
-{
-	struct vfio_mdev *vmdev = device_data;
-
-	mutex_lock(&vmdev->vfio_mdev_lock);
-	vmdev->refcnt--;
-	if (!vmdev->refcnt) {
-		memset(&vmdev->vfio_region_info, 0,
-			sizeof(vmdev->vfio_region_info));
-	}
-	mutex_unlock(&vmdev->vfio_mdev_lock);
-	module_put(THIS_MODULE);
-}
-
-static u8 mpci_find_pci_capability(struct mdev_device *mdev, u8 capability)
-{
-	loff_t pos = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_CONFIG_REGION_INDEX);
-	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-	u16 status;
-	u8  cap_ptr, cap_id = 0xff;
-
-	parent->ops->read(mdev, (char *)&status, sizeof(status),
-			  pos + PCI_STATUS);
-	if (!(status & PCI_STATUS_CAP_LIST))
-		return 0;
-
-	parent->ops->read(mdev, &cap_ptr, sizeof(cap_ptr),
-			  pos + PCI_CAPABILITY_LIST);
-
-	do {
-		cap_ptr &= 0xfc;
-		parent->ops->read(mdev, &cap_id, sizeof(cap_id),
-				  pos + cap_ptr + PCI_CAP_LIST_ID);
-		if (cap_id == capability)
-			return cap_ptr;
-		parent->ops->read(mdev, &cap_ptr, sizeof(cap_ptr),
-				  pos + cap_ptr + PCI_CAP_LIST_NEXT);
-	} while (cap_ptr && cap_id != 0xff);
-
 	return 0;
 }
 
-static int mpci_get_irq_count(struct vfio_mdev *vmdev, int irq_type)
+static void vfio_mdev_close(void *device_data)
 {
-	loff_t pos = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_CONFIG_REGION_INDEX);
-	struct mdev_device *mdev = vmdev->mdev;
-	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-
-	if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
-		u8 pin;
-
-		parent->ops->read(mdev, &pin, sizeof(pin),
-				  pos + PCI_INTERRUPT_PIN);
-		if (IS_ENABLED(CONFIG_VFIO_PCI_INTX) && pin)
-			return 1;
-
-	} else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
-		u8 cap_ptr;
-		u16 flags;
-
-		cap_ptr = mpci_find_pci_capability(mdev, PCI_CAP_ID_MSI);
-		if (cap_ptr) {
-			parent->ops->read(mdev, (char *)&flags, sizeof(flags),
-					pos + cap_ptr + PCI_MSI_FLAGS);
-			return 1 << ((flags & PCI_MSI_FLAGS_QMASK) >> 1);
-		}
-	} else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) {
-		u8 cap_ptr;
-		u16 flags;
-
-		cap_ptr = mpci_find_pci_capability(mdev, PCI_CAP_ID_MSIX);
-		if (cap_ptr) {
-			parent->ops->read(mdev, (char *)&flags, sizeof(flags),
-					pos + cap_ptr + PCI_MSIX_FLAGS);
-
-			return (flags & PCI_MSIX_FLAGS_QSIZE) + 1;
-		}
-	} else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) {
-		u8 cap_ptr;
-
-		cap_ptr = mpci_find_pci_capability(mdev, PCI_CAP_ID_EXP);
-		if (cap_ptr)
-			return 1;
-	} else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) {
-		return 1;
-	}
-
-	return 0;
+	module_put(THIS_MODULE);
 }
 
-static long vfio_mpci_unlocked_ioctl(void *device_data,
+static long vfio_mdev_unlocked_ioctl(void *device_data,
 				     unsigned int cmd, unsigned long arg)
 {
-	int ret = 0;
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
 	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-	unsigned long minsz;
-
-	switch (cmd) {
-	case VFIO_DEVICE_GET_INFO:
-	{
-		struct vfio_device_info info;
-
-		minsz = offsetofend(struct vfio_device_info, num_irqs);
-
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (info.argsz < minsz)
-			return -EINVAL;
-
-		info.flags = VFIO_DEVICE_FLAGS_PCI;
-
-		if (parent->ops->reset)
-			info.flags |= VFIO_DEVICE_FLAGS_RESET;
-
-		info.num_regions = VFIO_PCI_NUM_REGIONS;
-		info.num_irqs = VFIO_PCI_NUM_IRQS;
-
-		return copy_to_user((void __user *)arg, &info, minsz);
-	}
-	case VFIO_DEVICE_GET_REGION_INFO:
-	{
-		struct vfio_region_info info;
-
-		minsz = offsetofend(struct vfio_region_info, offset);
-
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (info.argsz < minsz)
-			return -EINVAL;
-
-		switch (info.index) {
-		case VFIO_PCI_CONFIG_REGION_INDEX:
-		case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
-			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
-			info.size = vmdev->vfio_region_info[info.index].size;
-			if (!info.size) {
-				info.flags = 0;
-				break;
-			}
-
-			info.flags = vmdev->vfio_region_info[info.index].flags;
-			break;
-		case VFIO_PCI_VGA_REGION_INDEX:
-		case VFIO_PCI_ROM_REGION_INDEX:
-		default:
-			return -EINVAL;
-		}
-
-		return copy_to_user((void __user *)arg, &info, minsz);
-	}
-	case VFIO_DEVICE_GET_IRQ_INFO:
-	{
-		struct vfio_irq_info info;
-
-		minsz = offsetofend(struct vfio_irq_info, count);
-
-		if (copy_from_user(&info, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
-			return -EINVAL;
-
-		switch (info.index) {
-		case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSI_IRQ_INDEX:
-		case VFIO_PCI_REQ_IRQ_INDEX:
-			break;
-			/* pass thru to return error */
-		case VFIO_PCI_MSIX_IRQ_INDEX:
-		default:
-			return -EINVAL;
-		}
-
-		info.flags = VFIO_IRQ_INFO_EVENTFD;
-		info.count = mpci_get_irq_count(vmdev, info.index);
-
-		if (info.count == -1)
-			return -EINVAL;
-
-		if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
-			info.flags |= (VFIO_IRQ_INFO_MASKABLE |
-					VFIO_IRQ_INFO_AUTOMASKED);
-		else
-			info.flags |= VFIO_IRQ_INFO_NORESIZE;
-
-		return copy_to_user((void __user *)arg, &info, minsz);
-	}
-	case VFIO_DEVICE_SET_IRQS:
-	{
-		struct vfio_irq_set hdr;
-		u8 *data = NULL, *ptr = NULL;
-
-		minsz = offsetofend(struct vfio_irq_set, count);
-
-		if (copy_from_user(&hdr, (void __user *)arg, minsz))
-			return -EFAULT;
-
-		if (hdr.argsz < minsz || hdr.index >= VFIO_PCI_NUM_IRQS ||
-		    hdr.flags & ~(VFIO_IRQ_SET_DATA_TYPE_MASK |
-				  VFIO_IRQ_SET_ACTION_TYPE_MASK))
-			return -EINVAL;
-
-		if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) {
-			size_t size;
-			int max = mpci_get_irq_count(vmdev, hdr.index);
-
-			if (hdr.flags & VFIO_IRQ_SET_DATA_BOOL)
-				size = sizeof(uint8_t);
-			else if (hdr.flags & VFIO_IRQ_SET_DATA_EVENTFD)
-				size = sizeof(int32_t);
-			else
-				return -EINVAL;
-
-			if (hdr.argsz - minsz < hdr.count * size ||
-			    hdr.start >= max || hdr.start + hdr.count > max)
-				return -EINVAL;
-
-			ptr = data = memdup_user((void __user *)(arg + minsz),
-						 hdr.count * size);
-			if (IS_ERR(data))
-				return PTR_ERR(data);
-		}
-
-		if (parent->ops->set_irqs)
-			ret = parent->ops->set_irqs(mdev, hdr.flags, hdr.index,
-						    hdr.start, hdr.count, data);
-
-		kfree(ptr);
-		return ret;
-	}
-	case VFIO_DEVICE_RESET:
-	{
-		if (parent->ops->reset)
-			return parent->ops->reset(vmdev->mdev);
-
-		return -EINVAL;
-	}
-	}
-	return -ENOTTY;
+
+	if (parent && parent->ops && parent->ops->ioctl)
+		return parent->ops->ioctl(mdev, cmd, arg);
+
+	return -ENODEV;
 }
 
-static ssize_t vfio_mpci_read(void *device_data, char __user *buf,
+static ssize_t vfio_mdev_read(void *device_data, char __user *buf,
 			      size_t count, loff_t *ppos)
 {
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
 	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-	int ret = 0;
-
-	if (!count)
-		return 0;
 
-	if (parent->ops->read) {
-		char *ret_data, *ptr;
+	if (parent && parent->ops && parent->ops->read)
+		return parent->ops->read(mdev, buf, count, ppos);
 
-		ptr = ret_data = kzalloc(count, GFP_KERNEL);
-
-		if (!ret_data)
-			return  -ENOMEM;
-
-		ret = parent->ops->read(mdev, ret_data, count, *ppos);
-
-		if (ret > 0) {
-			if (copy_to_user(buf, ret_data, ret))
-				ret = -EFAULT;
-			else
-				*ppos += ret;
-		}
-		kfree(ptr);
-	}
-
-	return ret;
+	return -ENODEV;
 }
 
-static ssize_t vfio_mpci_write(void *device_data, const char __user *buf,
+static ssize_t vfio_mdev_write(void *device_data, const char __user *buf,
 			       size_t count, loff_t *ppos)
 {
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
 	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
-	int ret = 0;
-
-	if (!count)
-		return 0;
-
-	if (parent->ops->write) {
-		char *usr_data, *ptr;
-
-		ptr = usr_data = memdup_user(buf, count);
-		if (IS_ERR(usr_data))
-			return PTR_ERR(usr_data);
-
-		ret = parent->ops->write(mdev, usr_data, count, *ppos);
-
-		if (ret > 0)
-			*ppos += ret;
-
-		kfree(ptr);
-	}
-
-	return ret;
-}
-
-static int mdev_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
-{
-	int ret;
-	struct vfio_mdev *vmdev = vma->vm_private_data;
-	struct mdev_device *mdev;
-	struct parent_device *parent;
-	u64 virtaddr = (u64)vmf->virtual_address;
-	unsigned long req_size, pgoff = 0;
-	pgprot_t pg_prot;
-	unsigned int index;
-
-	if (!vmdev && !vmdev->mdev)
-		return -EINVAL;
-
-	mdev = vmdev->mdev;
-	parent  = dev_to_parent_dev(mdev->dev.parent);
-
-	pg_prot  = vma->vm_page_prot;
-
-	if (parent->ops->validate_map_request) {
-		u64 offset;
-		loff_t pos;
-
-		offset   = virtaddr - vma->vm_start;
-		req_size = vma->vm_end - virtaddr;
-		pos = (vma->vm_pgoff << PAGE_SHIFT) + offset;
-
-		ret = parent->ops->validate_map_request(mdev, pos, &virtaddr,
-						&pgoff, &req_size, &pg_prot);
-		if (ret)
-			return ret;
-
-		/*
-		 * Verify pgoff and req_size are valid and virtaddr is within
-		 * vma range
-		 */
-		if (!pgoff || !req_size || (virtaddr < vma->vm_start) ||
-		    ((virtaddr + req_size) >= vma->vm_end))
-			return -EINVAL;
-	} else {
-		struct pci_dev *pdev;
-
-		virtaddr = vma->vm_start;
-		req_size = vma->vm_end - vma->vm_start;
-
-		pdev = to_pci_dev(parent->dev.parent);
-		index = VFIO_PCI_OFFSET_TO_INDEX(vma->vm_pgoff << PAGE_SHIFT);
-		pgoff = pci_resource_start(pdev, index) >> PAGE_SHIFT;
-	}
-
-	ret = remap_pfn_range(vma, virtaddr, pgoff, req_size, pg_prot);
-
-	return ret | VM_FAULT_NOPAGE;
-}
 
-void mdev_dev_mmio_close(struct vm_area_struct *vma)
-{
-	struct vfio_mdev *vmdev = vma->vm_private_data;
-	struct mdev_device *mdev = vmdev->mdev;
+	if (parent && parent->ops && parent->ops->write)
+		return parent->ops->write(mdev, buf, count, ppos);
 
-	mdev_del_phys_mapping(mdev, vma->vm_pgoff << PAGE_SHIFT);
+	return -ENODEV;
 }
 
-static const struct vm_operations_struct mdev_dev_mmio_ops = {
-	.fault = mdev_dev_mmio_fault,
-	.close = mdev_dev_mmio_close,
-};
-
-static int vfio_mpci_mmap(void *device_data, struct vm_area_struct *vma)
+static int vfio_mdev_mmap(void *device_data, struct vm_area_struct *vma)
 {
-	unsigned int index;
 	struct vfio_mdev *vmdev = device_data;
 	struct mdev_device *mdev = vmdev->mdev;
+	struct parent_device *parent = dev_to_parent_dev(mdev->dev.parent);
 
-	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
-
-	if (index >= VFIO_PCI_ROM_REGION_INDEX)
-		return -EINVAL;
-
-	vma->vm_private_data = vmdev;
-	vma->vm_ops = &mdev_dev_mmio_ops;
+	if (parent && parent->ops && parent->ops->mmap)
+		return parent->ops->mmap(mdev, vma);
 
-	return mdev_add_phys_mapping(mdev, vma->vm_file->f_mapping,
-				     vma->vm_pgoff << PAGE_SHIFT,
-				     vma->vm_end - vma->vm_start);
+	return -ENODEV;
 }
 
-static const struct vfio_device_ops vfio_mpci_dev_ops = {
-	.name		= "vfio-mpci",
-	.open		= vfio_mpci_open,
-	.release	= vfio_mpci_close,
-	.ioctl		= vfio_mpci_unlocked_ioctl,
-	.read		= vfio_mpci_read,
-	.write		= vfio_mpci_write,
-	.mmap		= vfio_mpci_mmap,
+static const struct vfio_device_ops vfio_mdev_dev_ops = {
+	.name		= "vfio-mdev",
+	.open		= vfio_mdev_open,
+	.release	= vfio_mdev_close,
+	.ioctl		= vfio_mdev_unlocked_ioctl,
+	.read		= vfio_mdev_read,
+	.write		= vfio_mdev_write,
+	.mmap		= vfio_mdev_mmap,
 };
 
-int vfio_mpci_probe(struct device *dev)
+static int vfio_mdev_probe(struct device *dev)
 {
 	struct vfio_mdev *vmdev;
 	struct mdev_device *mdev = to_mdev_device(dev);
@@ -482,16 +117,15 @@ int vfio_mpci_probe(struct device *dev)
 
 	vmdev->mdev = mdev;
 	vmdev->group = mdev->group;
-	mutex_init(&vmdev->vfio_mdev_lock);
 
-	ret = vfio_add_group_dev(dev, &vfio_mpci_dev_ops, vmdev);
+	ret = vfio_add_group_dev(dev, &vfio_mdev_dev_ops, vmdev);
 	if (ret)
 		kfree(vmdev);
 
 	return ret;
 }
 
-void vfio_mpci_remove(struct device *dev)
+static void vfio_mdev_remove(struct device *dev)
 {
 	struct vfio_mdev *vmdev;
 
@@ -499,33 +133,34 @@ void vfio_mpci_remove(struct device *dev)
 	kfree(vmdev);
 }
 
-int vfio_mpci_match(struct device *dev)
+/*
+ * Having one single driver to match all available mdev devices is enough,
+ * for every possible type.
+ */
+static int vfio_mdev_match(struct device *dev)
 {
-	if (dev_is_pci(dev->parent))
-		return 1;
-
-	return 0;
+	return 1;
 }
 
-struct mdev_driver vfio_mpci_driver = {
-	.name	= "vfio_mpci",
-	.probe	= vfio_mpci_probe,
-	.remove	= vfio_mpci_remove,
-	.match	= vfio_mpci_match,
+static struct mdev_driver vfio_mdev_driver = {
+	.name	= "vfio_mdev",
+	.probe	= vfio_mdev_probe,
+	.remove	= vfio_mdev_remove,
+	.match	= vfio_mdev_match,
 };
 
-static int __init vfio_mpci_init(void)
+static int __init vfio_mdev_init(void)
 {
-	return mdev_register_driver(&vfio_mpci_driver, THIS_MODULE);
+	return mdev_register_driver(&vfio_mdev_driver, THIS_MODULE);
 }
 
-static void __exit vfio_mpci_exit(void)
+static void __exit vfio_mdev_exit(void)
 {
-	mdev_unregister_driver(&vfio_mpci_driver);
+	mdev_unregister_driver(&vfio_mdev_driver);
 }
 
-module_init(vfio_mpci_init)
-module_exit(vfio_mpci_exit)
+module_init(vfio_mdev_init)
+module_exit(vfio_mdev_exit)
 
 MODULE_VERSION(DRIVER_VERSION);
 MODULE_LICENSE("GPL");
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 4b69e09..2128de8 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -19,6 +19,12 @@
 #ifndef VFIO_PCI_PRIVATE_H
 #define VFIO_PCI_PRIVATE_H
 
+#define VFIO_PCI_OFFSET_SHIFT   40
+
+#define VFIO_PCI_OFFSET_TO_INDEX(off)	(off >> VFIO_PCI_OFFSET_SHIFT)
+#define VFIO_PCI_INDEX_TO_OFFSET(index)	((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
+#define VFIO_PCI_OFFSET_MASK	(((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
+
 /* Special capability IDs predefined access */
 #define PCI_CAP_ID_INVALID		0xFF	/* default raw access */
 #define PCI_CAP_ID_INVALID_VIRT		0xFE	/* default virt access */
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
index c06967d..f99c9e2 100644
--- a/include/linux/mdev.h
+++ b/include/linux/mdev.h
@@ -15,33 +15,17 @@
 
 #include <uapi/linux/vfio.h>
 
-struct parent_device;
 
 /*
  * Mediated device
  */
-
-struct addr_desc {
-	unsigned long start;
-	unsigned long size;
-	struct list_head next;
-};
-
-struct mdev_phys_mapping {
-	struct address_space *mapping;
-	struct list_head addr_desc_list;
-	struct mutex addr_desc_list_lock;
-};
-
+struct parent_device;
 struct mdev_device {
 	struct device		dev;
 	struct iommu_group	*group;
 	uuid_le			uuid;
 	uint32_t		instance;
 	void			*driver_data;
-
-	/* internal only */
-	struct mdev_phys_mapping phys_mappings;
 };
 
 
@@ -96,19 +80,7 @@ struct mdev_device {
  *			@count: number of bytes to be written
  *			@pos: address.
  *			Retuns number on bytes written on success or error.
- * @set_irqs:		Called to send about interrupts configuration
- *			information that VMM sets.
- *			@mdev: mediated device structure
- *			@flags, index, start, count and *data : same as that of
- *			struct vfio_irq_set of VFIO_DEVICE_SET_IRQS API.
- * @get_region_info:	Called to get VFIO region size and flags of mediated
- *			device.
- *			@mdev: mediated device structure
- *			@region_index: VFIO region index
- *			@region_info: output, returns size and flags of
- *				      requested region.
- *			Returns integer: success (0) or error (< 0)
- * @validate_map_request: Validate remap pfn request
+ * @mmap:		Memory Map
  *			@mdev: mediated device structure
  *			@pos: address
  *			@virtaddr: target user address to start at. Vendor
@@ -133,21 +105,18 @@ struct parent_ops {
 	int	(*supported_config)(struct device *dev, char *config);
 	int     (*create)(struct mdev_device *mdev, char *mdev_params);
 	int     (*destroy)(struct mdev_device *mdev);
+
 	int     (*reset)(struct mdev_device *mdev);
 	int     (*start)(struct mdev_device *mdev);
 	int     (*stop)(struct mdev_device *mdev);
-	ssize_t (*read)(struct mdev_device *mdev, char *buf, size_t count,
-			loff_t pos);
-	ssize_t (*write)(struct mdev_device *mdev, char *buf, size_t count,
-			 loff_t pos);
-	int     (*set_irqs)(struct mdev_device *mdev, uint32_t flags,
-			    unsigned int index, unsigned int start,
-			    unsigned int count, void *data);
-	int	(*get_region_info)(struct mdev_device *mdev, int region_index,
-				   struct vfio_region_info *region_info);
-	int	(*validate_map_request)(struct mdev_device *mdev, loff_t pos,
-					u64 *virtaddr, unsigned long *pfn,
-					unsigned long *size, pgprot_t *prot);
+
+	ssize_t (*read)(struct mdev_device *mdev, char __user *buf, size_t count,
+			loff_t *pos);
+	ssize_t (*write)(struct mdev_device *mdev, const char __user *buf,
+			size_t count, loff_t *pos);
+	int	(*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
+	long	(*ioctl)(struct mdev_device *mdev, unsigned int cmd,
+			unsigned long arg);
 };
 
 /*
@@ -209,13 +178,4 @@ extern void mdev_unregister_driver(struct mdev_driver *drv);
 
 extern struct mdev_device *mdev_get_device_by_group(struct iommu_group *group);
 
-extern int mdev_device_invalidate_mapping(struct mdev_device *mdev,
-					unsigned long addr, unsigned long size);
-
-extern int mdev_add_phys_mapping(struct mdev_device *mdev,
-				 struct address_space *mapping,
-				 unsigned long addr, unsigned long size);
-
-
-extern void mdev_del_phys_mapping(struct mdev_device *mdev, unsigned long addr);
 #endif /* MDEV_H */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index abae882..f2f0daf 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -19,13 +19,6 @@
 #include <uapi/linux/vfio.h>
 #include <linux/mdev.h>
 
-#define VFIO_PCI_OFFSET_SHIFT   40
-
-#define VFIO_PCI_OFFSET_TO_INDEX(off)   (off >> VFIO_PCI_OFFSET_SHIFT)
-#define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT)
-#define VFIO_PCI_OFFSET_MASK    (((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
-
-
 /**
  * struct vfio_device_ops - VFIO bus driver device callbacks
  *
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v6-based v1 5/5] rename 'vfio_mpci' to 'vfio_mdev' since it is now BUS-agnostic
  2016-08-16  8:14 ` [Qemu-devel] " Jike Song
@ 2016-08-16  8:14   ` Jike Song
  -1 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/mdev/Kconfig                      | 10 +++++-----
 drivers/vfio/mdev/Makefile                     |  5 ++---
 drivers/vfio/mdev/{vfio_mpci.c => vfio_mdev.c} |  0
 3 files changed, 7 insertions(+), 8 deletions(-)
 rename drivers/vfio/mdev/{vfio_mpci.c => vfio_mdev.c} (100%)

diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index 431ed59..b2fe0c6 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -1,5 +1,5 @@
 
-config VFIO_MDEV
+config MDEV
     tristate "Mediated device driver framework"
     depends on VFIO
     default n
@@ -9,10 +9,10 @@ config VFIO_MDEV
 
         If you don't know what do here, say N.
 
-config VFIO_MPCI
-    tristate "VFIO support for Mediated PCI devices"
-    depends on VFIO && PCI && VFIO_MDEV
+config VFIO_MDEV
+    tristate "VFIO Bus driver for Mediated devices"
+    depends on VFIO && MDEV
     default n
     help
-        VFIO based driver for mediated PCI devices.
+        VFIO Bus driver for mediated devices.
 
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 264fb03..ee9f89f 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -1,6 +1,5 @@
 
 mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
 
-obj-$(CONFIG_VFIO_MDEV) += mdev.o
-obj-$(CONFIG_VFIO_MPCI) += vfio_mpci.o
-
+obj-$(CONFIG_MDEV) += mdev.o
+obj-$(CONFIG_VFIO_MDEV) += vfio_mdev.o
diff --git a/drivers/vfio/mdev/vfio_mpci.c b/drivers/vfio/mdev/vfio_mdev.c
similarity index 100%
rename from drivers/vfio/mdev/vfio_mpci.c
rename to drivers/vfio/mdev/vfio_mdev.c
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [RFC v6-based v1 5/5] rename 'vfio_mpci' to 'vfio_mdev' since it is now BUS-agnostic
@ 2016-08-16  8:14   ` Jike Song
  0 siblings, 0 replies; 22+ messages in thread
From: Jike Song @ 2016-08-16  8:14 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	jike.song, pbonzini, kraxel

Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/mdev/Kconfig                      | 10 +++++-----
 drivers/vfio/mdev/Makefile                     |  5 ++---
 drivers/vfio/mdev/{vfio_mpci.c => vfio_mdev.c} |  0
 3 files changed, 7 insertions(+), 8 deletions(-)
 rename drivers/vfio/mdev/{vfio_mpci.c => vfio_mdev.c} (100%)

diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index 431ed59..b2fe0c6 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -1,5 +1,5 @@
 
-config VFIO_MDEV
+config MDEV
     tristate "Mediated device driver framework"
     depends on VFIO
     default n
@@ -9,10 +9,10 @@ config VFIO_MDEV
 
         If you don't know what do here, say N.
 
-config VFIO_MPCI
-    tristate "VFIO support for Mediated PCI devices"
-    depends on VFIO && PCI && VFIO_MDEV
+config VFIO_MDEV
+    tristate "VFIO Bus driver for Mediated devices"
+    depends on VFIO && MDEV
     default n
     help
-        VFIO based driver for mediated PCI devices.
+        VFIO Bus driver for mediated devices.
 
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 264fb03..ee9f89f 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -1,6 +1,5 @@
 
 mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
 
-obj-$(CONFIG_VFIO_MDEV) += mdev.o
-obj-$(CONFIG_VFIO_MPCI) += vfio_mpci.o
-
+obj-$(CONFIG_MDEV) += mdev.o
+obj-$(CONFIG_VFIO_MDEV) += vfio_mdev.o
diff --git a/drivers/vfio/mdev/vfio_mpci.c b/drivers/vfio/mdev/vfio_mdev.c
similarity index 100%
rename from drivers/vfio/mdev/vfio_mpci.c
rename to drivers/vfio/mdev/vfio_mdev.c
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC v6-based v1 0/5] refine mdev framework
  2016-08-16  8:14 ` [Qemu-devel] " Jike Song
@ 2016-08-17  8:58   ` Dong Jia
  -1 siblings, 0 replies; 22+ messages in thread
From: Dong Jia @ 2016-08-17  8:58 UTC (permalink / raw)
  To: Jike Song
  Cc: kevin.tian, cjia, kvm, kwankhede, zhenyuw, qemu-devel,
	alex.williamson, kraxel, pbonzini, guangrong.xiao, Dong Jia

On Tue, 16 Aug 2016 16:14:12 +0800
Jike Song <jike.song@intel.com> wrote:

> 
> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> 
> 	http://www.spinics.net/lists/kvm/msg136472.html
> 
> 
> Background:
> 
> 	The patchset from NVidia introduced the Mediated Device support to
> 	Linux/VFIO. With that series, one can create virtual devices (supporting
> 	by underlying physical device and vendor driver), and assign them to
> 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
> 
> 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
> 	implemented its CCW pass-through.  However, there are limitations
> 	imposed by current (v6 in particular) mdev framework: the mdev must be
> 	represented as a PCI device, several vfio capabilities such as
> 	sparse mmap are not possible, and so forth.
> 
> 	This series aims to address above limitations and simplify the implementation.
> 
> 
> Key Changes:
> 
> 	- An independent "struct device" was introduced to parent_device, thus
> 	  a hierarchy in driver core is formed with physical device, parent device
> 	  and mdev device;
> 
> 	- Leveraging the mechanism and APIs provided by Linux driver core, it
> 	  is now safe to remove all refcnts and locks;
> 
> 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
> 	  PCI-specific logic was removed, accesses from userspace are now
> 	  passed to vendor driver directly, thus guaranteed that full VFIO
> 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
> 
> 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
> 	  driver for all mdev devices;

Hi Jike:

I don't know what happened, but finding out which direction this will
likely go seems my first priority now...

I'd say, either with only the original mdev v6, or patched this series,
vfio-ccw could live. But this series saves my work of mimicing the
vfio-mpci code in my vfio-mccw driver. I like this incremental patches.

I'm wondering if Alex and the Nvidia folks have some comments for this
from their point of views. And I'm really looking forward on their
feedback.

Thanks!

> 
> 	- UUID was removed from the interface between mdev and vendor driver;
> 
> 
> TODO
> 
> 	remove mdev stuff from vfio.h
> 	update doc
> 


--------
Dong Jia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v6-based v1 0/5] refine mdev framework
@ 2016-08-17  8:58   ` Dong Jia
  0 siblings, 0 replies; 22+ messages in thread
From: Dong Jia @ 2016-08-17  8:58 UTC (permalink / raw)
  To: Jike Song
  Cc: alex.williamson, kwankhede, cjia, qemu-devel, kvm, kevin.tian,
	guangrong.xiao, zhenyuw, pbonzini, kraxel, Dong Jia

On Tue, 16 Aug 2016 16:14:12 +0800
Jike Song <jike.song@intel.com> wrote:

> 
> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> 
> 	http://www.spinics.net/lists/kvm/msg136472.html
> 
> 
> Background:
> 
> 	The patchset from NVidia introduced the Mediated Device support to
> 	Linux/VFIO. With that series, one can create virtual devices (supporting
> 	by underlying physical device and vendor driver), and assign them to
> 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
> 
> 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
> 	implemented its CCW pass-through.  However, there are limitations
> 	imposed by current (v6 in particular) mdev framework: the mdev must be
> 	represented as a PCI device, several vfio capabilities such as
> 	sparse mmap are not possible, and so forth.
> 
> 	This series aims to address above limitations and simplify the implementation.
> 
> 
> Key Changes:
> 
> 	- An independent "struct device" was introduced to parent_device, thus
> 	  a hierarchy in driver core is formed with physical device, parent device
> 	  and mdev device;
> 
> 	- Leveraging the mechanism and APIs provided by Linux driver core, it
> 	  is now safe to remove all refcnts and locks;
> 
> 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
> 	  PCI-specific logic was removed, accesses from userspace are now
> 	  passed to vendor driver directly, thus guaranteed that full VFIO
> 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
> 
> 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
> 	  driver for all mdev devices;

Hi Jike:

I don't know what happened, but finding out which direction this will
likely go seems my first priority now...

I'd say, either with only the original mdev v6, or patched this series,
vfio-ccw could live. But this series saves my work of mimicing the
vfio-mpci code in my vfio-mccw driver. I like this incremental patches.

I'm wondering if Alex and the Nvidia folks have some comments for this
from their point of views. And I'm really looking forward on their
feedback.

Thanks!

> 
> 	- UUID was removed from the interface between mdev and vendor driver;
> 
> 
> TODO
> 
> 	remove mdev stuff from vfio.h
> 	update doc
> 


--------
Dong Jia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v6-based v1 0/5] refine mdev framework
  2016-08-17  8:58   ` [Qemu-devel] " Dong Jia
@ 2016-08-17 10:09     ` Neo Jia
  -1 siblings, 0 replies; 22+ messages in thread
From: Neo Jia @ 2016-08-17 10:09 UTC (permalink / raw)
  To: Dong Jia
  Cc: Jike Song, alex.williamson, kwankhede, qemu-devel, kvm,
	kevin.tian, guangrong.xiao, zhenyuw, pbonzini, kraxel

On Wed, Aug 17, 2016 at 04:58:14PM +0800, Dong Jia wrote:
> On Tue, 16 Aug 2016 16:14:12 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
> > 
> > This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> > 
> > 	http://www.spinics.net/lists/kvm/msg136472.html
> > 
> > 
> > Background:
> > 
> > 	The patchset from NVidia introduced the Mediated Device support to
> > 	Linux/VFIO. With that series, one can create virtual devices (supporting
> > 	by underlying physical device and vendor driver), and assign them to
> > 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
> > 
> > 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
> > 	implemented its CCW pass-through.  However, there are limitations
> > 	imposed by current (v6 in particular) mdev framework: the mdev must be
> > 	represented as a PCI device, several vfio capabilities such as
> > 	sparse mmap are not possible, and so forth.
> > 
> > 	This series aims to address above limitations and simplify the implementation.
> > 
> > 
> > Key Changes:
> > 
> > 	- An independent "struct device" was introduced to parent_device, thus
> > 	  a hierarchy in driver core is formed with physical device, parent device
> > 	  and mdev device;
> > 
> > 	- Leveraging the mechanism and APIs provided by Linux driver core, it
> > 	  is now safe to remove all refcnts and locks;
> > 
> > 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
> > 	  PCI-specific logic was removed, accesses from userspace are now
> > 	  passed to vendor driver directly, thus guaranteed that full VFIO
> > 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
> > 
> > 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
> > 	  driver for all mdev devices;
> 
> Hi Jike:
> 
> I don't know what happened, but finding out which direction this will
> likely go seems my first priority now...

Hi Dong,

Just want to let you know that we are preparing the v7 patches to incorporate
the latest review comments from Intel folks and Alex, for some changes in this
patch set also mentioned in the recent review are already queued up in the new
version.

> 
> I'd say, either with only the original mdev v6, or patched this series,
> vfio-ccw could live. But this series saves my work of mimicing the
> vfio-mpci code in my vfio-mccw driver. I like this incremental patches.

Thanks for sharing your progress and good to know our current v6 solution works 
for you. We are still evaluating the vfio_mdev changes here as I still prefer to
share general VFIO pci handling inside a common VFIO PCI driver, and the
modularization will reduce the impact of future changes and potential regressions
cross architectures - between PCI and CCW.

Thanks,
Neo

> 
> I'm wondering if Alex and the Nvidia folks have some comments for this
> from their point of views. And I'm really looking forward on their
> feedback.
> 
> Thanks!
> 
> > 
> > 	- UUID was removed from the interface between mdev and vendor driver;
> > 
> > 
> > TODO
> > 
> > 	remove mdev stuff from vfio.h
> > 	update doc
> > 
> 
> 
> --------
> Dong Jia
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v6-based v1 0/5] refine mdev framework
@ 2016-08-17 10:09     ` Neo Jia
  0 siblings, 0 replies; 22+ messages in thread
From: Neo Jia @ 2016-08-17 10:09 UTC (permalink / raw)
  To: Dong Jia
  Cc: Jike Song, alex.williamson, kwankhede, qemu-devel, kvm,
	kevin.tian, guangrong.xiao, zhenyuw, pbonzini, kraxel

On Wed, Aug 17, 2016 at 04:58:14PM +0800, Dong Jia wrote:
> On Tue, 16 Aug 2016 16:14:12 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
> > 
> > This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> > 
> > 	http://www.spinics.net/lists/kvm/msg136472.html
> > 
> > 
> > Background:
> > 
> > 	The patchset from NVidia introduced the Mediated Device support to
> > 	Linux/VFIO. With that series, one can create virtual devices (supporting
> > 	by underlying physical device and vendor driver), and assign them to
> > 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
> > 
> > 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
> > 	implemented its CCW pass-through.  However, there are limitations
> > 	imposed by current (v6 in particular) mdev framework: the mdev must be
> > 	represented as a PCI device, several vfio capabilities such as
> > 	sparse mmap are not possible, and so forth.
> > 
> > 	This series aims to address above limitations and simplify the implementation.
> > 
> > 
> > Key Changes:
> > 
> > 	- An independent "struct device" was introduced to parent_device, thus
> > 	  a hierarchy in driver core is formed with physical device, parent device
> > 	  and mdev device;
> > 
> > 	- Leveraging the mechanism and APIs provided by Linux driver core, it
> > 	  is now safe to remove all refcnts and locks;
> > 
> > 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
> > 	  PCI-specific logic was removed, accesses from userspace are now
> > 	  passed to vendor driver directly, thus guaranteed that full VFIO
> > 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
> > 
> > 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
> > 	  driver for all mdev devices;
> 
> Hi Jike:
> 
> I don't know what happened, but finding out which direction this will
> likely go seems my first priority now...

Hi Dong,

Just want to let you know that we are preparing the v7 patches to incorporate
the latest review comments from Intel folks and Alex, for some changes in this
patch set also mentioned in the recent review are already queued up in the new
version.

> 
> I'd say, either with only the original mdev v6, or patched this series,
> vfio-ccw could live. But this series saves my work of mimicing the
> vfio-mpci code in my vfio-mccw driver. I like this incremental patches.

Thanks for sharing your progress and good to know our current v6 solution works 
for you. We are still evaluating the vfio_mdev changes here as I still prefer to
share general VFIO pci handling inside a common VFIO PCI driver, and the
modularization will reduce the impact of future changes and potential regressions
cross architectures - between PCI and CCW.

Thanks,
Neo

> 
> I'm wondering if Alex and the Nvidia folks have some comments for this
> from their point of views. And I'm really looking forward on their
> feedback.
> 
> Thanks!
> 
> > 
> > 	- UUID was removed from the interface between mdev and vendor driver;
> > 
> > 
> > TODO
> > 
> > 	remove mdev stuff from vfio.h
> > 	update doc
> > 
> 
> 
> --------
> Dong Jia
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v6-based v1 0/5] refine mdev framework
  2016-08-17 10:09     ` [Qemu-devel] " Neo Jia
@ 2016-08-18  8:42       ` Dong Jia
  -1 siblings, 0 replies; 22+ messages in thread
From: Dong Jia @ 2016-08-18  8:42 UTC (permalink / raw)
  To: Neo Jia
  Cc: Jike Song, alex.williamson, kwankhede, qemu-devel, kvm,
	kevin.tian, guangrong.xiao, zhenyuw, pbonzini, kraxel, Dong Jia

On Wed, 17 Aug 2016 03:09:10 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Wed, Aug 17, 2016 at 04:58:14PM +0800, Dong Jia wrote:
> > On Tue, 16 Aug 2016 16:14:12 +0800
> > Jike Song <jike.song@intel.com> wrote:
> > 
> > > 
> > > This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> > > 
> > > 	http://www.spinics.net/lists/kvm/msg136472.html
> > > 
> > > 
> > > Background:
> > > 
> > > 	The patchset from NVidia introduced the Mediated Device support to
> > > 	Linux/VFIO. With that series, one can create virtual devices (supporting
> > > 	by underlying physical device and vendor driver), and assign them to
> > > 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
> > > 
> > > 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
> > > 	implemented its CCW pass-through.  However, there are limitations
> > > 	imposed by current (v6 in particular) mdev framework: the mdev must be
> > > 	represented as a PCI device, several vfio capabilities such as
> > > 	sparse mmap are not possible, and so forth.
> > > 
> > > 	This series aims to address above limitations and simplify the implementation.
> > > 
> > > 
> > > Key Changes:
> > > 
> > > 	- An independent "struct device" was introduced to parent_device, thus
> > > 	  a hierarchy in driver core is formed with physical device, parent device
> > > 	  and mdev device;
> > > 
> > > 	- Leveraging the mechanism and APIs provided by Linux driver core, it
> > > 	  is now safe to remove all refcnts and locks;
> > > 
> > > 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
> > > 	  PCI-specific logic was removed, accesses from userspace are now
> > > 	  passed to vendor driver directly, thus guaranteed that full VFIO
> > > 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
> > > 
> > > 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
> > > 	  driver for all mdev devices;
> > 
> > Hi Jike:
> > 
> > I don't know what happened, but finding out which direction this will
> > likely go seems my first priority now...
> 
> Hi Dong,
> 
> Just want to let you know that we are preparing the v7 patches to incorporate
> the latest review comments from Intel folks and Alex, for some changes in this
> patch set also mentioned in the recent review are already queued up in the new
> version.
Hi Neo,

Good to know this. :>

> 
> > 
> > I'd say, either with only the original mdev v6, or patched this series,
> > vfio-ccw could live. But this series saves my work of mimicing the
> > vfio-mpci code in my vfio-mccw driver. I like this incremental patches.
> 
> Thanks for sharing your progress and good to know our current v6 solution works 
> for you. We are still evaluating the vfio_mdev changes here as I still prefer to
> share general VFIO pci handling inside a common VFIO PCI driver, and the
> modularization will reduce the impact of future changes and potential regressions
> cross architectures - between PCI and CCW.
If this is something that Alex and the Intel folks are fine with, I have
no problem with this too. Thanks,

> 
> Thanks,
> Neo

[...]

--------
Dong Jia


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v6-based v1 0/5] refine mdev framework
@ 2016-08-18  8:42       ` Dong Jia
  0 siblings, 0 replies; 22+ messages in thread
From: Dong Jia @ 2016-08-18  8:42 UTC (permalink / raw)
  To: Neo Jia
  Cc: Jike Song, alex.williamson, kwankhede, qemu-devel, kvm,
	kevin.tian, guangrong.xiao, zhenyuw, pbonzini, kraxel, Dong Jia

On Wed, 17 Aug 2016 03:09:10 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Wed, Aug 17, 2016 at 04:58:14PM +0800, Dong Jia wrote:
> > On Tue, 16 Aug 2016 16:14:12 +0800
> > Jike Song <jike.song@intel.com> wrote:
> > 
> > > 
> > > This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> > > 
> > > 	http://www.spinics.net/lists/kvm/msg136472.html
> > > 
> > > 
> > > Background:
> > > 
> > > 	The patchset from NVidia introduced the Mediated Device support to
> > > 	Linux/VFIO. With that series, one can create virtual devices (supporting
> > > 	by underlying physical device and vendor driver), and assign them to
> > > 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
> > > 
> > > 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
> > > 	implemented its CCW pass-through.  However, there are limitations
> > > 	imposed by current (v6 in particular) mdev framework: the mdev must be
> > > 	represented as a PCI device, several vfio capabilities such as
> > > 	sparse mmap are not possible, and so forth.
> > > 
> > > 	This series aims to address above limitations and simplify the implementation.
> > > 
> > > 
> > > Key Changes:
> > > 
> > > 	- An independent "struct device" was introduced to parent_device, thus
> > > 	  a hierarchy in driver core is formed with physical device, parent device
> > > 	  and mdev device;
> > > 
> > > 	- Leveraging the mechanism and APIs provided by Linux driver core, it
> > > 	  is now safe to remove all refcnts and locks;
> > > 
> > > 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
> > > 	  PCI-specific logic was removed, accesses from userspace are now
> > > 	  passed to vendor driver directly, thus guaranteed that full VFIO
> > > 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
> > > 
> > > 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
> > > 	  driver for all mdev devices;
> > 
> > Hi Jike:
> > 
> > I don't know what happened, but finding out which direction this will
> > likely go seems my first priority now...
> 
> Hi Dong,
> 
> Just want to let you know that we are preparing the v7 patches to incorporate
> the latest review comments from Intel folks and Alex, for some changes in this
> patch set also mentioned in the recent review are already queued up in the new
> version.
Hi Neo,

Good to know this. :>

> 
> > 
> > I'd say, either with only the original mdev v6, or patched this series,
> > vfio-ccw could live. But this series saves my work of mimicing the
> > vfio-mpci code in my vfio-mccw driver. I like this incremental patches.
> 
> Thanks for sharing your progress and good to know our current v6 solution works 
> for you. We are still evaluating the vfio_mdev changes here as I still prefer to
> share general VFIO pci handling inside a common VFIO PCI driver, and the
> modularization will reduce the impact of future changes and potential regressions
> cross architectures - between PCI and CCW.
If this is something that Alex and the Intel folks are fine with, I have
no problem with this too. Thanks,

> 
> Thanks,
> Neo

[...]

--------
Dong Jia

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v6-based v1 0/5] refine mdev framework
  2016-08-18  8:42       ` [Qemu-devel] " Dong Jia
@ 2016-08-18 18:25         ` Alex Williamson
  -1 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2016-08-18 18:25 UTC (permalink / raw)
  To: Dong Jia
  Cc: kevin.tian, Neo Jia, kvm, Jike Song, zhenyuw, qemu-devel,
	kwankhede, kraxel, pbonzini, guangrong.xiao

On Thu, 18 Aug 2016 16:42:14 +0800
Dong Jia <bjsdjshi@linux.vnet.ibm.com> wrote:

> On Wed, 17 Aug 2016 03:09:10 -0700
> Neo Jia <cjia@nvidia.com> wrote:
> 
> > On Wed, Aug 17, 2016 at 04:58:14PM +0800, Dong Jia wrote:  
> > > On Tue, 16 Aug 2016 16:14:12 +0800
> > > Jike Song <jike.song@intel.com> wrote:
> > >   
> > > > 
> > > > This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> > > > 
> > > > 	http://www.spinics.net/lists/kvm/msg136472.html
> > > > 
> > > > 
> > > > Background:
> > > > 
> > > > 	The patchset from NVidia introduced the Mediated Device support to
> > > > 	Linux/VFIO. With that series, one can create virtual devices (supporting
> > > > 	by underlying physical device and vendor driver), and assign them to
> > > > 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
> > > > 
> > > > 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
> > > > 	implemented its CCW pass-through.  However, there are limitations
> > > > 	imposed by current (v6 in particular) mdev framework: the mdev must be
> > > > 	represented as a PCI device, several vfio capabilities such as
> > > > 	sparse mmap are not possible, and so forth.
> > > > 
> > > > 	This series aims to address above limitations and simplify the implementation.
> > > > 
> > > > 
> > > > Key Changes:
> > > > 
> > > > 	- An independent "struct device" was introduced to parent_device, thus
> > > > 	  a hierarchy in driver core is formed with physical device, parent device
> > > > 	  and mdev device;
> > > > 
> > > > 	- Leveraging the mechanism and APIs provided by Linux driver core, it
> > > > 	  is now safe to remove all refcnts and locks;
> > > > 
> > > > 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
> > > > 	  PCI-specific logic was removed, accesses from userspace are now
> > > > 	  passed to vendor driver directly, thus guaranteed that full VFIO
> > > > 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
> > > > 
> > > > 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
> > > > 	  driver for all mdev devices;  
> > > 
> > > Hi Jike:
> > > 
> > > I don't know what happened, but finding out which direction this will
> > > likely go seems my first priority now...  
> > 
> > Hi Dong,
> > 
> > Just want to let you know that we are preparing the v7 patches to incorporate
> > the latest review comments from Intel folks and Alex, for some changes in this
> > patch set also mentioned in the recent review are already queued up in the new
> > version.  
> Hi Neo,
> 
> Good to know this. :>
> 
> >   
> > > 
> > > I'd say, either with only the original mdev v6, or patched this series,
> > > vfio-ccw could live. But this series saves my work of mimicing the
> > > vfio-mpci code in my vfio-mccw driver. I like this incremental patches.  
> > 
> > Thanks for sharing your progress and good to know our current v6 solution works 
> > for you. We are still evaluating the vfio_mdev changes here as I still prefer to
> > share general VFIO pci handling inside a common VFIO PCI driver, and the
> > modularization will reduce the impact of future changes and potential regressions
> > cross architectures - between PCI and CCW.  
> If this is something that Alex and the Intel folks are fine with, I have
> no problem with this too. Thanks,

Overall, I like this a lot.  Creating a proper device hierarchy and
letting the driver core manage the references makes a lot of sense and
the reduction in code volume and complexity speaks for itself.  I like
how the PCI mdev layer goes away, we're not imposing arbitrary
restrictions on the vendor driver in an attempt to insert a common
layer.  We can add helpers for things that do end up being common as we
go.  Using devices rather than uuids for functions is a big
improvement.  I hope that Neo and Kirti will incorporate many of these
changes in their next revision.  Thanks for stepping in with this,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v6-based v1 0/5] refine mdev framework
@ 2016-08-18 18:25         ` Alex Williamson
  0 siblings, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2016-08-18 18:25 UTC (permalink / raw)
  To: Dong Jia
  Cc: Neo Jia, Jike Song, kwankhede, qemu-devel, kvm, kevin.tian,
	guangrong.xiao, zhenyuw, pbonzini, kraxel

On Thu, 18 Aug 2016 16:42:14 +0800
Dong Jia <bjsdjshi@linux.vnet.ibm.com> wrote:

> On Wed, 17 Aug 2016 03:09:10 -0700
> Neo Jia <cjia@nvidia.com> wrote:
> 
> > On Wed, Aug 17, 2016 at 04:58:14PM +0800, Dong Jia wrote:  
> > > On Tue, 16 Aug 2016 16:14:12 +0800
> > > Jike Song <jike.song@intel.com> wrote:
> > >   
> > > > 
> > > > This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> > > > 
> > > > 	http://www.spinics.net/lists/kvm/msg136472.html
> > > > 
> > > > 
> > > > Background:
> > > > 
> > > > 	The patchset from NVidia introduced the Mediated Device support to
> > > > 	Linux/VFIO. With that series, one can create virtual devices (supporting
> > > > 	by underlying physical device and vendor driver), and assign them to
> > > > 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
> > > > 
> > > > 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
> > > > 	implemented its CCW pass-through.  However, there are limitations
> > > > 	imposed by current (v6 in particular) mdev framework: the mdev must be
> > > > 	represented as a PCI device, several vfio capabilities such as
> > > > 	sparse mmap are not possible, and so forth.
> > > > 
> > > > 	This series aims to address above limitations and simplify the implementation.
> > > > 
> > > > 
> > > > Key Changes:
> > > > 
> > > > 	- An independent "struct device" was introduced to parent_device, thus
> > > > 	  a hierarchy in driver core is formed with physical device, parent device
> > > > 	  and mdev device;
> > > > 
> > > > 	- Leveraging the mechanism and APIs provided by Linux driver core, it
> > > > 	  is now safe to remove all refcnts and locks;
> > > > 
> > > > 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
> > > > 	  PCI-specific logic was removed, accesses from userspace are now
> > > > 	  passed to vendor driver directly, thus guaranteed that full VFIO
> > > > 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
> > > > 
> > > > 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
> > > > 	  driver for all mdev devices;  
> > > 
> > > Hi Jike:
> > > 
> > > I don't know what happened, but finding out which direction this will
> > > likely go seems my first priority now...  
> > 
> > Hi Dong,
> > 
> > Just want to let you know that we are preparing the v7 patches to incorporate
> > the latest review comments from Intel folks and Alex, for some changes in this
> > patch set also mentioned in the recent review are already queued up in the new
> > version.  
> Hi Neo,
> 
> Good to know this. :>
> 
> >   
> > > 
> > > I'd say, either with only the original mdev v6, or patched this series,
> > > vfio-ccw could live. But this series saves my work of mimicing the
> > > vfio-mpci code in my vfio-mccw driver. I like this incremental patches.  
> > 
> > Thanks for sharing your progress and good to know our current v6 solution works 
> > for you. We are still evaluating the vfio_mdev changes here as I still prefer to
> > share general VFIO pci handling inside a common VFIO PCI driver, and the
> > modularization will reduce the impact of future changes and potential regressions
> > cross architectures - between PCI and CCW.  
> If this is something that Alex and the Intel folks are fine with, I have
> no problem with this too. Thanks,

Overall, I like this a lot.  Creating a proper device hierarchy and
letting the driver core manage the references makes a lot of sense and
the reduction in code volume and complexity speaks for itself.  I like
how the PCI mdev layer goes away, we're not imposing arbitrary
restrictions on the vendor driver in an attempt to insert a common
layer.  We can add helpers for things that do end up being common as we
go.  Using devices rather than uuids for functions is a big
improvement.  I hope that Neo and Kirti will incorporate many of these
changes in their next revision.  Thanks for stepping in with this,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v6-based v1 0/5] refine mdev framework
  2016-08-18 18:25         ` [Qemu-devel] " Alex Williamson
@ 2016-08-19 17:29           ` Kirti Wankhede
  -1 siblings, 0 replies; 22+ messages in thread
From: Kirti Wankhede @ 2016-08-19 17:29 UTC (permalink / raw)
  To: Alex Williamson, Dong Jia
  Cc: Neo Jia, Jike Song, qemu-devel, kvm, kevin.tian, guangrong.xiao,
	zhenyuw, pbonzini, kraxel



On 8/18/2016 11:55 PM, Alex Williamson wrote:
> On Thu, 18 Aug 2016 16:42:14 +0800
> Dong Jia <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
>> On Wed, 17 Aug 2016 03:09:10 -0700
>> Neo Jia <cjia@nvidia.com> wrote:
>>
>>> On Wed, Aug 17, 2016 at 04:58:14PM +0800, Dong Jia wrote:  
>>>> On Tue, 16 Aug 2016 16:14:12 +0800
>>>> Jike Song <jike.song@intel.com> wrote:
>>>>   
>>>>>
>>>>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
>>>>>
>>>>> 	http://www.spinics.net/lists/kvm/msg136472.html
>>>>>
>>>>>
>>>>> Background:
>>>>>
>>>>> 	The patchset from NVidia introduced the Mediated Device support to
>>>>> 	Linux/VFIO. With that series, one can create virtual devices (supporting
>>>>> 	by underlying physical device and vendor driver), and assign them to
>>>>> 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
>>>>>
>>>>> 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
>>>>> 	implemented its CCW pass-through.  However, there are limitations
>>>>> 	imposed by current (v6 in particular) mdev framework: the mdev must be
>>>>> 	represented as a PCI device, several vfio capabilities such as
>>>>> 	sparse mmap are not possible, and so forth.
>>>>>
>>>>> 	This series aims to address above limitations and simplify the implementation.
>>>>>
>>>>>
>>>>> Key Changes:
>>>>>
>>>>> 	- An independent "struct device" was introduced to parent_device, thus
>>>>> 	  a hierarchy in driver core is formed with physical device, parent device
>>>>> 	  and mdev device;
>>>>>
>>>>> 	- Leveraging the mechanism and APIs provided by Linux driver core, it
>>>>> 	  is now safe to remove all refcnts and locks;
>>>>>
>>>>> 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
>>>>> 	  PCI-specific logic was removed, accesses from userspace are now
>>>>> 	  passed to vendor driver directly, thus guaranteed that full VFIO
>>>>> 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
>>>>>
>>>>> 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
>>>>> 	  driver for all mdev devices;  
>>>>
>>>> Hi Jike:
>>>>
>>>> I don't know what happened, but finding out which direction this will
>>>> likely go seems my first priority now...  
>>>
>>> Hi Dong,
>>>
>>> Just want to let you know that we are preparing the v7 patches to incorporate
>>> the latest review comments from Intel folks and Alex, for some changes in this
>>> patch set also mentioned in the recent review are already queued up in the new
>>> version.  
>> Hi Neo,
>>
>> Good to know this. :>
>>
>>>   
>>>>
>>>> I'd say, either with only the original mdev v6, or patched this series,
>>>> vfio-ccw could live. But this series saves my work of mimicing the
>>>> vfio-mpci code in my vfio-mccw driver. I like this incremental patches.  
>>>
>>> Thanks for sharing your progress and good to know our current v6 solution works 
>>> for you. We are still evaluating the vfio_mdev changes here as I still prefer to
>>> share general VFIO pci handling inside a common VFIO PCI driver, and the
>>> modularization will reduce the impact of future changes and potential regressions
>>> cross architectures - between PCI and CCW.  
>> If this is something that Alex and the Intel folks are fine with, I have
>> no problem with this too. Thanks,
> 
> Overall, I like this a lot.  Creating a proper device hierarchy and
> letting the driver core manage the references makes a lot of sense and
> the reduction in code volume and complexity speaks for itself.  

We are evaluating on this proposed solution. But the proposed patches
are not tested, those have bugs.

+#define dev_to_parent_dev(_dev) container_of((_dev),   \
+                                            struct parent_device, dev)

This macro itself is not correct and causes kernel crash. This macro
doesn't really takes to 'struct parent_device' as it aimed to.

We are evaluating on how this will change sysfs entries and what will be
the impact of fixing all these bugs or will that really going to help.

> I like
> how the PCI mdev layer goes away, we're not imposing arbitrary
> restrictions on the vendor driver in an attempt to insert a common
> layer.

We were trying to make it more and more configurable for vendor driver
while keeping common code at common place so that code is not replicated
in each vendor driver. We believe with the new version of v7, it will
become a common module instead of pci module.


>  We can add helpers for things that do end up being common as we
> go.  Using devices rather than uuids for functions is a big
> improvement.


This is agreed on reviews of v6 version of my patches. Now we are
introducing ‘online’ instead of start()/stop() and all are in agreement
with that, right?

Thanks,
Kirti

> I hope that Neo and Kirti will incorporate many of these
> changes in their next revision.  Thanks for stepping in with this,
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [RFC v6-based v1 0/5] refine mdev framework
@ 2016-08-19 17:29           ` Kirti Wankhede
  0 siblings, 0 replies; 22+ messages in thread
From: Kirti Wankhede @ 2016-08-19 17:29 UTC (permalink / raw)
  To: Alex Williamson, Dong Jia
  Cc: Neo Jia, Jike Song, qemu-devel, kvm, kevin.tian, guangrong.xiao,
	zhenyuw, pbonzini, kraxel



On 8/18/2016 11:55 PM, Alex Williamson wrote:
> On Thu, 18 Aug 2016 16:42:14 +0800
> Dong Jia <bjsdjshi@linux.vnet.ibm.com> wrote:
> 
>> On Wed, 17 Aug 2016 03:09:10 -0700
>> Neo Jia <cjia@nvidia.com> wrote:
>>
>>> On Wed, Aug 17, 2016 at 04:58:14PM +0800, Dong Jia wrote:  
>>>> On Tue, 16 Aug 2016 16:14:12 +0800
>>>> Jike Song <jike.song@intel.com> wrote:
>>>>   
>>>>>
>>>>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
>>>>>
>>>>> 	http://www.spinics.net/lists/kvm/msg136472.html
>>>>>
>>>>>
>>>>> Background:
>>>>>
>>>>> 	The patchset from NVidia introduced the Mediated Device support to
>>>>> 	Linux/VFIO. With that series, one can create virtual devices (supporting
>>>>> 	by underlying physical device and vendor driver), and assign them to
>>>>> 	userspace like QEMU/KVM, in the same way as device assignment via VFIO.
>>>>>
>>>>> 	Based on that, NVidia and Intel implemented their vGPU solutions, IBM
>>>>> 	implemented its CCW pass-through.  However, there are limitations
>>>>> 	imposed by current (v6 in particular) mdev framework: the mdev must be
>>>>> 	represented as a PCI device, several vfio capabilities such as
>>>>> 	sparse mmap are not possible, and so forth.
>>>>>
>>>>> 	This series aims to address above limitations and simplify the implementation.
>>>>>
>>>>>
>>>>> Key Changes:
>>>>>
>>>>> 	- An independent "struct device" was introduced to parent_device, thus
>>>>> 	  a hierarchy in driver core is formed with physical device, parent device
>>>>> 	  and mdev device;
>>>>>
>>>>> 	- Leveraging the mechanism and APIs provided by Linux driver core, it
>>>>> 	  is now safe to remove all refcnts and locks;
>>>>>
>>>>> 	- vfio_mpci (later renamed to vfio_mdev) was made BUS-agnostic: all
>>>>> 	  PCI-specific logic was removed, accesses from userspace are now
>>>>> 	  passed to vendor driver directly, thus guaranteed that full VFIO
>>>>> 	  capabilities provided: e.g. dynamic regions, sparse mmap, etc.;
>>>>>
>>>>> 	  With vfio_mdev being BUS-agnostic, it is enough to have only one
>>>>> 	  driver for all mdev devices;  
>>>>
>>>> Hi Jike:
>>>>
>>>> I don't know what happened, but finding out which direction this will
>>>> likely go seems my first priority now...  
>>>
>>> Hi Dong,
>>>
>>> Just want to let you know that we are preparing the v7 patches to incorporate
>>> the latest review comments from Intel folks and Alex, for some changes in this
>>> patch set also mentioned in the recent review are already queued up in the new
>>> version.  
>> Hi Neo,
>>
>> Good to know this. :>
>>
>>>   
>>>>
>>>> I'd say, either with only the original mdev v6, or patched this series,
>>>> vfio-ccw could live. But this series saves my work of mimicing the
>>>> vfio-mpci code in my vfio-mccw driver. I like this incremental patches.  
>>>
>>> Thanks for sharing your progress and good to know our current v6 solution works 
>>> for you. We are still evaluating the vfio_mdev changes here as I still prefer to
>>> share general VFIO pci handling inside a common VFIO PCI driver, and the
>>> modularization will reduce the impact of future changes and potential regressions
>>> cross architectures - between PCI and CCW.  
>> If this is something that Alex and the Intel folks are fine with, I have
>> no problem with this too. Thanks,
> 
> Overall, I like this a lot.  Creating a proper device hierarchy and
> letting the driver core manage the references makes a lot of sense and
> the reduction in code volume and complexity speaks for itself.  

We are evaluating on this proposed solution. But the proposed patches
are not tested, those have bugs.

+#define dev_to_parent_dev(_dev) container_of((_dev),   \
+                                            struct parent_device, dev)

This macro itself is not correct and causes kernel crash. This macro
doesn't really takes to 'struct parent_device' as it aimed to.

We are evaluating on how this will change sysfs entries and what will be
the impact of fixing all these bugs or will that really going to help.

> I like
> how the PCI mdev layer goes away, we're not imposing arbitrary
> restrictions on the vendor driver in an attempt to insert a common
> layer.

We were trying to make it more and more configurable for vendor driver
while keeping common code at common place so that code is not replicated
in each vendor driver. We believe with the new version of v7, it will
become a common module instead of pci module.


>  We can add helpers for things that do end up being common as we
> go.  Using devices rather than uuids for functions is a big
> improvement.


This is agreed on reviews of v6 version of my patches. Now we are
introducing ‘online’ instead of start()/stop() and all are in agreement
with that, right?

Thanks,
Kirti

> I hope that Neo and Kirti will incorporate many of these
> changes in their next revision.  Thanks for stepping in with this,
> 
> Alex
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-08-19 17:30 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-16  8:14 [RFC v6-based v1 0/5] refine mdev framework Jike Song
2016-08-16  8:14 ` [Qemu-devel] " Jike Song
2016-08-16  8:14 ` [RFC v6-based v1 1/5] mdev: create separate device for parent_device Jike Song
2016-08-16  8:14   ` [Qemu-devel] " Jike Song
2016-08-16  8:14 ` [RFC v6-based v1 2/5] mdev: drop management information and utilize driver-core instead Jike Song
2016-08-16  8:14   ` [Qemu-devel] " Jike Song
2016-08-16  8:14 ` [RFC v6-based v1 3/5] mdev: remove uuid from parent_ops Jike Song
2016-08-16  8:14   ` [Qemu-devel] " Jike Song
2016-08-16  8:14 ` [RFC v6-based v1 4/5] vfio_mpci: remove PCI-specific logic Jike Song
2016-08-16  8:14   ` [Qemu-devel] " Jike Song
2016-08-16  8:14 ` [RFC v6-based v1 5/5] rename 'vfio_mpci' to 'vfio_mdev' since it is now BUS-agnostic Jike Song
2016-08-16  8:14   ` [Qemu-devel] " Jike Song
2016-08-17  8:58 ` [RFC v6-based v1 0/5] refine mdev framework Dong Jia
2016-08-17  8:58   ` [Qemu-devel] " Dong Jia
2016-08-17 10:09   ` Neo Jia
2016-08-17 10:09     ` [Qemu-devel] " Neo Jia
2016-08-18  8:42     ` Dong Jia
2016-08-18  8:42       ` [Qemu-devel] " Dong Jia
2016-08-18 18:25       ` Alex Williamson
2016-08-18 18:25         ` [Qemu-devel] " Alex Williamson
2016-08-19 17:29         ` Kirti Wankhede
2016-08-19 17:29           ` [Qemu-devel] " Kirti Wankhede

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.