All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/4] adding mdev bus and vfio support
@ 2016-09-02  8:16 ` Jike Song
  0 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: kevin.tian, guangrong.xiao, kvm, qemu-devel, zhenyuw, jike.song,
	zhiyuan.lv, pbonzini, bjsdjshi, kraxel


This patchset is based on NVidia's "Add Mediated device support" series, version 6:

	http://www.spinics.net/lists/kvm/msg136472.html


Key Changes from Nvidia v6:

	- Introduced an independent struct device to host device, thereby
	  formed a physical-host-mdev hierarchy, and highly reused Linux
	  driver core support;

	- Added online/offline to mdev_bus_type, leveraging the 'online'
	  attr support from Linux driver core;

	- Removed mdev_class and other unecessary stuff;

	/*
	 * Given the changes above, the code volume of mdev core driver
	 * dramatically reduced by ~50%.
	 */


	- Interfaces between vfio_mdev and vendor driver are high-level,
	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
	  start/stop became mdev oriented, etc.;

	/*
	 * Given the changes above, the code volume of mdev core driver
	 * dramatically reduced by ~64%.
	 */


Test

	- Tested with KVMGT

TODO

	- Re-implement the attribute group of host device as long as the
	  sysfs hierarchy in discussion gets finalized;

	- Move common routines from current vfio-pci into a higher location,
	  export them for various VFIO bus drivers and/or mdev vendor drivers;

	- Add implementation examples for vendor drivers to Documentation;

	- Refine IOMMU changes



Jike Song (2):
  Mediated device Core driver
  vfio: VFIO bus driver for MDEV devices

Kirti Wankhede (2):
  vfio iommu: Add support for mediated devices
  docs: Add Documentation for Mediated devices

 Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
 drivers/vfio/Kconfig                   |   1 +
 drivers/vfio/Makefile                  |   1 +
 drivers/vfio/mdev/Kconfig              |  18 ++
 drivers/vfio/mdev/Makefile             |   5 +
 drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
 drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
 drivers/vfio/mdev/mdev_private.h       |  29 ++
 drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
 drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
 drivers/vfio/vfio.c                    |  82 ++++++
 drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
 include/linux/mdev.h                   | 159 +++++++++++
 include/linux/vfio.h                   |  13 +-
 14 files changed, 1709 insertions(+), 48 deletions(-)
 create mode 100644 Documentation/vfio-mediated-device.txt
 create mode 100644 drivers/vfio/mdev/Kconfig
 create mode 100644 drivers/vfio/mdev/Makefile
 create mode 100644 drivers/vfio/mdev/mdev_core.c
 create mode 100644 drivers/vfio/mdev/mdev_driver.c
 create mode 100644 drivers/vfio/mdev/mdev_private.h
 create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c
 create mode 100644 include/linux/mdev.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support
@ 2016-09-02  8:16 ` Jike Song
  0 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	zhiyuan.lv, jike.song, pbonzini, kraxel


This patchset is based on NVidia's "Add Mediated device support" series, version 6:

	http://www.spinics.net/lists/kvm/msg136472.html


Key Changes from Nvidia v6:

	- Introduced an independent struct device to host device, thereby
	  formed a physical-host-mdev hierarchy, and highly reused Linux
	  driver core support;

	- Added online/offline to mdev_bus_type, leveraging the 'online'
	  attr support from Linux driver core;

	- Removed mdev_class and other unecessary stuff;

	/*
	 * Given the changes above, the code volume of mdev core driver
	 * dramatically reduced by ~50%.
	 */


	- Interfaces between vfio_mdev and vendor driver are high-level,
	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
	  start/stop became mdev oriented, etc.;

	/*
	 * Given the changes above, the code volume of mdev core driver
	 * dramatically reduced by ~64%.
	 */


Test

	- Tested with KVMGT

TODO

	- Re-implement the attribute group of host device as long as the
	  sysfs hierarchy in discussion gets finalized;

	- Move common routines from current vfio-pci into a higher location,
	  export them for various VFIO bus drivers and/or mdev vendor drivers;

	- Add implementation examples for vendor drivers to Documentation;

	- Refine IOMMU changes



Jike Song (2):
  Mediated device Core driver
  vfio: VFIO bus driver for MDEV devices

Kirti Wankhede (2):
  vfio iommu: Add support for mediated devices
  docs: Add Documentation for Mediated devices

 Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
 drivers/vfio/Kconfig                   |   1 +
 drivers/vfio/Makefile                  |   1 +
 drivers/vfio/mdev/Kconfig              |  18 ++
 drivers/vfio/mdev/Makefile             |   5 +
 drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
 drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
 drivers/vfio/mdev/mdev_private.h       |  29 ++
 drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
 drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
 drivers/vfio/vfio.c                    |  82 ++++++
 drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
 include/linux/mdev.h                   | 159 +++++++++++
 include/linux/vfio.h                   |  13 +-
 14 files changed, 1709 insertions(+), 48 deletions(-)
 create mode 100644 Documentation/vfio-mediated-device.txt
 create mode 100644 drivers/vfio/mdev/Kconfig
 create mode 100644 drivers/vfio/mdev/Makefile
 create mode 100644 drivers/vfio/mdev/mdev_core.c
 create mode 100644 drivers/vfio/mdev/mdev_driver.c
 create mode 100644 drivers/vfio/mdev/mdev_private.h
 create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c
 create mode 100644 include/linux/mdev.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC v2 1/4] Mediated device Core driver
  2016-09-02  8:16 ` [Qemu-devel] " Jike Song
@ 2016-09-02  8:16   ` Jike Song
  -1 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: kevin.tian, guangrong.xiao, kvm, qemu-devel, zhenyuw, jike.song,
	zhiyuan.lv, pbonzini, bjsdjshi, kraxel

Design for Mediated Device Driver:
Main purpose of this driver is to provide a common interface for mediated
device management that can be used by different drivers of different
devices.

This module provides a generic interface to create the mdev device, add
it to the mdev bus, add it to IOMMU group and then to vfio group.

Below is the high level block diagram, with Nvidia, Intel and IBM devices
as example, since these are the devices which are going to actively use
this module as of now.

+---------------+
|               |
| +-----------+ |
| |           | |
| |           | |
| |  MDEV     | |  mdev_register_driver()       +---------------+
| |  Bus      | |<------------------------------+               |
| |  Driver   | |                               |               |
| |           | +------------------------------>| vfio_mdev.ko  |<-> VFIO user
| |           | |     probe()/remove()          |               |    APIs
| |           | |                               +---------------+
| +-----------+ |
|               |
|  MDEV CORE    |
|   MODULE      |
|   mdev.ko     |
|               |
| +-----------+ |  mdev_register_host_device()  +---------------+
| |           | +<------------------------------+               |
| |           | |                               |  nvidia.ko    |<-> physical
| |           | +------------------------------>+               |    device
| |           | |        callbacks              +---------------+
| | Physical/ | |
| | Host      | |  mdev_register_host_device()  +---------------+
| | Device    | |<------------------------------+               |
| | Interface | |                               |  i915.ko      |<-> physical
| |           | +------------------------------>+               |    device
| |           | |        callbacks              +---------------+
| |           | |
| |           | |  mdev_register_host_device()  +---------------+
| |           | +<------------------------------+               |
| |           | |                               | ccw_device.ko |<-> physical
| |           | +------------------------------>+               |    device
| |           | |        callbacks              +---------------+
| +-----------+ |
+---------------+

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/Kconfig             |   1 +
 drivers/vfio/Makefile            |   1 +
 drivers/vfio/mdev/Kconfig        |  10 ++
 drivers/vfio/mdev/Makefile       |   4 +
 drivers/vfio/mdev/mdev_core.c    | 250 +++++++++++++++++++++++++++++++++++++++
 drivers/vfio/mdev/mdev_driver.c  | 155 ++++++++++++++++++++++++
 drivers/vfio/mdev/mdev_private.h |  29 +++++
 drivers/vfio/mdev/mdev_sysfs.c   | 155 ++++++++++++++++++++++++
 include/linux/mdev.h             | 159 +++++++++++++++++++++++++
 9 files changed, 764 insertions(+)
 create mode 100644 drivers/vfio/mdev/Kconfig
 create mode 100644 drivers/vfio/mdev/Makefile
 create mode 100644 drivers/vfio/mdev/mdev_core.c
 create mode 100644 drivers/vfio/mdev/mdev_driver.c
 create mode 100644 drivers/vfio/mdev/mdev_private.h
 create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
 create mode 100644 include/linux/mdev.h

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index da6e2ce..23eced0 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -48,4 +48,5 @@ menuconfig VFIO_NOIOMMU
 
 source "drivers/vfio/pci/Kconfig"
 source "drivers/vfio/platform/Kconfig"
+source "drivers/vfio/mdev/Kconfig"
 source "virt/lib/Kconfig"
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 7b8a31f..7c70753 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
 obj-$(CONFIG_VFIO_PCI) += pci/
 obj-$(CONFIG_VFIO_PLATFORM) += platform/
+obj-$(CONFIG_MDEV) += mdev/
diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
new file mode 100644
index 0000000..d25439f
--- /dev/null
+++ b/drivers/vfio/mdev/Kconfig
@@ -0,0 +1,10 @@
+
+config MDEV
+    tristate "Mediated device driver framework"
+    depends on VFIO
+    default n
+    help
+        Provides a framework to virtualize device.
+	See Documentation/vfio-mediated-device.txt for more details.
+
+        If you don't know what do here, say N.
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
new file mode 100644
index 0000000..8bd78b5
--- /dev/null
+++ b/drivers/vfio/mdev/Makefile
@@ -0,0 +1,4 @@
+
+mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
+
+obj-$(CONFIG_MDEV) += mdev.o
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
new file mode 100644
index 0000000..cb27ccf
--- /dev/null
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -0,0 +1,250 @@
+/*
+ * Mediated device Core Driver
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ * Author:
+ *	Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *	Jike Song <jike.song@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/uuid.h>
+#include <linux/vfio.h>
+#include <linux/iommu.h>
+#include <linux/sysfs.h>
+#include <linux/mdev.h>
+
+#include "mdev_private.h"
+
+#define DRIVER_VERSION		"0.2"
+#define DRIVER_AUTHOR		"NVIDIA Corporation"
+#define DRIVER_DESC		"Mediated Device Core Driver"
+
+
+static int __find_mdev_device(struct device *dev, void *data)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+
+	return (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0);
+}
+
+static struct mdev_device *find_mdev_device(struct mdev_host *host,
+					    uuid_le uuid)
+{
+	struct device *dev;
+
+	dev = device_find_child(&host->dev, &uuid, __find_mdev_device);
+	if (!dev)
+		return NULL;
+
+	return dev_to_mdev(dev);
+}
+
+static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
+{
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	return host->ops->create(mdev, mdev_params);
+}
+
+static void mdev_device_destroy_ops(struct mdev_device *mdev)
+{
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	host->ops->destroy(mdev);
+}
+
+/*
+ * mdev_register_host_device : register a mdev host device
+ * @dev: device structure of the physical device under which the created
+ *       host device will be.
+ * @ops: Parent device operation structure to be registered.
+ *
+ * Register a mdev host device as the mediator of mdev devices.
+ * Returns the pointer of mdev host device structure for success, NULL
+ * for errors.
+ */
+struct mdev_host *mdev_register_host_device(struct device *pdev,
+				const struct mdev_host_ops *ops)
+{
+	int rc = 0;
+	struct mdev_host *host;
+
+	if (!pdev || !ops) {
+		dev_warn(pdev, "dev or ops is NULL\n");
+		return NULL;
+	}
+
+	/* check for mandatory ops */
+	if (!ops->create || !ops->destroy) {
+		dev_warn(pdev, "create and destroy methods are necessary\n");
+		return NULL;
+	}
+
+	host = kzalloc(sizeof(*host), GFP_KERNEL);
+	if (!host)
+		return NULL;
+
+	host->dev.parent = pdev;
+	host->ops = ops;
+	dev_set_name(&host->dev, "mdev-host");
+
+	rc = device_register(&host->dev);
+	if (rc)
+		goto register_error;
+
+	rc = mdev_create_sysfs_files(&host->dev);
+	if (rc)
+		goto add_sysfs_error;
+
+	rc = sysfs_create_groups(&host->dev.kobj, ops->hdev_attr_groups);
+	if (rc)
+		goto add_group_error;
+
+	dev_info(&host->dev, "mdev host device registered\n");
+	return host;
+
+add_group_error:
+	mdev_remove_sysfs_files(&host->dev);
+
+add_sysfs_error:
+	device_unregister(&host->dev);
+
+register_error:
+	kfree(host);
+	return NULL;
+}
+EXPORT_SYMBOL(mdev_register_host_device);
+
+static int __mdev_device_destroy(struct device *dev, void *data)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+
+	mdev_device_destroy_ops(mdev);
+	device_unregister(&mdev->dev);
+
+	return 0;
+}
+
+/*
+ * mdev_unregister_host_device : unregister a mdev host device
+ * @host: the mdev host device structure
+ *
+ * Unregister a mdev host device as the mediator
+ */
+void mdev_unregister_host_device(struct mdev_host *host)
+{
+	if (!host)
+		return;
+
+	dev_info(&host->dev, "mdev host device unregistered\n");
+
+	mdev_remove_sysfs_files(&host->dev);
+	sysfs_remove_groups(&host->dev.kobj, host->ops->hdev_attr_groups);
+	device_for_each_child(&host->dev, NULL,  __mdev_device_destroy);
+	device_unregister(&host->dev);
+}
+EXPORT_SYMBOL(mdev_unregister_host_device);
+
+int mdev_device_create(struct device *dev, uuid_le uuid, char *mdev_params)
+{
+	int ret;
+	struct mdev_device *mdev;
+	struct mdev_host *host = dev_to_host(dev);
+
+	/* Check for duplicate */
+	mdev = find_mdev_device(host, uuid);
+	if (mdev) {
+		ret = -EEXIST;
+		goto create_err;
+	}
+
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev) {
+		ret = -ENOMEM;
+		goto create_err;
+	}
+
+	memcpy(&mdev->uuid, &uuid, sizeof(uuid_le));
+
+	mdev->dev.parent = dev;
+	mdev->dev.bus = &mdev_bus_type;
+	mdev->dev.groups = host->ops->mdev_attr_groups;
+	dev_set_name(&mdev->dev, "%pUl", uuid.b);
+
+	ret = device_register(&mdev->dev);
+	if (ret) {
+		put_device(&mdev->dev);
+		goto create_err;
+	}
+
+	ret = mdev_device_create_ops(mdev, mdev_params);
+	if (ret)
+		goto create_failed;
+
+	dev_dbg(&mdev->dev, "MDEV: created\n");
+
+	return ret;
+
+create_failed:
+	device_unregister(&mdev->dev);
+
+create_err:
+	return ret;
+}
+
+int mdev_device_destroy(struct device *dev, uuid_le uuid)
+{
+	struct mdev_device *mdev;
+	struct mdev_host *host = dev_to_host(dev);
+
+	mdev = find_mdev_device(host, uuid);
+	if (!mdev)
+		return -ENODEV;
+
+	return __mdev_device_destroy(&mdev->dev, NULL);
+}
+
+void mdev_device_supported_config(struct device *dev, char *str)
+{
+	struct mdev_host *host = dev_to_host(dev);
+
+	if (host->ops->supported_config)
+		host->ops->supported_config(&host->dev, str);
+}
+
+static int __init mdev_init(void)
+{
+	int ret;
+
+	ret = mdev_bus_register();
+	if (ret)
+		pr_err("failed to register mdev bus: %d\n", ret);
+
+	return ret;
+}
+
+static void __exit mdev_exit(void)
+{
+	mdev_bus_unregister();
+}
+
+module_init(mdev_init)
+module_exit(mdev_exit)
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c
new file mode 100644
index 0000000..d298aaf
--- /dev/null
+++ b/drivers/vfio/mdev/mdev_driver.c
@@ -0,0 +1,155 @@
+/*
+ * MDEV driver
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/device.h>
+#include <linux/iommu.h>
+#include <linux/mdev.h>
+
+#include "mdev_private.h"
+
+static int mdev_attach_iommu(struct mdev_device *mdev)
+{
+	int ret;
+	struct iommu_group *group;
+
+	group = iommu_group_alloc();
+	if (IS_ERR(group)) {
+		dev_err(&mdev->dev, "MDEV: failed to allocate group!\n");
+		return PTR_ERR(group);
+	}
+
+	ret = iommu_group_add_device(group, &mdev->dev);
+	if (ret) {
+		dev_err(&mdev->dev, "MDEV: failed to add dev to group!\n");
+		goto attach_fail;
+	}
+
+	mdev->group = group;
+
+	dev_info(&mdev->dev, "MDEV: group_id = %d\n",
+				 iommu_group_id(group));
+attach_fail:
+	iommu_group_put(group);
+	return ret;
+}
+
+static void mdev_detach_iommu(struct mdev_device *mdev)
+{
+	iommu_group_remove_device(&mdev->dev);
+	mdev->group = NULL;
+	dev_info(&mdev->dev, "MDEV: detaching iommu\n");
+}
+
+static int mdev_probe(struct device *dev)
+{
+	struct mdev_driver *drv = to_mdev_driver(dev->driver);
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	int ret;
+
+	ret = mdev_attach_iommu(mdev);
+	if (ret) {
+		dev_err(dev, "Failed to attach IOMMU\n");
+		return ret;
+	}
+
+	if (drv && drv->probe)
+		ret = drv->probe(dev);
+
+	if (ret)
+		mdev_detach_iommu(mdev);
+
+	return ret;
+}
+
+static int mdev_remove(struct device *dev)
+{
+	struct mdev_driver *drv = to_mdev_driver(dev->driver);
+	struct mdev_device *mdev = dev_to_mdev(dev);
+
+	if (drv && drv->remove)
+		drv->remove(dev);
+
+	mdev_detach_iommu(mdev);
+
+	return 0;
+}
+
+static int mdev_online(struct device *dev)
+{
+	struct mdev_driver *drv = to_mdev_driver(dev->driver);
+
+	if (drv && drv->online)
+		return drv->online(dev);
+
+	return 0;
+}
+
+static int mdev_offline(struct device *dev)
+{
+	struct mdev_driver *drv = to_mdev_driver(dev->driver);
+
+	if (drv && drv->offline)
+		return drv->offline(dev);
+
+	return 0;
+}
+
+struct bus_type mdev_bus_type = {
+	.name		= "mdev",
+	.probe		= mdev_probe,
+	.remove		= mdev_remove,
+	.online		= mdev_online,
+	.offline	= mdev_offline,
+};
+EXPORT_SYMBOL_GPL(mdev_bus_type);
+
+/*
+ * mdev_register_driver - register a new MDEV driver
+ * @drv: the driver to register
+ * @owner: module owner of driver to be registered
+ *
+ * Returns a negative value on error, otherwise 0.
+ */
+int mdev_register_driver(struct mdev_driver *drv, struct module *owner)
+{
+	/* initialize common driver fields */
+	drv->driver.name = drv->name;
+	drv->driver.bus = &mdev_bus_type;
+	drv->driver.owner = owner;
+
+	/* register with core */
+	return driver_register(&drv->driver);
+}
+EXPORT_SYMBOL(mdev_register_driver);
+
+/*
+ * mdev_unregister_driver - unregister MDEV driver
+ * @drv: the driver to unregister
+ *
+ */
+void mdev_unregister_driver(struct mdev_driver *drv)
+{
+	driver_unregister(&drv->driver);
+}
+EXPORT_SYMBOL(mdev_unregister_driver);
+
+int mdev_bus_register(void)
+{
+	return bus_register(&mdev_bus_type);
+}
+
+void mdev_bus_unregister(void)
+{
+	bus_unregister(&mdev_bus_type);
+}
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
new file mode 100644
index 0000000..f153292
--- /dev/null
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -0,0 +1,29 @@
+/*
+ * Mediated device interal definitions
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef MDEV_PRIVATE_H
+#define MDEV_PRIVATE_H
+
+int  mdev_bus_register(void);
+void mdev_bus_unregister(void);
+
+/* Function prototypes for mdev_sysfs */
+int  mdev_create_sysfs_files(struct device *dev);
+void mdev_remove_sysfs_files(struct device *dev);
+
+int  mdev_device_create(struct device *dev, uuid_le uuid, char *mdev_params);
+int  mdev_device_destroy(struct device *dev, uuid_le uuid);
+void mdev_device_supported_config(struct device *dev, char *str);
+
+#endif /* MDEV_PRIVATE_H */
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
new file mode 100644
index 0000000..7d55188
--- /dev/null
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -0,0 +1,155 @@
+/*
+ * File attributes for Mediated devices
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ * Author:
+ *	Jike Song <jike.song@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/sysfs.h>
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/mdev.h>
+
+#include "mdev_private.h"
+
+/* Prototypes */
+static ssize_t mdev_supported_types_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf);
+static DEVICE_ATTR_RO(mdev_supported_types);
+
+static ssize_t mdev_create_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count);
+static DEVICE_ATTR_WO(mdev_create);
+
+static ssize_t mdev_destroy_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count);
+static DEVICE_ATTR_WO(mdev_destroy);
+
+static const struct attribute *mdev_host_attrs[] = {
+	&dev_attr_mdev_supported_types.attr,
+	&dev_attr_mdev_create.attr,
+	&dev_attr_mdev_destroy.attr,
+	NULL,
+};
+
+
+#define SUPPORTED_TYPE_BUFFER_LENGTH	4096
+
+/* mdev host sysfs functions */
+static ssize_t mdev_supported_types_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf)
+{
+	char *str, *ptr;
+	ssize_t n;
+
+	str = kzalloc(sizeof(*str) * SUPPORTED_TYPE_BUFFER_LENGTH, GFP_KERNEL);
+	if (!str)
+		return -ENOMEM;
+
+	ptr = str;
+	mdev_device_supported_config(dev, str);
+
+	n = sprintf(buf, "%s\n", str);
+	kfree(ptr);
+
+	return n;
+}
+
+static ssize_t mdev_create_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	char *str;
+	char *uuid_str, *params = NULL;
+	uuid_le uuid;
+	int ret;
+
+	str = kstrndup(buf, count, GFP_KERNEL);
+	if (!str)
+		return -ENOMEM;
+
+	uuid_str = strsep(&str, ":");
+	if (!uuid_str) {
+		pr_err("mdev_create: empty UUID string %s\n", buf);
+		ret = -EINVAL;
+		goto create_error;
+	}
+
+	if (str)
+		params = kstrdup(str, GFP_KERNEL);
+
+	ret = uuid_le_to_bin(uuid_str, &uuid);
+	if (ret) {
+		pr_err("mdev_create: UUID parse error %s\n", buf);
+		goto create_error;
+	}
+
+	ret = mdev_device_create(dev, uuid, params);
+	if (ret)
+		pr_err("mdev_create: Failed to create mdev device\n");
+	else
+		ret = count;
+
+create_error:
+	kfree(params);
+	kfree(str);
+	return ret;
+}
+
+static ssize_t mdev_destroy_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	char *str;
+	uuid_le uuid;
+	int ret;
+
+	str = kstrndup(buf, count, GFP_KERNEL);
+	if (!str)
+		return -ENOMEM;
+
+	ret = uuid_le_to_bin(str, &uuid);
+	if (ret) {
+		pr_err("mdev_destroy: UUID parse error  %s\n", buf);
+		goto destroy_error;
+	}
+
+	ret = mdev_device_destroy(dev, uuid);
+	if (ret == 0)
+		ret = count;
+
+destroy_error:
+	kfree(str);
+	return ret;
+}
+
+int mdev_create_sysfs_files(struct device *dev)
+{
+	int ret;
+
+	ret = sysfs_create_files(&dev->kobj, mdev_host_attrs);
+	if (ret)
+		pr_err("sysfs_create_files failed: %d\n", ret);
+
+	return ret;
+}
+
+void mdev_remove_sysfs_files(struct device *dev)
+{
+	sysfs_remove_files(&dev->kobj, mdev_host_attrs);
+}
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
new file mode 100644
index 0000000..1236200
--- /dev/null
+++ b/include/linux/mdev.h
@@ -0,0 +1,159 @@
+/*
+ * Mediated device definition
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef MDEV_H
+#define MDEV_H
+
+#include <uapi/linux/vfio.h>
+
+
+/* mediated device */
+struct mdev_device {
+	struct device		dev;
+	struct iommu_group	*group;
+	uuid_le			uuid;
+	void			*driver_data;
+};
+
+/**
+ * struct mdev_host_ops - Structure to be registered for each host device to
+ * to mdev.
+ *
+ * @owner:		The module owner.
+ * @hdev_attr_groups:	Default attributes of the host device.
+ * @mdev_attr_groups:	Default attributes of the mdev device.
+ * @supported_config:	Called to get information about supported types.
+ *			@dev : device structure of host device.
+ *			@config: should return string listing supported config
+ *			Returns integer: success (0) or error (< 0)
+ * @create:		Called to allocate basic resources in host device's
+ *			driver for a particular mediated device. It is
+ *			mandatory to provide create ops.
+ *			@mdev: mdev_device structure on of mediated device
+ *			      that is being created
+ *			@mdev_params: extra parameters required by host
+ *			device's driver.
+ *			Returns integer: success (0) or error (< 0)
+ * @destroy:		Called to free resources in host device's driver for a
+ *			a mediated device instance. It is mandatory to provide
+ *			destroy ops.
+ *			@mdev: mdev_device device structure which is being
+ *				destroyed
+ *			Returns integer: success (0) or error (< 0)
+ *			If VMM is running and destroy() is called that means the
+ *			mdev is being hotunpluged. Return error if VMM is
+ *			running and driver doesn't support mediated device
+ *			hotplug.
+ * @start:		Called to initiate mediated device initialization
+ *			process in host device's driver before VMM starts.
+ *			@mdev: mediated device structure
+ *			Returns integer: success (0) or error (< 0)
+ * @stop:		Called to teardown mediated device related resources
+ *			@mdev: mediated device structure
+ *			Returns integer: success (0) or error (< 0)
+ * @read:		Read emulation callback
+ *			@mdev: mediated device structure
+ *			@buf: read buffer
+ *			@count: number of bytes to read
+ *			@pos: address.
+ *			Retuns number on bytes read on success or error.
+ * @write:		Write emulation callback
+ *			@mdev: mediated device structure
+ *			@buf: write buffer
+ *			@count: number of bytes to be written
+ *			@pos: address.
+ *			Retuns number on bytes written on success or error.
+ * @mmap:		Memory Map
+ *			@mdev: mediated device structure
+ *			@pos: address
+ *			@virtaddr: target user address to start at. Vendor
+ *			driver can change if required.
+ *			@pfn: host address of kernel memory, vendor driver
+ *			can change if required.
+ *			@size: size of map area, vendor driver can change the
+ *			size of map area if desired.
+ *			@prot: page protection flags for this mapping, vendor
+ *			driver can change, if required.
+ *			Returns integer: success (0) or error (< 0)
+ *
+ * Host device that support mediated device should be registered with mdev
+ * module with mdev_host_ops structure.
+ */
+struct mdev_host_ops {
+	struct module *owner;
+	const struct attribute_group **hdev_attr_groups;
+	const struct attribute_group **mdev_attr_groups;
+
+	int (*supported_config)(struct device *dev, char *config);
+	int (*create)(struct mdev_device *mdev, char *mdev_params);
+	void (*destroy)(struct mdev_device *mdev);
+
+	int (*start)(struct mdev_device *mdev);
+	int (*stop)(struct mdev_device *mdev);
+
+	ssize_t (*read)(struct mdev_device *mdev, char __user *buf,
+			size_t count, loff_t *pos);
+	ssize_t (*write)(struct mdev_device *mdev, const char __user *buf,
+			size_t count, loff_t *pos);
+	int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
+	long (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
+			unsigned long arg);
+};
+
+/* mdev host device */
+struct mdev_host {
+	struct device dev;
+	const struct mdev_host_ops *ops;
+};
+
+/**
+ * struct mdev_driver - Mediated device driver
+ * @name: driver name
+ * @probe: called when new device created
+ * @remove: called when device removed
+ * @driver: device driver structure
+ **/
+struct mdev_driver {
+	const char *name;
+	int (*probe)(struct device *dev);
+	void (*remove)(struct device *dev);
+	int (*online)(struct device *dev);
+	int (*offline)(struct device *dev);
+	struct device_driver driver;
+};
+
+static inline void *mdev_get_drvdata(struct mdev_device *mdev)
+{
+	return mdev->driver_data;
+}
+
+static inline void mdev_set_drvdata(struct mdev_device *mdev, void *data)
+{
+	mdev->driver_data = data;
+}
+
+extern struct bus_type mdev_bus_type;
+
+#define to_mdev_driver(drv) container_of(drv, struct mdev_driver, driver)
+#define dev_to_host(_dev) container_of((_dev), struct mdev_host, dev)
+#define dev_to_mdev(_dev) container_of((_dev), struct mdev_device, dev)
+
+struct mdev_host *mdev_register_host_device(struct device *dev,
+				 const struct mdev_host_ops *ops);
+void mdev_unregister_host_device(struct mdev_host *host);
+
+int mdev_register_driver(struct mdev_driver *drv, struct module *owner);
+void mdev_unregister_driver(struct mdev_driver *drv);
+
+#endif /* MDEV_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v2 1/4] Mediated device Core driver
@ 2016-09-02  8:16   ` Jike Song
  0 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	zhiyuan.lv, jike.song, pbonzini, kraxel

Design for Mediated Device Driver:
Main purpose of this driver is to provide a common interface for mediated
device management that can be used by different drivers of different
devices.

This module provides a generic interface to create the mdev device, add
it to the mdev bus, add it to IOMMU group and then to vfio group.

Below is the high level block diagram, with Nvidia, Intel and IBM devices
as example, since these are the devices which are going to actively use
this module as of now.

+---------------+
|               |
| +-----------+ |
| |           | |
| |           | |
| |  MDEV     | |  mdev_register_driver()       +---------------+
| |  Bus      | |<------------------------------+               |
| |  Driver   | |                               |               |
| |           | +------------------------------>| vfio_mdev.ko  |<-> VFIO user
| |           | |     probe()/remove()          |               |    APIs
| |           | |                               +---------------+
| +-----------+ |
|               |
|  MDEV CORE    |
|   MODULE      |
|   mdev.ko     |
|               |
| +-----------+ |  mdev_register_host_device()  +---------------+
| |           | +<------------------------------+               |
| |           | |                               |  nvidia.ko    |<-> physical
| |           | +------------------------------>+               |    device
| |           | |        callbacks              +---------------+
| | Physical/ | |
| | Host      | |  mdev_register_host_device()  +---------------+
| | Device    | |<------------------------------+               |
| | Interface | |                               |  i915.ko      |<-> physical
| |           | +------------------------------>+               |    device
| |           | |        callbacks              +---------------+
| |           | |
| |           | |  mdev_register_host_device()  +---------------+
| |           | +<------------------------------+               |
| |           | |                               | ccw_device.ko |<-> physical
| |           | +------------------------------>+               |    device
| |           | |        callbacks              +---------------+
| +-----------+ |
+---------------+

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/Kconfig             |   1 +
 drivers/vfio/Makefile            |   1 +
 drivers/vfio/mdev/Kconfig        |  10 ++
 drivers/vfio/mdev/Makefile       |   4 +
 drivers/vfio/mdev/mdev_core.c    | 250 +++++++++++++++++++++++++++++++++++++++
 drivers/vfio/mdev/mdev_driver.c  | 155 ++++++++++++++++++++++++
 drivers/vfio/mdev/mdev_private.h |  29 +++++
 drivers/vfio/mdev/mdev_sysfs.c   | 155 ++++++++++++++++++++++++
 include/linux/mdev.h             | 159 +++++++++++++++++++++++++
 9 files changed, 764 insertions(+)
 create mode 100644 drivers/vfio/mdev/Kconfig
 create mode 100644 drivers/vfio/mdev/Makefile
 create mode 100644 drivers/vfio/mdev/mdev_core.c
 create mode 100644 drivers/vfio/mdev/mdev_driver.c
 create mode 100644 drivers/vfio/mdev/mdev_private.h
 create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
 create mode 100644 include/linux/mdev.h

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index da6e2ce..23eced0 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -48,4 +48,5 @@ menuconfig VFIO_NOIOMMU
 
 source "drivers/vfio/pci/Kconfig"
 source "drivers/vfio/platform/Kconfig"
+source "drivers/vfio/mdev/Kconfig"
 source "virt/lib/Kconfig"
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 7b8a31f..7c70753 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
 obj-$(CONFIG_VFIO_PCI) += pci/
 obj-$(CONFIG_VFIO_PLATFORM) += platform/
+obj-$(CONFIG_MDEV) += mdev/
diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
new file mode 100644
index 0000000..d25439f
--- /dev/null
+++ b/drivers/vfio/mdev/Kconfig
@@ -0,0 +1,10 @@
+
+config MDEV
+    tristate "Mediated device driver framework"
+    depends on VFIO
+    default n
+    help
+        Provides a framework to virtualize device.
+	See Documentation/vfio-mediated-device.txt for more details.
+
+        If you don't know what do here, say N.
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
new file mode 100644
index 0000000..8bd78b5
--- /dev/null
+++ b/drivers/vfio/mdev/Makefile
@@ -0,0 +1,4 @@
+
+mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
+
+obj-$(CONFIG_MDEV) += mdev.o
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
new file mode 100644
index 0000000..cb27ccf
--- /dev/null
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -0,0 +1,250 @@
+/*
+ * Mediated device Core Driver
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ * Author:
+ *	Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *	Jike Song <jike.song@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/uuid.h>
+#include <linux/vfio.h>
+#include <linux/iommu.h>
+#include <linux/sysfs.h>
+#include <linux/mdev.h>
+
+#include "mdev_private.h"
+
+#define DRIVER_VERSION		"0.2"
+#define DRIVER_AUTHOR		"NVIDIA Corporation"
+#define DRIVER_DESC		"Mediated Device Core Driver"
+
+
+static int __find_mdev_device(struct device *dev, void *data)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+
+	return (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0);
+}
+
+static struct mdev_device *find_mdev_device(struct mdev_host *host,
+					    uuid_le uuid)
+{
+	struct device *dev;
+
+	dev = device_find_child(&host->dev, &uuid, __find_mdev_device);
+	if (!dev)
+		return NULL;
+
+	return dev_to_mdev(dev);
+}
+
+static int mdev_device_create_ops(struct mdev_device *mdev, char *mdev_params)
+{
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	return host->ops->create(mdev, mdev_params);
+}
+
+static void mdev_device_destroy_ops(struct mdev_device *mdev)
+{
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	host->ops->destroy(mdev);
+}
+
+/*
+ * mdev_register_host_device : register a mdev host device
+ * @dev: device structure of the physical device under which the created
+ *       host device will be.
+ * @ops: Parent device operation structure to be registered.
+ *
+ * Register a mdev host device as the mediator of mdev devices.
+ * Returns the pointer of mdev host device structure for success, NULL
+ * for errors.
+ */
+struct mdev_host *mdev_register_host_device(struct device *pdev,
+				const struct mdev_host_ops *ops)
+{
+	int rc = 0;
+	struct mdev_host *host;
+
+	if (!pdev || !ops) {
+		dev_warn(pdev, "dev or ops is NULL\n");
+		return NULL;
+	}
+
+	/* check for mandatory ops */
+	if (!ops->create || !ops->destroy) {
+		dev_warn(pdev, "create and destroy methods are necessary\n");
+		return NULL;
+	}
+
+	host = kzalloc(sizeof(*host), GFP_KERNEL);
+	if (!host)
+		return NULL;
+
+	host->dev.parent = pdev;
+	host->ops = ops;
+	dev_set_name(&host->dev, "mdev-host");
+
+	rc = device_register(&host->dev);
+	if (rc)
+		goto register_error;
+
+	rc = mdev_create_sysfs_files(&host->dev);
+	if (rc)
+		goto add_sysfs_error;
+
+	rc = sysfs_create_groups(&host->dev.kobj, ops->hdev_attr_groups);
+	if (rc)
+		goto add_group_error;
+
+	dev_info(&host->dev, "mdev host device registered\n");
+	return host;
+
+add_group_error:
+	mdev_remove_sysfs_files(&host->dev);
+
+add_sysfs_error:
+	device_unregister(&host->dev);
+
+register_error:
+	kfree(host);
+	return NULL;
+}
+EXPORT_SYMBOL(mdev_register_host_device);
+
+static int __mdev_device_destroy(struct device *dev, void *data)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+
+	mdev_device_destroy_ops(mdev);
+	device_unregister(&mdev->dev);
+
+	return 0;
+}
+
+/*
+ * mdev_unregister_host_device : unregister a mdev host device
+ * @host: the mdev host device structure
+ *
+ * Unregister a mdev host device as the mediator
+ */
+void mdev_unregister_host_device(struct mdev_host *host)
+{
+	if (!host)
+		return;
+
+	dev_info(&host->dev, "mdev host device unregistered\n");
+
+	mdev_remove_sysfs_files(&host->dev);
+	sysfs_remove_groups(&host->dev.kobj, host->ops->hdev_attr_groups);
+	device_for_each_child(&host->dev, NULL,  __mdev_device_destroy);
+	device_unregister(&host->dev);
+}
+EXPORT_SYMBOL(mdev_unregister_host_device);
+
+int mdev_device_create(struct device *dev, uuid_le uuid, char *mdev_params)
+{
+	int ret;
+	struct mdev_device *mdev;
+	struct mdev_host *host = dev_to_host(dev);
+
+	/* Check for duplicate */
+	mdev = find_mdev_device(host, uuid);
+	if (mdev) {
+		ret = -EEXIST;
+		goto create_err;
+	}
+
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev) {
+		ret = -ENOMEM;
+		goto create_err;
+	}
+
+	memcpy(&mdev->uuid, &uuid, sizeof(uuid_le));
+
+	mdev->dev.parent = dev;
+	mdev->dev.bus = &mdev_bus_type;
+	mdev->dev.groups = host->ops->mdev_attr_groups;
+	dev_set_name(&mdev->dev, "%pUl", uuid.b);
+
+	ret = device_register(&mdev->dev);
+	if (ret) {
+		put_device(&mdev->dev);
+		goto create_err;
+	}
+
+	ret = mdev_device_create_ops(mdev, mdev_params);
+	if (ret)
+		goto create_failed;
+
+	dev_dbg(&mdev->dev, "MDEV: created\n");
+
+	return ret;
+
+create_failed:
+	device_unregister(&mdev->dev);
+
+create_err:
+	return ret;
+}
+
+int mdev_device_destroy(struct device *dev, uuid_le uuid)
+{
+	struct mdev_device *mdev;
+	struct mdev_host *host = dev_to_host(dev);
+
+	mdev = find_mdev_device(host, uuid);
+	if (!mdev)
+		return -ENODEV;
+
+	return __mdev_device_destroy(&mdev->dev, NULL);
+}
+
+void mdev_device_supported_config(struct device *dev, char *str)
+{
+	struct mdev_host *host = dev_to_host(dev);
+
+	if (host->ops->supported_config)
+		host->ops->supported_config(&host->dev, str);
+}
+
+static int __init mdev_init(void)
+{
+	int ret;
+
+	ret = mdev_bus_register();
+	if (ret)
+		pr_err("failed to register mdev bus: %d\n", ret);
+
+	return ret;
+}
+
+static void __exit mdev_exit(void)
+{
+	mdev_bus_unregister();
+}
+
+module_init(mdev_init)
+module_exit(mdev_exit)
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/drivers/vfio/mdev/mdev_driver.c b/drivers/vfio/mdev/mdev_driver.c
new file mode 100644
index 0000000..d298aaf
--- /dev/null
+++ b/drivers/vfio/mdev/mdev_driver.c
@@ -0,0 +1,155 @@
+/*
+ * MDEV driver
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/device.h>
+#include <linux/iommu.h>
+#include <linux/mdev.h>
+
+#include "mdev_private.h"
+
+static int mdev_attach_iommu(struct mdev_device *mdev)
+{
+	int ret;
+	struct iommu_group *group;
+
+	group = iommu_group_alloc();
+	if (IS_ERR(group)) {
+		dev_err(&mdev->dev, "MDEV: failed to allocate group!\n");
+		return PTR_ERR(group);
+	}
+
+	ret = iommu_group_add_device(group, &mdev->dev);
+	if (ret) {
+		dev_err(&mdev->dev, "MDEV: failed to add dev to group!\n");
+		goto attach_fail;
+	}
+
+	mdev->group = group;
+
+	dev_info(&mdev->dev, "MDEV: group_id = %d\n",
+				 iommu_group_id(group));
+attach_fail:
+	iommu_group_put(group);
+	return ret;
+}
+
+static void mdev_detach_iommu(struct mdev_device *mdev)
+{
+	iommu_group_remove_device(&mdev->dev);
+	mdev->group = NULL;
+	dev_info(&mdev->dev, "MDEV: detaching iommu\n");
+}
+
+static int mdev_probe(struct device *dev)
+{
+	struct mdev_driver *drv = to_mdev_driver(dev->driver);
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	int ret;
+
+	ret = mdev_attach_iommu(mdev);
+	if (ret) {
+		dev_err(dev, "Failed to attach IOMMU\n");
+		return ret;
+	}
+
+	if (drv && drv->probe)
+		ret = drv->probe(dev);
+
+	if (ret)
+		mdev_detach_iommu(mdev);
+
+	return ret;
+}
+
+static int mdev_remove(struct device *dev)
+{
+	struct mdev_driver *drv = to_mdev_driver(dev->driver);
+	struct mdev_device *mdev = dev_to_mdev(dev);
+
+	if (drv && drv->remove)
+		drv->remove(dev);
+
+	mdev_detach_iommu(mdev);
+
+	return 0;
+}
+
+static int mdev_online(struct device *dev)
+{
+	struct mdev_driver *drv = to_mdev_driver(dev->driver);
+
+	if (drv && drv->online)
+		return drv->online(dev);
+
+	return 0;
+}
+
+static int mdev_offline(struct device *dev)
+{
+	struct mdev_driver *drv = to_mdev_driver(dev->driver);
+
+	if (drv && drv->offline)
+		return drv->offline(dev);
+
+	return 0;
+}
+
+struct bus_type mdev_bus_type = {
+	.name		= "mdev",
+	.probe		= mdev_probe,
+	.remove		= mdev_remove,
+	.online		= mdev_online,
+	.offline	= mdev_offline,
+};
+EXPORT_SYMBOL_GPL(mdev_bus_type);
+
+/*
+ * mdev_register_driver - register a new MDEV driver
+ * @drv: the driver to register
+ * @owner: module owner of driver to be registered
+ *
+ * Returns a negative value on error, otherwise 0.
+ */
+int mdev_register_driver(struct mdev_driver *drv, struct module *owner)
+{
+	/* initialize common driver fields */
+	drv->driver.name = drv->name;
+	drv->driver.bus = &mdev_bus_type;
+	drv->driver.owner = owner;
+
+	/* register with core */
+	return driver_register(&drv->driver);
+}
+EXPORT_SYMBOL(mdev_register_driver);
+
+/*
+ * mdev_unregister_driver - unregister MDEV driver
+ * @drv: the driver to unregister
+ *
+ */
+void mdev_unregister_driver(struct mdev_driver *drv)
+{
+	driver_unregister(&drv->driver);
+}
+EXPORT_SYMBOL(mdev_unregister_driver);
+
+int mdev_bus_register(void)
+{
+	return bus_register(&mdev_bus_type);
+}
+
+void mdev_bus_unregister(void)
+{
+	bus_unregister(&mdev_bus_type);
+}
diff --git a/drivers/vfio/mdev/mdev_private.h b/drivers/vfio/mdev/mdev_private.h
new file mode 100644
index 0000000..f153292
--- /dev/null
+++ b/drivers/vfio/mdev/mdev_private.h
@@ -0,0 +1,29 @@
+/*
+ * Mediated device interal definitions
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef MDEV_PRIVATE_H
+#define MDEV_PRIVATE_H
+
+int  mdev_bus_register(void);
+void mdev_bus_unregister(void);
+
+/* Function prototypes for mdev_sysfs */
+int  mdev_create_sysfs_files(struct device *dev);
+void mdev_remove_sysfs_files(struct device *dev);
+
+int  mdev_device_create(struct device *dev, uuid_le uuid, char *mdev_params);
+int  mdev_device_destroy(struct device *dev, uuid_le uuid);
+void mdev_device_supported_config(struct device *dev, char *str);
+
+#endif /* MDEV_PRIVATE_H */
diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
new file mode 100644
index 0000000..7d55188
--- /dev/null
+++ b/drivers/vfio/mdev/mdev_sysfs.c
@@ -0,0 +1,155 @@
+/*
+ * File attributes for Mediated devices
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ * Author:
+ *	Jike Song <jike.song@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/sysfs.h>
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/mdev.h>
+
+#include "mdev_private.h"
+
+/* Prototypes */
+static ssize_t mdev_supported_types_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf);
+static DEVICE_ATTR_RO(mdev_supported_types);
+
+static ssize_t mdev_create_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count);
+static DEVICE_ATTR_WO(mdev_create);
+
+static ssize_t mdev_destroy_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count);
+static DEVICE_ATTR_WO(mdev_destroy);
+
+static const struct attribute *mdev_host_attrs[] = {
+	&dev_attr_mdev_supported_types.attr,
+	&dev_attr_mdev_create.attr,
+	&dev_attr_mdev_destroy.attr,
+	NULL,
+};
+
+
+#define SUPPORTED_TYPE_BUFFER_LENGTH	4096
+
+/* mdev host sysfs functions */
+static ssize_t mdev_supported_types_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf)
+{
+	char *str, *ptr;
+	ssize_t n;
+
+	str = kzalloc(sizeof(*str) * SUPPORTED_TYPE_BUFFER_LENGTH, GFP_KERNEL);
+	if (!str)
+		return -ENOMEM;
+
+	ptr = str;
+	mdev_device_supported_config(dev, str);
+
+	n = sprintf(buf, "%s\n", str);
+	kfree(ptr);
+
+	return n;
+}
+
+static ssize_t mdev_create_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t count)
+{
+	char *str;
+	char *uuid_str, *params = NULL;
+	uuid_le uuid;
+	int ret;
+
+	str = kstrndup(buf, count, GFP_KERNEL);
+	if (!str)
+		return -ENOMEM;
+
+	uuid_str = strsep(&str, ":");
+	if (!uuid_str) {
+		pr_err("mdev_create: empty UUID string %s\n", buf);
+		ret = -EINVAL;
+		goto create_error;
+	}
+
+	if (str)
+		params = kstrdup(str, GFP_KERNEL);
+
+	ret = uuid_le_to_bin(uuid_str, &uuid);
+	if (ret) {
+		pr_err("mdev_create: UUID parse error %s\n", buf);
+		goto create_error;
+	}
+
+	ret = mdev_device_create(dev, uuid, params);
+	if (ret)
+		pr_err("mdev_create: Failed to create mdev device\n");
+	else
+		ret = count;
+
+create_error:
+	kfree(params);
+	kfree(str);
+	return ret;
+}
+
+static ssize_t mdev_destroy_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	char *str;
+	uuid_le uuid;
+	int ret;
+
+	str = kstrndup(buf, count, GFP_KERNEL);
+	if (!str)
+		return -ENOMEM;
+
+	ret = uuid_le_to_bin(str, &uuid);
+	if (ret) {
+		pr_err("mdev_destroy: UUID parse error  %s\n", buf);
+		goto destroy_error;
+	}
+
+	ret = mdev_device_destroy(dev, uuid);
+	if (ret == 0)
+		ret = count;
+
+destroy_error:
+	kfree(str);
+	return ret;
+}
+
+int mdev_create_sysfs_files(struct device *dev)
+{
+	int ret;
+
+	ret = sysfs_create_files(&dev->kobj, mdev_host_attrs);
+	if (ret)
+		pr_err("sysfs_create_files failed: %d\n", ret);
+
+	return ret;
+}
+
+void mdev_remove_sysfs_files(struct device *dev)
+{
+	sysfs_remove_files(&dev->kobj, mdev_host_attrs);
+}
diff --git a/include/linux/mdev.h b/include/linux/mdev.h
new file mode 100644
index 0000000..1236200
--- /dev/null
+++ b/include/linux/mdev.h
@@ -0,0 +1,159 @@
+/*
+ * Mediated device definition
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef MDEV_H
+#define MDEV_H
+
+#include <uapi/linux/vfio.h>
+
+
+/* mediated device */
+struct mdev_device {
+	struct device		dev;
+	struct iommu_group	*group;
+	uuid_le			uuid;
+	void			*driver_data;
+};
+
+/**
+ * struct mdev_host_ops - Structure to be registered for each host device to
+ * to mdev.
+ *
+ * @owner:		The module owner.
+ * @hdev_attr_groups:	Default attributes of the host device.
+ * @mdev_attr_groups:	Default attributes of the mdev device.
+ * @supported_config:	Called to get information about supported types.
+ *			@dev : device structure of host device.
+ *			@config: should return string listing supported config
+ *			Returns integer: success (0) or error (< 0)
+ * @create:		Called to allocate basic resources in host device's
+ *			driver for a particular mediated device. It is
+ *			mandatory to provide create ops.
+ *			@mdev: mdev_device structure on of mediated device
+ *			      that is being created
+ *			@mdev_params: extra parameters required by host
+ *			device's driver.
+ *			Returns integer: success (0) or error (< 0)
+ * @destroy:		Called to free resources in host device's driver for a
+ *			a mediated device instance. It is mandatory to provide
+ *			destroy ops.
+ *			@mdev: mdev_device device structure which is being
+ *				destroyed
+ *			Returns integer: success (0) or error (< 0)
+ *			If VMM is running and destroy() is called that means the
+ *			mdev is being hotunpluged. Return error if VMM is
+ *			running and driver doesn't support mediated device
+ *			hotplug.
+ * @start:		Called to initiate mediated device initialization
+ *			process in host device's driver before VMM starts.
+ *			@mdev: mediated device structure
+ *			Returns integer: success (0) or error (< 0)
+ * @stop:		Called to teardown mediated device related resources
+ *			@mdev: mediated device structure
+ *			Returns integer: success (0) or error (< 0)
+ * @read:		Read emulation callback
+ *			@mdev: mediated device structure
+ *			@buf: read buffer
+ *			@count: number of bytes to read
+ *			@pos: address.
+ *			Retuns number on bytes read on success or error.
+ * @write:		Write emulation callback
+ *			@mdev: mediated device structure
+ *			@buf: write buffer
+ *			@count: number of bytes to be written
+ *			@pos: address.
+ *			Retuns number on bytes written on success or error.
+ * @mmap:		Memory Map
+ *			@mdev: mediated device structure
+ *			@pos: address
+ *			@virtaddr: target user address to start at. Vendor
+ *			driver can change if required.
+ *			@pfn: host address of kernel memory, vendor driver
+ *			can change if required.
+ *			@size: size of map area, vendor driver can change the
+ *			size of map area if desired.
+ *			@prot: page protection flags for this mapping, vendor
+ *			driver can change, if required.
+ *			Returns integer: success (0) or error (< 0)
+ *
+ * Host device that support mediated device should be registered with mdev
+ * module with mdev_host_ops structure.
+ */
+struct mdev_host_ops {
+	struct module *owner;
+	const struct attribute_group **hdev_attr_groups;
+	const struct attribute_group **mdev_attr_groups;
+
+	int (*supported_config)(struct device *dev, char *config);
+	int (*create)(struct mdev_device *mdev, char *mdev_params);
+	void (*destroy)(struct mdev_device *mdev);
+
+	int (*start)(struct mdev_device *mdev);
+	int (*stop)(struct mdev_device *mdev);
+
+	ssize_t (*read)(struct mdev_device *mdev, char __user *buf,
+			size_t count, loff_t *pos);
+	ssize_t (*write)(struct mdev_device *mdev, const char __user *buf,
+			size_t count, loff_t *pos);
+	int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma);
+	long (*ioctl)(struct mdev_device *mdev, unsigned int cmd,
+			unsigned long arg);
+};
+
+/* mdev host device */
+struct mdev_host {
+	struct device dev;
+	const struct mdev_host_ops *ops;
+};
+
+/**
+ * struct mdev_driver - Mediated device driver
+ * @name: driver name
+ * @probe: called when new device created
+ * @remove: called when device removed
+ * @driver: device driver structure
+ **/
+struct mdev_driver {
+	const char *name;
+	int (*probe)(struct device *dev);
+	void (*remove)(struct device *dev);
+	int (*online)(struct device *dev);
+	int (*offline)(struct device *dev);
+	struct device_driver driver;
+};
+
+static inline void *mdev_get_drvdata(struct mdev_device *mdev)
+{
+	return mdev->driver_data;
+}
+
+static inline void mdev_set_drvdata(struct mdev_device *mdev, void *data)
+{
+	mdev->driver_data = data;
+}
+
+extern struct bus_type mdev_bus_type;
+
+#define to_mdev_driver(drv) container_of(drv, struct mdev_driver, driver)
+#define dev_to_host(_dev) container_of((_dev), struct mdev_host, dev)
+#define dev_to_mdev(_dev) container_of((_dev), struct mdev_device, dev)
+
+struct mdev_host *mdev_register_host_device(struct device *dev,
+				 const struct mdev_host_ops *ops);
+void mdev_unregister_host_device(struct mdev_host *host);
+
+int mdev_register_driver(struct mdev_driver *drv, struct module *owner);
+void mdev_unregister_driver(struct mdev_driver *drv);
+
+#endif /* MDEV_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC v2 2/4] vfio: VFIO bus driver for MDEV devices
  2016-09-02  8:16 ` [Qemu-devel] " Jike Song
@ 2016-09-02  8:16   ` Jike Song
  -1 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: kevin.tian, guangrong.xiao, kvm, qemu-devel, zhenyuw, jike.song,
	zhiyuan.lv, pbonzini, bjsdjshi, kraxel

This driver implements the VFIO bus support for MDEV devices. Vendor drivers
are expected to register with MDEV core driver. MDEV core driver creates
mediated device and calls probe routine of this driver, then the mdev
is added to VFIO core module.

Main aim of this module is to manage all VFIO APIs for each mediated PCI
device. See Documentation/vfio-mediated-device.txt for details.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/mdev/Kconfig     |   8 ++
 drivers/vfio/mdev/Makefile    |   1 +
 drivers/vfio/mdev/vfio_mdev.c | 187 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 196 insertions(+)
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c

diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index d25439f..b2fe0c6 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -8,3 +8,11 @@ config MDEV
 	See Documentation/vfio-mediated-device.txt for more details.
 
         If you don't know what do here, say N.
+
+config VFIO_MDEV
+    tristate "VFIO Bus driver for Mediated devices"
+    depends on VFIO && MDEV
+    default n
+    help
+        VFIO Bus driver for mediated devices.
+
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 8bd78b5..ee9f89f 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -2,3 +2,4 @@
 mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
 
 obj-$(CONFIG_MDEV) += mdev.o
+obj-$(CONFIG_VFIO_MDEV) += vfio_mdev.o
diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
new file mode 100644
index 0000000..c22ebd8
--- /dev/null
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -0,0 +1,187 @@
+/*
+ * VFIO Bus driver for Mediated device
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ * Author:
+ *	Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *	Jike Song <jike.song@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/vfio.h>
+#include <linux/iommu.h>
+#include <linux/mdev.h>
+
+#include "mdev_private.h"
+
+#define DRIVER_VERSION  "0.2"
+#define DRIVER_AUTHOR   "NVIDIA Corporation"
+#define DRIVER_DESC     "VFIO Bus driver for Mediated device"
+
+struct vfio_mdev {
+	struct iommu_group *group;
+	struct mdev_device *mdev;
+};
+
+static int vfio_mdev_open(void *device_data)
+{
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	return 0;
+}
+
+static void vfio_mdev_close(void *device_data)
+{
+	module_put(THIS_MODULE);
+}
+
+static long vfio_mdev_unlocked_ioctl(void *device_data,
+				     unsigned int cmd, unsigned long arg)
+{
+	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->ioctl)
+		return host->ops->ioctl(mdev, cmd, arg);
+
+	return -ENODEV;
+}
+
+static ssize_t vfio_mdev_read(void *device_data, char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->read)
+		return host->ops->read(mdev, buf, count, ppos);
+
+	return -ENODEV;
+}
+
+static ssize_t vfio_mdev_write(void *device_data, const char __user *buf,
+			       size_t count, loff_t *ppos)
+{
+	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->write)
+		return host->ops->write(mdev, buf, count, ppos);
+
+	return -ENODEV;
+}
+
+static int vfio_mdev_mmap(void *device_data, struct vm_area_struct *vma)
+{
+	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->mmap)
+		return host->ops->mmap(mdev, vma);
+
+	return -ENODEV;
+}
+
+static const struct vfio_device_ops vfio_mdev_dev_ops = {
+	.name		= "vfio-mdev",
+	.open		= vfio_mdev_open,
+	.release	= vfio_mdev_close,
+	.ioctl		= vfio_mdev_unlocked_ioctl,
+	.read		= vfio_mdev_read,
+	.write		= vfio_mdev_write,
+	.mmap		= vfio_mdev_mmap,
+};
+
+static int vfio_mdev_probe(struct device *dev)
+{
+	struct vfio_mdev *vmdev;
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	int ret;
+
+	vmdev = kzalloc(sizeof(*vmdev), GFP_KERNEL);
+	if (IS_ERR(vmdev))
+		return PTR_ERR(vmdev);
+
+	vmdev->mdev = mdev;
+	vmdev->group = mdev->group;
+
+	ret = vfio_add_group_dev(dev, &vfio_mdev_dev_ops, vmdev);
+	if (ret)
+		kfree(vmdev);
+
+	return ret;
+}
+
+static void vfio_mdev_remove(struct device *dev)
+{
+	struct vfio_mdev *vmdev;
+
+	vmdev = vfio_del_group_dev(dev);
+	kfree(vmdev);
+}
+
+static int vfio_mdev_online(struct device *dev)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->start)
+		return host->ops->start(mdev);
+
+	return -ENOTSUPP;
+}
+
+static int vfio_mdev_offline(struct device *dev)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->stop)
+		return host->ops->stop(mdev);
+
+	return -ENOTSUPP;
+}
+
+static struct mdev_driver vfio_mdev_driver = {
+	.name		= "vfio_mdev",
+	.probe		= vfio_mdev_probe,
+	.remove		= vfio_mdev_remove,
+	.online		= vfio_mdev_online,
+	.offline	= vfio_mdev_offline,
+};
+
+static int __init vfio_mdev_init(void)
+{
+	return mdev_register_driver(&vfio_mdev_driver, THIS_MODULE);
+}
+
+static void __exit vfio_mdev_exit(void)
+{
+	mdev_unregister_driver(&vfio_mdev_driver);
+}
+
+module_init(vfio_mdev_init)
+module_exit(vfio_mdev_exit)
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v2 2/4] vfio: VFIO bus driver for MDEV devices
@ 2016-09-02  8:16   ` Jike Song
  0 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	zhiyuan.lv, jike.song, pbonzini, kraxel

This driver implements the VFIO bus support for MDEV devices. Vendor drivers
are expected to register with MDEV core driver. MDEV core driver creates
mediated device and calls probe routine of this driver, then the mdev
is added to VFIO core module.

Main aim of this module is to manage all VFIO APIs for each mediated PCI
device. See Documentation/vfio-mediated-device.txt for details.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/mdev/Kconfig     |   8 ++
 drivers/vfio/mdev/Makefile    |   1 +
 drivers/vfio/mdev/vfio_mdev.c | 187 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 196 insertions(+)
 create mode 100644 drivers/vfio/mdev/vfio_mdev.c

diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig
index d25439f..b2fe0c6 100644
--- a/drivers/vfio/mdev/Kconfig
+++ b/drivers/vfio/mdev/Kconfig
@@ -8,3 +8,11 @@ config MDEV
 	See Documentation/vfio-mediated-device.txt for more details.
 
         If you don't know what do here, say N.
+
+config VFIO_MDEV
+    tristate "VFIO Bus driver for Mediated devices"
+    depends on VFIO && MDEV
+    default n
+    help
+        VFIO Bus driver for mediated devices.
+
diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile
index 8bd78b5..ee9f89f 100644
--- a/drivers/vfio/mdev/Makefile
+++ b/drivers/vfio/mdev/Makefile
@@ -2,3 +2,4 @@
 mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o
 
 obj-$(CONFIG_MDEV) += mdev.o
+obj-$(CONFIG_VFIO_MDEV) += vfio_mdev.o
diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
new file mode 100644
index 0000000..c22ebd8
--- /dev/null
+++ b/drivers/vfio/mdev/vfio_mdev.c
@@ -0,0 +1,187 @@
+/*
+ * VFIO Bus driver for Mediated device
+ *
+ * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
+ *     Author: Neo Jia <cjia@nvidia.com>
+ *	       Kirti Wankhede <kwankhede@nvidia.com>
+ *
+ * Copyright (c) 2016 Intel Corporation.
+ * Author:
+ *	Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *	Jike Song <jike.song@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/vfio.h>
+#include <linux/iommu.h>
+#include <linux/mdev.h>
+
+#include "mdev_private.h"
+
+#define DRIVER_VERSION  "0.2"
+#define DRIVER_AUTHOR   "NVIDIA Corporation"
+#define DRIVER_DESC     "VFIO Bus driver for Mediated device"
+
+struct vfio_mdev {
+	struct iommu_group *group;
+	struct mdev_device *mdev;
+};
+
+static int vfio_mdev_open(void *device_data)
+{
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
+	return 0;
+}
+
+static void vfio_mdev_close(void *device_data)
+{
+	module_put(THIS_MODULE);
+}
+
+static long vfio_mdev_unlocked_ioctl(void *device_data,
+				     unsigned int cmd, unsigned long arg)
+{
+	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->ioctl)
+		return host->ops->ioctl(mdev, cmd, arg);
+
+	return -ENODEV;
+}
+
+static ssize_t vfio_mdev_read(void *device_data, char __user *buf,
+			      size_t count, loff_t *ppos)
+{
+	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->read)
+		return host->ops->read(mdev, buf, count, ppos);
+
+	return -ENODEV;
+}
+
+static ssize_t vfio_mdev_write(void *device_data, const char __user *buf,
+			       size_t count, loff_t *ppos)
+{
+	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->write)
+		return host->ops->write(mdev, buf, count, ppos);
+
+	return -ENODEV;
+}
+
+static int vfio_mdev_mmap(void *device_data, struct vm_area_struct *vma)
+{
+	struct vfio_mdev *vmdev = device_data;
+	struct mdev_device *mdev = vmdev->mdev;
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->mmap)
+		return host->ops->mmap(mdev, vma);
+
+	return -ENODEV;
+}
+
+static const struct vfio_device_ops vfio_mdev_dev_ops = {
+	.name		= "vfio-mdev",
+	.open		= vfio_mdev_open,
+	.release	= vfio_mdev_close,
+	.ioctl		= vfio_mdev_unlocked_ioctl,
+	.read		= vfio_mdev_read,
+	.write		= vfio_mdev_write,
+	.mmap		= vfio_mdev_mmap,
+};
+
+static int vfio_mdev_probe(struct device *dev)
+{
+	struct vfio_mdev *vmdev;
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	int ret;
+
+	vmdev = kzalloc(sizeof(*vmdev), GFP_KERNEL);
+	if (IS_ERR(vmdev))
+		return PTR_ERR(vmdev);
+
+	vmdev->mdev = mdev;
+	vmdev->group = mdev->group;
+
+	ret = vfio_add_group_dev(dev, &vfio_mdev_dev_ops, vmdev);
+	if (ret)
+		kfree(vmdev);
+
+	return ret;
+}
+
+static void vfio_mdev_remove(struct device *dev)
+{
+	struct vfio_mdev *vmdev;
+
+	vmdev = vfio_del_group_dev(dev);
+	kfree(vmdev);
+}
+
+static int vfio_mdev_online(struct device *dev)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->start)
+		return host->ops->start(mdev);
+
+	return -ENOTSUPP;
+}
+
+static int vfio_mdev_offline(struct device *dev)
+{
+	struct mdev_device *mdev = dev_to_mdev(dev);
+	struct mdev_host *host = dev_to_host(mdev->dev.parent);
+
+	if (host->ops->stop)
+		return host->ops->stop(mdev);
+
+	return -ENOTSUPP;
+}
+
+static struct mdev_driver vfio_mdev_driver = {
+	.name		= "vfio_mdev",
+	.probe		= vfio_mdev_probe,
+	.remove		= vfio_mdev_remove,
+	.online		= vfio_mdev_online,
+	.offline	= vfio_mdev_offline,
+};
+
+static int __init vfio_mdev_init(void)
+{
+	return mdev_register_driver(&vfio_mdev_driver, THIS_MODULE);
+}
+
+static void __exit vfio_mdev_exit(void)
+{
+	mdev_unregister_driver(&vfio_mdev_driver);
+}
+
+module_init(vfio_mdev_init)
+module_exit(vfio_mdev_exit)
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC v2 3/4] vfio iommu: Add support for mediated devices
  2016-09-02  8:16 ` [Qemu-devel] " Jike Song
@ 2016-09-02  8:16   ` Jike Song
  -1 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: kevin.tian, guangrong.xiao, kvm, qemu-devel, zhenyuw, jike.song,
	zhiyuan.lv, pbonzini, bjsdjshi, kraxel

From: Kirti Wankhede <kwankhede@nvidia.com>

VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
Mediated device only uses IOMMU APIs, the underlying hardware can be
managed by an IOMMU domain.

Aim of this change is:
- To use most of the code of TYPE1 IOMMU driver for mediated devices
- To support direct assigned device and mediated device in single module

Added two new callback functions to struct vfio_iommu_driver_ops. Backend
IOMMU module that supports pining and unpinning pages for mdev devices
should provide these functions.
Added APIs for pining and unpining pages to VFIO module. These calls back
into backend iommu module to actually pin and unpin pages.

This change adds pin and unpin support for mediated device to TYPE1 IOMMU
backend module. More details:
- When iommu_group of mediated devices is attached, task structure is
  cached which is used later to pin pages and page accounting.
- It keeps track of pinned pages for mediated domain. This data is used to
  verify unpinning request and to unpin remaining pages while detaching, if
  there are any.
- Used existing mechanism for page accounting. If iommu capable domain
  exist in the container then all pages are already pinned and accounted.
  Accouting for mdev device is only done if there is no iommu capable
  domain in the container.

Tested by assigning below combinations of devices to a single VM:
- GPU pass through only
- vGPU device only
- One GPU pass through and one vGPU device
- two GPU pass through

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/vfio.c             |  82 +++++++
 drivers/vfio/vfio_iommu_type1.c | 499 ++++++++++++++++++++++++++++++++++++----
 include/linux/vfio.h            |  13 +-
 3 files changed, 546 insertions(+), 48 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0..397c4be 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1782,6 +1782,88 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset)
 }
 EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
 
+/*
+ * Pin a set of guest PFNs and return their associated host PFNs for mediated
+ * domain only.
+ * @user_pfn [in]: array of user/guest PFNs
+ * @npage [in]: count of array elements
+ * @prot [in] : protection flags
+ * @phys_pfn[out] : array of host PFNs
+ */
+long vfio_pin_pages(struct mdev_device *mdev, unsigned long *user_pfn,
+		    long npage, int prot, unsigned long *phys_pfn)
+{
+	struct vfio_device *device;
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	ssize_t ret = -EINVAL;
+
+	if (!mdev || !user_pfn || !phys_pfn)
+		return -EINVAL;
+
+	device = dev_get_drvdata(&mdev->dev);
+
+	if (!device || !device->group)
+		return -EINVAL;
+
+	container = device->group->container;
+
+	if (!container)
+		return -EINVAL;
+
+	down_read(&container->group_lock);
+
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->pin_pages))
+		ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
+					     npage, prot, phys_pfn);
+
+	up_read(&container->group_lock);
+
+	return ret;
+
+}
+EXPORT_SYMBOL(vfio_pin_pages);
+
+/*
+ * Unpin set of host PFNs for mediated domain only.
+ * @pfn [in] : array of host PFNs to be unpinned.
+ * @npage [in] :count of elements in array, that is number of pages.
+ */
+long vfio_unpin_pages(struct mdev_device *mdev, unsigned long *pfn, long npage)
+{
+	struct vfio_device *device;
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	ssize_t ret = -EINVAL;
+
+	if (!mdev || !pfn)
+		return -EINVAL;
+
+	device = dev_get_drvdata(&mdev->dev);
+
+	if (!device || !device->group)
+		return -EINVAL;
+
+	container = device->group->container;
+
+	if (!container)
+		return -EINVAL;
+
+	down_read(&container->group_lock);
+
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->unpin_pages))
+		ret = driver->ops->unpin_pages(container->iommu_data, pfn,
+					       npage);
+
+	up_read(&container->group_lock);
+
+	return ret;
+
+}
+EXPORT_SYMBOL(vfio_unpin_pages);
+
 /**
  * Module/class support
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..3cd8098 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -55,18 +55,26 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct vfio_domain	*mediated_domain;
 	struct mutex		lock;
 	struct rb_root		dma_list;
 	bool			v2;
 	bool			nesting;
 };
 
+struct mdev_addr_space {
+	struct task_struct	*task;
+	struct rb_root		pfn_list;	/* pinned Host pfn list */
+	struct mutex		pfn_list_lock;	/* mutex for pfn_list */
+};
+
 struct vfio_domain {
 	struct iommu_domain	*domain;
 	struct list_head	next;
 	struct list_head	group_list;
 	int			prot;		/* IOMMU_CACHE */
 	bool			fgsp;		/* Fine-grained super pages */
+	struct mdev_addr_space	*mdev_addr_space;
 };
 
 struct vfio_dma {
@@ -83,6 +91,22 @@ struct vfio_group {
 };
 
 /*
+ * Guest RAM pinning working set or DMA target
+ */
+struct vfio_pfn {
+	struct rb_node		node;
+	unsigned long		vaddr;		/* virtual addr */
+	dma_addr_t		iova;		/* IOVA */
+	unsigned long		pfn;		/* Host pfn */
+	size_t			prot;
+	atomic_t		ref_count;
+};
+
+
+#define IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu)	\
+			 (list_empty(&iommu->domain_list) ? false : true)
+
+/*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
  */
@@ -130,6 +154,84 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, struct vfio_dma *old)
 	rb_erase(&old->node, &iommu->dma_list);
 }
 
+/*
+ * Helper Functions for host pfn list
+ */
+
+static struct vfio_pfn *vfio_find_pfn(struct vfio_domain *domain,
+				      unsigned long pfn)
+{
+	struct rb_node *node;
+	struct vfio_pfn *vpfn, *ret = NULL;
+
+	node = domain->mdev_addr_space->pfn_list.rb_node;
+
+	while (node) {
+		vpfn = rb_entry(node, struct vfio_pfn, node);
+
+		if (pfn < vpfn->pfn)
+			node = node->rb_left;
+		else if (pfn > vpfn->pfn)
+			node = node->rb_right;
+		else {
+			ret = vpfn;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void vfio_link_pfn(struct vfio_domain *domain, struct vfio_pfn *new)
+{
+	struct rb_node **link, *parent = NULL;
+	struct vfio_pfn *vpfn;
+
+	link = &domain->mdev_addr_space->pfn_list.rb_node;
+	while (*link) {
+		parent = *link;
+		vpfn = rb_entry(parent, struct vfio_pfn, node);
+
+		if (new->pfn < vpfn->pfn)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&new->node, parent, link);
+	rb_insert_color(&new->node, &domain->mdev_addr_space->pfn_list);
+}
+
+static void vfio_unlink_pfn(struct vfio_domain *domain, struct vfio_pfn *old)
+{
+	rb_erase(&old->node, &domain->mdev_addr_space->pfn_list);
+}
+
+static int vfio_add_to_pfn_list(struct vfio_domain *domain, unsigned long vaddr,
+				dma_addr_t iova, unsigned long pfn, size_t prot)
+{
+	struct vfio_pfn *vpfn;
+
+	vpfn = kzalloc(sizeof(*vpfn), GFP_KERNEL);
+	if (!vpfn)
+		return -ENOMEM;
+
+	vpfn->vaddr = vaddr;
+	vpfn->iova = iova;
+	vpfn->pfn = pfn;
+	vpfn->prot = prot;
+	atomic_set(&vpfn->ref_count, 1);
+	vfio_link_pfn(domain, vpfn);
+	return 0;
+}
+
+static void vfio_remove_from_pfn_list(struct vfio_domain *domain,
+				      struct vfio_pfn *vpfn)
+{
+	vfio_unlink_pfn(domain, vpfn);
+	kfree(vpfn);
+}
+
 struct vwork {
 	struct mm_struct	*mm;
 	long			npage;
@@ -150,17 +252,17 @@ static void vfio_lock_acct_bg(struct work_struct *work)
 	kfree(vwork);
 }
 
-static void vfio_lock_acct(long npage)
+static void vfio_lock_acct(struct task_struct *task, long npage)
 {
 	struct vwork *vwork;
 	struct mm_struct *mm;
 
-	if (!current->mm || !npage)
+	if (!task->mm || !npage)
 		return; /* process exited or nothing to do */
 
-	if (down_write_trylock(&current->mm->mmap_sem)) {
-		current->mm->locked_vm += npage;
-		up_write(&current->mm->mmap_sem);
+	if (down_write_trylock(&task->mm->mmap_sem)) {
+		task->mm->locked_vm += npage;
+		up_write(&task->mm->mmap_sem);
 		return;
 	}
 
@@ -172,7 +274,7 @@ static void vfio_lock_acct(long npage)
 	vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
 	if (!vwork)
 		return;
-	mm = get_task_mm(current);
+	mm = get_task_mm(task);
 	if (!mm) {
 		kfree(vwork);
 		return;
@@ -228,20 +330,31 @@ static int put_pfn(unsigned long pfn, int prot)
 	return 0;
 }
 
-static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
+static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
+			 int prot, unsigned long *pfn)
 {
 	struct page *page[1];
 	struct vm_area_struct *vma;
+	struct mm_struct *local_mm = mm ? mm : current->mm;
 	int ret = -EFAULT;
 
-	if (get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), page) == 1) {
+	if (mm) {
+		down_read(&local_mm->mmap_sem);
+		ret = get_user_pages_remote(NULL, local_mm, vaddr, 1,
+					!!(prot & IOMMU_WRITE), 0, page, NULL);
+		up_read(&local_mm->mmap_sem);
+	} else
+		ret = get_user_pages_fast(vaddr, 1,
+					  !!(prot & IOMMU_WRITE), page);
+
+	if (ret == 1) {
 		*pfn = page_to_pfn(page[0]);
 		return 0;
 	}
 
-	down_read(&current->mm->mmap_sem);
+	down_read(&local_mm->mmap_sem);
 
-	vma = find_vma_intersection(current->mm, vaddr, vaddr + 1);
+	vma = find_vma_intersection(local_mm, vaddr, vaddr + 1);
 
 	if (vma && vma->vm_flags & VM_PFNMAP) {
 		*pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
@@ -249,7 +362,7 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
 			ret = 0;
 	}
 
-	up_read(&current->mm->mmap_sem);
+	up_read(&local_mm->mmap_sem);
 
 	return ret;
 }
@@ -259,8 +372,8 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
  * the iommu can only map chunks of consecutive pfns anyway, so get the
  * first page and all consecutive pages with the same locking.
  */
-static long vfio_pin_pages(unsigned long vaddr, long npage,
-			   int prot, unsigned long *pfn_base)
+static long __vfio_pin_pages(unsigned long vaddr, long npage,
+			     int prot, unsigned long *pfn_base)
 {
 	unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 	bool lock_cap = capable(CAP_IPC_LOCK);
@@ -270,7 +383,7 @@ static long vfio_pin_pages(unsigned long vaddr, long npage,
 	if (!current->mm)
 		return -ENODEV;
 
-	ret = vaddr_get_pfn(vaddr, prot, pfn_base);
+	ret = vaddr_get_pfn(NULL, vaddr, prot, pfn_base);
 	if (ret)
 		return ret;
 
@@ -285,7 +398,7 @@ static long vfio_pin_pages(unsigned long vaddr, long npage,
 
 	if (unlikely(disable_hugepages)) {
 		if (!rsvd)
-			vfio_lock_acct(1);
+			vfio_lock_acct(current, 1);
 		return 1;
 	}
 
@@ -293,7 +406,7 @@ static long vfio_pin_pages(unsigned long vaddr, long npage,
 	for (i = 1, vaddr += PAGE_SIZE; i < npage; i++, vaddr += PAGE_SIZE) {
 		unsigned long pfn = 0;
 
-		ret = vaddr_get_pfn(vaddr, prot, &pfn);
+		ret = vaddr_get_pfn(NULL, vaddr, prot, &pfn);
 		if (ret)
 			break;
 
@@ -313,13 +426,13 @@ static long vfio_pin_pages(unsigned long vaddr, long npage,
 	}
 
 	if (!rsvd)
-		vfio_lock_acct(i);
+		vfio_lock_acct(current, i);
 
 	return i;
 }
 
-static long vfio_unpin_pages(unsigned long pfn, long npage,
-			     int prot, bool do_accounting)
+static long __vfio_unpin_pages(unsigned long pfn, long npage, int prot,
+			       bool do_accounting)
 {
 	unsigned long unlocked = 0;
 	long i;
@@ -328,7 +441,188 @@ static long vfio_unpin_pages(unsigned long pfn, long npage,
 		unlocked += put_pfn(pfn++, prot);
 
 	if (do_accounting)
-		vfio_lock_acct(-unlocked);
+		vfio_lock_acct(current, -unlocked);
+	return unlocked;
+}
+
+static long __vfio_pin_pages_for_mdev(struct vfio_domain *domain,
+				      unsigned long vaddr, int prot,
+				      unsigned long *pfn_base,
+				      bool do_accounting)
+{
+	unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+	bool lock_cap = capable(CAP_IPC_LOCK);
+	long ret;
+	bool rsvd;
+	struct task_struct *task = domain->mdev_addr_space->task;
+
+	if (!task->mm)
+		return -ENODEV;
+
+	ret = vaddr_get_pfn(task->mm, vaddr, prot, pfn_base);
+	if (ret)
+		return ret;
+
+	rsvd = is_invalid_reserved_pfn(*pfn_base);
+
+	if (!rsvd && !lock_cap && task->mm->locked_vm + 1 > limit) {
+		put_pfn(*pfn_base, prot);
+		pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__,
+			limit << PAGE_SHIFT);
+		return -ENOMEM;
+	}
+
+	if (!rsvd && do_accounting)
+		vfio_lock_acct(task, 1);
+
+	return 1;
+}
+
+static void __vfio_unpin_pages_for_mdev(struct vfio_domain *domain,
+					unsigned long pfn, int prot,
+					bool do_accounting)
+{
+	put_pfn(pfn, prot);
+
+	if (do_accounting)
+		vfio_lock_acct(domain->mdev_addr_space->task, -1);
+}
+
+static int vfio_unpin_pfn(struct vfio_domain *domain,
+			  struct vfio_pfn *vpfn, bool do_accounting)
+{
+	__vfio_unpin_pages_for_mdev(domain, vpfn->pfn, vpfn->prot,
+				    do_accounting);
+
+	if (atomic_dec_and_test(&vpfn->ref_count))
+		vfio_remove_from_pfn_list(domain, vpfn);
+
+	return 1;
+}
+
+static long vfio_iommu_type1_pin_pages(void *iommu_data,
+				       unsigned long *user_pfn,
+				       long npage, int prot,
+				       unsigned long *phys_pfn)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_domain *domain;
+	int i, j, ret;
+	long retpage;
+	unsigned long remote_vaddr;
+	unsigned long *pfn = phys_pfn;
+	struct vfio_dma *dma;
+	bool do_accounting = false;
+
+	if (!iommu || !user_pfn || !phys_pfn)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	if (!iommu->mediated_domain) {
+		ret = -EINVAL;
+		goto pin_done;
+	}
+
+	domain = iommu->mediated_domain;
+
+	/*
+	 * If iommu capable domain exist in the container then all pages are
+	 * already pinned and accounted. Accouting should be done if there is no
+	 * iommu capable domain in the container.
+	 */
+	do_accounting = !IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu);
+
+	for (i = 0; i < npage; i++) {
+		struct vfio_pfn *p;
+		dma_addr_t iova;
+
+		iova = user_pfn[i] << PAGE_SHIFT;
+
+		dma = vfio_find_dma(iommu, iova, 0);
+		if (!dma) {
+			ret = -EINVAL;
+			goto pin_unwind;
+		}
+
+		remote_vaddr = dma->vaddr + iova - dma->iova;
+
+		retpage = __vfio_pin_pages_for_mdev(domain, remote_vaddr, prot,
+						    &pfn[i], do_accounting);
+		if (retpage <= 0) {
+			WARN_ON(!retpage);
+			ret = (int)retpage;
+			goto pin_unwind;
+		}
+
+		mutex_lock(&domain->mdev_addr_space->pfn_list_lock);
+
+		/* search if pfn exist */
+		p = vfio_find_pfn(domain, pfn[i]);
+		if (p) {
+			atomic_inc(&p->ref_count);
+			mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+			continue;
+		}
+
+		ret = vfio_add_to_pfn_list(domain, remote_vaddr, iova,
+					   pfn[i], prot);
+		mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+
+		if (ret) {
+			__vfio_unpin_pages_for_mdev(domain, pfn[i], prot,
+						    do_accounting);
+			goto pin_unwind;
+		}
+	}
+
+	ret = i;
+	goto pin_done;
+
+pin_unwind:
+	pfn[i] = 0;
+	mutex_lock(&domain->mdev_addr_space->pfn_list_lock);
+	for (j = 0; j < i; j++) {
+		struct vfio_pfn *p;
+
+		p = vfio_find_pfn(domain, pfn[j]);
+		if (p)
+			vfio_unpin_pfn(domain, p, do_accounting);
+
+		pfn[j] = 0;
+	}
+	mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+
+pin_done:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static long vfio_iommu_type1_unpin_pages(void *iommu_data, unsigned long *pfn,
+					 long npage)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_domain *domain = NULL;
+	long unlocked = 0;
+	int i;
+
+	if (!iommu || !pfn)
+		return -EINVAL;
+
+	domain = iommu->mediated_domain;
+
+	for (i = 0; i < npage; i++) {
+		struct vfio_pfn *p;
+
+		mutex_lock(&domain->mdev_addr_space->pfn_list_lock);
+
+		/* verify if pfn exist in pfn_list */
+		p = vfio_find_pfn(domain, pfn[i]);
+		if (p)
+			unlocked += vfio_unpin_pfn(domain, p, true);
+
+		mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+	}
 
 	return unlocked;
 }
@@ -341,6 +635,9 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 
 	if (!dma->size)
 		return;
+
+	if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu))
+		return;
 	/*
 	 * We use the IOMMU to track the physical addresses, otherwise we'd
 	 * need a much more complicated tracking system.  Unfortunately that
@@ -382,15 +679,15 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 		if (WARN_ON(!unmapped))
 			break;
 
-		unlocked += vfio_unpin_pages(phys >> PAGE_SHIFT,
-					     unmapped >> PAGE_SHIFT,
-					     dma->prot, false);
+		unlocked += __vfio_unpin_pages(phys >> PAGE_SHIFT,
+					       unmapped >> PAGE_SHIFT,
+					       dma->prot, false);
 		iova += unmapped;
 
 		cond_resched();
 	}
 
-	vfio_lock_acct(-unlocked);
+	vfio_lock_acct(current, -unlocked);
 }
 
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
@@ -611,10 +908,16 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	/* Insert zero-sized and grow as we map chunks of it */
 	vfio_link_dma(iommu, dma);
 
+	/* Don't pin and map if container doesn't contain IOMMU capable domain*/
+	if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu)) {
+		dma->size = size;
+		goto map_done;
+	}
+
 	while (size) {
 		/* Pin a contiguous chunk of memory */
-		npage = vfio_pin_pages(vaddr + dma->size,
-				       size >> PAGE_SHIFT, prot, &pfn);
+		npage = __vfio_pin_pages(vaddr + dma->size,
+					 size >> PAGE_SHIFT, prot, &pfn);
 		if (npage <= 0) {
 			WARN_ON(!npage);
 			ret = (int)npage;
@@ -624,7 +927,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		/* Map it! */
 		ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, prot);
 		if (ret) {
-			vfio_unpin_pages(pfn, npage, prot, true);
+			__vfio_unpin_pages(pfn, npage, prot, true);
 			break;
 		}
 
@@ -635,6 +938,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	if (ret)
 		vfio_remove_dma(iommu, dma);
 
+map_done:
 	mutex_unlock(&iommu->lock);
 	return ret;
 }
@@ -734,11 +1038,24 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain)
 	__free_pages(pages, order);
 }
 
+static struct vfio_group *find_iommu_group(struct vfio_domain *domain,
+				   struct iommu_group *iommu_group)
+{
+	struct vfio_group *g;
+
+	list_for_each_entry(g, &domain->group_list, next) {
+		if (g->iommu_group == iommu_group)
+			return g;
+	}
+
+	return NULL;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
 	struct vfio_iommu *iommu = iommu_data;
-	struct vfio_group *group, *g;
+	struct vfio_group *group;
 	struct vfio_domain *domain, *d;
 	struct bus_type *bus = NULL;
 	int ret;
@@ -746,10 +1063,14 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	mutex_lock(&iommu->lock);
 
 	list_for_each_entry(d, &iommu->domain_list, next) {
-		list_for_each_entry(g, &d->group_list, next) {
-			if (g->iommu_group != iommu_group)
-				continue;
+		if (find_iommu_group(d, iommu_group)) {
+			mutex_unlock(&iommu->lock);
+			return -EINVAL;
+		}
+	}
 
+	if (iommu->mediated_domain) {
+		if (find_iommu_group(iommu->mediated_domain, iommu_group)) {
 			mutex_unlock(&iommu->lock);
 			return -EINVAL;
 		}
@@ -769,6 +1090,34 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_free;
 
+#if defined(CONFIG_VFIO_MDEV) || defined(CONFIG_VFIO_MDEV_MODULE)
+	if (!iommu_present(bus) && (bus == &mdev_bus_type)) {
+		if (iommu->mediated_domain) {
+			list_add(&group->next,
+				 &iommu->mediated_domain->group_list);
+			kfree(domain);
+			mutex_unlock(&iommu->lock);
+			return 0;
+		}
+
+		domain->mdev_addr_space = kzalloc(sizeof(*domain->mdev_addr_space),
+						  GFP_KERNEL);
+		if (!domain->mdev_addr_space) {
+			ret = -ENOMEM;
+			goto out_free;
+		}
+
+		domain->mdev_addr_space->task = current;
+		INIT_LIST_HEAD(&domain->group_list);
+		list_add(&group->next, &domain->group_list);
+		domain->mdev_addr_space->pfn_list = RB_ROOT;
+		mutex_init(&domain->mdev_addr_space->pfn_list_lock);
+		iommu->mediated_domain = domain;
+		mutex_unlock(&iommu->lock);
+		return 0;
+	}
+#endif
+
 	domain->domain = iommu_domain_alloc(bus);
 	if (!domain->domain) {
 		ret = -EIO;
@@ -859,6 +1208,18 @@ static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
 		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
 }
 
+static void vfio_mdev_unpin_all(struct vfio_domain *domain)
+{
+	struct rb_node *node;
+
+	mutex_lock(&domain->mdev_addr_space->pfn_list_lock);
+	while ((node = rb_first(&domain->mdev_addr_space->pfn_list))) {
+		vfio_unpin_pfn(domain,
+				rb_entry(node, struct vfio_pfn, node), false);
+	}
+	mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
 					  struct iommu_group *iommu_group)
 {
@@ -868,31 +1229,52 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 
 	mutex_lock(&iommu->lock);
 
-	list_for_each_entry(domain, &iommu->domain_list, next) {
-		list_for_each_entry(group, &domain->group_list, next) {
-			if (group->iommu_group != iommu_group)
-				continue;
+	if (iommu->mediated_domain) {
+		domain = iommu->mediated_domain;
+		group = find_iommu_group(domain, iommu_group);
+		if (group) {
+			list_del(&group->next);
+			kfree(group);
 
+			if (list_empty(&domain->group_list)) {
+				vfio_mdev_unpin_all(domain);
+				if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu))
+					vfio_iommu_unmap_unpin_all(iommu);
+				kfree(domain);
+				iommu->mediated_domain = NULL;
+			}
+			goto detach_group_done;
+		}
+	}
+
+	if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu))
+		goto detach_group_done;
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		group = find_iommu_group(domain, iommu_group);
+		if (group) {
 			iommu_detach_group(domain->domain, iommu_group);
 			list_del(&group->next);
 			kfree(group);
 			/*
 			 * Group ownership provides privilege, if the group
 			 * list is empty, the domain goes away.  If it's the
-			 * last domain, then all the mappings go away too.
+			 * last domain with iommu and mediated domain doesn't
+			 * exist, the all the mappings go away too.
 			 */
 			if (list_empty(&domain->group_list)) {
-				if (list_is_singular(&iommu->domain_list))
+				if (list_is_singular(&iommu->domain_list) &&
+				    !iommu->mediated_domain)
 					vfio_iommu_unmap_unpin_all(iommu);
 				iommu_domain_free(domain->domain);
 				list_del(&domain->next);
 				kfree(domain);
 			}
-			goto done;
+			break;
 		}
 	}
 
-done:
+detach_group_done:
 	mutex_unlock(&iommu->lock);
 }
 
@@ -924,27 +1306,48 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	return iommu;
 }
 
+static void vfio_release_domain(struct vfio_domain *domain)
+{
+	struct vfio_group *group, *group_tmp;
+
+	list_for_each_entry_safe(group, group_tmp,
+				 &domain->group_list, next) {
+		if (!domain->mdev_addr_space)
+			iommu_detach_group(domain->domain, group->iommu_group);
+		list_del(&group->next);
+		kfree(group);
+	}
+
+	if (domain->mdev_addr_space)
+		vfio_mdev_unpin_all(domain);
+	else
+		iommu_domain_free(domain->domain);
+}
+
 static void vfio_iommu_type1_release(void *iommu_data)
 {
 	struct vfio_iommu *iommu = iommu_data;
 	struct vfio_domain *domain, *domain_tmp;
-	struct vfio_group *group, *group_tmp;
+
+	if (iommu->mediated_domain) {
+		vfio_release_domain(iommu->mediated_domain);
+		kfree(iommu->mediated_domain);
+		iommu->mediated_domain = NULL;
+	}
 
 	vfio_iommu_unmap_unpin_all(iommu);
 
+	if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu))
+		goto release_exit;
+
 	list_for_each_entry_safe(domain, domain_tmp,
 				 &iommu->domain_list, next) {
-		list_for_each_entry_safe(group, group_tmp,
-					 &domain->group_list, next) {
-			iommu_detach_group(domain->domain, group->iommu_group);
-			list_del(&group->next);
-			kfree(group);
-		}
-		iommu_domain_free(domain->domain);
+		vfio_release_domain(domain);
 		list_del(&domain->next);
 		kfree(domain);
 	}
 
+release_exit:
 	kfree(iommu);
 }
 
@@ -1048,6 +1451,8 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.ioctl		= vfio_iommu_type1_ioctl,
 	.attach_group	= vfio_iommu_type1_attach_group,
 	.detach_group	= vfio_iommu_type1_detach_group,
+	.pin_pages	= vfio_iommu_type1_pin_pages,
+	.unpin_pages	= vfio_iommu_type1_unpin_pages,
 };
 
 static int __init vfio_iommu_type1_init(void)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 0ecae0b..f2f0daf 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -17,6 +17,7 @@
 #include <linux/workqueue.h>
 #include <linux/poll.h>
 #include <uapi/linux/vfio.h>
+#include <linux/mdev.h>
 
 /**
  * struct vfio_device_ops - VFIO bus driver device callbacks
@@ -75,7 +76,11 @@ struct vfio_iommu_driver_ops {
 					struct iommu_group *group);
 	void		(*detach_group)(void *iommu_data,
 					struct iommu_group *group);
-
+	long		(*pin_pages)(void *iommu_data, unsigned long *user_pfn,
+				     long npage, int prot,
+				     unsigned long *phys_pfn);
+	long		(*unpin_pages)(void *iommu_data, unsigned long *pfn,
+				       long npage);
 };
 
 extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
@@ -127,6 +132,12 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
 }
 #endif /* CONFIG_EEH */
 
+extern long vfio_pin_pages(struct mdev_device *mdev, unsigned long *user_pfn,
+			   long npage, int prot, unsigned long *phys_pfn);
+
+extern long vfio_unpin_pages(struct mdev_device *mdev, unsigned long *pfn,
+			     long npage);
+
 /*
  * IRQfd - generic
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v2 3/4] vfio iommu: Add support for mediated devices
@ 2016-09-02  8:16   ` Jike Song
  0 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	zhiyuan.lv, jike.song, pbonzini, kraxel

From: Kirti Wankhede <kwankhede@nvidia.com>

VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
Mediated device only uses IOMMU APIs, the underlying hardware can be
managed by an IOMMU domain.

Aim of this change is:
- To use most of the code of TYPE1 IOMMU driver for mediated devices
- To support direct assigned device and mediated device in single module

Added two new callback functions to struct vfio_iommu_driver_ops. Backend
IOMMU module that supports pining and unpinning pages for mdev devices
should provide these functions.
Added APIs for pining and unpining pages to VFIO module. These calls back
into backend iommu module to actually pin and unpin pages.

This change adds pin and unpin support for mediated device to TYPE1 IOMMU
backend module. More details:
- When iommu_group of mediated devices is attached, task structure is
  cached which is used later to pin pages and page accounting.
- It keeps track of pinned pages for mediated domain. This data is used to
  verify unpinning request and to unpin remaining pages while detaching, if
  there are any.
- Used existing mechanism for page accounting. If iommu capable domain
  exist in the container then all pages are already pinned and accounted.
  Accouting for mdev device is only done if there is no iommu capable
  domain in the container.

Tested by assigning below combinations of devices to a single VM:
- GPU pass through only
- vGPU device only
- One GPU pass through and one vGPU device
- two GPU pass through

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Change-Id: I295d6f0f2e0579b8d9882bfd8fd5a4194b97bd9a
Signed-off-by: Jike Song <jike.song@intel.com>
---
 drivers/vfio/vfio.c             |  82 +++++++
 drivers/vfio/vfio_iommu_type1.c | 499 ++++++++++++++++++++++++++++++++++++----
 include/linux/vfio.h            |  13 +-
 3 files changed, 546 insertions(+), 48 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0..397c4be 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1782,6 +1782,88 @@ void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset)
 }
 EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
 
+/*
+ * Pin a set of guest PFNs and return their associated host PFNs for mediated
+ * domain only.
+ * @user_pfn [in]: array of user/guest PFNs
+ * @npage [in]: count of array elements
+ * @prot [in] : protection flags
+ * @phys_pfn[out] : array of host PFNs
+ */
+long vfio_pin_pages(struct mdev_device *mdev, unsigned long *user_pfn,
+		    long npage, int prot, unsigned long *phys_pfn)
+{
+	struct vfio_device *device;
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	ssize_t ret = -EINVAL;
+
+	if (!mdev || !user_pfn || !phys_pfn)
+		return -EINVAL;
+
+	device = dev_get_drvdata(&mdev->dev);
+
+	if (!device || !device->group)
+		return -EINVAL;
+
+	container = device->group->container;
+
+	if (!container)
+		return -EINVAL;
+
+	down_read(&container->group_lock);
+
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->pin_pages))
+		ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
+					     npage, prot, phys_pfn);
+
+	up_read(&container->group_lock);
+
+	return ret;
+
+}
+EXPORT_SYMBOL(vfio_pin_pages);
+
+/*
+ * Unpin set of host PFNs for mediated domain only.
+ * @pfn [in] : array of host PFNs to be unpinned.
+ * @npage [in] :count of elements in array, that is number of pages.
+ */
+long vfio_unpin_pages(struct mdev_device *mdev, unsigned long *pfn, long npage)
+{
+	struct vfio_device *device;
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	ssize_t ret = -EINVAL;
+
+	if (!mdev || !pfn)
+		return -EINVAL;
+
+	device = dev_get_drvdata(&mdev->dev);
+
+	if (!device || !device->group)
+		return -EINVAL;
+
+	container = device->group->container;
+
+	if (!container)
+		return -EINVAL;
+
+	down_read(&container->group_lock);
+
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->unpin_pages))
+		ret = driver->ops->unpin_pages(container->iommu_data, pfn,
+					       npage);
+
+	up_read(&container->group_lock);
+
+	return ret;
+
+}
+EXPORT_SYMBOL(vfio_unpin_pages);
+
 /**
  * Module/class support
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..3cd8098 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -55,18 +55,26 @@ MODULE_PARM_DESC(disable_hugepages,
 
 struct vfio_iommu {
 	struct list_head	domain_list;
+	struct vfio_domain	*mediated_domain;
 	struct mutex		lock;
 	struct rb_root		dma_list;
 	bool			v2;
 	bool			nesting;
 };
 
+struct mdev_addr_space {
+	struct task_struct	*task;
+	struct rb_root		pfn_list;	/* pinned Host pfn list */
+	struct mutex		pfn_list_lock;	/* mutex for pfn_list */
+};
+
 struct vfio_domain {
 	struct iommu_domain	*domain;
 	struct list_head	next;
 	struct list_head	group_list;
 	int			prot;		/* IOMMU_CACHE */
 	bool			fgsp;		/* Fine-grained super pages */
+	struct mdev_addr_space	*mdev_addr_space;
 };
 
 struct vfio_dma {
@@ -83,6 +91,22 @@ struct vfio_group {
 };
 
 /*
+ * Guest RAM pinning working set or DMA target
+ */
+struct vfio_pfn {
+	struct rb_node		node;
+	unsigned long		vaddr;		/* virtual addr */
+	dma_addr_t		iova;		/* IOVA */
+	unsigned long		pfn;		/* Host pfn */
+	size_t			prot;
+	atomic_t		ref_count;
+};
+
+
+#define IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu)	\
+			 (list_empty(&iommu->domain_list) ? false : true)
+
+/*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
  */
@@ -130,6 +154,84 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, struct vfio_dma *old)
 	rb_erase(&old->node, &iommu->dma_list);
 }
 
+/*
+ * Helper Functions for host pfn list
+ */
+
+static struct vfio_pfn *vfio_find_pfn(struct vfio_domain *domain,
+				      unsigned long pfn)
+{
+	struct rb_node *node;
+	struct vfio_pfn *vpfn, *ret = NULL;
+
+	node = domain->mdev_addr_space->pfn_list.rb_node;
+
+	while (node) {
+		vpfn = rb_entry(node, struct vfio_pfn, node);
+
+		if (pfn < vpfn->pfn)
+			node = node->rb_left;
+		else if (pfn > vpfn->pfn)
+			node = node->rb_right;
+		else {
+			ret = vpfn;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static void vfio_link_pfn(struct vfio_domain *domain, struct vfio_pfn *new)
+{
+	struct rb_node **link, *parent = NULL;
+	struct vfio_pfn *vpfn;
+
+	link = &domain->mdev_addr_space->pfn_list.rb_node;
+	while (*link) {
+		parent = *link;
+		vpfn = rb_entry(parent, struct vfio_pfn, node);
+
+		if (new->pfn < vpfn->pfn)
+			link = &(*link)->rb_left;
+		else
+			link = &(*link)->rb_right;
+	}
+
+	rb_link_node(&new->node, parent, link);
+	rb_insert_color(&new->node, &domain->mdev_addr_space->pfn_list);
+}
+
+static void vfio_unlink_pfn(struct vfio_domain *domain, struct vfio_pfn *old)
+{
+	rb_erase(&old->node, &domain->mdev_addr_space->pfn_list);
+}
+
+static int vfio_add_to_pfn_list(struct vfio_domain *domain, unsigned long vaddr,
+				dma_addr_t iova, unsigned long pfn, size_t prot)
+{
+	struct vfio_pfn *vpfn;
+
+	vpfn = kzalloc(sizeof(*vpfn), GFP_KERNEL);
+	if (!vpfn)
+		return -ENOMEM;
+
+	vpfn->vaddr = vaddr;
+	vpfn->iova = iova;
+	vpfn->pfn = pfn;
+	vpfn->prot = prot;
+	atomic_set(&vpfn->ref_count, 1);
+	vfio_link_pfn(domain, vpfn);
+	return 0;
+}
+
+static void vfio_remove_from_pfn_list(struct vfio_domain *domain,
+				      struct vfio_pfn *vpfn)
+{
+	vfio_unlink_pfn(domain, vpfn);
+	kfree(vpfn);
+}
+
 struct vwork {
 	struct mm_struct	*mm;
 	long			npage;
@@ -150,17 +252,17 @@ static void vfio_lock_acct_bg(struct work_struct *work)
 	kfree(vwork);
 }
 
-static void vfio_lock_acct(long npage)
+static void vfio_lock_acct(struct task_struct *task, long npage)
 {
 	struct vwork *vwork;
 	struct mm_struct *mm;
 
-	if (!current->mm || !npage)
+	if (!task->mm || !npage)
 		return; /* process exited or nothing to do */
 
-	if (down_write_trylock(&current->mm->mmap_sem)) {
-		current->mm->locked_vm += npage;
-		up_write(&current->mm->mmap_sem);
+	if (down_write_trylock(&task->mm->mmap_sem)) {
+		task->mm->locked_vm += npage;
+		up_write(&task->mm->mmap_sem);
 		return;
 	}
 
@@ -172,7 +274,7 @@ static void vfio_lock_acct(long npage)
 	vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
 	if (!vwork)
 		return;
-	mm = get_task_mm(current);
+	mm = get_task_mm(task);
 	if (!mm) {
 		kfree(vwork);
 		return;
@@ -228,20 +330,31 @@ static int put_pfn(unsigned long pfn, int prot)
 	return 0;
 }
 
-static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
+static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
+			 int prot, unsigned long *pfn)
 {
 	struct page *page[1];
 	struct vm_area_struct *vma;
+	struct mm_struct *local_mm = mm ? mm : current->mm;
 	int ret = -EFAULT;
 
-	if (get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE), page) == 1) {
+	if (mm) {
+		down_read(&local_mm->mmap_sem);
+		ret = get_user_pages_remote(NULL, local_mm, vaddr, 1,
+					!!(prot & IOMMU_WRITE), 0, page, NULL);
+		up_read(&local_mm->mmap_sem);
+	} else
+		ret = get_user_pages_fast(vaddr, 1,
+					  !!(prot & IOMMU_WRITE), page);
+
+	if (ret == 1) {
 		*pfn = page_to_pfn(page[0]);
 		return 0;
 	}
 
-	down_read(&current->mm->mmap_sem);
+	down_read(&local_mm->mmap_sem);
 
-	vma = find_vma_intersection(current->mm, vaddr, vaddr + 1);
+	vma = find_vma_intersection(local_mm, vaddr, vaddr + 1);
 
 	if (vma && vma->vm_flags & VM_PFNMAP) {
 		*pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
@@ -249,7 +362,7 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
 			ret = 0;
 	}
 
-	up_read(&current->mm->mmap_sem);
+	up_read(&local_mm->mmap_sem);
 
 	return ret;
 }
@@ -259,8 +372,8 @@ static int vaddr_get_pfn(unsigned long vaddr, int prot, unsigned long *pfn)
  * the iommu can only map chunks of consecutive pfns anyway, so get the
  * first page and all consecutive pages with the same locking.
  */
-static long vfio_pin_pages(unsigned long vaddr, long npage,
-			   int prot, unsigned long *pfn_base)
+static long __vfio_pin_pages(unsigned long vaddr, long npage,
+			     int prot, unsigned long *pfn_base)
 {
 	unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 	bool lock_cap = capable(CAP_IPC_LOCK);
@@ -270,7 +383,7 @@ static long vfio_pin_pages(unsigned long vaddr, long npage,
 	if (!current->mm)
 		return -ENODEV;
 
-	ret = vaddr_get_pfn(vaddr, prot, pfn_base);
+	ret = vaddr_get_pfn(NULL, vaddr, prot, pfn_base);
 	if (ret)
 		return ret;
 
@@ -285,7 +398,7 @@ static long vfio_pin_pages(unsigned long vaddr, long npage,
 
 	if (unlikely(disable_hugepages)) {
 		if (!rsvd)
-			vfio_lock_acct(1);
+			vfio_lock_acct(current, 1);
 		return 1;
 	}
 
@@ -293,7 +406,7 @@ static long vfio_pin_pages(unsigned long vaddr, long npage,
 	for (i = 1, vaddr += PAGE_SIZE; i < npage; i++, vaddr += PAGE_SIZE) {
 		unsigned long pfn = 0;
 
-		ret = vaddr_get_pfn(vaddr, prot, &pfn);
+		ret = vaddr_get_pfn(NULL, vaddr, prot, &pfn);
 		if (ret)
 			break;
 
@@ -313,13 +426,13 @@ static long vfio_pin_pages(unsigned long vaddr, long npage,
 	}
 
 	if (!rsvd)
-		vfio_lock_acct(i);
+		vfio_lock_acct(current, i);
 
 	return i;
 }
 
-static long vfio_unpin_pages(unsigned long pfn, long npage,
-			     int prot, bool do_accounting)
+static long __vfio_unpin_pages(unsigned long pfn, long npage, int prot,
+			       bool do_accounting)
 {
 	unsigned long unlocked = 0;
 	long i;
@@ -328,7 +441,188 @@ static long vfio_unpin_pages(unsigned long pfn, long npage,
 		unlocked += put_pfn(pfn++, prot);
 
 	if (do_accounting)
-		vfio_lock_acct(-unlocked);
+		vfio_lock_acct(current, -unlocked);
+	return unlocked;
+}
+
+static long __vfio_pin_pages_for_mdev(struct vfio_domain *domain,
+				      unsigned long vaddr, int prot,
+				      unsigned long *pfn_base,
+				      bool do_accounting)
+{
+	unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+	bool lock_cap = capable(CAP_IPC_LOCK);
+	long ret;
+	bool rsvd;
+	struct task_struct *task = domain->mdev_addr_space->task;
+
+	if (!task->mm)
+		return -ENODEV;
+
+	ret = vaddr_get_pfn(task->mm, vaddr, prot, pfn_base);
+	if (ret)
+		return ret;
+
+	rsvd = is_invalid_reserved_pfn(*pfn_base);
+
+	if (!rsvd && !lock_cap && task->mm->locked_vm + 1 > limit) {
+		put_pfn(*pfn_base, prot);
+		pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", __func__,
+			limit << PAGE_SHIFT);
+		return -ENOMEM;
+	}
+
+	if (!rsvd && do_accounting)
+		vfio_lock_acct(task, 1);
+
+	return 1;
+}
+
+static void __vfio_unpin_pages_for_mdev(struct vfio_domain *domain,
+					unsigned long pfn, int prot,
+					bool do_accounting)
+{
+	put_pfn(pfn, prot);
+
+	if (do_accounting)
+		vfio_lock_acct(domain->mdev_addr_space->task, -1);
+}
+
+static int vfio_unpin_pfn(struct vfio_domain *domain,
+			  struct vfio_pfn *vpfn, bool do_accounting)
+{
+	__vfio_unpin_pages_for_mdev(domain, vpfn->pfn, vpfn->prot,
+				    do_accounting);
+
+	if (atomic_dec_and_test(&vpfn->ref_count))
+		vfio_remove_from_pfn_list(domain, vpfn);
+
+	return 1;
+}
+
+static long vfio_iommu_type1_pin_pages(void *iommu_data,
+				       unsigned long *user_pfn,
+				       long npage, int prot,
+				       unsigned long *phys_pfn)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_domain *domain;
+	int i, j, ret;
+	long retpage;
+	unsigned long remote_vaddr;
+	unsigned long *pfn = phys_pfn;
+	struct vfio_dma *dma;
+	bool do_accounting = false;
+
+	if (!iommu || !user_pfn || !phys_pfn)
+		return -EINVAL;
+
+	mutex_lock(&iommu->lock);
+
+	if (!iommu->mediated_domain) {
+		ret = -EINVAL;
+		goto pin_done;
+	}
+
+	domain = iommu->mediated_domain;
+
+	/*
+	 * If iommu capable domain exist in the container then all pages are
+	 * already pinned and accounted. Accouting should be done if there is no
+	 * iommu capable domain in the container.
+	 */
+	do_accounting = !IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu);
+
+	for (i = 0; i < npage; i++) {
+		struct vfio_pfn *p;
+		dma_addr_t iova;
+
+		iova = user_pfn[i] << PAGE_SHIFT;
+
+		dma = vfio_find_dma(iommu, iova, 0);
+		if (!dma) {
+			ret = -EINVAL;
+			goto pin_unwind;
+		}
+
+		remote_vaddr = dma->vaddr + iova - dma->iova;
+
+		retpage = __vfio_pin_pages_for_mdev(domain, remote_vaddr, prot,
+						    &pfn[i], do_accounting);
+		if (retpage <= 0) {
+			WARN_ON(!retpage);
+			ret = (int)retpage;
+			goto pin_unwind;
+		}
+
+		mutex_lock(&domain->mdev_addr_space->pfn_list_lock);
+
+		/* search if pfn exist */
+		p = vfio_find_pfn(domain, pfn[i]);
+		if (p) {
+			atomic_inc(&p->ref_count);
+			mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+			continue;
+		}
+
+		ret = vfio_add_to_pfn_list(domain, remote_vaddr, iova,
+					   pfn[i], prot);
+		mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+
+		if (ret) {
+			__vfio_unpin_pages_for_mdev(domain, pfn[i], prot,
+						    do_accounting);
+			goto pin_unwind;
+		}
+	}
+
+	ret = i;
+	goto pin_done;
+
+pin_unwind:
+	pfn[i] = 0;
+	mutex_lock(&domain->mdev_addr_space->pfn_list_lock);
+	for (j = 0; j < i; j++) {
+		struct vfio_pfn *p;
+
+		p = vfio_find_pfn(domain, pfn[j]);
+		if (p)
+			vfio_unpin_pfn(domain, p, do_accounting);
+
+		pfn[j] = 0;
+	}
+	mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+
+pin_done:
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
+static long vfio_iommu_type1_unpin_pages(void *iommu_data, unsigned long *pfn,
+					 long npage)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_domain *domain = NULL;
+	long unlocked = 0;
+	int i;
+
+	if (!iommu || !pfn)
+		return -EINVAL;
+
+	domain = iommu->mediated_domain;
+
+	for (i = 0; i < npage; i++) {
+		struct vfio_pfn *p;
+
+		mutex_lock(&domain->mdev_addr_space->pfn_list_lock);
+
+		/* verify if pfn exist in pfn_list */
+		p = vfio_find_pfn(domain, pfn[i]);
+		if (p)
+			unlocked += vfio_unpin_pfn(domain, p, true);
+
+		mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+	}
 
 	return unlocked;
 }
@@ -341,6 +635,9 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 
 	if (!dma->size)
 		return;
+
+	if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu))
+		return;
 	/*
 	 * We use the IOMMU to track the physical addresses, otherwise we'd
 	 * need a much more complicated tracking system.  Unfortunately that
@@ -382,15 +679,15 @@ static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
 		if (WARN_ON(!unmapped))
 			break;
 
-		unlocked += vfio_unpin_pages(phys >> PAGE_SHIFT,
-					     unmapped >> PAGE_SHIFT,
-					     dma->prot, false);
+		unlocked += __vfio_unpin_pages(phys >> PAGE_SHIFT,
+					       unmapped >> PAGE_SHIFT,
+					       dma->prot, false);
 		iova += unmapped;
 
 		cond_resched();
 	}
 
-	vfio_lock_acct(-unlocked);
+	vfio_lock_acct(current, -unlocked);
 }
 
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
@@ -611,10 +908,16 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	/* Insert zero-sized and grow as we map chunks of it */
 	vfio_link_dma(iommu, dma);
 
+	/* Don't pin and map if container doesn't contain IOMMU capable domain*/
+	if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu)) {
+		dma->size = size;
+		goto map_done;
+	}
+
 	while (size) {
 		/* Pin a contiguous chunk of memory */
-		npage = vfio_pin_pages(vaddr + dma->size,
-				       size >> PAGE_SHIFT, prot, &pfn);
+		npage = __vfio_pin_pages(vaddr + dma->size,
+					 size >> PAGE_SHIFT, prot, &pfn);
 		if (npage <= 0) {
 			WARN_ON(!npage);
 			ret = (int)npage;
@@ -624,7 +927,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		/* Map it! */
 		ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, prot);
 		if (ret) {
-			vfio_unpin_pages(pfn, npage, prot, true);
+			__vfio_unpin_pages(pfn, npage, prot, true);
 			break;
 		}
 
@@ -635,6 +938,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	if (ret)
 		vfio_remove_dma(iommu, dma);
 
+map_done:
 	mutex_unlock(&iommu->lock);
 	return ret;
 }
@@ -734,11 +1038,24 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain)
 	__free_pages(pages, order);
 }
 
+static struct vfio_group *find_iommu_group(struct vfio_domain *domain,
+				   struct iommu_group *iommu_group)
+{
+	struct vfio_group *g;
+
+	list_for_each_entry(g, &domain->group_list, next) {
+		if (g->iommu_group == iommu_group)
+			return g;
+	}
+
+	return NULL;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 					 struct iommu_group *iommu_group)
 {
 	struct vfio_iommu *iommu = iommu_data;
-	struct vfio_group *group, *g;
+	struct vfio_group *group;
 	struct vfio_domain *domain, *d;
 	struct bus_type *bus = NULL;
 	int ret;
@@ -746,10 +1063,14 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	mutex_lock(&iommu->lock);
 
 	list_for_each_entry(d, &iommu->domain_list, next) {
-		list_for_each_entry(g, &d->group_list, next) {
-			if (g->iommu_group != iommu_group)
-				continue;
+		if (find_iommu_group(d, iommu_group)) {
+			mutex_unlock(&iommu->lock);
+			return -EINVAL;
+		}
+	}
 
+	if (iommu->mediated_domain) {
+		if (find_iommu_group(iommu->mediated_domain, iommu_group)) {
 			mutex_unlock(&iommu->lock);
 			return -EINVAL;
 		}
@@ -769,6 +1090,34 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	if (ret)
 		goto out_free;
 
+#if defined(CONFIG_VFIO_MDEV) || defined(CONFIG_VFIO_MDEV_MODULE)
+	if (!iommu_present(bus) && (bus == &mdev_bus_type)) {
+		if (iommu->mediated_domain) {
+			list_add(&group->next,
+				 &iommu->mediated_domain->group_list);
+			kfree(domain);
+			mutex_unlock(&iommu->lock);
+			return 0;
+		}
+
+		domain->mdev_addr_space = kzalloc(sizeof(*domain->mdev_addr_space),
+						  GFP_KERNEL);
+		if (!domain->mdev_addr_space) {
+			ret = -ENOMEM;
+			goto out_free;
+		}
+
+		domain->mdev_addr_space->task = current;
+		INIT_LIST_HEAD(&domain->group_list);
+		list_add(&group->next, &domain->group_list);
+		domain->mdev_addr_space->pfn_list = RB_ROOT;
+		mutex_init(&domain->mdev_addr_space->pfn_list_lock);
+		iommu->mediated_domain = domain;
+		mutex_unlock(&iommu->lock);
+		return 0;
+	}
+#endif
+
 	domain->domain = iommu_domain_alloc(bus);
 	if (!domain->domain) {
 		ret = -EIO;
@@ -859,6 +1208,18 @@ static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
 		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
 }
 
+static void vfio_mdev_unpin_all(struct vfio_domain *domain)
+{
+	struct rb_node *node;
+
+	mutex_lock(&domain->mdev_addr_space->pfn_list_lock);
+	while ((node = rb_first(&domain->mdev_addr_space->pfn_list))) {
+		vfio_unpin_pfn(domain,
+				rb_entry(node, struct vfio_pfn, node), false);
+	}
+	mutex_unlock(&domain->mdev_addr_space->pfn_list_lock);
+}
+
 static void vfio_iommu_type1_detach_group(void *iommu_data,
 					  struct iommu_group *iommu_group)
 {
@@ -868,31 +1229,52 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
 
 	mutex_lock(&iommu->lock);
 
-	list_for_each_entry(domain, &iommu->domain_list, next) {
-		list_for_each_entry(group, &domain->group_list, next) {
-			if (group->iommu_group != iommu_group)
-				continue;
+	if (iommu->mediated_domain) {
+		domain = iommu->mediated_domain;
+		group = find_iommu_group(domain, iommu_group);
+		if (group) {
+			list_del(&group->next);
+			kfree(group);
 
+			if (list_empty(&domain->group_list)) {
+				vfio_mdev_unpin_all(domain);
+				if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu))
+					vfio_iommu_unmap_unpin_all(iommu);
+				kfree(domain);
+				iommu->mediated_domain = NULL;
+			}
+			goto detach_group_done;
+		}
+	}
+
+	if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu))
+		goto detach_group_done;
+
+	list_for_each_entry(domain, &iommu->domain_list, next) {
+		group = find_iommu_group(domain, iommu_group);
+		if (group) {
 			iommu_detach_group(domain->domain, iommu_group);
 			list_del(&group->next);
 			kfree(group);
 			/*
 			 * Group ownership provides privilege, if the group
 			 * list is empty, the domain goes away.  If it's the
-			 * last domain, then all the mappings go away too.
+			 * last domain with iommu and mediated domain doesn't
+			 * exist, the all the mappings go away too.
 			 */
 			if (list_empty(&domain->group_list)) {
-				if (list_is_singular(&iommu->domain_list))
+				if (list_is_singular(&iommu->domain_list) &&
+				    !iommu->mediated_domain)
 					vfio_iommu_unmap_unpin_all(iommu);
 				iommu_domain_free(domain->domain);
 				list_del(&domain->next);
 				kfree(domain);
 			}
-			goto done;
+			break;
 		}
 	}
 
-done:
+detach_group_done:
 	mutex_unlock(&iommu->lock);
 }
 
@@ -924,27 +1306,48 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	return iommu;
 }
 
+static void vfio_release_domain(struct vfio_domain *domain)
+{
+	struct vfio_group *group, *group_tmp;
+
+	list_for_each_entry_safe(group, group_tmp,
+				 &domain->group_list, next) {
+		if (!domain->mdev_addr_space)
+			iommu_detach_group(domain->domain, group->iommu_group);
+		list_del(&group->next);
+		kfree(group);
+	}
+
+	if (domain->mdev_addr_space)
+		vfio_mdev_unpin_all(domain);
+	else
+		iommu_domain_free(domain->domain);
+}
+
 static void vfio_iommu_type1_release(void *iommu_data)
 {
 	struct vfio_iommu *iommu = iommu_data;
 	struct vfio_domain *domain, *domain_tmp;
-	struct vfio_group *group, *group_tmp;
+
+	if (iommu->mediated_domain) {
+		vfio_release_domain(iommu->mediated_domain);
+		kfree(iommu->mediated_domain);
+		iommu->mediated_domain = NULL;
+	}
 
 	vfio_iommu_unmap_unpin_all(iommu);
 
+	if (!IS_IOMMU_CAPABLE_DOMAIN_IN_CONTAINER(iommu))
+		goto release_exit;
+
 	list_for_each_entry_safe(domain, domain_tmp,
 				 &iommu->domain_list, next) {
-		list_for_each_entry_safe(group, group_tmp,
-					 &domain->group_list, next) {
-			iommu_detach_group(domain->domain, group->iommu_group);
-			list_del(&group->next);
-			kfree(group);
-		}
-		iommu_domain_free(domain->domain);
+		vfio_release_domain(domain);
 		list_del(&domain->next);
 		kfree(domain);
 	}
 
+release_exit:
 	kfree(iommu);
 }
 
@@ -1048,6 +1451,8 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.ioctl		= vfio_iommu_type1_ioctl,
 	.attach_group	= vfio_iommu_type1_attach_group,
 	.detach_group	= vfio_iommu_type1_detach_group,
+	.pin_pages	= vfio_iommu_type1_pin_pages,
+	.unpin_pages	= vfio_iommu_type1_unpin_pages,
 };
 
 static int __init vfio_iommu_type1_init(void)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 0ecae0b..f2f0daf 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -17,6 +17,7 @@
 #include <linux/workqueue.h>
 #include <linux/poll.h>
 #include <uapi/linux/vfio.h>
+#include <linux/mdev.h>
 
 /**
  * struct vfio_device_ops - VFIO bus driver device callbacks
@@ -75,7 +76,11 @@ struct vfio_iommu_driver_ops {
 					struct iommu_group *group);
 	void		(*detach_group)(void *iommu_data,
 					struct iommu_group *group);
-
+	long		(*pin_pages)(void *iommu_data, unsigned long *user_pfn,
+				     long npage, int prot,
+				     unsigned long *phys_pfn);
+	long		(*unpin_pages)(void *iommu_data, unsigned long *pfn,
+				       long npage);
 };
 
 extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
@@ -127,6 +132,12 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
 }
 #endif /* CONFIG_EEH */
 
+extern long vfio_pin_pages(struct mdev_device *mdev, unsigned long *user_pfn,
+			   long npage, int prot, unsigned long *phys_pfn);
+
+extern long vfio_unpin_pages(struct mdev_device *mdev, unsigned long *pfn,
+			     long npage);
+
 /*
  * IRQfd - generic
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC v2 4/4] docs: Add Documentation for Mediated devices
  2016-09-02  8:16 ` [Qemu-devel] " Jike Song
@ 2016-09-02  8:16   ` Jike Song
  -1 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	zhiyuan.lv, jike.song, pbonzini, kraxel

From: Kirti Wankhede <kwankhede@nvidia.com>

Add file Documentation/vfio-mediated-device.txt that include details of
mediated device framework.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Jike Song <jike.song@intel.com>
---
 Documentation/vfio-mediated-device.txt | 203 +++++++++++++++++++++++++++++++++
 1 file changed, 203 insertions(+)
 create mode 100644 Documentation/vfio-mediated-device.txt

diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
new file mode 100644
index 0000000..39bdcd9
--- /dev/null
+++ b/Documentation/vfio-mediated-device.txt
@@ -0,0 +1,203 @@
+VFIO Mediated devices [1]
+-------------------------------------------------------------------------------
+
+There are more and more use cases/demands to virtualize the DMA devices which
+doesn't have SR_IOV capability built-in. To do this, drivers of different
+devices had to develop their own management interface and set of APIs and then
+integrate it to user space software. We've identified common requirements and
+unified management interface for such devices to make user space software
+integration easier.
+
+The VFIO driver framework provides unified APIs for direct device access. It is
+an IOMMU/device agnostic framework for exposing direct device access to
+user space, in a secure, IOMMU protected environment. This framework is
+used for multiple devices like GPUs, network adapters and compute accelerators.
+With direct device access, virtual machines or user space applications have
+direct access of physical device. This framework is reused for mediated devices.
+
+Mediated core driver provides a common interface for mediated device management
+that can be used by drivers of different devices. This module provides a generic
+interface to create/destroy mediated device, add/remove it to mediated bus
+driver, add/remove device to IOMMU group. It also provides an interface to
+register different types of bus drivers, for example, Mediated VFIO PCI driver
+is designed for mediated PCI devices and supports VFIO APIs. Similarly, driver
+can be designed to support any type of mediated device and added to this
+framework. Mediated bus driver add/delete mediated device to VFIO Group.
+
+Below is the high level block diagram, with NVIDIA, Intel and IBM devices
+as examples, since these are the devices which are going to actively use
+this module as of now.
+
+
+     +---------------+
+     |               |
+     | +-----------+ |
+     | |           | |
+     | |           | |
+     | |  mdev     | |    mdev_register_driver()     +---------------+
+     | |  bus      | |<------------------------------+               |
+     | |  driver   | |                               |               |
+     | |           | +------------------------------>| vfio_mdev.ko  |<-> VFIO user
+     | |           | |     probe()/remove()          |               |    APIs
+     | |           | |                               +---------------+
+     | |           | |
+     | +-----------+ |
+     |               |
+     |  MDEV CORE    |
+     |   MODULE      |
+     |   mdev.ko     |
+     |               |
+     | +-----------+ |  mdev_register_host_device()  +---------------+
+     | |           | +<------------------------------+               |
+     | |           | |                               |  nvidia.ko    |<-> physical
+     | |           | +------------------------------>+               |    device
+     | |           | |        callbacks              +---------------+
+     | |           | |
+     | | Physical  | |  mdev_register_host_device()  +---------------+
+     | | Device    | |<------------------------------+               |
+     | | Interface | |                               |  i915.ko      |<-> physical
+     | |           | +------------------------------>+               |    device
+     | |           | |        callbacks              +---------------+
+     | |           | |
+     | |           | |  mdev_register_host_device()  +---------------+
+     | |           | +<------------------------------+               |
+     | |           | |                               | ccw_device.ko |<-> physical
+     | |           | +------------------------------>+               |    device
+     | |           | |        callbacks              +---------------+
+     | +-----------+ |
+     +---------------+
+
+
+Registration Interfaces
+-------------------------------------------------------------------------------
+
+Mediated core driver provides two types of registration interfaces:
+
+1. Registration interface for mediated bus driver:
+-------------------------------------------------
+
+	/**
+	 * struct mdev_driver [2] - Mediated device driver
+	 * @name: driver name
+	 * @probe: called when new device created
+	 * @remove: called when device removed
+	 * @driver: device driver structure
+	 **/
+	struct mdev_driver {
+		const char *name;
+		int (*probe)(struct device *dev);
+		void (*remove)(struct device *dev);
+		int (*online)(struct device *dev);
+		int (*offline)(struct device *dev);
+		struct device_driver driver;
+	};
+
+
+
+Mediated bus driver for mdev should use this interface to register and
+unregister with core driver respectively:
+
+extern int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
+extern void mdev_unregister_driver(struct mdev_driver *drv);
+
+Mediated bus driver is responsible to add/delete mediated devices to/from VFIO
+group when devices are bound and unbound to the driver.
+
+2. Physical device driver interface:
+-----------------------------------
+This interface [3] provides a set of APIs to manage physical device related work
+in its driver. APIs are:
+
+* hdev_attr_groups: attribute groups of the mdev host device.
+* mdev_attr_groups: attribute groups of the mediated device.
+* supported_config: to provide supported configuration list by the driver.
+* create: to allocate basic resources in driver for a mediated device.
+* destroy: to free resources in driver when mediated device is destroyed.
+* start: to start the mediated device
+* stop: to stop the mediated device
+* read : read emulation callback.
+* write: write emulation callback.
+* mmap: the mmap callback
+* ioctl: the ioctl callback
+
+Drivers should use this interface to register and unregister device to mdev core
+driver respectively:
+
+	struct mdev_host *mdev_register_host_device(struct device *pdev,
+					 const struct mdev_host_ops *ops);
+	void mdev_unregister_device(struct mdev_host *host);
+
+
+Mediated device management interface via sysfs
+-------------------------------------------------------------------------------
+This is the interface that allows user space software, like libvirt, to query
+and configure mediated device in a HW agnostic fashion. This management
+interface provide flexibility to underlying physical device's driver to support
+mediated device hotplug, multiple mediated devices per virtual machine, multiple
+mediated devices from different physical devices, etc.
+
+For echo physical device, there is a mdev host device created, it shows in sysfs:
+/sys/bus/pci/devices/0000:05:00.0/mdev-host/
+---------------------------------
+
+* mdev_supported_types: (read only)
+    List the current supported mediated device types and its details.
+
+* mdev_create: (write only)
+	Create a mediated device on target physical device.
+	Input syntax: <UUID:params>
+	where,
+		UUID: mediated device's UUID
+		params: extra parameters required by driver
+	Example:
+	# echo "12345678-1234-1234-1234-123456789abc" >
+				 /sys/bus/pci/devices/0000\:05\:00.0/mdev-host/mdev_create
+
+* mdev_destroy: (write only)
+	Destroy a mediated device on a target physical device.
+	Input syntax: <UUID>
+	where,
+		UUID: mediated device's UUID
+	Example:
+	# echo "12345678-1234-1234-1234-123456789abc" >
+			       /sys/bus/pci/devices/0000\:05\:00.0/mdev-host/mdev_destroy
+
+Under mdev sysfs directory:
+/sys/bus/pci/devices/0000:05:00.0/mdev-host/12345678-1234-1234-1234-123456789abc/
+----------------------------------
+
+* online: (read/write)
+	This shows and sets the online status of mdev.
+	Input syntax: <0|1>
+	To set mdev online, simply echo "1" to the online file; "0" to
+	offline the mdev.
+	Example:
+	# echo 1 > /sys/bus/pci/devices/0000\:05\:00.0/mdev-host/12345678-1234-1234-1234-123456789abc/online
+	# echo 0 > /sys/bus/pci/devices/0000\:05\:00.0/mdev-host/12345678-1234-1234-1234-123456789abc/online
+
+
+Translation APIs for Mediated device
+------------------------------------------------------------------------------
+
+Below APIs are provided for user pfn to host pfn translation in VFIO driver:
+
+extern long vfio_pin_pages(struct mdev_device *mdev, unsigned long *user_pfn,
+                           long npage, int prot, unsigned long *phys_pfn);
+
+extern long vfio_unpin_pages(struct mdev_device *mdev, unsigned long *pfn,
+                             long npage);
+
+These functions call back into the backend IOMMU module using two callbacks of
+struct vfio_iommu_driver_ops, pin_pages and unpin_pages [4]. Currently these are
+supported in TYPE1 IOMMU module. To enable the same for other IOMMU backend
+modules, such as PPC64 sPAPR module, they need to provide these two callback
+functions.
+
+References
+-------------------------------------------------------------------------------
+
+[1] See Documentation/vfio.txt for more information on VFIO.
+[2] struct mdev_driver in include/linux/mdev.h
+[3] struct mdev_host_ops in include/linux/mdev.h
+[4] struct vfio_iommu_driver_ops in include/linux/vfio.h
+
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Qemu-devel] [RFC v2 4/4] docs: Add Documentation for Mediated devices
@ 2016-09-02  8:16   ` Jike Song
  0 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-02  8:16 UTC (permalink / raw)
  To: alex.williamson, kwankhede, cjia
  Cc: qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao, zhenyuw,
	zhiyuan.lv, jike.song, pbonzini, kraxel

From: Kirti Wankhede <kwankhede@nvidia.com>

Add file Documentation/vfio-mediated-device.txt that include details of
mediated device framework.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Neo Jia <cjia@nvidia.com>
Signed-off-by: Jike Song <jike.song@intel.com>
---
 Documentation/vfio-mediated-device.txt | 203 +++++++++++++++++++++++++++++++++
 1 file changed, 203 insertions(+)
 create mode 100644 Documentation/vfio-mediated-device.txt

diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
new file mode 100644
index 0000000..39bdcd9
--- /dev/null
+++ b/Documentation/vfio-mediated-device.txt
@@ -0,0 +1,203 @@
+VFIO Mediated devices [1]
+-------------------------------------------------------------------------------
+
+There are more and more use cases/demands to virtualize the DMA devices which
+doesn't have SR_IOV capability built-in. To do this, drivers of different
+devices had to develop their own management interface and set of APIs and then
+integrate it to user space software. We've identified common requirements and
+unified management interface for such devices to make user space software
+integration easier.
+
+The VFIO driver framework provides unified APIs for direct device access. It is
+an IOMMU/device agnostic framework for exposing direct device access to
+user space, in a secure, IOMMU protected environment. This framework is
+used for multiple devices like GPUs, network adapters and compute accelerators.
+With direct device access, virtual machines or user space applications have
+direct access of physical device. This framework is reused for mediated devices.
+
+Mediated core driver provides a common interface for mediated device management
+that can be used by drivers of different devices. This module provides a generic
+interface to create/destroy mediated device, add/remove it to mediated bus
+driver, add/remove device to IOMMU group. It also provides an interface to
+register different types of bus drivers, for example, Mediated VFIO PCI driver
+is designed for mediated PCI devices and supports VFIO APIs. Similarly, driver
+can be designed to support any type of mediated device and added to this
+framework. Mediated bus driver add/delete mediated device to VFIO Group.
+
+Below is the high level block diagram, with NVIDIA, Intel and IBM devices
+as examples, since these are the devices which are going to actively use
+this module as of now.
+
+
+     +---------------+
+     |               |
+     | +-----------+ |
+     | |           | |
+     | |           | |
+     | |  mdev     | |    mdev_register_driver()     +---------------+
+     | |  bus      | |<------------------------------+               |
+     | |  driver   | |                               |               |
+     | |           | +------------------------------>| vfio_mdev.ko  |<-> VFIO user
+     | |           | |     probe()/remove()          |               |    APIs
+     | |           | |                               +---------------+
+     | |           | |
+     | +-----------+ |
+     |               |
+     |  MDEV CORE    |
+     |   MODULE      |
+     |   mdev.ko     |
+     |               |
+     | +-----------+ |  mdev_register_host_device()  +---------------+
+     | |           | +<------------------------------+               |
+     | |           | |                               |  nvidia.ko    |<-> physical
+     | |           | +------------------------------>+               |    device
+     | |           | |        callbacks              +---------------+
+     | |           | |
+     | | Physical  | |  mdev_register_host_device()  +---------------+
+     | | Device    | |<------------------------------+               |
+     | | Interface | |                               |  i915.ko      |<-> physical
+     | |           | +------------------------------>+               |    device
+     | |           | |        callbacks              +---------------+
+     | |           | |
+     | |           | |  mdev_register_host_device()  +---------------+
+     | |           | +<------------------------------+               |
+     | |           | |                               | ccw_device.ko |<-> physical
+     | |           | +------------------------------>+               |    device
+     | |           | |        callbacks              +---------------+
+     | +-----------+ |
+     +---------------+
+
+
+Registration Interfaces
+-------------------------------------------------------------------------------
+
+Mediated core driver provides two types of registration interfaces:
+
+1. Registration interface for mediated bus driver:
+-------------------------------------------------
+
+	/**
+	 * struct mdev_driver [2] - Mediated device driver
+	 * @name: driver name
+	 * @probe: called when new device created
+	 * @remove: called when device removed
+	 * @driver: device driver structure
+	 **/
+	struct mdev_driver {
+		const char *name;
+		int (*probe)(struct device *dev);
+		void (*remove)(struct device *dev);
+		int (*online)(struct device *dev);
+		int (*offline)(struct device *dev);
+		struct device_driver driver;
+	};
+
+
+
+Mediated bus driver for mdev should use this interface to register and
+unregister with core driver respectively:
+
+extern int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
+extern void mdev_unregister_driver(struct mdev_driver *drv);
+
+Mediated bus driver is responsible to add/delete mediated devices to/from VFIO
+group when devices are bound and unbound to the driver.
+
+2. Physical device driver interface:
+-----------------------------------
+This interface [3] provides a set of APIs to manage physical device related work
+in its driver. APIs are:
+
+* hdev_attr_groups: attribute groups of the mdev host device.
+* mdev_attr_groups: attribute groups of the mediated device.
+* supported_config: to provide supported configuration list by the driver.
+* create: to allocate basic resources in driver for a mediated device.
+* destroy: to free resources in driver when mediated device is destroyed.
+* start: to start the mediated device
+* stop: to stop the mediated device
+* read : read emulation callback.
+* write: write emulation callback.
+* mmap: the mmap callback
+* ioctl: the ioctl callback
+
+Drivers should use this interface to register and unregister device to mdev core
+driver respectively:
+
+	struct mdev_host *mdev_register_host_device(struct device *pdev,
+					 const struct mdev_host_ops *ops);
+	void mdev_unregister_device(struct mdev_host *host);
+
+
+Mediated device management interface via sysfs
+-------------------------------------------------------------------------------
+This is the interface that allows user space software, like libvirt, to query
+and configure mediated device in a HW agnostic fashion. This management
+interface provide flexibility to underlying physical device's driver to support
+mediated device hotplug, multiple mediated devices per virtual machine, multiple
+mediated devices from different physical devices, etc.
+
+For echo physical device, there is a mdev host device created, it shows in sysfs:
+/sys/bus/pci/devices/0000:05:00.0/mdev-host/
+---------------------------------
+
+* mdev_supported_types: (read only)
+    List the current supported mediated device types and its details.
+
+* mdev_create: (write only)
+	Create a mediated device on target physical device.
+	Input syntax: <UUID:params>
+	where,
+		UUID: mediated device's UUID
+		params: extra parameters required by driver
+	Example:
+	# echo "12345678-1234-1234-1234-123456789abc" >
+				 /sys/bus/pci/devices/0000\:05\:00.0/mdev-host/mdev_create
+
+* mdev_destroy: (write only)
+	Destroy a mediated device on a target physical device.
+	Input syntax: <UUID>
+	where,
+		UUID: mediated device's UUID
+	Example:
+	# echo "12345678-1234-1234-1234-123456789abc" >
+			       /sys/bus/pci/devices/0000\:05\:00.0/mdev-host/mdev_destroy
+
+Under mdev sysfs directory:
+/sys/bus/pci/devices/0000:05:00.0/mdev-host/12345678-1234-1234-1234-123456789abc/
+----------------------------------
+
+* online: (read/write)
+	This shows and sets the online status of mdev.
+	Input syntax: <0|1>
+	To set mdev online, simply echo "1" to the online file; "0" to
+	offline the mdev.
+	Example:
+	# echo 1 > /sys/bus/pci/devices/0000\:05\:00.0/mdev-host/12345678-1234-1234-1234-123456789abc/online
+	# echo 0 > /sys/bus/pci/devices/0000\:05\:00.0/mdev-host/12345678-1234-1234-1234-123456789abc/online
+
+
+Translation APIs for Mediated device
+------------------------------------------------------------------------------
+
+Below APIs are provided for user pfn to host pfn translation in VFIO driver:
+
+extern long vfio_pin_pages(struct mdev_device *mdev, unsigned long *user_pfn,
+                           long npage, int prot, unsigned long *phys_pfn);
+
+extern long vfio_unpin_pages(struct mdev_device *mdev, unsigned long *pfn,
+                             long npage);
+
+These functions call back into the backend IOMMU module using two callbacks of
+struct vfio_iommu_driver_ops, pin_pages and unpin_pages [4]. Currently these are
+supported in TYPE1 IOMMU module. To enable the same for other IOMMU backend
+modules, such as PPC64 sPAPR module, they need to provide these two callback
+functions.
+
+References
+-------------------------------------------------------------------------------
+
+[1] See Documentation/vfio.txt for more information on VFIO.
+[2] struct mdev_driver in include/linux/mdev.h
+[3] struct mdev_host_ops in include/linux/mdev.h
+[4] struct vfio_iommu_driver_ops in include/linux/vfio.h
+
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC v2 0/4] adding mdev bus and vfio support
  2016-09-02  8:16 ` [Qemu-devel] " Jike Song
@ 2016-09-02 15:03   ` Alex Williamson
  -1 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2016-09-02 15:03 UTC (permalink / raw)
  To: Jike Song
  Cc: kwankhede, cjia, qemu-devel, kvm, bjsdjshi, kevin.tian,
	guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini, kraxel

On Fri,  2 Sep 2016 16:16:08 +0800
Jike Song <jike.song@intel.com> wrote:

> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> 
> 	http://www.spinics.net/lists/kvm/msg136472.html


Hi Jike,

I'm thrilled by your active participation here, but I'm confused which
versions I should be reviewing and where the primary development is
going.  Kirti sent v7 a week ago, so I would have expected a revision
based on that rather than a re-write based on v6 plus incorporation of a
few of Kirti's patches directly.  I liked the last version of these
changes a lot, but we need to figure out how to combine development
because we do not have infinite cycles for review available :-\  Thanks!

Alex

> 
> 
> Key Changes from Nvidia v6:
> 
> 	- Introduced an independent struct device to host device, thereby
> 	  formed a physical-host-mdev hierarchy, and highly reused Linux
> 	  driver core support;
> 
> 	- Added online/offline to mdev_bus_type, leveraging the 'online'
> 	  attr support from Linux driver core;
> 
> 	- Removed mdev_class and other unecessary stuff;
> 
> 	/*
> 	 * Given the changes above, the code volume of mdev core driver
> 	 * dramatically reduced by ~50%.
> 	 */
> 
> 
> 	- Interfaces between vfio_mdev and vendor driver are high-level,
> 	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
> 	  start/stop became mdev oriented, etc.;
> 
> 	/*
> 	 * Given the changes above, the code volume of mdev core driver
> 	 * dramatically reduced by ~64%.
> 	 */
> 
> 
> Test
> 
> 	- Tested with KVMGT
> 
> TODO
> 
> 	- Re-implement the attribute group of host device as long as the
> 	  sysfs hierarchy in discussion gets finalized;
> 
> 	- Move common routines from current vfio-pci into a higher location,
> 	  export them for various VFIO bus drivers and/or mdev vendor drivers;
> 
> 	- Add implementation examples for vendor drivers to Documentation;
> 
> 	- Refine IOMMU changes
> 
> 
> 
> Jike Song (2):
>   Mediated device Core driver
>   vfio: VFIO bus driver for MDEV devices
> 
> Kirti Wankhede (2):
>   vfio iommu: Add support for mediated devices
>   docs: Add Documentation for Mediated devices
> 
>  Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
>  drivers/vfio/Kconfig                   |   1 +
>  drivers/vfio/Makefile                  |   1 +
>  drivers/vfio/mdev/Kconfig              |  18 ++
>  drivers/vfio/mdev/Makefile             |   5 +
>  drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
>  drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
>  drivers/vfio/mdev/mdev_private.h       |  29 ++
>  drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
>  drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
>  drivers/vfio/vfio.c                    |  82 ++++++
>  drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
>  include/linux/mdev.h                   | 159 +++++++++++
>  include/linux/vfio.h                   |  13 +-
>  14 files changed, 1709 insertions(+), 48 deletions(-)
>  create mode 100644 Documentation/vfio-mediated-device.txt
>  create mode 100644 drivers/vfio/mdev/Kconfig
>  create mode 100644 drivers/vfio/mdev/Makefile
>  create mode 100644 drivers/vfio/mdev/mdev_core.c
>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
>  create mode 100644 drivers/vfio/mdev/mdev_private.h
>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
>  create mode 100644 include/linux/mdev.h
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support
@ 2016-09-02 15:03   ` Alex Williamson
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2016-09-02 15:03 UTC (permalink / raw)
  To: Jike Song
  Cc: kwankhede, cjia, qemu-devel, kvm, bjsdjshi, kevin.tian,
	guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini, kraxel

On Fri,  2 Sep 2016 16:16:08 +0800
Jike Song <jike.song@intel.com> wrote:

> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> 
> 	http://www.spinics.net/lists/kvm/msg136472.html


Hi Jike,

I'm thrilled by your active participation here, but I'm confused which
versions I should be reviewing and where the primary development is
going.  Kirti sent v7 a week ago, so I would have expected a revision
based on that rather than a re-write based on v6 plus incorporation of a
few of Kirti's patches directly.  I liked the last version of these
changes a lot, but we need to figure out how to combine development
because we do not have infinite cycles for review available :-\  Thanks!

Alex

> 
> 
> Key Changes from Nvidia v6:
> 
> 	- Introduced an independent struct device to host device, thereby
> 	  formed a physical-host-mdev hierarchy, and highly reused Linux
> 	  driver core support;
> 
> 	- Added online/offline to mdev_bus_type, leveraging the 'online'
> 	  attr support from Linux driver core;
> 
> 	- Removed mdev_class and other unecessary stuff;
> 
> 	/*
> 	 * Given the changes above, the code volume of mdev core driver
> 	 * dramatically reduced by ~50%.
> 	 */
> 
> 
> 	- Interfaces between vfio_mdev and vendor driver are high-level,
> 	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
> 	  start/stop became mdev oriented, etc.;
> 
> 	/*
> 	 * Given the changes above, the code volume of mdev core driver
> 	 * dramatically reduced by ~64%.
> 	 */
> 
> 
> Test
> 
> 	- Tested with KVMGT
> 
> TODO
> 
> 	- Re-implement the attribute group of host device as long as the
> 	  sysfs hierarchy in discussion gets finalized;
> 
> 	- Move common routines from current vfio-pci into a higher location,
> 	  export them for various VFIO bus drivers and/or mdev vendor drivers;
> 
> 	- Add implementation examples for vendor drivers to Documentation;
> 
> 	- Refine IOMMU changes
> 
> 
> 
> Jike Song (2):
>   Mediated device Core driver
>   vfio: VFIO bus driver for MDEV devices
> 
> Kirti Wankhede (2):
>   vfio iommu: Add support for mediated devices
>   docs: Add Documentation for Mediated devices
> 
>  Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
>  drivers/vfio/Kconfig                   |   1 +
>  drivers/vfio/Makefile                  |   1 +
>  drivers/vfio/mdev/Kconfig              |  18 ++
>  drivers/vfio/mdev/Makefile             |   5 +
>  drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
>  drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
>  drivers/vfio/mdev/mdev_private.h       |  29 ++
>  drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
>  drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
>  drivers/vfio/vfio.c                    |  82 ++++++
>  drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
>  include/linux/mdev.h                   | 159 +++++++++++
>  include/linux/vfio.h                   |  13 +-
>  14 files changed, 1709 insertions(+), 48 deletions(-)
>  create mode 100644 Documentation/vfio-mediated-device.txt
>  create mode 100644 drivers/vfio/mdev/Kconfig
>  create mode 100644 drivers/vfio/mdev/Makefile
>  create mode 100644 drivers/vfio/mdev/mdev_core.c
>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
>  create mode 100644 drivers/vfio/mdev/mdev_private.h
>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
>  create mode 100644 include/linux/mdev.h
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v2 0/4] adding mdev bus and vfio support
  2016-09-02 15:03   ` [Qemu-devel] " Alex Williamson
@ 2016-09-02 20:05     ` Neo Jia
  -1 siblings, 0 replies; 28+ messages in thread
From: Neo Jia @ 2016-09-02 20:05 UTC (permalink / raw)
  To: Alex Williamson, Jike Song
  Cc: kwankhede, qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao,
	zhenyuw, zhiyuan.lv, pbonzini, kraxel

On Fri, Sep 02, 2016 at 09:03:52AM -0600, Alex Williamson wrote:
> On Fri,  2 Sep 2016 16:16:08 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
> > This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> > 
> > 	http://www.spinics.net/lists/kvm/msg136472.html
> 
> 
> Hi Jike,
> 
> I'm thrilled by your active participation here, but I'm confused which
> versions I should be reviewing and where the primary development is
> going.  Kirti sent v7 a week ago, so I would have expected a revision
> based on that rather than a re-write based on v6 plus incorporation of a
> few of Kirti's patches directly.  I liked the last version of these
> changes a lot, but we need to figure out how to combine development
> because we do not have infinite cycles for review available :-\  Thanks!

Agree with Alex, and the primary development is on Kirti's v7 patches thread.

Jike, could you please join us in the existing code review thread?

I know you are already there with the sysfs discussion recently, but I would
like to see your comments on the rest stuff so we can know how to best
accommodate your requirements and needs in the future revisions.

I believe that would be the best and fastest way to collaborate and that is the 
main purpose of having code review cycles.

Thanks,
Neo

> 
> Alex
> 
> > 
> > 
> > Key Changes from Nvidia v6:
> > 
> > 	- Introduced an independent struct device to host device, thereby
> > 	  formed a physical-host-mdev hierarchy, and highly reused Linux
> > 	  driver core support;
> > 
> > 	- Added online/offline to mdev_bus_type, leveraging the 'online'
> > 	  attr support from Linux driver core;
> > 
> > 	- Removed mdev_class and other unecessary stuff;
> > 
> > 	/*
> > 	 * Given the changes above, the code volume of mdev core driver
> > 	 * dramatically reduced by ~50%.
> > 	 */
> > 
> > 
> > 	- Interfaces between vfio_mdev and vendor driver are high-level,
> > 	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
> > 	  start/stop became mdev oriented, etc.;
> > 
> > 	/*
> > 	 * Given the changes above, the code volume of mdev core driver
> > 	 * dramatically reduced by ~64%.
> > 	 */
> > 
> > 
> > Test
> > 
> > 	- Tested with KVMGT
> > 
> > TODO
> > 
> > 	- Re-implement the attribute group of host device as long as the
> > 	  sysfs hierarchy in discussion gets finalized;
> > 
> > 	- Move common routines from current vfio-pci into a higher location,
> > 	  export them for various VFIO bus drivers and/or mdev vendor drivers;
> > 
> > 	- Add implementation examples for vendor drivers to Documentation;
> > 
> > 	- Refine IOMMU changes
> > 
> > 
> > 
> > Jike Song (2):
> >   Mediated device Core driver
> >   vfio: VFIO bus driver for MDEV devices
> > 
> > Kirti Wankhede (2):
> >   vfio iommu: Add support for mediated devices
> >   docs: Add Documentation for Mediated devices
> > 
> >  Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
> >  drivers/vfio/Kconfig                   |   1 +
> >  drivers/vfio/Makefile                  |   1 +
> >  drivers/vfio/mdev/Kconfig              |  18 ++
> >  drivers/vfio/mdev/Makefile             |   5 +
> >  drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
> >  drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
> >  drivers/vfio/mdev/mdev_private.h       |  29 ++
> >  drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
> >  drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
> >  drivers/vfio/vfio.c                    |  82 ++++++
> >  drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
> >  include/linux/mdev.h                   | 159 +++++++++++
> >  include/linux/vfio.h                   |  13 +-
> >  14 files changed, 1709 insertions(+), 48 deletions(-)
> >  create mode 100644 Documentation/vfio-mediated-device.txt
> >  create mode 100644 drivers/vfio/mdev/Kconfig
> >  create mode 100644 drivers/vfio/mdev/Makefile
> >  create mode 100644 drivers/vfio/mdev/mdev_core.c
> >  create mode 100644 drivers/vfio/mdev/mdev_driver.c
> >  create mode 100644 drivers/vfio/mdev/mdev_private.h
> >  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
> >  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
> >  create mode 100644 include/linux/mdev.h
> > 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support
@ 2016-09-02 20:05     ` Neo Jia
  0 siblings, 0 replies; 28+ messages in thread
From: Neo Jia @ 2016-09-02 20:05 UTC (permalink / raw)
  To: Alex Williamson, Jike Song
  Cc: kwankhede, qemu-devel, kvm, bjsdjshi, kevin.tian, guangrong.xiao,
	zhenyuw, zhiyuan.lv, pbonzini, kraxel

On Fri, Sep 02, 2016 at 09:03:52AM -0600, Alex Williamson wrote:
> On Fri,  2 Sep 2016 16:16:08 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
> > This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> > 
> > 	http://www.spinics.net/lists/kvm/msg136472.html
> 
> 
> Hi Jike,
> 
> I'm thrilled by your active participation here, but I'm confused which
> versions I should be reviewing and where the primary development is
> going.  Kirti sent v7 a week ago, so I would have expected a revision
> based on that rather than a re-write based on v6 plus incorporation of a
> few of Kirti's patches directly.  I liked the last version of these
> changes a lot, but we need to figure out how to combine development
> because we do not have infinite cycles for review available :-\  Thanks!

Agree with Alex, and the primary development is on Kirti's v7 patches thread.

Jike, could you please join us in the existing code review thread?

I know you are already there with the sysfs discussion recently, but I would
like to see your comments on the rest stuff so we can know how to best
accommodate your requirements and needs in the future revisions.

I believe that would be the best and fastest way to collaborate and that is the 
main purpose of having code review cycles.

Thanks,
Neo

> 
> Alex
> 
> > 
> > 
> > Key Changes from Nvidia v6:
> > 
> > 	- Introduced an independent struct device to host device, thereby
> > 	  formed a physical-host-mdev hierarchy, and highly reused Linux
> > 	  driver core support;
> > 
> > 	- Added online/offline to mdev_bus_type, leveraging the 'online'
> > 	  attr support from Linux driver core;
> > 
> > 	- Removed mdev_class and other unecessary stuff;
> > 
> > 	/*
> > 	 * Given the changes above, the code volume of mdev core driver
> > 	 * dramatically reduced by ~50%.
> > 	 */
> > 
> > 
> > 	- Interfaces between vfio_mdev and vendor driver are high-level,
> > 	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
> > 	  start/stop became mdev oriented, etc.;
> > 
> > 	/*
> > 	 * Given the changes above, the code volume of mdev core driver
> > 	 * dramatically reduced by ~64%.
> > 	 */
> > 
> > 
> > Test
> > 
> > 	- Tested with KVMGT
> > 
> > TODO
> > 
> > 	- Re-implement the attribute group of host device as long as the
> > 	  sysfs hierarchy in discussion gets finalized;
> > 
> > 	- Move common routines from current vfio-pci into a higher location,
> > 	  export them for various VFIO bus drivers and/or mdev vendor drivers;
> > 
> > 	- Add implementation examples for vendor drivers to Documentation;
> > 
> > 	- Refine IOMMU changes
> > 
> > 
> > 
> > Jike Song (2):
> >   Mediated device Core driver
> >   vfio: VFIO bus driver for MDEV devices
> > 
> > Kirti Wankhede (2):
> >   vfio iommu: Add support for mediated devices
> >   docs: Add Documentation for Mediated devices
> > 
> >  Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
> >  drivers/vfio/Kconfig                   |   1 +
> >  drivers/vfio/Makefile                  |   1 +
> >  drivers/vfio/mdev/Kconfig              |  18 ++
> >  drivers/vfio/mdev/Makefile             |   5 +
> >  drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
> >  drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
> >  drivers/vfio/mdev/mdev_private.h       |  29 ++
> >  drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
> >  drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
> >  drivers/vfio/vfio.c                    |  82 ++++++
> >  drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
> >  include/linux/mdev.h                   | 159 +++++++++++
> >  include/linux/vfio.h                   |  13 +-
> >  14 files changed, 1709 insertions(+), 48 deletions(-)
> >  create mode 100644 Documentation/vfio-mediated-device.txt
> >  create mode 100644 drivers/vfio/mdev/Kconfig
> >  create mode 100644 drivers/vfio/mdev/Makefile
> >  create mode 100644 drivers/vfio/mdev/mdev_core.c
> >  create mode 100644 drivers/vfio/mdev/mdev_driver.c
> >  create mode 100644 drivers/vfio/mdev/mdev_private.h
> >  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
> >  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
> >  create mode 100644 include/linux/mdev.h
> > 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 4/4] docs: Add Documentation for Mediated devices
  2016-09-02  8:16   ` [Qemu-devel] " Jike Song
@ 2016-09-02 22:09     ` Eric Blake
  -1 siblings, 0 replies; 28+ messages in thread
From: Eric Blake @ 2016-09-02 22:09 UTC (permalink / raw)
  To: Jike Song, alex.williamson, kwankhede, cjia
  Cc: kevin.tian, guangrong.xiao, kvm, qemu-devel, zhenyuw, zhiyuan.lv,
	pbonzini, bjsdjshi, kraxel


[-- Attachment #1.1: Type: text/plain, Size: 4822 bytes --]

On 09/02/2016 03:16 AM, Jike Song wrote:
> From: Kirti Wankhede <kwankhede@nvidia.com>
> 
> Add file Documentation/vfio-mediated-device.txt that include details of
> mediated device framework.
> 
> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> Signed-off-by: Neo Jia <cjia@nvidia.com>
> Signed-off-by: Jike Song <jike.song@intel.com>
> ---
>  Documentation/vfio-mediated-device.txt | 203 +++++++++++++++++++++++++++++++++
>  1 file changed, 203 insertions(+)
>  create mode 100644 Documentation/vfio-mediated-device.txt
> 
> diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> new file mode 100644
> index 0000000..39bdcd9
> --- /dev/null
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -0,0 +1,203 @@
> +VFIO Mediated devices [1]
> +-------------------------------------------------------------------------------

Many files under Documentation trim the ---- decorator to the length of
the line above.

Also, since you have no explicit copyright/license notice, your
documentation is under GPLv2+ per the top level.  Other files do this,
and if you are okay with it, I won't complain; but if you intended
something else, or even if you wanted to make it explicit rather than
implicit, then you may want to copy the example of files that call out a
quick blurb on copyright and licensing.

> +
> +There are more and more use cases/demands to virtualize the DMA devices which
> +doesn't have SR_IOV capability built-in. To do this, drivers of different

s/doesn't/don't/

> +devices had to develop their own management interface and set of APIs and then
> +integrate it to user space software. We've identified common requirements and
> +unified management interface for such devices to make user space software
> +integration easier.
> +
> +The VFIO driver framework provides unified APIs for direct device access. It is
> +an IOMMU/device agnostic framework for exposing direct device access to
> +user space, in a secure, IOMMU protected environment. This framework is
> +used for multiple devices like GPUs, network adapters and compute accelerators.
> +With direct device access, virtual machines or user space applications have
> +direct access of physical device. This framework is reused for mediated devices.
> +
> +Mediated core driver provides a common interface for mediated device management
> +that can be used by drivers of different devices. This module provides a generic
> +interface to create/destroy mediated device, add/remove it to mediated bus

s/mediated/a mediated/ twice

> +driver, add/remove device to IOMMU group. It also provides an interface to

s/add/and add/
s/device to/a device to an/

> +register different types of bus drivers, for example, Mediated VFIO PCI driver
> +is designed for mediated PCI devices and supports VFIO APIs. Similarly, driver

s/driver/the driver/

> +can be designed to support any type of mediated device and added to this
> +framework. Mediated bus driver add/delete mediated device to VFIO Group.

Missing a verb and several articles, but I'm not sure what you meant.
Maybe:

A mediated bus driver can add/delete mediated devices to a VFIO Group.

> +
> +Below is the high level block diagram, with NVIDIA, Intel and IBM devices
> +as examples, since these are the devices which are going to actively use
> +this module as of now.
> +

> +
> +
> +Registration Interfaces
> +-------------------------------------------------------------------------------
> +

Again, rather long separator,

> +Mediated core driver provides two types of registration interfaces:
> +
> +1. Registration interface for mediated bus driver:
> +-------------------------------------------------

while this seems one byte short.  I'll quit pointing it out.

> +
> +	/**
> +	 * struct mdev_driver [2] - Mediated device driver
> +	 * @name: driver name
> +	 * @probe: called when new device created
> +	 * @remove: called when device removed
> +	 * @driver: device driver structure

No mention of online, offline.

> +	 **/
> +	struct mdev_driver {
> +		const char *name;
> +		int (*probe)(struct device *dev);
> +		void (*remove)(struct device *dev);
> +		int (*online)(struct device *dev);
> +		int (*offline)(struct device *dev);
> +		struct device_driver driver;
> +	};
> +
> +
...
> +
> +For echo physical device, there is a mdev host device created, it shows in sysfs:
> +/sys/bus/pci/devices/0000:05:00.0/mdev-host/

Did you mean 'For echoing a' (as in duplicating the device), or maybe
'For echoing to a' (as in duplicating data sent to the device)?  Or is
this a typo s/echo/each/?


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 4/4] docs: Add Documentation for Mediated devices
@ 2016-09-02 22:09     ` Eric Blake
  0 siblings, 0 replies; 28+ messages in thread
From: Eric Blake @ 2016-09-02 22:09 UTC (permalink / raw)
  To: Jike Song, alex.williamson, kwankhede, cjia
  Cc: kevin.tian, guangrong.xiao, kvm, qemu-devel, zhenyuw, zhiyuan.lv,
	pbonzini, bjsdjshi, kraxel

[-- Attachment #1: Type: text/plain, Size: 4822 bytes --]

On 09/02/2016 03:16 AM, Jike Song wrote:
> From: Kirti Wankhede <kwankhede@nvidia.com>
> 
> Add file Documentation/vfio-mediated-device.txt that include details of
> mediated device framework.
> 
> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> Signed-off-by: Neo Jia <cjia@nvidia.com>
> Signed-off-by: Jike Song <jike.song@intel.com>
> ---
>  Documentation/vfio-mediated-device.txt | 203 +++++++++++++++++++++++++++++++++
>  1 file changed, 203 insertions(+)
>  create mode 100644 Documentation/vfio-mediated-device.txt
> 
> diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> new file mode 100644
> index 0000000..39bdcd9
> --- /dev/null
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -0,0 +1,203 @@
> +VFIO Mediated devices [1]
> +-------------------------------------------------------------------------------

Many files under Documentation trim the ---- decorator to the length of
the line above.

Also, since you have no explicit copyright/license notice, your
documentation is under GPLv2+ per the top level.  Other files do this,
and if you are okay with it, I won't complain; but if you intended
something else, or even if you wanted to make it explicit rather than
implicit, then you may want to copy the example of files that call out a
quick blurb on copyright and licensing.

> +
> +There are more and more use cases/demands to virtualize the DMA devices which
> +doesn't have SR_IOV capability built-in. To do this, drivers of different

s/doesn't/don't/

> +devices had to develop their own management interface and set of APIs and then
> +integrate it to user space software. We've identified common requirements and
> +unified management interface for such devices to make user space software
> +integration easier.
> +
> +The VFIO driver framework provides unified APIs for direct device access. It is
> +an IOMMU/device agnostic framework for exposing direct device access to
> +user space, in a secure, IOMMU protected environment. This framework is
> +used for multiple devices like GPUs, network adapters and compute accelerators.
> +With direct device access, virtual machines or user space applications have
> +direct access of physical device. This framework is reused for mediated devices.
> +
> +Mediated core driver provides a common interface for mediated device management
> +that can be used by drivers of different devices. This module provides a generic
> +interface to create/destroy mediated device, add/remove it to mediated bus

s/mediated/a mediated/ twice

> +driver, add/remove device to IOMMU group. It also provides an interface to

s/add/and add/
s/device to/a device to an/

> +register different types of bus drivers, for example, Mediated VFIO PCI driver
> +is designed for mediated PCI devices and supports VFIO APIs. Similarly, driver

s/driver/the driver/

> +can be designed to support any type of mediated device and added to this
> +framework. Mediated bus driver add/delete mediated device to VFIO Group.

Missing a verb and several articles, but I'm not sure what you meant.
Maybe:

A mediated bus driver can add/delete mediated devices to a VFIO Group.

> +
> +Below is the high level block diagram, with NVIDIA, Intel and IBM devices
> +as examples, since these are the devices which are going to actively use
> +this module as of now.
> +

> +
> +
> +Registration Interfaces
> +-------------------------------------------------------------------------------
> +

Again, rather long separator,

> +Mediated core driver provides two types of registration interfaces:
> +
> +1. Registration interface for mediated bus driver:
> +-------------------------------------------------

while this seems one byte short.  I'll quit pointing it out.

> +
> +	/**
> +	 * struct mdev_driver [2] - Mediated device driver
> +	 * @name: driver name
> +	 * @probe: called when new device created
> +	 * @remove: called when device removed
> +	 * @driver: device driver structure

No mention of online, offline.

> +	 **/
> +	struct mdev_driver {
> +		const char *name;
> +		int (*probe)(struct device *dev);
> +		void (*remove)(struct device *dev);
> +		int (*online)(struct device *dev);
> +		int (*offline)(struct device *dev);
> +		struct device_driver driver;
> +	};
> +
> +
...
> +
> +For echo physical device, there is a mdev host device created, it shows in sysfs:
> +/sys/bus/pci/devices/0000:05:00.0/mdev-host/

Did you mean 'For echoing a' (as in duplicating the device), or maybe
'For echoing to a' (as in duplicating data sent to the device)?  Or is
this a typo s/echo/each/?


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 4/4] docs: Add Documentation for Mediated devices
  2016-09-02 22:09     ` Eric Blake
@ 2016-09-02 23:30       ` Neo Jia
  -1 siblings, 0 replies; 28+ messages in thread
From: Neo Jia @ 2016-09-02 23:30 UTC (permalink / raw)
  To: Eric Blake
  Cc: Jike Song, alex.williamson, kwankhede, kevin.tian,
	guangrong.xiao, kvm, qemu-devel, zhenyuw, zhiyuan.lv, pbonzini,
	bjsdjshi, kraxel

On Fri, Sep 02, 2016 at 05:09:46PM -0500, Eric Blake wrote:
> * PGP Signed by an unknown key
> 
> On 09/02/2016 03:16 AM, Jike Song wrote:
> > From: Kirti Wankhede <kwankhede@nvidia.com>
> > 
> > Add file Documentation/vfio-mediated-device.txt that include details of
> > mediated device framework.
> > 
> > Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> > Signed-off-by: Neo Jia <cjia@nvidia.com>
> > Signed-off-by: Jike Song <jike.song@intel.com>
> > ---
> >  Documentation/vfio-mediated-device.txt | 203 +++++++++++++++++++++++++++++++++
> >  1 file changed, 203 insertions(+)
> >  create mode 100644 Documentation/vfio-mediated-device.txt
> > 
> > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> > new file mode 100644
> > index 0000000..39bdcd9
> > --- /dev/null
> > +++ b/Documentation/vfio-mediated-device.txt
> > @@ -0,0 +1,203 @@
> > +VFIO Mediated devices [1]
> > +-------------------------------------------------------------------------------
> 
> Many files under Documentation trim the ---- decorator to the length of
> the line above.
> 
> Also, since you have no explicit copyright/license notice, your
> documentation is under GPLv2+ per the top level.  Other files do this,
> and if you are okay with it, I won't complain; but if you intended
> something else, or even if you wanted to make it explicit rather than
> implicit, then you may want to copy the example of files that call out a
> quick blurb on copyright and licensing.
> 

Hi Eric,

Thanks for the review and really sorry about the extra email threads of this review, 
we already have one actively going on for a while starting from RFC to currently v7.

http://www.spinics.net/lists/kvm/msg137208.html

And the related latest v7 document is at:

http://www.spinics.net/lists/kvm/msg137210.html

We will address all your review comments there.

Thanks,
Neo


> > +
> > +There are more and more use cases/demands to virtualize the DMA devices which
> > +doesn't have SR_IOV capability built-in. To do this, drivers of different
> 
> s/doesn't/don't/
> 
> > +devices had to develop their own management interface and set of APIs and then
> > +integrate it to user space software. We've identified common requirements and
> > +unified management interface for such devices to make user space software
> > +integration easier.
> > +
> > +The VFIO driver framework provides unified APIs for direct device access. It is
> > +an IOMMU/device agnostic framework for exposing direct device access to
> > +user space, in a secure, IOMMU protected environment. This framework is
> > +used for multiple devices like GPUs, network adapters and compute accelerators.
> > +With direct device access, virtual machines or user space applications have
> > +direct access of physical device. This framework is reused for mediated devices.
> > +
> > +Mediated core driver provides a common interface for mediated device management
> > +that can be used by drivers of different devices. This module provides a generic
> > +interface to create/destroy mediated device, add/remove it to mediated bus
> 
> s/mediated/a mediated/ twice
> 
> > +driver, add/remove device to IOMMU group. It also provides an interface to
> 
> s/add/and add/
> s/device to/a device to an/
> 
> > +register different types of bus drivers, for example, Mediated VFIO PCI driver
> > +is designed for mediated PCI devices and supports VFIO APIs. Similarly, driver
> 
> s/driver/the driver/
> 
> > +can be designed to support any type of mediated device and added to this
> > +framework. Mediated bus driver add/delete mediated device to VFIO Group.
> 
> Missing a verb and several articles, but I'm not sure what you meant.
> Maybe:
> 
> A mediated bus driver can add/delete mediated devices to a VFIO Group.
> 
> > +
> > +Below is the high level block diagram, with NVIDIA, Intel and IBM devices
> > +as examples, since these are the devices which are going to actively use
> > +this module as of now.
> > +
> 
> > +
> > +
> > +Registration Interfaces
> > +-------------------------------------------------------------------------------
> > +
> 
> Again, rather long separator,
> 
> > +Mediated core driver provides two types of registration interfaces:
> > +
> > +1. Registration interface for mediated bus driver:
> > +-------------------------------------------------
> 
> while this seems one byte short.  I'll quit pointing it out.
> 
> > +
> > +	/**
> > +	 * struct mdev_driver [2] - Mediated device driver
> > +	 * @name: driver name
> > +	 * @probe: called when new device created
> > +	 * @remove: called when device removed
> > +	 * @driver: device driver structure
> 
> No mention of online, offline.
> 
> > +	 **/
> > +	struct mdev_driver {
> > +		const char *name;
> > +		int (*probe)(struct device *dev);
> > +		void (*remove)(struct device *dev);
> > +		int (*online)(struct device *dev);
> > +		int (*offline)(struct device *dev);
> > +		struct device_driver driver;
> > +	};
> > +
> > +
> ...
> > +
> > +For echo physical device, there is a mdev host device created, it shows in sysfs:
> > +/sys/bus/pci/devices/0000:05:00.0/mdev-host/
> 
> Did you mean 'For echoing a' (as in duplicating the device), or maybe
> 'For echoing to a' (as in duplicating data sent to the device)?  Or is
> this a typo s/echo/each/?
> 
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 
> 
> * Unknown Key
> * 0x2527436A

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 4/4] docs: Add Documentation for Mediated devices
@ 2016-09-02 23:30       ` Neo Jia
  0 siblings, 0 replies; 28+ messages in thread
From: Neo Jia @ 2016-09-02 23:30 UTC (permalink / raw)
  To: Eric Blake
  Cc: Jike Song, alex.williamson, kwankhede, kevin.tian,
	guangrong.xiao, kvm, qemu-devel, zhenyuw, zhiyuan.lv, pbonzini,
	bjsdjshi, kraxel

On Fri, Sep 02, 2016 at 05:09:46PM -0500, Eric Blake wrote:
> * PGP Signed by an unknown key
> 
> On 09/02/2016 03:16 AM, Jike Song wrote:
> > From: Kirti Wankhede <kwankhede@nvidia.com>
> > 
> > Add file Documentation/vfio-mediated-device.txt that include details of
> > mediated device framework.
> > 
> > Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> > Signed-off-by: Neo Jia <cjia@nvidia.com>
> > Signed-off-by: Jike Song <jike.song@intel.com>
> > ---
> >  Documentation/vfio-mediated-device.txt | 203 +++++++++++++++++++++++++++++++++
> >  1 file changed, 203 insertions(+)
> >  create mode 100644 Documentation/vfio-mediated-device.txt
> > 
> > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> > new file mode 100644
> > index 0000000..39bdcd9
> > --- /dev/null
> > +++ b/Documentation/vfio-mediated-device.txt
> > @@ -0,0 +1,203 @@
> > +VFIO Mediated devices [1]
> > +-------------------------------------------------------------------------------
> 
> Many files under Documentation trim the ---- decorator to the length of
> the line above.
> 
> Also, since you have no explicit copyright/license notice, your
> documentation is under GPLv2+ per the top level.  Other files do this,
> and if you are okay with it, I won't complain; but if you intended
> something else, or even if you wanted to make it explicit rather than
> implicit, then you may want to copy the example of files that call out a
> quick blurb on copyright and licensing.
> 

Hi Eric,

Thanks for the review and really sorry about the extra email threads of this review, 
we already have one actively going on for a while starting from RFC to currently v7.

http://www.spinics.net/lists/kvm/msg137208.html

And the related latest v7 document is at:

http://www.spinics.net/lists/kvm/msg137210.html

We will address all your review comments there.

Thanks,
Neo


> > +
> > +There are more and more use cases/demands to virtualize the DMA devices which
> > +doesn't have SR_IOV capability built-in. To do this, drivers of different
> 
> s/doesn't/don't/
> 
> > +devices had to develop their own management interface and set of APIs and then
> > +integrate it to user space software. We've identified common requirements and
> > +unified management interface for such devices to make user space software
> > +integration easier.
> > +
> > +The VFIO driver framework provides unified APIs for direct device access. It is
> > +an IOMMU/device agnostic framework for exposing direct device access to
> > +user space, in a secure, IOMMU protected environment. This framework is
> > +used for multiple devices like GPUs, network adapters and compute accelerators.
> > +With direct device access, virtual machines or user space applications have
> > +direct access of physical device. This framework is reused for mediated devices.
> > +
> > +Mediated core driver provides a common interface for mediated device management
> > +that can be used by drivers of different devices. This module provides a generic
> > +interface to create/destroy mediated device, add/remove it to mediated bus
> 
> s/mediated/a mediated/ twice
> 
> > +driver, add/remove device to IOMMU group. It also provides an interface to
> 
> s/add/and add/
> s/device to/a device to an/
> 
> > +register different types of bus drivers, for example, Mediated VFIO PCI driver
> > +is designed for mediated PCI devices and supports VFIO APIs. Similarly, driver
> 
> s/driver/the driver/
> 
> > +can be designed to support any type of mediated device and added to this
> > +framework. Mediated bus driver add/delete mediated device to VFIO Group.
> 
> Missing a verb and several articles, but I'm not sure what you meant.
> Maybe:
> 
> A mediated bus driver can add/delete mediated devices to a VFIO Group.
> 
> > +
> > +Below is the high level block diagram, with NVIDIA, Intel and IBM devices
> > +as examples, since these are the devices which are going to actively use
> > +this module as of now.
> > +
> 
> > +
> > +
> > +Registration Interfaces
> > +-------------------------------------------------------------------------------
> > +
> 
> Again, rather long separator,
> 
> > +Mediated core driver provides two types of registration interfaces:
> > +
> > +1. Registration interface for mediated bus driver:
> > +-------------------------------------------------
> 
> while this seems one byte short.  I'll quit pointing it out.
> 
> > +
> > +	/**
> > +	 * struct mdev_driver [2] - Mediated device driver
> > +	 * @name: driver name
> > +	 * @probe: called when new device created
> > +	 * @remove: called when device removed
> > +	 * @driver: device driver structure
> 
> No mention of online, offline.
> 
> > +	 **/
> > +	struct mdev_driver {
> > +		const char *name;
> > +		int (*probe)(struct device *dev);
> > +		void (*remove)(struct device *dev);
> > +		int (*online)(struct device *dev);
> > +		int (*offline)(struct device *dev);
> > +		struct device_driver driver;
> > +	};
> > +
> > +
> ...
> > +
> > +For echo physical device, there is a mdev host device created, it shows in sysfs:
> > +/sys/bus/pci/devices/0000:05:00.0/mdev-host/
> 
> Did you mean 'For echoing a' (as in duplicating the device), or maybe
> 'For echoing to a' (as in duplicating data sent to the device)?  Or is
> this a typo s/echo/each/?
> 
> 
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 
> 
> * Unknown Key
> * 0x2527436A

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v2 0/4] adding mdev bus and vfio support
  2016-09-02 15:03   ` [Qemu-devel] " Alex Williamson
@ 2016-09-07  2:22     ` Jike Song
  -1 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-07  2:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kevin.tian, cjia, kvm, qemu-devel, zhenyuw, kwankhede,
	zhiyuan.lv, pbonzini, guangrong.xiao, bjsdjshi, kraxel

On 09/02/2016 11:03 PM, Alex Williamson wrote:
> On Fri,  2 Sep 2016 16:16:08 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
>>
>> 	http://www.spinics.net/lists/kvm/msg136472.html
> 
> 
> Hi Jike,
> 
> I'm thrilled by your active participation here, but I'm confused which
> versions I should be reviewing and where the primary development is
> going.  Kirti sent v7 a week ago, so I would have expected a revision
> based on that rather than a re-write based on v6 plus incorporation of a
> few of Kirti's patches directly.

Hi Alex,

[Sorry! replied this on Monday but it was silently dropped by the our firewall]



The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
demonstrate how is it possible and beneficial to:

	1, Introduce an independent device between physical and mdev;
	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;

Unfortunately neither was understood or adopted in v7:

	http://www.spinics.net/lists/kvm/msg137081.html

So here came the v2, as a standalone series, to give a whole and straight
demonstration. The reason of still basing on v6:

	- Addressed all v6 comments (except the iommu part);
	- There is no comments yet for v7 (except the sysfs ones);



> I liked the last version of these
> changes a lot, but we need to figure out how to combine development
> because we do not have infinite cycles for review available :-\  Thanks!

Fully understand.

Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
at the direction we prefer. We would be highly glad and thankful if Neo/Kirti
would adopt the code in their next version, which will certainly form a
more simple and consolidated base for future co-development; otherwise
and we could at least discuss the concerns, in case of any.


--
Thanks,
Jike

>>
>>
>> Key Changes from Nvidia v6:
>>
>> 	- Introduced an independent struct device to host device, thereby
>> 	  formed a physical-host-mdev hierarchy, and highly reused Linux
>> 	  driver core support;
>>
>> 	- Added online/offline to mdev_bus_type, leveraging the 'online'
>> 	  attr support from Linux driver core;
>>
>> 	- Removed mdev_class and other unecessary stuff;
>>
>> 	/*
>> 	 * Given the changes above, the code volume of mdev core driver
>> 	 * dramatically reduced by ~50%.
>> 	 */
>>
>>
>> 	- Interfaces between vfio_mdev and vendor driver are high-level,
>> 	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
>> 	  start/stop became mdev oriented, etc.;
>>
>> 	/*
>> 	 * Given the changes above, the code volume of mdev core driver
>> 	 * dramatically reduced by ~64%.
>> 	 */
>>
>>
>> Test
>>
>> 	- Tested with KVMGT
>>
>> TODO
>>
>> 	- Re-implement the attribute group of host device as long as the
>> 	  sysfs hierarchy in discussion gets finalized;
>>
>> 	- Move common routines from current vfio-pci into a higher location,
>> 	  export them for various VFIO bus drivers and/or mdev vendor drivers;
>>
>> 	- Add implementation examples for vendor drivers to Documentation;
>>
>> 	- Refine IOMMU changes
>>
>>
>>
>> Jike Song (2):
>>   Mediated device Core driver
>>   vfio: VFIO bus driver for MDEV devices
>>
>> Kirti Wankhede (2):
>>   vfio iommu: Add support for mediated devices
>>   docs: Add Documentation for Mediated devices
>>
>>  Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
>>  drivers/vfio/Kconfig                   |   1 +
>>  drivers/vfio/Makefile                  |   1 +
>>  drivers/vfio/mdev/Kconfig              |  18 ++
>>  drivers/vfio/mdev/Makefile             |   5 +
>>  drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
>>  drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
>>  drivers/vfio/mdev/mdev_private.h       |  29 ++
>>  drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
>>  drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
>>  drivers/vfio/vfio.c                    |  82 ++++++
>>  drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
>>  include/linux/mdev.h                   | 159 +++++++++++
>>  include/linux/vfio.h                   |  13 +-
>>  14 files changed, 1709 insertions(+), 48 deletions(-)
>>  create mode 100644 Documentation/vfio-mediated-device.txt
>>  create mode 100644 drivers/vfio/mdev/Kconfig
>>  create mode 100644 drivers/vfio/mdev/Makefile
>>  create mode 100644 drivers/vfio/mdev/mdev_core.c
>>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
>>  create mode 100644 drivers/vfio/mdev/mdev_private.h
>>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
>>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
>>  create mode 100644 include/linux/mdev.h
>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support
@ 2016-09-07  2:22     ` Jike Song
  0 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-07  2:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: kwankhede, cjia, qemu-devel, kvm, bjsdjshi, kevin.tian,
	guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini, kraxel

On 09/02/2016 11:03 PM, Alex Williamson wrote:
> On Fri,  2 Sep 2016 16:16:08 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
>>
>> 	http://www.spinics.net/lists/kvm/msg136472.html
> 
> 
> Hi Jike,
> 
> I'm thrilled by your active participation here, but I'm confused which
> versions I should be reviewing and where the primary development is
> going.  Kirti sent v7 a week ago, so I would have expected a revision
> based on that rather than a re-write based on v6 plus incorporation of a
> few of Kirti's patches directly.

Hi Alex,

[Sorry! replied this on Monday but it was silently dropped by the our firewall]



The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
demonstrate how is it possible and beneficial to:

	1, Introduce an independent device between physical and mdev;
	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;

Unfortunately neither was understood or adopted in v7:

	http://www.spinics.net/lists/kvm/msg137081.html

So here came the v2, as a standalone series, to give a whole and straight
demonstration. The reason of still basing on v6:

	- Addressed all v6 comments (except the iommu part);
	- There is no comments yet for v7 (except the sysfs ones);



> I liked the last version of these
> changes a lot, but we need to figure out how to combine development
> because we do not have infinite cycles for review available :-\  Thanks!

Fully understand.

Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
at the direction we prefer. We would be highly glad and thankful if Neo/Kirti
would adopt the code in their next version, which will certainly form a
more simple and consolidated base for future co-development; otherwise
and we could at least discuss the concerns, in case of any.


--
Thanks,
Jike

>>
>>
>> Key Changes from Nvidia v6:
>>
>> 	- Introduced an independent struct device to host device, thereby
>> 	  formed a physical-host-mdev hierarchy, and highly reused Linux
>> 	  driver core support;
>>
>> 	- Added online/offline to mdev_bus_type, leveraging the 'online'
>> 	  attr support from Linux driver core;
>>
>> 	- Removed mdev_class and other unecessary stuff;
>>
>> 	/*
>> 	 * Given the changes above, the code volume of mdev core driver
>> 	 * dramatically reduced by ~50%.
>> 	 */
>>
>>
>> 	- Interfaces between vfio_mdev and vendor driver are high-level,
>> 	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
>> 	  start/stop became mdev oriented, etc.;
>>
>> 	/*
>> 	 * Given the changes above, the code volume of mdev core driver
>> 	 * dramatically reduced by ~64%.
>> 	 */
>>
>>
>> Test
>>
>> 	- Tested with KVMGT
>>
>> TODO
>>
>> 	- Re-implement the attribute group of host device as long as the
>> 	  sysfs hierarchy in discussion gets finalized;
>>
>> 	- Move common routines from current vfio-pci into a higher location,
>> 	  export them for various VFIO bus drivers and/or mdev vendor drivers;
>>
>> 	- Add implementation examples for vendor drivers to Documentation;
>>
>> 	- Refine IOMMU changes
>>
>>
>>
>> Jike Song (2):
>>   Mediated device Core driver
>>   vfio: VFIO bus driver for MDEV devices
>>
>> Kirti Wankhede (2):
>>   vfio iommu: Add support for mediated devices
>>   docs: Add Documentation for Mediated devices
>>
>>  Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
>>  drivers/vfio/Kconfig                   |   1 +
>>  drivers/vfio/Makefile                  |   1 +
>>  drivers/vfio/mdev/Kconfig              |  18 ++
>>  drivers/vfio/mdev/Makefile             |   5 +
>>  drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
>>  drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
>>  drivers/vfio/mdev/mdev_private.h       |  29 ++
>>  drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
>>  drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
>>  drivers/vfio/vfio.c                    |  82 ++++++
>>  drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
>>  include/linux/mdev.h                   | 159 +++++++++++
>>  include/linux/vfio.h                   |  13 +-
>>  14 files changed, 1709 insertions(+), 48 deletions(-)
>>  create mode 100644 Documentation/vfio-mediated-device.txt
>>  create mode 100644 drivers/vfio/mdev/Kconfig
>>  create mode 100644 drivers/vfio/mdev/Makefile
>>  create mode 100644 drivers/vfio/mdev/mdev_core.c
>>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
>>  create mode 100644 drivers/vfio/mdev/mdev_private.h
>>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
>>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
>>  create mode 100644 include/linux/mdev.h
>>
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v2 0/4] adding mdev bus and vfio support
  2016-09-07  2:22     ` [Qemu-devel] " Jike Song
@ 2016-09-07  3:38       ` Neo Jia
  -1 siblings, 0 replies; 28+ messages in thread
From: Neo Jia @ 2016-09-07  3:38 UTC (permalink / raw)
  To: Jike Song
  Cc: kevin.tian, guangrong.xiao, kvm, kwankhede, zhenyuw, qemu-devel,
	Alex Williamson, zhiyuan.lv, pbonzini, bjsdjshi, kraxel

On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:
> On 09/02/2016 11:03 PM, Alex Williamson wrote:
> > On Fri,  2 Sep 2016 16:16:08 +0800
> > Jike Song <jike.song@intel.com> wrote:
> > 
> >> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> >>
> >> 	http://www.spinics.net/lists/kvm/msg136472.html
> > 
> > 
> > Hi Jike,
> > 
> > I'm thrilled by your active participation here, but I'm confused which
> > versions I should be reviewing and where the primary development is
> > going.  Kirti sent v7 a week ago, so I would have expected a revision
> > based on that rather than a re-write based on v6 plus incorporation of a
> > few of Kirti's patches directly.
> 
> Hi Alex,
> 
> [Sorry! replied this on Monday but it was silently dropped by the our firewall]
> 
> 
> 
> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
> demonstrate how is it possible and beneficial to:
> 
> 	1, Introduce an independent device between physical and mdev;
> 	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
> 
> Unfortunately neither was understood or adopted in v7:
> 
> 	http://www.spinics.net/lists/kvm/msg137081.html
> 
> So here came the v2, as a standalone series, to give a whole and straight
> demonstration. The reason of still basing on v6:
> 
> 	- Addressed all v6 comments (except the iommu part);
> 	- There is no comments yet for v7 (except the sysfs ones);
> 
> 
> 
> > I liked the last version of these
> > changes a lot, but we need to figure out how to combine development
> > because we do not have infinite cycles for review available :-\  Thanks!
> 
> Fully understand.
> 
> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
> at the direction we prefer. 

Hi Jike,

I wish I could meet you in person in KVM forum couple weeks ago so we can have a
better discussion.

We are trying our best to accommodate almost all requirements / comments from 
use cases and code reviews while keeping little (or none) architectural changes
between revisions.

> We would be highly glad and thankful if Neo/Kirti
> would adopt the code in their next version, which will certainly form a
> more simple and consolidated base for future co-development; otherwise
> and we could at least discuss the concerns, in case of any.
> 

As I have said in my previous response to you, if you have any questions about
adopting the framework that we have developed, you are very welcome to
comment/speak out on the code review thread like others. And if it is reasonable
request and won't break other vendors' use case, we will adopt it (one example
is the online file and removing the mdev pci dependency).

Just some update for you regarding the v7 patches, currently we are very
actively trying to lock down the sysfs and management interfaces discussion.

So, if you would like to make the upstream happen sooner, please join us in the
v7 and following patch discussion instead of rewriting them.

Thanks,
Neo

> 
> --
> Thanks,
> Jike
> 
> >>
> >>
> >> Key Changes from Nvidia v6:
> >>
> >> 	- Introduced an independent struct device to host device, thereby
> >> 	  formed a physical-host-mdev hierarchy, and highly reused Linux
> >> 	  driver core support;
> >>
> >> 	- Added online/offline to mdev_bus_type, leveraging the 'online'
> >> 	  attr support from Linux driver core;
> >>
> >> 	- Removed mdev_class and other unecessary stuff;
> >>
> >> 	/*
> >> 	 * Given the changes above, the code volume of mdev core driver
> >> 	 * dramatically reduced by ~50%.
> >> 	 */
> >>
> >>
> >> 	- Interfaces between vfio_mdev and vendor driver are high-level,
> >> 	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
> >> 	  start/stop became mdev oriented, etc.;
> >>
> >> 	/*
> >> 	 * Given the changes above, the code volume of mdev core driver
> >> 	 * dramatically reduced by ~64%.
> >> 	 */
> >>
> >>
> >> Test
> >>
> >> 	- Tested with KVMGT
> >>
> >> TODO
> >>
> >> 	- Re-implement the attribute group of host device as long as the
> >> 	  sysfs hierarchy in discussion gets finalized;
> >>
> >> 	- Move common routines from current vfio-pci into a higher location,
> >> 	  export them for various VFIO bus drivers and/or mdev vendor drivers;
> >>
> >> 	- Add implementation examples for vendor drivers to Documentation;
> >>
> >> 	- Refine IOMMU changes
> >>
> >>
> >>
> >> Jike Song (2):
> >>   Mediated device Core driver
> >>   vfio: VFIO bus driver for MDEV devices
> >>
> >> Kirti Wankhede (2):
> >>   vfio iommu: Add support for mediated devices
> >>   docs: Add Documentation for Mediated devices
> >>
> >>  Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
> >>  drivers/vfio/Kconfig                   |   1 +
> >>  drivers/vfio/Makefile                  |   1 +
> >>  drivers/vfio/mdev/Kconfig              |  18 ++
> >>  drivers/vfio/mdev/Makefile             |   5 +
> >>  drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
> >>  drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
> >>  drivers/vfio/mdev/mdev_private.h       |  29 ++
> >>  drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
> >>  drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
> >>  drivers/vfio/vfio.c                    |  82 ++++++
> >>  drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
> >>  include/linux/mdev.h                   | 159 +++++++++++
> >>  include/linux/vfio.h                   |  13 +-
> >>  14 files changed, 1709 insertions(+), 48 deletions(-)
> >>  create mode 100644 Documentation/vfio-mediated-device.txt
> >>  create mode 100644 drivers/vfio/mdev/Kconfig
> >>  create mode 100644 drivers/vfio/mdev/Makefile
> >>  create mode 100644 drivers/vfio/mdev/mdev_core.c
> >>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
> >>  create mode 100644 drivers/vfio/mdev/mdev_private.h
> >>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
> >>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
> >>  create mode 100644 include/linux/mdev.h
> >>
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support
@ 2016-09-07  3:38       ` Neo Jia
  0 siblings, 0 replies; 28+ messages in thread
From: Neo Jia @ 2016-09-07  3:38 UTC (permalink / raw)
  To: Jike Song
  Cc: Alex Williamson, kwankhede, qemu-devel, kvm, bjsdjshi,
	kevin.tian, guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini,
	kraxel

On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:
> On 09/02/2016 11:03 PM, Alex Williamson wrote:
> > On Fri,  2 Sep 2016 16:16:08 +0800
> > Jike Song <jike.song@intel.com> wrote:
> > 
> >> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> >>
> >> 	http://www.spinics.net/lists/kvm/msg136472.html
> > 
> > 
> > Hi Jike,
> > 
> > I'm thrilled by your active participation here, but I'm confused which
> > versions I should be reviewing and where the primary development is
> > going.  Kirti sent v7 a week ago, so I would have expected a revision
> > based on that rather than a re-write based on v6 plus incorporation of a
> > few of Kirti's patches directly.
> 
> Hi Alex,
> 
> [Sorry! replied this on Monday but it was silently dropped by the our firewall]
> 
> 
> 
> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
> demonstrate how is it possible and beneficial to:
> 
> 	1, Introduce an independent device between physical and mdev;
> 	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
> 
> Unfortunately neither was understood or adopted in v7:
> 
> 	http://www.spinics.net/lists/kvm/msg137081.html
> 
> So here came the v2, as a standalone series, to give a whole and straight
> demonstration. The reason of still basing on v6:
> 
> 	- Addressed all v6 comments (except the iommu part);
> 	- There is no comments yet for v7 (except the sysfs ones);
> 
> 
> 
> > I liked the last version of these
> > changes a lot, but we need to figure out how to combine development
> > because we do not have infinite cycles for review available :-\  Thanks!
> 
> Fully understand.
> 
> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
> at the direction we prefer. 

Hi Jike,

I wish I could meet you in person in KVM forum couple weeks ago so we can have a
better discussion.

We are trying our best to accommodate almost all requirements / comments from 
use cases and code reviews while keeping little (or none) architectural changes
between revisions.

> We would be highly glad and thankful if Neo/Kirti
> would adopt the code in their next version, which will certainly form a
> more simple and consolidated base for future co-development; otherwise
> and we could at least discuss the concerns, in case of any.
> 

As I have said in my previous response to you, if you have any questions about
adopting the framework that we have developed, you are very welcome to
comment/speak out on the code review thread like others. And if it is reasonable
request and won't break other vendors' use case, we will adopt it (one example
is the online file and removing the mdev pci dependency).

Just some update for you regarding the v7 patches, currently we are very
actively trying to lock down the sysfs and management interfaces discussion.

So, if you would like to make the upstream happen sooner, please join us in the
v7 and following patch discussion instead of rewriting them.

Thanks,
Neo

> 
> --
> Thanks,
> Jike
> 
> >>
> >>
> >> Key Changes from Nvidia v6:
> >>
> >> 	- Introduced an independent struct device to host device, thereby
> >> 	  formed a physical-host-mdev hierarchy, and highly reused Linux
> >> 	  driver core support;
> >>
> >> 	- Added online/offline to mdev_bus_type, leveraging the 'online'
> >> 	  attr support from Linux driver core;
> >>
> >> 	- Removed mdev_class and other unecessary stuff;
> >>
> >> 	/*
> >> 	 * Given the changes above, the code volume of mdev core driver
> >> 	 * dramatically reduced by ~50%.
> >> 	 */
> >>
> >>
> >> 	- Interfaces between vfio_mdev and vendor driver are high-level,
> >> 	  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
> >> 	  start/stop became mdev oriented, etc.;
> >>
> >> 	/*
> >> 	 * Given the changes above, the code volume of mdev core driver
> >> 	 * dramatically reduced by ~64%.
> >> 	 */
> >>
> >>
> >> Test
> >>
> >> 	- Tested with KVMGT
> >>
> >> TODO
> >>
> >> 	- Re-implement the attribute group of host device as long as the
> >> 	  sysfs hierarchy in discussion gets finalized;
> >>
> >> 	- Move common routines from current vfio-pci into a higher location,
> >> 	  export them for various VFIO bus drivers and/or mdev vendor drivers;
> >>
> >> 	- Add implementation examples for vendor drivers to Documentation;
> >>
> >> 	- Refine IOMMU changes
> >>
> >>
> >>
> >> Jike Song (2):
> >>   Mediated device Core driver
> >>   vfio: VFIO bus driver for MDEV devices
> >>
> >> Kirti Wankhede (2):
> >>   vfio iommu: Add support for mediated devices
> >>   docs: Add Documentation for Mediated devices
> >>
> >>  Documentation/vfio-mediated-device.txt | 203 ++++++++++++++
> >>  drivers/vfio/Kconfig                   |   1 +
> >>  drivers/vfio/Makefile                  |   1 +
> >>  drivers/vfio/mdev/Kconfig              |  18 ++
> >>  drivers/vfio/mdev/Makefile             |   5 +
> >>  drivers/vfio/mdev/mdev_core.c          | 250 +++++++++++++++++
> >>  drivers/vfio/mdev/mdev_driver.c        | 155 ++++++++++
> >>  drivers/vfio/mdev/mdev_private.h       |  29 ++
> >>  drivers/vfio/mdev/mdev_sysfs.c         | 155 ++++++++++
> >>  drivers/vfio/mdev/vfio_mdev.c          | 187 ++++++++++++
> >>  drivers/vfio/vfio.c                    |  82 ++++++
> >>  drivers/vfio/vfio_iommu_type1.c        | 499 +++++++++++++++++++++++++++++----
> >>  include/linux/mdev.h                   | 159 +++++++++++
> >>  include/linux/vfio.h                   |  13 +-
> >>  14 files changed, 1709 insertions(+), 48 deletions(-)
> >>  create mode 100644 Documentation/vfio-mediated-device.txt
> >>  create mode 100644 drivers/vfio/mdev/Kconfig
> >>  create mode 100644 drivers/vfio/mdev/Makefile
> >>  create mode 100644 drivers/vfio/mdev/mdev_core.c
> >>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
> >>  create mode 100644 drivers/vfio/mdev/mdev_private.h
> >>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
> >>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
> >>  create mode 100644 include/linux/mdev.h
> >>
> > 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v2 0/4] adding mdev bus and vfio support
  2016-09-07  3:38       ` [Qemu-devel] " Neo Jia
@ 2016-09-07  6:42         ` Jike Song
  -1 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-07  6:42 UTC (permalink / raw)
  To: Neo Jia
  Cc: Alex Williamson, kwankhede, qemu-devel, kvm, bjsdjshi,
	kevin.tian, guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini,
	kraxel

On 09/07/2016 11:38 AM, Neo Jia wrote:
> On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:
>> On 09/02/2016 11:03 PM, Alex Williamson wrote:
>>> On Fri,  2 Sep 2016 16:16:08 +0800
>>> Jike Song <jike.song@intel.com> wrote:
>>>
>>>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
>>>>
>>>> 	http://www.spinics.net/lists/kvm/msg136472.html
>>>
>>>
>>> Hi Jike,
>>>
>>> I'm thrilled by your active participation here, but I'm confused which
>>> versions I should be reviewing and where the primary development is
>>> going.  Kirti sent v7 a week ago, so I would have expected a revision
>>> based on that rather than a re-write based on v6 plus incorporation of a
>>> few of Kirti's patches directly.
>>
>> Hi Alex,
>>
>> [Sorry! replied this on Monday but it was silently dropped by the our firewall]
>>
>>
>>
>> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
>> demonstrate how is it possible and beneficial to:
>>
>> 	1, Introduce an independent device between physical and mdev;
>> 	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
>>
>> Unfortunately neither was understood or adopted in v7:
>>
>> 	http://www.spinics.net/lists/kvm/msg137081.html
>>
>> So here came the v2, as a standalone series, to give a whole and straight
>> demonstration. The reason of still basing on v6:
>>
>> 	- Addressed all v6 comments (except the iommu part);
>> 	- There is no comments yet for v7 (except the sysfs ones);
>>
>>
>>
>>> I liked the last version of these
>>> changes a lot, but we need to figure out how to combine development
>>> because we do not have infinite cycles for review available :-\  Thanks!
>>
>> Fully understand.
>>
>> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
>> at the direction we prefer. 
> 
> Hi Jike,
> 
> I wish I could meet you in person in KVM forum couple weeks ago so we can have a
> better discussion.

I wish I could have that opportunity, too!

> We are trying our best to accommodate almost all requirements / comments from 
> use cases and code reviews while keeping little (or none) architectural changes
> between revisions.

Yes I saw that, there was little architectural change from v6 to v7,
that's what I will argue for :)

>> We would be highly glad and thankful if Neo/Kirti
>> would adopt the code in their next version, which will certainly form a
>> more simple and consolidated base for future co-development; otherwise
>> and we could at least discuss the concerns, in case of any.
>>
> 
> As I have said in my previous response to you, if you have any questions about
> adopting the framework that we have developed, you are very welcome to
> comment/speak out on the code review thread like others. And if it is reasonable
> request and won't break other vendors' use case, we will adopt it (one example
> is the online file and removing the mdev pci dependency).
> 

Not limited to having questions about adoption, right? :)

We do think the framework itself had too much unnecessary logic and
imposed limitations to vendor drivers, also it's clearly possible to be
simplified.

> Just some update for you regarding the v7 patches, currently we are very
> actively trying to lock down the sysfs and management interfaces discussion.
> 
> So, if you would like to make the upstream happen sooner, please join us in the
> v7 and following patch discussion instead of rewriting them.
> 

So as you said, I would comment on the v7 series to propose both architectural
and implementation changes, hoping this will help more.


--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support
@ 2016-09-07  6:42         ` Jike Song
  0 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-07  6:42 UTC (permalink / raw)
  To: Neo Jia
  Cc: Alex Williamson, kwankhede, qemu-devel, kvm, bjsdjshi,
	kevin.tian, guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini,
	kraxel

On 09/07/2016 11:38 AM, Neo Jia wrote:
> On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:
>> On 09/02/2016 11:03 PM, Alex Williamson wrote:
>>> On Fri,  2 Sep 2016 16:16:08 +0800
>>> Jike Song <jike.song@intel.com> wrote:
>>>
>>>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
>>>>
>>>> 	http://www.spinics.net/lists/kvm/msg136472.html
>>>
>>>
>>> Hi Jike,
>>>
>>> I'm thrilled by your active participation here, but I'm confused which
>>> versions I should be reviewing and where the primary development is
>>> going.  Kirti sent v7 a week ago, so I would have expected a revision
>>> based on that rather than a re-write based on v6 plus incorporation of a
>>> few of Kirti's patches directly.
>>
>> Hi Alex,
>>
>> [Sorry! replied this on Monday but it was silently dropped by the our firewall]
>>
>>
>>
>> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
>> demonstrate how is it possible and beneficial to:
>>
>> 	1, Introduce an independent device between physical and mdev;
>> 	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
>>
>> Unfortunately neither was understood or adopted in v7:
>>
>> 	http://www.spinics.net/lists/kvm/msg137081.html
>>
>> So here came the v2, as a standalone series, to give a whole and straight
>> demonstration. The reason of still basing on v6:
>>
>> 	- Addressed all v6 comments (except the iommu part);
>> 	- There is no comments yet for v7 (except the sysfs ones);
>>
>>
>>
>>> I liked the last version of these
>>> changes a lot, but we need to figure out how to combine development
>>> because we do not have infinite cycles for review available :-\  Thanks!
>>
>> Fully understand.
>>
>> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
>> at the direction we prefer. 
> 
> Hi Jike,
> 
> I wish I could meet you in person in KVM forum couple weeks ago so we can have a
> better discussion.

I wish I could have that opportunity, too!

> We are trying our best to accommodate almost all requirements / comments from 
> use cases and code reviews while keeping little (or none) architectural changes
> between revisions.

Yes I saw that, there was little architectural change from v6 to v7,
that's what I will argue for :)

>> We would be highly glad and thankful if Neo/Kirti
>> would adopt the code in their next version, which will certainly form a
>> more simple and consolidated base for future co-development; otherwise
>> and we could at least discuss the concerns, in case of any.
>>
> 
> As I have said in my previous response to you, if you have any questions about
> adopting the framework that we have developed, you are very welcome to
> comment/speak out on the code review thread like others. And if it is reasonable
> request and won't break other vendors' use case, we will adopt it (one example
> is the online file and removing the mdev pci dependency).
> 

Not limited to having questions about adoption, right? :)

We do think the framework itself had too much unnecessary logic and
imposed limitations to vendor drivers, also it's clearly possible to be
simplified.

> Just some update for you regarding the v7 patches, currently we are very
> actively trying to lock down the sysfs and management interfaces discussion.
> 
> So, if you would like to make the upstream happen sooner, please join us in the
> v7 and following patch discussion instead of rewriting them.
> 

So as you said, I would comment on the v7 series to propose both architectural
and implementation changes, hoping this will help more.


--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v2 0/4] adding mdev bus and vfio support
  2016-09-07  6:42         ` [Qemu-devel] " Jike Song
@ 2016-09-07 16:56           ` Alex Williamson
  -1 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2016-09-07 16:56 UTC (permalink / raw)
  To: Jike Song
  Cc: Neo Jia, kwankhede, qemu-devel, kvm, bjsdjshi, kevin.tian,
	guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini, kraxel

On Wed, 07 Sep 2016 14:42:58 +0800
Jike Song <jike.song@intel.com> wrote:

> On 09/07/2016 11:38 AM, Neo Jia wrote:
> > On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:  
> >> On 09/02/2016 11:03 PM, Alex Williamson wrote:  
> >>> On Fri,  2 Sep 2016 16:16:08 +0800
> >>> Jike Song <jike.song@intel.com> wrote:
> >>>  
> >>>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> >>>>
> >>>> 	http://www.spinics.net/lists/kvm/msg136472.html  
> >>>
> >>>
> >>> Hi Jike,
> >>>
> >>> I'm thrilled by your active participation here, but I'm confused which
> >>> versions I should be reviewing and where the primary development is
> >>> going.  Kirti sent v7 a week ago, so I would have expected a revision
> >>> based on that rather than a re-write based on v6 plus incorporation of a
> >>> few of Kirti's patches directly.  
> >>
> >> Hi Alex,
> >>
> >> [Sorry! replied this on Monday but it was silently dropped by the our firewall]
> >>
> >>
> >>
> >> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
> >> demonstrate how is it possible and beneficial to:
> >>
> >> 	1, Introduce an independent device between physical and mdev;
> >> 	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
> >>
> >> Unfortunately neither was understood or adopted in v7:
> >>
> >> 	http://www.spinics.net/lists/kvm/msg137081.html
> >>
> >> So here came the v2, as a standalone series, to give a whole and straight
> >> demonstration. The reason of still basing on v6:
> >>
> >> 	- Addressed all v6 comments (except the iommu part);
> >> 	- There is no comments yet for v7 (except the sysfs ones);
> >>
> >>
> >>  
> >>> I liked the last version of these
> >>> changes a lot, but we need to figure out how to combine development
> >>> because we do not have infinite cycles for review available :-\  Thanks!  
> >>
> >> Fully understand.
> >>
> >> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
> >> at the direction we prefer.   
> > 
> > Hi Jike,
> > 
> > I wish I could meet you in person in KVM forum couple weeks ago so we can have a
> > better discussion.  
> 
> I wish I could have that opportunity, too!
> 
> > We are trying our best to accommodate almost all requirements / comments from 
> > use cases and code reviews while keeping little (or none) architectural changes
> > between revisions.  
> 
> Yes I saw that, there was little architectural change from v6 to v7,
> that's what I will argue for :)
> 
> >> We would be highly glad and thankful if Neo/Kirti
> >> would adopt the code in their next version, which will certainly form a
> >> more simple and consolidated base for future co-development; otherwise
> >> and we could at least discuss the concerns, in case of any.
> >>  
> > 
> > As I have said in my previous response to you, if you have any questions about
> > adopting the framework that we have developed, you are very welcome to
> > comment/speak out on the code review thread like others. And if it is reasonable
> > request and won't break other vendors' use case, we will adopt it (one example
> > is the online file and removing the mdev pci dependency).
> >   
> 
> Not limited to having questions about adoption, right? :)
> 
> We do think the framework itself had too much unnecessary logic and
> imposed limitations to vendor drivers, also it's clearly possible to be
> simplified.
> 
> > Just some update for you regarding the v7 patches, currently we are very
> > actively trying to lock down the sysfs and management interfaces discussion.
> > 
> > So, if you would like to make the upstream happen sooner, please join us in the
> > v7 and following patch discussion instead of rewriting them.
> >   
> 
> So as you said, I would comment on the v7 series to propose both architectural
> and implementation changes, hoping this will help more.

Please do, I definitely like where you're heading and I want to see
Kirti and Neo include your ideas.  The challenge is to try to focus our
efforts into a single development stream.  Please continue to comment
and even submit patches, but let's consider NVIDIA's latest patches to
be the main development stream and request changes against that.  As
you would do upstream, propose changes in small increments where we can
evaluate each step.  It's very difficult for Neo and Kirti to
incorporate bulk rewrites.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support
@ 2016-09-07 16:56           ` Alex Williamson
  0 siblings, 0 replies; 28+ messages in thread
From: Alex Williamson @ 2016-09-07 16:56 UTC (permalink / raw)
  To: Jike Song
  Cc: Neo Jia, kwankhede, qemu-devel, kvm, bjsdjshi, kevin.tian,
	guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini, kraxel

On Wed, 07 Sep 2016 14:42:58 +0800
Jike Song <jike.song@intel.com> wrote:

> On 09/07/2016 11:38 AM, Neo Jia wrote:
> > On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:  
> >> On 09/02/2016 11:03 PM, Alex Williamson wrote:  
> >>> On Fri,  2 Sep 2016 16:16:08 +0800
> >>> Jike Song <jike.song@intel.com> wrote:
> >>>  
> >>>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
> >>>>
> >>>> 	http://www.spinics.net/lists/kvm/msg136472.html  
> >>>
> >>>
> >>> Hi Jike,
> >>>
> >>> I'm thrilled by your active participation here, but I'm confused which
> >>> versions I should be reviewing and where the primary development is
> >>> going.  Kirti sent v7 a week ago, so I would have expected a revision
> >>> based on that rather than a re-write based on v6 plus incorporation of a
> >>> few of Kirti's patches directly.  
> >>
> >> Hi Alex,
> >>
> >> [Sorry! replied this on Monday but it was silently dropped by the our firewall]
> >>
> >>
> >>
> >> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
> >> demonstrate how is it possible and beneficial to:
> >>
> >> 	1, Introduce an independent device between physical and mdev;
> >> 	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
> >>
> >> Unfortunately neither was understood or adopted in v7:
> >>
> >> 	http://www.spinics.net/lists/kvm/msg137081.html
> >>
> >> So here came the v2, as a standalone series, to give a whole and straight
> >> demonstration. The reason of still basing on v6:
> >>
> >> 	- Addressed all v6 comments (except the iommu part);
> >> 	- There is no comments yet for v7 (except the sysfs ones);
> >>
> >>
> >>  
> >>> I liked the last version of these
> >>> changes a lot, but we need to figure out how to combine development
> >>> because we do not have infinite cycles for review available :-\  Thanks!  
> >>
> >> Fully understand.
> >>
> >> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
> >> at the direction we prefer.   
> > 
> > Hi Jike,
> > 
> > I wish I could meet you in person in KVM forum couple weeks ago so we can have a
> > better discussion.  
> 
> I wish I could have that opportunity, too!
> 
> > We are trying our best to accommodate almost all requirements / comments from 
> > use cases and code reviews while keeping little (or none) architectural changes
> > between revisions.  
> 
> Yes I saw that, there was little architectural change from v6 to v7,
> that's what I will argue for :)
> 
> >> We would be highly glad and thankful if Neo/Kirti
> >> would adopt the code in their next version, which will certainly form a
> >> more simple and consolidated base for future co-development; otherwise
> >> and we could at least discuss the concerns, in case of any.
> >>  
> > 
> > As I have said in my previous response to you, if you have any questions about
> > adopting the framework that we have developed, you are very welcome to
> > comment/speak out on the code review thread like others. And if it is reasonable
> > request and won't break other vendors' use case, we will adopt it (one example
> > is the online file and removing the mdev pci dependency).
> >   
> 
> Not limited to having questions about adoption, right? :)
> 
> We do think the framework itself had too much unnecessary logic and
> imposed limitations to vendor drivers, also it's clearly possible to be
> simplified.
> 
> > Just some update for you regarding the v7 patches, currently we are very
> > actively trying to lock down the sysfs and management interfaces discussion.
> > 
> > So, if you would like to make the upstream happen sooner, please join us in the
> > v7 and following patch discussion instead of rewriting them.
> >   
> 
> So as you said, I would comment on the v7 series to propose both architectural
> and implementation changes, hoping this will help more.

Please do, I definitely like where you're heading and I want to see
Kirti and Neo include your ideas.  The challenge is to try to focus our
efforts into a single development stream.  Please continue to comment
and even submit patches, but let's consider NVIDIA's latest patches to
be the main development stream and request changes against that.  As
you would do upstream, propose changes in small increments where we can
evaluate each step.  It's very difficult for Neo and Kirti to
incorporate bulk rewrites.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC v2 0/4] adding mdev bus and vfio support
  2016-09-07 16:56           ` [Qemu-devel] " Alex Williamson
@ 2016-09-08  8:00             ` Jike Song
  -1 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-08  8:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Neo Jia, kwankhede, qemu-devel, kvm, bjsdjshi, kevin.tian,
	guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini, kraxel


On 09/08/2016 12:56 AM, Alex Williamson wrote:
> On Wed, 07 Sep 2016 14:42:58 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
>> On 09/07/2016 11:38 AM, Neo Jia wrote:
>>> On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:  
>>>> On 09/02/2016 11:03 PM, Alex Williamson wrote:  
>>>>> On Fri,  2 Sep 2016 16:16:08 +0800
>>>>> Jike Song <jike.song@intel.com> wrote:
>>>>>  
>>>>>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
>>>>>>
>>>>>> 	http://www.spinics.net/lists/kvm/msg136472.html  
>>>>>
>>>>>
>>>>> Hi Jike,
>>>>>
>>>>> I'm thrilled by your active participation here, but I'm confused which
>>>>> versions I should be reviewing and where the primary development is
>>>>> going.  Kirti sent v7 a week ago, so I would have expected a revision
>>>>> based on that rather than a re-write based on v6 plus incorporation of a
>>>>> few of Kirti's patches directly.  
>>>>
>>>> Hi Alex,
>>>>
>>>> [Sorry! replied this on Monday but it was silently dropped by the our firewall]
>>>>
>>>>
>>>>
>>>> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
>>>> demonstrate how is it possible and beneficial to:
>>>>
>>>> 	1, Introduce an independent device between physical and mdev;
>>>> 	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
>>>>
>>>> Unfortunately neither was understood or adopted in v7:
>>>>
>>>> 	http://www.spinics.net/lists/kvm/msg137081.html
>>>>
>>>> So here came the v2, as a standalone series, to give a whole and straight
>>>> demonstration. The reason of still basing on v6:
>>>>
>>>> 	- Addressed all v6 comments (except the iommu part);
>>>> 	- There is no comments yet for v7 (except the sysfs ones);
>>>>
>>>>
>>>>  
>>>>> I liked the last version of these
>>>>> changes a lot, but we need to figure out how to combine development
>>>>> because we do not have infinite cycles for review available :-\  Thanks!  
>>>>
>>>> Fully understand.
>>>>
>>>> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
>>>> at the direction we prefer.   
>>>
>>> Hi Jike,
>>>
>>> I wish I could meet you in person in KVM forum couple weeks ago so we can have a
>>> better discussion.  
>>
>> I wish I could have that opportunity, too!
>>
>>> We are trying our best to accommodate almost all requirements / comments from 
>>> use cases and code reviews while keeping little (or none) architectural changes
>>> between revisions.  
>>
>> Yes I saw that, there was little architectural change from v6 to v7,
>> that's what I will argue for :)
>>
>>>> We would be highly glad and thankful if Neo/Kirti
>>>> would adopt the code in their next version, which will certainly form a
>>>> more simple and consolidated base for future co-development; otherwise
>>>> and we could at least discuss the concerns, in case of any.
>>>>  
>>>
>>> As I have said in my previous response to you, if you have any questions about
>>> adopting the framework that we have developed, you are very welcome to
>>> comment/speak out on the code review thread like others. And if it is reasonable
>>> request and won't break other vendors' use case, we will adopt it (one example
>>> is the online file and removing the mdev pci dependency).
>>>   
>>
>> Not limited to having questions about adoption, right? :)
>>
>> We do think the framework itself had too much unnecessary logic and
>> imposed limitations to vendor drivers, also it's clearly possible to be
>> simplified.
>>
>>> Just some update for you regarding the v7 patches, currently we are very
>>> actively trying to lock down the sysfs and management interfaces discussion.
>>>
>>> So, if you would like to make the upstream happen sooner, please join us in the
>>> v7 and following patch discussion instead of rewriting them.
>>>   
>>
>> So as you said, I would comment on the v7 series to propose both architectural
>> and implementation changes, hoping this will help more.
> 
> Please do, I definitely like where you're heading and I want to see
> Kirti and Neo include your ideas.  The challenge is to try to focus our
> efforts into a single development stream.  Please continue to comment
> and even submit patches, but let's consider NVIDIA's latest patches to
> be the main development stream and request changes against that.  As
> you would do upstream, propose changes in small increments where we can
> evaluate each step.  It's very difficult for Neo and Kirti to
> incorporate bulk rewrites.  Thanks,
> 

Thanks - We'll try our best to meet that!


--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support
@ 2016-09-08  8:00             ` Jike Song
  0 siblings, 0 replies; 28+ messages in thread
From: Jike Song @ 2016-09-08  8:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Neo Jia, kwankhede, qemu-devel, kvm, bjsdjshi, kevin.tian,
	guangrong.xiao, zhenyuw, zhiyuan.lv, pbonzini, kraxel


On 09/08/2016 12:56 AM, Alex Williamson wrote:
> On Wed, 07 Sep 2016 14:42:58 +0800
> Jike Song <jike.song@intel.com> wrote:
> 
>> On 09/07/2016 11:38 AM, Neo Jia wrote:
>>> On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:  
>>>> On 09/02/2016 11:03 PM, Alex Williamson wrote:  
>>>>> On Fri,  2 Sep 2016 16:16:08 +0800
>>>>> Jike Song <jike.song@intel.com> wrote:
>>>>>  
>>>>>> This patchset is based on NVidia's "Add Mediated device support" series, version 6:
>>>>>>
>>>>>> 	http://www.spinics.net/lists/kvm/msg136472.html  
>>>>>
>>>>>
>>>>> Hi Jike,
>>>>>
>>>>> I'm thrilled by your active participation here, but I'm confused which
>>>>> versions I should be reviewing and where the primary development is
>>>>> going.  Kirti sent v7 a week ago, so I would have expected a revision
>>>>> based on that rather than a re-write based on v6 plus incorporation of a
>>>>> few of Kirti's patches directly.  
>>>>
>>>> Hi Alex,
>>>>
>>>> [Sorry! replied this on Monday but it was silently dropped by the our firewall]
>>>>
>>>>
>>>>
>>>> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
>>>> demonstrate how is it possible and beneficial to:
>>>>
>>>> 	1, Introduce an independent device between physical and mdev;
>>>> 	2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
>>>>
>>>> Unfortunately neither was understood or adopted in v7:
>>>>
>>>> 	http://www.spinics.net/lists/kvm/msg137081.html
>>>>
>>>> So here came the v2, as a standalone series, to give a whole and straight
>>>> demonstration. The reason of still basing on v6:
>>>>
>>>> 	- Addressed all v6 comments (except the iommu part);
>>>> 	- There is no comments yet for v7 (except the sysfs ones);
>>>>
>>>>
>>>>  
>>>>> I liked the last version of these
>>>>> changes a lot, but we need to figure out how to combine development
>>>>> because we do not have infinite cycles for review available :-\  Thanks!  
>>>>
>>>> Fully understand.
>>>>
>>>> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
>>>> at the direction we prefer.   
>>>
>>> Hi Jike,
>>>
>>> I wish I could meet you in person in KVM forum couple weeks ago so we can have a
>>> better discussion.  
>>
>> I wish I could have that opportunity, too!
>>
>>> We are trying our best to accommodate almost all requirements / comments from 
>>> use cases and code reviews while keeping little (or none) architectural changes
>>> between revisions.  
>>
>> Yes I saw that, there was little architectural change from v6 to v7,
>> that's what I will argue for :)
>>
>>>> We would be highly glad and thankful if Neo/Kirti
>>>> would adopt the code in their next version, which will certainly form a
>>>> more simple and consolidated base for future co-development; otherwise
>>>> and we could at least discuss the concerns, in case of any.
>>>>  
>>>
>>> As I have said in my previous response to you, if you have any questions about
>>> adopting the framework that we have developed, you are very welcome to
>>> comment/speak out on the code review thread like others. And if it is reasonable
>>> request and won't break other vendors' use case, we will adopt it (one example
>>> is the online file and removing the mdev pci dependency).
>>>   
>>
>> Not limited to having questions about adoption, right? :)
>>
>> We do think the framework itself had too much unnecessary logic and
>> imposed limitations to vendor drivers, also it's clearly possible to be
>> simplified.
>>
>>> Just some update for you regarding the v7 patches, currently we are very
>>> actively trying to lock down the sysfs and management interfaces discussion.
>>>
>>> So, if you would like to make the upstream happen sooner, please join us in the
>>> v7 and following patch discussion instead of rewriting them.
>>>   
>>
>> So as you said, I would comment on the v7 series to propose both architectural
>> and implementation changes, hoping this will help more.
> 
> Please do, I definitely like where you're heading and I want to see
> Kirti and Neo include your ideas.  The challenge is to try to focus our
> efforts into a single development stream.  Please continue to comment
> and even submit patches, but let's consider NVIDIA's latest patches to
> be the main development stream and request changes against that.  As
> you would do upstream, propose changes in small increments where we can
> evaluate each step.  It's very difficult for Neo and Kirti to
> incorporate bulk rewrites.  Thanks,
> 

Thanks - We'll try our best to meet that!


--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-09-08  8:02 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-02  8:16 [RFC v2 0/4] adding mdev bus and vfio support Jike Song
2016-09-02  8:16 ` [Qemu-devel] " Jike Song
2016-09-02  8:16 ` [RFC v2 1/4] Mediated device Core driver Jike Song
2016-09-02  8:16   ` [Qemu-devel] " Jike Song
2016-09-02  8:16 ` [RFC v2 2/4] vfio: VFIO bus driver for MDEV devices Jike Song
2016-09-02  8:16   ` [Qemu-devel] " Jike Song
2016-09-02  8:16 ` [RFC v2 3/4] vfio iommu: Add support for mediated devices Jike Song
2016-09-02  8:16   ` [Qemu-devel] " Jike Song
2016-09-02  8:16 ` [RFC v2 4/4] docs: Add Documentation for Mediated devices Jike Song
2016-09-02  8:16   ` [Qemu-devel] " Jike Song
2016-09-02 22:09   ` Eric Blake
2016-09-02 22:09     ` Eric Blake
2016-09-02 23:30     ` Neo Jia
2016-09-02 23:30       ` Neo Jia
2016-09-02 15:03 ` [RFC v2 0/4] adding mdev bus and vfio support Alex Williamson
2016-09-02 15:03   ` [Qemu-devel] " Alex Williamson
2016-09-02 20:05   ` Neo Jia
2016-09-02 20:05     ` [Qemu-devel] " Neo Jia
2016-09-07  2:22   ` Jike Song
2016-09-07  2:22     ` [Qemu-devel] " Jike Song
2016-09-07  3:38     ` Neo Jia
2016-09-07  3:38       ` [Qemu-devel] " Neo Jia
2016-09-07  6:42       ` Jike Song
2016-09-07  6:42         ` [Qemu-devel] " Jike Song
2016-09-07 16:56         ` Alex Williamson
2016-09-07 16:56           ` [Qemu-devel] " Alex Williamson
2016-09-08  8:00           ` Jike Song
2016-09-08  8:00             ` [Qemu-devel] " Jike Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.