All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file
@ 2022-08-31  1:01 Jason Gunthorpe
  2022-08-31  1:01 ` [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h Jason Gunthorpe
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31  1:01 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, Kevin Tian, kvm

This creates an isolated layer around the container FD code and everything
under it, including the VFIO iommu drivers. All this code is placed into
container.c, along with the "struct vfio_container" to compartmentalize
it.

Future patches will provide an iommufd based layer that gives the same API
as the container layer and choose which layer to go to based on how
userspace operates.

The patches continue to split up existing functions and finally the last
patch just moves every function that is a "container" function to the new
file and creates the global symbols to link them together.

Cross-file container functions are prefixed with vfio_container_* for
clarity.

The last patch can be defered and queued during the merge window to manage
conflicts. The earlier patches should be fine immediately conflicts wise.

This is the last big series I have to enable basic iommufd functionality.
As part of the iommufd series the entire container.c becomes conditionally
compiled:

https://github.com/jgunthorpe/linux/commits/vfio_iommufd

This applies on top of the prior two series:
 Break up ioctl dispatch functions to one function per ioctl
 Remove private items from linux/vfio_pci_core.h

Jason Gunthorpe (8):
  vfio: Add header guards and includes to drivers/vfio/vfio.h
  vfio: Rename __vfio_group_unset_container()
  vfio: Split the container logic into vfio_container_attach_group()
  vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU
  vfio: Split out container code from the init/cleanup functions
  vfio: Rename vfio_ioctl_check_extension()
  vfio: Split the register_device ops call into functions
  vfio: Move container code into drivers/vfio/container.c

 drivers/vfio/Makefile    |   1 +
 drivers/vfio/container.c | 680 +++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h      |  57 ++++
 drivers/vfio/vfio_main.c | 709 ++-------------------------------------
 4 files changed, 766 insertions(+), 681 deletions(-)
 create mode 100644 drivers/vfio/container.c


base-commit: 456bc2e671af7852f8a028c8069906b6b6fa1ff9
-- 
2.37.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h
  2022-08-31  1:01 [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file Jason Gunthorpe
@ 2022-08-31  1:01 ` Jason Gunthorpe
  2022-08-31  8:42   ` Tian, Kevin
  2022-08-31  1:01 ` [PATCH 2/8] vfio: Rename __vfio_group_unset_container() Jason Gunthorpe
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31  1:01 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, Kevin Tian, kvm

As is normal for headers.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 503bea6c843d56..093784f1dea7a9 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -3,6 +3,14 @@
  * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
  *     Author: Alex Williamson <alex.williamson@redhat.com>
  */
+#ifndef __VFIO_VFIO_H__
+#define __VFIO_VFIO_H__
+
+#include <linux/device.h>
+#include <linux/cdev.h>
+#include <linux/module.h>
+
+struct iommu_group;
 
 enum vfio_group_type {
 	/*
@@ -69,3 +77,5 @@ struct vfio_iommu_driver_ops {
 
 int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
 void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops);
+
+#endif
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/8] vfio: Rename __vfio_group_unset_container()
  2022-08-31  1:01 [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file Jason Gunthorpe
  2022-08-31  1:01 ` [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h Jason Gunthorpe
@ 2022-08-31  1:01 ` Jason Gunthorpe
  2022-08-31  8:46   ` Tian, Kevin
  2022-08-31  1:01 ` [PATCH 3/8] vfio: Split the container logic into vfio_container_attach_group() Jason Gunthorpe
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31  1:01 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, Kevin Tian, kvm

To vfio_container_detatch_group(). This function is really a container
function.

Fold the WARN_ON() into it as a precondition assertion.

A following patch will move the vfio_container functions to their own .c
file.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_main.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index bfa6119ba47337..e145c87f208f3a 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -928,12 +928,13 @@ static const struct file_operations vfio_fops = {
 /*
  * VFIO Group fd, /dev/vfio/$GROUP
  */
-static void __vfio_group_unset_container(struct vfio_group *group)
+static void vfio_container_detatch_group(struct vfio_group *group)
 {
 	struct vfio_container *container = group->container;
 	struct vfio_iommu_driver *driver;
 
 	lockdep_assert_held_write(&group->group_rwsem);
+	WARN_ON(group->container_users != 1);
 
 	down_write(&container->group_lock);
 
@@ -976,7 +977,7 @@ static int vfio_group_ioctl_unset_container(struct vfio_group *group)
 		return -EINVAL;
 	if (group->container_users != 1)
 		return -EBUSY;
-	__vfio_group_unset_container(group);
+	vfio_container_detatch_group(group);
 	return 0;
 }
 
@@ -1329,10 +1330,8 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
 	 * is only called when there are no open devices.
 	 */
 	WARN_ON(group->notifier.head);
-	if (group->container) {
-		WARN_ON(group->container_users != 1);
-		__vfio_group_unset_container(group);
-	}
+	if (group->container)
+		vfio_container_detatch_group(group);
 	group->opened_file = NULL;
 	up_write(&group->group_rwsem);
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/8] vfio: Split the container logic into vfio_container_attach_group()
  2022-08-31  1:01 [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file Jason Gunthorpe
  2022-08-31  1:01 ` [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h Jason Gunthorpe
  2022-08-31  1:01 ` [PATCH 2/8] vfio: Rename __vfio_group_unset_container() Jason Gunthorpe
@ 2022-08-31  1:01 ` Jason Gunthorpe
  2022-08-31  8:47   ` Tian, Kevin
  2022-08-31  1:01 ` [PATCH 4/8] vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU Jason Gunthorpe
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31  1:01 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, Kevin Tian, kvm

This splits up the ioctl of vfio_group_ioctl_set_container() so it
determines the type of file then invokes a type specific attachment
function. Future patches will add iommufd to this function as an
alternative type.

A following patch will move the vfio_container functions to their own .c
file.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_main.c | 78 ++++++++++++++++++++++++----------------
 1 file changed, 48 insertions(+), 30 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index e145c87f208f3a..1108ba53fe5c28 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -981,40 +981,29 @@ static int vfio_group_ioctl_unset_container(struct vfio_group *group)
 	return 0;
 }
 
-static int vfio_group_ioctl_set_container(struct vfio_group *group,
-					  int __user *arg)
+static struct vfio_container *vfio_container_from_file(struct file *file)
 {
-	struct fd f;
 	struct vfio_container *container;
-	struct vfio_iommu_driver *driver;
-	int container_fd;
-	int ret = 0;
-
-	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
-		return -EPERM;
-
-	if (get_user(container_fd, arg))
-		return -EFAULT;
-	if (container_fd < 0)
-		return -EINVAL;
-	f = fdget(container_fd);
-	if (!f.file)
-		return -EBADF;
 
 	/* Sanity check, is this really our fd? */
-	if (f.file->f_op != &vfio_fops) {
-		ret = -EINVAL;
-		goto out_fdput;
-	}
-	container = f.file->private_data;
+	if (file->f_op != &vfio_fops)
+		return NULL;
+
+	container = file->private_data;
 	WARN_ON(!container); /* fget ensures we don't race vfio_release */
+	return container;
+}
 
-	down_write(&group->group_rwsem);
+static int vfio_container_attach_group(struct vfio_group *group,
+				       struct vfio_container *container)
+{
+	struct vfio_iommu_driver *driver;
+	int ret = 0;
 
-	if (group->container || WARN_ON(group->container_users)) {
-		ret = -EINVAL;
-		goto out_unlock_group;
-	}
+	lockdep_assert_held_write(&group->group_rwsem);
+
+	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
+		return -EPERM;
 
 	down_write(&container->group_lock);
 
@@ -1026,7 +1015,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	}
 
 	if (group->type == VFIO_IOMMU) {
-		ret = iommu_group_claim_dma_owner(group->iommu_group, f.file);
+		ret = iommu_group_claim_dma_owner(group->iommu_group, group);
 		if (ret)
 			goto out_unlock_container;
 	}
@@ -1054,9 +1043,38 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 
 out_unlock_container:
 	up_write(&container->group_lock);
-out_unlock_group:
+	return ret;
+}
+
+static int vfio_group_ioctl_set_container(struct vfio_group *group,
+					  int __user *arg)
+{
+	struct vfio_container *container;
+	struct fd f;
+	int ret;
+	int fd;
+
+	if (get_user(fd, arg))
+		return -EFAULT;
+
+	f = fdget(fd);
+	if (!f.file)
+		return -EBADF;
+
+	down_write(&group->group_rwsem);
+	if (group->container || WARN_ON(group->container_users)) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	container = vfio_container_from_file(f.file);
+	ret = -EINVAL;
+	if (container) {
+		ret = vfio_container_attach_group(group, container);
+		goto out_unlock;
+	}
+
+out_unlock:
 	up_write(&group->group_rwsem);
-out_fdput:
 	fdput(f);
 	return ret;
 }
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/8] vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU
  2022-08-31  1:01 [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2022-08-31  1:01 ` [PATCH 3/8] vfio: Split the container logic into vfio_container_attach_group() Jason Gunthorpe
@ 2022-08-31  1:01 ` Jason Gunthorpe
  2022-08-31  8:48   ` Tian, Kevin
  2022-08-31  1:01 ` [PATCH 5/8] vfio: Split out container code from the init/cleanup functions Jason Gunthorpe
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31  1:01 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, Kevin Tian, kvm

This can all be accomplished using typical IS_ENABLED techniques, drop it
all.

Also rename the variable to vfio_noiommu so this can be made global in
following patches.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_main.c | 43 ++++++++++++++++++----------------------
 1 file changed, 19 insertions(+), 24 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 1108ba53fe5c28..f76cea312028a6 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -82,10 +82,12 @@ struct vfio_group {
 };
 
 #ifdef CONFIG_VFIO_NOIOMMU
-static bool noiommu __read_mostly;
+static bool vfio_noiommu __read_mostly;
 module_param_named(enable_unsafe_noiommu_mode,
-		   noiommu, bool, S_IRUGO | S_IWUSR);
+		   vfio_noiommu, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode.  This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel.  If you do not know what this is for, step away. (default: false)");
+#else
+enum { vfio_noiommu = false };
 #endif
 
 static DEFINE_XARRAY(vfio_device_set_xa);
@@ -162,7 +164,6 @@ static void vfio_release_device_set(struct vfio_device *device)
 	xa_unlock(&vfio_device_set_xa);
 }
 
-#ifdef CONFIG_VFIO_NOIOMMU
 static void *vfio_noiommu_open(unsigned long arg)
 {
 	if (arg != VFIO_NOIOMMU_IOMMU)
@@ -181,7 +182,7 @@ static long vfio_noiommu_ioctl(void *iommu_data,
 			       unsigned int cmd, unsigned long arg)
 {
 	if (cmd == VFIO_CHECK_EXTENSION)
-		return noiommu && (arg == VFIO_NOIOMMU_IOMMU) ? 1 : 0;
+		return vfio_noiommu && (arg == VFIO_NOIOMMU_IOMMU) ? 1 : 0;
 
 	return -ENOTTY;
 }
@@ -211,18 +212,13 @@ static const struct vfio_iommu_driver_ops vfio_noiommu_ops = {
  * Only noiommu containers can use vfio-noiommu and noiommu containers can only
  * use vfio-noiommu.
  */
-static inline bool vfio_iommu_driver_allowed(struct vfio_container *container,
-		const struct vfio_iommu_driver *driver)
+static bool vfio_iommu_driver_allowed(struct vfio_container *container,
+				      const struct vfio_iommu_driver *driver)
 {
+	if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU))
+		return true;
 	return container->noiommu == (driver->ops == &vfio_noiommu_ops);
 }
-#else
-static inline bool vfio_iommu_driver_allowed(struct vfio_container *container,
-		const struct vfio_iommu_driver *driver)
-{
-	return true;
-}
-#endif /* CONFIG_VFIO_NOIOMMU */
 
 /*
  * IOMMU driver registration
@@ -535,8 +531,7 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
 	struct vfio_group *group;
 
 	iommu_group = iommu_group_get(dev);
-#ifdef CONFIG_VFIO_NOIOMMU
-	if (!iommu_group && noiommu) {
+	if (!iommu_group && vfio_noiommu) {
 		/*
 		 * With noiommu enabled, create an IOMMU group for devices that
 		 * don't already have one, implying no IOMMU hardware/driver
@@ -550,7 +545,7 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
 		}
 		return group;
 	}
-#endif
+
 	if (!iommu_group)
 		return ERR_PTR(-EINVAL);
 
@@ -2101,11 +2096,11 @@ static int __init vfio_init(void)
 	if (ret)
 		goto err_alloc_chrdev;
 
-#ifdef CONFIG_VFIO_NOIOMMU
-	ret = vfio_register_iommu_driver(&vfio_noiommu_ops);
-#endif
-	if (ret)
-		goto err_driver_register;
+	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU)) {
+		ret = vfio_register_iommu_driver(&vfio_noiommu_ops);
+		if (ret)
+			goto err_driver_register;
+	}
 
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	return 0;
@@ -2124,9 +2119,9 @@ static void __exit vfio_cleanup(void)
 {
 	WARN_ON(!list_empty(&vfio.group_list));
 
-#ifdef CONFIG_VFIO_NOIOMMU
-	vfio_unregister_iommu_driver(&vfio_noiommu_ops);
-#endif
+	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU))
+		vfio_unregister_iommu_driver(&vfio_noiommu_ops);
+
 	ida_destroy(&vfio.group_ida);
 	unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
 	class_destroy(vfio.class);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/8] vfio: Split out container code from the init/cleanup functions
  2022-08-31  1:01 [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2022-08-31  1:01 ` [PATCH 4/8] vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU Jason Gunthorpe
@ 2022-08-31  1:01 ` Jason Gunthorpe
  2022-08-31  8:49   ` Tian, Kevin
  2022-08-31  1:02 ` [PATCH 6/8] vfio: Rename vfio_ioctl_check_extension() Jason Gunthorpe
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31  1:01 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, Kevin Tian, kvm

This miscdev, noiommu driver and a coule of globals are all container
items. Move this init into its own functions.

A following patch will move the vfio_container functions to their own .c
file.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_main.c | 52 +++++++++++++++++++++++++++-------------
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f76cea312028a6..bfe13b5c12fed7 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -2067,14 +2067,11 @@ static struct miscdevice vfio_dev = {
 	.mode = S_IRUGO | S_IWUGO,
 };
 
-static int __init vfio_init(void)
+static int __init vfio_container_init(void)
 {
 	int ret;
 
-	ida_init(&vfio.group_ida);
-	mutex_init(&vfio.group_lock);
 	mutex_init(&vfio.iommu_drivers_lock);
-	INIT_LIST_HEAD(&vfio.group_list);
 	INIT_LIST_HEAD(&vfio.iommu_drivers_list);
 
 	ret = misc_register(&vfio_dev);
@@ -2083,6 +2080,38 @@ static int __init vfio_init(void)
 		return ret;
 	}
 
+	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU)) {
+		ret = vfio_register_iommu_driver(&vfio_noiommu_ops);
+		if (ret)
+			goto err_misc;
+	}
+	return 0;
+
+err_misc:
+	misc_deregister(&vfio_dev);
+	return ret;
+}
+
+static void vfio_container_cleanup(void)
+{
+	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU))
+		vfio_unregister_iommu_driver(&vfio_noiommu_ops);
+	misc_deregister(&vfio_dev);
+	mutex_destroy(&vfio.iommu_drivers_lock);
+}
+
+static int __init vfio_init(void)
+{
+	int ret;
+
+	ida_init(&vfio.group_ida);
+	mutex_init(&vfio.group_lock);
+	INIT_LIST_HEAD(&vfio.group_list);
+
+	ret = vfio_container_init();
+	if (ret)
+		return ret;
+
 	/* /dev/vfio/$GROUP */
 	vfio.class = class_create(THIS_MODULE, "vfio");
 	if (IS_ERR(vfio.class)) {
@@ -2096,22 +2125,14 @@ static int __init vfio_init(void)
 	if (ret)
 		goto err_alloc_chrdev;
 
-	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU)) {
-		ret = vfio_register_iommu_driver(&vfio_noiommu_ops);
-		if (ret)
-			goto err_driver_register;
-	}
-
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 	return 0;
 
-err_driver_register:
-	unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
 err_alloc_chrdev:
 	class_destroy(vfio.class);
 	vfio.class = NULL;
 err_class:
-	misc_deregister(&vfio_dev);
+	vfio_container_cleanup();
 	return ret;
 }
 
@@ -2119,14 +2140,11 @@ static void __exit vfio_cleanup(void)
 {
 	WARN_ON(!list_empty(&vfio.group_list));
 
-	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU))
-		vfio_unregister_iommu_driver(&vfio_noiommu_ops);
-
 	ida_destroy(&vfio.group_ida);
 	unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
 	class_destroy(vfio.class);
+	vfio_container_cleanup();
 	vfio.class = NULL;
-	misc_deregister(&vfio_dev);
 	xa_destroy(&vfio_device_set_xa);
 }
 
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 6/8] vfio: Rename vfio_ioctl_check_extension()
  2022-08-31  1:01 [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2022-08-31  1:01 ` [PATCH 5/8] vfio: Split out container code from the init/cleanup functions Jason Gunthorpe
@ 2022-08-31  1:02 ` Jason Gunthorpe
  2022-08-31  8:50   ` Tian, Kevin
  2022-08-31  1:02 ` [PATCH 7/8] vfio: Split the register_device ops call into functions Jason Gunthorpe
  2022-08-31  1:02 ` [PATCH 8/8] vfio: Move container code into drivers/vfio/container.c Jason Gunthorpe
  7 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31  1:02 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, Kevin Tian, kvm

To vfio_container_ioctl_check_extension().

A following patch will turn this into a non-static function, make it clear
it is related to the container.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_main.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index bfe13b5c12fed7..e1a424d243351a 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -705,8 +705,9 @@ EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
 /*
  * VFIO base fd, /dev/vfio/vfio
  */
-static long vfio_ioctl_check_extension(struct vfio_container *container,
-				       unsigned long arg)
+static long
+vfio_container_ioctl_check_extension(struct vfio_container *container,
+				     unsigned long arg)
 {
 	struct vfio_iommu_driver *driver;
 	long ret = 0;
@@ -863,7 +864,7 @@ static long vfio_fops_unl_ioctl(struct file *filep,
 		ret = VFIO_API_VERSION;
 		break;
 	case VFIO_CHECK_EXTENSION:
-		ret = vfio_ioctl_check_extension(container, arg);
+		ret = vfio_container_ioctl_check_extension(container, arg);
 		break;
 	case VFIO_SET_IOMMU:
 		ret = vfio_ioctl_set_iommu(container, arg);
@@ -1770,8 +1771,8 @@ bool vfio_file_enforced_coherent(struct file *file)
 
 	down_read(&group->group_rwsem);
 	if (group->container) {
-		ret = vfio_ioctl_check_extension(group->container,
-						 VFIO_DMA_CC_IOMMU);
+		ret = vfio_container_ioctl_check_extension(group->container,
+							   VFIO_DMA_CC_IOMMU);
 	} else {
 		/*
 		 * Since the coherency state is determined only once a container
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 7/8] vfio: Split the register_device ops call into functions
  2022-08-31  1:01 [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2022-08-31  1:02 ` [PATCH 6/8] vfio: Rename vfio_ioctl_check_extension() Jason Gunthorpe
@ 2022-08-31  1:02 ` Jason Gunthorpe
  2022-09-02  3:52   ` Tian, Kevin
  2022-08-31  1:02 ` [PATCH 8/8] vfio: Move container code into drivers/vfio/container.c Jason Gunthorpe
  7 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31  1:02 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, Kevin Tian, kvm

This is a container item.

A following patch will move the vfio_container functions to their own .c
file.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_main.c | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index e1a424d243351a..cf4b5418f497a9 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1110,9 +1110,28 @@ static void vfio_device_unassign_container(struct vfio_device *device)
 	up_write(&device->group->group_rwsem);
 }
 
+static void vfio_container_register_device(struct vfio_device *device)
+{
+	struct vfio_iommu_driver *iommu_driver =
+		device->group->container->iommu_driver;
+
+	if (iommu_driver && iommu_driver->ops->register_device)
+		iommu_driver->ops->register_device(
+			device->group->container->iommu_data, device);
+}
+
+static void vfio_container_unregister_device(struct vfio_device *device)
+{
+	struct vfio_iommu_driver *iommu_driver =
+		device->group->container->iommu_driver;
+
+	if (iommu_driver && iommu_driver->ops->unregister_device)
+		iommu_driver->ops->unregister_device(
+			device->group->container->iommu_data, device);
+}
+
 static struct file *vfio_device_open(struct vfio_device *device)
 {
-	struct vfio_iommu_driver *iommu_driver;
 	struct file *filep;
 	int ret;
 
@@ -1143,12 +1162,7 @@ static struct file *vfio_device_open(struct vfio_device *device)
 			if (ret)
 				goto err_undo_count;
 		}
-
-		iommu_driver = device->group->container->iommu_driver;
-		if (iommu_driver && iommu_driver->ops->register_device)
-			iommu_driver->ops->register_device(
-				device->group->container->iommu_data, device);
-
+		vfio_container_register_device(device);
 		up_read(&device->group->group_rwsem);
 	}
 	mutex_unlock(&device->dev_set->lock);
@@ -1186,10 +1200,7 @@ static struct file *vfio_device_open(struct vfio_device *device)
 	if (device->open_count == 1 && device->ops->close_device) {
 		device->ops->close_device(device);
 
-		iommu_driver = device->group->container->iommu_driver;
-		if (iommu_driver && iommu_driver->ops->unregister_device)
-			iommu_driver->ops->unregister_device(
-				device->group->container->iommu_data, device);
+		vfio_container_unregister_device(device);
 	}
 err_undo_count:
 	up_read(&device->group->group_rwsem);
@@ -1368,7 +1379,6 @@ static const struct file_operations vfio_group_fops = {
 static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 {
 	struct vfio_device *device = filep->private_data;
-	struct vfio_iommu_driver *iommu_driver;
 
 	mutex_lock(&device->dev_set->lock);
 	vfio_assert_device_open(device);
@@ -1376,10 +1386,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 	if (device->open_count == 1 && device->ops->close_device)
 		device->ops->close_device(device);
 
-	iommu_driver = device->group->container->iommu_driver;
-	if (iommu_driver && iommu_driver->ops->unregister_device)
-		iommu_driver->ops->unregister_device(
-			device->group->container->iommu_data, device);
+	vfio_container_unregister_device(device);
 	up_read(&device->group->group_rwsem);
 	device->open_count--;
 	if (device->open_count == 0)
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 8/8] vfio: Move container code into drivers/vfio/container.c
  2022-08-31  1:01 [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file Jason Gunthorpe
                   ` (6 preceding siblings ...)
  2022-08-31  1:02 ` [PATCH 7/8] vfio: Split the register_device ops call into functions Jason Gunthorpe
@ 2022-08-31  1:02 ` Jason Gunthorpe
  7 siblings, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31  1:02 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, Kevin Tian, kvm

All the functions that dereference struct vfio_container are moved into
container.c.

Simple code motion, no functional change.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/Makefile    |   1 +
 drivers/vfio/container.c | 680 ++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio.h      |  47 +++
 drivers/vfio/vfio_main.c | 693 +--------------------------------------
 4 files changed, 729 insertions(+), 692 deletions(-)
 create mode 100644 drivers/vfio/container.c

diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 1a32357592e3ea..d5ae6921eb4ece 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 vfio_virqfd-y := virqfd.o
 
+vfio-y += container.o
 vfio-y += vfio_main.o
 
 obj-$(CONFIG_VFIO) += vfio.o
diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
new file mode 100644
index 00000000000000..2b1e021ae58a83
--- /dev/null
+++ b/drivers/vfio/container.c
@@ -0,0 +1,680 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ *
+ * VFIO container (/dev/vfio/vfio)
+ */
+#include <linux/file.h>
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/capability.h>
+#include <linux/iommu.h>
+#include <linux/miscdevice.h>
+#include <linux/vfio.h>
+#include <uapi/linux/vfio.h>
+
+#include "vfio.h"
+
+struct vfio_container {
+	struct kref			kref;
+	struct list_head		group_list;
+	struct rw_semaphore		group_lock;
+	struct vfio_iommu_driver	*iommu_driver;
+	void				*iommu_data;
+	bool				noiommu;
+};
+
+static struct vfio {
+	struct list_head		iommu_drivers_list;
+	struct mutex			iommu_drivers_lock;
+} vfio;
+
+#ifdef CONFIG_VFIO_NOIOMMU
+bool vfio_noiommu __read_mostly;
+module_param_named(enable_unsafe_noiommu_mode,
+		   vfio_noiommu, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode.  This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel.  If you do not know what this is for, step away. (default: false)");
+#endif
+
+static void *vfio_noiommu_open(unsigned long arg)
+{
+	if (arg != VFIO_NOIOMMU_IOMMU)
+		return ERR_PTR(-EINVAL);
+	if (!capable(CAP_SYS_RAWIO))
+		return ERR_PTR(-EPERM);
+
+	return NULL;
+}
+
+static void vfio_noiommu_release(void *iommu_data)
+{
+}
+
+static long vfio_noiommu_ioctl(void *iommu_data,
+			       unsigned int cmd, unsigned long arg)
+{
+	if (cmd == VFIO_CHECK_EXTENSION)
+		return vfio_noiommu && (arg == VFIO_NOIOMMU_IOMMU) ? 1 : 0;
+
+	return -ENOTTY;
+}
+
+static int vfio_noiommu_attach_group(void *iommu_data,
+		struct iommu_group *iommu_group, enum vfio_group_type type)
+{
+	return 0;
+}
+
+static void vfio_noiommu_detach_group(void *iommu_data,
+				      struct iommu_group *iommu_group)
+{
+}
+
+static const struct vfio_iommu_driver_ops vfio_noiommu_ops = {
+	.name = "vfio-noiommu",
+	.owner = THIS_MODULE,
+	.open = vfio_noiommu_open,
+	.release = vfio_noiommu_release,
+	.ioctl = vfio_noiommu_ioctl,
+	.attach_group = vfio_noiommu_attach_group,
+	.detach_group = vfio_noiommu_detach_group,
+};
+
+/*
+ * Only noiommu containers can use vfio-noiommu and noiommu containers can only
+ * use vfio-noiommu.
+ */
+static bool vfio_iommu_driver_allowed(struct vfio_container *container,
+				      const struct vfio_iommu_driver *driver)
+{
+	if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU))
+		return true;
+	return container->noiommu == (driver->ops == &vfio_noiommu_ops);
+}
+
+/*
+ * IOMMU driver registration
+ */
+int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops)
+{
+	struct vfio_iommu_driver *driver, *tmp;
+
+	if (WARN_ON(!ops->register_device != !ops->unregister_device))
+		return -EINVAL;
+
+	driver = kzalloc(sizeof(*driver), GFP_KERNEL);
+	if (!driver)
+		return -ENOMEM;
+
+	driver->ops = ops;
+
+	mutex_lock(&vfio.iommu_drivers_lock);
+
+	/* Check for duplicates */
+	list_for_each_entry(tmp, &vfio.iommu_drivers_list, vfio_next) {
+		if (tmp->ops == ops) {
+			mutex_unlock(&vfio.iommu_drivers_lock);
+			kfree(driver);
+			return -EINVAL;
+		}
+	}
+
+	list_add(&driver->vfio_next, &vfio.iommu_drivers_list);
+
+	mutex_unlock(&vfio.iommu_drivers_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vfio_register_iommu_driver);
+
+void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
+{
+	struct vfio_iommu_driver *driver;
+
+	mutex_lock(&vfio.iommu_drivers_lock);
+	list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) {
+		if (driver->ops == ops) {
+			list_del(&driver->vfio_next);
+			mutex_unlock(&vfio.iommu_drivers_lock);
+			kfree(driver);
+			return;
+		}
+	}
+	mutex_unlock(&vfio.iommu_drivers_lock);
+}
+EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
+
+/*
+ * Container objects - containers are created when /dev/vfio/vfio is
+ * opened, but their lifecycle extends until the last user is done, so
+ * it's freed via kref.  Must support container/group/device being
+ * closed in any order.
+ */
+static void vfio_container_release(struct kref *kref)
+{
+	struct vfio_container *container;
+	container = container_of(kref, struct vfio_container, kref);
+
+	kfree(container);
+}
+
+static void vfio_container_get(struct vfio_container *container)
+{
+	kref_get(&container->kref);
+}
+
+static void vfio_container_put(struct vfio_container *container)
+{
+	kref_put(&container->kref, vfio_container_release);
+}
+
+void vfio_container_register_device(struct vfio_device *device)
+{
+	struct vfio_iommu_driver *iommu_driver =
+		device->group->container->iommu_driver;
+
+	if (iommu_driver && iommu_driver->ops->register_device)
+		iommu_driver->ops->register_device(
+			device->group->container->iommu_data, device);
+}
+
+void vfio_container_unregister_device(struct vfio_device *device)
+{
+	struct vfio_iommu_driver *iommu_driver =
+		device->group->container->iommu_driver;
+
+	if (iommu_driver && iommu_driver->ops->unregister_device)
+		iommu_driver->ops->unregister_device(
+			device->group->container->iommu_data, device);
+}
+
+long vfio_container_ioctl_check_extension(struct vfio_container *container,
+					  unsigned long arg)
+{
+	struct vfio_iommu_driver *driver;
+	long ret = 0;
+
+	down_read(&container->group_lock);
+
+	driver = container->iommu_driver;
+
+	switch (arg) {
+		/* No base extensions yet */
+	default:
+		/*
+		 * If no driver is set, poll all registered drivers for
+		 * extensions and return the first positive result.  If
+		 * a driver is already set, further queries will be passed
+		 * only to that driver.
+		 */
+		if (!driver) {
+			mutex_lock(&vfio.iommu_drivers_lock);
+			list_for_each_entry(driver, &vfio.iommu_drivers_list,
+					    vfio_next) {
+
+				if (!list_empty(&container->group_list) &&
+				    !vfio_iommu_driver_allowed(container,
+							       driver))
+					continue;
+				if (!try_module_get(driver->ops->owner))
+					continue;
+
+				ret = driver->ops->ioctl(NULL,
+							 VFIO_CHECK_EXTENSION,
+							 arg);
+				module_put(driver->ops->owner);
+				if (ret > 0)
+					break;
+			}
+			mutex_unlock(&vfio.iommu_drivers_lock);
+		} else
+			ret = driver->ops->ioctl(container->iommu_data,
+						 VFIO_CHECK_EXTENSION, arg);
+	}
+
+	up_read(&container->group_lock);
+
+	return ret;
+}
+
+/* hold write lock on container->group_lock */
+static int __vfio_container_attach_groups(struct vfio_container *container,
+					  struct vfio_iommu_driver *driver,
+					  void *data)
+{
+	struct vfio_group *group;
+	int ret = -ENODEV;
+
+	list_for_each_entry(group, &container->group_list, container_next) {
+		ret = driver->ops->attach_group(data, group->iommu_group,
+						group->type);
+		if (ret)
+			goto unwind;
+	}
+
+	return ret;
+
+unwind:
+	list_for_each_entry_continue_reverse(group, &container->group_list,
+					     container_next) {
+		driver->ops->detach_group(data, group->iommu_group);
+	}
+
+	return ret;
+}
+
+static long vfio_ioctl_set_iommu(struct vfio_container *container,
+				 unsigned long arg)
+{
+	struct vfio_iommu_driver *driver;
+	long ret = -ENODEV;
+
+	down_write(&container->group_lock);
+
+	/*
+	 * The container is designed to be an unprivileged interface while
+	 * the group can be assigned to specific users.  Therefore, only by
+	 * adding a group to a container does the user get the privilege of
+	 * enabling the iommu, which may allocate finite resources.  There
+	 * is no unset_iommu, but by removing all the groups from a container,
+	 * the container is deprivileged and returns to an unset state.
+	 */
+	if (list_empty(&container->group_list) || container->iommu_driver) {
+		up_write(&container->group_lock);
+		return -EINVAL;
+	}
+
+	mutex_lock(&vfio.iommu_drivers_lock);
+	list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) {
+		void *data;
+
+		if (!vfio_iommu_driver_allowed(container, driver))
+			continue;
+		if (!try_module_get(driver->ops->owner))
+			continue;
+
+		/*
+		 * The arg magic for SET_IOMMU is the same as CHECK_EXTENSION,
+		 * so test which iommu driver reported support for this
+		 * extension and call open on them.  We also pass them the
+		 * magic, allowing a single driver to support multiple
+		 * interfaces if they'd like.
+		 */
+		if (driver->ops->ioctl(NULL, VFIO_CHECK_EXTENSION, arg) <= 0) {
+			module_put(driver->ops->owner);
+			continue;
+		}
+
+		data = driver->ops->open(arg);
+		if (IS_ERR(data)) {
+			ret = PTR_ERR(data);
+			module_put(driver->ops->owner);
+			continue;
+		}
+
+		ret = __vfio_container_attach_groups(container, driver, data);
+		if (ret) {
+			driver->ops->release(data);
+			module_put(driver->ops->owner);
+			continue;
+		}
+
+		container->iommu_driver = driver;
+		container->iommu_data = data;
+		break;
+	}
+
+	mutex_unlock(&vfio.iommu_drivers_lock);
+	up_write(&container->group_lock);
+
+	return ret;
+}
+
+static long vfio_fops_unl_ioctl(struct file *filep,
+				unsigned int cmd, unsigned long arg)
+{
+	struct vfio_container *container = filep->private_data;
+	struct vfio_iommu_driver *driver;
+	void *data;
+	long ret = -EINVAL;
+
+	if (!container)
+		return ret;
+
+	switch (cmd) {
+	case VFIO_GET_API_VERSION:
+		ret = VFIO_API_VERSION;
+		break;
+	case VFIO_CHECK_EXTENSION:
+		ret = vfio_container_ioctl_check_extension(container, arg);
+		break;
+	case VFIO_SET_IOMMU:
+		ret = vfio_ioctl_set_iommu(container, arg);
+		break;
+	default:
+		driver = container->iommu_driver;
+		data = container->iommu_data;
+
+		if (driver) /* passthrough all unrecognized ioctls */
+			ret = driver->ops->ioctl(data, cmd, arg);
+	}
+
+	return ret;
+}
+
+static int vfio_fops_open(struct inode *inode, struct file *filep)
+{
+	struct vfio_container *container;
+
+	container = kzalloc(sizeof(*container), GFP_KERNEL);
+	if (!container)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&container->group_list);
+	init_rwsem(&container->group_lock);
+	kref_init(&container->kref);
+
+	filep->private_data = container;
+
+	return 0;
+}
+
+static int vfio_fops_release(struct inode *inode, struct file *filep)
+{
+	struct vfio_container *container = filep->private_data;
+	struct vfio_iommu_driver *driver = container->iommu_driver;
+
+	if (driver && driver->ops->notify)
+		driver->ops->notify(container->iommu_data,
+				    VFIO_IOMMU_CONTAINER_CLOSE);
+
+	filep->private_data = NULL;
+
+	vfio_container_put(container);
+
+	return 0;
+}
+
+static const struct file_operations vfio_fops = {
+	.owner		= THIS_MODULE,
+	.open		= vfio_fops_open,
+	.release	= vfio_fops_release,
+	.unlocked_ioctl	= vfio_fops_unl_ioctl,
+	.compat_ioctl	= compat_ptr_ioctl,
+};
+
+struct vfio_container *vfio_container_from_file(struct file *file)
+{
+	struct vfio_container *container;
+
+	/* Sanity check, is this really our fd? */
+	if (file->f_op != &vfio_fops)
+		return NULL;
+
+	container = file->private_data;
+	WARN_ON(!container); /* fget ensures we don't race vfio_release */
+	return container;
+}
+
+static struct miscdevice vfio_dev = {
+	.minor = VFIO_MINOR,
+	.name = "vfio",
+	.fops = &vfio_fops,
+	.nodename = "vfio/vfio",
+	.mode = S_IRUGO | S_IWUGO,
+};
+
+int vfio_container_attach_group(struct vfio_group *group,
+				struct vfio_container *container)
+{
+	struct vfio_iommu_driver *driver;
+	int ret = 0;
+
+	lockdep_assert_held_write(&group->group_rwsem);
+
+	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
+		return -EPERM;
+
+	down_write(&container->group_lock);
+
+	/* Real groups and fake groups cannot mix */
+	if (!list_empty(&container->group_list) &&
+	    container->noiommu != (group->type == VFIO_NO_IOMMU)) {
+		ret = -EPERM;
+		goto out_unlock_container;
+	}
+
+	if (group->type == VFIO_IOMMU) {
+		ret = iommu_group_claim_dma_owner(group->iommu_group, group);
+		if (ret)
+			goto out_unlock_container;
+	}
+
+	driver = container->iommu_driver;
+	if (driver) {
+		ret = driver->ops->attach_group(container->iommu_data,
+						group->iommu_group,
+						group->type);
+		if (ret) {
+			if (group->type == VFIO_IOMMU)
+				iommu_group_release_dma_owner(
+					group->iommu_group);
+			goto out_unlock_container;
+		}
+	}
+
+	group->container = container;
+	group->container_users = 1;
+	container->noiommu = (group->type == VFIO_NO_IOMMU);
+	list_add(&group->container_next, &container->group_list);
+
+	/* Get a reference on the container and mark a user within the group */
+	vfio_container_get(container);
+
+out_unlock_container:
+	up_write(&container->group_lock);
+	return ret;
+}
+
+void vfio_container_detatch_group(struct vfio_group *group)
+{
+	struct vfio_container *container = group->container;
+	struct vfio_iommu_driver *driver;
+
+	lockdep_assert_held_write(&group->group_rwsem);
+	WARN_ON(group->container_users != 1);
+
+	down_write(&container->group_lock);
+
+	driver = container->iommu_driver;
+	if (driver)
+		driver->ops->detach_group(container->iommu_data,
+					  group->iommu_group);
+
+	if (group->type == VFIO_IOMMU)
+		iommu_group_release_dma_owner(group->iommu_group);
+
+	group->container = NULL;
+	group->container_users = 0;
+	list_del(&group->container_next);
+
+	/* Detaching the last group deprivileges a container, remove iommu */
+	if (driver && list_empty(&container->group_list)) {
+		driver->ops->release(container->iommu_data);
+		module_put(driver->ops->owner);
+		container->iommu_driver = NULL;
+		container->iommu_data = NULL;
+	}
+
+	up_write(&container->group_lock);
+
+	vfio_container_put(container);
+}
+
+int vfio_device_assign_container(struct vfio_device *device)
+{
+	struct vfio_group *group = device->group;
+
+	lockdep_assert_held_write(&group->group_rwsem);
+
+	if (!group->container || !group->container->iommu_driver ||
+	    WARN_ON(!group->container_users))
+		return -EINVAL;
+
+	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
+		return -EPERM;
+
+	get_file(group->opened_file);
+	group->container_users++;
+	return 0;
+}
+
+void vfio_device_unassign_container(struct vfio_device *device)
+{
+	down_write(&device->group->group_rwsem);
+	WARN_ON(device->group->container_users <= 1);
+	device->group->container_users--;
+	fput(device->group->opened_file);
+	up_write(&device->group->group_rwsem);
+}
+
+/*
+ * Pin contiguous user pages and return their associated host pages for local
+ * domain only.
+ * @device [in]  : device
+ * @iova [in]    : starting IOVA of user pages to be pinned.
+ * @npage [in]   : count of pages to be pinned.  This count should not
+ *		   be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @prot [in]    : protection flags
+ * @pages[out]   : array of host pages
+ * Return error or number of pages pinned.
+ */
+int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
+		   int npage, int prot, struct page **pages)
+{
+	struct vfio_container *container;
+	struct vfio_group *group = device->group;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	if (!pages || !npage || !vfio_assert_device_open(device))
+		return -EINVAL;
+
+	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
+		return -E2BIG;
+
+	if (group->dev_counter > 1)
+		return -EINVAL;
+
+	/* group->container cannot change while a vfio device is open */
+	container = group->container;
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->pin_pages))
+		ret = driver->ops->pin_pages(container->iommu_data,
+					     group->iommu_group, iova,
+					     npage, prot, pages);
+	else
+		ret = -ENOTTY;
+
+	return ret;
+}
+EXPORT_SYMBOL(vfio_pin_pages);
+
+/*
+ * Unpin contiguous host pages for local domain only.
+ * @device [in]  : device
+ * @iova [in]    : starting address of user pages to be unpinned.
+ * @npage [in]   : count of pages to be unpinned.  This count should not
+ *                 be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ */
+void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
+{
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+
+	if (WARN_ON(npage <= 0 || npage > VFIO_PIN_PAGES_MAX_ENTRIES))
+		return;
+
+	if (WARN_ON(!vfio_assert_device_open(device)))
+		return;
+
+	/* group->container cannot change while a vfio device is open */
+	container = device->group->container;
+	driver = container->iommu_driver;
+
+	driver->ops->unpin_pages(container->iommu_data, iova, npage);
+}
+EXPORT_SYMBOL(vfio_unpin_pages);
+
+/*
+ * This interface allows the CPUs to perform some sort of virtual DMA on
+ * behalf of the device.
+ *
+ * CPUs read/write from/into a range of IOVAs pointing to user space memory
+ * into/from a kernel buffer.
+ *
+ * As the read/write of user space memory is conducted via the CPUs and is
+ * not a real device DMA, it is not necessary to pin the user space memory.
+ *
+ * @device [in]		: VFIO device
+ * @iova [in]		: base IOVA of a user space buffer
+ * @data [in]		: pointer to kernel buffer
+ * @len [in]		: kernel buffer length
+ * @write		: indicate read or write
+ * Return error code on failure or 0 on success.
+ */
+int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
+		size_t len, bool write)
+{
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	int ret = 0;
+
+	if (!data || len <= 0 || !vfio_assert_device_open(device))
+		return -EINVAL;
+
+	/* group->container cannot change while a vfio device is open */
+	container = device->group->container;
+	driver = container->iommu_driver;
+
+	if (likely(driver && driver->ops->dma_rw))
+		ret = driver->ops->dma_rw(container->iommu_data,
+					  iova, data, len, write);
+	else
+		ret = -ENOTTY;
+	return ret;
+}
+EXPORT_SYMBOL(vfio_dma_rw);
+
+int __init vfio_container_init(void)
+{
+	int ret;
+
+	mutex_init(&vfio.iommu_drivers_lock);
+	INIT_LIST_HEAD(&vfio.iommu_drivers_list);
+
+	ret = misc_register(&vfio_dev);
+	if (ret) {
+		pr_err("vfio: misc device register failed\n");
+		return ret;
+	}
+
+	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU)) {
+		ret = vfio_register_iommu_driver(&vfio_noiommu_ops);
+		if (ret)
+			goto err_misc;
+	}
+	return 0;
+
+err_misc:
+	misc_deregister(&vfio_dev);
+	return ret;
+}
+
+void vfio_container_cleanup(void)
+{
+	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU))
+		vfio_unregister_iommu_driver(&vfio_noiommu_ops);
+	misc_deregister(&vfio_dev);
+	mutex_destroy(&vfio.iommu_drivers_lock);
+}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 093784f1dea7a9..c4f11225cca015 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -11,6 +11,8 @@
 #include <linux/module.h>
 
 struct iommu_group;
+struct vfio_device;
+struct vfio_container;
 
 enum vfio_group_type {
 	/*
@@ -36,6 +38,25 @@ enum vfio_group_type {
 	VFIO_NO_IOMMU,
 };
 
+struct vfio_group {
+	struct device 			dev;
+	struct cdev			cdev;
+	refcount_t			users;
+	unsigned int			container_users;
+	struct iommu_group		*iommu_group;
+	struct vfio_container		*container;
+	struct list_head		device_list;
+	struct mutex			device_lock;
+	struct list_head		vfio_next;
+	struct list_head		container_next;
+	enum vfio_group_type		type;
+	unsigned int			dev_counter;
+	struct rw_semaphore		group_rwsem;
+	struct kvm			*kvm;
+	struct file			*opened_file;
+	struct blocking_notifier_head	notifier;
+};
+
 /* events for the backend driver notify callback */
 enum vfio_iommu_notify_type {
 	VFIO_IOMMU_CONTAINER_CLOSE = 0,
@@ -75,7 +96,33 @@ struct vfio_iommu_driver_ops {
 				  enum vfio_iommu_notify_type event);
 };
 
+struct vfio_iommu_driver {
+	const struct vfio_iommu_driver_ops	*ops;
+	struct list_head			vfio_next;
+};
+
 int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
 void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops);
 
+bool vfio_assert_device_open(struct vfio_device *device);
+
+struct vfio_container *vfio_container_from_file(struct file *filep);
+int vfio_device_assign_container(struct vfio_device *device);
+void vfio_device_unassign_container(struct vfio_device *device);
+int vfio_container_attach_group(struct vfio_group *group,
+                                struct vfio_container *container);
+void vfio_container_detatch_group(struct vfio_group *group);
+void vfio_container_register_device(struct vfio_device *device);
+void vfio_container_unregister_device(struct vfio_device *device);
+long vfio_container_ioctl_check_extension(struct vfio_container *container,
+					  unsigned long arg);
+int __init vfio_container_init(void);
+void vfio_container_cleanup(void);
+
+#ifdef CONFIG_VFIO_NOIOMMU
+extern bool vfio_noiommu __read_mostly;
+#else
+enum { vfio_noiommu = false };
+#endif
+
 #endif
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index cf4b5418f497a9..dcbb3482b1b53b 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -40,56 +40,12 @@
 
 static struct vfio {
 	struct class			*class;
-	struct list_head		iommu_drivers_list;
-	struct mutex			iommu_drivers_lock;
 	struct list_head		group_list;
 	struct mutex			group_lock; /* locks group_list */
 	struct ida			group_ida;
 	dev_t				group_devt;
 } vfio;
 
-struct vfio_iommu_driver {
-	const struct vfio_iommu_driver_ops	*ops;
-	struct list_head			vfio_next;
-};
-
-struct vfio_container {
-	struct kref			kref;
-	struct list_head		group_list;
-	struct rw_semaphore		group_lock;
-	struct vfio_iommu_driver	*iommu_driver;
-	void				*iommu_data;
-	bool				noiommu;
-};
-
-struct vfio_group {
-	struct device 			dev;
-	struct cdev			cdev;
-	refcount_t			users;
-	unsigned int			container_users;
-	struct iommu_group		*iommu_group;
-	struct vfio_container		*container;
-	struct list_head		device_list;
-	struct mutex			device_lock;
-	struct list_head		vfio_next;
-	struct list_head		container_next;
-	enum vfio_group_type		type;
-	unsigned int			dev_counter;
-	struct rw_semaphore		group_rwsem;
-	struct kvm			*kvm;
-	struct file			*opened_file;
-	struct blocking_notifier_head	notifier;
-};
-
-#ifdef CONFIG_VFIO_NOIOMMU
-static bool vfio_noiommu __read_mostly;
-module_param_named(enable_unsafe_noiommu_mode,
-		   vfio_noiommu, bool, S_IRUGO | S_IWUSR);
-MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode.  This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel.  If you do not know what this is for, step away. (default: false)");
-#else
-enum { vfio_noiommu = false };
-#endif
-
 static DEFINE_XARRAY(vfio_device_set_xa);
 static const struct file_operations vfio_group_fops;
 
@@ -164,140 +120,8 @@ static void vfio_release_device_set(struct vfio_device *device)
 	xa_unlock(&vfio_device_set_xa);
 }
 
-static void *vfio_noiommu_open(unsigned long arg)
-{
-	if (arg != VFIO_NOIOMMU_IOMMU)
-		return ERR_PTR(-EINVAL);
-	if (!capable(CAP_SYS_RAWIO))
-		return ERR_PTR(-EPERM);
-
-	return NULL;
-}
-
-static void vfio_noiommu_release(void *iommu_data)
-{
-}
-
-static long vfio_noiommu_ioctl(void *iommu_data,
-			       unsigned int cmd, unsigned long arg)
-{
-	if (cmd == VFIO_CHECK_EXTENSION)
-		return vfio_noiommu && (arg == VFIO_NOIOMMU_IOMMU) ? 1 : 0;
-
-	return -ENOTTY;
-}
-
-static int vfio_noiommu_attach_group(void *iommu_data,
-		struct iommu_group *iommu_group, enum vfio_group_type type)
-{
-	return 0;
-}
-
-static void vfio_noiommu_detach_group(void *iommu_data,
-				      struct iommu_group *iommu_group)
-{
-}
-
-static const struct vfio_iommu_driver_ops vfio_noiommu_ops = {
-	.name = "vfio-noiommu",
-	.owner = THIS_MODULE,
-	.open = vfio_noiommu_open,
-	.release = vfio_noiommu_release,
-	.ioctl = vfio_noiommu_ioctl,
-	.attach_group = vfio_noiommu_attach_group,
-	.detach_group = vfio_noiommu_detach_group,
-};
-
-/*
- * Only noiommu containers can use vfio-noiommu and noiommu containers can only
- * use vfio-noiommu.
- */
-static bool vfio_iommu_driver_allowed(struct vfio_container *container,
-				      const struct vfio_iommu_driver *driver)
-{
-	if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU))
-		return true;
-	return container->noiommu == (driver->ops == &vfio_noiommu_ops);
-}
-
-/*
- * IOMMU driver registration
- */
-int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops)
-{
-	struct vfio_iommu_driver *driver, *tmp;
-
-	if (WARN_ON(!ops->register_device != !ops->unregister_device))
-		return -EINVAL;
-
-	driver = kzalloc(sizeof(*driver), GFP_KERNEL);
-	if (!driver)
-		return -ENOMEM;
-
-	driver->ops = ops;
-
-	mutex_lock(&vfio.iommu_drivers_lock);
-
-	/* Check for duplicates */
-	list_for_each_entry(tmp, &vfio.iommu_drivers_list, vfio_next) {
-		if (tmp->ops == ops) {
-			mutex_unlock(&vfio.iommu_drivers_lock);
-			kfree(driver);
-			return -EINVAL;
-		}
-	}
-
-	list_add(&driver->vfio_next, &vfio.iommu_drivers_list);
-
-	mutex_unlock(&vfio.iommu_drivers_lock);
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(vfio_register_iommu_driver);
-
-void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
-{
-	struct vfio_iommu_driver *driver;
-
-	mutex_lock(&vfio.iommu_drivers_lock);
-	list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) {
-		if (driver->ops == ops) {
-			list_del(&driver->vfio_next);
-			mutex_unlock(&vfio.iommu_drivers_lock);
-			kfree(driver);
-			return;
-		}
-	}
-	mutex_unlock(&vfio.iommu_drivers_lock);
-}
-EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
-
 static void vfio_group_get(struct vfio_group *group);
 
-/*
- * Container objects - containers are created when /dev/vfio/vfio is
- * opened, but their lifecycle extends until the last user is done, so
- * it's freed via kref.  Must support container/group/device being
- * closed in any order.
- */
-static void vfio_container_get(struct vfio_container *container)
-{
-	kref_get(&container->kref);
-}
-
-static void vfio_container_release(struct kref *kref)
-{
-	struct vfio_container *container;
-	container = container_of(kref, struct vfio_container, kref);
-
-	kfree(container);
-}
-
-static void vfio_container_put(struct vfio_container *container)
-{
-	kref_put(&container->kref, vfio_container_release);
-}
-
 /*
  * Group objects - create, release, get, put, search
  */
@@ -702,263 +526,9 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 }
 EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
 
-/*
- * VFIO base fd, /dev/vfio/vfio
- */
-static long
-vfio_container_ioctl_check_extension(struct vfio_container *container,
-				     unsigned long arg)
-{
-	struct vfio_iommu_driver *driver;
-	long ret = 0;
-
-	down_read(&container->group_lock);
-
-	driver = container->iommu_driver;
-
-	switch (arg) {
-		/* No base extensions yet */
-	default:
-		/*
-		 * If no driver is set, poll all registered drivers for
-		 * extensions and return the first positive result.  If
-		 * a driver is already set, further queries will be passed
-		 * only to that driver.
-		 */
-		if (!driver) {
-			mutex_lock(&vfio.iommu_drivers_lock);
-			list_for_each_entry(driver, &vfio.iommu_drivers_list,
-					    vfio_next) {
-
-				if (!list_empty(&container->group_list) &&
-				    !vfio_iommu_driver_allowed(container,
-							       driver))
-					continue;
-				if (!try_module_get(driver->ops->owner))
-					continue;
-
-				ret = driver->ops->ioctl(NULL,
-							 VFIO_CHECK_EXTENSION,
-							 arg);
-				module_put(driver->ops->owner);
-				if (ret > 0)
-					break;
-			}
-			mutex_unlock(&vfio.iommu_drivers_lock);
-		} else
-			ret = driver->ops->ioctl(container->iommu_data,
-						 VFIO_CHECK_EXTENSION, arg);
-	}
-
-	up_read(&container->group_lock);
-
-	return ret;
-}
-
-/* hold write lock on container->group_lock */
-static int __vfio_container_attach_groups(struct vfio_container *container,
-					  struct vfio_iommu_driver *driver,
-					  void *data)
-{
-	struct vfio_group *group;
-	int ret = -ENODEV;
-
-	list_for_each_entry(group, &container->group_list, container_next) {
-		ret = driver->ops->attach_group(data, group->iommu_group,
-						group->type);
-		if (ret)
-			goto unwind;
-	}
-
-	return ret;
-
-unwind:
-	list_for_each_entry_continue_reverse(group, &container->group_list,
-					     container_next) {
-		driver->ops->detach_group(data, group->iommu_group);
-	}
-
-	return ret;
-}
-
-static long vfio_ioctl_set_iommu(struct vfio_container *container,
-				 unsigned long arg)
-{
-	struct vfio_iommu_driver *driver;
-	long ret = -ENODEV;
-
-	down_write(&container->group_lock);
-
-	/*
-	 * The container is designed to be an unprivileged interface while
-	 * the group can be assigned to specific users.  Therefore, only by
-	 * adding a group to a container does the user get the privilege of
-	 * enabling the iommu, which may allocate finite resources.  There
-	 * is no unset_iommu, but by removing all the groups from a container,
-	 * the container is deprivileged and returns to an unset state.
-	 */
-	if (list_empty(&container->group_list) || container->iommu_driver) {
-		up_write(&container->group_lock);
-		return -EINVAL;
-	}
-
-	mutex_lock(&vfio.iommu_drivers_lock);
-	list_for_each_entry(driver, &vfio.iommu_drivers_list, vfio_next) {
-		void *data;
-
-		if (!vfio_iommu_driver_allowed(container, driver))
-			continue;
-		if (!try_module_get(driver->ops->owner))
-			continue;
-
-		/*
-		 * The arg magic for SET_IOMMU is the same as CHECK_EXTENSION,
-		 * so test which iommu driver reported support for this
-		 * extension and call open on them.  We also pass them the
-		 * magic, allowing a single driver to support multiple
-		 * interfaces if they'd like.
-		 */
-		if (driver->ops->ioctl(NULL, VFIO_CHECK_EXTENSION, arg) <= 0) {
-			module_put(driver->ops->owner);
-			continue;
-		}
-
-		data = driver->ops->open(arg);
-		if (IS_ERR(data)) {
-			ret = PTR_ERR(data);
-			module_put(driver->ops->owner);
-			continue;
-		}
-
-		ret = __vfio_container_attach_groups(container, driver, data);
-		if (ret) {
-			driver->ops->release(data);
-			module_put(driver->ops->owner);
-			continue;
-		}
-
-		container->iommu_driver = driver;
-		container->iommu_data = data;
-		break;
-	}
-
-	mutex_unlock(&vfio.iommu_drivers_lock);
-	up_write(&container->group_lock);
-
-	return ret;
-}
-
-static long vfio_fops_unl_ioctl(struct file *filep,
-				unsigned int cmd, unsigned long arg)
-{
-	struct vfio_container *container = filep->private_data;
-	struct vfio_iommu_driver *driver;
-	void *data;
-	long ret = -EINVAL;
-
-	if (!container)
-		return ret;
-
-	switch (cmd) {
-	case VFIO_GET_API_VERSION:
-		ret = VFIO_API_VERSION;
-		break;
-	case VFIO_CHECK_EXTENSION:
-		ret = vfio_container_ioctl_check_extension(container, arg);
-		break;
-	case VFIO_SET_IOMMU:
-		ret = vfio_ioctl_set_iommu(container, arg);
-		break;
-	default:
-		driver = container->iommu_driver;
-		data = container->iommu_data;
-
-		if (driver) /* passthrough all unrecognized ioctls */
-			ret = driver->ops->ioctl(data, cmd, arg);
-	}
-
-	return ret;
-}
-
-static int vfio_fops_open(struct inode *inode, struct file *filep)
-{
-	struct vfio_container *container;
-
-	container = kzalloc(sizeof(*container), GFP_KERNEL);
-	if (!container)
-		return -ENOMEM;
-
-	INIT_LIST_HEAD(&container->group_list);
-	init_rwsem(&container->group_lock);
-	kref_init(&container->kref);
-
-	filep->private_data = container;
-
-	return 0;
-}
-
-static int vfio_fops_release(struct inode *inode, struct file *filep)
-{
-	struct vfio_container *container = filep->private_data;
-	struct vfio_iommu_driver *driver = container->iommu_driver;
-
-	if (driver && driver->ops->notify)
-		driver->ops->notify(container->iommu_data,
-				    VFIO_IOMMU_CONTAINER_CLOSE);
-
-	filep->private_data = NULL;
-
-	vfio_container_put(container);
-
-	return 0;
-}
-
-static const struct file_operations vfio_fops = {
-	.owner		= THIS_MODULE,
-	.open		= vfio_fops_open,
-	.release	= vfio_fops_release,
-	.unlocked_ioctl	= vfio_fops_unl_ioctl,
-	.compat_ioctl	= compat_ptr_ioctl,
-};
-
 /*
  * VFIO Group fd, /dev/vfio/$GROUP
  */
-static void vfio_container_detatch_group(struct vfio_group *group)
-{
-	struct vfio_container *container = group->container;
-	struct vfio_iommu_driver *driver;
-
-	lockdep_assert_held_write(&group->group_rwsem);
-	WARN_ON(group->container_users != 1);
-
-	down_write(&container->group_lock);
-
-	driver = container->iommu_driver;
-	if (driver)
-		driver->ops->detach_group(container->iommu_data,
-					  group->iommu_group);
-
-	if (group->type == VFIO_IOMMU)
-		iommu_group_release_dma_owner(group->iommu_group);
-
-	group->container = NULL;
-	group->container_users = 0;
-	list_del(&group->container_next);
-
-	/* Detaching the last group deprivileges a container, remove iommu */
-	if (driver && list_empty(&container->group_list)) {
-		driver->ops->release(container->iommu_data);
-		module_put(driver->ops->owner);
-		container->iommu_driver = NULL;
-		container->iommu_data = NULL;
-	}
-
-	up_write(&container->group_lock);
-
-	vfio_container_put(container);
-}
-
 /*
  * VFIO_GROUP_UNSET_CONTAINER should fail if there are other users or
  * if there was no container to unset.  Since the ioctl is called on
@@ -977,71 +547,6 @@ static int vfio_group_ioctl_unset_container(struct vfio_group *group)
 	return 0;
 }
 
-static struct vfio_container *vfio_container_from_file(struct file *file)
-{
-	struct vfio_container *container;
-
-	/* Sanity check, is this really our fd? */
-	if (file->f_op != &vfio_fops)
-		return NULL;
-
-	container = file->private_data;
-	WARN_ON(!container); /* fget ensures we don't race vfio_release */
-	return container;
-}
-
-static int vfio_container_attach_group(struct vfio_group *group,
-				       struct vfio_container *container)
-{
-	struct vfio_iommu_driver *driver;
-	int ret = 0;
-
-	lockdep_assert_held_write(&group->group_rwsem);
-
-	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
-		return -EPERM;
-
-	down_write(&container->group_lock);
-
-	/* Real groups and fake groups cannot mix */
-	if (!list_empty(&container->group_list) &&
-	    container->noiommu != (group->type == VFIO_NO_IOMMU)) {
-		ret = -EPERM;
-		goto out_unlock_container;
-	}
-
-	if (group->type == VFIO_IOMMU) {
-		ret = iommu_group_claim_dma_owner(group->iommu_group, group);
-		if (ret)
-			goto out_unlock_container;
-	}
-
-	driver = container->iommu_driver;
-	if (driver) {
-		ret = driver->ops->attach_group(container->iommu_data,
-						group->iommu_group,
-						group->type);
-		if (ret) {
-			if (group->type == VFIO_IOMMU)
-				iommu_group_release_dma_owner(
-					group->iommu_group);
-			goto out_unlock_container;
-		}
-	}
-
-	group->container = container;
-	group->container_users = 1;
-	container->noiommu = (group->type == VFIO_NO_IOMMU);
-	list_add(&group->container_next, &container->group_list);
-
-	/* Get a reference on the container and mark a user within the group */
-	vfio_container_get(container);
-
-out_unlock_container:
-	up_write(&container->group_lock);
-	return ret;
-}
-
 static int vfio_group_ioctl_set_container(struct vfio_group *group,
 					  int __user *arg)
 {
@@ -1078,58 +583,11 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 static const struct file_operations vfio_device_fops;
 
 /* true if the vfio_device has open_device() called but not close_device() */
-static bool vfio_assert_device_open(struct vfio_device *device)
+bool vfio_assert_device_open(struct vfio_device *device)
 {
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
-static int vfio_device_assign_container(struct vfio_device *device)
-{
-	struct vfio_group *group = device->group;
-
-	lockdep_assert_held_write(&group->group_rwsem);
-
-	if (!group->container || !group->container->iommu_driver ||
-	    WARN_ON(!group->container_users))
-		return -EINVAL;
-
-	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
-		return -EPERM;
-
-	get_file(group->opened_file);
-	group->container_users++;
-	return 0;
-}
-
-static void vfio_device_unassign_container(struct vfio_device *device)
-{
-	down_write(&device->group->group_rwsem);
-	WARN_ON(device->group->container_users <= 1);
-	device->group->container_users--;
-	fput(device->group->opened_file);
-	up_write(&device->group->group_rwsem);
-}
-
-static void vfio_container_register_device(struct vfio_device *device)
-{
-	struct vfio_iommu_driver *iommu_driver =
-		device->group->container->iommu_driver;
-
-	if (iommu_driver && iommu_driver->ops->register_device)
-		iommu_driver->ops->register_device(
-			device->group->container->iommu_data, device);
-}
-
-static void vfio_container_unregister_device(struct vfio_device *device)
-{
-	struct vfio_iommu_driver *iommu_driver =
-		device->group->container->iommu_driver;
-
-	if (iommu_driver && iommu_driver->ops->unregister_device)
-		iommu_driver->ops->unregister_device(
-			device->group->container->iommu_data, device);
-}
-
 static struct file *vfio_device_open(struct vfio_device *device)
 {
 	struct file *filep;
@@ -1951,114 +1409,6 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
 }
 EXPORT_SYMBOL(vfio_set_irqs_validate_and_prepare);
 
-/*
- * Pin contiguous user pages and return their associated host pages for local
- * domain only.
- * @device [in]  : device
- * @iova [in]    : starting IOVA of user pages to be pinned.
- * @npage [in]   : count of pages to be pinned.  This count should not
- *		   be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- * @prot [in]    : protection flags
- * @pages[out]   : array of host pages
- * Return error or number of pages pinned.
- */
-int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova,
-		   int npage, int prot, struct page **pages)
-{
-	struct vfio_container *container;
-	struct vfio_group *group = device->group;
-	struct vfio_iommu_driver *driver;
-	int ret;
-
-	if (!pages || !npage || !vfio_assert_device_open(device))
-		return -EINVAL;
-
-	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
-		return -E2BIG;
-
-	if (group->dev_counter > 1)
-		return -EINVAL;
-
-	/* group->container cannot change while a vfio device is open */
-	container = group->container;
-	driver = container->iommu_driver;
-	if (likely(driver && driver->ops->pin_pages))
-		ret = driver->ops->pin_pages(container->iommu_data,
-					     group->iommu_group, iova,
-					     npage, prot, pages);
-	else
-		ret = -ENOTTY;
-
-	return ret;
-}
-EXPORT_SYMBOL(vfio_pin_pages);
-
-/*
- * Unpin contiguous host pages for local domain only.
- * @device [in]  : device
- * @iova [in]    : starting address of user pages to be unpinned.
- * @npage [in]   : count of pages to be unpinned.  This count should not
- *                 be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
- */
-void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage)
-{
-	struct vfio_container *container;
-	struct vfio_iommu_driver *driver;
-
-	if (WARN_ON(npage <= 0 || npage > VFIO_PIN_PAGES_MAX_ENTRIES))
-		return;
-
-	if (WARN_ON(!vfio_assert_device_open(device)))
-		return;
-
-	/* group->container cannot change while a vfio device is open */
-	container = device->group->container;
-	driver = container->iommu_driver;
-
-	driver->ops->unpin_pages(container->iommu_data, iova, npage);
-}
-EXPORT_SYMBOL(vfio_unpin_pages);
-
-/*
- * This interface allows the CPUs to perform some sort of virtual DMA on
- * behalf of the device.
- *
- * CPUs read/write from/into a range of IOVAs pointing to user space memory
- * into/from a kernel buffer.
- *
- * As the read/write of user space memory is conducted via the CPUs and is
- * not a real device DMA, it is not necessary to pin the user space memory.
- *
- * @device [in]		: VFIO device
- * @iova [in]		: base IOVA of a user space buffer
- * @data [in]		: pointer to kernel buffer
- * @len [in]		: kernel buffer length
- * @write		: indicate read or write
- * Return error code on failure or 0 on success.
- */
-int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
-		size_t len, bool write)
-{
-	struct vfio_container *container;
-	struct vfio_iommu_driver *driver;
-	int ret = 0;
-
-	if (!data || len <= 0 || !vfio_assert_device_open(device))
-		return -EINVAL;
-
-	/* group->container cannot change while a vfio device is open */
-	container = device->group->container;
-	driver = container->iommu_driver;
-
-	if (likely(driver && driver->ops->dma_rw))
-		ret = driver->ops->dma_rw(container->iommu_data,
-					  iova, data, len, write);
-	else
-		ret = -ENOTTY;
-	return ret;
-}
-EXPORT_SYMBOL(vfio_dma_rw);
-
 /*
  * Module/class support
  */
@@ -2067,47 +1417,6 @@ static char *vfio_devnode(struct device *dev, umode_t *mode)
 	return kasprintf(GFP_KERNEL, "vfio/%s", dev_name(dev));
 }
 
-static struct miscdevice vfio_dev = {
-	.minor = VFIO_MINOR,
-	.name = "vfio",
-	.fops = &vfio_fops,
-	.nodename = "vfio/vfio",
-	.mode = S_IRUGO | S_IWUGO,
-};
-
-static int __init vfio_container_init(void)
-{
-	int ret;
-
-	mutex_init(&vfio.iommu_drivers_lock);
-	INIT_LIST_HEAD(&vfio.iommu_drivers_list);
-
-	ret = misc_register(&vfio_dev);
-	if (ret) {
-		pr_err("vfio: misc device register failed\n");
-		return ret;
-	}
-
-	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU)) {
-		ret = vfio_register_iommu_driver(&vfio_noiommu_ops);
-		if (ret)
-			goto err_misc;
-	}
-	return 0;
-
-err_misc:
-	misc_deregister(&vfio_dev);
-	return ret;
-}
-
-static void vfio_container_cleanup(void)
-{
-	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU))
-		vfio_unregister_iommu_driver(&vfio_noiommu_ops);
-	misc_deregister(&vfio_dev);
-	mutex_destroy(&vfio.iommu_drivers_lock);
-}
-
 static int __init vfio_init(void)
 {
 	int ret;
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* RE: [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h
  2022-08-31  1:01 ` [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h Jason Gunthorpe
@ 2022-08-31  8:42   ` Tian, Kevin
  2022-08-31 15:36     ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Tian, Kevin @ 2022-08-31  8:42 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, August 31, 2022 9:02 AM
> 
> As is normal for headers.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> index 503bea6c843d56..093784f1dea7a9 100644
> --- a/drivers/vfio/vfio.h
> +++ b/drivers/vfio/vfio.h
> @@ -3,6 +3,14 @@
>   * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
>   *     Author: Alex Williamson <alex.williamson@redhat.com>
>   */
> +#ifndef __VFIO_VFIO_H__
> +#define __VFIO_VFIO_H__
> +
> +#include <linux/device.h>
> +#include <linux/cdev.h>
> +#include <linux/module.h>

Curious what is the criteria for which header inclusions should be
placed here. If it is for everything required by the definitions in
this file then the list is not complete, e.g. <linux/iommu.h> is
obviously missing.

btw while they are moved here the inclusions in vfio_main.c are
not removed in patch8.

> +
> +struct iommu_group;
> 
>  enum vfio_group_type {
>  	/*
> @@ -69,3 +77,5 @@ struct vfio_iommu_driver_ops {
> 
>  int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
>  void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops
> *ops);
> +
> +#endif
> --
> 2.37.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 2/8] vfio: Rename __vfio_group_unset_container()
  2022-08-31  1:01 ` [PATCH 2/8] vfio: Rename __vfio_group_unset_container() Jason Gunthorpe
@ 2022-08-31  8:46   ` Tian, Kevin
  2022-09-02  0:28     ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Tian, Kevin @ 2022-08-31  8:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, August 31, 2022 9:02 AM
> 
> To vfio_container_detatch_group(). This function is really a container
> function.
> 
> Fold the WARN_ON() into it as a precondition assertion.
> 
> A following patch will move the vfio_container functions to their own .c
> file.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio_main.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index bfa6119ba47337..e145c87f208f3a 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -928,12 +928,13 @@ static const struct file_operations vfio_fops = {
>  /*
>   * VFIO Group fd, /dev/vfio/$GROUP
>   */
> -static void __vfio_group_unset_container(struct vfio_group *group)
> +static void vfio_container_detatch_group(struct vfio_group *group)

s/detatch/detach/

Given it's a vfio_container function is it better to have a container pointer
as the first parameter, i.e.:

static void vfio_container_detatch_group(struct vfio_container *container,
		struct vfio_group *group)

ditto for patch7:

+static void vfio_container_register_device(struct vfio_device *device)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 3/8] vfio: Split the container logic into vfio_container_attach_group()
  2022-08-31  1:01 ` [PATCH 3/8] vfio: Split the container logic into vfio_container_attach_group() Jason Gunthorpe
@ 2022-08-31  8:47   ` Tian, Kevin
  2022-09-02  0:42     ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Tian, Kevin @ 2022-08-31  8:47 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, August 31, 2022 9:02 AM
> 
> +static int vfio_container_attach_group(struct vfio_group *group,
> +				       struct vfio_container *container)

exchange the order of parameters.

Otherwise,

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 4/8] vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU
  2022-08-31  1:01 ` [PATCH 4/8] vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU Jason Gunthorpe
@ 2022-08-31  8:48   ` Tian, Kevin
  2022-08-31 15:28     ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Tian, Kevin @ 2022-08-31  8:48 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, August 31, 2022 9:02 AM
>  #ifdef CONFIG_VFIO_NOIOMMU
> -static bool noiommu __read_mostly;
> +static bool vfio_noiommu __read_mostly;
>  module_param_named(enable_unsafe_noiommu_mode,
> -		   noiommu, bool, S_IRUGO | S_IWUSR);
> +		   vfio_noiommu, bool, S_IRUGO | S_IWUSR);
>  MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE,
> no-IOMMU mode.  This mode provides no device isolation, no DMA
> translation, no host kernel protection, cannot be used for device assignment
> to virtual machines, requires RAWIO permissions, and will taint the kernel.  If
> you do not know what this is for, step away. (default: false)");
> +#else
> +enum { vfio_noiommu = false };
>  #endif

what is the benefit of enum here?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 5/8] vfio: Split out container code from the init/cleanup functions
  2022-08-31  1:01 ` [PATCH 5/8] vfio: Split out container code from the init/cleanup functions Jason Gunthorpe
@ 2022-08-31  8:49   ` Tian, Kevin
  0 siblings, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2022-08-31  8:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, August 31, 2022 9:02 AM
> 
> This miscdev, noiommu driver and a coule of globals are all container
> items. Move this init into its own functions.
> 
> A following patch will move the vfio_container functions to their own .c
> file.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 6/8] vfio: Rename vfio_ioctl_check_extension()
  2022-08-31  1:02 ` [PATCH 6/8] vfio: Rename vfio_ioctl_check_extension() Jason Gunthorpe
@ 2022-08-31  8:50   ` Tian, Kevin
  0 siblings, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2022-08-31  8:50 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, August 31, 2022 9:02 AM
> 
> To vfio_container_ioctl_check_extension().
> 
> A following patch will turn this into a non-static function, make it clear
> it is related to the container.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/8] vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU
  2022-08-31  8:48   ` Tian, Kevin
@ 2022-08-31 15:28     ` Jason Gunthorpe
  2022-09-01  2:18       ` Tian, Kevin
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31 15:28 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Alex Williamson, Cornelia Huck, kvm

On Wed, Aug 31, 2022 at 08:48:55AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, August 31, 2022 9:02 AM
> >  #ifdef CONFIG_VFIO_NOIOMMU
> > -static bool noiommu __read_mostly;
> > +static bool vfio_noiommu __read_mostly;
> >  module_param_named(enable_unsafe_noiommu_mode,
> > -		   noiommu, bool, S_IRUGO | S_IWUSR);
> > +		   vfio_noiommu, bool, S_IRUGO | S_IWUSR);
> >  MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE,
> > no-IOMMU mode.  This mode provides no device isolation, no DMA
> > translation, no host kernel protection, cannot be used for device assignment
> > to virtual machines, requires RAWIO permissions, and will taint the kernel.  If
> > you do not know what this is for, step away. (default: false)");
> > +#else
> > +enum { vfio_noiommu = false };
> >  #endif
> 
> what is the benefit of enum here?

It means we don't have to use #ifdef to protect references to
vfio_noiommu. Do mean enum vs #define? I prefer generally prefer enums
as they behave more like a variable.

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h
  2022-08-31  8:42   ` Tian, Kevin
@ 2022-08-31 15:36     ` Jason Gunthorpe
  2022-08-31 18:02       ` Alex Williamson
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31 15:36 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Alex Williamson, Cornelia Huck, kvm

On Wed, Aug 31, 2022 at 08:42:17AM +0000, Tian, Kevin wrote:

> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 503bea6c843d56..093784f1dea7a9 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -3,6 +3,14 @@
> >   * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
> >   *     Author: Alex Williamson <alex.williamson@redhat.com>
> >   */
> > +#ifndef __VFIO_VFIO_H__
> > +#define __VFIO_VFIO_H__
> > +
> > +#include <linux/device.h>
> > +#include <linux/cdev.h>
> > +#include <linux/module.h>
> 
> Curious what is the criteria for which header inclusions should be
> placed here. If it is for everything required by the definitions in
> this file then the list is not complete, e.g. <linux/iommu.h> is
> obviously missing.

It isn't missing:

$ clang-14 -Wp,-MMD,drivers/vfio/.vfio_main.o.d -nostdinc -I../arch/x86/include -I./arch/x86/include/generated -I../include -I./include -I../arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I../include/uapi -I./include/generated/uapi -include ../include/linux/compiler-version.h -include ../include/linux/kconfig.h -include ../include/linux/compiler_types.h -D__KERNEL__ -Qunused-arguments -fmacro-prefix-map=../= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu11 --target=x86_64-linux-gnu -fintegrated-as -Werror=unknown-warning-option -Werror=ignored-optimization-argument -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m64 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mstack-alignment=8 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-address-of-packed-member -O2 -Wframe-larger-than=2048 -fno-stack-protector -Wimplicit-fallthrough -Wno-gnu -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -fno-stack-clash-protection -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wcast-function-type -fno-strict-overflow -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare -Wno-unaligned-access -I ../drivers/vfio -I ./drivers/vfio  -DMODULE  -DKBUILD_BASENAME='"vfio_main"' -DKBUILD_MODNAME='"vfio"' -D__KBUILD_MODNAME=kmod_vfio -c -o /tmp/jnk.o ../drivers/vfio/vfio.h
[no error]

The criteria I like to use is if the header is able to compile
stand-alone.

> btw while they are moved here the inclusions in vfio_main.c are
> not removed in patch8.

?  I'm not sure I understand this

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h
  2022-08-31 15:36     ` Jason Gunthorpe
@ 2022-08-31 18:02       ` Alex Williamson
  2022-08-31 18:24         ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Williamson @ 2022-08-31 18:02 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Tian, Kevin, Cornelia Huck, kvm

On Wed, 31 Aug 2022 12:36:02 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Aug 31, 2022 at 08:42:17AM +0000, Tian, Kevin wrote:
> 
> > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > index 503bea6c843d56..093784f1dea7a9 100644
> > > --- a/drivers/vfio/vfio.h
> > > +++ b/drivers/vfio/vfio.h
> > > @@ -3,6 +3,14 @@
> > >   * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
> > >   *     Author: Alex Williamson <alex.williamson@redhat.com>
> > >   */
> > > +#ifndef __VFIO_VFIO_H__
> > > +#define __VFIO_VFIO_H__
> > > +
> > > +#include <linux/device.h>
> > > +#include <linux/cdev.h>
> > > +#include <linux/module.h>  
> > 
> > Curious what is the criteria for which header inclusions should be
> > placed here. If it is for everything required by the definitions in
> > this file then the list is not complete, e.g. <linux/iommu.h> is
> > obviously missing.  
> 
> It isn't missing:
> 
> $ clang-14 -Wp,-MMD,drivers/vfio/.vfio_main.o.d -nostdinc -I../arch/x86/include -I./arch/x86/include/generated -I../include -I./include -I../arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I../include/uapi -I./include/generated/uapi -include ../include/linux/compiler-version.h -include ../include/linux/kconfig.h -include ../include/linux/compiler_types.h -D__KERNEL__ -Qunused-arguments -fmacro-prefix-map=../= -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -std=gnu11 --target=x86_64-linux-gnu -fintegrated-as -Werror=unknown-warning-option -Werror=ignored-optimization-argument -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -fcf-protection=none -m64 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mstack-alignment=8 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -Wno-sign-compare -fno-asynchronous-u
 nwind-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-address-of-packed-member -O2 -Wframe-larger-than=2048 -fno-stack-protector -Wimplicit-fallthrough -Wno-gnu -Wno-unused-but-set-variable -Wno-unused-const-variable -fomit-frame-pointer -fno-stack-clash-protection -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wcast-function-type -fno-strict-overflow -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare -Wno-unaligned-access -I ../drivers/vfio -I ./drivers/vfio  -DMODULE  -DKBUILD_BASENAME='"vfio_main"' -DKBUILD_MODNAME='"vfio"' -D__KBUILD_MODNAME=kmod_vfio -c -o /tmp/jnk.o ../drivers/vfio/vfio.h
> [no error]
> 
> The criteria I like to use is if the header is able to compile
> stand-alone.

Is this stream of consciousness or is there some tooling for this? ;)

> > btw while they are moved here the inclusions in vfio_main.c are
> > not removed in patch8.  
> 
> ?  I'm not sure I understand this

I think Kevin is asking why these includes were not also removed from
vfio_main.c when adding them to vfio.h.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h
  2022-08-31 18:02       ` Alex Williamson
@ 2022-08-31 18:24         ` Jason Gunthorpe
  2022-09-01  2:17           ` Tian, Kevin
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-08-31 18:24 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Tian, Kevin, Cornelia Huck, kvm

On Wed, Aug 31, 2022 at 12:02:31PM -0600, Alex Williamson wrote:

> > The criteria I like to use is if the header is able to compile
> > stand-alone.
> 
> Is this stream of consciousness or is there some tooling for this? ;)

In my case I'm using clangd https://clangd.llvm.org/ in the editor,
which checks header files for self-consistency.

But there is also https://include-what-you-use.org/ (though I have
never tried it)

But the above wack of text is just the normal compiler invocation of
vfio_main.c with main.c replaced by the header.

> > > btw while they are moved here the inclusions in vfio_main.c are
> > > not removed in patch8.  
> > 
> > ?  I'm not sure I understand this
> 
> I think Kevin is asking why these includes were not also removed from
> vfio_main.c when adding them to vfio.h.  Thanks,

Oh, I am actually unclear what is policy/preference/consensus in that
area.

I know a strong camp is to avoid implicit includes, so the duplicated
includes are welcomed. include-what-you-use for instance is that
philosophy.

Do you have a preference?

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h
  2022-08-31 18:24         ` Jason Gunthorpe
@ 2022-09-01  2:17           ` Tian, Kevin
  0 siblings, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2022-09-01  2:17 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson; +Cc: Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, September 1, 2022 2:25 AM
> 
> On Wed, Aug 31, 2022 at 12:02:31PM -0600, Alex Williamson wrote:
> 
> > > The criteria I like to use is if the header is able to compile
> > > stand-alone.
> >
> > Is this stream of consciousness or is there some tooling for this? ;)
> 
> In my case I'm using clangd https://clangd.llvm.org/ in the editor,
> which checks header files for self-consistency.
> 
> But there is also https://include-what-you-use.org/ (though I have
> never tried it)
> 
> But the above wack of text is just the normal compiler invocation of
> vfio_main.c with main.c replaced by the header.

Thanks. This is a good learning.

> 
> > > > btw while they are moved here the inclusions in vfio_main.c are
> > > > not removed in patch8.
> > >
> > > ?  I'm not sure I understand this
> >
> > I think Kevin is asking why these includes were not also removed from
> > vfio_main.c when adding them to vfio.h.  Thanks,

yes

> 
> Oh, I am actually unclear what is policy/preference/consensus in that
> area.

me either. That is why I raised this open in case there is already
a consensus in place. 😊

> 
> I know a strong camp is to avoid implicit includes, so the duplicated
> includes are welcomed. include-what-you-use for instance is that
> philosophy.
> 
> Do you have a preference?
> 
> Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 4/8] vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU
  2022-08-31 15:28     ` Jason Gunthorpe
@ 2022-09-01  2:18       ` Tian, Kevin
  0 siblings, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2022-09-01  2:18 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Alex Williamson, Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, August 31, 2022 11:29 PM
> 
> On Wed, Aug 31, 2022 at 08:48:55AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, August 31, 2022 9:02 AM
> > >  #ifdef CONFIG_VFIO_NOIOMMU
> > > -static bool noiommu __read_mostly;
> > > +static bool vfio_noiommu __read_mostly;
> > >  module_param_named(enable_unsafe_noiommu_mode,
> > > -		   noiommu, bool, S_IRUGO | S_IWUSR);
> > > +		   vfio_noiommu, bool, S_IRUGO | S_IWUSR);
> > >  MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable
> UNSAFE,
> > > no-IOMMU mode.  This mode provides no device isolation, no DMA
> > > translation, no host kernel protection, cannot be used for device
> assignment
> > > to virtual machines, requires RAWIO permissions, and will taint the kernel.
> If
> > > you do not know what this is for, step away. (default: false)");
> > > +#else
> > > +enum { vfio_noiommu = false };
> > >  #endif
> >
> > what is the benefit of enum here?
> 
> It means we don't have to use #ifdef to protect references to
> vfio_noiommu. Do mean enum vs #define? I prefer generally prefer enums
> as they behave more like a variable.
> 

Okay.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] vfio: Rename __vfio_group_unset_container()
  2022-08-31  8:46   ` Tian, Kevin
@ 2022-09-02  0:28     ` Jason Gunthorpe
  2022-09-02  3:51       ` Tian, Kevin
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-09-02  0:28 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Alex Williamson, Cornelia Huck, kvm

On Wed, Aug 31, 2022 at 08:46:30AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, August 31, 2022 9:02 AM
> > 
> > To vfio_container_detatch_group(). This function is really a container
> > function.
> > 
> > Fold the WARN_ON() into it as a precondition assertion.
> > 
> > A following patch will move the vfio_container functions to their own .c
> > file.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> >  drivers/vfio/vfio_main.c | 11 +++++------
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index bfa6119ba47337..e145c87f208f3a 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -928,12 +928,13 @@ static const struct file_operations vfio_fops = {
> >  /*
> >   * VFIO Group fd, /dev/vfio/$GROUP
> >   */
> > -static void __vfio_group_unset_container(struct vfio_group *group)
> > +static void vfio_container_detatch_group(struct vfio_group *group)
> 
> s/detatch/detach/

Oops

> Given it's a vfio_container function is it better to have a container pointer
> as the first parameter, i.e.:
> 
> static void vfio_container_detatch_group(struct vfio_container *container,
> 		struct vfio_group *group)

Ah, I'm not so sure, it seems weird to make the caller do
group->container then pass the group in as well.

This call assumes the container is the container of the group, it
doesn't work in situations where that isn't true.

It is kind of weird layering in a way, arguably we should have the
current group stored in the container if we want things to work that
way, but that is a big change and not that wortwhile I think.

Patch 7 is pretty much the same, it doesn't work right unless the
container and device are already matched

Thanks,
Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/8] vfio: Split the container logic into vfio_container_attach_group()
  2022-08-31  8:47   ` Tian, Kevin
@ 2022-09-02  0:42     ` Jason Gunthorpe
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2022-09-02  0:42 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Alex Williamson, Cornelia Huck, kvm

On Wed, Aug 31, 2022 at 08:47:46AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, August 31, 2022 9:02 AM
> > 
> > +static int vfio_container_attach_group(struct vfio_group *group,
> > +				       struct vfio_container *container)
> 
> exchange the order of parameters.
> 
> Otherwise,
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

Done, thanks

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 2/8] vfio: Rename __vfio_group_unset_container()
  2022-09-02  0:28     ` Jason Gunthorpe
@ 2022-09-02  3:51       ` Tian, Kevin
  2022-09-02 14:39         ` Alex Williamson
  0 siblings, 1 reply; 30+ messages in thread
From: Tian, Kevin @ 2022-09-02  3:51 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Alex Williamson, Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, September 2, 2022 8:29 AM
> 
> On Wed, Aug 31, 2022 at 08:46:30AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, August 31, 2022 9:02 AM
> > >
> > > To vfio_container_detatch_group(). This function is really a container
> > > function.
> > >
> > > Fold the WARN_ON() into it as a precondition assertion.
> > >
> > > A following patch will move the vfio_container functions to their own .c
> > > file.
> > >
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > ---
> > >  drivers/vfio/vfio_main.c | 11 +++++------
> > >  1 file changed, 5 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > index bfa6119ba47337..e145c87f208f3a 100644
> > > --- a/drivers/vfio/vfio_main.c
> > > +++ b/drivers/vfio/vfio_main.c
> > > @@ -928,12 +928,13 @@ static const struct file_operations vfio_fops = {
> > >  /*
> > >   * VFIO Group fd, /dev/vfio/$GROUP
> > >   */
> > > -static void __vfio_group_unset_container(struct vfio_group *group)
> > > +static void vfio_container_detatch_group(struct vfio_group *group)
> >
> > s/detatch/detach/
> 
> Oops
> 
> > Given it's a vfio_container function is it better to have a container pointer
> > as the first parameter, i.e.:
> >
> > static void vfio_container_detatch_group(struct vfio_container *container,
> > 		struct vfio_group *group)
> 
> Ah, I'm not so sure, it seems weird to make the caller do
> group->container then pass the group in as well.
> 
> This call assumes the container is the container of the group, it
> doesn't work in situations where that isn't true.

Yes, this is a valid interpretation on doing this way.

Other places e.g. iommu_detach_group(domain, group) go the other way
even if internally domain is not used at all. I kind of leave that pattern
in mind thus raised this comment. But not a strong opinion.

> 
> It is kind of weird layering in a way, arguably we should have the
> current group stored in the container if we want things to work that
> way, but that is a big change and not that wortwhile I think.
> 
> Patch 7 is pretty much the same, it doesn't work right unless the
> container and device are already matched
> 

If Alex won't have a different preference and with the typo fixed,

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 7/8] vfio: Split the register_device ops call into functions
  2022-08-31  1:02 ` [PATCH 7/8] vfio: Split the register_device ops call into functions Jason Gunthorpe
@ 2022-09-02  3:52   ` Tian, Kevin
  0 siblings, 0 replies; 30+ messages in thread
From: Tian, Kevin @ 2022-09-02  3:52 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, August 31, 2022 9:02 AM
> 
> This is a container item.
> 
> A following patch will move the vfio_container functions to their own .c
> file.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] vfio: Rename __vfio_group_unset_container()
  2022-09-02  3:51       ` Tian, Kevin
@ 2022-09-02 14:39         ` Alex Williamson
  2022-09-02 16:24           ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Williamson @ 2022-09-02 14:39 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Jason Gunthorpe, Cornelia Huck, kvm

On Fri, 2 Sep 2022 03:51:01 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Friday, September 2, 2022 8:29 AM
> > 
> > On Wed, Aug 31, 2022 at 08:46:30AM +0000, Tian, Kevin wrote:  
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Wednesday, August 31, 2022 9:02 AM
> > > >
> > > > To vfio_container_detatch_group(). This function is really a container
> > > > function.
> > > >
> > > > Fold the WARN_ON() into it as a precondition assertion.
> > > >
> > > > A following patch will move the vfio_container functions to their own .c
> > > > file.
> > > >
> > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > ---
> > > >  drivers/vfio/vfio_main.c | 11 +++++------
> > > >  1 file changed, 5 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > index bfa6119ba47337..e145c87f208f3a 100644
> > > > --- a/drivers/vfio/vfio_main.c
> > > > +++ b/drivers/vfio/vfio_main.c
> > > > @@ -928,12 +928,13 @@ static const struct file_operations vfio_fops = {
> > > >  /*
> > > >   * VFIO Group fd, /dev/vfio/$GROUP
> > > >   */
> > > > -static void __vfio_group_unset_container(struct vfio_group *group)
> > > > +static void vfio_container_detatch_group(struct vfio_group *group)  
> > >
> > > s/detatch/detach/  
> > 
> > Oops
> >   
> > > Given it's a vfio_container function is it better to have a container pointer
> > > as the first parameter, i.e.:
> > >
> > > static void vfio_container_detatch_group(struct vfio_container *container,
> > > 		struct vfio_group *group)  
> > 
> > Ah, I'm not so sure, it seems weird to make the caller do
> > group->container then pass the group in as well.
> > 
> > This call assumes the container is the container of the group, it
> > doesn't work in situations where that isn't true.  
> 
> Yes, this is a valid interpretation on doing this way.
> 
> Other places e.g. iommu_detach_group(domain, group) go the other way
> even if internally domain is not used at all. I kind of leave that pattern
> in mind thus raised this comment. But not a strong opinion.
> 
> > 
> > It is kind of weird layering in a way, arguably we should have the
> > current group stored in the container if we want things to work that
> > way, but that is a big change and not that wortwhile I think.
> > 
> > Patch 7 is pretty much the same, it doesn't work right unless the
> > container and device are already matched
> >   
> 
> If Alex won't have a different preference and with the typo fixed,

I don't get it, if a group is detaching itself from a container, why
isn't it vfio_group_detach_container().  Given our naming scheme,
vfio_container_detach_group() absolutely implies the args Kevin
suggests, even if they're redundant.  vfio_foo_* functions should
operate on a a vfio_foo object.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] vfio: Rename __vfio_group_unset_container()
  2022-09-02 14:39         ` Alex Williamson
@ 2022-09-02 16:24           ` Jason Gunthorpe
  2022-09-06  3:35             ` Tian, Kevin
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Gunthorpe @ 2022-09-02 16:24 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Tian, Kevin, Cornelia Huck, kvm

On Fri, Sep 02, 2022 at 08:39:34AM -0600, Alex Williamson wrote:
> On Fri, 2 Sep 2022 03:51:01 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Friday, September 2, 2022 8:29 AM
> > > 
> > > On Wed, Aug 31, 2022 at 08:46:30AM +0000, Tian, Kevin wrote:  
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Wednesday, August 31, 2022 9:02 AM
> > > > >
> > > > > To vfio_container_detatch_group(). This function is really a container
> > > > > function.
> > > > >
> > > > > Fold the WARN_ON() into it as a precondition assertion.
> > > > >
> > > > > A following patch will move the vfio_container functions to their own .c
> > > > > file.
> > > > >
> > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > ---
> > > > >  drivers/vfio/vfio_main.c | 11 +++++------
> > > > >  1 file changed, 5 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > > index bfa6119ba47337..e145c87f208f3a 100644
> > > > > --- a/drivers/vfio/vfio_main.c
> > > > > +++ b/drivers/vfio/vfio_main.c
> > > > > @@ -928,12 +928,13 @@ static const struct file_operations vfio_fops = {
> > > > >  /*
> > > > >   * VFIO Group fd, /dev/vfio/$GROUP
> > > > >   */
> > > > > -static void __vfio_group_unset_container(struct vfio_group *group)
> > > > > +static void vfio_container_detatch_group(struct vfio_group *group)  
> > > >
> > > > s/detatch/detach/  
> > > 
> > > Oops
> > >   
> > > > Given it's a vfio_container function is it better to have a container pointer
> > > > as the first parameter, i.e.:
> > > >
> > > > static void vfio_container_detatch_group(struct vfio_container *container,
> > > > 		struct vfio_group *group)  
> > > 
> > > Ah, I'm not so sure, it seems weird to make the caller do
> > > group->container then pass the group in as well.
> > > 
> > > This call assumes the container is the container of the group, it
> > > doesn't work in situations where that isn't true.  
> > 
> > Yes, this is a valid interpretation on doing this way.
> > 
> > Other places e.g. iommu_detach_group(domain, group) go the other way
> > even if internally domain is not used at all. I kind of leave that pattern
> > in mind thus raised this comment. But not a strong opinion.
> > 
> > > 
> > > It is kind of weird layering in a way, arguably we should have the
> > > current group stored in the container if we want things to work that
> > > way, but that is a big change and not that wortwhile I think.
> > > 
> > > Patch 7 is pretty much the same, it doesn't work right unless the
> > > container and device are already matched
> > >   
> > 
> > If Alex won't have a different preference and with the typo fixed,
> 
> I don't get it, if a group is detaching itself from a container, why
> isn't it vfio_group_detach_container().  Given our naming scheme,
> vfio_container_detach_group() absolutely implies the args Kevin
> suggests, even if they're redundant.  vfio_foo_* functions should
> operate on a a vfio_foo object.

Look at the overall picture. This series moves struct vfio_container
into a .c file and then pulls all of the code that relies on it into
the c file too.

With our current function layout that results in these cut points:

struct vfio_container *vfio_container_from_file(struct file *filep);
int vfio_container_use(struct vfio_group *group);
void vfio_container_unuse(struct vfio_group *group);
int vfio_container_attach_group(struct vfio_container *container,
				struct vfio_group *group);
void vfio_container_detach_group(struct vfio_group *group);
void vfio_container_register_device(struct vfio_device *device);
void vfio_container_unregister_device(struct vfio_device *device);
long vfio_container_ioctl_check_extension(struct vfio_container *container,
					  unsigned long arg);
int vfio_container_pin_pages(struct vfio_device *device, dma_addr_t iova,
			     int npage, int prot, struct page **pages);
void vfio_container_unpin_pages(struct vfio_device *device, dma_addr_t iova,
				int npage);
int vfio_container_dma_rw(struct vfio_device *device, dma_addr_t iova,
			  void *data, size_t len, bool write);

Notice almost none of them use container as a function argument. The
container is always implied by another object that is linked to the
container.

Yet these are undeniably all container functions because they are only
necessary when the container code is actually being used. Every one of
this functions is bypassed if iommmufd is used. I even have a patch
that just replaces them all with nops if container and type 1 is
compiled out.

So, this presents two possiblities for naming. Either we call
everything in container.c vfio_container because it is primarily
related to the container.c *system*, or we label each function
according to OOP via the first argument type (vfio_XX_YY_container
perhaps) and still place them in container.c.

Keep in mind I have plans to see a group.c and rename vfio_main.c to
device.c, so having a vfio_group or vfio_device function in
container.c seems to get confusing.

In other words, I prefer to think of the group of functions above as
the exported API of the vfio container system (ie container.c) - not in
terms of an OOP modeling of a vfio_container object.

Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 2/8] vfio: Rename __vfio_group_unset_container()
  2022-09-02 16:24           ` Jason Gunthorpe
@ 2022-09-06  3:35             ` Tian, Kevin
  2022-09-06 19:38               ` Alex Williamson
  0 siblings, 1 reply; 30+ messages in thread
From: Tian, Kevin @ 2022-09-06  3:35 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson; +Cc: Cornelia Huck, kvm

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, September 3, 2022 12:25 AM
> 
> On Fri, Sep 02, 2022 at 08:39:34AM -0600, Alex Williamson wrote:
> > On Fri, 2 Sep 2022 03:51:01 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Friday, September 2, 2022 8:29 AM
> > > >
> > > > On Wed, Aug 31, 2022 at 08:46:30AM +0000, Tian, Kevin wrote:
> > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > Sent: Wednesday, August 31, 2022 9:02 AM
> > > > > >
> > > > > > To vfio_container_detatch_group(). This function is really a
> container
> > > > > > function.
> > > > > >
> > > > > > Fold the WARN_ON() into it as a precondition assertion.
> > > > > >
> > > > > > A following patch will move the vfio_container functions to their
> own .c
> > > > > > file.
> > > > > >
> > > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > ---
> > > > > >  drivers/vfio/vfio_main.c | 11 +++++------
> > > > > >  1 file changed, 5 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > > > index bfa6119ba47337..e145c87f208f3a 100644
> > > > > > --- a/drivers/vfio/vfio_main.c
> > > > > > +++ b/drivers/vfio/vfio_main.c
> > > > > > @@ -928,12 +928,13 @@ static const struct file_operations
> vfio_fops = {
> > > > > >  /*
> > > > > >   * VFIO Group fd, /dev/vfio/$GROUP
> > > > > >   */
> > > > > > -static void __vfio_group_unset_container(struct vfio_group *group)
> > > > > > +static void vfio_container_detatch_group(struct vfio_group *group)
> > > > >
> > > > > s/detatch/detach/
> > > >
> > > > Oops
> > > >
> > > > > Given it's a vfio_container function is it better to have a container
> pointer
> > > > > as the first parameter, i.e.:
> > > > >
> > > > > static void vfio_container_detatch_group(struct vfio_container
> *container,
> > > > > 		struct vfio_group *group)
> > > >
> > > > Ah, I'm not so sure, it seems weird to make the caller do
> > > > group->container then pass the group in as well.
> > > >
> > > > This call assumes the container is the container of the group, it
> > > > doesn't work in situations where that isn't true.
> > >
> > > Yes, this is a valid interpretation on doing this way.
> > >
> > > Other places e.g. iommu_detach_group(domain, group) go the other way
> > > even if internally domain is not used at all. I kind of leave that pattern
> > > in mind thus raised this comment. But not a strong opinion.
> > >
> > > >
> > > > It is kind of weird layering in a way, arguably we should have the
> > > > current group stored in the container if we want things to work that
> > > > way, but that is a big change and not that wortwhile I think.
> > > >
> > > > Patch 7 is pretty much the same, it doesn't work right unless the
> > > > container and device are already matched
> > > >
> > >
> > > If Alex won't have a different preference and with the typo fixed,
> >
> > I don't get it, if a group is detaching itself from a container, why
> > isn't it vfio_group_detach_container().  Given our naming scheme,
> > vfio_container_detach_group() absolutely implies the args Kevin
> > suggests, even if they're redundant.  vfio_foo_* functions should
> > operate on a a vfio_foo object.
> 
> Look at the overall picture. This series moves struct vfio_container
> into a .c file and then pulls all of the code that relies on it into
> the c file too.
> 
> With our current function layout that results in these cut points:
> 
> struct vfio_container *vfio_container_from_file(struct file *filep);
> int vfio_container_use(struct vfio_group *group);
> void vfio_container_unuse(struct vfio_group *group);
> int vfio_container_attach_group(struct vfio_container *container,
> 				struct vfio_group *group);
> void vfio_container_detach_group(struct vfio_group *group);
> void vfio_container_register_device(struct vfio_device *device);
> void vfio_container_unregister_device(struct vfio_device *device);
> long vfio_container_ioctl_check_extension(struct vfio_container *container,
> 					  unsigned long arg);
> int vfio_container_pin_pages(struct vfio_device *device, dma_addr_t iova,
> 			     int npage, int prot, struct page **pages);
> void vfio_container_unpin_pages(struct vfio_device *device, dma_addr_t
> iova,
> 				int npage);
> int vfio_container_dma_rw(struct vfio_device *device, dma_addr_t iova,
> 			  void *data, size_t len, bool write);
> 
> Notice almost none of them use container as a function argument. The
> container is always implied by another object that is linked to the
> container.
> 
> Yet these are undeniably all container functions because they are only
> necessary when the container code is actually being used. Every one of
> this functions is bypassed if iommmufd is used. I even have a patch
> that just replaces them all with nops if container and type 1 is
> compiled out.
> 
> So, this presents two possiblities for naming. Either we call
> everything in container.c vfio_container because it is primarily
> related to the container.c *system*, or we label each function
> according to OOP via the first argument type (vfio_XX_YY_container
> perhaps) and still place them in container.c.
> 
> Keep in mind I have plans to see a group.c and rename vfio_main.c to
> device.c, so having a vfio_group or vfio_device function in
> container.c seems to get confusing.

IMHO I don't see a big difference between two naming options if the
first parameter is kept as group or device object.

> 
> In other words, I prefer to think of the group of functions above as
> the exported API of the vfio container system (ie container.c) - not in
> terms of an OOP modeling of a vfio_container object.
> 

After reading above IMHO the OOP modeling wins a bit as far as
readability is concerned. Probably just my personal preference
but having most vfio_maintaier_xxx() helpers w/o explicitly providing
a vfio_maintainer object does affect my reading of related code.

At least vfio_XX_YY_container makes me feel better if we want to
avoid duplicating a vfio_maintainer object, e.g.:

struct vfio_container *vfio_container_from_file(struct file *filep);
int vfio_group_use_container(struct vfio_group *group);
void vfio_group_unuse_container(struct vfio_group *group);
int vfio_group_attach_container(struct vfio_group *group,
				vfio_container *container);
void vfio_group_detach_container(struct vfio_group *group);
void vfio_device_register_container(struct vfio_device *device);
void vfio_device_unregister_container(struct vfio_device *device);
long vfio_container_ioctl_check_extension(struct vfio_container *container,
					  unsigned long arg);
int vfio_pin_pages_container(struct vfio_device *device, dma_addr_t iova,
			     int npage, int prot, struct page **pages);
void vfio_unpin_pages_container(struct vfio_device *device, dma_addr_t
iova,
				int npage);
int vfio_dma_rw_container(struct vfio_device *device, dma_addr_t iova,
			  void *data, size_t len, bool write);

They are all container related. So although the first parameter is not
container we put them in container.c as the last word in the function
name is container.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] vfio: Rename __vfio_group_unset_container()
  2022-09-06  3:35             ` Tian, Kevin
@ 2022-09-06 19:38               ` Alex Williamson
  2022-09-06 23:06                 ` Jason Gunthorpe
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Williamson @ 2022-09-06 19:38 UTC (permalink / raw)
  To: Tian, Kevin; +Cc: Jason Gunthorpe, Cornelia Huck, kvm

On Tue, 6 Sep 2022 03:35:48 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, September 3, 2022 12:25 AM
> > 
> > On Fri, Sep 02, 2022 at 08:39:34AM -0600, Alex Williamson wrote:  
> > > On Fri, 2 Sep 2022 03:51:01 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >  
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Sent: Friday, September 2, 2022 8:29 AM
> > > > >
> > > > > On Wed, Aug 31, 2022 at 08:46:30AM +0000, Tian, Kevin wrote:  
> > > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > Sent: Wednesday, August 31, 2022 9:02 AM
> > > > > > >
> > > > > > > To vfio_container_detatch_group(). This function is really a  
> > container  
> > > > > > > function.
> > > > > > >
> > > > > > > Fold the WARN_ON() into it as a precondition assertion.
> > > > > > >
> > > > > > > A following patch will move the vfio_container functions to their  
> > own .c  
> > > > > > > file.
> > > > > > >
> > > > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > ---
> > > > > > >  drivers/vfio/vfio_main.c | 11 +++++------
> > > > > > >  1 file changed, 5 insertions(+), 6 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > > > > index bfa6119ba47337..e145c87f208f3a 100644
> > > > > > > --- a/drivers/vfio/vfio_main.c
> > > > > > > +++ b/drivers/vfio/vfio_main.c
> > > > > > > @@ -928,12 +928,13 @@ static const struct file_operations  
> > vfio_fops = {  
> > > > > > >  /*
> > > > > > >   * VFIO Group fd, /dev/vfio/$GROUP
> > > > > > >   */
> > > > > > > -static void __vfio_group_unset_container(struct vfio_group *group)
> > > > > > > +static void vfio_container_detatch_group(struct vfio_group *group)  
> > > > > >
> > > > > > s/detatch/detach/  
> > > > >
> > > > > Oops
> > > > >  
> > > > > > Given it's a vfio_container function is it better to have a container  
> > pointer  
> > > > > > as the first parameter, i.e.:
> > > > > >
> > > > > > static void vfio_container_detatch_group(struct vfio_container  
> > *container,  
> > > > > > 		struct vfio_group *group)  
> > > > >
> > > > > Ah, I'm not so sure, it seems weird to make the caller do
> > > > > group->container then pass the group in as well.
> > > > >
> > > > > This call assumes the container is the container of the group, it
> > > > > doesn't work in situations where that isn't true.  
> > > >
> > > > Yes, this is a valid interpretation on doing this way.
> > > >
> > > > Other places e.g. iommu_detach_group(domain, group) go the other way
> > > > even if internally domain is not used at all. I kind of leave that pattern
> > > > in mind thus raised this comment. But not a strong opinion.
> > > >  
> > > > >
> > > > > It is kind of weird layering in a way, arguably we should have the
> > > > > current group stored in the container if we want things to work that
> > > > > way, but that is a big change and not that wortwhile I think.
> > > > >
> > > > > Patch 7 is pretty much the same, it doesn't work right unless the
> > > > > container and device are already matched
> > > > >  
> > > >
> > > > If Alex won't have a different preference and with the typo fixed,  
> > >
> > > I don't get it, if a group is detaching itself from a container, why
> > > isn't it vfio_group_detach_container().  Given our naming scheme,
> > > vfio_container_detach_group() absolutely implies the args Kevin
> > > suggests, even if they're redundant.  vfio_foo_* functions should
> > > operate on a a vfio_foo object.  
> > 
> > Look at the overall picture. This series moves struct vfio_container
> > into a .c file and then pulls all of the code that relies on it into
> > the c file too.
> > 
> > With our current function layout that results in these cut points:
> > 
> > struct vfio_container *vfio_container_from_file(struct file *filep);
> > int vfio_container_use(struct vfio_group *group);
> > void vfio_container_unuse(struct vfio_group *group);
> > int vfio_container_attach_group(struct vfio_container *container,
> > 				struct vfio_group *group);
> > void vfio_container_detach_group(struct vfio_group *group);
> > void vfio_container_register_device(struct vfio_device *device);
> > void vfio_container_unregister_device(struct vfio_device *device);
> > long vfio_container_ioctl_check_extension(struct vfio_container *container,
> > 					  unsigned long arg);
> > int vfio_container_pin_pages(struct vfio_device *device, dma_addr_t iova,
> > 			     int npage, int prot, struct page **pages);
> > void vfio_container_unpin_pages(struct vfio_device *device, dma_addr_t
> > iova,
> > 				int npage);
> > int vfio_container_dma_rw(struct vfio_device *device, dma_addr_t iova,
> > 			  void *data, size_t len, bool write);
> > 
> > Notice almost none of them use container as a function argument. The
> > container is always implied by another object that is linked to the
> > container.
> > 
> > Yet these are undeniably all container functions because they are only
> > necessary when the container code is actually being used. Every one of
> > this functions is bypassed if iommmufd is used. I even have a patch
> > that just replaces them all with nops if container and type 1 is
> > compiled out.
> > 
> > So, this presents two possiblities for naming. Either we call
> > everything in container.c vfio_container because it is primarily
> > related to the container.c *system*, or we label each function
> > according to OOP via the first argument type (vfio_XX_YY_container
> > perhaps) and still place them in container.c.
> > 
> > Keep in mind I have plans to see a group.c and rename vfio_main.c to
> > device.c, so having a vfio_group or vfio_device function in
> > container.c seems to get confusing.  
> 
> IMHO I don't see a big difference between two naming options if the
> first parameter is kept as group or device object.
> 
> > 
> > In other words, I prefer to think of the group of functions above as
> > the exported API of the vfio container system (ie container.c) - not in
> > terms of an OOP modeling of a vfio_container object.
> >   
> 
> After reading above IMHO the OOP modeling wins a bit as far as
> readability is concerned. Probably just my personal preference
> but having most vfio_maintaier_xxx() helpers w/o explicitly providing
> a vfio_maintainer object does affect my reading of related code.
> 
> At least vfio_XX_YY_container makes me feel better if we want to
> avoid duplicating a vfio_maintainer object, e.g.:
> 
> struct vfio_container *vfio_container_from_file(struct file *filep);
> int vfio_group_use_container(struct vfio_group *group);
> void vfio_group_unuse_container(struct vfio_group *group);
> int vfio_group_attach_container(struct vfio_group *group,
> 				vfio_container *container);
> void vfio_group_detach_container(struct vfio_group *group);
> void vfio_device_register_container(struct vfio_device *device);
> void vfio_device_unregister_container(struct vfio_device *device);
> long vfio_container_ioctl_check_extension(struct vfio_container *container,
> 					  unsigned long arg);
> int vfio_pin_pages_container(struct vfio_device *device, dma_addr_t iova,
> 			     int npage, int prot, struct page **pages);
> void vfio_unpin_pages_container(struct vfio_device *device, dma_addr_t
> iova,
> 				int npage);
> int vfio_dma_rw_container(struct vfio_device *device, dma_addr_t iova,
> 			  void *data, size_t len, bool write);
> 
> They are all container related. So although the first parameter is not
> container we put them in container.c as the last word in the function
> name is container.

I might refine these to:

struct vfio_container *vfio_container_from_file(struct file *filep);

int vfio_group_use_container(struct vfio_group *group);
void vfio_group_unuse_container(struct vfio_group *group);

int vfio_container_attach_group(struct vfio_container *container,
				struct vfio_group *group);
void vfio_group_detach_container(struct vfio_group *group);

void vfio_device_container_register(struct vfio_device *device);
void vfio_device_container_unregister(struct vfio_device *device);

long vfio_container_ioctl_check_extension(struct vfio_container *container,
 					  unsigned long arg);

int vfio_device_pin_container_pages(struct vfio_device *device, dma_addr_t iova,
				    int npage, int prot, struct page **pages);
void vfio_device_unpin_container_pages(struct vfio_device *device, dma_addr_t
				       iova, int npage);

int vfio_device_dma_rw_container(struct vfio_device *device, dma_addr_t iova,
				 void *data, size_t len, bool write);

Overall, I'm still not really on board with sacrificing a "the name
tells you how to use it" API in order to break down devices, groups,
and containers into their own subsystems.  We should not only consider
the logical location of the function, but the usability and
intuitiveness of the API.

Are we necessarily finding the right splits here?  The {un}use,
{un}pin, and dma_rw in particular seem like they could be decomposed
further.  For example a vfio_group intermediary that calls
vfio_container_{un}use() with a vfio_container object.  Or a
vfio_device intermediary that can call vfio_container_ functions for
{un}pin/rw, also with a container object.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/8] vfio: Rename __vfio_group_unset_container()
  2022-09-06 19:38               ` Alex Williamson
@ 2022-09-06 23:06                 ` Jason Gunthorpe
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Gunthorpe @ 2022-09-06 23:06 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Tian, Kevin, Cornelia Huck, kvm

On Tue, Sep 06, 2022 at 01:38:11PM -0600, Alex Williamson wrote:

> I might refine these to:
> 
> struct vfio_container *vfio_container_from_file(struct file *filep);
> 
> int vfio_group_use_container(struct vfio_group *group);
> void vfio_group_unuse_container(struct vfio_group *group);
> 
> int vfio_container_attach_group(struct vfio_container *container,
> 				struct vfio_group *group);
> void vfio_group_detach_container(struct vfio_group *group);

IMHO it is weird to sacrifice the clear pair'd naming, keep the group
first?

> void vfio_device_container_register(struct vfio_device *device);
> void vfio_device_container_unregister(struct vfio_device *device);
> 
> long vfio_container_ioctl_check_extension(struct vfio_container *container,
>  					  unsigned long arg);

I'm fine with all of these, if you like it I'll change it - it is a
lot of work to change the names and rebase everything so please lets
all agree first.

> int vfio_device_pin_container_pages(struct vfio_device *device, dma_addr_t iova,
> 				    int npage, int prot, struct page **pages);
> void vfio_device_unpin_container_pages(struct vfio_device *device, dma_addr_t
> 				       iova, int npage);
> 
> int vfio_device_dma_rw_container(struct vfio_device *device, dma_addr_t iova,
> 				 void *data, size_t len, bool write);

These are exported symbols and I don't want to mess with them. The
fact they are in container.c is temporary artifact, as they do touch
the struct vfio_container. The iommufd series puts a wrapper in main.c
and the above can have signatures that only take in a container * -
which is what you suggest below. So lets leave them alone here.

> Overall, I'm still not really on board with sacrificing a "the name
> tells you how to use it" API in order to break down devices, groups,
> and containers into their own subsystems.  We should not only consider
> the logical location of the function, but the usability and
> intuitiveness of the API.

Sure, I'm not set on any particular naming scheme.

> Are we necessarily finding the right splits here?  The {un}use,
> {un}pin, and dma_rw in particular seem like they could be decomposed
> further.  

I can't think how to do anything better for {un}use. They are in
container.c because the entire container_users mechanism is gone when
the container code is gone, even though the mechanism is part of the
struct vfio_group. unuse doesn't even touch the vfio_container..

Lets just leave them with their group names and they slightly oddly
stay in container.c

> vfio_container_{un}use() with a vfio_container object.  Or a
> vfio_device intermediary that can call vfio_container_ functions for
> {un}pin/rw, also with a container object.  Thanks,

Some of the split is an artifact of how the code is right now. I don't
think you'd end up with exactly this interface if you designed it from
scratch, but we have what we have, and I don't have a lot of
enthusiasm to restructure significantly for the sake of naming..

Thanks,
Jason

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-09-06 23:06 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-31  1:01 [PATCH 0/8] vfio: Split the container code into a clean layer and dedicated file Jason Gunthorpe
2022-08-31  1:01 ` [PATCH 1/8] vfio: Add header guards and includes to drivers/vfio/vfio.h Jason Gunthorpe
2022-08-31  8:42   ` Tian, Kevin
2022-08-31 15:36     ` Jason Gunthorpe
2022-08-31 18:02       ` Alex Williamson
2022-08-31 18:24         ` Jason Gunthorpe
2022-09-01  2:17           ` Tian, Kevin
2022-08-31  1:01 ` [PATCH 2/8] vfio: Rename __vfio_group_unset_container() Jason Gunthorpe
2022-08-31  8:46   ` Tian, Kevin
2022-09-02  0:28     ` Jason Gunthorpe
2022-09-02  3:51       ` Tian, Kevin
2022-09-02 14:39         ` Alex Williamson
2022-09-02 16:24           ` Jason Gunthorpe
2022-09-06  3:35             ` Tian, Kevin
2022-09-06 19:38               ` Alex Williamson
2022-09-06 23:06                 ` Jason Gunthorpe
2022-08-31  1:01 ` [PATCH 3/8] vfio: Split the container logic into vfio_container_attach_group() Jason Gunthorpe
2022-08-31  8:47   ` Tian, Kevin
2022-09-02  0:42     ` Jason Gunthorpe
2022-08-31  1:01 ` [PATCH 4/8] vfio: Remove #ifdefs around CONFIG_VFIO_NOIOMMU Jason Gunthorpe
2022-08-31  8:48   ` Tian, Kevin
2022-08-31 15:28     ` Jason Gunthorpe
2022-09-01  2:18       ` Tian, Kevin
2022-08-31  1:01 ` [PATCH 5/8] vfio: Split out container code from the init/cleanup functions Jason Gunthorpe
2022-08-31  8:49   ` Tian, Kevin
2022-08-31  1:02 ` [PATCH 6/8] vfio: Rename vfio_ioctl_check_extension() Jason Gunthorpe
2022-08-31  8:50   ` Tian, Kevin
2022-08-31  1:02 ` [PATCH 7/8] vfio: Split the register_device ops call into functions Jason Gunthorpe
2022-09-02  3:52   ` Tian, Kevin
2022-08-31  1:02 ` [PATCH 8/8] vfio: Move container code into drivers/vfio/container.c Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.