linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] vfio: device fd address space and vfio-pci mmap invalidation cleanup
@ 2021-08-05 17:06 Alex Williamson
  2021-08-05 17:07 ` [PATCH 1/7] vfio: Create vfio_fs_type with inode per device Alex Williamson
                   ` (6 more replies)
  0 siblings, 7 replies; 38+ messages in thread
From: Alex Williamson @ 2021-08-05 17:06 UTC (permalink / raw)
  To: alex.williamson
  Cc: Andrew Morton, Jason Gunthorpe, linux-mm, linux-kernel, kvm, jgg, peterx

vfio-pci currently goes through some pretty nasty locking algorithms
since commit abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO
access on disabled memory") was added to invalidate and re-fault mmaps
to device MMIO around cases where device memory is disabled.  This
series greatly simplifies that by making use of an address space on
the vfio device file descriptor, as suggested by Jason Gunthorpe.
This allows us to use unmap_mapping_range() on the device fd to zap
such mappings, and by creating a vma-to-pfn callback, we can implement
a reverse function to restore all mappings.

This series was originally part of a larger series which also added a
callback to get a vfio device from a vma, which allows the IOMMU
backend to limit pfnmaps to vfio device memory.  The long term goal
is to implement the vma-to-pfn for all vfio device drivers to enable
this in the IOMMU backend and proceed with a mechanism to also
invalidate DMA mappings to device memory while disabled.

Given my slow progress towards that longer goal, I'd like to get this
in as an interim cleanup as it seems worthwhile on its own.  I'll
intend to rework this on top of Jason's device_open/close series.
Thanks,

Alex

---

Alex Williamson (7):
      vfio: Create vfio_fs_type with inode per device
      vfio: Export unmap_mapping_range() wrapper
      vfio/pci: Use vfio_device_unmap_mapping_range()
      vfio,vfio-pci: Add vma to pfn callback
      mm/interval_tree.c: Export vma interval tree iterators
      vfio: Add vfio_device_io_remap_mapping_range()
      vfio/pci: Remove map-on-fault behavior


 drivers/vfio/pci/vfio_pci.c         | 279 +++++++---------------------
 drivers/vfio/pci/vfio_pci_config.c  |   8 +-
 drivers/vfio/pci/vfio_pci_private.h |   5 +-
 drivers/vfio/vfio.c                 |  69 ++++++-
 include/linux/vfio.h                |  10 +
 mm/interval_tree.c                  |   3 +
 6 files changed, 156 insertions(+), 218 deletions(-)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 1/7] vfio: Create vfio_fs_type with inode per device
  2021-08-05 17:06 [PATCH 0/7] vfio: device fd address space and vfio-pci mmap invalidation cleanup Alex Williamson
@ 2021-08-05 17:07 ` Alex Williamson
  2021-08-10  8:43   ` Christoph Hellwig
  2021-08-05 17:07 ` [PATCH 2/7] vfio: Export unmap_mapping_range() wrapper Alex Williamson
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2021-08-05 17:07 UTC (permalink / raw)
  To: alex.williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm, jgg, peterx

By linking all the device fds we provide to userspace to an
address space through a new pseudo fs, we can use tools like
unmap_mapping_range() to zap all vmas associated with a device.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio.c  |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h |    1 +
 2 files changed, 58 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 02cc51ce6891..b88de89bda31 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -21,8 +21,10 @@
 #include <linux/list.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
+#include <linux/mount.h>
 #include <linux/mutex.h>
 #include <linux/pci.h>
+#include <linux/pseudo_fs.h>
 #include <linux/rwsem.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
@@ -37,6 +39,14 @@
 #define DRIVER_AUTHOR	"Alex Williamson <alex.williamson@redhat.com>"
 #define DRIVER_DESC	"VFIO - User Level meta-driver"
 
+/*
+ * Not exposed via UAPI
+ *
+ * XXX Adopt the following when available:
+ * https://lore.kernel.org/lkml/20210309155348.974875-1-hch@lst.de/
+ */
+#define VFIO_MAGIC 0x5646494f /* "VFIO" */
+
 static struct vfio {
 	struct class			*class;
 	struct list_head		iommu_drivers_list;
@@ -46,6 +56,8 @@ static struct vfio {
 	struct mutex			group_lock;
 	struct cdev			group_cdev;
 	dev_t				group_devt;
+	struct vfsmount			*vfio_fs_mnt;
+	int				vfio_fs_cnt;
 } vfio;
 
 struct vfio_iommu_driver {
@@ -519,6 +531,35 @@ static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
 	return group;
 }
 
+static int vfio_fs_init_fs_context(struct fs_context *fc)
+{
+	return init_pseudo(fc, VFIO_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type vfio_fs_type = {
+	.name = "vfio",
+	.owner = THIS_MODULE,
+	.init_fs_context = vfio_fs_init_fs_context,
+	.kill_sb = kill_anon_super,
+};
+
+static struct inode *vfio_fs_inode_new(void)
+{
+	struct inode *inode;
+	int ret;
+
+	ret = simple_pin_fs(&vfio_fs_type,
+			    &vfio.vfio_fs_mnt, &vfio.vfio_fs_cnt);
+	if (ret)
+		return ERR_PTR(ret);
+
+	inode = alloc_anon_inode(vfio.vfio_fs_mnt->mnt_sb);
+	if (IS_ERR(inode))
+		simple_release_fs(&vfio.vfio_fs_mnt, &vfio.vfio_fs_cnt);
+
+	return inode;
+}
+
 /**
  * Device objects - create, release, get, put, search
  */
@@ -783,6 +824,12 @@ int vfio_register_group_dev(struct vfio_device *device)
 		return -EBUSY;
 	}
 
+	device->inode = vfio_fs_inode_new();
+	if (IS_ERR(device->inode)) {
+		vfio_group_put(group);
+		return PTR_ERR(device->inode);
+	}
+
 	/* Our reference on group is moved to the device */
 	device->group = group;
 
@@ -907,6 +954,9 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 	group->dev_counter--;
 	mutex_unlock(&group->device_lock);
 
+	iput(device->inode);
+	simple_release_fs(&vfio.vfio_fs_mnt, &vfio.vfio_fs_cnt);
+
 	/*
 	 * In order to support multiple devices per group, devices can be
 	 * plucked from the group while other devices in the group are still
@@ -1411,6 +1461,13 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 	 */
 	filep->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
 
+	/*
+	 * Use the pseudo fs inode on the device to link all mmaps
+	 * to the same address space, allowing us to unmap all vmas
+	 * associated to this device using unmap_mapping_range().
+	 */
+	filep->f_mapping = device->inode->i_mapping;
+
 	atomic_inc(&group->container_users);
 
 	fd_install(ret, filep);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index a2c5b30e1763..90bcc2e9c8eb 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -24,6 +24,7 @@ struct vfio_device {
 	refcount_t refcount;
 	struct completion comp;
 	struct list_head group_next;
+	struct inode *inode;
 };
 
 /**



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 2/7] vfio: Export unmap_mapping_range() wrapper
  2021-08-05 17:06 [PATCH 0/7] vfio: device fd address space and vfio-pci mmap invalidation cleanup Alex Williamson
  2021-08-05 17:07 ` [PATCH 1/7] vfio: Create vfio_fs_type with inode per device Alex Williamson
@ 2021-08-05 17:07 ` Alex Williamson
  2021-08-10  8:45   ` Christoph Hellwig
  2021-08-10 18:56   ` Peter Xu
  2021-08-05 17:07 ` [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range() Alex Williamson
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 38+ messages in thread
From: Alex Williamson @ 2021-08-05 17:07 UTC (permalink / raw)
  To: alex.williamson; +Cc: linux-kernel, kvm, jgg, peterx

Allow bus drivers to use vfio pseudo fs mapping to zap all mmaps
across a range of their device files.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio.c  |    7 +++++++
 include/linux/vfio.h |    2 ++
 2 files changed, 9 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index b88de89bda31..1e4fc69fee7d 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -560,6 +560,13 @@ static struct inode *vfio_fs_inode_new(void)
 	return inode;
 }
 
+void vfio_device_unmap_mapping_range(struct vfio_device *device,
+				     loff_t start, loff_t len)
+{
+	unmap_mapping_range(device->inode->i_mapping, start, len, true);
+}
+EXPORT_SYMBOL_GPL(vfio_device_unmap_mapping_range);
+
 /**
  * Device objects - create, release, get, put, search
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 90bcc2e9c8eb..712813703e5a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -66,6 +66,8 @@ int vfio_register_group_dev(struct vfio_device *device);
 void vfio_unregister_group_dev(struct vfio_device *device);
 extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
 extern void vfio_device_put(struct vfio_device *device);
+extern void vfio_device_unmap_mapping_range(struct vfio_device *device,
+					    loff_t start, loff_t len);
 
 /* events for the backend driver notify callback */
 enum vfio_iommu_notify_type {



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-05 17:06 [PATCH 0/7] vfio: device fd address space and vfio-pci mmap invalidation cleanup Alex Williamson
  2021-08-05 17:07 ` [PATCH 1/7] vfio: Create vfio_fs_type with inode per device Alex Williamson
  2021-08-05 17:07 ` [PATCH 2/7] vfio: Export unmap_mapping_range() wrapper Alex Williamson
@ 2021-08-05 17:07 ` Alex Williamson
  2021-08-06  1:04   ` Jason Gunthorpe
                     ` (2 more replies)
  2021-08-05 17:07 ` [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback Alex Williamson
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 38+ messages in thread
From: Alex Williamson @ 2021-08-05 17:07 UTC (permalink / raw)
  To: alex.williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm, jgg, peterx

With the vfio device fd tied to the address space of the pseudo fs
inode, we can use the mm to track all vmas that might be mmap'ing
device BARs, which removes our vma_list and all the complicated lock
ordering necessary to manually zap each related vma.

Note that we can no longer store the pfn in vm_pgoff if we want to use
unmap_mapping_range() to zap a selective portion of the device fd
corresponding to BAR mappings.

This also converts our mmap fault handler to use vmf_insert_pfn()
because we no longer have a vma_list to avoid the concurrency problem
with io_remap_pfn_range().  This is a step towards removing the fault
handler entirely, at which point we'll return to using
io_remap_pfn_range().

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c         |  238 +++++++----------------------------
 drivers/vfio/pci/vfio_pci_private.h |    2 
 2 files changed, 49 insertions(+), 191 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 318864d52837..c526edbf1173 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -225,7 +225,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
 
 static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
 static void vfio_pci_disable(struct vfio_pci_device *vdev);
-static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data);
+static int vfio_pci_mem_trylock_and_zap_cb(struct pci_dev *pdev, void *data);
 
 /*
  * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -1141,7 +1141,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
 		struct vfio_pci_group_info info;
 		struct vfio_devices devs = { .cur_index = 0 };
 		bool slot = false;
-		int i, group_idx, mem_idx = 0, count = 0, ret = 0;
+		int i, group_idx, count = 0, ret = 0;
 
 		minsz = offsetofend(struct vfio_pci_hot_reset, count);
 
@@ -1241,39 +1241,22 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
 		}
 
 		/*
-		 * We need to get memory_lock for each device, but devices
-		 * can share mmap_lock, therefore we need to zap and hold
-		 * the vma_lock for each device, and only then get each
-		 * memory_lock.
+		 * Try to get the memory_lock write lock for all devices and
+		 * zap all BAR mmaps.
 		 */
 		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
-					    vfio_pci_try_zap_and_vma_lock_cb,
+					    vfio_pci_mem_trylock_and_zap_cb,
 					    &devs, slot);
-		if (ret)
-			goto hot_reset_release;
-
-		for (; mem_idx < devs.cur_index; mem_idx++) {
-			struct vfio_pci_device *tmp = devs.devices[mem_idx];
-
-			ret = down_write_trylock(&tmp->memory_lock);
-			if (!ret) {
-				ret = -EBUSY;
-				goto hot_reset_release;
-			}
-			mutex_unlock(&tmp->vma_lock);
-		}
 
 		/* User has access, do the reset */
-		ret = pci_reset_bus(vdev->pdev);
+		if (!ret)
+			ret = pci_reset_bus(vdev->pdev);
 
 hot_reset_release:
 		for (i = 0; i < devs.cur_index; i++) {
 			struct vfio_pci_device *tmp = devs.devices[i];
 
-			if (i < mem_idx)
-				up_write(&tmp->memory_lock);
-			else
-				mutex_unlock(&tmp->vma_lock);
+			up_write(&tmp->memory_lock);
 			vfio_device_put(&tmp->vdev);
 		}
 		kfree(devs.devices);
@@ -1424,100 +1407,18 @@ static ssize_t vfio_pci_write(struct vfio_device *core_vdev, const char __user *
 	return vfio_pci_rw(vdev, (char __user *)buf, count, ppos, true);
 }
 
-/* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try) */
-static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
+static void vfio_pci_zap_bars(struct vfio_pci_device *vdev)
 {
-	struct vfio_pci_mmap_vma *mmap_vma, *tmp;
-
-	/*
-	 * Lock ordering:
-	 * vma_lock is nested under mmap_lock for vm_ops callback paths.
-	 * The memory_lock semaphore is used by both code paths calling
-	 * into this function to zap vmas and the vm_ops.fault callback
-	 * to protect the memory enable state of the device.
-	 *
-	 * When zapping vmas we need to maintain the mmap_lock => vma_lock
-	 * ordering, which requires using vma_lock to walk vma_list to
-	 * acquire an mm, then dropping vma_lock to get the mmap_lock and
-	 * reacquiring vma_lock.  This logic is derived from similar
-	 * requirements in uverbs_user_mmap_disassociate().
-	 *
-	 * mmap_lock must always be the top-level lock when it is taken.
-	 * Therefore we can only hold the memory_lock write lock when
-	 * vma_list is empty, as we'd need to take mmap_lock to clear
-	 * entries.  vma_list can only be guaranteed empty when holding
-	 * vma_lock, thus memory_lock is nested under vma_lock.
-	 *
-	 * This enables the vm_ops.fault callback to acquire vma_lock,
-	 * followed by memory_lock read lock, while already holding
-	 * mmap_lock without risk of deadlock.
-	 */
-	while (1) {
-		struct mm_struct *mm = NULL;
-
-		if (try) {
-			if (!mutex_trylock(&vdev->vma_lock))
-				return 0;
-		} else {
-			mutex_lock(&vdev->vma_lock);
-		}
-		while (!list_empty(&vdev->vma_list)) {
-			mmap_vma = list_first_entry(&vdev->vma_list,
-						    struct vfio_pci_mmap_vma,
-						    vma_next);
-			mm = mmap_vma->vma->vm_mm;
-			if (mmget_not_zero(mm))
-				break;
-
-			list_del(&mmap_vma->vma_next);
-			kfree(mmap_vma);
-			mm = NULL;
-		}
-		if (!mm)
-			return 1;
-		mutex_unlock(&vdev->vma_lock);
-
-		if (try) {
-			if (!mmap_read_trylock(mm)) {
-				mmput(mm);
-				return 0;
-			}
-		} else {
-			mmap_read_lock(mm);
-		}
-		if (try) {
-			if (!mutex_trylock(&vdev->vma_lock)) {
-				mmap_read_unlock(mm);
-				mmput(mm);
-				return 0;
-			}
-		} else {
-			mutex_lock(&vdev->vma_lock);
-		}
-		list_for_each_entry_safe(mmap_vma, tmp,
-					 &vdev->vma_list, vma_next) {
-			struct vm_area_struct *vma = mmap_vma->vma;
-
-			if (vma->vm_mm != mm)
-				continue;
-
-			list_del(&mmap_vma->vma_next);
-			kfree(mmap_vma);
-
-			zap_vma_ptes(vma, vma->vm_start,
-				     vma->vm_end - vma->vm_start);
-		}
-		mutex_unlock(&vdev->vma_lock);
-		mmap_read_unlock(mm);
-		mmput(mm);
-	}
+	vfio_device_unmap_mapping_range(&vdev->vdev,
+			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX),
+			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) -
+			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX));
 }
 
 void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev)
 {
-	vfio_pci_zap_and_vma_lock(vdev, false);
 	down_write(&vdev->memory_lock);
-	mutex_unlock(&vdev->vma_lock);
+	vfio_pci_zap_bars(vdev);
 }
 
 u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev)
@@ -1539,95 +1440,58 @@ void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd)
 	up_write(&vdev->memory_lock);
 }
 
-/* Caller holds vma_lock */
-static int __vfio_pci_add_vma(struct vfio_pci_device *vdev,
-			      struct vm_area_struct *vma)
+static int vfio_pci_bar_vma_to_pfn(struct vm_area_struct *vma,
+				   unsigned long *pfn)
 {
-	struct vfio_pci_mmap_vma *mmap_vma;
-
-	mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL);
-	if (!mmap_vma)
-		return -ENOMEM;
+	struct vfio_pci_device *vdev = vma->vm_private_data;
+	struct pci_dev *pdev = vdev->pdev;
+	int index;
+	u64 pgoff;
 
-	mmap_vma->vma = vma;
-	list_add(&mmap_vma->vma_next, &vdev->vma_list);
+	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
 
-	return 0;
-}
+	if (index >= VFIO_PCI_ROM_REGION_INDEX ||
+	    !vdev->bar_mmap_supported[index] || !vdev->barmap[index])
+		return -EINVAL;
 
-/*
- * Zap mmaps on open so that we can fault them in on access and therefore
- * our vma_list only tracks mappings accessed since last zap.
- */
-static void vfio_pci_mmap_open(struct vm_area_struct *vma)
-{
-	zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
-}
+	pgoff = vma->vm_pgoff &
+		((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
 
-static void vfio_pci_mmap_close(struct vm_area_struct *vma)
-{
-	struct vfio_pci_device *vdev = vma->vm_private_data;
-	struct vfio_pci_mmap_vma *mmap_vma;
+	*pfn = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
 
-	mutex_lock(&vdev->vma_lock);
-	list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
-		if (mmap_vma->vma == vma) {
-			list_del(&mmap_vma->vma_next);
-			kfree(mmap_vma);
-			break;
-		}
-	}
-	mutex_unlock(&vdev->vma_lock);
+	return 0;
 }
 
 static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct vfio_pci_device *vdev = vma->vm_private_data;
-	struct vfio_pci_mmap_vma *mmap_vma;
-	vm_fault_t ret = VM_FAULT_NOPAGE;
+	unsigned long vaddr, pfn;
+	vm_fault_t ret = VM_FAULT_SIGBUS;
 
-	mutex_lock(&vdev->vma_lock);
-	down_read(&vdev->memory_lock);
-
-	if (!__vfio_pci_memory_enabled(vdev)) {
-		ret = VM_FAULT_SIGBUS;
-		goto up_out;
-	}
-
-	/*
-	 * We populate the whole vma on fault, so we need to test whether
-	 * the vma has already been mapped, such as for concurrent faults
-	 * to the same vma.  io_remap_pfn_range() will trigger a BUG_ON if
-	 * we ask it to fill the same range again.
-	 */
-	list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
-		if (mmap_vma->vma == vma)
-			goto up_out;
-	}
+	if (vfio_pci_bar_vma_to_pfn(vma, &pfn))
+		return ret;
 
-	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-			       vma->vm_end - vma->vm_start,
-			       vma->vm_page_prot)) {
-		ret = VM_FAULT_SIGBUS;
-		zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
-		goto up_out;
-	}
+	down_read(&vdev->memory_lock);
 
-	if (__vfio_pci_add_vma(vdev, vma)) {
-		ret = VM_FAULT_OOM;
-		zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
+	if (__vfio_pci_memory_enabled(vdev)) {
+		for (vaddr = vma->vm_start;
+		     vaddr < vma->vm_end; vaddr += PAGE_SIZE, pfn++) {
+			ret = vmf_insert_pfn(vma, vaddr, pfn);
+			if (ret != VM_FAULT_NOPAGE) {
+				zap_vma_ptes(vma, vma->vm_start,
+					     vaddr - vma->vm_start);
+				break;
+			}
+		}
 	}
 
-up_out:
 	up_read(&vdev->memory_lock);
-	mutex_unlock(&vdev->vma_lock);
+
 	return ret;
 }
 
 static const struct vm_operations_struct vfio_pci_mmap_ops = {
-	.open = vfio_pci_mmap_open,
-	.close = vfio_pci_mmap_close,
 	.fault = vfio_pci_mmap_fault,
 };
 
@@ -1690,7 +1554,7 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v
 
 	vma->vm_private_data = vdev;
 	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
+	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 
 	/*
 	 * See remap_pfn_range(), called from vfio_pci_fault() but we can't
@@ -2016,8 +1880,6 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	mutex_init(&vdev->ioeventfds_lock);
 	INIT_LIST_HEAD(&vdev->dummy_resources_list);
 	INIT_LIST_HEAD(&vdev->ioeventfds_list);
-	mutex_init(&vdev->vma_lock);
-	INIT_LIST_HEAD(&vdev->vma_list);
 	init_rwsem(&vdev->memory_lock);
 
 	ret = vfio_pci_reflck_attach(vdev);
@@ -2261,7 +2123,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
 	return 0;
 }
 
-static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
+static int vfio_pci_mem_trylock_and_zap_cb(struct pci_dev *pdev, void *data)
 {
 	struct vfio_devices *devs = data;
 	struct vfio_device *device;
@@ -2281,15 +2143,13 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
 
 	vdev = container_of(device, struct vfio_pci_device, vdev);
 
-	/*
-	 * Locking multiple devices is prone to deadlock, runaway and
-	 * unwind if we hit contention.
-	 */
-	if (!vfio_pci_zap_and_vma_lock(vdev, true)) {
+	if (!down_write_trylock(&vdev->memory_lock)) {
 		vfio_device_put(device);
 		return -EBUSY;
 	}
 
+	vfio_pci_zap_bars(vdev);
+
 	devs->devices[devs->cur_index++] = vdev;
 	return 0;
 }
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index bbc56c857ef0..0aa542fa1e26 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -140,8 +140,6 @@ struct vfio_pci_device {
 	struct list_head	ioeventfds_list;
 	struct vfio_pci_vf_token	*vf_token;
 	struct notifier_block	nb;
-	struct mutex		vma_lock;
-	struct list_head	vma_list;
 	struct rw_semaphore	memory_lock;
 };
 



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback
  2021-08-05 17:06 [PATCH 0/7] vfio: device fd address space and vfio-pci mmap invalidation cleanup Alex Williamson
                   ` (2 preceding siblings ...)
  2021-08-05 17:07 ` [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range() Alex Williamson
@ 2021-08-05 17:07 ` Alex Williamson
  2021-08-06  1:01   ` Jason Gunthorpe
  2021-08-10  9:00   ` Christoph Hellwig
  2021-08-05 17:08 ` [PATCH 5/7] mm/interval_tree.c: Export vma interval tree iterators Alex Williamson
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 38+ messages in thread
From: Alex Williamson @ 2021-08-05 17:07 UTC (permalink / raw)
  To: alex.williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm, jgg, peterx

Add a new vfio_device_ops callback to allow the vfio device driver to
translate a vma mapping of a vfio device fd to a pfn.  Implementation
limited to vfio-pci here for the purpose of supporting the reverse of
unmap_mapping_range(), but expected to be implemented for all vfio
device drivers supporting DMA mapping of device memory mmaps.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c |    9 ++++++---
 drivers/vfio/vfio.c         |   18 ++++++++++++++++--
 include/linux/vfio.h        |    6 ++++++
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index c526edbf1173..7a9f67cfc0a2 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1440,10 +1440,12 @@ void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd)
 	up_write(&vdev->memory_lock);
 }
 
-static int vfio_pci_bar_vma_to_pfn(struct vm_area_struct *vma,
+static int vfio_pci_bar_vma_to_pfn(struct vfio_device *core_vdev,
+				   struct vm_area_struct *vma,
 				   unsigned long *pfn)
 {
-	struct vfio_pci_device *vdev = vma->vm_private_data;
+	struct vfio_pci_device *vdev =
+			container_of(core_vdev, struct vfio_pci_device, vdev);
 	struct pci_dev *pdev = vdev->pdev;
 	int index;
 	u64 pgoff;
@@ -1469,7 +1471,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
 	unsigned long vaddr, pfn;
 	vm_fault_t ret = VM_FAULT_SIGBUS;
 
-	if (vfio_pci_bar_vma_to_pfn(vma, &pfn))
+	if (vfio_pci_bar_vma_to_pfn(&vdev->vdev, vma, &pfn))
 		return ret;
 
 	down_read(&vdev->memory_lock);
@@ -1742,6 +1744,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
 	.mmap		= vfio_pci_mmap,
 	.request	= vfio_pci_request,
 	.match		= vfio_pci_match,
+	.vma_to_pfn	= vfio_pci_bar_vma_to_pfn,
 };
 
 static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 1e4fc69fee7d..42ca93be152a 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -875,6 +875,22 @@ struct vfio_device *vfio_device_get_from_dev(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
 
+static const struct file_operations vfio_device_fops;
+
+int vfio_device_vma_to_pfn(struct vfio_device *device,
+			   struct vm_area_struct *vma, unsigned long *pfn)
+{
+	if (WARN_ON(!vma->vm_file || vma->vm_file->f_op != &vfio_device_fops ||
+		    vma->vm_file->private_data != device))
+		return -EINVAL;
+
+	if (unlikely(!device->ops->vma_to_pfn))
+		return -EPERM;
+
+	return device->ops->vma_to_pfn(device, vma, pfn);
+}
+EXPORT_SYMBOL_GPL(vfio_device_vma_to_pfn);
+
 static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
 						     char *buf)
 {
@@ -1407,8 +1423,6 @@ static int vfio_group_add_container_user(struct vfio_group *group)
 	return 0;
 }
 
-static const struct file_operations vfio_device_fops;
-
 static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 {
 	struct vfio_device *device;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 712813703e5a..5f07ebe0f85d 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -41,6 +41,7 @@ struct vfio_device {
  * @match: Optional device name match callback (return: 0 for no-match, >0 for
  *         match, -errno for abort (ex. match with insufficient or incorrect
  *         additional args)
+ * @vma_to_pfn: Optional pfn from vma lookup against vma mapping device fd
  */
 struct vfio_device_ops {
 	char	*name;
@@ -55,6 +56,8 @@ struct vfio_device_ops {
 	int	(*mmap)(struct vfio_device *vdev, struct vm_area_struct *vma);
 	void	(*request)(struct vfio_device *vdev, unsigned int count);
 	int	(*match)(struct vfio_device *vdev, char *buf);
+	int	(*vma_to_pfn)(struct vfio_device *vdev,
+			      struct vm_area_struct *vma, unsigned long *pfn);
 };
 
 extern struct iommu_group *vfio_iommu_group_get(struct device *dev);
@@ -68,6 +71,9 @@ extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
 extern void vfio_device_put(struct vfio_device *device);
 extern void vfio_device_unmap_mapping_range(struct vfio_device *device,
 					    loff_t start, loff_t len);
+extern int vfio_device_vma_to_pfn(struct vfio_device *device,
+				  struct vm_area_struct *vma,
+				  unsigned long *pfn);
 
 /* events for the backend driver notify callback */
 enum vfio_iommu_notify_type {



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 5/7] mm/interval_tree.c: Export vma interval tree iterators
  2021-08-05 17:06 [PATCH 0/7] vfio: device fd address space and vfio-pci mmap invalidation cleanup Alex Williamson
                   ` (3 preceding siblings ...)
  2021-08-05 17:07 ` [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback Alex Williamson
@ 2021-08-05 17:08 ` Alex Williamson
  2021-08-05 17:08 ` [PATCH 6/7] vfio: Add vfio_device_io_remap_mapping_range() Alex Williamson
  2021-08-05 17:08 ` [PATCH 7/7] vfio/pci: Remove map-on-fault behavior Alex Williamson
  6 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2021-08-05 17:08 UTC (permalink / raw)
  To: alex.williamson; +Cc: Andrew Morton, linux-mm, linux-kernel, kvm, jgg, peterx

In order to make use of vma_interval_tree_foreach() from a module
we need to export the first and next interators.  vfio code would
like to use this foreach helper to create a remapping helper,
essentially the reverse of unmap_mapping_range() for specific vmas
mapping vfio device memory.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 mm/interval_tree.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index 32e390c42c53..faa50767496c 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -24,6 +24,9 @@ INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
 		     unsigned long, shared.rb_subtree_last,
 		     vma_start_pgoff, vma_last_pgoff, /* empty */, vma_interval_tree)
 
+EXPORT_SYMBOL_GPL(vma_interval_tree_iter_first);
+EXPORT_SYMBOL_GPL(vma_interval_tree_iter_next);
+
 /* Insert node immediately after prev in the interval tree */
 void vma_interval_tree_insert_after(struct vm_area_struct *node,
 				    struct vm_area_struct *prev,



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 6/7] vfio: Add vfio_device_io_remap_mapping_range()
  2021-08-05 17:06 [PATCH 0/7] vfio: device fd address space and vfio-pci mmap invalidation cleanup Alex Williamson
                   ` (4 preceding siblings ...)
  2021-08-05 17:08 ` [PATCH 5/7] mm/interval_tree.c: Export vma interval tree iterators Alex Williamson
@ 2021-08-05 17:08 ` Alex Williamson
  2021-08-10  9:04   ` Christoph Hellwig
  2021-08-05 17:08 ` [PATCH 7/7] vfio/pci: Remove map-on-fault behavior Alex Williamson
  6 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2021-08-05 17:08 UTC (permalink / raw)
  To: alex.williamson; +Cc: linux-kernel, kvm, jgg, peterx

This provides a mirror of vfio_device_unmap_mapping_range() for
vmas mapping device memory where the pfn is provided by
vfio_device_vma_to_pfn().

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio.c  |   44 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h |    2 ++
 2 files changed, 46 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 42ca93be152a..c5b3a3446dd9 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -33,6 +33,7 @@
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/wait.h>
+#include <linux/sched/mm.h>
 #include <linux/sched/signal.h>
 
 #define DRIVER_VERSION	"0.3"
@@ -567,6 +568,49 @@ void vfio_device_unmap_mapping_range(struct vfio_device *device,
 }
 EXPORT_SYMBOL_GPL(vfio_device_unmap_mapping_range);
 
+int vfio_device_io_remap_mapping_range(struct vfio_device *device,
+				       loff_t start, loff_t len)
+{
+	struct address_space *mapping = device->inode->i_mapping;
+	int ret = 0;
+
+	i_mmap_lock_write(mapping);
+	if (mapping_mapped(mapping)) {
+		struct rb_root_cached *root = &mapping->i_mmap;
+		pgoff_t pgstart = start >> PAGE_SHIFT;
+		pgoff_t pgend = (start + len - 1) >> PAGE_SHIFT;
+		struct vm_area_struct *vma;
+
+		vma_interval_tree_foreach(vma, root, pgstart, pgend) {
+			unsigned long pfn;
+			unsigned int flags;
+
+			ret = vfio_device_vma_to_pfn(device, vma, &pfn);
+			if (ret)
+				break;
+
+			/*
+			 * Force NOFS memory allocation context to avoid
+			 * deadlock while we hold i_mmap_rwsem.
+			 */
+			flags = memalloc_nofs_save();
+			ret = io_remap_pfn_range(vma, vma->vm_start, pfn,
+						 vma->vm_end - vma->vm_start,
+						 vma->vm_page_prot);
+			memalloc_nofs_restore(flags);
+			if (ret)
+				break;
+		}
+	}
+	i_mmap_unlock_write(mapping);
+
+	if (ret)
+		vfio_device_unmap_mapping_range(device, start, len);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_device_io_remap_mapping_range);
+
 /**
  * Device objects - create, release, get, put, search
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 5f07ebe0f85d..c2c51c7a6f05 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -71,6 +71,8 @@ extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
 extern void vfio_device_put(struct vfio_device *device);
 extern void vfio_device_unmap_mapping_range(struct vfio_device *device,
 					    loff_t start, loff_t len);
+extern int vfio_device_io_remap_mapping_range(struct vfio_device *device,
+					      loff_t start, loff_t len);
 extern int vfio_device_vma_to_pfn(struct vfio_device *device,
 				  struct vm_area_struct *vma,
 				  unsigned long *pfn);



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 7/7] vfio/pci: Remove map-on-fault behavior
  2021-08-05 17:06 [PATCH 0/7] vfio: device fd address space and vfio-pci mmap invalidation cleanup Alex Williamson
                   ` (5 preceding siblings ...)
  2021-08-05 17:08 ` [PATCH 6/7] vfio: Add vfio_device_io_remap_mapping_range() Alex Williamson
@ 2021-08-05 17:08 ` Alex Williamson
  2021-08-10  9:11   ` Christoph Hellwig
  2021-08-10 20:54   ` Peter Xu
  6 siblings, 2 replies; 38+ messages in thread
From: Alex Williamson @ 2021-08-05 17:08 UTC (permalink / raw)
  To: alex.williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm, jgg, peterx

With vfio_device_io_remap_mapping_range() we can repopulate vmas with
device mappings around manipulation of the device rather than waiting
for an access.  This allows us to go back to the more standard use
case of io_remap_pfn_range() for device memory while still preventing
access to device memory through mmaps when the device is disabled.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c         |   80 +++++++++++++++++------------------
 drivers/vfio/pci/vfio_pci_config.c  |    8 ++--
 drivers/vfio/pci/vfio_pci_private.h |    3 +
 3 files changed, 45 insertions(+), 46 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7a9f67cfc0a2..196b8002447b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -447,6 +447,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 		kfree(dummy_res);
 	}
 
+	vdev->zapped_bars = false;
 	vdev->needs_reset = true;
 
 	/*
@@ -1057,7 +1058,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
 
 		vfio_pci_zap_and_down_write_memory_lock(vdev);
 		ret = pci_try_reset_function(vdev->pdev);
-		up_write(&vdev->memory_lock);
+		vfio_pci_test_and_up_write_memory_lock(vdev);
 
 		return ret;
 
@@ -1256,7 +1257,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev,
 		for (i = 0; i < devs.cur_index; i++) {
 			struct vfio_pci_device *tmp = devs.devices[i];
 
-			up_write(&tmp->memory_lock);
+			vfio_pci_test_and_up_write_memory_lock(tmp);
 			vfio_device_put(&tmp->vdev);
 		}
 		kfree(devs.devices);
@@ -1413,6 +1414,14 @@ static void vfio_pci_zap_bars(struct vfio_pci_device *vdev)
 			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX),
 			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) -
 			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX));
+
+	/*
+	 * Modified under memory_lock write semaphore.  Device handoff
+	 * with memory enabled, therefore any disable will zap and setup
+	 * a remap when re-enabled.  io_remap_pfn_range() is not forgiving
+	 * of duplicate mappings so we must track.
+	 */
+	vdev->zapped_bars = true;
 }
 
 void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev)
@@ -1421,6 +1430,18 @@ void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev)
 	vfio_pci_zap_bars(vdev);
 }
 
+void vfio_pci_test_and_up_write_memory_lock(struct vfio_pci_device *vdev)
+{
+	if (vdev->zapped_bars && __vfio_pci_memory_enabled(vdev)) {
+		WARN_ON(vfio_device_io_remap_mapping_range(&vdev->vdev,
+			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX),
+			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) -
+			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX)));
+		vdev->zapped_bars = false;
+	}
+	up_write(&vdev->memory_lock);
+}
+
 u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev)
 {
 	u16 cmd;
@@ -1464,39 +1485,6 @@ static int vfio_pci_bar_vma_to_pfn(struct vfio_device *core_vdev,
 	return 0;
 }
 
-static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
-{
-	struct vm_area_struct *vma = vmf->vma;
-	struct vfio_pci_device *vdev = vma->vm_private_data;
-	unsigned long vaddr, pfn;
-	vm_fault_t ret = VM_FAULT_SIGBUS;
-
-	if (vfio_pci_bar_vma_to_pfn(&vdev->vdev, vma, &pfn))
-		return ret;
-
-	down_read(&vdev->memory_lock);
-
-	if (__vfio_pci_memory_enabled(vdev)) {
-		for (vaddr = vma->vm_start;
-		     vaddr < vma->vm_end; vaddr += PAGE_SIZE, pfn++) {
-			ret = vmf_insert_pfn(vma, vaddr, pfn);
-			if (ret != VM_FAULT_NOPAGE) {
-				zap_vma_ptes(vma, vma->vm_start,
-					     vaddr - vma->vm_start);
-				break;
-			}
-		}
-	}
-
-	up_read(&vdev->memory_lock);
-
-	return ret;
-}
-
-static const struct vm_operations_struct vfio_pci_mmap_ops = {
-	.fault = vfio_pci_mmap_fault,
-};
-
 static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma)
 {
 	struct vfio_pci_device *vdev =
@@ -1504,6 +1492,7 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v
 	struct pci_dev *pdev = vdev->pdev;
 	unsigned int index;
 	u64 phys_len, req_len, pgoff, req_start;
+	unsigned long pfn;
 	int ret;
 
 	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
@@ -1554,18 +1543,25 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v
 		}
 	}
 
-	vma->vm_private_data = vdev;
+	ret = vfio_pci_bar_vma_to_pfn(core_vdev, vma, &pfn);
+	if (ret)
+		return ret;
+
 	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 
+	down_read(&vdev->memory_lock);
 	/*
-	 * See remap_pfn_range(), called from vfio_pci_fault() but we can't
-	 * change vm_flags within the fault handler.  Set them now.
+	 * Only perform the mapping now if BAR is not in zapped state, VFs
+	 * always report memory enabled so relying on device enable state
+	 * could lead to duplicate remaps.
 	 */
-	vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
-	vma->vm_ops = &vfio_pci_mmap_ops;
+	if (!vdev->zapped_bars)
+		ret = io_remap_pfn_range(vma, vma->vm_start, pfn,
+					 vma->vm_end - vma->vm_start,
+					 vma->vm_page_prot);
+	up_read(&vdev->memory_lock);
 
-	return 0;
+	return ret;
 }
 
 static void vfio_pci_request(struct vfio_device *core_vdev, unsigned int count)
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 70e28efbc51f..4220057b253c 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -605,7 +605,7 @@ static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos,
 	count = vfio_default_config_write(vdev, pos, count, perm, offset, val);
 	if (count < 0) {
 		if (offset == PCI_COMMAND)
-			up_write(&vdev->memory_lock);
+			vfio_pci_test_and_up_write_memory_lock(vdev);
 		return count;
 	}
 
@@ -619,7 +619,7 @@ static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos,
 		*virt_cmd &= cpu_to_le16(~mask);
 		*virt_cmd |= cpu_to_le16(new_cmd & mask);
 
-		up_write(&vdev->memory_lock);
+		vfio_pci_test_and_up_write_memory_lock(vdev);
 	}
 
 	/* Emulate INTx disable */
@@ -860,7 +860,7 @@ static int vfio_exp_config_write(struct vfio_pci_device *vdev, int pos,
 		if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) {
 			vfio_pci_zap_and_down_write_memory_lock(vdev);
 			pci_try_reset_function(vdev->pdev);
-			up_write(&vdev->memory_lock);
+			vfio_pci_test_and_up_write_memory_lock(vdev);
 		}
 	}
 
@@ -942,7 +942,7 @@ static int vfio_af_config_write(struct vfio_pci_device *vdev, int pos,
 		if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) {
 			vfio_pci_zap_and_down_write_memory_lock(vdev);
 			pci_try_reset_function(vdev->pdev);
-			up_write(&vdev->memory_lock);
+			vfio_pci_test_and_up_write_memory_lock(vdev);
 		}
 	}
 
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 0aa542fa1e26..9aedb78a4ae3 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -128,6 +128,7 @@ struct vfio_pci_device {
 	bool			needs_reset;
 	bool			nointx;
 	bool			needs_pm_restore;
+	bool			zapped_bars;
 	struct pci_saved_state	*pci_saved_state;
 	struct pci_saved_state	*pm_save;
 	struct vfio_pci_reflck	*reflck;
@@ -186,6 +187,8 @@ extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev,
 extern bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev);
 extern void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device
 						    *vdev);
+extern void vfio_pci_test_and_up_write_memory_lock(struct vfio_pci_device
+						   *vdev);
 extern u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev);
 extern void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev,
 					       u16 cmd);



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback
  2021-08-05 17:07 ` [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback Alex Williamson
@ 2021-08-06  1:01   ` Jason Gunthorpe
  2021-08-10  9:00     ` Christoph Hellwig
  2021-08-10  9:00   ` Christoph Hellwig
  1 sibling, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2021-08-06  1:01 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-kernel, kvm, peterx

On Thu, Aug 05, 2021 at 11:07:47AM -0600, Alex Williamson wrote:
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 1e4fc69fee7d..42ca93be152a 100644
> +++ b/drivers/vfio/vfio.c
> @@ -875,6 +875,22 @@ struct vfio_device *vfio_device_get_from_dev(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
>  
> +static const struct file_operations vfio_device_fops;
> +
> +int vfio_device_vma_to_pfn(struct vfio_device *device,
> +			   struct vm_area_struct *vma, unsigned long *pfn)

A comment here describing the locking conditions the caller must meet
would be a good addition.. It looks like this can only work under the
i_mmap_lock and the returned pfn can only be taken outside that lock
if it is placed in a VMA

Maybe this is not a great API then? Should it be 'populate vma' and
call io_remap_pfn_range under the op?

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-05 17:07 ` [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range() Alex Williamson
@ 2021-08-06  1:04   ` Jason Gunthorpe
  2021-08-06 20:17     ` Alex Williamson
  2021-08-10  8:53   ` Christoph Hellwig
  2021-08-10 18:48   ` Peter Xu
  2 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2021-08-06  1:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-kernel, kvm, peterx

On Thu, Aug 05, 2021 at 11:07:35AM -0600, Alex Williamson wrote:
> @@ -2281,15 +2143,13 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
>  
>  	vdev = container_of(device, struct vfio_pci_device, vdev);
>  
> -	/*
> -	 * Locking multiple devices is prone to deadlock, runaway and
> -	 * unwind if we hit contention.
> -	 */
> -	if (!vfio_pci_zap_and_vma_lock(vdev, true)) {
> +	if (!down_write_trylock(&vdev->memory_lock)) {
>  		vfio_device_put(device);
>  		return -EBUSY;
>  	}

Now that this is simplified so much, I wonder if we can drop the
memory_lock and just use the dev_set->lock?

That avoids the whole down_write_trylock thing and makes it much more
understandable?

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-06  1:04   ` Jason Gunthorpe
@ 2021-08-06 20:17     ` Alex Williamson
  2021-08-10  8:51       ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2021-08-06 20:17 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-kernel, kvm, peterx

On Thu, 5 Aug 2021 22:04:18 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Thu, Aug 05, 2021 at 11:07:35AM -0600, Alex Williamson wrote:
> > @@ -2281,15 +2143,13 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
> >  
> >  	vdev = container_of(device, struct vfio_pci_device, vdev);
> >  
> > -	/*
> > -	 * Locking multiple devices is prone to deadlock, runaway and
> > -	 * unwind if we hit contention.
> > -	 */
> > -	if (!vfio_pci_zap_and_vma_lock(vdev, true)) {
> > +	if (!down_write_trylock(&vdev->memory_lock)) {
> >  		vfio_device_put(device);
> >  		return -EBUSY;
> >  	}  
> 
> Now that this is simplified so much, I wonder if we can drop the
> memory_lock and just use the dev_set->lock?
> 
> That avoids the whole down_write_trylock thing and makes it much more
> understandable?

Hmm, that would make this case a lot easier, but using a mutex,
potentially shared across multiple devices, taken on every non-mmap
read/write doesn't really feel like a good trade-off when we're
currently using a per device rwsem to retain concurrency here.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] vfio: Create vfio_fs_type with inode per device
  2021-08-05 17:07 ` [PATCH 1/7] vfio: Create vfio_fs_type with inode per device Alex Williamson
@ 2021-08-10  8:43   ` Christoph Hellwig
  2021-08-10 14:52     ` Alex Williamson
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10  8:43 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm, peterx

> + * XXX Adopt the following when available:
> + * https://lore.kernel.org/lkml/20210309155348.974875-1-hch@lst.de/

No need for this link.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] vfio: Export unmap_mapping_range() wrapper
  2021-08-05 17:07 ` [PATCH 2/7] vfio: Export unmap_mapping_range() wrapper Alex Williamson
@ 2021-08-10  8:45   ` Christoph Hellwig
  2021-08-10 18:56   ` Peter Xu
  1 sibling, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10  8:45 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-kernel, kvm, jgg, peterx

> +void vfio_device_unmap_mapping_range(struct vfio_device *device,
> +				     loff_t start, loff_t len)
> +{
> +	unmap_mapping_range(device->inode->i_mapping, start, len, true);
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_unmap_mapping_range);

Instead of mirroring the name of unmap_mapping_range maybe give this
a name to document the use case?

> +extern void vfio_device_unmap_mapping_range(struct vfio_device *device,
> +					    loff_t start, loff_t len);

No need for the extern.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-06 20:17     ` Alex Williamson
@ 2021-08-10  8:51       ` Christoph Hellwig
  2021-08-10 11:57         ` Jason Gunthorpe
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10  8:51 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm, peterx

On Fri, Aug 06, 2021 at 02:17:45PM -0600, Alex Williamson wrote:
> > Now that this is simplified so much, I wonder if we can drop the
> > memory_lock and just use the dev_set->lock?
> > 
> > That avoids the whole down_write_trylock thing and makes it much more
> > understandable?
> 
> Hmm, that would make this case a lot easier, but using a mutex,
> potentially shared across multiple devices, taken on every non-mmap
> read/write doesn't really feel like a good trade-off when we're
> currently using a per device rwsem to retain concurrency here.  Thanks,

Using a per-set percpu_rw_semaphore might be a good plan here.  Probably
makes sense to do that incrementally after this change, though.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-05 17:07 ` [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range() Alex Williamson
  2021-08-06  1:04   ` Jason Gunthorpe
@ 2021-08-10  8:53   ` Christoph Hellwig
  2021-08-10 19:02     ` Peter Xu
  2021-08-10 18:48   ` Peter Xu
  2 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10  8:53 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm, peterx

On Thu, Aug 05, 2021 at 11:07:35AM -0600, Alex Williamson wrote:
> +static void vfio_pci_zap_bars(struct vfio_pci_device *vdev)
>  {
> +	vfio_device_unmap_mapping_range(&vdev->vdev,
> +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX),
> +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) -
> +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX));

Maybe make this a little more readable by having local variables:

> +static int vfio_pci_bar_vma_to_pfn(struct vm_area_struct *vma,
> +				   unsigned long *pfn)
>  {
> +	struct vfio_pci_device *vdev = vma->vm_private_data;
> +	struct pci_dev *pdev = vdev->pdev;
> +	int index;
> +	u64 pgoff;
>  
> +	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);

Nit: initialization at declaration time would be nice.

>  static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
>  {
>  	struct vm_area_struct *vma = vmf->vma;
>  	struct vfio_pci_device *vdev = vma->vm_private_data;
> +	unsigned long vaddr, pfn;
> +	vm_fault_t ret = VM_FAULT_SIGBUS;
>  
> +	if (vfio_pci_bar_vma_to_pfn(vma, &pfn))
> +		return ret;
>  
> +	down_read(&vdev->memory_lock);
>  
> +	if (__vfio_pci_memory_enabled(vdev)) {
> +		for (vaddr = vma->vm_start;
> +		     vaddr < vma->vm_end; vaddr += PAGE_SIZE, pfn++) {
> +			ret = vmf_insert_pfn(vma, vaddr, pfn);
> +			if (ret != VM_FAULT_NOPAGE) {
> +				zap_vma_ptes(vma, vma->vm_start,
> +					     vaddr - vma->vm_start);
> +				break;
> +			}
> +		}

Unwinding this with a goto for the not enabled case would be a little easier
to read.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback
  2021-08-06  1:01   ` Jason Gunthorpe
@ 2021-08-10  9:00     ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10  9:00 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Alex Williamson, linux-kernel, kvm, peterx

On Thu, Aug 05, 2021 at 10:01:46PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 05, 2021 at 11:07:47AM -0600, Alex Williamson wrote:
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 1e4fc69fee7d..42ca93be152a 100644
> > +++ b/drivers/vfio/vfio.c
> > @@ -875,6 +875,22 @@ struct vfio_device *vfio_device_get_from_dev(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
> >  
> > +static const struct file_operations vfio_device_fops;
> > +
> > +int vfio_device_vma_to_pfn(struct vfio_device *device,
> > +			   struct vm_area_struct *vma, unsigned long *pfn)
> 
> A comment here describing the locking conditions the caller must meet
> would be a good addition.. It looks like this can only work under the
> i_mmap_lock and the returned pfn can only be taken outside that lock
> if it is placed in a VMA
> 
> Maybe this is not a great API then? Should it be 'populate vma' and
> call io_remap_pfn_range under the op?

Yes, I think that would be a better API.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback
  2021-08-05 17:07 ` [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback Alex Williamson
  2021-08-06  1:01   ` Jason Gunthorpe
@ 2021-08-10  9:00   ` Christoph Hellwig
  1 sibling, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10  9:00 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm, peterx

>  static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 1e4fc69fee7d..42ca93be152a 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -875,6 +875,22 @@ struct vfio_device *vfio_device_get_from_dev(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
>  
> +static const struct file_operations vfio_device_fops;

If we ned a forward declaration here it would be nice to keep it at the top
of the file.  Finding a way to not need it would be even better.

> +
> +int vfio_device_vma_to_pfn(struct vfio_device *device,
> +			   struct vm_area_struct *vma, unsigned long *pfn)
> +{
> +	if (WARN_ON(!vma->vm_file || vma->vm_file->f_op != &vfio_device_fops ||
> +		    vma->vm_file->private_data != device))
> +		return -EINVAL;

WARN_ON_ONCE?

> +
> +	if (unlikely(!device->ops->vma_to_pfn))
> +		return -EPERM;
> +
> +	return device->ops->vma_to_pfn(device, vma, pfn);
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_vma_to_pfn);

This function is only used in vfio.c, so it can be marked static instead of
being exported.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 6/7] vfio: Add vfio_device_io_remap_mapping_range()
  2021-08-05 17:08 ` [PATCH 6/7] vfio: Add vfio_device_io_remap_mapping_range() Alex Williamson
@ 2021-08-10  9:04   ` Christoph Hellwig
  0 siblings, 0 replies; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10  9:04 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-kernel, kvm, jgg, peterx

> +int vfio_device_io_remap_mapping_range(struct vfio_device *device,
> +				       loff_t start, loff_t len)
> +{
> +	struct address_space *mapping = device->inode->i_mapping;
> +	int ret = 0;
> +
> +	i_mmap_lock_write(mapping);
> +	if (mapping_mapped(mapping)) {
> +		struct rb_root_cached *root = &mapping->i_mmap;
> +		pgoff_t pgstart = start >> PAGE_SHIFT;
> +		pgoff_t pgend = (start + len - 1) >> PAGE_SHIFT;
> +		struct vm_area_struct *vma;
> +
> +		vma_interval_tree_foreach(vma, root, pgstart, pgend) {

There is no need for the mapping_mapped check here,
vma_interval_tree_foreach will the right thing for an empty tree.
That also allows to move a few more instructions out of the lock.

> +			/*
> +			 * Force NOFS memory allocation context to avoid
> +			 * deadlock while we hold i_mmap_rwsem.
> +			 */
> +			flags = memalloc_nofs_save();

Please move this out of the loop.

> +extern int vfio_device_io_remap_mapping_range(struct vfio_device *device,
> +					      loff_t start, loff_t len);

No need for the extern.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] vfio/pci: Remove map-on-fault behavior
  2021-08-05 17:08 ` [PATCH 7/7] vfio/pci: Remove map-on-fault behavior Alex Williamson
@ 2021-08-10  9:11   ` Christoph Hellwig
  2021-08-10 15:04     ` Alex Williamson
  2021-08-10 20:54   ` Peter Xu
  1 sibling, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10  9:11 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm, peterx

On Thu, Aug 05, 2021 at 11:08:21AM -0600, Alex Williamson wrote:
> +void vfio_pci_test_and_up_write_memory_lock(struct vfio_pci_device *vdev)
> +{
> +	if (vdev->zapped_bars && __vfio_pci_memory_enabled(vdev)) {
> +		WARN_ON(vfio_device_io_remap_mapping_range(&vdev->vdev,
> +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX),
> +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) -
> +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX)));
> +		vdev->zapped_bars = false;

Doing actual work inside a WARN_ON is pretty nasty.  Also the non-ONCE
version here will lead to massive log splatter if it actually hits.

I'd prefer something like:

	loff_t start = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX);
	loff_t end = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX);

	if (vdev->zapped_bars && __vfio_pci_memory_enabled(vdev)) {
		if (!vfio_device_io_remap_mapping_range(&vdev->vdev, start,
				end - start))
			vdev->zapped_bars = false;
		WARN_ON_ONCE(vdev->zapped_bars);

>  	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> -	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);

What is the story with this appearing earlier and disappearing here
again?

> +extern void vfio_pci_test_and_up_write_memory_lock(struct vfio_pci_device
> +						   *vdev);

No need for the extern.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-10  8:51       ` Christoph Hellwig
@ 2021-08-10 11:57         ` Jason Gunthorpe
  2021-08-10 12:55           ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Jason Gunthorpe @ 2021-08-10 11:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Alex Williamson, linux-kernel, kvm, peterx

On Tue, Aug 10, 2021 at 09:51:07AM +0100, Christoph Hellwig wrote:
> On Fri, Aug 06, 2021 at 02:17:45PM -0600, Alex Williamson wrote:
> > > Now that this is simplified so much, I wonder if we can drop the
> > > memory_lock and just use the dev_set->lock?
> > > 
> > > That avoids the whole down_write_trylock thing and makes it much more
> > > understandable?
> > 
> > Hmm, that would make this case a lot easier, but using a mutex,
> > potentially shared across multiple devices, taken on every non-mmap
> > read/write doesn't really feel like a good trade-off when we're
> > currently using a per device rwsem to retain concurrency here.  Thanks,
> 
> Using a per-set percpu_rw_semaphore might be a good plan here.  Probably
> makes sense to do that incrementally after this change, though.

I'm not sure there is a real performance win to chase here? Doesn't
this only protect mmap against reset? The mmap isn't performance
sensitive, right?

If this really needs extra optimization adding a rwsem to the devset
and using that across the whole set would surely be sufficient.

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-10 11:57         ` Jason Gunthorpe
@ 2021-08-10 12:55           ` Christoph Hellwig
  2021-08-10 21:50             ` Alex Williamson
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10 12:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Alex Williamson, linux-kernel, kvm, peterx

On Tue, Aug 10, 2021 at 08:57:22AM -0300, Jason Gunthorpe wrote:
> I'm not sure there is a real performance win to chase here? Doesn't
> this only protect mmap against reset? The mmap isn't performance
> sensitive, right?
> 
> If this really needs extra optimization adding a rwsem to the devset
> and using that across the whole set would surely be sufficient.

Every mmio read or write takes memory_lock.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] vfio: Create vfio_fs_type with inode per device
  2021-08-10  8:43   ` Christoph Hellwig
@ 2021-08-10 14:52     ` Alex Williamson
  2021-08-10 14:57       ` Christoph Hellwig
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2021-08-10 14:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jason Gunthorpe, linux-kernel, kvm, peterx

On Tue, 10 Aug 2021 10:43:29 +0200
Christoph Hellwig <hch@infradead.org> wrote:

> > + * XXX Adopt the following when available:
> > + * https://lore.kernel.org/lkml/20210309155348.974875-1-hch@lst.de/  
> 
> No need for this link.

Is that effort dead?  I've used the link several times myself to search
for progress, so it's been useful to me.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] vfio: Create vfio_fs_type with inode per device
  2021-08-10 14:52     ` Alex Williamson
@ 2021-08-10 14:57       ` Christoph Hellwig
  2021-08-10 18:49         ` Peter Xu
  0 siblings, 1 reply; 38+ messages in thread
From: Christoph Hellwig @ 2021-08-10 14:57 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Christoph Hellwig, Jason Gunthorpe, linux-kernel, kvm, peterx

On Tue, Aug 10, 2021 at 08:52:54AM -0600, Alex Williamson wrote:
> On Tue, 10 Aug 2021 10:43:29 +0200
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > > + * XXX Adopt the following when available:
> > > + * https://lore.kernel.org/lkml/20210309155348.974875-1-hch@lst.de/  
> > 
> > No need for this link.
> 
> Is that effort dead?  I've used the link several times myself to search
> for progress, so it's been useful to me.  Thanks,

No, but it seems odd to have reference to an old patchset in the kernel
tree.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] vfio/pci: Remove map-on-fault behavior
  2021-08-10  9:11   ` Christoph Hellwig
@ 2021-08-10 15:04     ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2021-08-10 15:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jason Gunthorpe, linux-kernel, kvm, peterx

On Tue, 10 Aug 2021 11:11:49 +0200
Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Aug 05, 2021 at 11:08:21AM -0600, Alex Williamson wrote:
> > +void vfio_pci_test_and_up_write_memory_lock(struct vfio_pci_device *vdev)
> > +{
> > +	if (vdev->zapped_bars && __vfio_pci_memory_enabled(vdev)) {
> > +		WARN_ON(vfio_device_io_remap_mapping_range(&vdev->vdev,
> > +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX),
> > +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) -
> > +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX)));
> > +		vdev->zapped_bars = false;  
> 
> Doing actual work inside a WARN_ON is pretty nasty.  Also the non-ONCE
> version here will lead to massive log splatter if it actually hits.
> 
> I'd prefer something like:
> 
> 	loff_t start = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX);
> 	loff_t end = VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX);
> 
> 	if (vdev->zapped_bars && __vfio_pci_memory_enabled(vdev)) {
> 		if (!vfio_device_io_remap_mapping_range(&vdev->vdev, start,
> 				end - start))
> 			vdev->zapped_bars = false;
> 		WARN_ON_ONCE(vdev->zapped_bars);
> 
> >  	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > -	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);  
> 
> What is the story with this appearing earlier and disappearing here
> again?

We switched from io_remap_pfn_range() which includes this flag
implicitly, to vmf_insert_pfn() which does not, and back to
io_remap_pfn_range() when the fault handler is removed.

> > +extern void vfio_pci_test_and_up_write_memory_lock(struct vfio_pci_device
> > +						   *vdev);  
> 
> No need for the extern.

Thanks much for the reviews!

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-05 17:07 ` [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range() Alex Williamson
  2021-08-06  1:04   ` Jason Gunthorpe
  2021-08-10  8:53   ` Christoph Hellwig
@ 2021-08-10 18:48   ` Peter Xu
  2021-08-10 19:59     ` Alex Williamson
  2 siblings, 1 reply; 38+ messages in thread
From: Peter Xu @ 2021-08-10 18:48 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm

On Thu, Aug 05, 2021 at 11:07:35AM -0600, Alex Williamson wrote:
> @@ -1690,7 +1554,7 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v
>  
>  	vma->vm_private_data = vdev;
>  	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> -	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
> +	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);

This addition seems to be an accident. :) Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] vfio: Create vfio_fs_type with inode per device
  2021-08-10 14:57       ` Christoph Hellwig
@ 2021-08-10 18:49         ` Peter Xu
  2021-08-10 21:16           ` Alex Williamson
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Xu @ 2021-08-10 18:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Alex Williamson, Jason Gunthorpe, linux-kernel, kvm

On Tue, Aug 10, 2021 at 03:57:29PM +0100, Christoph Hellwig wrote:
> On Tue, Aug 10, 2021 at 08:52:54AM -0600, Alex Williamson wrote:
> > On Tue, 10 Aug 2021 10:43:29 +0200
> > Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > > > + * XXX Adopt the following when available:
> > > > + * https://lore.kernel.org/lkml/20210309155348.974875-1-hch@lst.de/  
> > > 
> > > No need for this link.
> > 
> > Is that effort dead?  I've used the link several times myself to search
> > for progress, so it's been useful to me.  Thanks,
> 
> No, but it seems odd to have reference to an old patchset in the kernel
> tree.

I learn from the reference too.  Maybe move into commit message?  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 2/7] vfio: Export unmap_mapping_range() wrapper
  2021-08-05 17:07 ` [PATCH 2/7] vfio: Export unmap_mapping_range() wrapper Alex Williamson
  2021-08-10  8:45   ` Christoph Hellwig
@ 2021-08-10 18:56   ` Peter Xu
  1 sibling, 0 replies; 38+ messages in thread
From: Peter Xu @ 2021-08-10 18:56 UTC (permalink / raw)
  To: Alex Williamson; +Cc: linux-kernel, kvm, jgg

On Thu, Aug 05, 2021 at 11:07:22AM -0600, Alex Williamson wrote:
> +void vfio_device_unmap_mapping_range(struct vfio_device *device,
> +				     loff_t start, loff_t len)
> +{
> +	unmap_mapping_range(device->inode->i_mapping, start, len, true);

(not a big deal, but still raise this up)

It seems to me VFIO MMIO regions do not allow private maps, so even_cow==true
should be the same with even_cow==false. even_cow==true will just check the
page mapping for each pte even though they should just all match, imho, so
logically "false" should work the same and should be tiny-little faster.

Thanks,

> +}
> +EXPORT_SYMBOL_GPL(vfio_device_unmap_mapping_range);

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-10  8:53   ` Christoph Hellwig
@ 2021-08-10 19:02     ` Peter Xu
  2021-08-10 20:51       ` Alex Williamson
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Xu @ 2021-08-10 19:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Alex Williamson, Jason Gunthorpe, linux-kernel, kvm

On Tue, Aug 10, 2021 at 10:53:50AM +0200, Christoph Hellwig wrote:
> On Thu, Aug 05, 2021 at 11:07:35AM -0600, Alex Williamson wrote:
> > +static void vfio_pci_zap_bars(struct vfio_pci_device *vdev)
> >  {
> > +	vfio_device_unmap_mapping_range(&vdev->vdev,
> > +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX),
> > +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) -
> > +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX));
> 
> Maybe make this a little more readable by having local variables:

Or just pass in unmap_mapping_range(start=0, len=0)?  As unmap_mapping_range()
understands len==0 as "to the end of file".  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-10 18:48   ` Peter Xu
@ 2021-08-10 19:59     ` Alex Williamson
  2021-08-10 20:20       ` Peter Xu
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2021-08-10 19:59 UTC (permalink / raw)
  To: Peter Xu; +Cc: Jason Gunthorpe, linux-kernel, kvm

On Tue, 10 Aug 2021 14:48:31 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Thu, Aug 05, 2021 at 11:07:35AM -0600, Alex Williamson wrote:
> > @@ -1690,7 +1554,7 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v
> >  
> >  	vma->vm_private_data = vdev;
> >  	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > -	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
> > +	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);  
> 
> This addition seems to be an accident. :) Thanks,

Nope, Jason noted on a previous version that io_remap_pfn_range() is
essentially:

  remap_pfn_range(vma, addr, pfn, size, pgprot_decrypted(prot));

So since we switched to vmf_insert_pfn() I added this page protection
flag to the vma instead, then it gets removed later when we switch back
to io_remap_pfn_range().  Thanks,

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-10 19:59     ` Alex Williamson
@ 2021-08-10 20:20       ` Peter Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Xu @ 2021-08-10 20:20 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm

On Tue, Aug 10, 2021 at 01:59:32PM -0600, Alex Williamson wrote:
> On Tue, 10 Aug 2021 14:48:31 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Thu, Aug 05, 2021 at 11:07:35AM -0600, Alex Williamson wrote:
> > > @@ -1690,7 +1554,7 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v
> > >  
> > >  	vma->vm_private_data = vdev;
> > >  	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > > -	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
> > > +	vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);  
> > 
> > This addition seems to be an accident. :) Thanks,
> 
> Nope, Jason noted on a previous version that io_remap_pfn_range() is
> essentially:
> 
>   remap_pfn_range(vma, addr, pfn, size, pgprot_decrypted(prot));
> 
> So since we switched to vmf_insert_pfn() I added this page protection
> flag to the vma instead, then it gets removed later when we switch back
> to io_remap_pfn_range().  Thanks,

I see, I read it wrongly as pgprot_noncached() previously.  Yes, it makes sense
to do so.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-10 19:02     ` Peter Xu
@ 2021-08-10 20:51       ` Alex Williamson
  0 siblings, 0 replies; 38+ messages in thread
From: Alex Williamson @ 2021-08-10 20:51 UTC (permalink / raw)
  To: Peter Xu; +Cc: Christoph Hellwig, Jason Gunthorpe, linux-kernel, kvm

On Tue, 10 Aug 2021 15:02:10 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Tue, Aug 10, 2021 at 10:53:50AM +0200, Christoph Hellwig wrote:
> > On Thu, Aug 05, 2021 at 11:07:35AM -0600, Alex Williamson wrote:  
> > > +static void vfio_pci_zap_bars(struct vfio_pci_device *vdev)
> > >  {
> > > +	vfio_device_unmap_mapping_range(&vdev->vdev,
> > > +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX),
> > > +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) -
> > > +			VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX));  
> > 
> > Maybe make this a little more readable by having local variables:  
> 
> Or just pass in unmap_mapping_range(start=0, len=0)?  As unmap_mapping_range()
> understands len==0 as "to the end of file".  Thanks,

But we're not actually trying to unmap the entire device fd, we're only
targeting the ranges of it that correspond to standard MMIO resources
of the device.  Our vma-to-pfn function for vfio-pci also only knows
how to lookup vmas in this range.  If there were mmap'able regions
outside of this, for example provided by vendor specific extensions, we
a) don't generically know if they're related to the device itself or
some supporting information managed in software (ex. IGD OpRegion) and
b) don't know how to lookup the pfn to remap them when MMIO is
re-enabled.

I don't think we have any such extensions today, but we might with
vfio-pci-core and we'd need to figure out how to include those regions
in some future work.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] vfio/pci: Remove map-on-fault behavior
  2021-08-05 17:08 ` [PATCH 7/7] vfio/pci: Remove map-on-fault behavior Alex Williamson
  2021-08-10  9:11   ` Christoph Hellwig
@ 2021-08-10 20:54   ` Peter Xu
  2021-08-10 21:45     ` Alex Williamson
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Xu @ 2021-08-10 20:54 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm

On Thu, Aug 05, 2021 at 11:08:21AM -0600, Alex Williamson wrote:
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index 0aa542fa1e26..9aedb78a4ae3 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -128,6 +128,7 @@ struct vfio_pci_device {
>  	bool			needs_reset;
>  	bool			nointx;
>  	bool			needs_pm_restore;
> +	bool			zapped_bars;

Would it be nicer to invert the meaning of "zapped_bars" and rename it to
"memory_enabled"?  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] vfio: Create vfio_fs_type with inode per device
  2021-08-10 18:49         ` Peter Xu
@ 2021-08-10 21:16           ` Alex Williamson
  2021-08-10 22:18             ` Peter Xu
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2021-08-10 21:16 UTC (permalink / raw)
  To: Peter Xu; +Cc: Christoph Hellwig, Jason Gunthorpe, linux-kernel, kvm

On Tue, 10 Aug 2021 14:49:45 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Tue, Aug 10, 2021 at 03:57:29PM +0100, Christoph Hellwig wrote:
> > On Tue, Aug 10, 2021 at 08:52:54AM -0600, Alex Williamson wrote:  
> > > On Tue, 10 Aug 2021 10:43:29 +0200
> > > Christoph Hellwig <hch@infradead.org> wrote:
> > >   
> > > > > + * XXX Adopt the following when available:
> > > > > + * https://lore.kernel.org/lkml/20210309155348.974875-1-hch@lst.de/    
> > > > 
> > > > No need for this link.  
> > > 
> > > Is that effort dead?  I've used the link several times myself to search
> > > for progress, so it's been useful to me.  Thanks,  
> > 
> > No, but it seems odd to have reference to an old patchset in the kernel
> > tree.  
> 
> I learn from the reference too.  Maybe move into commit message?  Thanks,

TBH, I'm ok if it's "odd" if it's useful.  Right here we have two
instances of it being useful.  I don't think that two lines of comment
is excessive and we can always remove it when we either make the
conversion or give up on it.  Moving it to the commit log would just
bury it to be pointless.

I don't have a more concise, current, or future-proof way to describe
the todo item than this link (ok, ok, I could s/lkml/r/ for 3 less
chars :-P).  Thanks,

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] vfio/pci: Remove map-on-fault behavior
  2021-08-10 20:54   ` Peter Xu
@ 2021-08-10 21:45     ` Alex Williamson
  2021-08-10 22:27       ` Peter Xu
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2021-08-10 21:45 UTC (permalink / raw)
  To: Peter Xu; +Cc: Jason Gunthorpe, linux-kernel, kvm

On Tue, 10 Aug 2021 16:54:19 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Thu, Aug 05, 2021 at 11:08:21AM -0600, Alex Williamson wrote:
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> > index 0aa542fa1e26..9aedb78a4ae3 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -128,6 +128,7 @@ struct vfio_pci_device {
> >  	bool			needs_reset;
> >  	bool			nointx;
> >  	bool			needs_pm_restore;
> > +	bool			zapped_bars;  
> 
> Would it be nicer to invert the meaning of "zapped_bars" and rename it to
> "memory_enabled"?  Thanks,

I think this has it's own down sides, for example is this really less
confusing?:

  if (!vdev->memory_enabled && __vfio_pci_memory_enabled(vdev))

Are you specifically trying to invert the polarity or just get away
from the name proposed here?  We could use something like
"bars_unmapped", which would have the same polarity (OTOH,
"bars_mapped" suggests something to me about whether the user has
performed any mmaps of BARs).

I do wish there was a more elegant solution than an @var tracking this
state in general, but I haven't come up with such a solution yet.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-10 12:55           ` Christoph Hellwig
@ 2021-08-10 21:50             ` Alex Williamson
  2021-08-11 11:57               ` Jason Gunthorpe
  0 siblings, 1 reply; 38+ messages in thread
From: Alex Williamson @ 2021-08-10 21:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jason Gunthorpe, linux-kernel, kvm, peterx

On Tue, 10 Aug 2021 13:55:00 +0100
Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Aug 10, 2021 at 08:57:22AM -0300, Jason Gunthorpe wrote:
> > I'm not sure there is a real performance win to chase here? Doesn't
> > this only protect mmap against reset? The mmap isn't performance
> > sensitive, right?
> > 
> > If this really needs extra optimization adding a rwsem to the devset
> > and using that across the whole set would surely be sufficient.  
> 
> Every mmio read or write takes memory_lock.

Exactly.  Ideally we're not using that path often, but I don't think
that's a good excuse to introduce memory access serialization, or even
dependencies between devices.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 1/7] vfio: Create vfio_fs_type with inode per device
  2021-08-10 21:16           ` Alex Williamson
@ 2021-08-10 22:18             ` Peter Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Xu @ 2021-08-10 22:18 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Christoph Hellwig, Jason Gunthorpe, linux-kernel, kvm

On Tue, Aug 10, 2021 at 03:16:14PM -0600, Alex Williamson wrote:
> On Tue, 10 Aug 2021 14:49:45 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Tue, Aug 10, 2021 at 03:57:29PM +0100, Christoph Hellwig wrote:
> > > On Tue, Aug 10, 2021 at 08:52:54AM -0600, Alex Williamson wrote:  
> > > > On Tue, 10 Aug 2021 10:43:29 +0200
> > > > Christoph Hellwig <hch@infradead.org> wrote:
> > > >   
> > > > > > + * XXX Adopt the following when available:
> > > > > > + * https://lore.kernel.org/lkml/20210309155348.974875-1-hch@lst.de/    
> > > > > 
> > > > > No need for this link.  
> > > > 
> > > > Is that effort dead?  I've used the link several times myself to search
> > > > for progress, so it's been useful to me.  Thanks,  
> > > 
> > > No, but it seems odd to have reference to an old patchset in the kernel
> > > tree.  
> > 
> > I learn from the reference too.  Maybe move into commit message?  Thanks,
> 
> TBH, I'm ok if it's "odd" if it's useful.  Right here we have two
> instances of it being useful.  I don't think that two lines of comment
> is excessive and we can always remove it when we either make the
> conversion or give up on it.  Moving it to the commit log would just
> bury it to be pointless.
> 
> I don't have a more concise, current, or future-proof way to describe
> the todo item than this link (ok, ok, I could s/lkml/r/ for 3 less
> chars :-P).  Thanks,

Yep, fine by me - either by keeping it in the code, commit message, or in cover
letter, as it's still an useful reference before that series got merged. Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 7/7] vfio/pci: Remove map-on-fault behavior
  2021-08-10 21:45     ` Alex Williamson
@ 2021-08-10 22:27       ` Peter Xu
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Xu @ 2021-08-10 22:27 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, linux-kernel, kvm

On Tue, Aug 10, 2021 at 03:45:12PM -0600, Alex Williamson wrote:
> On Tue, 10 Aug 2021 16:54:19 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Thu, Aug 05, 2021 at 11:08:21AM -0600, Alex Williamson wrote:
> > > diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> > > index 0aa542fa1e26..9aedb78a4ae3 100644
> > > --- a/drivers/vfio/pci/vfio_pci_private.h
> > > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > > @@ -128,6 +128,7 @@ struct vfio_pci_device {
> > >  	bool			needs_reset;
> > >  	bool			nointx;
> > >  	bool			needs_pm_restore;
> > > +	bool			zapped_bars;  
> > 
> > Would it be nicer to invert the meaning of "zapped_bars" and rename it to
> > "memory_enabled"?  Thanks,
> 
> I think this has it's own down sides, for example is this really less
> confusing?:
> 
>   if (!vdev->memory_enabled && __vfio_pci_memory_enabled(vdev))

Maybe "memory_enabled_last"?  No strong opinion, especially for namings. :)
zapped_bars still looks okay to me.  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range()
  2021-08-10 21:50             ` Alex Williamson
@ 2021-08-11 11:57               ` Jason Gunthorpe
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Gunthorpe @ 2021-08-11 11:57 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Christoph Hellwig, linux-kernel, kvm, peterx

On Tue, Aug 10, 2021 at 03:50:58PM -0600, Alex Williamson wrote:
> On Tue, 10 Aug 2021 13:55:00 +0100
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Tue, Aug 10, 2021 at 08:57:22AM -0300, Jason Gunthorpe wrote:
> > > I'm not sure there is a real performance win to chase here? Doesn't
> > > this only protect mmap against reset? The mmap isn't performance
> > > sensitive, right?
> > > 
> > > If this really needs extra optimization adding a rwsem to the devset
> > > and using that across the whole set would surely be sufficient.  
> > 
> > Every mmio read or write takes memory_lock.
> 
> Exactly.  Ideally we're not using that path often, but I don't think
> that's a good excuse to introduce memory access serialization, or even
> dependencies between devices.  Thanks,

But a cross device rwsem seems OK to me?? It won't contend unless we
are trying to reset and the upgrade to a percpu rwsem to optimize
the atomic doesn't seem warranted since this path already has a vmexit
and a syscall in it?

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2021-08-11 11:57 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05 17:06 [PATCH 0/7] vfio: device fd address space and vfio-pci mmap invalidation cleanup Alex Williamson
2021-08-05 17:07 ` [PATCH 1/7] vfio: Create vfio_fs_type with inode per device Alex Williamson
2021-08-10  8:43   ` Christoph Hellwig
2021-08-10 14:52     ` Alex Williamson
2021-08-10 14:57       ` Christoph Hellwig
2021-08-10 18:49         ` Peter Xu
2021-08-10 21:16           ` Alex Williamson
2021-08-10 22:18             ` Peter Xu
2021-08-05 17:07 ` [PATCH 2/7] vfio: Export unmap_mapping_range() wrapper Alex Williamson
2021-08-10  8:45   ` Christoph Hellwig
2021-08-10 18:56   ` Peter Xu
2021-08-05 17:07 ` [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range() Alex Williamson
2021-08-06  1:04   ` Jason Gunthorpe
2021-08-06 20:17     ` Alex Williamson
2021-08-10  8:51       ` Christoph Hellwig
2021-08-10 11:57         ` Jason Gunthorpe
2021-08-10 12:55           ` Christoph Hellwig
2021-08-10 21:50             ` Alex Williamson
2021-08-11 11:57               ` Jason Gunthorpe
2021-08-10  8:53   ` Christoph Hellwig
2021-08-10 19:02     ` Peter Xu
2021-08-10 20:51       ` Alex Williamson
2021-08-10 18:48   ` Peter Xu
2021-08-10 19:59     ` Alex Williamson
2021-08-10 20:20       ` Peter Xu
2021-08-05 17:07 ` [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback Alex Williamson
2021-08-06  1:01   ` Jason Gunthorpe
2021-08-10  9:00     ` Christoph Hellwig
2021-08-10  9:00   ` Christoph Hellwig
2021-08-05 17:08 ` [PATCH 5/7] mm/interval_tree.c: Export vma interval tree iterators Alex Williamson
2021-08-05 17:08 ` [PATCH 6/7] vfio: Add vfio_device_io_remap_mapping_range() Alex Williamson
2021-08-10  9:04   ` Christoph Hellwig
2021-08-05 17:08 ` [PATCH 7/7] vfio/pci: Remove map-on-fault behavior Alex Williamson
2021-08-10  9:11   ` Christoph Hellwig
2021-08-10 15:04     ` Alex Williamson
2021-08-10 20:54   ` Peter Xu
2021-08-10 21:45     ` Alex Williamson
2021-08-10 22:27       ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).