All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO
@ 2020-05-22 19:17 Alex Williamson
  2020-05-22 19:17 ` [PATCH v3 1/3] vfio/type1: Support faulting PFNMAP vmas Alex Williamson
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Alex Williamson @ 2020-05-22 19:17 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, cohuck, jgg, peterx, cai

v3:

The memory_lock semaphore is only held in the MSI-X path for callouts
to functions that may access MSI-X MMIO space of the device, this
should resolve the circular locking dependency reported by Qian
(re-testing very much appreciated).  I've also incorporated the
pci_map_rom() and pci_unmap_rom() calls under the memory_lock.  Commit
0cfd027be1d6 ("vfio_pci: Enable memory accesses before calling
pci_map_rom") made sure memory was enabled on the info path, but did
not provide locking to protect that state.  The r/w path of the BAR
access is expanded to include ROM mapping/unmapping.  Unless there
are objections, I'll plan to drop v2 from my next branch and replace
it with this.  Thanks,

Alex

v2:

Locking in 3/ is substantially changed to avoid the retry scenario
within the fault handler, therefore a caller who does not allow retry
will no longer receive a SIGBUS on contention.  IOMMU invalidations
are still not included here, I expect that will be a future follow-on
change as we're not fundamentally changing that issue in this series.
The 'add to vma list only on fault' behavior is also still included
here, per the discussion I think it's still a valid approach and has
some advantages, particularly in a VM scenario where we potentially
defer the mapping until the MMIO BAR is actually DMA mapped into the
VM address space (or the guest driver actually accesses the device
if that DMA mapping is eliminated at some point).  Further discussion
and review appreciated.  Thanks,

Alex

v1:

Add tracking of the device memory enable bit and block/fault accesses
to device MMIO space while disabled.  This provides synchronous fault
handling for CPU accesses to the device and prevents the user from
triggering platform level error handling present on some systems.
Device reset and MSI-X vector table accesses are also included such
that access is blocked across reset and vector table accesses do not
depend on the user configuration of the device.

This is based on the vfio for-linus branch currently in next, making
use of follow_pfn() in vaddr_get_pfn() and therefore requiring patch
1/ to force the user fault in the case that a PFNMAP vma might be
DMA mapped before user access.  Further PFNMAP iommu invalidation
tracking is not yet included here.

As noted in the comments, I'm copying quite a bit of the logic from
rdma code for performing the zap_vma_ptes() calls and I'm also
attempting to resolve lock ordering issues in the fault handler to
lockdep's satisfaction.  I appreciate extra eyes on these sections in
particular.

I expect this to be functionally equivalent for any well behaved
userspace driver, but obviously there is a potential for the user to
get -EIO or SIGBUS on device access.  The device is provided to the
user enabled and device resets will restore the command register, so
by my evaluation a user would need to explicitly disable the memory
enable bit to trigger these faults.  We could potentially remap vmas
to a zero page rather than SIGBUS if we experience regressions, but
without known code requiring that, SIGBUS seems the appropriate
response to this condition.  Thanks,

Alex

---

Alex Williamson (3):
      vfio/type1: Support faulting PFNMAP vmas
      vfio-pci: Fault mmaps to enable vma tracking
      vfio-pci: Invalidate mmaps and block MMIO access on disabled memory


 drivers/vfio/pci/vfio_pci.c         |  349 ++++++++++++++++++++++++++++++++---
 drivers/vfio/pci/vfio_pci_config.c  |   36 +++-
 drivers/vfio/pci/vfio_pci_intrs.c   |   14 +
 drivers/vfio/pci/vfio_pci_private.h |   15 ++
 drivers/vfio/pci/vfio_pci_rdwr.c    |   24 ++
 drivers/vfio/vfio_iommu_type1.c     |   36 +++-
 6 files changed, 435 insertions(+), 39 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/3] vfio/type1: Support faulting PFNMAP vmas
  2020-05-22 19:17 [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO Alex Williamson
@ 2020-05-22 19:17 ` Alex Williamson
  2020-05-22 19:17 ` [PATCH v3 2/3] vfio-pci: Fault mmaps to enable vma tracking Alex Williamson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Alex Williamson @ 2020-05-22 19:17 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, cohuck, jgg, peterx, cai

With conversion to follow_pfn(), DMA mapping a PFNMAP range depends on
the range being faulted into the vma.  Add support to manually provide
that, in the same way as done on KVM with hva_to_pfn_remapped().

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio_iommu_type1.c |   36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cc1d64765ce7..4a4cb7cd86b2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -317,6 +317,32 @@ static int put_pfn(unsigned long pfn, int prot)
 	return 0;
 }
 
+static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
+			    unsigned long vaddr, unsigned long *pfn,
+			    bool write_fault)
+{
+	int ret;
+
+	ret = follow_pfn(vma, vaddr, pfn);
+	if (ret) {
+		bool unlocked = false;
+
+		ret = fixup_user_fault(NULL, mm, vaddr,
+				       FAULT_FLAG_REMOTE |
+				       (write_fault ?  FAULT_FLAG_WRITE : 0),
+				       &unlocked);
+		if (unlocked)
+			return -EAGAIN;
+
+		if (ret)
+			return ret;
+
+		ret = follow_pfn(vma, vaddr, pfn);
+	}
+
+	return ret;
+}
+
 static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 			 int prot, unsigned long *pfn)
 {
@@ -339,12 +365,16 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 
 	vaddr = untagged_addr(vaddr);
 
+retry:
 	vma = find_vma_intersection(mm, vaddr, vaddr + 1);
 
 	if (vma && vma->vm_flags & VM_PFNMAP) {
-		if (!follow_pfn(vma, vaddr, pfn) &&
-		    is_invalid_reserved_pfn(*pfn))
-			ret = 0;
+		ret = follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE);
+		if (ret == -EAGAIN)
+			goto retry;
+
+		if (!ret && !is_invalid_reserved_pfn(*pfn))
+			ret = -EFAULT;
 	}
 done:
 	up_read(&mm->mmap_sem);


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/3] vfio-pci: Fault mmaps to enable vma tracking
  2020-05-22 19:17 [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO Alex Williamson
  2020-05-22 19:17 ` [PATCH v3 1/3] vfio/type1: Support faulting PFNMAP vmas Alex Williamson
@ 2020-05-22 19:17 ` Alex Williamson
  2020-05-22 19:17 ` [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory Alex Williamson
  2020-05-22 22:08 ` [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO Qian Cai
  3 siblings, 0 replies; 24+ messages in thread
From: Alex Williamson @ 2020-05-22 19:17 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, cohuck, jgg, peterx, cai

Rather than calling remap_pfn_range() when a region is mmap'd, setup
a vm_ops handler to support dynamic faulting of the range on access.
This allows us to manage a list of vmas actively mapping the area that
we can later use to invalidate those mappings.  The open callback
invalidates the vma range so that all tracking is inserted in the
fault handler and removed in the close handler.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c         |   76 ++++++++++++++++++++++++++++++++++-
 drivers/vfio/pci/vfio_pci_private.h |    7 +++
 2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 6c6b37b5c04e..66a545a01f8f 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1299,6 +1299,70 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 	return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true);
 }
 
+static int vfio_pci_add_vma(struct vfio_pci_device *vdev,
+			    struct vm_area_struct *vma)
+{
+	struct vfio_pci_mmap_vma *mmap_vma;
+
+	mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL);
+	if (!mmap_vma)
+		return -ENOMEM;
+
+	mmap_vma->vma = vma;
+
+	mutex_lock(&vdev->vma_lock);
+	list_add(&mmap_vma->vma_next, &vdev->vma_list);
+	mutex_unlock(&vdev->vma_lock);
+
+	return 0;
+}
+
+/*
+ * Zap mmaps on open so that we can fault them in on access and therefore
+ * our vma_list only tracks mappings accessed since last zap.
+ */
+static void vfio_pci_mmap_open(struct vm_area_struct *vma)
+{
+	zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
+}
+
+static void vfio_pci_mmap_close(struct vm_area_struct *vma)
+{
+	struct vfio_pci_device *vdev = vma->vm_private_data;
+	struct vfio_pci_mmap_vma *mmap_vma;
+
+	mutex_lock(&vdev->vma_lock);
+	list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
+		if (mmap_vma->vma == vma) {
+			list_del(&mmap_vma->vma_next);
+			kfree(mmap_vma);
+			break;
+		}
+	}
+	mutex_unlock(&vdev->vma_lock);
+}
+
+static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct vfio_pci_device *vdev = vma->vm_private_data;
+
+	if (vfio_pci_add_vma(vdev, vma))
+		return VM_FAULT_OOM;
+
+	if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
+			    vma->vm_end - vma->vm_start, vma->vm_page_prot))
+		return VM_FAULT_SIGBUS;
+
+	return VM_FAULT_NOPAGE;
+}
+
+static const struct vm_operations_struct vfio_pci_mmap_ops = {
+	.open = vfio_pci_mmap_open,
+	.close = vfio_pci_mmap_close,
+	.fault = vfio_pci_mmap_fault,
+};
+
 static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 {
 	struct vfio_pci_device *vdev = device_data;
@@ -1357,8 +1421,14 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
 	vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;
 
-	return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-			       req_len, vma->vm_page_prot);
+	/*
+	 * See remap_pfn_range(), called from vfio_pci_fault() but we can't
+	 * change vm_flags within the fault handler.  Set them now.
+	 */
+	vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+	vma->vm_ops = &vfio_pci_mmap_ops;
+
+	return 0;
 }
 
 static void vfio_pci_request(void *device_data, unsigned int count)
@@ -1608,6 +1678,8 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	spin_lock_init(&vdev->irqlock);
 	mutex_init(&vdev->ioeventfds_lock);
 	INIT_LIST_HEAD(&vdev->ioeventfds_list);
+	mutex_init(&vdev->vma_lock);
+	INIT_LIST_HEAD(&vdev->vma_list);
 
 	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
 	if (ret)
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 36ec69081ecd..9b25f9f6ce1d 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -92,6 +92,11 @@ struct vfio_pci_vf_token {
 	int			users;
 };
 
+struct vfio_pci_mmap_vma {
+	struct vm_area_struct	*vma;
+	struct list_head	vma_next;
+};
+
 struct vfio_pci_device {
 	struct pci_dev		*pdev;
 	void __iomem		*barmap[PCI_STD_NUM_BARS];
@@ -132,6 +137,8 @@ struct vfio_pci_device {
 	struct list_head	ioeventfds_list;
 	struct vfio_pci_vf_token	*vf_token;
 	struct notifier_block	nb;
+	struct mutex		vma_lock;
+	struct list_head	vma_list;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-22 19:17 [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO Alex Williamson
  2020-05-22 19:17 ` [PATCH v3 1/3] vfio/type1: Support faulting PFNMAP vmas Alex Williamson
  2020-05-22 19:17 ` [PATCH v3 2/3] vfio-pci: Fault mmaps to enable vma tracking Alex Williamson
@ 2020-05-22 19:17 ` Alex Williamson
  2020-05-23 19:34   ` Peter Xu
  2020-05-22 22:08 ` [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO Qian Cai
  3 siblings, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2020-05-22 19:17 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, cohuck, jgg, peterx, cai

Accessing the disabled memory space of a PCI device would typically
result in a master abort response on conventional PCI, or an
unsupported request on PCI express.  The user would generally see
these as a -1 response for the read return data and the write would be
silently discarded, possibly with an uncorrected, non-fatal AER error
triggered on the host.  Some systems however take it upon themselves
to bring down the entire system when they see something that might
indicate a loss of data, such as this discarded write to a disabled
memory space.

To avoid this, we want to try to block the user from accessing memory
spaces while they're disabled.  We start with a semaphore around the
memory enable bit, where writers modify the memory enable state and
must be serialized, while readers make use of the memory region and
can access in parallel.  Writers include both direct manipulation via
the command register, as well as any reset path where the internal
mechanics of the reset may both explicitly and implicitly disable
memory access, and manipulation of the MSI-X configuration, where the
MSI-X vector table resides in MMIO space of the device.  Readers
include the read and write file ops to access the vfio device fd
offsets as well as memory mapped access.  In the latter case, we make
use of our new vma list support to zap, or invalidate, those memory
mappings in order to force them to be faulted back in on access.

Our semaphore usage will stall user access to MMIO spaces across
internal operations like reset, but the user might experience new
behavior when trying to access the MMIO space while disabled via the
PCI command register.  Access via read or write while disabled will
return -EIO and access via memory maps will result in a SIGBUS.  This
is expected to be compatible with known use cases and potentially
provides better error handling capabilities than present in the
hardware, while avoiding the more readily accessible and severe
platform error responses that might otherwise occur.

Fixes: CVE-2020-12888
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c         |  291 +++++++++++++++++++++++++++++++----
 drivers/vfio/pci/vfio_pci_config.c  |   36 ++++
 drivers/vfio/pci/vfio_pci_intrs.c   |   14 ++
 drivers/vfio/pci/vfio_pci_private.h |    8 +
 drivers/vfio/pci/vfio_pci_rdwr.c    |   24 ++-
 5 files changed, 330 insertions(+), 43 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 66a545a01f8f..aabba6439a5b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -26,6 +26,7 @@
 #include <linux/vfio.h>
 #include <linux/vgaarb.h>
 #include <linux/nospec.h>
+#include <linux/sched/mm.h>
 
 #include "vfio_pci_private.h"
 
@@ -184,6 +185,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
 
 static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
 static void vfio_pci_disable(struct vfio_pci_device *vdev);
+static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data);
 
 /*
  * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -736,6 +738,12 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
 	return 0;
 }
 
+struct vfio_devices {
+	struct vfio_device **devices;
+	int cur_index;
+	int max_index;
+};
+
 static long vfio_pci_ioctl(void *device_data,
 			   unsigned int cmd, unsigned long arg)
 {
@@ -809,7 +817,7 @@ static long vfio_pci_ioctl(void *device_data,
 		{
 			void __iomem *io;
 			size_t size;
-			u16 orig_cmd;
+			u16 cmd;
 
 			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
 			info.flags = 0;
@@ -829,10 +837,7 @@ static long vfio_pci_ioctl(void *device_data,
 			 * Is it really there?  Enable memory decode for
 			 * implicit access in pci_map_rom().
 			 */
-			pci_read_config_word(pdev, PCI_COMMAND, &orig_cmd);
-			pci_write_config_word(pdev, PCI_COMMAND,
-					      orig_cmd | PCI_COMMAND_MEMORY);
-
+			cmd = vfio_pci_memory_lock_and_enable(vdev);
 			io = pci_map_rom(pdev, &size);
 			if (io) {
 				info.flags = VFIO_REGION_INFO_FLAG_READ;
@@ -840,8 +845,8 @@ static long vfio_pci_ioctl(void *device_data,
 			} else {
 				info.size = 0;
 			}
+			vfio_pci_memory_unlock_and_restore(vdev, cmd);
 
-			pci_write_config_word(pdev, PCI_COMMAND, orig_cmd);
 			break;
 		}
 		case VFIO_PCI_VGA_REGION_INDEX:
@@ -984,8 +989,16 @@ static long vfio_pci_ioctl(void *device_data,
 		return ret;
 
 	} else if (cmd == VFIO_DEVICE_RESET) {
-		return vdev->reset_works ?
-			pci_try_reset_function(vdev->pdev) : -EINVAL;
+		int ret;
+
+		if (!vdev->reset_works)
+			return -EINVAL;
+
+		vfio_pci_zap_and_down_write_memory_lock(vdev);
+		ret = pci_try_reset_function(vdev->pdev);
+		up_write(&vdev->memory_lock);
+
+		return ret;
 
 	} else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) {
 		struct vfio_pci_hot_reset_info hdr;
@@ -1065,8 +1078,9 @@ static long vfio_pci_ioctl(void *device_data,
 		int32_t *group_fds;
 		struct vfio_pci_group_entry *groups;
 		struct vfio_pci_group_info info;
+		struct vfio_devices devs = { .cur_index = 0 };
 		bool slot = false;
-		int i, count = 0, ret = 0;
+		int i, group_idx, mem_idx = 0, count = 0, ret = 0;
 
 		minsz = offsetofend(struct vfio_pci_hot_reset, count);
 
@@ -1118,9 +1132,9 @@ static long vfio_pci_ioctl(void *device_data,
 		 * user interface and store the group and iommu ID.  This
 		 * ensures the group is held across the reset.
 		 */
-		for (i = 0; i < hdr.count; i++) {
+		for (group_idx = 0; group_idx < hdr.count; group_idx++) {
 			struct vfio_group *group;
-			struct fd f = fdget(group_fds[i]);
+			struct fd f = fdget(group_fds[group_idx]);
 			if (!f.file) {
 				ret = -EBADF;
 				break;
@@ -1133,8 +1147,9 @@ static long vfio_pci_ioctl(void *device_data,
 				break;
 			}
 
-			groups[i].group = group;
-			groups[i].id = vfio_external_user_iommu_id(group);
+			groups[group_idx].group = group;
+			groups[group_idx].id =
+					vfio_external_user_iommu_id(group);
 		}
 
 		kfree(group_fds);
@@ -1153,13 +1168,63 @@ static long vfio_pci_ioctl(void *device_data,
 		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
 						    vfio_pci_validate_devs,
 						    &info, slot);
-		if (!ret)
-			/* User has access, do the reset */
-			ret = pci_reset_bus(vdev->pdev);
+		if (ret)
+			goto hot_reset_release;
+
+		devs.max_index = count;
+		devs.devices = kcalloc(count, sizeof(struct vfio_device *),
+				       GFP_KERNEL);
+		if (!devs.devices) {
+			ret = -ENOMEM;
+			goto hot_reset_release;
+		}
+
+		/*
+		 * We need to get memory_lock for each device, but devices
+		 * can share mmap_sem, therefore we need to zap and hold
+		 * the vma_lock for each device, and only then get each
+		 * memory_lock.
+		 */
+		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
+					    vfio_pci_try_zap_and_vma_lock_cb,
+					    &devs, slot);
+		if (ret)
+			goto hot_reset_release;
+
+		for (; mem_idx < devs.cur_index; mem_idx++) {
+			struct vfio_pci_device *tmp;
+
+			tmp = vfio_device_data(devs.devices[mem_idx]);
+
+			ret = down_write_trylock(&tmp->memory_lock);
+			if (!ret) {
+				ret = -EBUSY;
+				goto hot_reset_release;
+			}
+			mutex_unlock(&tmp->vma_lock);
+		}
+
+		/* User has access, do the reset */
+		ret = pci_reset_bus(vdev->pdev);
 
 hot_reset_release:
-		for (i--; i >= 0; i--)
-			vfio_group_put_external_user(groups[i].group);
+		for (i = 0; i < devs.cur_index; i++) {
+			struct vfio_device *device;
+			struct vfio_pci_device *tmp;
+
+			device = devs.devices[i];
+			tmp = vfio_device_data(device);
+
+			if (i < mem_idx)
+				up_write(&tmp->memory_lock);
+			else
+				mutex_unlock(&tmp->vma_lock);
+			vfio_device_put(device);
+		}
+		kfree(devs.devices);
+
+		for (group_idx--; group_idx >= 0; group_idx--)
+			vfio_group_put_external_user(groups[group_idx].group);
 
 		kfree(groups);
 		return ret;
@@ -1299,8 +1364,126 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 	return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true);
 }
 
-static int vfio_pci_add_vma(struct vfio_pci_device *vdev,
-			    struct vm_area_struct *vma)
+/* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try) */
+static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
+{
+	struct vfio_pci_mmap_vma *mmap_vma, *tmp;
+
+	/*
+	 * Lock ordering:
+	 * vma_lock is nested under mmap_sem for vm_ops callback paths.
+	 * The memory_lock semaphore is used by both code paths calling
+	 * into this function to zap vmas and the vm_ops.fault callback
+	 * to protect the memory enable state of the device.
+	 *
+	 * When zapping vmas we need to maintain the mmap_sem => vma_lock
+	 * ordering, which requires using vma_lock to walk vma_list to
+	 * acquire an mm, then dropping vma_lock to get the mmap_sem and
+	 * reacquiring vma_lock.  This logic is derived from similar
+	 * requirements in uverbs_user_mmap_disassociate().
+	 *
+	 * mmap_sem must always be the top-level lock when it is taken.
+	 * Therefore we can only hold the memory_lock write lock when
+	 * vma_list is empty, as we'd need to take mmap_sem to clear
+	 * entries.  vma_list can only be guaranteed empty when holding
+	 * vma_lock, thus memory_lock is nested under vma_lock.
+	 *
+	 * This enables the vm_ops.fault callback to acquire vma_lock,
+	 * followed by memory_lock read lock, while already holding
+	 * mmap_sem without risk of deadlock.
+	 */
+	while (1) {
+		struct mm_struct *mm = NULL;
+
+		if (try) {
+			if (!mutex_trylock(&vdev->vma_lock))
+				return 0;
+		} else {
+			mutex_lock(&vdev->vma_lock);
+		}
+		while (!list_empty(&vdev->vma_list)) {
+			mmap_vma = list_first_entry(&vdev->vma_list,
+						    struct vfio_pci_mmap_vma,
+						    vma_next);
+			mm = mmap_vma->vma->vm_mm;
+			if (mmget_not_zero(mm))
+				break;
+
+			list_del(&mmap_vma->vma_next);
+			kfree(mmap_vma);
+			mm = NULL;
+		}
+		if (!mm)
+			return 1;
+		mutex_unlock(&vdev->vma_lock);
+
+		if (try) {
+			if (!down_read_trylock(&mm->mmap_sem)) {
+				mmput(mm);
+				return 0;
+			}
+		} else {
+			down_read(&mm->mmap_sem);
+		}
+		if (mmget_still_valid(mm)) {
+			if (try) {
+				if (!mutex_trylock(&vdev->vma_lock)) {
+					up_read(&mm->mmap_sem);
+					mmput(mm);
+					return 0;
+				}
+			} else {
+				mutex_lock(&vdev->vma_lock);
+			}
+			list_for_each_entry_safe(mmap_vma, tmp,
+						 &vdev->vma_list, vma_next) {
+				struct vm_area_struct *vma = mmap_vma->vma;
+
+				if (vma->vm_mm != mm)
+					continue;
+
+				list_del(&mmap_vma->vma_next);
+				kfree(mmap_vma);
+
+				zap_vma_ptes(vma, vma->vm_start,
+					     vma->vm_end - vma->vm_start);
+			}
+			mutex_unlock(&vdev->vma_lock);
+		}
+		up_read(&mm->mmap_sem);
+		mmput(mm);
+	}
+}
+
+void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev)
+{
+	vfio_pci_zap_and_vma_lock(vdev, false);
+	down_write(&vdev->memory_lock);
+	mutex_unlock(&vdev->vma_lock);
+}
+
+u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev)
+{
+	u16 cmd;
+
+	down_write(&vdev->memory_lock);
+	pci_read_config_word(vdev->pdev, PCI_COMMAND, &cmd);
+	if (!(cmd & PCI_COMMAND_MEMORY))
+		pci_write_config_word(vdev->pdev, PCI_COMMAND,
+				      cmd | PCI_COMMAND_MEMORY);
+
+	return cmd;
+}
+
+void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd)
+{
+	pci_write_config_word(vdev->pdev, PCI_COMMAND, cmd);
+	up_write(&vdev->memory_lock);
+}
+
+/* Caller holds vma_lock */
+static int __vfio_pci_add_vma(struct vfio_pci_device *vdev,
+			      struct vm_area_struct *vma)
 {
 	struct vfio_pci_mmap_vma *mmap_vma;
 
@@ -1309,10 +1492,7 @@ static int vfio_pci_add_vma(struct vfio_pci_device *vdev,
 		return -ENOMEM;
 
 	mmap_vma->vma = vma;
-
-	mutex_lock(&vdev->vma_lock);
 	list_add(&mmap_vma->vma_next, &vdev->vma_list);
-	mutex_unlock(&vdev->vma_lock);
 
 	return 0;
 }
@@ -1346,15 +1526,32 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	struct vfio_pci_device *vdev = vma->vm_private_data;
+	vm_fault_t ret = VM_FAULT_NOPAGE;
+
+	mutex_lock(&vdev->vma_lock);
+	down_read(&vdev->memory_lock);
+
+	if (!__vfio_pci_memory_enabled(vdev)) {
+		ret = VM_FAULT_SIGBUS;
+		mutex_unlock(&vdev->vma_lock);
+		goto up_out;
+	}
 
-	if (vfio_pci_add_vma(vdev, vma))
-		return VM_FAULT_OOM;
+	if (__vfio_pci_add_vma(vdev, vma)) {
+		ret = VM_FAULT_OOM;
+		mutex_unlock(&vdev->vma_lock);
+		goto up_out;
+	}
+
+	mutex_unlock(&vdev->vma_lock);
 
 	if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
 			    vma->vm_end - vma->vm_start, vma->vm_page_prot))
-		return VM_FAULT_SIGBUS;
+		ret = VM_FAULT_SIGBUS;
 
-	return VM_FAULT_NOPAGE;
+up_out:
+	up_read(&vdev->memory_lock);
+	return ret;
 }
 
 static const struct vm_operations_struct vfio_pci_mmap_ops = {
@@ -1680,6 +1877,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_LIST_HEAD(&vdev->ioeventfds_list);
 	mutex_init(&vdev->vma_lock);
 	INIT_LIST_HEAD(&vdev->vma_list);
+	init_rwsem(&vdev->memory_lock);
 
 	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
 	if (ret)
@@ -1933,12 +2131,6 @@ static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck)
 	kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock);
 }
 
-struct vfio_devices {
-	struct vfio_device **devices;
-	int cur_index;
-	int max_index;
-};
-
 static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
 {
 	struct vfio_devices *devs = data;
@@ -1969,6 +2161,39 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
 	return 0;
 }
 
+static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
+{
+	struct vfio_devices *devs = data;
+	struct vfio_device *device;
+	struct vfio_pci_device *vdev;
+
+	if (devs->cur_index == devs->max_index)
+		return -ENOSPC;
+
+	device = vfio_device_get_from_dev(&pdev->dev);
+	if (!device)
+		return -EINVAL;
+
+	if (pci_dev_driver(pdev) != &vfio_pci_driver) {
+		vfio_device_put(device);
+		return -EBUSY;
+	}
+
+	vdev = vfio_device_data(device);
+
+	/*
+	 * Locking multiple devices is prone to deadlock, runaway and
+	 * unwind if we hit contention.
+	 */
+	if (!vfio_pci_zap_and_vma_lock(vdev, true)) {
+		vfio_device_put(device);
+		return -EBUSY;
+	}
+
+	devs->devices[devs->cur_index++] = device;
+	return 0;
+}
+
 /*
  * If a bus or slot reset is available for the provided device and:
  *  - All of the devices affected by that bus or slot reset are unused
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 90c0b80f8acf..3dcddbd572e6 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -395,6 +395,14 @@ static inline void p_setd(struct perm_bits *p, int off, u32 virt, u32 write)
 	*(__le32 *)(&p->write[off]) = cpu_to_le32(write);
 }
 
+/* Caller should hold memory_lock semaphore */
+bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev)
+{
+	u16 cmd = le16_to_cpu(*(__le16 *)&vdev->vconfig[PCI_COMMAND]);
+
+	return cmd & PCI_COMMAND_MEMORY;
+}
+
 /*
  * Restore the *real* BARs after we detect a FLR or backdoor reset.
  * (backdoor = some device specific technique that we didn't catch)
@@ -556,13 +564,18 @@ static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos,
 
 		new_cmd = le32_to_cpu(val);
 
+		phys_io = !!(phys_cmd & PCI_COMMAND_IO);
+		virt_io = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_IO);
+		new_io = !!(new_cmd & PCI_COMMAND_IO);
+
 		phys_mem = !!(phys_cmd & PCI_COMMAND_MEMORY);
 		virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY);
 		new_mem = !!(new_cmd & PCI_COMMAND_MEMORY);
 
-		phys_io = !!(phys_cmd & PCI_COMMAND_IO);
-		virt_io = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_IO);
-		new_io = !!(new_cmd & PCI_COMMAND_IO);
+		if (!new_mem)
+			vfio_pci_zap_and_down_write_memory_lock(vdev);
+		else
+			down_write(&vdev->memory_lock);
 
 		/*
 		 * If the user is writing mem/io enable (new_mem/io) and we
@@ -579,8 +592,11 @@ static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos,
 	}
 
 	count = vfio_default_config_write(vdev, pos, count, perm, offset, val);
-	if (count < 0)
+	if (count < 0) {
+		if (offset == PCI_COMMAND)
+			up_write(&vdev->memory_lock);
 		return count;
+	}
 
 	/*
 	 * Save current memory/io enable bits in vconfig to allow for
@@ -591,6 +607,8 @@ static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos,
 
 		*virt_cmd &= cpu_to_le16(~mask);
 		*virt_cmd |= cpu_to_le16(new_cmd & mask);
+
+		up_write(&vdev->memory_lock);
 	}
 
 	/* Emulate INTx disable */
@@ -828,8 +846,11 @@ static int vfio_exp_config_write(struct vfio_pci_device *vdev, int pos,
 						 pos - offset + PCI_EXP_DEVCAP,
 						 &cap);
 
-		if (!ret && (cap & PCI_EXP_DEVCAP_FLR))
+		if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) {
+			vfio_pci_zap_and_down_write_memory_lock(vdev);
 			pci_try_reset_function(vdev->pdev);
+			up_write(&vdev->memory_lock);
+		}
 	}
 
 	/*
@@ -907,8 +928,11 @@ static int vfio_af_config_write(struct vfio_pci_device *vdev, int pos,
 						pos - offset + PCI_AF_CAP,
 						&cap);
 
-		if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP))
+		if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) {
+			vfio_pci_zap_and_down_write_memory_lock(vdev);
 			pci_try_reset_function(vdev->pdev);
+			up_write(&vdev->memory_lock);
+		}
 	}
 
 	return count;
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 2056f3f85f59..1d9fb2592945 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -249,6 +249,7 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
 	struct pci_dev *pdev = vdev->pdev;
 	unsigned int flag = msix ? PCI_IRQ_MSIX : PCI_IRQ_MSI;
 	int ret;
+	u16 cmd;
 
 	if (!is_irq_none(vdev))
 		return -EINVAL;
@@ -258,13 +259,16 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
 		return -ENOMEM;
 
 	/* return the number of supported vectors if we can't get all: */
+	cmd = vfio_pci_memory_lock_and_enable(vdev);
 	ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag);
 	if (ret < nvec) {
 		if (ret > 0)
 			pci_free_irq_vectors(pdev);
+		vfio_pci_memory_unlock_and_restore(vdev, cmd);
 		kfree(vdev->ctx);
 		return ret;
 	}
+	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 
 	vdev->num_ctx = nvec;
 	vdev->irq_type = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
@@ -287,6 +291,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	struct pci_dev *pdev = vdev->pdev;
 	struct eventfd_ctx *trigger;
 	int irq, ret;
+	u16 cmd;
 
 	if (vector < 0 || vector >= vdev->num_ctx)
 		return -EINVAL;
@@ -295,7 +300,11 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 
 	if (vdev->ctx[vector].trigger) {
 		irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
+
+		cmd = vfio_pci_memory_lock_and_enable(vdev);
 		free_irq(irq, vdev->ctx[vector].trigger);
+		vfio_pci_memory_unlock_and_restore(vdev, cmd);
+
 		kfree(vdev->ctx[vector].name);
 		eventfd_ctx_put(vdev->ctx[vector].trigger);
 		vdev->ctx[vector].trigger = NULL;
@@ -323,6 +332,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	 * such a reset it would be unsuccessful. To avoid this, restore the
 	 * cached value of the message prior to enabling.
 	 */
+	cmd = vfio_pci_memory_lock_and_enable(vdev);
 	if (msix) {
 		struct msi_msg msg;
 
@@ -332,6 +342,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 
 	ret = request_irq(irq, vfio_msihandler, 0,
 			  vdev->ctx[vector].name, trigger);
+	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 	if (ret) {
 		kfree(vdev->ctx[vector].name);
 		eventfd_ctx_put(trigger);
@@ -376,6 +387,7 @@ static void vfio_msi_disable(struct vfio_pci_device *vdev, bool msix)
 {
 	struct pci_dev *pdev = vdev->pdev;
 	int i;
+	u16 cmd;
 
 	for (i = 0; i < vdev->num_ctx; i++) {
 		vfio_virqfd_disable(&vdev->ctx[i].unmask);
@@ -384,7 +396,9 @@ static void vfio_msi_disable(struct vfio_pci_device *vdev, bool msix)
 
 	vfio_msi_set_block(vdev, 0, vdev->num_ctx, NULL, msix);
 
+	cmd = vfio_pci_memory_lock_and_enable(vdev);
 	pci_free_irq_vectors(pdev);
+	vfio_pci_memory_unlock_and_restore(vdev, cmd);
 
 	/*
 	 * Both disable paths above use pci_intx_for_msi() to clear DisINTx
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 9b25f9f6ce1d..86a02aff8735 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -139,6 +139,7 @@ struct vfio_pci_device {
 	struct notifier_block	nb;
 	struct mutex		vma_lock;
 	struct list_head	vma_list;
+	struct rw_semaphore	memory_lock;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
@@ -181,6 +182,13 @@ extern int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
 extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev,
 				    pci_power_t state);
 
+extern bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev);
+extern void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device
+						    *vdev);
+extern u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev);
+extern void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev,
+					       u16 cmd);
+
 #ifdef CONFIG_VFIO_PCI_IGD
 extern int vfio_pci_igd_init(struct vfio_pci_device *vdev);
 #else
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index a87992892a9f..916b184df3a5 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -162,6 +162,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 	size_t x_start = 0, x_end = 0;
 	resource_size_t end;
 	void __iomem *io;
+	struct resource *res = &vdev->pdev->resource[bar];
 	ssize_t done;
 
 	if (pci_resource_start(pdev, bar))
@@ -177,6 +178,14 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 
 	count = min(count, (size_t)(end - pos));
 
+	if (res->flags & IORESOURCE_MEM) {
+		down_read(&vdev->memory_lock);
+		if (!__vfio_pci_memory_enabled(vdev)) {
+			up_read(&vdev->memory_lock);
+			return -EIO;
+		}
+	}
+
 	if (bar == PCI_ROM_RESOURCE) {
 		/*
 		 * The ROM can fill less space than the BAR, so we start the
@@ -184,13 +193,17 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 		 * filling large ROM BARs much faster.
 		 */
 		io = pci_map_rom(pdev, &x_start);
-		if (!io)
-			return -ENOMEM;
+		if (!io) {
+			done = -ENOMEM;
+			goto out;
+		}
 		x_end = end;
 	} else {
 		int ret = vfio_pci_setup_barmap(vdev, bar);
-		if (ret)
-			return ret;
+		if (ret) {
+			done = ret;
+			goto out;
+		}
 
 		io = vdev->barmap[bar];
 	}
@@ -207,6 +220,9 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 
 	if (bar == PCI_ROM_RESOURCE)
 		pci_unmap_rom(pdev, io);
+out:
+	if (res->flags & IORESOURCE_MEM)
+		up_read(&vdev->memory_lock);
 
 	return done;
 }


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO
  2020-05-22 19:17 [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO Alex Williamson
                   ` (2 preceding siblings ...)
  2020-05-22 19:17 ` [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory Alex Williamson
@ 2020-05-22 22:08 ` Qian Cai
  2020-05-22 22:25   ` Alex Williamson
  3 siblings, 1 reply; 24+ messages in thread
From: Qian Cai @ 2020-05-22 22:08 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, cohuck, jgg, peterx

On Fri, May 22, 2020 at 01:17:09PM -0600, Alex Williamson wrote:
> v3:
> 
> The memory_lock semaphore is only held in the MSI-X path for callouts
> to functions that may access MSI-X MMIO space of the device, this
> should resolve the circular locking dependency reported by Qian
> (re-testing very much appreciated).  I've also incorporated the
> pci_map_rom() and pci_unmap_rom() calls under the memory_lock.  Commit
> 0cfd027be1d6 ("vfio_pci: Enable memory accesses before calling
> pci_map_rom") made sure memory was enabled on the info path, but did
> not provide locking to protect that state.  The r/w path of the BAR
> access is expanded to include ROM mapping/unmapping.  Unless there
> are objections, I'll plan to drop v2 from my next branch and replace
> it with this.  Thanks,

FYI, the lockdep warning is gone.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO
  2020-05-22 22:08 ` [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO Qian Cai
@ 2020-05-22 22:25   ` Alex Williamson
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Williamson @ 2020-05-22 22:25 UTC (permalink / raw)
  To: Qian Cai; +Cc: kvm, linux-kernel, cohuck, jgg, peterx

On Fri, 22 May 2020 18:08:58 -0400
Qian Cai <cai@lca.pw> wrote:

> On Fri, May 22, 2020 at 01:17:09PM -0600, Alex Williamson wrote:
> > v3:
> > 
> > The memory_lock semaphore is only held in the MSI-X path for callouts
> > to functions that may access MSI-X MMIO space of the device, this
> > should resolve the circular locking dependency reported by Qian
> > (re-testing very much appreciated).  I've also incorporated the
> > pci_map_rom() and pci_unmap_rom() calls under the memory_lock.  Commit
> > 0cfd027be1d6 ("vfio_pci: Enable memory accesses before calling
> > pci_map_rom") made sure memory was enabled on the info path, but did
> > not provide locking to protect that state.  The r/w path of the BAR
> > access is expanded to include ROM mapping/unmapping.  Unless there
> > are objections, I'll plan to drop v2 from my next branch and replace
> > it with this.  Thanks,  
> 
> FYI, the lockdep warning is gone.
> 

Thank you for testing!

Alex


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-22 19:17 ` [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory Alex Williamson
@ 2020-05-23 19:34   ` Peter Xu
  2020-05-23 23:06     ` Alex Williamson
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2020-05-23 19:34 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, cohuck, jgg, cai

Hi, Alex,

On Fri, May 22, 2020 at 01:17:43PM -0600, Alex Williamson wrote:
> @@ -1346,15 +1526,32 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
>  {
>  	struct vm_area_struct *vma = vmf->vma;
>  	struct vfio_pci_device *vdev = vma->vm_private_data;
> +	vm_fault_t ret = VM_FAULT_NOPAGE;
> +
> +	mutex_lock(&vdev->vma_lock);
> +	down_read(&vdev->memory_lock);

I remembered to have seen the fault() handling FAULT_FLAG_RETRY_NOWAIT at least
in the very first version, but it's not here any more...  Could I ask what's
the reason behind?  I probably have missed something along with the versions,
I'm just not sure whether e.g. this would potentially block a GUP caller even
if it's with FOLL_NOWAIT.

Side note: Another thing I thought about when reading this patch - there seems
to have quite some possibility that the VFIO_DEVICE_PCI_HOT_RESET ioctl will
start to return -EBUSY now.  Not a problem for this series, but maybe we should
rememeber to let the userspace handle -EBUSY properly as follow up too, since I
saw QEMU seemed to not handle -EBUSY for host reset path right now.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-23 19:34   ` Peter Xu
@ 2020-05-23 23:06     ` Alex Williamson
  2020-05-23 23:52       ` Peter Xu
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Williamson @ 2020-05-23 23:06 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, linux-kernel, cohuck, jgg, cai

On Sat, 23 May 2020 15:34:17 -0400
Peter Xu <peterx@redhat.com> wrote:

> Hi, Alex,
> 
> On Fri, May 22, 2020 at 01:17:43PM -0600, Alex Williamson wrote:
> > @@ -1346,15 +1526,32 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
> >  {
> >  	struct vm_area_struct *vma = vmf->vma;
> >  	struct vfio_pci_device *vdev = vma->vm_private_data;
> > +	vm_fault_t ret = VM_FAULT_NOPAGE;
> > +
> > +	mutex_lock(&vdev->vma_lock);
> > +	down_read(&vdev->memory_lock);  
> 
> I remembered to have seen the fault() handling FAULT_FLAG_RETRY_NOWAIT at least
> in the very first version, but it's not here any more...  Could I ask what's
> the reason behind?  I probably have missed something along with the versions,
> I'm just not sure whether e.g. this would potentially block a GUP caller even
> if it's with FOLL_NOWAIT.

This is largely what v2 was about, from the cover letter:

    Locking in 3/ is substantially changed to avoid the retry scenario
    within the fault handler, therefore a caller who does not allow
    retry will no longer receive a SIGBUS on contention.

The discussion thread starts here:

https://lore.kernel.org/kvm/20200501234849.GQ26002@ziepe.ca/

Feel free to interject if there's something that doesn't make sense,
the idea is that since we've fixed the lock ordering we never need to
release one lock to wait for another, therefore we can wait for the
lock.  I'm under the impression that we can wait for the lock
regardless of the flags under these conditions.

> Side note: Another thing I thought about when reading this patch - there seems
> to have quite some possibility that the VFIO_DEVICE_PCI_HOT_RESET ioctl will
> start to return -EBUSY now.  Not a problem for this series, but maybe we should
> rememeber to let the userspace handle -EBUSY properly as follow up too, since I
> saw QEMU seemed to not handle -EBUSY for host reset path right now.

I think this has always been the case, whether it's the device lock or
this lock, the only way I know to avoid potential deadlock is to use
the 'try' locking semantics.  In normal scenarios I expect access to
sibling devices is quiesced at this point, so a user driver actually
wanting to achieve a reset shouldn't be affected by this.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-23 23:06     ` Alex Williamson
@ 2020-05-23 23:52       ` Peter Xu
  2020-05-24  0:02         ` Peter Xu
  2020-05-25 12:26         ` Jason Gunthorpe
  0 siblings, 2 replies; 24+ messages in thread
From: Peter Xu @ 2020-05-23 23:52 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, cohuck, jgg, cai

On Sat, May 23, 2020 at 05:06:02PM -0600, Alex Williamson wrote:
> On Sat, 23 May 2020 15:34:17 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > Hi, Alex,
> > 
> > On Fri, May 22, 2020 at 01:17:43PM -0600, Alex Williamson wrote:
> > > @@ -1346,15 +1526,32 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
> > >  {
> > >  	struct vm_area_struct *vma = vmf->vma;
> > >  	struct vfio_pci_device *vdev = vma->vm_private_data;
> > > +	vm_fault_t ret = VM_FAULT_NOPAGE;
> > > +
> > > +	mutex_lock(&vdev->vma_lock);
> > > +	down_read(&vdev->memory_lock);  
> > 
> > I remembered to have seen the fault() handling FAULT_FLAG_RETRY_NOWAIT at least
> > in the very first version, but it's not here any more...  Could I ask what's
> > the reason behind?  I probably have missed something along with the versions,
> > I'm just not sure whether e.g. this would potentially block a GUP caller even
> > if it's with FOLL_NOWAIT.
> 
> This is largely what v2 was about, from the cover letter:
> 
>     Locking in 3/ is substantially changed to avoid the retry scenario
>     within the fault handler, therefore a caller who does not allow
>     retry will no longer receive a SIGBUS on contention.
> 
> The discussion thread starts here:
> 
> https://lore.kernel.org/kvm/20200501234849.GQ26002@ziepe.ca/

[1]

> 
> Feel free to interject if there's something that doesn't make sense,
> the idea is that since we've fixed the lock ordering we never need to
> release one lock to wait for another, therefore we can wait for the
> lock.  I'm under the impression that we can wait for the lock
> regardless of the flags under these conditions.

I see; thanks for the link.  Sorry I should probably follow up the discussion
and ask the question earlier, anyway...

For what I understand now, IMHO we should still need all those handlings of
FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
not sure what would be the side effect of that if fault() blocked it.  E.g.,
the caller could be in an atomic context.

But now I also agree that VM_FAULT_SIGBUS is probably not correct there in the
initial version [1] - I thought it was OK initially (after all after the
multiple fault retry series we should always be with FAULT_FLAG_ALLOW_RETRY..).
However after some thinking... it should be the common slow path where retry is
simply not allowed.  So IMHO instead of SIGBUS there, we should also use all
the slow path of the locks.  That'll be safe then because it's never going to
be with FAULT_FLAG_RETRY_NOWAIT (FAULT_FLAG_RETRY_NOWAIT depends on
FAULT_FLAG_ALLOW_RETRY).

A reference code could be __lock_page_or_retry() where the lock_page could wait
just like we taking the sems/mutexes, and the previous SIGBUS case would
corresponds to this chunk of __lock_page_or_retry():

	} else {
		if (flags & FAULT_FLAG_KILLABLE) {
			int ret;

			ret = __lock_page_killable(page);
			if (ret) {
				up_read(&mm->mmap_sem);
				return 0;
			}
		} else
			__lock_page(page);
		return 1;
	}

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-23 23:52       ` Peter Xu
@ 2020-05-24  0:02         ` Peter Xu
  2020-05-25 12:26         ` Jason Gunthorpe
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Xu @ 2020-05-24  0:02 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, cohuck, jgg, cai, Andrea Arcangeli

(CC Andrea too)

On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> On Sat, May 23, 2020 at 05:06:02PM -0600, Alex Williamson wrote:
> > On Sat, 23 May 2020 15:34:17 -0400
> > Peter Xu <peterx@redhat.com> wrote:
> > 
> > > Hi, Alex,
> > > 
> > > On Fri, May 22, 2020 at 01:17:43PM -0600, Alex Williamson wrote:
> > > > @@ -1346,15 +1526,32 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
> > > >  {
> > > >  	struct vm_area_struct *vma = vmf->vma;
> > > >  	struct vfio_pci_device *vdev = vma->vm_private_data;
> > > > +	vm_fault_t ret = VM_FAULT_NOPAGE;
> > > > +
> > > > +	mutex_lock(&vdev->vma_lock);
> > > > +	down_read(&vdev->memory_lock);  
> > > 
> > > I remembered to have seen the fault() handling FAULT_FLAG_RETRY_NOWAIT at least
> > > in the very first version, but it's not here any more...  Could I ask what's
> > > the reason behind?  I probably have missed something along with the versions,
> > > I'm just not sure whether e.g. this would potentially block a GUP caller even
> > > if it's with FOLL_NOWAIT.
> > 
> > This is largely what v2 was about, from the cover letter:
> > 
> >     Locking in 3/ is substantially changed to avoid the retry scenario
> >     within the fault handler, therefore a caller who does not allow
> >     retry will no longer receive a SIGBUS on contention.
> > 
> > The discussion thread starts here:
> > 
> > https://lore.kernel.org/kvm/20200501234849.GQ26002@ziepe.ca/
> 
> [1]
> 
> > 
> > Feel free to interject if there's something that doesn't make sense,
> > the idea is that since we've fixed the lock ordering we never need to
> > release one lock to wait for another, therefore we can wait for the
> > lock.  I'm under the impression that we can wait for the lock
> > regardless of the flags under these conditions.
> 
> I see; thanks for the link.  Sorry I should probably follow up the discussion
> and ask the question earlier, anyway...
> 
> For what I understand now, IMHO we should still need all those handlings of
> FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> not sure what would be the side effect of that if fault() blocked it.  E.g.,
> the caller could be in an atomic context.
> 
> But now I also agree that VM_FAULT_SIGBUS is probably not correct there in the
> initial version [1] - I thought it was OK initially (after all after the
> multiple fault retry series we should always be with FAULT_FLAG_ALLOW_RETRY..).
> However after some thinking... it should be the common slow path where retry is
> simply not allowed.  So IMHO instead of SIGBUS there, we should also use all
> the slow path of the locks.  That'll be safe then because it's never going to
> be with FAULT_FLAG_RETRY_NOWAIT (FAULT_FLAG_RETRY_NOWAIT depends on
> FAULT_FLAG_ALLOW_RETRY).
> 
> A reference code could be __lock_page_or_retry() where the lock_page could wait
> just like we taking the sems/mutexes, and the previous SIGBUS case would
> corresponds to this chunk of __lock_page_or_retry():
> 
> 	} else {
> 		if (flags & FAULT_FLAG_KILLABLE) {
> 			int ret;
> 
> 			ret = __lock_page_killable(page);
> 			if (ret) {
> 				up_read(&mm->mmap_sem);
> 				return 0;
> 			}
> 		} else
> 			__lock_page(page);
> 		return 1;
> 	}
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-23 23:52       ` Peter Xu
  2020-05-24  0:02         ` Peter Xu
@ 2020-05-25 12:26         ` Jason Gunthorpe
  2020-05-25 14:28           ` Peter Xu
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-05-25 12:26 UTC (permalink / raw)
  To: Peter Xu; +Cc: Alex Williamson, kvm, linux-kernel, cohuck, cai

On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:

> For what I understand now, IMHO we should still need all those handlings of
> FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> not sure what would be the side effect of that if fault() blocked it.  E.g.,
> the caller could be in an atomic context.

AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
VM_FAULT_RETRY is returned, which this doesn't do?

It is not a generic 'do not sleep'

Do you know different?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-25 12:26         ` Jason Gunthorpe
@ 2020-05-25 14:28           ` Peter Xu
  2020-05-25 14:46             ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2020-05-25 14:28 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Alex Williamson, kvm, linux-kernel, cohuck, cai

On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote:
> On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> 
> > For what I understand now, IMHO we should still need all those handlings of
> > FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> > try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> > not sure what would be the side effect of that if fault() blocked it.  E.g.,
> > the caller could be in an atomic context.
> 
> AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
> VM_FAULT_RETRY is returned, which this doesn't do?

Yes, that's why I think we should still properly return VM_FAULT_RETRY if
needed..  because IMHO it is still possible that the caller calls with
FAULT_FLAG_RETRY_NOWAIT.

My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means:

  - We cannot release the mmap_sem, and,
  - We cannot sleep

But we're allowed to return VM_FAULT_RETRY if any of the above is necessary.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-25 14:28           ` Peter Xu
@ 2020-05-25 14:46             ` Jason Gunthorpe
  2020-05-25 15:11               ` Peter Xu
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-05-25 14:46 UTC (permalink / raw)
  To: Peter Xu; +Cc: Alex Williamson, kvm, linux-kernel, cohuck, cai

On Mon, May 25, 2020 at 10:28:06AM -0400, Peter Xu wrote:
> On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote:
> > On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> > 
> > > For what I understand now, IMHO we should still need all those handlings of
> > > FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> > > try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> > > not sure what would be the side effect of that if fault() blocked it.  E.g.,
> > > the caller could be in an atomic context.
> > 
> > AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
> > VM_FAULT_RETRY is returned, which this doesn't do?
> 
> Yes, that's why I think we should still properly return VM_FAULT_RETRY if
> needed..  because IMHO it is still possible that the caller calls with
> FAULT_FLAG_RETRY_NOWAIT.
> 
> My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means:
> 
>   - We cannot release the mmap_sem, and,
>   - We cannot sleep

Sleeping looks fine, look at any FS implementation of fault, say,
xfs. The first thing it does is xfs_ilock() which does down_write().

I can't say when VM_FAULT_RETRY comes into play, but it is not so
simple as just sleeping..

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-25 14:46             ` Jason Gunthorpe
@ 2020-05-25 15:11               ` Peter Xu
  2020-05-25 16:56                 ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2020-05-25 15:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, kvm, linux-kernel, cohuck, cai, Andrea Arcangeli

On Mon, May 25, 2020 at 11:46:51AM -0300, Jason Gunthorpe wrote:
> On Mon, May 25, 2020 at 10:28:06AM -0400, Peter Xu wrote:
> > On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote:
> > > On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> > > 
> > > > For what I understand now, IMHO we should still need all those handlings of
> > > > FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> > > > try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> > > > not sure what would be the side effect of that if fault() blocked it.  E.g.,
> > > > the caller could be in an atomic context.
> > > 
> > > AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
> > > VM_FAULT_RETRY is returned, which this doesn't do?
> > 
> > Yes, that's why I think we should still properly return VM_FAULT_RETRY if
> > needed..  because IMHO it is still possible that the caller calls with
> > FAULT_FLAG_RETRY_NOWAIT.
> > 
> > My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means:
> > 
> >   - We cannot release the mmap_sem, and,
> >   - We cannot sleep
> 
> Sleeping looks fine, look at any FS implementation of fault, say,
> xfs. The first thing it does is xfs_ilock() which does down_write().

Yeah.  My wild guess is that maybe fs code will always be without
FAULT_FLAG_RETRY_NOWAIT so it's safe to sleep unconditionally (e.g., I think
the general #PF should be fine to sleep in fault(); gup should be special, but
I didn't observe any gup code called upon file systems)?

Or I must have missed something important...

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-25 15:11               ` Peter Xu
@ 2020-05-25 16:56                 ` Jason Gunthorpe
  2020-05-25 20:56                   ` John Hubbard
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-05-25 16:56 UTC (permalink / raw)
  To: Peter Xu, John Hubbard
  Cc: Alex Williamson, kvm, linux-kernel, cohuck, cai, Andrea Arcangeli

On Mon, May 25, 2020 at 11:11:42AM -0400, Peter Xu wrote:
> On Mon, May 25, 2020 at 11:46:51AM -0300, Jason Gunthorpe wrote:
> > On Mon, May 25, 2020 at 10:28:06AM -0400, Peter Xu wrote:
> > > On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote:
> > > > On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> > > > 
> > > > > For what I understand now, IMHO we should still need all those handlings of
> > > > > FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> > > > > try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> > > > > not sure what would be the side effect of that if fault() blocked it.  E.g.,
> > > > > the caller could be in an atomic context.
> > > > 
> > > > AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
> > > > VM_FAULT_RETRY is returned, which this doesn't do?
> > > 
> > > Yes, that's why I think we should still properly return VM_FAULT_RETRY if
> > > needed..  because IMHO it is still possible that the caller calls with
> > > FAULT_FLAG_RETRY_NOWAIT.
> > > 
> > > My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means:
> > > 
> > >   - We cannot release the mmap_sem, and,
> > >   - We cannot sleep
> > 
> > Sleeping looks fine, look at any FS implementation of fault, say,
> > xfs. The first thing it does is xfs_ilock() which does down_write().
> 
> Yeah.  My wild guess is that maybe fs code will always be without
> FAULT_FLAG_RETRY_NOWAIT so it's safe to sleep unconditionally (e.g., I think
> the general #PF should be fine to sleep in fault(); gup should be special, but
> I didn't observe any gup code called upon file systems)?

get_user_pages is called on filesystem backed pages.

I have no idea what FAULT_FLAG_RETRY_NOWAIT is supposed to do. Maybe
John was able to guess when he reworked that stuff?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-25 16:56                 ` Jason Gunthorpe
@ 2020-05-25 20:56                   ` John Hubbard
  2020-05-26  0:37                     ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: John Hubbard @ 2020-05-25 20:56 UTC (permalink / raw)
  To: Jason Gunthorpe, Peter Xu
  Cc: Alex Williamson, kvm, linux-kernel, cohuck, cai, Andrea Arcangeli

On 2020-05-25 09:56, Jason Gunthorpe wrote:
> On Mon, May 25, 2020 at 11:11:42AM -0400, Peter Xu wrote:
>> On Mon, May 25, 2020 at 11:46:51AM -0300, Jason Gunthorpe wrote:
>>> On Mon, May 25, 2020 at 10:28:06AM -0400, Peter Xu wrote:
>>>> On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote:
>>>>> On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
>>>>>
>>>>>> For what I understand now, IMHO we should still need all those handlings of
>>>>>> FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
>>>>>> try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
>>>>>> not sure what would be the side effect of that if fault() blocked it.  E.g.,
>>>>>> the caller could be in an atomic context.
>>>>>
>>>>> AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
>>>>> VM_FAULT_RETRY is returned, which this doesn't do?
>>>>
>>>> Yes, that's why I think we should still properly return VM_FAULT_RETRY if
>>>> needed..  because IMHO it is still possible that the caller calls with
>>>> FAULT_FLAG_RETRY_NOWAIT.
>>>>
>>>> My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means:
>>>>
>>>>    - We cannot release the mmap_sem, and,
>>>>    - We cannot sleep
>>>
>>> Sleeping looks fine, look at any FS implementation of fault, say,
>>> xfs. The first thing it does is xfs_ilock() which does down_write().
>>
>> Yeah.  My wild guess is that maybe fs code will always be without
>> FAULT_FLAG_RETRY_NOWAIT so it's safe to sleep unconditionally (e.g., I think
>> the general #PF should be fine to sleep in fault(); gup should be special, but
>> I didn't observe any gup code called upon file systems)?
> 
> get_user_pages is called on filesystem backed pages.
> 
> I have no idea what FAULT_FLAG_RETRY_NOWAIT is supposed to do. Maybe
> John was able to guess when he reworked that stuff?
> 

Although I didn't end up touching that particular area, I'm sure it's going
to come up sometime soon, so I poked around just now, and found that
FAULT_FLAG_RETRY_NOWAIT was added almost exactly 9 years ago. This flag was
intended to make KVM and similar things behave better when doing GUP on
file-backed pages that might, or might not be in memory.

The idea is described in the changelog, but not in the code comments or
Documentation, sigh:

commit 318b275fbca1ab9ec0862de71420e0e92c3d1aa7
Author: Gleb Natapov <gleb@redhat.com>
Date:   Tue Mar 22 16:30:51 2011 -0700

     mm: allow GUP to fail instead of waiting on a page

     GUP user may want to try to acquire a reference to a page if it is already
     in memory, but not if IO, to bring it in, is needed.  For example KVM may
     tell vcpu to schedule another guest process if current one is trying to
     access swapped out page.  Meanwhile, the page will be swapped in and the
     guest process, that depends on it, will be able to run again.

     This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and
     FOLL_NOWAIT follow_page flags.  FAULT_FLAG_RETRY_NOWAIT, when used in
     conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that
     it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY
     instead.

If that helps, maybe documentation approximately like this might be welcome
(against linux-next, so I'm using mmap_lock, instead of mmap_sem), below.
Or is this overkill? People like minimal documentation in the code, so maybe
this belongs in Documentation, if anywhere:

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8429d5aa31e44..e32e8e52a57ac 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -430,6 +430,15 @@ extern pgprot_t protection_map[16];
   * continuous faults with flags (b).  We should always try to detect pending
   * signals before a retry to make sure the continuous page faults can still be
   * interrupted if necessary.
+ *
+ * About @FAULT_FLAG_RETRY_NOWAIT: this is intended for callers who would like
+ * to acquire a page, but only if the page is already in memory. If, on the
+ * other hand, the page requires IO in order to bring it into memory, then fault
+ * handlers will immediately return VM_FAULT_RETRY ("don't wait"), while leaving
+ * mmap_lock held ("don't drop mmap_lock"). For example, this is useful for
+ * virtual machines that have multiple guests running: if guest A attempts
+ * get_user_pages() on a swapped out page, another guest can be scheduled while
+ * waiting for IO to swap in guest A's page.
   */
  #define FAULT_FLAG_WRITE                       0x01
  #define FAULT_FLAG_MKWRITE                     0x02


thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-25 20:56                   ` John Hubbard
@ 2020-05-26  0:37                     ` Jason Gunthorpe
  2020-05-26  0:46                       ` John Hubbard
  2020-05-26 13:49                       ` Peter Xu
  0 siblings, 2 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2020-05-26  0:37 UTC (permalink / raw)
  To: John Hubbard
  Cc: Peter Xu, Alex Williamson, kvm, linux-kernel, cohuck, cai,
	Andrea Arcangeli

On Mon, May 25, 2020 at 01:56:28PM -0700, John Hubbard wrote:
> On 2020-05-25 09:56, Jason Gunthorpe wrote:
> > On Mon, May 25, 2020 at 11:11:42AM -0400, Peter Xu wrote:
> > > On Mon, May 25, 2020 at 11:46:51AM -0300, Jason Gunthorpe wrote:
> > > > On Mon, May 25, 2020 at 10:28:06AM -0400, Peter Xu wrote:
> > > > > On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote:
> > > > > > On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> > > > > > 
> > > > > > > For what I understand now, IMHO we should still need all those handlings of
> > > > > > > FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> > > > > > > try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> > > > > > > not sure what would be the side effect of that if fault() blocked it.  E.g.,
> > > > > > > the caller could be in an atomic context.
> > > > > > 
> > > > > > AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
> > > > > > VM_FAULT_RETRY is returned, which this doesn't do?
> > > > > 
> > > > > Yes, that's why I think we should still properly return VM_FAULT_RETRY if
> > > > > needed..  because IMHO it is still possible that the caller calls with
> > > > > FAULT_FLAG_RETRY_NOWAIT.
> > > > > 
> > > > > My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means:
> > > > > 
> > > > >    - We cannot release the mmap_sem, and,
> > > > >    - We cannot sleep
> > > > 
> > > > Sleeping looks fine, look at any FS implementation of fault, say,
> > > > xfs. The first thing it does is xfs_ilock() which does down_write().
> > > 
> > > Yeah.  My wild guess is that maybe fs code will always be without
> > > FAULT_FLAG_RETRY_NOWAIT so it's safe to sleep unconditionally (e.g., I think
> > > the general #PF should be fine to sleep in fault(); gup should be special, but
> > > I didn't observe any gup code called upon file systems)?
> > 
> > get_user_pages is called on filesystem backed pages.
> > 
> > I have no idea what FAULT_FLAG_RETRY_NOWAIT is supposed to do. Maybe
> > John was able to guess when he reworked that stuff?
> > 
> 
> Although I didn't end up touching that particular area, I'm sure it's going
> to come up sometime soon, so I poked around just now, and found that
> FAULT_FLAG_RETRY_NOWAIT was added almost exactly 9 years ago. This flag was
> intended to make KVM and similar things behave better when doing GUP on
> file-backed pages that might, or might not be in memory.
> 
> The idea is described in the changelog, but not in the code comments or
> Documentation, sigh:
> 
> commit 318b275fbca1ab9ec0862de71420e0e92c3d1aa7
> Author: Gleb Natapov <gleb@redhat.com>
> Date:   Tue Mar 22 16:30:51 2011 -0700
> 
>     mm: allow GUP to fail instead of waiting on a page
> 
>     GUP user may want to try to acquire a reference to a page if it is already
>     in memory, but not if IO, to bring it in, is needed.  For example KVM may
>     tell vcpu to schedule another guest process if current one is trying to
>     access swapped out page.  Meanwhile, the page will be swapped in and the
>     guest process, that depends on it, will be able to run again.
> 
>     This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and
>     FOLL_NOWAIT follow_page flags.  FAULT_FLAG_RETRY_NOWAIT, when used in
>     conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that
>     it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY
>     instead.

So, from kvm's perspective it was to avoid excessively long blocking in
common paths when it could rejoin the completed IO by somehow waiting
on a page itself?

It all seems like it should not be used unless the page is going to go
to IO?

Certainly there is no reason to optimize the fringe case of vfio
sleeping if there is and incorrect concurrnent attempt to disable the
a BAR.

> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 8429d5aa31e44..e32e8e52a57ac 100644
> +++ b/include/linux/mm.h
> @@ -430,6 +430,15 @@ extern pgprot_t protection_map[16];
>   * continuous faults with flags (b).  We should always try to detect pending
>   * signals before a retry to make sure the continuous page faults can still be
>   * interrupted if necessary.
> + *
> + * About @FAULT_FLAG_RETRY_NOWAIT: this is intended for callers who would like
> + * to acquire a page, but only if the page is already in memory. If, on the
> + * other hand, the page requires IO in order to bring it into memory, then fault
> + * handlers will immediately return VM_FAULT_RETRY ("don't wait"), while leaving
> + * mmap_lock held ("don't drop mmap_lock"). For example, this is useful for
> + * virtual machines that have multiple guests running: if guest A attempts
> + * get_user_pages() on a swapped out page, another guest can be scheduled while
> + * waiting for IO to swap in guest A's page.
>   */
>  #define FAULT_FLAG_WRITE                       0x01
>  #define FAULT_FLAG_MKWRITE                     0x02

It seems reasonable but people might complain about the kvm
specifics of the explanation.

It might be better to explain how the caller is supposed to know when
it is OK to try GUP again and expect success, as it seems to me this
is really about externalizing the sleep for page wait?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-26  0:37                     ` Jason Gunthorpe
@ 2020-05-26  0:46                       ` John Hubbard
  2020-05-26 13:49                       ` Peter Xu
  1 sibling, 0 replies; 24+ messages in thread
From: John Hubbard @ 2020-05-26  0:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, Alex Williamson, kvm, linux-kernel, cohuck, cai,
	Andrea Arcangeli

On 2020-05-25 17:37, Jason Gunthorpe wrote:
...
>> commit 318b275fbca1ab9ec0862de71420e0e92c3d1aa7
>> Author: Gleb Natapov <gleb@redhat.com>
>> Date:   Tue Mar 22 16:30:51 2011 -0700
>>
>>      mm: allow GUP to fail instead of waiting on a page
>>
>>      GUP user may want to try to acquire a reference to a page if it is already
>>      in memory, but not if IO, to bring it in, is needed.  For example KVM may
>>      tell vcpu to schedule another guest process if current one is trying to
>>      access swapped out page.  Meanwhile, the page will be swapped in and the
>>      guest process, that depends on it, will be able to run again.
>>
>>      This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and
>>      FOLL_NOWAIT follow_page flags.  FAULT_FLAG_RETRY_NOWAIT, when used in
>>      conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that
>>      it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY
>>      instead.
> 
> So, from kvm's perspective it was to avoid excessively long blocking in
> common paths when it could rejoin the completed IO by somehow waiting
> on a page itself?


Or perhaps some variation on that, such as just retrying with an intervening
schedule() call. It's not clear just from that commit.

> 
> It all seems like it should not be used unless the page is going to go
> to IO?


That's my conclusion so far, yes.


> 
> Certainly there is no reason to optimize the fringe case of vfio
> sleeping if there is and incorrect concurrnent attempt to disable the
> a BAR.


Definitely agree with that position.

> 
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 8429d5aa31e44..e32e8e52a57ac 100644
>> +++ b/include/linux/mm.h
>> @@ -430,6 +430,15 @@ extern pgprot_t protection_map[16];
>>    * continuous faults with flags (b).  We should always try to detect pending
>>    * signals before a retry to make sure the continuous page faults can still be
>>    * interrupted if necessary.
>> + *
>> + * About @FAULT_FLAG_RETRY_NOWAIT: this is intended for callers who would like
>> + * to acquire a page, but only if the page is already in memory. If, on the
>> + * other hand, the page requires IO in order to bring it into memory, then fault
>> + * handlers will immediately return VM_FAULT_RETRY ("don't wait"), while leaving
>> + * mmap_lock held ("don't drop mmap_lock"). For example, this is useful for
>> + * virtual machines that have multiple guests running: if guest A attempts
>> + * get_user_pages() on a swapped out page, another guest can be scheduled while
>> + * waiting for IO to swap in guest A's page.
>>    */
>>   #define FAULT_FLAG_WRITE                       0x01
>>   #define FAULT_FLAG_MKWRITE                     0x02
> 
> It seems reasonable but people might complain about the kvm
> specifics of the explanation.
> 
> It might be better to explain how the caller is supposed to know when
> it is OK to try GUP again and expect success, as it seems to me this
> is really about externalizing the sleep for page wait?
> 

OK, good point. The example was helpful in the commit log, but not quite
appropriate in mm.h, yes.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-26  0:37                     ` Jason Gunthorpe
  2020-05-26  0:46                       ` John Hubbard
@ 2020-05-26 13:49                       ` Peter Xu
  2020-05-26 14:32                         ` Alex Williamson
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Xu @ 2020-05-26 13:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: John Hubbard, Alex Williamson, kvm, linux-kernel, cohuck, cai,
	Andrea Arcangeli

On Mon, May 25, 2020 at 09:37:05PM -0300, Jason Gunthorpe wrote:
> On Mon, May 25, 2020 at 01:56:28PM -0700, John Hubbard wrote:
> > On 2020-05-25 09:56, Jason Gunthorpe wrote:
> > > On Mon, May 25, 2020 at 11:11:42AM -0400, Peter Xu wrote:
> > > > On Mon, May 25, 2020 at 11:46:51AM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, May 25, 2020 at 10:28:06AM -0400, Peter Xu wrote:
> > > > > > On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote:
> > > > > > > On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> > > > > > > 
> > > > > > > > For what I understand now, IMHO we should still need all those handlings of
> > > > > > > > FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> > > > > > > > try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> > > > > > > > not sure what would be the side effect of that if fault() blocked it.  E.g.,
> > > > > > > > the caller could be in an atomic context.
> > > > > > > 
> > > > > > > AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
> > > > > > > VM_FAULT_RETRY is returned, which this doesn't do?
> > > > > > 
> > > > > > Yes, that's why I think we should still properly return VM_FAULT_RETRY if
> > > > > > needed..  because IMHO it is still possible that the caller calls with
> > > > > > FAULT_FLAG_RETRY_NOWAIT.
> > > > > > 
> > > > > > My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means:
> > > > > > 
> > > > > >    - We cannot release the mmap_sem, and,
> > > > > >    - We cannot sleep
> > > > > 
> > > > > Sleeping looks fine, look at any FS implementation of fault, say,
> > > > > xfs. The first thing it does is xfs_ilock() which does down_write().
> > > > 
> > > > Yeah.  My wild guess is that maybe fs code will always be without
> > > > FAULT_FLAG_RETRY_NOWAIT so it's safe to sleep unconditionally (e.g., I think
> > > > the general #PF should be fine to sleep in fault(); gup should be special, but
> > > > I didn't observe any gup code called upon file systems)?
> > > 
> > > get_user_pages is called on filesystem backed pages.
> > > 
> > > I have no idea what FAULT_FLAG_RETRY_NOWAIT is supposed to do. Maybe
> > > John was able to guess when he reworked that stuff?
> > > 
> > 
> > Although I didn't end up touching that particular area, I'm sure it's going
> > to come up sometime soon, so I poked around just now, and found that
> > FAULT_FLAG_RETRY_NOWAIT was added almost exactly 9 years ago. This flag was
> > intended to make KVM and similar things behave better when doing GUP on
> > file-backed pages that might, or might not be in memory.
> > 
> > The idea is described in the changelog, but not in the code comments or
> > Documentation, sigh:
> > 
> > commit 318b275fbca1ab9ec0862de71420e0e92c3d1aa7
> > Author: Gleb Natapov <gleb@redhat.com>
> > Date:   Tue Mar 22 16:30:51 2011 -0700
> > 
> >     mm: allow GUP to fail instead of waiting on a page
> > 
> >     GUP user may want to try to acquire a reference to a page if it is already
> >     in memory, but not if IO, to bring it in, is needed.  For example KVM may
> >     tell vcpu to schedule another guest process if current one is trying to
> >     access swapped out page.  Meanwhile, the page will be swapped in and the
> >     guest process, that depends on it, will be able to run again.
> > 
> >     This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and
> >     FOLL_NOWAIT follow_page flags.  FAULT_FLAG_RETRY_NOWAIT, when used in
> >     conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that
> >     it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY
> >     instead.
> 
> So, from kvm's perspective it was to avoid excessively long blocking in
> common paths when it could rejoin the completed IO by somehow waiting
> on a page itself?
> 
> It all seems like it should not be used unless the page is going to go
> to IO?

I think NOWAIT is used as a common flag for kvm for its initial attempt to
fault in a normal page, however...  I just noticed another fact that actually
__get_user_pages() won't work with PFNMAP (check_vma_flags should fail), but
KVM just started to support fault() for PFNMAP from commit add6a0cd1c5b (2016)
using fixup_user_fault(), where nvidia seems to have a similar request to have
a fault handler on some mapped BARs.

> 
> Certainly there is no reason to optimize the fringe case of vfio
> sleeping if there is and incorrect concurrnent attempt to disable the
> a BAR.

If fixup_user_fault() (which is always with ALLOW_RETRY && !RETRY_NOWAIT) is
the only path for the new fault(), then current way seems ok.  Not sure whether
this would worth a WARN_ON_ONCE(RETRY_NOWAIT) in the fault() to be clear of
that fact.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-26 13:49                       ` Peter Xu
@ 2020-05-26 14:32                         ` Alex Williamson
  2020-05-26 14:46                           ` Peter Xu
  2020-05-26 15:53                           ` Jason Gunthorpe
  0 siblings, 2 replies; 24+ messages in thread
From: Alex Williamson @ 2020-05-26 14:32 UTC (permalink / raw)
  To: Peter Xu
  Cc: Jason Gunthorpe, John Hubbard, kvm, linux-kernel, cohuck, cai,
	Andrea Arcangeli

On Tue, 26 May 2020 09:49:54 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Mon, May 25, 2020 at 09:37:05PM -0300, Jason Gunthorpe wrote:
> > On Mon, May 25, 2020 at 01:56:28PM -0700, John Hubbard wrote:  
> > > On 2020-05-25 09:56, Jason Gunthorpe wrote:  
> > > > On Mon, May 25, 2020 at 11:11:42AM -0400, Peter Xu wrote:  
> > > > > On Mon, May 25, 2020 at 11:46:51AM -0300, Jason Gunthorpe wrote:  
> > > > > > On Mon, May 25, 2020 at 10:28:06AM -0400, Peter Xu wrote:  
> > > > > > > On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote:  
> > > > > > > > On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> > > > > > > >   
> > > > > > > > > For what I understand now, IMHO we should still need all those handlings of
> > > > > > > > > FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> > > > > > > > > try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> > > > > > > > > not sure what would be the side effect of that if fault() blocked it.  E.g.,
> > > > > > > > > the caller could be in an atomic context.  
> > > > > > > > 
> > > > > > > > AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
> > > > > > > > VM_FAULT_RETRY is returned, which this doesn't do?  
> > > > > > > 
> > > > > > > Yes, that's why I think we should still properly return VM_FAULT_RETRY if
> > > > > > > needed..  because IMHO it is still possible that the caller calls with
> > > > > > > FAULT_FLAG_RETRY_NOWAIT.
> > > > > > > 
> > > > > > > My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means:
> > > > > > > 
> > > > > > >    - We cannot release the mmap_sem, and,
> > > > > > >    - We cannot sleep  
> > > > > > 
> > > > > > Sleeping looks fine, look at any FS implementation of fault, say,
> > > > > > xfs. The first thing it does is xfs_ilock() which does down_write().  
> > > > > 
> > > > > Yeah.  My wild guess is that maybe fs code will always be without
> > > > > FAULT_FLAG_RETRY_NOWAIT so it's safe to sleep unconditionally (e.g., I think
> > > > > the general #PF should be fine to sleep in fault(); gup should be special, but
> > > > > I didn't observe any gup code called upon file systems)?  
> > > > 
> > > > get_user_pages is called on filesystem backed pages.
> > > > 
> > > > I have no idea what FAULT_FLAG_RETRY_NOWAIT is supposed to do. Maybe
> > > > John was able to guess when he reworked that stuff?
> > > >   
> > > 
> > > Although I didn't end up touching that particular area, I'm sure it's going
> > > to come up sometime soon, so I poked around just now, and found that
> > > FAULT_FLAG_RETRY_NOWAIT was added almost exactly 9 years ago. This flag was
> > > intended to make KVM and similar things behave better when doing GUP on
> > > file-backed pages that might, or might not be in memory.
> > > 
> > > The idea is described in the changelog, but not in the code comments or
> > > Documentation, sigh:
> > > 
> > > commit 318b275fbca1ab9ec0862de71420e0e92c3d1aa7
> > > Author: Gleb Natapov <gleb@redhat.com>
> > > Date:   Tue Mar 22 16:30:51 2011 -0700
> > > 
> > >     mm: allow GUP to fail instead of waiting on a page
> > > 
> > >     GUP user may want to try to acquire a reference to a page if it is already
> > >     in memory, but not if IO, to bring it in, is needed.  For example KVM may
> > >     tell vcpu to schedule another guest process if current one is trying to
> > >     access swapped out page.  Meanwhile, the page will be swapped in and the
> > >     guest process, that depends on it, will be able to run again.
> > > 
> > >     This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and
> > >     FOLL_NOWAIT follow_page flags.  FAULT_FLAG_RETRY_NOWAIT, when used in
> > >     conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that
> > >     it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY
> > >     instead.  
> > 
> > So, from kvm's perspective it was to avoid excessively long blocking in
> > common paths when it could rejoin the completed IO by somehow waiting
> > on a page itself?
> > 
> > It all seems like it should not be used unless the page is going to go
> > to IO?  
> 
> I think NOWAIT is used as a common flag for kvm for its initial attempt to
> fault in a normal page, however...  I just noticed another fact that actually
> __get_user_pages() won't work with PFNMAP (check_vma_flags should fail), but
> KVM just started to support fault() for PFNMAP from commit add6a0cd1c5b (2016)
> using fixup_user_fault(), where nvidia seems to have a similar request to have
> a fault handler on some mapped BARs.
> 
> > 
> > Certainly there is no reason to optimize the fringe case of vfio
> > sleeping if there is and incorrect concurrnent attempt to disable the
> > a BAR.  
> 
> If fixup_user_fault() (which is always with ALLOW_RETRY && !RETRY_NOWAIT) is
> the only path for the new fault(), then current way seems ok.  Not sure whether
> this would worth a WARN_ON_ONCE(RETRY_NOWAIT) in the fault() to be clear of
> that fact.

Thanks for the discussion over the weekend folks.  Peter, I take it
you'd be satisfied if this patch were updated as:

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index aabba6439a5b..35bd7cd4e268 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1528,6 +1528,13 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
 	struct vfio_pci_device *vdev = vma->vm_private_data;
 	vm_fault_t ret = VM_FAULT_NOPAGE;
 
+	/*
+	 * We don't expect to be called with NOWAIT and there are conflicting
+	 * opinions on whether NOWAIT suggests we shouldn't wait for locks or
+	 * just shouldn't wait for I/O.
+	 */
+	WARN_ON_ONCE(vmf->flags & FAULT_FLAG_RETRY_NOWAIT);
+
 	mutex_lock(&vdev->vma_lock);
 	down_read(&vdev->memory_lock);
 

Is that correct?  Thanks,

Alex


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-26 14:32                         ` Alex Williamson
@ 2020-05-26 14:46                           ` Peter Xu
  2020-05-26 15:53                           ` Jason Gunthorpe
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Xu @ 2020-05-26 14:46 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jason Gunthorpe, John Hubbard, kvm, linux-kernel, cohuck, cai,
	Andrea Arcangeli

On Tue, May 26, 2020 at 08:32:18AM -0600, Alex Williamson wrote:
> On Tue, 26 May 2020 09:49:54 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Mon, May 25, 2020 at 09:37:05PM -0300, Jason Gunthorpe wrote:
> > > On Mon, May 25, 2020 at 01:56:28PM -0700, John Hubbard wrote:  
> > > > On 2020-05-25 09:56, Jason Gunthorpe wrote:  
> > > > > On Mon, May 25, 2020 at 11:11:42AM -0400, Peter Xu wrote:  
> > > > > > On Mon, May 25, 2020 at 11:46:51AM -0300, Jason Gunthorpe wrote:  
> > > > > > > On Mon, May 25, 2020 at 10:28:06AM -0400, Peter Xu wrote:  
> > > > > > > > On Mon, May 25, 2020 at 09:26:07AM -0300, Jason Gunthorpe wrote:  
> > > > > > > > > On Sat, May 23, 2020 at 07:52:57PM -0400, Peter Xu wrote:
> > > > > > > > >   
> > > > > > > > > > For what I understand now, IMHO we should still need all those handlings of
> > > > > > > > > > FAULT_FLAG_RETRY_NOWAIT like in the initial version.  E.g., IIUC KVM gup will
> > > > > > > > > > try with FOLL_NOWAIT when async is allowed, before the complete slow path.  I'm
> > > > > > > > > > not sure what would be the side effect of that if fault() blocked it.  E.g.,
> > > > > > > > > > the caller could be in an atomic context.  
> > > > > > > > > 
> > > > > > > > > AFAICT FAULT_FLAG_RETRY_NOWAIT only impacts what happens when
> > > > > > > > > VM_FAULT_RETRY is returned, which this doesn't do?  
> > > > > > > > 
> > > > > > > > Yes, that's why I think we should still properly return VM_FAULT_RETRY if
> > > > > > > > needed..  because IMHO it is still possible that the caller calls with
> > > > > > > > FAULT_FLAG_RETRY_NOWAIT.
> > > > > > > > 
> > > > > > > > My understanding is that FAULT_FLAG_RETRY_NOWAIT majorly means:
> > > > > > > > 
> > > > > > > >    - We cannot release the mmap_sem, and,
> > > > > > > >    - We cannot sleep  
> > > > > > > 
> > > > > > > Sleeping looks fine, look at any FS implementation of fault, say,
> > > > > > > xfs. The first thing it does is xfs_ilock() which does down_write().  
> > > > > > 
> > > > > > Yeah.  My wild guess is that maybe fs code will always be without
> > > > > > FAULT_FLAG_RETRY_NOWAIT so it's safe to sleep unconditionally (e.g., I think
> > > > > > the general #PF should be fine to sleep in fault(); gup should be special, but
> > > > > > I didn't observe any gup code called upon file systems)?  
> > > > > 
> > > > > get_user_pages is called on filesystem backed pages.
> > > > > 
> > > > > I have no idea what FAULT_FLAG_RETRY_NOWAIT is supposed to do. Maybe
> > > > > John was able to guess when he reworked that stuff?
> > > > >   
> > > > 
> > > > Although I didn't end up touching that particular area, I'm sure it's going
> > > > to come up sometime soon, so I poked around just now, and found that
> > > > FAULT_FLAG_RETRY_NOWAIT was added almost exactly 9 years ago. This flag was
> > > > intended to make KVM and similar things behave better when doing GUP on
> > > > file-backed pages that might, or might not be in memory.
> > > > 
> > > > The idea is described in the changelog, but not in the code comments or
> > > > Documentation, sigh:
> > > > 
> > > > commit 318b275fbca1ab9ec0862de71420e0e92c3d1aa7
> > > > Author: Gleb Natapov <gleb@redhat.com>
> > > > Date:   Tue Mar 22 16:30:51 2011 -0700
> > > > 
> > > >     mm: allow GUP to fail instead of waiting on a page
> > > > 
> > > >     GUP user may want to try to acquire a reference to a page if it is already
> > > >     in memory, but not if IO, to bring it in, is needed.  For example KVM may
> > > >     tell vcpu to schedule another guest process if current one is trying to
> > > >     access swapped out page.  Meanwhile, the page will be swapped in and the
> > > >     guest process, that depends on it, will be able to run again.
> > > > 
> > > >     This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and
> > > >     FOLL_NOWAIT follow_page flags.  FAULT_FLAG_RETRY_NOWAIT, when used in
> > > >     conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that
> > > >     it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY
> > > >     instead.  
> > > 
> > > So, from kvm's perspective it was to avoid excessively long blocking in
> > > common paths when it could rejoin the completed IO by somehow waiting
> > > on a page itself?
> > > 
> > > It all seems like it should not be used unless the page is going to go
> > > to IO?  
> > 
> > I think NOWAIT is used as a common flag for kvm for its initial attempt to
> > fault in a normal page, however...  I just noticed another fact that actually
> > __get_user_pages() won't work with PFNMAP (check_vma_flags should fail), but
> > KVM just started to support fault() for PFNMAP from commit add6a0cd1c5b (2016)
> > using fixup_user_fault(), where nvidia seems to have a similar request to have
> > a fault handler on some mapped BARs.
> > 
> > > 
> > > Certainly there is no reason to optimize the fringe case of vfio
> > > sleeping if there is and incorrect concurrnent attempt to disable the
> > > a BAR.  
> > 
> > If fixup_user_fault() (which is always with ALLOW_RETRY && !RETRY_NOWAIT) is
> > the only path for the new fault(), then current way seems ok.  Not sure whether
> > this would worth a WARN_ON_ONCE(RETRY_NOWAIT) in the fault() to be clear of
> > that fact.
> 
> Thanks for the discussion over the weekend folks.  Peter, I take it
> you'd be satisfied if this patch were updated as:
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index aabba6439a5b..35bd7cd4e268 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1528,6 +1528,13 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
>  	struct vfio_pci_device *vdev = vma->vm_private_data;
>  	vm_fault_t ret = VM_FAULT_NOPAGE;
>  
> +	/*
> +	 * We don't expect to be called with NOWAIT and there are conflicting
> +	 * opinions on whether NOWAIT suggests we shouldn't wait for locks or
> +	 * just shouldn't wait for I/O.
> +	 */
> +	WARN_ON_ONCE(vmf->flags & FAULT_FLAG_RETRY_NOWAIT);
> +
>  	mutex_lock(&vdev->vma_lock);
>  	down_read(&vdev->memory_lock);
>  
> 
> Is that correct?  Thanks,

Yes, actually either with or without it. :)

Since I've read through the patch too and I don't see any issue from my side
besides this one, FWIW.. please also feel free to take my r-b:

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-26 14:32                         ` Alex Williamson
  2020-05-26 14:46                           ` Peter Xu
@ 2020-05-26 15:53                           ` Jason Gunthorpe
  2020-05-26 15:57                             ` Alex Williamson
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-05-26 15:53 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Peter Xu, John Hubbard, kvm, linux-kernel, cohuck, cai, Andrea Arcangeli

On Tue, May 26, 2020 at 08:32:18AM -0600, Alex Williamson wrote:
> > > Certainly there is no reason to optimize the fringe case of vfio
> > > sleeping if there is and incorrect concurrnent attempt to disable the
> > > a BAR.  
> > 
> > If fixup_user_fault() (which is always with ALLOW_RETRY && !RETRY_NOWAIT) is
> > the only path for the new fault(), then current way seems ok.  Not sure whether
> > this would worth a WARN_ON_ONCE(RETRY_NOWAIT) in the fault() to be clear of
> > that fact.
> 
> Thanks for the discussion over the weekend folks.  Peter, I take it
> you'd be satisfied if this patch were updated as:
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index aabba6439a5b..35bd7cd4e268 100644
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1528,6 +1528,13 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
>  	struct vfio_pci_device *vdev = vma->vm_private_data;
>  	vm_fault_t ret = VM_FAULT_NOPAGE;
>  
> +	/*
> +	 * We don't expect to be called with NOWAIT and there are conflicting
> +	 * opinions on whether NOWAIT suggests we shouldn't wait for locks or
> +	 * just shouldn't wait for I/O.
> +	 */
> +	WARN_ON_ONCE(vmf->flags & FAULT_FLAG_RETRY_NOWAIT);

I don't think this is right, this implies there is some reason this
code fails with FAULT_FLAG_RETRY_NOWAIT - but it is fine as written,
AFAICT

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
  2020-05-26 15:53                           ` Jason Gunthorpe
@ 2020-05-26 15:57                             ` Alex Williamson
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Williamson @ 2020-05-26 15:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Xu, John Hubbard, kvm, linux-kernel, cohuck, cai, Andrea Arcangeli

On Tue, 26 May 2020 12:53:31 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:

> On Tue, May 26, 2020 at 08:32:18AM -0600, Alex Williamson wrote:
> > > > Certainly there is no reason to optimize the fringe case of vfio
> > > > sleeping if there is and incorrect concurrnent attempt to disable the
> > > > a BAR.    
> > > 
> > > If fixup_user_fault() (which is always with ALLOW_RETRY && !RETRY_NOWAIT) is
> > > the only path for the new fault(), then current way seems ok.  Not sure whether
> > > this would worth a WARN_ON_ONCE(RETRY_NOWAIT) in the fault() to be clear of
> > > that fact.  
> > 
> > Thanks for the discussion over the weekend folks.  Peter, I take it
> > you'd be satisfied if this patch were updated as:
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index aabba6439a5b..35bd7cd4e268 100644
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -1528,6 +1528,13 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
> >  	struct vfio_pci_device *vdev = vma->vm_private_data;
> >  	vm_fault_t ret = VM_FAULT_NOPAGE;
> >  
> > +	/*
> > +	 * We don't expect to be called with NOWAIT and there are conflicting
> > +	 * opinions on whether NOWAIT suggests we shouldn't wait for locks or
> > +	 * just shouldn't wait for I/O.
> > +	 */
> > +	WARN_ON_ONCE(vmf->flags & FAULT_FLAG_RETRY_NOWAIT);  
> 
> I don't think this is right, this implies there is some reason this
> code fails with FAULT_FLAG_RETRY_NOWAIT - but it is fine as written,
> AFAICT

Ok, Peter said he's fine either way, I'll use the patch as originally
posted and include Peter's R-b.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory
@ 2020-05-25 10:09 kbuild test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2020-05-25 10:09 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 49807 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <159017506369.18853.17306023099999811263.stgit@gimli.home>
References: <159017506369.18853.17306023099999811263.stgit@gimli.home>
TO: Alex Williamson <alex.williamson@redhat.com>

Hi Alex,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.7-rc7]
[cannot apply to vfio/next linux/master next-20200522]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Alex-Williamson/vfio-pci-Block-user-access-to-disabled-device-MMIO/20200523-031907
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4286d192c803571e8ca43b0f1f8ea04d663a278a
:::::: branch date: 3 days ago
:::::: commit date: 3 days ago
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce (this is a W=1 build):
        # save the attached .config to linux build tree
        make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>


cppcheck warnings: (new ones prefixed by >>)

>> drivers/vfio/pci/vfio_pci.c:820:8: warning: Local variable 'cmd' shadows outer argument [shadowArgument]
      u16 cmd;
          ^
   drivers/vfio/pci/vfio_pci.c:748:20: note: Shadowed declaration
         unsigned int cmd, unsigned long arg)
                      ^
   drivers/vfio/pci/vfio_pci.c:820:8: note: Shadow variable
      u16 cmd;
          ^
>> drivers/vfio/vfio_iommu_type1.c:1226:12: warning: Local variable 'n' shadows outer variable [shadowVariable]
       size_t n = dma->iova + dma->size - iova;
              ^
   drivers/vfio/vfio_iommu_type1.c:1183:18: note: Shadowed declaration
    struct rb_node *n;
                    ^
   drivers/vfio/vfio_iommu_type1.c:1226:12: note: Shadow variable
       size_t n = dma->iova + dma->size - iova;
              ^

# https://github.com/0day-ci/linux/commit/3c9d9870c815814c4891775ab2950062e014a85c
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 3c9d9870c815814c4891775ab2950062e014a85c
vim +/cmd +820 drivers/vfio/pci/vfio_pci.c

3c9d9870c81581 Alex Williamson         2020-05-22   746  
89e1f7d4c66d85 Alex Williamson         2012-07-31   747  static long vfio_pci_ioctl(void *device_data,
89e1f7d4c66d85 Alex Williamson         2012-07-31   748  			   unsigned int cmd, unsigned long arg)
89e1f7d4c66d85 Alex Williamson         2012-07-31   749  {
89e1f7d4c66d85 Alex Williamson         2012-07-31   750  	struct vfio_pci_device *vdev = device_data;
89e1f7d4c66d85 Alex Williamson         2012-07-31   751  	unsigned long minsz;
89e1f7d4c66d85 Alex Williamson         2012-07-31   752  
89e1f7d4c66d85 Alex Williamson         2012-07-31   753  	if (cmd == VFIO_DEVICE_GET_INFO) {
89e1f7d4c66d85 Alex Williamson         2012-07-31   754  		struct vfio_device_info info;
89e1f7d4c66d85 Alex Williamson         2012-07-31   755  
89e1f7d4c66d85 Alex Williamson         2012-07-31   756  		minsz = offsetofend(struct vfio_device_info, num_irqs);
89e1f7d4c66d85 Alex Williamson         2012-07-31   757  
89e1f7d4c66d85 Alex Williamson         2012-07-31   758  		if (copy_from_user(&info, (void __user *)arg, minsz))
89e1f7d4c66d85 Alex Williamson         2012-07-31   759  			return -EFAULT;
89e1f7d4c66d85 Alex Williamson         2012-07-31   760  
89e1f7d4c66d85 Alex Williamson         2012-07-31   761  		if (info.argsz < minsz)
89e1f7d4c66d85 Alex Williamson         2012-07-31   762  			return -EINVAL;
89e1f7d4c66d85 Alex Williamson         2012-07-31   763  
89e1f7d4c66d85 Alex Williamson         2012-07-31   764  		info.flags = VFIO_DEVICE_FLAGS_PCI;
89e1f7d4c66d85 Alex Williamson         2012-07-31   765  
89e1f7d4c66d85 Alex Williamson         2012-07-31   766  		if (vdev->reset_works)
89e1f7d4c66d85 Alex Williamson         2012-07-31   767  			info.flags |= VFIO_DEVICE_FLAGS_RESET;
89e1f7d4c66d85 Alex Williamson         2012-07-31   768  
28541d41c9e04c Alex Williamson         2016-02-22   769  		info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
89e1f7d4c66d85 Alex Williamson         2012-07-31   770  		info.num_irqs = VFIO_PCI_NUM_IRQS;
89e1f7d4c66d85 Alex Williamson         2012-07-31   771  
8160c4e455820d Michael S. Tsirkin      2016-02-28   772  		return copy_to_user((void __user *)arg, &info, minsz) ?
8160c4e455820d Michael S. Tsirkin      2016-02-28   773  			-EFAULT : 0;
89e1f7d4c66d85 Alex Williamson         2012-07-31   774  
89e1f7d4c66d85 Alex Williamson         2012-07-31   775  	} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
89e1f7d4c66d85 Alex Williamson         2012-07-31   776  		struct pci_dev *pdev = vdev->pdev;
89e1f7d4c66d85 Alex Williamson         2012-07-31   777  		struct vfio_region_info info;
188ad9d6cbbce4 Alex Williamson         2016-02-22   778  		struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
28541d41c9e04c Alex Williamson         2016-02-22   779  		int i, ret;
89e1f7d4c66d85 Alex Williamson         2012-07-31   780  
89e1f7d4c66d85 Alex Williamson         2012-07-31   781  		minsz = offsetofend(struct vfio_region_info, offset);
89e1f7d4c66d85 Alex Williamson         2012-07-31   782  
89e1f7d4c66d85 Alex Williamson         2012-07-31   783  		if (copy_from_user(&info, (void __user *)arg, minsz))
89e1f7d4c66d85 Alex Williamson         2012-07-31   784  			return -EFAULT;
89e1f7d4c66d85 Alex Williamson         2012-07-31   785  
89e1f7d4c66d85 Alex Williamson         2012-07-31   786  		if (info.argsz < minsz)
89e1f7d4c66d85 Alex Williamson         2012-07-31   787  			return -EINVAL;
89e1f7d4c66d85 Alex Williamson         2012-07-31   788  
89e1f7d4c66d85 Alex Williamson         2012-07-31   789  		switch (info.index) {
89e1f7d4c66d85 Alex Williamson         2012-07-31   790  		case VFIO_PCI_CONFIG_REGION_INDEX:
89e1f7d4c66d85 Alex Williamson         2012-07-31   791  			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
89e1f7d4c66d85 Alex Williamson         2012-07-31   792  			info.size = pdev->cfg_size;
89e1f7d4c66d85 Alex Williamson         2012-07-31   793  			info.flags = VFIO_REGION_INFO_FLAG_READ |
89e1f7d4c66d85 Alex Williamson         2012-07-31   794  				     VFIO_REGION_INFO_FLAG_WRITE;
89e1f7d4c66d85 Alex Williamson         2012-07-31   795  			break;
89e1f7d4c66d85 Alex Williamson         2012-07-31   796  		case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
89e1f7d4c66d85 Alex Williamson         2012-07-31   797  			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
89e1f7d4c66d85 Alex Williamson         2012-07-31   798  			info.size = pci_resource_len(pdev, info.index);
89e1f7d4c66d85 Alex Williamson         2012-07-31   799  			if (!info.size) {
89e1f7d4c66d85 Alex Williamson         2012-07-31   800  				info.flags = 0;
89e1f7d4c66d85 Alex Williamson         2012-07-31   801  				break;
89e1f7d4c66d85 Alex Williamson         2012-07-31   802  			}
89e1f7d4c66d85 Alex Williamson         2012-07-31   803  
89e1f7d4c66d85 Alex Williamson         2012-07-31   804  			info.flags = VFIO_REGION_INFO_FLAG_READ |
89e1f7d4c66d85 Alex Williamson         2012-07-31   805  				     VFIO_REGION_INFO_FLAG_WRITE;
05f0c03fbac181 Yongji Xie              2016-06-30   806  			if (vdev->bar_mmap_supported[info.index]) {
89e1f7d4c66d85 Alex Williamson         2012-07-31   807  				info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
188ad9d6cbbce4 Alex Williamson         2016-02-22   808  				if (info.index == vdev->msix_bar) {
a32295c612c579 Alexey Kardashevskiy    2017-12-13   809  					ret = msix_mmappable_cap(vdev, &caps);
188ad9d6cbbce4 Alex Williamson         2016-02-22   810  					if (ret)
188ad9d6cbbce4 Alex Williamson         2016-02-22   811  						return ret;
188ad9d6cbbce4 Alex Williamson         2016-02-22   812  				}
188ad9d6cbbce4 Alex Williamson         2016-02-22   813  			}
188ad9d6cbbce4 Alex Williamson         2016-02-22   814  
89e1f7d4c66d85 Alex Williamson         2012-07-31   815  			break;
89e1f7d4c66d85 Alex Williamson         2012-07-31   816  		case VFIO_PCI_ROM_REGION_INDEX:
89e1f7d4c66d85 Alex Williamson         2012-07-31   817  		{
89e1f7d4c66d85 Alex Williamson         2012-07-31   818  			void __iomem *io;
89e1f7d4c66d85 Alex Williamson         2012-07-31   819  			size_t size;
3c9d9870c81581 Alex Williamson         2020-05-22  @820  			u16 cmd;
89e1f7d4c66d85 Alex Williamson         2012-07-31   821  
89e1f7d4c66d85 Alex Williamson         2012-07-31   822  			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
89e1f7d4c66d85 Alex Williamson         2012-07-31   823  			info.flags = 0;
89e1f7d4c66d85 Alex Williamson         2012-07-31   824  
89e1f7d4c66d85 Alex Williamson         2012-07-31   825  			/* Report the BAR size, not the ROM size */
89e1f7d4c66d85 Alex Williamson         2012-07-31   826  			info.size = pci_resource_len(pdev, info.index);
a13b64591747e8 Alex Williamson         2016-02-22   827  			if (!info.size) {
a13b64591747e8 Alex Williamson         2016-02-22   828  				/* Shadow ROMs appear as PCI option ROMs */
a13b64591747e8 Alex Williamson         2016-02-22   829  				if (pdev->resource[PCI_ROM_RESOURCE].flags &
a13b64591747e8 Alex Williamson         2016-02-22   830  							IORESOURCE_ROM_SHADOW)
a13b64591747e8 Alex Williamson         2016-02-22   831  					info.size = 0x20000;
a13b64591747e8 Alex Williamson         2016-02-22   832  				else
89e1f7d4c66d85 Alex Williamson         2012-07-31   833  					break;
a13b64591747e8 Alex Williamson         2016-02-22   834  			}
89e1f7d4c66d85 Alex Williamson         2012-07-31   835  
0cfd027be1d6de Eric Auger              2019-02-15   836  			/*
0cfd027be1d6de Eric Auger              2019-02-15   837  			 * Is it really there?  Enable memory decode for
0cfd027be1d6de Eric Auger              2019-02-15   838  			 * implicit access in pci_map_rom().
0cfd027be1d6de Eric Auger              2019-02-15   839  			 */
3c9d9870c81581 Alex Williamson         2020-05-22   840  			cmd = vfio_pci_memory_lock_and_enable(vdev);
89e1f7d4c66d85 Alex Williamson         2012-07-31   841  			io = pci_map_rom(pdev, &size);
0cfd027be1d6de Eric Auger              2019-02-15   842  			if (io) {
0cfd027be1d6de Eric Auger              2019-02-15   843  				info.flags = VFIO_REGION_INFO_FLAG_READ;
0cfd027be1d6de Eric Auger              2019-02-15   844  				pci_unmap_rom(pdev, io);
0cfd027be1d6de Eric Auger              2019-02-15   845  			} else {
89e1f7d4c66d85 Alex Williamson         2012-07-31   846  				info.size = 0;
89e1f7d4c66d85 Alex Williamson         2012-07-31   847  			}
3c9d9870c81581 Alex Williamson         2020-05-22   848  			vfio_pci_memory_unlock_and_restore(vdev, cmd);
89e1f7d4c66d85 Alex Williamson         2012-07-31   849  
89e1f7d4c66d85 Alex Williamson         2012-07-31   850  			break;
89e1f7d4c66d85 Alex Williamson         2012-07-31   851  		}
84237a826b261d Alex Williamson         2013-02-18   852  		case VFIO_PCI_VGA_REGION_INDEX:
84237a826b261d Alex Williamson         2013-02-18   853  			if (!vdev->has_vga)
84237a826b261d Alex Williamson         2013-02-18   854  				return -EINVAL;
84237a826b261d Alex Williamson         2013-02-18   855  
84237a826b261d Alex Williamson         2013-02-18   856  			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
84237a826b261d Alex Williamson         2013-02-18   857  			info.size = 0xc0000;
84237a826b261d Alex Williamson         2013-02-18   858  			info.flags = VFIO_REGION_INFO_FLAG_READ |
84237a826b261d Alex Williamson         2013-02-18   859  				     VFIO_REGION_INFO_FLAG_WRITE;
84237a826b261d Alex Williamson         2013-02-18   860  
84237a826b261d Alex Williamson         2013-02-18   861  			break;
89e1f7d4c66d85 Alex Williamson         2012-07-31   862  		default:
c535d34569bbc6 Kirti Wankhede          2016-11-17   863  		{
dda01f787df9f9 Alex Williamson         2017-12-12   864  			struct vfio_region_info_cap_type cap_type = {
dda01f787df9f9 Alex Williamson         2017-12-12   865  					.header.id = VFIO_REGION_INFO_CAP_TYPE,
dda01f787df9f9 Alex Williamson         2017-12-12   866  					.header.version = 1 };
c535d34569bbc6 Kirti Wankhede          2016-11-17   867  
28541d41c9e04c Alex Williamson         2016-02-22   868  			if (info.index >=
28541d41c9e04c Alex Williamson         2016-02-22   869  			    VFIO_PCI_NUM_REGIONS + vdev->num_regions)
89e1f7d4c66d85 Alex Williamson         2012-07-31   870  				return -EINVAL;
0e714d27786ce1 Gustavo A. R. Silva     2018-07-17   871  			info.index = array_index_nospec(info.index,
0e714d27786ce1 Gustavo A. R. Silva     2018-07-17   872  							VFIO_PCI_NUM_REGIONS +
0e714d27786ce1 Gustavo A. R. Silva     2018-07-17   873  							vdev->num_regions);
28541d41c9e04c Alex Williamson         2016-02-22   874  
28541d41c9e04c Alex Williamson         2016-02-22   875  			i = info.index - VFIO_PCI_NUM_REGIONS;
28541d41c9e04c Alex Williamson         2016-02-22   876  
28541d41c9e04c Alex Williamson         2016-02-22   877  			info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
28541d41c9e04c Alex Williamson         2016-02-22   878  			info.size = vdev->region[i].size;
28541d41c9e04c Alex Williamson         2016-02-22   879  			info.flags = vdev->region[i].flags;
28541d41c9e04c Alex Williamson         2016-02-22   880  
c535d34569bbc6 Kirti Wankhede          2016-11-17   881  			cap_type.type = vdev->region[i].type;
c535d34569bbc6 Kirti Wankhede          2016-11-17   882  			cap_type.subtype = vdev->region[i].subtype;
c535d34569bbc6 Kirti Wankhede          2016-11-17   883  
dda01f787df9f9 Alex Williamson         2017-12-12   884  			ret = vfio_info_add_capability(&caps, &cap_type.header,
dda01f787df9f9 Alex Williamson         2017-12-12   885  						       sizeof(cap_type));
28541d41c9e04c Alex Williamson         2016-02-22   886  			if (ret)
28541d41c9e04c Alex Williamson         2016-02-22   887  				return ret;
c535d34569bbc6 Kirti Wankhede          2016-11-17   888  
c2c0f1cde0ef56 Alexey Kardashevskiy    2018-12-19   889  			if (vdev->region[i].ops->add_capability) {
c2c0f1cde0ef56 Alexey Kardashevskiy    2018-12-19   890  				ret = vdev->region[i].ops->add_capability(vdev,
c2c0f1cde0ef56 Alexey Kardashevskiy    2018-12-19   891  						&vdev->region[i], &caps);
c2c0f1cde0ef56 Alexey Kardashevskiy    2018-12-19   892  				if (ret)
c2c0f1cde0ef56 Alexey Kardashevskiy    2018-12-19   893  					return ret;
c2c0f1cde0ef56 Alexey Kardashevskiy    2018-12-19   894  			}
c535d34569bbc6 Kirti Wankhede          2016-11-17   895  		}
89e1f7d4c66d85 Alex Williamson         2012-07-31   896  		}
89e1f7d4c66d85 Alex Williamson         2012-07-31   897  
188ad9d6cbbce4 Alex Williamson         2016-02-22   898  		if (caps.size) {
188ad9d6cbbce4 Alex Williamson         2016-02-22   899  			info.flags |= VFIO_REGION_INFO_FLAG_CAPS;
188ad9d6cbbce4 Alex Williamson         2016-02-22   900  			if (info.argsz < sizeof(info) + caps.size) {
188ad9d6cbbce4 Alex Williamson         2016-02-22   901  				info.argsz = sizeof(info) + caps.size;
188ad9d6cbbce4 Alex Williamson         2016-02-22   902  				info.cap_offset = 0;
188ad9d6cbbce4 Alex Williamson         2016-02-22   903  			} else {
188ad9d6cbbce4 Alex Williamson         2016-02-22   904  				vfio_info_cap_shift(&caps, sizeof(info));
c4aec3101319f8 Dan Carpenter           2016-02-25   905  				if (copy_to_user((void __user *)arg +
188ad9d6cbbce4 Alex Williamson         2016-02-22   906  						  sizeof(info), caps.buf,
c4aec3101319f8 Dan Carpenter           2016-02-25   907  						  caps.size)) {
188ad9d6cbbce4 Alex Williamson         2016-02-22   908  					kfree(caps.buf);
c4aec3101319f8 Dan Carpenter           2016-02-25   909  					return -EFAULT;
188ad9d6cbbce4 Alex Williamson         2016-02-22   910  				}
188ad9d6cbbce4 Alex Williamson         2016-02-22   911  				info.cap_offset = sizeof(info);
188ad9d6cbbce4 Alex Williamson         2016-02-22   912  			}
188ad9d6cbbce4 Alex Williamson         2016-02-22   913  
188ad9d6cbbce4 Alex Williamson         2016-02-22   914  			kfree(caps.buf);
89e1f7d4c66d85 Alex Williamson         2012-07-31   915  		}
89e1f7d4c66d85 Alex Williamson         2012-07-31   916  
8160c4e455820d Michael S. Tsirkin      2016-02-28   917  		return copy_to_user((void __user *)arg, &info, minsz) ?
8160c4e455820d Michael S. Tsirkin      2016-02-28   918  			-EFAULT : 0;
89e1f7d4c66d85 Alex Williamson         2012-07-31   919  
89e1f7d4c66d85 Alex Williamson         2012-07-31   920  	} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {
89e1f7d4c66d85 Alex Williamson         2012-07-31   921  		struct vfio_irq_info info;
89e1f7d4c66d85 Alex Williamson         2012-07-31   922  
89e1f7d4c66d85 Alex Williamson         2012-07-31   923  		minsz = offsetofend(struct vfio_irq_info, count);
89e1f7d4c66d85 Alex Williamson         2012-07-31   924  
89e1f7d4c66d85 Alex Williamson         2012-07-31   925  		if (copy_from_user(&info, (void __user *)arg, minsz))
89e1f7d4c66d85 Alex Williamson         2012-07-31   926  			return -EFAULT;
89e1f7d4c66d85 Alex Williamson         2012-07-31   927  
89e1f7d4c66d85 Alex Williamson         2012-07-31   928  		if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS)
89e1f7d4c66d85 Alex Williamson         2012-07-31   929  			return -EINVAL;
89e1f7d4c66d85 Alex Williamson         2012-07-31   930  
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   931  		switch (info.index) {
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   932  		case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX:
6140a8f5623820 Alex Williamson         2015-02-06   933  		case VFIO_PCI_REQ_IRQ_INDEX:
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   934  			break;
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   935  		case VFIO_PCI_ERR_IRQ_INDEX:
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   936  			if (pci_is_pcie(vdev->pdev))
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   937  				break;
544c05a60aef7d Gustavo A. R. Silva     2018-07-09   938  		/* fall through */
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   939  		default:
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   940  			return -EINVAL;
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   941  		}
dad9f8972e04cd Vijay Mohan Pandarathil 2013-03-11   942  
89e1f7d4c66d85 Alex Williamson         2012-07-31   943  		info.flags = VFIO_IRQ_INFO_EVENTFD;
89e1f7d4c66d85 Alex Williamson         2012-07-31   944  
89e1f7d4c66d85 Alex Williamson         2012-07-31   945  		info.count = vfio_pci_get_irq_count(vdev, info.index);
89e1f7d4c66d85 Alex Williamson         2012-07-31   946  
89e1f7d4c66d85 Alex Williamson         2012-07-31   947  		if (info.index == VFIO_PCI_INTX_IRQ_INDEX)
89e1f7d4c66d85 Alex Williamson         2012-07-31   948  			info.flags |= (VFIO_IRQ_INFO_MASKABLE |
89e1f7d4c66d85 Alex Williamson         2012-07-31   949  				       VFIO_IRQ_INFO_AUTOMASKED);
89e1f7d4c66d85 Alex Williamson         2012-07-31   950  		else
89e1f7d4c66d85 Alex Williamson         2012-07-31   951  			info.flags |= VFIO_IRQ_INFO_NORESIZE;
89e1f7d4c66d85 Alex Williamson         2012-07-31   952  
8160c4e455820d Michael S. Tsirkin      2016-02-28   953  		return copy_to_user((void __user *)arg, &info, minsz) ?
8160c4e455820d Michael S. Tsirkin      2016-02-28   954  			-EFAULT : 0;
89e1f7d4c66d85 Alex Williamson         2012-07-31   955  
89e1f7d4c66d85 Alex Williamson         2012-07-31   956  	} else if (cmd == VFIO_DEVICE_SET_IRQS) {
89e1f7d4c66d85 Alex Williamson         2012-07-31   957  		struct vfio_irq_set hdr;
89e1f7d4c66d85 Alex Williamson         2012-07-31   958  		u8 *data = NULL;
05692d7005a364 Vlad Tsyrklevich        2016-10-12   959  		int max, ret = 0;
ef198aaa169c61 Kirti Wankhede          2016-11-17   960  		size_t data_size = 0;
89e1f7d4c66d85 Alex Williamson         2012-07-31   961  
89e1f7d4c66d85 Alex Williamson         2012-07-31   962  		minsz = offsetofend(struct vfio_irq_set, count);
89e1f7d4c66d85 Alex Williamson         2012-07-31   963  
89e1f7d4c66d85 Alex Williamson         2012-07-31   964  		if (copy_from_user(&hdr, (void __user *)arg, minsz))
89e1f7d4c66d85 Alex Williamson         2012-07-31   965  			return -EFAULT;
89e1f7d4c66d85 Alex Williamson         2012-07-31   966  
05692d7005a364 Vlad Tsyrklevich        2016-10-12   967  		max = vfio_pci_get_irq_count(vdev, hdr.index);
89e1f7d4c66d85 Alex Williamson         2012-07-31   968  
ef198aaa169c61 Kirti Wankhede          2016-11-17   969  		ret = vfio_set_irqs_validate_and_prepare(&hdr, max,
ef198aaa169c61 Kirti Wankhede          2016-11-17   970  						 VFIO_PCI_NUM_IRQS, &data_size);
ef198aaa169c61 Kirti Wankhede          2016-11-17   971  		if (ret)
ef198aaa169c61 Kirti Wankhede          2016-11-17   972  			return ret;
89e1f7d4c66d85 Alex Williamson         2012-07-31   973  
ef198aaa169c61 Kirti Wankhede          2016-11-17   974  		if (data_size) {
3a1f7041ddd59e Fengguang Wu            2012-12-07   975  			data = memdup_user((void __user *)(arg + minsz),
ef198aaa169c61 Kirti Wankhede          2016-11-17   976  					    data_size);
3a1f7041ddd59e Fengguang Wu            2012-12-07   977  			if (IS_ERR(data))
3a1f7041ddd59e Fengguang Wu            2012-12-07   978  				return PTR_ERR(data);
89e1f7d4c66d85 Alex Williamson         2012-07-31   979  		}
89e1f7d4c66d85 Alex Williamson         2012-07-31   980  
89e1f7d4c66d85 Alex Williamson         2012-07-31   981  		mutex_lock(&vdev->igate);
89e1f7d4c66d85 Alex Williamson         2012-07-31   982  
89e1f7d4c66d85 Alex Williamson         2012-07-31   983  		ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index,
89e1f7d4c66d85 Alex Williamson         2012-07-31   984  					      hdr.start, hdr.count, data);
89e1f7d4c66d85 Alex Williamson         2012-07-31   985  
89e1f7d4c66d85 Alex Williamson         2012-07-31   986  		mutex_unlock(&vdev->igate);
89e1f7d4c66d85 Alex Williamson         2012-07-31   987  		kfree(data);
89e1f7d4c66d85 Alex Williamson         2012-07-31   988  
89e1f7d4c66d85 Alex Williamson         2012-07-31   989  		return ret;
89e1f7d4c66d85 Alex Williamson         2012-07-31   990  
8b27ee60bfd6bb Alex Williamson         2013-09-04   991  	} else if (cmd == VFIO_DEVICE_RESET) {
3c9d9870c81581 Alex Williamson         2020-05-22   992  		int ret;
3c9d9870c81581 Alex Williamson         2020-05-22   993  
3c9d9870c81581 Alex Williamson         2020-05-22   994  		if (!vdev->reset_works)
3c9d9870c81581 Alex Williamson         2020-05-22   995  			return -EINVAL;
3c9d9870c81581 Alex Williamson         2020-05-22   996  
3c9d9870c81581 Alex Williamson         2020-05-22   997  		vfio_pci_zap_and_down_write_memory_lock(vdev);
3c9d9870c81581 Alex Williamson         2020-05-22   998  		ret = pci_try_reset_function(vdev->pdev);
3c9d9870c81581 Alex Williamson         2020-05-22   999  		up_write(&vdev->memory_lock);
3c9d9870c81581 Alex Williamson         2020-05-22  1000  
3c9d9870c81581 Alex Williamson         2020-05-22  1001  		return ret;
89e1f7d4c66d85 Alex Williamson         2012-07-31  1002  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1003  	} else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) {
8b27ee60bfd6bb Alex Williamson         2013-09-04  1004  		struct vfio_pci_hot_reset_info hdr;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1005  		struct vfio_pci_fill_info fill = { 0 };
8b27ee60bfd6bb Alex Williamson         2013-09-04  1006  		struct vfio_pci_dependent_device *devices = NULL;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1007  		bool slot = false;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1008  		int ret = 0;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1009  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1010  		minsz = offsetofend(struct vfio_pci_hot_reset_info, count);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1011  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1012  		if (copy_from_user(&hdr, (void __user *)arg, minsz))
8b27ee60bfd6bb Alex Williamson         2013-09-04  1013  			return -EFAULT;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1014  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1015  		if (hdr.argsz < minsz)
8b27ee60bfd6bb Alex Williamson         2013-09-04  1016  			return -EINVAL;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1017  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1018  		hdr.flags = 0;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1019  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1020  		/* Can we do a slot or bus reset or neither? */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1021  		if (!pci_probe_reset_slot(vdev->pdev->slot))
8b27ee60bfd6bb Alex Williamson         2013-09-04  1022  			slot = true;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1023  		else if (pci_probe_reset_bus(vdev->pdev->bus))
8b27ee60bfd6bb Alex Williamson         2013-09-04  1024  			return -ENODEV;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1025  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1026  		/* How many devices are affected? */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1027  		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1028  						    vfio_pci_count_devs,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1029  						    &fill.max, slot);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1030  		if (ret)
8b27ee60bfd6bb Alex Williamson         2013-09-04  1031  			return ret;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1032  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1033  		WARN_ON(!fill.max); /* Should always be at least one */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1034  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1035  		/*
8b27ee60bfd6bb Alex Williamson         2013-09-04  1036  		 * If there's enough space, fill it now, otherwise return
8b27ee60bfd6bb Alex Williamson         2013-09-04  1037  		 * -ENOSPC and the number of devices affected.
8b27ee60bfd6bb Alex Williamson         2013-09-04  1038  		 */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1039  		if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
8b27ee60bfd6bb Alex Williamson         2013-09-04  1040  			ret = -ENOSPC;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1041  			hdr.count = fill.max;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1042  			goto reset_info_exit;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1043  		}
8b27ee60bfd6bb Alex Williamson         2013-09-04  1044  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1045  		devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1046  		if (!devices)
8b27ee60bfd6bb Alex Williamson         2013-09-04  1047  			return -ENOMEM;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1048  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1049  		fill.devices = devices;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1050  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1051  		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1052  						    vfio_pci_fill_devs,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1053  						    &fill, slot);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1054  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1055  		/*
8b27ee60bfd6bb Alex Williamson         2013-09-04  1056  		 * If a device was removed between counting and filling,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1057  		 * we may come up short of fill.max.  If a device was
8b27ee60bfd6bb Alex Williamson         2013-09-04  1058  		 * added, we'll have a return of -EAGAIN above.
8b27ee60bfd6bb Alex Williamson         2013-09-04  1059  		 */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1060  		if (!ret)
8b27ee60bfd6bb Alex Williamson         2013-09-04  1061  			hdr.count = fill.cur;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1062  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1063  reset_info_exit:
8b27ee60bfd6bb Alex Williamson         2013-09-04  1064  		if (copy_to_user((void __user *)arg, &hdr, minsz))
8b27ee60bfd6bb Alex Williamson         2013-09-04  1065  			ret = -EFAULT;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1066  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1067  		if (!ret) {
8b27ee60bfd6bb Alex Williamson         2013-09-04  1068  			if (copy_to_user((void __user *)(arg + minsz), devices,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1069  					 hdr.count * sizeof(*devices)))
8b27ee60bfd6bb Alex Williamson         2013-09-04  1070  				ret = -EFAULT;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1071  		}
8b27ee60bfd6bb Alex Williamson         2013-09-04  1072  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1073  		kfree(devices);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1074  		return ret;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1075  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1076  	} else if (cmd == VFIO_DEVICE_PCI_HOT_RESET) {
8b27ee60bfd6bb Alex Williamson         2013-09-04  1077  		struct vfio_pci_hot_reset hdr;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1078  		int32_t *group_fds;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1079  		struct vfio_pci_group_entry *groups;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1080  		struct vfio_pci_group_info info;
3c9d9870c81581 Alex Williamson         2020-05-22  1081  		struct vfio_devices devs = { .cur_index = 0 };
8b27ee60bfd6bb Alex Williamson         2013-09-04  1082  		bool slot = false;
3c9d9870c81581 Alex Williamson         2020-05-22  1083  		int i, group_idx, mem_idx = 0, count = 0, ret = 0;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1084  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1085  		minsz = offsetofend(struct vfio_pci_hot_reset, count);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1086  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1087  		if (copy_from_user(&hdr, (void __user *)arg, minsz))
8b27ee60bfd6bb Alex Williamson         2013-09-04  1088  			return -EFAULT;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1089  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1090  		if (hdr.argsz < minsz || hdr.flags)
8b27ee60bfd6bb Alex Williamson         2013-09-04  1091  			return -EINVAL;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1092  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1093  		/* Can we do a slot or bus reset or neither? */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1094  		if (!pci_probe_reset_slot(vdev->pdev->slot))
8b27ee60bfd6bb Alex Williamson         2013-09-04  1095  			slot = true;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1096  		else if (pci_probe_reset_bus(vdev->pdev->bus))
8b27ee60bfd6bb Alex Williamson         2013-09-04  1097  			return -ENODEV;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1098  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1099  		/*
8b27ee60bfd6bb Alex Williamson         2013-09-04  1100  		 * We can't let userspace give us an arbitrarily large
8b27ee60bfd6bb Alex Williamson         2013-09-04  1101  		 * buffer to copy, so verify how many we think there
8b27ee60bfd6bb Alex Williamson         2013-09-04  1102  		 * could be.  Note groups can have multiple devices so
8b27ee60bfd6bb Alex Williamson         2013-09-04  1103  		 * one group per device is the max.
8b27ee60bfd6bb Alex Williamson         2013-09-04  1104  		 */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1105  		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1106  						    vfio_pci_count_devs,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1107  						    &count, slot);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1108  		if (ret)
8b27ee60bfd6bb Alex Williamson         2013-09-04  1109  			return ret;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1110  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1111  		/* Somewhere between 1 and count is OK */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1112  		if (!hdr.count || hdr.count > count)
8b27ee60bfd6bb Alex Williamson         2013-09-04  1113  			return -EINVAL;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1114  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1115  		group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1116  		groups = kcalloc(hdr.count, sizeof(*groups), GFP_KERNEL);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1117  		if (!group_fds || !groups) {
8b27ee60bfd6bb Alex Williamson         2013-09-04  1118  			kfree(group_fds);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1119  			kfree(groups);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1120  			return -ENOMEM;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1121  		}
8b27ee60bfd6bb Alex Williamson         2013-09-04  1122  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1123  		if (copy_from_user(group_fds, (void __user *)(arg + minsz),
8b27ee60bfd6bb Alex Williamson         2013-09-04  1124  				   hdr.count * sizeof(*group_fds))) {
8b27ee60bfd6bb Alex Williamson         2013-09-04  1125  			kfree(group_fds);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1126  			kfree(groups);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1127  			return -EFAULT;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1128  		}
8b27ee60bfd6bb Alex Williamson         2013-09-04  1129  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1130  		/*
8b27ee60bfd6bb Alex Williamson         2013-09-04  1131  		 * For each group_fd, get the group through the vfio external
8b27ee60bfd6bb Alex Williamson         2013-09-04  1132  		 * user interface and store the group and iommu ID.  This
8b27ee60bfd6bb Alex Williamson         2013-09-04  1133  		 * ensures the group is held across the reset.
8b27ee60bfd6bb Alex Williamson         2013-09-04  1134  		 */
3c9d9870c81581 Alex Williamson         2020-05-22  1135  		for (group_idx = 0; group_idx < hdr.count; group_idx++) {
8b27ee60bfd6bb Alex Williamson         2013-09-04  1136  			struct vfio_group *group;
3c9d9870c81581 Alex Williamson         2020-05-22  1137  			struct fd f = fdget(group_fds[group_idx]);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1138  			if (!f.file) {
8b27ee60bfd6bb Alex Williamson         2013-09-04  1139  				ret = -EBADF;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1140  				break;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1141  			}
8b27ee60bfd6bb Alex Williamson         2013-09-04  1142  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1143  			group = vfio_group_get_external_user(f.file);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1144  			fdput(f);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1145  			if (IS_ERR(group)) {
8b27ee60bfd6bb Alex Williamson         2013-09-04  1146  				ret = PTR_ERR(group);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1147  				break;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1148  			}
8b27ee60bfd6bb Alex Williamson         2013-09-04  1149  
3c9d9870c81581 Alex Williamson         2020-05-22  1150  			groups[group_idx].group = group;
3c9d9870c81581 Alex Williamson         2020-05-22  1151  			groups[group_idx].id =
3c9d9870c81581 Alex Williamson         2020-05-22  1152  					vfio_external_user_iommu_id(group);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1153  		}
8b27ee60bfd6bb Alex Williamson         2013-09-04  1154  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1155  		kfree(group_fds);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1156  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1157  		/* release reference to groups on error */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1158  		if (ret)
8b27ee60bfd6bb Alex Williamson         2013-09-04  1159  			goto hot_reset_release;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1160  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1161  		info.count = hdr.count;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1162  		info.groups = groups;
8b27ee60bfd6bb Alex Williamson         2013-09-04  1163  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1164  		/*
8b27ee60bfd6bb Alex Williamson         2013-09-04  1165  		 * Test whether all the affected devices are contained
8b27ee60bfd6bb Alex Williamson         2013-09-04  1166  		 * by the set of groups provided by the user.
8b27ee60bfd6bb Alex Williamson         2013-09-04  1167  		 */
8b27ee60bfd6bb Alex Williamson         2013-09-04  1168  		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1169  						    vfio_pci_validate_devs,
8b27ee60bfd6bb Alex Williamson         2013-09-04  1170  						    &info, slot);
3c9d9870c81581 Alex Williamson         2020-05-22  1171  		if (ret)
3c9d9870c81581 Alex Williamson         2020-05-22  1172  			goto hot_reset_release;
3c9d9870c81581 Alex Williamson         2020-05-22  1173  
3c9d9870c81581 Alex Williamson         2020-05-22  1174  		devs.max_index = count;
3c9d9870c81581 Alex Williamson         2020-05-22  1175  		devs.devices = kcalloc(count, sizeof(struct vfio_device *),
3c9d9870c81581 Alex Williamson         2020-05-22  1176  				       GFP_KERNEL);
3c9d9870c81581 Alex Williamson         2020-05-22  1177  		if (!devs.devices) {
3c9d9870c81581 Alex Williamson         2020-05-22  1178  			ret = -ENOMEM;
3c9d9870c81581 Alex Williamson         2020-05-22  1179  			goto hot_reset_release;
3c9d9870c81581 Alex Williamson         2020-05-22  1180  		}
3c9d9870c81581 Alex Williamson         2020-05-22  1181  
3c9d9870c81581 Alex Williamson         2020-05-22  1182  		/*
3c9d9870c81581 Alex Williamson         2020-05-22  1183  		 * We need to get memory_lock for each device, but devices
3c9d9870c81581 Alex Williamson         2020-05-22  1184  		 * can share mmap_sem, therefore we need to zap and hold
3c9d9870c81581 Alex Williamson         2020-05-22  1185  		 * the vma_lock for each device, and only then get each
3c9d9870c81581 Alex Williamson         2020-05-22  1186  		 * memory_lock.
3c9d9870c81581 Alex Williamson         2020-05-22  1187  		 */
3c9d9870c81581 Alex Williamson         2020-05-22  1188  		ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
3c9d9870c81581 Alex Williamson         2020-05-22  1189  					    vfio_pci_try_zap_and_vma_lock_cb,
3c9d9870c81581 Alex Williamson         2020-05-22  1190  					    &devs, slot);
3c9d9870c81581 Alex Williamson         2020-05-22  1191  		if (ret)
3c9d9870c81581 Alex Williamson         2020-05-22  1192  			goto hot_reset_release;
3c9d9870c81581 Alex Williamson         2020-05-22  1193  
3c9d9870c81581 Alex Williamson         2020-05-22  1194  		for (; mem_idx < devs.cur_index; mem_idx++) {
3c9d9870c81581 Alex Williamson         2020-05-22  1195  			struct vfio_pci_device *tmp;
3c9d9870c81581 Alex Williamson         2020-05-22  1196  
3c9d9870c81581 Alex Williamson         2020-05-22  1197  			tmp = vfio_device_data(devs.devices[mem_idx]);
3c9d9870c81581 Alex Williamson         2020-05-22  1198  
3c9d9870c81581 Alex Williamson         2020-05-22  1199  			ret = down_write_trylock(&tmp->memory_lock);
3c9d9870c81581 Alex Williamson         2020-05-22  1200  			if (!ret) {
3c9d9870c81581 Alex Williamson         2020-05-22  1201  				ret = -EBUSY;
3c9d9870c81581 Alex Williamson         2020-05-22  1202  				goto hot_reset_release;
3c9d9870c81581 Alex Williamson         2020-05-22  1203  			}
3c9d9870c81581 Alex Williamson         2020-05-22  1204  			mutex_unlock(&tmp->vma_lock);
3c9d9870c81581 Alex Williamson         2020-05-22  1205  		}
3c9d9870c81581 Alex Williamson         2020-05-22  1206  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1207  		/* User has access, do the reset */
c6a44ba950d147 Sinan Kaya              2018-07-19  1208  		ret = pci_reset_bus(vdev->pdev);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1209  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1210  hot_reset_release:
3c9d9870c81581 Alex Williamson         2020-05-22  1211  		for (i = 0; i < devs.cur_index; i++) {
3c9d9870c81581 Alex Williamson         2020-05-22  1212  			struct vfio_device *device;
3c9d9870c81581 Alex Williamson         2020-05-22  1213  			struct vfio_pci_device *tmp;
3c9d9870c81581 Alex Williamson         2020-05-22  1214  
3c9d9870c81581 Alex Williamson         2020-05-22  1215  			device = devs.devices[i];
3c9d9870c81581 Alex Williamson         2020-05-22  1216  			tmp = vfio_device_data(device);
3c9d9870c81581 Alex Williamson         2020-05-22  1217  
3c9d9870c81581 Alex Williamson         2020-05-22  1218  			if (i < mem_idx)
3c9d9870c81581 Alex Williamson         2020-05-22  1219  				up_write(&tmp->memory_lock);
3c9d9870c81581 Alex Williamson         2020-05-22  1220  			else
3c9d9870c81581 Alex Williamson         2020-05-22  1221  				mutex_unlock(&tmp->vma_lock);
3c9d9870c81581 Alex Williamson         2020-05-22  1222  			vfio_device_put(device);
3c9d9870c81581 Alex Williamson         2020-05-22  1223  		}
3c9d9870c81581 Alex Williamson         2020-05-22  1224  		kfree(devs.devices);
3c9d9870c81581 Alex Williamson         2020-05-22  1225  
3c9d9870c81581 Alex Williamson         2020-05-22  1226  		for (group_idx--; group_idx >= 0; group_idx--)
3c9d9870c81581 Alex Williamson         2020-05-22  1227  			vfio_group_put_external_user(groups[group_idx].group);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1228  
8b27ee60bfd6bb Alex Williamson         2013-09-04  1229  		kfree(groups);
8b27ee60bfd6bb Alex Williamson         2013-09-04  1230  		return ret;
30656177c40804 Alex Williamson         2018-03-21  1231  	} else if (cmd == VFIO_DEVICE_IOEVENTFD) {
30656177c40804 Alex Williamson         2018-03-21  1232  		struct vfio_device_ioeventfd ioeventfd;
30656177c40804 Alex Williamson         2018-03-21  1233  		int count;
30656177c40804 Alex Williamson         2018-03-21  1234  
30656177c40804 Alex Williamson         2018-03-21  1235  		minsz = offsetofend(struct vfio_device_ioeventfd, fd);
30656177c40804 Alex Williamson         2018-03-21  1236  
30656177c40804 Alex Williamson         2018-03-21  1237  		if (copy_from_user(&ioeventfd, (void __user *)arg, minsz))
30656177c40804 Alex Williamson         2018-03-21  1238  			return -EFAULT;
30656177c40804 Alex Williamson         2018-03-21  1239  
30656177c40804 Alex Williamson         2018-03-21  1240  		if (ioeventfd.argsz < minsz)
30656177c40804 Alex Williamson         2018-03-21  1241  			return -EINVAL;
30656177c40804 Alex Williamson         2018-03-21  1242  
30656177c40804 Alex Williamson         2018-03-21  1243  		if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK)
30656177c40804 Alex Williamson         2018-03-21  1244  			return -EINVAL;
30656177c40804 Alex Williamson         2018-03-21  1245  
30656177c40804 Alex Williamson         2018-03-21  1246  		count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK;
30656177c40804 Alex Williamson         2018-03-21  1247  
30656177c40804 Alex Williamson         2018-03-21  1248  		if (hweight8(count) != 1 || ioeventfd.fd < -1)
30656177c40804 Alex Williamson         2018-03-21  1249  			return -EINVAL;
30656177c40804 Alex Williamson         2018-03-21  1250  
30656177c40804 Alex Williamson         2018-03-21  1251  		return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
30656177c40804 Alex Williamson         2018-03-21  1252  					  ioeventfd.data, count, ioeventfd.fd);
43eeeecc8ed5fa Alex Williamson         2020-03-24  1253  	} else if (cmd == VFIO_DEVICE_FEATURE) {
43eeeecc8ed5fa Alex Williamson         2020-03-24  1254  		struct vfio_device_feature feature;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1255  		uuid_t uuid;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1256  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1257  		minsz = offsetofend(struct vfio_device_feature, flags);
43eeeecc8ed5fa Alex Williamson         2020-03-24  1258  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1259  		if (copy_from_user(&feature, (void __user *)arg, minsz))
43eeeecc8ed5fa Alex Williamson         2020-03-24  1260  			return -EFAULT;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1261  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1262  		if (feature.argsz < minsz)
43eeeecc8ed5fa Alex Williamson         2020-03-24  1263  			return -EINVAL;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1264  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1265  		/* Check unknown flags */
43eeeecc8ed5fa Alex Williamson         2020-03-24  1266  		if (feature.flags & ~(VFIO_DEVICE_FEATURE_MASK |
43eeeecc8ed5fa Alex Williamson         2020-03-24  1267  				      VFIO_DEVICE_FEATURE_SET |
43eeeecc8ed5fa Alex Williamson         2020-03-24  1268  				      VFIO_DEVICE_FEATURE_GET |
43eeeecc8ed5fa Alex Williamson         2020-03-24  1269  				      VFIO_DEVICE_FEATURE_PROBE))
43eeeecc8ed5fa Alex Williamson         2020-03-24  1270  			return -EINVAL;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1271  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1272  		/* GET & SET are mutually exclusive except with PROBE */
43eeeecc8ed5fa Alex Williamson         2020-03-24  1273  		if (!(feature.flags & VFIO_DEVICE_FEATURE_PROBE) &&
43eeeecc8ed5fa Alex Williamson         2020-03-24  1274  		    (feature.flags & VFIO_DEVICE_FEATURE_SET) &&
43eeeecc8ed5fa Alex Williamson         2020-03-24  1275  		    (feature.flags & VFIO_DEVICE_FEATURE_GET))
43eeeecc8ed5fa Alex Williamson         2020-03-24  1276  			return -EINVAL;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1277  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1278  		switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) {
43eeeecc8ed5fa Alex Williamson         2020-03-24  1279  		case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
43eeeecc8ed5fa Alex Williamson         2020-03-24  1280  			if (!vdev->vf_token)
43eeeecc8ed5fa Alex Williamson         2020-03-24  1281  				return -ENOTTY;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1282  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1283  			/*
43eeeecc8ed5fa Alex Williamson         2020-03-24  1284  			 * We do not support GET of the VF Token UUID as this
43eeeecc8ed5fa Alex Williamson         2020-03-24  1285  			 * could expose the token of the previous device user.
43eeeecc8ed5fa Alex Williamson         2020-03-24  1286  			 */
43eeeecc8ed5fa Alex Williamson         2020-03-24  1287  			if (feature.flags & VFIO_DEVICE_FEATURE_GET)
43eeeecc8ed5fa Alex Williamson         2020-03-24  1288  				return -EINVAL;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1289  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1290  			if (feature.flags & VFIO_DEVICE_FEATURE_PROBE)
43eeeecc8ed5fa Alex Williamson         2020-03-24  1291  				return 0;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1292  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1293  			/* Don't SET unless told to do so */
43eeeecc8ed5fa Alex Williamson         2020-03-24  1294  			if (!(feature.flags & VFIO_DEVICE_FEATURE_SET))
43eeeecc8ed5fa Alex Williamson         2020-03-24  1295  				return -EINVAL;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1296  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1297  			if (feature.argsz < minsz + sizeof(uuid))
43eeeecc8ed5fa Alex Williamson         2020-03-24  1298  				return -EINVAL;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1299  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1300  			if (copy_from_user(&uuid, (void __user *)(arg + minsz),
43eeeecc8ed5fa Alex Williamson         2020-03-24  1301  					   sizeof(uuid)))
43eeeecc8ed5fa Alex Williamson         2020-03-24  1302  				return -EFAULT;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1303  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1304  			mutex_lock(&vdev->vf_token->lock);
43eeeecc8ed5fa Alex Williamson         2020-03-24  1305  			uuid_copy(&vdev->vf_token->uuid, &uuid);
43eeeecc8ed5fa Alex Williamson         2020-03-24  1306  			mutex_unlock(&vdev->vf_token->lock);
43eeeecc8ed5fa Alex Williamson         2020-03-24  1307  
43eeeecc8ed5fa Alex Williamson         2020-03-24  1308  			return 0;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1309  		default:
43eeeecc8ed5fa Alex Williamson         2020-03-24  1310  			return -ENOTTY;
43eeeecc8ed5fa Alex Williamson         2020-03-24  1311  		}
8b27ee60bfd6bb Alex Williamson         2013-09-04  1312  	}
8b27ee60bfd6bb Alex Williamson         2013-09-04  1313  
89e1f7d4c66d85 Alex Williamson         2012-07-31  1314  	return -ENOTTY;
89e1f7d4c66d85 Alex Williamson         2012-07-31  1315  }
89e1f7d4c66d85 Alex Williamson         2012-07-31  1316  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-05-26 15:57 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-22 19:17 [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO Alex Williamson
2020-05-22 19:17 ` [PATCH v3 1/3] vfio/type1: Support faulting PFNMAP vmas Alex Williamson
2020-05-22 19:17 ` [PATCH v3 2/3] vfio-pci: Fault mmaps to enable vma tracking Alex Williamson
2020-05-22 19:17 ` [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory Alex Williamson
2020-05-23 19:34   ` Peter Xu
2020-05-23 23:06     ` Alex Williamson
2020-05-23 23:52       ` Peter Xu
2020-05-24  0:02         ` Peter Xu
2020-05-25 12:26         ` Jason Gunthorpe
2020-05-25 14:28           ` Peter Xu
2020-05-25 14:46             ` Jason Gunthorpe
2020-05-25 15:11               ` Peter Xu
2020-05-25 16:56                 ` Jason Gunthorpe
2020-05-25 20:56                   ` John Hubbard
2020-05-26  0:37                     ` Jason Gunthorpe
2020-05-26  0:46                       ` John Hubbard
2020-05-26 13:49                       ` Peter Xu
2020-05-26 14:32                         ` Alex Williamson
2020-05-26 14:46                           ` Peter Xu
2020-05-26 15:53                           ` Jason Gunthorpe
2020-05-26 15:57                             ` Alex Williamson
2020-05-22 22:08 ` [PATCH v3 0/3] vfio-pci: Block user access to disabled device MMIO Qian Cai
2020-05-22 22:25   ` Alex Williamson
2020-05-25 10:09 [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.