All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH kernel] vfio-pci: Allow mapping of MSI-X table to userspace
@ 2017-06-29 14:50 Alexey Kardashevskiy
  2017-06-29 20:42 ` Alex Williamson
  0 siblings, 1 reply; 2+ messages in thread
From: Alexey Kardashevskiy @ 2017-06-29 14:50 UTC (permalink / raw)
  To: kvm
  Cc: Alexey Kardashevskiy, David Gibson, Yongji Xie, Alex Williamson,
	Eric Auger, Paul Mackerras

At the moment VFIO does not allow mapping MSIX BAR to the userspace
and provides information about what can be mapped via a PCI region
capability. This ensures that the userspace does not get access
to MSIX vector table and cannot break MSIX setup. For most platforms
the userspace also chooses not to map MSIX as it has to emulate it for
fully virtualized guests.

This creates a problem when an MSIX BAR is not aligned to a system
page size and in order to prevent mapping, chunks before and/or after
the MSIX BAR can't also be mapped to the userspace. If these chunks
happen to be used very often (for example, Emulex LightPulse FC),
we see half of expected performance.

This enables mapping of MSIX BARs to the userspace by advertising
these BARs as a single chunk (as any other BAR). The userspace
still is expected not to mmap an MSIX BAR if it want to emulate it
(QEMU still has this code).

This does not break protection as at the moment for the Type1 IOMMU,
vfio_iommu_type1_attach_group() hook fails if MSIX remapping
is not enabled and this ensures that even ill-behaved device cannot
harm the host. For sPAPR IOMMU, the hardware always provides such
isolation by default. The only case when such mapping protection
is rather needed is allow_unsafe_interrupts but as its name suggests,
it is not safe anyway.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

---

This is to replace 6 patches from Yongji
"[RESEND PATCH v2 0/6] vfio-pci: Add support for mmapping MSI-X table"

Previously we tried to add a MSI_REMAP flag to a PCI bus and set it
via various means for various platforms. It relied on
MSI_FLAG_IRQ_REMAPPING from msi_domain_info which never made it to
the upstream. Instead Type1 IOMMU learned to check remapping
capabilities and refuse to work and it seems like a good idea to use
this.

How much do we care about allow_unsafe_interrupts? If we care enough,
I'll need a way to pass the msi_remap flag from
vfio_iommu_type1_attach_group() down to vfio_pci_bar_rw/vfio_pci_mmap,
it does not seem easy to make it nice.

What else did I miss?

Please comment. Thanks.
---
 drivers/vfio/pci/vfio_pci.c      | 41 ++++------------------------------------
 drivers/vfio/pci/vfio_pci_rdwr.c |  5 -----
 2 files changed, 4 insertions(+), 42 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e3a1a4..222bc07b9e41 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -559,15 +559,8 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
 				struct vfio_info_cap *caps)
 {
 	struct vfio_region_info_cap_sparse_mmap *sparse;
-	size_t end, size;
-	int nr_areas = 2, i = 0, ret;
-
-	end = pci_resource_len(vdev->pdev, vdev->msix_bar);
-
-	/* If MSI-X table is aligned to the start or end, only one area */
-	if (((vdev->msix_offset & PAGE_MASK) == 0) ||
-	    (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
-		nr_areas = 1;
+	size_t size;
+	int nr_areas = 1, i = 0, ret;
 
 	size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas));
 
@@ -577,18 +570,8 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
 
 	sparse->nr_areas = nr_areas;
 
-	if (vdev->msix_offset & PAGE_MASK) {
-		sparse->areas[i].offset = 0;
-		sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
-		i++;
-	}
-
-	if (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) < end) {
-		sparse->areas[i].offset = PAGE_ALIGN(vdev->msix_offset +
-						     vdev->msix_size);
-		sparse->areas[i].size = end - sparse->areas[i].offset;
-		i++;
-	}
+	sparse->areas[i].offset = 0;
+	sparse->areas[i].size = pci_resource_len(vdev->pdev, vdev->msix_bar);
 
 	ret = vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP,
 				       sparse);
@@ -1115,22 +1098,6 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
 	if (req_start + req_len > phys_len)
 		return -EINVAL;
 
-	if (index == vdev->msix_bar) {
-		/*
-		 * Disallow mmaps overlapping the MSI-X table; users don't
-		 * get to touch this directly.  We could find somewhere
-		 * else to map the overlap, but page granularity is only
-		 * a recommendation, not a requirement, so the user needs
-		 * to know which bits are real.  Requiring them to mmap
-		 * around the table makes that clear.
-		 */
-
-		/* If neither entirely above nor below, then it overlaps */
-		if (!(req_start >= vdev->msix_offset + vdev->msix_size ||
-		      req_start + req_len <= vdev->msix_offset))
-			return -EINVAL;
-	}
-
 	/*
 	 * Even though we don't make use of the barmap for the mmap,
 	 * we need to request the region and the barmap tracks that.
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 357243d76f10..916b72c5c4fc 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -164,11 +164,6 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
 	} else
 		io = vdev->barmap[bar];
 
-	if (bar == vdev->msix_bar) {
-		x_start = vdev->msix_offset;
-		x_end = vdev->msix_offset + vdev->msix_size;
-	}
-
 	done = do_io_rw(io, buf, pos, count, x_start, x_end, iswrite);
 
 	if (done >= 0)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH kernel] vfio-pci: Allow mapping of MSI-X table to userspace
  2017-06-29 14:50 [RFC PATCH kernel] vfio-pci: Allow mapping of MSI-X table to userspace Alexey Kardashevskiy
@ 2017-06-29 20:42 ` Alex Williamson
  0 siblings, 0 replies; 2+ messages in thread
From: Alex Williamson @ 2017-06-29 20:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, David Gibson, Yongji Xie, Eric Auger, Paul Mackerras

On Fri, 30 Jun 2017 00:50:57 +1000
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> At the moment VFIO does not allow mapping MSIX BAR to the userspace
> and provides information about what can be mapped via a PCI region
> capability. This ensures that the userspace does not get access
> to MSIX vector table and cannot break MSIX setup. For most platforms
> the userspace also chooses not to map MSIX as it has to emulate it for
> fully virtualized guests.
> 
> This creates a problem when an MSIX BAR is not aligned to a system
> page size and in order to prevent mapping, chunks before and/or after
> the MSIX BAR can't also be mapped to the userspace. If these chunks
> happen to be used very often (for example, Emulex LightPulse FC),
> we see half of expected performance.
> 
> This enables mapping of MSIX BARs to the userspace by advertising
> these BARs as a single chunk (as any other BAR). The userspace
> still is expected not to mmap an MSIX BAR if it want to emulate it
> (QEMU still has this code).
> 
> This does not break protection as at the moment for the Type1 IOMMU,
> vfio_iommu_type1_attach_group() hook fails if MSIX remapping
> is not enabled and this ensures that even ill-behaved device cannot
> harm the host. For sPAPR IOMMU, the hardware always provides such
> isolation by default. The only case when such mapping protection
> is rather needed is allow_unsafe_interrupts but as its name suggests,
> it is not safe anyway.

I don't think allow_unsafe_interrupts is sufficiently un-safe to
entirely disregard it like this either.  It's currently an opt-in, but
we don't consider it unsafe enough that we taint the kernel as a result
of using it.  Allowing user write access to the MSI-X vector table when
there's no IOMMU protection upgrades an MSI injection attack from
somewhat involved to almost trivially easy.  So nak to that.

 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> ---
> 
> This is to replace 6 patches from Yongji
> "[RESEND PATCH v2 0/6] vfio-pci: Add support for mmapping MSI-X table"
> 
> Previously we tried to add a MSI_REMAP flag to a PCI bus and set it
> via various means for various platforms. It relied on
> MSI_FLAG_IRQ_REMAPPING from msi_domain_info which never made it to
> the upstream. Instead Type1 IOMMU learned to check remapping
> capabilities and refuse to work and it seems like a good idea to use
> this.

Type1 has always known how to do this, but interrupt remapping is a
property on the IOMMU, not the bus, which is why Yongji was trying to
create a PCI bus property to expose to the device.
 
> How much do we care about allow_unsafe_interrupts? If we care enough,
> I'll need a way to pass the msi_remap flag from
> vfio_iommu_type1_attach_group() down to vfio_pci_bar_rw/vfio_pci_mmap,
> it does not seem easy to make it nice.

As above, I think you're abusing an opt-in security concern as an
excuse to allow an almost trivially easy exploit.  I also prefer
Yongji's approach of exposing the capability through bus properties
rather than creating a back channel between vfio modules as you seem to
suggest.

> What else did I miss?
> 
> Please comment. Thanks.
> ---
>  drivers/vfio/pci/vfio_pci.c      | 41 ++++------------------------------------
>  drivers/vfio/pci/vfio_pci_rdwr.c |  5 -----
>  2 files changed, 4 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 324c52e3a1a4..222bc07b9e41 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -559,15 +559,8 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
>  				struct vfio_info_cap *caps)
>  {
>  	struct vfio_region_info_cap_sparse_mmap *sparse;
> -	size_t end, size;
> -	int nr_areas = 2, i = 0, ret;
> -
> -	end = pci_resource_len(vdev->pdev, vdev->msix_bar);
> -
> -	/* If MSI-X table is aligned to the start or end, only one area */
> -	if (((vdev->msix_offset & PAGE_MASK) == 0) ||
> -	    (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
> -		nr_areas = 1;
> +	size_t size;
> +	int nr_areas = 1, i = 0, ret;
>  
>  	size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas));
>  
> @@ -577,18 +570,8 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
>  
>  	sparse->nr_areas = nr_areas;
>  
> -	if (vdev->msix_offset & PAGE_MASK) {
> -		sparse->areas[i].offset = 0;
> -		sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
> -		i++;
> -	}
> -
> -	if (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) < end) {
> -		sparse->areas[i].offset = PAGE_ALIGN(vdev->msix_offset +
> -						     vdev->msix_size);
> -		sparse->areas[i].size = end - sparse->areas[i].offset;
> -		i++;
> -	}
> +	sparse->areas[i].offset = 0;
> +	sparse->areas[i].size = pci_resource_len(vdev->pdev, vdev->msix_bar);
>  
>  	ret = vfio_info_add_capability(caps, VFIO_REGION_INFO_CAP_SPARSE_MMAP,
>  				       sparse);

This doesn't make any sense to me, why even expose it as a sparse mmap
capability if there's nothing sparse about it?

> @@ -1115,22 +1098,6 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>  	if (req_start + req_len > phys_len)
>  		return -EINVAL;
>  
> -	if (index == vdev->msix_bar) {
> -		/*
> -		 * Disallow mmaps overlapping the MSI-X table; users don't
> -		 * get to touch this directly.  We could find somewhere
> -		 * else to map the overlap, but page granularity is only
> -		 * a recommendation, not a requirement, so the user needs
> -		 * to know which bits are real.  Requiring them to mmap
> -		 * around the table makes that clear.
> -		 */
> -
> -		/* If neither entirely above nor below, then it overlaps */
> -		if (!(req_start >= vdev->msix_offset + vdev->msix_size ||
> -		      req_start + req_len <= vdev->msix_offset))
> -			return -EINVAL;
> -	}
> -
>  	/*
>  	 * Even though we don't make use of the barmap for the mmap,
>  	 * we need to request the region and the barmap tracks that.
> diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
> index 357243d76f10..916b72c5c4fc 100644
> --- a/drivers/vfio/pci/vfio_pci_rdwr.c
> +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
> @@ -164,11 +164,6 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
>  	} else
>  		io = vdev->barmap[bar];
>  
> -	if (bar == vdev->msix_bar) {
> -		x_start = vdev->msix_offset;
> -		x_end = vdev->msix_offset + vdev->msix_size;
> -	}
> -
>  	done = do_io_rw(io, buf, pos, count, x_start, x_end, iswrite);
>  
>  	if (done >= 0)

Nope, I prefer Yongji's approach, sorry.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-06-29 20:42 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29 14:50 [RFC PATCH kernel] vfio-pci: Allow mapping of MSI-X table to userspace Alexey Kardashevskiy
2017-06-29 20:42 ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.