linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* revert dma direct internals abuse
@ 2019-04-08 10:55 Christoph Hellwig
  2019-04-08 10:55 ` [PATCH] Revert "drm/vmwgfx: Improve on IOMMU detection" Christoph Hellwig
  2019-04-08 18:47 ` revert dma direct internals abuse Thomas Hellstrom
  0 siblings, 2 replies; 14+ messages in thread
From: Christoph Hellwig @ 2019-04-08 10:55 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Hellstrom; +Cc: Deepak Rawat, iommu, linux-kernel

Hi Linus,

unfortunately it took less than a merge window for the:

/*
 * All the dma_direct_* declarations are here just for the indirect call bypass,
 * and must not be used directly drivers!
 */

warning in dma-mapping.h to be ignored.  This series reverts the offender
to keep our API clean.

Thomas: please talk me first about your needs and I'll be happy to figure out
a proper API for what you want to do.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Revert "drm/vmwgfx: Improve on IOMMU detection"
  2019-04-08 10:55 revert dma direct internals abuse Christoph Hellwig
@ 2019-04-08 10:55 ` Christoph Hellwig
  2019-04-08 20:05   ` Thomas Hellstrom
  2019-04-08 18:47 ` revert dma direct internals abuse Thomas Hellstrom
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2019-04-08 10:55 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Hellstrom; +Cc: Deepak Rawat, iommu, linux-kernel

This reverts commit 9ddac734aa310c5fbc0ec93602335d2a39092451.
---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 6165fe2c4504..9294b76c8084 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -26,7 +26,6 @@
  **************************************************************************/
 #include <linux/module.h>
 #include <linux/console.h>
-#include <linux/dma-mapping.h>
 
 #include <drm/drmP.h>
 #include "vmwgfx_drv.h"
@@ -35,6 +34,7 @@
 #include <drm/ttm/ttm_placement.h>
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_module.h>
+#include <linux/intel-iommu.h>
 
 #define VMWGFX_DRIVER_DESC "Linux drm driver for VMware graphics devices"
 #define VMWGFX_CHIP_SVGAII 0
@@ -545,21 +545,6 @@ static void vmw_get_initial_size(struct vmw_private *dev_priv)
 	dev_priv->initial_height = height;
 }
 
-/**
- * vmw_assume_iommu - Figure out whether coherent dma-remapping might be
- * taking place.
- * @dev: Pointer to the struct drm_device.
- *
- * Return: true if iommu present, false otherwise.
- */
-static bool vmw_assume_iommu(struct drm_device *dev)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev->dev);
-
-	return !dma_is_direct(ops) && ops &&
-		ops->map_page != dma_direct_map_page;
-}
-
 /**
  * vmw_dma_select_mode - Determine how DMA mappings should be set up for this
  * system.
@@ -583,7 +568,7 @@ static int vmw_dma_select_mode(struct vmw_private *dev_priv)
 
 	if (vmw_force_coherent)
 		dev_priv->map_mode = vmw_dma_alloc_coherent;
-	else if (vmw_assume_iommu(dev_priv->dev))
+	else if (intel_iommu_enabled)
 		dev_priv->map_mode = vmw_dma_map_populate;
 	else if (!vmw_force_iommu)
 		dev_priv->map_mode = vmw_dma_phys;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-08 10:55 revert dma direct internals abuse Christoph Hellwig
  2019-04-08 10:55 ` [PATCH] Revert "drm/vmwgfx: Improve on IOMMU detection" Christoph Hellwig
@ 2019-04-08 18:47 ` Thomas Hellstrom
  2019-04-09  9:57   ` hch
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Hellstrom @ 2019-04-08 18:47 UTC (permalink / raw)
  To: hch, torvalds; +Cc: linux-kernel, Deepak Singh Rawat, iommu

Christoph,


On Mon, 2019-04-08 at 12:55 +0200, Christoph Hellwig wrote:
> Hi Linus,
> 
> unfortunately it took less than a merge window for the:
> 
> /*
>  * All the dma_direct_* declarations are here just for the indirect
> call bypass,
>  * and must not be used directly drivers!
>  */
> 
> warning in dma-mapping.h to be ignored.  This series reverts the
> offender
> to keep our API clean.
> 
> Thomas: please talk me first about your needs and I'll be happy to
> figure out
> a proper API for what you want to do.

We HAVE discussed our needs, although admittedly some of my emails
ended up unanswered.

We've as you're well aware of had a discussion with the other
subsystems doing user-space DMA-buffers wanting this functionality from
the dma api (AMD graphics and RDMA people IIRC). that is a bool that
tells us whether streaming dma mappings are coherent, and I described
in detail why we couldn't use the dma_sync_* API and
dma_alloc_coherent().

The other option we have is to just fail miserably without messages if
streaming DMA is not coherent, which I think the other drivers might
do... That's all I'm trying to avoid here. I'd much prefer to have the
dma API export this bool.

/Thomas







^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Revert "drm/vmwgfx: Improve on IOMMU detection"
  2019-04-08 10:55 ` [PATCH] Revert "drm/vmwgfx: Improve on IOMMU detection" Christoph Hellwig
@ 2019-04-08 20:05   ` Thomas Hellstrom
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellstrom @ 2019-04-08 20:05 UTC (permalink / raw)
  To: hch, torvalds; +Cc: linux-kernel, Deepak Singh Rawat, iommu

On Mon, 2019-04-08 at 12:55 +0200, Christoph Hellwig wrote:
> This reverts commit 9ddac734aa310c5fbc0ec93602335d2a39092451.

IMHO, rather than reverting, I'd like to see the dma API provide a
function telling us whether streaming DMA operations are coherent or
not. 

With that we could fix all drivers that would otherwise just blow up if
somebody happened to enable swiotlb with bounce buffers...

/Thomas



> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 19 ++-----------------
>  1 file changed, 2 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 6165fe2c4504..9294b76c8084 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -26,7 +26,6 @@
>  
> *********************************************************************
> *****/
>  #include <linux/module.h>
>  #include <linux/console.h>
> -#include <linux/dma-mapping.h>
>  
>  #include <drm/drmP.h>
>  #include "vmwgfx_drv.h"
> @@ -35,6 +34,7 @@
>  #include <drm/ttm/ttm_placement.h>
>  #include <drm/ttm/ttm_bo_driver.h>
>  #include <drm/ttm/ttm_module.h>
> +#include <linux/intel-iommu.h>
>  
>  #define VMWGFX_DRIVER_DESC "Linux drm driver for VMware graphics
> devices"
>  #define VMWGFX_CHIP_SVGAII 0
> @@ -545,21 +545,6 @@ static void vmw_get_initial_size(struct
> vmw_private *dev_priv)
>  	dev_priv->initial_height = height;
>  }
>  
> -/**
> - * vmw_assume_iommu - Figure out whether coherent dma-remapping
> might be
> - * taking place.
> - * @dev: Pointer to the struct drm_device.
> - *
> - * Return: true if iommu present, false otherwise.
> - */
> -static bool vmw_assume_iommu(struct drm_device *dev)
> -{
> -	const struct dma_map_ops *ops = get_dma_ops(dev->dev);
> -
> -	return !dma_is_direct(ops) && ops &&
> -		ops->map_page != dma_direct_map_page;
> -}
> -
>  /**
>   * vmw_dma_select_mode - Determine how DMA mappings should be set up
> for this
>   * system.
> @@ -583,7 +568,7 @@ static int vmw_dma_select_mode(struct vmw_private
> *dev_priv)
>  
>  	if (vmw_force_coherent)
>  		dev_priv->map_mode = vmw_dma_alloc_coherent;
> -	else if (vmw_assume_iommu(dev_priv->dev))
> +	else if (intel_iommu_enabled)
>  		dev_priv->map_mode = vmw_dma_map_populate;
>  	else if (!vmw_force_iommu)
>  		dev_priv->map_mode = vmw_dma_phys;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-08 18:47 ` revert dma direct internals abuse Thomas Hellstrom
@ 2019-04-09  9:57   ` hch
  2019-04-09 13:04     ` Thomas Hellstrom
  0 siblings, 1 reply; 14+ messages in thread
From: hch @ 2019-04-09  9:57 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: hch, torvalds, linux-kernel, Deepak Singh Rawat, iommu

On Mon, Apr 08, 2019 at 06:47:52PM +0000, Thomas Hellstrom wrote:
> We HAVE discussed our needs, although admittedly some of my emails
> ended up unanswered.

And than you haven't followed up, and instead ignored the layering
instructions and just commited a broken patch?

> We've as you're well aware of had a discussion with the other
> subsystems doing user-space DMA-buffers wanting this functionality from
> the dma api (AMD graphics and RDMA people IIRC). that is a bool that
> tells us whether streaming dma mappings are coherent, and I described
> in detail why we couldn't use the dma_sync_* API and
> dma_alloc_coherent().

And that is not at all what you either check or claim to do in the
changelog (which btw, are two different things).

> The other option we have is to just fail miserably without messages if
> streaming DMA is not coherent, which I think the other drivers might
> do... That's all I'm trying to avoid here. I'd much prefer to have the
> dma API export this bool.

Both DMA direct and non-DMA direct streaming mappings can be either
coherent or incoherent, so your patch doesn't archive that.  The
commit log claims the following:

"drm/vmwgfx: Improve on IOMMU detection
    
 instead of relying on intel_iommu_enabled, use the fact that the
 dma_map_ops::map_page != dma_direct_map_page"

which has nothing to do with the fact that streaming mappings are
coherent.  It also is incorrect as there are direct mapping that
do not use dma_direct_map_page (e.g. on ARM, or x86 with VMD).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-09  9:57   ` hch
@ 2019-04-09 13:04     ` Thomas Hellstrom
  2019-04-09 13:31       ` hch
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellstrom @ 2019-04-09 13:04 UTC (permalink / raw)
  To: hch; +Cc: torvalds, linux-kernel, Deepak Singh Rawat, iommu

On Tue, 2019-04-09 at 11:57 +0200, hch@lst.de wrote:
> On Mon, Apr 08, 2019 at 06:47:52PM +0000, Thomas Hellstrom wrote:
> > We HAVE discussed our needs, although admittedly some of my emails
> > ended up unanswered.
> 
> And than you haven't followed up, and instead ignored the layering
> instructions and just commited a broken patch?

Yes, I'm really sorry for not observing the layering instructions. To
be completely honest I didn't observe that layering warning, rather
than ignoring it. The patch doesn't claim to be generic and I believe
that on the VMware platform it is functionally correct, please see
below. I also would like not to focus on who did what and why. I can't
see how that will help this discussion moving forward.

> 
> > We've as you're well aware of had a discussion with the other
> > subsystems doing user-space DMA-buffers wanting this functionality
> > from
> > the dma api (AMD graphics and RDMA people IIRC). that is a bool
> > that
> > tells us whether streaming dma mappings are coherent, and I
> > described
> > in detail why we couldn't use the dma_sync_* API and
> > dma_alloc_coherent().
> 
> And that is not at all what you either check or claim to do in the
> changelog (which btw, are two different things).
> 
> > The other option we have is to just fail miserably without messages
> > if
> > streaming DMA is not coherent, which I think the other drivers
> > might
> > do... That's all I'm trying to avoid here. I'd much prefer to have
> > the
> > dma API export this bool.
> 
> Both DMA direct and non-DMA direct streaming mappings can be either
> coherent or incoherent, so your patch doesn't archive that.  The
> commit log claims the following:
> 
> "drm/vmwgfx: Improve on IOMMU detection
>     
>  instead of relying on intel_iommu_enabled, use the fact that the
>  dma_map_ops::map_page != dma_direct_map_page"
> 
> which has nothing to do with the fact that streaming mappings are
> coherent.  It also is incorrect as there are direct mapping that
> do not use dma_direct_map_page (e.g. on ARM, or x86 with VMD).

On the VMware platform we have two possible vIOMMUS, the AMD iommu and
Intel VTD, Given those conditions I belive the patch is functionally
correct. We can't cover the AMD case with intel_iommu_enabled.
Furthermore the only form of incoherency that can affect our graphics
device is someone forcing SWIOTLB in which case that person would be
happier with software rendering. In any case, observing the fact that
the direct_ops are not used makes sure that SWIOTLB is not used.
Knowing that we're on the VMware platform, we're coherent and can
safely have the dma layer do dma address translation for us. All this
information was not explicilty written in the changelog, no.


In any case, assuming that that patch is reverted due to the layering
violation, Are you willing to help out with a small API to detect the
situation where streaming DMA is incoherent?

/Thomas



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-09 13:04     ` Thomas Hellstrom
@ 2019-04-09 13:31       ` hch
  2019-04-09 14:17         ` Thomas Hellstrom
  0 siblings, 1 reply; 14+ messages in thread
From: hch @ 2019-04-09 13:31 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: hch, torvalds, linux-kernel, Deepak Singh Rawat, iommu

On Tue, Apr 09, 2019 at 01:04:51PM +0000, Thomas Hellstrom wrote:
> On the VMware platform we have two possible vIOMMUS, the AMD iommu and
> Intel VTD, Given those conditions I belive the patch is functionally
> correct. We can't cover the AMD case with intel_iommu_enabled.
> Furthermore the only form of incoherency that can affect our graphics
> device is someone forcing SWIOTLB in which case that person would be
> happier with software rendering. In any case, observing the fact that
> the direct_ops are not used makes sure that SWIOTLB is not used.
> Knowing that we're on the VMware platform, we're coherent and can
> safely have the dma layer do dma address translation for us. All this
> information was not explicilty written in the changelog, no.

We have a series pending that might bounce your buffers even when
using the Intel IOMMU, which should eventually also find its way
to other IOMMUs:

    https://lists.linuxfoundation.org/pipermail/iommu/2019-March/034090.html

> In any case, assuming that that patch is reverted due to the layering
> violation, Are you willing to help out with a small API to detect the
> situation where streaming DMA is incoherent?

The short but sad answer is that we can't ever guarantee that you
can skip the dma_*sync_* calls.  There are too many factors in play
that might require it at any time - working around unaligned addresses
in iommus, CPUs that are coherent for some device and not some, addressing
limitations both in physical CPUs and VMs (see the various "secure VM"
concepts floating around at the moment).

If you want to avoid the dma_*sync_* calls you must use
dma_alloc_coherent to allocate the memory.  Note that the memory for
dma_alloc_coherent actually comes from the normal page pool most of
the time, and for certain on x86, which seems to be what you care
about.  The times of it dipping into the tiny swiotlb pool are long
gone.  So at least for you I see absolutely no reason to not simply
always use dma_alloc_coherent to start with.  For other uses that
involve platforms without DMA coherent devices like arm the tradeoffs
might be a little different.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-09 13:31       ` hch
@ 2019-04-09 14:17         ` Thomas Hellstrom
  2019-04-09 15:25           ` hch
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellstrom @ 2019-04-09 14:17 UTC (permalink / raw)
  To: hch; +Cc: torvalds, linux-kernel, Deepak Singh Rawat, iommu

On Tue, 2019-04-09 at 15:31 +0200, hch@lst.de wrote:
> On Tue, Apr 09, 2019 at 01:04:51PM +0000, Thomas Hellstrom wrote:
> > On the VMware platform we have two possible vIOMMUS, the AMD iommu
> > and
> > Intel VTD, Given those conditions I belive the patch is
> > functionally
> > correct. We can't cover the AMD case with intel_iommu_enabled.
> > Furthermore the only form of incoherency that can affect our
> > graphics
> > device is someone forcing SWIOTLB in which case that person would
> > be
> > happier with software rendering. In any case, observing the fact
> > that
> > the direct_ops are not used makes sure that SWIOTLB is not used.
> > Knowing that we're on the VMware platform, we're coherent and can
> > safely have the dma layer do dma address translation for us. All
> > this
> > information was not explicilty written in the changelog, no.
> 
> We have a series pending that might bounce your buffers even when
> using the Intel IOMMU, which should eventually also find its way
> to other IOMMUs:
> 
>     
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxfoundation.org%2Fpipermail%2Fiommu%2F2019-March%2F034090.html&amp;data=02%7C01%7Cthellstrom%40vmware.com%7C9933ee7b805842607ea908d6bcefc505%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636904135345010687&amp;sdata=Px500%2B1FjL%2FZLedUdbAXz4a%2BT5DaZBFf6wnesTyFvZY%3D&amp;reserved=0

If that's the case, I think most of the graphics drivers will stop
functioning. I don't think people would want that, and even if the
graphics drivers are "to blame" due to not implementing the sync calls,
I think the work involved to get things right is impressive if at all
possible.

> 
> > In any case, assuming that that patch is reverted due to the
> > layering
> > violation, Are you willing to help out with a small API to detect
> > the
> > situation where streaming DMA is incoherent?
> 
> The short but sad answer is that we can't ever guarantee that you
> can skip the dma_*sync_* calls.  There are too many factors in play
> that might require it at any time - working around unaligned
> addresses
> in iommus, CPUs that are coherent for some device and not some,
> addressing
> limitations both in physical CPUs and VMs (see the various "secure
> VM"
> concepts floating around at the moment).
> 
> If you want to avoid the dma_*sync_* calls you must use
> dma_alloc_coherent to allocate the memory.  Note that the memory for
> dma_alloc_coherent actually comes from the normal page pool most of
> the time, and for certain on x86, which seems to be what you care
> about.  The times of it dipping into the tiny swiotlb pool are long
> gone.  So at least for you I see absolutely no reason to not simply
> always use dma_alloc_coherent to start with.  For other uses that
> involve platforms without DMA coherent devices like arm the tradeoffs
> might be a little different.

There are two things that concerns me with dma_alloc_coherent:

1) It seems to want pages mapped either in the kernel map or vmapped.
Graphics drivers allocate huge amounts of memory, Typically up to 50%
of system memory or more. In a 32 bit PAE system I'm afraid of running
out of vmap space as well as not being able to allocate as much memory
as I want. Perhaps a dma_alloc_coherent() interface that returns a page
rather than a virtual address would do the trick.

2) Exporting using dma-buf. A page allocated using dma_alloc_coherent()
for one device might not be coherent for another device. What happens
if I allocate a page using dma_alloc_coherent() for device 1 and then
want to map it using dma_map_page() for device 2?

Thanks,
Thomas




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-09 14:17         ` Thomas Hellstrom
@ 2019-04-09 15:25           ` hch
  2019-04-09 17:24             ` Thomas Hellstrom
  0 siblings, 1 reply; 14+ messages in thread
From: hch @ 2019-04-09 15:25 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: hch, torvalds, linux-kernel, Deepak Singh Rawat, iommu

On Tue, Apr 09, 2019 at 02:17:40PM +0000, Thomas Hellstrom wrote:
> If that's the case, I think most of the graphics drivers will stop
> functioning. I don't think people would want that, and even if the
> graphics drivers are "to blame" due to not implementing the sync calls,
> I think the work involved to get things right is impressive if at all
> possible.

Note that this only affects external, untrusted devices.  But that
may include eGPU, so yes GPU folks finally need to up their game and
stop thinking they are above the law^H^H^Hinterface.

> There are two things that concerns me with dma_alloc_coherent:
> 
> 1) It seems to want pages mapped either in the kernel map or vmapped.
> Graphics drivers allocate huge amounts of memory, Typically up to 50%
> of system memory or more. In a 32 bit PAE system I'm afraid of running
> out of vmap space as well as not being able to allocate as much memory
> as I want. Perhaps a dma_alloc_coherent() interface that returns a page
> rather than a virtual address would do the trick.

We can't just simply export a page.  For devices that are not cache
coherent we need to remap the returned memory to be uncached.  In the
common cases that is either done by setting an uncached bit in the page
tables, or by using a special address space alias.  So the virtual
address to access the page matter, and we can't just kmap a random
page an expect it to be coherent.  If you want memory that is not
mapped into the kernel direct mapping and DMA to it you need to
use the proper DMA streaming interface and obey their rules.

> 2) Exporting using dma-buf. A page allocated using dma_alloc_coherent()
> for one device might not be coherent for another device. What happens
> if I allocate a page using dma_alloc_coherent() for device 1 and then
> want to map it using dma_map_page() for device 2?

The problem in this case isn't really the coherency - once a page
is mapped uncached it is 'coherent' for all devices, even those not
requiring it.  The problem is addressability - the DMA address for
the same memory might be different for different devices, and something
that looks contigous to one device that is using an IOMMU might not
for another one using the direct mapping.

We have the dma_get_sgtable API to map a piece of coherent memory
using the streaming APIs for another device, but it has all sorts of
problems.

That being said: your driver already uses the dma coherent API
under various circumstances, so you already have the above issues.

In the end swiotlb_nr_tbl() might be the best hint that some bounce
buffering could happen.  This isn't proper use of the API, but at
least a little better than your old intel_iommu_emabled check
and much better than we we have right now.  Something like this:

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 6165fe2c4504..ff00bea026c5 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -545,21 +545,6 @@ static void vmw_get_initial_size(struct vmw_private *dev_priv)
 	dev_priv->initial_height = height;
 }
 
-/**
- * vmw_assume_iommu - Figure out whether coherent dma-remapping might be
- * taking place.
- * @dev: Pointer to the struct drm_device.
- *
- * Return: true if iommu present, false otherwise.
- */
-static bool vmw_assume_iommu(struct drm_device *dev)
-{
-	const struct dma_map_ops *ops = get_dma_ops(dev->dev);
-
-	return !dma_is_direct(ops) && ops &&
-		ops->map_page != dma_direct_map_page;
-}
-
 /**
  * vmw_dma_select_mode - Determine how DMA mappings should be set up for this
  * system.
@@ -581,25 +566,14 @@ static int vmw_dma_select_mode(struct vmw_private *dev_priv)
 		[vmw_dma_map_populate] = "Keeping DMA mappings.",
 		[vmw_dma_map_bind] = "Giving up DMA mappings early."};
 
-	if (vmw_force_coherent)
-		dev_priv->map_mode = vmw_dma_alloc_coherent;
-	else if (vmw_assume_iommu(dev_priv->dev))
-		dev_priv->map_mode = vmw_dma_map_populate;
-	else if (!vmw_force_iommu)
-		dev_priv->map_mode = vmw_dma_phys;
-	else if (IS_ENABLED(CONFIG_SWIOTLB) && swiotlb_nr_tbl())
+	if (vmw_force_coherent ||
+	    (IS_ENABLED(CONFIG_SWIOTLB) && swiotlb_nr_tbl()))
 		dev_priv->map_mode = vmw_dma_alloc_coherent;
+	else if (vmw_restrict_iommu)
+		dev_priv->map_mode = vmw_dma_map_bind;
 	else
 		dev_priv->map_mode = vmw_dma_map_populate;
 
-	if (dev_priv->map_mode == vmw_dma_map_populate && vmw_restrict_iommu)
-		dev_priv->map_mode = vmw_dma_map_bind;
-
-	/* No TTM coherent page pool? FIXME: Ask TTM instead! */
-        if (!(IS_ENABLED(CONFIG_SWIOTLB) || IS_ENABLED(CONFIG_INTEL_IOMMU)) &&
-	    (dev_priv->map_mode == vmw_dma_alloc_coherent))
-		return -EINVAL;
-
 	DRM_INFO("DMA map mode: %s\n", names[dev_priv->map_mode]);
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-09 15:25           ` hch
@ 2019-04-09 17:24             ` Thomas Hellstrom
  2019-04-10  6:43               ` hch
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellstrom @ 2019-04-09 17:24 UTC (permalink / raw)
  To: hch; +Cc: torvalds, linux-kernel, Deepak Singh Rawat, iommu

On Tue, 2019-04-09 at 17:25 +0200, hch@lst.de wrote:
> On Tue, Apr 09, 2019 at 02:17:40PM +0000, Thomas Hellstrom wrote:
> > If that's the case, I think most of the graphics drivers will stop
> > functioning. I don't think people would want that, and even if the
> > graphics drivers are "to blame" due to not implementing the sync
> > calls,
> > I think the work involved to get things right is impressive if at
> > all
> > possible.
> 
> Note that this only affects external, untrusted devices.  But that
> may include eGPU,

What about discrete graphics cards, like Radeon and Nvidia? Who gets to
determine what's trusted?

> so yes GPU folks finally need to up their game and
> stop thinking they are above the law^H^H^Hinterface.

And others doing user-space DMA. But I don't think a good way is to
break their drivers.

> 
> > There are two things that concerns me with dma_alloc_coherent:
> > 
> > 1) It seems to want pages mapped either in the kernel map or
> > vmapped.
> > Graphics drivers allocate huge amounts of memory, Typically up to
> > 50%
> > of system memory or more. In a 32 bit PAE system I'm afraid of
> > running
> > out of vmap space as well as not being able to allocate as much
> > memory
> > as I want. Perhaps a dma_alloc_coherent() interface that returns a
> > page
> > rather than a virtual address would do the trick.
> 
> We can't just simply export a page.  For devices that are not cache
> coherent we need to remap the returned memory to be uncached.  In the
> common cases that is either done by setting an uncached bit in the
> page
> tables, or by using a special address space alias.  So the virtual
> address to access the page matter, and we can't just kmap a random
> page an expect it to be coherent.  If you want memory that is not
> mapped into the kernel direct mapping and DMA to it you need to
> use the proper DMA streaming interface and obey their rules.

GPU libraries traditionally have been taking care of the CPU mapping
caching modes since the first AGP drivers. GPU MMU ptes commonly
support various caching options and pages are changing caching mode
dynamically.
So even if the DMA layer needs to do the remapping, couldn't we do that
on-demand when needed with a simple interface?

> 
> > 2) Exporting using dma-buf. A page allocated using
> > dma_alloc_coherent()
> > for one device might not be coherent for another device. What
> > happens
> > if I allocate a page using dma_alloc_coherent() for device 1 and
> > then
> > want to map it using dma_map_page() for device 2?
> 
> The problem in this case isn't really the coherency - once a page
> is mapped uncached it is 'coherent' for all devices, even those not
> requiring it.  The problem is addressability - the DMA address for
> the same memory might be different for different devices, and
> something
> that looks contigous to one device that is using an IOMMU might not
> for another one using the direct mapping.
> 
> We have the dma_get_sgtable API to map a piece of coherent memory
> using the streaming APIs for another device, but it has all sorts of
> problems.
> 
> That being said: your driver already uses the dma coherent API
> under various circumstances, so you already have the above issues.

Yes, but they are hidden behind driver options. We can't have someone
upgrade their kernel and suddenly things don't work anymore, That said,
I think the SWIOTLB case is rare enough for the below solution to be
acceptable, although the TTM check for the coherent page pool being
available still needs to remain.

Thanks,
Thomas


> 
> In the end swiotlb_nr_tbl() might be the best hint that some bounce
> buffering could happen.  This isn't proper use of the API, but at
> least a little better than your old intel_iommu_emabled check
> and much better than we we have right now.  Something like this:
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 6165fe2c4504..ff00bea026c5 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -545,21 +545,6 @@ static void vmw_get_initial_size(struct
> vmw_private *dev_priv)
>  	dev_priv->initial_height = height;
>  }
>  
> -/**
> - * vmw_assume_iommu - Figure out whether coherent dma-remapping
> might be
> - * taking place.
> - * @dev: Pointer to the struct drm_device.
> - *
> - * Return: true if iommu present, false otherwise.
> - */
> -static bool vmw_assume_iommu(struct drm_device *dev)
> -{
> -	const struct dma_map_ops *ops = get_dma_ops(dev->dev);
> -
> -	return !dma_is_direct(ops) && ops &&
> -		ops->map_page != dma_direct_map_page;
> -}
> -
>  /**
>   * vmw_dma_select_mode - Determine how DMA mappings should be set up
> for this
>   * system.
> @@ -581,25 +566,14 @@ static int vmw_dma_select_mode(struct
> vmw_private *dev_priv)
>  		[vmw_dma_map_populate] = "Keeping DMA mappings.",
>  		[vmw_dma_map_bind] = "Giving up DMA mappings early."};
>  
> -	if (vmw_force_coherent)
> -		dev_priv->map_mode = vmw_dma_alloc_coherent;
> -	else if (vmw_assume_iommu(dev_priv->dev))
> -		dev_priv->map_mode = vmw_dma_map_populate;
> -	else if (!vmw_force_iommu)
> -		dev_priv->map_mode = vmw_dma_phys;
> -	else if (IS_ENABLED(CONFIG_SWIOTLB) && swiotlb_nr_tbl())
> +	if (vmw_force_coherent ||
> +	    (IS_ENABLED(CONFIG_SWIOTLB) && swiotlb_nr_tbl()))
>  		dev_priv->map_mode = vmw_dma_alloc_coherent;
> +	else if (vmw_restrict_iommu)
> +		dev_priv->map_mode = vmw_dma_map_bind;
>  	else
>  		dev_priv->map_mode = vmw_dma_map_populate;
>  
> -	if (dev_priv->map_mode == vmw_dma_map_populate &&
> vmw_restrict_iommu)
> -		dev_priv->map_mode = vmw_dma_map_bind;
> -
> -	/* No TTM coherent page pool? FIXME: Ask TTM instead! */
> -        if (!(IS_ENABLED(CONFIG_SWIOTLB) ||
> IS_ENABLED(CONFIG_INTEL_IOMMU)) &&
> -	    (dev_priv->map_mode == vmw_dma_alloc_coherent))
> -		return -EINVAL;
> -
>  	DRM_INFO("DMA map mode: %s\n", names[dev_priv->map_mode]);
>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-09 17:24             ` Thomas Hellstrom
@ 2019-04-10  6:43               ` hch
  2019-04-10 15:01                 ` Thomas Hellstrom
  0 siblings, 1 reply; 14+ messages in thread
From: hch @ 2019-04-10  6:43 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: hch, torvalds, linux-kernel, Deepak Singh Rawat, iommu

On Tue, Apr 09, 2019 at 05:24:48PM +0000, Thomas Hellstrom wrote:
> > Note that this only affects external, untrusted devices.  But that
> > may include eGPU,
> 
> What about discrete graphics cards, like Radeon and Nvidia? Who gets to
> determine what's trusted?

Based on firmware tables.  discrete graphics would not qualify unless
they are attached through thunderbolt bridges or external PCIe ports.

> GPU libraries traditionally have been taking care of the CPU mapping
> caching modes since the first AGP drivers. GPU MMU ptes commonly
> support various caching options and pages are changing caching mode
> dynamically.
> So even if the DMA layer needs to do the remapping, couldn't we do that
> on-demand when needed with a simple interface?

The problem is that there is no "simple" interface as the details
depend on the architecture.  We have the following base variants
to create coherent memory:

  1) do nothing - this works on x86-like platforms where I/O is always
     coherent
  2) use a special kernel segment, after flushing the caches for the
     normal segment, done on platforms like mips that have this
     special segment
  3) remap the existing kernel direct mapping, after flushing the
     caches, done by openrisc and in some cases arm32
  4) create a new mapping in vmap or ioremap space after flushing the
     caches - done by most architectures with an MMU and non-coherent
     devices
  5) use a special pool of uncached memory set aside by the hardware
     or firmware - done by most architectures without an MMU but with
     non-coherent devices

So that is just five major variants with a lot of details on how
it is done in practice.  Add to that that many of the operations
are fairly expensive and need to be pre-loaded.

> > That being said: your driver already uses the dma coherent API
> > under various circumstances, so you already have the above issues.
> 
> Yes, but they are hidden behind driver options. We can't have someone
> upgrade their kernel and suddenly things don't work anymore, That said,
> I think the SWIOTLB case is rare enough for the below solution to be
> acceptable, although the TTM check for the coherent page pool being
> available still needs to remain.

So can you please respin a version acceptable to you and submit it
for 5.1 ASAP?  Otherwise I'll need to move ahead with the simple
revert.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-10  6:43               ` hch
@ 2019-04-10 15:01                 ` Thomas Hellstrom
  2019-04-22 17:56                   ` hch
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Hellstrom @ 2019-04-10 15:01 UTC (permalink / raw)
  To: hch; +Cc: torvalds, linux-kernel, Deepak Singh Rawat, iommu

On Wed, 2019-04-10 at 08:43 +0200, hch@lst.de wrote:
> On Tue, Apr 09, 2019 at 05:24:48PM +0000, Thomas Hellstrom wrote:
> > > Note that this only affects external, untrusted devices.  But
> > > that
> > > may include eGPU,
> > 
> > What about discrete graphics cards, like Radeon and Nvidia? Who
> > gets to
> > determine what's trusted?
> 
> Based on firmware tables.  discrete graphics would not qualify unless
> they are attached through thunderbolt bridges or external PCIe ports.
> 
> > GPU libraries traditionally have been taking care of the CPU
> > mapping
> > caching modes since the first AGP drivers. GPU MMU ptes commonly
> > support various caching options and pages are changing caching mode
> > dynamically.
> > So even if the DMA layer needs to do the remapping, couldn't we do
> > that
> > on-demand when needed with a simple interface?
> 
> The problem is that there is no "simple" interface as the details
> depend on the architecture.  We have the following base variants
> to create coherent memory:
> 
>   1) do nothing - this works on x86-like platforms where I/O is
> always
>      coherent
>   2) use a special kernel segment, after flushing the caches for the
>      normal segment, done on platforms like mips that have this
>      special segment
>   3) remap the existing kernel direct mapping, after flushing the
>      caches, done by openrisc and in some cases arm32
>   4) create a new mapping in vmap or ioremap space after flushing the
>      caches - done by most architectures with an MMU and non-coherent
>      devices
>   5) use a special pool of uncached memory set aside by the hardware
>      or firmware - done by most architectures without an MMU but with
>      non-coherent devices
> 

Understood. Unfortunately IMO this severly limits the use of the
dma_alloc_coherent() method.

> So that is just five major variants with a lot of details on how
> it is done in practice.  Add to that that many of the operations
> are fairly expensive and need to be pre-loaded.
> 
> > > That being said: your driver already uses the dma coherent API
> > > under various circumstances, so you already have the above
> > > issues.
> > 
> > Yes, but they are hidden behind driver options. We can't have
> > someone
> > upgrade their kernel and suddenly things don't work anymore, That
> > said,
> > I think the SWIOTLB case is rare enough for the below solution to
> > be
> > acceptable, although the TTM check for the coherent page pool being
> > available still needs to remain.
> 
> So can you please respin a version acceptable to you and submit it
> for 5.1 ASAP?  Otherwise I'll need to move ahead with the simple
> revert.

I will. 
I need to do some testing to investigate how to best choose between the
options, but will have something ready for 5.1.

/Thomas


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-10 15:01                 ` Thomas Hellstrom
@ 2019-04-22 17:56                   ` hch
  2019-04-23 12:26                     ` Thomas Hellstrom
  0 siblings, 1 reply; 14+ messages in thread
From: hch @ 2019-04-22 17:56 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: hch, torvalds, linux-kernel, Deepak Singh Rawat, iommu

On Wed, Apr 10, 2019 at 03:01:14PM +0000, Thomas Hellstrom wrote:
> > So can you please respin a version acceptable to you and submit it
> > for 5.1 ASAP?  Otherwise I'll need to move ahead with the simple
> > revert.
> 
> I will. 
> I need to do some testing to investigate how to best choose between the
> options, but will have something ready for 5.1.

I still don't see anything in -rc6..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: revert dma direct internals abuse
  2019-04-22 17:56                   ` hch
@ 2019-04-23 12:26                     ` Thomas Hellstrom
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellstrom @ 2019-04-23 12:26 UTC (permalink / raw)
  To: hch; +Cc: torvalds, linux-kernel, Deepak Singh Rawat, iommu

Hi, Christoph,

On Mon, 2019-04-22 at 19:56 +0200, hch@lst.de wrote:
> On Wed, Apr 10, 2019 at 03:01:14PM +0000, Thomas Hellstrom wrote:
> > > So can you please respin a version acceptable to you and submit
> > > it
> > > for 5.1 ASAP?  Otherwise I'll need to move ahead with the simple
> > > revert.
> > 
> > I will. 
> > I need to do some testing to investigate how to best choose between
> > the
> > options, but will have something ready for 5.1.
> 
> I still don't see anything in -rc6..

Been on easter vacation. I have a patch ready for review, though, will
send it out in a moment.

It turns out that to do something well-behaved in case someone sets
swiotlb=force, the variable swiotlb_force would need to be exported.
(I can't rely on swiotlb_nr_tbl()).

That's not currently done in the patch, but instead the driver just
malfunctions like most other graphics drivers.

/Thomas


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-04-23 12:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-08 10:55 revert dma direct internals abuse Christoph Hellwig
2019-04-08 10:55 ` [PATCH] Revert "drm/vmwgfx: Improve on IOMMU detection" Christoph Hellwig
2019-04-08 20:05   ` Thomas Hellstrom
2019-04-08 18:47 ` revert dma direct internals abuse Thomas Hellstrom
2019-04-09  9:57   ` hch
2019-04-09 13:04     ` Thomas Hellstrom
2019-04-09 13:31       ` hch
2019-04-09 14:17         ` Thomas Hellstrom
2019-04-09 15:25           ` hch
2019-04-09 17:24             ` Thomas Hellstrom
2019-04-10  6:43               ` hch
2019-04-10 15:01                 ` Thomas Hellstrom
2019-04-22 17:56                   ` hch
2019-04-23 12:26                     ` Thomas Hellstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).