linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping
@ 2018-10-10 23:33 John Stultz
  2018-10-12 17:51 ` Laura Abbott
  0 siblings, 1 reply; 7+ messages in thread
From: John Stultz @ 2018-10-10 23:33 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Beata Michalska, Matt Szczesiak, Anders Pedersen,
	John Reitan, Liam Mark, Laura Abbott, Sumit Semwal,
	Greg Kroah-Hartman, Todd Kjos, Martijn Coenen, dri-devel

Since 4.12, much later narrowed down to commit 2a55e7b5e544
("staging: android: ion: Call dma_map_sg for syncing and mapping"),
we have seen graphics performance issues on the HiKey960.

This was initially confounded by the fact that the out-of-tree
DRM driver was using HiSi custom ION heap which broke with the
4.12 ION abi changes, so there was lots of suspicion that the
performance problems were due to switching to a somewhat simple
cma based DRM driver for HiKey960. Additionally, as no
performance regression was seen w/ the original HiKey board
(which is SMP, not big.LITTLE as w/ HiKey960), there was some
thought that the out-of-tree EAS code wasn't quite optimized.

But after chasing a number of other leads, I found that
reverting the ION code to 4.11-era got the majority of the
graphics performance back (there may yet be further EAS tweaks
needed), which lead me to the dma_map_sg change.

In talking w/ Laura and Liam, it was suspected that the extra
cache operations were causing the trouble. Additionally, I found
that part of the reason we didn't see this w/ the original
HiKey board is that its (proprietary blob) GL code uses ion_mmap
and ion_map_dma_buf is called very rarely, where as with
HiKey960, the (also proprietary blob) GL code calls
ion_map_dma_buf much more frequently via the kernel driver.

Anyway, with the cause of the performance regression isolated,
I've tried to find a way to improve the performance of the
current code.

This approach, which I've mostly copied from the drm_prime
implementation is to try to track the direction we're mapping
the buffers so we can avoid calling dma_map/unmap_sg on every
ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
the work in attach/detach paths.

I'm not 100% sure of the correctness here, so close review would
be good, but it gets the performance back to being similar to
reverting the ION code to the 4.11-era.

Feedback would be greatly appreciated!

Cc: Beata Michalska <Beata.Michalska@arm.com>
Cc: Matt Szczesiak <matt.szczesiak@arm.com>
Cc: Anders Pedersen <Anders.Pedersen@arm.com>
Cc: John Reitan <John.Reitan@arm.com>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Todd Kjos <tkjos@android.com>
Cc: Martijn Coenen <maco@android.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/staging/android/ion/ion.c | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
index 9907332..a4d7fca 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -199,6 +199,7 @@ struct ion_dma_buf_attachment {
 	struct device *dev;
 	struct sg_table *table;
 	struct list_head list;
+	enum dma_data_direction dir;
 };
 
 static int ion_dma_buf_attach(struct dma_buf *dmabuf,
@@ -220,6 +221,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf,
 
 	a->table = table;
 	a->dev = attachment->dev;
+	a->dir = DMA_NONE;
 	INIT_LIST_HEAD(&a->list);
 
 	attachment->priv = a;
@@ -236,6 +238,18 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
 {
 	struct ion_dma_buf_attachment *a = attachment->priv;
 	struct ion_buffer *buffer = dmabuf->priv;
+	struct sg_table *table;
+
+	if (!a)
+		return;
+
+	table = a->table;
+	if (table) {
+		if (a->dir != DMA_NONE)
+			dma_unmap_sg(attachment->dev, table->sgl, table->nents,
+				     a->dir);
+		sg_free_table(table);
+	}
 
 	free_duped_table(a->table);
 	mutex_lock(&buffer->lock);
@@ -243,6 +257,7 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
 	mutex_unlock(&buffer->lock);
 
 	kfree(a);
+	attachment->priv = NULL;
 }
 
 static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
@@ -251,12 +266,24 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
 	struct ion_dma_buf_attachment *a = attachment->priv;
 	struct sg_table *table;
 
-	table = a->table;
+	if (WARN_ON(direction == DMA_NONE || !a))
+		return ERR_PTR(-EINVAL);
 
-	if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
-			direction))
-		return ERR_PTR(-ENOMEM);
+	if (a->dir == direction)
+		return a->table;
 
+	if (WARN_ON(a->dir != DMA_NONE))
+		return ERR_PTR(-EBUSY);
+
+	table = a->table;
+	if (!IS_ERR(table)) {
+		if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
+				direction)) {
+			table = ERR_PTR(-ENOMEM);
+		} else {
+			a->dir = direction;
+		}
+	}
 	return table;
 }
 
@@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment,
 			      struct sg_table *table,
 			      enum dma_data_direction direction)
 {
-	dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
 }
 
 static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping
  2018-10-10 23:33 [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping John Stultz
@ 2018-10-12 17:51 ` Laura Abbott
  2018-10-14  6:01   ` Liam Mark
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Laura Abbott @ 2018-10-12 17:51 UTC (permalink / raw)
  To: John Stultz, lkml
  Cc: Beata Michalska, Matt Szczesiak, Anders Pedersen, John Reitan,
	Liam Mark, Sumit Semwal, Greg Kroah-Hartman, Todd Kjos,
	Martijn Coenen, dri-devel

On 10/10/2018 04:33 PM, John Stultz wrote:
> Since 4.12, much later narrowed down to commit 2a55e7b5e544
> ("staging: android: ion: Call dma_map_sg for syncing and mapping"),
> we have seen graphics performance issues on the HiKey960.
> 
> This was initially confounded by the fact that the out-of-tree
> DRM driver was using HiSi custom ION heap which broke with the
> 4.12 ION abi changes, so there was lots of suspicion that the
> performance problems were due to switching to a somewhat simple
> cma based DRM driver for HiKey960. Additionally, as no
> performance regression was seen w/ the original HiKey board
> (which is SMP, not big.LITTLE as w/ HiKey960), there was some
> thought that the out-of-tree EAS code wasn't quite optimized.
> 
> But after chasing a number of other leads, I found that
> reverting the ION code to 4.11-era got the majority of the
> graphics performance back (there may yet be further EAS tweaks
> needed), which lead me to the dma_map_sg change.
> 
> In talking w/ Laura and Liam, it was suspected that the extra
> cache operations were causing the trouble. Additionally, I found
> that part of the reason we didn't see this w/ the original
> HiKey board is that its (proprietary blob) GL code uses ion_mmap
> and ion_map_dma_buf is called very rarely, where as with
> HiKey960, the (also proprietary blob) GL code calls
> ion_map_dma_buf much more frequently via the kernel driver.
> 
> Anyway, with the cause of the performance regression isolated,
> I've tried to find a way to improve the performance of the
> current code.
> 
> This approach, which I've mostly copied from the drm_prime
> implementation is to try to track the direction we're mapping
> the buffers so we can avoid calling dma_map/unmap_sg on every
> ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
> the work in attach/detach paths.
> 
> I'm not 100% sure of the correctness here, so close review would
> be good, but it gets the performance back to being similar to
> reverting the ION code to the 4.11-era.
> 
> Feedback would be greatly appreciated!
> 
> Cc: Beata Michalska <Beata.Michalska@arm.com>
> Cc: Matt Szczesiak <matt.szczesiak@arm.com>
> Cc: Anders Pedersen <Anders.Pedersen@arm.com>
> Cc: John Reitan <John.Reitan@arm.com>
> Cc: Liam Mark <lmark@codeaurora.org>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Todd Kjos <tkjos@android.com>
> Cc: Martijn Coenen <maco@android.com>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>   drivers/staging/android/ion/ion.c | 36 +++++++++++++++++++++++++++++++-----
>   1 file changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> index 9907332..a4d7fca 100644
> --- a/drivers/staging/android/ion/ion.c
> +++ b/drivers/staging/android/ion/ion.c
> @@ -199,6 +199,7 @@ struct ion_dma_buf_attachment {
>   	struct device *dev;
>   	struct sg_table *table;
>   	struct list_head list;
> +	enum dma_data_direction dir;
>   };
>   
>   static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> @@ -220,6 +221,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf,
>   
>   	a->table = table;
>   	a->dev = attachment->dev;
> +	a->dir = DMA_NONE;
>   	INIT_LIST_HEAD(&a->list);
>   
>   	attachment->priv = a;
> @@ -236,6 +238,18 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
>   {
>   	struct ion_dma_buf_attachment *a = attachment->priv;
>   	struct ion_buffer *buffer = dmabuf->priv;
> +	struct sg_table *table;
> +
> +	if (!a)
> +		return;
> +
> +	table = a->table;
> +	if (table) {
> +		if (a->dir != DMA_NONE)
> +			dma_unmap_sg(attachment->dev, table->sgl, table->nents,
> +				     a->dir);
> +		sg_free_table(table);
> +	}
>   
>   	free_duped_table(a->table);
>   	mutex_lock(&buffer->lock);
> @@ -243,6 +257,7 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
>   	mutex_unlock(&buffer->lock);
>   
>   	kfree(a);
> +	attachment->priv = NULL;
>   }
>   
>   static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> @@ -251,12 +266,24 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
>   	struct ion_dma_buf_attachment *a = attachment->priv;
>   	struct sg_table *table;
>   
> -	table = a->table;
> +	if (WARN_ON(direction == DMA_NONE || !a))
> +		return ERR_PTR(-EINVAL);
>   
> -	if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> -			direction))
> -		return ERR_PTR(-ENOMEM);
> +	if (a->dir == direction)
> +		return a->table;
>   
> +	if (WARN_ON(a->dir != DMA_NONE))
> +		return ERR_PTR(-EBUSY);
> +
> +	table = a->table;
> +	if (!IS_ERR(table)) {
> +		if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> +				direction)) {
> +			table = ERR_PTR(-ENOMEM);
> +		} else {
> +			a->dir = direction;
> +		}
> +	}
>   	return table;
>   }
>   
> @@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment,
>   			      struct sg_table *table,
>   			      enum dma_data_direction direction)
>   {
> -	dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);

This changes the semantics so that the only time a buffer
gets unmapped is on detach. I don't think we want to restrict
Ion to that behavior but I also don't know if anyone else
is relying on that. I thought there might have been some Qualcomm
stuff that did that (Liam? Todd?)

I suspect most of the cost of the dma_map/dma_unmap is from the
cache flushing and not the actual mapping operations. If this
is the case, another option might be to figure out how to
incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
to decide when they actually want to sync.

Thanks,
Laura

>   }
>   
>   static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping
  2018-10-12 17:51 ` Laura Abbott
@ 2018-10-14  6:01   ` Liam Mark
  2018-10-15 16:31     ` John Stultz
  2018-10-15 16:29   ` John Stultz
  2018-10-18  2:53   ` John Stultz
  2 siblings, 1 reply; 7+ messages in thread
From: Liam Mark @ 2018-10-14  6:01 UTC (permalink / raw)
  To: Laura Abbott
  Cc: John Stultz, lkml, Beata Michalska, Matt Szczesiak,
	Anders Pedersen, John Reitan, Sumit Semwal, Greg Kroah-Hartman,
	Todd Kjos, Martijn Coenen, dri-devel

On Fri, 12 Oct 2018, Laura Abbott wrote:

> On 10/10/2018 04:33 PM, John Stultz wrote:
> > Since 4.12, much later narrowed down to commit 2a55e7b5e544
> > ("staging: android: ion: Call dma_map_sg for syncing and mapping"),
> > we have seen graphics performance issues on the HiKey960.
> > 
> > This was initially confounded by the fact that the out-of-tree
> > DRM driver was using HiSi custom ION heap which broke with the
> > 4.12 ION abi changes, so there was lots of suspicion that the
> > performance problems were due to switching to a somewhat simple
> > cma based DRM driver for HiKey960. Additionally, as no
> > performance regression was seen w/ the original HiKey board
> > (which is SMP, not big.LITTLE as w/ HiKey960), there was some
> > thought that the out-of-tree EAS code wasn't quite optimized.
> > 
> > But after chasing a number of other leads, I found that
> > reverting the ION code to 4.11-era got the majority of the
> > graphics performance back (there may yet be further EAS tweaks
> > needed), which lead me to the dma_map_sg change.
> > 
> > In talking w/ Laura and Liam, it was suspected that the extra
> > cache operations were causing the trouble. Additionally, I found
> > that part of the reason we didn't see this w/ the original
> > HiKey board is that its (proprietary blob) GL code uses ion_mmap
> > and ion_map_dma_buf is called very rarely, where as with
> > HiKey960, the (also proprietary blob) GL code calls
> > ion_map_dma_buf much more frequently via the kernel driver.
> > 
> > Anyway, with the cause of the performance regression isolated,
> > I've tried to find a way to improve the performance of the
> > current code.
> > 
> > This approach, which I've mostly copied from the drm_prime
> > implementation is to try to track the direction we're mapping
> > the buffers so we can avoid calling dma_map/unmap_sg on every
> > ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
> > the work in attach/detach paths.
> > 
> > I'm not 100% sure of the correctness here, so close review would
> > be good, but it gets the performance back to being similar to
> > reverting the ION code to the 4.11-era.
> > 
> > Feedback would be greatly appreciated!
> > 
> > Cc: Beata Michalska <Beata.Michalska@arm.com>
> > Cc: Matt Szczesiak <matt.szczesiak@arm.com>
> > Cc: Anders Pedersen <Anders.Pedersen@arm.com>
> > Cc: John Reitan <John.Reitan@arm.com>
> > Cc: Liam Mark <lmark@codeaurora.org>
> > Cc: Laura Abbott <labbott@redhat.com>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Todd Kjos <tkjos@android.com>
> > Cc: Martijn Coenen <maco@android.com>
> > Cc: dri-devel@lists.freedesktop.org
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > ---
> >   drivers/staging/android/ion/ion.c | 36
> > +++++++++++++++++++++++++++++++-----
> >   1 file changed, 31 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/staging/android/ion/ion.c
> > b/drivers/staging/android/ion/ion.c
> > index 9907332..a4d7fca 100644
> > --- a/drivers/staging/android/ion/ion.c
> > +++ b/drivers/staging/android/ion/ion.c
> > @@ -199,6 +199,7 @@ struct ion_dma_buf_attachment {
> >   	struct device *dev;
> >   	struct sg_table *table;
> >   	struct list_head list;
> > +	enum dma_data_direction dir;
> >   };
> >     static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> > @@ -220,6 +221,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> >     	a->table = table;
> >   	a->dev = attachment->dev;
> > +	a->dir = DMA_NONE;
> >   	INIT_LIST_HEAD(&a->list);
> >     	attachment->priv = a;
> > @@ -236,6 +238,18 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
> >   {
> >   	struct ion_dma_buf_attachment *a = attachment->priv;
> >   	struct ion_buffer *buffer = dmabuf->priv;
> > +	struct sg_table *table;
> > +
> > +	if (!a)
> > +		return;
> > +
> > +	table = a->table;
> > +	if (table) {
> > +		if (a->dir != DMA_NONE)
> > +			dma_unmap_sg(attachment->dev, table->sgl,
> > table->nents,
> > +				     a->dir);
> > +		sg_free_table(table);
> > +	}
> >     	free_duped_table(a->table);
> >   	mutex_lock(&buffer->lock);
> > @@ -243,6 +257,7 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
> >   	mutex_unlock(&buffer->lock);
> >     	kfree(a);
> > +	attachment->priv = NULL;
> >   }
> >     static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment
> > *attachment,
> > @@ -251,12 +266,24 @@ static struct sg_table *ion_map_dma_buf(struct
> > dma_buf_attachment *attachment,
> >   	struct ion_dma_buf_attachment *a = attachment->priv;
> >   	struct sg_table *table;
> >   -	table = a->table;
> > +	if (WARN_ON(direction == DMA_NONE || !a))
> > +		return ERR_PTR(-EINVAL);
> >   -	if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > -			direction))
> > -		return ERR_PTR(-ENOMEM);
> > +	if (a->dir == direction)
> > +		return a->table;
> >   +	if (WARN_ON(a->dir != DMA_NONE))
> > +		return ERR_PTR(-EBUSY);
> > +
> > +	table = a->table;
> > +	if (!IS_ERR(table)) {
> > +		if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > +				direction)) {
> > +			table = ERR_PTR(-ENOMEM);
> > +		} else {
> > +			a->dir = direction;
> > +		}
> > +	}
> >   	return table;
> >   }
> >   @@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct
> > dma_buf_attachment *attachment,
> >   			      struct sg_table *table,
> >   			      enum dma_data_direction direction)
> >   {
> > -	dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
> 
> This changes the semantics so that the only time a buffer
> gets unmapped is on detach. I don't think we want to restrict
> Ion to that behavior but I also don't know if anyone else
> is relying on that. 

I also have a concern with this patch, wouldn't it run into difficulties 
if multiple devices were attached to the same cached ION buffer and one of 
those devices was IO-coherent but the other one wasn't.
For example if device A (which is non IO-coherent) wrote to the buffer but 
skipped the cache invalidation on dma_unmap and then if the buffer was 
dma-mapped into device B (which is IO-coherent) and then device B 
attempted to read the buffer it may end up reading stale cache entries.

I don't believe there is any requirement for device A to detach (which 
would do the cache invalidation) before having device B dma-map the 
buffer.

I believe there would also be issues if device B wrote to the buffer and 
then device A tried to read or write it.

> I thought there might have been some Qualcomm
> stuff that did that (Liam? Todd?)

Yes we have a form of "lazy mapping", which clients can opt into using, 
which results in iommu page table mapping not being unmaped on dma-unamp. 
Instead they are unmapped when the buffer is destroyed.

It is important to note that in our "lazy mapping" implementation cache 
maintenance is still applied on dma-map and dma-unmap.
If you need a full description of this feature I can provide it.

> 
> I suspect most of the cost of the dma_map/dma_unmap is from the
> cache flushing and not the actual mapping operations. If this
> is the case, another option might be to figure out how to
> incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
> to decide when they actually want to sync.

We have support for this locally on our 4.14 branch.
We have added a dma_map_attrs field to the dma_buf_attachment struct, 
clients can then specify dma-attributes here such as 
DMA_ATTR_SKIP_CPU_SYNC before dma-mapping the buffer, then we ensure that 
these dma attributes are passed to dma_map_sg_attrs when ion_map_dma_buf 
is called (same for ion_unmap_dma_buf).

I hope to try and upstream this at some point.

> 
> Thanks,
> Laura
> 
> >   }
> >     static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> > 
> 
> 

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping
  2018-10-12 17:51 ` Laura Abbott
  2018-10-14  6:01   ` Liam Mark
@ 2018-10-15 16:29   ` John Stultz
  2018-10-18  2:53   ` John Stultz
  2 siblings, 0 replies; 7+ messages in thread
From: John Stultz @ 2018-10-15 16:29 UTC (permalink / raw)
  To: Laura Abbott
  Cc: lkml, Beata Michalska, Matt Szczesiak, Anders Pedersen,
	John Reitan, Liam Mark, Sumit Semwal, Greg Kroah-Hartman,
	Todd Kjos, Martijn Coenen, dri-devel

On Fri, Oct 12, 2018 at 10:51 AM, Laura Abbott <labbott@redhat.com> wrote:
> On 10/10/2018 04:33 PM, John Stultz wrote:
>>
>> Since 4.12, much later narrowed down to commit 2a55e7b5e544
>> ("staging: android: ion: Call dma_map_sg for syncing and mapping"),
>> we have seen graphics performance issues on the HiKey960.
>>
>> This was initially confounded by the fact that the out-of-tree
>> DRM driver was using HiSi custom ION heap which broke with the
>> 4.12 ION abi changes, so there was lots of suspicion that the
>> performance problems were due to switching to a somewhat simple
>> cma based DRM driver for HiKey960. Additionally, as no
>> performance regression was seen w/ the original HiKey board
>> (which is SMP, not big.LITTLE as w/ HiKey960), there was some
>> thought that the out-of-tree EAS code wasn't quite optimized.
>>
>> But after chasing a number of other leads, I found that
>> reverting the ION code to 4.11-era got the majority of the
>> graphics performance back (there may yet be further EAS tweaks
>> needed), which lead me to the dma_map_sg change.
>>
>> In talking w/ Laura and Liam, it was suspected that the extra
>> cache operations were causing the trouble. Additionally, I found
>> that part of the reason we didn't see this w/ the original
>> HiKey board is that its (proprietary blob) GL code uses ion_mmap
>> and ion_map_dma_buf is called very rarely, where as with
>> HiKey960, the (also proprietary blob) GL code calls
>> ion_map_dma_buf much more frequently via the kernel driver.
>>
>> Anyway, with the cause of the performance regression isolated,
>> I've tried to find a way to improve the performance of the
>> current code.
>>
>> This approach, which I've mostly copied from the drm_prime
>> implementation is to try to track the direction we're mapping
>> the buffers so we can avoid calling dma_map/unmap_sg on every
>> ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
>> the work in attach/detach paths.
>>
>> I'm not 100% sure of the correctness here, so close review would
>> be good, but it gets the performance back to being similar to
>> reverting the ION code to the 4.11-era.
>>
>> Feedback would be greatly appreciated!
>>
...
>>   @@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct
>> dma_buf_attachment *attachment,
>>                               struct sg_table *table,
>>                               enum dma_data_direction direction)
>>   {
>> -       dma_unmap_sg(attachment->dev, table->sgl, table->nents,
>> direction);
>
>
> This changes the semantics so that the only time a buffer
> gets unmapped is on detach. I don't think we want to restrict
> Ion to that behavior but I also don't know if anyone else
> is relying on that. I thought there might have been some Qualcomm
> stuff that did that (Liam? Todd?)
>
> I suspect most of the cost of the dma_map/dma_unmap is from the
> cache flushing and not the actual mapping operations. If this
> is the case, another option might be to figure out how to
> incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
> to decide when they actually want to sync.

Ok. Thanks so much for the feedback and the suggestion. I'll try to
look into dma_attrs here shortly.

thanks
-john

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping
  2018-10-14  6:01   ` Liam Mark
@ 2018-10-15 16:31     ` John Stultz
  0 siblings, 0 replies; 7+ messages in thread
From: John Stultz @ 2018-10-15 16:31 UTC (permalink / raw)
  To: Liam Mark
  Cc: Laura Abbott, lkml, Beata Michalska, Matt Szczesiak,
	Anders Pedersen, John Reitan, Sumit Semwal, Greg Kroah-Hartman,
	Todd Kjos, Martijn Coenen, dri-devel

On Sat, Oct 13, 2018 at 11:01 PM, Liam Mark <lmark@codeaurora.org> wrote:
> On Fri, 12 Oct 2018, Laura Abbott wrote:
>> I thought there might have been some Qualcomm
>> stuff that did that (Liam? Todd?)
>
> Yes we have a form of "lazy mapping", which clients can opt into using,
> which results in iommu page table mapping not being unmaped on dma-unamp.
> Instead they are unmapped when the buffer is destroyed.
>
> It is important to note that in our "lazy mapping" implementation cache
> maintenance is still applied on dma-map and dma-unmap.
> If you need a full description of this feature I can provide it.
>
>>
>> I suspect most of the cost of the dma_map/dma_unmap is from the
>> cache flushing and not the actual mapping operations. If this
>> is the case, another option might be to figure out how to
>> incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
>> to decide when they actually want to sync.
>
> We have support for this locally on our 4.14 branch.
>
> We have added a dma_map_attrs field to the dma_buf_attachment struct,
> clients can then specify dma-attributes here such as
> DMA_ATTR_SKIP_CPU_SYNC before dma-mapping the buffer, then we ensure that
> these dma attributes are passed to dma_map_sg_attrs when ion_map_dma_buf
> is called (same for ion_unmap_dma_buf).

Do you happen to have a pointer to the tree if its public?

thanks
-john

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping
  2018-10-12 17:51 ` Laura Abbott
  2018-10-14  6:01   ` Liam Mark
  2018-10-15 16:29   ` John Stultz
@ 2018-10-18  2:53   ` John Stultz
  2018-10-21  9:13     ` Laura Abbott
  2 siblings, 1 reply; 7+ messages in thread
From: John Stultz @ 2018-10-18  2:53 UTC (permalink / raw)
  To: Laura Abbott
  Cc: lkml, Beata Michalska, Matt Szczesiak, Anders Pedersen,
	John Reitan, Liam Mark, Sumit Semwal, Greg Kroah-Hartman,
	Todd Kjos, Martijn Coenen, dri-devel

On Fri, Oct 12, 2018 at 10:51 AM, Laura Abbott <labbott@redhat.com> wrote:
>
> I suspect most of the cost of the dma_map/dma_unmap is from the
> cache flushing and not the actual mapping operations. If this
> is the case, another option might be to figure out how to
> incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
> to decide when they actually want to sync.

So just to confirm on this point, I basically tested the following
change (whitespace corrupt, sorry):

diff --git a/drivers/staging/android/ion/ion.c
b/drivers/staging/android/ion/ion.c
index 24cb666..e76b0e2 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -273,8 +273,8 @@ static struct sg_table *ion_map_dma_buf(struct
dma_buf_attachment *attachment,

        table = a->table;

-       if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
-                       direction))
+       if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
+                             direction,  DMA_ATTR_SKIP_CPU_SYNC))
                return ERR_PTR(-ENOMEM);

        return table;
@@ -284,7 +284,7 @@ static void ion_unmap_dma_buf(struct
dma_buf_attachment *attachment,
                              struct sg_table *table,
                              enum dma_data_direction direction)
 {
-       dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
+       dma_unmap_sg_attrs(attachment->dev, table->sgl, table->nents,
direction, DMA_ATTR_SKIP_CPU_SYNC);
 }

 static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)


And indeed, that performed similarly to the pre 4.12 ION code (and it
also had some of the same image caching error garbage we've seen w/
4.9 era kernels, which my earlier patch didn't have).

So yes, it seems having some way to conditionally skip cpu sync would
be good.  Though I'm not sure what sort of interface to using this you
might have in mind?

thanks
-john

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping
  2018-10-18  2:53   ` John Stultz
@ 2018-10-21  9:13     ` Laura Abbott
  0 siblings, 0 replies; 7+ messages in thread
From: Laura Abbott @ 2018-10-21  9:13 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Beata Michalska, Matt Szczesiak, Anders Pedersen,
	John Reitan, Liam Mark, Sumit Semwal, Greg Kroah-Hartman,
	Todd Kjos, Martijn Coenen, dri-devel

On 10/17/2018 07:53 PM, John Stultz wrote:
> On Fri, Oct 12, 2018 at 10:51 AM, Laura Abbott <labbott@redhat.com> wrote:
>>
>> I suspect most of the cost of the dma_map/dma_unmap is from the
>> cache flushing and not the actual mapping operations. If this
>> is the case, another option might be to figure out how to
>> incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
>> to decide when they actually want to sync.
> 
> So just to confirm on this point, I basically tested the following
> change (whitespace corrupt, sorry):
> 
> diff --git a/drivers/staging/android/ion/ion.c
> b/drivers/staging/android/ion/ion.c
> index 24cb666..e76b0e2 100644
> --- a/drivers/staging/android/ion/ion.c
> +++ b/drivers/staging/android/ion/ion.c
> @@ -273,8 +273,8 @@ static struct sg_table *ion_map_dma_buf(struct
> dma_buf_attachment *attachment,
> 
>          table = a->table;
> 
> -       if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> -                       direction))
> +       if (!dma_map_sg_attrs(attachment->dev, table->sgl, table->nents,
> +                             direction,  DMA_ATTR_SKIP_CPU_SYNC))
>                  return ERR_PTR(-ENOMEM);
> 
>          return table;
> @@ -284,7 +284,7 @@ static void ion_unmap_dma_buf(struct
> dma_buf_attachment *attachment,
>                                struct sg_table *table,
>                                enum dma_data_direction direction)
>   {
> -       dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
> +       dma_unmap_sg_attrs(attachment->dev, table->sgl, table->nents,
> direction, DMA_ATTR_SKIP_CPU_SYNC);
>   }
> 
>   static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> 
> 
> And indeed, that performed similarly to the pre 4.12 ION code (and it
> also had some of the same image caching error garbage we've seen w/
> 4.9 era kernels, which my earlier patch didn't have).
> 
> So yes, it seems having some way to conditionally skip cpu sync would
> be good.  Though I'm not sure what sort of interface to using this you
> might have in mind?
> 

I hadn't quite gotten anything fully formed but I think solving this
will require changes at the dma-buf layer. One idea I had was allowing
dma_attrs to be set per attachment, the other was extending the
dma_buf_map_attachment to take an attrs argument. I'm at OSSEU this
week but I didn't want to forget about this. I'll see if I can turn
this into a more coherent proposal unless you get to it first.

Thanks,
Laura

> thanks
> -john
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-10-21  9:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-10 23:33 [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping John Stultz
2018-10-12 17:51 ` Laura Abbott
2018-10-14  6:01   ` Liam Mark
2018-10-15 16:31     ` John Stultz
2018-10-15 16:29   ` John Stultz
2018-10-18  2:53   ` John Stultz
2018-10-21  9:13     ` Laura Abbott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).