All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately
@ 2018-04-11 17:11 Robin Murphy
  2018-04-11 17:11 ` [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing Robin Murphy
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Robin Murphy @ 2018-04-11 17:11 UTC (permalink / raw)
  To: amd-gfx, dri-devel; +Cc: okaya, alexander.deucher, christian.koenig

For dma_map_sg(), DMA API implementations are free to merge consecutive
segments into a single DMA mapping if conditions are suitable, thus the
resulting DMA addresses may be packed into fewer entries than
ttm->sg->nents implies.

drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
its callers either have to reject the 0 < count < nents case or risk
getting bogus addresses back later. Fortunately this is relatively easy
to deal with having to rejig structures to also store the mapped count,
since the total DMA length should still be equal to the total buffer
length. All we need is a separate scatterlist cursor to iterate the DMA
addresses separately from the CPU addresses.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

Off the back of Sinan's proposal for a workaround, I took a closer look
and this jumped out - I have no hardware to test it, nor do I really
know my way around this code, so I'm probably missing something, but at
face value this seems like the only obvious problem, and worth fixing
either way.

These patches are based on drm-next, and compile-tested (for arm64) only.

Robin.

 drivers/gpu/drm/drm_prime.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 7856a9b3f8a8..db3dc8489afc 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -933,16 +933,18 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
 				     dma_addr_t *addrs, int max_entries)
 {
 	unsigned count;
-	struct scatterlist *sg;
+	struct scatterlist *sg, *dma_sg;
 	struct page *page;
-	u32 len, index;
+	u32 len, dma_len, index;
 	dma_addr_t addr;
 
 	index = 0;
+	dma_sg = sgt->sgl;
+	dma_len = sg_dma_len(dma_sg);
+	addr = sg_dma_address(dma_sg);
 	for_each_sg(sgt->sgl, sg, sgt->nents, count) {
 		len = sg->length;
 		page = sg_page(sg);
-		addr = sg_dma_address(sg);
 
 		while (len > 0) {
 			if (WARN_ON(index >= max_entries))
@@ -957,6 +959,12 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
 			len -= PAGE_SIZE;
 			index++;
 		}
+
+		if (dma_len == 0) {
+			dma_sg = sg_next(dma_sg);
+			dma_len = sg_dma_len(dma_sg);
+			addr = sg_dma_address(dma_sg);
+		}
 	}
 	return 0;
 }
-- 
2.16.1.dirty

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing
  2018-04-11 17:11 [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately Robin Murphy
@ 2018-04-11 17:11 ` Robin Murphy
  2018-04-11 18:28   ` Christian König
  2018-04-11 17:45 ` [1/2] drm/prime: Iterate SG DMA addresses separately Robin Murphy
       [not found] ` <0901c8c3b9adbcb851ba58dfca6b16d12ccbcb0f.1523465719.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
  2 siblings, 1 reply; 13+ messages in thread
From: Robin Murphy @ 2018-04-11 17:11 UTC (permalink / raw)
  To: amd-gfx, dri-devel; +Cc: okaya, alexander.deucher, christian.koenig

Now that drm_prime_sg_to_page_addr_arrays() understands the case where
dma_map_sg() has coalesced segments and returns 0 < count < nents, we
can relax the check to only consider genuine failure.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 205da3ff9cd0..f81e96a4242f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -813,7 +813,7 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 
 	r = -ENOMEM;
 	nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
-	if (nents != ttm->sg->nents)
+	if (nents == 0)
 		goto release_sg;
 
 	drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
-- 
2.16.1.dirty

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [1/2] drm/prime: Iterate SG DMA addresses separately
  2018-04-11 17:11 [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately Robin Murphy
  2018-04-11 17:11 ` [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing Robin Murphy
@ 2018-04-11 17:45 ` Robin Murphy
       [not found] ` <0901c8c3b9adbcb851ba58dfca6b16d12ccbcb0f.1523465719.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
  2 siblings, 0 replies; 13+ messages in thread
From: Robin Murphy @ 2018-04-11 17:45 UTC (permalink / raw)
  To: amd-gfx, dri-devel; +Cc: okaya, alexander.deucher, christian.koenig

On 11/04/18 18:11, Robin Murphy wrote:
> For dma_map_sg(), DMA API implementations are free to merge consecutive
> segments into a single DMA mapping if conditions are suitable, thus the
> resulting DMA addresses may be packed into fewer entries than
> ttm->sg->nents implies.
> 
> drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
> its callers either have to reject the 0 < count < nents case or risk
> getting bogus addresses back later. Fortunately this is relatively easy
> to deal with having to rejig structures to also store the mapped count,
> since the total DMA length should still be equal to the total buffer
> length. All we need is a separate scatterlist cursor to iterate the DMA
> addresses separately from the CPU addresses.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> Off the back of Sinan's proposal for a workaround, I took a closer look
> and this jumped out - I have no hardware to test it, nor do I really
> know my way around this code, so I'm probably missing something, but at
> face value this seems like the only obvious problem, and worth fixing
> either way.
> 
> These patches are based on drm-next, and compile-tested (for arm64) only.
> 
> Robin.
> 
>   drivers/gpu/drm/drm_prime.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 7856a9b3f8a8..db3dc8489afc 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -933,16 +933,18 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
>   				     dma_addr_t *addrs, int max_entries)
>   {
>   	unsigned count;
> -	struct scatterlist *sg;
> +	struct scatterlist *sg, *dma_sg;
>   	struct page *page;
> -	u32 len, index;
> +	u32 len, dma_len, index;
>   	dma_addr_t addr;
>   
>   	index = 0;
> +	dma_sg = sgt->sgl;
> +	dma_len = sg_dma_len(dma_sg);
> +	addr = sg_dma_address(dma_sg);
>   	for_each_sg(sgt->sgl, sg, sgt->nents, count) {
>   		len = sg->length;
>   		page = sg_page(sg);
> -		addr = sg_dma_address(sg);
>   
>   		while (len > 0) {
>   			if (WARN_ON(index >= max_entries))
> @@ -957,6 +959,12 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
>   			len -= PAGE_SIZE;

+			dma_len -= PAGE_SIZE;

Ugh, somehow that bit got lost. Told you I'd have missed something, 
although I was rather assuming it would be something less obvious...

Robin.

>   			index++;
>   		}
> +
> +		if (dma_len == 0) {
> +			dma_sg = sg_next(dma_sg);
> +			dma_len = sg_dma_len(dma_sg);
> +			addr = sg_dma_address(dma_sg);
> +		}
>   	}
>   	return 0;
>   }
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately
       [not found] ` <0901c8c3b9adbcb851ba58dfca6b16d12ccbcb0f.1523465719.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
@ 2018-04-11 18:26   ` Christian König
       [not found]     ` <67b1875d-9f77-5fb8-bfc6-53d34c15ab16-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2018-04-11 18:26 UTC (permalink / raw)
  To: Robin Murphy, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: okaya-sgV2jX0FEOL9JmXXK+q4OQ, alexander.deucher-5C7GfCeVMHo,
	David1.Zhou-5C7GfCeVMHo

Am 11.04.2018 um 19:11 schrieb Robin Murphy:
> For dma_map_sg(), DMA API implementations are free to merge consecutive
> segments into a single DMA mapping if conditions are suitable, thus the
> resulting DMA addresses may be packed into fewer entries than
> ttm->sg->nents implies.
>
> drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
> its callers either have to reject the 0 < count < nents case or risk
> getting bogus addresses back later. Fortunately this is relatively easy
> to deal with having to rejig structures to also store the mapped count,
> since the total DMA length should still be equal to the total buffer
> length. All we need is a separate scatterlist cursor to iterate the DMA
> addresses separately from the CPU addresses.

Mhm, I think I like Sinas approach better.

See the hardware actually needs the dma_address on a page by page basis.

Joining multiple consecutive pages into one entry is just additional 
overhead which we don't need.

Regards,
Christian.

>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> Off the back of Sinan's proposal for a workaround, I took a closer look
> and this jumped out - I have no hardware to test it, nor do I really
> know my way around this code, so I'm probably missing something, but at
> face value this seems like the only obvious problem, and worth fixing
> either way.
>
> These patches are based on drm-next, and compile-tested (for arm64) only.
>
> Robin.
>
>   drivers/gpu/drm/drm_prime.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 7856a9b3f8a8..db3dc8489afc 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -933,16 +933,18 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
>   				     dma_addr_t *addrs, int max_entries)
>   {
>   	unsigned count;
> -	struct scatterlist *sg;
> +	struct scatterlist *sg, *dma_sg;
>   	struct page *page;
> -	u32 len, index;
> +	u32 len, dma_len, index;
>   	dma_addr_t addr;
>   
>   	index = 0;
> +	dma_sg = sgt->sgl;
> +	dma_len = sg_dma_len(dma_sg);
> +	addr = sg_dma_address(dma_sg);
>   	for_each_sg(sgt->sgl, sg, sgt->nents, count) {
>   		len = sg->length;
>   		page = sg_page(sg);
> -		addr = sg_dma_address(sg);
>   
>   		while (len > 0) {
>   			if (WARN_ON(index >= max_entries))
> @@ -957,6 +959,12 @@ int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
>   			len -= PAGE_SIZE;
>   			index++;
>   		}
> +
> +		if (dma_len == 0) {
> +			dma_sg = sg_next(dma_sg);
> +			dma_len = sg_dma_len(dma_sg);
> +			addr = sg_dma_address(dma_sg);
> +		}
>   	}
>   	return 0;
>   }

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing
  2018-04-11 17:11 ` [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing Robin Murphy
@ 2018-04-11 18:28   ` Christian König
  2018-04-12 17:53     ` Robin Murphy
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2018-04-11 18:28 UTC (permalink / raw)
  To: Robin Murphy, amd-gfx, dri-devel; +Cc: okaya, alexander.deucher

Am 11.04.2018 um 19:11 schrieb Robin Murphy:
> Now that drm_prime_sg_to_page_addr_arrays() understands the case where
> dma_map_sg() has coalesced segments and returns 0 < count < nents, we
> can relax the check to only consider genuine failure.

That pattern is repeated in pretty much all drivers using TTM.

So you would need to fix all of them, but as I said I don't think that 
this approach is a good idea.

We essentially wanted to get rid of the dma_address array in the mid 
term and that change goes into the exactly opposite direction.

Regards,
Christian.

>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 205da3ff9cd0..f81e96a4242f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -813,7 +813,7 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt *ttm)
>   
>   	r = -ENOMEM;
>   	nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, direction);
> -	if (nents != ttm->sg->nents)
> +	if (nents == 0)
>   		goto release_sg;
>   
>   	drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately
       [not found]     ` <67b1875d-9f77-5fb8-bfc6-53d34c15ab16-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-12  9:11       ` Lucas Stach
       [not found]         ` <1523524317.4981.24.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Lucas Stach @ 2018-04-12  9:11 UTC (permalink / raw)
  To: Christian König, Robin Murphy,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: okaya-sgV2jX0FEOL9JmXXK+q4OQ, alexander.deucher-5C7GfCeVMHo

Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
> > For dma_map_sg(), DMA API implementations are free to merge consecutive
> > segments into a single DMA mapping if conditions are suitable, thus the
> > resulting DMA addresses may be packed into fewer entries than
> > ttm->sg->nents implies.
> > 
> > drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
> > its callers either have to reject the 0 < count < nents case or risk
> > getting bogus addresses back later. Fortunately this is relatively easy
> > to deal with having to rejig structures to also store the mapped count,
> > since the total DMA length should still be equal to the total buffer
> > length. All we need is a separate scatterlist cursor to iterate the DMA
> > addresses separately from the CPU addresses.
> 
> Mhm, I think I like Sinas approach better.
> 
> See the hardware actually needs the dma_address on a page by page basis.
> 
> Joining multiple consecutive pages into one entry is just additional 
> overhead which we don't need.

But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
be in front of your GPU to map large pages as such, causing additional
overhead by means of additional TLB misses and page walks on the IOMMU
side.

And wouldn't you like to use huge pages at the GPU side, if the IOMMU
already provides you the service of producing one large consecutive
address range, rather than mapping them via a number of small pages?
Detecting such a condition is much easier if the DMA map implementation
gives you the coalesced scatter entries.

Regards,
Lucas
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately
       [not found]         ` <1523524317.4981.24.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
@ 2018-04-12  9:35           ` Christian König
       [not found]             ` <58ef1aab-6c3a-8e69-0a7e-98218ba9fe96-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Christian König @ 2018-04-12  9:35 UTC (permalink / raw)
  To: Lucas Stach, Robin Murphy,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: okaya-sgV2jX0FEOL9JmXXK+q4OQ, alexander.deucher-5C7GfCeVMHo

Am 12.04.2018 um 11:11 schrieb Lucas Stach:
> Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
>> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
>>> For dma_map_sg(), DMA API implementations are free to merge consecutive
>>> segments into a single DMA mapping if conditions are suitable, thus the
>>> resulting DMA addresses may be packed into fewer entries than
>>> ttm->sg->nents implies.
>>>
>>> drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
>>> its callers either have to reject the 0 < count < nents case or risk
>>> getting bogus addresses back later. Fortunately this is relatively easy
>>> to deal with having to rejig structures to also store the mapped count,
>>> since the total DMA length should still be equal to the total buffer
>>> length. All we need is a separate scatterlist cursor to iterate the DMA
>>> addresses separately from the CPU addresses.
>> Mhm, I think I like Sinas approach better.
>>
>> See the hardware actually needs the dma_address on a page by page basis.
>>
>> Joining multiple consecutive pages into one entry is just additional
>> overhead which we don't need.
> But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
> be in front of your GPU to map large pages as such, causing additional
> overhead by means of additional TLB misses and page walks on the IOMMU
> side.
>
> And wouldn't you like to use huge pages at the GPU side, if the IOMMU
> already provides you the service of producing one large consecutive
> address range, rather than mapping them via a number of small pages?

No, I wouldn't like to use that. We're already using that :)

But we use huge pages by allocating consecutive chunks of memory so that 
both the CPU as well as the GPU can increase their TLB hit rate.

What currently happens is that the DMA subsystem tries to coalesce 
multiple pages into on SG entry and we de-coalesce that inside the 
driver again to create our random access array.

That is a huge waste of memory and CPU cycles and I actually wanted to 
get rid of it for quite some time now. Sinas approach seems to be a good 
step into the right direction to me to actually clean that up.

> Detecting such a condition is much easier if the DMA map implementation
> gives you the coalesced scatter entries.

A way which preserves both path would be indeed nice to have, but that 
only allows for the TLB optimization for the GPU and not the CPU any 
more. So I actually see that as really minor use case.

Regards,
Christian.

>
> Regards,
> Lucas

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately
       [not found]             ` <58ef1aab-6c3a-8e69-0a7e-98218ba9fe96-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-12  9:49               ` Lucas Stach
       [not found]                 ` <1523526540.4981.26.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Lucas Stach @ 2018-04-12  9:49 UTC (permalink / raw)
  To: Christian König, Robin Murphy,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: okaya-sgV2jX0FEOL9JmXXK+q4OQ, alexander.deucher-5C7GfCeVMHo

Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:
> Am 12.04.2018 um 11:11 schrieb Lucas Stach:
> > Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
> > > Am 11.04.2018 um 19:11 schrieb Robin Murphy:
> > > > For dma_map_sg(), DMA API implementations are free to merge consecutive
> > > > segments into a single DMA mapping if conditions are suitable, thus the
> > > > resulting DMA addresses may be packed into fewer entries than
> > > > ttm->sg->nents implies.
> > > > 
> > > > drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
> > > > its callers either have to reject the 0 < count < nents case or risk
> > > > getting bogus addresses back later. Fortunately this is relatively easy
> > > > to deal with having to rejig structures to also store the mapped count,
> > > > since the total DMA length should still be equal to the total buffer
> > > > length. All we need is a separate scatterlist cursor to iterate the DMA
> > > > addresses separately from the CPU addresses.
> > > 
> > > Mhm, I think I like Sinas approach better.
> > > 
> > > See the hardware actually needs the dma_address on a page by page basis.
> > > 
> > > Joining multiple consecutive pages into one entry is just additional
> > > overhead which we don't need.
> > 
> > But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
> > be in front of your GPU to map large pages as such, causing additional
> > overhead by means of additional TLB misses and page walks on the IOMMU
> > side.
> > 
> > And wouldn't you like to use huge pages at the GPU side, if the IOMMU
> > already provides you the service of producing one large consecutive
> > address range, rather than mapping them via a number of small pages?
> 
> No, I wouldn't like to use that. We're already using that :)
> 
> But we use huge pages by allocating consecutive chunks of memory so that 
> both the CPU as well as the GPU can increase their TLB hit rate.

I'm not convinced that this is a universal win. Many GPU buffers aren't
accessed by the CPU and allocating huge pages puts much more strain on
the kernel MM subsystem.

> What currently happens is that the DMA subsystem tries to coalesce 
> multiple pages into on SG entry and we de-coalesce that inside the 
> driver again to create our random access array.

I guess the right thing would be to have a flag that tells the the DMA
implementation to not coalesce the entries. But (ab-)using max segment
size tells the DMA API to split the segments if they are larger than
the given size, which is probably not what you want either as you now
need to coalesce the segments again when you are mapping real huge
pages.

> That is a huge waste of memory and CPU cycles and I actually wanted to 
> get rid of it for quite some time now. Sinas approach seems to be a good 
> step into the right direction to me to actually clean that up.
> 
> > Detecting such a condition is much easier if the DMA map implementation
> > gives you the coalesced scatter entries.
> 
> A way which preserves both path would be indeed nice to have, but that 
> only allows for the TLB optimization for the GPU and not the CPU any 
> more. So I actually see that as really minor use case.

Some of the Tegras have much larger TLBs and better page-walk
performance on the system IOMMU compared to the GPU MMU, so they would
probably benefit a good deal even if the hugepage optimization only
targets the GPU side.

Regards,
Lucas
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately
       [not found]                 ` <1523526540.4981.26.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
@ 2018-04-12 10:33                   ` Christian König
       [not found]                     ` <9e55076e-56dd-d553-63d6-2ce018199f84-5C7GfCeVMHo@public.gmane.org>
  2018-04-12 13:18                     ` Robin Murphy
  0 siblings, 2 replies; 13+ messages in thread
From: Christian König @ 2018-04-12 10:33 UTC (permalink / raw)
  To: Lucas Stach, Robin Murphy,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: okaya-sgV2jX0FEOL9JmXXK+q4OQ, alexander.deucher-5C7GfCeVMHo

Am 12.04.2018 um 11:49 schrieb Lucas Stach:
> Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:
>> Am 12.04.2018 um 11:11 schrieb Lucas Stach:
>>> Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
>>>> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
>>>>> For dma_map_sg(), DMA API implementations are free to merge consecutive
>>>>> segments into a single DMA mapping if conditions are suitable, thus the
>>>>> resulting DMA addresses may be packed into fewer entries than
>>>>> ttm->sg->nents implies.
>>>>>
>>>>> drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
>>>>> its callers either have to reject the 0 < count < nents case or risk
>>>>> getting bogus addresses back later. Fortunately this is relatively easy
>>>>> to deal with having to rejig structures to also store the mapped count,
>>>>> since the total DMA length should still be equal to the total buffer
>>>>> length. All we need is a separate scatterlist cursor to iterate the DMA
>>>>> addresses separately from the CPU addresses.
>>>> Mhm, I think I like Sinas approach better.
>>>>
>>>> See the hardware actually needs the dma_address on a page by page basis.
>>>>
>>>> Joining multiple consecutive pages into one entry is just additional
>>>> overhead which we don't need.
>>> But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
>>> be in front of your GPU to map large pages as such, causing additional
>>> overhead by means of additional TLB misses and page walks on the IOMMU
>>> side.
>>>
>>> And wouldn't you like to use huge pages at the GPU side, if the IOMMU
>>> already provides you the service of producing one large consecutive
>>> address range, rather than mapping them via a number of small pages?
>> No, I wouldn't like to use that. We're already using that :)
>>
>> But we use huge pages by allocating consecutive chunks of memory so that
>> both the CPU as well as the GPU can increase their TLB hit rate.
> I'm not convinced that this is a universal win. Many GPU buffers aren't
> accessed by the CPU and allocating huge pages puts much more strain on
> the kernel MM subsystem.

Yeah, we indeed see the extra overhead during allocation.

>> What currently happens is that the DMA subsystem tries to coalesce
>> multiple pages into on SG entry and we de-coalesce that inside the
>> driver again to create our random access array.
> I guess the right thing would be to have a flag that tells the the DMA
> implementation to not coalesce the entries. But (ab-)using max segment
> size tells the DMA API to split the segments if they are larger than
> the given size, which is probably not what you want either as you now
> need to coalesce the segments again when you are mapping real huge
> pages.

No, even with huge pages I need an array with every single dma address 
for a 4K pages (or whatever the normal page size is).

The problem is that I need a random access array for the DMA addresses, 
even when they are continuously.

I agree that max segment size is a bit ugly here, but at least for 
radeon, amdgpu and pretty much TTM in general it is exactly what I need.

I could fix TTM to not need that, but for radeon and amdgpu it is the 
hardware which needs this.

Christian.

>
>> That is a huge waste of memory and CPU cycles and I actually wanted to
>> get rid of it for quite some time now. Sinas approach seems to be a good
>> step into the right direction to me to actually clean that up.
>>
>>> Detecting such a condition is much easier if the DMA map implementation
>>> gives you the coalesced scatter entries.
>> A way which preserves both path would be indeed nice to have, but that
>> only allows for the TLB optimization for the GPU and not the CPU any
>> more. So I actually see that as really minor use case.
> Some of the Tegras have much larger TLBs and better page-walk
> performance on the system IOMMU compared to the GPU MMU, so they would
> probably benefit a good deal even if the hugepage optimization only
> targets the GPU side.
>
> Regards,
> Lucas

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately
       [not found]                     ` <9e55076e-56dd-d553-63d6-2ce018199f84-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-12 11:48                       ` okaya-sgV2jX0FEOL9JmXXK+q4OQ
  0 siblings, 0 replies; 13+ messages in thread
From: okaya-sgV2jX0FEOL9JmXXK+q4OQ @ 2018-04-12 11:48 UTC (permalink / raw)
  To: Christian König
  Cc: alexander.deucher-5C7GfCeVMHo, Robin Murphy,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Lucas Stach

On 2018-04-12 06:33, Christian König wrote:
> Am 12.04.2018 um 11:49 schrieb Lucas Stach:
>> Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:
>>> Am 12.04.2018 um 11:11 schrieb Lucas Stach:
>>>> Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
>>>>> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
>>>>>> For dma_map_sg(), DMA API implementations are free to merge 
>>>>>> consecutive
>>>>>> segments into a single DMA mapping if conditions are suitable, 
>>>>>> thus the
>>>>>> resulting DMA addresses may be packed into fewer entries than
>>>>>> ttm->sg->nents implies.
>>>>>> 
>>>>>> drm_prime_sg_to_page_addr_arrays() does not account for this, 
>>>>>> meaning
>>>>>> its callers either have to reject the 0 < count < nents case or 
>>>>>> risk
>>>>>> getting bogus addresses back later. Fortunately this is relatively 
>>>>>> easy
>>>>>> to deal with having to rejig structures to also store the mapped 
>>>>>> count,
>>>>>> since the total DMA length should still be equal to the total 
>>>>>> buffer
>>>>>> length. All we need is a separate scatterlist cursor to iterate 
>>>>>> the DMA
>>>>>> addresses separately from the CPU addresses.
>>>>> Mhm, I think I like Sinas approach better.
>>>>> 
>>>>> See the hardware actually needs the dma_address on a page by page 
>>>>> basis.
>>>>> 
>>>>> Joining multiple consecutive pages into one entry is just 
>>>>> additional
>>>>> overhead which we don't need.
>>>> But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that 
>>>> might
>>>> be in front of your GPU to map large pages as such, causing 
>>>> additional
>>>> overhead by means of additional TLB misses and page walks on the 
>>>> IOMMU
>>>> side.
>>>> 
>>>> And wouldn't you like to use huge pages at the GPU side, if the 
>>>> IOMMU
>>>> already provides you the service of producing one large consecutive
>>>> address range, rather than mapping them via a number of small pages?
>>> No, I wouldn't like to use that. We're already using that :)
>>> 
>>> But we use huge pages by allocating consecutive chunks of memory so 
>>> that
>>> both the CPU as well as the GPU can increase their TLB hit rate.
>> I'm not convinced that this is a universal win. Many GPU buffers 
>> aren't
>> accessed by the CPU and allocating huge pages puts much more strain on
>> the kernel MM subsystem.
> 
> Yeah, we indeed see the extra overhead during allocation.
> 
>>> What currently happens is that the DMA subsystem tries to coalesce
>>> multiple pages into on SG entry and we de-coalesce that inside the
>>> driver again to create our random access array.
>> I guess the right thing would be to have a flag that tells the the DMA
>> implementation to not coalesce the entries. But (ab-)using max segment
>> size tells the DMA API to split the segments if they are larger than
>> the given size, which is probably not what you want either as you now
>> need to coalesce the segments again when you are mapping real huge
>> pages.
> 
> No, even with huge pages I need an array with every single dma address
> for a 4K pages (or whatever the normal page size is).
> 
> The problem is that I need a random access array for the DMA
> addresses, even when they are continuously.
> 
> I agree that max segment size is a bit ugly here, but at least for
> radeon, amdgpu and pretty much TTM in general it is exactly what I
> need.
> 
> I could fix TTM to not need that, but for radeon and amdgpu it is the
> hardware which needs this.

I can implement i915 approach as Robin suggested as an alternative. Is 
that better?


> 
> Christian.
> 
>> 
>>> That is a huge waste of memory and CPU cycles and I actually wanted 
>>> to
>>> get rid of it for quite some time now. Sinas approach seems to be a 
>>> good
>>> step into the right direction to me to actually clean that up.
>>> 
>>>> Detecting such a condition is much easier if the DMA map 
>>>> implementation
>>>> gives you the coalesced scatter entries.
>>> A way which preserves both path would be indeed nice to have, but 
>>> that
>>> only allows for the TLB optimization for the GPU and not the CPU any
>>> more. So I actually see that as really minor use case.
>> Some of the Tegras have much larger TLBs and better page-walk
>> performance on the system IOMMU compared to the GPU MMU, so they would
>> probably benefit a good deal even if the hugepage optimization only
>> targets the GPU side.
>> 
>> Regards,
>> Lucas
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately
  2018-04-12 10:33                   ` Christian König
       [not found]                     ` <9e55076e-56dd-d553-63d6-2ce018199f84-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-12 13:18                     ` Robin Murphy
  1 sibling, 0 replies; 13+ messages in thread
From: Robin Murphy @ 2018-04-12 13:18 UTC (permalink / raw)
  To: Christian König, Lucas Stach, amd-gfx, dri-devel
  Cc: okaya, alexander.deucher

On 12/04/18 11:33, Christian König wrote:
> Am 12.04.2018 um 11:49 schrieb Lucas Stach:
>> Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:
>>> Am 12.04.2018 um 11:11 schrieb Lucas Stach:
>>>> Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
>>>>> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
>>>>>> For dma_map_sg(), DMA API implementations are free to merge 
>>>>>> consecutive
>>>>>> segments into a single DMA mapping if conditions are suitable, 
>>>>>> thus the
>>>>>> resulting DMA addresses may be packed into fewer entries than
>>>>>> ttm->sg->nents implies.
>>>>>>
>>>>>> drm_prime_sg_to_page_addr_arrays() does not account for this, meaning
>>>>>> its callers either have to reject the 0 < count < nents case or risk
>>>>>> getting bogus addresses back later. Fortunately this is relatively 
>>>>>> easy
>>>>>> to deal with having to rejig structures to also store the mapped 
>>>>>> count,
>>>>>> since the total DMA length should still be equal to the total buffer
>>>>>> length. All we need is a separate scatterlist cursor to iterate 
>>>>>> the DMA
>>>>>> addresses separately from the CPU addresses.
>>>>> Mhm, I think I like Sinas approach better.
>>>>>
>>>>> See the hardware actually needs the dma_address on a page by page 
>>>>> basis.
>>>>>
>>>>> Joining multiple consecutive pages into one entry is just additional
>>>>> overhead which we don't need.

Note that the merging done inside dma_map_sg() is pretty trivial in 
itself (it's effectively just the inverse of the logic in this patch). 
The "overhead" here is inherent in calling sg_alloc_table_from_pages() 
and dma_map_sg() at all.

>>>> But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that might
>>>> be in front of your GPU to map large pages as such, causing additional
>>>> overhead by means of additional TLB misses and page walks on the IOMMU
>>>> side.
>>>>
>>>> And wouldn't you like to use huge pages at the GPU side, if the IOMMU
>>>> already provides you the service of producing one large consecutive
>>>> address range, rather than mapping them via a number of small pages?
>>> No, I wouldn't like to use that. We're already using that :)
>>>
>>> But we use huge pages by allocating consecutive chunks of memory so that
>>> both the CPU as well as the GPU can increase their TLB hit rate.
>> I'm not convinced that this is a universal win. Many GPU buffers aren't
>> accessed by the CPU and allocating huge pages puts much more strain on
>> the kernel MM subsystem.
> 
> Yeah, we indeed see the extra overhead during allocation.
> 
>>> What currently happens is that the DMA subsystem tries to coalesce
>>> multiple pages into on SG entry and we de-coalesce that inside the
>>> driver again to create our random access array.
>> I guess the right thing would be to have a flag that tells the the DMA
>> implementation to not coalesce the entries. But (ab-)using max segment
>> size tells the DMA API to split the segments if they are larger than
>> the given size, which is probably not what you want either as you now
>> need to coalesce the segments again when you are mapping real huge
>> pages.
> 
> No, even with huge pages I need an array with every single dma address 
> for a 4K pages (or whatever the normal page size is).
> 
> The problem is that I need a random access array for the DMA addresses, 
> even when they are continuously.

OK, that makes me wonder if you even need dma_map_sg() in this case. 
 From the sound of that it would be a lot simpler to just call 
dma_map_page() in a loop over the pair of arrays. AFAICS that's what TTM 
already does in most places.

> I agree that max segment size is a bit ugly here, but at least for 
> radeon, amdgpu and pretty much TTM in general it is exactly what I need.
> 
> I could fix TTM to not need that, but for radeon and amdgpu it is the 
> hardware which needs this.

Sorry, I don't follow - how does the hardware care about the format of 
an intermediate data structure used to *generate* the dma_address array? 
That's all that I'm proposing to fix here.

Robin.

> 
> Christian.
> 
>>
>>> That is a huge waste of memory and CPU cycles and I actually wanted to
>>> get rid of it for quite some time now. Sinas approach seems to be a good
>>> step into the right direction to me to actually clean that up.
>>>
>>>> Detecting such a condition is much easier if the DMA map implementation
>>>> gives you the coalesced scatter entries.
>>> A way which preserves both path would be indeed nice to have, but that
>>> only allows for the TLB optimization for the GPU and not the CPU any
>>> more. So I actually see that as really minor use case.
>> Some of the Tegras have much larger TLBs and better page-walk
>> performance on the system IOMMU compared to the GPU MMU, so they would
>> probably benefit a good deal even if the hugepage optimization only
>> targets the GPU side.
>>
>> Regards,
>> Lucas
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing
  2018-04-11 18:28   ` Christian König
@ 2018-04-12 17:53     ` Robin Murphy
       [not found]       ` <24948fc1-ff01-1276-e3fa-c2b0e5713b5b-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Robin Murphy @ 2018-04-12 17:53 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel; +Cc: okaya, alexander.deucher

On 11/04/18 19:28, Christian König wrote:
> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
>> Now that drm_prime_sg_to_page_addr_arrays() understands the case where
>> dma_map_sg() has coalesced segments and returns 0 < count < nents, we
>> can relax the check to only consider genuine failure.
> 
> That pattern is repeated in pretty much all drivers using TTM.

AFAICS from a look through drivers/gpu/ only 3 drivers consider the 
actual value of what dma_map_sg() returns - others only handle the 
failure case of 0, and some don't check it at all - and of those it's 
only amdgpu and radeon barfing on it being different from nents (vmwgfx 
appears to just stash it to use slightly incorrectly later).

> So you would need to fix all of them, but as I said I don't think that 
> this approach is a good idea.
> 
> We essentially wanted to get rid of the dma_address array in the mid 
> term and that change goes into the exactly opposite direction.

But this patch bears no intention of making any fundamental change to 
the existing behaviour of this particular driver, it simply permits said 
behaviour to work at all on more systems than it currently does. I would 
consider that entirely orthogonal to future TTM-wide development :/

Robin.

> 
> Regards,
> Christian.
> 
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 205da3ff9cd0..f81e96a4242f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -813,7 +813,7 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_tt 
>> *ttm)
>>       r = -ENOMEM;
>>       nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, 
>> direction);
>> -    if (nents != ttm->sg->nents)
>> +    if (nents == 0)
>>           goto release_sg;
>>       drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing
       [not found]       ` <24948fc1-ff01-1276-e3fa-c2b0e5713b5b-5wv7dgnIgG8@public.gmane.org>
@ 2018-04-12 19:08         ` Christian König
  0 siblings, 0 replies; 13+ messages in thread
From: Christian König @ 2018-04-12 19:08 UTC (permalink / raw)
  To: Robin Murphy, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: okaya-sgV2jX0FEOL9JmXXK+q4OQ, alexander.deucher-5C7GfCeVMHo,
	David1.Zhou-5C7GfCeVMHo

Am 12.04.2018 um 19:53 schrieb Robin Murphy:
> On 11/04/18 19:28, Christian König wrote:
>> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
>>> Now that drm_prime_sg_to_page_addr_arrays() understands the case where
>>> dma_map_sg() has coalesced segments and returns 0 < count < nents, we
>>> can relax the check to only consider genuine failure.
>>
>> That pattern is repeated in pretty much all drivers using TTM.
>
> AFAICS from a look through drivers/gpu/ only 3 drivers consider the 
> actual value of what dma_map_sg() returns - others only handle the 
> failure case of 0, and some don't check it at all - and of those it's 
> only amdgpu and radeon barfing on it being different from nents 
> (vmwgfx appears to just stash it to use slightly incorrectly later).
>
>> So you would need to fix all of them, but as I said I don't think 
>> that this approach is a good idea.
>>
>> We essentially wanted to get rid of the dma_address array in the mid 
>> term and that change goes into the exactly opposite direction.
>
> But this patch bears no intention of making any fundamental change to 
> the existing behaviour of this particular driver, it simply permits 
> said behaviour to work at all on more systems than it currently does. 
> I would consider that entirely orthogonal to future TTM-wide 
> development :/

That's a really good point. Please try to fix radeon as well and maybe 
ping the vmwgfx maintainers.

With that in place I'm perfectly ok with going ahead with that approach.

Christian.

>
> Robin.
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index 205da3ff9cd0..f81e96a4242f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -813,7 +813,7 @@ static int amdgpu_ttm_tt_pin_userptr(struct 
>>> ttm_tt *ttm)
>>>       r = -ENOMEM;
>>>       nents = dma_map_sg(adev->dev, ttm->sg->sgl, ttm->sg->nents, 
>>> direction);
>>> -    if (nents != ttm->sg->nents)
>>> +    if (nents == 0)
>>>           goto release_sg;
>>>       drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-04-12 19:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-11 17:11 [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately Robin Murphy
2018-04-11 17:11 ` [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing Robin Murphy
2018-04-11 18:28   ` Christian König
2018-04-12 17:53     ` Robin Murphy
     [not found]       ` <24948fc1-ff01-1276-e3fa-c2b0e5713b5b-5wv7dgnIgG8@public.gmane.org>
2018-04-12 19:08         ` Christian König
2018-04-11 17:45 ` [1/2] drm/prime: Iterate SG DMA addresses separately Robin Murphy
     [not found] ` <0901c8c3b9adbcb851ba58dfca6b16d12ccbcb0f.1523465719.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-04-11 18:26   ` [PATCH 1/2] " Christian König
     [not found]     ` <67b1875d-9f77-5fb8-bfc6-53d34c15ab16-5C7GfCeVMHo@public.gmane.org>
2018-04-12  9:11       ` Lucas Stach
     [not found]         ` <1523524317.4981.24.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2018-04-12  9:35           ` Christian König
     [not found]             ` <58ef1aab-6c3a-8e69-0a7e-98218ba9fe96-5C7GfCeVMHo@public.gmane.org>
2018-04-12  9:49               ` Lucas Stach
     [not found]                 ` <1523526540.4981.26.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2018-04-12 10:33                   ` Christian König
     [not found]                     ` <9e55076e-56dd-d553-63d6-2ce018199f84-5C7GfCeVMHo@public.gmane.org>
2018-04-12 11:48                       ` okaya-sgV2jX0FEOL9JmXXK+q4OQ
2018-04-12 13:18                     ` Robin Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.