linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fixes to drm-next - TTM DMA code (v1)
@ 2011-12-12 20:09 Konrad Rzeszutek Wilk
  2011-12-12 20:09 ` [PATCH 1/3] drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool Konrad Rzeszutek Wilk
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-12 20:09 UTC (permalink / raw)
  To: linux-kernel, j.glisse, dri-devel; +Cc: bskeggs, thellstrom

Jerome pointed me to some accounting error in the DMA API debugging code and
while I can't figure it out yet, I did notice some extreme slowness - which
is due to the nouveau driver calling the unpopulate (now that unbind +
unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2
did not have those issues but I can't recall).

Anyhow these patches fix the 50% perf regression I saw and also some minor bugs
that I noticed.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool.
  2011-12-12 20:09 [PATCH] fixes to drm-next - TTM DMA code (v1) Konrad Rzeszutek Wilk
@ 2011-12-12 20:09 ` Konrad Rzeszutek Wilk
  2011-12-12 20:09 ` [PATCH 2/3] drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page Konrad Rzeszutek Wilk
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-12 20:09 UTC (permalink / raw)
  To: linux-kernel, j.glisse, dri-devel
  Cc: bskeggs, thellstrom, Konrad Rzeszutek Wilk

Otherwise we are doing redundant work. Especially since the 'unbind'
and 'unpopulate' have been merged and nouveau driver ends up calling
it quite excessivly. On a GeForce 8600 GT with Gnome Shell (GNOME 3)
we end up spending about 54% CPU time in __change_page_attr_set_clr
checking the page flags when I move the mouse cursor all over the screen.

The callgraph (annotated) looks as so before this patch:

    53.29%  gnome-shell  [kernel.kallsyms]                   [k] static_protections
            |
            --- static_protections
               |
               |--91.80%-- __change_page_attr_set_clr
               |          change_page_attr_set_clr
               |          set_pages_array_wb
               |          |
               |          |--96.55%-- ttm_dma_unpopulate
               |          |          nouveau_ttm_tt_unpopulate
               |          |          ttm_tt_destroy
               |          |          ttm_bo_cleanup_memtype_use
               |          |          ttm_bo_release
               |          |          kref_put
               |          |          ttm_bo_unref
               |          |          nouveau_gem_object_del
               |          |          drm_gem_object_free
               |          |          kref_put
               |          |          drm_gem_object_unreference_unlocked
               |          |          drm_gem_object_handle_unreference_unlocked.part.1
               |          |          drm_gem_handle_delete
               |          |          drm_gem_close_ioctl
               |          |          drm_ioctl
               |          |          do_vfs_ioctl
               |          |          sys_ioctl
               |          |          system_call_fastpath
               |          |          __GI___ioctl
               |          |
               |           --3.45%-- ttm_dma_pages_put
               |                     ttm_dma_page_pool_free
               |                     ttm_dma_unpopulate
               |                     nouveau_ttm_tt_unpopulate
               |                     ttm_tt_destroy
               |                     ttm_bo_cleanup_memtype_use
               |                     ttm_bo_release
               |                     kref_put
               |                     ttm_bo_unref
               |                     nouveau_gem_object_del
               |                     drm_gem_object_free
               |                     kref_put
               |                     drm_gem_object_unreference_unlocked
               |                     drm_gem_object_handle_unreference_unlocked.part.1
               |                     drm_gem_handle_delete
               |                     drm_gem_close_ioctl
               |                     drm_ioctl
               |                     do_vfs_ioctl
               |                     sys_ioctl
               |                     system_call_fastpath
               |                     __GI___ioctl
               |
                --8.20%-- change_page_attr_set_clr
                          set_pages_array_wb
                          |
                          |--93.76%-- ttm_dma_unpopulate
                          |          nouveau_ttm_tt_unpopulate
                          |          ttm_tt_destroy
                          |          ttm_bo_cleanup_memtype_use
                          |          ttm_bo_release
                          |          kref_put
                          |          ttm_bo_unref
                          |          nouveau_gem_object_del
                          |          drm_gem_object_free
                          |          kref_put
                          |          drm_gem_object_unreference_unlocked
                          |          drm_gem_object_handle_unreference_unlocked.part.1
                          |          drm_gem_handle_delete
                          |          drm_gem_close_ioctl
                          |          drm_ioctl
                          |          do_vfs_ioctl
                          |          sys_ioctl
                          |          system_call_fastpath
                          |          __GI___ioctl
                          |
                           --6.24%-- ttm_dma_pages_put
                                     ttm_dma_page_pool_free
                                     ttm_dma_unpopulate
                                     nouveau_ttm_tt_unpopulate
                                     ttm_tt_destroy
                                     ttm_bo_cleanup_memtype_use
                                     ttm_bo_release
                                     kref_put
                                     ttm_bo_unref
                                     nouveau_gem_object_del
                                     drm_gem_object_free
                                     kref_put
                                     drm_gem_object_unreference_unlocked
                                     drm_gem_object_handle_unreference_unlocked.part.1
                                     drm_gem_handle_delete
                                     drm_gem_close_ioctl
                                     drm_ioctl
                                     do_vfs_ioctl
                                     sys_ioctl
                                     system_call_fastpath
                                     __GI___ioctl

and after this patch all of that disappears.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index 6678abc..6c06d0b 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -384,7 +384,9 @@ static void ttm_dma_pages_put(struct dma_pool *pool, struct list_head *d_pages,
 {
 	struct dma_page *d_page, *tmp;
 
-	if (npages && set_pages_array_wb(pages, npages))
+	/* Don't set WB on WB page pool. */
+	if (npages && !(pool->type & IS_CACHED) &&
+	    set_pages_array_wb(pages, npages))
 		pr_err(TTM_PFX "%s: Failed to set %d pages to wb!\n",
 			pool->dev_name, npages);
 
@@ -396,7 +398,8 @@ static void ttm_dma_pages_put(struct dma_pool *pool, struct list_head *d_pages,
 
 static void ttm_dma_page_put(struct dma_pool *pool, struct dma_page *d_page)
 {
-	if (set_pages_array_wb(&d_page->p, 1))
+	/* Don't set WB on WB page pool. */
+	if (!(pool->type & IS_CACHED) && set_pages_array_wb(&d_page->p, 1))
 		pr_err(TTM_PFX "%s: Failed to set %d pages to wb!\n",
 			pool->dev_name, 1);
 
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page.
  2011-12-12 20:09 [PATCH] fixes to drm-next - TTM DMA code (v1) Konrad Rzeszutek Wilk
  2011-12-12 20:09 ` [PATCH 1/3] drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool Konrad Rzeszutek Wilk
@ 2011-12-12 20:09 ` Konrad Rzeszutek Wilk
  2011-12-21 15:17   ` Jerome Glisse
  2011-12-12 20:09 ` [PATCH 3/3] drm/ttm/dma: Optimize when free-ing the recycled page pool Konrad Rzeszutek Wilk
  2011-12-13 16:07 ` [PATCH] fixes to drm-next - TTM DMA code (v1) Jerome Glisse
  3 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-12 20:09 UTC (permalink / raw)
  To: linux-kernel, j.glisse, dri-devel
  Cc: bskeggs, thellstrom, Konrad Rzeszutek Wilk

The code to figure out how many pages to shrink the pool
ends up capping the 'count' at _manager->options.max_size - which is OK.
Except that the 'count' is also used when accounting for how many pages
are recycled - which we end up with the invalid values. This fixes
it by using a different value for the amount of pages to shrink.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index 6c06d0b..e57aa24 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
 	struct dma_page *d_page, *next;
 	enum pool_type type;
 	bool is_cached = false;
-	unsigned count = 0, i;
+	unsigned count = 0, i, npages;
 	unsigned long irq_flags;
 
 	type = ttm_to_type(ttm->page_flags, ttm->caching_state);
@@ -971,11 +971,13 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
 	pool->npages_in_use -= count;
 	if (is_cached) {
 		pool->nfrees += count;
+		npages = count;
 	} else {
 		pool->npages_free += count;
 		list_splice(&ttm_dma->pages_list, &pool->free_list);
+		npages = count;
 		if (pool->npages_free > _manager->options.max_size) {
-			count = pool->npages_free - _manager->options.max_size;
+			npages = pool->npages_free - _manager->options.max_size;
 		}
 	}
 	spin_unlock_irqrestore(&pool->lock, irq_flags);
@@ -1000,8 +1002,8 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
 	}
 
 	/* shrink pool if necessary */
-	if (count)
-		ttm_dma_page_pool_free(pool, count);
+	if (npages)
+		ttm_dma_page_pool_free(pool, npages);
 	ttm->state = tt_unpopulated;
 }
 EXPORT_SYMBOL_GPL(ttm_dma_unpopulate);
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] drm/ttm/dma: Optimize when free-ing the recycled page pool.
  2011-12-12 20:09 [PATCH] fixes to drm-next - TTM DMA code (v1) Konrad Rzeszutek Wilk
  2011-12-12 20:09 ` [PATCH 1/3] drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool Konrad Rzeszutek Wilk
  2011-12-12 20:09 ` [PATCH 2/3] drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page Konrad Rzeszutek Wilk
@ 2011-12-12 20:09 ` Konrad Rzeszutek Wilk
  2011-12-21 15:22   ` Jerome Glisse
  2011-12-13 16:07 ` [PATCH] fixes to drm-next - TTM DMA code (v1) Jerome Glisse
  3 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-12 20:09 UTC (permalink / raw)
  To: linux-kernel, j.glisse, dri-devel
  Cc: bskeggs, thellstrom, Konrad Rzeszutek Wilk

We would free the page pool - even if it was for cached pages
(which are deleted from the pool - so there are no free pages).

Also we also missed the opportunity to batch the amount of pages
to free (similar to how ttm_page_alloc.c does it). This reintroduces
the code that was lost during rebasing.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index e57aa24..156ddcd 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
 	struct dma_page *d_page, *next;
 	enum pool_type type;
 	bool is_cached = false;
-	unsigned count = 0, i, npages;
+	unsigned count = 0, i, npages = 0;
 	unsigned long irq_flags;
 
 	type = ttm_to_type(ttm->page_flags, ttm->caching_state);
@@ -971,13 +971,16 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
 	pool->npages_in_use -= count;
 	if (is_cached) {
 		pool->nfrees += count;
-		npages = count;
 	} else {
 		pool->npages_free += count;
 		list_splice(&ttm_dma->pages_list, &pool->free_list);
 		npages = count;
 		if (pool->npages_free > _manager->options.max_size) {
 			npages = pool->npages_free - _manager->options.max_size;
+			/* free at least NUM_PAGES_TO_ALLOC number of pages
+			 * to reduce calls to set_memory_wb */
+			if (npages < NUM_PAGES_TO_ALLOC)
+				npages = NUM_PAGES_TO_ALLOC;
 		}
 	}
 	spin_unlock_irqrestore(&pool->lock, irq_flags);
@@ -1001,7 +1004,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
 		ttm_dma->dma_address[i] = 0;
 	}
 
-	/* shrink pool if necessary */
+	/* shrink pool if necessary (only on !is_cached pools)*/
 	if (npages)
 		ttm_dma_page_pool_free(pool, npages);
 	ttm->state = tt_unpopulated;
-- 
1.7.7.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fixes to drm-next - TTM DMA code (v1)
  2011-12-12 20:09 [PATCH] fixes to drm-next - TTM DMA code (v1) Konrad Rzeszutek Wilk
                   ` (2 preceding siblings ...)
  2011-12-12 20:09 ` [PATCH 3/3] drm/ttm/dma: Optimize when free-ing the recycled page pool Konrad Rzeszutek Wilk
@ 2011-12-13 16:07 ` Jerome Glisse
  2011-12-13 16:23   ` Thomas Hellstrom
  3 siblings, 1 reply; 14+ messages in thread
From: Jerome Glisse @ 2011-12-13 16:07 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, dri-devel, bskeggs, thellstrom

On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
> Jerome pointed me to some accounting error in the DMA API debugging code and
> while I can't figure it out yet, I did notice some extreme slowness - which
> is due to the nouveau driver calling the unpopulate (now that unbind +
> unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2
> did not have those issues but I can't recall).
> 
> Anyhow these patches fix the 50% perf regression I saw and also some minor bugs
> that I noticed.
> 

Gonna review those today and test them.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fixes to drm-next - TTM DMA code (v1)
  2011-12-13 16:07 ` [PATCH] fixes to drm-next - TTM DMA code (v1) Jerome Glisse
@ 2011-12-13 16:23   ` Thomas Hellstrom
  2011-12-13 16:40     ` Konrad Rzeszutek Wilk
  2011-12-13 22:09     ` James Simmons
  0 siblings, 2 replies; 14+ messages in thread
From: Thomas Hellstrom @ 2011-12-13 16:23 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Konrad Rzeszutek Wilk, linux-kernel, dri-devel, bskeggs

On 12/13/2011 05:07 PM, Jerome Glisse wrote:
> On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
>    
>> Jerome pointed me to some accounting error in the DMA API debugging code and
>> while I can't figure it out yet, I did notice some extreme slowness - which
>> is due to the nouveau driver calling the unpopulate (now that unbind +
>> unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2
>> did not have those issues but I can't recall).
>>
>> Anyhow these patches fix the 50% perf regression I saw and also some minor bugs
>> that I noticed.
>>
>>      
> Gonna review those today and test them.
>
> Cheers,
> Jerome
>    
Hi!

I'm not whether any drivers are still using the AGP backend?
Calling unpopulate / (previous clear) each time unbind is done should be 
quite
inefficient with that one, as AGP sets up its own data structures and 
copies page tables
on each populate. That should really be avoided unless there is a good 
reason to have it.

/Thomas



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fixes to drm-next - TTM DMA code (v1)
  2011-12-13 16:23   ` Thomas Hellstrom
@ 2011-12-13 16:40     ` Konrad Rzeszutek Wilk
  2011-12-19 19:51       ` Thomas Hellstrom
  2012-01-05 19:37       ` Jerome Glisse
  2011-12-13 22:09     ` James Simmons
  1 sibling, 2 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-13 16:40 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: Jerome Glisse, linux-kernel, dri-devel, bskeggs

On Tue, Dec 13, 2011 at 05:23:30PM +0100, Thomas Hellstrom wrote:
> On 12/13/2011 05:07 PM, Jerome Glisse wrote:
> >On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
> >>Jerome pointed me to some accounting error in the DMA API debugging code and
> >>while I can't figure it out yet, I did notice some extreme slowness - which
> >>is due to the nouveau driver calling the unpopulate (now that unbind +
> >>unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2
> >>did not have those issues but I can't recall).
> >>
> >>Anyhow these patches fix the 50% perf regression I saw and also some minor bugs
> >>that I noticed.
> >>
> >Gonna review those today and test them.
> >
> >Cheers,
> >Jerome
> Hi!
> 
> I'm not whether any drivers are still using the AGP backend?

Uh, probably they do if the cards are AGP?
The problem I encountered was with an PCIe Nvidia card:

01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1

> Calling unpopulate / (previous clear) each time unbind is done
> should be quite
> inefficient with that one, as AGP sets up its own data structures
> and copies page tables
> on each populate. That should really be avoided unless there is a
> good reason to have it.

nouveau_bo_rd32 and nv50_crtc_cursor_set showed up as the callers that
were causing the unpopulate calls. It did happen _a lot_ when I moved the
cursor madly.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fixes to drm-next - TTM DMA code (v1)
  2011-12-13 16:23   ` Thomas Hellstrom
  2011-12-13 16:40     ` Konrad Rzeszutek Wilk
@ 2011-12-13 22:09     ` James Simmons
  1 sibling, 0 replies; 14+ messages in thread
From: James Simmons @ 2011-12-13 22:09 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: Jerome Glisse, bskeggs, linux-kernel, dri-devel


> Hi!
> 
> I'm not whether any drivers are still using the AGP backend?

Actually the openchrome projects KMS kernel uses a AGP backend.


> Calling unpopulate / (previous clear) each time unbind is done should be quite
> inefficient with that one, as AGP sets up its own data structures and copies
> page tables
> on each populate. That should really be avoided unless there is a good reason
> to have it.
> 
> /Thomas
> 
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fixes to drm-next - TTM DMA code (v1)
  2011-12-13 16:40     ` Konrad Rzeszutek Wilk
@ 2011-12-19 19:51       ` Thomas Hellstrom
  2011-12-20 17:35         ` Konrad Rzeszutek Wilk
  2012-01-05 19:37       ` Jerome Glisse
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Hellstrom @ 2011-12-19 19:51 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jerome Glisse
  Cc: Thomas Hellstrom, linux-kernel, dri-devel, bskeggs

On 12/13/2011 05:40 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 13, 2011 at 05:23:30PM +0100, Thomas Hellstrom wrote:
>    
>> On 12/13/2011 05:07 PM, Jerome Glisse wrote:
>>      
>>> On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
>>>        
>>>> Jerome pointed me to some accounting error in the DMA API debugging code and
>>>> while I can't figure it out yet, I did notice some extreme slowness - which
>>>> is due to the nouveau driver calling the unpopulate (now that unbind +
>>>> unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2
>>>> did not have those issues but I can't recall).
>>>>
>>>> Anyhow these patches fix the 50% perf regression I saw and also some minor bugs
>>>> that I noticed.
>>>>
>>>>          
>>> Gonna review those today and test them.
>>>
>>> Cheers,
>>> Jerome
>>>        
>> Hi!
>>
>> I'm not whether any drivers are still using the AGP backend?
>>      
> Uh, probably they do if the cards are AGP?
> The problem I encountered was with an PCIe Nvidia card:
>
> 01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1
>
>    
>> Calling unpopulate / (previous clear) each time unbind is done
>> should be quite
>> inefficient with that one, as AGP sets up its own data structures
>> and copies page tables
>> on each populate. That should really be avoided unless there is a
>> good reason to have it.
>>      
> nouveau_bo_rd32 and nv50_crtc_cursor_set showed up as the callers that
> were causing the unpopulate calls. It did happen _a lot_ when I moved the
> cursor madly.
>    

Konrad, Jerome
Was there a resolution to this. If the ttm_tt rewrite results in 
unpopulate being called more often than before,
and that causes performance regressions, that must be fixed as soon as 
possible.

/Thomas



> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>    


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fixes to drm-next - TTM DMA code (v1)
  2011-12-19 19:51       ` Thomas Hellstrom
@ 2011-12-20 17:35         ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-20 17:35 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: Jerome Glisse, linux-kernel, dri-devel, bskeggs

On Mon, Dec 19, 2011 at 08:51:15PM +0100, Thomas Hellstrom wrote:
> On 12/13/2011 05:40 PM, Konrad Rzeszutek Wilk wrote:
> >On Tue, Dec 13, 2011 at 05:23:30PM +0100, Thomas Hellstrom wrote:
> >>On 12/13/2011 05:07 PM, Jerome Glisse wrote:
> >>>On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
> >>>>Jerome pointed me to some accounting error in the DMA API debugging code and
> >>>>while I can't figure it out yet, I did notice some extreme slowness - which
> >>>>is due to the nouveau driver calling the unpopulate (now that unbind +
> >>>>unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2
> >>>>did not have those issues but I can't recall).
> >>>>
> >>>>Anyhow these patches fix the 50% perf regression I saw and also some minor bugs
> >>>>that I noticed.
> >>>>
> >>>Gonna review those today and test them.
> >>>
> >>>Cheers,
> >>>Jerome
> >>Hi!
> >>
> >>I'm not whether any drivers are still using the AGP backend?
> >Uh, probably they do if the cards are AGP?
> >The problem I encountered was with an PCIe Nvidia card:
> >
> >01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1
> >
> >>Calling unpopulate / (previous clear) each time unbind is done
> >>should be quite
> >>inefficient with that one, as AGP sets up its own data structures
> >>and copies page tables
> >>on each populate. That should really be avoided unless there is a
> >>good reason to have it.
> >nouveau_bo_rd32 and nv50_crtc_cursor_set showed up as the callers that
> >were causing the unpopulate calls. It did happen _a lot_ when I moved the
> >cursor madly.
> 
> Konrad, Jerome
> Was there a resolution to this. If the ttm_tt rewrite results in
> unpopulate being called more often than before,
> and that causes performance regressions, that must be fixed as soon
> as possible.

I am waiting for Jerome to look over my patches. Next week I was
going to setup a test-bed with an Radeon and Nvidia AGP card
and bootup two kernels - one without the ttm_bind/ttm_populate
squashing and one with it.

Then run some benchmark code.. Thought not sure what kind of benchmark
code I should run? Any thoughts? The regression I observed were just by
moving the mouse around but that is not very "scientfic".

openareana? alien? etuxracer? Any of them offer some timed demo mode
to get an idea of performance impact?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page.
  2011-12-12 20:09 ` [PATCH 2/3] drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page Konrad Rzeszutek Wilk
@ 2011-12-21 15:17   ` Jerome Glisse
  2011-12-21 15:30     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Jerome Glisse @ 2011-12-21 15:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, dri-devel, bskeggs, thellstrom

On Mon, Dec 12, 2011 at 3:09 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> The code to figure out how many pages to shrink the pool
> ends up capping the 'count' at _manager->options.max_size - which is OK.
> Except that the 'count' is also used when accounting for how many pages
> are recycled - which we end up with the invalid values. This fixes
> it by using a different value for the amount of pages to shrink.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |   10 ++++++----
>  1 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> index 6c06d0b..e57aa24 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
>        struct dma_page *d_page, *next;
>        enum pool_type type;
>        bool is_cached = false;
> -       unsigned count = 0, i;
> +       unsigned count = 0, i, npages;
>        unsigned long irq_flags;
>
>        type = ttm_to_type(ttm->page_flags, ttm->caching_state);
> @@ -971,11 +971,13 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
>        pool->npages_in_use -= count;
>        if (is_cached) {
>                pool->nfrees += count;
> +               npages = count;

This is wrong, i thought we didn't wanted to shrink cached page, your
3/3 patch remove this line. Thought i am kind of wondering why we
don't shrink the cached pool, i can't remember the reason.

>        } else {
>                pool->npages_free += count;
>                list_splice(&ttm_dma->pages_list, &pool->free_list);
> +               npages = count;
>                if (pool->npages_free > _manager->options.max_size) {
> -                       count = pool->npages_free - _manager->options.max_size;
> +                       npages = pool->npages_free - _manager->options.max_size;
>                }
>        }
>        spin_unlock_irqrestore(&pool->lock, irq_flags);
> @@ -1000,8 +1002,8 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
>        }
>
>        /* shrink pool if necessary */
> -       if (count)
> -               ttm_dma_page_pool_free(pool, count);
> +       if (npages)
> +               ttm_dma_page_pool_free(pool, npages);
>        ttm->state = tt_unpopulated;
>  }
>  EXPORT_SYMBOL_GPL(ttm_dma_unpopulate);
> --
> 1.7.7.3
>

Otherwise Reviewed-by:Jerome Glisse <jglisse@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] drm/ttm/dma: Optimize when free-ing the recycled page pool.
  2011-12-12 20:09 ` [PATCH 3/3] drm/ttm/dma: Optimize when free-ing the recycled page pool Konrad Rzeszutek Wilk
@ 2011-12-21 15:22   ` Jerome Glisse
  0 siblings, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2011-12-21 15:22 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: linux-kernel, dri-devel, bskeggs, thellstrom

On Mon, Dec 12, 2011 at 3:09 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> We would free the page pool - even if it was for cached pages
> (which are deleted from the pool - so there are no free pages).
>
> Also we also missed the opportunity to batch the amount of pages
> to free (similar to how ttm_page_alloc.c does it). This reintroduces
> the code that was lost during rebasing.
>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |    9 ++++++---
>  1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> index e57aa24..156ddcd 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
>        struct dma_page *d_page, *next;
>        enum pool_type type;
>        bool is_cached = false;
> -       unsigned count = 0, i, npages;
> +       unsigned count = 0, i, npages = 0;
>        unsigned long irq_flags;
>
>        type = ttm_to_type(ttm->page_flags, ttm->caching_state);
> @@ -971,13 +971,16 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
>        pool->npages_in_use -= count;
>        if (is_cached) {
>                pool->nfrees += count;
> -               npages = count;

See patch 2/3 comment otherwise Reviewed-by: Jerome Glisse <jglisse@redhat.com>

>        } else {
>                pool->npages_free += count;
>                list_splice(&ttm_dma->pages_list, &pool->free_list);
>                npages = count;
>                if (pool->npages_free > _manager->options.max_size) {
>                        npages = pool->npages_free - _manager->options.max_size;
> +                       /* free at least NUM_PAGES_TO_ALLOC number of pages
> +                        * to reduce calls to set_memory_wb */
> +                       if (npages < NUM_PAGES_TO_ALLOC)
> +                               npages = NUM_PAGES_TO_ALLOC;
>                }
>        }
>        spin_unlock_irqrestore(&pool->lock, irq_flags);
> @@ -1001,7 +1004,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
>                ttm_dma->dma_address[i] = 0;
>        }
>
> -       /* shrink pool if necessary */
> +       /* shrink pool if necessary (only on !is_cached pools)*/
>        if (npages)
>                ttm_dma_page_pool_free(pool, npages);
>        ttm->state = tt_unpopulated;
> --
> 1.7.7.3
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page.
  2011-12-21 15:17   ` Jerome Glisse
@ 2011-12-21 15:30     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-21 15:30 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: linux-kernel, dri-devel, bskeggs, thellstrom

On Wed, Dec 21, 2011 at 10:17:36AM -0500, Jerome Glisse wrote:
> On Mon, Dec 12, 2011 at 3:09 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > The code to figure out how many pages to shrink the pool
> > ends up capping the 'count' at _manager->options.max_size - which is OK.
> > Except that the 'count' is also used when accounting for how many pages
> > are recycled - which we end up with the invalid values. This fixes
> > it by using a different value for the amount of pages to shrink.
> >
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > ---
> >  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |   10 ++++++----
> >  1 files changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> > index 6c06d0b..e57aa24 100644
> > --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> > +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> > @@ -949,7 +949,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
> >        struct dma_page *d_page, *next;
> >        enum pool_type type;
> >        bool is_cached = false;
> > -       unsigned count = 0, i;
> > +       unsigned count = 0, i, npages;
> >        unsigned long irq_flags;
> >
> >        type = ttm_to_type(ttm->page_flags, ttm->caching_state);
> > @@ -971,11 +971,13 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
> >        pool->npages_in_use -= count;
> >        if (is_cached) {
> >                pool->nfrees += count;
> > +               npages = count;
> 
> This is wrong, i thought we didn't wanted to shrink cached page, your
> 3/3 patch remove this line. Thought i am kind of wondering why we
> don't shrink the cached pool, i can't remember the reason.

We delete the cached ones:

 988         if (is_cached) {
 989                 list_for_each_entry_safe(d_page, next, &ttm_dma->pages_list, page_list) {
 990                         ttm_mem_global_free_page(ttm->glob->mem_glob,
 991                                                  d_page->p);
 992                         ttm_dma_page_put(pool, d_page);
 993                 }

And then we end up calling in ttm_dma_page_pool_free for (count) pages
that we had already deleted. So it ends up being a bit redundant.

The 3/3 patch fixes that (by removing the call to ttm_dma_page_pool_free
for is_cached. This patch (2/3) tries to preserve the logic if it is either
is_cached or !is_cached.

> 
> >        } else {
> >                pool->npages_free += count;
> >                list_splice(&ttm_dma->pages_list, &pool->free_list);
> > +               npages = count;
> >                if (pool->npages_free > _manager->options.max_size) {
> > -                       count = pool->npages_free - _manager->options.max_size;
> > +                       npages = pool->npages_free - _manager->options.max_size;
> >                }
> >        }
> >        spin_unlock_irqrestore(&pool->lock, irq_flags);
> > @@ -1000,8 +1002,8 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev)
> >        }
> >
> >        /* shrink pool if necessary */
> > -       if (count)
> > -               ttm_dma_page_pool_free(pool, count);
> > +       if (npages)
> > +               ttm_dma_page_pool_free(pool, npages);
> >        ttm->state = tt_unpopulated;
> >  }
> >  EXPORT_SYMBOL_GPL(ttm_dma_unpopulate);
> > --
> > 1.7.7.3
> >
> 
> Otherwise Reviewed-by:Jerome Glisse <jglisse@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fixes to drm-next - TTM DMA code (v1)
  2011-12-13 16:40     ` Konrad Rzeszutek Wilk
  2011-12-19 19:51       ` Thomas Hellstrom
@ 2012-01-05 19:37       ` Jerome Glisse
  1 sibling, 0 replies; 14+ messages in thread
From: Jerome Glisse @ 2012-01-05 19:37 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Thomas Hellstrom, linux-kernel, dri-devel, bskeggs

On Tue, Dec 13, 2011 at 11:40:31AM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 13, 2011 at 05:23:30PM +0100, Thomas Hellstrom wrote:
> > On 12/13/2011 05:07 PM, Jerome Glisse wrote:
> > >On Mon, Dec 12, 2011 at 03:09:26PM -0500, Konrad Rzeszutek Wilk wrote:
> > >>Jerome pointed me to some accounting error in the DMA API debugging code and
> > >>while I can't figure it out yet, I did notice some extreme slowness - which
> > >>is due to the nouveau driver calling the unpopulate (now that unbind +
> > >>unpopulate are squashed) quite a lot (this is using Gnome Shell - I think GNOME2
> > >>did not have those issues but I can't recall).
> > >>
> > >>Anyhow these patches fix the 50% perf regression I saw and also some minor bugs
> > >>that I noticed.
> > >>
> > >Gonna review those today and test them.
> > >
> > >Cheers,
> > >Jerome
> > Hi!
> > 
> > I'm not whether any drivers are still using the AGP backend?
> 
> Uh, probably they do if the cards are AGP?
> The problem I encountered was with an PCIe Nvidia card:
> 
> 01:00.0 VGA compatible controller: nVidia Corporation G84 [GeForce 8600 GT] (rev a1
> 
> > Calling unpopulate / (previous clear) each time unbind is done
> > should be quite
> > inefficient with that one, as AGP sets up its own data structures
> > and copies page tables
> > on each populate. That should really be avoided unless there is a
> > good reason to have it.
> 
> nouveau_bo_rd32 and nv50_crtc_cursor_set showed up as the callers that
> were causing the unpopulate calls. It did happen _a lot_ when I moved the
> cursor madly.

Looked at that and i am pretty sure that those function can't trigger
unpopulate event (but i might have missed a path). I tested before and
after the ttm change and those function are as time consuming as before.

I think nouveau is doing something stupid in userspace when it comes to
cursor (might be the ddx or just gnome3 or X server haven't tracked that
down).

Anyway, bottom line is i believe we don't call unpopulate more than
before the ttm changes. It seems to be the same from code inspection,
previously each time we called clear we also freed the pages so clear
is equivalent to unpopulate. In theory we don't call unpopulate on
each unbind but given the current user we do (only path were unbind
is not followed by unpopulate is ttm_bo_move_ttm same as before the
patch so no change there).

So, Konrad patch to avoid calling set_pages_array_wb when the page is
cached is a proper fix, this is what happened before, and this is what
the ttm change broke. I believe there is no performance regression with
this ttm change on radeon or nouveau (tested only with gnome3+nexuiz
on nv50 and r600,r700,evergreen).

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-01-05 19:33 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-12 20:09 [PATCH] fixes to drm-next - TTM DMA code (v1) Konrad Rzeszutek Wilk
2011-12-12 20:09 ` [PATCH 1/3] drm/ttm/dma: Only call set_pages_array_wb when the page is not in WB pool Konrad Rzeszutek Wilk
2011-12-12 20:09 ` [PATCH 2/3] drm/ttm/dma: Fix accounting error when calling ttm_mem_global_free_page Konrad Rzeszutek Wilk
2011-12-21 15:17   ` Jerome Glisse
2011-12-21 15:30     ` Konrad Rzeszutek Wilk
2011-12-12 20:09 ` [PATCH 3/3] drm/ttm/dma: Optimize when free-ing the recycled page pool Konrad Rzeszutek Wilk
2011-12-21 15:22   ` Jerome Glisse
2011-12-13 16:07 ` [PATCH] fixes to drm-next - TTM DMA code (v1) Jerome Glisse
2011-12-13 16:23   ` Thomas Hellstrom
2011-12-13 16:40     ` Konrad Rzeszutek Wilk
2011-12-19 19:51       ` Thomas Hellstrom
2011-12-20 17:35         ` Konrad Rzeszutek Wilk
2012-01-05 19:37       ` Jerome Glisse
2011-12-13 22:09     ` James Simmons

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).