dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* Adding sync_shrinkers function for TTM pool optimization
@ 2021-08-20 12:05 Christian König
  2021-08-20 12:05 ` [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2 Christian König
  2021-08-20 12:05 ` [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2 Christian König
  0 siblings, 2 replies; 14+ messages in thread
From: Christian König @ 2021-08-20 12:05 UTC (permalink / raw)
  To: akpm; +Cc: dri-devel, daniel

Hi Andrew,

Daniel suggested that I ping you once more about this.

Basically we want to add a barrier function to make sure that our TTM pool shrinker is not freeing up pages from a device while the device is being unplugged.

Currently we are having a global mutex to serialize all of this, but this caused contention for unmapping the freed pages in the IOMMU.

We just need your Acked-by and I hope my explanation is now more understandable than the last time.

Cheers,
Christian.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2
  2021-08-20 12:05 Adding sync_shrinkers function for TTM pool optimization Christian König
@ 2021-08-20 12:05 ` Christian König
  2021-08-22 21:31   ` Andrew Morton
  2021-08-26 13:27   ` Daniel Vetter
  2021-08-20 12:05 ` [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2 Christian König
  1 sibling, 2 replies; 14+ messages in thread
From: Christian König @ 2021-08-20 12:05 UTC (permalink / raw)
  To: akpm; +Cc: dri-devel, daniel

From: Christian König <ckoenig.leichtzumerken@gmail.com>

While unplugging a device the TTM shrinker implementation
needs a barrier to make sure that all concurrent shrink
operations are done and no other CPU is referring to a
device specific pool any more.

Taking and releasing the shrinker semaphore on the write
side after unmapping and freeing all pages from the device
pool should make sure that no shrinker is running in
paralell.

This allows us to avoid the contented mutex in the TTM pool
implementation for every alloc/free operation.

v2: rework the commit message to make clear why we need this

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/linux/shrinker.h |  1 +
 mm/vmscan.c              | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 9814fff58a69..1de17f53cdbc 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -93,4 +93,5 @@ extern void register_shrinker_prepared(struct shrinker *shrinker);
 extern int register_shrinker(struct shrinker *shrinker);
 extern void unregister_shrinker(struct shrinker *shrinker);
 extern void free_prealloced_shrinker(struct shrinker *shrinker);
+extern void sync_shrinkers(void);
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4620df62f0ff..fde1aabcfa7f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -638,6 +638,16 @@ void unregister_shrinker(struct shrinker *shrinker)
 }
 EXPORT_SYMBOL(unregister_shrinker);
 
+/**
+ * sync_shrinker - Wait for all running shrinkers to complete.
+ */
+void sync_shrinkers(void)
+{
+	down_write(&shrinker_rwsem);
+	up_write(&shrinker_rwsem);
+}
+EXPORT_SYMBOL(sync_shrinkers);
+
 #define SHRINK_BATCH 128
 
 static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2
  2021-08-20 12:05 Adding sync_shrinkers function for TTM pool optimization Christian König
  2021-08-20 12:05 ` [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2 Christian König
@ 2021-08-20 12:05 ` Christian König
  1 sibling, 0 replies; 14+ messages in thread
From: Christian König @ 2021-08-20 12:05 UTC (permalink / raw)
  To: akpm; +Cc: dri-devel, daniel

Switch back to using a spinlock again by moving the IOMMU unmap outside
of the locked region.

This avoids contention especially while freeing pages.

v2: Add a comment explaining why we need sync_shrinkers().

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 40 +++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cb38b1a17b09..7d4f76d4141d 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -70,7 +70,7 @@ static struct ttm_pool_type global_uncached[MAX_ORDER];
 static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
 static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
 
-static struct mutex shrinker_lock;
+static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
 
@@ -263,9 +263,9 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
 	spin_lock_init(&pt->lock);
 	INIT_LIST_HEAD(&pt->pages);
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	list_add_tail(&pt->shrinker_list, &shrinker_list);
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 }
 
 /* Remove a pool_type from the global shrinker list and free all pages */
@@ -273,9 +273,9 @@ static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 {
 	struct page *p;
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	list_del(&pt->shrinker_list);
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 
 	while ((p = ttm_pool_type_take(pt)))
 		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
@@ -313,24 +313,23 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 static unsigned int ttm_pool_shrink(void)
 {
 	struct ttm_pool_type *pt;
-	unsigned int num_freed;
+	unsigned int num_pages;
 	struct page *p;
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list);
+	list_move_tail(&pt->shrinker_list, &shrinker_list);
+	spin_unlock(&shrinker_lock);
 
 	p = ttm_pool_type_take(pt);
 	if (p) {
 		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
-		num_freed = 1 << pt->order;
+		num_pages = 1 << pt->order;
 	} else {
-		num_freed = 0;
+		num_pages = 0;
 	}
 
-	list_move_tail(&pt->shrinker_list, &shrinker_list);
-	mutex_unlock(&shrinker_lock);
-
-	return num_freed;
+	return num_pages;
 }
 
 /* Return the allocation order based for a page */
@@ -530,6 +529,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
 			for (j = 0; j < MAX_ORDER; ++j)
 				ttm_pool_type_fini(&pool->caching[i].orders[j]);
 	}
+
+	/* We removed the pool types from the LRU, but we need to also make sure
+	 * that no shrinker is concurrently freeing pages from the pool.
+	 */
+	sync_shrinkers();
 }
 
 /* As long as pages are available make sure to release at least one */
@@ -604,7 +608,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data)
 {
 	ttm_pool_debugfs_header(m);
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	seq_puts(m, "wc\t:");
 	ttm_pool_debugfs_orders(global_write_combined, m);
 	seq_puts(m, "uc\t:");
@@ -613,7 +617,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data)
 	ttm_pool_debugfs_orders(global_dma32_write_combined, m);
 	seq_puts(m, "uc 32\t:");
 	ttm_pool_debugfs_orders(global_dma32_uncached, m);
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 
 	ttm_pool_debugfs_footer(m);
 
@@ -640,7 +644,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 
 	ttm_pool_debugfs_header(m);
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
 		seq_puts(m, "DMA ");
 		switch (i) {
@@ -656,7 +660,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 		}
 		ttm_pool_debugfs_orders(pool->caching[i].orders, m);
 	}
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 
 	ttm_pool_debugfs_footer(m);
 	return 0;
@@ -693,7 +697,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	if (!page_pool_size)
 		page_pool_size = num_pages;
 
-	mutex_init(&shrinker_lock);
+	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
 	for (i = 0; i < MAX_ORDER; ++i) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2
  2021-08-20 12:05 ` [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2 Christian König
@ 2021-08-22 21:31   ` Andrew Morton
  2021-08-26 13:27   ` Daniel Vetter
  1 sibling, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2021-08-22 21:31 UTC (permalink / raw)
  To:  Christian König ; +Cc: dri-devel, daniel

On Fri, 20 Aug 2021 14:05:27 +0200 "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> While unplugging a device the TTM shrinker implementation
> needs a barrier to make sure that all concurrent shrink
> operations are done and no other CPU is referring to a
> device specific pool any more.
> 
> Taking and releasing the shrinker semaphore on the write
> side after unmapping and freeing all pages from the device
> pool should make sure that no shrinker is running in
> paralell.
> 
> This allows us to avoid the contented mutex in the TTM pool
> implementation for every alloc/free operation.
> 
> v2: rework the commit message to make clear why we need this

Acked-by: Andrew Morton <akpm@linux-foundation.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2
  2021-08-20 12:05 ` [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2 Christian König
  2021-08-22 21:31   ` Andrew Morton
@ 2021-08-26 13:27   ` Daniel Vetter
  2021-08-26 13:28     ` Daniel Vetter
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Vetter @ 2021-08-26 13:27 UTC (permalink / raw)
  To: Christian König; +Cc: akpm, dri-devel, daniel

On Fri, Aug 20, 2021 at 02:05:27PM +0200, Christian König wrote:
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> 
> While unplugging a device the TTM shrinker implementation
> needs a barrier to make sure that all concurrent shrink
> operations are done and no other CPU is referring to a
> device specific pool any more.
> 
> Taking and releasing the shrinker semaphore on the write
> side after unmapping and freeing all pages from the device
> pool should make sure that no shrinker is running in
> paralell.
> 
> This allows us to avoid the contented mutex in the TTM pool
> implementation for every alloc/free operation.
> 
> v2: rework the commit message to make clear why we need this
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> Acked-by: Huang Rui <ray.huang@amd.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  include/linux/shrinker.h |  1 +
>  mm/vmscan.c              | 10 ++++++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> index 9814fff58a69..1de17f53cdbc 100644
> --- a/include/linux/shrinker.h
> +++ b/include/linux/shrinker.h
> @@ -93,4 +93,5 @@ extern void register_shrinker_prepared(struct shrinker *shrinker);
>  extern int register_shrinker(struct shrinker *shrinker);
>  extern void unregister_shrinker(struct shrinker *shrinker);
>  extern void free_prealloced_shrinker(struct shrinker *shrinker);
> +extern void sync_shrinkers(void);
>  #endif
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4620df62f0ff..fde1aabcfa7f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -638,6 +638,16 @@ void unregister_shrinker(struct shrinker *shrinker)
>  }
>  EXPORT_SYMBOL(unregister_shrinker);
>  
> +/**
> + * sync_shrinker - Wait for all running shrinkers to complete.

I think it would be good to add a bit more text here maybe:

"This is equivalent to calling unregister_shrink() and
register_shrinker(), but atomically and with less overhead. This is useful
to guarantee that all shrinker invocations have seen an update, before
freeing memory, similar to rcu."

Also a bit a bikeshed, but if we look at the equivalent in irq land it's
called synchronize_irq() and synchronize_hardirq(). I think it'd be good
to bikeshed that for more conceptual consistency.
-Daniel

> + */
> +void sync_shrinkers(void)
> +{
> +	down_write(&shrinker_rwsem);
> +	up_write(&shrinker_rwsem);
> +}
> +EXPORT_SYMBOL(sync_shrinkers);
> +
>  #define SHRINK_BATCH 128
>  
>  static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2
  2021-08-26 13:27   ` Daniel Vetter
@ 2021-08-26 13:28     ` Daniel Vetter
  2021-08-26 14:58       ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Vetter @ 2021-08-26 13:28 UTC (permalink / raw)
  To: Christian König; +Cc: akpm, dri-devel, daniel

On Thu, Aug 26, 2021 at 03:27:30PM +0200, Daniel Vetter wrote:
> On Fri, Aug 20, 2021 at 02:05:27PM +0200, Christian König wrote:
> > From: Christian König <ckoenig.leichtzumerken@gmail.com>
> > 
> > While unplugging a device the TTM shrinker implementation
> > needs a barrier to make sure that all concurrent shrink
> > operations are done and no other CPU is referring to a
> > device specific pool any more.
> > 
> > Taking and releasing the shrinker semaphore on the write
> > side after unmapping and freeing all pages from the device
> > pool should make sure that no shrinker is running in
> > paralell.
> > 
> > This allows us to avoid the contented mutex in the TTM pool
> > implementation for every alloc/free operation.
> > 
> > v2: rework the commit message to make clear why we need this
> > 
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > Acked-by: Huang Rui <ray.huang@amd.com>
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  include/linux/shrinker.h |  1 +
> >  mm/vmscan.c              | 10 ++++++++++
> >  2 files changed, 11 insertions(+)
> > 
> > diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> > index 9814fff58a69..1de17f53cdbc 100644
> > --- a/include/linux/shrinker.h
> > +++ b/include/linux/shrinker.h
> > @@ -93,4 +93,5 @@ extern void register_shrinker_prepared(struct shrinker *shrinker);
> >  extern int register_shrinker(struct shrinker *shrinker);
> >  extern void unregister_shrinker(struct shrinker *shrinker);
> >  extern void free_prealloced_shrinker(struct shrinker *shrinker);
> > +extern void sync_shrinkers(void);
> >  #endif
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 4620df62f0ff..fde1aabcfa7f 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -638,6 +638,16 @@ void unregister_shrinker(struct shrinker *shrinker)
> >  }
> >  EXPORT_SYMBOL(unregister_shrinker);
> >  
> > +/**
> > + * sync_shrinker - Wait for all running shrinkers to complete.
> 
> I think it would be good to add a bit more text here maybe:
> 
> "This is equivalent to calling unregister_shrink() and
> register_shrinker(), but atomically and with less overhead. This is useful
> to guarantee that all shrinker invocations have seen an update, before
> freeing memory, similar to rcu."
> 
> Also a bit a bikeshed, but if we look at the equivalent in irq land it's
> called synchronize_irq() and synchronize_hardirq(). I think it'd be good
> to bikeshed that for more conceptual consistency.

Oh also synchronize_*rcu* also spells them all out, so even more reasons
to do the same.
-Daniel

> > + */
> > +void sync_shrinkers(void)
> > +{
> > +	down_write(&shrinker_rwsem);
> > +	up_write(&shrinker_rwsem);
> > +}
> > +EXPORT_SYMBOL(sync_shrinkers);
> > +
> >  #define SHRINK_BATCH 128
> >  
> >  static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> > -- 
> > 2.25.1
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2
  2021-08-26 13:28     ` Daniel Vetter
@ 2021-08-26 14:58       ` Christian König
  2021-08-26 15:34         ` Daniel Vetter
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2021-08-26 14:58 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: akpm, dri-devel

Am 26.08.21 um 15:28 schrieb Daniel Vetter:
> On Thu, Aug 26, 2021 at 03:27:30PM +0200, Daniel Vetter wrote:
>> On Fri, Aug 20, 2021 at 02:05:27PM +0200, Christian König wrote:
>>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>>>
>>> While unplugging a device the TTM shrinker implementation
>>> needs a barrier to make sure that all concurrent shrink
>>> operations are done and no other CPU is referring to a
>>> device specific pool any more.
>>>
>>> Taking and releasing the shrinker semaphore on the write
>>> side after unmapping and freeing all pages from the device
>>> pool should make sure that no shrinker is running in
>>> paralell.
>>>
>>> This allows us to avoid the contented mutex in the TTM pool
>>> implementation for every alloc/free operation.
>>>
>>> v2: rework the commit message to make clear why we need this
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>> Acked-by: Huang Rui <ray.huang@amd.com>
>>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> ---
>>>   include/linux/shrinker.h |  1 +
>>>   mm/vmscan.c              | 10 ++++++++++
>>>   2 files changed, 11 insertions(+)
>>>
>>> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
>>> index 9814fff58a69..1de17f53cdbc 100644
>>> --- a/include/linux/shrinker.h
>>> +++ b/include/linux/shrinker.h
>>> @@ -93,4 +93,5 @@ extern void register_shrinker_prepared(struct shrinker *shrinker);
>>>   extern int register_shrinker(struct shrinker *shrinker);
>>>   extern void unregister_shrinker(struct shrinker *shrinker);
>>>   extern void free_prealloced_shrinker(struct shrinker *shrinker);
>>> +extern void sync_shrinkers(void);
>>>   #endif
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index 4620df62f0ff..fde1aabcfa7f 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -638,6 +638,16 @@ void unregister_shrinker(struct shrinker *shrinker)
>>>   }
>>>   EXPORT_SYMBOL(unregister_shrinker);
>>>   
>>> +/**
>>> + * sync_shrinker - Wait for all running shrinkers to complete.
>> I think it would be good to add a bit more text here maybe:
>>
>> "This is equivalent to calling unregister_shrink() and
>> register_shrinker(), but atomically and with less overhead. This is useful
>> to guarantee that all shrinker invocations have seen an update, before
>> freeing memory, similar to rcu."
>>
>> Also a bit a bikeshed, but if we look at the equivalent in irq land it's
>> called synchronize_irq() and synchronize_hardirq(). I think it'd be good
>> to bikeshed that for more conceptual consistency.
> Oh also synchronize_*rcu* also spells them all out, so even more reasons
> to do the same.

I will just go with the explanation above.

The synchronize_rcu() explanation is so extensive that most people will 
probably stop reading after the first paragraph.

Thanks,
Christian.

> -Daniel
>
>>> + */
>>> +void sync_shrinkers(void)
>>> +{
>>> +	down_write(&shrinker_rwsem);
>>> +	up_write(&shrinker_rwsem);
>>> +}
>>> +EXPORT_SYMBOL(sync_shrinkers);
>>> +
>>>   #define SHRINK_BATCH 128
>>>   
>>>   static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>>> -- 
>>> 2.25.1
>>>
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2
  2021-08-26 14:58       ` Christian König
@ 2021-08-26 15:34         ` Daniel Vetter
  2021-08-27  6:09           ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Vetter @ 2021-08-26 15:34 UTC (permalink / raw)
  To: Christian König; +Cc: Daniel Vetter, akpm, dri-devel

On Thu, Aug 26, 2021 at 04:58:06PM +0200, Christian König wrote:
> Am 26.08.21 um 15:28 schrieb Daniel Vetter:
> > On Thu, Aug 26, 2021 at 03:27:30PM +0200, Daniel Vetter wrote:
> > > On Fri, Aug 20, 2021 at 02:05:27PM +0200, Christian König wrote:
> > > > From: Christian König <ckoenig.leichtzumerken@gmail.com>
> > > > 
> > > > While unplugging a device the TTM shrinker implementation
> > > > needs a barrier to make sure that all concurrent shrink
> > > > operations are done and no other CPU is referring to a
> > > > device specific pool any more.
> > > > 
> > > > Taking and releasing the shrinker semaphore on the write
> > > > side after unmapping and freeing all pages from the device
> > > > pool should make sure that no shrinker is running in
> > > > paralell.
> > > > 
> > > > This allows us to avoid the contented mutex in the TTM pool
> > > > implementation for every alloc/free operation.
> > > > 
> > > > v2: rework the commit message to make clear why we need this
> > > > 
> > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > Acked-by: Huang Rui <ray.huang@amd.com>
> > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > ---
> > > >   include/linux/shrinker.h |  1 +
> > > >   mm/vmscan.c              | 10 ++++++++++
> > > >   2 files changed, 11 insertions(+)
> > > > 
> > > > diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
> > > > index 9814fff58a69..1de17f53cdbc 100644
> > > > --- a/include/linux/shrinker.h
> > > > +++ b/include/linux/shrinker.h
> > > > @@ -93,4 +93,5 @@ extern void register_shrinker_prepared(struct shrinker *shrinker);
> > > >   extern int register_shrinker(struct shrinker *shrinker);
> > > >   extern void unregister_shrinker(struct shrinker *shrinker);
> > > >   extern void free_prealloced_shrinker(struct shrinker *shrinker);
> > > > +extern void sync_shrinkers(void);
> > > >   #endif
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 4620df62f0ff..fde1aabcfa7f 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -638,6 +638,16 @@ void unregister_shrinker(struct shrinker *shrinker)
> > > >   }
> > > >   EXPORT_SYMBOL(unregister_shrinker);
> > > > +/**
> > > > + * sync_shrinker - Wait for all running shrinkers to complete.
> > > I think it would be good to add a bit more text here maybe:
> > > 
> > > "This is equivalent to calling unregister_shrink() and
> > > register_shrinker(), but atomically and with less overhead. This is useful
> > > to guarantee that all shrinker invocations have seen an update, before
> > > freeing memory, similar to rcu."
> > > 
> > > Also a bit a bikeshed, but if we look at the equivalent in irq land it's
> > > called synchronize_irq() and synchronize_hardirq(). I think it'd be good
> > > to bikeshed that for more conceptual consistency.
> > Oh also synchronize_*rcu* also spells them all out, so even more reasons
> > to do the same.
> 
> I will just go with the explanation above.
> 
> The synchronize_rcu() explanation is so extensive that most people will
> probably stop reading after the first paragraph.

Ack, my comment was only about the function name (spelled out instead of
abbreviated), not about pulling the entire kerneldoc in from these.
-Daniel

> 
> Thanks,
> Christian.
> 
> > -Daniel
> > 
> > > > + */
> > > > +void sync_shrinkers(void)
> > > > +{
> > > > +	down_write(&shrinker_rwsem);
> > > > +	up_write(&shrinker_rwsem);
> > > > +}
> > > > +EXPORT_SYMBOL(sync_shrinkers);
> > > > +
> > > >   #define SHRINK_BATCH 128
> > > >   static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
> > > > -- 
> > > > 2.25.1
> > > > 
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2
  2021-08-26 15:34         ` Daniel Vetter
@ 2021-08-27  6:09           ` Christian König
  0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2021-08-27  6:09 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: akpm, dri-devel



Am 26.08.21 um 17:34 schrieb Daniel Vetter:
> On Thu, Aug 26, 2021 at 04:58:06PM +0200, Christian König wrote:
>> Am 26.08.21 um 15:28 schrieb Daniel Vetter:
>>> On Thu, Aug 26, 2021 at 03:27:30PM +0200, Daniel Vetter wrote:
>>>> On Fri, Aug 20, 2021 at 02:05:27PM +0200, Christian König wrote:
>>>>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>>>>>
>>>>> While unplugging a device the TTM shrinker implementation
>>>>> needs a barrier to make sure that all concurrent shrink
>>>>> operations are done and no other CPU is referring to a
>>>>> device specific pool any more.
>>>>>
>>>>> Taking and releasing the shrinker semaphore on the write
>>>>> side after unmapping and freeing all pages from the device
>>>>> pool should make sure that no shrinker is running in
>>>>> paralell.
>>>>>
>>>>> This allows us to avoid the contented mutex in the TTM pool
>>>>> implementation for every alloc/free operation.
>>>>>
>>>>> v2: rework the commit message to make clear why we need this
>>>>>
>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>> Acked-by: Huang Rui <ray.huang@amd.com>
>>>>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>> ---
>>>>>    include/linux/shrinker.h |  1 +
>>>>>    mm/vmscan.c              | 10 ++++++++++
>>>>>    2 files changed, 11 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
>>>>> index 9814fff58a69..1de17f53cdbc 100644
>>>>> --- a/include/linux/shrinker.h
>>>>> +++ b/include/linux/shrinker.h
>>>>> @@ -93,4 +93,5 @@ extern void register_shrinker_prepared(struct shrinker *shrinker);
>>>>>    extern int register_shrinker(struct shrinker *shrinker);
>>>>>    extern void unregister_shrinker(struct shrinker *shrinker);
>>>>>    extern void free_prealloced_shrinker(struct shrinker *shrinker);
>>>>> +extern void sync_shrinkers(void);
>>>>>    #endif
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index 4620df62f0ff..fde1aabcfa7f 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -638,6 +638,16 @@ void unregister_shrinker(struct shrinker *shrinker)
>>>>>    }
>>>>>    EXPORT_SYMBOL(unregister_shrinker);
>>>>> +/**
>>>>> + * sync_shrinker - Wait for all running shrinkers to complete.
>>>> I think it would be good to add a bit more text here maybe:
>>>>
>>>> "This is equivalent to calling unregister_shrink() and
>>>> register_shrinker(), but atomically and with less overhead. This is useful
>>>> to guarantee that all shrinker invocations have seen an update, before
>>>> freeing memory, similar to rcu."
>>>>
>>>> Also a bit a bikeshed, but if we look at the equivalent in irq land it's
>>>> called synchronize_irq() and synchronize_hardirq(). I think it'd be good
>>>> to bikeshed that for more conceptual consistency.
>>> Oh also synchronize_*rcu* also spells them all out, so even more reasons
>>> to do the same.
>> I will just go with the explanation above.
>>
>> The synchronize_rcu() explanation is so extensive that most people will
>> probably stop reading after the first paragraph.
> Ack, my comment was only about the function name (spelled out instead of
> abbreviated), not about pulling the entire kerneldoc in from these.

Ah, good point. Going to change that as well.

Christian.

> -Daniel
>
>> Thanks,
>> Christian.
>>
>>> -Daniel
>>>
>>>>> + */
>>>>> +void sync_shrinkers(void)
>>>>> +{
>>>>> +	down_write(&shrinker_rwsem);
>>>>> +	up_write(&shrinker_rwsem);
>>>>> +}
>>>>> +EXPORT_SYMBOL(sync_shrinkers);
>>>>> +
>>>>>    #define SHRINK_BATCH 128
>>>>>    static unsigned long do_shrink_slab(struct shrink_control *shrinkctl,
>>>>> -- 
>>>>> 2.25.1
>>>>>
>>>> -- 
>>>> Daniel Vetter
>>>> Software Engineer, Intel Corporation
>>>> http://blog.ffwll.ch


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2
  2021-04-16  7:08     ` Christian König
@ 2021-04-26 11:15       ` Christian König
  0 siblings, 0 replies; 14+ messages in thread
From: Christian König @ 2021-04-26 11:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, dri-devel, linux-mm, ray.huang, vbabka

Just a gentle ping?

Are you ok with this explanation Andrew or should I look for a different 
approach?

Thanks,
Christian.

Am 16.04.21 um 09:08 schrieb Christian König:
> Am 15.04.21 um 22:33 schrieb Andrew Morton:
>> On Thu, 15 Apr 2021 13:56:24 +0200 "Christian König" 
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>>> @@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
>>>               for (j = 0; j < MAX_ORDER; ++j)
>>> ttm_pool_type_fini(&pool->caching[i].orders[j]);
>>>       }
>>> +
>>> +    /* We removed the pool types from the LRU, but we need to also 
>>> make sure
>>> +     * that no shrinker is concurrently freeing pages from the pool.
>>> +     */
>>> +    sync_shrinkers();
>> It isn't immediately clear to me how this works. ttm_pool_fini() has
>> already freed all the pages hasn't it?  So why would it care if some
>> shrinkers are still playing with the pages?
>
> Yes ttm_pool_fini() has freed up all pages which had been in the pool 
> when the function was called.
>
> But the problem is it is possible that a parallel running shrinker has 
> taken a page from the pool and is in the process of freeing it up.
>
> When I return here the pool structure and especially the device 
> structure are freed while the parallel running shrinker is still using 
> them.
>
> I could go for a design where we have one shrinker per device instead, 
> but that would put a bit to much pressure on the pool in my opinion.
>
>> Or is it the case that ttm_pool_fini() is assuming that there will be
>> some further action against these pages, which requires that shrinkers
>> no longer be accessing the pages and which further assumes that future
>> shrinker invocations will not be able to look up these pages?
>>
>> IOW, a bit more explanation about the dynamics here would help!
>
> Sorry, I'm not a native speaker of English and sometimes still have a 
> hard time explaining things.
>
> Regards,
> Christian.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2
  2021-04-15 20:33   ` Andrew Morton
@ 2021-04-16  7:08     ` Christian König
  2021-04-26 11:15       ` Christian König
  0 siblings, 1 reply; 14+ messages in thread
From: Christian König @ 2021-04-16  7:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, dri-devel, linux-mm, ray.huang, vbabka

Am 15.04.21 um 22:33 schrieb Andrew Morton:
> On Thu, 15 Apr 2021 13:56:24 +0200 "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:
>
>> @@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
>>   			for (j = 0; j < MAX_ORDER; ++j)
>>   				ttm_pool_type_fini(&pool->caching[i].orders[j]);
>>   	}
>> +
>> +	/* We removed the pool types from the LRU, but we need to also make sure
>> +	 * that no shrinker is concurrently freeing pages from the pool.
>> +	 */
>> +	sync_shrinkers();
> It isn't immediately clear to me how this works.  ttm_pool_fini() has
> already freed all the pages hasn't it?  So why would it care if some
> shrinkers are still playing with the pages?

Yes ttm_pool_fini() has freed up all pages which had been in the pool 
when the function was called.

But the problem is it is possible that a parallel running shrinker has 
taken a page from the pool and is in the process of freeing it up.

When I return here the pool structure and especially the device 
structure are freed while the parallel running shrinker is still using them.

I could go for a design where we have one shrinker per device instead, 
but that would put a bit to much pressure on the pool in my opinion.

> Or is it the case that ttm_pool_fini() is assuming that there will be
> some further action against these pages, which requires that shrinkers
> no longer be accessing the pages and which further assumes that future
> shrinker invocations will not be able to look up these pages?
>
> IOW, a bit more explanation about the dynamics here would help!

Sorry, I'm not a native speaker of English and sometimes still have a 
hard time explaining things.

Regards,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2
  2021-04-15 11:56 ` [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2 Christian König
  2021-04-15 14:33   ` Huang Rui
@ 2021-04-15 20:33   ` Andrew Morton
  2021-04-16  7:08     ` Christian König
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2021-04-15 20:33 UTC (permalink / raw)
  To:  Christian König 
  Cc: linux-kernel, dri-devel, linux-mm, ray.huang, vbabka

On Thu, 15 Apr 2021 13:56:24 +0200 "Christian König" <ckoenig.leichtzumerken@gmail.com> wrote:

> @@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
>  			for (j = 0; j < MAX_ORDER; ++j)
>  				ttm_pool_type_fini(&pool->caching[i].orders[j]);
>  	}
> +
> +	/* We removed the pool types from the LRU, but we need to also make sure
> +	 * that no shrinker is concurrently freeing pages from the pool.
> +	 */
> +	sync_shrinkers();

It isn't immediately clear to me how this works.  ttm_pool_fini() has
already freed all the pages hasn't it?  So why would it care if some
shrinkers are still playing with the pages?

Or is it the case that ttm_pool_fini() is assuming that there will be
some further action against these pages, which requires that shrinkers
no longer be accessing the pages and which further assumes that future
shrinker invocations will not be able to look up these pages?

IOW, a bit more explanation about the dynamics here would help!
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2
  2021-04-15 11:56 ` [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2 Christian König
@ 2021-04-15 14:33   ` Huang Rui
  2021-04-15 20:33   ` Andrew Morton
  1 sibling, 0 replies; 14+ messages in thread
From: Huang Rui @ 2021-04-15 14:33 UTC (permalink / raw)
  To: Christian König; +Cc: linux-kernel, dri-devel, linux-mm, akpm, vbabka

On Thu, Apr 15, 2021 at 07:56:24PM +0800, Christian König wrote:
> Switch back to using a spinlock again by moving the IOMMU unmap outside
> of the locked region.
> 
> v2: Add a comment explaining why we need sync_shrinkers().
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>

Series look good for me as well.

Acked-by: Huang Rui <ray.huang@amd.com>

> ---
>  drivers/gpu/drm/ttm/ttm_pool.c | 44 +++++++++++++++++-----------------
>  1 file changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index cb38b1a17b09..955836d569cc 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -70,7 +70,7 @@ static struct ttm_pool_type global_uncached[MAX_ORDER];
>  static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
>  static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
>  
> -static struct mutex shrinker_lock;
> +static spinlock_t shrinker_lock;
>  static struct list_head shrinker_list;
>  static struct shrinker mm_shrinker;
>  
> @@ -263,9 +263,9 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
>  	spin_lock_init(&pt->lock);
>  	INIT_LIST_HEAD(&pt->pages);
>  
> -	mutex_lock(&shrinker_lock);
> +	spin_lock(&shrinker_lock);
>  	list_add_tail(&pt->shrinker_list, &shrinker_list);
> -	mutex_unlock(&shrinker_lock);
> +	spin_unlock(&shrinker_lock);
>  }
>  
>  /* Remove a pool_type from the global shrinker list and free all pages */
> @@ -273,9 +273,9 @@ static void ttm_pool_type_fini(struct ttm_pool_type *pt)
>  {
>  	struct page *p;
>  
> -	mutex_lock(&shrinker_lock);
> +	spin_lock(&shrinker_lock);
>  	list_del(&pt->shrinker_list);
> -	mutex_unlock(&shrinker_lock);
> +	spin_unlock(&shrinker_lock);
>  
>  	while ((p = ttm_pool_type_take(pt)))
>  		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
> @@ -313,24 +313,19 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
>  static unsigned int ttm_pool_shrink(void)
>  {
>  	struct ttm_pool_type *pt;
> -	unsigned int num_freed;
>  	struct page *p;
>  
> -	mutex_lock(&shrinker_lock);
> +	spin_lock(&shrinker_lock);
>  	pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list);
> +	list_move_tail(&pt->shrinker_list, &shrinker_list);
> +	spin_unlock(&shrinker_lock);
>  
>  	p = ttm_pool_type_take(pt);
> -	if (p) {
> -		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
> -		num_freed = 1 << pt->order;
> -	} else {
> -		num_freed = 0;
> -	}
> -
> -	list_move_tail(&pt->shrinker_list, &shrinker_list);
> -	mutex_unlock(&shrinker_lock);
> +	if (!p)
> +		return 0;
>  
> -	return num_freed;
> +	ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
> +	return 1 << pt->order;
>  }
>  
>  /* Return the allocation order based for a page */
> @@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
>  			for (j = 0; j < MAX_ORDER; ++j)
>  				ttm_pool_type_fini(&pool->caching[i].orders[j]);
>  	}
> +
> +	/* We removed the pool types from the LRU, but we need to also make sure
> +	 * that no shrinker is concurrently freeing pages from the pool.
> +	 */
> +	sync_shrinkers();
>  }
>  
>  /* As long as pages are available make sure to release at least one */
> @@ -604,7 +604,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data)
>  {
>  	ttm_pool_debugfs_header(m);
>  
> -	mutex_lock(&shrinker_lock);
> +	spin_lock(&shrinker_lock);
>  	seq_puts(m, "wc\t:");
>  	ttm_pool_debugfs_orders(global_write_combined, m);
>  	seq_puts(m, "uc\t:");
> @@ -613,7 +613,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data)
>  	ttm_pool_debugfs_orders(global_dma32_write_combined, m);
>  	seq_puts(m, "uc 32\t:");
>  	ttm_pool_debugfs_orders(global_dma32_uncached, m);
> -	mutex_unlock(&shrinker_lock);
> +	spin_unlock(&shrinker_lock);
>  
>  	ttm_pool_debugfs_footer(m);
>  
> @@ -640,7 +640,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
>  
>  	ttm_pool_debugfs_header(m);
>  
> -	mutex_lock(&shrinker_lock);
> +	spin_lock(&shrinker_lock);
>  	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
>  		seq_puts(m, "DMA ");
>  		switch (i) {
> @@ -656,7 +656,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
>  		}
>  		ttm_pool_debugfs_orders(pool->caching[i].orders, m);
>  	}
> -	mutex_unlock(&shrinker_lock);
> +	spin_unlock(&shrinker_lock);
>  
>  	ttm_pool_debugfs_footer(m);
>  	return 0;
> @@ -693,7 +693,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
>  	if (!page_pool_size)
>  		page_pool_size = num_pages;
>  
> -	mutex_init(&shrinker_lock);
> +	spin_lock_init(&shrinker_lock);
>  	INIT_LIST_HEAD(&shrinker_list);
>  
>  	for (i = 0; i < MAX_ORDER; ++i) {
> -- 
> 2.25.1
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2
  2021-04-15 11:56 [PATCH 1/2] mm/vmscan: add sync_shrinkers function Christian König
@ 2021-04-15 11:56 ` Christian König
  2021-04-15 14:33   ` Huang Rui
  2021-04-15 20:33   ` Andrew Morton
  0 siblings, 2 replies; 14+ messages in thread
From: Christian König @ 2021-04-15 11:56 UTC (permalink / raw)
  To: linux-mm, linux-kernel, dri-devel; +Cc: akpm, ray.huang, vbabka

Switch back to using a spinlock again by moving the IOMMU unmap outside
of the locked region.

v2: Add a comment explaining why we need sync_shrinkers().

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 44 +++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index cb38b1a17b09..955836d569cc 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -70,7 +70,7 @@ static struct ttm_pool_type global_uncached[MAX_ORDER];
 static struct ttm_pool_type global_dma32_write_combined[MAX_ORDER];
 static struct ttm_pool_type global_dma32_uncached[MAX_ORDER];
 
-static struct mutex shrinker_lock;
+static spinlock_t shrinker_lock;
 static struct list_head shrinker_list;
 static struct shrinker mm_shrinker;
 
@@ -263,9 +263,9 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
 	spin_lock_init(&pt->lock);
 	INIT_LIST_HEAD(&pt->pages);
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	list_add_tail(&pt->shrinker_list, &shrinker_list);
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 }
 
 /* Remove a pool_type from the global shrinker list and free all pages */
@@ -273,9 +273,9 @@ static void ttm_pool_type_fini(struct ttm_pool_type *pt)
 {
 	struct page *p;
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	list_del(&pt->shrinker_list);
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 
 	while ((p = ttm_pool_type_take(pt)))
 		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
@@ -313,24 +313,19 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
 static unsigned int ttm_pool_shrink(void)
 {
 	struct ttm_pool_type *pt;
-	unsigned int num_freed;
 	struct page *p;
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	pt = list_first_entry(&shrinker_list, typeof(*pt), shrinker_list);
+	list_move_tail(&pt->shrinker_list, &shrinker_list);
+	spin_unlock(&shrinker_lock);
 
 	p = ttm_pool_type_take(pt);
-	if (p) {
-		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
-		num_freed = 1 << pt->order;
-	} else {
-		num_freed = 0;
-	}
-
-	list_move_tail(&pt->shrinker_list, &shrinker_list);
-	mutex_unlock(&shrinker_lock);
+	if (!p)
+		return 0;
 
-	return num_freed;
+	ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
+	return 1 << pt->order;
 }
 
 /* Return the allocation order based for a page */
@@ -530,6 +525,11 @@ void ttm_pool_fini(struct ttm_pool *pool)
 			for (j = 0; j < MAX_ORDER; ++j)
 				ttm_pool_type_fini(&pool->caching[i].orders[j]);
 	}
+
+	/* We removed the pool types from the LRU, but we need to also make sure
+	 * that no shrinker is concurrently freeing pages from the pool.
+	 */
+	sync_shrinkers();
 }
 
 /* As long as pages are available make sure to release at least one */
@@ -604,7 +604,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data)
 {
 	ttm_pool_debugfs_header(m);
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	seq_puts(m, "wc\t:");
 	ttm_pool_debugfs_orders(global_write_combined, m);
 	seq_puts(m, "uc\t:");
@@ -613,7 +613,7 @@ static int ttm_pool_debugfs_globals_show(struct seq_file *m, void *data)
 	ttm_pool_debugfs_orders(global_dma32_write_combined, m);
 	seq_puts(m, "uc 32\t:");
 	ttm_pool_debugfs_orders(global_dma32_uncached, m);
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 
 	ttm_pool_debugfs_footer(m);
 
@@ -640,7 +640,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 
 	ttm_pool_debugfs_header(m);
 
-	mutex_lock(&shrinker_lock);
+	spin_lock(&shrinker_lock);
 	for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
 		seq_puts(m, "DMA ");
 		switch (i) {
@@ -656,7 +656,7 @@ int ttm_pool_debugfs(struct ttm_pool *pool, struct seq_file *m)
 		}
 		ttm_pool_debugfs_orders(pool->caching[i].orders, m);
 	}
-	mutex_unlock(&shrinker_lock);
+	spin_unlock(&shrinker_lock);
 
 	ttm_pool_debugfs_footer(m);
 	return 0;
@@ -693,7 +693,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
 	if (!page_pool_size)
 		page_pool_size = num_pages;
 
-	mutex_init(&shrinker_lock);
+	spin_lock_init(&shrinker_lock);
 	INIT_LIST_HEAD(&shrinker_list);
 
 	for (i = 0; i < MAX_ORDER; ++i) {
-- 
2.25.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-08-27  6:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-20 12:05 Adding sync_shrinkers function for TTM pool optimization Christian König
2021-08-20 12:05 ` [PATCH 1/2] mm/vmscan: add sync_shrinkers function v2 Christian König
2021-08-22 21:31   ` Andrew Morton
2021-08-26 13:27   ` Daniel Vetter
2021-08-26 13:28     ` Daniel Vetter
2021-08-26 14:58       ` Christian König
2021-08-26 15:34         ` Daniel Vetter
2021-08-27  6:09           ` Christian König
2021-08-20 12:05 ` [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2 Christian König
  -- strict thread matches above, loose matches on Subject: below --
2021-04-15 11:56 [PATCH 1/2] mm/vmscan: add sync_shrinkers function Christian König
2021-04-15 11:56 ` [PATCH 2/2] drm/ttm: optimize the pool shrinker a bit v2 Christian König
2021-04-15 14:33   ` Huang Rui
2021-04-15 20:33   ` Andrew Morton
2021-04-16  7:08     ` Christian König
2021-04-26 11:15       ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).