All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown()
@ 2018-03-27 23:06 Shakeel Butt
  2018-03-28  0:16 ` David Rientjes
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Shakeel Butt @ 2018-03-27 23:06 UTC (permalink / raw)
  To: Andrey Ryabinin, Vladimir Davydov, Alexander Potapenko,
	Greg Thelen, Dmitry Vyukov, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton
  Cc: linux-mm, linux-kernel, Shakeel Butt

The kasan quarantine is designed to delay freeing slab objects to catch
use-after-free. The quarantine can be large (several percent of machine
memory size). When kmem_caches are deleted related objects are flushed
from the quarantine but this requires scanning the entire quarantine
which can be very slow. We have seen the kernel busily working on this
while holding slab_mutex and badly affecting cache_reaper, slabinfo
readers and memcg kmem cache creations.

It can easily reproduced by following script:

	yes . | head -1000000 | xargs stat > /dev/null
	for i in `seq 1 10`; do
		seq 500 | (cd /cg/memory && xargs mkdir)
		seq 500 | xargs -I{} sh -c 'echo $BASHPID > \
			/cg/memory/{}/tasks && exec stat .' > /dev/null
		seq 500 | (cd /cg/memory && xargs rmdir)
	done

The busy stack:
    kasan_cache_shutdown
    shutdown_cache
    memcg_destroy_kmem_caches
    mem_cgroup_css_free
    css_free_rwork_fn
    process_one_work
    worker_thread
    kthread
    ret_from_fork

This patch is based on the observation that if the kmem_cache to be
destroyed is empty then there should not be any objects of this cache in
the quarantine.

Without the patch the script got stuck for couple of hours. With the
patch the script completed within a second.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 mm/kasan/kasan.c |  3 ++-
 mm/slab.c        | 12 ++++++++++++
 mm/slab.h        |  1 +
 mm/slub.c        | 11 +++++++++++
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 49fffb0ca83b..135ce2838c89 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -382,7 +382,8 @@ void kasan_cache_shrink(struct kmem_cache *cache)
 
 void kasan_cache_shutdown(struct kmem_cache *cache)
 {
-	quarantine_remove_cache(cache);
+	if (!__kmem_cache_empty(cache))
+		quarantine_remove_cache(cache);
 }
 
 size_t kasan_metadata_size(struct kmem_cache *cache)
diff --git a/mm/slab.c b/mm/slab.c
index 9212c64bb705..b59f2cdf28d1 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2291,6 +2291,18 @@ static int drain_freelist(struct kmem_cache *cache,
 	return nr_freed;
 }
 
+bool __kmem_cache_empty(struct kmem_cache *s)
+{
+	int node;
+	struct kmem_cache_node *n;
+
+	for_each_kmem_cache_node(s, node, n)
+		if (!list_empty(&n->slabs_full) ||
+		    !list_empty(&n->slabs_partial))
+			return false;
+	return true;
+}
+
 int __kmem_cache_shrink(struct kmem_cache *cachep)
 {
 	int ret = 0;
diff --git a/mm/slab.h b/mm/slab.h
index e8981e811c45..68bdf498da3b 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -166,6 +166,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
 			      SLAB_TEMPORARY | \
 			      SLAB_ACCOUNT)
 
+bool __kmem_cache_empty(struct kmem_cache *);
 int __kmem_cache_shutdown(struct kmem_cache *);
 void __kmem_cache_release(struct kmem_cache *);
 int __kmem_cache_shrink(struct kmem_cache *);
diff --git a/mm/slub.c b/mm/slub.c
index 1edc8d97c862..44aa7847324a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3707,6 +3707,17 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
 		discard_slab(s, page);
 }
 
+bool __kmem_cache_empty(struct kmem_cache *s)
+{
+	int node;
+	struct kmem_cache_node *n;
+
+	for_each_kmem_cache_node(s, node, n)
+		if (n->nr_partial || slabs_node(s, node))
+			return false;
+	return true;
+}
+
 /*
  * Release all resources used by a slab cache.
  */
-- 
2.17.0.rc1.321.gba9d0f2565-goog

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown()
  2018-03-27 23:06 [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown() Shakeel Butt
@ 2018-03-28  0:16 ` David Rientjes
  2018-03-28  5:41   ` Shakeel Butt
  2018-03-28 17:03 ` Andrey Ryabinin
  2018-03-28 22:00 ` Christopher Lameter
  2 siblings, 1 reply; 6+ messages in thread
From: David Rientjes @ 2018-03-28  0:16 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrey Ryabinin, Vladimir Davydov, Alexander Potapenko,
	Greg Thelen, Dmitry Vyukov, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-kernel

On Tue, 27 Mar 2018, Shakeel Butt wrote:

> diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
> index 49fffb0ca83b..135ce2838c89 100644
> --- a/mm/kasan/kasan.c
> +++ b/mm/kasan/kasan.c
> @@ -382,7 +382,8 @@ void kasan_cache_shrink(struct kmem_cache *cache)
>  
>  void kasan_cache_shutdown(struct kmem_cache *cache)
>  {
> -	quarantine_remove_cache(cache);
> +	if (!__kmem_cache_empty(cache))
> +		quarantine_remove_cache(cache);
>  }
>  
>  size_t kasan_metadata_size(struct kmem_cache *cache)
> diff --git a/mm/slab.c b/mm/slab.c
> index 9212c64bb705..b59f2cdf28d1 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2291,6 +2291,18 @@ static int drain_freelist(struct kmem_cache *cache,
>  	return nr_freed;
>  }
>  
> +bool __kmem_cache_empty(struct kmem_cache *s)
> +{
> +	int node;
> +	struct kmem_cache_node *n;
> +
> +	for_each_kmem_cache_node(s, node, n)
> +		if (!list_empty(&n->slabs_full) ||
> +		    !list_empty(&n->slabs_partial))
> +			return false;
> +	return true;
> +}
> +
>  int __kmem_cache_shrink(struct kmem_cache *cachep)
>  {
>  	int ret = 0;
> diff --git a/mm/slab.h b/mm/slab.h
> index e8981e811c45..68bdf498da3b 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -166,6 +166,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
>  			      SLAB_TEMPORARY | \
>  			      SLAB_ACCOUNT)
>  
> +bool __kmem_cache_empty(struct kmem_cache *);
>  int __kmem_cache_shutdown(struct kmem_cache *);
>  void __kmem_cache_release(struct kmem_cache *);
>  int __kmem_cache_shrink(struct kmem_cache *);
> diff --git a/mm/slub.c b/mm/slub.c
> index 1edc8d97c862..44aa7847324a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3707,6 +3707,17 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
>  		discard_slab(s, page);
>  }
>  
> +bool __kmem_cache_empty(struct kmem_cache *s)
> +{
> +	int node;
> +	struct kmem_cache_node *n;
> +
> +	for_each_kmem_cache_node(s, node, n)
> +		if (n->nr_partial || slabs_node(s, node))
> +			return false;
> +	return true;
> +}
> +
>  /*
>   * Release all resources used by a slab cache.
>   */

Any reason not to just make quarantine_remove_cache() part of 
__kmem_cache_shutdown() instead of duplicating its logic?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown()
  2018-03-28  0:16 ` David Rientjes
@ 2018-03-28  5:41   ` Shakeel Butt
  2018-03-28 21:29     ` David Rientjes
  0 siblings, 1 reply; 6+ messages in thread
From: Shakeel Butt @ 2018-03-28  5:41 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrey Ryabinin, Vladimir Davydov, Alexander Potapenko,
	Greg Thelen, Dmitry Vyukov, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Linux MM, LKML

On Tue, Mar 27, 2018 at 5:16 PM, David Rientjes <rientjes@google.com> wrote:
> On Tue, 27 Mar 2018, Shakeel Butt wrote:
>
>> diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
>> index 49fffb0ca83b..135ce2838c89 100644
>> --- a/mm/kasan/kasan.c
>> +++ b/mm/kasan/kasan.c
>> @@ -382,7 +382,8 @@ void kasan_cache_shrink(struct kmem_cache *cache)
>>
>>  void kasan_cache_shutdown(struct kmem_cache *cache)
>>  {
>> -     quarantine_remove_cache(cache);
>> +     if (!__kmem_cache_empty(cache))
>> +             quarantine_remove_cache(cache);
>>  }
>>
>>  size_t kasan_metadata_size(struct kmem_cache *cache)
>> diff --git a/mm/slab.c b/mm/slab.c
>> index 9212c64bb705..b59f2cdf28d1 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -2291,6 +2291,18 @@ static int drain_freelist(struct kmem_cache *cache,
>>       return nr_freed;
>>  }
>>
>> +bool __kmem_cache_empty(struct kmem_cache *s)
>> +{
>> +     int node;
>> +     struct kmem_cache_node *n;
>> +
>> +     for_each_kmem_cache_node(s, node, n)
>> +             if (!list_empty(&n->slabs_full) ||
>> +                 !list_empty(&n->slabs_partial))
>> +                     return false;
>> +     return true;
>> +}
>> +
>>  int __kmem_cache_shrink(struct kmem_cache *cachep)
>>  {
>>       int ret = 0;
>> diff --git a/mm/slab.h b/mm/slab.h
>> index e8981e811c45..68bdf498da3b 100644
>> --- a/mm/slab.h
>> +++ b/mm/slab.h
>> @@ -166,6 +166,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
>>                             SLAB_TEMPORARY | \
>>                             SLAB_ACCOUNT)
>>
>> +bool __kmem_cache_empty(struct kmem_cache *);
>>  int __kmem_cache_shutdown(struct kmem_cache *);
>>  void __kmem_cache_release(struct kmem_cache *);
>>  int __kmem_cache_shrink(struct kmem_cache *);
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 1edc8d97c862..44aa7847324a 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -3707,6 +3707,17 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
>>               discard_slab(s, page);
>>  }
>>
>> +bool __kmem_cache_empty(struct kmem_cache *s)
>> +{
>> +     int node;
>> +     struct kmem_cache_node *n;
>> +
>> +     for_each_kmem_cache_node(s, node, n)
>> +             if (n->nr_partial || slabs_node(s, node))
>> +                     return false;
>> +     return true;
>> +}
>> +
>>  /*
>>   * Release all resources used by a slab cache.
>>   */
>
> Any reason not to just make quarantine_remove_cache() part of
> __kmem_cache_shutdown() instead of duplicating its logic?

Can you please explain what you mean by making
quarantine_remove_cache() part of __kmem_cache_shutdown()? Do you mean
calling quarantine_remove_cache() inside __kmem_cache_shutdown()? The
__kmem_cache_shutdown() of both SLAB & SLUB does per-cpu
draining/flushing and we want the free the quarantined objects before
that. So, I am not sure how to incorporate  quarantine_remove_cache()
into __kmem_cache_shutdown() without duplicating the for-loop &
if-condition.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown()
  2018-03-27 23:06 [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown() Shakeel Butt
  2018-03-28  0:16 ` David Rientjes
@ 2018-03-28 17:03 ` Andrey Ryabinin
  2018-03-28 22:00 ` Christopher Lameter
  2 siblings, 0 replies; 6+ messages in thread
From: Andrey Ryabinin @ 2018-03-28 17:03 UTC (permalink / raw)
  To: Shakeel Butt, Vladimir Davydov, Alexander Potapenko, Greg Thelen,
	Dmitry Vyukov, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton
  Cc: linux-mm, linux-kernel



On 03/28/2018 02:06 AM, Shakeel Butt wrote:
> The kasan quarantine is designed to delay freeing slab objects to catch
> use-after-free. The quarantine can be large (several percent of machine
> memory size). When kmem_caches are deleted related objects are flushed
> from the quarantine but this requires scanning the entire quarantine
> which can be very slow. We have seen the kernel busily working on this
> while holding slab_mutex and badly affecting cache_reaper, slabinfo
> readers and memcg kmem cache creations.
> 
> It can easily reproduced by following script:
> 
> 	yes . | head -1000000 | xargs stat > /dev/null
> 	for i in `seq 1 10`; do
> 		seq 500 | (cd /cg/memory && xargs mkdir)
> 		seq 500 | xargs -I{} sh -c 'echo $BASHPID > \
> 			/cg/memory/{}/tasks && exec stat .' > /dev/null
> 		seq 500 | (cd /cg/memory && xargs rmdir)
> 	done
> 
> The busy stack:
>     kasan_cache_shutdown
>     shutdown_cache
>     memcg_destroy_kmem_caches
>     mem_cgroup_css_free
>     css_free_rwork_fn
>     process_one_work
>     worker_thread
>     kthread
>     ret_from_fork
> 
> This patch is based on the observation that if the kmem_cache to be
> destroyed is empty then there should not be any objects of this cache in
> the quarantine.
> 
> Without the patch the script got stuck for couple of hours. With the
> patch the script completed within a second.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> 

Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown()
  2018-03-28  5:41   ` Shakeel Butt
@ 2018-03-28 21:29     ` David Rientjes
  0 siblings, 0 replies; 6+ messages in thread
From: David Rientjes @ 2018-03-28 21:29 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrey Ryabinin, Vladimir Davydov, Alexander Potapenko,
	Greg Thelen, Dmitry Vyukov, Christoph Lameter, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Linux MM, LKML

On Tue, 27 Mar 2018, Shakeel Butt wrote:

> On Tue, Mar 27, 2018 at 5:16 PM, David Rientjes <rientjes@google.com> wrote:
> > On Tue, 27 Mar 2018, Shakeel Butt wrote:
> >
> >> diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
> >> index 49fffb0ca83b..135ce2838c89 100644
> >> --- a/mm/kasan/kasan.c
> >> +++ b/mm/kasan/kasan.c
> >> @@ -382,7 +382,8 @@ void kasan_cache_shrink(struct kmem_cache *cache)
> >>
> >>  void kasan_cache_shutdown(struct kmem_cache *cache)
> >>  {
> >> -     quarantine_remove_cache(cache);
> >> +     if (!__kmem_cache_empty(cache))
> >> +             quarantine_remove_cache(cache);
> >>  }
> >>
> >>  size_t kasan_metadata_size(struct kmem_cache *cache)
> >> diff --git a/mm/slab.c b/mm/slab.c
> >> index 9212c64bb705..b59f2cdf28d1 100644
> >> --- a/mm/slab.c
> >> +++ b/mm/slab.c
> >> @@ -2291,6 +2291,18 @@ static int drain_freelist(struct kmem_cache *cache,
> >>       return nr_freed;
> >>  }
> >>
> >> +bool __kmem_cache_empty(struct kmem_cache *s)
> >> +{
> >> +     int node;
> >> +     struct kmem_cache_node *n;
> >> +
> >> +     for_each_kmem_cache_node(s, node, n)
> >> +             if (!list_empty(&n->slabs_full) ||
> >> +                 !list_empty(&n->slabs_partial))
> >> +                     return false;
> >> +     return true;
> >> +}
> >> +
> >>  int __kmem_cache_shrink(struct kmem_cache *cachep)
> >>  {
> >>       int ret = 0;
> >> diff --git a/mm/slab.h b/mm/slab.h
> >> index e8981e811c45..68bdf498da3b 100644
> >> --- a/mm/slab.h
> >> +++ b/mm/slab.h
> >> @@ -166,6 +166,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size,
> >>                             SLAB_TEMPORARY | \
> >>                             SLAB_ACCOUNT)
> >>
> >> +bool __kmem_cache_empty(struct kmem_cache *);
> >>  int __kmem_cache_shutdown(struct kmem_cache *);
> >>  void __kmem_cache_release(struct kmem_cache *);
> >>  int __kmem_cache_shrink(struct kmem_cache *);
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index 1edc8d97c862..44aa7847324a 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -3707,6 +3707,17 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
> >>               discard_slab(s, page);
> >>  }
> >>
> >> +bool __kmem_cache_empty(struct kmem_cache *s)
> >> +{
> >> +     int node;
> >> +     struct kmem_cache_node *n;
> >> +
> >> +     for_each_kmem_cache_node(s, node, n)
> >> +             if (n->nr_partial || slabs_node(s, node))
> >> +                     return false;
> >> +     return true;
> >> +}
> >> +
> >>  /*
> >>   * Release all resources used by a slab cache.
> >>   */
> >
> > Any reason not to just make quarantine_remove_cache() part of
> > __kmem_cache_shutdown() instead of duplicating its logic?
> 
> Can you please explain what you mean by making
> quarantine_remove_cache() part of __kmem_cache_shutdown()? Do you mean
> calling quarantine_remove_cache() inside __kmem_cache_shutdown()? The
> __kmem_cache_shutdown() of both SLAB & SLUB does per-cpu
> draining/flushing and we want the free the quarantined objects before
> that. So, I am not sure how to incorporate  quarantine_remove_cache()
> into __kmem_cache_shutdown() without duplicating the for-loop &
> if-condition.
> 

__kmem_cache_empty() is largely a copy and paste of 
__kmem_cache_shutdown() logic, is there no way to simplify this?  I was 
thinking of generalizing the iteration (for_each_kmem_cach_node_nonempty?) 
that eliminates the need for __kmem_cache_empty().

kasan_cache_shutdown() would do

	for_each_kmem_cache_node_nonempty(cache, node, n) {
		quarantine_remove_cache(cache);
		break;
	}

and __kmem_cache_shutdown() would use it for the iteration over 
kmem_cache_node's to drain.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown()
  2018-03-27 23:06 [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown() Shakeel Butt
  2018-03-28  0:16 ` David Rientjes
  2018-03-28 17:03 ` Andrey Ryabinin
@ 2018-03-28 22:00 ` Christopher Lameter
  2 siblings, 0 replies; 6+ messages in thread
From: Christopher Lameter @ 2018-03-28 22:00 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrey Ryabinin, Vladimir Davydov, Alexander Potapenko,
	Greg Thelen, Dmitry Vyukov, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, linux-mm, linux-kernel

On Tue, 27 Mar 2018, Shakeel Butt wrote:

> The kasan quarantine is designed to delay freeing slab objects to catch
> use-after-free. The quarantine can be large (several percent of machine
> memory size). When kmem_caches are deleted related objects are flushed
> from the quarantine but this requires scanning the entire quarantine
> which can be very slow. We have seen the kernel busily working on this
> while holding slab_mutex and badly affecting cache_reaper, slabinfo
> readers and memcg kmem cache creations.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-28 22:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-27 23:06 [PATCH] slab, slub: skip unnecessary kasan_cache_shutdown() Shakeel Butt
2018-03-28  0:16 ` David Rientjes
2018-03-28  5:41   ` Shakeel Butt
2018-03-28 21:29     ` David Rientjes
2018-03-28 17:03 ` Andrey Ryabinin
2018-03-28 22:00 ` Christopher Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.