linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] mm: memcg/slab: Prevent recursive kfree() loop
@ 2021-05-02 18:07 Waiman Long
  2021-05-02 18:07 ` [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2021-05-02 18:07 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Vlastimil Babka, Roman Gushchin, Shakeel Butt
  Cc: linux-kernel, cgroups, linux-mm, Waiman Long

Since the merging of the new slab memory controller in v5.9, the
page structure stores a pointer to obj_cgroup pointer array for
slab pages. When the slab has no used objects, it can be freed in
free_slab() which will call kfree() to free the obj_cgroup pointer array
in memcg_alloc_page_obj_cgroups(). If it happens that the obj_cgroup
array is the last used object in its slab, that slab may then be freed
which may caused kfree() to be called again.

With the right workload, the slab cache may be set up in a way that
allows the recursive kfree() calling loop to nest deep enough to
cause a kernel stack overflow and panic the system. In fact, we have
a reproducer that can cause kernel stack overflow on a s390 system
involving kmalloc-rcl-256 and kmalloc-rcl-128 slabs with the following
kfree() loop recursively called 74 times:

  [  285.520739]  [<000000000ec432fc>] kfree+0x4bc/0x560
  [  285.520740]  [<000000000ec43466>] __free_slab+0xc6/0x228
  [  285.520741]  [<000000000ec41fc2>] __slab_free+0x3c2/0x3e0
  [  285.520742]  [<000000000ec432fc>] kfree+0x4bc/0x560
					:

One way to prevent this from happening is to defer the freeing of the
obj_cgroup array to a later time like using kfree_rcu() even though we
don't really need rcu protection in this case.

The size of rcu_head is just two pointers. The allocated obj_cgroup
array should not be less than that. To be safe, however, additional
code is added to make sure that this is really the case.

Fixes: 286e04b8ed7a ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/memcontrol.c |  9 ++++++++-
 mm/slab.h       | 11 ++++++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c100265dc393..b0695d3aa530 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2866,10 +2866,17 @@ static struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg)
 int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 				 gfp_t gfp, bool new_page)
 {
-	unsigned int objects = objs_per_slab_page(s, page);
+	unsigned int objects;
 	unsigned long memcg_data;
 	void *vec;
 
+	/*
+	 * Since kfree_rcu() is used for freeing, we have to make
+	 * sure that the allocated buffer is big enough for rcu_head.
+	 */
+	objects = max(objs_per_slab_page(s, page),
+		      (int)(sizeof(struct rcu_head)/sizeof(void *)));
+
 	vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp,
 			   page_to_nid(page));
 	if (!vec)
diff --git a/mm/slab.h b/mm/slab.h
index 18c1927cd196..6244a00d30ce 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -242,8 +242,17 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 
 static inline void memcg_free_page_obj_cgroups(struct page *page)
 {
-	kfree(page_objcgs(page));
+	struct {
+		struct rcu_head rcu;
+	} *objcgs = (void *)page_objcgs(page);
+
+	/*
+	 * We don't actually need to use rcu to protect objcg pointers.
+	 * kfree_rcu() is used here just to defer the actual freeing to avoid
+	 * a recursive kfree() loop which may lead to kernel stack overflow.
+	 */
 	page->memcg_data = 0;
+	kfree_rcu(objcgs, rcu);
 }
 
 static inline size_t obj_full_size(struct kmem_cache *s)
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab
  2021-05-02 18:07 [PATCH 1/2] mm: memcg/slab: Prevent recursive kfree() loop Waiman Long
@ 2021-05-02 18:07 ` Waiman Long
  2021-05-03 12:22   ` Vlastimil Babka
  0 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2021-05-02 18:07 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Vladimir Davydov, Andrew Morton,
	Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Vlastimil Babka, Roman Gushchin, Shakeel Butt
  Cc: linux-kernel, cgroups, linux-mm, Waiman Long

The obj_cgroup array (memcg_data) embedded in the page structure is
allocated at the first instance an accounted memory allocation happens.
With the right size object, it is possible that the allocated obj_cgroup
array comes from the same slab that requires memory accounting. If this
happens, the slab will never become empty again as there is at least one
object left (the obj_cgroup array) in the slab.

With instructmentation code added to detect this situation, I got 76
hits on the kmalloc-192 slab when booting up a test kernel on a VM.
So this can really happen.

To avoid the creation of these unfreeable slabs, a check is added to
memcg_alloc_page_obj_cgroups() to detect that and double the size
of the array in case it happens to make sure that it comes from a
different kmemcache.

This change, however, does not completely eliminate the presence
of unfreeable slabs which can still happen if a circular obj_cgroup
array dependency is formed.

Fixes: 286e04b8ed7a ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
Signed-off-by: Waiman Long <longman@redhat.com>
---
 mm/memcontrol.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b0695d3aa530..44852ac048c3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2876,12 +2876,24 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
 	 */
 	objects = max(objs_per_slab_page(s, page),
 		      (int)(sizeof(struct rcu_head)/sizeof(void *)));
-
+retry:
 	vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp,
 			   page_to_nid(page));
 	if (!vec)
 		return -ENOMEM;
 
+	/*
+	 * The allocated vector should not come from the same slab.
+	 * Otherwise, this slab will never become empty. Double the size
+	 * in this case to make sure that the vector comes from a different
+	 * kmemcache.
+	 */
+	if (unlikely(virt_to_head_page(vec) == page)) {
+		kfree(vec);
+		objects *= 2;
+		goto retry;
+	}
+
 	memcg_data = (unsigned long) vec | MEMCG_DATA_OBJCGS;
 	if (new_page) {
 		/*
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab
  2021-05-02 18:07 ` [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab Waiman Long
@ 2021-05-03 12:22   ` Vlastimil Babka
  2021-05-03 14:20     ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2021-05-03 12:22 UTC (permalink / raw)
  To: Waiman Long, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Shakeel Butt
  Cc: linux-kernel, cgroups, linux-mm

On 5/2/21 8:07 PM, Waiman Long wrote:
> The obj_cgroup array (memcg_data) embedded in the page structure is
> allocated at the first instance an accounted memory allocation happens.
> With the right size object, it is possible that the allocated obj_cgroup
> array comes from the same slab that requires memory accounting. If this
> happens, the slab will never become empty again as there is at least one
> object left (the obj_cgroup array) in the slab.
> 
> With instructmentation code added to detect this situation, I got 76
> hits on the kmalloc-192 slab when booting up a test kernel on a VM.
> So this can really happen.
> 
> To avoid the creation of these unfreeable slabs, a check is added to
> memcg_alloc_page_obj_cgroups() to detect that and double the size
> of the array in case it happens to make sure that it comes from a
> different kmemcache.
> 
> This change, however, does not completely eliminate the presence
> of unfreeable slabs which can still happen if a circular obj_cgroup
> array dependency is formed.

Hm this looks like only a half fix then.
I'm afraid the proper fix is for kmemcg to create own set of caches for the
arrays. It would also solve the recursive kfree() issue.

> Fixes: 286e04b8ed7a ("mm: memcg/slab: allocate obj_cgroups for non-root slab pages")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  mm/memcontrol.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b0695d3aa530..44852ac048c3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2876,12 +2876,24 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
>  	 */
>  	objects = max(objs_per_slab_page(s, page),
>  		      (int)(sizeof(struct rcu_head)/sizeof(void *)));
> -
> +retry:
>  	vec = kcalloc_node(objects, sizeof(struct obj_cgroup *), gfp,
>  			   page_to_nid(page));
>  	if (!vec)
>  		return -ENOMEM;
>  
> +	/*
> +	 * The allocated vector should not come from the same slab.
> +	 * Otherwise, this slab will never become empty. Double the size
> +	 * in this case to make sure that the vector comes from a different
> +	 * kmemcache.
> +	 */
> +	if (unlikely(virt_to_head_page(vec) == page)) {
> +		kfree(vec);
> +		objects *= 2;
> +		goto retry;
> +	}
> +
>  	memcg_data = (unsigned long) vec | MEMCG_DATA_OBJCGS;
>  	if (new_page) {
>  		/*
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab
  2021-05-03 12:22   ` Vlastimil Babka
@ 2021-05-03 14:20     ` Waiman Long
  2021-05-03 15:32       ` Vlastimil Babka
  0 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2021-05-03 14:20 UTC (permalink / raw)
  To: Vlastimil Babka, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Shakeel Butt
  Cc: linux-kernel, cgroups, linux-mm

On 5/3/21 8:22 AM, Vlastimil Babka wrote:
> On 5/2/21 8:07 PM, Waiman Long wrote:
>> The obj_cgroup array (memcg_data) embedded in the page structure is
>> allocated at the first instance an accounted memory allocation happens.
>> With the right size object, it is possible that the allocated obj_cgroup
>> array comes from the same slab that requires memory accounting. If this
>> happens, the slab will never become empty again as there is at least one
>> object left (the obj_cgroup array) in the slab.
>>
>> With instructmentation code added to detect this situation, I got 76
>> hits on the kmalloc-192 slab when booting up a test kernel on a VM.
>> So this can really happen.
>>
>> To avoid the creation of these unfreeable slabs, a check is added to
>> memcg_alloc_page_obj_cgroups() to detect that and double the size
>> of the array in case it happens to make sure that it comes from a
>> different kmemcache.
>>
>> This change, however, does not completely eliminate the presence
>> of unfreeable slabs which can still happen if a circular obj_cgroup
>> array dependency is formed.
> Hm this looks like only a half fix then.
> I'm afraid the proper fix is for kmemcg to create own set of caches for the
> arrays. It would also solve the recursive kfree() issue.

Right, this is a possible solution. However, the objcg pointers array 
should need that much memory. Creating its own set of kmemcaches may 
seem like an overkill.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab
  2021-05-03 14:20     ` Waiman Long
@ 2021-05-03 15:32       ` Vlastimil Babka
  2021-05-03 16:24         ` Shakeel Butt
  0 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2021-05-03 15:32 UTC (permalink / raw)
  To: Waiman Long, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, Shakeel Butt
  Cc: linux-kernel, cgroups, linux-mm

On 5/3/21 4:20 PM, Waiman Long wrote:
> On 5/3/21 8:22 AM, Vlastimil Babka wrote:
>> On 5/2/21 8:07 PM, Waiman Long wrote:
>>> The obj_cgroup array (memcg_data) embedded in the page structure is
>>> allocated at the first instance an accounted memory allocation happens.
>>> With the right size object, it is possible that the allocated obj_cgroup
>>> array comes from the same slab that requires memory accounting. If this
>>> happens, the slab will never become empty again as there is at least one
>>> object left (the obj_cgroup array) in the slab.
>>>
>>> With instructmentation code added to detect this situation, I got 76
>>> hits on the kmalloc-192 slab when booting up a test kernel on a VM.
>>> So this can really happen.
>>>
>>> To avoid the creation of these unfreeable slabs, a check is added to
>>> memcg_alloc_page_obj_cgroups() to detect that and double the size
>>> of the array in case it happens to make sure that it comes from a
>>> different kmemcache.
>>>
>>> This change, however, does not completely eliminate the presence
>>> of unfreeable slabs which can still happen if a circular obj_cgroup
>>> array dependency is formed.
>> Hm this looks like only a half fix then.
>> I'm afraid the proper fix is for kmemcg to create own set of caches for the
>> arrays. It would also solve the recursive kfree() issue.
> 
> Right, this is a possible solution. However, the objcg pointers array should
> need that much memory. Creating its own set of kmemcaches may seem like an
> overkill.

Well if we go that way, there might be additional benefits:

depending of gfp flags, kmalloc() would allocate from:

kmalloc-* caches that never have kmemcg objects, thus can be used for the objcg
pointer arrays
kmalloc-cg-* caches that have only kmemcg unreclaimable objects
kmalloc-rcl-* and dma-kmalloc-* can stay with on-demand
memcg_alloc_page_obj_cgroups()

This way we fully solve the issues that this patchset solves. In addition we get
better separation between kmemcg and !kmemcg thus save memory - no allocation of
the array as soon as a single object appears in slab. For "kmalloc-8" we now
have 8 bytes for the useful data and 8 bytes for the obj_cgroup  pointer.

Vlastimil

> Cheers,
> Longman
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab
  2021-05-03 15:32       ` Vlastimil Babka
@ 2021-05-03 16:24         ` Shakeel Butt
  2021-05-03 17:21           ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Shakeel Butt @ 2021-05-03 16:24 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Waiman Long, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, LKML, Cgroups, Linux MM

On Mon, May 3, 2021 at 8:32 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 5/3/21 4:20 PM, Waiman Long wrote:
> > On 5/3/21 8:22 AM, Vlastimil Babka wrote:
> >> On 5/2/21 8:07 PM, Waiman Long wrote:
> >>> The obj_cgroup array (memcg_data) embedded in the page structure is
> >>> allocated at the first instance an accounted memory allocation happens.
> >>> With the right size object, it is possible that the allocated obj_cgroup
> >>> array comes from the same slab that requires memory accounting. If this
> >>> happens, the slab will never become empty again as there is at least one
> >>> object left (the obj_cgroup array) in the slab.
> >>>
> >>> With instructmentation code added to detect this situation, I got 76
> >>> hits on the kmalloc-192 slab when booting up a test kernel on a VM.
> >>> So this can really happen.
> >>>
> >>> To avoid the creation of these unfreeable slabs, a check is added to
> >>> memcg_alloc_page_obj_cgroups() to detect that and double the size
> >>> of the array in case it happens to make sure that it comes from a
> >>> different kmemcache.
> >>>
> >>> This change, however, does not completely eliminate the presence
> >>> of unfreeable slabs which can still happen if a circular obj_cgroup
> >>> array dependency is formed.
> >> Hm this looks like only a half fix then.
> >> I'm afraid the proper fix is for kmemcg to create own set of caches for the
> >> arrays. It would also solve the recursive kfree() issue.
> >
> > Right, this is a possible solution. However, the objcg pointers array should
> > need that much memory. Creating its own set of kmemcaches may seem like an
> > overkill.
>
> Well if we go that way, there might be additional benefits:
>
> depending of gfp flags, kmalloc() would allocate from:
>
> kmalloc-* caches that never have kmemcg objects, thus can be used for the objcg
> pointer arrays
> kmalloc-cg-* caches that have only kmemcg unreclaimable objects
> kmalloc-rcl-* and dma-kmalloc-* can stay with on-demand
> memcg_alloc_page_obj_cgroups()
>
> This way we fully solve the issues that this patchset solves. In addition we get
> better separation between kmemcg and !kmemcg thus save memory - no allocation of
> the array as soon as a single object appears in slab. For "kmalloc-8" we now
> have 8 bytes for the useful data and 8 bytes for the obj_cgroup  pointer.
>

Yes this seems like a better approach.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab
  2021-05-03 16:24         ` Shakeel Butt
@ 2021-05-03 17:21           ` Waiman Long
  2021-05-03 20:15             ` Waiman Long
  0 siblings, 1 reply; 8+ messages in thread
From: Waiman Long @ 2021-05-03 17:21 UTC (permalink / raw)
  To: Shakeel Butt, Vlastimil Babka
  Cc: Waiman Long, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, LKML, Cgroups, Linux MM

On 5/3/21 12:24 PM, Shakeel Butt wrote:
> On Mon, May 3, 2021 at 8:32 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>> On 5/3/21 4:20 PM, Waiman Long wrote:
>>> On 5/3/21 8:22 AM, Vlastimil Babka wrote:
>>>> On 5/2/21 8:07 PM, Waiman Long wrote:
>>>>> The obj_cgroup array (memcg_data) embedded in the page structure is
>>>>> allocated at the first instance an accounted memory allocation happens.
>>>>> With the right size object, it is possible that the allocated obj_cgroup
>>>>> array comes from the same slab that requires memory accounting. If this
>>>>> happens, the slab will never become empty again as there is at least one
>>>>> object left (the obj_cgroup array) in the slab.
>>>>>
>>>>> With instructmentation code added to detect this situation, I got 76
>>>>> hits on the kmalloc-192 slab when booting up a test kernel on a VM.
>>>>> So this can really happen.
>>>>>
>>>>> To avoid the creation of these unfreeable slabs, a check is added to
>>>>> memcg_alloc_page_obj_cgroups() to detect that and double the size
>>>>> of the array in case it happens to make sure that it comes from a
>>>>> different kmemcache.
>>>>>
>>>>> This change, however, does not completely eliminate the presence
>>>>> of unfreeable slabs which can still happen if a circular obj_cgroup
>>>>> array dependency is formed.
>>>> Hm this looks like only a half fix then.
>>>> I'm afraid the proper fix is for kmemcg to create own set of caches for the
>>>> arrays. It would also solve the recursive kfree() issue.
>>> Right, this is a possible solution. However, the objcg pointers array should
>>> need that much memory. Creating its own set of kmemcaches may seem like an
>>> overkill.
>> Well if we go that way, there might be additional benefits:
>>
>> depending of gfp flags, kmalloc() would allocate from:
>>
>> kmalloc-* caches that never have kmemcg objects, thus can be used for the objcg
>> pointer arrays
>> kmalloc-cg-* caches that have only kmemcg unreclaimable objects
>> kmalloc-rcl-* and dma-kmalloc-* can stay with on-demand
>> memcg_alloc_page_obj_cgroups()
>>
>> This way we fully solve the issues that this patchset solves. In addition we get
>> better separation between kmemcg and !kmemcg thus save memory - no allocation of
>> the array as soon as a single object appears in slab. For "kmalloc-8" we now
>> have 8 bytes for the useful data and 8 bytes for the obj_cgroup  pointer.
>>
> Yes this seems like a better approach.
>
OK, I will try to go this route then if there is no objection from others.

 From slabinfo, the objs/slab numbers range from 4-512. That means we 
need kmalloc-cg-{32,64,128,256,512,1k,2k,4k}. A init function to set up 
the new kmemcaches and an allocation function that use the proper 
kmemcaches to allocate from.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab
  2021-05-03 17:21           ` Waiman Long
@ 2021-05-03 20:15             ` Waiman Long
  0 siblings, 0 replies; 8+ messages in thread
From: Waiman Long @ 2021-05-03 20:15 UTC (permalink / raw)
  To: Shakeel Butt, Vlastimil Babka
  Cc: Waiman Long, Johannes Weiner, Michal Hocko, Vladimir Davydov,
	Andrew Morton, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Roman Gushchin, LKML, Cgroups, Linux MM

On 5/3/21 1:21 PM, Waiman Long wrote:
> On 5/3/21 12:24 PM, Shakeel Butt wrote:
>> On Mon, May 3, 2021 at 8:32 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>> On 5/3/21 4:20 PM, Waiman Long wrote:
>>>> On 5/3/21 8:22 AM, Vlastimil Babka wrote:
>>>>> On 5/2/21 8:07 PM, Waiman Long wrote:
>>>>>> The obj_cgroup array (memcg_data) embedded in the page structure is
>>>>>> allocated at the first instance an accounted memory allocation 
>>>>>> happens.
>>>>>> With the right size object, it is possible that the allocated 
>>>>>> obj_cgroup
>>>>>> array comes from the same slab that requires memory accounting. 
>>>>>> If this
>>>>>> happens, the slab will never become empty again as there is at 
>>>>>> least one
>>>>>> object left (the obj_cgroup array) in the slab.
>>>>>>
>>>>>> With instructmentation code added to detect this situation, I got 76
>>>>>> hits on the kmalloc-192 slab when booting up a test kernel on a VM.
>>>>>> So this can really happen.
>>>>>>
>>>>>> To avoid the creation of these unfreeable slabs, a check is added to
>>>>>> memcg_alloc_page_obj_cgroups() to detect that and double the size
>>>>>> of the array in case it happens to make sure that it comes from a
>>>>>> different kmemcache.
>>>>>>
>>>>>> This change, however, does not completely eliminate the presence
>>>>>> of unfreeable slabs which can still happen if a circular obj_cgroup
>>>>>> array dependency is formed.
>>>>> Hm this looks like only a half fix then.
>>>>> I'm afraid the proper fix is for kmemcg to create own set of 
>>>>> caches for the
>>>>> arrays. It would also solve the recursive kfree() issue.
>>>> Right, this is a possible solution. However, the objcg pointers 
>>>> array should
>>>> need that much memory. Creating its own set of kmemcaches may seem 
>>>> like an
>>>> overkill.
>>> Well if we go that way, there might be additional benefits:
>>>
>>> depending of gfp flags, kmalloc() would allocate from:
>>>
>>> kmalloc-* caches that never have kmemcg objects, thus can be used 
>>> for the objcg
>>> pointer arrays
>>> kmalloc-cg-* caches that have only kmemcg unreclaimable objects
>>> kmalloc-rcl-* and dma-kmalloc-* can stay with on-demand
>>> memcg_alloc_page_obj_cgroups()
>>>
>>> This way we fully solve the issues that this patchset solves. In 
>>> addition we get
>>> better separation between kmemcg and !kmemcg thus save memory - no 
>>> allocation of
>>> the array as soon as a single object appears in slab. For 
>>> "kmalloc-8" we now
>>> have 8 bytes for the useful data and 8 bytes for the obj_cgroup  
>>> pointer.
>>>
>> Yes this seems like a better approach.
>>
> OK, I will try to go this route then if there is no objection from 
> others.
>
> From slabinfo, the objs/slab numbers range from 4-512. That means we 
> need kmalloc-cg-{32,64,128,256,512,1k,2k,4k}. A init function to set 
> up the new kmemcaches and an allocation function that use the proper 
> kmemcaches to allocate from. 

I think I had misinterpreted the kmalloc-* setup. In this case, the 
kmalloc-cg-* should have the same set of sizes as kmalloc-*.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-03 20:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-02 18:07 [PATCH 1/2] mm: memcg/slab: Prevent recursive kfree() loop Waiman Long
2021-05-02 18:07 ` [PATCH 2/2] mm: memcg/slab: Don't create unfreeable slab Waiman Long
2021-05-03 12:22   ` Vlastimil Babka
2021-05-03 14:20     ` Waiman Long
2021-05-03 15:32       ` Vlastimil Babka
2021-05-03 16:24         ` Shakeel Butt
2021-05-03 17:21           ` Waiman Long
2021-05-03 20:15             ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).