linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
@ 2018-04-12 14:52 Kirill Tkhai
  2018-04-13  8:55 ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Kirill Tkhai @ 2018-04-12 14:52 UTC (permalink / raw)
  To: akpm, hannes, mhocko, vdavydov.dev, cgroups, linux-mm, linux-kernel

In case of memcg_online_kmem() fail, memcg_cgroup::id remains hashed
in mem_cgroup_idr even after memcg memory is freed. This leads to leak
of ID in mem_cgroup_idr.

This patch adds removing into mem_cgroup_css_alloc(), which fixes
the problem. For better readability, it adds generic helper, which
will be used in mem_cgroup_alloc() and mem_cgroup_id_put_many() too.

Fixes 73f576c04b94 "mm: memcontrol: fix cgroup creation failure after many small jobs"
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 mm/memcontrol.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3e7942c301a8..448db08d97a0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4263,6 +4263,14 @@ static struct cftype mem_cgroup_legacy_files[] = {
 
 static DEFINE_IDR(mem_cgroup_idr);
 
+static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
+{
+	if (memcg->id.id > 0) {
+		idr_remove(&mem_cgroup_idr, memcg->id.id);
+		memcg->id.id = 0;
+	}
+}
+
 static void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n)
 {
 	VM_BUG_ON(atomic_read(&memcg->id.ref) <= 0);
@@ -4273,8 +4281,7 @@ static void mem_cgroup_id_put_many(struct mem_cgroup *memcg, unsigned int n)
 {
 	VM_BUG_ON(atomic_read(&memcg->id.ref) < n);
 	if (atomic_sub_and_test(n, &memcg->id.ref)) {
-		idr_remove(&mem_cgroup_idr, memcg->id.id);
-		memcg->id.id = 0;
+		mem_cgroup_id_remove(memcg);
 
 		/* Memcg ID pins CSS */
 		css_put(&memcg->css);
@@ -4411,8 +4418,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
 	return memcg;
 fail:
-	if (memcg->id.id > 0)
-		idr_remove(&mem_cgroup_idr, memcg->id.id);
+	mem_cgroup_id_remove(memcg);
 	__mem_cgroup_free(memcg);
 	return NULL;
 }
@@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 
 	return &memcg->css;
 fail:
+	mem_cgroup_id_remove(memcg);
 	mem_cgroup_free(memcg);
 	return ERR_PTR(-ENOMEM);
 }

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-12 14:52 [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure Kirill Tkhai
@ 2018-04-13  8:55 ` Michal Hocko
  2018-04-13  9:35   ` Kirill Tkhai
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-04-13  8:55 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
[...]
> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>  
>  	return &memcg->css;
>  fail:
> +	mem_cgroup_id_remove(memcg);
>  	mem_cgroup_free(memcg);
>  	return ERR_PTR(-ENOMEM);
>  }

The only path which jumps to fail: here (in the current mmotm tree) is 
	error = memcg_online_kmem(memcg);
	if (error)
		goto fail;

AFAICS and the only failure path in memcg_online_kmem
	memcg_id = memcg_alloc_cache_id();
	if (memcg_id < 0)
		return memcg_id;

I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
up properly. Or am I missing something?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13  8:55 ` Michal Hocko
@ 2018-04-13  9:35   ` Kirill Tkhai
  2018-04-13 11:02     ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Kirill Tkhai @ 2018-04-13  9:35 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On 13.04.2018 11:55, Michal Hocko wrote:
> On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
> [...]
>> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>>  
>>  	return &memcg->css;
>>  fail:
>> +	mem_cgroup_id_remove(memcg);
>>  	mem_cgroup_free(memcg);
>>  	return ERR_PTR(-ENOMEM);
>>  }
> 
> The only path which jumps to fail: here (in the current mmotm tree) is 
> 	error = memcg_online_kmem(memcg);
> 	if (error)
> 		goto fail;
> 
> AFAICS and the only failure path in memcg_online_kmem
> 	memcg_id = memcg_alloc_cache_id();
> 	if (memcg_id < 0)
> 		return memcg_id;
> 
> I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
> up properly. Or am I missing something?

memcg_alloc_cache_id() may allocate a lot of memory, in case of the system reached
memcg_nr_cache_ids cgroups. In this case it iterates over all LRU lists, and double
size of every of them. In case of memory pressure it can fail. If this occurs,
mem_cgroup::id is not unhashed from IDR and we leak this id.

After further iterations, all IDs may be occupied, and there won't be able to create
a memcg in the system ever. You may reproduce the situation with the patch:

[root@localhost ~]# cd /sys/fs/cgroup/memory/
[root@localhost memory]# mkdir 1
mkdir: cannot create directory `1': Cannot allocate memory
[root@localhost memory]# for i in {1..65535}; do mkdir 1 2>/dev/null; done
[root@localhost memory]# mkdir 1
mkdir: cannot create directory `1': No space left on device

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3e7942c301a8..5e17bfee9e6f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2156,6 +2156,7 @@ static int memcg_alloc_cache_id(void)
 	err = memcg_update_all_caches(size);
 	if (!err)
 		err = memcg_update_all_list_lrus(size);
+	err = -ENOMEM;
 	if (!err)
 		memcg_nr_cache_ids = size;
 
@@ -4422,7 +4423,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 {
 	struct mem_cgroup *parent = mem_cgroup_from_css(parent_css);
 	struct mem_cgroup *memcg;
-	long error = -ENOMEM;
+	long error = -ENOSPC;
 
 	memcg = mem_cgroup_alloc();
 	if (!memcg)

ENOSPC was added to the second hunk to show that the function fails on IDR allocation.

Kirill

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13  9:35   ` Kirill Tkhai
@ 2018-04-13 11:02     ` Michal Hocko
  2018-04-13 11:06       ` Kirill Tkhai
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-04-13 11:02 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On Fri 13-04-18 12:35:22, Kirill Tkhai wrote:
> On 13.04.2018 11:55, Michal Hocko wrote:
> > On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
> > [...]
> >> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> >>  
> >>  	return &memcg->css;
> >>  fail:
> >> +	mem_cgroup_id_remove(memcg);
> >>  	mem_cgroup_free(memcg);
> >>  	return ERR_PTR(-ENOMEM);
> >>  }
> > 
> > The only path which jumps to fail: here (in the current mmotm tree) is 
> > 	error = memcg_online_kmem(memcg);
> > 	if (error)
> > 		goto fail;
> > 
> > AFAICS and the only failure path in memcg_online_kmem
> > 	memcg_id = memcg_alloc_cache_id();
> > 	if (memcg_id < 0)
> > 		return memcg_id;
> > 
> > I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
> > up properly. Or am I missing something?
> 
> memcg_alloc_cache_id() may allocate a lot of memory, in case of the system reached
> memcg_nr_cache_ids cgroups. In this case it iterates over all LRU lists, and double
> size of every of them. In case of memory pressure it can fail. If this occurs,
> mem_cgroup::id is not unhashed from IDR and we leak this id.

OK, my bad I was looking at the bad code path. So you want to clean up
after mem_cgroup_alloc not memcg_online_kmem. Now it makes much more
sense. Sorry for the confusion on my end.

Anyway, shouldn't we do the thing in mem_cgroup_free() to be symmetric
to mem_cgroup_alloc?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 11:02     ` Michal Hocko
@ 2018-04-13 11:06       ` Kirill Tkhai
  2018-04-13 11:20         ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Kirill Tkhai @ 2018-04-13 11:06 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On 13.04.2018 14:02, Michal Hocko wrote:
> On Fri 13-04-18 12:35:22, Kirill Tkhai wrote:
>> On 13.04.2018 11:55, Michal Hocko wrote:
>>> On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
>>> [...]
>>>> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>>>>  
>>>>  	return &memcg->css;
>>>>  fail:
>>>> +	mem_cgroup_id_remove(memcg);
>>>>  	mem_cgroup_free(memcg);
>>>>  	return ERR_PTR(-ENOMEM);
>>>>  }
>>>
>>> The only path which jumps to fail: here (in the current mmotm tree) is 
>>> 	error = memcg_online_kmem(memcg);
>>> 	if (error)
>>> 		goto fail;
>>>
>>> AFAICS and the only failure path in memcg_online_kmem
>>> 	memcg_id = memcg_alloc_cache_id();
>>> 	if (memcg_id < 0)
>>> 		return memcg_id;
>>>
>>> I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
>>> up properly. Or am I missing something?
>>
>> memcg_alloc_cache_id() may allocate a lot of memory, in case of the system reached
>> memcg_nr_cache_ids cgroups. In this case it iterates over all LRU lists, and double
>> size of every of them. In case of memory pressure it can fail. If this occurs,
>> mem_cgroup::id is not unhashed from IDR and we leak this id.
> 
> OK, my bad I was looking at the bad code path. So you want to clean up
> after mem_cgroup_alloc not memcg_online_kmem. Now it makes much more
> sense. Sorry for the confusion on my end.
> 
> Anyway, shouldn't we do the thing in mem_cgroup_free() to be symmetric
> to mem_cgroup_alloc?

We can't, since it's called from mem_cgroup_css_free(), which doesn't have a deal
with idr freeing. All the asymmetry, we see, is because of the trick to unhash ID
earlier, then from mem_cgroup_css_free().

Kirill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 11:06       ` Kirill Tkhai
@ 2018-04-13 11:20         ` Michal Hocko
  2018-04-13 11:29           ` Kirill Tkhai
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-04-13 11:20 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On Fri 13-04-18 14:06:40, Kirill Tkhai wrote:
> On 13.04.2018 14:02, Michal Hocko wrote:
> > On Fri 13-04-18 12:35:22, Kirill Tkhai wrote:
> >> On 13.04.2018 11:55, Michal Hocko wrote:
> >>> On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
> >>> [...]
> >>>> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> >>>>  
> >>>>  	return &memcg->css;
> >>>>  fail:
> >>>> +	mem_cgroup_id_remove(memcg);
> >>>>  	mem_cgroup_free(memcg);
> >>>>  	return ERR_PTR(-ENOMEM);
> >>>>  }
> >>>
> >>> The only path which jumps to fail: here (in the current mmotm tree) is 
> >>> 	error = memcg_online_kmem(memcg);
> >>> 	if (error)
> >>> 		goto fail;
> >>>
> >>> AFAICS and the only failure path in memcg_online_kmem
> >>> 	memcg_id = memcg_alloc_cache_id();
> >>> 	if (memcg_id < 0)
> >>> 		return memcg_id;
> >>>
> >>> I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
> >>> up properly. Or am I missing something?
> >>
> >> memcg_alloc_cache_id() may allocate a lot of memory, in case of the system reached
> >> memcg_nr_cache_ids cgroups. In this case it iterates over all LRU lists, and double
> >> size of every of them. In case of memory pressure it can fail. If this occurs,
> >> mem_cgroup::id is not unhashed from IDR and we leak this id.
> > 
> > OK, my bad I was looking at the bad code path. So you want to clean up
> > after mem_cgroup_alloc not memcg_online_kmem. Now it makes much more
> > sense. Sorry for the confusion on my end.
> > 
> > Anyway, shouldn't we do the thing in mem_cgroup_free() to be symmetric
> > to mem_cgroup_alloc?
> 
> We can't, since it's called from mem_cgroup_css_free(), which doesn't have a deal
> with idr freeing. All the asymmetry, we see, is because of the trick to unhash ID
> earlier, then from mem_cgroup_css_free().

Are you sure. It's been some time since I've looked at the quite complex
cgroup tear down code but from what I remember, css_free is called on
the css release (aka when the reference count drops to zero). mem_cgroup_id_put_many
seems to unpin the css reference so we should have idr_remove by the
time when css_free is called. Or am I still wrong and should go over the
brain hurting cgroup removal code again?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 11:20         ` Michal Hocko
@ 2018-04-13 11:29           ` Kirill Tkhai
  2018-04-13 11:38             ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Kirill Tkhai @ 2018-04-13 11:29 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On 13.04.2018 14:20, Michal Hocko wrote:
> On Fri 13-04-18 14:06:40, Kirill Tkhai wrote:
>> On 13.04.2018 14:02, Michal Hocko wrote:
>>> On Fri 13-04-18 12:35:22, Kirill Tkhai wrote:
>>>> On 13.04.2018 11:55, Michal Hocko wrote:
>>>>> On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
>>>>> [...]
>>>>>> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>>>>>>  
>>>>>>  	return &memcg->css;
>>>>>>  fail:
>>>>>> +	mem_cgroup_id_remove(memcg);
>>>>>>  	mem_cgroup_free(memcg);
>>>>>>  	return ERR_PTR(-ENOMEM);
>>>>>>  }
>>>>>
>>>>> The only path which jumps to fail: here (in the current mmotm tree) is 
>>>>> 	error = memcg_online_kmem(memcg);
>>>>> 	if (error)
>>>>> 		goto fail;
>>>>>
>>>>> AFAICS and the only failure path in memcg_online_kmem
>>>>> 	memcg_id = memcg_alloc_cache_id();
>>>>> 	if (memcg_id < 0)
>>>>> 		return memcg_id;
>>>>>
>>>>> I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
>>>>> up properly. Or am I missing something?
>>>>
>>>> memcg_alloc_cache_id() may allocate a lot of memory, in case of the system reached
>>>> memcg_nr_cache_ids cgroups. In this case it iterates over all LRU lists, and double
>>>> size of every of them. In case of memory pressure it can fail. If this occurs,
>>>> mem_cgroup::id is not unhashed from IDR and we leak this id.
>>>
>>> OK, my bad I was looking at the bad code path. So you want to clean up
>>> after mem_cgroup_alloc not memcg_online_kmem. Now it makes much more
>>> sense. Sorry for the confusion on my end.
>>>
>>> Anyway, shouldn't we do the thing in mem_cgroup_free() to be symmetric
>>> to mem_cgroup_alloc?
>>
>> We can't, since it's called from mem_cgroup_css_free(), which doesn't have a deal
>> with idr freeing. All the asymmetry, we see, is because of the trick to unhash ID
>> earlier, then from mem_cgroup_css_free().
> 
> Are you sure. It's been some time since I've looked at the quite complex
> cgroup tear down code but from what I remember, css_free is called on
> the css release (aka when the reference count drops to zero). mem_cgroup_id_put_many
> seems to unpin the css reference so we should have idr_remove by the
> time when css_free is called. Or am I still wrong and should go over the
> brain hurting cgroup removal code again?

mem_cgroup_id_put_many() unpins css, but this may be not the last reference to the css.
Thus, we release ID earlier, then all references to css are freed.

You may look at the commit 73f576c04b94, and it describes the reason we do that earlier:

Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Wed Jul 20 15:44:57 2016 -0700

    mm: memcontrol: fix cgroup creation failure after many small jobs
    
    The memory controller has quite a bit of state that usually outlives the
    cgroup and pins its CSS until said state disappears.  At the same time
    it imposes a 16-bit limit on the CSS ID space to economically store IDs
    in the wild.  Consequently, when we use cgroups to contain frequent but
    small and short-lived jobs that leave behind some page cache, we quickly
    run into the 64k limitations of outstanding CSSs.  Creating a new cgroup
    fails with -ENOSPC while there are only a few, or even no user-visible
    cgroups in existence.
    ...

Kirill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 11:29           ` Kirill Tkhai
@ 2018-04-13 11:38             ` Michal Hocko
  2018-04-13 11:49               ` Kirill Tkhai
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-04-13 11:38 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On Fri 13-04-18 14:29:11, Kirill Tkhai wrote:
> On 13.04.2018 14:20, Michal Hocko wrote:
> > On Fri 13-04-18 14:06:40, Kirill Tkhai wrote:
> >> On 13.04.2018 14:02, Michal Hocko wrote:
> >>> On Fri 13-04-18 12:35:22, Kirill Tkhai wrote:
> >>>> On 13.04.2018 11:55, Michal Hocko wrote:
> >>>>> On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
> >>>>> [...]
> >>>>>> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> >>>>>>  
> >>>>>>  	return &memcg->css;
> >>>>>>  fail:
> >>>>>> +	mem_cgroup_id_remove(memcg);
> >>>>>>  	mem_cgroup_free(memcg);
> >>>>>>  	return ERR_PTR(-ENOMEM);
> >>>>>>  }
> >>>>>
> >>>>> The only path which jumps to fail: here (in the current mmotm tree) is 
> >>>>> 	error = memcg_online_kmem(memcg);
> >>>>> 	if (error)
> >>>>> 		goto fail;
> >>>>>
> >>>>> AFAICS and the only failure path in memcg_online_kmem
> >>>>> 	memcg_id = memcg_alloc_cache_id();
> >>>>> 	if (memcg_id < 0)
> >>>>> 		return memcg_id;
> >>>>>
> >>>>> I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
> >>>>> up properly. Or am I missing something?
> >>>>
> >>>> memcg_alloc_cache_id() may allocate a lot of memory, in case of the system reached
> >>>> memcg_nr_cache_ids cgroups. In this case it iterates over all LRU lists, and double
> >>>> size of every of them. In case of memory pressure it can fail. If this occurs,
> >>>> mem_cgroup::id is not unhashed from IDR and we leak this id.
> >>>
> >>> OK, my bad I was looking at the bad code path. So you want to clean up
> >>> after mem_cgroup_alloc not memcg_online_kmem. Now it makes much more
> >>> sense. Sorry for the confusion on my end.
> >>>
> >>> Anyway, shouldn't we do the thing in mem_cgroup_free() to be symmetric
> >>> to mem_cgroup_alloc?
> >>
> >> We can't, since it's called from mem_cgroup_css_free(), which doesn't have a deal
> >> with idr freeing. All the asymmetry, we see, is because of the trick to unhash ID
> >> earlier, then from mem_cgroup_css_free().
> > 
> > Are you sure. It's been some time since I've looked at the quite complex
> > cgroup tear down code but from what I remember, css_free is called on
> > the css release (aka when the reference count drops to zero). mem_cgroup_id_put_many
> > seems to unpin the css reference so we should have idr_remove by the
> > time when css_free is called. Or am I still wrong and should go over the
> > brain hurting cgroup removal code again?
> 
> mem_cgroup_id_put_many() unpins css, but this may be not the last reference to the css.
> Thus, we release ID earlier, then all references to css are freed.

Right and so what. If we have released the idr then we are not going to
do that again in css_free. That is why we have that memcg->id.id > 0
check before idr_remove and memcg->id.id = 0 for the last memcg ref.
count. So again, why cannot we do the clean up in mem_cgroup_free and
have a less confusing code? Or am I just not getting your point and
being dense here?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 11:38             ` Michal Hocko
@ 2018-04-13 11:49               ` Kirill Tkhai
  2018-04-13 11:54                 ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Kirill Tkhai @ 2018-04-13 11:49 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On 13.04.2018 14:38, Michal Hocko wrote:
> On Fri 13-04-18 14:29:11, Kirill Tkhai wrote:
>> On 13.04.2018 14:20, Michal Hocko wrote:
>>> On Fri 13-04-18 14:06:40, Kirill Tkhai wrote:
>>>> On 13.04.2018 14:02, Michal Hocko wrote:
>>>>> On Fri 13-04-18 12:35:22, Kirill Tkhai wrote:
>>>>>> On 13.04.2018 11:55, Michal Hocko wrote:
>>>>>>> On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
>>>>>>> [...]
>>>>>>>> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>>>>>>>>  
>>>>>>>>  	return &memcg->css;
>>>>>>>>  fail:
>>>>>>>> +	mem_cgroup_id_remove(memcg);
>>>>>>>>  	mem_cgroup_free(memcg);
>>>>>>>>  	return ERR_PTR(-ENOMEM);
>>>>>>>>  }
>>>>>>>
>>>>>>> The only path which jumps to fail: here (in the current mmotm tree) is 
>>>>>>> 	error = memcg_online_kmem(memcg);
>>>>>>> 	if (error)
>>>>>>> 		goto fail;
>>>>>>>
>>>>>>> AFAICS and the only failure path in memcg_online_kmem
>>>>>>> 	memcg_id = memcg_alloc_cache_id();
>>>>>>> 	if (memcg_id < 0)
>>>>>>> 		return memcg_id;
>>>>>>>
>>>>>>> I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
>>>>>>> up properly. Or am I missing something?
>>>>>>
>>>>>> memcg_alloc_cache_id() may allocate a lot of memory, in case of the system reached
>>>>>> memcg_nr_cache_ids cgroups. In this case it iterates over all LRU lists, and double
>>>>>> size of every of them. In case of memory pressure it can fail. If this occurs,
>>>>>> mem_cgroup::id is not unhashed from IDR and we leak this id.
>>>>>
>>>>> OK, my bad I was looking at the bad code path. So you want to clean up
>>>>> after mem_cgroup_alloc not memcg_online_kmem. Now it makes much more
>>>>> sense. Sorry for the confusion on my end.
>>>>>
>>>>> Anyway, shouldn't we do the thing in mem_cgroup_free() to be symmetric
>>>>> to mem_cgroup_alloc?
>>>>
>>>> We can't, since it's called from mem_cgroup_css_free(), which doesn't have a deal
>>>> with idr freeing. All the asymmetry, we see, is because of the trick to unhash ID
>>>> earlier, then from mem_cgroup_css_free().
>>>
>>> Are you sure. It's been some time since I've looked at the quite complex
>>> cgroup tear down code but from what I remember, css_free is called on
>>> the css release (aka when the reference count drops to zero). mem_cgroup_id_put_many
>>> seems to unpin the css reference so we should have idr_remove by the
>>> time when css_free is called. Or am I still wrong and should go over the
>>> brain hurting cgroup removal code again?
>>
>> mem_cgroup_id_put_many() unpins css, but this may be not the last reference to the css.
>> Thus, we release ID earlier, then all references to css are freed.
> 
> Right and so what. If we have released the idr then we are not going to
> do that again in css_free. That is why we have that memcg->id.id > 0
> check before idr_remove and memcg->id.id = 0 for the last memcg ref.
> count. So again, why cannot we do the clean up in mem_cgroup_free and
> have a less confusing code? Or am I just not getting your point and
> being dense here?

We can, but mem_cgroup_free() called from mem_cgroup_css_alloc() is unlikely case.
The likely case is mem_cgroup_free() is called from mem_cgroup_css_free(), where
this idr manipulations will be a noop. Noop in likely case looks more confusing
for me.

Less confusing will be to move

        memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
                                 1, MEM_CGROUP_ID_MAX,
                                 GFP_KERNEL);

into mem_cgroup_css_alloc(). How are you think about this?

Kirill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 11:49               ` Kirill Tkhai
@ 2018-04-13 11:54                 ` Michal Hocko
  2018-04-13 12:07                   ` Kirill Tkhai
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-04-13 11:54 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On Fri 13-04-18 14:49:32, Kirill Tkhai wrote:
> On 13.04.2018 14:38, Michal Hocko wrote:
> > On Fri 13-04-18 14:29:11, Kirill Tkhai wrote:
[...]
> >> mem_cgroup_id_put_many() unpins css, but this may be not the last reference to the css.
> >> Thus, we release ID earlier, then all references to css are freed.
> > 
> > Right and so what. If we have released the idr then we are not going to
> > do that again in css_free. That is why we have that memcg->id.id > 0
> > check before idr_remove and memcg->id.id = 0 for the last memcg ref.
> > count. So again, why cannot we do the clean up in mem_cgroup_free and
> > have a less confusing code? Or am I just not getting your point and
> > being dense here?
> 
> We can, but mem_cgroup_free() called from mem_cgroup_css_alloc() is unlikely case.
> The likely case is mem_cgroup_free() is called from mem_cgroup_css_free(), where
> this idr manipulations will be a noop. Noop in likely case looks more confusing
> for me.

Well, I would really prefer to have _free being symmetric to _alloc so
that you can rely that the full state is gone after _free is called.
This confused the hell out of me. Because I _did_ expect that
mem_cgroup_free would do that and so I was looking at completely
different place.
 
> Less confusing will be to move
> 
>         memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
>                                  1, MEM_CGROUP_ID_MAX,
>                                  GFP_KERNEL);
> 
> into mem_cgroup_css_alloc(). How are you think about this?

I would have to double check. Maybe it can be done on top. But for the
actual fix and a stable backport potentially should be as clear as
possible. Your original patch would be just fine but if I would prefer 
mem_cgroup_free for the symmetry.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 11:54                 ` Michal Hocko
@ 2018-04-13 12:07                   ` Kirill Tkhai
  2018-04-13 12:14                     ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Kirill Tkhai @ 2018-04-13 12:07 UTC (permalink / raw)
  To: Michal Hocko; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On 13.04.2018 14:54, Michal Hocko wrote:
> On Fri 13-04-18 14:49:32, Kirill Tkhai wrote:
>> On 13.04.2018 14:38, Michal Hocko wrote:
>>> On Fri 13-04-18 14:29:11, Kirill Tkhai wrote:
> [...]
>>>> mem_cgroup_id_put_many() unpins css, but this may be not the last reference to the css.
>>>> Thus, we release ID earlier, then all references to css are freed.
>>>
>>> Right and so what. If we have released the idr then we are not going to
>>> do that again in css_free. That is why we have that memcg->id.id > 0
>>> check before idr_remove and memcg->id.id = 0 for the last memcg ref.
>>> count. So again, why cannot we do the clean up in mem_cgroup_free and
>>> have a less confusing code? Or am I just not getting your point and
>>> being dense here?
>>
>> We can, but mem_cgroup_free() called from mem_cgroup_css_alloc() is unlikely case.
>> The likely case is mem_cgroup_free() is called from mem_cgroup_css_free(), where
>> this idr manipulations will be a noop. Noop in likely case looks more confusing
>> for me.
> 
> Well, I would really prefer to have _free being symmetric to _alloc so
> that you can rely that the full state is gone after _free is called.
> This confused the hell out of me. Because I _did_ expect that
> mem_cgroup_free would do that and so I was looking at completely
> different place.
>  
>> Less confusing will be to move
>>
>>         memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
>>                                  1, MEM_CGROUP_ID_MAX,
>>                                  GFP_KERNEL);
>>
>> into mem_cgroup_css_alloc(). How are you think about this?
> 
> I would have to double check. Maybe it can be done on top. But for the
> actual fix and a stable backport potentially should be as clear as
> possible. Your original patch would be just fine but if I would prefer 
> mem_cgroup_free for the symmetry.

We definitely can move id allocation to mem_cgroup_css_alloc(), but this
is really not for an easy fix, which will be backported to stable.

Moving idr destroy to mem_cgroup_free() hides IDR trick. My IMHO it's less
readable for a reader.

The main problem is allocation asymmetric, and we shouldn't handle it on free path...

Kirill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 12:07                   ` Kirill Tkhai
@ 2018-04-13 12:14                     ` Michal Hocko
  2018-04-13 12:51                       ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-04-13 12:14 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On Fri 13-04-18 15:07:14, Kirill Tkhai wrote:
> On 13.04.2018 14:54, Michal Hocko wrote:
> > On Fri 13-04-18 14:49:32, Kirill Tkhai wrote:
> >> On 13.04.2018 14:38, Michal Hocko wrote:
> >>> On Fri 13-04-18 14:29:11, Kirill Tkhai wrote:
> > [...]
> >>>> mem_cgroup_id_put_many() unpins css, but this may be not the last reference to the css.
> >>>> Thus, we release ID earlier, then all references to css are freed.
> >>>
> >>> Right and so what. If we have released the idr then we are not going to
> >>> do that again in css_free. That is why we have that memcg->id.id > 0
> >>> check before idr_remove and memcg->id.id = 0 for the last memcg ref.
> >>> count. So again, why cannot we do the clean up in mem_cgroup_free and
> >>> have a less confusing code? Or am I just not getting your point and
> >>> being dense here?
> >>
> >> We can, but mem_cgroup_free() called from mem_cgroup_css_alloc() is unlikely case.
> >> The likely case is mem_cgroup_free() is called from mem_cgroup_css_free(), where
> >> this idr manipulations will be a noop. Noop in likely case looks more confusing
> >> for me.
> > 
> > Well, I would really prefer to have _free being symmetric to _alloc so
> > that you can rely that the full state is gone after _free is called.
> > This confused the hell out of me. Because I _did_ expect that
> > mem_cgroup_free would do that and so I was looking at completely
> > different place.
> >  
> >> Less confusing will be to move
> >>
> >>         memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
> >>                                  1, MEM_CGROUP_ID_MAX,
> >>                                  GFP_KERNEL);
> >>
> >> into mem_cgroup_css_alloc(). How are you think about this?
> > 
> > I would have to double check. Maybe it can be done on top. But for the
> > actual fix and a stable backport potentially should be as clear as
> > possible. Your original patch would be just fine but if I would prefer 
> > mem_cgroup_free for the symmetry.
> 
> We definitely can move id allocation to mem_cgroup_css_alloc(), but this
> is really not for an easy fix, which will be backported to stable.
> 
> Moving idr destroy to mem_cgroup_free() hides IDR trick. My IMHO it's less
> readable for a reader.
> 
> The main problem is allocation asymmetric, and we shouldn't handle it on free path...

Well, this is probably a matter of taste. I will not argue. I will not
object if Johannes is OK with your patch. But the whole thing confused
hell out of me so I would rather un-clutter it...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 12:14                     ` Michal Hocko
@ 2018-04-13 12:51                       ` Michal Hocko
  2018-07-26 23:25                         ` Andrew Morton
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-04-13 12:51 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: akpm, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On Fri 13-04-18 14:14:33, Michal Hocko wrote:
[...]
> Well, this is probably a matter of taste. I will not argue. I will not
> object if Johannes is OK with your patch. But the whole thing confused
> hell out of me so I would rather un-clutter it...

In other words, this

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8c2ed1c2b72c..ca7e981a8a1a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4351,6 +4351,14 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 {
 	int node;
 
+	/*
+	 * We are trying to remove the idr key when the last memcg
+	 * reference drops which can be sooner than when the last
+	 * css reference is dropped to recycle ids faster.
+	 */
+	if (memcg->id.id > 0)
+		idr_remove(&mem_cgroup_idr, memcg->id.id);
+
 	for_each_node(node)
 		free_mem_cgroup_per_node_info(memcg, node);
 	free_percpu(memcg->stat_cpu);
@@ -4411,8 +4419,6 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
 	return memcg;
 fail:
-	if (memcg->id.id > 0)
-		idr_remove(&mem_cgroup_idr, memcg->id.id);
 	__mem_cgroup_free(memcg);
 	return NULL;
 }
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-04-13 12:51                       ` Michal Hocko
@ 2018-07-26 23:25                         ` Andrew Morton
  2018-07-27 19:31                           ` Johannes Weiner
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2018-07-26 23:25 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill Tkhai, hannes, vdavydov.dev, cgroups, linux-mm, linux-kernel

On Fri, 13 Apr 2018 14:51:01 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> On Fri 13-04-18 14:14:33, Michal Hocko wrote:
> [...]
> > Well, this is probably a matter of taste. I will not argue. I will not
> > object if Johannes is OK with your patch. But the whole thing confused
> > hell out of me so I would rather un-clutter it...
> 
> In other words, this
> 

This discussion has rather petered out.  afaict we're waiting for
hannes to offer an opinion?


From: Kirill Tkhai <ktkhai@virtuozzo.com>
Subject: memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure

In case of memcg_online_kmem() failure, memcg_cgroup::id remains hashed in
mem_cgroup_idr even after memcg memory is freed.  This leads to leak of ID
in mem_cgroup_idr.

This patch adds removal into mem_cgroup_css_alloc(), which fixes the
problem.  For better readability, it adds a generic helper which is used
in mem_cgroup_alloc() and mem_cgroup_id_put_many() as well.

Link: http://lkml.kernel.org/r/152354470916.22460.14397070748001974638.stgit@localhost.localdomain
Fixes 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff -puN mm/memcontrol.c~memcg-remove-memcg_cgroup-id-from-idr-on-mem_cgroup_css_alloc-failure mm/memcontrol.c
--- a/mm/memcontrol.c~memcg-remove-memcg_cgroup-id-from-idr-on-mem_cgroup_css_alloc-failure
+++ a/mm/memcontrol.c
@@ -4037,6 +4037,14 @@ static struct cftype mem_cgroup_legacy_f
 
 static DEFINE_IDR(mem_cgroup_idr);
 
+static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
+{
+	if (memcg->id.id > 0) {
+		idr_remove(&mem_cgroup_idr, memcg->id.id);
+		memcg->id.id = 0;
+	}
+}
+
 static void mem_cgroup_id_get_many(struct mem_cgroup *memcg, unsigned int n)
 {
 	VM_BUG_ON(atomic_read(&memcg->id.ref) <= 0);
@@ -4047,8 +4055,7 @@ static void mem_cgroup_id_put_many(struc
 {
 	VM_BUG_ON(atomic_read(&memcg->id.ref) < n);
 	if (atomic_sub_and_test(n, &memcg->id.ref)) {
-		idr_remove(&mem_cgroup_idr, memcg->id.id);
-		memcg->id.id = 0;
+		mem_cgroup_id_remove(memcg);
 
 		/* Memcg ID pins CSS */
 		css_put(&memcg->css);
@@ -4185,8 +4192,7 @@ static struct mem_cgroup *mem_cgroup_all
 	idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
 	return memcg;
 fail:
-	if (memcg->id.id > 0)
-		idr_remove(&mem_cgroup_idr, memcg->id.id);
+	mem_cgroup_id_remove(memcg);
 	__mem_cgroup_free(memcg);
 	return NULL;
 }
@@ -4245,6 +4251,7 @@ mem_cgroup_css_alloc(struct cgroup_subsy
 
 	return &memcg->css;
 fail:
+	mem_cgroup_id_remove(memcg);
 	mem_cgroup_free(memcg);
 	return ERR_PTR(-ENOMEM);
 }
_


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-07-26 23:25                         ` Andrew Morton
@ 2018-07-27 19:31                           ` Johannes Weiner
  2018-07-29 19:26                             ` Vladimir Davydov
  0 siblings, 1 reply; 23+ messages in thread
From: Johannes Weiner @ 2018-07-27 19:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Kirill Tkhai, vdavydov.dev, cgroups, linux-mm,
	linux-kernel

On Thu, Jul 26, 2018 at 04:25:12PM -0700, Andrew Morton wrote:
> On Fri, 13 Apr 2018 14:51:01 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Fri 13-04-18 14:14:33, Michal Hocko wrote:
> > [...]
> > > Well, this is probably a matter of taste. I will not argue. I will not
> > > object if Johannes is OK with your patch. But the whole thing confused
> > > hell out of me so I would rather un-clutter it...
> > 
> > In other words, this
> > 
> 
> This discussion has rather petered out.  afaict we're waiting for
> hannes to offer an opinion?
> 
> 
> From: Kirill Tkhai <ktkhai@virtuozzo.com>
> Subject: memcg: remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
> 
> In case of memcg_online_kmem() failure, memcg_cgroup::id remains hashed in
> mem_cgroup_idr even after memcg memory is freed.  This leads to leak of ID
> in mem_cgroup_idr.
> 
> This patch adds removal into mem_cgroup_css_alloc(), which fixes the
> problem.  For better readability, it adds a generic helper which is used
> in mem_cgroup_alloc() and mem_cgroup_id_put_many() as well.
> 
> Link: http://lkml.kernel.org/r/152354470916.22460.14397070748001974638.stgit@localhost.localdomain
> Fixes 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

I also do wonder if we can do it cleaner, but since it's a fix I don't
want that discussion to hold things up:

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

That said, the lifetime of the root reference on the ID is the online
state, we put that in css_offline. Is there a reason we need to have
the ID ready and the memcg in the IDR before onlining it? Can we do
something like this and not mess with the alloc/free sequence at all?

Michal, Vladimir, am I missing something?

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c59519d600ea..865e6d41d3d1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4144,12 +4144,6 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	if (!memcg)
 		return NULL;
 
-	memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
-				 1, MEM_CGROUP_ID_MAX,
-				 GFP_KERNEL);
-	if (memcg->id.id < 0)
-		goto fail;
-
 	memcg->stat_cpu = alloc_percpu(struct mem_cgroup_stat_cpu);
 	if (!memcg->stat_cpu)
 		goto fail;
@@ -4176,11 +4170,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 #ifdef CONFIG_CGROUP_WRITEBACK
 	INIT_LIST_HEAD(&memcg->cgwb_list);
 #endif
-	idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
 	return memcg;
 fail:
-	if (memcg->id.id > 0)
-		idr_remove(&mem_cgroup_idr, memcg->id.id);
 	__mem_cgroup_free(memcg);
 	return NULL;
 }
@@ -4246,10 +4237,17 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+	int i;
+
+	i = idr_alloc(&mem_cgroup_idr, memcg, 1, MEM_CGROUP_ID_MAX, GFP_KERNEL);
+	if (i < 0)
+		return i;
 
 	/* Online state pins memcg ID, memcg ID pins CSS */
+	memcg->id.id = i;
 	atomic_set(&memcg->id.ref, 1);
 	css_get(css);
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-07-27 19:31                           ` Johannes Weiner
@ 2018-07-29 19:26                             ` Vladimir Davydov
  2018-07-30 15:31                               ` Johannes Weiner
  0 siblings, 1 reply; 23+ messages in thread
From: Vladimir Davydov @ 2018-07-29 19:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Kirill Tkhai, cgroups, linux-mm,
	linux-kernel

On Fri, Jul 27, 2018 at 03:31:34PM -0400, Johannes Weiner wrote:
> That said, the lifetime of the root reference on the ID is the online
> state, we put that in css_offline. Is there a reason we need to have
> the ID ready and the memcg in the IDR before onlining it?

I fail to see any reason for this in the code.

> Can we do something like this and not mess with the alloc/free
> sequence at all?

I guess so, and this definitely looks better to me.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-07-29 19:26                             ` Vladimir Davydov
@ 2018-07-30 15:31                               ` Johannes Weiner
  2018-07-31 23:39                                 ` Andrew Morton
  2018-08-01 16:16                                 ` [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure Vladimir Davydov
  0 siblings, 2 replies; 23+ messages in thread
From: Johannes Weiner @ 2018-07-30 15:31 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Andrew Morton, Michal Hocko, Kirill Tkhai, cgroups, linux-mm,
	linux-kernel

On Sun, Jul 29, 2018 at 10:26:21PM +0300, Vladimir Davydov wrote:
> On Fri, Jul 27, 2018 at 03:31:34PM -0400, Johannes Weiner wrote:
> > That said, the lifetime of the root reference on the ID is the online
> > state, we put that in css_offline. Is there a reason we need to have
> > the ID ready and the memcg in the IDR before onlining it?
> 
> I fail to see any reason for this in the code.

Me neither, thanks for double checking.

The patch also survives stress testing cgroup creation and destruction
with the script from 73f576c04b94 ("mm: memcontrol: fix cgroup
creation failure after many small jobs").

> > Can we do something like this and not mess with the alloc/free
> > sequence at all?
> 
> I guess so, and this definitely looks better to me.

Cool, then I think we should merge Kirill's patch as the fix and mine
as a follow-up cleanup.

---

From b4106ea1f163479da805eceada60c942bd66e524 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 30 Jul 2018 11:03:55 -0400
Subject: [PATCH] mm: memcontrol: simplify memcg idr allocation and error
 unwinding

The memcg ID is allocated early in the multi-step memcg creation
process, which needs 2-step ID allocation and IDR publishing, as well
as two separate IDR cleanup/unwind sites on error.

Defer the IDR allocation until the last second during onlining to
eliminate all this complexity. There is no requirement to have the ID
and IDR entry earlier than that. And the root reference to the ID is
put in the offline path, so this matches nicely.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 12205159b462..12339ae779ca 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4151,12 +4151,6 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	if (!memcg)
 		return NULL;
 
-	memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
-				 1, MEM_CGROUP_ID_MAX,
-				 GFP_KERNEL);
-	if (memcg->id.id < 0)
-		goto fail;
-
 	memcg->stat_cpu = alloc_percpu(struct mem_cgroup_stat_cpu);
 	if (!memcg->stat_cpu)
 		goto fail;
@@ -4183,10 +4177,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 #ifdef CONFIG_CGROUP_WRITEBACK
 	INIT_LIST_HEAD(&memcg->cgwb_list);
 #endif
-	idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
 	return memcg;
 fail:
-	mem_cgroup_id_remove(memcg);
 	__mem_cgroup_free(memcg);
 	return NULL;
 }
@@ -4245,7 +4237,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 
 	return &memcg->css;
 fail:
-	mem_cgroup_id_remove(memcg);
 	mem_cgroup_free(memcg);
 	return ERR_PTR(-ENOMEM);
 }
@@ -4253,10 +4244,17 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+	int i;
+
+	i = idr_alloc(&mem_cgroup_idr, memcg, 1, MEM_CGROUP_ID_MAX, GFP_KERNEL);
+	if (i < 0)
+		return i;
 
 	/* Online state pins memcg ID, memcg ID pins CSS */
+	memcg->id.id = i;
 	atomic_set(&memcg->id.ref, 1);
 	css_get(css);
+
 	return 0;
 }
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-07-30 15:31                               ` Johannes Weiner
@ 2018-07-31 23:39                                 ` Andrew Morton
  2018-08-01 15:55                                   ` Johannes Weiner
  2018-08-01 16:16                                 ` [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure Vladimir Davydov
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2018-07-31 23:39 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Vladimir Davydov, Michal Hocko, Kirill Tkhai, cgroups, linux-mm,
	linux-kernel

On Mon, 30 Jul 2018 11:31:13 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:

> Subject: [PATCH] mm: memcontrol: simplify memcg idr allocation and error
>  unwinding
> 
> The memcg ID is allocated early in the multi-step memcg creation
> process, which needs 2-step ID allocation and IDR publishing, as well
> as two separate IDR cleanup/unwind sites on error.
> 
> Defer the IDR allocation until the last second during onlining to
> eliminate all this complexity. There is no requirement to have the ID
> and IDR entry earlier than that. And the root reference to the ID is
> put in the offline path, so this matches nicely.

This patch isn't aware of Kirill's later "mm, memcg: assign memcg-aware
shrinkers bitmap to memcg", which altered mem_cgroup_css_online():

@@ -4356,6 +4470,11 @@ static int mem_cgroup_css_online(struct
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
+	if (memcg_alloc_shrinker_maps(memcg)) {
+		mem_cgroup_id_remove(memcg);
+		return -ENOMEM;
+	}
+
 	/* Online state pins memcg ID, memcg ID pins CSS */
 	atomic_set(&memcg->id.ref, 1);
 	css_get(css);


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-07-31 23:39                                 ` Andrew Morton
@ 2018-08-01 15:55                                   ` Johannes Weiner
  2018-08-01 16:22                                     ` Vladimir Davydov
  0 siblings, 1 reply; 23+ messages in thread
From: Johannes Weiner @ 2018-08-01 15:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vladimir Davydov, Michal Hocko, Kirill Tkhai, cgroups, linux-mm,
	linux-kernel

On Tue, Jul 31, 2018 at 04:39:08PM -0700, Andrew Morton wrote:
> On Mon, 30 Jul 2018 11:31:13 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > Subject: [PATCH] mm: memcontrol: simplify memcg idr allocation and error
> >  unwinding
> > 
> > The memcg ID is allocated early in the multi-step memcg creation
> > process, which needs 2-step ID allocation and IDR publishing, as well
> > as two separate IDR cleanup/unwind sites on error.
> > 
> > Defer the IDR allocation until the last second during onlining to
> > eliminate all this complexity. There is no requirement to have the ID
> > and IDR entry earlier than that. And the root reference to the ID is
> > put in the offline path, so this matches nicely.
> 
> This patch isn't aware of Kirill's later "mm, memcg: assign memcg-aware
> shrinkers bitmap to memcg", which altered mem_cgroup_css_online():
> 
> @@ -4356,6 +4470,11 @@ static int mem_cgroup_css_online(struct
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  
> +	if (memcg_alloc_shrinker_maps(memcg)) {
> +		mem_cgroup_id_remove(memcg);
> +		return -ENOMEM;
> +	}
> +
>  	/* Online state pins memcg ID, memcg ID pins CSS */
>  	atomic_set(&memcg->id.ref, 1);
>  	css_get(css);
> 

Hm, that looks out of place too. The bitmaps are allocated for the
entire lifetime of the css, not just while it's online.

Any objections to the following fixup to that patch?

From bbb785f1daca74024232aa34ba29a8a108556ace Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Wed, 1 Aug 2018 11:42:55 -0400
Subject: [PATCH] mm, memcg: assign memcg-aware shrinkers bitmap to memcg fix

The shrinker bitmap allocation is a bit out of place in the css
onlining path.

Allocate and free those bitmaps as part of the memcg alloc and free
procedures.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/memcontrol.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c9098200326f..82eb40b715da 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4342,6 +4342,7 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
 {
 	int node;
 
+	memcg_free_shrinker_maps(memcg);
 	for_each_node(node)
 		free_mem_cgroup_per_node_info(memcg, node);
 	free_percpu(memcg->stat_cpu);
@@ -4381,6 +4382,9 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 		if (alloc_mem_cgroup_per_node_info(memcg, node))
 			goto fail;
 
+	if (memcg_alloc_shrinker_maps(memcg))
+		goto fail;
+
 	if (memcg_wb_domain_init(memcg, GFP_KERNEL))
 		goto fail;
 
@@ -4470,11 +4474,6 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
-	if (memcg_alloc_shrinker_maps(memcg)) {
-		mem_cgroup_id_remove(memcg);
-		return -ENOMEM;
-	}
-
 	/* Online state pins memcg ID, memcg ID pins CSS */
 	atomic_set(&memcg->id.ref, 1);
 	css_get(css);
@@ -4527,7 +4526,6 @@ static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
 	vmpressure_cleanup(&memcg->vmpressure);
 	cancel_work_sync(&memcg->high_work);
 	mem_cgroup_remove_from_trees(memcg);
-	memcg_free_shrinker_maps(memcg);
 	memcg_free_kmem(memcg);
 	mem_cgroup_free(memcg);
 }
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-07-30 15:31                               ` Johannes Weiner
  2018-07-31 23:39                                 ` Andrew Morton
@ 2018-08-01 16:16                                 ` Vladimir Davydov
  1 sibling, 0 replies; 23+ messages in thread
From: Vladimir Davydov @ 2018-08-01 16:16 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Kirill Tkhai, cgroups, linux-mm,
	linux-kernel

On Mon, Jul 30, 2018 at 11:31:13AM -0400, Johannes Weiner wrote:
> On Sun, Jul 29, 2018 at 10:26:21PM +0300, Vladimir Davydov wrote:
> > On Fri, Jul 27, 2018 at 03:31:34PM -0400, Johannes Weiner wrote:
> > > That said, the lifetime of the root reference on the ID is the online
> > > state, we put that in css_offline. Is there a reason we need to have
> > > the ID ready and the memcg in the IDR before onlining it?
> > 
> > I fail to see any reason for this in the code.
> 
> Me neither, thanks for double checking.
> 
> The patch also survives stress testing cgroup creation and destruction
> with the script from 73f576c04b94 ("mm: memcontrol: fix cgroup
> creation failure after many small jobs").
> 
> > > Can we do something like this and not mess with the alloc/free
> > > sequence at all?
> > 
> > I guess so, and this definitely looks better to me.
> 
> Cool, then I think we should merge Kirill's patch as the fix and mine
> as a follow-up cleanup.
> 
> ---
> 
> From b4106ea1f163479da805eceada60c942bd66e524 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Mon, 30 Jul 2018 11:03:55 -0400
> Subject: [PATCH] mm: memcontrol: simplify memcg idr allocation and error
>  unwinding
> 
> The memcg ID is allocated early in the multi-step memcg creation
> process, which needs 2-step ID allocation and IDR publishing, as well
> as two separate IDR cleanup/unwind sites on error.
> 
> Defer the IDR allocation until the last second during onlining to
> eliminate all this complexity. There is no requirement to have the ID
> and IDR entry earlier than that. And the root reference to the ID is
> put in the offline path, so this matches nicely.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-08-01 15:55                                   ` Johannes Weiner
@ 2018-08-01 16:22                                     ` Vladimir Davydov
  2018-08-02  8:03                                       ` Kirill Tkhai
  0 siblings, 1 reply; 23+ messages in thread
From: Vladimir Davydov @ 2018-08-01 16:22 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, Kirill Tkhai, cgroups, linux-mm,
	linux-kernel

On Wed, Aug 01, 2018 at 11:55:52AM -0400, Johannes Weiner wrote:
> On Tue, Jul 31, 2018 at 04:39:08PM -0700, Andrew Morton wrote:
> > On Mon, 30 Jul 2018 11:31:13 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> > 
> > > Subject: [PATCH] mm: memcontrol: simplify memcg idr allocation and error
> > >  unwinding
> > > 
> > > The memcg ID is allocated early in the multi-step memcg creation
> > > process, which needs 2-step ID allocation and IDR publishing, as well
> > > as two separate IDR cleanup/unwind sites on error.
> > > 
> > > Defer the IDR allocation until the last second during onlining to
> > > eliminate all this complexity. There is no requirement to have the ID
> > > and IDR entry earlier than that. And the root reference to the ID is
> > > put in the offline path, so this matches nicely.
> > 
> > This patch isn't aware of Kirill's later "mm, memcg: assign memcg-aware
> > shrinkers bitmap to memcg", which altered mem_cgroup_css_online():
> > 
> > @@ -4356,6 +4470,11 @@ static int mem_cgroup_css_online(struct
> >  {
> >  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> >  
> > +	if (memcg_alloc_shrinker_maps(memcg)) {
> > +		mem_cgroup_id_remove(memcg);
> > +		return -ENOMEM;
> > +	}
> > +
> >  	/* Online state pins memcg ID, memcg ID pins CSS */
> >  	atomic_set(&memcg->id.ref, 1);
> >  	css_get(css);
> > 
> 
> Hm, that looks out of place too. The bitmaps are allocated for the
> entire lifetime of the css, not just while it's online.
> 
> Any objections to the following fixup to that patch?

That would be incorrect. Memory cgroups that haven't been put online
are invisible to for_each_mem_cgroup(), which is used for expanding
shrinker maps of all cgroups - see memcg_expand_shrinker_maps(). So if
memcg_expand_shrinker_maps() is called between css_alloc and css_online,
it will miss this cgroup and its shrinker_map won't be reallocated to
fit the new id. Allocating the shrinker map in css_online guarantees
that it won't happen. Looks like this code lacks a comment...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure
  2018-08-01 16:22                                     ` Vladimir Davydov
@ 2018-08-02  8:03                                       ` Kirill Tkhai
  2018-08-02  8:13                                         ` [PATCH] memcg: Add comment to mem_cgroup_css_online() Kirill Tkhai
  0 siblings, 1 reply; 23+ messages in thread
From: Kirill Tkhai @ 2018-08-02  8:03 UTC (permalink / raw)
  To: Vladimir Davydov, Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, cgroups, linux-mm, linux-kernel

On 01.08.2018 19:22, Vladimir Davydov wrote:
> On Wed, Aug 01, 2018 at 11:55:52AM -0400, Johannes Weiner wrote:
>> On Tue, Jul 31, 2018 at 04:39:08PM -0700, Andrew Morton wrote:
>>> On Mon, 30 Jul 2018 11:31:13 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
>>>
>>>> Subject: [PATCH] mm: memcontrol: simplify memcg idr allocation and error
>>>>  unwinding
>>>>
>>>> The memcg ID is allocated early in the multi-step memcg creation
>>>> process, which needs 2-step ID allocation and IDR publishing, as well
>>>> as two separate IDR cleanup/unwind sites on error.
>>>>
>>>> Defer the IDR allocation until the last second during onlining to
>>>> eliminate all this complexity. There is no requirement to have the ID
>>>> and IDR entry earlier than that. And the root reference to the ID is
>>>> put in the offline path, so this matches nicely.
>>>
>>> This patch isn't aware of Kirill's later "mm, memcg: assign memcg-aware
>>> shrinkers bitmap to memcg", which altered mem_cgroup_css_online():
>>>
>>> @@ -4356,6 +4470,11 @@ static int mem_cgroup_css_online(struct
>>>  {
>>>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>>>  
>>> +	if (memcg_alloc_shrinker_maps(memcg)) {
>>> +		mem_cgroup_id_remove(memcg);
>>> +		return -ENOMEM;
>>> +	}
>>> +
>>>  	/* Online state pins memcg ID, memcg ID pins CSS */
>>>  	atomic_set(&memcg->id.ref, 1);
>>>  	css_get(css);
>>>
>>
>> Hm, that looks out of place too. The bitmaps are allocated for the
>> entire lifetime of the css, not just while it's online.
>>
>> Any objections to the following fixup to that patch?
> 
> That would be incorrect. Memory cgroups that haven't been put online
> are invisible to for_each_mem_cgroup(), which is used for expanding
> shrinker maps of all cgroups - see memcg_expand_shrinker_maps(). So if
> memcg_expand_shrinker_maps() is called between css_alloc and css_online,
> it will miss this cgroup and its shrinker_map won't be reallocated to
> fit the new id. Allocating the shrinker map in css_online guarantees
> that it won't happen.

Yes, doubtless.

>Looks like this code lacks a comment...

Ok.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] memcg: Add comment to mem_cgroup_css_online()
  2018-08-02  8:03                                       ` Kirill Tkhai
@ 2018-08-02  8:13                                         ` Kirill Tkhai
  0 siblings, 0 replies; 23+ messages in thread
From: Kirill Tkhai @ 2018-08-02  8:13 UTC (permalink / raw)
  To: Vladimir Davydov, Johannes Weiner
  Cc: Andrew Morton, Michal Hocko, cgroups, linux-mm, linux-kernel

Explain relationships between allocation and expanding.

Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d90993ef1d7d..34e5ff72ce87 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4703,6 +4703,11 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 {
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
+	/*
+	 * A memcg must be visible for memcg_expand_shrinker_maps()
+	 * by the time the maps are allocated. So, we allocate maps
+	 * here, when for_each_mem_cgroup() can't skip it.
+	 */
 	if (memcg_alloc_shrinker_maps(memcg)) {
 		mem_cgroup_id_remove(memcg);
 		return -ENOMEM;


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-08-02  8:13 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 14:52 [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure Kirill Tkhai
2018-04-13  8:55 ` Michal Hocko
2018-04-13  9:35   ` Kirill Tkhai
2018-04-13 11:02     ` Michal Hocko
2018-04-13 11:06       ` Kirill Tkhai
2018-04-13 11:20         ` Michal Hocko
2018-04-13 11:29           ` Kirill Tkhai
2018-04-13 11:38             ` Michal Hocko
2018-04-13 11:49               ` Kirill Tkhai
2018-04-13 11:54                 ` Michal Hocko
2018-04-13 12:07                   ` Kirill Tkhai
2018-04-13 12:14                     ` Michal Hocko
2018-04-13 12:51                       ` Michal Hocko
2018-07-26 23:25                         ` Andrew Morton
2018-07-27 19:31                           ` Johannes Weiner
2018-07-29 19:26                             ` Vladimir Davydov
2018-07-30 15:31                               ` Johannes Weiner
2018-07-31 23:39                                 ` Andrew Morton
2018-08-01 15:55                                   ` Johannes Weiner
2018-08-01 16:22                                     ` Vladimir Davydov
2018-08-02  8:03                                       ` Kirill Tkhai
2018-08-02  8:13                                         ` [PATCH] memcg: Add comment to mem_cgroup_css_online() Kirill Tkhai
2018-08-01 16:16                                 ` [PATCH] memcg: Remove memcg_cgroup::id from IDR on mem_cgroup_css_alloc() failure Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).