All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] memcg: enable accounting for pids in nested pid namespaces
@ 2021-04-22  5:44 Vasily Averin
  2021-04-24 11:54   ` Vasily Averin
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Vasily Averin @ 2021-04-22  5:44 UTC (permalink / raw)
  To: cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko
  Cc: Christian Brauner, Serge Hallyn, Roman Gushchin

init_pid_ns.pid_cachep have enabled memcg accounting, though this
setting was disabled for nested pid namespaces.

Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
---
 kernel/pid_namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 6cd6715..a46a372 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
 	mutex_lock(&pid_caches_mutex);
 	/* Name collision forces to do allocation under mutex. */
 	if (!*pkc)
-		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
+		*pkc = kmem_cache_create(name, len, 0,
+					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
 	mutex_unlock(&pid_caches_mutex);
 	/* current can fail, but someone else can succeed. */
 	return READ_ONCE(*pkc);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: enable accounting for pids in nested pid namespaces
       [not found] ` <7b777e22-5b0d-7444-343d-92cbfae5f8b4-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
@ 2021-04-23  1:00   ` Roman Gushchin
  2021-04-23  2:09     ` Vasily Averin
  2021-04-23 16:54   ` Michal Koutný
  2021-07-14  7:43   ` Christian Brauner
  2 siblings, 1 reply; 16+ messages in thread
From: Roman Gushchin @ 2021-04-23  1:00 UTC (permalink / raw)
  To: Vasily Averin
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Christian Brauner,
	Serge Hallyn

On Thu, Apr 22, 2021 at 08:44:15AM +0300, Vasily Averin wrote:
> init_pid_ns.pid_cachep have enabled memcg accounting, though this
> setting was disabled for nested pid namespaces.
> 
> Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> ---
>  kernel/pid_namespace.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 6cd6715..a46a372 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
>  	mutex_lock(&pid_caches_mutex);
>  	/* Name collision forces to do allocation under mutex. */
>  	if (!*pkc)
> -		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
> +		*pkc = kmem_cache_create(name, len, 0,
> +					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
>  	mutex_unlock(&pid_caches_mutex);
>  	/* current can fail, but someone else can succeed. */
>  	return READ_ONCE(*pkc);
> -- 
> 1.8.3.1
> 

It looks good to me! It makes total sense to apply the same rules to the root
and non-root levels.

Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>

Btw, is there any reason why this patch is not included into the series?

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: enable accounting for pids in nested pid namespaces
  2021-04-23  1:00   ` [PATCH] " Roman Gushchin
@ 2021-04-23  2:09     ` Vasily Averin
       [not found]       ` <38945563-59ad-fb5e-9f7f-eb65ae4bf55e-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vasily Averin @ 2021-04-23  2:09 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Christian Brauner,
	Serge Hallyn

On 4/23/21 4:00 AM, Roman Gushchin wrote:
> On Thu, Apr 22, 2021 at 08:44:15AM +0300, Vasily Averin wrote:
>> init_pid_ns.pid_cachep have enabled memcg accounting, though this
>> setting was disabled for nested pid namespaces.
>>
>> Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
>> ---
>>  kernel/pid_namespace.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
>> index 6cd6715..a46a372 100644
>> --- a/kernel/pid_namespace.c
>> +++ b/kernel/pid_namespace.c
>> @@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
>>  	mutex_lock(&pid_caches_mutex);
>>  	/* Name collision forces to do allocation under mutex. */
>>  	if (!*pkc)
>> -		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
>> +		*pkc = kmem_cache_create(name, len, 0,
>> +					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
>>  	mutex_unlock(&pid_caches_mutex);
>>  	/* current can fail, but someone else can succeed. */
>>  	return READ_ONCE(*pkc);
>> -- 
>> 1.8.3.1
>>
> 
> It looks good to me! It makes total sense to apply the same rules to the root
> and non-root levels.
> 
> Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
> 
> Btw, is there any reason why this patch is not included into the series?

It is a bugfix and I think it should be added to upstream ASAP.
Another patches adds a new functionality, they can cause questions or objections
and anyway can wait.

Thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: enable accounting for pids in nested pid namespaces
       [not found]       ` <38945563-59ad-fb5e-9f7f-eb65ae4bf55e-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
@ 2021-04-23  2:30         ` Roman Gushchin
  2021-04-23  2:53           ` Vasily Averin
  0 siblings, 1 reply; 16+ messages in thread
From: Roman Gushchin @ 2021-04-23  2:30 UTC (permalink / raw)
  To: Vasily Averin
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Christian Brauner,
	Serge Hallyn

On Fri, Apr 23, 2021 at 05:09:01AM +0300, Vasily Averin wrote:
> On 4/23/21 4:00 AM, Roman Gushchin wrote:
> > On Thu, Apr 22, 2021 at 08:44:15AM +0300, Vasily Averin wrote:
> >> init_pid_ns.pid_cachep have enabled memcg accounting, though this
> >> setting was disabled for nested pid namespaces.
> >>
> >> Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> >> ---
> >>  kernel/pid_namespace.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> >> index 6cd6715..a46a372 100644
> >> --- a/kernel/pid_namespace.c
> >> +++ b/kernel/pid_namespace.c
> >> @@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
> >>  	mutex_lock(&pid_caches_mutex);
> >>  	/* Name collision forces to do allocation under mutex. */
> >>  	if (!*pkc)
> >> -		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
> >> +		*pkc = kmem_cache_create(name, len, 0,
> >> +					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
> >>  	mutex_unlock(&pid_caches_mutex);
> >>  	/* current can fail, but someone else can succeed. */
> >>  	return READ_ONCE(*pkc);
> >> -- 
> >> 1.8.3.1
> >>
> > 
> > It looks good to me! It makes total sense to apply the same rules to the root
> > and non-root levels.
> > 
> > Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
> > 
> > Btw, is there any reason why this patch is not included into the series?
> 
> It is a bugfix and I think it should be added to upstream ASAP.

Then it would be really useful to add some details on why it's a bug,
what kind of problems it causes, etc. If it has to be backported to
stable, please, add cc stable/fixes tag.

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: enable accounting for pids in nested pid namespaces
  2021-04-23  2:30         ` Roman Gushchin
@ 2021-04-23  2:53           ` Vasily Averin
       [not found]             ` <cd6680e3-edd0-88fa-bb83-b9f2d5a65d5b-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vasily Averin @ 2021-04-23  2:53 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Christian Brauner,
	Serge Hallyn

On 4/23/21 5:30 AM, Roman Gushchin wrote:
> On Fri, Apr 23, 2021 at 05:09:01AM +0300, Vasily Averin wrote:
>> On 4/23/21 4:00 AM, Roman Gushchin wrote:
>>> On Thu, Apr 22, 2021 at 08:44:15AM +0300, Vasily Averin wrote:
>>>> init_pid_ns.pid_cachep have enabled memcg accounting, though this
>>>> setting was disabled for nested pid namespaces.
>>>>
>>>> Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
>>>> ---
>>>>  kernel/pid_namespace.c | 3 ++-
>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
>>>> index 6cd6715..a46a372 100644
>>>> --- a/kernel/pid_namespace.c
>>>> +++ b/kernel/pid_namespace.c
>>>> @@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
>>>>  	mutex_lock(&pid_caches_mutex);
>>>>  	/* Name collision forces to do allocation under mutex. */
>>>>  	if (!*pkc)
>>>> -		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
>>>> +		*pkc = kmem_cache_create(name, len, 0,
>>>> +					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
>>>>  	mutex_unlock(&pid_caches_mutex);
>>>>  	/* current can fail, but someone else can succeed. */
>>>>  	return READ_ONCE(*pkc);
>>>> -- 
>>>> 1.8.3.1
>>>>
>>>
>>> It looks good to me! It makes total sense to apply the same rules to the root
>>> and non-root levels.
>>>
>>> Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
>>>
>>> Btw, is there any reason why this patch is not included into the series?
>>
>> It is a bugfix and I think it should be added to upstream ASAP.
> 
> Then it would be really useful to add some details on why it's a bug,
> what kind of problems it causes, etc. If it has to be backported to
> stable, please, add cc stable/fixes tag.

I mean, in this case we already decided to account pids, but forget to do it.
In another cases we did not have final decision about accounting.

I doubt we specially denied accounting for pids frem nested pid namespaces,
especially because they consumes more memory.
We can expect that all pids are accounted -- but it does not happen in fact.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: enable accounting for pids in nested pid namespaces
       [not found]             ` <cd6680e3-edd0-88fa-bb83-b9f2d5a65d5b-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
@ 2021-04-23  7:34               ` Christian Brauner
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Brauner @ 2021-04-23  7:34 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Roman Gushchin, cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Serge Hallyn

On Fri, Apr 23, 2021 at 05:53:43AM +0300, Vasily Averin wrote:
> On 4/23/21 5:30 AM, Roman Gushchin wrote:
> > On Fri, Apr 23, 2021 at 05:09:01AM +0300, Vasily Averin wrote:
> >> On 4/23/21 4:00 AM, Roman Gushchin wrote:
> >>> On Thu, Apr 22, 2021 at 08:44:15AM +0300, Vasily Averin wrote:
> >>>> init_pid_ns.pid_cachep have enabled memcg accounting, though this
> >>>> setting was disabled for nested pid namespaces.
> >>>>
> >>>> Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> >>>> ---
> >>>>  kernel/pid_namespace.c | 3 ++-
> >>>>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> >>>> index 6cd6715..a46a372 100644
> >>>> --- a/kernel/pid_namespace.c
> >>>> +++ b/kernel/pid_namespace.c
> >>>> @@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
> >>>>  	mutex_lock(&pid_caches_mutex);
> >>>>  	/* Name collision forces to do allocation under mutex. */
> >>>>  	if (!*pkc)
> >>>> -		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
> >>>> +		*pkc = kmem_cache_create(name, len, 0,
> >>>> +					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
> >>>>  	mutex_unlock(&pid_caches_mutex);
> >>>>  	/* current can fail, but someone else can succeed. */
> >>>>  	return READ_ONCE(*pkc);
> >>>> -- 
> >>>> 1.8.3.1
> >>>>
> >>>
> >>> It looks good to me! It makes total sense to apply the same rules to the root
> >>> and non-root levels.
> >>>
> >>> Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
> >>>
> >>> Btw, is there any reason why this patch is not included into the series?
> >>
> >> It is a bugfix and I think it should be added to upstream ASAP.
> > 
> > Then it would be really useful to add some details on why it's a bug,
> > what kind of problems it causes, etc. If it has to be backported to
> > stable, please, add cc stable/fixes tag.
> 
> I mean, in this case we already decided to account pids, but forget to do it.
> In another cases we did not have final decision about accounting.
> 
> I doubt we specially denied accounting for pids frem nested pid namespaces,
> especially because they consumes more memory.
> We can expect that all pids are accounted -- but it does not happen in fact.

As Roman said you should probably explain this in the cover letter and
essentially justify why this should be backported. The thing with
changes such as this is that it's easy for someone to reply and go
"noone noticed for <n> years so why do we care now?". Otherwise:

Acked-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>

Christian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: enable accounting for pids in nested pid namespaces
       [not found] ` <7b777e22-5b0d-7444-343d-92cbfae5f8b4-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
  2021-04-23  1:00   ` [PATCH] " Roman Gushchin
@ 2021-04-23 16:54   ` Michal Koutný
  2021-07-14  7:43   ` Christian Brauner
  2 siblings, 0 replies; 16+ messages in thread
From: Michal Koutný @ 2021-04-23 16:54 UTC (permalink / raw)
  To: Vasily Averin
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Christian Brauner,
	Serge Hallyn, Roman Gushchin

[-- Attachment #1: Type: text/plain, Size: 421 bytes --]

On Thu, Apr 22, 2021 at 08:44:15AM +0300, Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org> wrote:
> init_pid_ns.pid_cachep have enabled memcg accounting, though this
> setting was disabled for nested pid namespaces.
Good catch.
Cursory grep of user_namespace.c and nsproxy.c suggests it's the only
case of namespace-induced new cache.

Reviewed-by: Michal Koutný <mkoutny-IBi9RG/b67k@public.gmane.org>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 0/1] memcg: enable accounting for pids in nested pid namespaces
       [not found] ` <7b777e22-5b0d-7444-343d-92cbfae5f8b4-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
@ 2021-04-24 11:54   ` Vasily Averin
  2021-04-23 16:54   ` Michal Koutný
  2021-07-14  7:43   ` Christian Brauner
  2 siblings, 0 replies; 16+ messages in thread
From: Vasily Averin @ 2021-04-24 11:54 UTC (permalink / raw)
  To: Michal Hocko, cgroups
  Cc: linux-kernel, Roman Gushchin, Christian Brauner,
	Michal Koutný,
	Serge Hallyn

Pid was one the first kernel objects enabled for memcg accounting, see
5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")

init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that any new pids 
in the system are memcg-accounted.

Though recently I've noticed that it is wrong. nested pid namespaces creates 
own slab caches for pid objects, nested pids have increased size because contain 
id both for all parent and for own pid namespaces. The problem is that these slab
caches are _NOT_ marked by SLAB_ACCOUNT,as a result any pids allocated in 
nested pid namespaces are not memcg-accounted.

Pid struct in nested pid namespace consumes up to 500 bytes memory,
100000 such objects gives us up to ~50Mb unaccounted memory.
This allow container to exceed assigned memcg limits.

For me this issue lookslike bug and I would like to ask to push this fix 
both to upstream and to stable

Vasily Averin (1):
  memcg: enable accounting for pids in nested pid namespaces

 kernel/pid_namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 0/1] memcg: enable accounting for pids in nested pid namespaces
@ 2021-04-24 11:54   ` Vasily Averin
  0 siblings, 0 replies; 16+ messages in thread
From: Vasily Averin @ 2021-04-24 11:54 UTC (permalink / raw)
  To: Michal Hocko, cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Roman Gushchin,
	Christian Brauner, Michal Koutný,
	Serge Hallyn

Pid was one the first kernel objects enabled for memcg accounting, see
5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")

init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that any new pids 
in the system are memcg-accounted.

Though recently I've noticed that it is wrong. nested pid namespaces creates 
own slab caches for pid objects, nested pids have increased size because contain 
id both for all parent and for own pid namespaces. The problem is that these slab
caches are _NOT_ marked by SLAB_ACCOUNT,as a result any pids allocated in 
nested pid namespaces are not memcg-accounted.

Pid struct in nested pid namespace consumes up to 500 bytes memory,
100000 such objects gives us up to ~50Mb unaccounted memory.
This allow container to exceed assigned memcg limits.

For me this issue lookslike bug and I would like to ask to push this fix 
both to upstream and to stable

Vasily Averin (1):
  memcg: enable accounting for pids in nested pid namespaces

 kernel/pid_namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/1] memcg: enable accounting for pids in nested pid namespaces
       [not found] ` <7b777e22-5b0d-7444-343d-92cbfae5f8b4-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
@ 2021-04-24 11:54   ` Vasily Averin
  2021-04-23 16:54   ` Michal Koutný
  2021-07-14  7:43   ` Christian Brauner
  2 siblings, 0 replies; 16+ messages in thread
From: Vasily Averin @ 2021-04-24 11:54 UTC (permalink / raw)
  To: Michal Hocko, cgroups
  Cc: linux-kernel, Roman Gushchin, Christian Brauner,
	Michal Koutný,
	Serge Hallyn

Commit 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep,
but forgot to adjust the setting for nested pid namespaces.
As a result, pid memory is not accounted exactly where it is really needed,
inside memcg-limited containers with their own pid namespaces.

Pid was one the first kernel objects enabled for memcg accounting.
init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that
any new pids in the system are memcg-accounted.

Though recently I've noticed that it is wrong. nested pid namespaces creates 
own slab caches for pid objects, nested pids have increased size because contain 
id both for all parent and for own pid namespaces. The problem is that these slab
caches are _NOT_ marked by SLAB_ACCOUNT, as a result any pids allocated in 
nested pid namespaces are not memcg-accounted.

Pid struct in nested pid namespace consumes up to 500 bytes memory, 
100000 such objects gives us up to ~50Mb unaccounted memory,
this allow container to exceed assigned memcg limits.

Fixes: 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Roman Gushchin <guro@fb.com>
---
 kernel/pid_namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 6cd6715..a46a372 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
 	mutex_lock(&pid_caches_mutex);
 	/* Name collision forces to do allocation under mutex. */
 	if (!*pkc)
-		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
+		*pkc = kmem_cache_create(name, len, 0,
+					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
 	mutex_unlock(&pid_caches_mutex);
 	/* current can fail, but someone else can succeed. */
 	return READ_ONCE(*pkc);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 1/1] memcg: enable accounting for pids in nested pid namespaces
@ 2021-04-24 11:54   ` Vasily Averin
  0 siblings, 0 replies; 16+ messages in thread
From: Vasily Averin @ 2021-04-24 11:54 UTC (permalink / raw)
  To: Michal Hocko, cgroups-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Roman Gushchin,
	Christian Brauner, Michal Koutný,
	Serge Hallyn

Commit 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep,
but forgot to adjust the setting for nested pid namespaces.
As a result, pid memory is not accounted exactly where it is really needed,
inside memcg-limited containers with their own pid namespaces.

Pid was one the first kernel objects enabled for memcg accounting.
init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that
any new pids in the system are memcg-accounted.

Though recently I've noticed that it is wrong. nested pid namespaces creates 
own slab caches for pid objects, nested pids have increased size because contain 
id both for all parent and for own pid namespaces. The problem is that these slab
caches are _NOT_ marked by SLAB_ACCOUNT, as a result any pids allocated in 
nested pid namespaces are not memcg-accounted.

Pid struct in nested pid namespace consumes up to 500 bytes memory, 
100000 such objects gives us up to ~50Mb unaccounted memory,
this allow container to exceed assigned memcg limits.

Fixes: 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
Reviewed-by: Michal Koutn√Ω <mkoutny-IBi9RG/b67k@public.gmane.org>
Acked-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
---
 kernel/pid_namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 6cd6715..a46a372 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
 	mutex_lock(&pid_caches_mutex);
 	/* Name collision forces to do allocation under mutex. */
 	if (!*pkc)
-		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
+		*pkc = kmem_cache_create(name, len, 0,
+					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
 	mutex_unlock(&pid_caches_mutex);
 	/* current can fail, but someone else can succeed. */
 	return READ_ONCE(*pkc);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] memcg: enable accounting for pids in nested pid namespaces
  2021-04-24 11:54   ` Vasily Averin
@ 2021-04-26 19:39     ` Shakeel Butt
  -1 siblings, 0 replies; 16+ messages in thread
From: Shakeel Butt @ 2021-04-26 19:39 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Michal Hocko, Cgroups, LKML, Roman Gushchin, Christian Brauner,
	Michal Koutný,
	Serge Hallyn

On Sat, Apr 24, 2021 at 4:54 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>
> Commit 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
> enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep,
> but forgot to adjust the setting for nested pid namespaces.
> As a result, pid memory is not accounted exactly where it is really needed,
> inside memcg-limited containers with their own pid namespaces.
>
> Pid was one the first kernel objects enabled for memcg accounting.
> init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that
> any new pids in the system are memcg-accounted.
>
> Though recently I've noticed that it is wrong. nested pid namespaces creates
> own slab caches for pid objects, nested pids have increased size because contain
> id both for all parent and for own pid namespaces. The problem is that these slab
> caches are _NOT_ marked by SLAB_ACCOUNT, as a result any pids allocated in
> nested pid namespaces are not memcg-accounted.
>
> Pid struct in nested pid namespace consumes up to 500 bytes memory,
> 100000 such objects gives us up to ~50Mb unaccounted memory,
> this allow container to exceed assigned memcg limits.
>
> Fixes: 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> Reviewed-by: Michal Koutný <mkoutny@suse.com>
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> Acked-by: Roman Gushchin <guro@fb.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] memcg: enable accounting for pids in nested pid namespaces
@ 2021-04-26 19:39     ` Shakeel Butt
  0 siblings, 0 replies; 16+ messages in thread
From: Shakeel Butt @ 2021-04-26 19:39 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Michal Hocko, Cgroups, LKML, Roman Gushchin, Christian Brauner,
	Michal Koutný,
	Serge Hallyn

On Sat, Apr 24, 2021 at 4:54 AM Vasily Averin <vvs@virtuozzo.com> wrote:
>
> Commit 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
> enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep,
> but forgot to adjust the setting for nested pid namespaces.
> As a result, pid memory is not accounted exactly where it is really needed,
> inside memcg-limited containers with their own pid namespaces.
>
> Pid was one the first kernel objects enabled for memcg accounting.
> init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that
> any new pids in the system are memcg-accounted.
>
> Though recently I've noticed that it is wrong. nested pid namespaces creates
> own slab caches for pid objects, nested pids have increased size because contain
> id both for all parent and for own pid namespaces. The problem is that these slab
> caches are _NOT_ marked by SLAB_ACCOUNT, as a result any pids allocated in
> nested pid namespaces are not memcg-accounted.
>
> Pid struct in nested pid namespace consumes up to 500 bytes memory,
> 100000 such objects gives us up to ~50Mb unaccounted memory,
> this allow container to exceed assigned memcg limits.
>
> Fixes: 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> Reviewed-by: Michal Koutn√Ω <mkoutny@suse.com>
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> Acked-by: Roman Gushchin <guro@fb.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] memcg: enable accounting for pids in nested pid namespaces
@ 2021-07-14  6:31     ` Vasily Averin
  0 siblings, 0 replies; 16+ messages in thread
From: Vasily Averin @ 2021-07-14  6:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Roman Gushchin, Christian Brauner,
	Michal Koutný,
	Serge Hallyn, cgroups, Michal Hocko

Dear Andrew,
could you please pick up this patch and add
 Reviewed-by: Shakeel Butt <shakeelb@google.com>

Thank you,
	Vasily Averin

On 4/24/21 2:54 PM, Vasily Averin wrote:
> Commit 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
> enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep,
> but forgot to adjust the setting for nested pid namespaces.
> As a result, pid memory is not accounted exactly where it is really needed,
> inside memcg-limited containers with their own pid namespaces.
> 
> Pid was one the first kernel objects enabled for memcg accounting.
> init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that
> any new pids in the system are memcg-accounted.
> 
> Though recently I've noticed that it is wrong. nested pid namespaces creates 
> own slab caches for pid objects, nested pids have increased size because contain 
> id both for all parent and for own pid namespaces. The problem is that these slab
> caches are _NOT_ marked by SLAB_ACCOUNT, as a result any pids allocated in 
> nested pid namespaces are not memcg-accounted.
> 
> Pid struct in nested pid namespace consumes up to 500 bytes memory, 
> 100000 such objects gives us up to ~50Mb unaccounted memory,
> this allow container to exceed assigned memcg limits.
> 
> Fixes: 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
> Cc: stable@vger.kernel.org
> Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
> Reviewed-by: Michal Koutný <mkoutny@suse.com>
> Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
> Acked-by: Roman Gushchin <guro@fb.com>
> ---
>  kernel/pid_namespace.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 6cd6715..a46a372 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
>  	mutex_lock(&pid_caches_mutex);
>  	/* Name collision forces to do allocation under mutex. */
>  	if (!*pkc)
> -		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
> +		*pkc = kmem_cache_create(name, len, 0,
> +					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
>  	mutex_unlock(&pid_caches_mutex);
>  	/* current can fail, but someone else can succeed. */
>  	return READ_ONCE(*pkc);
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/1] memcg: enable accounting for pids in nested pid namespaces
@ 2021-07-14  6:31     ` Vasily Averin
  0 siblings, 0 replies; 16+ messages in thread
From: Vasily Averin @ 2021-07-14  6:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Roman Gushchin,
	Christian Brauner, Michal Koutný,
	Serge Hallyn, cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko

Dear Andrew,
could you please pick up this patch and add
 Reviewed-by: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

Thank you,
	Vasily Averin

On 4/24/21 2:54 PM, Vasily Averin wrote:
> Commit 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
> enabled memcg accounting for pids allocated from init_pid_ns.pid_cachep,
> but forgot to adjust the setting for nested pid namespaces.
> As a result, pid memory is not accounted exactly where it is really needed,
> inside memcg-limited containers with their own pid namespaces.
> 
> Pid was one the first kernel objects enabled for memcg accounting.
> init_pid_ns.pid_cachep marked by SLAB_ACCOUNT and we can expect that
> any new pids in the system are memcg-accounted.
> 
> Though recently I've noticed that it is wrong. nested pid namespaces creates 
> own slab caches for pid objects, nested pids have increased size because contain 
> id both for all parent and for own pid namespaces. The problem is that these slab
> caches are _NOT_ marked by SLAB_ACCOUNT, as a result any pids allocated in 
> nested pid namespaces are not memcg-accounted.
> 
> Pid struct in nested pid namespace consumes up to 500 bytes memory, 
> 100000 such objects gives us up to ~50Mb unaccounted memory,
> this allow container to exceed assigned memcg limits.
> 
> Fixes: 5d097056c9a0 ("kmemcg: account certain kmem allocations to memcg")
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> Reviewed-by: Michal Koutn√Ω <mkoutny-IBi9RG/b67k@public.gmane.org>
> Acked-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>
> ---
>  kernel/pid_namespace.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 6cd6715..a46a372 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -51,7 +51,8 @@ static struct kmem_cache *create_pid_cachep(unsigned int level)
>  	mutex_lock(&pid_caches_mutex);
>  	/* Name collision forces to do allocation under mutex. */
>  	if (!*pkc)
> -		*pkc = kmem_cache_create(name, len, 0, SLAB_HWCACHE_ALIGN, 0);
> +		*pkc = kmem_cache_create(name, len, 0,
> +					 SLAB_HWCACHE_ALIGN | SLAB_ACCOUNT, 0);
>  	mutex_unlock(&pid_caches_mutex);
>  	/* current can fail, but someone else can succeed. */
>  	return READ_ONCE(*pkc);
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] memcg: enable accounting for pids in nested pid namespaces
       [not found] ` <7b777e22-5b0d-7444-343d-92cbfae5f8b4-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
  2021-04-23  1:00   ` [PATCH] " Roman Gushchin
  2021-04-23 16:54   ` Michal Koutný
@ 2021-07-14  7:43   ` Christian Brauner
  2 siblings, 0 replies; 16+ messages in thread
From: Christian Brauner @ 2021-07-14  7:43 UTC (permalink / raw)
  To: Vasily Averin
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Michal Hocko, Serge Hallyn,
	Roman Gushchin

On Thu, Apr 22, 2021 at 08:44:15AM +0300, Vasily Averin wrote:
> init_pid_ns.pid_cachep have enabled memcg accounting, though this
> setting was disabled for nested pid namespaces.
> 
> Signed-off-by: Vasily Averin <vvs-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
> ---

Not sure I already acked this but looks good,
Acked-by: Christian Brauner <christian.brauner-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-14  7:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-22  5:44 [PATCH] memcg: enable accounting for pids in nested pid namespaces Vasily Averin
2021-04-24 11:54 ` [PATCH v2 0/1] " Vasily Averin
2021-04-24 11:54   ` Vasily Averin
2021-04-24 11:54 ` [PATCH v2 1/1] " Vasily Averin
2021-04-24 11:54   ` Vasily Averin
2021-04-26 19:39   ` Shakeel Butt
2021-04-26 19:39     ` Shakeel Butt
2021-07-14  6:31   ` Vasily Averin
2021-07-14  6:31     ` Vasily Averin
     [not found] ` <7b777e22-5b0d-7444-343d-92cbfae5f8b4-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-04-23  1:00   ` [PATCH] " Roman Gushchin
2021-04-23  2:09     ` Vasily Averin
     [not found]       ` <38945563-59ad-fb5e-9f7f-eb65ae4bf55e-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-04-23  2:30         ` Roman Gushchin
2021-04-23  2:53           ` Vasily Averin
     [not found]             ` <cd6680e3-edd0-88fa-bb83-b9f2d5a65d5b-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>
2021-04-23  7:34               ` Christian Brauner
2021-04-23 16:54   ` Michal Koutný
2021-07-14  7:43   ` Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.