linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] pidns: limit the nesting depth of pid namespaces
@ 2012-10-12 12:30 Andrew Vagin
  2012-10-19 14:25 ` Andrey Wagin
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Andrew Vagin @ 2012-10-12 12:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Vagin, Andrew Morton, Oleg Nesterov, Cyrill Gorcunov,
	Eric W. Biederman, Pavel Emelyanov

'struct pid' is a "variable sized struct" - a header with an array
of upids at the end.

A size of the array depends on a level (depth) of pid namespaces. Now
a level of pidns is not limited, so 'struct pid' can be more than one
page.

Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
is not calculated from PAGE_SIZE, because in this case it depends on
architectures, config options and it will be reduced, if someone adds a
new fields in struct pid or struct upid.

I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
expand "struct pid" and it's more than enough for all known for me
use-cases.  When someone finds a reasonable use case, we can add a
config option or a sysctl parameter.

In addition it will reduce effect of another problem, when we have many
nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
will be called for each namespace and find_vpid will be called for each
process in a namespace. find_vpid will be called minimum max_level^2 / 2
times. The reason of that is that when we found a bit in pidmap, we
can't determine this pidns is top for this process or it isn't.

vpid is a heavy operation, so a fork bomb, which create many nested
namespace, can do a system inaccessible for a long time.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
---
 kernel/pid_namespace.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index b051fa6..598bfb3 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -70,12 +70,18 @@ err_alloc:
 	return NULL;
 }
 
+/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
+#define MAX_PID_NS_LEVEL 32
+
 static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns)
 {
 	struct pid_namespace *ns;
 	unsigned int level = parent_pid_ns->level + 1;
 	int i, err = -ENOMEM;
 
+	if (level > MAX_PID_NS_LEVEL)
+		goto out;
+
 	ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
 	if (ns == NULL)
 		goto out;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] pidns: limit the nesting depth of pid namespaces
  2012-10-12 12:30 [PATCH] pidns: limit the nesting depth of pid namespaces Andrew Vagin
@ 2012-10-19 14:25 ` Andrey Wagin
  2012-10-20 15:21 ` Oleg Nesterov
  2012-10-23 23:56 ` Andrew Morton
  2 siblings, 0 replies; 7+ messages in thread
From: Andrey Wagin @ 2012-10-19 14:25 UTC (permalink / raw)
  To: Andrew Morton, Oleg Nesterov
  Cc: Andrew Vagin, Cyrill Gorcunov, Eric W. Biederman,
	Pavel Emelyanov, linux-kernel

Hello Andrew and Oleg,

Andrew, what do you think about this patch? I reworked it according
with your comments to the previous version.

Oleg, could you send Ack in this version, if it's ok for you.

Thanks.

2012/10/12 Andrew Vagin <avagin@openvz.org>:
> 'struct pid' is a "variable sized struct" - a header with an array
> of upids at the end.
>
> A size of the array depends on a level (depth) of pid namespaces. Now
> a level of pidns is not limited, so 'struct pid' can be more than one
> page.
>
> Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
> is not calculated from PAGE_SIZE, because in this case it depends on
> architectures, config options and it will be reduced, if someone adds a
> new fields in struct pid or struct upid.
>
> I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
> expand "struct pid" and it's more than enough for all known for me
> use-cases.  When someone finds a reasonable use case, we can add a
> config option or a sysctl parameter.
>
> In addition it will reduce effect of another problem, when we have many
> nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
> will be called for each namespace and find_vpid will be called for each
> process in a namespace. find_vpid will be called minimum max_level^2 / 2
> times. The reason of that is that when we found a bit in pidmap, we
> can't determine this pidns is top for this process or it isn't.
>
> vpid is a heavy operation, so a fork bomb, which create many nested
> namespace, can do a system inaccessible for a long time.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Signed-off-by: Andrew Vagin <avagin@openvz.org>
> ---
>  kernel/pid_namespace.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index b051fa6..598bfb3 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -70,12 +70,18 @@ err_alloc:
>         return NULL;
>  }
>
> +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
> +#define MAX_PID_NS_LEVEL 32
> +
>  static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns)
>  {
>         struct pid_namespace *ns;
>         unsigned int level = parent_pid_ns->level + 1;
>         int i, err = -ENOMEM;
>
> +       if (level > MAX_PID_NS_LEVEL)
> +               goto out;
> +
>         ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
>         if (ns == NULL)
>                 goto out;
> --
> 1.7.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pidns: limit the nesting depth of pid namespaces
  2012-10-12 12:30 [PATCH] pidns: limit the nesting depth of pid namespaces Andrew Vagin
  2012-10-19 14:25 ` Andrey Wagin
@ 2012-10-20 15:21 ` Oleg Nesterov
  2012-10-23 23:56 ` Andrew Morton
  2 siblings, 0 replies; 7+ messages in thread
From: Oleg Nesterov @ 2012-10-20 15:21 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: linux-kernel, Andrew Morton, Cyrill Gorcunov, Eric W. Biederman,
	Pavel Emelyanov

On 10/12, Andrew Vagin wrote:
>
> 'struct pid' is a "variable sized struct" - a header with an array
> of upids at the end.
>
> A size of the array depends on a level (depth) of pid namespaces. Now
> a level of pidns is not limited, so 'struct pid' can be more than one
> page.
>
> Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
> is not calculated from PAGE_SIZE, because in this case it depends on
> architectures, config options and it will be reduced, if someone adds a
> new fields in struct pid or struct upid.
>
> I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
> expand "struct pid" and it's more than enough for all known for me
> use-cases.  When someone finds a reasonable use case, we can add a
> config option or a sysctl parameter.
>
> In addition it will reduce effect of another problem, when we have many
> nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
> will be called for each namespace and find_vpid will be called for each
> process in a namespace. find_vpid will be called minimum max_level^2 / 2
> times. The reason of that is that when we found a bit in pidmap, we
> can't determine this pidns is top for this process or it isn't.
>
> vpid is a heavy operation, so a fork bomb, which create many nested
> namespace, can do a system inaccessible for a long time.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Signed-off-by: Andrew Vagin <avagin@openvz.org>


Acked-by: Oleg Nesterov <oleg@redhat.com>



> ---
>  kernel/pid_namespace.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index b051fa6..598bfb3 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -70,12 +70,18 @@ err_alloc:
>  	return NULL;
>  }
>
> +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
> +#define MAX_PID_NS_LEVEL 32
> +
>  static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns)
>  {
>  	struct pid_namespace *ns;
>  	unsigned int level = parent_pid_ns->level + 1;
>  	int i, err = -ENOMEM;
>
> +	if (level > MAX_PID_NS_LEVEL)
> +		goto out;
> +
>  	ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
>  	if (ns == NULL)
>  		goto out;
> --
> 1.7.1
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pidns: limit the nesting depth of pid namespaces
  2012-10-12 12:30 [PATCH] pidns: limit the nesting depth of pid namespaces Andrew Vagin
  2012-10-19 14:25 ` Andrey Wagin
  2012-10-20 15:21 ` Oleg Nesterov
@ 2012-10-23 23:56 ` Andrew Morton
  2012-10-24  5:38   ` Andrey Wagin
  2 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2012-10-23 23:56 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: linux-kernel, Oleg Nesterov, Cyrill Gorcunov, Eric W. Biederman,
	Pavel Emelyanov

On Fri, 12 Oct 2012 16:30:42 +0400
Andrew Vagin <avagin@openvz.org> wrote:

> 'struct pid' is a "variable sized struct" - a header with an array
> of upids at the end.
> 
> A size of the array depends on a level (depth) of pid namespaces. Now
> a level of pidns is not limited, so 'struct pid' can be more than one
> page.
> 
> Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
> is not calculated from PAGE_SIZE, because in this case it depends on
> architectures, config options and it will be reduced, if someone adds a
> new fields in struct pid or struct upid.
> 
> I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
> expand "struct pid" and it's more than enough for all known for me
> use-cases.  When someone finds a reasonable use case, we can add a
> config option or a sysctl parameter.
> 
> In addition it will reduce effect of another problem, when we have many
> nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
> will be called for each namespace and find_vpid will be called for each
> process in a namespace. find_vpid will be called minimum max_level^2 / 2
> times. The reason of that is that when we found a bit in pidmap, we
> can't determine this pidns is top for this process or it isn't.
> 
> vpid is a heavy operation, so a fork bomb, which create many nested
> namespace, can do a system inaccessible for a long time.
> 
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -70,12 +70,18 @@ err_alloc:
>  	return NULL;
>  }
>  
> +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
> +#define MAX_PID_NS_LEVEL 32
> +
>  static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns)
>  {
>  	struct pid_namespace *ns;
>  	unsigned int level = parent_pid_ns->level + 1;
>  	int i, err = -ENOMEM;
>  
> +	if (level > MAX_PID_NS_LEVEL)
> +		goto out;
> +
>  	ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
>  	if (ns == NULL)
>  		goto out;

hm, OK.  This will kind-of fix the free_pid_ns()-excessive-recursion
issue which we already fixed by other means ;)

I don't think this problem is serious enough to bother backporting into
-stable but I guess we can/should do it in 3.7.  Agree?

I think that returning -ENOMEM in response to an excessive nesting
attempt is misleading - the system *didn't* run out of memory.  EINVAL
is better?



From: Andrew Morton <akpm@linux-foundation.org>
Subject: pidns-limit-the-nesting-depth-of-pid-namespaces-fix

return -EINVAL in response to excessive nesting, not -ENOMEM

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 kernel/pid_namespace.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff -puN kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix kernel/pid_namespace.c
--- a/kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix
+++ a/kernel/pid_namespace.c
@@ -78,11 +78,15 @@ static struct pid_namespace *create_pid_
 {
 	struct pid_namespace *ns;
 	unsigned int level = parent_pid_ns->level + 1;
-	int i, err = -ENOMEM;
+	int i;
+	int err;
 
-	if (level > MAX_PID_NS_LEVEL)
+	if (level > MAX_PID_NS_LEVEL) {
+		err = -EINVAL;
 		goto out;
+	}
 
+	err = -ENOMEM;
 	ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
 	if (ns == NULL)
 		goto out;
_


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pidns: limit the nesting depth of pid namespaces
  2012-10-23 23:56 ` Andrew Morton
@ 2012-10-24  5:38   ` Andrey Wagin
  2012-10-24 19:46     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Andrey Wagin @ 2012-10-24  5:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Oleg Nesterov, Cyrill Gorcunov, Eric W. Biederman,
	Pavel Emelyanov

2012/10/24 Andrew Morton <akpm@linux-foundation.org>:
> On Fri, 12 Oct 2012 16:30:42 +0400
> Andrew Vagin <avagin@openvz.org> wrote:
>
>> 'struct pid' is a "variable sized struct" - a header with an array
>> of upids at the end.
>>
>> A size of the array depends on a level (depth) of pid namespaces. Now
>> a level of pidns is not limited, so 'struct pid' can be more than one
>> page.
>>
>> Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
>> is not calculated from PAGE_SIZE, because in this case it depends on
>> architectures, config options and it will be reduced, if someone adds a
>> new fields in struct pid or struct upid.
>>
>> I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
>> expand "struct pid" and it's more than enough for all known for me
>> use-cases.  When someone finds a reasonable use case, we can add a
>> config option or a sysctl parameter.
>>
>> In addition it will reduce effect of another problem, when we have many
>> nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
>> will be called for each namespace and find_vpid will be called for each
>> process in a namespace. find_vpid will be called minimum max_level^2 / 2
>> times. The reason of that is that when we found a bit in pidmap, we
>> can't determine this pidns is top for this process or it isn't.
>>
>> vpid is a heavy operation, so a fork bomb, which create many nested
>> namespace, can do a system inaccessible for a long time.
>>
>> --- a/kernel/pid_namespace.c
>> +++ b/kernel/pid_namespace.c
>> @@ -70,12 +70,18 @@ err_alloc:
>>       return NULL;
>>  }
>>
>> +/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
>> +#define MAX_PID_NS_LEVEL 32
>> +
>>  static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns)
>>  {
>>       struct pid_namespace *ns;
>>       unsigned int level = parent_pid_ns->level + 1;
>>       int i, err = -ENOMEM;
>>
>> +     if (level > MAX_PID_NS_LEVEL)
>> +             goto out;
>> +
>>       ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
>>       if (ns == NULL)
>>               goto out;
>
> hm, OK.  This will kind-of fix the free_pid_ns()-excessive-recursion
> issue which we already fixed by other means ;)

I caught both problems with help of an one program, which create many
nested namespaces. Actually this patch prevents another problem. A few
thousand nested namespaces is destroyed for more than one minutes and
in this period a system is inaccessible.

>
> I don't think this problem is serious enough to bother backporting into
> -stable but I guess we can/should do it in 3.7.  Agree?

Yes.

>
> I think that returning -ENOMEM in response to an excessive nesting
> attempt is misleading - the system *didn't* run out of memory.  EINVAL
> is better?

I chose ENOMEM by analogy with max_pid.  When a new PID can not be
allocated, ENOMEM is returned too.

>
>
>
> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: pidns-limit-the-nesting-depth-of-pid-namespaces-fix
>
> return -EINVAL in response to excessive nesting, not -ENOMEM
>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Andrew Vagin <avagin@openvz.org>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>  kernel/pid_namespace.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff -puN kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix kernel/pid_namespace.c
> --- a/kernel/pid_namespace.c~pidns-limit-the-nesting-depth-of-pid-namespaces-fix
> +++ a/kernel/pid_namespace.c
> @@ -78,11 +78,15 @@ static struct pid_namespace *create_pid_
>  {
>         struct pid_namespace *ns;
>         unsigned int level = parent_pid_ns->level + 1;
> -       int i, err = -ENOMEM;
> +       int i;
> +       int err;
>
> -       if (level > MAX_PID_NS_LEVEL)
> +       if (level > MAX_PID_NS_LEVEL) {
> +               err = -EINVAL;
>                 goto out;
> +       }
>
> +       err = -ENOMEM;
>         ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
>         if (ns == NULL)
>                 goto out;
> _
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pidns: limit the nesting depth of pid namespaces
  2012-10-24  5:38   ` Andrey Wagin
@ 2012-10-24 19:46     ` Andrew Morton
  2012-10-25 15:02       ` Andrey Wagin
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2012-10-24 19:46 UTC (permalink / raw)
  To: Andrey Wagin
  Cc: linux-kernel, Oleg Nesterov, Cyrill Gorcunov, Eric W. Biederman,
	Pavel Emelyanov

On Wed, 24 Oct 2012 09:38:57 +0400
Andrey Wagin <avagin@gmail.com> wrote:

> >
> > I think that returning -ENOMEM in response to an excessive nesting
> > attempt is misleading - the system *didn't* run out of memory.  EINVAL
> > is better?
> 
> I chose ENOMEM by analogy with max_pid.  When a new PID can not be
> allocated, ENOMEM is returned too.

I don't know what this means - please be carefully specific when
identifying kernel code.

If you're referring to kernel/pid.c:alloc_pid() then -ENOMEM is
appropriate there, because a failure *is* caused by memory allocation
failure.

But ENOMEM isn't appropriate for nesting-depth-exceeded - we shouldn't
tell the user "you ran out of memory" when he didn't!  -EINVAL isn't
really appropriate either ("Invalid argument") but it has become a
general you-screwed-up catchall and seems to me to be the most
appropriate errno we have available.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] pidns: limit the nesting depth of pid namespaces
  2012-10-24 19:46     ` Andrew Morton
@ 2012-10-25 15:02       ` Andrey Wagin
  0 siblings, 0 replies; 7+ messages in thread
From: Andrey Wagin @ 2012-10-25 15:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Oleg Nesterov, Cyrill Gorcunov, Eric W. Biederman,
	Pavel Emelyanov

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]

2012/10/24 Andrew Morton <akpm@linux-foundation.org>:
> On Wed, 24 Oct 2012 09:38:57 +0400
> Andrey Wagin <avagin@gmail.com> wrote:
>
>> >
>> > I think that returning -ENOMEM in response to an excessive nesting
>> > attempt is misleading - the system *didn't* run out of memory.  EINVAL
>> > is better?
>>
>> I chose ENOMEM by analogy with max_pid.  When a new PID can not be
>> allocated, ENOMEM is returned too.
>
> I don't know what this means - please be carefully specific when
> identifying kernel code.

Sorry.

>
> If you're referring to kernel/pid.c:alloc_pid() then -ENOMEM is
> appropriate there, because a failure *is* caused by memory allocation
> failure.

I'm referring to alloc_pidmap().
For example I set pid_max to 500 and try to create more than 500 processes.

[pid   345] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f8721716a10) = -1 ENOMEM (Cannot allocate memory)

Actually I'm agree with EINVAL and a patch is attached to this message.

Thanks.
>
> But ENOMEM isn't appropriate for nesting-depth-exceeded - we shouldn't
> tell the user "you ran out of memory" when he didn't!  -EINVAL isn't
> really appropriate either ("Invalid argument") but it has become a
> general you-screwed-up catchall and seems to me to be the most
> appropriate errno we have available.
>

[-- Attachment #2: 0001-pidns-limit-the-nesting-depth-of-pid-namespaces-v2.patch --]
[-- Type: application/octet-stream, Size: 2566 bytes --]

From 53286ba0cf56585048b5363844dec924081fe06d Mon Sep 17 00:00:00 2001
From: Andrew Vagin <avagin@openvz.org>
Date: Wed, 10 Oct 2012 12:22:16 +0400
Subject: [PATCH] pidns: limit the nesting depth of pid namespaces (v2)

'struct pid' is a "variable sized struct" - a header with an array
of upids at the end.

A size of the array depends on a level (depth) of pid namespaces. Now
a level of pidns is not limited, so 'struct pid' can be more than one
page.

Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
is not calculated from PAGE_SIZE, because in this case it depends on
architectures, config options and it will be reduced, if someone adds a
new fields in struct pid or struct upid.

I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
expand "struct pid" and it's more than enough for all known for me
use-cases.  When someone finds a reasonable use case, we can add a
config option or a sysctl parameter.

In addition it will reduce effect of another problem, when we have many
nested namespaces and the oldest one starts dying.  zap_pid_ns_processe
will be called for each namespace and find_vpid will be called for each
process in a namespace. find_vpid will be called minimum max_level^2 / 2
times. The reason of that is that when we found a bit in pidmap, we
can't determine this pidns is top for this process or it isn't.

vpid is a heavy operation, so a fork bomb, which create many nested
namespace, can do a system inaccessible for a long time.

v2: return EINVAL in response to an excessive nesting attempt

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
---
 kernel/pid_namespace.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 478bad2..c923bb5 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -71,12 +71,18 @@ err_alloc:
 	return NULL;
 }
 
+/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
+#define MAX_PID_NS_LEVEL 32
+
 static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns)
 {
 	struct pid_namespace *ns;
 	unsigned int level = parent_pid_ns->level + 1;
 	int i, err = -ENOMEM;
 
+	if (level > MAX_PID_NS_LEVEL)
+		return ERR_PTR(-EINVAL);
+
 	ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
 	if (ns == NULL)
 		goto out;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-10-25 15:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-12 12:30 [PATCH] pidns: limit the nesting depth of pid namespaces Andrew Vagin
2012-10-19 14:25 ` Andrey Wagin
2012-10-20 15:21 ` Oleg Nesterov
2012-10-23 23:56 ` Andrew Morton
2012-10-24  5:38   ` Andrey Wagin
2012-10-24 19:46     ` Andrew Morton
2012-10-25 15:02       ` Andrey Wagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).