From: Andrey Wagin <avagin@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, Oleg Nesterov <oleg@redhat.com>,
Cyrill Gorcunov <gorcunov@openvz.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Pavel Emelyanov <xemul@parallels.com>
Subject: Re: [PATCH] pidns: limit the nesting depth of pid namespaces
Date: Thu, 25 Oct 2012 19:02:18 +0400 [thread overview]
Message-ID: <CANaxB-x413bEkfA=1FRrYz6=P+T-V372UUuP0o67KT2=7-eTPA@mail.gmail.com> (raw)
In-Reply-To: <20121024124607.43f599e8.akpm@linux-foundation.org>
[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]
2012/10/24 Andrew Morton <akpm@linux-foundation.org>:
> On Wed, 24 Oct 2012 09:38:57 +0400
> Andrey Wagin <avagin@gmail.com> wrote:
>
>> >
>> > I think that returning -ENOMEM in response to an excessive nesting
>> > attempt is misleading - the system *didn't* run out of memory. EINVAL
>> > is better?
>>
>> I chose ENOMEM by analogy with max_pid. When a new PID can not be
>> allocated, ENOMEM is returned too.
>
> I don't know what this means - please be carefully specific when
> identifying kernel code.
Sorry.
>
> If you're referring to kernel/pid.c:alloc_pid() then -ENOMEM is
> appropriate there, because a failure *is* caused by memory allocation
> failure.
I'm referring to alloc_pidmap().
For example I set pid_max to 500 and try to create more than 500 processes.
[pid 345] clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f8721716a10) = -1 ENOMEM (Cannot allocate memory)
Actually I'm agree with EINVAL and a patch is attached to this message.
Thanks.
>
> But ENOMEM isn't appropriate for nesting-depth-exceeded - we shouldn't
> tell the user "you ran out of memory" when he didn't! -EINVAL isn't
> really appropriate either ("Invalid argument") but it has become a
> general you-screwed-up catchall and seems to me to be the most
> appropriate errno we have available.
>
[-- Attachment #2: 0001-pidns-limit-the-nesting-depth-of-pid-namespaces-v2.patch --]
[-- Type: application/octet-stream, Size: 2566 bytes --]
From 53286ba0cf56585048b5363844dec924081fe06d Mon Sep 17 00:00:00 2001
From: Andrew Vagin <avagin@openvz.org>
Date: Wed, 10 Oct 2012 12:22:16 +0400
Subject: [PATCH] pidns: limit the nesting depth of pid namespaces (v2)
'struct pid' is a "variable sized struct" - a header with an array
of upids at the end.
A size of the array depends on a level (depth) of pid namespaces. Now
a level of pidns is not limited, so 'struct pid' can be more than one
page.
Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL
is not calculated from PAGE_SIZE, because in this case it depends on
architectures, config options and it will be reduced, if someone adds a
new fields in struct pid or struct upid.
I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to
expand "struct pid" and it's more than enough for all known for me
use-cases. When someone finds a reasonable use case, we can add a
config option or a sysctl parameter.
In addition it will reduce effect of another problem, when we have many
nested namespaces and the oldest one starts dying. zap_pid_ns_processe
will be called for each namespace and find_vpid will be called for each
process in a namespace. find_vpid will be called minimum max_level^2 / 2
times. The reason of that is that when we found a bit in pidmap, we
can't determine this pidns is top for this process or it isn't.
vpid is a heavy operation, so a fork bomb, which create many nested
namespace, can do a system inaccessible for a long time.
v2: return EINVAL in response to an excessive nesting attempt
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
---
kernel/pid_namespace.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 478bad2..c923bb5 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -71,12 +71,18 @@ err_alloc:
return NULL;
}
+/* MAX_PID_NS_LEVEL is needed for limiting size of 'struct pid' */
+#define MAX_PID_NS_LEVEL 32
+
static struct pid_namespace *create_pid_namespace(struct pid_namespace *parent_pid_ns)
{
struct pid_namespace *ns;
unsigned int level = parent_pid_ns->level + 1;
int i, err = -ENOMEM;
+ if (level > MAX_PID_NS_LEVEL)
+ return ERR_PTR(-EINVAL);
+
ns = kmem_cache_zalloc(pid_ns_cachep, GFP_KERNEL);
if (ns == NULL)
goto out;
--
1.7.1
prev parent reply other threads:[~2012-10-25 15:02 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-12 12:30 [PATCH] pidns: limit the nesting depth of pid namespaces Andrew Vagin
2012-10-19 14:25 ` Andrey Wagin
2012-10-20 15:21 ` Oleg Nesterov
2012-10-23 23:56 ` Andrew Morton
2012-10-24 5:38 ` Andrey Wagin
2012-10-24 19:46 ` Andrew Morton
2012-10-25 15:02 ` Andrey Wagin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CANaxB-x413bEkfA=1FRrYz6=P+T-V372UUuP0o67KT2=7-eTPA@mail.gmail.com' \
--to=avagin@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=gorcunov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).