linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Tkhai <ktkhai@virtuozzo.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: mhocko@suse.com, avagin@openvz.org, peterz@infradead.org,
	linux-kernel@vger.kernel.org, oleg@redhat.com,
	rppt@linux.vnet.ibm.com, luto@kernel.org, gorcunov@openvz.org,
	akpm@linux-foundation.org, mingo@kernel.org, serge@hallyn.com
Subject: Re: [PATCH] pid_ns: Allow to get pid_for_children ns before child_reaper is created
Date: Thu, 8 Jun 2017 15:17:25 +0300	[thread overview]
Message-ID: <f7c774ef-7b93-c2b9-f257-f538cce09730@virtuozzo.com> (raw)
In-Reply-To: <8de7d233-b7d3-a63e-4980-eb32e8761c30@virtuozzo.com>

ping

On 29.05.2017 13:49, Kirill Tkhai wrote:
> On 27.05.2017 14:01, Eric W. Biederman wrote:
>> Kirill Tkhai <ktkhai@virtuozzo.com> writes:
>>
>>> This patch prohibits pid allocation till child_reaper
>>> of pid namespace is set, and it makes possible and safe
>>> to get just unshared pid_ns from "/proc/[pid]/ns/pid_for_children"
>>> file. This may be useful to determine user_ns of such a created
>>> pid_ns, which is not possible now.
>>>
>>> It was prohibited till now, because the architecture of pid namespaces
>>> assumes child reaper is the firstly created process of the namespace,
>>> and it initializes pid_namespace::proc_mnt. Child reaper creation
>>> mustn't race with creation of another processes from this namespace,
>>> otherwise a process with pid > 1 may die before pid_namespace::proc_mnt
>>> is populated and it will get a null pointer dereference in proc_flush_task().
>>> Also, child reaper mustn't die before processes from the namespace.
>>
>> This patch introduces the possibility that two or more processes may
>> have the same pid namespace (with no processes) as pid_ns_for_children.
>>
>> Which means you can now have a race for the first pid in alloc_pid.
>> Making it indeterminant who allocates the init process.  Which is not
>> acceptable.
>>
>> It is not acceptable on two grounds.
>> 1) It is a bogus user space semantic.  Because userspace needs to
>>    know who allocates init.
>> 2) It is horrible for maintenance becuase now the code has to be very
>>    clever to deal with a case that no one cares about.  Which is
>>    a general formula for buggy code.
> 
> We may disallow setns() if there is no child reaper created, and
> this solves all above issues. Please see v2 below, it has no problems
> you pointed.
> 
> [PATCH v2]pid_ns: Allow to get pid_for_children ns before child_reaper is created
> 
> This patch prohibits setns() on a pid namespace till its child_reaper
> is set, and it makes possible and safe to get just unshared pid_ns
> from "/proc/[pid]/ns/pid_for_children" file. This may be useful
> to determine user_ns of such a created pid_ns, which is not possible now.
> 
> It was not possible till now, because the architecture of pid namespaces
> assumes child reaper is the first created process of the namespace,
> and it initializes pid_namespace::proc_mnt. Child reaper creation
> mustn't race with creation of another processes from this namespace,
> otherwise a process with pid > 1 may die before pid_namespace::proc_mnt
> is populated and it will get a null pointer dereference in proc_flush_task().
> Also, child reaper mustn't die before processes from the namespace.
> 
> The patch prevents such races. It allows to setns() on a pid namespace
> only if ns->child_reaper is already set, and this guarantees, that
> only pid namespace creator may establish child reaper.
> So, we can safely allow to get "/proc/[pid]/ns/pid_for_children"
> since it's created, and to analyse it.
> 
> v2: Don't race for child reaper creation.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: "Eric W. Biederman" <ebiederm@xmission.com>
> CC: Oleg Nesterov <oleg@redhat.com>
> CC: Andy Lutomirski <luto@kernel.org>
> CC: Serge Hallyn <serge@hallyn.com>
> CC: Michal Hocko <mhocko@suse.com>
> CC: Andrei Vagin <avagin@openvz.org>
> CC: Cyrill Gorcunov <gorcunov@openvz.org>
> CC: Mike Rapoport <rppt@linux.vnet.ibm.com>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/pid_namespace.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 74a5a7255b4d..5e7b3fd0d4c2 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -385,15 +385,6 @@ static struct ns_common *pidns_for_children_get(struct task_struct *task)
>  	}
>  	task_unlock(task);
>  
> -	if (ns) {
> -		read_lock(&tasklist_lock);
> -		if (!ns->child_reaper) {
> -			put_pid_ns(ns);
> -			ns = NULL;
> -		}
> -		read_unlock(&tasklist_lock);
> -	}
> -
>  	return ns ? &ns->ns : NULL;
>  }
>  
> @@ -428,6 +419,15 @@ static int pidns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	if (ancestor != active)
>  		return -EINVAL;
>  
> +	/*
> +	 * Disallow processes to use pid namespace till its
> +	 * creator makes child reaper. Otherwise, several
> +	 * processes race for that, and it's not clear who
> +	 * establishes init.
> +	 */
> +	if (!new->child_reaper)
> +		return -ESRCH;
> +
>  	put_pid_ns(nsproxy->pid_ns_for_children);
>  	nsproxy->pid_ns_for_children = get_pid_ns(new);
>  	return 0;
> 

      reply	other threads:[~2017-06-08 12:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-23 16:29 [PATCH] pid_ns: Allow to get pid_for_children ns before child_reaper is created Kirill Tkhai
2017-05-27 11:01 ` Eric W. Biederman
2017-05-29 10:49   ` Kirill Tkhai
2017-06-08 12:17     ` Kirill Tkhai [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7c774ef-7b93-c2b9-f257-f538cce09730@virtuozzo.com \
    --to=ktkhai@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).