All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: Konstantin Khlebnikov
	<khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
Cc: Roman Gushchin <klamm-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Serge Hallyn
	<serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Chen Fan <chen.fan.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [PATCH RFC v3 2/2] pidns: introduce syscall getvpid
Date: Mon, 28 Sep 2015 11:22:21 -0500	[thread overview]
Message-ID: <87a8s6a4zm.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20150925135247.27620.37109.stgit@buzz> (Konstantin Khlebnikov's message of "Fri, 25 Sep 2015 16:52:47 +0300")

Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org> writes:

> pid_t getvpid(pid_t pid, int source, int target);
>
> This syscall converts pid from source pid-namespace into pid visible
> in target pid-namespace. If pid is unreachable from target namespace
> then getvpid() returns zero.

Two minor things.

Can we please call this translate_pid? getvpid does not really cover
what this syscall does.

Can you please split wiring up into a separate patch?  You goofed it
up this round and it just adds noise in reviewing the core syscall.

> Namespaces are defined by file descriptors pointing to entries in
> proc (/proc/[pid]/ns/pid). If argument is negative then current pid
> namespace is used.
>
> If pid is negative then getvpid() returns pid of parent task for -pid.
>
> Possible error codes:
> ESRCH    - task not found
> EBADF    - closed file descriptor
> EINVAL   - not pid-namespace file descriptor
>
> Such conversion is required for interaction between processes from
> different pid-namespaces. For example system service at host system
> who provide access to restricted set of privileged operations for
> clients from containers have to convert pids back and forward.
>
> Recent kernels expose virtual pids in /proc/[pid]/status:NSpid, but
> this interface works only in one way and even that is non-trivial.
>
> Other option is passing pids with credentials via unix socket, but
> this solution requires a lot of preparation and CAP_SYS_ADMIN for
> sending arbitrary pids.
>
> This syscall works in both directions, it's fast and simple.
>
> Examples:
> getvpid(pid, ns, -1)      - get pid in our pid namespace
> getvpid(pid, -1, ns)      - get pid in container
> getvpid(pid, -1, ns) > 0  - is pid is reachable from container?
> getvpid(1, ns1, ns2) > 0  - is ns1 inside ns2?
> getvpid(1, ns1, ns2) == 0 - is ns1 outside ns2?
> getvpid(1, ns, -1)        - get init task of pid-namespace
> getvpid(-1, ns, -1)       - get reaper of init task in parent pid-namespace
> getvpid(-pid, -1, -1)     - get ppid by pid
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
>
> --
>
> v1: https://lkml.org/lkml/2015/9/15/411
> v2: https://lkml.org/lkml/2015/9/24/278
> v3:
>  * use proc_ns_fdget()
>  * update description
>  * rebase to next-20150925
>  * fix conflict with mlock2
> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |    1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |    1 +
>  include/linux/syscalls.h               |    1 +
>  include/uapi/asm-generic/unistd.h      |    4 ++-
>  kernel/sys.c                           |   51 ++++++++++++++++++++++++++++++++
>  5 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 143ef9f37932..c36c2c65d204 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -383,3 +383,4 @@
>  374	i386	userfaultfd		sys_userfaultfd
>  375	i386	membarrier		sys_membarrier
>  376	i386	mlock2			sys_mlock2
> +377	i386	getvpid			sys_getvpid
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 314a90bfc09c..90bbbc7fdbe0 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -332,6 +332,7 @@
>  323	common	userfaultfd		sys_userfaultfd
>  324	common	membarrier		sys_membarrier
>  325	common	mlock2			sys_mlock2
> +326	common	getvpid			sys_getvpid
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index a156b82dd14c..dbb5638260b5 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -222,6 +222,7 @@ asmlinkage long sys_nanosleep(struct timespec __user *rqtp, struct timespec __us
>  asmlinkage long sys_alarm(unsigned int seconds);
>  asmlinkage long sys_getpid(void);
>  asmlinkage long sys_getppid(void);
> +asmlinkage long sys_getvpid(pid_t pid, int source, int target);
>  asmlinkage long sys_getuid(void);
>  asmlinkage long sys_geteuid(void);
>  asmlinkage long sys_getgid(void);
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 1324b0292ec2..2c1123130f90 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -715,9 +715,11 @@ __SYSCALL(__NR_userfaultfd, sys_userfaultfd)
>  __SYSCALL(__NR_membarrier, sys_membarrier)
>  #define __NR_mlock2 284
>  __SYSCALL(__NR_mlock2, sys_mlock2)
> +#define __NR_mlock2 285
> +__SYSCALL(__NR_getvpid, sys_getvpid)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 285
> +#define __NR_syscalls 286
>  
>  /*
>   * All syscalls below here should go away really,
> diff --git a/kernel/sys.c b/kernel/sys.c
> index fa2f2f671a5c..1e28a36b84fa 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -46,6 +46,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/kprobes.h>
>  #include <linux/user_namespace.h>
> +#include <linux/proc_ns.h>
>  #include <linux/binfmts.h>
>  
>  #include <linux/sched.h>
> @@ -855,6 +856,56 @@ SYSCALL_DEFINE0(getppid)
>  	return pid;
>  }
>  
> +SYSCALL_DEFINE3(getvpid, pid_t, pid, int, source, int, target)
> +{
> +	struct pid_namespace *source_ns, *target_ns;
> +	struct fd source_fd = {}, target_fd = {};
> +	struct pid *struct_pid;
> +	struct ns_common *ns;
> +	pid_t result;
> +
> +	if (source >= 0) {
> +		ns = proc_ns_fdget(source, CLONE_NEWPID, &source_fd);
> +		result = PTR_ERR(ns);
> +		if (IS_ERR(ns))
> +			goto out;
> +		source_ns = container_of(ns, struct pid_namespace, ns);
> +	} else
> +		source_ns = task_active_pid_ns(current);
> +
> +	if (target >= 0) {
> +		ns = proc_ns_fdget(target, CLONE_NEWPID, &target_fd);
> +		result = PTR_ERR(ns);
> +		if (IS_ERR(ns))
> +			goto out;
> +		target_ns = container_of(ns, struct pid_namespace, ns);
> +	} else
> +		target_ns = task_active_pid_ns(current);
> +
> +	rcu_read_lock();
> +	struct_pid = find_pid_ns(abs(pid), source_ns);
> +
> +	if (struct_pid && pid < 0) {
> +		struct task_struct *task;
> +
> +		task = pid_task(struct_pid, PIDTYPE_PID);
> +		if (task)
> +			task = rcu_dereference(task->real_parent);
> +		struct_pid = task ? task_pid(task) : NULL;
> +	}
> +
> +	if (struct_pid)
> +		result = pid_nr_ns(struct_pid, target_ns);
> +	else
> +		result = -ESRCH;
> +	rcu_read_unlock();
> +
> +out:
> +	fdput(target_fd);
> +	fdput(source_fd);
> +	return result;
> +}
> +
>  SYSCALL_DEFINE0(getuid)
>  {
>  	/* Only we change this so SMP safe */

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: linux-api@vger.kernel.org, containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org,
	"Roman Gushchin" <klamm@yandex-team.ru>,
	"Serge Hallyn" <serge.hallyn@ubuntu.com>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"Chen Fan" <chen.fan.fnst@cn.fujitsu.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Stéphane Graber" <stgraber@ubuntu.com>
Subject: Re: [PATCH RFC v3 2/2] pidns: introduce syscall getvpid
Date: Mon, 28 Sep 2015 11:22:21 -0500	[thread overview]
Message-ID: <87a8s6a4zm.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20150925135247.27620.37109.stgit@buzz> (Konstantin Khlebnikov's message of "Fri, 25 Sep 2015 16:52:47 +0300")

Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:

> pid_t getvpid(pid_t pid, int source, int target);
>
> This syscall converts pid from source pid-namespace into pid visible
> in target pid-namespace. If pid is unreachable from target namespace
> then getvpid() returns zero.

Two minor things.

Can we please call this translate_pid? getvpid does not really cover
what this syscall does.

Can you please split wiring up into a separate patch?  You goofed it
up this round and it just adds noise in reviewing the core syscall.

> Namespaces are defined by file descriptors pointing to entries in
> proc (/proc/[pid]/ns/pid). If argument is negative then current pid
> namespace is used.
>
> If pid is negative then getvpid() returns pid of parent task for -pid.
>
> Possible error codes:
> ESRCH    - task not found
> EBADF    - closed file descriptor
> EINVAL   - not pid-namespace file descriptor
>
> Such conversion is required for interaction between processes from
> different pid-namespaces. For example system service at host system
> who provide access to restricted set of privileged operations for
> clients from containers have to convert pids back and forward.
>
> Recent kernels expose virtual pids in /proc/[pid]/status:NSpid, but
> this interface works only in one way and even that is non-trivial.
>
> Other option is passing pids with credentials via unix socket, but
> this solution requires a lot of preparation and CAP_SYS_ADMIN for
> sending arbitrary pids.
>
> This syscall works in both directions, it's fast and simple.
>
> Examples:
> getvpid(pid, ns, -1)      - get pid in our pid namespace
> getvpid(pid, -1, ns)      - get pid in container
> getvpid(pid, -1, ns) > 0  - is pid is reachable from container?
> getvpid(1, ns1, ns2) > 0  - is ns1 inside ns2?
> getvpid(1, ns1, ns2) == 0 - is ns1 outside ns2?
> getvpid(1, ns, -1)        - get init task of pid-namespace
> getvpid(-1, ns, -1)       - get reaper of init task in parent pid-namespace
> getvpid(-pid, -1, -1)     - get ppid by pid
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> --
>
> v1: https://lkml.org/lkml/2015/9/15/411
> v2: https://lkml.org/lkml/2015/9/24/278
> v3:
>  * use proc_ns_fdget()
>  * update description
>  * rebase to next-20150925
>  * fix conflict with mlock2
> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |    1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |    1 +
>  include/linux/syscalls.h               |    1 +
>  include/uapi/asm-generic/unistd.h      |    4 ++-
>  kernel/sys.c                           |   51 ++++++++++++++++++++++++++++++++
>  5 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 143ef9f37932..c36c2c65d204 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -383,3 +383,4 @@
>  374	i386	userfaultfd		sys_userfaultfd
>  375	i386	membarrier		sys_membarrier
>  376	i386	mlock2			sys_mlock2
> +377	i386	getvpid			sys_getvpid
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index 314a90bfc09c..90bbbc7fdbe0 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -332,6 +332,7 @@
>  323	common	userfaultfd		sys_userfaultfd
>  324	common	membarrier		sys_membarrier
>  325	common	mlock2			sys_mlock2
> +326	common	getvpid			sys_getvpid
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index a156b82dd14c..dbb5638260b5 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -222,6 +222,7 @@ asmlinkage long sys_nanosleep(struct timespec __user *rqtp, struct timespec __us
>  asmlinkage long sys_alarm(unsigned int seconds);
>  asmlinkage long sys_getpid(void);
>  asmlinkage long sys_getppid(void);
> +asmlinkage long sys_getvpid(pid_t pid, int source, int target);
>  asmlinkage long sys_getuid(void);
>  asmlinkage long sys_geteuid(void);
>  asmlinkage long sys_getgid(void);
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 1324b0292ec2..2c1123130f90 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -715,9 +715,11 @@ __SYSCALL(__NR_userfaultfd, sys_userfaultfd)
>  __SYSCALL(__NR_membarrier, sys_membarrier)
>  #define __NR_mlock2 284
>  __SYSCALL(__NR_mlock2, sys_mlock2)
> +#define __NR_mlock2 285
> +__SYSCALL(__NR_getvpid, sys_getvpid)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 285
> +#define __NR_syscalls 286
>  
>  /*
>   * All syscalls below here should go away really,
> diff --git a/kernel/sys.c b/kernel/sys.c
> index fa2f2f671a5c..1e28a36b84fa 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -46,6 +46,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/kprobes.h>
>  #include <linux/user_namespace.h>
> +#include <linux/proc_ns.h>
>  #include <linux/binfmts.h>
>  
>  #include <linux/sched.h>
> @@ -855,6 +856,56 @@ SYSCALL_DEFINE0(getppid)
>  	return pid;
>  }
>  
> +SYSCALL_DEFINE3(getvpid, pid_t, pid, int, source, int, target)
> +{
> +	struct pid_namespace *source_ns, *target_ns;
> +	struct fd source_fd = {}, target_fd = {};
> +	struct pid *struct_pid;
> +	struct ns_common *ns;
> +	pid_t result;
> +
> +	if (source >= 0) {
> +		ns = proc_ns_fdget(source, CLONE_NEWPID, &source_fd);
> +		result = PTR_ERR(ns);
> +		if (IS_ERR(ns))
> +			goto out;
> +		source_ns = container_of(ns, struct pid_namespace, ns);
> +	} else
> +		source_ns = task_active_pid_ns(current);
> +
> +	if (target >= 0) {
> +		ns = proc_ns_fdget(target, CLONE_NEWPID, &target_fd);
> +		result = PTR_ERR(ns);
> +		if (IS_ERR(ns))
> +			goto out;
> +		target_ns = container_of(ns, struct pid_namespace, ns);
> +	} else
> +		target_ns = task_active_pid_ns(current);
> +
> +	rcu_read_lock();
> +	struct_pid = find_pid_ns(abs(pid), source_ns);
> +
> +	if (struct_pid && pid < 0) {
> +		struct task_struct *task;
> +
> +		task = pid_task(struct_pid, PIDTYPE_PID);
> +		if (task)
> +			task = rcu_dereference(task->real_parent);
> +		struct_pid = task ? task_pid(task) : NULL;
> +	}
> +
> +	if (struct_pid)
> +		result = pid_nr_ns(struct_pid, target_ns);
> +	else
> +		result = -ESRCH;
> +	rcu_read_unlock();
> +
> +out:
> +	fdput(target_fd);
> +	fdput(source_fd);
> +	return result;
> +}
> +
>  SYSCALL_DEFINE0(getuid)
>  {
>  	/* Only we change this so SMP safe */

  parent reply	other threads:[~2015-09-28 16:22 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-25 13:52 [PATCH RFC v3 1/2] nsfs: replace proc_ns_fget() with proc_ns_fdget() Konstantin Khlebnikov
2015-09-25 13:52 ` Konstantin Khlebnikov
2015-09-25 13:52 ` [PATCH RFC v3 2/2] pidns: introduce syscall getvpid Konstantin Khlebnikov
2015-09-25 13:52   ` Konstantin Khlebnikov
2015-09-28  4:12   ` kbuild test robot
2015-09-28  4:12     ` kbuild test robot
2015-09-28 16:22   ` Eric W. Biederman [this message]
2015-09-28 16:22     ` Eric W. Biederman
2015-09-28 16:57   ` Eric W. Biederman
2015-09-28 16:57     ` Eric W. Biederman
     [not found]     ` <87d1x25vng.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2015-10-20 10:04       ` Konstantin Khlebnikov
2015-10-20 10:04     ` Konstantin Khlebnikov
2015-10-20 10:04       ` Konstantin Khlebnikov
2015-09-25 17:56 ` [PATCH 0/1] ns: introduce proc_get_ns_by_fd() Oleg Nesterov
2015-09-25 17:56   ` Oleg Nesterov
     [not found]   ` <20150925175654.GA12504-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-09-25 17:57     ` [PATCH 1/1] " Oleg Nesterov
2015-09-25 17:57       ` Oleg Nesterov
2015-09-28  8:21     ` [PATCH 0/1] " Konstantin Khlebnikov
2015-09-28 16:37     ` Eric W. Biederman
2015-09-28  8:21   ` Konstantin Khlebnikov
2015-09-28  8:21     ` Konstantin Khlebnikov
2015-09-28 16:37   ` Eric W. Biederman
2015-09-28 16:37     ` Eric W. Biederman
2015-09-29 16:43     ` Oleg Nesterov
2015-09-29 16:43       ` Oleg Nesterov
     [not found]       ` <20150929164315.GA16734-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-09-29 17:30         ` Eric W. Biederman
2015-09-29 17:30       ` Eric W. Biederman
2015-09-29 17:30         ` Eric W. Biederman
2015-09-29 18:38         ` Oleg Nesterov
2015-09-29 18:38           ` Oleg Nesterov
     [not found]           ` <20150929183833.GA21875-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-09-29 19:05             ` Eric W. Biederman
2015-09-29 19:05               ` Eric W. Biederman
     [not found]         ` <874mid16bk.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2015-09-29 18:38           ` Oleg Nesterov
     [not found]     ` <871tdi8pqj.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2015-09-29 16:43       ` Oleg Nesterov
2015-09-30  2:54       ` Chen Fan
2015-09-30  2:54     ` Chen Fan
2015-09-30  2:54       ` Chen Fan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a8s6a4zm.fsf@x220.int.ebiederm.org \
    --to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=chen.fan.fnst-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org \
    --cc=klamm-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.