All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian@brauner.io>
To: Oleg Nesterov <oleg@redhat.com>
Cc: jannh@google.com, viro@zeniv.linux.org.uk,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	arnd@arndb.de, akpm@linux-foundation.org, cyphar@cyphar.com,
	dhowells@redhat.com, ebiederm@xmission.com,
	elena.reshetova@intel.com, keescook@chromium.org,
	luto@amacapital.net, luto@kernel.org, tglx@linutronix.de,
	linux-alpha@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org,
	linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org,
	linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-kselftest@vger.kernel.org, joel@joelfernandes.org,
	dancol@google.com, serge@hallyn.c
Subject: Re: [PATCH v1 1/2] pid: add pidfd_open()
Date: Thu, 16 May 2019 14:57:17 +0000	[thread overview]
Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> (raw)
In-Reply-To: <20190516142659.GB22564@redhat.com>

On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote:
> On 05/16, Christian Brauner wrote:
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> 
> Now I am wondering why do we need CLONE_PIDFD, you can just do
> 
> 	pid = fork();
> 	pidfd_open(pid);

CLONE_PIDFD eliminates the race at the source and let's us avoid two
syscalls for the sake of one. That'll obviously matter even more when we
enable CLONE_THREAD | CLONE_PIDFD.
pidfd_open() is really just a necessity for anyone who does non-parent
process management aka LMK or service managers.
I also would like to reserve the ability at some point (e.g. with cloneX
or sm) to be able to specify specific additional flags at process
creation time that modify pidfd behavior.

> 
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +	int fd, ret;
> > +	struct pid *p;
> > +	struct task_struct *tsk;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +
> > +	if (pid <= 0)
> > +		return -EINVAL;
> > +
> > +	p = find_get_pid(pid);
> > +	if (!p)
> > +		return -ESRCH;
> > +
> > +	ret = 0;
> > +	rcu_read_lock();
> > +	/*
> > +	 * If this returns non-NULL the pid was used as a thread-group
> > +	 * leader. Note, we race with exec here: If it changes the
> > +	 * thread-group leader we might return the old leader.
> > +	 */
> > +	tsk = pid_task(p, PIDTYPE_TGID);
> > +	if (!tsk)
> > +		ret = -ESRCH;
> > +	rcu_read_unlock();
> > +
> > +	fd = ret ?: pidfd_create(p);
> > +	put_pid(p);
> > +	return fd;
> > +}
> 
> Looks correct, feel free to add Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> 
> But why do we need task_struct *tsk?
> 
> 	rcu_read_lock();
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_read_unlock();

Sure, that's simpler. I'll rework and add your Reviewed-by.

> 
> and in fact we do not even need rcu_read_lock(), we could do
> 
> 	// shut up rcu_dereference_check()
> 	rcu_lock_acquire(&rcu_lock_map);
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_lock_release(&rcu_lock_map);
> 
> Well... I won't insist, but the comment about the race with exec looks a bit
> confusing to me. It is true, but we do not care at all, we are not going to
> use the task_struct returned by pid_task().

Yeah, I can remove it.

Thanks!
Christian

WARNING: multiple messages have this Message-ID (diff)
From: Christian Brauner <christian@brauner.io>
To: Oleg Nesterov <oleg@redhat.com>
Cc: jannh@google.com, viro@zeniv.linux.org.uk,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	arnd@arndb.de, akpm@linux-foundation.org, cyphar@cyphar.com,
	dhowells@redhat.com, ebiederm@xmission.com,
	elena.reshetova@intel.com, keescook@chromium.org,
	luto@amacapital.net, luto@kernel.org, tglx@linutronix.de,
	linux-alpha@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org,
	linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org,
	linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-kselftest@vger.kernel.org, joel@joelfernandes.org,
	dancol@google.com, serge@hallyn.com,
	Geert Uytterhoeven <geert@linux-m68k.org>
Subject: Re: [PATCH v1 1/2] pid: add pidfd_open()
Date: Thu, 16 May 2019 16:57:17 +0200	[thread overview]
Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> (raw)
In-Reply-To: <20190516142659.GB22564@redhat.com>

On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote:
> On 05/16, Christian Brauner wrote:
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> 
> Now I am wondering why do we need CLONE_PIDFD, you can just do
> 
> 	pid = fork();
> 	pidfd_open(pid);

CLONE_PIDFD eliminates the race at the source and let's us avoid two
syscalls for the sake of one. That'll obviously matter even more when we
enable CLONE_THREAD | CLONE_PIDFD.
pidfd_open() is really just a necessity for anyone who does non-parent
process management aka LMK or service managers.
I also would like to reserve the ability at some point (e.g. with cloneX
or sm) to be able to specify specific additional flags at process
creation time that modify pidfd behavior.

> 
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +	int fd, ret;
> > +	struct pid *p;
> > +	struct task_struct *tsk;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +
> > +	if (pid <= 0)
> > +		return -EINVAL;
> > +
> > +	p = find_get_pid(pid);
> > +	if (!p)
> > +		return -ESRCH;
> > +
> > +	ret = 0;
> > +	rcu_read_lock();
> > +	/*
> > +	 * If this returns non-NULL the pid was used as a thread-group
> > +	 * leader. Note, we race with exec here: If it changes the
> > +	 * thread-group leader we might return the old leader.
> > +	 */
> > +	tsk = pid_task(p, PIDTYPE_TGID);
> > +	if (!tsk)
> > +		ret = -ESRCH;
> > +	rcu_read_unlock();
> > +
> > +	fd = ret ?: pidfd_create(p);
> > +	put_pid(p);
> > +	return fd;
> > +}
> 
> Looks correct, feel free to add Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> 
> But why do we need task_struct *tsk?
> 
> 	rcu_read_lock();
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_read_unlock();

Sure, that's simpler. I'll rework and add your Reviewed-by.

> 
> and in fact we do not even need rcu_read_lock(), we could do
> 
> 	// shut up rcu_dereference_check()
> 	rcu_lock_acquire(&rcu_lock_map);
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_lock_release(&rcu_lock_map);
> 
> Well... I won't insist, but the comment about the race with exec looks a bit
> confusing to me. It is true, but we do not care at all, we are not going to
> use the task_struct returned by pid_task().

Yeah, I can remove it.

Thanks!
Christian

WARNING: multiple messages have this Message-ID (diff)
From: Christian Brauner <christian@brauner.io>
To: Oleg Nesterov <oleg@redhat.com>
Cc: jannh@google.com, viro@zeniv.linux.org.uk,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	arnd@arndb.de, akpm@linux-foundation.org, cyphar@cyphar.com,
	dhowells@redhat.com, ebiederm@xmission.com,
	elena.reshetova@intel.com, keescook@chromium.org,
	luto@amacapital.net, luto@kernel.org, tglx@linutronix.de,
	linux-alpha@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org,
	linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org,
	linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, linux-sh@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-kselftest@vger.kernel.org, joel@joelfernandes.org,
	dancol@google.com, serge@hallyn.c
Subject: Re: [PATCH v1 1/2] pid: add pidfd_open()
Date: Thu, 16 May 2019 16:57:17 +0200	[thread overview]
Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> (raw)
In-Reply-To: <20190516142659.GB22564@redhat.com>

On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote:
> On 05/16, Christian Brauner wrote:
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> 
> Now I am wondering why do we need CLONE_PIDFD, you can just do
> 
> 	pid = fork();
> 	pidfd_open(pid);

CLONE_PIDFD eliminates the race at the source and let's us avoid two
syscalls for the sake of one. That'll obviously matter even more when we
enable CLONE_THREAD | CLONE_PIDFD.
pidfd_open() is really just a necessity for anyone who does non-parent
process management aka LMK or service managers.
I also would like to reserve the ability at some point (e.g. with cloneX
or sm) to be able to specify specific additional flags at process
creation time that modify pidfd behavior.

> 
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +	int fd, ret;
> > +	struct pid *p;
> > +	struct task_struct *tsk;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +
> > +	if (pid <= 0)
> > +		return -EINVAL;
> > +
> > +	p = find_get_pid(pid);
> > +	if (!p)
> > +		return -ESRCH;
> > +
> > +	ret = 0;
> > +	rcu_read_lock();
> > +	/*
> > +	 * If this returns non-NULL the pid was used as a thread-group
> > +	 * leader. Note, we race with exec here: If it changes the
> > +	 * thread-group leader we might return the old leader.
> > +	 */
> > +	tsk = pid_task(p, PIDTYPE_TGID);
> > +	if (!tsk)
> > +		ret = -ESRCH;
> > +	rcu_read_unlock();
> > +
> > +	fd = ret ?: pidfd_create(p);
> > +	put_pid(p);
> > +	return fd;
> > +}
> 
> Looks correct, feel free to add Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> 
> But why do we need task_struct *tsk?
> 
> 	rcu_read_lock();
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_read_unlock();

Sure, that's simpler. I'll rework and add your Reviewed-by.

> 
> and in fact we do not even need rcu_read_lock(), we could do
> 
> 	// shut up rcu_dereference_check()
> 	rcu_lock_acquire(&rcu_lock_map);
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_lock_release(&rcu_lock_map);
> 
> Well... I won't insist, but the comment about the race with exec looks a bit
> confusing to me. It is true, but we do not care at all, we are not going to
> use the task_struct returned by pid_task().

Yeah, I can remove it.

Thanks!
Christian

WARNING: multiple messages have this Message-ID (diff)
From: christian at brauner.io (Christian Brauner)
Subject: [PATCH v1 1/2] pid: add pidfd_open()
Date: Thu, 16 May 2019 16:57:17 +0200	[thread overview]
Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> (raw)
In-Reply-To: <20190516142659.GB22564@redhat.com>

On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote:
> On 05/16, Christian Brauner wrote:
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> 
> Now I am wondering why do we need CLONE_PIDFD, you can just do
> 
> 	pid = fork();
> 	pidfd_open(pid);

CLONE_PIDFD eliminates the race at the source and let's us avoid two
syscalls for the sake of one. That'll obviously matter even more when we
enable CLONE_THREAD | CLONE_PIDFD.
pidfd_open() is really just a necessity for anyone who does non-parent
process management aka LMK or service managers.
I also would like to reserve the ability at some point (e.g. with cloneX
or sm) to be able to specify specific additional flags at process
creation time that modify pidfd behavior.

> 
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +	int fd, ret;
> > +	struct pid *p;
> > +	struct task_struct *tsk;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +
> > +	if (pid <= 0)
> > +		return -EINVAL;
> > +
> > +	p = find_get_pid(pid);
> > +	if (!p)
> > +		return -ESRCH;
> > +
> > +	ret = 0;
> > +	rcu_read_lock();
> > +	/*
> > +	 * If this returns non-NULL the pid was used as a thread-group
> > +	 * leader. Note, we race with exec here: If it changes the
> > +	 * thread-group leader we might return the old leader.
> > +	 */
> > +	tsk = pid_task(p, PIDTYPE_TGID);
> > +	if (!tsk)
> > +		ret = -ESRCH;
> > +	rcu_read_unlock();
> > +
> > +	fd = ret ?: pidfd_create(p);
> > +	put_pid(p);
> > +	return fd;
> > +}
> 
> Looks correct, feel free to add Reviewed-by: Oleg Nesterov <oleg at redhat.com>
> 
> But why do we need task_struct *tsk?
> 
> 	rcu_read_lock();
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_read_unlock();

Sure, that's simpler. I'll rework and add your Reviewed-by.

> 
> and in fact we do not even need rcu_read_lock(), we could do
> 
> 	// shut up rcu_dereference_check()
> 	rcu_lock_acquire(&rcu_lock_map);
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_lock_release(&rcu_lock_map);
> 
> Well... I won't insist, but the comment about the race with exec looks a bit
> confusing to me. It is true, but we do not care at all, we are not going to
> use the task_struct returned by pid_task().

Yeah, I can remove it.

Thanks!
Christian

WARNING: multiple messages have this Message-ID (diff)
From: christian@brauner.io (Christian Brauner)
Subject: [PATCH v1 1/2] pid: add pidfd_open()
Date: Thu, 16 May 2019 16:57:17 +0200	[thread overview]
Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> (raw)
Message-ID: <20190516145717.FAz8YSZ70Qj2Gsqlny2qQf7XrKGZBzR8Esfmv5LIHNw@z> (raw)
In-Reply-To: <20190516142659.GB22564@redhat.com>

On Thu, May 16, 2019@04:27:00PM +0200, Oleg Nesterov wrote:
> On 05/16, Christian Brauner wrote:
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> 
> Now I am wondering why do we need CLONE_PIDFD, you can just do
> 
> 	pid = fork();
> 	pidfd_open(pid);

CLONE_PIDFD eliminates the race at the source and let's us avoid two
syscalls for the sake of one. That'll obviously matter even more when we
enable CLONE_THREAD | CLONE_PIDFD.
pidfd_open() is really just a necessity for anyone who does non-parent
process management aka LMK or service managers.
I also would like to reserve the ability at some point (e.g. with cloneX
or sm) to be able to specify specific additional flags at process
creation time that modify pidfd behavior.

> 
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +	int fd, ret;
> > +	struct pid *p;
> > +	struct task_struct *tsk;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +
> > +	if (pid <= 0)
> > +		return -EINVAL;
> > +
> > +	p = find_get_pid(pid);
> > +	if (!p)
> > +		return -ESRCH;
> > +
> > +	ret = 0;
> > +	rcu_read_lock();
> > +	/*
> > +	 * If this returns non-NULL the pid was used as a thread-group
> > +	 * leader. Note, we race with exec here: If it changes the
> > +	 * thread-group leader we might return the old leader.
> > +	 */
> > +	tsk = pid_task(p, PIDTYPE_TGID);
> > +	if (!tsk)
> > +		ret = -ESRCH;
> > +	rcu_read_unlock();
> > +
> > +	fd = ret ?: pidfd_create(p);
> > +	put_pid(p);
> > +	return fd;
> > +}
> 
> Looks correct, feel free to add Reviewed-by: Oleg Nesterov <oleg at redhat.com>
> 
> But why do we need task_struct *tsk?
> 
> 	rcu_read_lock();
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_read_unlock();

Sure, that's simpler. I'll rework and add your Reviewed-by.

> 
> and in fact we do not even need rcu_read_lock(), we could do
> 
> 	// shut up rcu_dereference_check()
> 	rcu_lock_acquire(&rcu_lock_map);
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_lock_release(&rcu_lock_map);
> 
> Well... I won't insist, but the comment about the race with exec looks a bit
> confusing to me. It is true, but we do not care at all, we are not going to
> use the task_struct returned by pid_task().

Yeah, I can remove it.

Thanks!
Christian

WARNING: multiple messages have this Message-ID (diff)
From: Christian Brauner <christian@brauner.io>
To: Oleg Nesterov <oleg@redhat.com>
Cc: linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org,
	linux-mips@vger.kernel.org, dhowells@redhat.com,
	joel@joelfernandes.org, linux-kselftest@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-api@vger.kernel.org,
	elena.reshetova@intel.com, linux-arch@vger.kernel.org,
	linux-s390@vger.kernel.org, dancol@google.com,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	serge@hallyn.com, linux-xtensa@linux-xtensa.org,
	keescook@chromium.org, arnd@arndb.de, jannh@google.com,
	linux-m68k@lists.linux-m68k.org, viro@zeniv.linux.org.uk,
	luto@kernel.org, tglx@linutronix.de,
	linux-arm-kernel@lists.infradead.org,
	linux-parisc@vger.kernel.org, cyphar@cyphar.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	luto@amacapital.net, ebiederm@xmission.com,
	linux-alpha@vger.kernel.org, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v1 1/2] pid: add pidfd_open()
Date: Thu, 16 May 2019 16:57:17 +0200	[thread overview]
Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> (raw)
In-Reply-To: <20190516142659.GB22564@redhat.com>

On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote:
> On 05/16, Christian Brauner wrote:
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> 
> Now I am wondering why do we need CLONE_PIDFD, you can just do
> 
> 	pid = fork();
> 	pidfd_open(pid);

CLONE_PIDFD eliminates the race at the source and let's us avoid two
syscalls for the sake of one. That'll obviously matter even more when we
enable CLONE_THREAD | CLONE_PIDFD.
pidfd_open() is really just a necessity for anyone who does non-parent
process management aka LMK or service managers.
I also would like to reserve the ability at some point (e.g. with cloneX
or sm) to be able to specify specific additional flags at process
creation time that modify pidfd behavior.

> 
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +	int fd, ret;
> > +	struct pid *p;
> > +	struct task_struct *tsk;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +
> > +	if (pid <= 0)
> > +		return -EINVAL;
> > +
> > +	p = find_get_pid(pid);
> > +	if (!p)
> > +		return -ESRCH;
> > +
> > +	ret = 0;
> > +	rcu_read_lock();
> > +	/*
> > +	 * If this returns non-NULL the pid was used as a thread-group
> > +	 * leader. Note, we race with exec here: If it changes the
> > +	 * thread-group leader we might return the old leader.
> > +	 */
> > +	tsk = pid_task(p, PIDTYPE_TGID);
> > +	if (!tsk)
> > +		ret = -ESRCH;
> > +	rcu_read_unlock();
> > +
> > +	fd = ret ?: pidfd_create(p);
> > +	put_pid(p);
> > +	return fd;
> > +}
> 
> Looks correct, feel free to add Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> 
> But why do we need task_struct *tsk?
> 
> 	rcu_read_lock();
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_read_unlock();

Sure, that's simpler. I'll rework and add your Reviewed-by.

> 
> and in fact we do not even need rcu_read_lock(), we could do
> 
> 	// shut up rcu_dereference_check()
> 	rcu_lock_acquire(&rcu_lock_map);
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_lock_release(&rcu_lock_map);
> 
> Well... I won't insist, but the comment about the race with exec looks a bit
> confusing to me. It is true, but we do not care at all, we are not going to
> use the task_struct returned by pid_task().

Yeah, I can remove it.

Thanks!
Christian

WARNING: multiple messages have this Message-ID (diff)
From: Christian Brauner <christian@brauner.io>
To: Oleg Nesterov <oleg@redhat.com>
Cc: linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org,
	linux-mips@vger.kernel.org, dhowells@redhat.com,
	joel@joelfernandes.org, linux-kselftest@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-api@vger.kernel.org,
	elena.reshetova@intel.com, linux-arch@vger.kernel.org,
	linux-s390@vger.kernel.org, dancol@google.com,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	serge@hallyn.com, linux-xtensa@linux-xtensa.org,
	keescook@chromium.org, arnd@arndb.de, jannh@google.com,
	linux-m68k@lists.linux-m68k.org, viro@zeniv.linux.org.uk,
	luto@kernel.org, tglx@linutronix.de,
	linux-arm-kernel@lists.infradead.org,
	linux-parisc@vger.kernel.org, cyphar@cyphar.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	luto@amacapital.net, ebiederm@xmission.com,
	linux-alpha@vger.kernel.org, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v1 1/2] pid: add pidfd_open()
Date: Thu, 16 May 2019 16:57:17 +0200	[thread overview]
Message-ID: <20190516145716.ool2pzdqbfclnnqi@brauner.io> (raw)
In-Reply-To: <20190516142659.GB22564@redhat.com>

On Thu, May 16, 2019 at 04:27:00PM +0200, Oleg Nesterov wrote:
> On 05/16, Christian Brauner wrote:
> >
> > With the introduction of pidfds through CLONE_PIDFD it is possible to
> > created pidfds at process creation time.
> 
> Now I am wondering why do we need CLONE_PIDFD, you can just do
> 
> 	pid = fork();
> 	pidfd_open(pid);

CLONE_PIDFD eliminates the race at the source and let's us avoid two
syscalls for the sake of one. That'll obviously matter even more when we
enable CLONE_THREAD | CLONE_PIDFD.
pidfd_open() is really just a necessity for anyone who does non-parent
process management aka LMK or service managers.
I also would like to reserve the ability at some point (e.g. with cloneX
or sm) to be able to specify specific additional flags at process
creation time that modify pidfd behavior.

> 
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +	int fd, ret;
> > +	struct pid *p;
> > +	struct task_struct *tsk;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +
> > +	if (pid <= 0)
> > +		return -EINVAL;
> > +
> > +	p = find_get_pid(pid);
> > +	if (!p)
> > +		return -ESRCH;
> > +
> > +	ret = 0;
> > +	rcu_read_lock();
> > +	/*
> > +	 * If this returns non-NULL the pid was used as a thread-group
> > +	 * leader. Note, we race with exec here: If it changes the
> > +	 * thread-group leader we might return the old leader.
> > +	 */
> > +	tsk = pid_task(p, PIDTYPE_TGID);
> > +	if (!tsk)
> > +		ret = -ESRCH;
> > +	rcu_read_unlock();
> > +
> > +	fd = ret ?: pidfd_create(p);
> > +	put_pid(p);
> > +	return fd;
> > +}
> 
> Looks correct, feel free to add Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> 
> But why do we need task_struct *tsk?
> 
> 	rcu_read_lock();
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_read_unlock();

Sure, that's simpler. I'll rework and add your Reviewed-by.

> 
> and in fact we do not even need rcu_read_lock(), we could do
> 
> 	// shut up rcu_dereference_check()
> 	rcu_lock_acquire(&rcu_lock_map);
> 	if (!pid_task(PIDTYPE_TGID))
> 		ret = -ESRCH;
> 	rcu_lock_release(&rcu_lock_map);
> 
> Well... I won't insist, but the comment about the race with exec looks a bit
> confusing to me. It is true, but we do not care at all, we are not going to
> use the task_struct returned by pid_task().

Yeah, I can remove it.

Thanks!
Christian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2019-05-16 14:57 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16 13:59 [PATCH v1 1/2] pid: add pidfd_open() Christian Brauner
2019-05-16 13:59 ` Christian Brauner
2019-05-16 13:59 ` Christian Brauner
2019-05-16 13:59 ` Christian Brauner
2019-05-16 13:59 ` christian
2019-05-16 13:59 ` Christian Brauner
2019-05-16 13:59 ` [PATCH v1 2/2] tests: add pidfd_open() tests Christian Brauner
2019-05-16 13:59   ` Christian Brauner
2019-05-16 13:59   ` Christian Brauner
2019-05-16 13:59   ` Christian Brauner
2019-05-16 13:59   ` christian
2019-05-16 13:59   ` Christian Brauner
2019-05-16 14:27 ` [PATCH v1 1/2] pid: add pidfd_open() Oleg Nesterov
2019-05-16 14:27   ` Oleg Nesterov
2019-05-16 14:27   ` Oleg Nesterov
2019-05-16 14:27   ` Oleg Nesterov
2019-05-16 14:27   ` oleg
2019-05-16 14:27   ` Oleg Nesterov
2019-05-16 14:27   ` Oleg Nesterov
2019-05-16 14:56   ` Aleksa Sarai
2019-05-16 14:56     ` Aleksa Sarai
2019-05-16 14:56     ` Aleksa Sarai
2019-05-16 14:56     ` Aleksa Sarai
2019-05-16 14:56     ` Aleksa Sarai
2019-05-16 14:56     ` cyphar
2019-05-16 14:56     ` Aleksa Sarai
2019-05-16 14:56     ` Aleksa Sarai
2019-05-16 15:06     ` Oleg Nesterov
2019-05-16 15:06       ` Oleg Nesterov
2019-05-16 15:06       ` Oleg Nesterov
2019-05-16 15:06       ` Oleg Nesterov
2019-05-16 15:06       ` Oleg Nesterov
2019-05-16 15:06       ` oleg
2019-05-16 15:06       ` Oleg Nesterov
2019-05-16 15:06       ` Oleg Nesterov
2019-05-16 15:12       ` Aleksa Sarai
2019-05-16 15:12         ` Aleksa Sarai
2019-05-16 15:12         ` Aleksa Sarai
2019-05-16 15:12         ` Aleksa Sarai
2019-05-16 15:12         ` Aleksa Sarai
2019-05-16 15:12         ` cyphar
2019-05-16 15:12         ` Aleksa Sarai
2019-05-16 15:12         ` Aleksa Sarai
2019-05-16 15:22         ` Oleg Nesterov
2019-05-16 15:22           ` Oleg Nesterov
2019-05-16 15:22           ` Oleg Nesterov
2019-05-16 15:22           ` Oleg Nesterov
2019-05-16 15:22           ` Oleg Nesterov
2019-05-16 15:22           ` oleg
2019-05-16 15:22           ` Oleg Nesterov
2019-05-16 15:22           ` Oleg Nesterov
2019-05-16 15:29           ` Christian Brauner
2019-05-16 15:29             ` Christian Brauner
2019-05-16 15:29             ` Christian Brauner
2019-05-16 15:29             ` Christian Brauner
2019-05-16 15:29             ` christian
2019-05-16 15:29             ` Christian Brauner
2019-05-16 15:29             ` Christian Brauner
2019-05-16 14:57   ` Christian Brauner [this message]
2019-05-16 14:57     ` Christian Brauner
2019-05-16 14:57     ` Christian Brauner
2019-05-16 14:57     ` Christian Brauner
2019-05-16 14:57     ` christian
2019-05-16 14:57     ` Christian Brauner
2019-05-16 14:57     ` Christian Brauner
2019-05-16 14:56 ` Geert Uytterhoeven
2019-05-16 14:56   ` Geert Uytterhoeven
2019-05-16 14:56   ` Geert Uytterhoeven
2019-05-16 14:56   ` Geert Uytterhoeven
2019-05-16 14:56   ` Geert Uytterhoeven
2019-05-16 14:56   ` geert
2019-05-16 14:56   ` Geert Uytterhoeven
2019-05-16 14:56   ` Geert Uytterhoeven
2019-05-16 14:58   ` Christian Brauner
2019-05-16 14:58     ` Christian Brauner
2019-05-16 14:58     ` Christian Brauner
2019-05-16 14:58     ` Christian Brauner
2019-05-16 14:58     ` Christian Brauner
2019-05-16 14:58     ` christian
2019-05-16 14:58     ` Christian Brauner
2019-05-16 14:58     ` Christian Brauner
2019-05-18  9:48 ` Joel Fernandes
2019-05-18  9:48   ` Joel Fernandes
2019-05-18  9:48   ` Joel Fernandes
2019-05-18  9:48   ` Joel Fernandes
2019-05-18  9:48   ` joel
2019-05-18  9:48   ` Joel Fernandes
2019-05-18  9:48   ` Joel Fernandes
2019-05-18 10:04   ` Christian Brauner
2019-05-18 10:04     ` Christian Brauner
2019-05-18 10:04     ` Christian Brauner
2019-05-18 10:04     ` Christian Brauner
2019-05-18 10:04     ` christian
2019-05-18 10:04     ` Christian Brauner
2019-05-18 10:04     ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190516145716.ool2pzdqbfclnnqi@brauner.io \
    --to=christian@brauner.io \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=cyphar@cyphar.com \
    --cc=dancol@google.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=elena.reshetova@intel.com \
    --cc=jannh@google.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux-xtensa@linux-xtensa.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.c \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.