LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v1 0/2] proc: allow signaling processes via file descriptors
@ 2018-11-19 10:32 Christian Brauner
  2018-11-19 10:32 ` [PATCH v1 1/2] proc: get process file descriptor from /proc/<pid> Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 10:32 UTC (permalink / raw)
  To: ebiederm, linux-kernel
  Cc: serge, jannh, luto, akpm, oleg, cyphar, viro, linux-fsdevel,
	linux-api, dancol, timmurray, linux-man, Christian Brauner

Hey,

This little series introduces the ability to signal processes via file
descriptors to eliminate race-conditions caused by pid recycling.
With this patch an open() call on /proc/<pid> will give userspace a
handle to struct pid of the process associated with /proc/<pid>. This
allows to maintain a stable handle on a process.
Discussion has shown that a dedicated syscall is prefered over an ioctl().
Thus, the  new syscall procfd_signal() is introduced to solve this
problem. It operates on a process file descriptor. More details are
found in the individual commit messages.

With this series a process can be killed via:

 #define _GNU_SOURCE
 #include <errno.h>
 #include <fcntl.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <signal.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <unistd.h>

 int main(int argc, char *argv[])
 {
         int ret;
         char buf[1000];

         if (argc < 2)
                 exit(EXIT_FAILURE);

         ret = snprintf(buf, sizeof(buf), "/proc/%s", argv[1]);
         if (ret < 0)
                 exit(EXIT_FAILURE);

         int fd = open(buf, O_DIRECTORY | O_CLOEXEC);
         if (fd < 0) {
                 printf("%s - Failed to open \"%s\"\n", strerror(errno), buf);
                 exit(EXIT_FAILURE);
         }

         ret = syscall(__NR_procfd_signal, fd, SIGKILL, NULL, 0);
         if (ret < 0) {
                 printf("Failed to send SIGKILL \"%s\"\n", strerror(errno));
                 close(fd);
                 exit(EXIT_FAILURE);
         }

         close(fd);

         exit(EXIT_SUCCESS);
 }

Thanks!
Christian

Christian Brauner (2):
  proc: get process file descriptor from /proc/<pid>
  signal: add procfd_signal() syscall
  procfd_signal.2: document procfd_signal syscall

 arch/x86/entry/syscalls/syscall_32.tbl |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl |  1 +
 fs/proc/base.c                         | 23 ++++++++
 include/linux/proc_fs.h                |  1 +
 include/linux/syscalls.h               |  2 +
 kernel/signal.c                        | 76 ++++++++++++++++++++++++--
 6 files changed, 98 insertions(+), 6 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 1/2] proc: get process file descriptor from /proc/<pid>
  2018-11-19 10:32 [PATCH v1 0/2] proc: allow signaling processes via file descriptors Christian Brauner
@ 2018-11-19 10:32 ` Christian Brauner
  2018-11-19 15:32   ` Andy Lutomirski
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
  2018-11-19 10:32 ` [PATCH] procfd_signal.2: document procfd_signal syscall Christian Brauner
  2 siblings, 1 reply; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 10:32 UTC (permalink / raw)
  To: ebiederm, linux-kernel
  Cc: serge, jannh, luto, akpm, oleg, cyphar, viro, linux-fsdevel,
	linux-api, dancol, timmurray, linux-man, Christian Brauner,
	Kees Cook

With this patch an open() call on /proc/<pid> will give userspace a handle
to struct pid of the process associated with /proc/<pid>. This allows to
maintain a stable handle on a process.
I have been discussing various approaches extensively during technical
conferences this year culminating in a long argument with Eric at Linux
Plumbers. The general consensus was that having a handle on a process
should be something that is very simple and easy to maintain with the
option of being extensible via a more advanced api if the need arises. I
believe that this patch is the most simple, dumb, and therefore
maintainable solution.

[1]: https://lkml.org/lkml/2018/10/30/118

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
---
Changelog:
v1:
- remove ioctl() to signal processes and replace with a dedicated syscall
  in the next patch
---
 fs/proc/base.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index ce3465479447..6365a4fea314 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3032,10 +3032,27 @@ static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
 				   tgid_base_stuff, ARRAY_SIZE(tgid_base_stuff));
 }
 
+static int proc_tgid_open(struct inode *inode, struct file *file)
+{
+	/* grab reference to struct pid and stash the pointer away */
+	file->private_data = get_pid(proc_pid(inode));
+	return 0;
+}
+
+static int proc_tgid_release(struct inode *inode, struct file *file)
+{
+	struct pid *pid = file->private_data;
+	/* drop reference to struct pid */
+	put_pid(pid);
+	return 0;
+}
+
 static const struct file_operations proc_tgid_base_operations = {
+	.open		= proc_tgid_open,
 	.read		= generic_read_dir,
 	.iterate_shared	= proc_tgid_base_readdir,
 	.llseek		= generic_file_llseek,
+	.release	= proc_tgid_release,
 };
 
 static struct dentry *proc_tgid_base_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
-- 
2.19.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 [PATCH v1 0/2] proc: allow signaling processes via file descriptors Christian Brauner
  2018-11-19 10:32 ` [PATCH v1 1/2] proc: get process file descriptor from /proc/<pid> Christian Brauner
@ 2018-11-19 10:32 ` Christian Brauner
  2018-11-19 15:45   ` Andy Lutomirski
                     ` (8 more replies)
  2018-11-19 10:32 ` [PATCH] procfd_signal.2: document procfd_signal syscall Christian Brauner
  2 siblings, 9 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 10:32 UTC (permalink / raw)
  To: ebiederm, linux-kernel
  Cc: serge, jannh, luto, akpm, oleg, cyphar, viro, linux-fsdevel,
	linux-api, dancol, timmurray, linux-man, Christian Brauner,
	Kees Cook

The kill() syscall operates on process identifiers. After a process has
exited its pid can be reused by another process. If a caller sends a signal
to a reused pid it will end up signaling the wrong process. This issue has
often surfaced and there has been a push [1] to address this problem.

A prior patch has introduced the ability to get a file descriptor
referencing struct pid by opening /proc/<pid>. This guarantees a stable
handle on a process which can be used to send signals to the referenced
process. Discussion has shown that a dedicated syscall is preferable over
ioctl()s. Thus, the  new syscall procfd_signal() is introduced to solve
this problem. It operates on a process file descriptor.
The syscall takes an additional siginfo_t and flags argument. If siginfo_t
is NULL then procfd_signal() behaves like kill() if it is not NULL it
behaves like rt_sigqueueinfo.
The flags argument is added to allow for future extensions of this syscall.
It currently needs to be passed as 0.

With this patch a process can be killed via:

 #define _GNU_SOURCE
 #include <errno.h>
 #include <fcntl.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <signal.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <unistd.h>

 int main(int argc, char *argv[])
 {
         int ret;
         char buf[1000];

         if (argc < 2)
                 exit(EXIT_FAILURE);

         ret = snprintf(buf, sizeof(buf), "/proc/%s", argv[1]);
         if (ret < 0)
                 exit(EXIT_FAILURE);

         int fd = open(buf, O_DIRECTORY | O_CLOEXEC);
         if (fd < 0) {
                 printf("%s - Failed to open \"%s\"\n", strerror(errno), buf);
                 exit(EXIT_FAILURE);
         }

         ret = syscall(__NR_procfd_signal, fd, SIGKILL, NULL, 0);
         if (ret < 0) {
                 printf("Failed to send SIGKILL \"%s\"\n", strerror(errno));
                 close(fd);
                 exit(EXIT_FAILURE);
         }

         close(fd);

         exit(EXIT_SUCCESS);
 }

[1]: https://lkml.org/lkml/2018/11/18/130

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
---
Changelog:
v1:
- patch introduced
---
 arch/x86/entry/syscalls/syscall_32.tbl |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl |  1 +
 fs/proc/base.c                         |  6 ++
 include/linux/proc_fs.h                |  1 +
 include/linux/syscalls.h               |  2 +
 kernel/signal.c                        | 76 ++++++++++++++++++++++++--
 6 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 3cf7b533b3d1..e637eab883e9 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -398,3 +398,4 @@
 384	i386	arch_prctl		sys_arch_prctl			__ia32_compat_sys_arch_prctl
 385	i386	io_pgetevents		sys_io_pgetevents		__ia32_compat_sys_io_pgetevents
 386	i386	rseq			sys_rseq			__ia32_sys_rseq
+387	i386	procfd_signal		sys_procfd_signal		__ia32_sys_procfd_signal
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index f0b1709a5ffb..e95f6741ab42 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -343,6 +343,7 @@
 332	common	statx			__x64_sys_statx
 333	common	io_pgetevents		__x64_sys_io_pgetevents
 334	common	rseq			__x64_sys_rseq
+335	common	procfd_signal		__x64_sys_procfd_signal
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6365a4fea314..a12c9de92bd0 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3055,6 +3055,12 @@ static const struct file_operations proc_tgid_base_operations = {
 	.release	= proc_tgid_release,
 };
 
+bool proc_is_procfd(const struct file *file)
+{
+	return d_is_dir(file->f_path.dentry) &&
+	       (file->f_op == &proc_tgid_base_operations);
+}
+
 static struct dentry *proc_tgid_base_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
 {
 	return proc_pident_lookup(dir, dentry,
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index d0e1f1522a78..2d53a47fba34 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -73,6 +73,7 @@ struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mo
 						    int (*show)(struct seq_file *, void *),
 						    proc_write_t write,
 						    void *data);
+extern bool proc_is_procfd(const struct file *file);
 
 #else /* CONFIG_PROC_FS */
 
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 2ac3d13a915b..a5ca8cb84566 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -907,6 +907,8 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
 			 int flags, uint32_t sig);
+asmlinkage long sys_procfd_signal(int fd, int sig, siginfo_t __user *info,
+				  int flags);
 
 /*
  * Architecture-specific system calls
diff --git a/kernel/signal.c b/kernel/signal.c
index 9a32bc2088c9..e8a8929de710 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -19,7 +19,9 @@
 #include <linux/sched/task.h>
 #include <linux/sched/task_stack.h>
 #include <linux/sched/cputime.h>
+#include <linux/file.h>
 #include <linux/fs.h>
+#include <linux/proc_fs.h>
 #include <linux/tty.h>
 #include <linux/binfmts.h>
 #include <linux/coredump.h>
@@ -3286,6 +3288,16 @@ COMPAT_SYSCALL_DEFINE4(rt_sigtimedwait, compat_sigset_t __user *, uthese,
 }
 #endif
 
+static inline void prepare_kill_siginfo(int sig, struct kernel_siginfo *info)
+{
+	clear_siginfo(info);
+	info->si_signo = sig;
+	info->si_errno = 0;
+	info->si_code = SI_USER;
+	info->si_pid = task_tgid_vnr(current);
+	info->si_uid = from_kuid_munged(current_user_ns(), current_uid());
+}
+
 /**
  *  sys_kill - send a signal to a process
  *  @pid: the PID of the process
@@ -3295,16 +3307,68 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
 {
 	struct kernel_siginfo info;
 
-	clear_siginfo(&info);
-	info.si_signo = sig;
-	info.si_errno = 0;
-	info.si_code = SI_USER;
-	info.si_pid = task_tgid_vnr(current);
-	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+	prepare_kill_siginfo(sig, &info);
 
 	return kill_something_info(sig, &info, pid);
 }
 
+/**
+ *  sys_procfd_signal - send a signal to a process through a process file
+ *                      descriptor
+ *  @fd: the file descriptor of the process
+ *  @sig: signal to be sent
+ *  @info: the signal info
+ *  @flags: future flags to be passed
+ */
+SYSCALL_DEFINE4(procfd_signal, int, fd, int, sig, siginfo_t __user *, info,
+		int, flags)
+{
+	int ret;
+	struct pid *pid;
+	kernel_siginfo_t kinfo;
+	struct fd f;
+
+	/* Enforce flags be set to 0 until we add an extension. */
+	if (flags)
+		return -EINVAL;
+
+	f = fdget_raw(fd);
+	if (!f.file)
+		return -EBADF;
+
+	ret = -EINVAL;
+	/* Is this a process file descriptor? */
+	if (!proc_is_procfd(f.file))
+		goto err;
+
+	pid = f.file->private_data;
+	if (!pid)
+		goto err;
+
+	if (info) {
+		ret = __copy_siginfo_from_user(sig, &kinfo, info);
+		if (unlikely(ret))
+			goto err;
+		/*
+		 * Not even root can pretend to send signals from the kernel.
+		 * Nor can they impersonate a kill()/tgkill(), which adds
+		 * source info.
+		 */
+		ret = -EPERM;
+		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
+		    (task_pid(current) != pid))
+			goto err;
+	} else {
+		prepare_kill_siginfo(sig, &kinfo);
+	}
+
+	ret = kill_pid_info(sig, &kinfo, pid);
+
+err:
+	fdput(f);
+	return ret;
+}
+
 static int
 do_send_specific(pid_t tgid, pid_t pid, int sig, struct kernel_siginfo *info)
 {
-- 
2.19.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] procfd_signal.2: document procfd_signal syscall
  2018-11-19 10:32 [PATCH v1 0/2] proc: allow signaling processes via file descriptors Christian Brauner
  2018-11-19 10:32 ` [PATCH v1 1/2] proc: get process file descriptor from /proc/<pid> Christian Brauner
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
@ 2018-11-19 10:32 ` Christian Brauner
  2018-11-20 13:29   ` Michael Kerrisk (man-pages)
  2018-11-28 20:59   ` Florian Weimer
  2 siblings, 2 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 10:32 UTC (permalink / raw)
  To: ebiederm, linux-kernel
  Cc: serge, jannh, luto, akpm, oleg, cyphar, viro, linux-fsdevel,
	linux-api, dancol, timmurray, linux-man, Christian Brauner

Signed-off-by: Christian Brauner <christian@brauner.io>
---
Changelog:
v1:
- patch introduced
---
 man2/procfd_signal.2 | 147 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 147 insertions(+)
 create mode 100644 man2/procfd_signal.2

diff --git a/man2/procfd_signal.2 b/man2/procfd_signal.2
new file mode 100644
index 000000000..6af0b74f4
--- /dev/null
+++ b/man2/procfd_signal.2
@@ -0,0 +1,147 @@
+.\" Copyright (C) 2018 Christian Brauner <christian@brauner.io>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH PROCFD_SIGNAL 2 2017-09-15 "Linux" "Linux Programmer's Manual"
+.SH NAME
+procfd_signal \- send signal to a process through a process descriptor
+.SH SYNOPSIS
+.nf
+.B #include <sys/types.h>
+.B #include <signal.h>
+.PP
+.BI "int procfd_signal(int " fd ", int " sig ", siginfo_t *" info ", int " flags );
+.fi
+.SH DESCRIPTION
+.BR procfd_signal ()
+sends the signal specified in
+.I sig
+to the process identified by the file descriptor
+.IR fd .
+The permissions required to send a signal are the same as for
+.BR kill (2).
+As with
+.BR kill (2),
+the null signal (0) can be used to check if a process with a given
+PID exists.
+.PP
+The optional
+.I info
+argument specifies the data to accompany the signal.
+This argument is a pointer to a structure of type
+.IR siginfo_t ,
+described in
+.BR sigaction (2)
+(and defined by including
+.IR <sigaction.h> ).
+The caller should set the following fields in this structure:
+.TP
+.I si_code
+This must be one of the
+.B SI_*
+codes in the Linux kernel source file
+.IR include/asm-generic/siginfo.h ,
+with the restriction that the code must be negative
+(i.e., cannot be
+.BR SI_USER ,
+which is used by the kernel to indicate a signal sent by
+.BR kill (2))
+and cannot (since Linux 2.6.39) be
+.BR SI_TKILL
+(which is used by the kernel to indicate a signal sent using
+.\" tkill(2) or
+.BR tgkill (2)).
+.TP
+.I si_pid
+This should be set to a process ID,
+typically the process ID of the sender.
+.TP
+.I si_uid
+This should be set to a user ID,
+typically the real user ID of the sender.
+.TP
+.I si_value
+This field contains the user data to accompany the signal.
+For more information, see the description of the last
+.RI ( "union sigval" )
+argument of
+.BR sigqueue (3).
+.PP
+Internally, the kernel sets the
+.I si_signo
+field to the value specified in
+.IR sig ,
+so that the receiver of the signal can also obtain
+the signal number via that field.
+.PP
+The
+.I flags
+argument is reserved for future extension and must be set to 0.
+.PP
+.SH RETURN VALUE
+On success, this system call returns 0.
+On error, it returns \-1 and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EBADF
+.I fd
+is not a valid file descriptor.
+.TP
+.B EINVAL
+An invalid signal was specified.
+.TP
+.B EINVAL
+.I fd
+does not refer to a process.
+.TP
+.B EINVAL
+The flags argument was not 0.
+.TP
+.B EPERM
+The caller does not have permission to send the signal to the target.
+For the required permissions, see
+.BR kill (2).
+Or:
+.I uinfo->si_code
+is invalid.
+.TP
+.B ESRCH
+The process or process group does not exist.
+Note that an existing process might be a zombie,
+a process that has terminated execution, but
+has not yet been
+.BR wait (2)ed
+for.
+.SH CONFORMING TO
+This system call is Linux-specific.
+.SH SEE ALSO
+.BR kill (2),
+.BR sigaction (2),
+.BR sigprocmask (2),
+.BR tgkill (2),
+.BR pthread_sigqueue (3),
+.BR rt_sigqueueinfo (2),
+.BR sigqueue (3),
+.BR signal (7)
-- 
2.19.1


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 1/2] proc: get process file descriptor from /proc/<pid>
  2018-11-19 10:32 ` [PATCH v1 1/2] proc: get process file descriptor from /proc/<pid> Christian Brauner
@ 2018-11-19 15:32   ` Andy Lutomirski
  2018-11-19 18:20     ` Christian Brauner
  0 siblings, 1 reply; 50+ messages in thread
From: Andy Lutomirski @ 2018-11-19 15:32 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, LKML, Serge E. Hallyn, Jann Horn,
	Andrew Lutomirski, Andrew Morton, Oleg Nesterov, Aleksa Sarai,
	Al Viro, Linux FS Devel, Linux API, Daniel Colascione,
	Tim Murray, linux-man, Kees Cook

On Mon, Nov 19, 2018 at 2:33 AM Christian Brauner <christian@brauner.io> wrote:
>
> With this patch an open() call on /proc/<pid> will give userspace a handle
> to struct pid of the process associated with /proc/<pid>. This allows to
> maintain a stable handle on a process.
> I have been discussing various approaches extensively during technical
> conferences this year culminating in a long argument with Eric at Linux
> Plumbers. The general consensus was that having a handle on a process
> should be something that is very simple and easy to maintain with the
> option of being extensible via a more advanced api if the need arises. I
> believe that this patch is the most simple, dumb, and therefore
> maintainable solution.

How does the mechanism you're adding here differ from proc_pid()?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
@ 2018-11-19 15:45   ` Andy Lutomirski
  2018-11-19 15:57     ` Daniel Colascione
  2018-11-19 18:39     ` Christian Brauner
  2018-11-19 15:59   ` Daniel Colascione
                     ` (7 subsequent siblings)
  8 siblings, 2 replies; 50+ messages in thread
From: Andy Lutomirski @ 2018-11-19 15:45 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, LKML, Serge E. Hallyn, Jann Horn,
	Andrew Lutomirski, Andrew Morton, Oleg Nesterov, Aleksa Sarai,
	Al Viro, Linux FS Devel, Linux API, Daniel Colascione,
	Tim Murray, linux-man, Kees Cook

On Mon, Nov 19, 2018 at 2:33 AM Christian Brauner <christian@brauner.io> wrote:
>
> The kill() syscall operates on process identifiers. After a process has
> exited its pid can be reused by another process. If a caller sends a signal
> to a reused pid it will end up signaling the wrong process. This issue has
> often surfaced and there has been a push [1] to address this problem.
>
> A prior patch has introduced the ability to get a file descriptor
> referencing struct pid by opening /proc/<pid>. This guarantees a stable
> handle on a process which can be used to send signals to the referenced
> process. Discussion has shown that a dedicated syscall is preferable over
> ioctl()s. Thus, the  new syscall procfd_signal() is introduced to solve
> this problem. It operates on a process file descriptor.
> The syscall takes an additional siginfo_t and flags argument. If siginfo_t
> is NULL then procfd_signal() behaves like kill() if it is not NULL it
> behaves like rt_sigqueueinfo.
> The flags argument is added to allow for future extensions of this syscall.
> It currently needs to be passed as 0.

A few questions.  First: you've made this work on /proc/PID, but
should it also work on /proc/PID/task/TID to send signals to a
specific thread?

> +bool proc_is_procfd(const struct file *file)
> +{
> +       return d_is_dir(file->f_path.dentry) &&
> +              (file->f_op == &proc_tgid_base_operations);
> +}

Maybe rename to proc_is_tgid_procfd() to leave room for proc_is_tid_procfd()?

> +       if (info) {
> +               ret = __copy_siginfo_from_user(sig, &kinfo, info);
> +               if (unlikely(ret))
> +                       goto err;
> +               /*
> +                * Not even root can pretend to send signals from the kernel.
> +                * Nor can they impersonate a kill()/tgkill(), which adds
> +                * source info.
> +                */
> +               ret = -EPERM;
> +               if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> +                   (task_pid(current) != pid))
> +                       goto err;

Is the exception for signaling yourself actually useful here?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 15:45   ` Andy Lutomirski
@ 2018-11-19 15:57     ` Daniel Colascione
  2018-11-19 18:39     ` Christian Brauner
  1 sibling, 0 replies; 50+ messages in thread
From: Daniel Colascione @ 2018-11-19 15:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christian Brauner, Eric W. Biederman, LKML, Serge E. Hallyn,
	Jann Horn, Andrew Morton, Oleg Nesterov, Aleksa Sarai, Al Viro,
	Linux FS Devel, Linux API, Tim Murray, linux-man, Kees Cook

On Mon, Nov 19, 2018 at 7:45 AM, Andy Lutomirski <luto@kernel.org> wrote:
> On Mon, Nov 19, 2018 at 2:33 AM Christian Brauner <christian@brauner.io> wrote:
>>
>> The kill() syscall operates on process identifiers. After a process has
>> exited its pid can be reused by another process. If a caller sends a signal
>> to a reused pid it will end up signaling the wrong process. This issue has
>> often surfaced and there has been a push [1] to address this problem.
>>
>> A prior patch has introduced the ability to get a file descriptor
>> referencing struct pid by opening /proc/<pid>. This guarantees a stable
>> handle on a process which can be used to send signals to the referenced
>> process. Discussion has shown that a dedicated syscall is preferable over
>> ioctl()s. Thus, the  new syscall procfd_signal() is introduced to solve
>> this problem. It operates on a process file descriptor.
>> The syscall takes an additional siginfo_t and flags argument. If siginfo_t
>> is NULL then procfd_signal() behaves like kill() if it is not NULL it
>> behaves like rt_sigqueueinfo.
>> The flags argument is added to allow for future extensions of this syscall.
>> It currently needs to be passed as 0.
>
> A few questions.  First: you've made this work on /proc/PID, but
> should it also work on /proc/PID/task/TID to send signals to a
> specific thread?

+1

>> +       if (info) {
>> +               ret = __copy_siginfo_from_user(sig, &kinfo, info);
>> +               if (unlikely(ret))
>> +                       goto err;
>> +               /*
>> +                * Not even root can pretend to send signals from the kernel.
>> +                * Nor can they impersonate a kill()/tgkill(), which adds
>> +                * source info.
>> +                */
>> +               ret = -EPERM;
>> +               if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
>> +                   (task_pid(current) != pid))
>> +                       goto err;
>
> Is the exception for signaling yourself actually useful here?

All the signal functions exempt the current process from access
checks. Whether that's useful or not (and I think it is), being
inconsistent here would be wrong.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
  2018-11-19 15:45   ` Andy Lutomirski
@ 2018-11-19 15:59   ` Daniel Colascione
  2018-11-19 18:29     ` Christian Brauner
  2018-11-19 17:10   ` Eugene Syromiatnikov
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 50+ messages in thread
From: Daniel Colascione @ 2018-11-19 15:59 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, linux-kernel, Serge E. Hallyn, Jann Horn,
	Andy Lutomirski, Andrew Morton, Oleg Nesterov, Aleksa Sarai,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 2:32 AM, Christian Brauner <christian@brauner.io> wrote:
> The kill() syscall operates on process identifiers. After a process has
> exited its pid can be reused by another process. If a caller sends a signal
> to a reused pid it will end up signaling the wrong process. This issue has
> often surfaced and there has been a push [1] to address this problem.
>
> A prior patch has introduced the ability to get a file descriptor
> referencing struct pid by opening /proc/<pid>. This guarantees a stable
> handle on a process which can be used to send signals to the referenced
> process. Discussion has shown that a dedicated syscall is preferable over
> ioctl()s. Thus, the  new syscall procfd_signal() is introduced to solve
> this problem. It operates on a process file descriptor.
> The syscall takes an additional siginfo_t and flags argument. If siginfo_t
> is NULL then procfd_signal() behaves like kill() if it is not NULL it
> behaves like rt_sigqueueinfo.
> The flags argument is added to allow for future extensions of this syscall.
> It currently needs to be passed as 0.
>
> With this patch a process can be killed via:
>
>  #define _GNU_SOURCE
>  #include <errno.h>
>  #include <fcntl.h>
>  #include <stdio.h>
>  #include <stdlib.h>
>  #include <string.h>
>  #include <signal.h>
>  #include <sys/stat.h>
>  #include <sys/syscall.h>
>  #include <sys/types.h>
>  #include <unistd.h>
>
>  int main(int argc, char *argv[])
>  {
>          int ret;
>          char buf[1000];
>
>          if (argc < 2)
>                  exit(EXIT_FAILURE);
>
>          ret = snprintf(buf, sizeof(buf), "/proc/%s", argv[1]);
>          if (ret < 0)
>                  exit(EXIT_FAILURE);
>
>          int fd = open(buf, O_DIRECTORY | O_CLOEXEC);
>          if (fd < 0) {
>                  printf("%s - Failed to open \"%s\"\n", strerror(errno), buf);
>                  exit(EXIT_FAILURE);
>          }
>
>          ret = syscall(__NR_procfd_signal, fd, SIGKILL, NULL, 0);
>          if (ret < 0) {
>                  printf("Failed to send SIGKILL \"%s\"\n", strerror(errno));
>                  close(fd);
>                  exit(EXIT_FAILURE);
>          }
>
>          close(fd);
>
>          exit(EXIT_SUCCESS);
>  }
>
> [1]: https://lkml.org/lkml/2018/11/18/130
>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Serge Hallyn <serge@hallyn.com>
> Cc: Jann Horn <jannh@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andy Lutomirsky <luto@kernel.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Aleksa Sarai <cyphar@cyphar.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Christian Brauner <christian@brauner.io>
> ---
> Changelog:
> v1:
> - patch introduced
> ---
>  arch/x86/entry/syscalls/syscall_32.tbl |  1 +
>  arch/x86/entry/syscalls/syscall_64.tbl |  1 +
>  fs/proc/base.c                         |  6 ++
>  include/linux/proc_fs.h                |  1 +
>  include/linux/syscalls.h               |  2 +
>  kernel/signal.c                        | 76 ++++++++++++++++++++++++--
>  6 files changed, 81 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 3cf7b533b3d1..e637eab883e9 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -398,3 +398,4 @@
>  384    i386    arch_prctl              sys_arch_prctl                  __ia32_compat_sys_arch_prctl
>  385    i386    io_pgetevents           sys_io_pgetevents               __ia32_compat_sys_io_pgetevents
>  386    i386    rseq                    sys_rseq                        __ia32_sys_rseq
> +387    i386    procfd_signal           sys_procfd_signal               __ia32_sys_procfd_signal
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index f0b1709a5ffb..e95f6741ab42 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -343,6 +343,7 @@
>  332    common  statx                   __x64_sys_statx
>  333    common  io_pgetevents           __x64_sys_io_pgetevents
>  334    common  rseq                    __x64_sys_rseq
> +335    common  procfd_signal           __x64_sys_procfd_signal
>
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 6365a4fea314..a12c9de92bd0 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3055,6 +3055,12 @@ static const struct file_operations proc_tgid_base_operations = {
>         .release        = proc_tgid_release,
>  };
>
> +bool proc_is_procfd(const struct file *file)
> +{
> +       return d_is_dir(file->f_path.dentry) &&
> +              (file->f_op == &proc_tgid_base_operations);
> +}
> +
>  static struct dentry *proc_tgid_base_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
>  {
>         return proc_pident_lookup(dir, dentry,
> diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> index d0e1f1522a78..2d53a47fba34 100644
> --- a/include/linux/proc_fs.h
> +++ b/include/linux/proc_fs.h
> @@ -73,6 +73,7 @@ struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mo
>                                                     int (*show)(struct seq_file *, void *),
>                                                     proc_write_t write,
>                                                     void *data);
> +extern bool proc_is_procfd(const struct file *file);
>
>  #else /* CONFIG_PROC_FS */
>
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 2ac3d13a915b..a5ca8cb84566 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -907,6 +907,8 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
>                           unsigned mask, struct statx __user *buffer);
>  asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
>                          int flags, uint32_t sig);
> +asmlinkage long sys_procfd_signal(int fd, int sig, siginfo_t __user *info,
> +                                 int flags);
>
>  /*
>   * Architecture-specific system calls
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9a32bc2088c9..e8a8929de710 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -19,7 +19,9 @@
>  #include <linux/sched/task.h>
>  #include <linux/sched/task_stack.h>
>  #include <linux/sched/cputime.h>
> +#include <linux/file.h>
>  #include <linux/fs.h>
> +#include <linux/proc_fs.h>
>  #include <linux/tty.h>
>  #include <linux/binfmts.h>
>  #include <linux/coredump.h>
> @@ -3286,6 +3288,16 @@ COMPAT_SYSCALL_DEFINE4(rt_sigtimedwait, compat_sigset_t __user *, uthese,
>  }
>  #endif
>
> +static inline void prepare_kill_siginfo(int sig, struct kernel_siginfo *info)
> +{
> +       clear_siginfo(info);
> +       info->si_signo = sig;
> +       info->si_errno = 0;
> +       info->si_code = SI_USER;
> +       info->si_pid = task_tgid_vnr(current);
> +       info->si_uid = from_kuid_munged(current_user_ns(), current_uid());
> +}
> +
>  /**
>   *  sys_kill - send a signal to a process
>   *  @pid: the PID of the process
> @@ -3295,16 +3307,68 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
>  {
>         struct kernel_siginfo info;
>
> -       clear_siginfo(&info);
> -       info.si_signo = sig;
> -       info.si_errno = 0;
> -       info.si_code = SI_USER;
> -       info.si_pid = task_tgid_vnr(current);
> -       info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
> +       prepare_kill_siginfo(sig, &info);
>
>         return kill_something_info(sig, &info, pid);
>  }
>
> +/**
> + *  sys_procfd_signal - send a signal to a process through a process file
> + *                      descriptor
> + *  @fd: the file descriptor of the process
> + *  @sig: signal to be sent
> + *  @info: the signal info
> + *  @flags: future flags to be passed
> + */
> +SYSCALL_DEFINE4(procfd_signal, int, fd, int, sig, siginfo_t __user *, info,
> +               int, flags)
> +{
> +       int ret;
> +       struct pid *pid;
> +       kernel_siginfo_t kinfo;
> +       struct fd f;
> +
> +       /* Enforce flags be set to 0 until we add an extension. */
> +       if (flags)
> +               return -EINVAL;
> +
> +       f = fdget_raw(fd);
> +       if (!f.file)
> +               return -EBADF;
> +
> +       ret = -EINVAL;
> +       /* Is this a process file descriptor? */
> +       if (!proc_is_procfd(f.file))
> +               goto err;
> +
> +       pid = f.file->private_data;

You never addressed my comment on the previous patch about your use of
private_data here. Why can't you use the struct pid reference that's
already in the inode?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
  2018-11-19 15:45   ` Andy Lutomirski
  2018-11-19 15:59   ` Daniel Colascione
@ 2018-11-19 17:10   ` Eugene Syromiatnikov
  2018-11-19 18:23     ` Christian Brauner
  2018-11-19 17:14   ` Eugene Syromiatnikov
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 50+ messages in thread
From: Eugene Syromiatnikov @ 2018-11-19 17:10 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, cyphar,
	viro, linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 11:32:39AM +0100, Christian Brauner wrote:
> diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> index 3cf7b533b3d1..e637eab883e9 100644
> --- a/arch/x86/entry/syscalls/syscall_32.tbl
> +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> @@ -398,3 +398,4 @@
>  384	i386	arch_prctl		sys_arch_prctl			__ia32_compat_sys_arch_prctl
>  385	i386	io_pgetevents		sys_io_pgetevents		__ia32_compat_sys_io_pgetevents
>  386	i386	rseq			sys_rseq			__ia32_sys_rseq
> +387	i386	procfd_signal		sys_procfd_signal		__ia32_sys_procfd_signal
> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index f0b1709a5ffb..e95f6741ab42 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -343,6 +343,7 @@
>  332	common	statx			__x64_sys_statx
>  333	common	io_pgetevents		__x64_sys_io_pgetevents
>  334	common	rseq			__x64_sys_rseq
> +335	common	procfd_signal		__x64_sys_procfd_signal

You have wired up the syscall on x86 but have not added the syscall number
to the generic syscall header (include/uapi/asm-generic/unistd.h), why?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
                     ` (2 preceding siblings ...)
  2018-11-19 17:10   ` Eugene Syromiatnikov
@ 2018-11-19 17:14   ` Eugene Syromiatnikov
  2018-11-19 20:28   ` Aleksa Sarai
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 50+ messages in thread
From: Eugene Syromiatnikov @ 2018-11-19 17:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, cyphar,
	viro, linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 11:32:39AM +0100, Christian Brauner wrote:
> +/**
> + *  sys_procfd_signal - send a signal to a process through a process file
> + *                      descriptor
> + *  @fd: the file descriptor of the process
> + *  @sig: signal to be sent
> + *  @info: the signal info
> + *  @flags: future flags to be passed
> + */
> +SYSCALL_DEFINE4(procfd_signal, int, fd, int, sig, siginfo_t __user *, info,
> +		int, flags)

It could be considered better to use an unsigned type for the "flags"
argument.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 1/2] proc: get process file descriptor from /proc/<pid>
  2018-11-19 15:32   ` Andy Lutomirski
@ 2018-11-19 18:20     ` Christian Brauner
  0 siblings, 0 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 18:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, LKML, Serge E. Hallyn, Jann Horn,
	Andrew Morton, Oleg Nesterov, Aleksa Sarai, Al Viro,
	Linux FS Devel, Linux API, Daniel Colascione, Tim Murray,
	linux-man, Kees Cook

On Mon, Nov 19, 2018 at 07:32:33AM -0800, Andy Lutomirski wrote:
> On Mon, Nov 19, 2018 at 2:33 AM Christian Brauner <christian@brauner.io> wrote:
> >
> > With this patch an open() call on /proc/<pid> will give userspace a handle
> > to struct pid of the process associated with /proc/<pid>. This allows to
> > maintain a stable handle on a process.
> > I have been discussing various approaches extensively during technical
> > conferences this year culminating in a long argument with Eric at Linux
> > Plumbers. The general consensus was that having a handle on a process
> > should be something that is very simple and easy to maintain with the
> > option of being extensible via a more advanced api if the need arises. I
> > believe that this patch is the most simple, dumb, and therefore
> > maintainable solution.
> 
> How does the mechanism you're adding here differ from proc_pid()?

I'm sorry, I am missing the context around proc_pid()?. Are you talking
about the the proc_pid() function in "internal.h"? What exactly is the
proc_pid() mechanism?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 17:10   ` Eugene Syromiatnikov
@ 2018-11-19 18:23     ` Christian Brauner
  0 siblings, 0 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 18:23 UTC (permalink / raw)
  To: Eugene Syromiatnikov
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, cyphar,
	viro, linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 06:10:53PM +0100, Eugene Syromiatnikov wrote:
> On Mon, Nov 19, 2018 at 11:32:39AM +0100, Christian Brauner wrote:
> > diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> > index 3cf7b533b3d1..e637eab883e9 100644
> > --- a/arch/x86/entry/syscalls/syscall_32.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> > @@ -398,3 +398,4 @@
> >  384	i386	arch_prctl		sys_arch_prctl			__ia32_compat_sys_arch_prctl
> >  385	i386	io_pgetevents		sys_io_pgetevents		__ia32_compat_sys_io_pgetevents
> >  386	i386	rseq			sys_rseq			__ia32_sys_rseq
> > +387	i386	procfd_signal		sys_procfd_signal		__ia32_sys_procfd_signal
> > diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> > index f0b1709a5ffb..e95f6741ab42 100644
> > --- a/arch/x86/entry/syscalls/syscall_64.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> > @@ -343,6 +343,7 @@
> >  332	common	statx			__x64_sys_statx
> >  333	common	io_pgetevents		__x64_sys_io_pgetevents
> >  334	common	rseq			__x64_sys_rseq
> > +335	common	procfd_signal		__x64_sys_procfd_signal
> 
> You have wired up the syscall on x86 but have not added the syscall number
> to the generic syscall header (include/uapi/asm-generic/unistd.h), why?

Oversight on my part, sorry.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 15:59   ` Daniel Colascione
@ 2018-11-19 18:29     ` Christian Brauner
  2018-11-19 19:02       ` Eric W. Biederman
  0 siblings, 1 reply; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 18:29 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Eric W. Biederman, linux-kernel, Serge E. Hallyn, Jann Horn,
	Andy Lutomirski, Andrew Morton, Oleg Nesterov, Aleksa Sarai,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 07:59:24AM -0800, Daniel Colascione wrote:
> On Mon, Nov 19, 2018 at 2:32 AM, Christian Brauner <christian@brauner.io> wrote:
> > The kill() syscall operates on process identifiers. After a process has
> > exited its pid can be reused by another process. If a caller sends a signal
> > to a reused pid it will end up signaling the wrong process. This issue has
> > often surfaced and there has been a push [1] to address this problem.
> >
> > A prior patch has introduced the ability to get a file descriptor
> > referencing struct pid by opening /proc/<pid>. This guarantees a stable
> > handle on a process which can be used to send signals to the referenced
> > process. Discussion has shown that a dedicated syscall is preferable over
> > ioctl()s. Thus, the  new syscall procfd_signal() is introduced to solve
> > this problem. It operates on a process file descriptor.
> > The syscall takes an additional siginfo_t and flags argument. If siginfo_t
> > is NULL then procfd_signal() behaves like kill() if it is not NULL it
> > behaves like rt_sigqueueinfo.
> > The flags argument is added to allow for future extensions of this syscall.
> > It currently needs to be passed as 0.
> >
> > With this patch a process can be killed via:
> >
> >  #define _GNU_SOURCE
> >  #include <errno.h>
> >  #include <fcntl.h>
> >  #include <stdio.h>
> >  #include <stdlib.h>
> >  #include <string.h>
> >  #include <signal.h>
> >  #include <sys/stat.h>
> >  #include <sys/syscall.h>
> >  #include <sys/types.h>
> >  #include <unistd.h>
> >
> >  int main(int argc, char *argv[])
> >  {
> >          int ret;
> >          char buf[1000];
> >
> >          if (argc < 2)
> >                  exit(EXIT_FAILURE);
> >
> >          ret = snprintf(buf, sizeof(buf), "/proc/%s", argv[1]);
> >          if (ret < 0)
> >                  exit(EXIT_FAILURE);
> >
> >          int fd = open(buf, O_DIRECTORY | O_CLOEXEC);
> >          if (fd < 0) {
> >                  printf("%s - Failed to open \"%s\"\n", strerror(errno), buf);
> >                  exit(EXIT_FAILURE);
> >          }
> >
> >          ret = syscall(__NR_procfd_signal, fd, SIGKILL, NULL, 0);
> >          if (ret < 0) {
> >                  printf("Failed to send SIGKILL \"%s\"\n", strerror(errno));
> >                  close(fd);
> >                  exit(EXIT_FAILURE);
> >          }
> >
> >          close(fd);
> >
> >          exit(EXIT_SUCCESS);
> >  }
> >
> > [1]: https://lkml.org/lkml/2018/11/18/130
> >
> > Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> > Cc: Serge Hallyn <serge@hallyn.com>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Andy Lutomirsky <luto@kernel.org>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Aleksa Sarai <cyphar@cyphar.com>
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Signed-off-by: Christian Brauner <christian@brauner.io>
> > ---
> > Changelog:
> > v1:
> > - patch introduced
> > ---
> >  arch/x86/entry/syscalls/syscall_32.tbl |  1 +
> >  arch/x86/entry/syscalls/syscall_64.tbl |  1 +
> >  fs/proc/base.c                         |  6 ++
> >  include/linux/proc_fs.h                |  1 +
> >  include/linux/syscalls.h               |  2 +
> >  kernel/signal.c                        | 76 ++++++++++++++++++++++++--
> >  6 files changed, 81 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> > index 3cf7b533b3d1..e637eab883e9 100644
> > --- a/arch/x86/entry/syscalls/syscall_32.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> > @@ -398,3 +398,4 @@
> >  384    i386    arch_prctl              sys_arch_prctl                  __ia32_compat_sys_arch_prctl
> >  385    i386    io_pgetevents           sys_io_pgetevents               __ia32_compat_sys_io_pgetevents
> >  386    i386    rseq                    sys_rseq                        __ia32_sys_rseq
> > +387    i386    procfd_signal           sys_procfd_signal               __ia32_sys_procfd_signal
> > diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> > index f0b1709a5ffb..e95f6741ab42 100644
> > --- a/arch/x86/entry/syscalls/syscall_64.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> > @@ -343,6 +343,7 @@
> >  332    common  statx                   __x64_sys_statx
> >  333    common  io_pgetevents           __x64_sys_io_pgetevents
> >  334    common  rseq                    __x64_sys_rseq
> > +335    common  procfd_signal           __x64_sys_procfd_signal
> >
> >  #
> >  # x32-specific system call numbers start at 512 to avoid cache impact
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index 6365a4fea314..a12c9de92bd0 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -3055,6 +3055,12 @@ static const struct file_operations proc_tgid_base_operations = {
> >         .release        = proc_tgid_release,
> >  };
> >
> > +bool proc_is_procfd(const struct file *file)
> > +{
> > +       return d_is_dir(file->f_path.dentry) &&
> > +              (file->f_op == &proc_tgid_base_operations);
> > +}
> > +
> >  static struct dentry *proc_tgid_base_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
> >  {
> >         return proc_pident_lookup(dir, dentry,
> > diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
> > index d0e1f1522a78..2d53a47fba34 100644
> > --- a/include/linux/proc_fs.h
> > +++ b/include/linux/proc_fs.h
> > @@ -73,6 +73,7 @@ struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mo
> >                                                     int (*show)(struct seq_file *, void *),
> >                                                     proc_write_t write,
> >                                                     void *data);
> > +extern bool proc_is_procfd(const struct file *file);
> >
> >  #else /* CONFIG_PROC_FS */
> >
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 2ac3d13a915b..a5ca8cb84566 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -907,6 +907,8 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
> >                           unsigned mask, struct statx __user *buffer);
> >  asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
> >                          int flags, uint32_t sig);
> > +asmlinkage long sys_procfd_signal(int fd, int sig, siginfo_t __user *info,
> > +                                 int flags);
> >
> >  /*
> >   * Architecture-specific system calls
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index 9a32bc2088c9..e8a8929de710 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -19,7 +19,9 @@
> >  #include <linux/sched/task.h>
> >  #include <linux/sched/task_stack.h>
> >  #include <linux/sched/cputime.h>
> > +#include <linux/file.h>
> >  #include <linux/fs.h>
> > +#include <linux/proc_fs.h>
> >  #include <linux/tty.h>
> >  #include <linux/binfmts.h>
> >  #include <linux/coredump.h>
> > @@ -3286,6 +3288,16 @@ COMPAT_SYSCALL_DEFINE4(rt_sigtimedwait, compat_sigset_t __user *, uthese,
> >  }
> >  #endif
> >
> > +static inline void prepare_kill_siginfo(int sig, struct kernel_siginfo *info)
> > +{
> > +       clear_siginfo(info);
> > +       info->si_signo = sig;
> > +       info->si_errno = 0;
> > +       info->si_code = SI_USER;
> > +       info->si_pid = task_tgid_vnr(current);
> > +       info->si_uid = from_kuid_munged(current_user_ns(), current_uid());
> > +}
> > +
> >  /**
> >   *  sys_kill - send a signal to a process
> >   *  @pid: the PID of the process
> > @@ -3295,16 +3307,68 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
> >  {
> >         struct kernel_siginfo info;
> >
> > -       clear_siginfo(&info);
> > -       info.si_signo = sig;
> > -       info.si_errno = 0;
> > -       info.si_code = SI_USER;
> > -       info.si_pid = task_tgid_vnr(current);
> > -       info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
> > +       prepare_kill_siginfo(sig, &info);
> >
> >         return kill_something_info(sig, &info, pid);
> >  }
> >
> > +/**
> > + *  sys_procfd_signal - send a signal to a process through a process file
> > + *                      descriptor
> > + *  @fd: the file descriptor of the process
> > + *  @sig: signal to be sent
> > + *  @info: the signal info
> > + *  @flags: future flags to be passed
> > + */
> > +SYSCALL_DEFINE4(procfd_signal, int, fd, int, sig, siginfo_t __user *, info,
> > +               int, flags)
> > +{
> > +       int ret;
> > +       struct pid *pid;
> > +       kernel_siginfo_t kinfo;
> > +       struct fd f;
> > +
> > +       /* Enforce flags be set to 0 until we add an extension. */
> > +       if (flags)
> > +               return -EINVAL;
> > +
> > +       f = fdget_raw(fd);
> > +       if (!f.file)
> > +               return -EBADF;
> > +
> > +       ret = -EINVAL;
> > +       /* Is this a process file descriptor? */
> > +       if (!proc_is_procfd(f.file))
> > +               goto err;
> > +
> > +       pid = f.file->private_data;
> 
> You never addressed my comment on the previous patch about your use of

Sorry, that thread exploded so quickly that I might have missed it.

> private_data here. Why can't you use the struct pid reference that's
> already in the inode?

If that's what people prefer we can probably use that. There was
precedent for stashing away such data in fs/proc/base.c already for
various other things including user namespaces and struct mm so I
followed this model. A prior version of my patch (I didn't send out) did
retrive the inode via proc_pid() in .open() took an additional reference
via get_pid() and dropped it in .release(). Do we prefer that?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 15:45   ` Andy Lutomirski
  2018-11-19 15:57     ` Daniel Colascione
@ 2018-11-19 18:39     ` Christian Brauner
  1 sibling, 0 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 18:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric W. Biederman, LKML, Serge E. Hallyn, Jann Horn,
	Andrew Morton, Oleg Nesterov, Aleksa Sarai, Al Viro,
	Linux FS Devel, Linux API, Daniel Colascione, Tim Murray,
	linux-man, Kees Cook

On Mon, Nov 19, 2018 at 07:45:04AM -0800, Andy Lutomirski wrote:
> On Mon, Nov 19, 2018 at 2:33 AM Christian Brauner <christian@brauner.io> wrote:
> >
> > The kill() syscall operates on process identifiers. After a process has
> > exited its pid can be reused by another process. If a caller sends a signal
> > to a reused pid it will end up signaling the wrong process. This issue has
> > often surfaced and there has been a push [1] to address this problem.
> >
> > A prior patch has introduced the ability to get a file descriptor
> > referencing struct pid by opening /proc/<pid>. This guarantees a stable
> > handle on a process which can be used to send signals to the referenced
> > process. Discussion has shown that a dedicated syscall is preferable over
> > ioctl()s. Thus, the  new syscall procfd_signal() is introduced to solve
> > this problem. It operates on a process file descriptor.
> > The syscall takes an additional siginfo_t and flags argument. If siginfo_t
> > is NULL then procfd_signal() behaves like kill() if it is not NULL it
> > behaves like rt_sigqueueinfo.
> > The flags argument is added to allow for future extensions of this syscall.
> > It currently needs to be passed as 0.
> 
> A few questions.  First: you've made this work on /proc/PID, but
> should it also work on /proc/PID/task/TID to send signals to a
> specific thread?

Yeah, so I thought about that. Your point being to combine: kill(),
tgkill() aka rt_sigqueueinfo() and rt_tg_sigqueueinfo(). If I understand
this correctly the implication is to also get file descriptors to
/proc/PID/task/TID and pass them to procfd_signal()? Can we hold of on
that one? Adding this in the future should be easily doable by simply
getting /proc/PID/task/TID file descriptors but I would like this
patchset to be as small as possible.

> 
> > +bool proc_is_procfd(const struct file *file)
> > +{
> > +       return d_is_dir(file->f_path.dentry) &&
> > +              (file->f_op == &proc_tgid_base_operations);
> > +}
> 
> Maybe rename to proc_is_tgid_procfd() to leave room for proc_is_tid_procfd()?

Yes, good idea!

> 
> > +       if (info) {
> > +               ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > +               if (unlikely(ret))
> > +                       goto err;
> > +               /*
> > +                * Not even root can pretend to send signals from the kernel.
> > +                * Nor can they impersonate a kill()/tgkill(), which adds
> > +                * source info.
> > +                */
> > +               ret = -EPERM;
> > +               if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > +                   (task_pid(current) != pid))
> > +                       goto err;
> 
> Is the exception for signaling yourself actually useful here?

I tried to strictly follow the sigqueue-based permission checks. I'm not
comfortable removing this check without signal-experts telling me that
it is safe to do.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 18:29     ` Christian Brauner
@ 2018-11-19 19:02       ` Eric W. Biederman
  2018-11-19 19:31         ` Christian Brauner
  0 siblings, 1 reply; 50+ messages in thread
From: Eric W. Biederman @ 2018-11-19 19:02 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Daniel Colascione, linux-kernel, Serge E. Hallyn, Jann Horn,
	Andy Lutomirski, Andrew Morton, Oleg Nesterov, Aleksa Sarai,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

Christian Brauner <christian@brauner.io> writes:

> On Mon, Nov 19, 2018 at 07:59:24AM -0800, Daniel Colascione wrote:
>> You never addressed my comment on the previous patch about your use of
>
> Sorry, that thread exploded so quickly that I might have missed it.
>
>> private_data here. Why can't you use the struct pid reference that's
>> already in the inode?
>
> If that's what people prefer we can probably use that. There was
> precedent for stashing away such data in fs/proc/base.c already for
> various other things including user namespaces and struct mm so I
> followed this model. A prior version of my patch (I didn't send out) did
> retrive the inode via proc_pid() in .open() took an additional reference
> via get_pid() and dropped it in .release(). Do we prefer that?

If you are using proc/<pid>/ directories as your file descriptors, you
don't need to add an open or a release method at all.  The existing file
descriptors hold a reference to the inode which holds a reference the
the struct pid.

The only time you need to get a reference is when you need a task
and kill_pid_info already performs that work for you.

So using proc_pid is all you need to do to get the pid from the existing
file descriptors.

Eric


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 19:02       ` Eric W. Biederman
@ 2018-11-19 19:31         ` Christian Brauner
  2018-11-19 19:39           ` Daniel Colascione
  0 siblings, 1 reply; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 19:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Daniel Colascione, linux-kernel, Serge E. Hallyn, Jann Horn,
	Andy Lutomirski, Andrew Morton, Oleg Nesterov, Aleksa Sarai,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 01:02:06PM -0600, Eric W. Biederman wrote:
> Christian Brauner <christian@brauner.io> writes:
> 
> > On Mon, Nov 19, 2018 at 07:59:24AM -0800, Daniel Colascione wrote:
> >> You never addressed my comment on the previous patch about your use of
> >
> > Sorry, that thread exploded so quickly that I might have missed it.
> >
> >> private_data here. Why can't you use the struct pid reference that's
> >> already in the inode?
> >
> > If that's what people prefer we can probably use that. There was
> > precedent for stashing away such data in fs/proc/base.c already for
> > various other things including user namespaces and struct mm so I
> > followed this model. A prior version of my patch (I didn't send out) did
> > retrive the inode via proc_pid() in .open() took an additional reference
> > via get_pid() and dropped it in .release(). Do we prefer that?
> 
> If you are using proc/<pid>/ directories as your file descriptors, you
> don't need to add an open or a release method at all.  The existing file
> descriptors hold a reference to the inode which holds a reference the
> the struct pid.
> 
> The only time you need to get a reference is when you need a task
> and kill_pid_info already performs that work for you.

Oh, I see what you and Andy are saying now. Sweet, that means we can
trim down the patch even more. Less code, less headache.

Thanks!

> 
> So using proc_pid is all you need to do to get the pid from the existing
> file descriptors.
> 
> Eric
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 19:31         ` Christian Brauner
@ 2018-11-19 19:39           ` Daniel Colascione
  0 siblings, 0 replies; 50+ messages in thread
From: Daniel Colascione @ 2018-11-19 19:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, linux-kernel, Serge E. Hallyn, Jann Horn,
	Andy Lutomirski, Andrew Morton, Oleg Nesterov, Aleksa Sarai,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

Yep. That's also what I was talking about, FWIW.

On Mon, Nov 19, 2018 at 11:31 AM, Christian Brauner
<christian@brauner.io> wrote:
> On Mon, Nov 19, 2018 at 01:02:06PM -0600, Eric W. Biederman wrote:
>> Christian Brauner <christian@brauner.io> writes:
>>
>> > On Mon, Nov 19, 2018 at 07:59:24AM -0800, Daniel Colascione wrote:
>> >> You never addressed my comment on the previous patch about your use of
>> >
>> > Sorry, that thread exploded so quickly that I might have missed it.
>> >
>> >> private_data here. Why can't you use the struct pid reference that's
>> >> already in the inode?
>> >
>> > If that's what people prefer we can probably use that. There was
>> > precedent for stashing away such data in fs/proc/base.c already for
>> > various other things including user namespaces and struct mm so I
>> > followed this model. A prior version of my patch (I didn't send out) did
>> > retrive the inode via proc_pid() in .open() took an additional reference
>> > via get_pid() and dropped it in .release(). Do we prefer that?
>>
>> If you are using proc/<pid>/ directories as your file descriptors, you
>> don't need to add an open or a release method at all.  The existing file
>> descriptors hold a reference to the inode which holds a reference the
>> the struct pid.
>>
>> The only time you need to get a reference is when you need a task
>> and kill_pid_info already performs that work for you.
>
> Oh, I see what you and Andy are saying now. Sweet, that means we can
> trim down the patch even more. Less code, less headache.
>
> Thanks!
>
>>
>> So using proc_pid is all you need to do to get the pid from the existing
>> file descriptors.
>>
>> Eric
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
                     ` (3 preceding siblings ...)
  2018-11-19 17:14   ` Eugene Syromiatnikov
@ 2018-11-19 20:28   ` Aleksa Sarai
  2018-11-19 20:55     ` Christian Brauner
  2018-11-19 22:39   ` Tycho Andersen
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 50+ messages in thread
From: Aleksa Sarai @ 2018-11-19 20:28 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, viro,
	linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook


[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]

On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> +	if (info) {
> +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> +		if (unlikely(ret))
> +			goto err;
> +		/*
> +		 * Not even root can pretend to send signals from the kernel.
> +		 * Nor can they impersonate a kill()/tgkill(), which adds
> +		 * source info.
> +		 */
> +		ret = -EPERM;
> +		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> +		    (task_pid(current) != pid))
> +			goto err;
> +	} else {
> +		prepare_kill_siginfo(sig, &kinfo);
> +	}

I wonder whether we should also have a pidns restriction here, since
currently it isn't possible for a container process using a pidns to
signal processes outside its pidns. AFAICS, this isn't done through an
explicit check -- it's a side-effect of processes in a pidns not being
able to address non-descendant-pidns processes.

But maybe it's reasonable to allow sending a procfd to a different pidns
and the same operations working on it? If we extend the procfd API to
allow process creation this would allow a container to create a process
outside its pidns.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 20:28   ` Aleksa Sarai
@ 2018-11-19 20:55     ` Christian Brauner
  2018-11-19 21:13       ` Christian Brauner
  2018-11-19 21:18       ` Aleksa Sarai
  0 siblings, 2 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 20:55 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, viro,
	linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook

On Tue, Nov 20, 2018 at 07:28:57AM +1100, Aleksa Sarai wrote:
> On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > +	if (info) {
> > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > +		if (unlikely(ret))
> > +			goto err;
> > +		/*
> > +		 * Not even root can pretend to send signals from the kernel.
> > +		 * Nor can they impersonate a kill()/tgkill(), which adds
> > +		 * source info.
> > +		 */
> > +		ret = -EPERM;
> > +		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > +		    (task_pid(current) != pid))
> > +			goto err;
> > +	} else {
> > +		prepare_kill_siginfo(sig, &kinfo);
> > +	}
> 
> I wonder whether we should also have a pidns restriction here, since
> currently it isn't possible for a container process using a pidns to
> signal processes outside its pidns. AFAICS, this isn't done through an
> explicit check -- it's a side-effect of processes in a pidns not being
> able to address non-descendant-pidns processes.
> 
> But maybe it's reasonable to allow sending a procfd to a different pidns
> and the same operations working on it? If we extend the procfd API to

No, I don't think so. I really don't want any fancy semantics in here.
Fancy doesn't get merged and fancy is hard to maintain. So we should do
something like:

if (proc_pid_ns() != current_pid_ns)
	return EINVAL

> allow process creation this would allow a container to create a process
> outside its pidns.
> 
> -- 
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> <https://www.cyphar.com/>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 20:55     ` Christian Brauner
@ 2018-11-19 21:13       ` Christian Brauner
  2018-11-19 21:18       ` Aleksa Sarai
  1 sibling, 0 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 21:13 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, viro,
	linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 09:55:18PM +0100, Christian Brauner wrote:
> On Tue, Nov 20, 2018 at 07:28:57AM +1100, Aleksa Sarai wrote:
> > On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > > +	if (info) {
> > > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > > +		if (unlikely(ret))
> > > +			goto err;
> > > +		/*
> > > +		 * Not even root can pretend to send signals from the kernel.
> > > +		 * Nor can they impersonate a kill()/tgkill(), which adds
> > > +		 * source info.
> > > +		 */
> > > +		ret = -EPERM;
> > > +		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > > +		    (task_pid(current) != pid))
> > > +			goto err;
> > > +	} else {
> > > +		prepare_kill_siginfo(sig, &kinfo);
> > > +	}
> > 
> > I wonder whether we should also have a pidns restriction here, since
> > currently it isn't possible for a container process using a pidns to
> > signal processes outside its pidns. AFAICS, this isn't done through an
> > explicit check -- it's a side-effect of processes in a pidns not being
> > able to address non-descendant-pidns processes.
> > 
> > But maybe it's reasonable to allow sending a procfd to a different pidns
> > and the same operations working on it? If we extend the procfd API to
> 
> No, I don't think so. I really don't want any fancy semantics in here.
> Fancy doesn't get merged and fancy is hard to maintain. So we should do
> something like:
> 
> if (proc_pid_ns() != current_pid_ns)
> 	return EINVAL

To be more precise, we need to detect if fd refers to an ancestor pidns
and if so return EINVAL.

> 
> > allow process creation this would allow a container to create a process
> > outside its pidns.
> > 
> > -- 
> > Aleksa Sarai
> > Senior Software Engineer (Containers)
> > SUSE Linux GmbH
> > <https://www.cyphar.com/>
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 20:55     ` Christian Brauner
  2018-11-19 21:13       ` Christian Brauner
@ 2018-11-19 21:18       ` Aleksa Sarai
  2018-11-19 21:20         ` Christian Brauner
                           ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: Aleksa Sarai @ 2018-11-19 21:18 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, viro,
	linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook


[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]

On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> On Tue, Nov 20, 2018 at 07:28:57AM +1100, Aleksa Sarai wrote:
> > On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > > +	if (info) {
> > > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > > +		if (unlikely(ret))
> > > +			goto err;
> > > +		/*
> > > +		 * Not even root can pretend to send signals from the kernel.
> > > +		 * Nor can they impersonate a kill()/tgkill(), which adds
> > > +		 * source info.
> > > +		 */
> > > +		ret = -EPERM;
> > > +		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > > +		    (task_pid(current) != pid))
> > > +			goto err;
> > > +	} else {
> > > +		prepare_kill_siginfo(sig, &kinfo);
> > > +	}
> > 
> > I wonder whether we should also have a pidns restriction here, since
> > currently it isn't possible for a container process using a pidns to
> > signal processes outside its pidns. AFAICS, this isn't done through an
> > explicit check -- it's a side-effect of processes in a pidns not being
> > able to address non-descendant-pidns processes.
> > 
> > But maybe it's reasonable to allow sending a procfd to a different pidns
> > and the same operations working on it? If we extend the procfd API to
> 
> No, I don't think so. I really don't want any fancy semantics in here.
> Fancy doesn't get merged and fancy is hard to maintain. So we should do
> something like:
> 
> if (proc_pid_ns() != current_pid_ns)
> 	return EINVAL

This isn't quite sufficient. The key thing is that you have to be in an
*ancestor* (or same) pidns, not the *same* pidns. Ideally you can re-use
the check already in pidns_get_parent, and expose it. It would be
something as trivial as:

bool pidns_is_descendant(struct pid_namespace *ns,
                         struct pid_namespace *ancestor)
{
    for (;;) {
        if (!ns)
            return false;
        if (ns == ancestor)
            break;
        ns = ns->parent;
    }
    return true;
}

And you can rewrite pidns_get_parent to use it. So you would instead be
doing:

    if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
        return -EPERM;

(Or you can just copy the 5-line loop into procfd_signal -- though I
imagine we'll need this for all of the procfd_* APIs.)

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:18       ` Aleksa Sarai
@ 2018-11-19 21:20         ` Christian Brauner
  2018-11-19 21:21         ` Christian Brauner
  2018-11-19 21:23         ` Aleksa Sarai
  2 siblings, 0 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 21:20 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, viro,
	linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook

On Tue, Nov 20, 2018 at 08:18:10AM +1100, Aleksa Sarai wrote:
> On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > On Tue, Nov 20, 2018 at 07:28:57AM +1100, Aleksa Sarai wrote:
> > > On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > > > +	if (info) {
> > > > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > > > +		if (unlikely(ret))
> > > > +			goto err;
> > > > +		/*
> > > > +		 * Not even root can pretend to send signals from the kernel.
> > > > +		 * Nor can they impersonate a kill()/tgkill(), which adds
> > > > +		 * source info.
> > > > +		 */
> > > > +		ret = -EPERM;
> > > > +		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > > > +		    (task_pid(current) != pid))
> > > > +			goto err;
> > > > +	} else {
> > > > +		prepare_kill_siginfo(sig, &kinfo);
> > > > +	}
> > > 
> > > I wonder whether we should also have a pidns restriction here, since
> > > currently it isn't possible for a container process using a pidns to
> > > signal processes outside its pidns. AFAICS, this isn't done through an
> > > explicit check -- it's a side-effect of processes in a pidns not being
> > > able to address non-descendant-pidns processes.
> > > 
> > > But maybe it's reasonable to allow sending a procfd to a different pidns
> > > and the same operations working on it? If we extend the procfd API to
> > 
> > No, I don't think so. I really don't want any fancy semantics in here.
> > Fancy doesn't get merged and fancy is hard to maintain. So we should do
> > something like:
> > 
> > if (proc_pid_ns() != current_pid_ns)
> > 	return EINVAL
> 
> This isn't quite sufficient. The key thing is that you have to be in an
> *ancestor* (or same) pidns, not the *same* pidns. Ideally you can re-use

See my next mail.

> the check already in pidns_get_parent, and expose it. It would be
> something as trivial as:
> 
> bool pidns_is_descendant(struct pid_namespace *ns,
>                          struct pid_namespace *ancestor)
> {
>     for (;;) {
>         if (!ns)
>             return false;
>         if (ns == ancestor)
>             break;
>         ns = ns->parent;
>     }
>     return true;
> }
> 
> And you can rewrite pidns_get_parent to use it. So you would instead be
> doing:
> 
>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
>         return -EPERM;
> 
> (Or you can just copy the 5-line loop into procfd_signal -- though I
> imagine we'll need this for all of the procfd_* APIs.)
> 
> -- 
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> <https://www.cyphar.com/>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:18       ` Aleksa Sarai
  2018-11-19 21:20         ` Christian Brauner
@ 2018-11-19 21:21         ` Christian Brauner
  2018-11-19 21:25           ` Aleksa Sarai
  2018-11-19 21:26           ` Daniel Colascione
  2018-11-19 21:23         ` Aleksa Sarai
  2 siblings, 2 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 21:21 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, viro,
	linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook

On Tue, Nov 20, 2018 at 08:18:10AM +1100, Aleksa Sarai wrote:
> On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > On Tue, Nov 20, 2018 at 07:28:57AM +1100, Aleksa Sarai wrote:
> > > On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > > > +	if (info) {
> > > > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > > > +		if (unlikely(ret))
> > > > +			goto err;
> > > > +		/*
> > > > +		 * Not even root can pretend to send signals from the kernel.
> > > > +		 * Nor can they impersonate a kill()/tgkill(), which adds
> > > > +		 * source info.
> > > > +		 */
> > > > +		ret = -EPERM;
> > > > +		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > > > +		    (task_pid(current) != pid))
> > > > +			goto err;
> > > > +	} else {
> > > > +		prepare_kill_siginfo(sig, &kinfo);
> > > > +	}
> > > 
> > > I wonder whether we should also have a pidns restriction here, since
> > > currently it isn't possible for a container process using a pidns to
> > > signal processes outside its pidns. AFAICS, this isn't done through an
> > > explicit check -- it's a side-effect of processes in a pidns not being
> > > able to address non-descendant-pidns processes.
> > > 
> > > But maybe it's reasonable to allow sending a procfd to a different pidns
> > > and the same operations working on it? If we extend the procfd API to
> > 
> > No, I don't think so. I really don't want any fancy semantics in here.
> > Fancy doesn't get merged and fancy is hard to maintain. So we should do
> > something like:
> > 
> > if (proc_pid_ns() != current_pid_ns)
> > 	return EINVAL
> 
> This isn't quite sufficient. The key thing is that you have to be in an
> *ancestor* (or same) pidns, not the *same* pidns. Ideally you can re-use
> the check already in pidns_get_parent, and expose it. It would be
> something as trivial as:
> 
> bool pidns_is_descendant(struct pid_namespace *ns,
>                          struct pid_namespace *ancestor)
> {
>     for (;;) {
>         if (!ns)
>             return false;
>         if (ns == ancestor)
>             break;
>         ns = ns->parent;
>     }
>     return true;
> }

That can be done without a loop by comparing the level counter for the
two pid namespaces.

> 
> And you can rewrite pidns_get_parent to use it. So you would instead be
> doing:
> 
>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
>         return -EPERM;
> 
> (Or you can just copy the 5-line loop into procfd_signal -- though I
> imagine we'll need this for all of the procfd_* APIs.)
> 
> -- 
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> <https://www.cyphar.com/>



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:18       ` Aleksa Sarai
  2018-11-19 21:20         ` Christian Brauner
  2018-11-19 21:21         ` Christian Brauner
@ 2018-11-19 21:23         ` Aleksa Sarai
  2018-11-22  7:41           ` Serge E. Hallyn
  2 siblings, 1 reply; 50+ messages in thread
From: Aleksa Sarai @ 2018-11-19 21:23 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, viro,
	linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook


[-- Attachment #1: Type: text/plain, Size: 2866 bytes --]

On 2018-11-20, Aleksa Sarai <cyphar@cyphar.com> wrote:
> On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > On Tue, Nov 20, 2018 at 07:28:57AM +1100, Aleksa Sarai wrote:
> > > On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > > > +	if (info) {
> > > > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > > > +		if (unlikely(ret))
> > > > +			goto err;
> > > > +		/*
> > > > +		 * Not even root can pretend to send signals from the kernel.
> > > > +		 * Nor can they impersonate a kill()/tgkill(), which adds
> > > > +		 * source info.
> > > > +		 */
> > > > +		ret = -EPERM;
> > > > +		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > > > +		    (task_pid(current) != pid))
> > > > +			goto err;
> > > > +	} else {
> > > > +		prepare_kill_siginfo(sig, &kinfo);
> > > > +	}
> > > 
> > > I wonder whether we should also have a pidns restriction here, since
> > > currently it isn't possible for a container process using a pidns to
> > > signal processes outside its pidns. AFAICS, this isn't done through an
> > > explicit check -- it's a side-effect of processes in a pidns not being
> > > able to address non-descendant-pidns processes.
> > > 
> > > But maybe it's reasonable to allow sending a procfd to a different pidns
> > > and the same operations working on it? If we extend the procfd API to
> > 
> > No, I don't think so. I really don't want any fancy semantics in here.
> > Fancy doesn't get merged and fancy is hard to maintain. So we should do
> > something like:
> > 
> > if (proc_pid_ns() != current_pid_ns)
> > 	return EINVAL
> 
> This isn't quite sufficient. The key thing is that you have to be in an
> *ancestor* (or same) pidns, not the *same* pidns. Ideally you can re-use
> the check already in pidns_get_parent, and expose it. It would be
> something as trivial as:
> 
> bool pidns_is_descendant(struct pid_namespace *ns,
>                          struct pid_namespace *ancestor)
> {
>     for (;;) {
>         if (!ns)
>             return false;
>         if (ns == ancestor)
>             break;
>         ns = ns->parent;
>     }
>     return true;
> }
> 
> And you can rewrite pidns_get_parent to use it. So you would instead be
> doing:
> 
>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
>         return -EPERM;

Scratch the last bit, -EPERM is wrong here. I would argue that -EINVAL
is *somewhat* wrong because arguable the more semantically consistent
error (with kill(2)) would be -ESRCH -- but then you're mixing the "pid
is dead" and "pid is not visible to you" cases. I'm not sure what the
right errno would be here (I'm sure some of the LKML greybeards will
have a better clue.) :P

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:21         ` Christian Brauner
@ 2018-11-19 21:25           ` Aleksa Sarai
  2018-11-19 21:26           ` Daniel Colascione
  1 sibling, 0 replies; 50+ messages in thread
From: Aleksa Sarai @ 2018-11-19 21:25 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, viro,
	linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook


[-- Attachment #1: Type: text/plain, Size: 2686 bytes --]

On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> On Tue, Nov 20, 2018 at 08:18:10AM +1100, Aleksa Sarai wrote:
> > On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > > On Tue, Nov 20, 2018 at 07:28:57AM +1100, Aleksa Sarai wrote:
> > > > On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > > > > +	if (info) {
> > > > > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > > > > +		if (unlikely(ret))
> > > > > +			goto err;
> > > > > +		/*
> > > > > +		 * Not even root can pretend to send signals from the kernel.
> > > > > +		 * Nor can they impersonate a kill()/tgkill(), which adds
> > > > > +		 * source info.
> > > > > +		 */
> > > > > +		ret = -EPERM;
> > > > > +		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > > > > +		    (task_pid(current) != pid))
> > > > > +			goto err;
> > > > > +	} else {
> > > > > +		prepare_kill_siginfo(sig, &kinfo);
> > > > > +	}
> > > > 
> > > > I wonder whether we should also have a pidns restriction here, since
> > > > currently it isn't possible for a container process using a pidns to
> > > > signal processes outside its pidns. AFAICS, this isn't done through an
> > > > explicit check -- it's a side-effect of processes in a pidns not being
> > > > able to address non-descendant-pidns processes.
> > > > 
> > > > But maybe it's reasonable to allow sending a procfd to a different pidns
> > > > and the same operations working on it? If we extend the procfd API to
> > > 
> > > No, I don't think so. I really don't want any fancy semantics in here.
> > > Fancy doesn't get merged and fancy is hard to maintain. So we should do
> > > something like:
> > > 
> > > if (proc_pid_ns() != current_pid_ns)
> > > 	return EINVAL
> > 
> > This isn't quite sufficient. The key thing is that you have to be in an
> > *ancestor* (or same) pidns, not the *same* pidns. Ideally you can re-use
> > the check already in pidns_get_parent, and expose it. It would be
> > something as trivial as:
> > 
> > bool pidns_is_descendant(struct pid_namespace *ns,
> >                          struct pid_namespace *ancestor)
> > {
> >     for (;;) {
> >         if (!ns)
> >             return false;
> >         if (ns == ancestor)
> >             break;
> >         ns = ns->parent;
> >     }
> >     return true;
> > }
> 
> That can be done without a loop by comparing the level counter for the
> two pid namespaces.

If so, we can refactor how pidns_get_parent() and family work. :P

But yes, I agree with doing the above check.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:21         ` Christian Brauner
  2018-11-19 21:25           ` Aleksa Sarai
@ 2018-11-19 21:26           ` Daniel Colascione
  2018-11-19 21:36             ` Aleksa Sarai
  2018-11-19 21:37             ` Christian Brauner
  1 sibling, 2 replies; 50+ messages in thread
From: Daniel Colascione @ 2018-11-19 21:26 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Aleksa Sarai, Eric W. Biederman, linux-kernel, Serge E. Hallyn,
	Jann Horn, Andy Lutomirski, Andrew Morton, Oleg Nesterov,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 1:21 PM, Christian Brauner <christian@brauner.io> wrote:
> That can be done without a loop by comparing the level counter for the
> two pid namespaces.
>
>>
>> And you can rewrite pidns_get_parent to use it. So you would instead be
>> doing:
>>
>>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
>>         return -EPERM;
>>
>> (Or you can just copy the 5-line loop into procfd_signal -- though I
>> imagine we'll need this for all of the procfd_* APIs.)

Why is any of this even necessary? Why does the child namespace we're
considering even have a file descriptor to its ancestor's procfs? If
it has one of these FDs, it can already *read* all sorts of
information it really shouldn't be able to acquire, so the additional
ability to send a signal (subject to the usual permission checks)
feels like sticking a finger in a dike that's already well-perforated.
IMHO, we shouldn't bother with this check. The patch would be simpler
without it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:26           ` Daniel Colascione
@ 2018-11-19 21:36             ` Aleksa Sarai
  2018-11-19 21:37             ` Christian Brauner
  1 sibling, 0 replies; 50+ messages in thread
From: Aleksa Sarai @ 2018-11-19 21:36 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Christian Brauner, Eric W. Biederman, linux-kernel,
	Serge E. Hallyn, Jann Horn, Andy Lutomirski, Andrew Morton,
	Oleg Nesterov, Al Viro, Linux FS Devel, Linux API, Tim Murray,
	linux-man, Kees Cook


[-- Attachment #1: Type: text/plain, Size: 2466 bytes --]

On 2018-11-19, Daniel Colascione <dancol@google.com> wrote:
> On Mon, Nov 19, 2018 at 1:21 PM, Christian Brauner <christian@brauner.io> wrote:
> > That can be done without a loop by comparing the level counter for the
> > two pid namespaces.
> >
> >>
> >> And you can rewrite pidns_get_parent to use it. So you would instead be
> >> doing:
> >>
> >>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
> >>         return -EPERM;
> >>
> >> (Or you can just copy the 5-line loop into procfd_signal -- though I
> >> imagine we'll need this for all of the procfd_* APIs.)
> 
> Why is any of this even necessary? Why does the child namespace we're
> considering even have a file descriptor to its ancestor's procfs? If
> it has one of these FDs, it can already *read* all sorts of
> information it really shouldn't be able to acquire, so the additional
> ability to send a signal (subject to the usual permission checks)
> feels like sticking a finger in a dike that's already well-perforated.
> IMHO, we shouldn't bother with this check. The patch would be simpler
> without it.

First of all, currently it isn't possible to signal processes in an
ancestor pidns. Given the long thread about exit code visibility
semantics, I'm sure you see why bringing up this question is reasonable.

Some people (stupidly) bind-mount / into containers. There were several
CVEs in both LXC and runc where you could access the host filesystem
(including the host /proc). I'd prefer to not provide a mechanism for
such escalations to start sending signals to host processes, since I
don't see a strong reason why it should be allowed (and allowing it
would add more cracks to the isolation of pidns).

I think there is a huge difference between having read access to /proc
and being able to use /proc to signal processes which you ordinarily
would not be able to signal.

And another important point is that of semantics.

If we move forward with procfd_new() and the rest of the API we are
discussing, I'd argue we'd want to allow passing an nsfs fd to specify
what pidns we want the process to be created in (for procfd_new()). This
will obviously require a permission check to make sure we aren't
creating processes in a parent pidns -- and so for consistency all
procfd_* operations should have similar checks.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:26           ` Daniel Colascione
  2018-11-19 21:36             ` Aleksa Sarai
@ 2018-11-19 21:37             ` Christian Brauner
  2018-11-19 21:41               ` Daniel Colascione
  1 sibling, 1 reply; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 21:37 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Aleksa Sarai, Eric W. Biederman, linux-kernel, Serge E. Hallyn,
	Jann Horn, Andy Lutomirski, Andrew Morton, Oleg Nesterov,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 01:26:22PM -0800, Daniel Colascione wrote:
> On Mon, Nov 19, 2018 at 1:21 PM, Christian Brauner <christian@brauner.io> wrote:
> > That can be done without a loop by comparing the level counter for the
> > two pid namespaces.
> >
> >>
> >> And you can rewrite pidns_get_parent to use it. So you would instead be
> >> doing:
> >>
> >>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
> >>         return -EPERM;
> >>
> >> (Or you can just copy the 5-line loop into procfd_signal -- though I
> >> imagine we'll need this for all of the procfd_* APIs.)
> 
> Why is any of this even necessary? Why does the child namespace we're
> considering even have a file descriptor to its ancestor's procfs? If

Because you can send file descriptors between processes and container
runtimes tend to do that.

> it has one of these FDs, it can already *read* all sorts of
> information it really shouldn't be able to acquire, so the additional
> ability to send a signal (subject to the usual permission checks)
> feels like sticking a finger in a dike that's already well-perforated.
> IMHO, we shouldn't bother with this check. The patch would be simpler
> without it.

We will definitely not allow signaling processes in an ancestor pid
namespace! That is a security issue! I can imagine container runtimes
killing their monitoring process etc. pp. Not happening, unless someone
with deep expertise in signals can convince me otherwise.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:37             ` Christian Brauner
@ 2018-11-19 21:41               ` Daniel Colascione
  2018-11-20  4:59                 ` Eric W. Biederman
  0 siblings, 1 reply; 50+ messages in thread
From: Daniel Colascione @ 2018-11-19 21:41 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Aleksa Sarai, Eric W. Biederman, linux-kernel, Serge E. Hallyn,
	Jann Horn, Andy Lutomirski, Andrew Morton, Oleg Nesterov,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 1:37 PM Christian Brauner <christian@brauner.io> wrote:
>
> On Mon, Nov 19, 2018 at 01:26:22PM -0800, Daniel Colascione wrote:
> > On Mon, Nov 19, 2018 at 1:21 PM, Christian Brauner <christian@brauner.io> wrote:
> > > That can be done without a loop by comparing the level counter for the
> > > two pid namespaces.
> > >
> > >>
> > >> And you can rewrite pidns_get_parent to use it. So you would instead be
> > >> doing:
> > >>
> > >>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
> > >>         return -EPERM;
> > >>
> > >> (Or you can just copy the 5-line loop into procfd_signal -- though I
> > >> imagine we'll need this for all of the procfd_* APIs.)
> >
> > Why is any of this even necessary? Why does the child namespace we're
> > considering even have a file descriptor to its ancestor's procfs? If
>
> Because you can send file descriptors between processes and container
> runtimes tend to do that.

Right. But why *would* a container runtime send one of these procfs
FDs to a container?

> > it has one of these FDs, it can already *read* all sorts of
> > information it really shouldn't be able to acquire, so the additional
> > ability to send a signal (subject to the usual permission checks)
> > feels like sticking a finger in a dike that's already well-perforated.
> > IMHO, we shouldn't bother with this check. The patch would be simpler
> > without it.
>
> We will definitely not allow signaling processes in an ancestor pid
> namespace! That is a security issue! I can imagine container runtimes
> killing their monitoring process etc. pp. Not happening, unless someone
> with deep expertise in signals can convince me otherwise.

If parent namespace procfs FDs or mounts really can leak into child
namespaces as easily as Aleksa says, then I don't mind adding the
check. I was under the impression that if you find yourself in this
situation, you already have a big problem.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
                     ` (4 preceding siblings ...)
  2018-11-19 20:28   ` Aleksa Sarai
@ 2018-11-19 22:39   ` Tycho Andersen
  2018-11-19 22:49     ` Daniel Colascione
  2018-11-22  7:48     ` Serge E. Hallyn
  2018-11-19 23:35   ` kbuild test robot
                     ` (2 subsequent siblings)
  8 siblings, 2 replies; 50+ messages in thread
From: Tycho Andersen @ 2018-11-19 22:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, cyphar,
	viro, linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 11:32:39AM +0100, Christian Brauner wrote:
>
> +/**
> + *  sys_procfd_signal - send a signal to a process through a process file
> + *                      descriptor
> + *  @fd: the file descriptor of the process
> + *  @sig: signal to be sent
> + *  @info: the signal info
> + *  @flags: future flags to be passed
> + */
> +SYSCALL_DEFINE4(procfd_signal, int, fd, int, sig, siginfo_t __user *, info,
> +		int, flags)
> +{

Can I just register an objection here that I think using a syscall
just for this is silly?

My understanding is that the concern is that some code might do:

unknown_fd = recv_fd();
ioctl(unknown_fd, SOME_IOCTL, NULL); // where SOME_IOCTL == PROC_FD_KILL
// whoops, unknown_fd was a procfd and we killed a task!

In my experience when writing fd sending/receiving code, the sender and
receiver are fairly tightly coupled. Has anyone ever actually fixed a
bug where they had an fd that they lost track of what "type" it was
and screwed up like this? It seems completely theoretical to me.

The ioctl() approach has the benefit of being extensible. Adding a
syscall means that everyone has to do all the boilerplate for each new
pid op in the kernel, arches, libc, strace, etc.

Tycho

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 22:39   ` Tycho Andersen
@ 2018-11-19 22:49     ` Daniel Colascione
  2018-11-19 23:07       ` Tycho Andersen
  2018-11-22  7:48     ` Serge E. Hallyn
  1 sibling, 1 reply; 50+ messages in thread
From: Daniel Colascione @ 2018-11-19 22:49 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Christian Brauner, Eric W. Biederman, linux-kernel,
	Serge E. Hallyn, Jann Horn, Andy Lutomirski, Andrew Morton,
	Oleg Nesterov, Aleksa Sarai, Al Viro, Linux FS Devel, Linux API,
	Tim Murray, linux-man, Kees Cook

On Mon, Nov 19, 2018 at 2:40 PM Tycho Andersen <tycho@tycho.ws> wrote:
> Can I just register an objection here that I think using a syscall
> just for this is silly?

Yes, you can argue that the bikeshed should be ioctl-colored and not
syscall-colored.

> My understanding is that the concern is that some code might do:
>
> unknown_fd = recv_fd();
> ioctl(unknown_fd, SOME_IOCTL, NULL); // where SOME_IOCTL == PROC_FD_KILL
> // whoops, unknown_fd was a procfd and we killed a task!
>
> In my experience when writing fd sending/receiving code, the sender and
> receiver are fairly tightly coupled. Has anyone ever actually fixed a
> bug where they had an fd that they lost track of what "type" it was
> and screwed up like this? It seems completely theoretical to me.

Yes, I have fixed bugs of this form.

> The ioctl() approach has the benefit of being extensible.

The system call table is also extensible. ioctl is for when a given
piece of functionality *can't* realistically get its own system call
because it's separated from the main kernel somehow. procfs is a core
part of the kernel, so we can and should expose interfaces to it using
system calls.

> Adding a
> syscall means that everyone has to do all the boilerplate for each new
> pid op in the kernel, arches, libc, strace, etc.

These tools also care about ioctls. Adding a system call is a pain,
but the solution is to make adding system calls less of a pain, not to
permanently make the Linux ABI worse.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 22:49     ` Daniel Colascione
@ 2018-11-19 23:07       ` Tycho Andersen
  2018-11-20  0:27         ` Andy Lutomirski
  0 siblings, 1 reply; 50+ messages in thread
From: Tycho Andersen @ 2018-11-19 23:07 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Christian Brauner, Eric W. Biederman, linux-kernel,
	Serge E. Hallyn, Jann Horn, Andy Lutomirski, Andrew Morton,
	Oleg Nesterov, Aleksa Sarai, Al Viro, Linux FS Devel, Linux API,
	Tim Murray, linux-man, Kees Cook

On Mon, Nov 19, 2018 at 02:49:22PM -0800, Daniel Colascione wrote:
> On Mon, Nov 19, 2018 at 2:40 PM Tycho Andersen <tycho@tycho.ws> wrote:
> > Can I just register an objection here that I think using a syscall
> > just for this is silly?
> 
> Yes, you can argue that the bikeshed should be ioctl-colored and not
> syscall-colored.
> 
> > My understanding is that the concern is that some code might do:
> >
> > unknown_fd = recv_fd();
> > ioctl(unknown_fd, SOME_IOCTL, NULL); // where SOME_IOCTL == PROC_FD_KILL
> > // whoops, unknown_fd was a procfd and we killed a task!
> >
> > In my experience when writing fd sending/receiving code, the sender and
> > receiver are fairly tightly coupled. Has anyone ever actually fixed a
> > bug where they had an fd that they lost track of what "type" it was
> > and screwed up like this? It seems completely theoretical to me.
> 
> Yes, I have fixed bugs of this form.
> 
> > The ioctl() approach has the benefit of being extensible.
> 
> The system call table is also extensible.

But not infinitely so. The x32 ABI starts at 512, and right now I see
334 syscalls on x86_64. So the next 178 people can say "let's just
define a syscall", and after that? I suppose we could move to setting
BIT(10), but how much userspace assumes > 512 => compat syscall and
blocks it via seccomp or whatever?

Contrast that with the ioctl space, which is an unsigned long and
fairly sparse still (Documentation/ioctl/ioctl-number.txt).

> ioctl is for when a given piece of functionality *can't*
> realistically get its own system call because it's separated from
> the main kernel somehow. procfs is a core part of the kernel, so we
> can and should expose interfaces to it using system calls.

I suppose it's obvious, but I disagree.

> > Adding a
> > syscall means that everyone has to do all the boilerplate for each new
> > pid op in the kernel, arches, libc, strace, etc.
> 
> These tools also care about ioctls. Adding a system call is a pain,
> but the solution is to make adding system calls less of a pain, not to
> permanently make the Linux ABI worse.

For user-defined values of "worse" :)

Tycho

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
                     ` (5 preceding siblings ...)
  2018-11-19 22:39   ` Tycho Andersen
@ 2018-11-19 23:35   ` kbuild test robot
  2018-11-19 23:37   ` kbuild test robot
  2018-11-28 21:45   ` Joey Pabalinas
  8 siblings, 0 replies; 50+ messages in thread
From: kbuild test robot @ 2018-11-19 23:35 UTC (permalink / raw)
  To: Christian Brauner
  Cc: kbuild-all, ebiederm, linux-kernel, serge, jannh, luto, akpm,
	oleg, cyphar, viro, linux-fsdevel, linux-api, dancol, timmurray,
	linux-man, Christian Brauner, Kees Cook


[-- Attachment #1: Type: text/plain, Size: 1048 bytes --]

Hi Christian,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v4.20-rc3]
[cannot apply to next-20181119]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Christian-Brauner/proc-allow-signaling-processes-via-file-descriptors/20181120-063836
config: riscv-tinyconfig (attached as .config)
compiler: riscv64-linux-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.1.0 make.cross ARCH=riscv 

All warnings (new ones prefixed by >>):

>> <stdin>:1338:2: warning: #warning syscall procfd_signal not implemented [-Wcpp]

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 4416 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
                     ` (6 preceding siblings ...)
  2018-11-19 23:35   ` kbuild test robot
@ 2018-11-19 23:37   ` kbuild test robot
  2018-11-19 23:45     ` Christian Brauner
  2018-11-28 21:45   ` Joey Pabalinas
  8 siblings, 1 reply; 50+ messages in thread
From: kbuild test robot @ 2018-11-19 23:37 UTC (permalink / raw)
  To: Christian Brauner
  Cc: kbuild-all, ebiederm, linux-kernel, serge, jannh, luto, akpm,
	oleg, cyphar, viro, linux-fsdevel, linux-api, dancol, timmurray,
	linux-man, Christian Brauner, Kees Cook


[-- Attachment #1: Type: text/plain, Size: 2942 bytes --]

Hi Christian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.20-rc3]
[cannot apply to next-20181119]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Christian-Brauner/proc-allow-signaling-processes-via-file-descriptors/20181120-063836
config: riscv-tinyconfig (attached as .config)
compiler: riscv64-linux-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=8.1.0 make.cross ARCH=riscv 

All errors (new ones prefixed by >>):

   kernel/signal.c: In function '__do_sys_procfd_signal':
>> kernel/signal.c:3341:7: error: implicit declaration of function 'proc_is_procfd'; did you mean 'clockid_to_fd'? [-Werror=implicit-function-declaration]
     if (!proc_is_procfd(f.file))
          ^~~~~~~~~~~~~~
          clockid_to_fd
   cc1: some warnings being treated as errors

vim +3341 kernel/signal.c

  3314	
  3315	/**
  3316	 *  sys_procfd_signal - send a signal to a process through a process file
  3317	 *                      descriptor
  3318	 *  @fd: the file descriptor of the process
  3319	 *  @sig: signal to be sent
  3320	 *  @info: the signal info
  3321	 *  @flags: future flags to be passed
  3322	 */
  3323	SYSCALL_DEFINE4(procfd_signal, int, fd, int, sig, siginfo_t __user *, info,
  3324			int, flags)
  3325	{
  3326		int ret;
  3327		struct pid *pid;
  3328		kernel_siginfo_t kinfo;
  3329		struct fd f;
  3330	
  3331		/* Enforce flags be set to 0 until we add an extension. */
  3332		if (flags)
  3333			return -EINVAL;
  3334	
  3335		f = fdget_raw(fd);
  3336		if (!f.file)
  3337			return -EBADF;
  3338	
  3339		ret = -EINVAL;
  3340		/* Is this a process file descriptor? */
> 3341		if (!proc_is_procfd(f.file))
  3342			goto err;
  3343	
  3344		pid = f.file->private_data;
  3345		if (!pid)
  3346			goto err;
  3347	
  3348		if (info) {
  3349			ret = __copy_siginfo_from_user(sig, &kinfo, info);
  3350			if (unlikely(ret))
  3351				goto err;
  3352			/*
  3353			 * Not even root can pretend to send signals from the kernel.
  3354			 * Nor can they impersonate a kill()/tgkill(), which adds
  3355			 * source info.
  3356			 */
  3357			ret = -EPERM;
  3358			if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
  3359			    (task_pid(current) != pid))
  3360				goto err;
  3361		} else {
  3362			prepare_kill_siginfo(sig, &kinfo);
  3363		}
  3364	
  3365		ret = kill_pid_info(sig, &kinfo, pid);
  3366	
  3367	err:
  3368		fdput(f);
  3369		return ret;
  3370	}
  3371	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 4416 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 23:37   ` kbuild test robot
@ 2018-11-19 23:45     ` Christian Brauner
  0 siblings, 0 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-19 23:45 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, ebiederm, linux-kernel, serge, jannh, luto, akpm,
	oleg, cyphar, viro, linux-fsdevel, linux-api, dancol, timmurray,
	linux-man, Kees Cook

On Tue, Nov 20, 2018 at 07:37:58AM +0800, kbuild test robot wrote:
> Hi Christian,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v4.20-rc3]
> [cannot apply to next-20181119]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> 
> url:    https://github.com/0day-ci/linux/commits/Christian-Brauner/proc-allow-signaling-processes-via-file-descriptors/20181120-063836
> config: riscv-tinyconfig (attached as .config)
> compiler: riscv64-linux-gcc (GCC) 8.1.0
> reproduce:
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=8.1.0 make.cross ARCH=riscv 
> 
> All errors (new ones prefixed by >>):
> 
>    kernel/signal.c: In function '__do_sys_procfd_signal':
> >> kernel/signal.c:3341:7: error: implicit declaration of function 'proc_is_procfd'; did you mean 'clockid_to_fd'? [-Werror=implicit-function-declaration]
>      if (!proc_is_procfd(f.file))
>           ^~~~~~~~~~~~~~

On my radar and fixed. This happens when CONFIG_PROC_FS unset.

>           clockid_to_fd
>    cc1: some warnings being treated as errors
> 
> vim +3341 kernel/signal.c
> 
>   3314	
>   3315	/**
>   3316	 *  sys_procfd_signal - send a signal to a process through a process file
>   3317	 *                      descriptor
>   3318	 *  @fd: the file descriptor of the process
>   3319	 *  @sig: signal to be sent
>   3320	 *  @info: the signal info
>   3321	 *  @flags: future flags to be passed
>   3322	 */
>   3323	SYSCALL_DEFINE4(procfd_signal, int, fd, int, sig, siginfo_t __user *, info,
>   3324			int, flags)
>   3325	{
>   3326		int ret;
>   3327		struct pid *pid;
>   3328		kernel_siginfo_t kinfo;
>   3329		struct fd f;
>   3330	
>   3331		/* Enforce flags be set to 0 until we add an extension. */
>   3332		if (flags)
>   3333			return -EINVAL;
>   3334	
>   3335		f = fdget_raw(fd);
>   3336		if (!f.file)
>   3337			return -EBADF;
>   3338	
>   3339		ret = -EINVAL;
>   3340		/* Is this a process file descriptor? */
> > 3341		if (!proc_is_procfd(f.file))
>   3342			goto err;
>   3343	
>   3344		pid = f.file->private_data;
>   3345		if (!pid)
>   3346			goto err;
>   3347	
>   3348		if (info) {
>   3349			ret = __copy_siginfo_from_user(sig, &kinfo, info);
>   3350			if (unlikely(ret))
>   3351				goto err;
>   3352			/*
>   3353			 * Not even root can pretend to send signals from the kernel.
>   3354			 * Nor can they impersonate a kill()/tgkill(), which adds
>   3355			 * source info.
>   3356			 */
>   3357			ret = -EPERM;
>   3358			if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
>   3359			    (task_pid(current) != pid))
>   3360				goto err;
>   3361		} else {
>   3362			prepare_kill_siginfo(sig, &kinfo);
>   3363		}
>   3364	
>   3365		ret = kill_pid_info(sig, &kinfo, pid);
>   3366	
>   3367	err:
>   3368		fdput(f);
>   3369		return ret;
>   3370	}
>   3371	
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 23:07       ` Tycho Andersen
@ 2018-11-20  0:27         ` Andy Lutomirski
  2018-11-20  0:32           ` Christian Brauner
  2018-11-20  0:49           ` Daniel Colascione
  0 siblings, 2 replies; 50+ messages in thread
From: Andy Lutomirski @ 2018-11-20  0:27 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Daniel Colascione, Christian Brauner, Eric W. Biederman, LKML,
	Serge E. Hallyn, Jann Horn, Andrew Lutomirski, Andrew Morton,
	Oleg Nesterov, Aleksa Sarai, Al Viro, Linux FS Devel, Linux API,
	Tim Murray, linux-man, Kees Cook

On Mon, Nov 19, 2018 at 3:07 PM Tycho Andersen <tycho@tycho.ws> wrote:
> > These tools also care about ioctls. Adding a system call is a pain,
> > but the solution is to make adding system calls less of a pain, not to
> > permanently make the Linux ABI worse.
>
> For user-defined values of "worse" :)
>

I tend to agree with Tycho here.  But I'm wondering if it might be
worth considering a better ioctl.

/me dons flame-proof hat

We could do:

long better_ioctl(int fd, u32 nr, const void *inbuf, size_t inlen,
const void *outbuf, size_t outlen);

and have a central table in the kernel listing all possible nr values
along with which driver they belong to.  We could have a sane
signature and get rid of the nr collision problem.

The major problem I see is that u32 isn't really enough to have a sane
way to allow out-of-tree drivers to use this, and that we can't
readily use anything bigger than u32 without indirection because we're
out of syscall argument space.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-20  0:27         ` Andy Lutomirski
@ 2018-11-20  0:32           ` Christian Brauner
  2018-11-20  0:34             ` Andy Lutomirski
  2018-11-20  0:49           ` Daniel Colascione
  1 sibling, 1 reply; 50+ messages in thread
From: Christian Brauner @ 2018-11-20  0:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tycho Andersen, Daniel Colascione, Eric W. Biederman, LKML,
	Serge E. Hallyn, Jann Horn, Andrew Morton, Oleg Nesterov,
	Aleksa Sarai, Al Viro, Linux FS Devel, Linux API, Tim Murray,
	linux-man, Kees Cook

On Mon, Nov 19, 2018 at 04:27:49PM -0800, Andy Lutomirski wrote:
> On Mon, Nov 19, 2018 at 3:07 PM Tycho Andersen <tycho@tycho.ws> wrote:
> > > These tools also care about ioctls. Adding a system call is a pain,
> > > but the solution is to make adding system calls less of a pain, not to
> > > permanently make the Linux ABI worse.
> >
> > For user-defined values of "worse" :)
> >
> 
> I tend to agree with Tycho here.  But I'm wondering if it might be
> worth considering a better ioctl.
> 
> /me dons flame-proof hat
> 
> We could do:
> 
> long better_ioctl(int fd, u32 nr, const void *inbuf, size_t inlen,
> const void *outbuf, size_t outlen);

I'm the writer of this patch so take this with a grain of salt.
I think it is a bad idea to stop this patch and force us to introduce a
new type of ioctl().
An ioctl() is also not easy to use for this task because we want to add
a siginfo_t argument which I actually think provides value and makes
this interface more useful.

> 
> and have a central table in the kernel listing all possible nr values
> along with which driver they belong to.  We could have a sane
> signature and get rid of the nr collision problem.
> 
> The major problem I see is that u32 isn't really enough to have a sane
> way to allow out-of-tree drivers to use this, and that we can't
> readily use anything bigger than u32 without indirection because we're
> out of syscall argument space.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-20  0:32           ` Christian Brauner
@ 2018-11-20  0:34             ` Andy Lutomirski
  0 siblings, 0 replies; 50+ messages in thread
From: Andy Lutomirski @ 2018-11-20  0:34 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Andrew Lutomirski, Tycho Andersen, Daniel Colascione,
	Eric W. Biederman, LKML, Serge E. Hallyn, Jann Horn,
	Andrew Morton, Oleg Nesterov, Aleksa Sarai, Al Viro,
	Linux FS Devel, Linux API, Tim Murray, linux-man, Kees Cook

On Mon, Nov 19, 2018 at 4:33 PM Christian Brauner <christian@brauner.io> wrote:
>
> On Mon, Nov 19, 2018 at 04:27:49PM -0800, Andy Lutomirski wrote:
> > On Mon, Nov 19, 2018 at 3:07 PM Tycho Andersen <tycho@tycho.ws> wrote:
> > > > These tools also care about ioctls. Adding a system call is a pain,
> > > > but the solution is to make adding system calls less of a pain, not to
> > > > permanently make the Linux ABI worse.
> > >
> > > For user-defined values of "worse" :)
> > >
> >
> > I tend to agree with Tycho here.  But I'm wondering if it might be
> > worth considering a better ioctl.
> >
> > /me dons flame-proof hat
> >
> > We could do:
> >
> > long better_ioctl(int fd, u32 nr, const void *inbuf, size_t inlen,
> > const void *outbuf, size_t outlen);
>
> I'm the writer of this patch so take this with a grain of salt.
> I think it is a bad idea to stop this patch and force us to introduce a
> new type of ioctl().

I agree completely.

> An ioctl() is also not easy to use for this task because we want to add
> a siginfo_t argument which I actually think provides value and makes
> this interface more useful.
>

You could always have a struct procfd_kill and pass a pointer to
*that*.  But sure, it's ugly.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-20  0:27         ` Andy Lutomirski
  2018-11-20  0:32           ` Christian Brauner
@ 2018-11-20  0:49           ` Daniel Colascione
  1 sibling, 0 replies; 50+ messages in thread
From: Daniel Colascione @ 2018-11-20  0:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tycho Andersen, Christian Brauner, Eric W. Biederman,
	linux-kernel, Serge E. Hallyn, Jann Horn, Andrew Morton,
	Oleg Nesterov, Aleksa Sarai, Al Viro, Linux FS Devel, Linux API,
	Tim Murray, linux-man, Kees Cook

On Mon, Nov 19, 2018 at 4:28 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Mon, Nov 19, 2018 at 3:07 PM Tycho Andersen <tycho@tycho.ws> wrote:
> > > These tools also care about ioctls. Adding a system call is a pain,
> > > but the solution is to make adding system calls less of a pain, not to
> > > permanently make the Linux ABI worse.
> >
> > For user-defined values of "worse" :)
> >
>
> I tend to agree with Tycho here.  But I'm wondering if it might be
> worth considering a better ioctl.
>
> /me dons flame-proof hat
>
> We could do:
>
> long better_ioctl(int fd, u32 nr, const void *inbuf, size_t inlen,
> const void *outbuf, size_t outlen);
>
> and have a central table in the kernel listing all possible nr values
> along with which driver they belong to.  We could have a sane
> signature and get rid of the nr collision problem.

The essential difference between a regular system call and an ioctl is
that in the former, the invoked kernel-side code depends on the
operation number, and in the latter, the invoked kernel-side code
depends on the operation number and file descriptor type. By creating
a new kind of collision-free ioctl, all you've done is re-invent the
system call, but with a funky calling convention and less operand
space. It makes no sense. Previous system call multiplexers --- e.g.,
socketcall --- are widely regarded as mistakes, and there's no reason
to repeat these mistakes.

System call numbers are not scarce, and your other proposal to clean
up the x86 numbering will make wiring up a new system call less
annoying. The *only* purpose of an ioctl is to solve the system call
numbering coordination problem --- if the invoked kernel-side code
depends on (DRIVER, OPERATION_NUMBER), and DRIVER can vary out-of-tree
with ioctl, ioctl lets out-of-tree code expose interfaces. For in-tree
code, this problem doesn't exist, so there's no reason to use the
awkward ioctl workaround!

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:41               ` Daniel Colascione
@ 2018-11-20  4:59                 ` Eric W. Biederman
  2018-11-20 10:31                   ` Christian Brauner
  0 siblings, 1 reply; 50+ messages in thread
From: Eric W. Biederman @ 2018-11-20  4:59 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Christian Brauner, Aleksa Sarai, linux-kernel, Serge E. Hallyn,
	Jann Horn, Andy Lutomirski, Andrew Morton, Oleg Nesterov,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

Daniel Colascione <dancol@google.com> writes:

> On Mon, Nov 19, 2018 at 1:37 PM Christian Brauner <christian@brauner.io> wrote:
>>
>> On Mon, Nov 19, 2018 at 01:26:22PM -0800, Daniel Colascione wrote:
>> > On Mon, Nov 19, 2018 at 1:21 PM, Christian Brauner <christian@brauner.io> wrote:
>> > > That can be done without a loop by comparing the level counter for the
>> > > two pid namespaces.
>> > >
>> > >>
>> > >> And you can rewrite pidns_get_parent to use it. So you would instead be
>> > >> doing:
>> > >>
>> > >>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
>> > >>         return -EPERM;
>> > >>
>> > >> (Or you can just copy the 5-line loop into procfd_signal -- though I
>> > >> imagine we'll need this for all of the procfd_* APIs.)
>> >
>> > Why is any of this even necessary? Why does the child namespace we're
>> > considering even have a file descriptor to its ancestor's procfs? If
>>
>> Because you can send file descriptors between processes and container
>> runtimes tend to do that.
>
> Right. But why *would* a container runtime send one of these procfs
> FDs to a container?
>
>> > it has one of these FDs, it can already *read* all sorts of
>> > information it really shouldn't be able to acquire, so the additional
>> > ability to send a signal (subject to the usual permission checks)
>> > feels like sticking a finger in a dike that's already well-perforated.
>> > IMHO, we shouldn't bother with this check. The patch would be simpler
>> > without it.
>>
>> We will definitely not allow signaling processes in an ancestor pid
>> namespace! That is a security issue! I can imagine container runtimes
>> killing their monitoring process etc. pp. Not happening, unless someone
>> with deep expertise in signals can convince me otherwise.
>
> If parent namespace procfs FDs or mounts really can leak into child
> namespaces as easily as Aleksa says, then I don't mind adding the
> check. I was under the impression that if you find yourself in this
> situation, you already have a big problem.

There is one big reason to have the check, and I have not seen it
mentioned yet in this thread.

When SI_USER is set we report the pid of the sender of the signal in
si_pid.  When the signal comes from the kernel si_pid == 0.  When signal
is sent from an ancestor pid namespace si_pid also equals 0 (which is
reasonable).

A signal out to a process in a parent pid namespace such as SIGCHLD is
reasonable as we can map the pid.  I really don't see the point of
forbidding that.  From the perspective of the process in the parent pid
namespace it is just another process in it's pid namespace.  So it
should pose no problem from the perspective of the receiving process.

A signal to a process in a pid namespace that is neither a parent nor a
descendent pid namespace would be a problem, as there is no well defined
notion of what si_pid should be set to.  So for that case perhaps we
should have something like a noprocess pid that we can set.  Perhaps we
could set si_pid to 0xffffffff.  That would take a small extension to
pid_nr_ns.

File descriptors are not namespaced.  It is completely legitimate to use
file descriptors to get around limitations of namespaces.

Adding limitations to a file descriptor based api because someone else
can't set up their processes in such a way as to get the restrictions
they are looking for seems very sad.

Frankly I think it is one of the better features of namespaces that we
have to carefully handle and define these cases so that when the
inevitable leaks happen you are not immediately in a world of hurt.  All
of the other permission checks etc continue to do their job.  Plus you
are prepared for the case when someone wants their containers to have an
interesting communication primitive.

Eric





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-20  4:59                 ` Eric W. Biederman
@ 2018-11-20 10:31                   ` Christian Brauner
  2018-11-21 21:39                     ` Serge E. Hallyn
  0 siblings, 1 reply; 50+ messages in thread
From: Christian Brauner @ 2018-11-20 10:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Daniel Colascione, Aleksa Sarai, linux-kernel, Serge E. Hallyn,
	Jann Horn, Andy Lutomirski, Andrew Morton, Oleg Nesterov,
	Al Viro, Linux FS Devel, Linux API, Tim Murray, linux-man,
	Kees Cook

On Mon, Nov 19, 2018 at 10:59:12PM -0600, Eric W. Biederman wrote:
> Daniel Colascione <dancol@google.com> writes:
> 
> > On Mon, Nov 19, 2018 at 1:37 PM Christian Brauner <christian@brauner.io> wrote:
> >>
> >> On Mon, Nov 19, 2018 at 01:26:22PM -0800, Daniel Colascione wrote:
> >> > On Mon, Nov 19, 2018 at 1:21 PM, Christian Brauner <christian@brauner.io> wrote:
> >> > > That can be done without a loop by comparing the level counter for the
> >> > > two pid namespaces.
> >> > >
> >> > >>
> >> > >> And you can rewrite pidns_get_parent to use it. So you would instead be
> >> > >> doing:
> >> > >>
> >> > >>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
> >> > >>         return -EPERM;
> >> > >>
> >> > >> (Or you can just copy the 5-line loop into procfd_signal -- though I
> >> > >> imagine we'll need this for all of the procfd_* APIs.)
> >> >
> >> > Why is any of this even necessary? Why does the child namespace we're
> >> > considering even have a file descriptor to its ancestor's procfs? If
> >>
> >> Because you can send file descriptors between processes and container
> >> runtimes tend to do that.
> >
> > Right. But why *would* a container runtime send one of these procfs
> > FDs to a container?
> >
> >> > it has one of these FDs, it can already *read* all sorts of
> >> > information it really shouldn't be able to acquire, so the additional
> >> > ability to send a signal (subject to the usual permission checks)
> >> > feels like sticking a finger in a dike that's already well-perforated.
> >> > IMHO, we shouldn't bother with this check. The patch would be simpler
> >> > without it.
> >>
> >> We will definitely not allow signaling processes in an ancestor pid
> >> namespace! That is a security issue! I can imagine container runtimes
> >> killing their monitoring process etc. pp. Not happening, unless someone
> >> with deep expertise in signals can convince me otherwise.
> >
> > If parent namespace procfs FDs or mounts really can leak into child
> > namespaces as easily as Aleksa says, then I don't mind adding the
> > check. I was under the impression that if you find yourself in this
> > situation, you already have a big problem.
> 
> There is one big reason to have the check, and I have not seen it
> mentioned yet in this thread.
> 
> When SI_USER is set we report the pid of the sender of the signal in
> si_pid.  When the signal comes from the kernel si_pid == 0.  When signal
> is sent from an ancestor pid namespace si_pid also equals 0 (which is
> reasonable).
> 
> A signal out to a process in a parent pid namespace such as SIGCHLD is
> reasonable as we can map the pid.  I really don't see the point of
> forbidding that.  From the perspective of the process in the parent pid
> namespace it is just another process in it's pid namespace.  So it
> should pose no problem from the perspective of the receiving process.
> 
> A signal to a process in a pid namespace that is neither a parent nor a
> descendent pid namespace would be a problem, as there is no well defined
> notion of what si_pid should be set to.  So for that case perhaps we
> should have something like a noprocess pid that we can set.  Perhaps we
> could set si_pid to 0xffffffff.  That would take a small extension to
> pid_nr_ns.
> 
> File descriptors are not namespaced.  It is completely legitimate to use
> file descriptors to get around limitations of namespaces.

Frankly, I don't see a good argument for why we would allow that even if
safe. I have not heard a legitimate use-case or need for this.
At this point I care about very simple semantics. Being able to signal
into ancestor pid namespaces and cousin namespaces is interesting but
makes the syscall more brittle and harder to understand.
Changing pid_nr_ns() might be the solution but this function is called
all over the place in the kernel and I'm not going to risk breaking
something by changing it for a feature that no one so far has ever
asked for.
If you are ok with this then we should hold off on this. We can always
add this feature later by removing the check when someone has a use-case
for it.
I'll send a v2 of the patch that keeps the restriction for now. If you
insist on it being removed we can make the change in a follow-up
iteration.

Christian

> 
> Adding limitations to a file descriptor based api because someone else
> can't set up their processes in such a way as to get the restrictions
> they are looking for seems very sad.
> 
> Frankly I think it is one of the better features of namespaces that we
> have to carefully handle and define these cases so that when the
> inevitable leaks happen you are not immediately in a world of hurt.  All
> of the other permission checks etc continue to do their job.  Plus you
> are prepared for the case when someone wants their containers to have an
> interesting communication primitive.
> 
> Eric
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] procfd_signal.2: document procfd_signal syscall
  2018-11-19 10:32 ` [PATCH] procfd_signal.2: document procfd_signal syscall Christian Brauner
@ 2018-11-20 13:29   ` Michael Kerrisk (man-pages)
  2018-11-28 20:59   ` Florian Weimer
  1 sibling, 0 replies; 50+ messages in thread
From: Michael Kerrisk (man-pages) @ 2018-11-20 13:29 UTC (permalink / raw)
  To: Christian Brauner, ebiederm, linux-kernel
  Cc: mtk.manpages, serge, jannh, luto, akpm, oleg, cyphar, viro,
	linux-fsdevel, linux-api, dancol, timmurray, linux-man

Hello Christian,

On 11/19/18 11:32 AM, Christian Brauner wrote:
> Signed-off-by: Christian Brauner <christian@brauner.io>
> ---
> Changelog:
> v1:
> - patch introduced
> ---
>  man2/procfd_signal.2 | 147 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 147 insertions(+)
>  create mode 100644 man2/procfd_signal.2
> 
> diff --git a/man2/procfd_signal.2 b/man2/procfd_signal.2
> new file mode 100644
> index 000000000..6af0b74f4
> --- /dev/null
> +++ b/man2/procfd_signal.2
> @@ -0,0 +1,147 @@
> +.\" Copyright (C) 2018 Christian Brauner <christian@brauner.io>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein.  The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH PROCFD_SIGNAL 2 2017-09-15 "Linux" "Linux Programmer's Manual"

Bad timestamp :-)

> +.SH NAME
> +procfd_signal \- send signal to a process through a process descriptor

s/through/via/

> +.SH SYNOPSIS
> +.nf
> +.B #include <sys/types.h>
> +.B #include <signal.h>
> +.PP
> +.BI "int procfd_signal(int " fd ", int " sig ", siginfo_t *" info ", int " flags );
> +.fi
> +.SH DESCRIPTION
> +.BR procfd_signal ()
> +sends the signal specified in
> +.I sig
> +to the process identified by the file descriptor
> +.IR fd .

Here, I think we need some words about how one obtains that 
file descriptor.

> +The permissions required to send a signal are the same as for
> +.BR kill (2).
> +As with
> +.BR kill (2),
> +the null signal (0) can be used to check if a process with a given
> +PID exists.

But there is no PID mentioned on this page?

I suppose:

As with
.BR kill (2),
the null signal (0) can be used to check if the process referred to by
.I fd
exists.

?

> +.PP
> +The optional
> +.I info
> +argument specifies the data to accompany the signal.
> +This argument is a pointer to a structure of type
> +.IR siginfo_t ,
> +described in
> +.BR sigaction (2)
> +(and defined by including
> +.IR <sigaction.h> ).
> +The caller should set the following fields in this structure:
> +.TP
> +.I si_code
> +This must be one of the
> +.B SI_*
> +codes in the Linux kernel source file
> +.IR include/asm-generic/siginfo.h ,
> +with the restriction that the code must be negative
> +(i.e., cannot be
> +.BR SI_USER ,
> +which is used by the kernel to indicate a signal sent by
> +.BR kill (2))
> +and cannot (since Linux 2.6.39) be
> +.BR SI_TKILL
> +(which is used by the kernel to indicate a signal sent using
> +.\" tkill(2) or
> +.BR tgkill (2)).
> +.TP
> +.I si_pid
> +This should be set to a process ID,
> +typically the process ID of the sender.
> +.TP
> +.I si_uid
> +This should be set to a user ID,
> +typically the real user ID of the sender.
> +.TP
> +.I si_value
> +This field contains the user data to accompany the signal.
> +For more information, see the description of the last
> +.RI ( "union sigval" )
> +argument of
> +.BR sigqueue (3).

With sigqueue(3), one sends only a signal plus accompanying
data. It is of course based on the lower level rt_sigqueueinfo(2).
The man page for that system call says:

    These system calls are not intended for  direct  application  use;
    they  are  provided to allow the implementation of sigqueue(3) and
    pthread_sigqueue(3).

Is procfd_signal() intended to be the API directly used from
user space? If it is, then I think there should be some
explanation of why there is a 'siginfo_t' argument (vis-à-vis
sigqueue(3), which makes do with just union sigval).
If procfd_signal() is not intended to be the API used directly
from user space, then I think there needs to be a paragraph
similar to the one in the rt_sigqueueinfo(2) page queoted above.

> +.PP
> +Internally, the kernel sets the
> +.I si_signo
> +field to the value specified in
> +.IR sig ,
> +so that the receiver of the signal can also obtain
> +the signal number via that field.
> +.PP
> +The
> +.I flags
> +argument is reserved for future extension and must be set to 0.
> +.PP
> +.SH RETURN VALUE
> +On success, this system call returns 0.
> +On error, it returns \-1 and
> +.I errno
> +is set to indicate the error.
> +.SH ERRORS
> +.TP
> +.B EBADF
> +.I fd
> +is not a valid file descriptor.
> +.TP
> +.B EINVAL
> +An invalid signal was specified.
> +.TP
> +.B EINVAL
> +.I fd
> +does not refer to a process.
> +.TP
> +.B EINVAL
> +The flags argument was not 0.
> +.TP
> +.B EPERM
> +The caller does not have permission to send the signal to the target.
> +For the required permissions, see
> +.BR kill (2).
> +Or:
> +.I uinfo->si_code
> +is invalid.
> +.TP
> +.B ESRCH
> +The process or process group does not exist.

"or process group"? I suspect a cut and paste error here :-)

The connection between the preceding sentence and the one
that follows it is not quite clear:

> +Note that an existing process might be a zombie,
> +a process that has terminated execution, but
> +has not yet been
> +.BR wait (2)ed
> +for.

Is it the case that using procfd_signal() with a file
descriptor that refers to a zombie will yield EINVAL?
If yes, this could be made clearer with the following:

.B ESRCH
The specified process no longer exists or is a process in the
zombie state (a process that has terminated execution, but
has not yet been
BR wait (2)ed
for).

> +.SH CONFORMING TO
> +This system call is Linux-specific.
> +.SH SEE ALSO
> +.BR kill (2),
> +.BR sigaction (2),
> +.BR sigprocmask (2),
> +.BR tgkill (2),
> +.BR pthread_sigqueue (3),
> +.BR rt_sigqueueinfo (2),
> +.BR sigqueue (3),
> +.BR signal (7)

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-20 10:31                   ` Christian Brauner
@ 2018-11-21 21:39                     ` Serge E. Hallyn
  0 siblings, 0 replies; 50+ messages in thread
From: Serge E. Hallyn @ 2018-11-21 21:39 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, Daniel Colascione, Aleksa Sarai, linux-kernel,
	Serge E. Hallyn, Jann Horn, Andy Lutomirski, Andrew Morton,
	Oleg Nesterov, Al Viro, Linux FS Devel, Linux API, Tim Murray,
	linux-man, Kees Cook

On Tue, Nov 20, 2018 at 11:31:13AM +0100, Christian Brauner wrote:
> On Mon, Nov 19, 2018 at 10:59:12PM -0600, Eric W. Biederman wrote:
> > Daniel Colascione <dancol@google.com> writes:
> > 
> > > On Mon, Nov 19, 2018 at 1:37 PM Christian Brauner <christian@brauner.io> wrote:
> > >>
> > >> On Mon, Nov 19, 2018 at 01:26:22PM -0800, Daniel Colascione wrote:
> > >> > On Mon, Nov 19, 2018 at 1:21 PM, Christian Brauner <christian@brauner.io> wrote:
> > >> > > That can be done without a loop by comparing the level counter for the
> > >> > > two pid namespaces.
> > >> > >
> > >> > >>
> > >> > >> And you can rewrite pidns_get_parent to use it. So you would instead be
> > >> > >> doing:
> > >> > >>
> > >> > >>     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
> > >> > >>         return -EPERM;
> > >> > >>
> > >> > >> (Or you can just copy the 5-line loop into procfd_signal -- though I
> > >> > >> imagine we'll need this for all of the procfd_* APIs.)
> > >> >
> > >> > Why is any of this even necessary? Why does the child namespace we're
> > >> > considering even have a file descriptor to its ancestor's procfs? If
> > >>
> > >> Because you can send file descriptors between processes and container
> > >> runtimes tend to do that.
> > >
> > > Right. But why *would* a container runtime send one of these procfs
> > > FDs to a container?
> > >
> > >> > it has one of these FDs, it can already *read* all sorts of
> > >> > information it really shouldn't be able to acquire, so the additional
> > >> > ability to send a signal (subject to the usual permission checks)
> > >> > feels like sticking a finger in a dike that's already well-perforated.
> > >> > IMHO, we shouldn't bother with this check. The patch would be simpler
> > >> > without it.
> > >>
> > >> We will definitely not allow signaling processes in an ancestor pid
> > >> namespace! That is a security issue! I can imagine container runtimes
> > >> killing their monitoring process etc. pp. Not happening, unless someone
> > >> with deep expertise in signals can convince me otherwise.
> > >
> > > If parent namespace procfs FDs or mounts really can leak into child
> > > namespaces as easily as Aleksa says, then I don't mind adding the
> > > check. I was under the impression that if you find yourself in this
> > > situation, you already have a big problem.
> > 
> > There is one big reason to have the check, and I have not seen it
> > mentioned yet in this thread.
> > 
> > When SI_USER is set we report the pid of the sender of the signal in
> > si_pid.  When the signal comes from the kernel si_pid == 0.  When signal
> > is sent from an ancestor pid namespace si_pid also equals 0 (which is
> > reasonable).
> > 
> > A signal out to a process in a parent pid namespace such as SIGCHLD is
> > reasonable as we can map the pid.  I really don't see the point of
> > forbidding that.  From the perspective of the process in the parent pid
> > namespace it is just another process in it's pid namespace.  So it
> > should pose no problem from the perspective of the receiving process.
> > 
> > A signal to a process in a pid namespace that is neither a parent nor a
> > descendent pid namespace would be a problem, as there is no well defined
> > notion of what si_pid should be set to.  So for that case perhaps we
> > should have something like a noprocess pid that we can set.  Perhaps we
> > could set si_pid to 0xffffffff.  That would take a small extension to
> > pid_nr_ns.
> > 
> > File descriptors are not namespaced.  It is completely legitimate to use
> > file descriptors to get around limitations of namespaces.
> 
> Frankly, I don't see a good argument for why we would allow that even if
> safe. I have not heard a legitimate use-case or need for this.
> At this point I care about very simple semantics. Being able to signal
> into ancestor pid namespaces and cousin namespaces is interesting but
> makes the syscall more brittle and harder to understand.

Yeah, I'm with you on that.  We can always open that door later if a good
use case comes up, but I prefer simple at first.

> Changing pid_nr_ns() might be the solution but this function is called
> all over the place in the kernel and I'm not going to risk breaking
> something by changing it for a feature that no one so far has ever
> asked for.
> If you are ok with this then we should hold off on this. We can always
> add this feature later by removing the check when someone has a use-case
> for it.
> I'll send a v2 of the patch that keeps the restriction for now. If you
> insist on it being removed we can make the change in a follow-up
> iteration.
> 
> Christian
> 
> > 
> > Adding limitations to a file descriptor based api because someone else
> > can't set up their processes in such a way as to get the restrictions
> > they are looking for seems very sad.
> > 
> > Frankly I think it is one of the better features of namespaces that we
> > have to carefully handle and define these cases so that when the
> > inevitable leaks happen you are not immediately in a world of hurt.  All
> > of the other permission checks etc continue to do their job.  Plus you
> > are prepared for the case when someone wants their containers to have an
> > interesting communication primitive.
> > 
> > Eric
> > 
> > 
> > 
> > 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 21:23         ` Aleksa Sarai
@ 2018-11-22  7:41           ` Serge E. Hallyn
  0 siblings, 0 replies; 50+ messages in thread
From: Serge E. Hallyn @ 2018-11-22  7:41 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: Christian Brauner, ebiederm, linux-kernel, serge, jannh, luto,
	akpm, oleg, viro, linux-fsdevel, linux-api, dancol, timmurray,
	linux-man, Kees Cook

On Tue, Nov 20, 2018 at 08:23:43AM +1100, Aleksa Sarai wrote:
> On 2018-11-20, Aleksa Sarai <cyphar@cyphar.com> wrote:
> > On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > > On Tue, Nov 20, 2018 at 07:28:57AM +1100, Aleksa Sarai wrote:
> > > > On 2018-11-19, Christian Brauner <christian@brauner.io> wrote:
> > > > > +	if (info) {
> > > > > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > > > > +		if (unlikely(ret))
> > > > > +			goto err;
> > > > > +		/*
> > > > > +		 * Not even root can pretend to send signals from the kernel.
> > > > > +		 * Nor can they impersonate a kill()/tgkill(), which adds
> > > > > +		 * source info.
> > > > > +		 */
> > > > > +		ret = -EPERM;
> > > > > +		if ((kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL) &&
> > > > > +		    (task_pid(current) != pid))
> > > > > +			goto err;
> > > > > +	} else {
> > > > > +		prepare_kill_siginfo(sig, &kinfo);
> > > > > +	}
> > > > 
> > > > I wonder whether we should also have a pidns restriction here, since
> > > > currently it isn't possible for a container process using a pidns to
> > > > signal processes outside its pidns. AFAICS, this isn't done through an
> > > > explicit check -- it's a side-effect of processes in a pidns not being
> > > > able to address non-descendant-pidns processes.
> > > > 
> > > > But maybe it's reasonable to allow sending a procfd to a different pidns
> > > > and the same operations working on it? If we extend the procfd API to
> > > 
> > > No, I don't think so. I really don't want any fancy semantics in here.
> > > Fancy doesn't get merged and fancy is hard to maintain. So we should do
> > > something like:
> > > 
> > > if (proc_pid_ns() != current_pid_ns)
> > > 	return EINVAL
> > 
> > This isn't quite sufficient. The key thing is that you have to be in an
> > *ancestor* (or same) pidns, not the *same* pidns. Ideally you can re-use
> > the check already in pidns_get_parent, and expose it. It would be
> > something as trivial as:
> > 
> > bool pidns_is_descendant(struct pid_namespace *ns,
> >                          struct pid_namespace *ancestor)
> > {
> >     for (;;) {
> >         if (!ns)
> >             return false;
> >         if (ns == ancestor)
> >             break;
> >         ns = ns->parent;
> >     }
> >     return true;
> > }
> > 
> > And you can rewrite pidns_get_parent to use it. So you would instead be
> > doing:
> > 
> >     if (pidns_is_descendant(proc_pid_ns, task_active_pid_ns(current)))
> >         return -EPERM;
> 
> Scratch the last bit, -EPERM is wrong here. I would argue that -EINVAL
> is *somewhat* wrong because arguable the more semantically consistent
> error (with kill(2)) would be -ESRCH -- but then you're mixing the "pid
> is dead" and "pid is not visible to you" cases. I'm not sure what the
> right errno would be here (I'm sure some of the LKML greybeards will
> have a better clue.) :P

Actually I like EXDEV for this.  ERMOTE also works.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 22:39   ` Tycho Andersen
  2018-11-19 22:49     ` Daniel Colascione
@ 2018-11-22  7:48     ` Serge E. Hallyn
  1 sibling, 0 replies; 50+ messages in thread
From: Serge E. Hallyn @ 2018-11-22  7:48 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Christian Brauner, ebiederm, linux-kernel, serge, jannh, luto,
	akpm, oleg, cyphar, viro, linux-fsdevel, linux-api, dancol,
	timmurray, linux-man, Kees Cook

On Mon, Nov 19, 2018 at 03:39:54PM -0700, Tycho Andersen wrote:
> On Mon, Nov 19, 2018 at 11:32:39AM +0100, Christian Brauner wrote:
> >
> > +/**
> > + *  sys_procfd_signal - send a signal to a process through a process file
> > + *                      descriptor
> > + *  @fd: the file descriptor of the process
> > + *  @sig: signal to be sent
> > + *  @info: the signal info
> > + *  @flags: future flags to be passed
> > + */
> > +SYSCALL_DEFINE4(procfd_signal, int, fd, int, sig, siginfo_t __user *, info,
> > +		int, flags)
> > +{
> 
> Can I just register an objection here that I think using a syscall
> just for this is silly?
> 
> My understanding is that the concern is that some code might do:
> 
> unknown_fd = recv_fd();
> ioctl(unknown_fd, SOME_IOCTL, NULL); // where SOME_IOCTL == PROC_FD_KILL
> // whoops, unknown_fd was a procfd and we killed a task!

This could just be my own mental model, but for something like "kill a
task", an ioctl just seems wrong.  Syscall seems more natural.

I'd ack either method.

-serge

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] procfd_signal.2: document procfd_signal syscall
  2018-11-19 10:32 ` [PATCH] procfd_signal.2: document procfd_signal syscall Christian Brauner
  2018-11-20 13:29   ` Michael Kerrisk (man-pages)
@ 2018-11-28 20:59   ` Florian Weimer
  2018-11-28 21:12     ` Christian Brauner
  1 sibling, 1 reply; 50+ messages in thread
From: Florian Weimer @ 2018-11-28 20:59 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, cyphar,
	viro, linux-fsdevel, linux-api, dancol, timmurray, linux-man

* Christian Brauner:

> +.\" Copyright (C) 2018 Christian Brauner <christian@brauner.io>

The text seems to be largely derived from rt_sigqueueinfo, so I'm not
sure if this appropriate here.

> +the null signal (0) can be used to check if a process with a given
> +PID exists.

What does this mean if hte process is identified by file descriptor?

> +.PP
> +The optional
> +.I info
> +argument specifies the data to accompany the signal.
> +This argument is a pointer to a structure of type
> +.IR siginfo_t ,
> +described in
> +.BR sigaction (2)
> +(and defined by including
> +.IR <sigaction.h> ).
> +The caller should set the following fields in this structure:
> +.TP
> +.I si_code
> +This must be one of the
> +.B SI_*
> +codes in the Linux kernel source file
> +.IR include/asm-generic/siginfo.h ,
> +with the restriction that the code must be negative
> +(i.e., cannot be
> +.BR SI_USER ,
> +which is used by the kernel to indicate a signal sent by
> +.BR kill (2))
> +and cannot (since Linux 2.6.39) be

Obsolete reference in this context.

> +.TP
> +.B ESRCH
> +The process or process group does not exist.
> +Note that an existing process might be a zombie,
> +a process that has terminated execution, but
> +has not yet been
> +.BR wait (2)ed
> +for.

Again: What does this mean if the process identified by a descriptor?
Does a process in zombie state exist in this sense or not?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] procfd_signal.2: document procfd_signal syscall
  2018-11-28 20:59   ` Florian Weimer
@ 2018-11-28 21:12     ` Christian Brauner
  0 siblings, 0 replies; 50+ messages in thread
From: Christian Brauner @ 2018-11-28 21:12 UTC (permalink / raw)
  To: Florian Weimer
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, cyphar,
	viro, linux-fsdevel, linux-api, dancol, timmurray, linux-man

On November 29, 2018 9:59:52 AM GMT+13:00, Florian Weimer <fweimer@redhat.com> wrote:
>* Christian Brauner:
>
>> +.\" Copyright (C) 2018 Christian Brauner <christian@brauner.io>
>
>The text seems to be largely derived from rt_sigqueueinfo, so I'm not
>sure if this appropriate here.
>
>> +the null signal (0) can be used to check if a process with a given
>> +PID exists.
>
>What does this mean if hte process is identified by file descriptor?
>
>> +.PP
>> +The optional
>> +.I info
>> +argument specifies the data to accompany the signal.
>> +This argument is a pointer to a structure of type
>> +.IR siginfo_t ,
>> +described in
>> +.BR sigaction (2)
>> +(and defined by including
>> +.IR <sigaction.h> ).
>> +The caller should set the following fields in this structure:
>> +.TP
>> +.I si_code
>> +This must be one of the
>> +.B SI_*
>> +codes in the Linux kernel source file
>> +.IR include/asm-generic/siginfo.h ,
>> +with the restriction that the code must be negative
>> +(i.e., cannot be
>> +.BR SI_USER ,
>> +which is used by the kernel to indicate a signal sent by
>> +.BR kill (2))
>> +and cannot (since Linux 2.6.39) be
>
>Obsolete reference in this context.
>
>> +.TP
>> +.B ESRCH
>> +The process or process group does not exist.
>> +Note that an existing process might be a zombie,
>> +a process that has terminated execution, but
>> +has not yet been
>> +.BR wait (2)ed
>> +for.
>
>Again: What does this mean if the process identified by a descriptor?
>Does a process in zombie state exist in this sense or not?
>
>Thanks,
>Florian

Updating the document is on my Todo.
Florian, can you take a look at the actual patch too, please?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
                     ` (7 preceding siblings ...)
  2018-11-19 23:37   ` kbuild test robot
@ 2018-11-28 21:45   ` Joey Pabalinas
  2018-11-28 22:05     ` Christian Brauner
  8 siblings, 1 reply; 50+ messages in thread
From: Joey Pabalinas @ 2018-11-28 21:45 UTC (permalink / raw)
  To: Christian Brauner
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, cyphar,
	viro, linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook, Joey Pabalinas


[-- Attachment #1: Type: text/plain, Size: 510 bytes --]

On Mon, Nov 19, 2018 at 11:32:39AM +0100, Christian Brauner wrote:
> +	if (info) {
> +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> +		if (unlikely(ret))
> +			goto err;

What's the reason you don't propagate up the errors from __copy_siginfo_from_user()?
Granted, I admit that -E2BIG is kind of weird to return, but -EFAULT seems like a
fairly sane error.

Or is there some reason it's more useful to just return -EINVAL for all of the
failure cases here?

-- 
Cheers,
Joey Pabalinas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-28 21:45   ` Joey Pabalinas
@ 2018-11-28 22:05     ` Christian Brauner
  2018-11-28 23:02       ` Joey Pabalinas
  0 siblings, 1 reply; 50+ messages in thread
From: Christian Brauner @ 2018-11-28 22:05 UTC (permalink / raw)
  To: Joey Pabalinas
  Cc: ebiederm, linux-kernel, serge, jannh, luto, akpm, oleg, cyphar,
	viro, linux-fsdevel, linux-api, dancol, timmurray, linux-man,
	Kees Cook

On Wed, Nov 28, 2018 at 11:45:34AM -1000, Joey Pabalinas wrote:
> On Mon, Nov 19, 2018 at 11:32:39AM +0100, Christian Brauner wrote:
> > +	if (info) {
> > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > +		if (unlikely(ret))
> > +			goto err;
> 

I think you're misreading here. It jumps to:

err:
        fdput(f);
	return ret;

and does propagate the error. This is also an old iteration of the patch.

Christian

> What's the reason you don't propagate up the errors from __copy_siginfo_from_user()?
> Granted, I admit that -E2BIG is kind of weird to return, but -EFAULT seems like a
> fairly sane error.
> 
> Or is there some reason it's more useful to just return -EINVAL for all of the
> failure cases here?
> 
> -- 
> Cheers,
> Joey Pabalinas



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v1 2/2] signal: add procfd_signal() syscall
  2018-11-28 22:05     ` Christian Brauner
@ 2018-11-28 23:02       ` Joey Pabalinas
  0 siblings, 0 replies; 50+ messages in thread
From: Joey Pabalinas @ 2018-11-28 23:02 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Joey Pabalinas, ebiederm, linux-kernel, serge, jannh, luto, akpm,
	oleg, cyphar, viro, linux-fsdevel, linux-api, dancol, timmurray,
	linux-man, Kees Cook


[-- Attachment #1: Type: text/plain, Size: 599 bytes --]

On Wed, Nov 28, 2018 at 11:05:49PM +0100, Christian Brauner wrote:
> On Wed, Nov 28, 2018 at 11:45:34AM -1000, Joey Pabalinas wrote:
> > On Mon, Nov 19, 2018 at 11:32:39AM +0100, Christian Brauner wrote:
> > > +	if (info) {
> > > +		ret = __copy_siginfo_from_user(sig, &kinfo, info);
> > > +		if (unlikely(ret))
> > > +			goto err;
> > 
> 
> I think you're misreading here. It jumps to:
> 
> err:
>         fdput(f);
> 	return ret;
> 
> and does propagate the error. This is also an old iteration of the patch.

Whoops, that I am. Sorry about that.

-- 
Cheers,
Joey Pabalinas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, back to index

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-19 10:32 [PATCH v1 0/2] proc: allow signaling processes via file descriptors Christian Brauner
2018-11-19 10:32 ` [PATCH v1 1/2] proc: get process file descriptor from /proc/<pid> Christian Brauner
2018-11-19 15:32   ` Andy Lutomirski
2018-11-19 18:20     ` Christian Brauner
2018-11-19 10:32 ` [PATCH v1 2/2] signal: add procfd_signal() syscall Christian Brauner
2018-11-19 15:45   ` Andy Lutomirski
2018-11-19 15:57     ` Daniel Colascione
2018-11-19 18:39     ` Christian Brauner
2018-11-19 15:59   ` Daniel Colascione
2018-11-19 18:29     ` Christian Brauner
2018-11-19 19:02       ` Eric W. Biederman
2018-11-19 19:31         ` Christian Brauner
2018-11-19 19:39           ` Daniel Colascione
2018-11-19 17:10   ` Eugene Syromiatnikov
2018-11-19 18:23     ` Christian Brauner
2018-11-19 17:14   ` Eugene Syromiatnikov
2018-11-19 20:28   ` Aleksa Sarai
2018-11-19 20:55     ` Christian Brauner
2018-11-19 21:13       ` Christian Brauner
2018-11-19 21:18       ` Aleksa Sarai
2018-11-19 21:20         ` Christian Brauner
2018-11-19 21:21         ` Christian Brauner
2018-11-19 21:25           ` Aleksa Sarai
2018-11-19 21:26           ` Daniel Colascione
2018-11-19 21:36             ` Aleksa Sarai
2018-11-19 21:37             ` Christian Brauner
2018-11-19 21:41               ` Daniel Colascione
2018-11-20  4:59                 ` Eric W. Biederman
2018-11-20 10:31                   ` Christian Brauner
2018-11-21 21:39                     ` Serge E. Hallyn
2018-11-19 21:23         ` Aleksa Sarai
2018-11-22  7:41           ` Serge E. Hallyn
2018-11-19 22:39   ` Tycho Andersen
2018-11-19 22:49     ` Daniel Colascione
2018-11-19 23:07       ` Tycho Andersen
2018-11-20  0:27         ` Andy Lutomirski
2018-11-20  0:32           ` Christian Brauner
2018-11-20  0:34             ` Andy Lutomirski
2018-11-20  0:49           ` Daniel Colascione
2018-11-22  7:48     ` Serge E. Hallyn
2018-11-19 23:35   ` kbuild test robot
2018-11-19 23:37   ` kbuild test robot
2018-11-19 23:45     ` Christian Brauner
2018-11-28 21:45   ` Joey Pabalinas
2018-11-28 22:05     ` Christian Brauner
2018-11-28 23:02       ` Joey Pabalinas
2018-11-19 10:32 ` [PATCH] procfd_signal.2: document procfd_signal syscall Christian Brauner
2018-11-20 13:29   ` Michael Kerrisk (man-pages)
2018-11-28 20:59   ` Florian Weimer
2018-11-28 21:12     ` Christian Brauner

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git